Cell lines and cell culture
HEK293T (CRL-3216), HepG2 (HB-8065) and HeLa (CCL-2) cell lines were obtained from the American Type Culture Collection. Cells were maintained in Dulbecco’s Modified Eagle Medium (DMEM; Gibco, cat. no. 11995073) supplemented with 10% heat-inactivated fetal bovine serum (FBS; Gibco, cat. no. A56708-01) and 1% penicillin-streptomycin (Gibco, cat. no. 15070063) at 37 °C in a humidified atmosphere containing 5% CO2.
The KOLF2.1J SNCA E46K−/− induced pluripotent stem (iPS) cell line was purchased from the Jackson Laboratory and cultured in StemFlex medium (Life Technologies, cat. no. A3349401) supplemented with 10% FBS (Life Technologies). iPS cells were plated on dishes coated with Synthemax II-SC (1 mg ml−1 stock solution; Corning, cat. no. 3535) according to the manufacturer’s instructions and treated with 10 μM Y-27632 for 24 h following each passage.
All cultures were tested routinely for Mycoplasma contamination every 3 months using the American Type Culture Collection Universal Mycoplasma Detection Kit, and all tests were negative. For passaging, adherent cell lines were dissociated using TrypLE Express (Gibco, cat. no. 12604013) for 3–5 min, whereas iPS cells were detached gently with ReLeSR (Stem Cell Technologies, cat. no. 100-0483). For experiments requiring single-cell suspensions, iPS cells were detached using Accutase (Stem Cell Technologies, cat. no. 07922) and counted using trypan blue exclusion on a Countess II automated cell counter (Thermo Fisher Scientific).
Bacterial media, reagents and plasmids
Luria broth (LB) and 2× YT medium were prepared using MP Biomedicals media capsules according to the manufacturer’s protocol. For LB and 2× YT agar, 16 g l−1 agar was added for standard, and 7 g l−1 agar was added for soft agar. All media was sterilized by autoclaving. Oligonucleotides, primers and plasmids used in the study can be found in Supplementary Table 4. All gRNAs were cloned using KLD (NEB, cat. no. M0554S) according to the manufacturer’s protocol and spacer sequences are in Supplementary Table 9.
agRNA library generation
The agRNA library consists of an upstream binding sequence that is the reverse complement of the downstream sequence of the target, a counter-loop and a downstream binding sequence that binds the upstream sequence of the target. The upstream and downstream binding sequences are of different length and have different binding regions in the 1–11 bp region upstream and downstream of the target. The counter-loop library includes 33 different DNA sequences of which the longer sequences form GC-rich hairpins. The length of the counter-loop ranges from 1 to 14 nucleotides. The final library contains every combination of the possible upstream binding sequence, counter-loop and downstream binding sequence combinations. A script to generate the hairpin library for a new context can be found in Supplementary Code 1. To facilitate agRNA library design for new targets, we developed a Python script (Supplementary Code 1) that automatically generates library sequences tailored to any custom base editing site.
The agRNA for DNMT1 was ordered as an Agilent DNA Oligo Pool (64,610 oligonucleotides; Supplementary Tables 5 and 6). The oligonucleotides for the DNMT1 library contained a Gibson overhang, gRNA, gRNA scaffold, agRNA library and a terminator followed by a short DNA sequence used as primer binding site. The target for the DNMT1 library was already cloned on the plasmid used as a backbone. The DNA Oligo Pool library was amplified by PCR with the oligonucleotides Lib_F and Lib_R using Q5 Hot Start High-Fidelity DNA Polymerase (NEB, cat. no. M0493S). The backbone pU6-tevopreq1-GG-acceptor (Addgene, cat. no. 174038) was PCR amplified using the oligos SplitF and SplitR. The PCR product of the backbone was digested overnight with DpnI (NEB, cat. no. R0176S) at 37 °C and both PCR products were purified using the Monarch PCR and DNA Cleanup Kit (NEB, cat. no. T1130S) according to the manufacturer’s protocol. The fragments were assembled using the Gibson Assembly Master Mix (NEB, cat. no. E2611S) in a 10:1 ratio library:backbone and 150 ng of the backbone DNA according to the manufacturer’s protocol. A 2-µl aliquot of the Gibson assembly mix was used to directly transform DUOs electrocompetent cells (Endura, cat. no. 60242-1) and, after recovery on 1 ml of SOC medium, plated on Carbenicillin/Agar plates poured in Nunc Square BioAssay Dishes (Cole Palmer, cat. no. EW-01929-00). For cloning into the backbone LentiGuide-Puro (Addgene, cat. no. 52963), the library was amplified from the pU6-tevopreq1-GG-acceptor. The backbone was digested using PspXI (NEB, cat. no. R0656S) and Esp3I (NEB, cat. no. R0734S) and cloned by Gibson assembly following the previously described protocol.
For each library at least ten times library coverage of colonies were washed off the plates using LB medium then spun down. The plasmid DNA was extracted from the resulting pellet using a Plasmid Plus Midi Kit (Qiagen, cat. no. 12943) according to the manufacturer’s protocol.
Lentiviral particles generation
HEK293T were cells were seeded at a density of 5 × 10⁶ cells per 10-cm plate in DMEM supplemented with 10% FBS and antibiotics (Thermo Fisher, cat. no. 15240062). The following day, cells were transfected using Transporter 5 Transfection Reagent (Polysciences, cat. no. 26008) with a plasmid mix containing pVSV-G (3.86 µg), pPax2 (8.57 µg) and the lentiviral transfer vector (9.23 µg) in Opti-MEM (Thermo Fisher). The DNA-transporter complexes were incubated at room temperature for 20 min before being added to the culture medium. After 24 h, the medium was replaced with fresh DMEM, and viral supernatants were collected at 48- and 72-h post-transfection. The harvested medium was filtered through a 0.45-µm vacuum filter system and concentrated using Lenti-X Concentrator (Takara Bio, cat. no. 631231) at a 1:3 ratio (media to concentrator) by incubation at 4 °C for at least 30 min, followed by centrifugation at 1,500g for 45 min at 4 °C. The viral pellet was resuspended in phosphate-buffered saline, aliquoted and stored at −80 °C until further use.
DNMT1 library testing in HEK cells
HEK293T cells were transduced with the lentiviral library with a multiplicity of infection of 0.2. After 24–48 h, medium was removed and exchanged for fresh medium with 2 μg ml−1 of Puromycin. Selection continued for 2 weeks. To test the editing pattern across our library, 20 million cells (300× coverage) were seeded in a 225 mm3 dish and transfected the day after using Lipofectamine 3000 (Invitrogen, cat. no. L3000015) with 20 μg of pCMV-T7-ABE8e-nSpCas9-P2A-EGFP (KAC978) (Addgene, cat. no. 185910). Genomic DNA was collected from cells 5 days after transfection.
Evaluation of the anchor library
Efficiency and precision of the BE in combination with the agRNA were evaluated using custom R script. The quality of the reads from NGS samples was assessed before further processing. Variant calling techniques were then applied to distinctly identify the perfect edit (conversion of adenine at position 8 to guanine) apart from bystander edits, which encompassed any conversion of the other adenines or combinations involving A8. Samples with anchor sequences yielding fewer than 20 reads were excluded to ensure robustness in the data analysis. Furthermore, a quantitative score was devised and calculated using the following formula:
$${\rm{Score}}=({\rm{ \% }}{\rm{Perfect}}{\rm{Edit}})/({({\rm{ \% }}{\rm{Perfect}}{\rm{Edit}}+{\rm{ \% }}{\rm{Bystander}})}^{2})$$
Anchors achieving the highest scores and demonstrating at least 20% overall editing efficiency were further characterized experimentally (Supplementary Tables 7 and 8).
Context library generation
For the A-to-G BE, ~12,000 different gRNAs targeting pathogenic relevant mutations14 were cloned as a library to test the performance of different hairpins and BEs.
The generation of these context libraries differed from the generation of the agRNA, as extensive recombination events occurred when the gRNA, gRNA scaffold, agRNA and target were introduced as one oligonucleotide. To avoid recombination issues, the gRNA and target with 11 bp upstream and 25 bp downstream of the native genomic context were cloned as an oligo lacking the gRNA scaffold and hairpin. Instead of these, the oligo had two outwards-facing BsaI-HF (NEB, cat. no. R3733S) cutting sites with ten randomized base pairs at that position. The DNA Oligo Pool libraries are amplified using PCR with the oligonucleotides Lib_F and Lib_R. The backbone sgBbsI (p2Tol-U6-2×BbsI-sgRNA-HygR) (Addgene, cat. no. 71485) was PCR amplified using the oligos BB_R and BB_F. The PCR product of the backbone was digested with DpnI overnight at 37 °C and both PCR products were purified using the NEB Monarch PCR and DNA Cleanup Kit according to the manufacturer’s protocol. The fragments were assembled using the New England Biolabs Gibson Assembly Master Mix in a 10:1 library:backbone ratio and 150 ng of the backbone DNA according to the manufacturer’s protocol. A 2-µl aliquot of the Gibson assembly mix was used directly to transform Lucigen’s Endura Competent Cells and, after recovery, plated on agar plates poured in Nunc Square BioAssay Dishes. For each library at least ten times library coverage of colonies were washed off the plates using Luria broth medium and then spun down. The plasmid DNA was extracted using the QIAGEN Plasmid Plus Midi Kit according to the manufacturer’s protocol from the resulting pellet.
The library was then digested using BsaI according to the manufacturer’s protocol and gel purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol. The gRNA scaffold, hairpin and terminator with an inward facing BsaI cutting site up- and downstream were ordered as cloned gene synthesis from IDT. The plasmid was also BsaI digested, and the fragment was purified using the New England Biolabs Monarch Gel Extraction Kit according to the Manufacturer’s protocol. The insert and library were ligated using New England Biolabs T4 DNA Ligase according to the Manufacturer’s protocol with 150 ng backbone and a 10:1 ratio of the insert to the backbone. 2 µl of the ligation mix were directly transformed into Lucigen’s Endura Competent Cells and after recovery plated on agar plates poured in Nunc Square BioAssay Dishes. For each library at least 10x library coverage of colonies were washed off the plates using LB medium and then spun down. The plasmid DNA was extracted from the resulting pellet using the QIAGEN Plasmid Plus Midi Kit according to the manufacturer’s protocol.
Path_Var library testing in HEK cells
For stable Tol2 transposon-mediated library integration, 5 million HEK293T cells (~400× coverage) were seeded in 175 mm3 dishes. The following day, cells were cotransfected using Lipofectamine 3000 with 10 μg of Tol2 transposase plasmid (pCMV-Tol2; Addgene, cat. no. 31823) and 10 μg of Path_Var library. To generate stable library cell lines, cells were selected with hygromycin (25 μg ml−1) starting the day after transfection and continued for 2 weeks. Thereafter, 2.5 million cells were transfected with 10 μg of BE using Lipofectamine 3000 to (~200× coverage) were seeded the day before in a 100-mm3 dish. Genomic DNA was collected from cells 5 days after transfection.
Illumina sequencing and bioinformatic analysis of the libraries
To sequence the libraries before (as quality control) and after testing in the HEK293T cells, 1 ng of the isolated plasmid DNA (QuickExtract DNA Extraction Solution Biosearch Technologies) was amplified using the oligonucleotide mix IllSeq_DNMT1_i5_F1-4 and IllSeq_DNMT1_i7_R1-4 using Q5 High-Fidelity 2× Master Mix according to the manufacturer’s protocol. The forward and reverse oligonucleotide contained an i5/i7 overhang for indexing as well as four to seven Ns to ensure shifting of the sequence to be able to determine sequences with high identity. The resulting PCR products were amplified in a second PCR reaction using a compatible combination of the NEBNext Multiplex Oligos for Illumina (NEB, cat. no. E6440S). The PCR products were purified using the Monarch Gel Extraction Kit (NEB, cat. no. T1120S) and quantified using Qubit-HS (Invitrogen, cat. no. Q33231). A 4-nM pool of the different libraries was generated and 10% 4 nM PhiX was added. The pool was sequenced using the Illumina MiSeq Reagent Kit v.2 (300 cycles) according to the manufacturer’s protocol.
Evaluation of the context library
Evaluation of the context library involved analyzing gRNA libraries, which comprised approximately 12,000 spacer sequences and their respective contextual sequences. The efficiency and editing profiles for each gRNA were established using custom scripts developed in R. First, the target sites, where each spacer binds within the context, were extracted from the NGS reads. Subsequently, for each spacer in the library, all combinations of adenine to guanine conversions were aligned against these extracted sequences. Spacers with fewer than 25 total reads were excluded from the analysis. To quantify overall editing efficiency for the different BEs, the mean A to G conversion rate was calculated by averaging the editing frequencies at each targeted position.
Genome editing of endogenous human genomic loci
A total of 30,000 cells (HEK293T, HeLa and HepG2) were seeded in 96-well plates (Corning) and transfected the next day using jetOPTIMUS (Sartorius, cat. no. 101000006) or TransIT-X2 (MirusBio, 6003) following the manufacturer instructions. Then, 50 ng of sgRNA or agRNA (both cloned in BPK1520; Addgene, cat. no. 65777) with 150 ng of BE variant were cotransfected, and cells were harvested after 3 days for Sanger sequencing (Azenta Life Sciences) or NGS (Quintara Biosciences or in-house Illumina miSeq). Cas9-dependent genomic off-target sites, predicted using Cas-OFFinder32, were analyzed by targeted PCR by NGS. Spacer sequences of tested loci are found in Supplementary Table 9. NGS data were analyzed using CRISPRESSO33 and BE-analyzer (CRISPR RGEN tools)33 and representative substitution tables are found in Supplementary Table 3. gRNAs used in Fig. 4d and Supplementary Figs. 5g, 6b,e,f and 8b were described previously in ref. 34. gRNAs used in Supplementary Fig. 6g were described previously in ref. 16. gRNAs used in Supplementary Fig. 6b were described previously in ref. 35.
DNA Cas9-independent off-target editing analysis
The analysis of Cas9-independent off-target analysis was performed by R-loop assay following previously detailed protocols and gRNAs22; 150 ng of BE (nSpCas9-ABE8e-SpRY or evolved variant), 100 ng of SpCas9 gRNA plasmid, 150 ng of a catalytically dead SaCas9 and 100 ng of SaCas9 gRNA plasmid (targeting a genomic locus unrelated to the on-target site) were cotransfected into HEK293T cells. DNA was extracted after 72 h and analyzed using high-throughput sequencing.
RNA sequencing
HEK293T cells were seeded in 48-well plates and transfected with 300 ng of BE plasmid and 100 ng of gRNA plasmid per well. After 48 h, cells were lysed and total RNA was extracted using TRIzol (Invitrogen, cat. no. 15596026) reagent according to the manufacturer’s instructions. RNA sequencing was performed by Azenta Life Sciences according to the following protocol.
For RNA sequencing with poly(A) selection, total RNA was quantified using a Qubit 2.0 Fluorometer (Life Technologies), and RNA integrity was assessed using the Agilent TapeStation 4200 (Agilent Technologies). ERCC RNA Spike-In Mix (Thermo Fisher Scientific) was added to normalized RNA samples following the manufacturer’s protocol.
RNA sequencing libraries were prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina (NEB, cat. no. E7770S), following the manufacturer’s instructions. Briefly, mRNA was enriched using oligo(dT) beads, then fragmented at 94 °C for 15 min. First- and second-strand cDNA synthesis was performed, followed by end repair, 3′ adenylation, adapter ligation, index addition and PCR enrichment. Libraries were validated using the Agilent TapeStation and quantified using the Qubit 2.0 Fluorometer and quantitative PCR (KAPA Biosystems).
Prepared libraries were multiplexed and sequenced on an Illumina NovaSeq X Plus system (Illumina) using 2 × 150 bp paired-end reads. Image analysis and base calling were performed using NovaSeq Control Software (NCS), and raw BCL files were converted to FASTQ format and demultiplexed using bcl2fastq v.2.20, allowing one mismatch in index sequences.
RNA Cas9-independent off-target editing analysis
To assess A-to-I RNA editing, SNP and indel calling was performed using Samtools mpileup (v.1.3.1) followed by VarScan (v.2.3.9) with the following parameters: –min-coverage 1–min-reads2 1–min-var-freq 0.01–P value 0.05. Variant call format files from BE-treated samples were analyzed for A-to-I RNA editing events using a custom pipeline. Control variants were collected from three independent control samples and merged to generate a comprehensive background set (40,625 variants), which was subtracted from treatment samples to exclude germline or systematic variants.
A-to-G substitutions were considered putative A-to-I editing sites. Stringent quality filters were applied: minimum read depth ≥5×, minimum allele frequency ≥1% and maximum allele frequency ≤99% (to remove homozygous SNPs). After background subtraction, high-confidence A-to-I editing sites were retained. Editing percentages were calculated per site as the ratio of edited reads to total reads (AD/DP in the variant call format) and per sample as the average editing frequency across all sites passing filters.
Using this approach, we identified 207,541 high-confidence A-to-I editing sites across 12 samples (ABE8e-SpRY, V28C-SpRY and L34W-SpRY BEs with sgRNACtrl or agRNA56114, each in duplicate).
Generation of the selection phages for PANCE
The PANCE selection phages are carrying the CDS for the ABE8e adenine deaminase instead of the CDS of PIII. The ABE8e adenine deaminase has part of the peptide linker sequence and a C-terminal fused intein CDS to enable it to encode the relatively small protein and not the whole BE. The phages were generated by PCR amplifying the ABE8e adenine deaminase including the partial sequence of the peptide linker using the oligonucleotides ABE_M13_F and ABE_M13_R. The N-terminal Npu DnaE intein was ordered as gBlock and amplified using the oligonucleotides Npu_ABE_F and Npu_M13_R. The phage backbone was amplified using the oligonucleotides GOI_M13_F and GOI_M13_R using M13KO7 helper phage (NEB, cat. no. N0315S) genomic DNA as a template. All PCRs were performed using Q5 High-Fidelity 2× Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C. All fragments were digested with DpnI (NEB) overnight at 37 °C in the PCR buffer and PCR purified using the NEB Monarch PCR and DNA Cleanup Kit the next day. The fragments were assembled using the NEB Gibson Assembly Master Mix according to the manufacturer’s protocol and 3 µl of the reaction were transformed directly into electrocompetent S2060 pJC175e competent cells. The cells were recovered in 500 µl SOC medium for 45 min, then 450 µl and 50 µl were each mixed with 500 µl freshly grown S2060 pJC175e cells. The cells were mixed immediately with 3 ml soft LB agar (0.7%) and plated on LB bottom agar plates containing 100 µg ml−1 carbenicillin. The plates were incubated at 37 °C overnight. Plaques were picked into 50 µl 2× YT medium and 1 µl was used as a template for colony PCR using the oligonucleotides ABE_M13_F and Npu_M13_R. Positive phages were amplified by adding the remaining 2× YT medium to a freshly grown S2060 pJC175e culture at the OD600 of 0.4 and cultivating for 16 h at 37 °C. The cultures were spun down to remove the Escherichia coli cells and the phages were precipitated by adding a 20% polyethylene glycol (8000) and 2.5 M sodium chloride solution in a 1:4 ratio to the culture supernatant. The mixture is incubated for at least 3 h at 4 °C and the phage pellet is resuspended in a phosphate-buffered saline buffer, the phage titer was quantified using the Phage Titration ELISA kit (Progen, PRPHAGE) and the phages were stored at 4 °C until use. Furthermore, 3 ml of the culture supernatant was used for phage DNA isolation using the E.Z.N.A. M13 DNA Mini kit (Omega Bio-tek, cat. no. D6900-01). The isolated DNA was sent to Plasmidsaurus for whole-phage DNA sequencing.
Generation of the selection cells for PANCE
The selection plasmids were designed on the basis of using pJC175e (Addgene, cat. no. 79219)36 and adding mutations that, when edited by the ABE, only perfect edits restore PIII activity whereas bystander edits lower pIII activity. The pJC175e backbone was amplified using the oligonucleotides pIII_gBlock_R and pJC175e_Cas_F. The part of the pIII CDS containing the mutation followed by the corresponding guide correcting the introduced mutation downstream as well as the C-terminal DnaE intein necessary to fuse the ABE8e encoded by the phage to the nSpCas9-SpRY encoded by the selection plasmid were ordered as gBlock. Each selection plasmid also encodes the agRNA to fix the mutation on pIII (Supplementary Table 8). The three different gBlocks for the three different selection plasmids each encoding a different pIII mutation were amplified by PCR using the oligonucleotides gBlock_R and gBlock_pIII_F. The base nSpCas9 CDS was amplified from ABE8e plasmid (Addgene, cat. no. 138489) using the oligonucleotides BE_Npu_F and BE_pJC175e_R. All PCRs were performed using Q5 High-Fidelity 2× Master Mix according to the manufacturer’s protocol using 1 ng of template DNA and an annealing temperature of 60 °C. All fragments were digested with DpnI overnight at 37 °C in the PCR buffer and PCR purified using the NEB Monarch PCR and DNA Cleanup kit the next day. The fragments were assembled using the NEB Gibson Assembly Master Mix according to the manufacturer’s protocol and transformed into electrocompetent S2060 competent cells. The cells were recovered in 500 µl SOC medium for 1 h then plated on LB agar plates with 100 µg ml−1 carbenicillin and incubated overnight at 37 °C. Colonies were screened using colony PCR and positive clones were sent for whole-plasmid sequencing. Clones with verified sequence were used to generate electrocompetent cells that were then transformed with the mutation plasmid MP4 (Addgene, cat. no. 69652)37. The cells were recovered in 1 ml SOC medium and plated on 2× YT agar plates containing 1% glucose, 100 µg ml−1 carbenicillin and 25 µg ml−1 chloramphenicol. Five colonies were used to start 50-µl shake flask 2× YT 1% glucose, 100 µg ml−1 carbenicillin and 25 µg ml−1 chloramphenicol cultivations. The cultivations were used to freeze 20% glycerol stocks in 1 ml aliquots after 16 h. Each culture was also used to isolate plasmid DNA for whole-plasmid sequencing by Plasmidsaurus to select the glycerol stocks with no mutation in MP4 and the selection plasmid.
PANCE
The evolution was performed as ten consecutive batch cultivations in triplicate using a mix of three different selection plasmids in each evolution. For the PANCE experiment, the day before the cultivation, three 3-ml overnight cultures were prepared using 2× YT medium with 1% glucose, 100 µg ml−1 carbenicillin and 25 µg ml−1 chloramphenicol. The cultures are inoculated with the glycerol stock of one of the selection plasmids each. The following day, 3–4 h before phage infection, 50-ml shake flasks are inoculated with a combined OD600 of 0.1 of the pooled overnight cultures with the different selection plasmids. The cells are cultivated in 2× YT with 100 µg ml−1 carbenicillin and 25 µg ml−1 chloramphenicol. At 30 min before reaching an OD600 of 0.4, the cells are induced with 0.5% arabinose and when the cells reach the OD600 of 0.4, the cells are infected with the selection phages at a multiplicity of infection of 1 for the first selection round. The evolution is performed for 12 h at 37 °C; thereafter, the entire cultivation was spun down and the supernatant was filtered with 0.2 µM filters. For selection rounds 2–4, 500 µl, for rounds 5–6 100 µl and for the remaining rounds 5 µl of the supernatant were used to infect the following evolution. The phage titer after each selection round was determined using the Progen Phage Titration ELISA kit. A 3-ml aliquot of each culture supernatant was used for phage DNA isolation using the Omega Bio-tek E.Z.N.A. M13 DNA Mini Kit. Sequences of V28C, L34W, M151E and V28C-M151E are detailed in Supplementary Table 8.
Illumina sequencing of the PANCE experiment
To sequence the PANCE variants, 1 ng of the isolated plasmid phage DNA of each selection round was amplified using the oligonucleotide mix IllSeq_ABE_i5_F1-4 and IllSeq_ABE_i5_R1-4 using Q5 High-Fidelity 2× Master Mix according to the manufacturer’s protocol. The forward and reverse oligonucleotide contained an i5/i7 overhang for indexing as well as four to seven Ns to ensure shifting of the sequence to be able to determine sequences with high identity. The resulting PCR products were amplified in a second PCR reaction using a compatible combination of the NEBNext Multiplex Oligos for Illumina. The PCR products were purified using the Monarch Gel Extraction Kit and quantified using Qubit-HS. A 4-nM pool of the different libraries was generated and 10% 4 nM PhiX was added. The pool was sequenced using the MiSeq Reagent Kit v.3 (600-cycles, Illumina, MS-102-3003) according to the manufacturer’s protocol.
PreS calculation
To quantify the specificity and efficiency of each variant in editing the target nucleotide with minimal bystander activity, we defined a PreS using the following formula:
$$\begin{array}{rcl}\begin{array}{l}{\rm{PreS}}=({\rm{ \% }}\displaystyle \frac{{\rm{On}}\,{\rm{target}}\,{\rm{editing}}}{{\rm{Average}}\,{\rm{bystander}}\,{\rm{editing}}})\\ \,\times ({\rm{ \% }}\displaystyle \frac{{\rm{On}}\,{\rm{target}}\,{\rm{editing}}\,({\rm{variant}})}{{\rm{On}}\,{\rm{target}}\,{\rm{editing}}\,({\rm{ABE}}8{\rm{eCtrl}})})\end{array}\end{array}$$
This score integrates two key components:
Editing precision as the ratio of on-target editing to average bystander editing. This reflects the precision of editing at the intended nucleotide relative to unintended nearby edits.
Relative efficiency as the comparison of a variant’s on-target editing efficiency to that of ABE8e—our reference editor. This normalization controls for differences in baseline editing efficiency and allows comparison across variants.
A higher PreS indicates a variant that edits the intended base with high efficiency and minimal bystander activity relative to ABE8e. For example, ABE8e-SpRYCtrl has a PreS of 0.8 at the DNMT1 site. The final value was calculated by dividing the editing efficiency at position A8 (15.3%) by the total bystander editing at positions A4, A5 and A6 (19%), and multiplying by a normalization factor derived from the ratio of editing at A8 to itself (15.3%/15.3%), which simplifies to 1. All PreS above 0.8 indicate higher precision and or activity.
Evolutionary plausible mutations prediction with ML
We followed the approach used by Hie et al.23, which consists of using an ensemble of six PLMs: ESM-1b23 and ESM-1v24, composed itself of five models (accessible at: https://github.com/facebookresearch/esm). Together, the six models are used to predict what amino acids, if any, would be more likely than the current wild-type amino acid at each position of the protein sequence given as input. The number of models in the ensemble agreeing on a given prediction (that is, a specific amino acid substitution) allows to score a given substitution, where a higher score is more likely to yield a positive result.
We applied the ensemble of PLMs to the TadA-8e sequence, which yielded the following predictions (PreS in parenthesis): R26G (6), F84L (6), N108D (6), Y149F (6), F156K (6), V106A (5), P152R (5), H8D (5), N157K (5), R111T (4), C146S (3), R111A (2), C146Q (2), M151E (2), C146K (1), Y123H (1), M151Q (1), A48P (1), V155E (1), S109P (1), P152Q (1).
Computational analysis of structural impact on ABE8e
We used the crystal structure of ABE8e named 6vpc available in the Protein Data Bank database and the software ChimeraX (v.1.6.1 (9 May 2023) for visualization and analysis of the structure.
We first attempted to predict the structures of relevant mutants in this study using AlphaFold2 (available in the colab notebook from sokrypton). However, no structural change was predicted for these single or double mutants with AlphaFold2, which agrees with previous observations on the limitations of AlphaFold2. In addition, a region relevant to this study involving residues from 151 to 160 of chain F was folded incorrectly by AlphaFold2. Hence, we decided to directly mutate our target residues on the crystal structure of ABE8e, although this assumes there is no conformational or folding change for our mutants.
Using the command line in ChimeraX, we visualized, mutated and analyzed interactions of target residues. We also analyzed interactions of nucleotides 5 to 8 in the gRNA with residues of ABE8e. We sought for hydrogen bonds, nonpolar (van der Waals) interactions between carbon atoms in the gRNA and protein at a maximum distance of 3.8 Å, and cationic interactions between nitrogen atoms within 5 Å of an aromatic carbon involving our target residues and nucleotides.
Editing of iPS cells
Plasmid nucleofection was performed using the Neon Electroporation System (Thermo Fisher) 10 μl kit. A total of 200,000 cells were resuspended in ~10 μl of buffer R and mixed with 200 ng of gRNA vector and 200 μg of BE. Nucleofection was performed using the following parameters: voltage, 1,400 V; width, 20 ms; pulses, two pulses. After nucleofection, cells were plated in 12-well plates with 400 μl of StemFlex medium (Gibco, cat. no. A3349401) without antibiotic and 1:100 dilution of RevitaCell Supplement (Gibco, cat. no. A26445-01). After 24 h, the medium was replaced and editing was evaluated after 48 h by NGS (Quintara Biosciences). gRNAs were generated for this study.
Statistics and reproducibility
No statistical methods were used to predetermine sample size. Sample sizes were selected based on standards from previous genome editing studies and were sufficient to ensure reproducibility. For cell-based and sequencing experiments, three to six independent replicates were performed, which reliably captured biological variability and produced consistent effect sizes across experiments.
All experiments were performed independently at least three times, unless otherwise stated. Statistical analyses and plots were generated using GraphPad Prism v.10 (GraphPad Software LLC). Two-tailed, unpaired Student’s t-tests, one-way and two-way analysis of variance (ANOVA) were used to evaluate differences between groups, with P < 0.05 considered statistically significant. In cases where adding P values directly to the figures would compromise visual clarity, the corresponding statistical comparisons are provided in Supplementary Table 10.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.






