Design, synthesis and functional characterization of diversified U6 promoters
To date, only a handful of Pol III promoters have been characterized for genome engineering in mammalian cells11,15,32. To identify sequence-diversified and activity-diversified promoters, we showed two complementary approaches to design ~200 diversified Pol III U6 promoters (~100 through evolutionary diversification and ~100 through synthetic diversification; Fig. 1a). To quantify and ensure the sequence diversity, we developed an algorithm that calculates the length and identity of the longest shared repeat between every possible pair of sequences in either orientation, termed as Lmax (Supplementary Fig. 1a)16. For compatibility with contemporary protocols for large-scale assembly of synthetic DNA in yeast, our goal was to identify a set of sequences that satisfied the constraint of Lmax < 40 (refs. 27,29).
a, Synthetically and evolutionarily diversified U6 promoters were tested in three human cellular contexts with a multiplex prime editing functional assay. Edit scores were defined as the frequency of an iBC at the genomic target site divided by the frequency of the same barcode in the pBC. b, Lmax distributions quantifying the maximum shared repeat length between all possible pairs of sequences for the evolutionarily diversified U6 promoter library (n = 97; 4,656 pairs), the synthetically diversified hRNU6-1p library (n = 112; 6,216 pairs) and the combined set (n = 209; 21,736 pairs) in the same orientation. See Supplementary Fig. 1b for Lmax distributions for reverse complement comparisons. c, Pairwise comparison of log-transformed edit scores between cellular contexts. Pearson correlations, calculated on barcode-normalized edit scores before log transformation, are shown. d, Sequence identity with hRNU6-1p (x axis) is not predictive of functional activity of synthetically or evolutionarily diversified U6 promoters. Spearman correlation is shown. e, Edit scores of 146 functional diversified U6 promoters ordered left to right by ascending median edit score across three human cellular contexts.
For evolutionary diversification, we selected 89 diverse orthologs of human U6 promoters with putative transcriptional activity33 from various vertebrate species, the canonical human RNU6-1 promoter that is widely used in mammalian RNAi and gRNA delivery vectors11,34,35,36,37, four mammalian promoters designed for a 3-gRNA array lentiviral Perturb-seq vector11,34,35,36,37 and finally three additional human U6 promoters that were sufficiently divergent from the human RNU6-1 promoter33 that, as a set, satisfied Lmax < 40 (n = 97; promoter length range = 249–600 bp, mean length = 475 bp). For synthetic diversification, we used the human RNU6-1 promoter as a starting template, and shuffled nucleotides located in between known core transcription factor binding sites (TFBSs), and, in a subset of cases, introducing putatively tolerated single-nucleotide variants (SNVs) into core TFBSs as well as random 3-bp spacers between core TFBS, again ensuring that, as a set, these satisfied Lmax < 40 (n = 112; promoter length range = 249–252 bp, mean length = 250 bp). Applying the Lmax algorithm to the combined set of 209 diversified Pol III U6 promoters by checking all 21,736 possible pairwise combinations, we found that they continue to satisfy Lmax < 40 (Fig. 1b, Supplementary Fig. 1b and Supplementary Table 1; Methods).
We then sought to perform a multiplex experiment that quantified the relative activity of these Pol III promoters. For this, we cloned the promoters upstream of a prime editing gRNA (pegRNA) designed to install a 5-bp insertional barcode (iBC) at the HEK3 locus in the human genome, with a strategy that linked each Pol III promoter to a specific barcode (Fig. 1a)38,39. In the experiments described below, we quantify the functional activity of a given promoter as the frequency of its iBC at the genomic target site (iBC) normalized by the frequency of the same barcode in the plasmid library (pBC) encoding the promoter–pegRNA combinations. We refer to this ratio as the edit score, analogous to regulatory element activity scores of massively parallel reporter assays (Fig. 1a)39,40.
To account for the possibility that the barcodes themselves influence pegRNA abundance and/or prime editing efficiency31, we also measured the RNA abundance and insertion efficiency of every possible 5N insertion (n = 1,024 barcodes) when driven by the standard human RNU6-1 promoter (Supplementary Figs. 2 and 3 and Supplementary Table 2). The biases associated with transcription and editing are overwhelmingly uncorrelated, with the exception of seven 5 N variants that contain a ‘TTTT’ polyT Pol III termination sequence, and thus exhibit consistently severe depletion in both transcription and editing data (Supplementary Fig. 3c–e). To correct for barcode bias, we use the relative editing rates of the 5N barcodes from this experiment, which reflect the combined consequences of transcription bias and editing bias, to further normalize the edit scores calculated for individual promoters or scaffolds.
We introduced this library of Pol III promoter-driven pegRNAs to human K562 cells, HEK293T cells or induced pluripotent stem cells (iPSCs) that had been engineered to stably express a prime editor38,41. Both synthetically and evolutionarily diversified U6 promoters drove genome editing at the HEK3 locus at a broad range of levels (Fig. 1c–e, Supplementary Fig. 4 and Supplementary Table 1). Edit scores were reasonably well-correlated between technical replicates (r = 0.47–0.96) and cellular contexts (r = 0.85–0.96; Fig. 1c and Supplementary Fig. 4). Of note, evolutionarily diversified U6 promoters displayed greater variance in activity levels than synthetically diversified alternatives, consistent with their greater sequence divergence from the human RNU6-1 promoter (Fig. 1d and Supplementary Fig. 5). The canonical human RNU6-1 promoter was consistently among the most active promoters, modestly outperformed by only a U6 promoter of Ornithorhynchus anatinus, the duck-billed platypus (1.2–1.8-fold; Fig. 1e and Supplementary Table 1).
Altogether, we identified 146 of 209 (70%) promoters that drove editing in all three cellular contexts (Fig. 1e). There were 70 promoters displaying edit scores of >1 across all contexts, which correspond to activity within about 50-fold of the standard human RNU6-1 promoter (Supplementary Table 1). Among these, there were 28 promoters whose activity fell within fivefold of the standard human RNU6-1p in all three contexts, including all three other human U6 promoters tested33, 2 of 4 promoters previously tested in ref. 11 and 23 newly characterized promoters (21 evolutionary diversified, 2 synthetically diversified; Fig. 1e and Supplementary Table 1). A total of 4 of these 23 highly functional, newly characterized U6 promoters ranked higher than previously characterized nonhuman RNU6-1p orthologs, specifically those of the common snapping turtle (Chelydra serpentina), the one-humped camel (Camelus dromedarius), the domestic muscovy duck (Cairina moschata domestica) and, finally, the aforementioned platypus (Fig. 1e and Supplementary Table 1).
We sought to validate these results using two strategies. First, we identified a subset of the diversified U6 promoters representing a broad range of activity levels in the primary screen and then recloned and independently tested them in a monoclonal PEmax-iPSC line (n = 50 diversified U6 promoters together with the standard human RNU6-1p). Results from this validation set correlated strongly with results from the primary screen (r = 0.93; Supplementary Fig. 6a). Second, we simultaneously measured transcription scores and edit scores for all 209 diversified promoters using targeted RNA-seq of pegRNA transcripts and our multiplex prime editing functional assay, respectively (Supplementary Fig. 6b). The resulting data were reproducible across transfection replicates (all r > 0.97 for edit scores; all r > 0.82 for transcription scores). Furthermore, edit scores correlated well with both edit scores from the primary screen (r = 0.87) and transcription scores from the validation screen (r = 0.83; Supplementary Fig. 6c–h and Supplementary Table 3).
Together with the primary screen, these validation experiments confirm that synthetically and evolutionarily diversified U6 promoters from across species are functional in human cells and reproducibly exhibit a broad range of activities in driving genome editing. Although both strategies yielded functional promoters with activities within fivefold of that of human RNU6-1p, the vast majority of this highly active subset were evolutionarily diversified. While human RNU6-1p was consistently among the top performers in human cells, there were a few U6 promoters from extant species that exhibited comparable activity in human cells, despite extensive sequence divergence.
Design, synthesis and functional characterization of diversified pegRNA scaffolds
Diversifying gRNA scaffolds is considerably more challenging than diversifying Pol III promoters due to extensive constraints on gRNA secondary structure12,16,42,43,44. We designed libraries of diversified pegRNA scaffolds to satisfy Lmax < 40 using two approaches. First, we introduced putatively secondary structure-retaining 5N and 4N replacements to repeat:antirepeat (R:AR) regions (‘replacement designs’). Second, we introduced 5N insertions to regions predicted to tolerate insertions based on pegRNA secondary structure, along with R:AR 5N replacements (‘extension designs’). Altogether, we designed 174 replacement scaffolds and 138 extension pegRNA scaffolds, and then specific versions of these to install a 5-bp iBC at the human HEK3 locus (Fig. 2a and Supplementary Fig. 7).
a, Diversified pegRNA scaffold designs. Complementary R:AR sequences were introduced at specific locations, producing either replacement (top) or extension (bottom) variants of the conventional pegRNA scaffold. b, Pairwise comparison of log-transformed edit scores between cellular contexts. Pearson correlation coefficients, calculated on barcode-normalized edit scores before log transformation, are listed. c, Replacement scaffolds tended to have higher edit scores than extension scaffolds. d, Diversified pegRNA scaffolds that eliminated a Pol III termination sequence consistently exhibited higher edit scores than the standard scaffold. Boxes represent the 25th and 75th percentiles, box centerline represents the median. Whiskers extend from hinge to 1.5× the interquartile range (n = 3 transfection replicates for each of two separate libraries, each with a different iBC per scaffold; these six edit scores for each scaffold are shown). Term., termination.
We synthesized and cloned these 312 pegRNA scaffold variants downstream of human RNU6-1p, each driving a specific iBC, and introduced them to human K562 cells, HEK293T cells or iPSCs that stably expressed a prime editor38,41. Because the impact of the iBC sequence on pegRNA secondary structure and insertion efficiency can be difficult to predict39,45, we also synthesized and cloned each pegRNA scaffold with an alternate iBC in a second library, which was tested independently. After sequencing 5-bp iBCs at the HEK3 locus, we quantified the edit score for each scaffold–iBC pair and normalized these for differential iBC efficiencies as above (Supplementary Table 4). Results correlated reasonably well across cellular contexts (r = 0.82–0.96; Fig. 2b and Supplementary Fig. 8) and across independent iBC sets (r = 0.58–0.75; Supplementary Fig. 9). Overall, replacement designs markedly outperformed insertion designs (13-fold to 37-fold higher median edit score across cellular contexts; Fig. 2c).
Altogether, we identified 272 of 312 (87%) pegRNA scaffolds that drove editing with both iBCs across all cellular contexts (Supplementary Table 4). Among these, 58 functioned within fivefold of the standard pegRNA scaffold with both iBCs across all cellular contexts, including 7 that outperformed the standard pegRNA scaffold (Fig. 2d and Supplementary Table 4). These seven included a scaffold with a previously described A-U flip design that swaps nucleotides in the first R:AR region to remove a polythymidine Pol III termination sequence (‘TTTTA:TAAAA’ > ‘TTTAA:TTAAA’), previously reported to improve function by reducing premature termination of Pol III transcription13,46. The remaining six scaffolds that outperformed the standard pegRNA each maintain the first two ‘TT’ nucleotides in the first R:AR sequence while introducing variants that disrupt the Pol III termination sequence through means other than the A-U flip (Fig. 2d). Taken together, these results identify dozens of sequence-diversified pegRNA scaffolds that are similarly active to the conventional scaffold in human cells, and confirm two strategies to diversify (pe)gRNA scaffolds while maintaining or improving their function, namely, introducing complementary R:AR variants and/or removing Pol III termination sequences.
Saturation mutagenesis and functional assessment of a miniaturized U6p–pegRNA cassette
The diversified parts described thus far were designed to satisfy Lmax < 40, a practical requirement for yeast-based assembly of large constructs27,29. Smaller subsets of parts can be selected from these libraries to further increase diversity. However, gaining more comprehensive knowledge about which variants can be introduced to a Pol III promoter and/or gRNA scaffold while retaining functionality would enable the design of even more diversified parts to meet more stringent Lmax requirements. To this end, we conducted saturation mutagenesis and functional assessment of a U6p–pegRNA cassette.
To focus our efforts on the most critical sequence elements, our ‘wild-type’ construct appends a miniaturized version of the canonical human RNU6-1 promoter47 that retains its four-key TFBS while deleting divergent intervening regions (shortened from 249 to 111 bp; Fig. 3a and Supplementary Table 5) to a standard pegRNA driving a 5-bp insertion (124 bp). We first sought to confirm that the wild-type version of this 235-bp minU6p–pegRNA cassette is functional, and found it drove editing at 38% of standard hRNU6-1p levels (Fig. 3a). In contrast, the deletion of TFBS from minU6p severely diminished activity (169-fold to 2,732-fold reduction; Fig. 3a). The H1 promoter, a naturally occurring human Pol III promoter, similarly miniaturized in the sense that the TFBS are retained, exhibited similar activity as miniaturized U6p (29% of standard hRNU6-1p; Fig. 3a). Taken together, these results confirm that retention of TFBS while deleting divergent intervening sequences is a general approach for deriving miniaturized Pol III promoters that retain function47,48.
a, Left, human Pol III promoter deletion series constructs and corresponding lengths. Locations of key TFBS are labeled. The top five rows correspond to hRNU6-1p and miniaturized variants thereof. The key TFBSs are always in the same order from 5′ to 3′ (5′–SPH–OCT–PSE–TATA). The bottom row corresponds to the 100-bp human H1 promoter, in which the positions of the OCT and SPH elements are reversed relative to hRNU6-1p. Right, log-scaled edit scores of wild-type or miniaturized Pol III promoters (n = 3 transfection replicates each with four iBCs per promoter, and mean of the edit scores of these four iBCs per transfection replicate is shown). b, Variant effect maps of saturation mutagenesis of a miniaturized hRNU6-1p–pegRNA cassette tested across three human cellular contexts. Color-scaled, log-transformed fold changes in median edit scores relative to minU6p–pegRNA are shown. Edit scores were not calculated for the unboxed region surrounding the pBC, as exact matches spanning this region were required for edit quantification.
With the wild-type miniaturized U6p–pegRNA as the baseline, we designed, synthesized and cloned two libraries encoding every possible single-nucleotide substitution and single-nucleotide deletion across its length (230 bp excluding the 5N iBC; n = 920 variants in total; a second library is identical but with a different set of iBC pairings; Supplementary Table 5). We then, as above, introduced these libraries to three human cellular contexts and quantified edit scores. These experiments revealed a biologically coherent landscape of variant effects with consistent sequence–function relationships across cellular contexts (Fig. 3b and Supplementary Figs. 10 and 11). As expected, given the flexibility of the cis-regulatory code, the U6 promoter region (positions 1–111) was more tolerant to variation than the pegRNA (positions 112–235; 1.6-fold to 1.9-fold higher median edit score across cell contexts; Fig. 3b and Supplementary Fig. 11). Single-nucleotide deletions within the U6 promoter TATA box (positions 81–89, ‘TTTATATAT’) were not tolerated (Fig. 3b). Activity was also particularly compromised by deletions in the nucleotides forming the final pegRNA stem loop (positions 198–202, ‘GAGTC’; 2.1-fold to 5.4-fold lower edit scores than all other deletions) or PAM-proximal portion region of the spacer (positions 122–131, ‘GAGCACGTGA’; 1.4-fold to 1.6-fold lower edit scores than all other deletions; Fig. 3b and Supplementary Fig. 11). These results are consistent with the core roles of these elements in the editing cycle of a pegRNA—transcription, stability and target nicking, respectively.
In contrast to single-nucleotide deletions, many SNVs were tolerated throughout the length of the cassette, and several displayed enhanced performance compared to the miniaturized U6p–pegRNA cassette (Fig. 3b and Supplementary Table 5). In particular, 16 of 920 variants, 15 of which were SNVs, displayed increased edit scores across both iBCs in all three cellular contexts (median 1.9-fold higher edit scores, max = 20.8-fold; Fig. 3b and Supplementary Table 5). A total of 13/16 (81%) of these variants were in the miniaturized promoter, of which 5 introduced substitutions to a ‘TATT’ sequence at the end of the proximal sequence element (PSE; positions 64–67), which may boost function by improving promoter conformation and/or transcription initiation from the immediately downstream TATA box. Furthermore, 3 of 16 (19%) variants with improved function in the pegRNA region all introduced substitutions to two neighboring nucleotides near the 3’ end of the primer binding site (231 G > C; 232 T > C; 232 T > A), suggesting that these variants may yield a more optimal primer and/or more stable pegRNA. Relaxing these criteria, we identified 499 variants that functioned within fivefold of the wild-type minU6p–pegRNA cassette across barcodes and contexts, and 764 that functioned within 50-fold. These results provide a rich set of enhancing or tolerated SNVs that can be leveraged to boost sequence diversity as needed (Fig. 3b and Supplementary Table 5).
Diversified U6 promoters exhibit consistent functional activities in mouse embryonic stem cells (mESCs)
To assess whether the activities of these parts are human-specific or consistent across mammalian models, we then sought to characterize them in mESCs. As mESCs lack an endogenous HEK3 locus, we introduced synthetic human HEK3 target sites49 (synHEK3) and PEmax through piggyBac transposition at a high multiplicity of integration, and isolated a monoclonal line with an estimated 87 synHEK3 targets (29 integrations × 3 synHEK3 targets per integration; Supplementary Fig. 12). We then introduced the original library of evolutionarily or synthetically diversified U6 promoters (n = 209) to this cell line and quantified edit scores as above.
As in human cells, diversified U6 promoters drove prime editing in mESCs with a very high correlation between technical replicates (r > 0.99; Supplementary Figs. 12 and 13). We speculated that this high reproducibility was due to the much larger number of synHEK3 sites in these engineered mouse cells compared to the endogenous HEK3 sites in human cell lines (~87 versus 2–3), which is expected to decrease measurement noise. To confirm this, we generated a new monoclonal HEK293T line harboring ~146 synHEK3 target sites and retested the library of 209 diversified U6 promoters. As in mESCs, we observed that introducing many synHEK3 target sites resulted in much higher replicate correlations in human cells as well (r = 0.96–0.98; compare HEK293T results in Supplementary Figs. 13 and 14 to those in Supplementary Fig. 4).
Furthermore, results also correlated well between human and mouse cells (r = 0.73–0.80; Supplementary Figs. 12 and 13 and Supplementary Table 1). In mESCs as in human cells, evolutionarily diversified U6 promoters exhibited greater variance in activity (Supplementary Figs. 12 and 13). The human RNU6-1 promoter was again among the top-performing promoters in mESCs, consistently outperforming a commonly used, modified mouse U6 promoter11,50,51 as well as another mouse U6 promoter that was part of the evolutionarily diversified set (Supplementary Figs. 12 and 13 and Supplementary Table 1). Other evolutionarily diversified promoters that were among the most highly active in the human context were similarly highly active in the mouse context (Supplementary Figs. 12 and 13).
Taken together, these results suggest that these diversified U6 promoters can likely be used across both human and mouse model systems, with the expectation that their activities will be similar to those observed in human cell lines.
Testing thousands of ancestral, extant and mutagenized sequences reveals highly active Pol III promoters for mammalian genome editing
We then sought to scale both our evolutionary and synthetic approaches to further expand the set of sequence-diversified and activity-diversified Pol III promoters available for use in synthetic biology and genome engineering. Functional candidate parts for genome engineering can be mined from both extant and ancestral genomes, as has been done for cytidine deaminases52. We leveraged the Zoonomia Project’s 240-species Cactus genome alignment53,54,55 to identify extant and ancestral orthologs of seven Pol III promoters known to be functional in mammalian cells (RNU6-1, RNU6-2, RNU6-7, RNU6-8, RNU6-9, H1 and 7SK promoters). Altogether, we extracted 2,192 unique Pol III promoter sequences, including 1,084 that exactly match at least one extant genome, and 1,108 that solely occur in inferred, ancestral genome(s). We supplemented these mammalian Pol III promoters with saturation mutagenesis libraries that encompass all single-nucleotide substitutions and deletions of the human H1 (100 bp, 401 variants including wild-type) and 7SK (243 bp, 973 variants including wild-type) promoters. Altogether, this library contained 3,566 ancestral, extant or mutagenized mammalian Pol III promoters (Fig. 4a).
a, Library design, contents and multiplex prime editing functional assessment workflow. b, Edit scores correlations across the four transfection replicates. Points represent edit scores for the three independent iBCs paired with each of the 3,566 promoters (10,698 constructs total). Pearson correlations, calculated on barcode-normalized edit scores before log transformation, are shown. c, Edit score distributions for the different promoter classes tested in this experiment. The standard human RNU6-1 promoter is shown in the top row, and its mean activity is marked with a vertical dashed line.
To facilitate the accurate quantification of the relative activities of these promoters, we leveraged insights from earlier experiments. First, given the high technical reproducibility of multiplex prime editing experiments conducted in monoclonal mESCs and HEK293Ts with large numbers of synHEK3 target sites (r > 0.99; Supplementary Figs. 12–14), we used a monoclonal K562 line with 22 synHEK3 targets49 and PEmax41 as our prime editor for these experiments (Fig. 4a). Second, we paired each Pol III promoter with three independent iBCs (3,566 promoters × 3 iBCs = 10,698 constructs total), accommodating the larger library size by switching from a 5-bp to 8-bp barcode. To facilitate downstream normalization, we measured the relative insertion activity of all 65,536 possible 8N insertions when driven by the same hRNU6-1p promoter (Supplementary Fig. 15 and Supplementary Table 6).
After transfection and synHEK3 amplicon sequencing, we observed the expected insertional edits with strong concordance in edit scores derived from four transfection replicates (r > 0.94; Fig. 4b). We also observed strong correlation across the three independent iBCs associated with each Pol III promoter (r > 0.80; Supplementary Fig. 16). This correlation was markedly improved by correcting the relative barcode insertion efficiency (r = 0.48–0.51 before versus 0.80–0.81 after barcode correction; Supplementary Fig. 16). This result reinforces the importance of having relative activity measurements for all iBCs used, particularly for longer iBCs, which exerted greater influence on raw edit scores than shorter barcodes (Supplementary Figs. 2, 3 and 13).
Global analyses of this screen revealed a broad range of mammalian Pol III promoter activity levels, with the clear differences between the activity distributions of the classes of elements tested. Evolutionary orthologs of the H1 promoter exhibited weaker activity than orthologs of U6 or 7SK promoters (Fig. 4c), consistent with our earlier comparisons of the short H1 and miniaturized U6 promoters compared to the full length U6 promoter (Fig. 3a). Also consistent with expectation, saturation mutagenesis of the human H1 and 7SK promoters highlighted the four core TFBSs as particularly constrained, while also identifying numerous tolerated and activity-enhancing SNVs that could be leveraged for additional diversification (Supplementary Fig. 17). Notably, as compared with U6, the H1 and 7SK Pol III promoters were much more tolerant of single-nucleotide deletions in their TATA boxes, but much less tolerant of mutations in the SPH or PSE elements (Fig. 3b and Supplementary Fig. 17).
As in earlier screens, hRNU6-1p was among the most highly active promoters (Fig. 4c). Remarkably, however, we also identified 982 promoters that outperformed hRNU6-1p across all iBCs (982 of 3,566 or 28%, including 475 U6, 26 H1 and 481 7SK promoter orthologs; median 1.3-fold increase over hRNU6-1p; Fig. 4c and Supplementary Table 7). A total of 408 of 982 (42%) of these hRNU6-1p outperformers were not present in any extant mammalian genome in the Zoomania Project, highlighting the potential value of inferred, ancestral genome(s) as a source of noncoding regulatory parts for synthetic biology. These included the most active Pol III promoter in this experiment, a 7SK promoter ortholog from an intermediate ancestral rodent genome that drove prime editing at synHEK3 sites with 2.6-fold greater activity than hRNU6-1p. Other top performers derived from saturation mutagenesis (25%) or extant genomes (33%), the latter including Pol III promoters from the genomes of the Java mouse deer (Tragulus javanicus), long-tongued fruit bat (Macroglossus sobrinus), Linnaeus’s two-toed sloth (Choloepus didactylus) and one of our closest relatives, the bonobo (Pan paniscus; Supplementary Table 7).
We then sought to validate results for these 3,566 promoters by conducting a full replication experiment with simultaneous genome editing and transcription measurements (Supplementary Fig. 18a). The resulting data were reproducible across transfection replicates (all r > 0.9 for edit scores; all r > 0.86 for transcription scores). Furthermore, edit scores correlated well with both edit scores from the primary screen (r = 0.96) and transcription scores from the validation screen (r = 0.74; Supplementary Fig. 18b–g and Supplementary Table 8). These results provide further confidence in the estimated activity levels of these 3,566 diversified Pol III promoters.
While our main goal was to generate diversified parts to facilitate genome engineering, synthetic biology and molecular recording, this experiment incidentally mapped the distribution of activities of ancestral and extant orthologs of Pol III across the mammalian phylogeny (Supplementary Fig. 19). For example, at least when assayed in human cells, hRNU6-9p orthologs from primates are more active than hRNU6-9p orthologs from other orders (false discovery rate < 0.1), while hRNU6-1p orthologs are not (Supplementary Fig. 20). Further investigation of such patterns with phylogenetic methods has the potential to shed light on the evolution of Pol III promoter sequences.
We suspect that the much higher proportion of Pol III promoters whose activities exceed hRNU6-1p in this screen, as compared with the primary screen, follows from sampling an order of magnitude more sequences from more closely related species, with less attention to ensuring their sequence divergence. Alternatively, this may stem from modest overestimation of hRNU6-1p activity in earlier, single barcode screens (see further validations below, which support this interpretation). Nonetheless, this set is sufficiently large to enable the selection of subsets that are highly sequence-diverse, so as to facilitate yeast-based assembly. For example, of the 481,687 possible pairwise comparisons among the 982 Pol III promoters that outperformed hRNU6-1p, there exist subsets of at least 205 that satisfy Lmax < 40 (Supplementary Fig. 21). This effectively provides a large set of yeast-assembly-compatible Pol III promoters that are as or more active than hRNU6-1p for driving genome editing.
Validation of diversified Pol III promoters and gRNA scaffolds at additional target loci identifies parts that consistently outperform the standard components
We then sought to validate diversified Pol III promoters and gRNA scaffolds at additional genomic target loci. First, we selected 20 diversified Pol III promoters that exhibited a broad range of activity levels in the primary (n = 209) or scaled (n = 3,566) screens, including hRNU6-1p. We paired each of these 20 promoters with three pegRNAs designed to install unique 8 N iBCs at each of five distinct genomic target loci—CLYBL, EMX1, FANCF, HBB and synHEK3 (20 promoters × 3 8 N iBCs × 5 target loci = 300 constructs; Fig. 5a). Second, we took all 313 gRNA scaffold designs and reprogrammed them to install three unique 8 N iBCs at the same five target loci. We supplemented these with an additional 100 new gRNA scaffold variants that preserve the transcription-enhancing A-U flip variant while introducing additional diversifying R:AR replacement variants (413 scaffolds × 3 8 N iBCs × 5 target loci = 6,195 constructs; Supplementary Fig. 22a).
a, Library design, contents and multiplex prime editing functional assessment workflow. b, Diversified Pol III promoters drove editing across all tested target loci—CLYBL, EMX1, FANCF, HBB and synHEK3. Editing efficiencies, calculated as the percentage of reads with programmed 8-bp insertions at each locus for each transfection replicate (n = 4), are shown. Boxes represent the 25th and 75th percentiles, box centerline represents the median. Whiskers extend from the hinge to 1.5× the interquartile range. c, Reproducibility of edit scores between transfection replicates for synHEK3 target sites. Pearson correlation coefficients, calculated on edit scores for each construct before log transformation, are listed. d, Reproducibility of edit scores from the primary screen versus the validation screen. Pearson and Spearman correlation coefficients, calculated on edit scores before log transformation, are listed. e, Comparison of edit scores at synHEK3 versus exemplary alternative target locus, CLYBL. Pearson and Spearman correlation coefficients, calculated between log-transformed edit scores, are listed. f, Barplot of Pearson and Spearman correlation coefficients, calculated between log-transformed edit scores, between synHEK3 and alternative target loci. g, Diversified Pol III promoter edit scores at synHEK3. Four points are plotted for each of 20 promoters (x axis), each representing mean promoter edit scores across three 8N iBCs for one transfection replicate (points are overlapping due to high reproducibility, such that they are not visually distinguishable). The ancestral rodent 7SK promoter was the top-performing promoter in both the primary screen and cross-locus validations. Anc., ancestral.
We introduced these libraries into a monoclonal HEK293T line expressing PEmax and bearing ~146 randomly integrated synHEK3 target sites. After 3 days, we independently amplified each endogenous target locus, or all synHEK3 sites, and quantified edit scores. Diversified promoters and scaffolds successfully drove editing at all five target loci (Fig. 5b and Supplementary Fig. 22b). As expected based on our earlier screens, edit scores at synHEK3 correlated exceptionally well across transfection replicates for both diversified Pol III promoters (r > 0.99; Fig. 5c and Supplementary Fig. 23a) and gRNA scaffolds (r > 0.99; Supplementary Figs. 22c and 24a). At single-copy endogenous target loci, edit scores also correlated reasonably well across transfection replicates for both Pol III promoters (CLYBL, r = 0.87–0.92; EMX1, r = 0.86–0.94; HBB, r = 0.91–0.98; FANCF, r > 0.99) and gRNA scaffolds (CLYBL, r = 0.63–0.74; EMX1, r = 0.45–0.66; HBB, r = 0.61–0.79; FANCF, r = 0.83–0.89). The more modest reproducibility at alternative endogenous sites than we observed for endogenous HEK3 is likely due to a combination of sparse measurements for poorly active scaffolds and target-specific differences in iBC insertion efficiencies (that is, we did not measure baseline efficiencies for all 65,536 8N iBCs at these alternative endogenous loci as we did for HEK3/synHEK3).
Are the activities of parts at one genomic location or target site predictive of their activities at another? For the former (generalizability across genomic locations), we compared results from endogenous HEK3 (primary screen) versus synHEK3 sites (validation screen) and found them to be highly correlated (promoters, r = 0.91; scaffolds, r = 0.87; Fig. 5d, Supplementary Fig. 22 and Supplementary Tables 9 and 10). For the latter (generalizability across target sites), we compared results from synHEK3 (validation screen) versus alternative endogenous loci (validation screen) and also found them to be reasonably well-correlated (promoters, r = 0.79–0.93; scaffolds, r = 0.43–0.60; Fig. 5e,f, Supplementary Fig. 22e,f and Supplementary Tables 9–10), despite the lack of target site-specific iBC edit score normalization at alternative targets. Once again, these correlations were more modest for diversified gRNA scaffolds, plausibly due to the greater opportunity for interaction between the iBC and/or target sequence with variable scaffold sequences (that is, spacer, PBS and reverse-transcription template (RTT)). Nonetheless, classes of gRNA scaffolds exhibited consistent patterns of activity across target loci, for example, extensions exhibiting lower activity than both replacements and A-U flip variants (Supplementary Fig. 22g).
This screen also revealed promoters and gRNA scaffolds that consistently outperformed the standard components. For scaffolds, this included 17 designs that outperformed the standard across all target genomic loci, all of which were replacement or A-U flip variants (Supplementary Fig. 22 and Supplementary Table 10). Notably, these included six of seven scaffolds that outperformed the standard scaffold in the primary screen at endogenous HEK3 (Fig. 2d). The sole exception was scaffold 285, which outperformed the standard scaffold at all loci except HBB (Supplementary Table 10).
For promoters, although in our validation experiments we focused on a few Pol III promoters exhibiting a broad range of activities in the primary screen, 7 of 20 promoters outperformed the standard at synHEK3 (Fig. 5g), and 4 of 20 promoters across all five target loci (Supplementary Table 9). Notably, these included the ancestral rodent 7SK promoter that was the top-performing promoter both here as well as in our scaled screen of 3,566 promoters (Figs. 4 and 5g).
Taken together, these results show that the activities of diversified Pol III promoters and gRNA scaffold parts at HEK3 are predictive of their activities at other target sequences and genomic locations. Furthermore, they highlight several Pol III promoters and gRNA scaffolds that consistently exhibit higher levels of activity than the standard parts.
Single-step assembly and deployment of a ‘ten-key’ diversified molecular recording array
With functional parts in hand, we sought to test whether these parts were sufficiently sequence-diverse to enable their one-step assembly in yeast, and then to deploy this assembly in mammalian cells. In addition, we sought to assess whether activity measurements for isolated Pol III promoters, scaffolds and iBCs could be used to predict the activity of U6p–pegRNA–iBC combinations, as well as the relative activity of multiple U6p–pegRNA–iBC units assembled into a large array. For this, we designed a ten-unit array of ‘keys’ based on our diversified parts and DNA Typewriter31, a time-resolved, multisymbol molecular recording system that relies on sequential prime editing (Fig. 6a). In brief, DNA Typewriter leverages a ‘Tape’ composed of a tandem array of prime editing target sites, most of which lack the first 3 bp of the spacer targeted by corresponding pegRNAs, with the exception of the 5′-most site, which is complete. Each sequential round of prime editing inserts a barcode that both records information and completes the next spacer along the tandem array, enabling it to be written during the next round of prime editing (Fig. 6a). Sequential records generated with DNA Typewriter can be used to reconstruct cellular event histories, for example, of cell lineage31,39. In this analogy, pegRNAs encoding different barcodes are analogous to keys on a typewriter, encoding symbols that are written sequentially to media.
a, Schematic of workflow. (i) Diversified parts were paired in reverse rank order based on individual part activity measurements to balance predicted activity levels. (ii) Diversified U6p–pegRNA–iBC units were one-step assembled in yeast. (iii) The assembly was recovered, sequence validated and amplified in bacteria. (iv) The assembly was delivered to mammalian cells for sequential recording with the DNA Typewriter. (v) Each insertion of an NNNGGA barcode (that is, NNN as iBC; GGA to complete the next target site for sequential editing). b, Editing efficiency and the number of unique iBCs recovered at each of the six sequential sites in DNA Tape. Each dot represents an individual transfection replicate (n = 4). Higher editing rates in earlier sites are expected due to sequential editing by the DNA Typewriter. c, Proportion of insertions derived from each of the ten units of the diversified recording array across all DNA Tape sites. Observed proportions are correlated with predicted editing rates for each U6p–pegRNA–iBC unit. Smaller dots represent individual transfection replicates (n = 4), larger dots represent the mean of transfection replicates or predicted editing rates. Boxes represent the 25th and 75th percentiles, box centerline represents the median. Whiskers extend from the hinge to 1.5× the interquartile range. H. sapiens, Homo sapiens; C. parvulus, Camarhynchus parvulus; U. thibetanus, Ursus thibetanus; C. l. dingo, Canis lupus dingo; E. telfairi, Echinops telfairi; progr., programmed.
In designing this diversified molecular recording array, we sought to balance the activity levels of individual U6p–pegRNA–iBC units, as this is expected to yield a greater diversity of sequential editing patterns and thereby maximize the information content of any resulting recordings. Specifically, we paired ten of our top promoters with ten of our top scaffolds (Fig. 6a). Furthermore, we paired each U6p–pegRNA unit with specific ‘NNNGGA’ DNA Typewriter barcodes with similar activity levels31. We ordered 494–573-bp sequences corresponding to these ten U6p–pegRNA–iBC units flanked by versatile genetic assembly system (VEGAS) adaptors56 to facilitate their assembly in yeast (Supplementary Fig. 25). Additional components of the overall design included piggyBac inverted terminal repeats (for random integration), Bxb1 attB sites (for site-specific integration), orthogonal restriction enzymes sites (for isolation of individual units or the entire array) and flanking antirepressor elements (for insulation57,58; Supplementary Fig. 25). After the pooled transformation of 14 fragments to yeast (ten U6p–pegRNA–iBC units, four auxiliary and backbone components), we successfully recovered the complete 15.8-kb ten-unit assembly (Supplementary Fig. 25). Whole-construct sequencing revealed only one single-nucleotide substitution error that fell at the 5′ end of one of the U6 promoters, upstream of the four core TFBSs.
To more formally assess the value of diversified parts in this context, we attempted to construct a similar ten-key recording loci using fully repetitive standard parts—specifically ten repeats of the standard hRNU6-1p and gRNA scaffold (each driving ten different iBCs). We transformed either the diversified fragments or repetitive fragments into yeast in parallel, using the same set of VEGAS adaptors. We then performed shotgun genomic long-read sequencing on a pool of transformed yeast. Focusing alignments to the intended assembly, the number of successfully assembled junctions per read was markedly higher for assembly with diversified parts than repetitive parts, consistent with expectation (Supplementary Fig. 26a). Furthermore, we only identified reads harboring all nine assembly junctions when using diversified parts (5/346 reads with diversified parts (1.5%) versus 0/430 reads with repetitive parts (0%); Supplementary Fig. 26b). These results confirm and quantify the necessity of diversified parts for enabling the yeast-based assembly of arrays of Pol III-driven gRNAs.
Next, we delivered the diversified ‘ten-key’ DNA Typewriter construct to a HEK293T cell line expressing PEmax and multiple integrated copies of a synthetic DNA Tape construct, each with six editable sites for sequential recording (Fig. 6a). After 72 h, we observed all or a subset of the ten expected NNNGGA barcodes at each of the six sites, at rates that progressively decreased from the first to sixth unit, consistent with sequential editing (Fig. 6b). Notably, we observed insertions corresponding to all ten U6p–pegRNA–iBC units, and the proportion of edited reads corresponding to each unit was balanced within a few fold at each DNA Tape site where all ten iBCs were observed (4.7-fold range; Fig. 6c and Supplementary Table 11). Furthermore, the proportion of edited reads for each unit predicted by a simple Pol III × scaffold × iBC model based on our individual part measurements mirrored their observed activities throughout the length of the tandem array, with no obvious systematic bias attributable to the 5′ → 3′ position of the U6p–pegRNA–iBC units (r = 0.58; Fig. 6c and Supplementary Table 11). Of note, unit 2, which is an outlier in this correlation, has hRNU6-1p as its promoter, which is consistent with a modest overestimation of the hRNU6-1p in the primary, single barcode screen (see above). Taken together, these experiments confirm that our diversified parts are amenable to large-scale assembly in yeast, and that we can predict the activity of Pol III promoter–gRNA scaffold–iBC combinations (and tandem arrays thereof) based on the measured activities of individual parts.











