1. Introduction
Since the discovery of the CRISPR/Cas9 genome engineering technology platform [
1], the most commonly used application has been generation of microdeletions using a single sgRNA. The cellular non-homologous end-joining (NHEJ) pathway repairs the double-strand breaks, and this can lead to generation of an out-of-frame gene knockout (KO) [
2]. Functional outcomes of this process need to be properly screened for. Disruption of functional DNA motifs, such as transcription factor-binding sites or splicing signals, are easily achieved by deletion or insertion of one or more nucleotides. However, if a gene KO needs to be established, it is critical to make sure that a phase shift has been generated in all alleles of the target gene. It is worthwhile mentioning that using a single sgRNA has the inherent danger of generating different effects in sister alleles, which may result in distinct phenotypes. Screening usually involves a PCR over the region of interest, submitting the amplicon to sequencing, and deconvolution of the genotypes generated in all alleles (e.g., [
2]) (
Figure 1A).
Deletion of selected genetic elements relies on a dual sgRNA-mediated strategy and the repair follows the same principle as above, including excision of the genetic segment by Cas9 and subsequent repair of the cut by NHEJ [
3,
4]. For knockout of genes, as in classical gene targeting approaches, a chosen critical exon is deleted to generate a functional null allele [
5]. The critical exon is defined by (i) being part of all transcriptional isoforms whose expression needs to be eliminated; (ii) being contained within the first 1/3 of the coding sequence to have a good chance for nonsense-mediated decay (NMD) to occur [
6]; (iii) not being exon 1, if possible, and not having any immediate downstream in-frame ATGs (both of which could allow expression of a purely N-terminally truncated version); and (iv) having a base count indivisible by three to generate a phase shift in the transcript once the target exon is deleted. As such, deletion of the critical exon will create a defined mis-splicing and will terminate mRNA translation at a known endpoint. The targeting of intronic regions with sgRNAs needs to ensure all splicing signals are co-deleted, since only then will the previous exon be spliced to the subsequent downstream exon. A simple PCR screen over the target region discriminates presence and absence of the targeted exon/region and allows determination of deletion over WT alleles (
Figure 1B) [
4]. Presence of the short amplicon indicates deletion in all alleles analyzed and will identify the desired full KO. This is independent of ploidy and, in the primary screen, makes more complicated sequencing and deconvolution steps obsolete. Using the dual sgRNA approach is helpful when trying to establish deletions in triploid or tetraploid lines where generation of a full KO usually requires screening of many clonal lines. Several independent studies have reported the use of a dual sgRNA-based strategy being very effective, if not essential for genetic modification in diverse model systems [
7,
8,
9,
10,
11].
CRISPR/Cas9 is widely used as a reliable and very precise genome engineering tool. However, accumulating evidence suggests that repair of Cas9-induced DNA double-strand breaks can lead to various degrees of genomic rearrangements. NHEJ is the default repair pathway for cells and is usually highly efficient and accurate to allow maintenance of cell viability [
12], and inversions, duplications and deletions can be reliably generated by end joining after generation of DNA double-strand breaks [
13,
14]. However, when analyzing CRISPR/Cas9-mediated engineering with closer scrutiny, Shin and colleagues have identified large deletions of up to 600 bp and showed asymmetric deletions and large insertions of middle repetitive sequences [
15]. Boroviak and colleagues demonstrated that larger genomic sequences targeted for inversion or excision can re-integrate (demarcated by the gRNAs) in the vicinity of the edited locus [
16]. Furthermore, large deletions extending over many kilobases and more complex genomic rearrangements at the targeted sites were found, identifying significant numbers of unexpected cross-over events [
17]. Our own research has observed frequent larger-than-expected deletions, potentially generated by the microhomology mediated end-joining (MMEJ) repair pathway [
18]. Additionally, more recent findings report serial head-to-tail insertions of donor DNA templates [
19]; error-prone repair pathways inserting unwanted deletions and insertions [
20]; unintended ON-target chromosomal instabilities [
21]; harmful chromosomal deletions [
22]; and deleterious ON-target effects when aiming at homology directed repair (HDR) [
23]. Hence, it is crucially important for precise model validation to know which outcomes CRISPR/Cas9 genome engineering can generate besides the actual aimed-for edit [
24].
In addition to all of the above-described side products of genome editing, we observe an unexpectedly high level of inverted re-insertion events when using the dual sgRNA approach for genomic deletions. Our study summarizes several independent experiments, and we reveal inverted re-insertions in a median range of 3–20% throughout several different cell lines and at diverse genomic loci. Keeping this hitherto unreported phenomenon in mind, we suggest new measures that are vital for correct genotyping and, moreover, are potentially highly beneficial when establishing cellular model systems in more than diploid cell lines.
2. Materials and Methods
We present only a brief description of procedures and materials used in this manuscript since the aim is to provide a technical note as opposed to an elaborated protocol. However, we are very happy to share full details of experimental procedures, and any information request can be addressed to the corresponding author at any time.
Design of sgRNA reagents. All sgRNAs in this study were designed using the CRISPOR algorithm [
25]. Guides aimed at direct excision of either the critical exon (within the first third of the coding sequence, number of base pairs not divisible by three to generate a phase shift after removal, present in all transcriptional isoforms) or the respective regulatory element as detailed in the text. Individual ssODNs for sgRNA pairs were subcloned into pX458-eGFP (Addgene 48138) and pX458-Ruby (Addgene 110164) and ON-target activity of each individual sgRNA was evaluated by Surveyor assays (according to the manufacturer, IDT). The two best and, as far as possible, equally well performing sgRNAs were chosen for the dual sgRNA-based deletions. All sgRNA pairs used for exon removal were designed to delete both the critical 5′ splice branching point and the 3′ splice donor sites to allow full removal of the critical exon. All sgRNAs used in this study target intronic or intergenic regions.
Cell lines. Cell lines were obtained from ATCC or SIGMA and cultured, as described (HeLa, 293, 293T-Rex, HT29-MTX-E12, HCT116, and 4T1). The hiPSC line CTR M3 36S was generated from the reprogramming of keratinocytes from a neurotypical Caucasian male, aged 36 years old, as described [
26]. The mouse embryonic stem cell line E14 has been described [
27].
Editing procedure. Two plasmids encoding one single sgRNA and color-coded Cas9 (eGFP for 5′ guides, and mRuby2 for 3′ guides) were transfected in either 6 or 12 well format into 293, HCT116, HeLa and 4T1 cells by lipofection with LPF
2000 (ThermoFisher Scientific, Hemel Hempstead, UK ) and into HT29-MTX-E12 and E14 by Fugene (Roche). hiPSCs were transfected with pre-assembled RNPs using Alt-R
® S.p. HiFi-Cas9 nuclease 3NLS (IDT) and chemically synthesized tracrRNA and crRNA (IDT) using Lipofectamine CRISPRMAX™ reagent (ThermoFisher Scientific, Hemel Hempstead, UK), according to a previously outlined protocol [
28]. All delivery procedures were used, as described by the respective manufacturers. Seventy-two hours post transfection, eGFP and mRuby2 double-positive cells were single-cell sorted by FACS and grown out to individual clonal lines. hiPSCs and E14 cells were plated in low density and subsequently picked and expanded as individual clonal lines.
Isolation of gDNA. gDNA was isolated by lysing the cell pellet of a confluent 6 or 12 well plate in 500 µL lysis buffer (50 mM Tris HCl pH8.0, 100 mM EDTA, 100 mM NaCl, 1% SDS, 100 µg proteinase K) by incubation overnight at 56 °C. gDNA was isopropanol-precipitated and (critical step) DNA pellets were resuspended overnight at 56 °C in 500 µL ddH2O before determination of the DNA concentration (Nanodrop, ThermoFisher Scientific, Hemel Hempstead, UK).
Screening and validation procedures. PCR screening procedures in this study involve the following setup: primers for amplification of the genomic fragments were designed as 23mers with the aim to have three 3′ G/C residues and a GC content of minimum 55%. Primary PCR amplification was performed using Qiagen Taq polymerase and standard buffers (QIAGEN, Hilden, Germany) in presence of 3% DMSO using an annealing temperature of 62 °C and an elongation time of 60 s for all amplicons below 1 kb using primers denoted FW (forward) and RV (reverse) (see
Supplemental Table S1). PCR reactions were separated on 1% TAE agarose gels, pictures were archived and processed with a GelDoc XR gel documentation unit (BioRad, Hercules, USA). As soon as a WT band was amplified in conjunction with a deletion, samples were re-analyzed with co-aligned primers that will allow amplification only in case of inverted re-insertion occurred (oligonucleotides FW and FW*, described in
Supplemental Table S1). Inversion events were confirmed by Sanger sequencing of the amplified WT-sized bands in both directions. Cell lines have been further characterized for either absence of protein expression by Western Blot, or for absence of mRNA expression by qRT-PCR.
3. Results
We routinely use a dual sgRNA-mediated excision approach for both functional gene KO and excision of regulatory genetic elements. Using a dual sgRNA-based strategy has the potential to double the number of potential OFF-target modifications; however, using highly selected sgRNAs, we have successfully generated many model systems and have been unable to detect significant numbers of aberrantly generated models. This can potentially be attributed to the fact that we generally apply careful initial PCR screening procedures that always take into account potential larger-than-expected deletions [
18]. The dual sgRNA-based excision works well in a wide selection of cell lines or genomic loci (for strategy see
Figure 2, and for a summary of results see
Table 1).
A dual sgRNA-based deletion of a critical exon leads to a defined outcome in all alleles and does not pose the risk of potential hypo- or hypermorphic outcomes when independent alleles are differentially modified. Importantly, the desired edit is easy to screen for using a simple overlapping PCR (as outlined in
Figure 2A). This is especially useful when engineering cell lines that are more than diploid, which comprises most of the immortalized cancer cell lines available. A single PCR reaction will determine the allelic status at the same time (WT or KO), and this assay assumes that only clones with a ‘
deleted only band’ are bona fide KO cell lines (
Figure 2A). The observation of smaller bands of different sizes is indicative of excessive NHEJ taking place after Cas9-mediated excision.
When correlating phenotypes to results from such a simple primary PCR screen, we observed several clones that did not match the expected behavior. Clones potentially identified as HET (presence of a larger and a smaller band) did not express functional protein or detectable amounts of mRNA (data not shown). Investigating this matter by molecular cloning and Sanger sequencing, we found that a substantial number of clonal cell lines displayed inverted re-insertion of the excised fragment (
Table 1). We set up a simple second PCR screen, making use of two primers in co-alignment. In this assay, both forward (FW and FW’) primers would fail to amplify in cases where the target element is retained in its WT orientation (generating only two non-exponential linear amplicons). It is only when an inverted re-insertion event occurs that the previously co-aligned primers face each other and generate a productive amplicon (
Figure 2B). The example below demonstrates how a combination of our two-tier screening PCR revealed that clones previously classified as WT (clones 2 and 3) were actually HET, and only clones 4 and 10 were bona fide WT cells that went through the engineering procedure (
Figure 2A,B). This is an important finding, especially when aiming to use WT-classified clones, which have seen identical genome engineering reagents and procedures, as the generally accepted best possible control.
We subsequently screened several ongoing projects with this new approach. We observed a median inverted re-insertion rate of 3–20% of all clones screened. We find varying efficiencies in different cell lines and diverse target loci in our set of 12 independent experiments; however, our data strongly suggest a common phenomenon. It is important to note that the median range does exclude sample sizes with very limited numbers of clones available (as marked by asterisks in
Table 1). These cases yielded very high levels (50%) of inverted re-insertion; however, no clear deduction of generality is possible due to low sample numbers. We decided to include those numbers (
Table 1) to give an as broad as possible overview, especially since they provide direct evidence of inverted re-insertions happening. Moreover, our data also demonstrate a high rate of efficiency in the generation of full KO in our experimental cohort, with a median range from 2 to 43%. The same cutoff for low-number projects has been applied (
Table 1, %Δ).
Given the rather high occurrence of inverted re-insertions, it is important to stress the significance of this finding and the implications it can have on isolated clonal cell lines. Inverted re-insertions result in co-inversion of splicing signals and render them unrecognizable. Supporting this, we were unable to detect protein and mRNA expression in various inverted re-insertion models by RT-PCR, qRT-PCR and Western Blot analyses (data not shown). The consequence is that cells initially screened and identified as WT (by a single PCR) could well turn out to be heterozygous deletions. Importantly, this can be highly beneficial for the generation of gene deficient models in tri- or tetraploid cell lines. Generation of a full KO in those cells is usually difficult and requires screening of tens to hundreds of cell lines to find those with the required three or four simultaneous deletion events. In light of our observations, re-screening of HET clones (some to several alleles deleted and one potential WT remaining) could increase the pool of fully deleted clones by identification of inverted re-insertion events without the requirement to screen additional lines. In line with this, we observe that a large proportion of full KO models in tetraploid HeLa and 4T1 cell lines have been generated as combinations of KO and re-inserted inversions, at 30% and 100%, respectively (
Table 1).
Interestingly, hiPSCs, which are notoriously hard to modify [
29], did not display this outcome (
Table 1, bottom row). We screened 80 clones and only one was isolated with a proper monoallelic deletion. Re-targeting with a different set of sgRNAs with the aim of achieving a biallelic deletion resulted in the screening of an additional 140 clones. In this case, no clones were identified to contain either further deletions or inverted re-insertions. This might indicate a difficult genetic context or may represent an inherent feature of repair pathways active in hiPSCs. Further work is required to address this since our observations are based on one target locus in one hiPSC line. We thought to include this example in our manuscript to raise awareness, especially, since very many individual clones have been screened (
Table 1 and data not shown).
4. Discussion
We routinely use dual sgRNA approaches to generate deletions in genes or regulatory elements. We consistently find high levels of full deletions in all alleles across a variety of different cell lines, from different species and origins and targeting at several distinct loci (
Table 1).
Generally, the regions of interest are short, with critical exons or the targeted regulatory elements usually spanning less than a couple of hundred base pairs. A screen for deletion events can be performed easily using a reliable PCR over the deleted region and this is, importantly, independent of cell ploidy. When analyzing inconsistent gene or protein expression data in genotyped cell clones, we realized that this approach produced unexpected inverted re-insertions at rather high frequencies next to common NHEJ events at either sgRNA target site. We observe a range of 3–20% of events where the target exon and associated splicing signals have been inverted, contributing to the frequencies of heterozygous clones and as well to functional null alleles. Likewise, we observed inversion of regulatory elements targeted using the same dual sgRNA strategy. Our selection of cell lines (
Table 1) is not comprehensive. However, it allows us inference that inverted re-insertions are a rather common by-product of dual sgRNA-mediated genome engineering. Our observations are backed by Birling and colleagues who also detect variable inversion events performing mouse and rat oocyte injections using Cas9 mRNA and in vitro transcribed sgRNAs [
14]. In general, more data need to be analyzed to demonstrate general effects of cell line origin, genomic location and dependency on other factors.
Our new data are in line with recent publications where genome engineering resulted in unexpected larger rearrangements [
15,
17,
18,
19,
20,
21,
22,
23]. Our findings support the observations of Boroviak and colleagues [
16]; however, the high-frequency occurrence of inverted re-insertions using the dual sgRNA approach has not previously been described in this context. We think it is important to highlight the implications when establishing genome-engineered model systems. The frequencies we observe are variable, differ between cell lines and likely depend on intrinsic factors such as sgRNA quality, sgRNA activity or the underlying genomic context. We see a broad range from 0 up to 50% inversion frequency. The higher numbers (50% marked with * and ** in
Table 1) are most likely overstated due to low sample numbers in re-screening experiments; however, this still stresses the point that inverted re-insertions are a common phenomenon. Inverted re-insertion events in our hands did not give rise to protein in any of the full KO cell lines, reflecting successful trigger of NMD and generation of functional bona fide null alleles (data not shown). Our data, accumulated in
Table 1, could even be a general underrepresentation if larger-than-expected deletions were not picked up [
18]. The limited amount of cell lines, target loci and delivery options does not allow a generalized conclusion; however, the fact we consistently detect inverted re-insertions, apparently irrespective of the experimental approach, is worthwhile to take into account when screening engineered cell lines. A more systematic screening needs to be undertaken to account for a better representation of the overall rates of inverted re-insertions.
An important lesson we learned is that inverted re-insertions must be considered when selecting appropriate control cell lines. The best control line is always the cells that underwent the same engineering pipeline and that turn out to be “WT” or “HET” in respect to the engineered target. If inverted re-insertions are not properly screened for, such events can be missed. Cells could wrongly be classified “WT” but instead be “HET”, resulting in a “hidden genotype” issue. It is vital to ensure bona fide KO cell lines are compared to the properly characterized control lines. Equally important, inverted re-insertions can be very beneficial when engineering polyploid cell lines. In several cases, we observed that HeLa and 4T1 tetraploid lines with a full KO genotype comprise a combination of deletion and inverted re-insertion alleles to generate the full KO (
Table 1 and data not shown). An additional PCR screening step is enough to ensure a reliable primary identification of the deletion status in cell lines (
Figure 2).
Albeit we saw a consistent rate of inverted re-insertion throughout several immortalized human and mouse cell lines or murine embryonic stem cells, we failed to detect any event in hiPSCs (
Table 1 last row). Here, screening of initially 80 cells yielded only one HET deleted cell line with no indication of any inverted re-insertion. Re-targeting of the HET cell line and subsequent screening of >140 hiPSC clones did not result in identification of any full KO or inversion event. Addressing cell viability, it has been reported that a homozygous deletion of the target gene Clusterin is viable, at least in a mouse model [
30]. Our data points are based on one experimental cell line only; however, we screened many colonies and wanted to include those data to raise awareness. This phenomenon may be locus specific; however, it might also be intrinsic to hiPSCs and the use of their respective repair pathways. It is interesting to note that the hiPSCs were the only cell line edited with RNPs instead of plasmid delivered Cas9. Future investigation using other loci and other established hiPSC lines may help bring clarity to this issue.
The potential danger of increased OFF-targeting and/or risking potentially higher levels of genomic rearrangements by using two instead of only one sgRNA needs to be carefully considered. In our experience, the benefits prevail, allowing an easy screen and, so far, not experiencing serious OFF-targeting issues. We generally generate three to five independent KO lines and cross-compare their phenotype to ensure that we have generated the model system we intended to create. To be fully aware of any other OFF-targeting events, a comprehensive genome wide analysis is highly recommended, such as targeted locus amplification [
31] or unbiased next-generation whole-genome sequencing [
32]. The frequency of apparent OFF-targeting events is in discussion, and several papers provide arguments for [
33,
34,
35,
36] or against [
37] the use of a dual sgRNA-based approach for generation of non-clinical model systems. Ultimately, a careful decision needs to be made to choose between mono or dual sgRNA-based deletions. To our understanding, it depends on the balance between ease of model development, screening complexity and added quality control procedures. We generally recommend use of several lines for direct comparison as well as demonstrative final assays per cell line, such as qRT-PCR or Western Blot analyses, to confirm complete absence of mRNA or protein production.