CRISPR-Cas: Converting A Bacterial Defence Mechanism into A State-of-the-Art Genetic Manipulation Tool

Bacteriophages are pervasive viruses that infect bacteria, relying on their genetic machinery to replicate. In order to protect themselves from this kind of invader, bacteria developed an ingenious adaptive defence system, clustered regularly interspaced short palindromic repeats (CRISPR). Researchers soon realised that a specific type of CRISPR system, CRISPR-Cas9, could be modified into a simple and efficient genetic engineering technology, with several improvements over currently used systems. This discovery set in motion a revolution in genetics, with new and improved CRISPR systems being used in plenty of in vitro and in vivo experiments in recent years. This review illustrates the mechanisms behind CRISPR-Cas systems as a means of bacterial immunity against phage invasion and how these systems were engineered to originate new genetic manipulation tools. Newfound CRISPR-Cas technologies and the up-and-coming applications of these systems on healthcare and other fields of science are also discussed.


Introduction
Genetic engineering is of great interest for its large array of possible uses in a multitude of scientific domains. This set of technologies enables innovative practices such as the development of drought-resistant plant species [1], the modification of human pluripotent cells [2], or even the generation of genetically modified monkeys [3].
The first attempts to achieve genetically modified organisms (apart from rudimental selective breeding and induced mutagenesis techniques) were unsuccessful up until the 1970s. Transgenesis was effectively used to insert exogenous DNA sequences into Escherichia coli plasmids without disrupting the bacteria's biological functions, resulting in the first genetically modified organism [4].
However, this technique had its limitations. Since it relied on the random insertion of a DNA fragment, there was a risk of mismatch and interference of the exogenous gene with endogenous sequences that were not meant to be altered.
Homologous recombination was the first precise gene-editing technique to be developed [5]. The sequences of the DNA fragment delivered to the cell were homologous to the sequences of a target location in the genome, thus providing a way to reduce non-specific binding. Although this technique could and would be used among the scientific community for research purposes, widespread use was restricted due to its inefficiency.
Over the following decades, the increase in knowledge and information about genetics and advances in DNA sequencing technologies would pave the way for the development of more efficient and precise gene-editing tools, with clustered regularly interspaced short palindromic repeats (CRISPR), as the latest to make the headlines in the scientific community.

What Is Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)?
This complex system was first mentioned in 1987 when Japanese scientists were studying the activity of the iap gene in Escherichia coli [6]. Close to the sequence of the iap gene, they noticed an unusual genetic structure composed of alternating repeat and non-repeat DNA sequences, whose biological significance was at the time unclear.
The function of these intriguing systems was only brought to light 20 years later. In a landmark study, experimental evidence established CRISPR as a crucial element of the bacterial defence system against bacteriophage infection [7]. Scientists identified two different CRISPR loci in Streptococcus thermophilus strains. Sequencing of the spacer sequences of the CRISPR system revealed that these spacers were homologous to some bacteriophage and plasmid sequences, leading to the hypothesis that CRISPR was a defence mechanism of bacteria against foreign elements. To test this possibility, a phage-sensitive wild-type (WT) S. thermophilus strain was challenged with two different virulent bacteriophages. This resulted in the generation of nine different phage-resistant S. thermophilus strains, and further analysis of CRISPR loci in these mutant strains discovered that new spacers had been inserted next to those of the WT strain. Additionally, the sequences of these new spacers were similar to sequences within the genome of the phages used in the experiment. This confirmed the hypothesis that, with CRISPR, bacteria submitted to viral stress may integrate new spacers from phage genomic sequences that can lead to a diverse phage resistance phenotype of the bacteria. In the same study, researchers also noticed the proximity of CRISPR sites to a particular set of CRISPR-associated (cas) genes that coded Cas proteins, which were also relevant to CRISPR-mediated immunity since silencing of these genes disrupted CRISPR function.
Bioinformatic databases (CRISPRdb and CRISPI) dedicated to finding CRISPR motifs and Cas proteins on sequenced genomes predict that these systems are prevalent in Archaea (~87%) and can be found on many bacterial genomes and plasmids (~45%) [8,9].

Structure of CRISPR Loci
Several types of CRISPR exist with varying sequences and reliant on different Cas proteins, although they all share a similar DNA-encoded, RNA-mediated activity. The CRISPR locus, as the full name dictates, is composed of short repeat sequences, usually ranging from 28 to 37 bp (base pairs) [10], separated by spacers each bearing a unique sequence of similar length. Each repeat is arranged in a palindromic fashion, meaning that the repeat's sequence on one side of the strand is identical to the opposite strand's sequence when both are read in their respective 5 to 3 direction. Spacer sequences feature phage-or plasmid-derived genetic material and constitute the key elements to the specificity of CRISPR's defene mechanisms [10]. These spacers function as an immunological memory bank, storing sequences from previous encounters with invading organisms. The number of spacers within a CRISPR array can range from as few as one to several hundred, depending on the species [8].
A region rich in adenine and thymine (A and T, respectively), known as the leader sequence, stands upstream to CRISPR loci. These leader sequences have a length of approximately 500 bp and carry promoter elements and signals for CRISPR systems adaptation that are crucial to the transcription of crRNA (CRISPR RNA) and the successful integration of foreign genetic material into CRISPR sequences [11,12].
The CRISPR array and the leader sequence are preceded by CRISPR-associated genes, otherwise known as cas genes. Cas proteins (proteins encoded by cas genes) pair together with crRNA transcribed from CRISPR loci, forming CRISPR-Cas effector complexes which mediate the silencing and cleavage of alien nucleic acids. Variations in cas genes and different arrangements of CRISPR loci originate several types of CRISPR-Cas systems. These were originally broken down into three major types, I, II, Figure 1. CRISPR-Cas adaptive immunity. Upon injection of genetic material from a virus or a plasmid into the bacteria, part of the invading sequence is cleaved and incorporated into the CRISPR locus, forming a new spacer within the locus. The CRISPR array is transcribed into a precursor to crRNA molecules (pre-crRNA), which is then cleaved into mature crRNA, which form effector complexes with type-specific Cas proteins (brown). When a foreign sequence matches a CRISPR spacer, the matching crRNA binds to the invading strand, activating Cas proteins with nuclease activity which silence the invader.
Integrase activity of the Cas1-Cas2 complex mediates integration of protospacer DNA into the CRISPR array. 3 -OH groups of the protospacer intermediate catalyse 2 sequential nucleophilic attacks at both 5 ends of the first repeat of the CRISPR array [28]. The result is an expanded CRISPR array with a new spacer in between two incomplete ssDNA repeats, which are afterwards repaired by unknown enzymes. This selection bias for the repeat that is the closest to the leader sequence means that the most recently acquired spacer is the first on the CRISPR array. Therefore, spacers are arranged chronologically within the array, with a few exceptions [29,30].
Some type I, II and V systems also rely on Cas4 nuclease activity for the adaptation step [14,18]. The type III-B Cas-system of Marinomonas mediterranea is particularly interesting because Cas1 is linked to a reverse transcriptase, meaning that spacers can be obtained from RNA-based invaders and subsequently reverse transcripted into DNA [31]. The need to further comprehend the adaptation machineries for other types of Cas systems persists, although since Cas1 and Cas2 are widely present in nearly all CRISPR systems [17], the function of this complex in most CRISPR systems is thought to be similar to the well understood adaptation mechanisms of type I and type II systems.

CRISPR RNA Biogenesis
The transcription and processing of the CRISPR array and cas genes into small crRNAs involves subtype-specific processes and enzymes. In all types of CRISPR-Cas systems, the CRISPR locus is transcribed into a crRNA precursor (pre-crRNA), which is subsequently cleaved and processed by Cas proteins or cellular ribonucleases, yielding smaller units of mature crRNA [32]. This mature crRNA features a single spacer sequence flanked by fragments of the repeat region.
In type I systems, a Cas6 variant (formerly Cse3) is the enzyme that processes the pre-crRNA into mature crRNA fragments. As an example, spacers in type I-E form a stem-loop shape after transcription that is recognised and cleaved by Cas6e. This type I-E specific protein remains attached to the 3 end of the crRNA after cleavage [33]. Type III systems also require Cas6 for crRNA processing, even though their repeats do not originate in stem-loop structures [34].
Type IV CRISPR systems are rare and do not carry the usual proteins CRISPR systems needed for the adaptation and cleavage such as Cas1, Cas2 and Cas4 [14]. Unlike other class 1 systems, type IV lacks Cas6, a protein that types I and III use to process pre-crRNA into mature crRNA. Its multi-subunit effector module is composed of Csf1 (the signature protein of this system), Cas5, and Cas7. Type IV warrants further experimenting to comprehend its mechanisms of adaptation and bacterial immunity.
Type II systems do not carry the gene for Cas6 and instead rely on host RNase III, Cas9 proteins, and small trans-activating RNA molecules (tracrRNA). tracrRNA is complementary to the repeat sequence and contains 3 stem-loop hairpin structures [35,36]. After transcription, tracrRNA binds to pre-crRNA molecules originating dsRNA repeats alternated with ssRNA spacers. Cas9 acts as a molecular anchor which stabilizes the tracrRNA:pre-crRNA interaction for later recognition and cleavage of pre-crRNA by RNase III for complete processing [37].
In type V and type VI, Cas12 and Cas13, respectively, are the proteins that process pre-crRNA into a mature crRNA, without need for tracrRNA molecules [18,38]. However, in subtype VI-A the processing step is not essential, since pre-crRNA molecules can be used as guides for target cleavage [39].
Both type II and III systems require a further trimming step through a ruler-based mechanism for complete crRNA processing. Trimming occurs at the 5 end in type II systems and at the 3 end in type III systems [40].

Interference
Upon infection, the mature crRNA molecules direct the subtype-specific interference machinery towards invading nucleic acids to enable the silencing of foreign genetic material.
In type I systems a Cascade system (CRISPR-associated complex for antiviral defence) is formed, composed of a multiprotein backbone with different Cas protein subunits linked to the crRNA molecule [41]. The Cascade complex recognizes the PAM site in the invading molecule and unwinds the DNA, enabling the pairing of crRNA with the homologous invading DNA strand. This pairing induces a triple-stranded R-loop formation, which in turn prompts the recruitment of Cas3, the signature protein of type I systems [42,43]. Cas3 cleaves the ssDNA strand not linked to the Cascade complex. Although this degradation handicaps the invader, it might not lead to the full destruction of the target. Complete degradation might be induced by other cellular nucleases or by Cascade-independent Cas3 nuclease activity, which has been previously documented [44][45][46].
Type III systems are similar to Type I in the sense that they also depend on multiprotein Cascade complexes that encompass crRNA, Csm in subtype III-A and Cmr in III-B, although Cas6 is absent in these complexes [47]. Type III-A and type III-B share in common the signature protein of type III systems, Cas10, and are unique in relation to other interference mechanisms because they target both RNA and DNA substrates [48,49]. Interference by type III systems occurs when the target DNA is being transcripted, since the cascade complex binds to a nascent ssRNA transcript. This binding enables Cas10-mediated cleavage of the complementary DNA duplex and Cas7-guided cleavage of the ssRNA molecule in intervals of 6 nucleotides [47,50]. Recent studies also suggest that Cas10 has a further role in activating non-specific RNase Csm6, by producing cyclic oligoadenylates from ATP molecules. Csm6 is activated by these oligoadenylates, and even though it is not a part of the Cascade effector complex, it has an auxiliary action by degrading foreign transcripts in a non-specific fashion [51,52]. Akin to the biogenesis step, interference in Type II CRISPR-Cas systems depends on both Cas9 and tracrRNA. In interference, Cas9 acts as an endonuclease guided by two RNAs, crRNA and tracrRNA, which pair together due to tracrRNAs complementarity to spacer sequences carried by crRNA, forming a dual RNA complex (tracrRNA:crRNA) [37,53,54]. Binding of this dual RNA structure induces conformational changes on Cas9, leading to its activation [54]. Upon activation, the guide RNA-bound complex screens foreign genetic elements for the correct PAM site, opposite to the target strand. Once identified, the dsDNA is unwinded and crRNA binds to the target ssDNA leading to an R-loop shape and ultimately to a blunt double-strand break by both catalytic sites of Cas9, RuvC and HNH, 3 nt upstream of the PAM site [25,53,54].
Type V CRISPR systems depend on subtype-specific Cas12 proteins, Cas12a (formerly Cpf1), Cas12b and Cas12c for subtypes V-A, V-B, and V-C, respectively [18]. These proteins bear some degree of similarity to Cas9, as noted by phylogenetic analysis and the bilobed structure they share in common [18,55,56]. After PAM site recognition and crRNA binding to target DNA, Cas12a and Cas12b asymmetrically cleave the DNA duplex in both strands, originating staggered breaks with 5-and 7-nt overhangs on Cas12a and Cas12b, accordingly. However, unlike Cas9 or Cas12b, the interference mechanism of Cas12a does not depend on tracrRNA for successful cleaving, and instead relies solely on crRNA [38]. Cas12c still awaits further investigation on its structure and activity.
The recently characterised type VI is defined by the presence of the Cas13 protein (formerly C2c2). This protein contains higher eukaryotes and prokaryotes nucleotide (HEPN)-binding domains, which are ubiquitous in RNases [16,18]. Cas13 is unique relative to other class 2 systems due to its ability to cleave ssRNA molecules homologous to crRNA, which is complemented with non-specific cleaving of other ssRNAs, similar to the Csm6 enzyme of type III systems [57,58]. Cleaving occurs preferably before uridine (U) residues. A species-dependent protospacer flanking site (PFS), analogous to PAMs in DNA targets, is also of relevance for activation of Cas13 proteins [58]. Binding to crRNA induces conformational changes in Cas13 that promote ssRNA pairing. Upon linking to the target, Cas13 RNase activity is prompted by the approximation of the catalytic sites of both HEPN domains [59]. Table 1 highlights the main particularities of each type of CRISPR system.

CRISPR-Cas Systems as a Gene-Editing Tool
In a landmark paper released in June 2012, Jinek et al. laid the foundation to what would ultimately become a revolution in genome editing and transcriptional control [54]. Jinek and his peers hypothesised that in Type II systems, the dual guide RNA complex tracrRNA:crRNA of Cas9 could be fused into a single chimeric RNA by linking the 3 end of crRNA to the 5 end of tracrRNA. Such a technique would allow for programmed DNA cleavage through engineering of the chimeric RNA molecule, later designated as sgRNA or gRNA (guide RNA). This hypothesis was proven with the design five different gRNA molecules to target the green fluorescent protein (GFP) gene, which Shortly thereafter, further discoveries would unravel the full potential of CRISPR as a tool for genetic editing. In early 2013, Jiang et al. used the CRISPR-Cas9 system to induce targeted mutations (insertions, deletions, and single-nucleotide substitutions) in the genome of Streptococcus pneumoniae and E. coli strains [60].
Later in the same year, Bikard et al. demonstrated how CRISPR could be used as a new tool to regulate gene expression by either activating or repressing the transcription of bacterial genes [61].
As experimentation with CRISPR started to become widespread, scientists moved from bacteria to other kinds of cells. Soon, all sorts of cells and some multicellular organisms would be the object of CRISPR-mediated manipulation, such as human cell cultures, mice, plants, yeasts, and the list goes on [62][63][64][65].

Repurposing CRISPR for Genetic Engineering
Of both CRISPR-Cas classes, class 2 systems are the most widespread among the scientific community due to the simplicity of their mechanism. Whereas class 1 systems require a convoluted multiprotein Cascade complex, class 2 systems depend only on small RNA molecules, apart from the type's specific Cas protein [14].
As previously mentioned (see Interference), type II Cas9 systems rely solely on a dual RNA complex of crRNA:tracrRNA, which can be effortlessly engineered into a single chimeric gRNA molecule [54]. gRNA molecules contain both a scaffold sequence that binds to Cas9 and a targeting sequence which directs the system towards the target locus [25]. As Cas9-gRNA screens for a potential target, the first 8-12 PAM-proximal bases of gRNA's targeting sequence, also known as the seed sequence, will begin pairing with the target DNA in the 3 -5 direction, provided a PAM site is recognised [66]. While mismatches in the seed sequence terminate pairing and compromise Cas9 cleaving activity, mismatches towards the 5 PAM-distal end do not always jeopardise Cas9 function [67]. Homology between gRNA and the target sequence results in a double-strand break (DSB) in the DNA, catalysed by both catalytic domains of Cas9, HNH and RuvC ( Figure 2). Shortly thereafter, further discoveries would unravel the full potential of CRISPR as a tool for genetic editing. In early 2013, Jiang et al. used the CRISPR-Cas9 system to induce targeted mutations (insertions, deletions, and single-nucleotide substitutions) in the genome of Streptococcus pneumoniae and E. coli strains [60].
Later in the same year, Bikard et al. demonstrated how CRISPR could be used as a new tool to regulate gene expression by either activating or repressing the transcription of bacterial genes [61].
As experimentation with CRISPR started to become widespread, scientists moved from bacteria to other kinds of cells. Soon, all sorts of cells and some multicellular organisms would be the object of CRISPR-mediated manipulation, such as human cell cultures, mice, plants, yeasts, and the list goes on [62][63][64][65].

Repurposing CRISPR for Genetic Engineering
Of both CRISPR-Cas classes, class 2 systems are the most widespread among the scientific community due to the simplicity of their mechanism. Whereas class 1 systems require a convoluted multiprotein Cascade complex, class 2 systems depend only on small RNA molecules, apart from the type's specific Cas protein [14].
As previously mentioned (see Interference), type II Cas9 systems rely solely on a dual RNA complex of crRNA:tracrRNA, which can be effortlessly engineered into a single chimeric gRNA molecule [54]. gRNA molecules contain both a scaffold sequence that binds to Cas9 and a targeting sequence which directs the system towards the target locus [25]. As Cas9-gRNA screens for a potential target, the first 8-12 PAM-proximal bases of gRNA's targeting sequence, also known as the seed sequence, will begin pairing with the target DNA in the 3′-5′ direction, provided a PAM site is recognised [66]. While mismatches in the seed sequence terminate pairing and compromise Cas9 cleaving activity, mismatches towards the 5′ PAM-distal end do not always jeopardise Cas9 function [67]. Homology between gRNA and the target sequence results in a double-strand break (DSB) in the DNA, catalysed by both catalytic domains of Cas9, HNH and RuvC ( Figure 2). DSB repair is mediated either by non-homologous end joining (NHEJ) or by homology-directed repair (HDR). NHEJ is an active and error-prone mechanism where random DNA fragments align with both ends of the DSB and are linked by endogenous repair machinery, provided the bases at both ends share some degree of complementarity [68]. This pathway requires no repair template and constitutes the main route by which Cas9-induced DSBs are repaired. NHEJ can lead to small DSB repair is mediated either by non-homologous end joining (NHEJ) or by homology-directed repair (HDR). NHEJ is an active and error-prone mechanism where random DNA fragments align with both ends of the DSB and are linked by endogenous repair machinery, provided the bases at both ends share some degree of complementarity [68]. This pathway requires no repair template and constitutes the main route by which Cas9-induced DSBs are repaired. NHEJ can lead to small nucleotide insertions or deletions (indels) in the DSB region, which in turn can originate a vast host of insertions, deletions, or frameshift mutations [69,70]. These mutations derived from Cas9-induced DSBs can be beneficial when trying to attain a knockout in the targeted gene, since indels often result in premature stop codons and consequently render the gene inoperative. However, NHEJ is a highly random and unpredictable process not suitable for the generation of single-base editing or the insertion of specific sequences.
Homology-directed repair arises as a more precise method for DSB repair and incorporation of specific sequences after Cas9 cleavage. Contrarily to NHEJ, HDR requires a DNA template containing the sequence to be delivered to the cell, along with Cas9 and the gRNA [71,72]. For HDR to be successful, both ends of the template must be homologous to the terminal region of the DSB. In order to prevent Cas9 linking and eventual cleavage of the inserted sequence, the PAM sequence should be absent from the repair template. Due to the high efficiency of Cas9 activity and the relatively higher efficiency of NHEJ when compared to HDR, three kinds of entities coexist in this process: wild-type sequences, NHEJ-repaired sequences, and a smaller population of the intended HDR-repaired sequence [73]. Thus, isolation and amplification of the desired sequence are of utmost importance to enhance the in vitro efficiency of HDR.
Apart from mutations and gene editing, CRISPR systems have also been manipulated to increase or reduce gene expression. By introducing two mutations in the RuvC and HNH catalytic domains of Cas9, scientists engineered a 'dead' Cas9 (dCas9) that could still bind to DNA but had no cleaving activity [61]. Repression is possible using dCas9 linked to gRNA molecules complementary to a selected gene region. Binding to the targeted gene prevented transcription, seemingly by sterically inhibiting RNA polymerase (RNAP) binding and activity. Moreover, both the initiation and elongation steps of transcription can be prevented through this method, depending on the gene region towards which dCas9 is directed. Another particularly interesting finding was the possibility to modulate the strength of transcription repression by weakening RNA/DNA interactions through the introduction of mismatches in the gRNA/target connection, induced by mutations in the 5 end of crRNA. Transcription activation was achieved by fusing dCas9 to the omega (ω) subunit of RNAP. dCas9 is directed to the target region, subsequently recruiting and activating the RNAP and culminating in an increase of gene transcription. Novel strategies of Cas9-based transcription modulators rapidly appeared prompted by the fusion of dCas9 or gRNA to different activator or repressor elements, with multiple degrees of modulation and specificity, and enabling single or multiplexed transcriptional control [74][75][76]. Due to its catalytically inactive nature, modifications of gene expression caused by dCas9 are transient, since the genomic DNA is not altered [61,77]. However, persistent modifications can be achieved with dCas9 by fusing it to acetyltransferase, histone/DNA demethylase or methyltransferase, altering histone acetylation/methylation or DNA methylation marks and inducing potentially inheritable epigenetic expression modulation when dividing cells are targeted [78][79][80].
If the purpose is to correct or induce a point mutation requiring only a base substitution, there are simpler methods that do not depend on a DNA template and are more efficient than HDR. By introducing an aspartate-to-alanine (D10A) mutation in the RuvC active domain of Cas9, the resulting mutant, Cas9 nickase, (Cas9n) will nick the target DNA, originating single-stranded breaks rather than DSB [54,81]. Coupling this Cas9n or dCas9, the catalytically "dead" variant of Cas9, with a cytidine deaminase enzyme enables the gRNA-mediated deamination of cytosine (C) bases in the non-target DNA strand into uracil (U), which shares the base-pairing properties of thymine (T) [82]. Through endogenous DNA replication or DNA repair, the U base is repaired to a T base, thereby creating a C→T (or G→A) substitution without inducing a DSB. In the same paper, with enhanced third-generation base editors, Komor et al. achieved >30-fold greater editing efficiency in various human cell lines when compared to HDR-mediated Cas9 editing, with fewer indel formation. Later base editors' generations built upon this system to increase efficiency and reduce off-target indel formation [83][84][85].

Advantages of CRISPR Relative to Other Techniques
CRISPR is the latest addition to a set of gene editing tools that keeps evolving and producing new possibilities in the field of genome engineering. Zinc-finger nucleases (ZFN) and transcription activator-like effector nucleases (TALENs) are the other vastly used approaches that complete the lot.
ZFN and TALENs both derive from the fusion of the DNA cleavage domain of the non-specific FokI restriction endonuclease with DNA-binding elements that direct the enzyme to the desired locus. In ZFNs, the FokI cleaving domain is coupled to an assemblage of zinc finger proteins, each recognizing and binding to a triplet of the nucleotide [86,87]. The design of a ZFN pair targeting opposite strands with an offset of 6 bp results in a DSB in the targeted DNA and can subsequently be repaired by NHEJ or HDR [88]. On the other hand, TALENs were engineered through the fusion of the FokI domain with transcription activator-like (TAL) effectors, proteins found in plant pathogens whose DNA-binding domains (DBD) contain tandem repeats of 33-35 amino acids [89,90]. Each repeat binds to a single nucleotide, and the amino acids residues in the positions 12 and 13 of that repeat determine which nucleotide is bound [91]. By knowing this, it is possible to direct TALENs towards a target locus by programming specific DBDs that can recognize DNA sequences with a length of 15-20 base pairs, originating a DSB when a pair of TALENs is used [92].
These two methods have seen use as genetic editing mechanisms, for gene insertion, deletion, and modulation in multiple species and cell lines [93][94][95][96][97]. Nevertheless, both techniques encompass some drawbacks that the use of CRISPR/Cas systems overcomes ( Table 2). For starters, whereas ZFNs and TALENs require custom-made proteins to guide the enzyme towards its target, Cas9 systems depend solely on the engineering of short gRNA molecules, without the need for laborious and costly protein programming and validation and therefore saving time and resources. The need for specific proteins tailored for each gene also makes multiplex gene editing a strenuous task for ZFNs and TALENs, while with Cas9 systems multiple genes can be targeted simply by delivering multiple gRNAs to the cells [63,98]. Additionally, the fact that ZFNs and TALENs work as dimers deters the use of some delivery systems, such as the adeno-associated virus, due to the limited loading capacity of these vectors and the hefty dimensions of ZFN and TALEN systems [99]. The markedly high efficiency of CRISPR/Cas systems coupled with the above-mentioned advantages over other methods justify the CRISPR "epidemic" that the scientific community experienced in 2013 [62,81,100,101].

Limitations of CRISPR Systems
Despite the qualities of CRISPR systems referred to, some limitations need to be taken into consideration for CRISPR to see use in therapeutic and clinical applications on a larger scale.
CRISPR-Cas systems require a short PAM sequence next to the 3 end of the target sequence [24,37]. As an example, the most common Cas9 system, SpCas9, recognises NGG motifs and therefore only sequences adjacent to that motif can be targeted [60]. This feature of CRISPR limits its use when no such PAM exists in the neighbourhood of the locus one wishes to target. However, various conditions attenuate the impact of this obstacle: NGG motifs occur rather frequently, on average every 8 bp in the human genome [81,102]; SpCas9 can also recognise NAG motifs, albeit with lower efficiency [60]; and the fact that Cas9 systems from different bacteria recognise other PAM sites [26], meaning that researchers can pick whatever system better suits their needs. Hu and colleagues recently engineered a SpCas9 variant, xCas9, that recognises additional PAM sites, such as NG, GAA, and GAT, and at the same time displaying significantly lower off-target effects [103].
One of the most significant hurdles that stalls CRISPR adoption is its propensity to generate off-target effects. While TALEN target sites can have a length of~30 nt, making them unique targets in the genome and lowering the chance of mismatches [104,105], Cas9 is guided by a 20 nt fraction of the gRNA, and it maintains cleaving activity even with 3-5 mismatches at the PAM-distal end of the gRNA molecule [59,81,102]. Defective off-target binding and cleaving can result in collateral mutagenesis induced by the error-prone repair of DSB by NHEJ [106,107]. Harmful consequences can arise from off-target mutations, such as activation of oncogenes or silencing of tumour suppressor genes [108]. New CRISPR variants that minimise off-target effects will be discussed in the next chapter.
Some aspects of HDR of DSBs can also impair CRISPR's efficiency. For precise gene editing with CRISPR to be a reliable therapeutic alternative, HDR needs to be the main repair mechanism after Cas9-mediated cleavage. Due to the low rate of HDR recombination [109], and because it is only readily available in dividing cells [110,111], this method needs to become more robust and flexible in order to see use in disease therapy. Techniques like synchronisation of cell cycle and use of repair templates of single-stranded oligonucleotide DNA [112], or inhibiting the NHEJ pathway [113] have shown to be useful as means of increasing the efficiency of HDR after Cas9 cleavage. Researchers have also developed a homology-independent targeted integration (HITI) strategy as an alternative to HDR, a technique that allows DNA integration in both dividing and non-dividing cells in vitro and in vivo [114,115].

Novel and Enhanced CRISPR/Cas Systems
With the aforementioned limitations in mind, researchers have made efforts to improve upon CRISPR systems in order to develop their specificity, efficiency and consistency even further.

Cas12a (Cpf1)
As established previously (See Section 4.3 Interference), type V Cas12a shares some similarities to Cas9 in the sense that it depends only on RNA molecules to originate DSBs, and therefore is classified as a Class 2 CRISPR system [18,38]. However, in contrast to Cas9, it requires only a crRNA molecule to guide it towards its target, in contrast with the crRNA:tracrRNA dual guide of Cas9; and the resulting DSBs are staggered cuts with 5-nt 5 -overhangs as opposed to the blunt cuts generated by Cas9 [116]. Furthermore, while Cas9 enzymes recognise G-rich PAMs, Cas12a preferably links to targets with T-rich PAM sites, this range of recognisable PAMs having increased lately thanks to engineered versions of Cas12a [117]. Additional advantages of Cas12a over Cas9 are the fact that it has lower mismatch tolerance, reducing off-target effects [118]; and Cas12a can process its own crRNA through RNase III activity, thus facilitating multiplex gene editing, as one pre-crRNA template can be delivered to the cell where it is subsequently cleaved by Cas12a into various crRNA molecules targeting different genes [119]. The overhangs left after Cas12a cleaves the target DNA also facilitate HDR, as staggered cuts are preferably repaired through this mechanism rather than NHEJ [120]. Cas12a variants from Acidaminococcus sp. BV3l6 and Lachnospiraceae bacterium ND2006, AsCpf1 and LbCpf1, correspondingly, display similar on-target efficiency to SpCas9 in human cells [121].

Cas13a (C2c2)
The most recent addition to the CRISPR family is particularly unique in comparison to its counterparts. Although type VI Cas13a is also a class 2 CRISPR system, it has the ability to cleave exclusively RNA through the activity of two HEPN domains [58], in contrast with the DNA cleaving ability of Cas9 and Cas12a. It shares with Cas12a the ability to process its own crRNA, which enables the targeting of multiple loci with one pre-crRNA template [122]. The RNA-cleaving properties of Cas13a can be harnessed for post-transcriptional repression, with comparable efficiency to RNA interference (RNAi) methods of RNA silencing [123,124], albeit with more specificity and the ability to cleave nuclear transcripts, that is minimal with RNAi [122]. Due to alternative splicing, the transcription of one DNA sequence results in various splicing isoforms, which means that by targeting DNA with CRISPR systems all mRNA isoforms are affected. By using Cas13a, it is possible to target specifically a single isoform to study its function or interfere with its effect without hampering the activity of the other isoforms [125]. Cas13a can also target pre-mRNA, which can be useful in diseases caused by mis-splicing [126], since the enzyme can act before the defective splicing occurs. However, Cas13a demonstrated indiscriminate RNA cleaving ability [57], which could hinder its usefulness as a therapeutic agent. A recent study found no such effects when the Leptotrichia wadei variant, LwaCas13a, was used in mammalian cells [122], implying that this collateral effect might be absent or undetectable in eukaryotic cells.

Cas9n
In situations where gene knockouts are not in order, the NHEJ pathway serves no other purpose than to hinder the repair of DSB by the desired HDR mechanism. As previously stated (see Section 5 Repurposing CRISPR for Genetic Engineering), by introducing a specific mutation in the RuvC domain of Cas9, a Cas9 nickase variant (Cas9n) is created. Cas9n nicks the target DNA, originating single-stranded breaks rather than DSB [54,81]. Single nicked DNA is preferably repaired through base excision repair [127], thus Cas9n can be used to improve the efficiency of the process by reducing the number of indel mutations resulting from unwanted NHEJ repairs. In addition, nickases can be employed to further increase the specificity of Cas9-directed genome editing. Scientists engineered a double nicking scheme featuring a pair of Cas9n targeting opposite strands where neighbouring gRNA targets are offset by a certain number of base pairs [101]. The pairing of Cas9n systems results in DSBs with gRNA-defined overhangs, which can lead to highly specific gene edits when combined with HDR, or originate precise deletions in critical alleles through NHEJ [128,129]. Double nicking dramatically increases specificity, since even if one of Cas9n acts off-target the resulting nick is easily repaired through high-fidelity base excision repair [127], unlike wild-type Cas9, where the blunt off-target DSB can result in undesired mutations when repaired by the NHEJ pathway. However, this method has the drawback of requiring the simultaneous design and delivery of two distinct gRNA molecules.

dCas9
The manipulation of Cas9 systems into tools that modulate gene expression has been previously addressed in this paper in a more detailed fashion (see Section 5 Repurposing CRISPR for Genetic Engineering). When both RuvC and HNH catalytic domains of Cas9 are modified through two silencing mutations, the system loses its DNA cleaving capabilities but retains the ability to bind to targeted sequences [61,77]. Research has demonstrated that this catalytically inactive variant of Cas9 (dCas9) can hinder transcription on its own, presumably by either blocking the pairing between RNA-polymerase and promoter sequences targeted with dCas9, or instead by halting the elongation step if the target sequence is part of an open reading frame region.
The dCas9 system can be further modified in several ways, such as fusing dCas9 to direct or indirect transcription activators (such as VP64), to increase the expression of a specific DNA sequence; or transcription repressors (such as KRAB), to increase the efficiency of dCas9-mediated transcription inhibition [130,131]. The modification of genetic expression by dCas9 is a transient process, as it does not cause permanent modifications to the genomic DNA. However, specific and long-lasting modifications to genetic expression are possible through the fusion of epigenetic modifiers to dCas9 [78]. In a thorough and enlightening paper, Brocken et al. compiled the most recent advances and strategies for epigenetic modification and transcriptional regulation using dCas9 [132].

eSpCas9, SpCas9-HF1, and HypaCas9
A distinct approach to improve CRISPR targeting specificity relies on the modification of the interactions between the Cas9 system and the bound DNA strands. Slaymaker et al. entertained the possibility that Cas9 cleavage is more efficient when the separation of the target and non-target strands is stable, so undermining this separation in unwanted targets would reduce off-target effects [133]. Upon binding of Streptococcus pyogenes Cas9 (SpCas9) to the target site, a stable strand separation is maintained through two kinds of interactions: the binding of gRNA to the target strand, and a positively-charged groove resulting from the unspecific interaction of both HNH and RuvC domains with the negatively-charged non-target strand [36]. Weakening the interactions on the non-target strand by reducing positive charges potentiates the re-hybridization between the target and non-target strand. Off-target effects are therefore reduced since rigorous base pairing between gRNA and the target DNA is required in order to maintain a stable separation of the target and non-target strands. To weaken groove interactions, scientists engineered SpCas9 mutants with a substitution of a single positively-charged amino acid residue, from which resulted two "enhanced specificity" SpCas9 variants (eSpCas9(1.0) and eSpCas9(1.1)), which displayed similar on-target efficiency to WT SpCas9 while having significantly lower levels of off-target cleavage.
Focusing also on the binding between Cas9 and the target locus, Kleinstiver et al. developed the high-fidelity SpCas9-HF1, a variant that produced undetectable genome-wide off-target cleavage [134]. However, instead of disrupting the non-target strand interactions, Kleinstiver and his colleagues modified four SpCas9 residues that formed hydrogen bonds with the phosphate backbone of the target strand, therefore impairing gRNA binding to DNA targets in the presence of any mismatches. Alanine substitutions in all four residues originated SpCas9-HF1, which along with eSpCas9 also showed comparable on-target activity with WT SpCas9, without impactful off-target effects.
Most recently, Chen et al. utilised single-molecule Förster resonance energy transfer (smFRET) to find out how SpCas9-HF1 and eSpCas9(1.1) differentiate between targets [135]. Throughout their research, scientists have found that SpCas9-HF1 and eSpCas9(1.1) halt in an inactive conformation after they bind to mismatched sequences. Furthermore, they characterised the functions of REC3, a non-catalytic domain of Cas9 that regulates target complementarity and HNH catalytic activity. Using this newfound knowledge, they induced mutations in the REC3 domain, originating a hyper-accurate Cas9 variant (HypaCas9) with the same on-target efficacy as WT Cas9 and similar or improved specificity when compared with SpCas9-HF1 or eSpCas9(1.1).

Delivering CRISPR Systems into the Cell
One of the most important elements for CRISPR to work is the successful delivery of these systems to the cells that are meant to be altered. Different strategies and techniques have been developed and employed, some with better outcomes for in vitro or in vivo research applications, and others which yield more auspicious results for therapeutic and clinical uses.
Traditional physical methods such as microinjection [136,137] or electroporation [138,139] have been successfully used with CRISPR to engineer embryonic stem cells and zygotes that later originate genetically modified animals. However, microinjection is an intrusive method that can damage the cell and requires the individual injection of each cell, thus constituting a laborious and time-consuming task [140]; and although electroporation is less invasive and allows for the editing of multiple cells at the same time [139], both techniques are only suitable for in vitro use. Hydrodynamic injection is another physical means of gene delivery that has been used to modify liver cells with CRISPR Antibiotics 2019, 8, 18 13 of 25 in vivo [141,142]. Even though said techniques are widely used in research labs, these shortcomings combined with low efficiency limit their use in human gene therapy [143,144].
Viral vectors compose a versatile means of delivery with diverse in vitro and in vivo practical applications depending on the chosen vector. For example, adeno-associated viral (AAV) vectors have been used in a dual-cassette system as a way of delivering up to three plasmid-incorporated sgRNAs to the same cell to study gene function in vivo by multiplex gene editing [145] or to create disease models [146]. Due to the different tissue tropism of each AAV serotype, the encapsulated systems can easily be directed towards the tissue of interest by choosing the serotype that better suits the experiment [147]. Genome-wide screening of gene function is another use for viral vectors, through lentiviral gRNA libraries [148,149]. Viral vectors are one of the main means of gene therapy delivery in clinical trials, although their utility and great efficiency might be hindered by several factors, such as immunogenicity, limited insertion capacity (AAV), carcinogenesis (mainly lenti-and retroviruses), and off-target effects (lentiviruses) [99,[150][151][152].
Recently developed non-viral vectors have shown promising uses as fitting alternatives to viral and physical methods. As an example for in vitro and ex vivo, cell-penetrating peptides (CPP), conjugated with Cas9 and complexed with gRNA enabled efficient gene silencing in human cell lines and disease models featuring fewer off-target effects when compared to plasmid transfection [153,154]. Another prospect for in vitro and ex vivo CRISPR delivery is through cationic arginine gold nanoparticles (ArgNPs) with engineered Cas9 systems, enhancing the cytosolic delivery of Cas9-gRNA, with an editing efficiency of 30% [155]. As for in vivo, DNA-conjugated gold nanoparticles complexed with endosomal disruptive polymers were used to deliver Cas9, gRNA, and a DNA template to treat Duchenne muscular dystrophy in mice through homology-directed repair, by correcting the mutation that causes this congenital myopathy [156]. Lipid nanoparticles (LNP) are another way to tackle the delivery of CRISPR systems to the cells, with the advantages of being biodegradable and well tolerated. One of such LNP-mediated delivery systems, LNP-INT01, was used with CRISPR to repair the mutated Ttr gene that causes transthyretin (TTR)-mediated amyloidosis due to the accumulation of amyloid proteins [157]. A single dose of LNP-INT01 achieved a reduction of more than 97% in serum TTR.

Applications of CRISPR-Cas Systems
The versatility and ease of use of the CRISPR methodology, combined with the constant developments to mitigate its flaws, have gathered the attention of researchers from all fields of science who look for ways in which they can use CRISPR to improve and hasten their scientific endeavours.

Oncology
CRISPR can be used to dissect the diverse genetic and epigenetic factors that are involved in cancer and tumorigenesis. Through HDR-or NHEJ-mediated silencing or knock-in of oncogenes and tumour suppressor genes in vitro, ex vivo, or in vivo, researchers have used CRISPR systems to create cell lines and animal models of certain types of cancer [142,146], as well as to study the impact of specific genes on the progression of the disease [158,159]. Therapeutic uses for CRISPR in cancer are also being researched. As an example, in vivo CRISPR-mediated knockout of NANOG and NANOGP8, genes involved in prostate cancer, resulted in a decrease of tumorigenic potential in mice [160]. Another example is the knockout of MDR1 in osteosarcoma cells with Cas9, which reduced cell resistance to chemotherapeutic agents [161]. Clinical trials using CRISPR as a therapeutic agent in cancer are currently underway. In China, researchers from Sichuan University are studying the use of CRISPR-Cas9 for ex vivo engineering of autologous human T-cells, as a treatment for metastatic lung cancer (ClinicalTrials.gov Identifier: NCT02793856). In the United States of America, another clinical trial based on CRISPR-mediated editing of human T-cells is already recruiting, focusing on the treatment of multiple myeloma, sarcoma and melanoma patients (ClinicalTrials.gov Identifier: NCT03399448).

Genetic Diseases
The correction of diseases arising from genetic aberrations is one of the most obvious use cases for CRISPR methodologies, due to its ability to produce specific changes in the genome. This includes diseases like cystic fibrosis, caused by a mutation in the CFTR gene [162], or sickle cell disease, prompted by inheriting two dysfunctional copies of the β-globin (HBB) gene, where at least one of those expresses the sickle hemoglobin (HbS) mutation [163]. These inheritable and chronic diseases shorten the life expectancy and quality of life of the carriers [164,165], and can only be attenuated by symptomatic treatments, since the only potential cure available at the moment is stem cell transplant for sickle cell disease [166]. Using CRISPR, researchers were able to correct mutant intestinal stem cells from cystic fibrosis patients and restore their function in vitro [167], providing the first step to advance gene therapy in cystic fibrosis patients with CRISPR. Sickle cell disease has also seen promising breakthroughs, Li and colleagues having successfully corrected the disease-causing mutations in pluripotent stem cells with HDR-mediated Cas9 activity, without noticeable off-target effects [168]. Clinical trial applications for treatment of sickle cell disease using CRISPR/Cas9 systems have already been submitted, and the clinical trials are set to start in 2019 (NCT03745287).

Viral Diseases
CRISPR systems can be employed to combat viral diseases by disrupting the viral replication mechanisms and restoring the infected cell to normality. Cas9 has been used to target conserved regions of the hepatitis B virus (HBV), which are responsible for virus persistence and replication and are not directly targeted by current anti-viral therapies [169]. With Cas9, several research groups targeted this core region of HBV genome, both in vivo and in vitro, successfully supressing the virus with significant and long-lasting reductions in viral load and antigen production, which are related to the severity of the disease [170,171]. Researchers have also eradicated human immunodeficiency virus (HIV-1) from human CD4+ T-cells, by removing parts of its integrated genome using Cas9 [172]. Interestingly, the cured cells were also less susceptible to future infection by HIV-1. CRISPR/Cas13a systems also emerge as an answer to RNA viruses do their capacity to cleave RNA molecules, although due to its novelty ongoing research still focuses on identifying the most potent and specific Cas13 variants [173,174].

Bacterial Infections
The emergence of antimicrobial-resistant bacteria is one of the most concerning aspects for public health specialists nowadays. Misuse and overuse of antibiotics have led to an increasing number of multidrug-resistant strains [175], urging the need for new ways to fight back bacterial infections. CRISPR systems might be an alternative or a tool to use together with conventional antibiotics, either by disabling antibiotic-resistance genes or by developing toxicity in bacteria through the cleavage of crucial domains of their genome, exerting a bactericidal effect [60]. With CRISPR/Cas9, Citorik et al. targeted sequences that enabled antibiotic resistance and virulence in E. coli strains [176]. Researchers used Cas9 to target β-lactamase antibiotic resistance genes commonly found in high-copy plasmids in E. coli. In a first attempt to deliver this system to the bacteria, the CRISPR machinery was included in a conjugative plasmid. However, this conjugation-based approach resulted in low efficiency, and so researchers turned to bacteriophages as a possible delivery method since they easily inject DNA into particular species of bacteria. The CRISPR system was packaged into phagemid vectors, plasmids that can be loaded into phage capsids [177]. With this method, E. coli bearing the antibiotic resistance plasmid sequence were made vulnerable to antibiotic, while causing no unwarranted effects on WT bacteria. In the same paper, using larvae of Galleria mellonella (wax moth) as an intestinal infection model, researchers directed Cas9 towards the gene of intimin, a virulence factor of Enterohemorragic E. coli. Treatment with Cas9 improved larvae survival and was more effective than treatment with chloramphenicol, an antibiotic to which the E. coli strain was resistant.
Later in the same year, a different research group focused on reprogramming CRISPR to target virulence genes in Staphylococcus aureus [178]. Using the same phage-based approach to deliver Cas9 systems to bacteria, Bikard et al. targeted the kanamycin resistance gene aph-3, a gene carried in the chromosome of strains used in the experiment. This resulted in strong growth inhibition of S. aureus due to chromosome cleavage and subsequent cell death. Switching to an in vivo mouse skin colonization model of S. aureus infection, treatment with CRISPR also led to a significant reduction of antibiotic-resistant bacteria.
In both papers, researchers successfully employed the multiplexing capabilities of Cas9 to target multiple chromosomal/plasmid sequences at a time in bacteria [176,178]. This can be useful to target more than one species of bacteria with a single agent, or to affect two different sequences in the same species.
Another aspect highlighted by both research groups is the specificity of CRISPR-based antimicrobials in the treatment of bacterial infections, since it acts selectively on virulent bacteria without affecting the neighboring bacteria. The high degree of specificity is one advantage of this strategy over antibiotics or phages, which kill virulent and innocuous bacteria alike, thereby affecting the microbiota and potentially selecting for resistant bacteria.
Yu et al. suggested a different way in which CRISPR systems can be useful in the fight against bacterial infections [179]. WAP-8294A antibiotics are produced by Lysobacter in a very low amount and only under strict conditions. These compounds exhibit potent activity against methicillin resistant S. aureus, but the difficulties in obtaining them are an obstacle for researchers. The very low yield is thought to be a self-defense mechanism of Lysobacter against the strong activity of these compounds. Building on this, researchers fused dCas9 with a transcription activator to increase the expression of a selected group of genes that had a fundamental role in protecting Lysobacter from the action of WAP-8294A compounds. Ultimately, this resulted in a 4-to 9-fold increase in the yield of three WAP-8294A antibiotics.

Crop Industry
Diseases are not the only application for the newfound CRISPR technology, with many industry fields experimenting with CRISPR systems to come up with new methodologies and techniques. One such field is crop science, where researchers consistently breed new varieties of plants to improve agricultural output, confer resistance to certain pathogens, or change specific traits like fruit size. Crop engineering with CRISPR is already in motion, with researchers developing cucumbers with broad virus resistance without affecting plant development [180], improving the yield of maize crops under drought stress [181], or producing seedless tomatoes [182]. The ease of use and precise editing provided by CRISPR systems has the potential to reduce costs and breeding time in crop engineering, improving over current genome editing technologies [183].

Conclusions
Gene editing techniques have been around for over forty years, yet the limitations affecting their use are still significant in several fields of science. Ethical issues are one of the key concerns among the scientific community, mainly due to the harmful consequences that can result from the genetic manipulation of human and animal germlines. The insufficient precision and efficiency of currently available techniques are also two of the main deterrents against a more widespread use of genetic manipulation.
With this in mind, CRISPR-Cas systems seem to be rapidly changing the landscape of the genetic engineering field. Although initial Cas systems used for genetic engineering were more efficient and simpler than methods such as TALENs and ZFNs, their relatively low specificity and presence of off-target effects meant that they were still not the perfect tool for genetic manipulation.
However, new variants with improved precision and reduced off-target effects while maintaining the original efficiency have been developed and, therefore, the main limitation of these systems has been offset. The applicability of CRISPR-Cas systems is yet to be seen on a larger scale, but the results of upcoming clinical trials using this technology might kick-start new CRISPR "fever", leading CRISPR systems into mainstream use.