Ectopic Spacer Acquisition in Streptococcus thermophilus CRISPR3 Array

Streptococcus thermophilus relies heavily on two type II-A CRISPR-Cas systems, CRISPR1 and CRISPR3, to resist siphophage infections. One hallmark of these systems is the integration of a new spacer at the 5′ end of the CRISPR arrays following phage infection. However, we have previously shown that ectopic acquisition of spacers can occur within the CRISPR1 array. Here, we present evidence of the acquisition of new spacers within the array of CRISPR3 of S. thermophilus. The analysis of randomly selected bacteriophage-insensitive mutants of the strain Uy01 obtained after phage infection, as well as the comparison with other S. thermophilus strains with similar CRISPR3 content, showed that a specific spacer within the array could be responsible for misguiding the adaptation complex. These results also indicate that while the vast majority of new spacers are added at the 5′ end of the CRISPR array, ectopic spacer acquisition is a common feature of both CRISPR1 and CRISPR3 systems in S. thermophilus, and it can still provide phage resistance. Ectopic spacer acquisition also appears to have occurred naturally in some strains of Streptococcus pyogenes, suggesting that it is a general phenomenon, at least in type II-A systems.


Introduction
Streptococcus thermophilus is a lactic acid bacterium used extensively for the manufacture of several fermented dairy products, such as yogurt and several cheeses [1][2][3][4]. Siphophage infections of these Gram-positive bacteria are the leading cause of milk fermentation failures worldwide [5][6][7][8]. One important strategy for controlling virulent phages in industrial settings is to select and use natural bacteriophage-insensitive mutant (BIM) strains as starter cultures. It is now well-documented that S. thermophilus strains rely on the CRISPR-Cas system (a prokaryotic adaptive immune system) to protect itself against phage attacks [9][10][11][12][13][14][15]. This system is composed of a clustered regularly interspaced short palindromic repeats (CRISPR) array and its associated cas genes [16][17][18]. These cas genes encode Cas proteins, some of which are used by the bacteria to acquire new immunities by integrating short DNA sequences, called spacers, from invading DNA, such as phage genomes, at the leader 5 end of the CRISPR array. Then, the CRISPR array is transcribed and processed into small interfering RNAs (called crRNAs) [19,20]. These crRNAs form ribonucleoprotein complexes with Cas proteins and destroy invading DNA through base-pair recognition and cleavage [21][22][23].
CRISPR-Cas systems are currently divided into two classes, six types, and several subtypes [24]. For most of them, the integration of new spacers is mostly driven by two proteins, Cas1 and Cas2, which form an integrase-like complex [25]. During the adaptation process, the first repeat of the CRISPR array is also duplicated upstream of the newly acquired spacer [26]. S. thermophilus strains have been shown to possess up to four different CRISPR-Cas systems: two distinct type II-A systems, one type I-E, and one type III-A [22]. So far, only the two type II-A systems, CRISPR1 (CR1) and CRISPR3 (CR3), appear to be active in spacer acquisition [27,28], with the large majority of the acquisition events taking place in the CR1 array [29], which is the most predominant system in this bacterial species.
Type II-A CRISPR-Cas systems are composed of four genes coding for Cas9, Cas1, Cas2, and Csn2 [30]. All four are required for the adaptation step in vivo, which implies probable interactions between them [31]. For type II CRISPR-Cas systems, the signature gene is cas9, which encodes a multidomain protein that combines the functions of the crRNA-effector complex, targets DNA cleavage, and contributes to the selection of new spacers during the adaptation stage [30]. Cas9 recognizes short motifs called protospacer adjacent motifs (PAMs) within the invading DNA. When a PAM is recognized, the adjacent protospacer sequence can be integrated into the CRISPR array as a new spacer. Csn2 also appears to be critical in the adaptation stage as a csn2 gene insertion mutant was found to be incapable of acquiring new spacers in response to phage infection [3,11]. Of note, the two type II-A systems of S. thermophilus use a different PAM [21,22,27].
Cryoelectron microscopy analyses of the Cas1-Cas2-Csn2 complex from the S. thermophilus CR3 system showed a large multi-subunit complex (Cas1 8 -Cas2 4 -Csn2 8 ) with a channel occupied by (approx. 30 bp) double-stranded DNA, also suggesting a protective role for the complex. Spacer adaptation complexes may have quite different architectures, but the speculated model for spacer capture suggest that: (1) the Cas1-Cas2-Csn2 complex engages free DNA ends from invading dsDNA phage genome and encircles it within the complex, (2) the Cas1-Cas2-Csn2 complex slide on the DNA until it encounters Cas9 that is bound to a PAM, (3) DNA is cleaved, releasing Cas9 and the Cas1-Cas2-Csn2 complex, encapsulating the DNA as a new spacer ready for integration [31].
One of the hallmarks of the acquisition of new immunities is that novel spacers are typically integrated at the leader-proximal region of the CRISPR array [32]. This polarity is guided by the leader sequence upstream of the CRISPR array [29]. It is likely that sequences within the leader elements of CRISPR loci are important given that novel spacers are introduced adjacent to the leader in several systems. It was reported that the integrity of the 3' end of the leader sequence (called leader anchoring sequence or LAS) is crucial for the polarized acquisition of new spacers [33]. It has been also shown that Cas1 contains a DNA-binding region that binds this leader DNA [34][35][36]. This sequence (most notably the 5 -GAG-3 at the 3 end) is highly conserved across type II systems. In the absence of an appropriate LAS, other short nucleotide sequences within the Streptococcus pyogenes type II-A associated CRISPR array were shown to guide ectopic (acquired at positions other than the 5 end) spacer integration in the heterologous host Staphylococcus aureus [33]. Ectopic spacer acquisition was also recently observed in the CRISPR array associated with the type II-A system of Streptococcus mutans [37].
In a previous study, we observed ectopic spacer acquisition in the CR1 array of S. thermophilus BIMs, obtained after a phage-sensitive host was exposed to virulent phages of the Siphoviridae family [38]. In this study, we investigated whether ectopic spacer acquisition could also occur following a phage infection in the second active type II-A CRISPR-Cas system of S. thermophilus, namely CR3, as well as determining if the presence of a LAS sequence was needed.

Phage and Bacterial Strains
Virulent siphophage 53 (cos-type) was previously isolated from a failed mozzarella production in Uruguay and its complete genome is available (accession no. KT717084) [38]. Phage-sensitive S. thermophilus strains (Uy series) were obtained from a local starter culture supplier and grown in LM17 medium at 42 • C. Phages and bacterial strains were stored, as frozen stocks, in LM17 supplemented with 15% (v/v) glycerol. Phage 53 was amplified in LM17 supplemented with 10 mM CaCl 2 (LM17-CaCl 2 ). Briefly, 0.1 mL of a fresh bacterial culture (OD 600 = 0.6) was inoculated in 10 mL of broth and incubated at 42 • C for three hours. Then, 0.1 mL of phage lysate was added and incubated at 42 • C until complete lysis was observed. The lysate was filtered (0.45 µm filters) and stored at 4 • C until used. Phage titer was determined using methods described elsewhere [39].

S. thermophilus and S. pyogenes CRISPR Loci Analysis
The CR3 loci of 50 strains of the Uy collection were amplified by PCR (NEB Phusion High-Fidelity DNA Polymerase) and sequenced using primers CR3-fwd (5-CTGAGATTAA TAGTGCGATTACG-3) and CR3-rev (5-GCTGGATATTCGTATAACATGTC-3) [27]. Bioinformatics analyses to identify the CR3 spacer content of 45 Uy strains were firstly performed with SnapGene (version 4.1.9). In addition, 39 CR3 loci were retrieved from 64 complete S. thermophilus genomes available in GenBank as of October 2020. Similarly, type II-A CRISPR loci were searched from S. pyogenes genomes. A set of 213 complete and circular genomes was available from GenBank as of October 2020. All type II-A loci (116 for S. pyogenes and 84 for S. thermophilus) were identified with CRISPRDetect (version 2.2.3, http://crispr.otago.ac.nz/CRISPRDetect, accessed on 1 March 2021) using default parameters [40]. The putative CRISPRs were also manually checked. CRISPRDetect gff output files were then used with the Python script CRISPRStudio (version 1) [41] to extract, align and cluster the spacer sequences to generate a SVG file for CRISPR loci representation. Output SVG files were edited manually using Illustrator 2020 or Inkscape 1.0.0.

Bacteriophage Insensitive Mutants (BIMs)
BIMs were obtained by infecting the phage-sensitive strain S. thermophilus Uy01 with the virulent phage 53. Briefly, approximately 5 × 10 8 CFU of S. thermophilus were mixed with 1 × 10 8 PFU of phages in 4 mL of soft LM17-CaCl 2 (0.75% agar) and poured on a LM17-CaCl 2 agar plate (1.5% agar). Plates were incubated at least 48 h at 42 • C. Individual colonies were recovered, streaked and re-streaked for purity. Individual colonies were then inoculated and incubated overnight in LM17 broth. The cultures were then tested for phage resistance as described elsewhere [10].

Evidence of Ectopic Acquisition Events in CRISPR3
The analysis of the spacer content of the CR3 array was performed on 64 complete public S. thermophilus genomes available at the time of the study in GenBank along with 50 strains of the Uy collection. A total of 39 (60.9%) CR3 loci were retrieved from publicly available S. thermophilus genomes and 45 (90%) CR3 loci were detected by PCR from the Uy stain collection. Analysis of these arrays showed that they contained a minimum of 5 spacers and a maximum of 44, with a median of 15 spacers per strain. Moreover, the same spacers appeared to have been acquired by some of these wild-type strains or were derived from a common ancestor (Figure 1a). These data also suggested that spacers were either deleted or acquired at specific positions within the CR3 array (Figure 1b). For example, when comparing S. thermophilus strain Uy23 with the strain Uy44 and the reference strains LMD-9 and Uy44, it would appear that two spacers were either deleted in the CR3 array of the Uy44/LMD-9 strains or were acquired by the strain Uy23. Similar events of either deleted or ectopically acquired spacers appear to have occurred within the CR3 arrays of other strains (Figure 1b). In some strains, new spacers appear to have also been subsequently acquired at the 5 end of the array, as with the strain APC151 when compared to strains Uy07 and Uy33. We next sought to identify similar potential and natural ectopic acquisition events in other type II-A systems, outside S. thermophilus. The type II-A system of S. pyogenes system has been shown to acquire spacer ectopically when expressed in a heterologous host [33] Thus, we explored if natural ectopic spacer acquisition could also be inferred from publicly available genomic sequences of S. pyogenes. Moreover, the Cas9 (1368 aa) of S. pyogenes is related to S. thermophilus CR3 Cas9 (1409 aa, 57% ID). Interestingly, the S. pyogenes system uses the PAM 5 -NGG-3 , while the CR3 system of S. thermophilus uses the PAM 5 -NGGNG-3 [22,27].
From the 213 S. pyogenes genomes found in GenBank, 116 (54.5%) CRISPR arrays with at least three repeats were detected. Analysis of these arrays showed that they contained up to a maximum of 13 spacers, with a median of 4 spacers per strain. Of note, five strains (2.3%) with only two repeats and one spacer were also detected. Therefore, the number of spacers is generally smaller in S. pyogenes than in S. thermophilus.
Further analyses suggested that some of the strains acquired the same spacers or that the strains were derived from a common ancestor (Figure 2a). Moreover, specific spacers could have also been either deleted or acquired within the CRISPR array (Figure 2b). For example, when comparing S. pyogenes strain NCTC12044 with the strain NCTC12059, two spacers appear to have been deleted in the CRISPR array of strain NCTC12059 or acquired by the strain NCTC12044. The latter strain may also have acquired two unique spacers at the 5 end, compared to the reference strain.

Evaluation of Spacer Acquisition
To test if ectopic acquisition can indeed occur in the CR3 array of S. thermophilus, bacteriophage insensitive mutants (BIMs) were generated by infecting the phage-sensitive S. thermophilus strain Uy01 (which has nine spacers in its CR3 array) with the virulent phage 53 for a prolonged period of time. Forty-six randomly selected BIMs were analyzed for new spacer acquisition. All of them acquired new spacers in the CR3 array. Only one spacer was acquired per BIM, except for BIM1, which acquired two. A total of 25 different spacers (out of 47) were acquired by these 46 BIMs (Supplementary Table S1), indicating that some BIMs acquired the same spacer. We observed clear ectopic spacer acquisition events in the CR3 of 5 BIMs (10.9%), representing three CRISPR cluster (C) types (C2, C3 and C4; Figure 3). Spacer acquisitions with concomitant spacer deletions were also observed in four BIMs (8.7%) (see C4 and C5, Figure 3). It is unclear if additional ectopic spacer acquisitions also occurred in the three BIMs grouped in C5 (Figure 3). The remaining 38 BIMs acquired spacers at the leader 5 end of the CRISPR array (CR1, Figure 3). Interestingly, over a third (16) of all the acquisition events involved protospacers from orf14 (the longest gene in the phage genome which codes for the tape measure protein), including six of the eight ectopic events ( Figure 4). All the protospacers were flanked by the previously identified PAM for CR3 (5 -NGGNG-3 ) [4,8].  : :: :::t i :: : H\ I Most ectopic acquisition events occurred between spacers 3 and 4. A single ectopic spacer acquisition also occurred between spacers 4 and 5 in BIM1. Interestingly, loss of spacer content also occurred in three of these BIMs, namely BIMs 3, 4, and 6 ( Figure 3). The newly acquired spacers were mapped in the phage genome and perfectly matched protospacers (Table 1), except for one new spacer that could not be matched to the phage genome. The BIMs were tested against phage 53 and all showed a phage-resistant phenotype, including those that had acquired the spacer in the middle of the array. Because the region flanking spacers 3 and 4 appears to be a hot spot for ectopic spacer acquisition in CR3, we investigated the presence of a motif that could mimic the leader sequence, as previously observed in ectopic events in CR1 of S. thermophilus [38]. In both cases, spacers did not contain the GAG motif found in the LAS sequence in the leader sequence ( Figure 5). Only the adenine at position -2 matching the GAG motif was found in five spacers, including spacers 3 and 4.

Discussion
Integration of new spacers at the 5 end of the CRISPR loci is one of the hallmark features of the CRISPR-Cas systems in S. thermophilus [42] and other bacterial species [3], although ectopic spacer acquisition in CR1 was previously described in S. thermophilus [38].
Our results now show that ectopic spacer acquisition can also occur in the CR3 system of S. thermophilus. Spacer analysis of CRISPR type II-A loci of S. pyogenes genomes hinted at ectopic spacer acquisition in this species as well. BIMs were generated by infecting the phage-sensitive S. thermophilus strain Uy01 with the virulent phage 53. All of them acquired spacers in CR3. We also observed that many of the newly acquired spacers came from protospacers found in the orf14 (coding for the tape measure protein) of phage 53, without any obvious reason, other than being the longest gene. Of interest, there are 478 CR3 PAMs in the genome of phage 53, and 68 of them (14%) are in orf14. The absence of spacer acquisition in the CR1 of this strain is unknown, but it could be related to the presence of genes coding for anti-CRISPR proteins in the phage or bacterial genome [43][44][45] or a defect in the spacer acquisition machinery.
Recent work showed the importance of a seven-nucleotide sequence at the 3 end of the leader sequence, called the "leader-anchoring sequence" (LAS), in the acquisition of new spacer [42]. When this (or similar) LAS sequence is found in specific spacers, this may lead to ectopic spacer acquisition events, especially if they have a GAG motif at the 3 end [16]. In a previous study, we observed that ectopic integration occurred when the last G of the leader sequence was missing, and that new spacers could be added at five different positions within the array [38]. We noticed that in four out of the five adjacent spacers to the newly acquired one, an adenine and a guanine were found at position -2 and -1 (3 end), respectively. In agreement with previous studies [17][18][19], our data suggested that the LAS may be limited to only a few nucleotides, including the adenine at position -2 [21]. In the present study, the LAS sequence of the leader did contain the GAG motif, while spacers 3 and 4 did not have the GAG motif at the 3 end (just the adenine at position -2). Similar results were observed in S. pyogenes [16]. In that case, while the LAS sequence (conserved between the two species) was critical for the integration into the first repeat, one of the spacers may have led the addition of new spacers into its downstream repeat while not fully matching the GAG motif of the leader sequence (only the G at the position -3). Therefore, it could be possible that other factors guide ectopic spacer acquisition. Further experimental studies are needed to address the above.
Nevertheless, these results represent the first study showing ectopic spacer acquisition in the CR3 of S. thermophilus. Our analyses also suggest that a similar phenomenon may be naturally occurring in S. pyogenes, supporting the previously observed ectopic spacer acquisition in this type II-A system when expressed in S. aureus [33]. It should be noted that the acquisition of novel spacers at the 5 end of the CRISPR array is still the preferred location of new immunities. However, ectopic spacer acquisition also occurs at various frequencies. While CRISPR arrays still represent molecular archives of past nucleic acids invasion, the chronology of these events may not always be correlated with the spacer position within a given array.

Supplementary Materials:
The following are available online at https://www.mdpi.com/2076-260 7/9/3/512/s1, Table S1: Nucleotide sequences and positions of newly acquired spacers in CRISPR3 of all the BIMs obtained after a challenge with the virulent cos-type phage 53.