Comment on Tanmoy et al. CRISPR-Cas Diversity in Clinical Salmonella enterica Serovar Typhi Isolates from South Asian Countries. Genes 2020, 11, 1365

Tanmoy et al. [...].

Institut Pasteur, Unité des Bactéries Pathogènes Entériques, 75015 Paris, France; laetitia.fabre@pasteur.fr (L.F.); elisabeth.njamkepo-nguemkam@pasteur.fr (E.N.) * Correspondence: francois-xavier.weill@pasteur.fr Tanmoy et al. [1] report new findings relating to CRISPR locus organization and composition in Salmonella enterica serovar Typhi (hereafter referred to as S. Typhi). They reported that S. Typhi isolates can carry up to five different CRISPR loci and about 19% of the tested genomes had three or more CRISPR loci, whereas previous studies reported only two loci [2,3], suggesting that these studies were incomplete due to the use of too small a set of S. Typhi genomes.
As first described by Jansen et al. [4], clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of repeated DNA sequences present in prokaryotes, and they are characterized by 24-47 bp DNA direct repeats (DRs), separated by variable 21-72 bp sequences called "spacers" [5,6]. An A-T-rich "leader sequence" and cas (CRISPR-associated sequence) genes are often identified adjacent to the CRISPR locus. Based on the 16 to 39 assembled genomes from Salmonella available at the time, including two from S. Typhi (CT18 and Ty2), the Salmonella CRISPR region was characterized by our group and others as two loci, CRISPR1 and CRISPR2, separated by less than 20 kb [2,7], with type I-E cas genes in the interval between the CRISPR loci [8] (Figure 1).
We also investigated the polymorphism of CRISPR1 and CRISPR2 by performing a PCR (polymerase chain reaction)-Sanger sequencing analysis on 744 Salmonella reference strains and isolates of 130 serovars (including 18 clinical S. Typhi isolates). Over 3800 unique spacer sequences were identified, stored and can be queried online at: https://galaxy.paste ur.fr/root?tool_id=toolshed.pasteur.fr/repos/khillion/salmonella_crispr_typing/salmone lla_crispr_typing/1.0.0 (accessed on 5 July 2021) [2]. Mean spacer length was 32 bp (29-74 bp), and a 29 bp (26-30 bp) DR consensus sequence was identified; however, single nucleotide polymorphism (SNP) variants were also observed. The strong correlation between spacer content and serovar/multilocus sequence type [2] led to a patent describing a new Salmonella-subtyping method [9]. In the two S. Typhi genomes (Ty2 and CT18) and 18 S. Typhi isolates from diverse genotypes (13 different haplotypes [10]), geographic origins (9 countries) and time periods  studied, six CRISPR1 spacer sequences (Ty-phi1 to Typhi6), one CRISPR2 spacer sequence (EntB0var1) and seven different combined CRISPR1/CRISPR2 profiles were identified [9]. No CRISPR1 spacer was common to all 20 S. Typhi genomes and isolates studied. However, a spacer, EntB0var1, was found in all CRISPR2 sequences. This spacer sequence was then used to develop a S. Typhi serovarspecific PCR assay, which was validated on 188 S. Typhi reference strains and isolates of diverse genotypes (65 different haplotypes [10]), geographic origins (40 countries) and time periods ). This CRISPR2 target was then used in a multiplex PCR for the detection of S. Typhi in blood samples from Bangladesh [11]. The CRISPR/Cas system structure of S. enterica serovar Typhi. The structure shown is that for the representative S. enterica serovar Typhi strain Ty2 (GenBank accession no. AE014613.1). Two CRISPR loci (CRISPR1 and CRISPR2) are present. The CRISPR-associated (cas) genes cas2, cas1, cse3, cas5e, cse4, cse2, cse1 and cas3 genes of the I-E type are located between the CRISPR loci. Diamonds represent direct repeats (DRs), with colored rectangles indicating spacers. The primers used to extract the two CRISPR loci from genomic sequences are shown as red horizontal arrows. The coordinates of the region (based on AE014613.1) are indicated.
For comparison of the results reported by Tanmoy et al. [1] with those from our previous studies [2,3], the 1059 genomes described by the authors were downloaded from EBI-ENA (https://www.ebi.ac.uk/ena/browser/home, accessed on 24 November 2020) and assembled with SPAdes [12], according to the authors' parameters. The metrics of the assemblies (N50, genome size and N contigs) revealed evidence of the contamination of some genomes (ERR2663487, ERR2663542, ERR2663589, ERR2663887 and ERR2663969) with other Salmonella serovars (Enteritidis, Paratyphi A and Worthington), which was confirmed by molecular serotyping and/or multilocus sequence typing (Supplementary  Materials Table S1, "Comment" column). The genotyphi program [13] was used to check the genotypes of the downloaded genomes. Intriguingly, for more than 500 of the 1059 genomes, discrepancies were observed between the results presented by the authors and our analysis (Supplementary Materials Table S2). The discrepancies related to genome sequences from a previous publication by the authors [14]. We found that the strain name/accession code pairs shown in the authors' Dataset S1 [14] did not match those available from EBI-ENA (Supplementary Materials Table S3). We suspect that a single lane shift occurred between the accession codes and associated strain names after strain 311189_226186 (ERR2663487), either during construction of the spreadsheet or after its submission to EBI-ENA. We therefore used only the EBI-ENA accession codes as identifiers for the genomes in our review. This major issue also precluded the use of metadata, other than the country of origin (Bangladesh) and of the data published in the authors' papers [1,14]. We analyzed the CRISPR region with both CRISPRCasFinder [15] and a script developed in-house and based on in silico PCR for identification of the sequences of CRISPR1, CRISPR2 and spacer/DR sequences with the SalmCRISPRtyping script [2] (Supplementary Materials Table S1).
All the S. Typhi genomes belonged to the type I-E CRISPR-Cas system. No new CRISPR spacer or DR sequences were identified. Eleven combined CRISPR1/CRISPR2 profiles were identified among the 1059 genomes tested (Table 1 and Supplementary Materials Table S1). The four new combined CRISPR1/CRISPR2 profiles identified since the previous analysis of 20 S. Typhi genomes [2,3] were due entirely to variations of the number of DR64 at the 3'-end of CRISPR1. No clear association between the combined CRISPR1/CRISPR2 profiles and genotype was observed (Table 1). Salmonella enterica serovar Typhi strain Ty2 For comparison of the results reported by Tanmoy et al. [1] with those from our previous studies [2,3], the 1059 genomes described by the authors were downloaded from EBI-ENA (https://www.ebi.ac.uk/ena/browser/home, accessed on 24 November 2020) and assembled with SPAdes [12], according to the authors' parameters. The metrics of the assemblies (N50, genome size and N contigs) revealed evidence of the contamination of some genomes (ERR2663487, ERR2663542, ERR2663589, ERR2663887 and ERR2663969) with other Salmonella serovars (Enteritidis, Paratyphi A and Worthington), which was confirmed by molecular serotyping and/or multilocus sequence typing (Supplementary  Materials Table S1, "Comment" column). The genotyphi program [13] was used to check the genotypes of the downloaded genomes. Intriguingly, for more than 500 of the 1059 genomes, discrepancies were observed between the results presented by the authors and our analysis (Supplementary Materials Table S2). The discrepancies related to genome sequences from a previous publication by the authors [14]. We found that the strain name/accession code pairs shown in the authors' Dataset S1 [14] did not match those available from EBI-ENA (Supplementary Materials Table S3). We suspect that a single lane shift occurred between the accession codes and associated strain names after strain 311189_226186 (ERR2663487), either during construction of the spreadsheet or after its submission to EBI-ENA. We therefore used only the EBI-ENA accession codes as identifiers for the genomes in our review. This major issue also precluded the use of metadata, other than the country of origin (Bangladesh) and of the data published in the authors' papers [1,14]. We analyzed the CRISPR region with both CRISPRCasFinder [15] and a script developed in-house and based on in silico PCR for identification of the sequences of CRISPR1, CRISPR2 and spacer/DR sequences with the SalmCRISPRtyping script [2] (Supplementary Materials Table S1).
We took a closer look at the different CRISPR loci to understand the discrepancies between the authors' study and this review. First, our CRISPR2 locus was not identified by the authors in any of the 1059 genomes studied. This is not particularly surprising, as we had already reported in 2012 that CRISPRCasFinder software [15] was unable to detect the short CRISPR2 locus of S. Typhi strains Ty2 and CT18, which have a unique spacer (EntB0var1) between two DRs (DR27 and DR), one of which is degenerate (identity of 20/29 bp). Surprisingly, this undetected CRISPR2 target was previously used by the first and senior authors of the Tanmoy et al. [1] article, for the detection of S. Typhi by PCR in clinical samples [11]. The number of loci we identified (Supplementary Materials Table S1) with CRISPRCasFinder software (including loci with a low evidence score) did not really match the number of loci reported by the authors. The nucleotide sequences of each locus defined in the authors' Table 4 were then reconstructed according to the DR-spacer sequences described in their Dataset S3 [1], and the blastn algorithm of blast+ v.2.6.0 was then used to check for these sequences. For "Group-A" loci, patterns a1 and a3 (described in their Table 4) corresponded to CRISPR1 and CRISPR2 of S. Enteritidis, respectively. Therefore, only patterns a2, and a4 to a7 consist of S. Typhi CRISPR1 spacer sequences. One of these patterns, a6, found in genomes ERR2663783 and ERR2663776, is actually identical to the 421 bp length variant of a2 and should therefore be withdrawn. In comparison, our analysis identified 11 alleles for CRISPR1 ( Table 1). Some of our profiles containing a limited number of CRISPR1 spacers, such as P1 (one spacer) and P6 (two spacers), were not identified by the authors (Supplementary Materials Table S5). Our profiles P5 and P8 correspond to their patterns a4 and a7, respectively. For the other profiles, there was no good correlation with their patterns. In particular, their prevalent pattern, a2, could be broken down into eight different profiles according to our analysis, due to the presence of DR64, sometimes repeated, in various profiles (Table 1). DR64 was not identified by the authors.
Regarding the new CRISPR loci identified by the authors (corresponding to "Group-B" loci), we found that the b4, and b21/b22 patterns corresponded to the normal CRISPR1 array found downstream from the iap gene [2] ("group-A" locus according to Reference [1]) and not to new CRISPR loci. Pattern b21, in particular, corresponds to our profile P6 (see above). The other "Group-B" loci were scored as "low evidence" by CRISPRCasFinder (levels 0 and 1 versus level 4 for CRISPR1), and no cas genes were detected in their vicinity. Furthermore, most of these new CRISPR loci consist of a minimal array (DR-spacer-DR). Investigation of the most frequent "Group-B" patterns, namely b1, b9 and b10-b13 (Supplementary Materials Figure S1) identified by the authors revealed that these "Group-A" loci consisted of repeated sequences, some being genuine variable number of tandem repeats (VNTRs). Hence, in the region defining "Group-B" loci patterns b10-b13, a large sequence of 93 bp (fusion of Ts54a/b and Td39a), repeated up to six times, depending on the genome, was misinterpreted as a CRISPR locus. Consequently, single loci consisting of limited numbers of repeated sequences with no nearby CRISPR-cas machinery, such as the "Group-B" loci described by the authors, should not be considered to be CRISPR loci. This analysis confirms our previous results [2,3,9] and those of other groups [7,8] by showing that the genetically homogeneous S. Typhi population contains a single CRISPRcas system (type I-E), with two adjacent CRISPR loci. Both CRISPR loci contain a limited number of spacers (1-6 in CRISPR1 and one in CRISPR2), as observed in other host-adapted Salmonella serovars with altered cas genes [2].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/1 0.3390/genes12081142/s1. Figure S1: Analysis of various "Group-B" CRISPR loci patterns described by Tanmoy et al. [1]. Table S1: Main characteristics of the 1059 S. enterica serovar Typhi genomes studied. Table S2: Discrepancies between genotype data from the study by Tanmoy et al. [1] and our study. Table S3: Strains described in Tanmoy et al. [14] and their different accession codes. Table S4: Correspondence between the spacer and DR sequences described by Tanmoy et al. [1] and those from Fabre et al. [2], and prevalence of these sequences among the 1059 S. enterica serovar Typhi genomes studied. Table S5: Correspondence between the "Group-A" CRISPR locus patterns from Tanmoy et al. [1] and our CRISPR profiles.

Conflicts of Interest:
The authors declare no conflict of interest.