1. Introduction
Typhoid fever is a systemic enteric infection, caused by
Salmonella enterica serovar Typhi (
S. Typhi), a human-restricted bacterial pathogen [
1,
2]. It is estimated to lead to 117 thousand deaths and 11 million episodes of illnesses every year and thus remains a major global public health concern [
3]. The fecal–oral transmission route of
S. Typhi makes typhoid fever highly endemic in areas with poor water and sanitation systems, especially the South Asian countries such as Bangladesh, India, Nepal, and Pakistan [
3,
4]. Moreover, treating typhoid fever has become harder, because of the increasing antimicrobial resistance (AMR) [
5]. Recently, a highly clonal and extensively drug-resistant (XDR) lineage of
S. Typhi that is resistant to all, but one oral antibiotic, azithromycin, caused a large-scale typhoid outbreak in Pakistan [
6]. A highly ciprofloxacin-resistant lineage (named ‘Bdq’; as a part of genotype 4.3.1.3, it will be referred to as 4.3.1.3q1 in the rest of the article) has appeared in Bangladesh and carries a
qnr gene-containing plasmid, pK91 [
5,
7]. Isolates with high azithromycin resistance have been reported in Bangladesh as well [
8,
9]. With the availability of whole-genome sequence (WGS) data, these AMR characteristics can be easily detected and a large amount of WGS data is publicly available for
S. Typhi. WGS data can also shed light on the presence of defense mechanisms that can recognize and destroy foreign genetic materials [
10].
One such system is the
Clustered
Regularly
Interspaced
Short
Palindromic
Repeat and CRISPR-associated genes (CRISPR-Cas) for which little information is available in
S. Typhi [
11,
12,
13,
14]. A CRISPR locus usually contains two to several hundreds of direct repeat (DR) sequences of 23–50 bp in length, separated by unique spacer sequences of similar length [
15]. Spacers share complementarity with sequences identified in foreign DNA elements (protospacers) and are acquired from phages, plasmids, and other transferrable elements that previously infected bacteria [
16,
17,
18]. To differentiate foreign DNA elements from self-DNA, the Cas proteins follow often at least three-nucleotide long protospacer-adjacent motif (PAM) present on the target sequence [
19,
20].
The genus
Salmonella is known to carry a class-1 type I-E system, closely related to the CRISPR-Cas system in
Escherichia coli (
E.
coli) [
21,
22]. The systems have been reported to carry either one or two CRISPR loci and a
cas-gene cluster of
cas3,
cse1-
cse2-
cas7-
cas5-
cas6e-
cas1-
cas2 genes [
2,
14]. CRISPR-Cas systems in other bacterial species have been explored extensively for typing purposes [
23]. For AMR, it became evident that the size of the CRISPR loci correlates with the presence or absence of AMR-related genes [
24,
25,
26,
27]. In
S. Typhi, only a few studies explored the usage of the CRISPR-Cas system for typing purposes, which is still an unexplored territory [
11,
12]. Moreover, the earlier studies analyzed only a smaller number of whole-genome sequences (WGS) to explore the diversity of the system. For example, Fabre et al. used 18
S. Typhi WGS data to report two different CRISPR loci in the genome (CRISPR1 and CRISPR2) and used PCR assays to amplify those loci to explore the diversity of DR and spacers [
11]. Therefore, an opportunity exists to follow-up this work with a larger set of WGS data to explore the
S. Typhi CRISPR-Cas system further and report on its diversity as well.
In this work, we analyzed the S. Typhi CRISPR-Cas system using WGS data of 1059 isolates obtained from four major typhoid-endemic countries (Bangladesh, India, Nepal and Pakistan) with the country of isolation, demographic data, and AMR status. Our work identified potential CRISPR-Cas system-related markers that associate specifically with endemic and AMR-related S. Typhi isolates. We further identified unique spacer targets in bacteriophages and plasmids that led to the identification of a specific PAM sequence for S. Typhi. Next, we annotated common and new cas genes, of which one, the gene WYL, could be specifically linked to XDR isolates from Pakistan. Collectively, our study reveals with an impressive dataset that the CRISPR-Cas system in S. Typhi might become of use to monitor the dissemination of AMR endemic isolates so that their spreading can be contained.
4. Discussion
We here show that
S. Typhi isolates can carry up to five different CRISPR loci and about 19% (203/1059) had three or more CRISPR loci (
Figure S2). Although previous studies reported only one or two loci [
2,
11,
12], they analyzed WGS data of a handful of
S. Typhi isolates, a maximum of 18 genomes by Fabre et al. [
11], which could be the reason why others missed the third, fourth, or the fifth loci. However, these isolates carried only one group-A CRISPR locus with a high spacer count, resembles CRISPR1 in the previous nomenclature used by Fabre et al. and it agrees with a few of the previous reports on the CRISPR-Cas system in
S. Typhi [
11,
12]. However, nearly 40% (422/1059) of our isolates had only one CRISPR locus and their number was significantly higher among Bangladeshi surveillance isolates, while the Pakistani outbreak isolates had a relatively lower average loci number (
p < 0.001;
Figure 1a and
Table 1). Local and highly clonal
S. Typhi lineages have been reported from both countries [
5,
6] and none of these lineages had higher average numbers of CRISPR loci (
Figure 1b). Hence, clonality could be a contributing factor for the lower number of CRISPR loci identified in these isolates.
Haplotype specificity of the
S. Typhi spacer arrangement patterns has been described [
11]. We could not confirm those associations [
11], primarily, because we were unable to identify the same spacers, except one, Ts32v (match 31/32 bp of a spacer from CRISPR2 described by Fabre et al. [
11]). However, our study revealed multiple spacers (Ts32g, Ts32h, and Ts32i), spacer arrangement patterns (a2 and a5), DRs (Td23a, Td35a, and Td39a-b), and DR-spacer pairing patterns (Ts34d-Td35a, Ts55a-Td23a, and Ts54a-Td39a/b) specific to different AMR, country, genotype or surveillance, travel, and outbreak characteristics (
Figure 2,
Figure 3 and
Figure 5,
Table 2 and
Table 4,
Figure S6, Table S3, and Dataset S1). The identified spacer, DR, and DR-spacer patterns could, therefore, be further exploited by CRISPR-based diagnostic platforms like SHERLOCK or DETECTR for clinically relevant samples [
36,
37] to identify AMR among endemic isolates that are spreading in and beyond South Asian countries [
29,
38].
The spacer sequences of
S. Typhi showed remarkable conservation, and only 47 unique spacers were detected in 1919 CRISPRs identified in the genomes of 1059
S. Typhi isolates (
Table 3 and
Dataset S3). Many spacers in group-A loci (Ts32c, e, g, h, i, and l) were almost universally present in all
S. Typhi isolates, whereas specific spacers (Ts55a, Ts54a, Ts34d) showed high numbers of presence in group-B loci (
Figure 2 and
Table 3). Reports on CRISPRs identified in other pathogens described a higher number of unique spacers, i.e., 2823 spacers from 669
Pseudomonas aeruginosa and 745 from 100
E. coli isolates [
26,
39]. In our study, 48 other
Salmonella (19 different serovars) and six
E. coli isolates showed 857 unique spacers from 136 CRISPR loci and 118 unique spacers from 35 loci identified in their genome, respectively (
data not shown). However, a study of 400
Salmonella enterica isolates of four serovars (Enteritidis, Typhimurium, Newport, and Heidelberg) reported 179 unique spacers [
21]. A lower number of unique spacers have also been reported for pathogens like
Campylobacter jejuni,
Neisseria meningitidis,
Pasteurella multocida,
Streptococcus agalactiae, and
Shigella spp. [
40,
41]. Such conservative nature of
S. Typhi spacers could be due to host-restriction of
S. Typhi.
It is now well established that spacers are likely to share complementarity with a target sequence (protospacer) in foreign DNA. The
S. Typhi CRISPRs have been studied before, but the PAM sequence was yet to be defined. In our work, we report for the first time a possible PAM sequence, TTTCA/T. Although this PAM is based on the protospacers of only nine different spacers (
Table 6), the nearly universal presence of two phage-targeting spacers, Ts32g (
n = 1054) and Ts32i (
n = 976), make this PAM motif more plausible. Besides that, Ts32i also targets a
Salmonella phage suggestive for a functional CRISPR-Cas-related viral immunity system to protect the
S. Typhi genome against bacteriophages.
Furthermore, the differentiation between the spacers or DRs of group-A and -B CRISPR loci were evident in our work. Very few spacers (
n = 8) and DRs (
n = 1) were present in both groups and considering the spacer targets, the
S. Typhi group-A CRISPR loci seem more associated with phage defense, whereas group-B CRISPR loci potentially play a role in the defense against plasmids (
Table 3 and
Table 6). This is not a common finding since the reports of defense mechanisms in other bacterial species against phages and plasmids are mainly linked to group-A CRISPR loci [
42,
43,
44].
Similar to the previous reports [
11,
12,
13], the CRISPR-Cas system identified in our study belongs all to the type I-E category in the case of S. Typhi. Among the identified
cas genes, very few (
n = 5) had an incomplete reading frame (
Figure 7), which could be caused by non-sense mutations or sequencing errors. However, all
cas gene loci were detected near a group-A locus, except six, where a group-B locus was present instead (
Figure 7b). Thus, most of the group-B loci can be called “orphan” loci. According to the CRISPRCasFinder tool, CRISPR loci with low evidence score (which we termed group-B loci) might be false-positive, but some of the CRISPR arrays can be real. Indeed, the CRISPRCasFinder tool was specifically designed to identify these types of CRISPR loci so they could be functionally studied [
32]. To our knowledge, orphan loci have never been reported for
S. Typhi before. However, as identified in other prokaryotes, they can exist and even be functional without nearby
cas-gene loci [
15,
16,
18,
32,
45,
46].
We also identified three different
cas genes of other types of CRISPR-Cas system, i.e.,
DinG,
DEDDh, and
WYL (
Figure S9 and Table S4). Although the presence of the
DinG family helicase gene suggests an existing type-IV-A CRISPR-Cas system [
33], no other
cas-genes of that system were found. No CRISPR loci were present on the same contigs either, but that is not uncommon for this type of system [
16,
18]. The type IV-A system is considered as a degraded derivative of class 1 CRISPR-Cas system, hypothesized to be originating from combinations of mobile genetic elements [
16,
18,
47]. The presence of multiple copies of the
WYL gene (part of the type-I system) among the
S. Typhi isolates in our study was interesting, as two copies of this gene,
WYL693 and
WYL888, had a difference in origin and presence. The former had a chromosomal match, whereas the latter was probably plasmid-borne (
Table S4).
WYL888 matched the plasmid sequences of genotype 4.3.1.3q1 (Bdq lineage) and 4.3.1.1.P1 (XDR lineage) [
5,
6,
7], making it a potential biomarker for these resistance lineages. However, the role of
WYL888 on these plasmids remains to be elucidated. Remarkably, both the
S. Typhi lineages completely lacked a copy of the
DEDDh558 gene (
Table S4). Proteins containing the WYL domain are not uncommon in bacteria and have been reported to regulate transcription of the CRISPR-Cas systems [
48]. The
DEDDh gene, on the other hand, has defined exonuclease activity and can fuse with
cas1 and
cas2 genes to exert such function [
49,
50]. The presence of multiple DEDDh domains in
S. Typhi genomes may indicate a compensatory role for the shorter
cas3 gene (compared to other
Salmonella species,
data not shown), which also functions as an exonuclease.