Characterization of CRISPR-Cas Systems in Serratia marcescens Isolated from Rhynchophorus ferrugineus (Olivier, 1790) (Coleoptera: Curculionidae)

The CRISPR-Cas adaptive immune system has been attracting increasing scientific interest for biological functions and biotechnological applications. Data on the Serratia marcescens system are scarce. Here, we report a comprehensive characterisation of CRISPR-Cas systems identified in S. marcescens strains isolated as secondary symbionts of Rhynchophorus ferrugineus, also known as Red Palm Weevil (RPW), one of the most invasive pests of major cultivated palms. Whole genome sequencing was performed on four strains (S1, S5, S8, and S13), which were isolated from the reproductive apparatus of RPWs. Subtypes I-F and I-E were harboured by S5 and S8, respectively. No CRISPR-Cas system was detected in S1 or S13. Two CRISPR arrays (4 and 51 spacers) were detected in S5 and three arrays (11, 31, and 30 spacers) were detected in S8. The CRISPR-Cas systems were located in the genomic region spanning from ybhR to phnP, as if this were the only region where CRISPR-Cas loci were acquired. This was confirmed by analyzing the S. marcescens complete genomes available in the NCBI database. This region defines a genomic hotspot for horizontally acquired genes and/or CRISPR-Cas systems. This study also supplies the first identification of subtype I-E in S. marcescens.


Introduction
Prokaryotic genomic loci, named Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-CRISPR associated proteins (CRISPR-Cas system) are adaptive immune systems that confer resistance to the entry of intracellular parasitic elements like phages and plasmids [1][2][3]. These systems might also play additional functions, such as regulation of virulence and the stress response [4][5][6][7]. The CRISPR-Cas systems consist of at least one array where DNA sequences of 25-35 bp each (direct repeats-DRs) with a high level of nucleotide identity are regularly interspaced by unique sequences, termed spacers (typically 30-40 bp each), that may differ considerably, depending on their extracellular origin [8].
The CRISPR arrays are generally flanked at one end by an AT-rich, non-coding region (leader sequence) extending from 47 bp to a few hundred bp, which promotes CRISPR transcription and guides the insertion of new spacers [9][10][11][12]. Single or multiple CRISPR arrays may be found adjacent to Cas proteins; the latter mediate adaptive immunity and various connected functions [7,13,14]. The Cas proteins are grouped into two functional modules [14][15][16]: (i) An adaptation module that delivers exogenous DNA sequences into CRISPR arrays, and (ii) effector modules that target and cleave invading nucleic acids. The adaptation module, mainly composed of Cas1 and Cas2 proteins, is conserved, while the effector modules are highly variable among the CRISPR-Cas systems described in the literature.
Based on the different gene organizations of the effector modules, the CRISPR-Cas systems are currently grouped into classes, types, and subtypes. Class 1 includes three types (I, III, and IV) which, in turn, are subdivided into eight, five, and two subtypes, respectively. Class 2 is composed of three types (II, V, and VI) subdivided into four, ten, and four subtypes, respectively [14][15][16]. CRISPR-Cas systems are widespread in prokaryotes; the Enterobacteriaceae bacterial family has recently been reviewed by Medina-Aparicio et al., [17]. These systems were detected in 57% of the analyzed genomes, with subtypes I-E and I-F being almost exclusively identified. I-E was mainly detected in Escherichia, Klebsiella, Salmonella, and Shigella genera and I-F was detected in Yersinia and, to a lesser extent, in Escherichia genera. However, for the genus Serratia and its type species marcescens, data on CRISPR-Cas systems are still very scarce. Indeed, the only descriptions of CRISPR-Cas systems in S. marcescens are those reported by: (1) Vicente et al., 2016 [18] on the presence of a CRISPR-Cas system subtype I-F in the S. marcescens strain PWN146, isolated from the nematode Bursaphelenchus xylophilus; (2) Srinivasan et al., 2018 [19] who reported the presence of cas1, cas3, and additional csy genes (csy1-4) in the S. marcescens strain SM03 isolated from human gut; and (3) Medina-Aparicio et al., 2018 2018 [17] who mentioned the presence of the CRISPR-Cas system subtype I-F in the S. marcescens strain FG194.
The aims of this study were: (i) to identify and characterize the CRISPR-Cas systems in S. marcescens symbiont strains of Rhynchophorus ferrugineus, also known as Red Palm Weevil (RPW), one of the most invasive pests of major cultivated palms; (ii) to identify the genomic region(s) of S. marcescens involved in horizontal acquisition of CRISPR-Cas systems; and iii) to provide data on the symbiotic relationship between RPW and S. marcescens (e.g., the clonal relationship of bacteria symbionts).

Serratia marcescens Strains
S. marcescens was regularly isolated from the reproductive apparatus of adult RPWs, deposited eggs, and along the tissue of infested palms. Isolation was performed in Southern Italy from 2009 to 2014 [20]. Based on different places and years of isolation (Table 1), 4 strains (S1, S5, S8, and S13), collected from the reproductive apparatus of RPWs were randomly selected for whole genome sequencing and detection of CRISPR-Cas systems.

Genome Sequencing and Characterization of CRISPR-Cas Systems
Genomic DNA was purified using the cetyltrimethylammonium bromide method [21] and sequenced using the SELGE ("Laboratory network for the selection, characterization and conservation of germplasm and for preventing the spread of economically-relevant and quarantine pests" No. 14, subsidised by the Apulia Region, PO FESR 2007-2013-Axis I, Line of intervention 1.2, Action 1.2.1) using a 100 bp paired end protocol on an Illumina HiScanSQ platform. Whole genome sequencing reads of each strain were assembled in multiple contigs using the MyPro pipeline [22], failing to create chromosome-wide contigs due to the presence of repeated sequences (e.g., IS). Contigs positive for CRISPR-Cas systems were annotated using Prokka [23] and were graphically displayed using SnapGene software, version 4.3.11 (GSL Biotech LLC, Chicago, IL; https://www.snapgene.com/). The sequence of the S5 contig where the CRISPR-Cas system was identified was found to be interrupted in CRISPR array S5.2 and completed by primer walking of the~3.2 kb amplicon from nucleotide 30,974 to 34,322. PCRs were performed in a total volume of 50 µL containing 50-100 ng of genomic DNA of strain S5 using 1X PCR buffer (20 mM Tris-HCl, pH 8.4, and 50 mM KCl), 1.5 mM MgCl 2 , 125 µM of deoxynucleotide triphosphate mix (EuroClone S.p.A., Italy), 20 µM of each primer and 1U Platinum Taq DNA polymerase (Invitrogen, Thermo Fisher Scientific, Italy). Cycling conditions were: 94 • C for 5 min; 5 cycles of 94 • C for 30 s, 65 • C for 10 s, and 72 • C for 4 min; 10 cycles of 94 • C for 30 s, 65 • C decreasing to 60 • C (0.5 • C/cycle) for 10 s, and 72 • C for 4 min; and 15 cycles of 94 • C for 30 s, 60 • C for 10 s, and 72 • C for 4 min with final extension at 72 • C for 5 min. Primer walking was performed by GENEWIZ European Headquarters (UK). The primers used in this study are reported in Table S1. The sequence of S1, S5, S8, and S13 contigs harbouring CRISPR-Cas systems and spanning from tRNA-Leu to phnF was submitted to NCBI under the accession numbers MK507743, MK507745, MK507744, and MK507746, respectively.
Bioinformatic analysis of the spacer sequences was performed by CRISPRTarget in GenBank-PHAGE and RefSeq-PLASMID databases [26]. In order to establish if the spacers' sequences were shared among array(s), a pairwise comparison was performed by NCBI Blast. Two spacers with at least 95% identity and less than 10% difference in length were considered to be the same [27]. The online tool WebLogo (http://weblogo.berkeley.edu/logo.cgi) was used to analyze the conservation of DRs of CRISPR arrays [28].
AT rich, non-coding sequences placed upstream and downstream of CRISPR arrays were selected to detect putative leader sequences. The AT% was calculated and compared with the AT% of the whole-contig sequence. Putative leader sequences were screened for prokaryotic promoters using the BDGP Neural Network Promotor Prediction tool (http://www.fruitfly.org/seq_tools/promoter.html).
The phylogenetic tree was produced using BLAST pairwise alignments, and the Blast Tree View Widget. BLAST computes a pairwise alignment between a query (e.g., S1 contig) and a dataset of sequences (i.e., the other selected regions in this study). The algorithm used to produce the phylogenetic tree involved setting the Fast-Minimum Evolution [29] to 0.75, the maximum allowed fraction of mismatched bases in the aligned region between any pair of sequences.

PFGE
Genomic restriction was performed according to the standardised PulseNet Salmonella protocol [30]. Agarose-embedded DNA was digested with 40 U of SpeI for 3 h at 37 • C. The restriction fragments were separated by electrophoresis in Tris-borate-EDTA (44.5 mM Tris-borate, 1 mM EDTA; pH 8.0) at 14 • C using a CHEF-DRIII (Bio-Rad, Milan, Italy). Electrophoresis conditions were as follows: (Block I) initial switch time 2 s, final switch time 10 s, voltage 6 V, included angle 120 • , and run time 13 h; (block II) initial switch time 20 s, final switch time 25 s, voltage 6 V, included angle 120 • , and run time 6 h. The size standard used for all gels was XbaI-digested DNA from Salmonella Braenderup strain, the universal size standard used by all PulseNet laboratories. The PFGE agarose gels were stained with ethidium bromide (40 µg/mL) and the DNA band images were acquired by the Gel Doc-It photo documentation system (Gel Doc-It photo documentation system, UVP, Upland, CA, USA). A dendrogram of pulsotype relationships was developed through the unweighted pair group method using arithmetic averages (UPGMA) with BioNumerics software version 6.5 (Applied Maths). Pulsotypes were assigned to the same clusters if they exhibited 80% similarity in the dendrogram.

CRISPR-Cas Systems in S. marcescens from RPW
CRISPR-Cas systems were detected in two (S5 and S8) of the four strains whose genomes were sequenced in this study. The systems were chromosomally located within a variable region that extended from ybhR (encoding for an inner membrane transport permease) to phnP (encoding for a protein involved in the phosphonate metabolism). Outside of this region (i.e., from the gene for tRNA Leu to phnF) the gene organization was found conserved. A detailed description of this region, the cas genes organization, and their positions on the corresponding contig are reported in Figure 1 and Table S2. and run time 13 h; (block II) initial switch time 20 s, final switch time 25 s, voltage 6 V, included angle 120°, and run time 6 h. The size standard used for all gels was XbaI-digested DNA from Salmonella Braenderup strain, the universal size standard used by all PulseNet laboratories. The PFGE agarose gels were stained with ethidium bromide (40 µg/mL) and the DNA band images were acquired by the Gel Doc-It photo documentation system (Gel Doc-It photo documentation system, UVP, Upland, CA, USA). A dendrogram of pulsotype relationships was developed through the unweighted pair group method using arithmetic averages (UPGMA) with BioNumerics software version 6.5 (Applied Maths). Pulsotypes were assigned to the same clusters if they exhibited 80% similarity in the dendrogram.

CRISPR-Cas Systems in S. marcescens from RPW
CRISPR-Cas systems were detected in two (S5 and S8) of the four strains whose genomes were sequenced in this study. The systems were chromosomally located within a variable region that extended from ybhR (encoding for an inner membrane transport permease) to phnP (encoding for a protein involved in the phosphonate metabolism). Outside of this region (i.e. from the gene for tRNA Leu to phnF) the gene organization was found conserved. A detailed description of this region, the cas genes organization, and their positions on the corresponding contig are reported in Figure 1 and Table S2.  S5 harboured a CRISPR-Cas system subtype I-F and had two arrays (S5.1 and S5.2) containing 4 and 51 DRs, respectively. Consensus direct repeats (CDRs) of 28 bp were identified for both arrays (File S1). The I-F cas operon (cas1-cas3-cas8f-cas5-cas7-cas6f ) was found between the arrays S5.1 and S5.2 and a putative leader sequence of 131 bp was detected upstream of each array (Table S3). Of the 53 identified spacers, 19 showed putative origins from phages (6), plasmids (11), or both (2) ( Table 2). S8 harbored the CRISPR-Cas system subtype I-E. Three arrays named S8.1, S8.2, and S8.3 contained 31, 11, and 30 DRs, respectively. CDRs of 29 bp were identified for the three arrays (File S1). The subtype I-E cas operon (cas3-cas8e-cse2-cas7-cas5-cas6-cas1-cas2) was found between S8.1 and S8.2 with an 84 bp putative leader sequence detected upstream of each array (Table S3). Of the 69 identified spacers, 27 showed putative origins from phages (10), plasmids (13), or both (4) ( Table 2). All of the spacers were different. Most of them matched DNA sequences of plasmids mainly isolated from bacteria of the Enterobacteriaceae family. Likewise, phage genomes recognised by spacers came from phages predominantly detected in bacteria of the Enterobacteriaceae family and, to a lesser extent, from other bacteria families (e.g., Burkholderiaceae, Pseudomonadaceae, and Vibrionaceae) (File S2). Nine spacers hit genetic elements identified in S. marcescens and seven matched plasmids, of which one (pSMC2) was conjugative [31] and one (pCAV1492-199) was reported to harbour copper resistance genes [32]. Two spacers hit phages, of which one (phage Eta) might establish unstable lysogeny with the host [33]. Interestingly, phage Eta genome was recognised by six spacers harboured by S8 arrays.

CRISPR-Cas Systems in S. marcescens
A total of 15 S. marcescens complete genomes available in the NCBI database (www.ncbi.nlm.nih. gov/genome/genomes/1112; version Dec. 2018) were investigated to detect the presence of CRISPR-Cas systems or orphan CRISPR arrays (i.e., arrays without adjacent cas genes). This analysis, to the best of our knowledge, has never before been performed. Eight strains were found positive (Table 3). Three strains, two environmental (PWN146 and N4-5) and one clinical (12TM), harboured a CRISPR-Cas subtype I-F. Five isolates harboured orphan CRISPR arrays. The I-E subtype was detected in two environmental (KS10 and EL1) and two clinical (CAV1492 and CAV1761) strains and the I-F subtype was found in the environmental strain B3R3. It is noteworthy that I-E was only identified in orphan CRISPR arrays. Conservation of nucleotide positions within the DRs identified in this study was analyzed by WebLogo and reported in Figures S1 and S2. Subtypes I-E and I-F were localised within the same ybhR-phnP chromosomal region already identified for S5 and S8 strains. We analyzed this region present in complete genomes of S. marcescens strains devoid of detectable CRISPR-Cas systems or orphan CRISPR arrays. Apart from the presence of different CRISPR-Cas systems or orphan CRISPR arrays, this region was found to harbour genes present in specific strains ( Figure S3).
Taken together, these data suggest that the chromosomal region between ybhR and phnP is a hotospot for horizontally acquired genes and/or CRISPR-Cas systems.
The region from the gene for tRNA Leu to phnF was selected to establish the clonal relationship between our strains and those of S. marcescens that we took for comparison. S1, S5, S8, and S13 were distributed along four distant branches (Figure 2). The clonal relationship was also established by analysis of S1, S5, S8, and S13 PFGE profiles, which assigned these strains to different clusters (% of similarity lower than 80) ( Figure S4). A detailed description of the region from gene for tRNA Leu to phnF is reported in Figure S3 and Table S4. detected in two environmental (KS10 and EL1) and two clinical (CAV1492 and CAV1761) strains and the I-F subtype was found in the environmental strain B3R3. It is noteworthy that I-E was only identified in orphan CRISPR arrays. Conservation of nucleotide positions within the DRs identified in this study was analyzed by WebLogo and reported in Figure S1 and Figure S2. Subtypes I-E and I-F were localised within the same ybhR-phnP chromosomal region already identified for S5 and S8 strains. We analyzed this region present in complete genomes of S. marcescens strains devoid of detectable CRISPR-Cas systems or orphan CRISPR arrays. Apart from the presence of different CRISPR-Cas systems or orphan CRISPR arrays, this region was found to harbour genes present in specific strains ( Figure S3). Taken together, these data suggest that the chromosomal region between ybhR and phnP is a hotospot for horizontally acquired genes and/or CRISPR-Cas systems.
The region from the gene for tRNA Leu to phnF was selected to establish the clonal relationship between our strains and those of S. marcescens that we took for comparison. S1, S5, S8, and S13 were distributed along four distant branches (Figure 2). The clonal relationship was also established by analysis of S1, S5, S8, and S13 PFGE profiles, which assigned these strains to different clusters (% of similarity lower than 80) ( Figure S4). A detailed description of the region from gene for tRNA Leu to phnF is reported in Figure S3 and Table S4. Figure 2. Phylogenetic tree. The phylogenetic tree was obtained from comparative analysis of the genomic region extending from the gene for tRNA Leu to phnF in S1, S5, S8, and S13 (in red) and S. marcescens complete genomes available from the NCBI database. System types are highlighted in bold.

Discussion
Serratia is a ubiquitous genus in nature. Serratia species have been isolated from water, soil, animals (including people), and from the surface of plants [34]. In insects, Serratia spp. have been reported either as pathogens or symbionts, with marcescens and liquefaciens being the species predominantly identified [34]. For instance, S. liquefaciens and S. marcescens were found in sugar-beet root-maggot development stages (Tetanops myopaeformis), suggesting an insect microbe symbiosis as well as a nutritional interdependence [35]. S. marcescens was been found to be facultatively associated with the plant-nematode Bursaphelenchus xylophilus [18]. Concerning this, we recently reported S. marcescens as a secondary symbiont of the RPW [20], one of the most invasive pests of major cultivated palms.
Data of CRISPR-Cas systems in S. marcescens are still scarce. Despite being limited, the analysis of the four S. marcescens genome sequences considered in this study shows the presence of either subtype I-E or subtype I-F. Detection of subtype I-E, to the best of our knowledge, has not previously Microorganisms 2019, 7, 368 7 of 9 been reported in this species. In silico analysis revealed the unicity of all the spacers and memories of different foreign genetic elements that invaded S5 and S8.
The subtypes I-E and I-F identified in our strains and in the available complete genomes of S. marcescens were located within the genomic region spanning from ybhR to phnP. This region is likely to be a genomic hotspot for horizontally acquired genes and/or CRISPR-Cas systems. The tendency of CRISPRs to be distributed non-randomly on bacterial chromosomes, but occurring in specific chromosome regions, has already been reported [36][37][38]. For instance, within the Enterobacteriaceae family genome, hotspots involved in the acquisition of CRISPR-Cas systems have been described for Klebsiella spp. [39,40] and Escherichia coli [41]. Our findings extend this knowledge to S. marcescens.
We also performed a phylogenetic analysis by using the genomic sequence encompassing ybhR-phnP and extending from the gene for tRNA Leu to phnF. In the resulting dendrogram, our strains were situated on distant branches. The clonal relationship was further established by analysis of the S1, S5, S8, and S13 pulsotypes that assigned these bacteria to different clonal groups. The unrelated clonal relationship, the different immune memories, and the presence of different CRISPR subtypes or their absence suggest that RPW might have acquired S. marcescens through independent events. However more data are necessary to support this hypothesis.
CRISPR studies are receiving a lot of interest from the scientific community, particularly for their potential applications in different fields [42,43]. However, studies of CRISPR diversity can also greatly help in the understanding of evolutionary dynamics of specific bacteria populations [44]. The symbiotic association between eukaryotes and prokaryotes is now recognised as one of the main forces shaping life on our planet [45]. Concerning this, the data presented here supply both a comprehensive characterization of CRISPR-Cas systems in S. marcescens symbionts of RPW and the first comparative analysis of CRISPR loci in S. marcescens.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2076-2607/7/9/368/s1, Table S1: Primers used to sequence the S5.2 CRISPR array; Table S2: Position of cas genes in contigs S5 and S8; Table S3: Putative leader sequences and their predicted promoters; Table S4: Genomic region spanning from tRNA-Leu to phnF; Figure S1: WebLogo of I-E DRs. Nucleotide frequency within CRISPR-arrays was shown by the letter's height. The codes on the left identify the strain. The number of DRs is reported in brackets. Figure S2: WebLogo of I-F DRs. Nucleotide frequency within CRISPR-arrays was shown by the letter's height. The codes on the left identify the strain. The number of DRs is reported in brackets; Figure S3: comparative analysis of the genomic region extending from the gene for tRNA Leu to phnF. Names of our strains and S. marcescens strains complete genomes retrieved from the NCBI database are reported on the left side. The region was labelled both by automatic Prokka annotation and manual editing. Dashed lines follow the corresponding region among strains. Bold dashed lines delimit the variable domain. The array code, number of DRs and subtype are reported in the grey flags on the top of the corresponding CRISPR array. Genes and open reading frames are represented by arrow boxes pointing in the direction of transcription. Feature names are labelled below arrow boxes; Figure S4: PFGE. Pulsed field gel electrophoresis profiles and dendrogram showing the relationship between the S. marcescens strains S1, S5, S8 and S13; File S1: sequences of consensus direct repeats and arrays; File S2: biological origin of protospacers and their distibution among arrays.

Funding:
The study was partially supported by the CIHEAM IAMB within the event: "International meeting-Innovative and sustainable approaches to control the Red Palm Weevil (RPW)" held in CIHEAM Bari, 23-25 October 2018.