Genotyping Study of Salmonella 4,[5],12:i:- Monophasic Variant of Serovar Typhimurium and Characterization of the Second-Phase Flagellar Deletion by Whole Genome Sequencing

After Salmonella Enteritidis and S. Typhimurium, S. 4,[5],12:i:- is the most reported serovar in human clinical cases. During the past 20 years, many tools have been used for its typing and second-phase flagellar deletion characterization. Currently, whole genome sequencing (WGS) and different bioinformatic programs have shown the potential to be more accurate than earlier tools. To assess this potential, we analyzed by WGS and in silico typing a selection of 42 isolates of S. 4,[5],12:i:- and S. Typhimurium with different in vitro characteristics. Comparative analysis showed that SeqSero2 does not differentiate fljB-positive S. 4,[5],12:i:- strains from those of serovar Typhimurium. Our results proved that the strains selected for this work were non-clonal S. 4,[5],12:i:- strains circulating in Spain. Using WGS data, we identified 13 different deletion types of the second-phase flagellar genomic region. Most of the deletions were generated by IS26 insertions, showing orientation-dependent conserved deletion ends. In addition, we detected S. 4,[5],12:i:- strains of the American clonal line that would give rise to the Southern European clone in Spain. Our results suggest that new S. 4,[5],12:i:- strains are continuously emerging from different S. Typhimurium strains via different genetic events, at least in swine products.


Introduction
Salmonella enterica subs. enterica consist of more than 2600 serovars [1]. Nontyphoidal Salmonella serovars are common causative zoonotic agents of bacterial food-borne disease worldwide. After S. Enteritidis and S. Typhimurium, the monophasic variant S. 4, [5],12:i:-is the most frequently reported in clinical human infections, and is responsible for about 4.7 % of total reported cases [2]. The monophasic variant S. 4, [5],12:i:-is antigenically similar to S. Typhimurium (which has the antigenic formula 4,5,12:i:1,2) but does not express the second-phase flagellar antigen, which is identified as 1,2 in the S. Typhimurium antigenic formula. The first described monophasic variant of S. Typhimurium emerged in Spain in 1997 [3] and became the fourth most common serovar in clinical isolates in 1998 [4]. Thereafter, the emergence of multiple clones of monophasic variant of S. Typhimurium has been is a clear example of the usefulness of NGS techniques to carry out a complete characterization of S. 4, [5], 12:i:-and it shows that these techniques can be useful for the monitoring of S. 4, [5],12:i:-strains circulating both in Spain and worldwide.

Isolate Collection
A total of 42 Salmonella enterica isolates collected from 1999 to 2015, from different matrixes and Spanish locations, were selected for this study (Table A1). The selection was made based on the different origins and characteristics studied in vitro with the aim of reflecting the genetic variations among the monophasic strains circulating in recent years in Spain. Briefly, the isolates were: (i) 13 S. 4, [5],12:i:-from unrelated gastroenteric infection cases; (ii) 4 S. 4, [5],12:i:-from pork sausages; (iii) 15 S. 4, [5],12:i:-of asymptomatic pigs, of which 13 were from the intestinal content (IC) and 2 from mesenteric lymph nodes (MLNs); and (iv) 10 S. Typhimurium, of which 9 were from MLNs and 1 was from the IC of asymptomatic pigs. All the isolates were provided with the serotyping determined by the Kauffmann-White scheme [1], the antimicrobial susceptibility determined by the Kirby-Bauer disc diffusion test [22] and phage type defined according to Anderson et al. [23].

Whole Genome Sequencing and in Silico Genotyping
Genomic DNA from the 42 isolates was extracted and purified using the NucleoSpin Tissue DNA purification kit (Macherey-Nagel, Duren, Germany), according to the manufacturer's instructions. Sequencing libraries were prepared using the NexteraXT library preparation kit and WGS was performed on the Illumina MiSeq platform, generating 250 bp paired-end reads. The sequences were submitted to the European Nucleotide Archive (https://www.ebi.ac.uk/ena) under the project accession number PRJEB37694. Raw reads were assembled into contigs using the INNUca pipeline (https://github.com/theInnuendoProject/INNUca), which consists of several modules [24]. Firstly, INNUca calculates whether the sample raw data fulfil the expected coverage (minimum 15X). Then, INNUca uses FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to perform a read quality analysis and Trimmomatic [25] to trim the reads. After subjecting reads to quality analysis using FastQC again, INNUca proceeds to de novo draft genome assembly with SPAdes [26]. Subsequently, coverage filtering is performed using Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) and Samtools (http://www.htslib.org/doc/samtools.html). Next, Pilon [27] improves the draft genome, removing very poorly represented sequences, correcting bases, fixing misassemblies and filling gaps. Finally, the INNUca workflow ends with species confirmation and MLST prediction of seven genes using mlst2 (https://github.com/tseemann/mlst).
Serovar and antibiotic resistance prediction was performed using SeqSero2 [28] and ResFinder 4.0 (95% ID threshold, 60% minimum length) [29], available as a web service at the Center for Genomic Epidemiology (http://www.genomicepidemiology.org), and was then compared with those provided by classical microbiology.

3
ResFinder allowed the detection of at least one antibiotic resistance gene in all S. 4, [5],12:i:-and S. Typhimurium strains ( Table 2). The aminoglycoside family genes (i.e., aac, aad and/or aph) were the most common, with the cryptic gene aac(6 )-laa being found in all the isolates studied (Table 2). In addition, 14 different genotypic antimicrobial profiles were determined and 66.67% of the strains showed resistance genes to at least 3 antibiotics. The most frequent genotypic multi-resistance profile was ASSuT (19.05%). Interestingly, all the strains with this tetra-resistance profile were S. 4, [5],12:i:-, coinciding with the reported European monophasic clone. ResFinder correctly identified 92.12% (117/127) of the antibiotic resistances found phenotypically. Comparing with classical antibiograms, ResFinder detected the same antibiotic resistances or more in 85.71% (36/42) of the isolates. In the remaining 6 isolates, ResFinder detected fewer antibiotic resistances than in classical antibiograms.
In contrast, among S. Typhimurium strains, the replicon most frequently identified was IncFII(S) (70.00%). All the genomes analyzed showed at least one prophage sequence by PHASTER analysis (Table 1). A total of 15 different prophage sequences were found, Gifsy1 (88.10%), Gifsy2 (50.00%) and Sal3 (47.62%) being the most frequent (Table 1). In silico PCR simulation detected the presence of the insertion sequence IS26 in 96.87% of the monophasic strains and in 30.00% of S. Typhimurium strains (Table 1).
Using a traditional Salmonella MLST scheme, formed by 7 housekeeping genes (aroC, dnaN, hemD, hisD, purE, sucA and thrA), the isolates were classified into 2 different sequence types (Table 1). On the one hand, most of the S. On the other hand, 90.00% of the S. Typhimurium strains were classified in ST-19 and the other 10.00 % in ST-34. In contrast, by the cgMLST scheme consisting of 3002 genes, the isolates were classified into 23 different sequence types. None of the S. 4, [5],12:i:-strains shared the same cgMLST type with the S. Typhimurium strains, even though they were strains isolated from the same pig farm.        Figure A5)

Characterization of the fljAB Operon Deletion Types by WGS
As mentioned above, the fljAB operon (fljA, fljB and hin genes) and the flanking genes from the S. Typhimurium LT2 genome were used as a reference to characterize the deletions. The area searched began at STM2693 and ended at STM2774. Thirteen different deletion types were characterized (Table 3 and Figures A1-A6). Twelve fljB-negative deletion types (∆fljAB1-12) were sorted according to the deletion length (∆fljAB1 being the longest and ∆fljAB12 the shortest). One fljB-positive deletion type was detected with different variations included (∆fljAB13).
The ∆fljAB1 type was the longest deletion (77 genes) and showed an insertion of 5654 bp, which contained different fragments encoding for Gyfsy-2 prophage proteins, UMUC protein and two fragments encoding for three Fels-2 prophage proteins ( Figure A1). The deletion types ∆fljAB2-6, ∆fljAB8-9 and ∆fljAB12 varied in the length of their deletions, and they were all characterized by the insertion of one IS26 copy (Figures A2 and A3). In the deletion ∆fljAB7, two IS26 copies were inserted ( Figure A4). The deletion ∆fljAB10 had the largest insertion (7663 bp), which consisted of three insertion sequences (IS1, IS10 and IS26) and other additional genes, including tetracycline resistance genes ( Figure A5). The main characteristic of ∆fljAB11 was the insertion of a truncated IS1 and a complete IS26 copy ( Figure A2). Finally, in the deletion type ∆fljAB13, three variants of fljB-positive strains were included. These strains showed an insertion of one or two copies of IS26, generating a partial deletion, an interruption of the hin gene or were located after it ( Figure A6). The deletion types belonging to ∆fljAB2-3, ∆fljAB6 and ∆fljAB8-13 had the IS26 copy, in the 3 -5 direction, inserted in the same nucleotide of the intergenic region between the hin and iroB genes (the same ending point). However, the ∆fljAB4 and ∆fljAB5 deletions had the IS26 copy, in the 5 -3 direction, inserted 222 bp downstream of the STM2757 gene (the same starting point).

Discussion
The emergence of the 4, [5],12:i:-monophasic variant of Salmonella Typhimurium demonstrates its evolutionary success. It has rapidly become one of the most prevalent serovars in humans in numerous countries worldwide [34]. The loss of the second-phase flagella has not prevented the emergence and worldwide spread of S. 4, [5],12:i:-monophasic variant strains. Flagella (H antigen) on the surface of S. Typhimurium had been characterized as a virulence factor that helps the bacteria move toward and adhere to host cells. However, Lockman and Curtiss [35] concluded that independent Tn10 insertions that were mapped to different flagellar genes did not affect the virulence of S. Typhimurium for mice and suggested that motility might be irrelevant as a virulence factor for an invasive, facultative intracellular pathogen.
Classification of Salmonella by serotyping is generally performed by accredited National Reference Centers, as an essential epidemiological tool. In case of S. 4, [5],12:i:-, this procedure is crucial, since the non-expression or non-detection of the first and second flagellar antigens leads to the erroneous typing of S. Typhimurium as its monophasic variant. To solve this problem, the bacteria should be sequentially subcultivated for a new serotyping and, in case of negative results, an additional multiplex PCR should be completed [5]. Since this method is highly time-consuming and entails an unnecessary manipulation of the pathogen, multiplex PCR is routinely used. This PCR amplifies the fljB-fljA intergenic region of the flagellin gene cluster [5] but is unable to differentiate the monophasic fljB-positive variant from S. Typhimurium [36]. In this study, we found that WGS can prevent the amplification of the long-inserted fragments in the fljAB operon. Since most of the monophasic variant studies seldomly search for the deletion of the second flagellar phase, a complete strain characterization often requires applying multiple techniques. However, as we have verified through this work, NGS technologies allow a complete characterization of Salmonella strains within a few days. As an alternative to WGS, in 2018, a liquid bead array was proposed for the identification and characterization of S. 4, [5],12:i:variants to achieve results in a rapid and simple data analysis [36]. In this work, we demonstrate that WGS can also be a rapid method that enhances traditional profiling efforts for the characterization of the monophasic variant, including the prediction of clinically relevant phenotypic traits such as antibiotic resistance genes, plasmids or virulence genes. On the other hand, the specific matrix of bead arrays only allows us to discriminate between S. Typhimurium and S. 4, [5],12:i:-, whilst we observed that WGS allows a detailed characterization of the second flagellar phase genetic deletions involved in this serovar.
To achieve full molecular description of strains, the development of efficient, standardized and molecular-guided laboratory surveillance is necessary and a high priority [37]. As presented in this work, WGS and freely available web services and bioinformatic tools can be extremely useful for public health laboratories and epidemiological surveillance. For instance, the bioinformatic tools used in this work allowed the achievement of the complete typing of S. 4, [5],12:i:-. Regarding the serotype, SeqSero2 has been considered more reliable for the serological prediction of the monophasic variant of S. Typhimurium, compared to other tools such as MOST and SISTR [38]. Even so, our results indicate that SeqSero2 does not correctly predict the serotype of S. The in silico typing done in this work showed that, although WGS analysis seems expensive and complex, bioinformatic tools have transformed it into a cost-effective tool. Furthermore, large amounts of time and materials would be required if all typing had to be done through classic microbiology. To ensure the success of in silico typing, it is essential to generate high-quality contigs, which in turn requires evaluating sequence quality and the existence of possible technical errors by establishing quality control measures. Assessing genome assembly quality is significant in this process because poor-quality assemblies hamper downstream analyses, resulting in incorrect interpretations [37]. As such, it is critical to identify, evaluate and minimize technical errors occurring during sample isolation, DNA preparation sequencing and genome assembly.
Regardless of the epidemiological information, the study of the second flagellar phase deletions could provide further insight into their origin, the genetic events yielding them and the characterization of the inserted fragment. Through WGS carried out in this work, 13 different deletion types and subtypes of the second-phase flagellar genomic region were found in 32 S. 4, [5],12:i:-strains. Genetic diversity observed in the deletion types and in the in silico typing achieved by bioinformatic tools (ResFinder, SPIFinder, PlasmidFinder, PHASTER, in silico PCR and cgMLSTFinder) prove that the selection of strains analyzed in this work is a representation of non-clonal monophasic strains circulating in Spain.
The deletion ∆fljAB1, where 77 genes were absent compared to the genome of S. Typhimurium LT2 strain, showed an insertion of 5654 bp. The sequence of this insertion is very similar to the insertion described by Soyer et al. in the US strains [10], although some differences could be detected, namely, a 5654 bp fragment inserted instead of a 7 kb fragment, and the presence of the complete STM2704 gene, which is partially deleted on US strains. Of the total number of strains analyzed, four strains of clinical origin had this deletion; nevertheless, they varied in phenotypic characteristics. Strain 692 had the aminoglycoside resistance gene aac (6 )-laa that is a cryptic gene in Salmonella and IncFIB(S) and IncFII(S) plasmids. In strain 705, the cryptic gene aac(6 )-laa and the tetracycline resistance gene tet(B) were detected but no plasmids. Given the information provided by WGS, we considered that these strains could belong to the same lineage as the American strains described by Soyer et al. in the US. However, the other two strains with this same deletion (697 and 702) had these characteristics: cmlA, aac(6 )-laa, aph(3")-lb, aph(6)-ld, aadA1, aadA2, Sul3, tet(B) and dfrA12 resistance genes (CSSuTTm multi-resistance profile) and the presence of IncR plasmids. Interestingly, these two strains have the same characteristics as the Southern Europe clone described by Mourão et al. [14]. The results suggest that the strains 692 and 705 of the American clonal line are the ancestors of the Southern Europe clone in Spain. Moreover, the strains 697 and 702 with the American deletion, IncR plasmids and CSSuTTm multi-resistance profile are the representation of the Southern Europe clone in Spain, a result of the acquisition of IncR plasmids.
In 2016, Garcia et al. described an S. 4, [5],12:i:-clonal lineage widespread in Germany, Switzerland and Italy, carrying a ASSuT tetra-resistance induced by IncH1 plasmids, which replaced the second-phase flagellar genomic region [18]. We found eight strains S. 4, [5],12:i:-with an ASSuT tetra-resistance but none of these strains had the multidrug resistance plasmid described by Garcia et al. In six of these strains, only one or two copies of IS26 were detected in the second-phase flagellar genomic region (deletions ∆fljAB6, ∆fljAB7, ∆fljAB9, ∆fljAB12 and ∆fljAB13) and the remaining two S. 4, [5],12:i:-ASSuT strains were classified as ∆fljAB10 deletion type, showing a 7663 bp fragment between STM2761 and iroB genes. This fragment was composed of IS1, IS26 and a truncated IS10, tetA and tetR genes, which are implicated in tetracycline resistance, as well as of genes that codified a hypothetical protein or JemC, JemB and JemA products. Similar genetic composition has been described on an STM plasmid (pSRC27-H) and in the S. Typhimurium genome (T000240 strain). The putative roles of these insertion sequences would be the following: IS1 would drag the genes that appear on the T000240 strain; IS10, possibly located on the pSRC27-H plasmid, would be inserted by recognizing the tetA, tetR and jemC genes, and thus partially deleting the hypothetical protein of the T000240 strain; lastly, IS26 would interrupt IS10.
To date, two different models have been proposed to explain why the insertion sequence IS26 generates deletions in the second-phase flagellar region of S. 4, [5],12:i:-strains. On the one hand, it is suggested that most of the fljB-negative S. 4, [5],12:i:-strains observed globally could have emerged from a common ancestor containing an IS26 copy at that specific position [41]. On the other hand, it is theorized that several independent insertions may have occurred in different genetic events where an IS26 copy recognizes a hotspot [16,18]. In our study, S. 4, [5],12:i:-strains isolated in Spain containing at least one IS26 copy in the second-phase flagellar deletion have shown a genetic variability both in the region adjacent to the 3 -end of IS26 and in the genotyping based on WGS (presence of resistance genes, pathogenicity islands, plasmids, prophages and cgMLST). The great diversity found shows that S. 4, [5],12:i:-strains isolated in Spain and containing an IS26 copy do not belong to a single clone.
In this research, 85.71% of strains containing an IS26 in the second flagellar phase deletion had this insertion sequence in the 3 -5 direction. In these deletions, the region adjacent to the 3 -end of IS26 (deletion starting point) varied between strains, while the region adjacent to the 5 -end (deletion ending point) was conserved. The IS26 insertion was found in the same position (i.e., 334 nucleotides upstream from the iroB gene) in fljB-negative strains isolated in the USA, South Korea and some European countries [18,36]. It is noteworthy that three strains of our work were fljB-positives (i.e., ∆fljAB13), even though they had IS26 inserted. The remaining 14.29% of strains had an IS26 in the 5 -3 direction and they all had the same deletion starting point. Interestingly, in a previous work with 60 strains of the S. 4, [5],12:i:-Spanish clone, 93.60% of the isolates shared the same starting point but different deletion ending points [9]. All these deletions had an IS26 inserted in the 5 -3 direction in nucleotide no. 1444 of the gene STM2758 [9], very close to the starting point of the ∆fljAB4 and ∆fljAB5 deletions. WGS results show that the 5 -end of IS26 generates conserved deletion ends whilst creating significant variability in the genomic region adjacent to the 3 -end of the IS26 insertions. This finding suggests that the IS26 insertion sequence has a recognition end at 5 that can be inserted in certain areas depending on its direction. Furthermore, we propose that S. 4, [5],12:i:-strains are evolving from different S. Typhimurium strains with no close phylogenetic relationship through different genetic events in which at least one IS26 was involved, promoting a pool of monophasic variants that share a similar deletion due to the IS26 5 -end recognition.
Several research groups further characterized S. 4, [5],12:i:-strains, reporting the existence of non-clonal S. 4, [5],12:i:-circulating in other European countries, such as Belgium, Italy, France and Poland [19,20,42]. However, other studies have observed S. 4, [5],12:i:-clonal lineages in the United Kingdom, Italy, Germany and Switzerland [7,16,18]. The detection of clonal and non-clonal strains may depend on the objectives of the research carried out in the above works. Although greatly expanded S. 4, [5],12:i:-clonal lines have been reported worldwide, it is possible that in the future, new S. 4, [5],12:i:-strains will be detected with different deletions of the second-phase flagellar genomic region. New S. 4, [5],12:i:-strains will be generated from S. Typhimurium strains in different genetic events, especially genomic rearrangements mediated by IS26. Pigs have been the main animal reservoir for S. 4, [5],12:i:-for years [34,43]. Therefore, it can be deduced that the genetic events causing the deletion of the second flagellar phase of the Salmonella strains analyzed in our study probably occurred within pigs. In fact, S. 4, [5],12:i:-was reported amongst the three most frequent serotypes in pigs in 2017, together with S. Typhimurium and S. Derby [34], in agreement with previous European guidelines [5].

Conclusions
In conclusion, our study demonstrates that the availability of sequencing technologies and the development of bioinformatic tools turn NGS into a realistic alternative to traditional methods for the characterization of S. 4, [5],12:i:-strains. In addition, these tools were essential to study the genetic bases of the monophasic phenotype and to identify S. 4, [5],12:i:-American clonal line strains in Spain that would give rise to the Southern Europe clone due to the acquisition of the IncR plasmid. Therefore, these tools were useful in determining the implication of the insertion sequence IS26 when generating new deletions and to establish the genetic link between S. 4, [5],12:i:-and S. Typhimurium strains. The results obtained in our study suggest that Salmonella monophasic variants are evolving from different S. Typhimurium strains through independent genetic events that may have taken place in swine. Within these genetic events, at least one IS26 was inserted whose 5 -end recognized a hotspot in the second-phase flagellar genomic region and generated conserved deletion ends. Finally, we consider that the genetic diversity observed in S. 4, [5],12:i:-strains analyzed in this study proves that non-clonal monophasic strains are circulating in Spain. Further studies are needed to analyze the recognition mechanism of the insertion sequence IS26 in the second-phase flagellar genomic region of S. 4, [5],12:i:-. This finding would help in the understanding of the mechanism by which new S. 4, [5] Figure A2. Structures of the ∆fljAB2, ∆fljAB3, ∆fljAB6, ∆fljAB8, ∆fljAB9, ∆fljAB11 and ∆fljAB12 deletion types. These deletion types had different starting points, but the same ending point of the deletion at the intergenic zone between hin and iroB genes (334 nucleotides upstream of iroB) and at least one IS26 inserted (colored in green       Figure A5. Structure of the ΔfljAB10 deletion type. The ΔfljAB10 deletion (strains 703 and 704) started at nucleotide 1125 of the STM2761 gene and ended at the intergenic zone between hin and iroB genes (334 nucleotides upstream of iroB). The inserted fragment (colored in green) started with an IS1, 354 nucleotides from the protein COG1309 described in the S. Typhimurium T000240 strain, and after that the tetA, tetR, jemC, jemB and jemA genes appear and a truncated IS10 lacking 973 nucleotides by an IS26. Figure A5. Structure of the ∆fljAB10 deletion type. The ∆fljAB10 deletion (strains 703 and 704) started at nucleotide 1125 of the STM2761 gene and ended at the intergenic zone between hin and iroB genes (334 nucleotides upstream of iroB). The inserted fragment (colored in green) started with an IS1, 354 nucleotides from the protein COG1309 described in the S. Typhimurium T000240 strain, and after that the tetA, tetR, jemC, jemB and jemA genes appear and a truncated IS10 lacking 973 nucleotides by an IS26. Figure A5. Structure of the ΔfljAB10 deletion type. The ΔfljAB10 deletion (strains 703 and 704) started at nucleotide 1125 of the STM2761 gene and ended at the intergenic zone between hin and iroB genes (334 nucleotides upstream of iroB). The inserted fragment (colored in green) started with an IS1, 354 nucleotides from the protein COG1309 described in the S. Typhimurium T000240 strain, and after that the tetA, tetR, jemC, jemB and jemA genes appear and a truncated IS10 lacking 973 nucleotides by an IS26.