Serotype Screening of Salmonella enterica Subspecies I by Intergenic Sequence Ribotyping (ISR): Critical Updates

(1) Background: Foodborne illness from Salmonella enterica subspecies I is most associated with approximately 32 out of 1600 serotypes. While whole genome sequencing and other nucleic acid-based methods are preferred for serotyping, they require expertise in bioinformatics and often submission to an external agency. Intergenic Sequence Ribotyping (ISR) assigns serotype to Salmonella in coordination with information freely available at the National Center for Biotechnology Information. ISR requires updating because it was developed from 26 genomes while there are now currently 1804 genomes and 1685 plasmids. (2) Methods: Serotypes available for sequencing were analyzed by ISR to confirm primer efficacy and to identify any issues in application. Differences between the 2012 and 2022 ISR database were tabulated, nomenclature edited, and instances of multiple serotypes aligning to a single ISR were examined. (3) Results: The 2022 ISR database has 268 sequences and 40 of these were assigned new NCBI accession numbers that were not previously available. Extending boundaries of sequences resolved hdfR cross-alignment and reduced multiplicity of alignment for 37 ISRs. Comparison of gene cyaA sequences and some cell surface epitopes provided evidence that homologous recombination was potentially impacting results for this subset. There were 99 sequences that still had no match with an NCBI submission. (4) The 2022 ISR database is available for use as a serotype screening method for Salmonella enterica subspecies I. Finding that 36.9% of the sequences in the ISR database still have no match within the NCBI Salmonella enterica database suggests that there is more genomic heterogeneity yet to characterize.


Introduction
Foodborne illness caused by Salmonella enterica (S. enterica) is a persistent threat to the health of people around the world, and outbreaks are closely monitored within the U.S. [1][2][3]. The genus Salmonella has two species, namely S. bongori and S. enterica. Foodborne pathogens are concentrated in one of the six subspecies (subsp.) namely S. enterica subsp. I, which has the synonym of S. enterica subsp. enterica [4]. There are some instances where other subspecies of S. enterica subsp. I cause illness, but overall, they are infrequently encountered as public health issues. The information provided here focuses on S. enterica subsp. I taxid:59201 and sometimes broaden searches to all of Salmonella in taxid:28901 (Search: Salmonella enterica-NLM (last accessed on 18 December 2022 (nih.gov)).
There are approximately 1600 serotypes within S. enterica subsp. I. Of these 1600 serotypes, 32 (2.0%) have genomes optimized for clonal expansion, virulence factors, environmental persistence, genetic adaptability, and the ability to be easily transferred between ecological niches associated with humans, animals, and the handling and processing of food [5]. Of the 32 serotypes, approximately twelve are of greater concern because they account for about 90% of foodborne outbreaks. The twelve serotypes, in approximate order of magnitude as evaluated from current information from the Centers for Disease Control (CDC), the Food and Drug Administration (FDA) and the USDA Food Safety and Inspection Service (FSIS) include the following serotypes: S. enterica subsp. I serotype Enteritidis (S. Enteritidis), S. Typhimurium, S. Newport, S. Javiana, S. Heidelberg, S. Hadar, S. Infantis, S. Montevideo, S. Muenchen, S. Braenderup, S. Saintpaul, and S. Senftenberg [6,7]. Additionally, included in the list of top twelve serotypes is S. 4, [5],12:i:-, which is a serotype that expresses Oantigen and H1 flagellar epitopes; however, it lacks flagellar H2 epitopes due to mutation [8].
There are annual variations in the relative incidence of frequently isolated salmonellae. Another serotype of note is S. Kentucky because it is frequently isolated from agricultural environments, but it does not often cause human disease; however, it does harbor antibiotic resistance that can impact in-hospital nosocomial disease [9].
Serotyping of S. enterica subsp. I was founded on nearly 70 years of information produced by using a complex panel of monospecific antisera to characterize epitopes on the outer membrane of the bacterial cell. The process is referred to as the Kauffman-White-LeMinor (KWL) scheme [10]. The targeted epitopes are associated with the complex carbohydrate O-antigen repeating unit of lipopolysaccharide (LPS) and two proteins expressed from genes fliC and fljB. Expressed proteins from these two genes, which undergo phase variation, comprise the major structural component of the flagellar organelle used for motility. These antigenic variants are called H1 and H2 in the KWL scheme. National responses and regulatory actions for the three serotypes are different, and the U.S. poultry industry has eradicated S. Gallinarum and S. Pullorum due to their threat to the entire poultry industry; in contrast, S. Enteritidis remains a constant threat to the safety of the food supply [11,12]. Detection of S. Gallinarum or S. Pullorum in U.S. poultry flocks necessitates stringent quarantine and eradication measures to protect the economic viability of the egg industry. Detection of S. Enteritidis falls under regulatory guidelines and might trigger a traceback investigation or other measures intended to reduce the risk of food contamination [13].
Most of the bioinformatics pipelines for receiving, processing, analyzing, and interpreting whole genome sequences (WGS) are associated with government agencies and both FDA and USDA-FSIS have regulatory responsibilities for the safety of food products in the United States [14]. Regulators use both MLST and WGS to conduct source attribution following outbreaks, and there is an association between MLST and WGS with serotype [15]. Source attribution requires resolving genome sequence to the single nucleotide polymorphism (SNP) and stringent bioinformatics [16,17]. This level of analysis is not needed by companies wanting to keep environments associated with producing food free of Salmonella. Instead, companies need streamlined information on the presence of Salmonella, on the presence of regulated serotypes such as S. Enteritidis, and if serotype populations fluctuate throughout the year. The ability of ISR to distinguish between the closely related serotypes S. Gallinarum and S. Pullorum provides an example of how ISR can be used in field studies for initial screening of larger sample numbers and then informing additional genomic analyses of selected strains [18].
The Centers for Disease Control developed SeqSero2 for epidemiological investigations of outbreaks in humans, and it associates serotype to WGS [19]. Agencies across the government collaborate with each other, confirm serotype designations with the National Veterinary Services Laboratory (NVSL), and consult with other researchers and public health departments, on issues involving Salmonella contamination of food sources (Par-ticipants|PulseNet USA|CDC). Large government supported databases are invaluable resources for epidemiological investigations, and they can also be used to evaluate worldwide trends by coordinating analyses with other international databases [20,21]. Companies producing food have expressed concerns about submitting samples to government-based pipelines beyond regulatory requirements because there is potential liability associated with the duty of responsibility to report and a loss of data ownership [22]. Thus, domestic and international agricultural companies are inhibited from using MLST or WGS in a manner that makes full use of their technological power.
For the reasons cited above, and especially to encourage routine screening for the presence Salmonella enterica within any operation producing food and food products, Intergenic Sequence Ribotyping (ISR) was developed. The initial development of ISR focused on distinguishing non-motile S. enterica subsp. I S. Gallinarum and S. Pullorum, which lack both H1 and H2 antigens, from rare variants of S. Enteritidis that did not express either variant [23]. Previous research found that the dkgB-linked ISR region was the most useful for investigating poultry-associated Salmonella [23]. It is a PCR-based method developed further for the purpose of screening for contamination and assigning S. enterica subsp. I serotype names that have potential for causing foodborne illness [24]. ISR is not designed to make a definitive identification of serotype but instead functions best as a quality control measure. Since cultures are the starting point for analysis, companies can make later submissions if Salmonella serotypes appear to be present that might be of concern from either a regulatory viewpoint or from a general concern that products are free of Salmonella.
Following is an abbreviated description of the major steps for performing ISR, and further details for processing samples are described in Materials and Methods: (i) After purification of DNA from cultures suspected to be Salmonella, amplifying primers ISR F1 and ISR R1, described in detail in Section 2.4, are used to target sequence spanning part of the 23S ribosomal gene rrlH and part of the gene encoding 2,5-diketo-D-gluconate reductase B (yafB in S. Typhimurium reference strain NC_003197.2; dkgB in S. Enteritidis reference strain NC_011294.1). The amplicon product will include sequence from the end of the rrlH gene, sequence that includes all the 5S ribosomal gene rrfH and its 5 and 3 flanks, tRNA-asp, and part of the dkgB gene. Within the reference strain for the genus of Salmonella enterica, namely S. Typhimurium LT2 (NC_003197.2), the ISR amplicon region with primers is 1444 nt and is located between 294,123 and 295,567 bp [25]. (ii) Sequence is obtained from the amplicon product by using primers ISRs1_F8 and ISRs2_R42, which are located internal to the 5 and 3 ends of the amplicon, in separate PCR reactions. Forward and reverse orientations are advised for best resolution. Reactions are then submitted or processed in-house to obtain sequences. If submitted, the client receives the sequence by private link. (iii) The client then uses commercially available bioinformatics packages to trim ambiguous nucleotides, and trimmed sequences are batch aligned to the ISR database; alternatively, text recognition software can be used if bioinformatics software is not available. (iv) Trimmed sequence can also be compared by BLAST to available genomes at NCBI.
Parameters for aligning sequence to the most likely S. enterica subsp. I named serotype are 100% query coverage and 100% identity with no ambiguities.
The ISR database was first developed when there were 26 completed chromosomal S. enterica genomes available at NCBI. The ISR database was expanded beyond that of NCBI by combining it with a sequencing project analyzing strains submitted from many sources and coordinating it with a DNA hybridization AOAC approved method for assigning serotyping used in the EU [26]. By the end of 2012, there were 220 ISR sequences available upon request, and another 24 were added between 2012 to 2020. The NCBI database has grown since 2012 to include a list of 1804 chromosomal and 1419 plasmid completed genomes for S. enterica subsp. I (taxid: 59201) (last date accessed 21 September 2022). Ten years later after its initial development, it is time to review the ISR database, expand it to include more serotypes accessioned at NCBI, identify issues with interpretation of data, and to identify any problems in application.

Determining the Size of the Ncbi Database in 2022
ISR accessioning of the NCBI database uses only completed genomes due to assembly issues involving redundancies within ribosomal gene sequences. At site Genome-NCBI-NLM (nih.gov), 1804 genomes are listed after filtering for completeness. The number of genomes published per year can also be estimated. When conducting microbe BLAST searches for all subsp. of S. enterica (taxid:28901), 3938 genomes are listed, which include 1685 plasmids (Nucleotide BLAST: Search nucleotide databases using a nucleotide query (nih.gov)). BLAST search for complete genomes of S. enterica subsp. I (taxid:59201) lists 3327 genomes including 1419 plasmids. Therefore, the range of completed chromosomal genomes at NCBI for S. enterica subsp. I is between 1804 to 1908.

Bioinformatics Software and Analytics
There are several sources of suitable software. For the analyses here, Geneious Prime ® 1 January 2022 Build 15 March 2022 11:43 was used throughout. NCBI also has applicable bioinformatics, annotations, search engines, BLAST analysis algorithms, and other bioanalytic tools (National Center for Biotechnology Information (nih.gov)). NCBI is the source for S. enterica complete genomes.

Culture and Initial DNA Extraction
To begin analysis, 200 isolates of S. enterica subsp. I were grown on Brilliant Green (BG) agar (Acumedia; Neogen Corporation, Lansing, MI, USA) from stock frozen in glycerol and maintained at −80 • C at the U.S. National Poultry Research Center (USNPRC) in Athens, GA, USA. All isolates had been stored for at least 2 years and were chosen to maximize serotype variability. However, some serotypes were duplicated to analyze variation in results. Cultures on BG plates were stored at 4 • C after culturing, and then shipped per regulations to RSI Poultry Veterinary Consulting (DeSoto, KS, USA). At RSI, Salmonella isolates were grown in Tryptic Soy Broth for 18-24 h at 37 • C. One (1) mL of broth culture was harvested and processed for DNA extraction. DNA extraction was performed using the PureLink™ Genomic DNA Mini kit (Invitrogen Cat#K1820-02). DNA was eluted in 260 µL of PCR water. DNA was then spotted onto Whatman™ FTA Cards (GE Healthcare Bio-Sciences Corp., Piscataway, NJ, USA) for storage [27]. A 15-day DNA quality control ISR PCR run was performed on 40 samples to check for the ability to repeat results.

Preparation of Primers
Handling of all primers and DNA was done within a Mystaire CleanPrep Station. PCR Primers were ordered from IDT (accessed 13 January 2021 (www.idtdna.com)), and parameters were 25nmole DNA oligos with standard desalting. Primers were diluted in PCR pure water to obtain a 100 pmol/µL concentrated solution. The concentrate was diluted to a working concentration of 10 pmol/µL in 9 µL of PCR water. The concentrate and working solutions were frozen at −20 • C until used. Amplifying primers ISR_F1 and ISR_R1 were used in the first phase to make sure DNA from the dkgB region was amplified. Product size could vary as much as 250 bp, and bands in gels are typically seen around 1400 bp markers. Sequencing primers, used separately to obtain forward and reverse sequence, were ISRs1_F8 and ISRs2_R42. Primer sequences were: CGGAACGGAC GGGACTCGA (19 nt)

Pcr Amplification of DNA Samples
Master mix was DreamTaq Hot Start (ThermoFisher), and 23 µL were aliquoted into 0.2 mL PCR tubes (ThermoFisher, Suwanee, GA, USA). DNA samples were extracted from Whatman FTA card spots as previously described XX. Extracted DNA samples and negative sample controls, 2 µL, were added to the aliquots of master mix. Samples were processed in a Techne Prime Thermal Cycler (ThermoFisher) using the following parameters: (i) initial denaturation at 94 • C for 2 min; (ii) 35 cycles of 94 • C for 30 s, 64 • C for 30 s, and 72 • C for 2 min; (iii) final extension at 72 • C for 5 min; (iv) hold at 4 • C. Amplified samples, 15 µL, were purified using DNA Clean & Concentrator-5™ (Zymo Research, Irvine, CA, USA) according to the manufacturer's directions.

Electrophoresis
Gel electrophoresis was performed in a 1/10 dilution of the PCR amplicon was as a quality control measure to assess successful amplification of a single band. Materials used for electrophoresis were Agarose I™ biotechnology grade (ThermoFisher), 0.5 mL GelRed ® Nucleic Acid Stain (ThermoFisher), 100 bp DNA Molecular Weight Marker (Invitrogen, Waltham, MA, USA), 4X Gel loading buffer (BPB) (ThermoFisher), and distilled water molecular grade. GelRed stain was added to 1% warm agarose prepared in TBE, and 25 mL of agarose was loaded into the gel tray chamber with 1X TBE buffer. Each amplified DNA sample, 2 µL, was mixed with 7 µL loading dye on parafilm and then loaded into gel wells. One well was loaded with 5 µL of DNA molecular weight markers. The run was conducted at a constant 100 V for at least 30 min, and then observed under UV light (254 nm) to visualize the expected band size of approximately 1400 bp.

Sample Preparation for Sequencing
Purified PCR product, 5 µL, was combined separately with 5 µL of 10 pmol sequencing primers ISRs1_F8 and ISRs2_R42 in either 0.2 mL tubes or in 96-well plates. Premixed samples were submitted to Eurofins for Sanger sequencing (accessed 27 June 2022 (https:// eurofinsgenomics.com/en/products/dna-sequencing/all-sequencing-options/)). Samples were submitted at ambient temperature on the same day as PCR product purification. Use of a company's services for sequencing does not constitute endorsement by the U.S. Department of Agriculture.

Analysis of Sequence
Two hundred (200) strains from a variety of serotypes were processed by ISR to try to (i) encounter any problem that a user might experience, (ii) address those issues, and (iii) offer solutions and advice. File management is an issue, and users should consider how they will accession both strains and files associated with strains. Accessioning by processing date with the format YYMMDD works well. The user needs to determine what metadata are important to their operation. Finally, the information should be stored in a secure location, initial data should be saved as a master file that undergoes no further processing, and personnel with access to the data should understand the parameters of interpretation. Raw sequence files could come in an a *.abi format, which can be directly imported by bioinformatics software. Alternatively, some sequencing companies might supply plain text files for importation.
After importing and securing raw data files, the next step in analysis is to copy raw data files to a separate file for progressive trimming of both 5 and 3 ends. Using Geneious software (version 2022.1.1), 35 nucleotides (nt) were trimmed off both ends and the option of removing all ambiguities was selected. Sequences that were shorter than 500 nt after trimming were individually evaluated, and most were discarded for poor quality as defined by internal ambiguous results depicted in sequence as "N". After confirming the quality of trimmed sequences, all sequences were aligned with ISR database sequences to observe if both forward (F) and reverse (R) reactions had substantial overlap. The best quality result is when both F and R sequences span the entire ISR post-trimming. If sequence in only one direction aligns with an ISR, then it is appropriate to accept results if the length of sequence covers the entire ISR. If F and R sequences align to different ISRs, then that sample will require more scrutiny, as will be discussed under troubleshooting results. Users are encouraged to submit novel sequences not included in the updated ISR to the communicating author for further analysis and possible linkage to a known serotype.

Characteristics of the 2022 ISR Database as Compared to 2012
The previous ISR database last made available upon request had 242 individual sequences, and the current one has 268 sequences; thus, there are 26 new entries. The 2022 database was blasted against the NCBI Salmonella enterica subsp. I (taxid:59201) as a FASTA formatted file for immediate application by users (last date accessed 21 September 2022) (Supplement File S1). Within the 2022 update, the average length for ISR-C sequences was 438.2 nt, standard deviation was 79.25 nt, and the range was 257 to 556 nt. As will be discussed, the average length for extended ISR sequences (ISR-X) was 1302.9 nt, standard deviation was 158.59 nt, and the range was 898 to 1499 nt. Table 1 includes results from the first database aligned with current information at NCBI. Parameters for defining a conventional ISR (ISR-C) alignment were 100% Query Coverage (QC) and 100% identity with no ambiguities. In the updated ISR list, phenotypic designations such as "monoflagellated" or "possibly Java" were removed because ISRs are DNA based. However, O-antigen Group B immunogroups (IG), such as IG 1,4, [5],12:i:-, were retained as they have become distinguishable from related serotypes at the genomic level.

Repeatibility of 2012 and 2022 Database Sequence Alignments
Of the 242 sequences in the 2012 database, 169 had no change (69.8%). There were 40 previously reported sequences that aligned with new NCBI accession numbers in the 2022 database. Thus, 209 sequences of the 242 in the 2012 database (86.4%) repeated had no substantial change in 2022. This percentage compares favorably to MLST analysis of serotype [16]. Finding that ISR sequences within the 2012 database were eventually matched to a NCBI accession possibly reflects that the USDA laboratory received samples from more varied environments and sources as compared to submissions sent to NCBI. Overall, 100% of the 2012 database is contained within the 2022 database, and differences between the two are catalogued in Table 1. Table 1 includes an assessment of the confidence with which serotype is associated with an ISR. The confidence groups are as follows:

Assessing Confidence in Assigning Serotype to an ISR Sequence
(A) Three or more strains align with a single serotype, Ab) Greater than ten strains align with a predominant serotype with completed genomes, and alignment with additional serotypes accounts for no more than 20% of the total. For all genomes at any stage of completion, S. Enteritidis and S. Typhimurium together include 60,478 genomes (32.7%) of the 184,731 Salmonella enterica subsp. I genomes (synonymous with Salmonella enterica serovar enterica) at NCBI, whereas another 96 serotypes comprise the remainder of the dataset and range from 1 to 12,076 submitted genomes (Salmonella enterica subsp. enterica-NCBI-NLM (nih.gov): last date accessed 6 December 2022). Thus, the "Ab" confidence rating accounts for database size of complete genomes available for searching.
(B) Fewer than 3 strains align with a single serotype, (C) Two serotypes, distinguishable by a simplified KWL scheme, align with the same ISR, (D) Three or more serotypes align with a single ISR, requiring additional analysis by KWL, MLST, or WGS.
Previous determinations of serotype by alternative methods, such as DNA hybridization or submissions with completed KWL immunotyping, were kept within the 2022 database. Strains not available for further analysis are indicated in Table 1 by "not available (na)", and entries with this designation may have unique sequences without a NCBI accession or any other knowledge of serotype. Table 1 shows data for 268 ISR results classified by specificity of alignment and an assessment of confidence. An example of how to read results is ISR 147. It aligned with 4 S. enterica serotypes, namely S. Typhimurium, S. Enteritidis, S. Hissar, and S. Albert. Using the first 2 letters to indicate the respective serotype, column H reports 127Ty:1En:1Hi:1Al. The formula is interpreted as ISR-X 147 aligned perfectly to 127 strains of Typhimurium and to 1 strain each of the other 3 serotypes. Representative NCBI accession numbers respective to the listing of strains aligned is provided in an adjacent column. The confidence assessment for this ISR is thus assigned as Ab, because a typical strain of Typhimurium is about 40 times more likely to be encountered than one of the rarer serotypes. It is important for users to recognize that the NCBI database is skewed in numerical representation to those serotypes of most concern to public health. Increasing submissions of a serotype to NCBI often correlates with its relative importance for impacting human health.

Evaluating Multiplicity of Serotype Alignment to a Single ISR
Multiplicity of serotype alignment to a single ISR is important because one such instance, specifically ISR 37, is a known example of homologous recombination impacting serotype variability [28]. To see if there was further evidence of homologous recombination impacting ISR results, all results that had alignments to multiple serotypes with NCBI accessions are shown in Table 2. The adenylate cyclase (cyaA) sequence for each serotype was downloaded, aligned, and examined for having 100% query coverage (QC) and 100% alignment identity (ID) with no ambiguities. Adenylate cyclase was chosen as a secondary site in the genome to evaluate because it is a large housekeeping gene that infrequently generates single nucleotide polymorphisms (SNPs), and it is required for full metabolic potential and virulence of S. enterica subsp. I. Additionally, listed, respective to the order within the multiple alignment, is the KWL O-antigen grouping for each serotype. Results are that cyaA sequence and O-antigen grouping can differ; thus, chances are that homologous recombination events are impacting serotype variability and resulting in unlikely pairings. These results also suggest that submission errors to NCBI due to mixtures of serotypes in the same sample are not substantially impacting results because processing of data would be likely showing an unacceptable degree of nucleotide ambiguity. The 2012 database used strict parameters to define an ISR sequence. The convention was to use the first nucleotide after the end of the rrlH 23S ribosomal gene to the nucleotide that preceded tRNA-asp as the ISR sequence for BLAST searches. [24]. However, additional sequence is generated during the same sequencing reactions at no extra cost. Extending the boundaries of the conventional ISR to include unambiguous sequence that was previously trimmed increased specificity for assigning serotype in some cases by decreasing multiple alignments. Extended sequences are labeled with an "X" after the ISR number in the 2022 database, whereas the conventional length sequences are labeled "C" (Supplement File S1). For the 37 instances of multiple alignments shown in Table 2 (13.8% of the 2022 database), 6 were resolved to one serotype and 21 were resolved to 2 serotypes using ISR-X. Ten (10) ISRs were not improved by using ISR-X and had 3 or more serotype alignments. Of the 21 that were resolved to 2 serotypes, one O-antigen antisera from the KWL scheme would differentiate 12 of them, whereas 9 shared the same O-antigen epitopes [10]. These results indicate that 93% of ISRs could be assigned a serotype name using a single Oantigen to differentiate some double alignments. As with MLST and WGS, there are always outliers that require further analysis or multiple approaches to best assign a strain to a serotype or closest evolutionary group [16,17]. The recent description of a S. Lubbock-S. Mbandaka hybrid identified by WGS was also identified by ISR as being unusual; in addition, some S. Typhimurium strains appeared to align with this hybrid, which is "(37)X" in Supplement File S1 (Table 2).
Thirty-one (31) serotypes (11.6% of the 2022 database) in Table 2 could not be resolved to a single serotype by using an extended ISR region, and thus they aligned with two or more serotypes. The most extreme example is S. Typhimurium, which appears 9 times in combinations with other serotypes (Table 2). In contrast, S. Enteritidis appears twice in Table 2, and both times together with S. Typhimurium. This result supports the theory that S. Typhimurium retains its position as one of the top 3 persistent cause of foodborne salmonellosis because it has an exceptional ability to undergo homologous recombination [29]. In contrast, S. Enteritidis is especially evolved to colonize and persist in modern food commodities and might not accept donor DNA efficiently because its genome is at a peak of optimization [30]. The next most frequently occurring serotype within Table 2 is S. Newport, which appears 6 times. If frequency of ISR appearance in multiple serotypes is an indication of donor capability, then S. Newport also appears to be a serotype that is highly competent at undergoing homologous recombination.
There are serotype pairings that appear multiple times in Table 2. Examples are S. Newport and S. Bardo (ISRs 87 and 136), S. Saintpaul and S. Stanleyville (ISRs 92 and 141), a different pairing of S. Saintpaul and S. Stanleyville (ISRs 141 and 184), and S. Newport and S. Abaetetuba (199 and 222). These pairings resulted from ISRs that differed either by SNPs or by length of ISR-X. It is possible there is some preference for homologous recombination to occur for some pairs. Of note is that S. Newport and S. Bardo are primarily differentiated by bacteriophage content and that serotypes S. Senftenberg and S. Dessau are not differentiated by cyaA sequence or the KWL typing scheme (10). Thus, it is possible that some pairings indicate variants within a single serotype, and thus they should not be given individual names. The Pasteur KWL reference on S. enterica serotypes provides many examples of adjustments to interpretation of serotype [10]. In addition, MLST may be preferred as an alternative to the KWL scheme for grouping Salmonella for commonly encountered groups [20].

Plasmid Association of Isr Sequence
One of the more perplexing results that differentiated the 2012 and 2022 databases was finding 5 ISR sequences (1.9% of the 2022 database) that cross-aligned with a ribosomal gene region other than dkgB, namely hdfR. We took the 258 nt ISR-C 210 sequence and blasted it separately against all of Salmonella enterica (taxid:28901) and associated plasmids. The sequence was highly conserved within S. enterica subsp. I genomes, and 901 genomes had identities > 98% with query coverage of 100%. Five (5) plasmid alignments were reported and were from S. enterica subsp. I. One unnamed plasmid from S. Typhimurium strain SJTUF10484 (CP047533.1) had a striking alignment to ISR 210 with a query coverage of 100%, identity of 98.6%, a maximum score of 449, and an Evalue of 1 × 10 −125 . This large plasmid (96,002 bp) had several core genes, such as cyaA, hemD, and several LPS genes. Another plasmid from S. Senftenberg (LN86894.1) had a query coverage of 100%, percent identity of 98.04%, a maximum score of 355, and an Evalue of 3 × 10 −97 ; however, it had only 200/204 identities as compared to 253/258 for the S. Typhimurium plasmid. There were 3 other alignments of much poorer similarity, 2 more from S. Senftenberg and 1 from S. Infantis. Thus, the plasmid with the ribosomal region associated with hdfR appears to be a rare find, as it was the only one with such a high-quality alignment.
This result suggests that DNA associated with the sequence of ISR 210 has potential to cross genus and species boundaries by homologous recombination, and that some unusual plasmids might be involved [31]. Finding an extrachromosomal element with potential for transmitting an ISR related sequence within the Enterobacteriaceae and that also has an association with serotype of S. enterica subsp. I is an intriguing but poorly understood result due to the rarity of the find. It perhaps bolsters the concept that S. Typhimurium is especially proficient at donating DNA amongst the Enterobacteriaceae.

Conclusions
The NCBI database of completed Salmonella enterica subsp. I genomes is exponentially larger today than it was in 2012, and it now includes approximately 3327 completed genomes including 1419 plasmids accessed for these analyses. Overall, the 2012 ISR database transitioned to the larger NCBI database of accessioned Salmonella enterica serotypes intact. The most challenging aspect encountered was the alignment of multiple serotypes with a single ISR sequence for 31 of 268 total sequences (11.9%). While extending the strict boundaries of the conventional ISR sequence reduced multiplicity of alignment, the phenomenon might provide insight into the evolution of S. enterica subsp. I that poses persistent challenges for protecting the safety of the food supply and the health of people and animals.
We suggest, based on evidence, that ISR sequencing is detecting instances of homologous recombination. Alternative explanations about the alignment of multiple serotypes to a single sequence are refutable. For example, a mixture of DNA from different strains is detectable during sequencing and should show conflicting associations between serotype and expected submission because of sequence differences in housekeeping genes located around the genome. Another alternative explanation to homologous recombination would be that the ISR database is more limited than expected for screening serotype. We have shown here that, of the 242 sequences available in 2012, 209 (86.3%) were not impacted by anything more than having NCBI release an associated accession number. Of the 268 sequences in the 2022 database, 237 out of 268 (88.4%) could be resolved to aligning with a single serotype if the ISR was extended, and 231 (86.2%) were resolved using the conventional ISR boundaries of 2012.
The 2012 database had 40 ISR sequences that were not matched with NCBI accessions until ten year later, which supports that ISR was identifying circulating serotypes faster than submissions of whole genomes were completed within the national database. The 2022 ISR database still contains 23 unique sequences that have not been matched to any database. These results support that another use of ISR is to select new strains for analysis by whole genome sequencing, which would help to avoid redundancy and provide a larger picture of evolution occurring in an important foodborne pathogen. It is important to note that ISR is a democratized assay, requiring no reporting of results or reference to its use [32]. Small in-house laboratories capable of conducting PCR and approved to culture BS level 2 pathogens such as Salmonella can use ISR for serotyping of Salmonella enterica subsp. I. Democratization of a simpler typing method that has been evaluated in reference to well-curated and publicly accessible information such as that at NCBI can help to put the ability to follow evolutionary trends of Salmonella enterica subsp. I occurring in association with food products in more hands.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/microorganisms11010097/s1, Supplement File S1 is the 2022 ISR database with 268 sequences. To convert the PDF to a plain text FASTA formatted file, copy and paste the entire file to Word using the text only option. Then, save the Word file as plain text (Notepad). This file can be uploaded directly to the NCBI microbe BLAST site available at: Nucleotide BLAST: Search nucleotide databases using a nucleotide query (nih.gov) accessed 19 December 2022).