Identification of Clinically Relevant Streptococcus and Enterococcus Species Based on Biochemical Methods and 16S rRNA, sodA, tuf, rpoB, and recA Gene Sequencing

Streptococci and enterococci are significant opportunistic pathogens in epidemiology and infectious medicine. High genetic and taxonomic similarities and several reclassifications within genera are the most challenging in species identification. The aim of this study was to identify Streptococcus and Enterococcus species using genetic and phenotypic methods and to determine the most discriminatory identification method. Thirty strains recovered from clinical samples representing 15 streptococcal species, five enterococcal species, and four nonstreptococcal species were subjected to bacterial identification by the Vitek® 2 system and Sanger-based sequencing methods targeting the 16S rRNA, sodA, tuf, rpoB, and recA genes. Phenotypic methods allowed the identification of 10 streptococcal strains, five enterococcal strains, and four nonstreptococcal strains (Leuconostoc, Granulicatella, and Globicatella genera). The combination of sequencing methods allowed the identification of 21 streptococcal strains, five enterococcal strains, and four nonstreptococcal strains. The 16S rRNA and rpoB genes had the highest identification potential. Only a combination of several molecular methods was sufficient for unambiguous confirmation of species identity. This study will be useful for comparison of several identification methods, both those used as a first choice in routine microbiology and those used for final confirmation.


Introduction
Gram-positive bacteria of the Streptococcus and Enterococcus genera are of great clinical and epidemiological importance, and most species are components of the natural human microbiota [1]. The genus Streptococcus includes a large number (at least 135) [2,3] [https://www.bacterio.net/genus/ streptococcus] of species that colonize human and animal mucous membranes. Species such as Streptococcus pyogenes, Streptococcus pneumoniae, and Streptococcus agalactiae are highly virulent and cause infections and diseases such as scarlet and rheumatic fevers, pneumonia or neonatal sepsis [4][5][6]. Streptococci are classified based on colony morphology, hemolysis type, and serological specificity. The serological specificity is based on antigenic differences in cell wall carbohydrates, in cell wall pilus-associated proteins, and in the polysaccharide capsule in group B streptococci [7]. The classification and nomenclature of streptococci are based on group antigens (Lancefield serotyping system) as follows: group A Streptococcus (GAS); group B Streptococcus (GBS); group C Streptococcus; group G Streptococcus; the viridans group, with the subgroups anginosus, mitis, mutans, and salivarius; and the bovis group [8][9][10].
The members of the genus now known as Enterococcus were formerly considered to be group D Streptococcus until 1984 [11]. Isolates from the Enterococcus genus are commensals of the gastrointestinal tracts of humans and animals and include 64 species [12,13] [https://bacterio.net/genus/enterococcus]. All Enterococcus species are classified into the antigen D group by the Lancefield system [11] and exhibit gamma-hemolysis on blood agar, although some strains are alpha-hemolytic or beta-hemolytic [14,15]. Enterococcus faecalis and Enterococcus faecium can cause a variety of infections, including endocarditis and urinary tract infections [16,17].
The addition of new species, changing taxonomy and modification of the systematic names of streptococci and enterococci, poses a challenge to proper identification of species. Therefore, precise identification of these species is laborious. Clinical laboratories use phenotypic biochemical methods such as Vitek ® 2 (bioMérieux, La Balme Les Grottes, France) and BD Phoenix (BD Diagnostic Systems, Sparks, MD, USA), commercial rapid test kits such as API ® Strep (bioMérieux, La Balme Les Grottes, France) and matrix-assisted laser desorption ionization-time-of-flight mass spectrometry (MALDI-TOF MS). In routine diagnostics, especially the Vitek ® 2 system is used. This system is based on kinetic analysis detecting metabolic changes and by additional continuous monitoring of reactions, provides much faster species identifications [18]. Nevertheless, the technique so far has failed at differentiating between mitis, bovis groups, and other closely relative species [19,20]. On the other hand, commercially available MALDI-TOF MS systems provide accurate identification of many clinically relevant streptococcal species. However, MALDI-TOF spectra databases are limited to only some species, and further improvements of Streptococcus and Enterococcus spectra databases seem necessary. The phenotypic trait variability within strains and species using this method compared to methods based on genetic discrimination causes limited differentiation capacity; consequently, more than 50% of these bacteria are incorrectly identified [21,22].
The development of molecular biological techniques has made it possible to rapidly and reliably diagnose infections caused by bacteria of the Streptococcus and Enterococcus genera. Genetic methods are based on PCR or sequencing, and identification is based on selected molecular target amplification, sequencing, and comparison to a reference sequence deposited in a nucleotide database [13]. 16S rRNA gene sequencing has proven to be one of the most powerful tools for the classification of microorganisms, including streptococci and enterococci [1,23]. However, due to low specificity, the correct identification of bacterial species should not be based on the nucleotide sequence of a single gene. For unambiguous species confirmation, it is necessary to use additional molecular markers. For the identification of Streptococcus and Enterococcus isolates, several gene targets, such as genes encoding manganese-dependent superoxide dismutase (sodA) [24], the elongation factor Tu (tuf ) [25], and beta-subunit of RNA polymerase (rpoB) [26], have been used. Furthermore, for species included in the mitis (currently includes about 20 different species [27,28]) and bovis (Streptococcus bovis, Streptococcus equinus, Streptococcus gallolyticus, Streptococcus lutetiensis, Streptococcus alactolyticus [29]) groups, which are closely related, other conserved molecular targets, such as the subunit of the bacterial recombinase (recA) gene, may be used [30,31].
The aim of this study was to identify clinically relevant Streptococcus and Enterococcus species using genetic and phenotypic methods and to determine the most discriminatory identification method. In our study, the Vitek ® 2 system and Sanger sequencing of five genes, namely, the 16S rRNA, sodA, tuf, rpoB, and recA genes, were used.

Serotyping and Identification of Gram-Positive Cocci with the Vitek ® 2 System and MALDI-TOF MS
After recovering the isolates from clinical samples, the hospital laboratories identified all of the isolates at the genus level. All isolates were identified as Streptococcus and Enterococcus with routine diagnostic methods. Afterwards, serotyping and identification at the species level were performed in our laboratory. The Lancefield serotype groups were assigned: 57% streptococci, 60% enterococci, and 50% other nonstreptococci. No visible agglutination of latex or autoagglutination with more than one reagent with antibody particles was interpreted as ambiguous. Briefly, in the streptococcal serotype identification performed with the Pastorex™ Strep Test Kit (Bio-Rad, Hercules, CA, USA), a positive reaction is indicated by red clumps on a green background, visible to the naked eye. Agglutination intensity and time of appearance depend upon the strain tested. Only marked, rapid agglutination with only one of the six latex suspensions convincingly establishes the group of the strain tested. A negative reaction is indicated by a homogenous brown suspension, without clumps, after one minute of agitation. A reaction is un-interpretable if small clumps appear on a brown background, or if agglutination appears with more than one latex reagent in the kit [32].

Analysis of the recA Gene for the Streptococcal mitis Group
The streptococcal species that belong to the mitis group (S. pneumoniae, S. pseudopneumoniae, S. mitis, S. oralis, S. gordonii, S. sanguinis, and S. parasanguinis) are closely related phylogenetically.
For precise differentiation of species within this group, sequencing of the recA gene was used. The specific nucleotide signatures of the 313-bp fragment of the recA gene sequence were compared to reference sequences in GenBank (HM572273-HM572277). Sanger sequencing of the recA gene allowed precise identification of strains from the mitis group, namely, S. pneumoniae, S. pseudopneumoniae, S. mitis, S. oralis, and S. infantis. The alignment showed six specific nucleotides at positions 97, 160, 199, 247, 250, and 280 ( Figure 1). The nucleotide signature is based on homology analyses of recA gene sequences from reference strains of the aforementioned species and our strains. The recA gene sequence of the p41 strain was almost identical to the reference sequence (S. pseudopneumoniae), with a one-nucleotide difference at position 280. For PL427, differences at two nucleotide positions were observed in comparison to S. infantis. The only method that allowed unambiguous identification of S. pseudopneumoniae was Sanger sequencing of the recA gene.

Comparison of the Sequencing Methods
The combination of sequencing methods based on the 16S rRNA, sodA, tuf, rpoB, and recA genes allowed the identification of 21 streptococcal strains, five enterococcal strains, two Leuconostoc strains, one Globicatella sanguinis strain, and one Granulicatella adiacens strain. Due to high (or identical) similarity or a lack of similarity with the reference sequences in GenBank and leBIBI QBPP , it was not possible to identify all the strains at the species level by using the targets separately (Table 2).
For Streptococcus, Sanger sequencing of the 16S rRNA gene had the highest identification potential, allowing the identification of 19 (90%) strains. Additionally, rpoB gene sequencing had high discriminative potential, allowing the identification of 18 (86%) Streptococcus strains. Sanger sequencing of the tuf gene had moderate identification potential and identified 13 (62%) streptococcal strains. Sanger sequencing of the sodA gene had the lowest discriminatory potential, allowing the identification of 12 (57%) streptococcal strains.
Sanger sequencing of rpoB and tuf allowed the identification of five (100%) analyzed enterococcal strains. Sequencing of the 16S rRNA and sodA genes had moderate identification potential and allowed the identification of four (80%) enterococcal strains (Table 3).

Phylogenetic Analysis of Streptococcus and Enterococcus
To show the relationships among the species, phylogenetic trees were constructed. The evolutionary distances were computed using the Jukes-Cantor method and are shown in units of the number of base substitutions per site. The computed overall means for the 16S rRNA, rpoB, soda, and tuf genes were 0.098, 0.225, 0.348, and 0.176, respectively. In the phylogenetic tree constructed for the tuf gene, the Leuconostoc species sequences are shorter because sequences of the same length as those of other species could not be obtained. Both streptococci and enterococci are grouped into separate clusters. Moreover, the Streptococcus strains are divided into mitis, bovis, and anginosus complexes. Sequencing of the 16S rRNA, rpoB, and tuf genes showed that L. lactis, L. citreum, G. sanguinis, and G. adiacens were distantly related to the other species (Figures 2-5).

Discussion
Because of the variability of strains and challenging taxonomic changes of Streptococcus and Enterococcus species, it is necessary to use a reliable identification method to better understand the pathogenic potential of various streptococcal and enterococcal species. The currently used phenotypic identification methods based on morphological and biochemical characteristics appear to be unreliable and are characterized by low discriminatory potential [33][34][35].

Discussion
Because of the variability of strains and challenging taxonomic changes of Streptococcus and Enterococcus species, it is necessary to use a reliable identification method to better understand the pathogenic potential of various streptococcal and enterococcal species. The currently used phenotypic identification methods based on morphological and biochemical characteristics appear to be unreliable and are characterized by low discriminatory potential [33][34][35].
In this study, we applied biochemical methods and genetic sequencing-based methods to identify clinically relevant Streptococcus and Enterococcus species. We showed that the Vitek ® 2 system and MALDI-TOF MS did not correctly identify particular closely related species, such as S. mitis, S. oralis, and other species of the mitis group. Overall, the phenotypic methods allowed the identification of 48% of streptococcal and 100% of enterococcal strains. These data are consistent with previous data in the literature [19,[36][37][38][39]. Therefore, applying genetic methods in standard microbiological diagnostics can lead to unambiguous confirmation at the species level. Genotypic methods utilizing Sanger sequencing of targeted genes were shown to be useful for both Streptococcus and Enterococcus identification [13,25]. 16S rRNA is mostly used to identify unknown organisms because of the availability of universal primers [23,40]. However, most reports show that the discriminatory power of 16S rRNA gene sequencing is very low for closely related streptococcal and enterococcal species [13,41,42]. Analysis based on only one gene target is not recommended because duplication, gene transfer, and gene loss can affect the reliability of the results [43,44].
In this study, we used a combination of four gene targets (16S rRNA, sodA, tuf, rpoB) to unambiguously confirm the identity at the species level for 21 streptococci and five enterococcal strains. None of the individual sequencing-based methods allowed the identification of all species. In our study, Sanger sequencing of the 16S rRNA gene had the highest discriminatory power, allowing unambiguous identification of 19 (90%) of the analyzed streptococcal strains, but the rpoB gene had almost identical identification potential, allowing the identification of 18 (86%) Streptococcus strains. For Enterococcus strains, Sanger sequencing of the tuf and rpoB genes allowed the identification of five (100%) strains. The 16S rRNA and sodA genes did not allow identification of all Enterococcus strains, but in our study, this group was very small (only five strains).
Over the years, the taxonomy of bacteria has changed, and streptococcal groups, i.e., mitis and bovis, have undergone several reclassifications. Moreover, incorrect systematic names of bacteria have been deposited in publicly available databases [45]. In our study, several problematic situations occurred. First, Streptococcus tigurinus was classified as S. oralis subsp. tigurinus, but in 2012, this species was separated into two different species. Finally, in 2016, it was again proposed that this species be classified as S. oralis subsp. tigurinus [27,46]. Our sequence was aligned to the sequence of S. oralis, but the next closest species was S. tigurinus. Incorrect taxonomic annotations of DNA sequences are often present in databases [45]. A similar situation was found for S. lutetiensis (PL428 strain), which was described as S. infantarius subsp. coli based on the sodA and tuf genes. In 2005, the International Committee on Systematics of Prokaryotes (Status of strains that contravene Rules 27 (3) and 30 of the International Code of Nomenclature of Bacteria, Opinion 81) accepted S. lutetiensis as the correct systematic name [47], but in databases, double taxonomic annotation for one organism can be found.
For Streptococcus, there were also some problematic cases in the anginosus group (also known as the S. milleri group). Strain 5898/10 was identified as S. anginosus by 16S rRNA gene sequencing, but other molecular methods showed ambiguous identification among the S. anginosus-S. milleri-S. intermedius species. Such a situation was observed by others [50,51]. A similar problem was observed in the identification of the 1107/08 and 6922/09 strains. Only 16S rRNA and rpoB allowed Streptococcus constellatus identification, while for the sodA and tuf genes, our strain sequences shared high nucleotide similarities with both the S. anginosus and S. milleri sequences. The Streptococcus milleri group proved to be challenging to identify in previous studies [51,52].
Both phenotypic and genetic methods correctly identified the nonstreptococcal species as Globicatella sanguinis, Granulicatella adiacens, Leuconostoc citreum, and Leuconostoc lactis. Globicatella sanguinis was initially described as Streptococcus uberis and Aerococcus viridans due to similar phenotypic properties. The advanced methods allowed the distinguishing and classification of G. sanguinis into a new species [9,[53][54][55]. In our study, this species was identified by all four gene targets (16S rRNA, sodA, rpoB, and tuf ).
Granulicatella adiacens was first described as Streptococcus adjacens and then as belonging to the Abiotrophia genus due to distant relations with streptococci. Collins and Lawson proposed a new genus, Granulicatella, due to significant differences [56,57]. In our study, strain PL434 was identified as G. adiacens by all sequencing methods.
The Leuconostoc genus is often identified as Streptococcus spp. Because similar biochemical properties and serotypes of the D group are observed, Leuconostoc species are difficult to detect with routine diagnostic methods [9]. It has been suggested that Leuconostoc is a pathogen that colonizes the gastrointestinal tract and is present in neutropenic patients [58,59]. For the Leuconostoc genus, strain 1113/11 was correctly identified by the Vitek ® 2 system and based on the rpoB gene, but the 16S rRNA and tuf genes were ambiguous between L. lactis and L. garlicum. For the sodA gene, there was no L. lactis reference sequence available in databases, but the sequence was identical to S. parasanguinis. Such results were not observed by other research groups, but our study showed that in some cases the distinction between two bacterial genera is not possible by only one molecular target. For both Leuconostoc strains (1113/11 and 3696/08), the other set of primers for tuf gene amplification had to be used [60].
Strain 3696/08 was correctly identified as L. citreum by 16S rRNA, tuf, and rpoB gene sequencing, but amplification of the sodA gene was problematic. The primers d1 and d2 [24] used for the sodA gene in other Streptococcus strains did not result in PCR product amplification.
In our study, S. pseudopneumoniae was not identified by any of the four Sanger sequencing-based or phenotypic methods. Arbique et al. and Harf-Monteil et al. observed similarity between the isolates identified as S. pseudopneumoniae and S. pneumoniae, which demonstrated a high degree of homology and shared phenotypic characteristics [61,62]. However, in 2011, Zbinden et al. suggested that sequencing of the recA gene could differentiate between S. pneumoniae and S. pseudopneumoniae [31]. In our study, in addition to identification of the Streptococcus mitis group, we used Sanger sequencing of the recA gene, which successfully confirmed the identities of the S. pseudopneumoniae, S. pneumoniae, S. mitis, S. oralis, and S. infantis species. Moreover, it was the only method that correctly identified the p41 strain as S. pseudopneumoniae.
In Streptococcus species genetic diagnostics, other molecular target such as sequencing of the ddl or gdh genes could also be used [63,64]. However, these targets are not commonly used and are usually used for identification of specific species groups [65,66]. The advanced molecular diagnostics precision should definitely be strengthened with methods based on next-generation sequencing, but the costs and challenging data analysis are the pitfalls of these methods to be used in routine diagnostic laboratories [67].
To conclude, phenotypic methods such as the Vitek ® 2 system and MALDI-TOF MS constitute basic methods because the results are received after approximately 8 h and are characterized by lower costs than those of genetic methods. However, Sanger sequencing and PCR-based approaches proved to be excellent tools for identification at the species level for both Streptococcus and Enterococcus strains. We also proved that the use of only one method is often not enough for appropriate identification at the species level.

Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.

Bacterial Isolates
The collection of bacterial isolates included 30 isolates of 15 Streptococcus species, five Enterococcus species, two Leuconostoc species, and one isolate each from Globicatella and Granulicatella species recovered from clinical origin (Table 4). Most isolates were recovered from the National Medicines Institute in Warsaw (n = 13), with five isolates from the University Medical Center Groningen and 12 from Pescara Local Hospital. The isolates were cultured on blood agar medium with 5% sheep blood (bioMérieux, La Balme Les Grottes, France) and incubated at 37 • C in an atmosphere of 5% CO 2 for 20 h.

Phenotypic Identification Tests
All isolates were identified using two phenotypic tests. The Vitek ® 2 system (bioMérieux, La Balme Les Grottes, France) was used to identify isolates at the genus and species levels. The suspension used in the Vitek ® 2 system was adjusted to a McFarland standard of 0.5 by using a densitometer and interpreted according to the manufacturer's instructions. A score of ≥96% indicated excellent species identification; 91-95% indicated very good species identification. A score of 89-92% indicated good species identification. For streptococcal serotype identification, the Pastorex™ Strep Test Kit (Bio-Rad, Hercules, CA, USA) was used. The bacterial cells were suspended in 300 µL of enzymatic extract and incubated at 37 • C for 15 min. After incubation, the reagent with antibodies and bacterial suspension was applied to identification cards and mixed. The results were read after 30 s.

MALDI-TOF MS Identification
The MicroFlex MALDI-TOF mass spectrometer with MALDI Biotyper software 2.0 (Bruker Daltonics, Bremen, Germany) was used for isolate identification. Identification of isolates PL434, 1113/11, 3696/08, p41, and 1375/11 using MALDI-TOF MS was performed by The Microbiological Laboratory of the Jagiellonian Center of Innovation (Krakow, Poland). Sample extraction and strain identification were performed following the manufacturer's instructions. A score of >2 indicated correct genus and probable species identification.

Genomic DNA Extraction
The Qiagen DNeasy Blood & Tissue Kit (Qiagen, Germantown, MD, USA) was used for genomic DNA extraction. Bacteria were homogenized with a TissueLyser II (Qiagen, Germantown, MD, USA) for five minutes at a frequency of 50 Hz. After homogenization, the tubes were centrifuged for 10 min at 13 200 rpm. The subsequent steps were performed according to the manufacturer's instructions.
4.6. PCR Amplification of the 16S rRNA, sodA, rpoB tuf, and recA Genes Both bacterial DNA and the negative control (nuclease-free H 2 O (EurX-Molecular Biology Products, Gdansk, Poland)) were amplified with primers for a given locus. As shown in Tables 5 and 6, primers specific for the targeted locus were used as described previously [21,[24][25][26]31,60,68]. Based on our previous studies, the PCR programs were modified slightly to obtain increased product quality [13].  All PCR products were resolved by electrophoresis in 1% agarose with 1× TAE and then purified using the DNA Clean & Concentrator™ Kit (Zymo Research, Irvine, CA, USA; A&A Biotechnology, Gdynia, Poland). Concentrations and purity were measured using a NanoDrop ND-1000. Sanger sequencing was performed at GATC Eurofins Genomics (Ebersberg, Germany) and Genomed S.A. (Warsaw, Poland) with the same primers as those used for PCR (Tables 5 and 6).

Sanger Sequencing Analysis of the 16S rRNA, sodA, rpoB, and tuf Genes
The Sanger sequencing results were analyzed using Chromas software (version: 2.6.6). Nucleotide BLAST (Basic Local Alignment Search Tool http://www.ncbi.nlm.nih.gov/BLAST/) was used to analyze the obtained sequences and align them to the reference sequences deposited in the GenBank (https://www.ncbi.nlm.nih.gov/nucleotide/) and leBIBI QBPP (leBIBI-Quick BioInformatic Phylogeny of Prokaryotes) (https://umr5558-bibiserv.univ-lyon1.fr/lebibi/lebibi.cgi) databases. The first and second best species alignments were analyzed. To identify the selected strain at the species level, the criterion of ≥99% first best match with the reference database and a difference of at least two nucleotides between the first and second best matches was used [13,69]. All sequences were aligned in ClustalW. The phylogenetic trees were constructed using the neighbor-joining method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) and evolutionary distances were computed using the Jukes-Cantor method (MEGA, version 7.0.26, Pennsylvania State University, State College, PA, USA). Pairwise comparison of each pair of sequences was performed using CLC Genomics Workbench (version 8.1, Qiagen, USA).

Nucleotide Sequence Accession Numbers
The 124 sequences for 21 Streptococcus, five Enterococcus, and four other species were annotated using the NCBI BankIt tool and deposited in the GenBank database (https://www.ncbi.nlm.nih.gov/ genbank/) under the following accession numbers: for the 16S rRNA gene, MT535599-MT535603, MT535764 and MT535859-MT535882; for the sodA gene, MT560910-MT560938; for the tuf gene,