Epidemiology of Antimicrobial Resistance Genes in Streptococcus agalactiae Sequences from a Public Database in a One Health Perspective

Streptococcus agalactiae is a well-known pathogen in humans and food-producing animals. Therefore, this bacterium is a paradigmatic example of a pathogen to be controlled by a One Health approach. Indeed, the zoonotic and reverse-zoonotic potential of the bacteria, the prevalence of Group B Streptococci (GBS) diseases in both human and animal domains, and the threatening global situation on GBS antibiotic resistance make these bacteria an important target for control programs. An epidemiological analysis using a public database containing sequences of S. agalactiae from all over the world was conducted to evaluate the frequency and evolution of antibiotic resistance genes in those isolates. The database we considered (NCBI pathogen detection isolate browser—NPDIB) is maintained on a voluntary basis. Therefore, it does not follow strict epidemiological criteria. However, it may be considered representative of the bacterial population related to human diseases. The results showed that the number of reported sequences increased largely in the last four years, and about 50% are of European origin. The frequency data and the cluster analysis showed that the AMR genes increased in frequency in recent years and suggest the importance of verifying the application of prudent protocols for antimicrobials in areas with an increasing frequency of GBS infections both in human and veterinary medicine.


Introduction
Streptococcus agalactiae, also known as Group B Streptococcus (GBS), is a Gram-positive bacterium able to colonize healthy adults' intestinal and vaginal tracts. Based on the capsular polysaccharide, S. agalactiae was categorized into ten serotypes (Ia, Ib, II, III, IV, V, VI, VII, VIII, IX). The disease-causing dominant serotype varies by geography and may have a different distribution of invasive and colonizing strains. The most prevalent serotypes associated with colonizing strains are serotypes I-V, whereas serotype Ia is the principal cause of maternal GBS disease. Additionally, a strong association between serotype and invasive infant disease has been observed regarding serotypes III, Ia, Ib, II, and V. Whereas, serotypes Ia, Ib, II, and V are the most prevalent causes of invasive disease in non-pregnant adults [1][2][3].
This commensal bacterium is the leading cause of invasive neonatal infections (e.g., bacteremia and meningitis) in industrialized countries and a pathogen of growing importance in the elderly, particularly in people having pre-existing diseases such as diabetes or cancer [1][2][3]. The most common syndromes due to invasive GBS disease in adults are bacteremia without a focus and skin/soft tissue infections, but bacteremia may also lead to seeding of the cardiac valves, and to endocarditis. Due to the burden of GBS disease in neonates, preventative measures have been developed to minimize invasive disease. In 1973, a program of maternal antibiotic administration began to prevent neonatal GBS disease [2]. Despite the effectiveness of Intrapartum Antibiotic Prophylaxis (IAP), according to the World Health Organization, GBS still causes 150,000 stillbirths and infant deaths worldwide [4]. Depending on the established guidelines, the administration of IAP is based on risk factors or screening protocols and, for this reason, antibiotic resistance rate monitoring is of paramount importance. The first-line antibiotic for IAP is penicillin. In cases of severe penicillin allergies, the recommendation is to use second-line antibiotics such as macrolides (erythromycin) and lincosamides (clindamycin); unfortunately, increasing resistance to both antibiotics restricts their use. While penicillin is still considered effective against GBS, increasing reports of isolates less sensitive to this antibiotic are causing great concern, especially when resistance to second-line antibiotics becomes an established trait in GBS [3,5]. Indeed, the spread of these resistant pathogens in humans and animals, and the potential contamination of the environment, through the use of manure as fertilizer will increase the global risk from a One Health perspective [6].
S. agalactiae is also a well-known pathogen in food-producing animals. Indeed, it is a leading cause of contagious mastitis in dairy cows [7][8][9], and streptococcosis leads to several clinical diseases in fish species in the aquaculture industry [10,11]. The S. agalactiae infection in cows has been considered a zoonotic disease in several countries. Moreover, the risk of transmission from humans to cows has also been suggested (reverse zoonosis) [12,13]. These aspects need further investigation, but they are not the only ones to be considered. Indeed, the control strategies of GBS in human and veterinary medicine are based mainly on antimicrobials and often with the same type of molecules. Therefore, the increasing incidence of antibiotic resistance poses a serious threat to disease treatment and is now recognized as a significant public health concern [14,15].
The S. agalactiae infections in human beings and animals are a paradigmatic example of a disease to be controlled by a One Health approach. Indeed, the zoonotic and reversezoonotic potential of the bacteria, the prevalence of GBS diseases in both human and animal domains, and the threatening global situation on GBS antibiotic resistance make these bacteria an important target for control programs. These programs should consider the bacteria's genetic and phenotypic characteristics and the epidemiology of the infection with an approach that should consider both isolates from human beings and animals and the potential risk of spreading bacteria and resistance genes through the environment.
Within a project aiming to develop effective control programs for S. agalactiae, we performed an epidemiological analysis using a public database (NCBI Pathogen Detection Isolate Browser; NPDIB) containing sequences of S. agalactiae from all over the world. This analysis aims to evaluate the frequency and how it changed in recent years, as well as the geographical and biological sources of antibiotic resistance genes in those isolates, to identify the potential presence of specific AMR patterns related to any of the considered factors.

Data Description
The public database at the date 30 April 2022 included 1828 isolates. Among them, 1164 were classified as clinical, 658 were not classified, and six were from environmental sources. From 2010 to 2015, when approximately 33% isolates were recorded, the overall number of isolates reported into the database increased at a consistent rate until 2020, when a rise in records of about 25% was noticed in less than two years ( Table 1). The pattern was different for clinical isolates that showed a more erratic trend with the highest frequencies in 2011-2015 and 2019-2020 when nearly 60% of the clinical isolates were reported, while only 1% of the clinical isolates were reported in 2010-2012. European isolates represent nearly 40% of all records, while African and Australian showed the lowest frequencies (<2%). The highest frequency of clinical isolates was also from Europe with 47.5% of the isolates, while the other geographical areas contributed about 10% each of the sequences, apart from Africa and Australia that showed very low frequencies in this case ( Table 2). Blood was the most frequent biological source, with 16% of the overall isolates and 25% of the clinical ones (Table 3). Interestingly, human milk was reported in 10% of the overall samples, and only in 2.9% of the clinical ones, while all other sources increased their frequency when clinical cases were considered. Animal sources represented less than 1% of the reported isolates.

Resistance Gene Distribution
S. agalactiae is known to have multiple antimicrobial resistance (AMR) genes. Overall, 19 different AMR genes were considered in the database, and their frequency ranged from 0.5% of catTC and ermT up to 59.8% of tetM (Table 4). However, the clinical isolates showed a different distribution. Indeed, catTC and ermT were the genes with the lowest frequency (0.8%), while tetM also in this group had the highest frequency (66.5%). A detailed description of AMR associated with the genes is reported in Supplementary Table S1. A statistical analysis (χ 2 test) of the eight most frequently isolated genes (>100 positive sequences) was carried out to see whether there may be significant variations in the distribution pattern between overall sequences and clinical ones. The results showed that only tetO and ant(6)la significantly differed in frequency (Table 5). Indeed, ant(6)la had a higher-than-expected frequency in clinical cases, while tetO had a lower-than-expected frequency in the same group of sequences. 1 Bold type shows that the observed frequency is different from the expected at Fisher's exact test (α = 0.05). 2 > sign means that an observed frequency higher than expected, while < sign means a lower-than-expected frequency.

Cluster Analysis
The AMR pattern of a microorganism is the result of the different combinations of AMR genes, and of their expression in the host. Cluster analysis (Figure 1), as described in Section 4, was performed to analyze the presence of these patterns and the relationship with the other available factors (year, area and source). 1 Bold type shows that the observed frequency is different from the expected at Fisher's exact test (α = 0.05). 2 > sign means that an observed frequency higher than expected, while < sign means a lower-than-expected frequency.

Cluster Analysis
The AMR pattern of a microorganism is the result of the different combinations of AMR genes, and of their expression in the host. Cluster analysis (Figure 1), as described in Section 4, was performed to analyze the presence of these patterns and the relationship with the other available factors (year, area and source). The analysis identified six different clusters with a combination of the 19 different AMR genes. The numerosity of each cluster is the following: C1 (n = 843), C2 (n = 465), C3 (n = 128), C4 (n = 237), C5 (n = 86), and C6 (n = 69). Table 6 reported the distribution of the 10 genes with a frequency >50 among the 6 clusters. Cluster 2 represents the isolates with the lowest AMR risk since none of the considered genes was reported. On the other hand, Cluster 6 includes all positive sequences for lnuB and lsaE, which were nearly absent in all other clusters. Among the other clusters, C1 includes sequences positive for tetM, ermB and ant(6)la with prevalences of 100%, 12% and 3 %, respectively, while the other seven genes were not found. Clusters 3, 4 and 5 showed a different distribution of AMR gene frequency. Cluster 3 may be characterized The analysis identified six different clusters with a combination of the 19 different AMR genes. The numerosity of each cluster is the following: C1 (n = 843), C2 (n = 465), C3 (n = 128), C4 (n = 237), C5 (n = 86), and C6 (n = 69). Table 6 reported the distribution of the 10 genes with a frequency >50 among the 6 clusters. Cluster 2 represents the isolates with the lowest AMR risk since none of the considered genes was reported. On the other hand, Cluster 6 includes all positive sequences for lnuB and lsaE, which were nearly absent in all other clusters. Among the other clusters, C1 includes sequences positive for tetM, ermB and ant(6)la with prevalences of 100%, 12% and 3 %, respectively, while the other seven genes were not found. Clusters 3, 4 and 5 showed a different distribution of AMR gene frequency. Cluster 3 may be characterized by a relative high frequency of tetM and ermA; cluster 4 by the presence of eight out of 10 the genes, and a high frequency for tetO; cluster 5 by the presence of six out of 10 genes, and a high frequency for mrsD and tetM (99% and 85%, respectively). Statistical analysis was performed to identify the association between cluster and the characteristics of the sequences (year, area and source of reporting). Table 7 shows that the frequency of cluster 2 (absence of AMR genes) was significantly less than expected after 2019, while cluster 6 (highest frequency of AMR genes) was absent until 2015. Then, its frequency started to increase with a peak in the period 2019-2020 (significant at Fisher's exact test). Clusters 1 and 2 showed a declining trend, while cluster 5 significantly increased its frequency in 2019-2022. 1 Bold type means that the observed frequency is different from the expected at Fisher's exact test (α = 0.05). 2 > sign means that an observed frequency is higher than expected, while < sign means a lower-than-expected frequency.
The same analysis was applied only to isolates from clinical sources (Table 8), and some differences were observed compared to whole database analysis. Indeed, cluster 4 had a significantly higher-than-expected frequency in the period 2019-2022. The same result was observed for clusters 5 and 6, even if the differences were not significant for the period 2021-2022. Cluster 2 showed a significantly higher than expected frequency in the period 2021-2022. 1 Bold type means that the observed frequency is different from the expected at Fisher's exact test (α = 0.05); 2 > sign means that an observed frequency is higher than expected, while < sign means a lo1wer-than-expected frequency.

Area of Reporting
The analysis of the association between clusters and area of reporting (Table 9) showed that cluster 6 has a significantly higher-than-expected frequency in the Asian countries, while it is scarcely identified in Europe, despite the larger part of the sequence being European. In this area, a significantly high frequency was observed for cluster 4, while a lower-than-expected frequency was observed for cluster 2.  2 Bold type means that the observed frequency is different from the expected at Fisher's exact test (α = 0.05); 3 > sign means that an observed frequency is higher than expected, while < sign means a lower-thanexpected frequency.
The analysis performed on clinical isolates (Table 10) gave similar results but few important differences. Indeed, cluster 1 was highly frequent in Europe, whereas cluster 2 had a lower-than-expected frequency in the same area, while its frequency was higherthan-expected in Americas.

Source of the Isolates
The association between clusters and sequence sources (Tables 11 and 12) showed that cluster 2 has a higher than expected proportion in milk samples, while it was lower than expected in blood. Isolates from the vagina, rectum, urine, and feces had a higher than expected frequency of sequences included in cluster 6. In the clinical subset, the higher frequency of cluster 6 was confirmed at least for isolates from urine and feces, while a higher-than-expected proportion of sequences included in cluster 3 was observed in blood isolates.  3 > sign means that an observed frequency is higher than expected, while < sign means a lower-than-expected frequency.  2 Bold type means that the observed frequency is different from the expected at Fisher's exact test (α = 0.05); 3 > sign means that an observed frequency higher than expected, while < sign means a lower-than-expected frequency.

Relevance of the Dataset
The database we considered (NCBI pathogen detection isolate browser) is a subset of a larger one, including 53 different bacteria and over a million sequences. Due to the voluntary nature of this information source, it does not strictly adhere to epidemiological standards (i.e., random sampling). However, the large number of sequences of S. agalactiae included in the database may be considered representative of the population of these bacteria related to human diseases, and it has practical importance including sequences from all over the world and from different biological sources. The very low frequency of sequences from animal sources should be considered a critical point from a One Health approach since the lack of these sequences cannot be related to low prevalence of infections in animals. Indeed, livestock-associated GBS infections (LA-GBS) are common in food-producing animals [7,16,17]. Therefore, the low frequencies of LA-GBS in the database may be related to the relatively high costs of these analyses, which could be unsustainable in the veterinary field. Despite the previous limitations, we considered the dataset of S. agalactiae sequences a useful source of information to investigate the molecular epidemiology of resistance genes and identify potential associations and trends, useful to improve surveillance at the human and animal levels.

Epidemiological and Clinical Characteristics
The number of sequences included in the database increased year after year. Indeed, nearly 50% of the overall sequences and more than 50% of the clinical-related sequences were included in the last three years. This increase may be related to the higher feasibility of performing genetic analysis owing to a decrease in cost and availability of new technologies, but an increasing concern with regard to the diffusion of these infections may also be hypothesized.
Europe is the area where about 50% of the sequences originated. When these data are compared to areas with a similar level of health services (Americas and Australia), the frequency observed was much lower, suggesting that the high frequency of European sequences is related to a higher prevalence of GBS infections on the continent. The known high frequency of LA-GBS, particularly in dairy cows [8,18], supports the importance of promoting the registration of LA-GBS sequences in the database to favor the investigation of the measures to reduce the risk of transmission between humans and animals.
Blood represents 25% of overall biological sources of GBS included in the database, while 25% was related to the urinary and intestinal tract (including urine and feces). Milk represents 10% of the whole sequences and only 2.9% of the clinical ones. These data suggest that the role of direct infections through milk in humans is of minor importance compared to the role in dairy cows, where milk is considered the major source of infection [19,20]. Moreover, it can be hypothesized that the pathogenetic characteristics of humans or LA-GBS are different, even if an acquisition of the adaptation gene to an alternative species is common [12,21].

Antimicrobial Resistance Pattern
The database considers 19 different AMR-genes, but only eight were recovered from the sequences with a proportion >100. Tetracycline resistance in GBS is ubiquitously high (usually >80%), and most GBS strains are characterized by the presence of resistance elements, tetO and tetM [3]. Indeed, tetM is the gene with the higher frequency both in the whole and in the clinical-related database. This gene is related to GBS tetracycline resistance that includes the genes for efflux proteins like TetK and TetL or to ribosomal protection proteins like TetM and TetO. Efflux proteins belong to the major facilitator superfamily (MFS) and all tet efflux genes encode membrane-associated proteins that export tetracycline from the cell, reducing the intracellular concentration of the antibiotic. In GBS, these TetK and TetL efflux proteins are encoded by tetK and tetL genes, respectively, and are usually located on large plasmids or plasmids that can integrate into the bacterial chromosome [22]. These proteins are responsible for detaching tetracycline molecules from the ribosome, thus, the aminoacyl-tRNA molecules can bind again to the ribosomal A-site allowing protein synthesis to continue [23,24]. Since its discovery, tetracycline has been extensively overused, and thus resistance to this antibiotic is now widely observed, as confirmed by the results of this study.
The gene with the second higher frequency is ermB, which is related to macrolide resistance. Indeed, the methyl-transferases encoded by the erm gene family, composed of more than 40 erm variants, represent the most common macrolide resistance mechanism in pathogenic bacteria via ribosomal methylation. ermB is considered the most widespread erm family gene in Streptococci and GBS [3], as confirmed by this study.
ant (6)la has the highest frequency in clinical-related sequences after the previous two genes, and its frequency was significantly higher than expected. This gene is related to resistance to aminoglycosides [25]. The presence of this gene in isolates from human diseases has peculiar importance from a One Health approach because, recently, it was also recovered from streptococci of animal origin [26], supporting the risk of a bi-directional transfer of resistance gene.
The very low frequency of genes related to penicillin-resistance GBS, such as pbp1a, pbp2a, pbp2b, and pbp2x, confirmed that overall penicillin resistance is still relatively rare [3]. Analogously, a very low frequency of genes related to vancomycin resistance was observed. The resistance is related to the vanG gene responsible for altering the vancomycin target site to D-Ala-D-Ser. It was suggested that the lack of vancomycin-sensitivity testing might bias the low frequency of vancomycin resistance isolated due to universal susceptibility to penicillin in GBS strains [3]. However, the analysis of the sequences with very low presence of van family genes confirms the low frequency of the vancomycin resistance observed in the field. The same results were observed for gentamycin resistance, associated with aacA-aphD gene [27], which was identified in less than four sequences. These data support the evidence of a low frequency for clinical isolates (<0.5%) [1,27].
The pathogenetic and the AMR characteristics of pathogens are the results of the combination of their virulence and AMR genes. A cluster analysis was conducted to investigate the pattern of AMR genes in GBS, allowing to identify six different clusters. One cluster (C2) included isolates without AMR genes, while another (C6) included sequences positives for the 10 most frequent AMR genes, even with different proportions. All the other four clusters included sequences with a high proportion of tetracycline gene resistance associated with macrolide (C3), aminoglycosides (C4), and msrD gene. This latter one is involved in resistance to azithromycin by acting as a ribosomal protection protein and displacing the macrolides from the ribosome. It is common in other Streptococci, supporting the evidence of AMR gene transmission among different streptococcal species [28].
Sequences included in C5 and C6 have a higher frequency in 2021-2022, and C6 sequences were not found until 2016, whereas C1 and C2 (with lower proportion of AMR genes) had a lower-than-expected frequency in recent years, even not significantly for C2 in clinical isolates. In this latter case, a significant increase in C4 was observed in recent years, suggesting a potential role of this specific resistance mechanism in the development of clinical cases. Moreover, this cluster also has a significantly higher frequency in Asiatic and European isolates, even if the association with clinical cases was confirmed only for asiatic isolates. In this latter continent, a significant increase in C6 was also observed. C2 showed a significantly lower-than-expected frequency in Europe, supporting the previous hypothesis of a role of the acquisition of new resistance genes in the development of GBS disease. It should be noted that Asian sequences provided a contradicting signal. Specifically, both C2 (no AMR) and C6 (high AMR) had a frequency greater than expected. This result suggests that the spread and development of GBS infections in an area with an apparent low prevalence may be due to strains without any significant AMR, but their treatment could lead to an increase in AMR very rapidly and the spread of the resistant strains.
C6 was typically observed in isolates from the uro-vaginal and rectal tract, including urine and feces, as expected. On the other hand, milk was the major source of C2, which supports the very low frequency of clinical isolates from milk, and, therefore, a probable low frequency of antimicrobial treatment.

NCBI Pathogen Detection Isolate Browser and Antibacterial Data
Approximately one million isolates from 53 different bacteria are currently available in the NCBI pathogen detection isolate browser (NPDIB). Therefore, the following parameters were selected to epidemiologically study S. agalactiae strains uploaded to this database: scientific name, collection date, location, isolation type, and AMR genotype.
The NPDIB data were downloaded into a Microsoft Excel spreadsheet for this study. The scientific name, collection date, location, isolation type, serovar, and AMR genotype were then organized into columns in a matrix. A sample of S. agalactiae was represented by each row of the matrix.
As retrieved from NPDIB, the AMR genotype data presented the following formatting: "aac(6 )-Ie/aph(2 )-Ia = COMPLETE, catA16 = COMPLETE, msr(D) = COMPLETE, sat4 = COMPLETE, tet(L) = COMPLETE". Before further processing, data were transformed to generate one column for each gene, which was filled with 1 if the gene was discovered in the sample and 0 if it was not. The information in the other columns was changed to align the formats and switch out text entries for numbers.

Statistical Analysis
Data were analyzed on XLSTAT 22.3.1 (Addinsoft, New York, NY, USA, 2022), applying χ 2 test, Fisher's exact test, and cluster analysis with the following parameters: Euclidean distance, Ward's agglomeration method, and truncation with Silhouette index [29].

Conclusions
The analysis of recorded S. agalactiae sequences, even if collected without an epidemiological approach, allows to identify useful information on GBS causing infections in human beings and to identify an important gap represented by the scarce presence of sequences from animal origin, despite the high prevalence of these infections in the animal domain and the known zoonotic and reverse-zoonotic characteristics of these bacteria. Moreover, for the development of effective control programs in a One Health perspective, further study of the pathogenesis involving STs and SNPs would also be of great interest. The recorded data confirm the importance of these infections, as suggested by the large increase in data in the last two years and the increased importance of AMR in the case of GBS infections. Indeed, a cluster of sequence positives for the most frequent AMR genes is prevalent in Asian countries, suggesting the importance of verifying the application of prudent protocols for antimicrobials in areas with an increasing frequency of GBS infections.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/antibiotics11091236/s1; Table S1: Genetic location, molecular functions and drug class resistance induced by antibiotic resistance genes found in S. agalactiae genome.