Epidemiology of Antimicrobial Resistance Genes in Staphylococcus aureus Isolates from a Public Database from a One Health Perspective—Sample Origin and Geographical Distribution of Isolates

Staphylococcus aureus are commensal bacteria that are found in food, water, and a variety of settings in addition to being present on the skin and mucosae of both humans and animals. They are regarded as a significant pathogen as well, with a high morbidity that can cause a variety of illnesses. The Centers for Disease Control and Prevention (CDC) has listed them among the most virulent and resistant to antibiotics bacterial pathogens, along with Escherichia coli, Staphylococcus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, Enterococcus faecalis, and Enterococcus faecium. Additionally, S. aureus is a part of the global threat posed by the existence of antimicrobial resistance (AMR). Using 26,430 S. aureus isolates from a global public database (NPDIB; NCBI Pathogen Detection Isolate Browser), epidemiological research was conducted. The results corroborate the evidence of notable variations in isolate distribution and ARG (Antimicrobial Resistance Gene) clusters between isolate sources and geographic origins. Furthermore, a link between the isolates from human and animal populations is suggested by the ARG cluster patterns. This result and the widespread dissemination of the pathogens among animal and human populations highlight how crucial it is to learn more about the epidemiology of these antibiotic-resistance-related infections using a One Health approach.


Introduction
Staphylococcus aureus is a commensal bacterium present on the skin and mucosae in both humans and animals, but it can also be found in food, water, and various other environments [1,2].Unfortunately, it is also a major pathogen of great morbidity, leading to a wide range of infections including bacteremia, infective endocarditis, complicated skin and soft tissue infections, pleuropulmonary infections, urinary tract infections, toxic shock syndrome, and prosthetic device infections [3].Most cases of infection are observed in healthcare and community settings [4] and it has been estimated that the global mortality due to S. aureus infections reached approximately 1 million in 2019 [5].
S. aureus is also present in almost every animal species, from wild animals [6] to livestock and pets [7,8], and it can lead to different kinds of infections that can be a health and economic burden, especially in farm animals, including mastitis in ruminants [9], septicemia, osteoarticular infections, and pododermatitis in poultry [10][11][12][13], exudative epidermitis in piglets [14], and cutaneous abscesses [15] and mastitis in rabbits [16].
S. aureus, with its enterotoxins, is also considered one of the principal pathogens responsible for foodborne diseases [17].Indeed, 241,000 illnesses per year are estimated in the United States [18], while in Europe, the officially diagnosed cases in 2021 were 640 with more than 50 hospitalizations [19]; these latter small numbers probably underestimate the frequency of the infections due to their mild symptoms not requiring medical attention, thus leading to unreported foodborne infections in many countries.
S. aureus is included in the list of most virulent and antimicrobic resistant bacterial pathogens (ESKAPE) by the Centers for Disease Control and Prevention (CDC), and S. aureus is part of the worldwide threat represented by AMR [1,19].This latter problem and the large diffusion of these pathogens among human and animal populations support the importance of gaining information on the epidemiology of these infections using a One Health approach.This approach is fundamental to developing efficient control strategies that consider isolates from both humans and animals, as well as the possible risks related to bacteria and ARGs spreading via the environment.The access to publicly available databases, collecting isolates from numerous locations and sources, may help to investigate the distribution of AMR genes among isolates, as shown in a previous study [20].This paper is focused on reporting the results of the epidemiological analysis on the distribution of S. aureus AMR genes based on the isolation characteristics of geography, source, and clinical characteristics.

Data Description
We considered the worldwide public database NCBI Pathogen Detection Isolate Browser (NPDIB).On the 30 April 2022, the public database included 35,026 S. aureus isolates.The isolates were classified into three groups: human-associated (HUA), nonhuman-associated (NHA), and unknown (UNK).The HUA category contains samples taken in healthcare settings, while the NHA category includes isolates from animals, food, and the environment.Isolates with little or no information about the origin of isolation were labeled as unknown origin, thus forming the UNK category.Isolates without information on their geographical origin were not considered in the analysis.After data verification, we included 26,430 isolates in the epidemiological analysis.Most of the isolates submitted were from North America (USA, Canada, and Mexico) and Europe with 35% and 30.8% of the isolates, respectively.Asia accounted for about 20% of the isolates, whereas Oceania, South America, and Africa had the lowest percentages, with frequencies of 3-7% (Figure 1).
Table 1 reports the distribution of the isolates among the different geographical regions by the source characteristics: 63% of them were classified as HUA, 7.8% were NHA, while the remaining isolates had an unknown origin.The HUA group was the most frequent in each geographical area; a significant statistical difference among NHA, HUA, and UNK isolates using the χ 2 test (α = 0.05) was observed in each area, except Africa.   1 NHA = non-human-associated, HUA = human-associated, UNK = unknown origin. 2 Values with different letter superscripts among lines statistically differ at χ 2 test (α = 0.05). 3Other Asia category includes Saudi Arabia, Bangladesh, Cambodia, United Arab Emirates, Jordan, Hong Kong, Kazakhstan, Kuwait, Lebanon, Nepal, Oman, Pakistan, Russia, Singapore, Syria, Sri Lanka, South Korea, Thailand, Taiwan, Turkey, and Vietnam.
Figure 1.Distribution of S. aureus isolates by geographical region. 1 Other Asia category includes countries with a frequency <100: Saudi Arabia, Bangladesh, Cambodia, United Arab Emirates, Jordan, Hong Kong, Kazakhstan, Kuwait, Lebanon, Nepal, Oman, Pakistan, Russia, Singapore, Syria, Sri Lanka, South Korea, Thailand, Taiwan, Turkey, and Vietnam.

European Isolates
Given the substantial number of isolates originating from Europe and the United States, we comprehensively analyzed this dataset.In Europe (Figure 2), the UK had 2212 (27%), followed by Germany with 1547 (19%), Denmark with 928 (11.4%), the Netherlands with 754 (9.3%), Switzerland with 599 (7.4%), and Italy with 515 (6.3%).Other European states had lower isolate frequencies.The proportion of UK isolates was significantly larger than the other countries, but when the ratio cases/population was considered, the proportion of cases was higher for Denmark (154 records/million people), Switzerland (68 records/million people) and the Netherlands (42 records/million people), while in the UK, 33 records/million people were registered and in Germany, the number was 19 records/million people.  1 Other Asia category includes countries with a frequency <100: Saudi Arabia, Bangladesh, Cambodia, United Arab Emirates, Jordan, Hong Kong, Kazakhstan, Kuwait, Lebanon, Nepal, Oman, Pakistan, Russia, Singapore, Syria, Sri Lanka, South Korea, Thailand, Taiwan, Turkey, and Vietnam.

European Isolates
Given the substantial number of isolates originating from Europe and the United States, we comprehensively analyzed this dataset.In Europe (Figure 2), the UK had 2212 (27%), followed by Germany with 1547 (19%), Denmark with 928 (11.4%), the Netherlands with 754 (9.3%), Switzerland with 599 (7.4%), and Italy with 515 (6.3%).Other European states had lower isolate frequencies.The proportion of UK isolates was significantly larger than the other countries, but when the ratio cases/population was considered, the proportion of cases was higher for Denmark (154 records/million people), Switzerland (68 records/million people) and the Netherlands (42 records/million people), while in the UK, 33 records/million people were registered and in Germany, the number was 19 records/million people.
The trend observed in the worldwide distribution among NHA, HUA, and UNK isolates was observed for the European data as well (Table 2), where a significant statistical difference between the three groups was present in every area considered.  1 Other Europe category includes countries with a frequency <100: Austria, Belgium, Belarus, Croatia, Finland, Greece, Latvia, Lithuania, Luxembourg, Poland, Portugal, Czech Republic, Romania, Serbia, Slovenia, and Hungary.
The trend observed in the worldwide distribution among NHA, HUA, and UNK isolates was observed for the European data as well (Table 2), where a significant statistical difference between the three groups was present in every area considered. 1 NHA = non-human-associated, HUA = human-associated, UNK = unknown origin. 2 Values with different letter superscripts among lines statistically differ at χ 2 test or Fisher's exact test (α = 0.05). 3Other Europe category includes Austria, Belgium, Belarus, Croatia, Finland, Greece, Latvia, Lithuania, Luxembourg, Poland, Portugal, Czech Republic, Romania, Serbia, Slovenia, and Hungary.

USA Isolates
Within the different states of the USA, Massachusetts had the highest frequency of isolates (1530, 17%), followed by California with 15% of the US isolates (1388 isolates), New York state with 13% (1160 isolates), and Iowa with 11% (1007 isolates).The remaining states had frequencies lower than 10% (Figure 3).For the statistical analysis, all US states with a frequency <2% were included in the category "Other States".
New York state with 13% (1160 isolates), and Iowa with 11% (1007 isolates).The remaining states had frequencies lower than 10% (Figure 3).For the statistical analysis, all US states with a frequency <2% were included in the category "Other States".Table 3 reports the distribution and the statistical differences observed among NHA, HUA, and UNK isolates among the states, which had similar results compared to Europe.The statistical analysis results showed that a statistically significant difference among the three groups of isolates was observed for each state, except for New York, Iowa, Pennsylvania, and Missouri.
In addition, 314 records/million people were submitted from Iowa, a typical agricultural state, while Massachusetts, which supplied the highest number of isolates, had a proportion of 218 records/million people, while New York had 59 records/million people and California had 35 records/million people.Table 3 reports the distribution and the statistical differences observed among NHA, HUA, and UNK isolates among the states, which had similar results compared to Europe.The statistical analysis results showed that a statistically significant difference among the three groups of isolates was observed for each state, except for New York, Iowa, Pennsylvania, and Missouri.
In addition, 314 records/million people were submitted from Iowa, a typical agricultural state, while Massachusetts, which supplied the highest number of isolates, had a proportion of 218 records/million people, while New York had 59 records/million people and California had 35 records/million people.

Resistance Gene Distribution
Among the 67 ARGs reported in the database, only those with a total prevalence >2% were considered for the statistical analyses.Furthermore, regulatory genes such as blaI, blaR1 for blaZ, and mecI and mecR1 for mecA were excluded from this study.The most frequent ARGs identified were those conferring resistance to the tetracycline antimicrobial family with more than 77,000 positive identifications and those responsible for resistance against penams and fosfonic acid, with 51,459 and 45,659 identifications, respectively; ARGs related to resistance to aminoglycosides and fluoroquinolones both had more than 36,000 positive identifications, and genes related to resistance to macrolides had 15,784 positive identifications.Our prior study examined single ARG prevalence in depth [20].

Cluster Analyses
In order to identify a possible pattern in the distribution of the ARGs, a cluster analysis was performed on the dataset, not only to recognize a particular asset in the AMR of the various isolates, but also to identify potential associations between the different patterns and the source or the region of origin of the isolates.The analysis identified seven different clusters based on the presence of the ARGs described in Supplementary Table S1.In Figure 4, we graphically represent the ARG rates divided according to the relative antibiotic class and clusters to visually describe the ARG distribution among all seven clusters, as previously reported [20].Briefly, Clusters 2 and 3 were identified as the clusters with the lowest presence of ARGs, while Clusters 4, 5, and 6 had high rates of ARGs related to nine different antibiotic classes.Clusters 1 and 7 showed a mild resistance pattern with a progressive increase in ARG frequency from Cluster 7 to Cluster 1.In Figure 5, the distribution of the seven clusters identified in the subset of isolates considered in this paper are visualized.

Association between Gene Cluster and Geographical Area of Submission
The statistical analysis was performed to identify possible associations between the origin of the isolates and the cluster membership (Tables 4-6).The statistical analysis showed large and significant variations among clusters in relation to the geographical origin (Table 4).Indeed, North America had the highest significant prevalence of Clusters

Association between Gene Cluster and Geographical Area of Submission
The statistical analysis was performed to identify possible associations between the origin of the isolates and the cluster membership (Tables 4-6).The statistical analysis showed large and significant variations among clusters in relation to the geographical origin (Table 4).Indeed, North America had the highest significant prevalence of Clusters 4 and 5 (64.9% and 73.5%, respectively); Cluster 1 was more prevalent in Europe, with 2034 isolates, representing more than half of all isolates in this cluster (53.7%);Cluster 2 with 1153 isolates (40.3%),Cluster 7 with 2228 isolates (41.4%), and Cluster 3 with 1138 isolates (33.3%) were also more frequently reported in Europe, even if with lower prevalence.Cluster 6 showed the highest prevalence in Other Asian Countries with 1019 isolates (56.3%).

Association between Gene Cluster and Geographical Area of Submission
The statistical analysis was performed to identify possible associations between the origin of the isolates and the cluster membership (Tables 4-6).The statistical analysis showed large and significant variations among clusters in relation to the geographical origin (Table 4).Indeed, North America had the highest significant prevalence of Clusters 4 and 5 (64.9% and 73.5%, respectively); Cluster 1 was more prevalent in Europe, with 2034 isolates, representing more than half of all isolates in this cluster (53.7%);Cluster 2 with 1153 isolates (40.3%),Cluster 7 with 2228 isolates (41.4%), and Cluster 3 with 1138 isolates (33.3%) were also more frequently reported in Europe, even if with lower prevalence.Cluster 6 showed the highest prevalence in Other Asian Countries with 1019 isolates (56.3%).

European and USA Isolates
Since most of the isolates were reported from Europe and the USA, and the numbers between these two areas were comparable, we also analyzed the different distribution of clusters between these two areas.The results of the χ 2 analysis (Figure 6a) and of the residues (Figure 6b) confirmed significant differences in the cluster frequencies between the European and USA isolates.These differences are particularly significant for Clusters 1, 5, 6, and 7.
When the cluster distribution was analyzed within European countries, great differences were also observed (Table 5).Indeed, Cluster 1 was mainly recovered in the UK, while Germany supplied about one third of the isolates of Cluster 7, as well as Switzerland for Cluster 4.More generally, each country appeared to be characterized by one or two clusters with a prevalence largely higher than all of the others.

European and USA Isolates
Since most of the isolates were reported from Europe and the USA, and the numbers between these two areas were comparable, we also analyzed the different distribution of clusters between these two areas.The results of the χ 2 analysis (Figure 6a) and of the residues (Figure 6b) confirmed significant differences in the cluster frequencies between the European and USA isolates.These differences are particularly significant for Clusters 1, 5, 6, and 7.   When the cluster distribution was analyzed within European countries, great differences were also observed (Table 5).Indeed, Cluster 1 was mainly recovered in the UK, The same analysis applied to the USA (Table 6) gave similar results, with the distribution of clusters largely associated with a specific state.Indeed, Cluster 4 was mainly associated with Massachusetts isolates, Cluster 5 with New York isolates, Cluster 6 with California, and Cluster 7 with Iowa isolates.

Isolates from Humans (Clinical Sources)
We also investigated human (clinical) isolates in detail, which are those with more precise characterization in the database.They were more frequently classified in Clusters 4, 5, and 7 (Table 7), and the statistical analysis of the frequencies among geographical areas supports the difference observed in the general database.Most isolates in North America were classified in Clusters 4 and 5, while Cluster 1 was the most frequent in Europe, supplying nearly 50% of the isolates classified in this cluster.Cluster 2 isolates came mainly from North America, Europe, and Asian countries.
When the statistical analysis was performed within European countries, the results showed that 75% of the isolates in Cluster 1 were from the UK.Clusters 3 and 7 were more frequently associated with Germany, while Italy supplied about one third of the Cluster 6 isolates (Table 8).The same analysis performed on the USA isolates (Table 9) showed that Clusters 4 and 5 represented more than 70% of the total HUA isolates, Massachusetts was the area where Cluster 4 isolates were more frequently isolated, while New York State was the major source of Cluster 5 isolates.

Isolates from Animals, Food, and Environment
As stated before, the amount of NHA isolate was very low, representing only 7.8% of the whole database.Nonetheless, the data reported in Supplementary Table S2 show that Cluster 7 is mainly associated with China and Europe, with 42.2% and 31.2% of the isolates, respectively, while Cluster 2 is prevalent in Europe and North America (40.7% and 36.1%,respectively).Notably, the distribution of the NHA isolates in Europe is characterized by the complete absence of them in Clusters 5 and 6 (Supplementary Table S3), while Cluster 7 is more present in Germany (61.3%).In the USA, most of the isolates fall in Cluster 5 (166) with a great contribution from Maryland (42.2%), while there are no isolates in Cluster 6 (Supplementary Table S4).

Discussion
S. aureus is a highly adapted microorganism, with different lineages associated with specific hosts [21]; while a change in the major host is rare, spillover events can be more common and lead to infection in unusual hosts [22].The risk of transmission should consider not only zoonotic or anthropozoonotic (reverse zoonotic) infections, but also the ARG spread among species, through pathways that still need to be investigated [23].The presence of these risks supports the importance of epidemiological studies on the characteristics and distribution of the isolates with different genetic patterns [24].Publicly available datasets, collecting isolates from various countries and sources, allow the monitoring of the epidemiology of S. aureus and can help foresee changes in the distribution of different lineages and ARGs, as already observed for other pathogens, like S. agalactiae [25].
The analysis of the database considered in this study supports the evidence of significant differences in the distribution of isolates and ARG clusters among geographical areas of origin and sources of the isolates.Geographical differences in the genetic characteristics of isolates were already known in the case of bovine mastitis [26][27][28], but these differences were only recently investigated in the case of human isolates [24,29,30].More than 60% of the records originated in Europe and the USA, suggesting the relevance of the problem of AMR spread in these areas, but the proportion of records from Asia (19%) is not negligible and confirms the increasing importance of S. aureus infections and AMR spread in this area as well [25,[31][32][33].
As reported in other studies, there is a scarcity of information derived from low-to middle-income countries, also evident in this study, reflecting the limits of the local healthcare systems where resources for the control and prevention of AMR are limited [34,35].Indeed, one of the limits of this study is represented by the voluntariness of the submissions of the isolates, and the uneven frequencies of reporting information among countries could be attributed to economic limitations, missed diagnoses, or the lack of interest in sharing the data.
Most records are related to human clinical cases, and relatively few to environmental, animal, or food isolates.This imbalance could be a source of bias in the analysis when the different sources are compared; the close values of the NHA isolates from North America, the USA, and China suggest that an imbalance between NHA and HUA isolates is common in these highly populated areas.This may also be due to the low prevalence of severe illnesses in humans, usually not requiring hospitalization, leading to an underestimation of the frequency of these infections.
Despite the population size and public health conditions being similar within different European countries, the frequency and relevance of the problem seem to be different, suggesting the presence of local factors that could influence the spread and characteristics of the infections.For example, the Netherlands and Denmark have significant food animal populations, mainly cows and pigs, that may play a role in the epidemiology of S. aureus infections, as already shown for MRSA infections [36].
These results support the importance of a One Health approach to investigate these infections and the need for a larger number of isolates from animal, environmental, and food sources to confirm the pattern that emerged from the data considered in this study.The analysis of the distribution of ARG clusters among and within continents fully supports the previous observations and suggests that the circulation of the different isolates is associated with relatively small areas, and the development of AMR may be mainly due to the therapeutical protocols applied locally and cannot be generalized.Indeed, the results of this study suggest that the ARG clusters characterized by higher AMR (Clusters 4, 5, and 6) [20] are recovered with significantly higher frequency in North America, while the other clusters, characterized by lower AMR patterns, originate mainly from Europe.These differences are also supported by the evidence of different distributions of the clusters even when relatively homogeneous economic and political areas (USA and Europe) were compared.
In Europe, the prevalent cluster among HUA isolates is Cluster 1; this cluster is mainly associated with ARGs that are resistant to fluoroquinolones, penams, and tetracycline, while other prevalent clusters are numbers 2 and 7 that involve resistance toward fosfonic acid, tetracycline, and penams.It is interesting to note that in European states, the NHA isolates with ARGs are also included in Clusters 2 and 7, which are characterized by quite a high prevalence of ARGs resistant to rifamycin and fosfonic acid, which are not allowed for veterinary use [37].Clusters 5 and 6 are both characterized by ARGs for nucleosides [20]: in Europe, this antimicrobial class is not authorized for veterinary use, and this could explain why we did not find any NHA isolates in Clusters 5 and 6 among the European isolates [37].
Similarly, in the USA, Massachusetts was the major source of Cluster 4, while New York State was a major source for Cluster 5.These differences may result from a higher transmission frequency of genetically similar isolates in the specific geographical area and/or from applying different therapeutical protocols among the different states.Indeed, Cluster 4 is characterized by a high frequency of genes leading to fluoroquinolone and glycopeptide resistance, while Cluster 5 is characterized by a high frequency of genes related to penam and nucleoside resistance [20].It is important to highlight that Iowa is the single state with the highest frequency of insolates in Cluster 7, supporting the hypothesis of an association with livestock.Indeed, this cluster is related mostly to ARGs directed against tetracycline and penams [20], which are largely applied in livestock treatments [38].
Cluster 7 was also frequently observed within Asian isolates, suggesting a similar epidemiological pattern to the other continents.
Overall, the results of this study support the evidence that the occurrence of isolates with peculiar characteristics, including higher morbidity and AMR, may be identified, and should be considered [29].It also implies that the preventive measures to reduce the occurrence and development of AMR should be aimed at the clusters of S. aureus with the highest frequency at the local level [24].

NCBI Pathogen Detection Isolate Browser and Antibacterial Data (NPDIB)
More than one million isolates from 80 different bacteria are available from the NCBI pathogen detection isolate browser (NPDIB).The strains submitted to the database, used in this epidemiological study, were analyzed using the same parameters described in a previous study [20].Identification data from the database were exported and organized into columns with Microsoft Excel™.Each column represented an AMR gene, and the value of the cell was associated with a dichotomic variable: 1 if the ARG was present, and 0 if it was not.The information in the other columns (e.g., source of isolation, geographical area) were kept as in the original database.

Statistical Analysis
The statistical analysis was performed on SPSS 28.0.1.1 (IBM Corp., Armonk, NY, USA, 2022).We used a χ 2 test with Bonferroni adjustment in order to analyze the frequency distribution.When the cell numerosity was below 6, a Fisher's exact test was applied instead of a χ 2 test.With the aim of classifying isolates based on the different combinations of AMR genes, cluster analysis was performed using the following parameters: squared Eu-

Figure 4 .
Figure 4. Graphical representation (heat map) of the antibiotic classes' related ARG frequencies in each cluster [20].

Figure 5 .
Figure 5. Cluster distribution related to the current database.The cluster analysis process was reported in the companion paper [20].

Figure 4 .
Figure 4. Graphical representation (heat map) of the antibiotic classes' related ARG frequencies in each cluster [20].

Figure 4 .
Figure 4. Graphical representation (heat map) of the antibiotic classes' related ARG frequencies in each cluster [20].

Figure 5 .
Figure 5.Cluster distribution related to the current database.The cluster analysis process was reported in the companion paper[20].Figure5.Cluster distribution related to the current database.The cluster analysis process was reported in the companion paper[20].

Figure 5 .
Figure 5.Cluster distribution related to the current database.The cluster analysis process was reported in the companion paper[20].Figure5.Cluster distribution related to the current database.The cluster analysis process was reported in the companion paper[20].

Figure 6 .
Figure 6.Comparison between the distribution of clusters (a) and residues (difference between expected and observed frequency) (b) among European (orange colored bars) and USA (blue colored bars) S. aureus isolates.

Figure 6 .
Figure 6.Comparison between the distribution of clusters (a) and residues (difference between expected and observed frequency) (b) among European (orange colored bars) and USA (blue colored bars) S. aureus isolates.

Table 1 .
Geographical distribution and statistical differences of non-human-associated, humanassociated, and unknown isolates.

Table 2 .
European distribution of NHA, HUA, and UNK isolates.

Table 2 .
European distribution of NHA, HUA, and UNK isolates.

Table 3 .
USA distribution of NHA, HUA, and UNK isolates.

Table 3 .
USA distribution of NHA, HUA, and UNK isolates.

Table 4 .
Geographical distribution of clusters and statistical difference among geographical regions within each cluster.

Table 5 .
Geographical distribution of clusters in Europe and statistical differences among countries.

Table 6 .
Geographical distribution of clusters in USA and statistical differences among states.

Table 4 .
Geographical distribution of clusters and statistical difference among geographical regions within each cluster.

Table 7 .
Distribution and statistical differences of HUA isolates among clusters in geographical regions.

Table 8 .
Distribution and statistical differences of HUA isolates among clusters in European countries.

Table 9 .
Distribution and statistical differences of HUA isolates among clusters in USA.