Molecular Characterization and Antimicrobial Susceptibilities of Nocardia Species Isolated from the Soil; A Comparison with Species Isolated from Humans

Nocardia species, one of the most predominant Actinobacteria of the soil microbiota, cause infection in humans following traumatic inoculation or inhalation. The identification, typing, phylogenetic relationship and antimicrobial susceptibilities of 38 soil Nocardia strains from Lara State, Venezuela, were studied by 16S rRNA and gyrB (subunit B of topoisomerase II) genes, multilocus sequence analysis (MLSA), whole-genome sequencing (WGS), and microdilution. The results were compared with those for human strains. Just seven Nocardia species with one or two strains each, except for Nocardia cyriacigeorgica with 29, were identified. MLSA confirmed the species assignments made by 16S rRNA and gyrB analyses (89.5% and 71.0% respectively), and grouped each soil strain with its corresponding reference and clinical strains, except for 19 N. cyriacigeorgica strains found at five locations which grouped into a soil-only cluster. The soil strains of N. cyriacigeorgica showed fewer gyrB haplotypes than the examined human strains (13 vs. 17) but did show a larger number of gyrB SNPs (212 vs. 77). Their susceptibilities to antimicrobials were similar except for beta-lactams, fluoroquinolones, minocycline, and clarithromycin, with the soil strains more susceptible to the first three (p ≤ 0.05). WGS was performed on four strains belonging to the soil-only cluster and on two outside it, and the results compared with public N. cyriacigeorgica genomes. The average nucleotide/amino acid identity, in silico genome-to-genome hybridization similarity, and the difference in the genomic GC content, suggest that some strains of the soil-only cluster may belong to a novel subspecies or even a new species (proposed name Nocardia venezuelensis).


Introduction
Nocardia spp. species are found everywhere from sludge and soil to contaminated soil water, deep-sea sediments [1], and desert habitats [2]. Some even infect plants and animals [3][4][5]. They are among the most predominant Actinobacteria of the soil microbiota, including that of the extreme biosphere [6]. The members of Nocardia spp. are producers of diverse natural bioactive metabolites [7], such as antimicrobials, enzyme inhibitors, immunomodifiers, and plant growth-promoting substances, etc. [8,9], a result of the physiological and biochemical pressures imposed by the environmental conditions under which they live [10]. Their activity in the degradation of polycyclic aromatic hydrocarbon [11,12] focused on them as potential xenobiotic bioremediators.
Although they cause a number of severe invasive diseases [3,13], the members of Nocardia spp. are mainly opportunistic pathogens in humans, usually affecting the lungs, central nervous system, and skin [14,15]. The burden of human nocardiosis differs between geographical locations.
In previous work, Nocardia strains were isolated from soil collected at different sites in Lara State (Venezuela) [16], where the prevalence of human mycetoma (a severe cutaneous infection) is high. The present work examines the identity of these strains via 16S rRNA and gyrB genes analysis, together with multi-locus sequence analysis (MLSA), whole-genome sequencing (WGS), and susceptibilities. With a special focus on the most prevalent soil species detected, differences and similarities with clinical strains were explored.

Molecular Identification of Species
In the present work, 38 phenotypically identified strains were submitted to the National Centre for Microbiology (CNM, Majadahonda, Madrid, Spain) for molecular identification. The strains were isolated from soil samples collected over two periods-08/2002 and 05/2006-from nine sites in six municipalities in Lara State, NW Venezuela ( Figure 1). Table 1 shows the climatic characteristics of each site, and the Supplemental File the soil culture and the phenotypic identification previously described [16]. After growth on Columbia agar supplemented with 5% (v/v) sheep s blood and buffered charcoal-yeast extract agar (BCYE) for 48-72 h at 37 • C under aerobic conditions, their chromosomal DNA was extracted by the boiling method. The 16S rRNA and gyrB genes were then amplified and sequenced as previously described [17], and species identified by comparing them against type strain sequences [18,19] using the BLAST algorithm v.2.2.10 (http://www.ncbi.nlm.nih.gov/BLAST). Similarity values of ≥99.6% for 16S rRNA [20], and ≥93.5% for gyrB, were deemed to indicate the same species [19]. Sequences were assembled using SEQ-Man software (DNASTAR, Inc., Madison, WI) and, using BioEdit [21], adjusted for phylogenetic analysis to coincide with the length of the shortest sequence for each reference strain (16S rRNA 1215 bp; gyrB 726 bp). They were then aligned using the ClustalW algorithm [22], and 16S rRNA and the gyrB phylogenetic trees constructed using MEGA 6 software [23] following the neighbor-joining (NJ) and maximum likelihood (ML) methods [24] with 1000 bootstrap replications.

Multilocus Sequence Analysis (MLSA)
All 38 soil strains were then subjected to MLSA [25] alongside a further five Venezuelan strains (three of N. cyriacigeorgica and two of N. farcinica), eight Spanish clinical Nocardia strains, and type strain sequences retrieved from GenBank. MLSA was performed using trimmed sequences of concatenated gyrB-16S rRNA-secA1-hsp65 (1790-bp). A NJ phylogenetic tree was then constructed using MEGA 6 software [23]. It should be noted that N. elegans lacks a reference type strain for all the genes here examined; the clinical N. elegans 20130578 strain was used as an alternative, and it is, therefore this strain that appears in the phylogenetic tree.

Genetic Similarities Among Soil and Clinical Nocardia cyriacigeorgica Strains
Given the strong predominance of Nocardia cyriacigeorgica strains in the sampled soils, 30 previously characterized Spanish clinical strains belonging to this species [17] were compared with them in terms of their 16S rRNA, gyrB, and GyrB (DNA gyrase subunit B) sequences. Hunter-Gaston discrimination indices (HGDI) [26], single nucleotide polymorphisms (SNPs), and haplotype numbers were examined using DnaSP software [22]. The N. cyriacigeorgica 16S rRNA, gyrB, and GyrB sequences of the type strain DSM 44484 T (GenBank accession number AF430027, GQ496121, and ACV89678, respectively) were used to determine SNP numbers. In addition, the population structures of the soil and clinical groups were examined via a gyrB NJ phylogenetic tree, with the inclusion of a further 3 Venezuelan clinical strains and the genome reference strain GUH-2 (GenBank accession number FO082843).

Bioinformatic Analysis
Six representative soil N. cyriacigeorgica strains, thought to be distinct according to their gyrB analysis results, were sequenced. Genomic DNA was extracted from single subcultured colonies using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany). Paired-end libraries were prepared using the Nextera-XT DNA Library Preparation Kit (Illumina 1.9, San Diego, CA, USA) and sequencing performed using the Illumina NextSeq 500 platform (mean sequencing depth ∼90 × per sample). Read quality control was undertaken using FastQC v. 0.11.8 software. Trimmomatic v. 0.33 software [28] was used to remove adapter contamination and to trim low-quality regions (phred >2 0 in a 4 nt window, minimum length 70 bp). Kmerfinder v. 3.0 software [29] was then used for species confirmation and to detect contamination. Assembly was performed using Spades v. 3.8.0 software [30]; Prokka v. 1.12 software [31] was used for genome annotation. Quast v. 4.1 software [32] was used for assembly quality control.
High-quality assemblies of the same six soil Nocardia strains, the type strain DSM 44484 T , the reference genome of N. cyriacigeorgica GUH-2, and five genome assemblies for N. cyriacigeorgica (available in NCBI at the time of publication) were subjected to core-genome gene-by-gene typing (cgMLST) using chewBBACA v. 2.0.17.2 software (open-source in https://github.com/B-UMMI/ chewBBACA) [40]. Those loci corresponding to potentially complete coding sequences (CDS) that were unique, but present in 95% of the strains, were used in subsequent phylogenetic analysis, using GrapeTree v .2.0 software to visualize the results. A phylogenetic analysis was also performed using bcgTree v .1.1.0 software (available at https://github.com/iimog/bcgTree) [41]; this searches for 107 conserved proteins among the examined bacteria and creates a concatenated gene matrix for a maximum likelihood phylogeny analysis with 100 bootstrap replications (performed using RAxML v .8.2.9 software) [42].

Distribution of Nocardia Species in the Soil
The number of Nocardia strains recovered from the soil samples ranged from 1-14 (mean = 6 strains per sample). The Quebrada de Oro (14 strains) and Caraquita (9 strains) sites returned the highest number of soil strains. 16S rRNA sequencing [50] identified the species of all 38 strains with the following distribution: N. cyriacigeorgica 29 strains, N. abscessus 2, N. rhamnosiphila 2, N. vermiculata 2, N. asteroides 1, N. elegans 1, and N. mexicana 1 strain. Three different species were found in Quebrada de Oro and Siquisique, in the Crespo and Urdancia municipalities, respectively. The most common species, N. cyriacigeorgica, was present at all sites except for El Padrón (in the Torres municipality) ( Table 1). Species assignment via gyrB analysis [19] agreed with the 16S rDNA-based identifications for 27 strains (71.05%). Table 1 highlights those for which the results were discrepant.

Phylogenetic Analysis by MLSA
MLSA assigned all the soil strains but four to the same species as determined by 16S rRNA analysis ( Table 1). The percentage similarity of each MLSA sequence with respect to the MLSA sequence of the respective type strain was: 94.0-98.3% in N. cyriacigeorgica, 93.2-94.5% in N. abscessus, 93.0-93.5% in N. rhamnosiphila, 96.3% in N. asteroides, and 93.5% in N. mexicana. In addition, MLSA confirmed the gyrB-based identification of 27 strains. The MLSA phylogenetic tree showed the 38 soil strains to group into three clusters for NJ (Figure 2), and more for ML topologies (Supplementary Figure S1). Most gathered into cluster A, N. elegans, N. nova and N. vermiculata grouped into cluster C, and N. cyriacigeorgica strains were found in all three clusters. The clinical strains fell closer to the type strain of each species than did the soil strains. Twenty of the 29 soil N. cyriacigeorgica strains fell into cluster B: the remainder were distributed across clusters C (n = 7) and A (n = 2). Table 1 shows the antimicrobial-resistance phenotypes for each soil strain. These soil strains showed a phenotype that fitted the drug pattern type [51], except for the N. asteroides soil strain which was susceptible to aminoglycosides and clarithromycin. The N. mexicana strain showed a wider resistance spectrum. The soil strains of N. cyriacigeorgica showed variable resistance to amoxicillin-clavulanate, clarithromycin, and ciprofloxacin. Table 2 shows the corresponding MIC50, MIC90, MIC range, and resistance rates. type strain of each species than did the soil strains. Twenty of the 29 soil N. cyriacigeorgica strains fell into cluster B: the remainder were distributed across clusters C (n = 7) and A (n = 2).   Table S1 and Table 2 compare the 16S rRNA, gyrB, and GyrB sequences and antimicrobial susceptibilities of the soil N. cyriacigeorgica strains to those reported for previously studied Spanish human strains [17]. The soil strains were represented by three 16S haplotypes while the human strains were represented by just one, and by 13 gyrB haplotypes rather than 17 for the human strains. The high HGDI of the clinical N. cyriacigeorgica strains showed them to be more diverse than the soil strains (0.94 vs. 0.761). However, compared to the type strain DSMZ 44484 T , the soil strains had higher SNP numbers, and wider SNP ranges per strain, than did the clinical strains (212 vs. 77 and 1-109 vs. 0-38).

Antimicrobial Susceptibilities
The gyrB-based phylogenetic relationships among the soil and human N. cyriacigeorgica strains, the three Venezuelan clinical strains, the type strain DSM 44484 T , and the genome reference strain GUH-2 are shown in Figure 3 and Supplementary Figure S2. The main cluster (cluster I) includes the 30 Spanish clinical strains, five soil strains, the three Venezuelan clinical strains, plus the two reference strains. Two subclusters with 20 and 17 strains were also seen, with N. cyriacigeorgica GUH-2 in one and DSM 44484 T in the other. Nineteen of the 29 N. cyriacigeorgica soil strains gathered into a soil-only cluster (cluster II), i.e., it contained no human source strains. This cluster showed similarity values ranging from 91.2-92.3% with respect to the type strain. Finally, three soil strains and one clinical strain grouped into a minor cluster with two independent branches (one strain each one).
The soil strains showed low resistance (0-5%) to ceftriaxone, cefepime, imipenem, amikacin, tobramycin, co-trimoxazole, and linezolid, intermediate resistance to minocycline (24%), ciprofloxacin (28%), and amoxicillin-clavulanic acid (48%), and strong resistance to clarithromycin (96.5%) ( Table 2). Their susceptibilities to aminoglycosides, doxycycline, tigecycline, co-trimoxazole, and linezolid were similar to those shown by the human strains. However, differences (p ≤ 0.05) were seen between the soil and clinical strains for all studied beta-lactams, fluoroquinolones, clarithromycin, and minocycline. Overall, the human N. cyriacigeorgica strains were more resistant (except for clarithromycin) than the soil strains. With respect to tigecycline (for which there are no available breakpoints for Nocardia), only one soil and three human strains returned MIC values of ≥ 4 mg/L.

Whole-Genome Sequencing of the soil N. cyriacigeorgica Strains
Whole-genome sequences of six soil N. cyriacigeorgica strains were obtained: two belonging to the major gyrB cluster (cluster I) and four to cluster II (the soil-only cluster). Their ANI and AAI and in silico GGDH (DDH-estimate) values, genomic G + C percentages, and other characteristics were used to determine their species. The same was performed for other N. cyriacigeorgica strains for which genomes were available (Table 3). Although all the strains showed 16S rRNA identities of ≥99.6% with respect to the type strain and the genome of the reference strain [20], the ANI-AAI values for the GUH-2 strain were <95% [52] (except for the ANI of strain 20110626). Strain 20110626, together with 20110624 (both in cluster I), showed higher ANI-AAIs (>89.84% and >91.12% respectively) than the other four studied genomes. Determining the DDH-estimate and G + C content via the GGDH server (https://ggdc.dsmz.de/ggdc.php#) [35] showed the four selected strains (20110629, 20110639, 20110648, and 20110649) of the soil-only cluster (cluster II) did not meet the conditions of ≥70% DDH-estimate plus a difference of <1% G + C with respect to strains GUH-2 and DSM 44484 T ; they were therefore interpreted as being distinct species . In addition, lower gyrB identity (≤ 93.5%) and G + G content (all 67.2%) values were seen for all the strains of the soil-only cluster than for the strains of cluster I (≥95.3% and ≥68.3% respectively). In contrast, for two strains of cluster I, and for those with available genomes (strains 3012STDY6756504, EML 446, EML 1456, MDA3349, MDA3732) [36], one or more criteria were met, rendering their interpretation as either "distinct or belonging to the same species", i.e., they could not be clearly identified.
To check these interpretations, analyses were run using the TrueBac TM ID BETA , AAI-profiler, and TSGY [37][38][39] Table S2). Using the TrueBac TM server, the ANI values for three of the four sequenced genomes from the soil-only cluster with respect to the GUH-2 reference genome was 87.7% (0.877). With the AAI-profiler, the AAIs of the four selected strains from the soil-only cluster, and that of the EML 1456 strain, were~75%; the remainder were over ≥80%. When the TYGS server (https://tygs.dsmz.de) [39] was used to determine AAI with respect to GUH-2 and DSM 44484 T , the strains of the soil-only cluster returned > 1% difference in the G + C content; no such result was returned for any other strain. When these four selected strains were compared among themselves, the gyrB, ANI, AAI, DDH, and G + C ranges were 97.8-98.8%, 99.65-99.73%, 99.56-99.73%, 97.70-98.80%, and 0.01-0.33, respectively. Also using the TYGS, 16S rRNA gene sequence-based and whole-genome sequence-based trees were constructed with the above-mentioned genome sequence data and those of Nocardia type strains of other species. The soil-only cluster appeared separated from the other N. cyriacigeorgica strains in the whole-genome sequence-based tree ( Figure 4).  Figure 4. 16S rRNA gene sequence-based and whole-genome sequence-based phylogenetic trees constructed using FastME v.2.1.6.1 software (which calculates Genome BLAST Distance Phylogeny (GBDP) distances; the branch lengths are scaled in terms of GBDP distance formula). The numbers above the branches are GBDP pseudo-bootstrap support values (all are >60% from 100 replications), with average branch support of 91.8% and 58.8% for the 16S rRNA gene and for the genome respectively. The trees were rooted at the midpoint. The results were provided by the Type Strain Genome Server (TYGS), a free bioinformatics platform available at https://tygs.dsmz.de (The whole genome-based taxonomic analysis was performed on 8th January 2020) [39]. . 16S rRNA gene sequence-based and whole-genome sequence-based phylogenetic trees constructed using FastME v.2.1.6.1 software (which calculates Genome BLAST Distance Phylogeny (GBDP) distances; the branch lengths are scaled in terms of GBDP distance formula). The numbers above the branches are GBDP pseudo-bootstrap support values (all are >60% from 100 replications), with average branch support of 91.8% and 58.8% for the 16S rRNA gene and for the genome respectively. The trees were rooted at the midpoint. The results were provided by the Type Strain Genome Server (TYGS), a free bioinformatics platform available at https://tygs.dsmz.de (The whole genome-based taxonomic analysis was performed on 8th January 2020) [39].

web servers (Supplementary
Using the chewBBACA platform, a novel cgMLST typing method based on 3048 loci was performed independent of any defined comparator strain [40]. The N. cyriacigeorgica type strain designated IMMIB D-1627 has several culture collection denominations, including DSM 44484 and NBRC 100375 (although their respective genomes differ in 10 alleles). In the cgMLST dendrogram, the genome of the type strain NBRC 100375 has a central node from which other genomes emerge. Indeed, moving in a clockwise fashion, six distinct lineages can be seen ( Figure 5), with the genome of the reference GUH-2 appearing as lineage 1 (with 3034 different alleles). The genomes of the soil strains appeared as lineages 1, 3, and 5, with lineage 5 belonging to the soil-only cluster. The strains of this latter cluster differ in 3047 alleles with respect to the central node of NBRC 100375, and among themselves by a mean 1594 alleles. Using the 107 essential single-copy genes extracted by BCGtree analysis [41], the four selected strains of the soil-only cluster grouped into one of two clusters with a high bootstrap value ( Figure 5).
Regarding the quinolone resistance, the topoisomerase subunits GyrA/B of the soil N. cyriacigeorgica strains showed both two major alleles (19 and 11 changes, respectively): GyrA1 and GyrB1, in CNM20110624-626 (with 6 and 4 differences outside of the quinolone-determining-region between both strains); and GyrA2 and GyrB2, without changes in the four strains of the soil-only cluster (Supplementary Figure S3). Lastly, no intact or questionable phages were detected in the soil strains.

Discussion
Like other actinomycetes, Nocardia spp. contribute to soil health, playing major roles in the cycling of organic matter, inhibiting the growth of plant pathogens, and decomposing complex mixtures of dead plants and animals [1,53]. As well as maintaining the biotic equilibrium of the soil, these bacteria are involved in a wide array of opportunistic infections in both immunocompromised and immunocompetent persons [13]. Mycetoma and pulmonary nocardiosis, respectively caused by traumatic inoculation and inhalation, are the most common [14,15]. The increase in the size of the immunocompromised and immunosenescent populations has led to an increase in the number of cases of nocardiosis recorded. The annual incidence rate in Canada has now reached 0.87/100,000 inhabitants [15]; in Western Europe, the hospitalization rate due to nocardiosis has reached 0.04/100,000 inhabitants [54].
Climate, vegetation type, and soil pH probably affect the frequency and diversity of soil aerobic Actinobacteria [55]. Those that cause human infections in any given area are typically those found in the local soil [56]. Thus, different Nocardia species appear as major aetiological agents in different countries. For instance, N. farcinica causes infections in Canada [15], France [57], and Japan [58], but not in Spain, where the incidence N. cyriacigeorgica is double that of N. farcinica [59]. Nocardia spp. in the environment, thus posing some risk to human health [14], a fact reflected in the greater incidence of mycetoma in farmers and other people from rural areas of Lara State [16].
Nocardia genus contains about 200 species (https://lpsn.dsmz.de/), however, in the present work, only seven species were identified, with N. cyriacigeorgica the most common (71.8%). Surprisingly, Nocardia brasiliensis, the main causal agent of mycetoma in Lara State, was not isolated in the previous work [16]. In south-eastern Spain, N. cyriacigeorgica (previously identified as the N. asteroides complex) [60] has been detected in soil samples [61], and it is responsible for the majority of human nocardiosis (25%) [59]. However, N. brasiliensis, which is responsible for more than half of soft tissue/bone infections [59], was not detected in the above study [61]. This might be explained in that actinomycetes are 3-5.6 times more abundant in air samples above ground than in the soil [62].
With respect to the present molecular targets, MLSA (gyrB-16S rRNA-secA1-hsp65) was the arbiter of Nocardia species identification [25], confirming the 16S rRNA-and gyrB-based assignment results for 89.47% and 71.05% of the strains, respectively. Nearly 70% of the soil N. cyriacigeorgica strains isolated from five of the nine sampling sites gathered into MLSA cluster B or gyrB cluster II (the soil-only cluster). Both clustering methods are valuable in species/subspecies identification [63], although the gyrB method, with just one studied gene, is more simple.
gyrB gene sequencing showed the soil-derived N. cyriacigeorgica strains to be less diverse (lower HGDI) than the human-isolated strains, although the number and range of SNPs per strain were significantly greater. The difference in SNPs found between the DSM 44484 T strain and the soil strains might suggest the presence of some atypical N. cyriacigeorgica strains. In addition, the soil strains were more susceptible to beta-lactams, fluoroquinolones, and minocycline than were the human strains, and more resistant to clarithromycin. Regarding fluoroquinolones, susceptibility differences could be related to variations in the amino acid composition of GyrA/GyrB. These differences might be the result of reduced exposure to antimicrobials in Venezuelan soils, or perhaps low intrinsic resistance of this variant.
To check the species assignment of the strains in the soil-only cluster -despite them belonging to the same species according to their 16S rRNA results -some were subjected to WGS along with others from outside this cluster. Several coefficients were required to reach specific thresholds for an assignment to be deemed correct: > 95% ANI/AAI, > 70% DDH-estimate and a < 1% difference in G + C content [34,52,64,65]. ANI resolves well between genomes that share 80-100% identity, and AAI does so for species that share < 80% ANI and/or when 30% of their gene content is very divergent [34]. In the present work, the results of both AAI and ANI were taken into account, along with the DDH-estimate, and the G + C content since a query genome with an ANI of < 95% likely represents a new species [66]. Indeed, with respect to the genome of reference strain GUH-2, average ANI and AAI values of around 87% were returned for the strains of the soil-only cluster, along with a mean DDH-estimate of 31.6% and G + C content differences of around 1.5%. In addition, the N. cyriacigeorgica strains of the soil-only cluster showed the greatest identity among themselves, with average ANI values of 99.7% being returned. For two strains from outside of the soil-only cluster, as well as for those for which genomes were available, the ANI and AAI values were around 90% and 92%, satisfying the criterion of a <1% difference in the G + C content, meaning they belong to the same species.
According to the commercial TrueBac TM system, and the AAI-profiler and TSGY systems (both open source) [37][38][39], the genomic evidence might suggest that a new species (Nocardia venezuelensis sp. nov) exists among the soil-only cluster strains examined, all of which had low G + C contents. Some of the available genomes studied might also belong to a new species. In cgMLST (performed using chewBBACA software) [40], six lineages appear for 12 N. cyriacigeorgica strains around the reference strain DSM 44484 T / NRBC 100375. It may be that the DSM 44484 T strain provides a better reference genome than the current GUH-2 reference strain. Intraspecies MLSA sub-clusters of N. cyriacigeorgica have already been described [63]; thus, whole-genome analyses of N. cyriacigeorgica should be performed to determine its lineages, with the description of different species/subspecies as members of a single complex.
In conclusion, no genetic differences, nor differences in antimicrobial susceptibilities, were found between the Nocardia strains isolated from the Venezuelan soil samples and the reference or clinical strains-except for the strains of N. cyriacigeorgica-. This might indicate that some of the latter belong to a new subspecies of N. cyriacigeorgica or even a new species. Should this be confirmed, the name Nocardia venezuelensis is proposed.
Supplementary Materials: The following are available online at http://www.mdpi.com/2076-2607/8/6/900/s1, Figure S1. Phylogenetic ML tree based on the MLSA analysis gyrB-16S rRNA-secA-hsp65 genes) of the 38 N. cyriacigeorgica strains from soil (in blue), 5 Nocardia clinical strains from Venezuelan patients and nine Spanish clinical strains representing each species present in soil (in red), plus the type strains (in black). The asterisk indicates the strains selected for WGS. Na stands for N. abscessus, so on Nast for N. mexicana, Nc for N. cyriacigeorgica, Ne for N. elegans, Nf for N. farcinica, Ni for N. ignorata, Nm for N. mexicana, Nn for N. nova, Nr for N. rhamnosiphila, and Nv for N. vermiculata. The reliability of the topologies was assessed by the bootstrap method with 1000 replicates; Figure S2. Phylogenetic relationships of the 29 Venezuelan N. cyriacigeorgica soil strains (in blue), three Venezuelan and 30 Spanish N. cyriacigeorgica clinical strains (in red), as revealed by their gyrB genes. The reliability of the ML topologies was assessed by the bootstrap method (1000 replications). The asterisk indicates the strains selected for WGS; Figure S3. Amino acid sequences of GyrA alignment from N. cyriacigeorgica genome reference strain GUH-2, type strain DSM44484T, CNM20110626, and CNM20110624 with major allele GyrA1, and the strains of the soil-only cluster (CNM20110629, CNM20110639, CNM20110648, and CNM20110649) with major allele GyrA2; Table S1. Comparison of the main typing characteristics (16S rDNA, gyrB, and GyrB) between the N. cyriacigeorgica soil strains from Lara State (Venezuela) and clinical samples from Spanish patients; Table S2. Interpretations of the analysis of the genomes of the soil N. cyriacigeorgica strains and the NCBI-available N. cyriacigeorgica genomes in terms of gyrB, ANI, AAI, in silico genome-to-genome distance similarity (GGDH; DDH-estimate), and differences in G+C content.