Evaluation of Genetic Diversity and Virulence Potential of Legionella pneumophila Isolated from Water Supply Systems of Residential Buildings in Latvia

Legionella is an opportunistic pathogen with a biphasic life cycle that occasionally infects humans. The aim of the study was to assess the distribution of virulence genes and genetic diversity among L. pneumophila isolated from water supply systems of residential buildings in Latvia. In total, 492 water samples from 200 residential buildings were collected. Identification of Legionella spp. was performed according to ISO 11731, and 58 isolates were subjected to whole-genome sequencing. At least one Legionella-positive sample was found in 112 out of 200 apartment buildings (56.0%). The study revealed extensive sequence-type diversity, where 58 L. pneumophila isolates fell into 36 different sequence types. A total of 420 virulence genes were identified, of which 260 genes were found in all sequenced L. pneumophila isolates. The virulence genes enhC, htpB, omp28, and mip were detected in all isolates, suggesting that adhesion, attachment, and entry into host cells are enabled for all isolates. The relative frequency of virulence genes among L. pneumophila isolates was high. The high prevalence, extensive genetic diversity, and the wide range of virulence genes indicated that the virulence potential of environmental Legionella is high, and proper risk management is of key importance to public health.


Introduction
Legionella pneumophila is a Gram-negative, intracellular bacterial pathogen that causes the potentially fatal but preventable Legionnaires' disease (LD), which manifests as a severe pneumonia. L. pneumophila is also associated with a mild, flu-like illness referred to as Pontiac fever [1]. Although more than 16 serogroups of L. pneumophila are known, only half of them have been associated with LD, with the serogroup 1 (SG1) being responsible for the majority of LD cases in Europe. A large proportion of LD cases are diagnosed by a urinary antigen test, which is the best tool for identifying illnesses caused by L. pneumophila SG1 [2]. However, widespread application of a urinary antigen test may lead to improper identification and underreporting of other L. pneumophila SGs. This may result in delayed epidemiological investigation and implementation of preventive measures against L. pneumophila in water circulation systems [3].
Legionellae are ubiquitous and have been found in natural and artificial aquatic environments around the world [4]. The range of habitats includes underground and surface bodies of water, moist soils, aquatic plants, and rainforests, although anthropogenic aquatic environments are considered as the main reservoirs. Since Legionella can establish biofilms in In total, 492 water samples from multistorey residential buildings were collected in sterile bottles, including cold water (n = 164) and hot water (n = 328). Samples were collected from August 2016 to December 2022 at 200 residential buildings from 26 municipalities in Latvia. For 131 of these residential buildings, there was information available about previous cases of Legionnaires' disease. There was no such information about the rest (n = 69) of the buildings. Buildings that have been linked to LD cases reported in residents relied on thermal disinfection as the primary disinfection method for water supply systems. In buildings without LD history, no additional disinfection or Legionella monitoring had been implemented. All the buildings included in the present study were older than 30 years.
Sampling was performed by trained staff in accordance with the requirements of ISO 19458 [18]. At each sampling point, at least one hot water sample was taken from a shower head. Additional samples were taken depending on the size of the building and the responsiveness of the residents and included a cold water sample from a shower head and a hot tap water sample. One liter of water was collected at each location. The water circulation temperature was measured during sampling. A specially equipped vehicle was used to transport the samples to the laboratory while maintaining the temperature between 0 and 6 • C during transportation. Testing of the samples was started no later than 6 h after sampling.

Microbiological Testing of Legionella spp.
Identification and enumeration of Legionella spp. was performed according to ISO 11731 [19]. One liter of water was concentrated using a 0.45 µm polyamide membrane filter (Millipore, Molsheim, France). The filter membranes were resuspended in sterile distilled water (5 mL), shaken for two minutes (Vortex Genius, IKA, Staufen, Germany). A total of three 0.1 mL aliquots (untreated, heat-treated, and acid-treated) were spread on Buffered charcoal yeast extract agar (BCYE, Biolife Italiana, Milan, Italy) and Glycine vancomycin polymyxin B cycloheximide agar (GVPC, Biolife Italiana, Milan, Italy). For samples taken before November 2017, only the GVPC medium was used.
The inoculated plates were incubated at 36 • C for 10 days. At least three presumptive colonies of Legionella from each plate were subcultured on BCYE agar medium (Biolife Italiana, Milan, Italy) and BCYE agar medium without L-cysteine (BCYE-Cys, Biolife Italiana, Milan, Italy), and incubated at 36 • C for at least 48 h.
Suspected Legionella colonies were identified by matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry (MALDI-TOF MS, Bruker, Bremen, Germany). An agglutination test (Thermo Fisher Scientific, Bred, The Netherlands) was used for the confirmation of L. pneumophila. Individual latex reagents (Pro-Lab Diagnostics, Richmond Hill, Canada) were used for the detection of L. pneumophila serogroups.
Presumptive colonies from all BCYE and BCYE-Cys plates were counted and confirmed, and the estimated number of Legionella was expressed as CFU/liter with an indication of serogroup. All confirmed Legionella isolates were obtained in pure cultures and transferred to the culture collection for long-term storage at −80 • C.

DNA Extraction from Legionella Isolates
From all 197 L. pneumophila-positive samples, 58 isolates were selected for genetic analysis. Isolates were selected based on the frequency of L. pneumophila serogroup in the analyzed water samples, the geographical origin of the samples, and the year of sampling ( Table 1).
The isolates were thawed and cultured on BCYE agar at 37 • C for 48 h before DNA extraction. A single colony from each culture was subjected to manual DNA extraction using the NucleoSpin Tissue reagent kit (Macherey-Nagel, Düren, Germany) according to the manufacturer's instructions.

Whole-Genome Sequencing
All 58 isolates were subjected to whole-genome sequencing. The wet-lab procedures for library preparation and the bioinformatics procedures for assembly, data quality control, sequence-based typing (SBT), and core-genome MLST (cgMLST) typing were carried out as described previously [20]. Briefly, the trimmed reads were assembled into contigs using the SPAdes assembler v3.14.0 [21]. The contigs were then used as input for further SBT analysis according to the ESCMID Legionella Study Group (ESGLI) scheme [22,23] and cgMLST typing according to the scheme published by Moran-Gilad [24]. The collection of core genes among all isolates was determined using the pangenome analysis tool Roary v3.13.0 [25]. The core-gene SNP alignment produced by Roary was then used by FastTree v2.1.10 [26] to build an approximate maximum-likelihood phylogeny.
The virulence factor database (VFDB, retrieved on 12 November 2021) [27] and separately the sequence of the rtxA gene of L. pneumophila strain AA100 (nucleotide positions 949-4575 of GenBank ID AF057703.1) were used to identify virulence-encoding genes. The ABricate v1.0.1 tool (https://github.com/tseemann/abricate, accessed on 16 March 2023) was used to screen the assembled genomes for the presence of these virulence genes based on a nucleotide BLAST approach [28]. The thresholds of 80% BLAST sequence identity and 80% length coverage were set to qualify any gene as present in a genome. Additionally, in silico PCR with the rtx1/rtxA-rtx2/rtxA and rtx3/rtxA-rtx4/rtxA primer pairs [29] was simulated using the iPCRess tool from the Exonerate v2.2.0 software package [30]. The presence of antimicrobial resistance (AMR) genes was determined using the ResFinder software v4.1.7 and its associated database (version 2022-05-24) [31]. The same identity and coverage thresholds were used for AMR genes as for virulence genes.
In this study, we selected 11 genes encoding molecules responsible for the ability of L. pneumophila to infect humans [16] for further investigation (Table 2).

LegK1
Activation of NF-kB [37] sidJ SidJ Calmodulin-activated glutamylase for SidE [38] [43] was used for data analysis. Chi-squared tests and ANOVA were used to calculate differences between the variables. Genotypes were visualized in the form of dendrogram with iTOL v6.7.3 [44] and in the form of minimum spanning tree with GrapeTree v1.5.0 [45].

Prevalence of L. pneumophila in Residential Buildings
Overall, 197 of 492 (40.0%) samples were Legionella spp.-positive (Table 3). Only two Legionella species-L. pneumophila (n = 196) and L. rubrilucens (n = 1)-were found. At least one Legionella-positive sample was found in 112 out of 200 apartment buildings (56.0%); however, there were no significant differences between buildings linked to known cases of LD (71 positive buildings of 131) and buildings without known LD cases (41 positive buildings of 69; p = 0.54).
In buildings with known LD cases, the prevalence of Legionella spp. was significantly lower in cold water (p < 0.0001), while in buildings without known LD cases, there were no differences observed in the prevalence of Legionella between the cold and hot water (p = 0.192). Overall, the prevalence of Legionella spp. was higher both in cold (p < 0.01) and hot water samples in apartment buildings without known LD cases, although for hot water the difference was not statistically significant (p = 0.056).
In total, six L. pneumophila serogroups (SG) and three combinations of them were identified. Out of all 196 L. pneumophila-positive water samples, in 191 cases (97.4%), only one SG was identified. Overall, 55.1% (108/196) of isolates represented SG 2, and 28.1% (55/196) represented SG 3. In five water samples (2.6%), two SGs were found simultaneously (Table 4). No significant differences were observed in the prevalence of SGs in apartment buildings with and without known cases of LD (p > 0.05).
The observed levels of L. pneumophila colonization varied from 50 CFU/L up to 1.7 × 10 4 CFU/L, with the average value of 1.8 × 10 3 CFU/L. No significant differences were revealed in the levels of colonization with various SGs between buildings with or without known LD cases, except for L. pneumophila SG 3, which showed a significantly higher level of colonization (p < 0.05) in apartment buildings linked to LD cases (Table 5).

Whole-Genome Sequencing of L. pneumophila
Overall, 36 SBT sequence types (STs) were identified for 58 L. pneumophila isolates. Ten SBT sequence types have not been previously documented and are considered as new STs (Table 6).
ST 338 was the most frequent sequence type, which was determined for 10 out of 58 isolates (17%). None of the ST 338 isolates represented L. pneumophila SG 1. ST 366 and ST 2002 were detected in three isolates each (5%), while the other STs were detected in no more than two isolates each.
According to cgMLST typing, all 58 sequenced L. pneumophila isolates fell into 56 different cgMLST types. No cgMLST types specific to geographic location, serogroup, or SBT type were detected. No distinct clusters were identified, but clades around dominant STs such as ST 338 and ST 366 could be discerned, and the clade around ST 1104 was the most distant ( Figure 1).  All sequenced L. pneumophila isolates had only one antibiotic resistance gene-aph(9)-la, encoding the antibiotic resistance factor spectinomycin phosphotransferase.
In total, 420 virulence genes representing 59 gene families were found in 58 sequenced L. pneumophila genomes. The number of genes per one isolate varied from 312 to 415 (Table 6), with the average number of 375 virulence genes per isolate. A similar diversity of genes was observed between the isolates from buildings linked to LD cases and buildings without known LD cases. Notable differences between isolates of different serogroups were not found.
The genes enhC, htpB, omp28, mip, mavC, legK1, sidJ, lvhD4, lpnE, lspC, and rtxA were selected as objects of the greatest interest in this study, and the relative frequency of these genes was evaluated in all 58 isolates (Table 7). No significant differences were observed in the relative frequency of genes (p > 0.05) between buildings linked to LD cases and buildings without known LD cases and between different serogroups, except for sidJ, which was the less frequent in SG 9 isolates than in SG 1, SG 2, and SG 3 isolates (p < 0.05) and PCR simulated rtxA, which was less frequent in SG 1 isolates than in SG 3 isolates (p < 0.05).
A total of 260 genes (62.1%), including enhC, htpB, omp28, mip, lpnE, and 11 genes of the lsp family, were observed in all of the isolates (Figure 2). The Core-Genome SNP dendrogram showed the same clades as the cgMLST minimum spanning tree. The individual leg family virulence genes were found only in seven isolates out of 58 that were present in the clade formed around ST 1104. Rīga 1 No 2,10,3,3,9,4,6 366 375 According to cgMLST typing, all 58 sequenced L. pneumophila isolates fell into 56 different cgMLST types. No cgMLST types specific to geographic location, serogroup, or SBT type were detected. No distinct clusters were identified, but clades around dominant STs such as ST 338 and ST 366 could be discerned, and the clade around ST 1104 was the most distant ( Figure 1).

Figure 1.
A minimum spanning tree of 58 L. pneumophila isolates from water samples taken in apartment buildings in Latvia, based on cgMLST. The node sizes are proportional to the numbers of isolates sharing an identical pattern, while the node colors represent the geographical origin of the isolates, and the node labels indicate STs of the L. pneumophila isolates.
All sequenced L. pneumophila isolates had only one antibiotic resistance gene-aph(9)la, encoding the antibiotic resistance factor spectinomycin phosphotransferase.
In total, 420 virulence genes representing 59 gene families were found in 58 sequenced L. pneumophila genomes. The number of genes per one isolate varied from 312 to 415 (Table 6), with the average number of 375 virulence genes per isolate. A similar diversity of genes was observed between the isolates from buildings linked to LD cases and buildings without known LD cases. Notable differences between isolates of different serogroups were not found.
The genes enhC, htpB, omp28, mip, mavC, legK1, sidJ, lvhD4, lpnE, lspC, and rtxA were selected as objects of the greatest interest in this study, and the relative frequency of these genes was evaluated in all 58 isolates (Table 7).   The mav family was represented by 13 identified genes, of which nine were found in all isolates, while mavC was present in 54, mavG in 53, mavH in 55, and mavI in 57 isolates. Altogether, 29 virulence genes of the leg family have been identified. The frequency of leg genes in L. pneumophila isolates varied from seven to 58. However, only six genes of the leg family had a frequency of less than 40: legU1 was found in 23 isolates, legC1 in 13 isolates, as well as legA6, legL5, legL7, and legLC4 in seven isolates each. Out of the 11  The mav family was represented by 13 identified genes, of which nine were found in all isolates, while mavC was present in 54, mavG in 53, mavH in 55, and mavI in 57 isolates. Altogether, 29 virulence genes of the leg family have been identified. The frequency of leg genes in L. pneumophila isolates varied from seven to 58. However, only six genes of the leg family had a frequency of less than 40: legU1 was found in 23 isolates, legC1 in 13 isolates, as well as legA6, legL5, legL7, and legLC4 in seven isolates each. Out of the 11 virulence genes representing the sid family, sidA, sidE, sidF, and sidK were found in all 58 isolates. Furthermore, sidG and sidH were the rarest and were found in six and 13 isolates, respectively. All 11 identified genes of the lvh family were found in 46 isolates, except for lvhB2, which was found only in 27 isolates.

Discussion
The evaluation of the presence and diversity of Legionella in 200 residential apartment buildings across Latvia revealed high prevalence of Legionella in water supply systems of residential buildings. In general, Legionella was found in 56% of residential buildings and in 40% of water samples. Our study showed that the prevalence of L. pneumophila significantly exceeded the data reported in other studies: 20.7% in Germany [46] and 19.8% in Italy [47]. In the USA, at least one positive Legionella sample was found in 15% of single-family homes [48]. However, the prevalence of L. pneumophila reported in this study was similar to a previous report from Latvia, where 39% of water samples from apartment buildings were Legionella-positive [49]. The high prevalence of Legionella in water in Latvia could be related to ineffective water supply system maintenance strategies where water temperature requirements are not met, while water temperature is one of the main factors for Legionella persistence and proliferation in building water supply systems [9,10].
Our study showed that water contamination with Legionella was found more often (52.3% vs. 35.6%) in samples from buildings with no previous connection to LD cases. The average temperature of hot water was 50.7 • C; however, it was seven degrees lower on average in buildings without previous LD cases. Currently, the temperature of hot water circulation or at the points of consumption is not regulated in Latvia, and the maintenance companies and building managers are obliged to ensure only the temperature at the exit from the heat exchanger, which must not be lower than 55 • C according to the national legislation [50]. Due to the often considerable total length of water circulation pipelines and heat losses in the water supply system between the heat exchanger and the showerheads, the observed hot water temperature at the points of water consumption reached only 45.8 • C in buildings not related to LD cases. In contrast, managers are obliged to carry out disinfection procedures and Legionella monitoring in buildings with previous LD history, although guidelines for sampling frequency have not been set. Therefore, in buildings with a previous association with LD, the managers are more aware and, perhaps for this reason, higher hot water temperatures and lower prevalence of culturable Legionella were observed in those buildings.
In the present study, six serogroups of L. pneumophila were identified, of which SG 2 (55.1%), SG 3 (27.0%), and SG 1 (9.7%) were predominant. However, these results differed from our recent study of L. pneumophila in Latvian hotels where SG 3 was the predominant serogroup [20]. It is worth noting that the low prevalence of L. pneumophila SG 1 in residential buildings was in agreement with the results from hotels in Latvia [20]. Those results were consistent with low prevalence of L. pneumophila SG 1 antibodies (0.2%) in healthy blood donors in Latvia [51]. Living in an apartment building with a centralized hot water supply had been identified as the main environmental risk factor, and the seroprevalence of 9.5% in residents of urban apartment buildings was reported [51]. Globally, L. pneumophila SG 1 is considered to be the main causative agent of LD [2] and, accordingly, diagnostic methods for clinical cases have been adapted to identify SG 1. The urine antigen test, which is specific only for L. pneumophila SG 1, is still the first-choice method for the diagnostics of LD [2]. Additionally, the incidence of LD could be underreported, because the diagnostics and reports are biased towards SG1; thus, we assume that only the most severe cases, which most likely initially had a higher bacterial load, were detected and reported, while other cases may remain unrecognized. Also, to the best of our knowledge, the absence of clinical isolates of other SGs in Latvia until the end of 2022 may indicate insufficient diagnostics of other SGs.
In five noteworthy cases, two different L. pneumophila strains belonging to two different serogroups were found simultaneously in the same water samples. At the time of infec-tion, a person may encounter several Legionella strains with different immunological and antibacterial resistance characteristics, thus the choice of appropriate diagnostic methods for clinical cases can present a significant challenge.
Our study showed extensive sequence-type diversity, where 58 L. pneumophila isolates fell into 36 different sequence types, 10 of which have not been previously described. It must be admitted that the diversity of sequence types is not unusual for environmental Legionella, as confirmed by previous studies from Bosnia and Herzegovina [52], the USA [53], China [54], and Latvia [20]. The higher diversity of STs and the isolation of new STs during the present study could be explained by the focus on residential buildings. In comparison, the lower diversity and lower number of new STs found in hotels during our previous study could be related to internationally distributed STs [20].
The 11 STs found in residential buildings matched those found in Latvian hotels. Moreover, the predominant ST 338 and ST 336 were found both in hotels and in residential buildings. Several STs identified in our study have been associated with LD outbreaks and sporadic cases in other countries [55,56]. In addition, we did not find differences between the STs found in buildings with and without LD cases, and all these findings consequently suggest that Legionella strains may persist in residential buildings in Latvia and pose a long-term risk to residents.
In our study, 420 virulence genes were identified, of which 260 genes were found in all sequenced L. pneumophila isolates. Genes enhC, htpB, omp28, and mip encoding virulence factors related to the bacterial cell surface structures were detected in all isolates, suggesting that adhesion, attachment, and entry into the host cells are enabled for all isolates. The largest group of genes encoding T4SS effectors were quite variable; however, the relative frequency of virulence genes among L. pneumophila isolates was high. The wide range of genes encoding effectors demonstrated the high plasticity of the L. pneumophila genome and pointed to the possible redundancy of effectors, which is an important feature of Legionella [57]. The redundancy within the SidE effector family is well established, where members of the SidE effector family exert the same function on specific host cell targets. SidE, SdeA, SdeB, and SdeC catalyze the ubiquitination of host proteins, and the deletion of all four of these effectors together, but not individually, impairs intracellular growth, which can be restored by the insertion of just one of them [58].
New gene functions in Legionella are still being studied, and not all of the lvh locus group genes have been assigned a function yet. However, the environment in which Legionella grows before contact with the host cell also plays a role. The lvh locus gene lvhB2 is closely related to the ability of the bacterium to infect macrophages or amoebae, depending on the temperature at which Legionella grew prior to contact [59].
During the initial analysis, we noticed a remarkable difference in the prevalence of rtxA-positive isolates between our study and other studies. The rtxA gene was absent in all isolates when screening the genomes against the VFDB database. This was in stark contrast with other studies [60,61], where 20.69-100% of L. pneumophila isolates were rtxApositive. However, those studies relied on PCR to determine the presence of this gene. The commonly used rtx1/rtxA-rtx2/rtxA and rtx3/rtxA-rtx4/rtxA primers have been developed based on the L. pneumophila strain AA100 DNA sequence [29], and they only target two approximately 540-630 bp long fragments of the gene. The rtxA itself is known to have a modular structure and to be highly variable in length and sequence similarity between different L. pneumophila strains [62]. Thereby, we hypothesized that the absence of these two PCR target sites did not necessarily indicate the absence of every possible variant of the rtxA gene.
In order to test this hypothesis, we simulated PCR in silico, using the two aforementioned primer pairs and L. pneumophila reference sequences that were used to characterize the modular structure of rtxA [62]. Only the sequence of strain AA100 produced both in silico PCR products, confirming our hypothesis. Furthermore, the rtxA reference (YP_123037) that was included in the respective release of VFDB also did not produce any of the two expected in silico PCR products. Since the rtxA sequence from strain AA100 was the shortest of available references and it contained the conserved regions that are located at the start and the end of rtxA, we used it as a reference for BLAST-based screening for rtxA in our Legionella genomes.
We concluded that the method for accurate determination of the presence of rtxA should be carefully assessed, taking into account the apparent limitations of either the PCR methods or the alignment-based computational methods and their respective reference databases.
The high prevalence, extensive genetic diversity, and the wide range of virulence genes, which have been detected in all Legionella isolates from residential buildings in Latvia, indicate that the virulence potential of environmental Legionella is high and that all Legionella strains persisting in the water supply systems should be considered as potentially pathogenic. Virulence gene analysis has shown that Legionella strains persisting in residential buildings can acquire characteristics and increase the pathogenicity through horizontal gene transfer. Proper risk management, implementation of water safety plans, and microbiological monitoring would ensure the protection of residents by reducing the opportunities for Legionella to grow and evolve new virulence traits in man-made water systems.