1. Introduction
Legionella pneumophila, the causative agent of Legionnaires’ disease, is a waterborne bacterial pathogen that poses a significant public health threat [
1,
2]. Despite its importance, there is still much to uncover regarding its virulence factors and genotypic diversity in man-made drinking water distribution systems (DWDSs).
Legionella, especially
Legionella pneumophila (
L. pneumophila), has emerged as an important waterborne pathogen, responsible for more drinking water-related disease outbreaks than any other pathogen in the United States. The persistent increase in legionellosis cases, coupled with diverse transmission sources, underscores the necessity for tools that facilitate the swift identification and characterization of these pathogens for public health needs, including tracing transmission sources during outbreak investigations [
3]. Given the prevalence of
Legionella in public water supplies and the relatively low incidence of disease compared to exposure events, the development of high-resolution and time-efficient strain-typing is particularly critical.
Genotyping
L. pneumophila is most commonly performed using Sequence-Based Typing (SBT) to define clinical strains. Although widely accepted and standardized, SBT remains limited in its applicability to ecological investigations and routine clinical practices [
4]. An alternative typing method for
Legionella is multi-locus variable number of tandem repeats analysis (MLVA), which utilizes eight loci to discriminate between closely related strains at the subspecies level [
5,
6,
7]. Comparative studies have demonstrated that MLVA achieves comparable accuracy to SBT for genotype assignment, while providing enhanced resolution within the highly prevalent and pathogenic Sequence Type 1 (ST1) lineage [
8,
9]. This increased discriminatory power is particularly significant in the context of ecological investigations, as shown in previous studies [
10,
11,
12].
While numerous studies have examined
L. pneumophila genotypes in relation to virulence-associated traits, less attention has been directed toward genotype-specific characteristics that govern persistence, growth, and survival in engineered freshwater environments. Previous studies have revealed pronounced ecophysiological and virulence-associated differences among three MLVA-8 genotypes (GT4, GT6, and GT15) [
10,
11,
12]. Notably, GT15 isolates exhibited significantly elevated hemolytic activity compared to GT4 and GT6, suggesting enhanced virulence potential [
11]. This genotype-dependent phenotypic variation raises the question of whether these functional differences are similarly reflected at the proteomic level, providing a strong rationale for exploring high-resolution subtyping strategies based on Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS). The observed phenotypic heterogeneity indicates that
L. pneumophila genotypes differ not only genetically but also functionally with respect to ecological fitness and pathogenic potential, likely influencing niche colonization within drinking water distribution systems (DWDSs) and modulating associated human health risks.
In the last decade, MALDI-TOF MS has been incorporated into the standard procedures of microbiology laboratories for microbial identification, partially replacing conventional biochemical and molecular methods [
3]. MALDI-TOF MS facilitates species-level identification of microbes by analyzing total protein profiles and serves as an innovative, user-friendly, rapid, precise, and economical technique [
13,
14].
While MALDI-TOF MS has transformed microbial identification, its use for bacterial typing remains not fully established for
Legionella spp. [
15,
16,
17]. MALDI-TOF MS has been evaluated for the typing of numerous bacterial infections, as it necessitates minimal sample preparation, analyzes data directly and swiftly, is automated, and is cost-effective. Regarding
Legionella, MALDI-TOF MS has been increasingly evaluated for the rapid differentiation of isolates, including recent validation studies using environmental strains [
16,
18]. Fujinami et al. showed that two groups, consisting of 23
L. pneumophila isolates, exhibited identical clustering patterns by MALDI-TOF MS compared to those analyzed using mass spectrometry or pulsed-field gel electrophoresis (PFGE) [
19]. Kyritsi et al. aimed to establish a rapid and accurate method for serotyping
L. pneumophila and detecting two key pathogenicity loci,
lvh and
rtxA, which are associated with its virulence. By analyzing 150 environmental isolates using MALDI-TOF MS and comparing the results with PCR tests, they identified specific ion peaks that could distinguish between serogroups and detect the presence of virulence genes. Their method showed high accuracy, with 87.1% for serogroup assignment and over 97% for
rtxA detection. For the
lvh locus, the accuracy was lower, with 84% of the assignments being correct. These findings suggest that MALDI-TOF MS is a promising tool for environmental monitoring and risk assessment of
Legionella contamination [
16].
Moreover, Schwake et al. demonstrated, using 28
Legionella strains isolated from DWDSs in the USA, that MALDI-TOF MS can reliably differentiate between multiple
Legionella species and further discriminate
L. pneumophila strains according to their environmental origins [
20]. More recently, Pascale et al. expanded these findings by showing that MALDI-TOF MS is suitable not only for diagnostic purposes but also for environmental surveillance of
Legionella spp., exhibiting strong concordance with conventional culture-based methods and
mip-gene sequencing [
21]. Extending the application of MALDI-TOF MS beyond species-level identification, Blanco et al. demonstrated that several
L. pneumophila sequence types (STs) can be accurately identified within minutes directly from colonies grown on BCYE agar, highlighting the potential of this approach for rapid strain-level typing [
22].
Building upon these efforts, the current study provides one of the most comprehensive evaluations of MALDI-TOF MS as a high-resolution tool for the subspecies classification of
L. pneumophila. By directly linking proteomic signatures to well-defined MLVA-8 genotypic groups in a large and extensively characterized strain collection, we identified genotype-associated discriminatory
m/
z features and assessed their performance across multiple analytical frameworks. This study advances the application of MALDI-TOF MS beyond species-level identification and limited strain differentiation, highlighting its potential to complement molecular typing methods in routine surveillance, outbreak investigations, and risk assessment of DWDSs, in line with recent advances integrating MALDI-TOF MS with high-resolution analytical and computational approaches [
23].
2. Materials and Methods
2.1. Bacterial Strains
This study used 70
L. pneumophila strains, including environmental and clinical isolates. The strains were originally isolated from drinking water systems in northern Israel and from sputum samples of pneumonia patients at the Rambam Health Care Campus between 2012 and 2013 [
24]. Subsequent genotypic classification into MLVA-8 groups was performed by Sharaby et al. A detailed description of the strains is provided in the
Supplementary Materials (Table S1) [
10,
12].
2.2. Storage and Cultivation
All isolates were stored at −80 °C. For this study, the strains were cultured on buffered charcoal yeast extract (BCYE) agar plates (Hylabs, Rehovot, Israel). Incubation was carried out at 36 °C in a 5% CO2 atmosphere for 72 h, and all strains were cultured under identical conditions and harvested after the same incubation period to ensure comparable growth stages before MALDI-TOF MS analysis.
2.3. Sample Preparation and MALDI-TOF Mass Spectrometry Analysis
After incubation on BCYE agar plates, protein extraction was performed as follows: fresh bacterial biomass from 3–5 well-isolated colonies per strain was collected using a sterile 1 µL inoculation loop and suspended in 300 µL of high-performance liquid chromatography-grade (HPLC) water in an Eppendorf tube, followed by thorough vortexing. Colonies were selected based on the typical L. pneumophila morphology and visual uniformity. The number of colonies was adjusted to achieve a comparable biomass load, considering strain-specific differences in colony size and texture (e.g., mucoid or non-mucoid appearance). Then, 900 µL of absolute ethanol was added and mixed well. The supernatant was completely removed by two consecutive centrifugation steps for 2 min at 13,000 rpm. The pellet was air-dried at room temperature for approximately 5 min. Subsequently, 25 µL of 70% formic acid was added and mixed by pipetting until the pellet was well resuspended. Finally, 25 µL of acetonitrile was added, and the solution was mixed thoroughly and centrifuged for 2 min at 13,000 rpm. 1 µL of the prepared extract was applied onto a 96-well stainless-steel target plate (Bruker Daltonics, Bremen, Germany) and allowed to air-dry at room temperature. After drying, each spot was overlaid with 1 µL of a matrix solution containing 10 mg/mL α-cyano-4-hydroxycinnamic acid matrix in 50% acetonitrile with 2.5% trifluoroacetic acid (HCCA, Bruker Daltonics, Bremen, Germany).
Mass spectra were generated using a Microflex LT/SH™ mass spectrometer (Bruker Daltonics, Bremen, Germany) operating in the linear positive mode, covering a mass-to-charge (m/z) range of 2000–20,000 Da. Each spot on the target plate was irradiated with a pulsed nitrogen laser at a wavelength of 337 nm and a frequency of 60 Hz. The laser power was automatically optimized within the range of 40–60%. The spectra were acquired with dynamic termination disabled. A total of 240 laser shots were accumulated per spot and distributed over six raster positions (40 shots per position). External calibration was performed daily and prior to each analytical run using the Bruker Bacterial Test Standard (BTS; Bruker Daltonics, Bremen, Germany), covering the full mass range of interest (2000–20,000 Da). The BTS consists of an Escherichia coli DH5α extract supplemented with RNase A (13,683.2 Da) and myoglobin (16,952.3 Da), which served as reference peaks to ensure mass accuracy and calibration consistency.
2.4. Mass Spectrometry Analysis and Data Processing
For each isolate, nine raw mass spectra were acquired by analyzing three distinct spots on a 96-well target plate, each measured in triplicate. The spectra were subsequently processed using the FlexAnalysis v4.0 software (Bruker Daltonics, Bremen, Germany), including baseline subtraction, smoothing, and quality control procedures, according to Bruker’s standard workflow. Each spectrum was visually inspected using zoom tools to identify the artifacts. Spectra were excluded based on the following criteria: (i) flatline spectra, defined as spectra showing minimal or no discernible peaks above the baseline; (ii) outlier peaks, defined as abnormally sharp or isolated peaks not reproduced across replicates; (iii) mass shift anomalies, where selected peaks within the 6000–7000 Da range exhibited a shift exceeding 700 parts per million (ppm) tolerance. After quality control, at least six high-quality spectra per strain were retained and used to generate representative profiles of the strains. These processed spectra were then introduced into Mass-Up, an open-access software platform designed for MALDI data manipulation and examination, which incorporates machine learning and statistical methodologies [
25]. Main spectra were generated for each isolate by applying a 700 ppm tolerance to align the peaks across replicates and excluding peaks with a relative intensity below 0.01%. Following quality control of the peak lists, a Main Spectrum Profile (MSP) was constructed for each isolate.
2.5. L. Pneumophila Genotyping and Peak Biomarkers Discovery
Following initial data processing, this study aimed to identify distinctive peak biomarkers within the three MLVA-8 genotypic groups: GT4, GT6, and GT15. This step was critical for developing a MALDI-TOF MS-based genotyping method. Candidate peaks were statistically evaluated for their association with the MLVA-8 groups using Fisher’s exact test. Given the non-quantitative nature of MALDI-TOF MS, the analysis was performed on binarized data, indicating the presence or absence of each peak in the spectra. To account for multiple comparisons, the Benjamini–Hochberg procedure was applied to control the false discovery rate (FDR), yielding adjusted q-values. Peaks with statistically significant p- and q-values were strong biomarker candidates. These peaks represent spectral features with the highest discriminatory potential for distinguishing between the GT4, GT6, and GT15 MLVA-8 groups, potentially offering a rapid and accurate method for L. pneumophila genotyping by MALDI-TOF MS.
In addition, PCA was performed using Mass-Up v1.0.14 software with the complete set of processed spectral peaks to examine the distribution of strains. The resulting plots were exported to Python 3.11 for graphical representation. A separate PCA was conducted in Python 3.11 based on the discriminatory peaks, allowing for improved resolution between the MLVA-8 genotypes. The same set of peaks was used for classification using the Random Forest and Decision Tree algorithms. Hierarchical clustering was performed on a binarized matrix (values > 0 assigned as 1, otherwise 0) representing the presence or absence of the selected discriminatory peaks. A dendrogram was constructed using the Hamming distance metric and complete linkage method.
4. Discussion
L. pneumophila, an opportunistic waterborne pathogen widely found in both natural and man-made water systems, is the primary cause of Legionnaires’ disease and continues to be a significant concern for public health worldwide [
26,
27,
28]. Building upon previous research by Rodríguez-Martínez et al. and Sharaby et al., new insights into the identification and characterization of previously isolated
L. pneumophila genotypes were obtained in the current study using MALDI-TOF MS [
10,
11,
12,
24]. Similarly to recent studies demonstrating the value of optimized sample preparation and in-house spectral libraries for
Legionella identification and clustering, our approach builds upon well-characterized strain collections to enhance the discriminatory power at the subspecies level [
29].
A series of studies by Sharaby et al. provided valuable insights into the genotype-specific characteristics of the
L. pneumophila strains studied. Distinct temperature-dependent growth patterns among genotypes [
12], significant variations in cytotoxicity through their interactions with red blood cells, macrophages, and amoebae [
11], and genotype-specific antimicrobial resistance patterns among MLVA-8 genotypes have been demonstrated [
10]. These genotype-specific traits influence their ability to colonize different ecological niches within drinking-water distribution systems and suggest varying levels of pathogenicity. The observed variations in virulence-associated characteristics, particularly in cytotoxicity and growth patterns, indicate that different genotypes may pose distinct risks to human health.
The current study yielded several significant findings that advance our understanding of
L. pneumophila characterization through MALDI-TOF MS analyses. These analyses revealed a consistent protein profile across all 70
L. pneumophila isolates, with 18 peaks consistently present among all
L. pneumophila strains. The observed uniformity in peak distribution suggests a conserved core proteome among different
L. pneumophila genotypes, whereas the variation in maximum mass values indicates genotype-specific protein expression. Interestingly, while Fujinami et al. identified a characteristic peak at
m/
z 7180 in their
L. pneumophila strains, our analysis revealed a similar but slightly different peak at
m/
z 7214. This variation aligns with previous observations that MALDI-TOF MS spectral patterns may reflect differences associated with the isolate origin, suggesting that the ecological or geographical context could contribute to proteomic variability [
17,
19]. However, as all strains analyzed in the present study originated from a single geographic region, these observations should be interpreted with caution. The slight difference in peak mass observed between our study and that of Fujinami et al. may partly reflect differences in the broader geographic or environmental background of the analyzed strain collections, while validation using strain collections from additional geographic regions will be required to assess broader global applicability.
The identification of consistent spectral patterns, particularly conserved peaks, provides a robust foundation for MALDI-TOF MS-based identification of
L. pneumophila strains, specifically for Middle Eastern isolates. This geographical specificity of protein profiles suggests the importance of developing region-specific MALDI-TOF MS databases for accurate
L. pneumophila strain identification and characterization [
29].
MALDI-TOF MS as a Potential Tool for the Identification of Highly Pathogenic L. pneumophila Strains
Current
Legionella risk assessment models and public health protocols typically treat all
L. pneumophila strains as equivalent in terms of their health risks [
30,
31]. However, the distinct ecophysiological and virulent profiles among genotypes demonstrate the need for genotype-specific approaches in public health management and risk assessment. Understanding these genotype-specific characteristics is crucial for accurately evaluating the public health risks posed by
Legionella in DWDSs. Recent perspectives emphasize that integrating high-resolution analytical tools, including proteomic profiling approaches, is essential for advancing genotype-informed risk assessment frameworks [
23].
The Department of Environmental Health in Israel, like many public and environmental health agencies worldwide, routinely conducts environmental surveillance for
Legionella [
32,
33,
34]. Currently, strain characterization and genotyping rely on molecular techniques such as SBT or whole-genome sequencing, which are time-consuming, labor-intensive, and require specialized expertise and instrumentation [
32,
34]. Owing to these limitations, most public health agencies focus primarily on quantifying
Legionella spp., potentially overlooking crucial genotypic variations that influence pathogenicity [
10,
11].
MALDI-TOF MS, already widely implemented in clinical laboratories, presents a promising solution for rapid and reliable
L. pneumophila genotyping [
15,
18,
20,
21]. This approach could enable direct genotypic identification in both clinical and environmental laboratories without extensive molecular workflows, significantly reducing costs and processing times. The implementation of MALDI-TOF MS-based genotyping would enhance public health responses in several ways, including enabling the rapid identification of highly virulent strains, facilitating faster outbreak responses, improving source tracking, and allowing more efficient implementation of control measures [
21]. Moreover, during routine water monitoring, agencies could quickly identify and prioritize remediation of sites containing potentially hazardous strains, thereby reducing legionellosis outbreak risks.
Our detailed analyses of 70 isolates representing three MLVA-8 genotypes (GT4, GT6, and GT15) demonstrated promising capabilities in discriminating between
L. pneumophila genotypes, achieving over 85% accuracy in strain classification using the RF algorithm. Accordingly, the reported classification accuracy should be interpreted in the context of proof-of-concept analysis. Model performance was evaluated using a single 80/20 train–test split, designed to assess the discriminatory potential of MALDI-TOF MS–derived features, rather than to establish a fully optimized predictive framework. While the results showed some limitations in fully differentiating GT4 from GT6 isolates (
Table 2), this outcome is consistent with their remarkably high genomic similarity, as confirmed by whole-genome sequencing data (unpublished results). Average Nucleotide Identity (ANI) and Genome-to-Genome Distance Calculator (GGDC) analyses revealed >99% similarity among GT4 and GT6 strains, with both groups also classified as ST-1 by SBT, underscoring their near-identical genomic background. Beyond genomic similarity, the limited separation observed between GT4 and GT6 may also reflect the specific protein mass range examined in this study. It is possible that subtle genotype-associated differences, if present, occur outside the 2000–20,000 Da mass range and are therefore not captured by MALDI-TOF MS profiles. Consistent with these factors and the identical cultivation conditions applied, the differences in protein expression between GT4 and GT6 within the examined mass range appeared to be minimal. In contrast, the GT15 strains formed a distinct cluster, exhibiting only ~97% ANI similarity to GT4/GT6 and markedly lower GGDC values (~78–81%), reflecting their greater genomic divergence. These genomic findings provide a strong rationale for the proteomic clustering observed: GT15 isolates were resolved by MALDI-TOF MS, whereas GT4 and GT6 displayed overlapping proteomic profiles in line with their near-clonal genomic relationship. Notably, despite being represented by a smaller number of isolates, GT15 strains exhibited complete consistency in several genotype-specific peaks across all samples, supporting the robustness of the observed proteomic signatures in this group. Taken together, our results highlight MALDI-TOF MS as an efficient tool for
L. pneumophila subtyping, while also emphasizing the inherent challenge of separating genotypes that are virtually indistinguishable at the genomic level.
Supporting these findings, both PCA and hierarchical cluster analysis provided complementary evidence for the discriminatory power of MALDI-TOF MS. The PCA revealed distinct clustering patterns, with GT15 forming a well-defined cluster separate from GT4 and GT6, which showed partial overlap, consistent with their high genomic similarity. Hierarchical cluster analysis, visualized through a dendrogram, further strengthened these observations, showing GT15 as a distinct, cohesive cluster, whereas GT4 and GT6 exhibited minor genotype-specific sub-clustering patterns. These results are consistent with those of previous studies demonstrating the ability of MALDI-TOF MS to resolve
L. pneumophila sequence types and strain-level variations using current classification approaches [
22].
These findings establish MALDI-TOF MS as a powerful tool for
L. pneumophila identification and subtyping, offering advantages such as rapid analysis and minimal sample preparation requirements. Our results complement previous studies by Fujinami et al. and Kyritsi et al., which established the technique’s utility for strain-level differentiation and virulence factor detection [
16,
19]. Notably, while Kyritsi et al. reported specific peak patterns associated with different serogroups (peaks at 3227.998 and 4756.435 for serogroup 1 and additional peaks at 7439.687, 9181.862, and 10,688.00 for serogroup 3), our findings revealed different serogroup-specific patterns. In our study, GT15 strains (serogroup 3) were distinguished by peaks at 10,713, 7510, and 5358, present in 100% of GT15 strains but largely absent in GT4 and GT6 strains. Conversely, the GT4 and GT6 strains (serogroup 1) consistently showed peaks at 5353, 7450, and 10,643. These differences in peak patterns between studies suggest that strain-specific protein profiles may be influenced by multiple factors beyond serogroup classification.
The variations in virulence and ecological adaptability among
L. pneumophila genotypes, previously documented by Sharaby et al., underscore the importance of high-throughput methods that can link genotypic diversity with phenotypic characteristics [
11]. Notably, our MALDI-TOF MS analyses revealed distinct protein profiles in GT15 strains compared to GT4 and GT6 (
Table 1). These protein variations could potentially explain the significantly higher hemolytic capacity observed in GT15 strains, which was previously reported by Sharaby et al. to be approximately 2.5-fold higher than that of other genotypes. Hemolysis is a well-recognized virulence-associated mechanism that enhances bacterial survival through iron acquisition, and the pronounced hemolytic capacity of GT15 may therefore serve as an important phenotypic indicator of its elevated virulence potential. The proteomic distinction observed in GT15 in this study is consistent with its previously described physiological and virulence-related characteristics, reinforcing the link between genotype-specific protein expression and functional pathogenic traits. The identification of these differentially expressed proteins opens new avenues for understanding the virulence mechanisms of
L. pneumophila. However, further research is required to identify these proteins and elucidate their roles in pathogenicity. Such investigations could provide crucial insights into virulence variation among
L. pneumophila strains and could lead to new, faster genotyping capabilities.
The integration of MALDI-TOF MS into current surveillance protocols offers promising opportunities to deepen our understanding of
L. pneumophila ecology and improve public health responses to potential outbreaks. This is particularly relevant considering recent advances that integrate MALDI-TOF MS with high-resolution analytical and computational strategies, thereby expanding its utility beyond conventional identification workflows [
23].
5. Conclusions
These findings underscore the potential of MALDI-TOF MS not only as a rapid and reliable tool for L. pneumophila genotyping but also as a powerful approach for uncovering proteomic variations that underpin genotype-specific characteristics, enhancing our understanding of the pathogen’s ecology and informing targeted public health interventions. Importantly, MALDI-TOF MS enables genotype-level assessment within hours of culture, in contrast to conventional molecular typing approaches, such as SBT or MLVA, which typically require several days to complete. This rapid turnaround, combined with the low per-sample cost and high throughput, highlights the practical advantages of MALDI-TOF MS for routine environmental surveillance and outbreak response.
Future research should expand genotypic identification by targeting additional MLVA-8 genotypes using MALDI-TOF MS, thereby broadening the applicability of this technique across a diverse array of L. pneumophila strains. Additionally, isolating and characterizing proteins unique to specific genotypes could provide deeper insights into the proteomic foundations of genotypic variation, further enhancing subtyping capabilities and enabling a more nuanced understanding of genotype-specific traits.
A key limitation of the present study is the incomplete discrimination between the closely related GT4 and GT6 genotypes using MALDI-TOF MS. This limitation is consistent with their exceptionally high genomic similarity (ANI > 99%) and shared ST-1 background, indicating a near-clonal relationship that inherently constrains the level of proteomic divergence detectable using this approach. Future studies may address this limitation by integrating complementary analytical strategies, such as alternative sample preparation protocols, targeted analysis of low-abundance proteins, or a combination of MALDI-TOF MS with molecular or genomic methods. In this context, the findings suggest that improving resolution among closely related genotypes may require consideration of additional proteomic features beyond those captured within the mass range examined in the present study. Furthermore, the uneven distribution of strains across genotypes, particularly the smaller number of GT15 isolates, should be considered when interpreting the results and highlights the importance of validation using larger and more balanced strain collections in future studies. Taken together, these considerations provide a practical framework for advancing MALDI-TOF MS-based subtyping from proof-of-concept toward broader applications in environmental surveillance and public health contexts.