Skip to Content
WaterWater
  • Article
  • Open Access

21 January 2026

Utilizing MALDI-TOF MS for Legionella pneumophila Subspecies Typing and Classification

,
,
and
1
Department of Evolutionary and Environmental Biology, University of Haifa, Haifa 3498838, Israel
2
Clinical Microbiology Laboratory, Ziv Medical Center, Safed 1311502, Israel
3
Infectious Diseases Unit, Ziv Medical Center, Safed 1311502, Israel
4
Department of Biology and Environment, University of Haifa, Oranim, Tivon 3600600, Israel
This article belongs to the Section Water and One Health

Abstract

Legionella pneumophila (L. pneumophila), the primary causative agent of Legionnaires’ disease, is a waterborne bacterial pathogen that poses significant public health concern. This opportunistic pathogen commonly inhabits both natural and man-made water systems, particularly drinking water distribution systems (DWDSs), where it can proliferate and pose a risk to human health. In this study, we evaluated the potential of Matrix-Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS) for rapid and accurate subtyping of L. pneumophila. Our analysis included 70 L. pneumophila strains collected from the Middle East, representing one of the largest and most comprehensive MALDI-TOF MS-based subtyping of strains from this geographically underrepresented region. These strains, representing three Multi-Locus Variable Number Tandem Repeat Analysis (MLVA-8) genotypic groups (GT4, GT6, and GT15), have been extensively characterized in previous studies for their virulence traits, cytotoxicity patterns, and antimicrobial susceptibility profiles. Our findings revealed distinct genotype-associated spectral signatures with 30 discriminatory m/z peaks (p ≤ 0.005). These markers enabled accurate genotype-level classification, achieving over 85% classification accuracy with a Random Forest model and over 71% accuracy using a Decision Tree algorithm. Importantly, the m/z peak at 5358 was uniquely present in the GT15 strains, whereas m/z 5353 was consistently detected in both GT4 and GT6 isolates, demonstrating the potential of specific mass peaks to serve as reliable genotype markers. Furthermore, GT15 strains consistently formed a separate cluster in both Principal Component Analysis (PCA) and hierarchical analyses, whereas GT4 and GT6 exhibited partial overlap, reflecting their exceptionally high genomic similarity.

1. Introduction

Legionella pneumophila, the causative agent of Legionnaires’ disease, is a waterborne bacterial pathogen that poses a significant public health threat [1,2]. Despite its importance, there is still much to uncover regarding its virulence factors and genotypic diversity in man-made drinking water distribution systems (DWDSs). Legionella, especially Legionella pneumophila (L. pneumophila), has emerged as an important waterborne pathogen, responsible for more drinking water-related disease outbreaks than any other pathogen in the United States. The persistent increase in legionellosis cases, coupled with diverse transmission sources, underscores the necessity for tools that facilitate the swift identification and characterization of these pathogens for public health needs, including tracing transmission sources during outbreak investigations [3]. Given the prevalence of Legionella in public water supplies and the relatively low incidence of disease compared to exposure events, the development of high-resolution and time-efficient strain-typing is particularly critical.
Genotyping L. pneumophila is most commonly performed using Sequence-Based Typing (SBT) to define clinical strains. Although widely accepted and standardized, SBT remains limited in its applicability to ecological investigations and routine clinical practices [4]. An alternative typing method for Legionella is multi-locus variable number of tandem repeats analysis (MLVA), which utilizes eight loci to discriminate between closely related strains at the subspecies level [5,6,7]. Comparative studies have demonstrated that MLVA achieves comparable accuracy to SBT for genotype assignment, while providing enhanced resolution within the highly prevalent and pathogenic Sequence Type 1 (ST1) lineage [8,9]. This increased discriminatory power is particularly significant in the context of ecological investigations, as shown in previous studies [10,11,12].
While numerous studies have examined L. pneumophila genotypes in relation to virulence-associated traits, less attention has been directed toward genotype-specific characteristics that govern persistence, growth, and survival in engineered freshwater environments. Previous studies have revealed pronounced ecophysiological and virulence-associated differences among three MLVA-8 genotypes (GT4, GT6, and GT15) [10,11,12]. Notably, GT15 isolates exhibited significantly elevated hemolytic activity compared to GT4 and GT6, suggesting enhanced virulence potential [11]. This genotype-dependent phenotypic variation raises the question of whether these functional differences are similarly reflected at the proteomic level, providing a strong rationale for exploring high-resolution subtyping strategies based on Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS). The observed phenotypic heterogeneity indicates that L. pneumophila genotypes differ not only genetically but also functionally with respect to ecological fitness and pathogenic potential, likely influencing niche colonization within drinking water distribution systems (DWDSs) and modulating associated human health risks.
In the last decade, MALDI-TOF MS has been incorporated into the standard procedures of microbiology laboratories for microbial identification, partially replacing conventional biochemical and molecular methods [3]. MALDI-TOF MS facilitates species-level identification of microbes by analyzing total protein profiles and serves as an innovative, user-friendly, rapid, precise, and economical technique [13,14].
While MALDI-TOF MS has transformed microbial identification, its use for bacterial typing remains not fully established for Legionella spp. [15,16,17]. MALDI-TOF MS has been evaluated for the typing of numerous bacterial infections, as it necessitates minimal sample preparation, analyzes data directly and swiftly, is automated, and is cost-effective. Regarding Legionella, MALDI-TOF MS has been increasingly evaluated for the rapid differentiation of isolates, including recent validation studies using environmental strains [16,18]. Fujinami et al. showed that two groups, consisting of 23 L. pneumophila isolates, exhibited identical clustering patterns by MALDI-TOF MS compared to those analyzed using mass spectrometry or pulsed-field gel electrophoresis (PFGE) [19]. Kyritsi et al. aimed to establish a rapid and accurate method for serotyping L. pneumophila and detecting two key pathogenicity loci, lvh and rtxA, which are associated with its virulence. By analyzing 150 environmental isolates using MALDI-TOF MS and comparing the results with PCR tests, they identified specific ion peaks that could distinguish between serogroups and detect the presence of virulence genes. Their method showed high accuracy, with 87.1% for serogroup assignment and over 97% for rtxA detection. For the lvh locus, the accuracy was lower, with 84% of the assignments being correct. These findings suggest that MALDI-TOF MS is a promising tool for environmental monitoring and risk assessment of Legionella contamination [16].
Moreover, Schwake et al. demonstrated, using 28 Legionella strains isolated from DWDSs in the USA, that MALDI-TOF MS can reliably differentiate between multiple Legionella species and further discriminate L. pneumophila strains according to their environmental origins [20]. More recently, Pascale et al. expanded these findings by showing that MALDI-TOF MS is suitable not only for diagnostic purposes but also for environmental surveillance of Legionella spp., exhibiting strong concordance with conventional culture-based methods and mip-gene sequencing [21]. Extending the application of MALDI-TOF MS beyond species-level identification, Blanco et al. demonstrated that several L. pneumophila sequence types (STs) can be accurately identified within minutes directly from colonies grown on BCYE agar, highlighting the potential of this approach for rapid strain-level typing [22].
Building upon these efforts, the current study provides one of the most comprehensive evaluations of MALDI-TOF MS as a high-resolution tool for the subspecies classification of L. pneumophila. By directly linking proteomic signatures to well-defined MLVA-8 genotypic groups in a large and extensively characterized strain collection, we identified genotype-associated discriminatory m/z features and assessed their performance across multiple analytical frameworks. This study advances the application of MALDI-TOF MS beyond species-level identification and limited strain differentiation, highlighting its potential to complement molecular typing methods in routine surveillance, outbreak investigations, and risk assessment of DWDSs, in line with recent advances integrating MALDI-TOF MS with high-resolution analytical and computational approaches [23].

2. Materials and Methods

2.1. Bacterial Strains

This study used 70 L. pneumophila strains, including environmental and clinical isolates. The strains were originally isolated from drinking water systems in northern Israel and from sputum samples of pneumonia patients at the Rambam Health Care Campus between 2012 and 2013 [24]. Subsequent genotypic classification into MLVA-8 groups was performed by Sharaby et al. A detailed description of the strains is provided in the Supplementary Materials (Table S1) [10,12].

2.2. Storage and Cultivation

All isolates were stored at −80 °C. For this study, the strains were cultured on buffered charcoal yeast extract (BCYE) agar plates (Hylabs, Rehovot, Israel). Incubation was carried out at 36 °C in a 5% CO2 atmosphere for 72 h, and all strains were cultured under identical conditions and harvested after the same incubation period to ensure comparable growth stages before MALDI-TOF MS analysis.

2.3. Sample Preparation and MALDI-TOF Mass Spectrometry Analysis

After incubation on BCYE agar plates, protein extraction was performed as follows: fresh bacterial biomass from 3–5 well-isolated colonies per strain was collected using a sterile 1 µL inoculation loop and suspended in 300 µL of high-performance liquid chromatography-grade (HPLC) water in an Eppendorf tube, followed by thorough vortexing. Colonies were selected based on the typical L. pneumophila morphology and visual uniformity. The number of colonies was adjusted to achieve a comparable biomass load, considering strain-specific differences in colony size and texture (e.g., mucoid or non-mucoid appearance). Then, 900 µL of absolute ethanol was added and mixed well. The supernatant was completely removed by two consecutive centrifugation steps for 2 min at 13,000 rpm. The pellet was air-dried at room temperature for approximately 5 min. Subsequently, 25 µL of 70% formic acid was added and mixed by pipetting until the pellet was well resuspended. Finally, 25 µL of acetonitrile was added, and the solution was mixed thoroughly and centrifuged for 2 min at 13,000 rpm. 1 µL of the prepared extract was applied onto a 96-well stainless-steel target plate (Bruker Daltonics, Bremen, Germany) and allowed to air-dry at room temperature. After drying, each spot was overlaid with 1 µL of a matrix solution containing 10 mg/mL α-cyano-4-hydroxycinnamic acid matrix in 50% acetonitrile with 2.5% trifluoroacetic acid (HCCA, Bruker Daltonics, Bremen, Germany).
Mass spectra were generated using a Microflex LT/SH™ mass spectrometer (Bruker Daltonics, Bremen, Germany) operating in the linear positive mode, covering a mass-to-charge (m/z) range of 2000–20,000 Da. Each spot on the target plate was irradiated with a pulsed nitrogen laser at a wavelength of 337 nm and a frequency of 60 Hz. The laser power was automatically optimized within the range of 40–60%. The spectra were acquired with dynamic termination disabled. A total of 240 laser shots were accumulated per spot and distributed over six raster positions (40 shots per position). External calibration was performed daily and prior to each analytical run using the Bruker Bacterial Test Standard (BTS; Bruker Daltonics, Bremen, Germany), covering the full mass range of interest (2000–20,000 Da). The BTS consists of an Escherichia coli DH5α extract supplemented with RNase A (13,683.2 Da) and myoglobin (16,952.3 Da), which served as reference peaks to ensure mass accuracy and calibration consistency.

2.4. Mass Spectrometry Analysis and Data Processing

For each isolate, nine raw mass spectra were acquired by analyzing three distinct spots on a 96-well target plate, each measured in triplicate. The spectra were subsequently processed using the FlexAnalysis v4.0 software (Bruker Daltonics, Bremen, Germany), including baseline subtraction, smoothing, and quality control procedures, according to Bruker’s standard workflow. Each spectrum was visually inspected using zoom tools to identify the artifacts. Spectra were excluded based on the following criteria: (i) flatline spectra, defined as spectra showing minimal or no discernible peaks above the baseline; (ii) outlier peaks, defined as abnormally sharp or isolated peaks not reproduced across replicates; (iii) mass shift anomalies, where selected peaks within the 6000–7000 Da range exhibited a shift exceeding 700 parts per million (ppm) tolerance. After quality control, at least six high-quality spectra per strain were retained and used to generate representative profiles of the strains. These processed spectra were then introduced into Mass-Up, an open-access software platform designed for MALDI data manipulation and examination, which incorporates machine learning and statistical methodologies [25]. Main spectra were generated for each isolate by applying a 700 ppm tolerance to align the peaks across replicates and excluding peaks with a relative intensity below 0.01%. Following quality control of the peak lists, a Main Spectrum Profile (MSP) was constructed for each isolate.

2.5. L. Pneumophila Genotyping and Peak Biomarkers Discovery

Following initial data processing, this study aimed to identify distinctive peak biomarkers within the three MLVA-8 genotypic groups: GT4, GT6, and GT15. This step was critical for developing a MALDI-TOF MS-based genotyping method. Candidate peaks were statistically evaluated for their association with the MLVA-8 groups using Fisher’s exact test. Given the non-quantitative nature of MALDI-TOF MS, the analysis was performed on binarized data, indicating the presence or absence of each peak in the spectra. To account for multiple comparisons, the Benjamini–Hochberg procedure was applied to control the false discovery rate (FDR), yielding adjusted q-values. Peaks with statistically significant p- and q-values were strong biomarker candidates. These peaks represent spectral features with the highest discriminatory potential for distinguishing between the GT4, GT6, and GT15 MLVA-8 groups, potentially offering a rapid and accurate method for L. pneumophila genotyping by MALDI-TOF MS.
In addition, PCA was performed using Mass-Up v1.0.14 software with the complete set of processed spectral peaks to examine the distribution of strains. The resulting plots were exported to Python 3.11 for graphical representation. A separate PCA was conducted in Python 3.11 based on the discriminatory peaks, allowing for improved resolution between the MLVA-8 genotypes. The same set of peaks was used for classification using the Random Forest and Decision Tree algorithms. Hierarchical clustering was performed on a binarized matrix (values > 0 assigned as 1, otherwise 0) representing the presence or absence of the selected discriminatory peaks. A dendrogram was constructed using the Hamming distance metric and complete linkage method.

3. Results

3.1. MALDI-TOF MS Data Analysis

Analysis of the three MLVA-8 genotypic groups, GT4 (n = 36), GT6 (n = 28), and GT15 (n = 6), showed a consistent peak profile, with an average of 63.8 ± 1.5 peaks detected per isolate. The lower quartile ranged from 49.3 to 56.0 peaks, and the upper quartile and outliers revealed spectra with up to 114.8 peaks. Regarding the mass values, the minimum masses ranged from m/z 2000 to 2659, with an average of m/z 2135.5 ± 9.2. In contrast, the maximum mass values exhibited more variability, ranging from m/z 10,642 to 20,598, with an average of m/z 12,630.5 ± 269.6.

3.2. Within-Genotype Conservation and Biomarker Discovery of MLVA-8 Genotype-Specific Peaks

Biomarker discovery analysis within the three MLVA-8 groups yielded significant findings. Eighteen consistent peaks were identified across all groups. An additional peak at m/z 4312 was observed across all three MLVA-8 groups, except for the O171 (GT6) strain (Table S2).
The GT4 group comprised 36 strains, including 32 environmental and 4 clinical isolates. The analysis revealed that 21 peaks were consistent in all GT4 strains. In addition, 25 peaks were identified in at least 50% of the GT4 strains. Three peaks (m/z 2674, m/z 7450, and m/z 10,640) were detected in 35 out of 36 GT4 strains. Two of these three peaks were absent in the clinical strain, Cl9. Notably, no peaks were exclusively observed in either the clinical or environmental strains (Figure 1).
Figure 1. MALDI-TOF MS spectra of nine L. pneumophila strains representing the three MLVA-8 genotypes: GT4, GT6, and GT15. The spectrum of each strain is displayed as a normalized bar chart across the m/z range of 2000–20,000. The vertical axis represents the normalized signal intensity (relative intensity, scaled from 0 to 1). Genotype-specific discriminatory peaks are annotated with dashed lines and labeled above the corresponding bars using genotype-specific colors (green for GT4, blue for GT6, and orange for GT15). These peaks highlight distinct spectral patterns that differentiate GT15 from GT4 and GT6, including peaks at m/z 5358, m/z 7510, and m/z 10,713, which are unique to GT15, and peaks at m/z 5353, m/z 7450, and m/z 10,643, which are shared by GT4 and GT6 but absent in GT15.
The GT6 group comprised 28 strains, consisting of 26 environmental and 2 clinical isolates. The analysis revealed that 21 peaks were consistent across all the GT6 strains. Additionally, 22 peaks were identified in at least 50% of the GT6 strains, with two peaks identified in 27 of 28 strains (m/z 2674 and m/z 4312). As with GT4, no peaks were exclusive to the clinical or environmental strains.
The GT15 group comprised six environmental strains. The analysis revealed that 21 peaks were consistent across all the GT15 strains. 35 more peaks were identified in at least 50% of the GT15 strains, whereas 17 peaks were detected in five of six strains.
Intergroup analysis revealed distinct spectral signatures across the MLVA-8 genotypes. Importantly, the m/z 5358 peak was found to be unique to the GT15 strains, whereas the m/z 5353 peak was identified in GT4 and GT6. Peaks at m/z 7450 and m/z 10,640 were identified in the GT6 and GT4 strains, except for Cl9. Conversely, GT15 strains and Cl9 (GT4) presented two different peaks at m/z 7510 and 10,713 (Figure 1).
This comparative analysis revealed conserved signatures and genotype-specific differences across the three MLVA-8 groups. While these results established the overall spectral framework, the next step was to determine which peaks contributed most strongly to genotype separation. By applying statistical testing in the Mass-Up software, a panel of 30 peaks was selected as the most informative discriminatory features (all peaks had a p-value ≤ 0.005 and a q-value < 0.05). From this point onward, our analyses focused on this refined set of peaks (Table 1 and Table S3).
Table 1. Intergroup comparison of discriminatory MALDI-TOF MS peak masses among the three MLVA-8 genotypes: GT4 (n = 36), GT6 (n = 28), and GT15 (n = 6). Peaks were statistically evaluated using Fisher’s exact test on binarized presence/absence of data. The table presents only peaks with p-value < 0.005, indicating the most significant associations with specific genotype groups. For each peak, the corresponding p-value, false discovery rate–adjusted q-value (Benjamini–Hochberg correction), and frequency of occurrence within each genotype are reported.
A heatmap was constructed to underscore the ability of MALDI-TOF MS to resolve genotype-level proteomic variation by visualizing both the presence and relative intensity of discriminatory peaks across isolates (Table S3), thereby illustrating genotype-specific expression patterns (Figure 2).
Figure 2. Heatmap representation of normalized peak intensities (range 0–1) from MALDI-TOF MS analysis of all L. pneumophila isolates. Rows represent a subset of discriminatory m/z peaks identified as significantly different between genotypes (n = 30; p ≤ 0.005, Mass-up software), ordered by ascending mass values. Columns represent individual isolates grouped by the MLVA-8 genotype (GT4, GT6, and GT15). The peak intensities are shown without log transformation. The color gradient indicates the relative intensity of each peak across isolates, from white (low intensity) to dark blue (high intensity). The heatmap simultaneously illustrates the qualitative (presence/absence) and quantitative (relative intensity) differences in genotype-associated protein expression.
The heatmap revealed distinct, genotype-specific proteomic signatures. The m/z 5353 peak exhibited high intensity in all GT4 and GT6 isolates but was absent in GT15, which instead showed a relatively high expression of the m/z 5358 peak. Additionally, the m/z 5377 peak was highly expressed in GT15 isolates but was undetected in the other genotypes. These patterns highlight the discriminatory capability of MALDI-TOF MS in distinguishing L. pneumophila genotypes based on their characteristic protein expression profiles. Importantly, these patterns are not limited to the occurrence of binary peaks but also involve substantial differences in peak intensity, indicating quantitative variations in protein expression between genotypes. Figure 2 provides a visual synthesis of the main outcomes of this study by integrating the 30 discriminatory peaks with their relative intensities, thereby reinforcing the biological relevance of the MALDI-TOF MS profiles.
PCA was applied to display the distribution of isolates based on their MALDI-TOF MS profiles (Figure 3). Analysis of the complete set of detected peaks showed that the GT15 isolates formed a separate cluster, whereas GT4 and GT6 exhibited partially overlapping distributions. When PCA was repeated using only the 30 discriminatory peaks (p ≤ 0.005), the genotypic resolution improved. GT15 isolates remained clearly separated, and the explained variance of the first principal component (PC0) increased relative to the analysis of the complete dataset, indicating that the discriminatory peak subset provided stronger separation power. Nevertheless, GT4 and GT6 remained overlapping.
Figure 3. Principal component analysis (PCA) plots of L. pneumophila isolates based on the main spectra of MALDI-TOF MS. (A) PCA derived from the complete set of peaks detected in all isolates. (B) PCA based on a subset of 30 discriminatory m/z peaks identified as significantly different between genotypes (p ≤ 0.005, Mass-Up software; see Table S1). Both plots illustrate the distribution of the three MLVA-8 genotypes: GT4 (n = 36, light green), GT6 (n = 28, light blue), and GT15 (n = 6, light orange).

3.3. Classification Accuracy of Machine Learning Models

To evaluate the discriminatory capacity of MALDI-TOF MS data, Random Forest (RF) and Decision Tree (DT) models were applied using the 30 statistically significant peaks (p ≤ 0.005). The RF classifier achieved 85.7% accuracy, precision, recall, and F1-score, whereas the DT model achieved 71.4% accuracy, 78.2% precision, 71.4% recall, and 73.4% F1-score (Table 2).
Table 2. Classification performance metrics of Random Forest and Decision Tree models applied to MALDI-TOF MS data for genotype prediction of L. pneumophila isolates. The models were trained on 80% of the data and evaluated on a 20% hold-out test set. Analysis was performed using a subset of peaks that showed statistically significant differences between genotypes (p ≤ 0.005), as identified by the Mass-up software (Table S3).
The feature importance analyses of both models are presented in Figure 4, which illustrates the relative contribution of the discriminatory peaks to genotype classification.
Figure 4. Relative feature importance of the top 10 discriminatory m/z peaks for the genotype classification of L. pneumophila, as determined by (A) Random Forest and (B) Decision Tree models.

3.4. Hierarchical Clustering Confirms Genotype-Level Separation

Hierarchical clustering was performed using the 30 discriminatory peaks (p ≤ 0.005), and the resulting dendrogram is presented in Figure 5. The dendrogram shows the clustering of L. pneumophila strains based on the selected peak list, revealing the distribution patterns for GT4, GT6, and GT15 genotypes. Consistent with earlier results, the GT15 isolates (orange) formed a distinct and cohesive cluster, clearly separated from the other genotypes. This separation reflects the unique protein profile of the GT15 strains within the discriminatory peak set. In contrast, GT4 (green) and GT6 (blue) did not form fully distinct clusters; instead, they appeared intermixed, although small genotype-specific sub-clusters were still observed within each group (Figure 5).
Figure 5. Hierarchical clustering dendrogram of L. pneumophila isolates constructed from a subset of 30 discriminatory m/z peaks identified as significantly different among genotypes (p ≤ 0.005, Mass-Up software; Table S3). The dendrogram illustrates the clustering patterns of the strains from the three MLVA-8 genotypes (GT4, GT6, and GT15). GT15 isolates formed a distinct cluster, whereas GT4 and GT6 strains displayed partial overlap, reflecting their close genomic relatedness.

4. Discussion

L. pneumophila, an opportunistic waterborne pathogen widely found in both natural and man-made water systems, is the primary cause of Legionnaires’ disease and continues to be a significant concern for public health worldwide [26,27,28]. Building upon previous research by Rodríguez-Martínez et al. and Sharaby et al., new insights into the identification and characterization of previously isolated L. pneumophila genotypes were obtained in the current study using MALDI-TOF MS [10,11,12,24]. Similarly to recent studies demonstrating the value of optimized sample preparation and in-house spectral libraries for Legionella identification and clustering, our approach builds upon well-characterized strain collections to enhance the discriminatory power at the subspecies level [29].
A series of studies by Sharaby et al. provided valuable insights into the genotype-specific characteristics of the L. pneumophila strains studied. Distinct temperature-dependent growth patterns among genotypes [12], significant variations in cytotoxicity through their interactions with red blood cells, macrophages, and amoebae [11], and genotype-specific antimicrobial resistance patterns among MLVA-8 genotypes have been demonstrated [10]. These genotype-specific traits influence their ability to colonize different ecological niches within drinking-water distribution systems and suggest varying levels of pathogenicity. The observed variations in virulence-associated characteristics, particularly in cytotoxicity and growth patterns, indicate that different genotypes may pose distinct risks to human health.
The current study yielded several significant findings that advance our understanding of L. pneumophila characterization through MALDI-TOF MS analyses. These analyses revealed a consistent protein profile across all 70 L. pneumophila isolates, with 18 peaks consistently present among all L. pneumophila strains. The observed uniformity in peak distribution suggests a conserved core proteome among different L. pneumophila genotypes, whereas the variation in maximum mass values indicates genotype-specific protein expression. Interestingly, while Fujinami et al. identified a characteristic peak at m/z 7180 in their L. pneumophila strains, our analysis revealed a similar but slightly different peak at m/z 7214. This variation aligns with previous observations that MALDI-TOF MS spectral patterns may reflect differences associated with the isolate origin, suggesting that the ecological or geographical context could contribute to proteomic variability [17,19]. However, as all strains analyzed in the present study originated from a single geographic region, these observations should be interpreted with caution. The slight difference in peak mass observed between our study and that of Fujinami et al. may partly reflect differences in the broader geographic or environmental background of the analyzed strain collections, while validation using strain collections from additional geographic regions will be required to assess broader global applicability.
The identification of consistent spectral patterns, particularly conserved peaks, provides a robust foundation for MALDI-TOF MS-based identification of L. pneumophila strains, specifically for Middle Eastern isolates. This geographical specificity of protein profiles suggests the importance of developing region-specific MALDI-TOF MS databases for accurate L. pneumophila strain identification and characterization [29].

MALDI-TOF MS as a Potential Tool for the Identification of Highly Pathogenic L. pneumophila Strains

Current Legionella risk assessment models and public health protocols typically treat all L. pneumophila strains as equivalent in terms of their health risks [30,31]. However, the distinct ecophysiological and virulent profiles among genotypes demonstrate the need for genotype-specific approaches in public health management and risk assessment. Understanding these genotype-specific characteristics is crucial for accurately evaluating the public health risks posed by Legionella in DWDSs. Recent perspectives emphasize that integrating high-resolution analytical tools, including proteomic profiling approaches, is essential for advancing genotype-informed risk assessment frameworks [23].
The Department of Environmental Health in Israel, like many public and environmental health agencies worldwide, routinely conducts environmental surveillance for Legionella [32,33,34]. Currently, strain characterization and genotyping rely on molecular techniques such as SBT or whole-genome sequencing, which are time-consuming, labor-intensive, and require specialized expertise and instrumentation [32,34]. Owing to these limitations, most public health agencies focus primarily on quantifying Legionella spp., potentially overlooking crucial genotypic variations that influence pathogenicity [10,11].
MALDI-TOF MS, already widely implemented in clinical laboratories, presents a promising solution for rapid and reliable L. pneumophila genotyping [15,18,20,21]. This approach could enable direct genotypic identification in both clinical and environmental laboratories without extensive molecular workflows, significantly reducing costs and processing times. The implementation of MALDI-TOF MS-based genotyping would enhance public health responses in several ways, including enabling the rapid identification of highly virulent strains, facilitating faster outbreak responses, improving source tracking, and allowing more efficient implementation of control measures [21]. Moreover, during routine water monitoring, agencies could quickly identify and prioritize remediation of sites containing potentially hazardous strains, thereby reducing legionellosis outbreak risks.
Our detailed analyses of 70 isolates representing three MLVA-8 genotypes (GT4, GT6, and GT15) demonstrated promising capabilities in discriminating between L. pneumophila genotypes, achieving over 85% accuracy in strain classification using the RF algorithm. Accordingly, the reported classification accuracy should be interpreted in the context of proof-of-concept analysis. Model performance was evaluated using a single 80/20 train–test split, designed to assess the discriminatory potential of MALDI-TOF MS–derived features, rather than to establish a fully optimized predictive framework. While the results showed some limitations in fully differentiating GT4 from GT6 isolates (Table 2), this outcome is consistent with their remarkably high genomic similarity, as confirmed by whole-genome sequencing data (unpublished results). Average Nucleotide Identity (ANI) and Genome-to-Genome Distance Calculator (GGDC) analyses revealed >99% similarity among GT4 and GT6 strains, with both groups also classified as ST-1 by SBT, underscoring their near-identical genomic background. Beyond genomic similarity, the limited separation observed between GT4 and GT6 may also reflect the specific protein mass range examined in this study. It is possible that subtle genotype-associated differences, if present, occur outside the 2000–20,000 Da mass range and are therefore not captured by MALDI-TOF MS profiles. Consistent with these factors and the identical cultivation conditions applied, the differences in protein expression between GT4 and GT6 within the examined mass range appeared to be minimal. In contrast, the GT15 strains formed a distinct cluster, exhibiting only ~97% ANI similarity to GT4/GT6 and markedly lower GGDC values (~78–81%), reflecting their greater genomic divergence. These genomic findings provide a strong rationale for the proteomic clustering observed: GT15 isolates were resolved by MALDI-TOF MS, whereas GT4 and GT6 displayed overlapping proteomic profiles in line with their near-clonal genomic relationship. Notably, despite being represented by a smaller number of isolates, GT15 strains exhibited complete consistency in several genotype-specific peaks across all samples, supporting the robustness of the observed proteomic signatures in this group. Taken together, our results highlight MALDI-TOF MS as an efficient tool for L. pneumophila subtyping, while also emphasizing the inherent challenge of separating genotypes that are virtually indistinguishable at the genomic level.
Supporting these findings, both PCA and hierarchical cluster analysis provided complementary evidence for the discriminatory power of MALDI-TOF MS. The PCA revealed distinct clustering patterns, with GT15 forming a well-defined cluster separate from GT4 and GT6, which showed partial overlap, consistent with their high genomic similarity. Hierarchical cluster analysis, visualized through a dendrogram, further strengthened these observations, showing GT15 as a distinct, cohesive cluster, whereas GT4 and GT6 exhibited minor genotype-specific sub-clustering patterns. These results are consistent with those of previous studies demonstrating the ability of MALDI-TOF MS to resolve L. pneumophila sequence types and strain-level variations using current classification approaches [22].
These findings establish MALDI-TOF MS as a powerful tool for L. pneumophila identification and subtyping, offering advantages such as rapid analysis and minimal sample preparation requirements. Our results complement previous studies by Fujinami et al. and Kyritsi et al., which established the technique’s utility for strain-level differentiation and virulence factor detection [16,19]. Notably, while Kyritsi et al. reported specific peak patterns associated with different serogroups (peaks at 3227.998 and 4756.435 for serogroup 1 and additional peaks at 7439.687, 9181.862, and 10,688.00 for serogroup 3), our findings revealed different serogroup-specific patterns. In our study, GT15 strains (serogroup 3) were distinguished by peaks at 10,713, 7510, and 5358, present in 100% of GT15 strains but largely absent in GT4 and GT6 strains. Conversely, the GT4 and GT6 strains (serogroup 1) consistently showed peaks at 5353, 7450, and 10,643. These differences in peak patterns between studies suggest that strain-specific protein profiles may be influenced by multiple factors beyond serogroup classification.
The variations in virulence and ecological adaptability among L. pneumophila genotypes, previously documented by Sharaby et al., underscore the importance of high-throughput methods that can link genotypic diversity with phenotypic characteristics [11]. Notably, our MALDI-TOF MS analyses revealed distinct protein profiles in GT15 strains compared to GT4 and GT6 (Table 1). These protein variations could potentially explain the significantly higher hemolytic capacity observed in GT15 strains, which was previously reported by Sharaby et al. to be approximately 2.5-fold higher than that of other genotypes. Hemolysis is a well-recognized virulence-associated mechanism that enhances bacterial survival through iron acquisition, and the pronounced hemolytic capacity of GT15 may therefore serve as an important phenotypic indicator of its elevated virulence potential. The proteomic distinction observed in GT15 in this study is consistent with its previously described physiological and virulence-related characteristics, reinforcing the link between genotype-specific protein expression and functional pathogenic traits. The identification of these differentially expressed proteins opens new avenues for understanding the virulence mechanisms of L. pneumophila. However, further research is required to identify these proteins and elucidate their roles in pathogenicity. Such investigations could provide crucial insights into virulence variation among L. pneumophila strains and could lead to new, faster genotyping capabilities.
The integration of MALDI-TOF MS into current surveillance protocols offers promising opportunities to deepen our understanding of L. pneumophila ecology and improve public health responses to potential outbreaks. This is particularly relevant considering recent advances that integrate MALDI-TOF MS with high-resolution analytical and computational strategies, thereby expanding its utility beyond conventional identification workflows [23].

5. Conclusions

These findings underscore the potential of MALDI-TOF MS not only as a rapid and reliable tool for L. pneumophila genotyping but also as a powerful approach for uncovering proteomic variations that underpin genotype-specific characteristics, enhancing our understanding of the pathogen’s ecology and informing targeted public health interventions. Importantly, MALDI-TOF MS enables genotype-level assessment within hours of culture, in contrast to conventional molecular typing approaches, such as SBT or MLVA, which typically require several days to complete. This rapid turnaround, combined with the low per-sample cost and high throughput, highlights the practical advantages of MALDI-TOF MS for routine environmental surveillance and outbreak response.
Future research should expand genotypic identification by targeting additional MLVA-8 genotypes using MALDI-TOF MS, thereby broadening the applicability of this technique across a diverse array of L. pneumophila strains. Additionally, isolating and characterizing proteins unique to specific genotypes could provide deeper insights into the proteomic foundations of genotypic variation, further enhancing subtyping capabilities and enabling a more nuanced understanding of genotype-specific traits.
A key limitation of the present study is the incomplete discrimination between the closely related GT4 and GT6 genotypes using MALDI-TOF MS. This limitation is consistent with their exceptionally high genomic similarity (ANI > 99%) and shared ST-1 background, indicating a near-clonal relationship that inherently constrains the level of proteomic divergence detectable using this approach. Future studies may address this limitation by integrating complementary analytical strategies, such as alternative sample preparation protocols, targeted analysis of low-abundance proteins, or a combination of MALDI-TOF MS with molecular or genomic methods. In this context, the findings suggest that improving resolution among closely related genotypes may require consideration of additional proteomic features beyond those captured within the mass range examined in the present study. Furthermore, the uneven distribution of strains across genotypes, particularly the smaller number of GT15 isolates, should be considered when interpreting the results and highlights the importance of validation using larger and more balanced strain collections in future studies. Taken together, these considerations provide a practical framework for advancing MALDI-TOF MS-based subtyping from proof-of-concept toward broader applications in environmental surveillance and public health contexts.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/w18020269/s1, Table S1: MLVA-8 classification of L. pneumophila strains from clinical and environmental samples (2012–2013).; Table S2: Inter-group analysis presents common peak masses between the three MLVA-8 genotypes studied.; Table S3: Matrix of normalized MALDI-TOF MS peak intensities used for comparative proteomic analyses of Legionella pneumophila isolates. Each row represents an individual isolate, annotated by MLVA-8 genotype (GT4, GT6, or GT15), and each column corresponds to an m/z peak identified as discriminatory (p < 0.005, Fisher’s exact test). Values indicate the relative intensity of each peak (range 0–1, normalized by total ion current). This dataset served as the basis for downstream analyses, including heatmap, principal component analysis (PCA), hierarchical clustering, and machine-learning classification using Random Forest (RF) and Decision Tree (DT) models.

Author Contributions

Conceptualization, Y.S., H.B.-A. and S.E.; methodology, L.M.; software, L.M.; validation, L.M.; formal analysis, L.M.; investigation, L.M.; resources, Y.S. and H.B.-A.; data curation, L.M.; writing—original draft preparation, L.M., H.B.-A. and Y.S.; writing—review and editing, L.M., H.B.-A., S.E. and Y.S.; visualization, H.B.-A.; supervision, Y.S.; project administration, Y.S. and H.B.-A.; funding acquisition, Y.S. and H.B.-A. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any external funding.

Data Availability Statement

All relevant data are contained within the article and Supplementary Material. Any further relevant supplementary or raw data will be made available by the authors upon request.

Acknowledgments

During the preparation of this work, the author used ChatGPT-4 turbo (OpenAI) to assist with language editing and improve the clarity of the text. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the final version of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cunha, B.A.; Burillo, A.; Bouza, E. Legionnaires’ Disease. Lancet 2016, 387, 376–385. [Google Scholar] [CrossRef]
  2. Fraser, D.W.; Tsai, T.R.; Orenstein, W.; Parkin, W.E.; Beecham, H.J.; Sharrar, R.G.; Harris, J.; Mallison, G.F.; Martin, S.M.; McDade, J.E. Legionnaires’ Disease: Description of an Epidemic of Pneumonia. N. Engl. J. Med. 1977, 297, 1189–1197. [Google Scholar] [CrossRef]
  3. Calderaro, A.; Chezzi, C. MALDI-TOF MS: A Reliable Tool in the Real Life of the Clinical Microbiology Laboratory. Microorganisms 2024, 12, 322. [Google Scholar] [CrossRef]
  4. Lück, P.C.; Ecker, C.; Reischl, U.; Linde, H.-J.; Stempka, R. Culture-Independent Identification of the Source of an Infection by Direct Amplification and Sequencing of Legionella pneumophila DNA from a Clinical Specimen. J. Clin. Microbiol. 2007, 45, 3143. [Google Scholar] [CrossRef]
  5. Pourcel, C.; Vidgop, Y.; Ramisse, F.; Vergnaud, G.; Tram, C. Characterization of a Tandem Repeat Polymorphism in Legionella pneumophila and Its Use for Genotyping. J. Clin. Microbiol. 2003, 41, 1819–1826. [Google Scholar] [CrossRef]
  6. Pourcel, C.; Visca, P.; Afshar, B.; d’Arezzo, S.; Vergnaud, G.; Fry, N.K. Identification of Variable-Number Tandem-Repeat (VNTR) Sequences in Legionella pneumophila and Development of an Optimized Multiple-Locus VNTR Analysis Typing Scheme. J. Clin. Microbiol. 2007, 45, 1190–1199. [Google Scholar] [CrossRef]
  7. Sobral, D.; Le Cann, P.; Gerard, A.; Jarraud, S.; Lebeau, B.; Loisy-Hamon, F.; Vergnaud, G.; Pourcel, C. High-Throughput Typing Method to Identify a Non-Outbreak-Involved Legionella pneumophila Strain Colonizing the Entire Water Supply System in the Town of Rennes, France. Appl. Environ. Microbiol. 2011, 77, 6899–6907. [Google Scholar] [CrossRef] [PubMed][Green Version]
  8. Kozak-Muiznieks, N.A.; Lucas, C.E.; Brown, E.; Pondo, T.; Taylor, T.H., Jr.; Frace, M.; Miskowski, D.; Winchell, J.M. Prevalence of Sequence Types among Clinical and Environmental Isolates of Legionella pneumophila Serogroup 1 in the United States from 1982 to 2012. J. Clin. Microbiol. 2014, 52, 201–211. [Google Scholar] [CrossRef] [PubMed]
  9. Mercante, J.W.; Caravas, J.A.; Ishaq, M.K.; Kozak-Muiznieks, N.A.; Raphael, B.H.; Winchell, J.M. Genomic Heterogeneity Differentiates Clinical and Environmental Subgroups of Legionella pneumophila Sequence Type 1. PLoS ONE 2018, 13, e0206110. [Google Scholar] [CrossRef] [PubMed]
  10. Sharaby, Y.; Nitzan, O.; Brettar, I.; Höfle, M.G.; Peretz, A.; Halpern, M. Antimicrobial Agent Susceptibilities of Legionella pneumophila MLVA-8 Genotypes. Sci. Rep. 2019, 9, 6138. [Google Scholar] [CrossRef]
  11. Sharaby, Y.; Rodríguez-Martínez, S.; Pecellin, M.; Sela, R.; Peretz, A.; Höfle, M.; Halpern, M.; Brettar, I. Virulence Traits of Environmental and Clinical Legionella pneumophila MLVA Genotypes. Appl. Environ. Microbiol. 2018, 84, e00429-18. [Google Scholar] [CrossRef]
  12. Sharaby, Y.; Rodríguez-Martínez, S.; Oks, O.; Pecellin, M.; Mizrahi, H.; Peretz, A.; Brettar, I.; Höfle, M.G.; Halpern, M. Temperature Dependent Growth Modeling of Environmental and Clinical Legionella pneumophila Multilocus Variable-Number Tandem-Repeat Analysis (MLVA) Genotypes. Appl. Environ. Microbiol. 2017, 83, e03295-16. [Google Scholar] [CrossRef]
  13. Bizzini, A.; Greub, G. Matrix-Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometry, a Revolution in Clinical Microbial Identification. Clin. Microbiol. Infect. 2010, 16, 1614–1619. [Google Scholar] [CrossRef]
  14. Nomura, F. Proteome-Based Bacterial Identification Using Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry (MALDI-TOF MS): A Revolutionary Shift in Clinical Diagnostic Microbiology. Biochim. Biophys. Acta (BBA)-Proteins Proteom. 2015, 1854, 528–537. [Google Scholar] [CrossRef]
  15. Sauget, M.; Valot, B.; Bertrand, X.; Hocquet, D. Can MALDI-TOF Mass Spectrometry Reasonably Type Bacteria? Trends Microbiol. 2017, 25, 447–455. [Google Scholar] [CrossRef] [PubMed]
  16. Kyritsi, M.A.; Kristo, I.; Hadjichristodoulou, C. Serotyping and Detection of Pathogenecity Loci of Environmental Isolates of Legionella pneumophila Using MALDI-TOF MS. Int. J. Hyg. Environ. Health 2020, 224, 113441. [Google Scholar] [CrossRef]
  17. Topić Popović, N.; Kazazić, S.P.; Bojanić, K.; Strunjak-Perović, I.; Čož-Rakovac, R. Sample Preparation and Culture Condition Effects on MALDI-TOF MS Identification of Bacteria: A Review. Mass Spectrom. Rev. 2023, 42, 1589–1603. [Google Scholar] [CrossRef] [PubMed]
  18. Savonen, E.; Mentula, S.; Ikonen, J.; Miettinen, I.T.; Rossi, P.M.; Niittynen, M. Identification of Legionella Anisa, Legionella Longbeachae and Legionella pneumophila Using MALDI-TOF MS: A Method Validation Study with Environmental Isolates. J. Microbiol. Methods 2025, 240, 107330. [Google Scholar] [CrossRef]
  19. Fujinami, Y.; Kikkawa, H.S.; Kurosaki, Y.; Sakurada, K.; Yoshino, M.; Yasuda, J. Rapid Discrimination of Legionella by Matrix-Assisted Laser Desorption Ionization Time-of-Flight Mass Spectrometry. Microbiol. Res. 2011, 166, 77–86. [Google Scholar] [CrossRef] [PubMed]
  20. Schwake, D.O.; Sandrin, T.; Zhang, L.; Abbaszadegan, M. Strain-Level Characterization of Legionella Environmental Isolates via MALDI-TOF-MS. Microorganisms 2022, 11, 8. [Google Scholar] [CrossRef]
  21. Pascale, M.R.; Mazzotta, M.; Salaris, S.; Girolamini, L.; Grottola, A.; Simone, M.L.; Cordovana, M.; Bisognin, F.; Dal Monte, P.; Bucci Sabattini, M.A. Evaluation of MALDI–TOF Mass Spectrometry in Diagnostic and Environmental Surveillance of Legionella Species: A Comparison with Culture and Mip-Gene Sequencing Technique. Front. Microbiol. 2020, 11, 589369. [Google Scholar] [CrossRef]
  22. Blanco, S.; Sanz, C.; Gutiérrez, M.P.; Simarro, M.; López, I.; Escribano, I.; Eiros, J.M.; Zarzosa, P.; Orduña, A.; López, J.C. A New MALDI-TOF Approach for the Quick Sequence Type Identification of Legionella pneumophila. J. Microbiol. Methods 2021, 188, 106292. [Google Scholar] [CrossRef]
  23. Andreadi, A.; Tsivelekidou, E.; Dermitzakis, I.; Theotokis, P.; Gargani, S.; Meditskou, S.; Manthou, M.E. Innovations in MALDI-TOF Mass Spectrometry: Bridging Modern Diagnostics and Historical Insights. Open Life Sci. 2025, 20, 20251136. [Google Scholar] [CrossRef]
  24. Rodríguez-Martínez, S.; Sharaby, Y.; Pecellín, M.; Brettar, I.; Höfle, M.; Halpern, M. Spatial Distribution of Legionella pneumophila MLVA-Genotypes in a Drinking Water System. Water Res. 2015, 77, 119–132. [Google Scholar] [CrossRef]
  25. López-Fernández, H.; Santos, H.M.; Capelo, J.L.; Fdez-Riverola, F.; Glez-Peña, D.; Reboiro-Jato, M. Mass-Up: An All-in-One Open Software Application for MALDI-TOF Mass Spectrometry Knowledge Discovery. BMC Bioinform. 2015, 16, 318. [Google Scholar] [CrossRef]
  26. Bartram, J. Legionella and the Prevention of Legionellosis; World Health Organization: Geneva, Switzerland, 2007; ISBN 92-4-156297-8.
  27. Jomehzadeh, N.; Moosavian, M.; Saki, M.; Rashno, M. Legionella and Legionnaires’ Disease: An Overview. J. Acute Dis. 2019, 8, 221–232. [Google Scholar] [CrossRef]
  28. Winn, W.C., Jr. Legionnaires Disease: Historical Perspective. Clin. Microbiol. Rev. 1988, 1, 60–81. [Google Scholar] [CrossRef]
  29. Girolamini, L.; Caiazza, P.; Marino, F.; Pascale, M.R.; Caligaris, L.; Spiteri, S.; Derelitto, C.; Simone, M.L.; Grottola, A.; Cristino, S. Identification of Legionella by MALDI Biotyper through Three Preparation Methods and an In-House Library Comparing Phylogenetic and Hierarchical Cluster Results. Sci. Rep. 2025, 15, 2162. [Google Scholar] [CrossRef]
  30. Hamilton, K.; Haas, C. Critical Review of Mathematical Approaches for Quantitative Microbial Risk Assessment (QMRA) of Legionella in Engineered Water Systems: Research Gaps and a New Framework. Environ. Sci. Water Res. Technol. 2016, 2, 599–613. [Google Scholar] [CrossRef]
  31. Sharaby, Y.; Rodríguez-Martínez, S.; Höfle, M.; Brettar, I.; Halpern, M. Quantitative Microbial Risk Assessment of Legionella pneumophila in a Drinking Water Supply System in Israel. Sci. Total Environ. 2019, 671, 404–410. [Google Scholar] [CrossRef] [PubMed]
  32. Moran-Gilad, J.; Mentasti, M.; Lazarovitch, T.; Huberman, Z.; Stocki, T.; Sadik, C.; Shahar, T.; Anis, E.; Valinsky, L.; Harrison, T.G. Molecular Epidemiology of Legionnaires ‘disease in Israel. Clin. Microbiol. Infect. 2014, 20, 690–696. [Google Scholar] [CrossRef][Green Version]
  33. National Academies of Sciences, Engineering, and Medicine. Management of Legionella in Water Systems; National Academies Press: Washington, DC, USA, 2020; ISBN 0-309-49382-X. [Google Scholar]
  34. de Jong, B.; Hallstrom, L.P. European Surveillance of Legionnaires’ Disease. Curr. Issues Mol. Biol. 2021, 42, 81–96. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.