Improving the Detection of Epidemic Clones in Candida parapsilosis Outbreaks by Combining MALDI-TOF Mass Spectrometry and Deep Learning Approaches

Identifying fungal clones propagated during outbreaks in hospital settings is a problem that increasingly confronts biologists. Current tools based on DNA sequencing or microsatellite analysis require specific manipulations that are difficult to implement in the context of routine diagnosis. Using deep learning to classify the mass spectra obtained during the routine identification of fungi by MALDI-TOF mass spectrometry could be of interest to differentiate isolates belonging to epidemic clones from others. As part of the management of a nosocomial outbreak due to Candida parapsilosis in two Parisian hospitals, we studied the impact of the preparation of the spectra on the performance of a deep neural network. Our purpose was to differentiate 39 otherwise fluconazole-resistant isolates belonging to a clonal subset from 56 other isolates, most of which were fluconazole-susceptible, collected during the same period and not belonging to the clonal subset. Our study carried out on spectra obtained on four different machines from isolates cultured for 24 or 48 h on three different culture media showed that each of these parameters had a significant impact on the performance of the classifier. In particular, using different culture times between learning and testing steps could lead to a collapse in the accuracy of the predictions. On the other hand, including spectra obtained after 24 and 48 h of growth during the learning step restored the good results. Finally, we showed that the deleterious effect of the device variability used for learning and testing could be largely improved by including a spectra alignment step during preprocessing before submitting them to the neural network. Taken together, these experiments show the great potential of deep learning models to identify spectra of specific clones, providing that crucial parameters are controlled during both culture and preparation steps before submitting spectra to a classifier.


Introduction
Candida parapsilosis is one of the most common yeasts responsible for human infections. Some studies rank it second just behind Candida albicans among the species most frequently responsible for candidemia [1]. Notably, this yeast has been implicated in nosocomial infection epidemics [2], including several outbreaks due to isolates resistant to fluconazole and other azoles, which are the first line of treatment [3][4][5][6]. Furthermore, some of these outbreaks are responsible for high mortality rates in intensive care units, especially if the patients are immunocompromised [7,8]. Recent publications report up to 30% of fluconazole-resistant isolates carrying an A395T mutation (Y132F substitution) in the erg11 gene to explain the observed phenotype. This mutation is likely the main mechanism that confers azole resistance to these isolates. In 2021, our team [9] described an outbreak of Candida parapsilosis resistant to fluconazole in the La Pitié Salpêtrière hospital (PSL) in Paris. Two clones infecting mainly ICU patients were identified; one was identified between 2012 and 2017 and the other emerged in 2017 and is unfortunately still active. The worrying spread of these resistant epidemic clones makes it necessary to build appropriate diagnostic tools for detecting clonal resistant isolates among all nonclonal fluconazolesusceptible C. parapsilosis identified in the routine flow of our microbiology departments. For now, allocating a given isolate to a clonal set requires the use of molecular methods such as microsatellite typing [10,11] or DNA sequencing. However, these methods are too expensive and time consuming to be implemented as routine activities.
We therefore set out to find a method that would allow clones to be identified directly in the flow of routine analyses without having to implement additional biological assays based on molecular biology. Detecting an epidemiological cluster of drug-resistant microorganisms directly through routine analysis methods would allow microbiologists to alert clinicians, making it possible to rapidly adapt the treatment administered to the patient and thus improve infection management. Currently, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) represents the main routine approach to identify bacteria and yeasts in almost all microbiology laboratories around the world. MALDI-TOF mass spectrometry generates mass spectra corresponding to the main proteins and glycoproteins extracted from microorganisms [12][13][14][15]. The mass spectra can be considered as species-specific fingerprints, allowing accurate identification of purified isolates at the genus and species levels. Recent studies have opened the door for new applications of MALDI-TOF typing approaches with the use of machine learning algorithms. Delavy et al. selected a machine learning model that qualitatively detects fluconazole resistance in the azole-tolerant species C. albicans [16]. Most recently, Normand et al. developed a simple deep learning model to identify a clonal population of Aspergillus flavus by MALDI-TOF mass spectrometry with a high performance [17]. Unfortunately, unlike the examples cited above, we have quickly discovered that in the case of Candida parapsilosis, the protein profiles obtained after MALDI-TOF mass spectrometry were so similar that it was impossible to obtain a good discrimination between the isolates belonging to the resistant clone and others using the model that successively discriminated Aspergillus flavus clones.
In this study, we investigated the methods used during the preparation of samples and during the computer analysis of mass spectra to improve the learning phase and, consequently, the discriminatory power of the trained neural network. This study particularly focused on the experimental steps that can influence the performance of epidemic clone identification using deep learning applied to the MALDI-TOF MS spectra. This work constitutes the first effort to analyze the conditions required for the optimal use of MALDI-TOF MS and deep learning in investigating outbreaks in medical mycology. It may be useful for other teams experiencing difficulties in successfully distinguishing between microbial entities with highly similar MALDI-TOF MS patterns. peaks matched between two spectra. It can be applied using a reference spectrum. Here, the chosen reference spectrum was the spectrum with the highest correlation coefficient with all other spectra. Step-by-step preprocessing of the spectra from the raw spectrum to the processed spectra before use in the machine learning phase.
Alignment. Alignment of the spectra was performed after the preprocessing step with MSIWarp, a Python package provided with C++ implementation. MSIWarp is a flexible tool compatible with multiple instrument types to perform mass alignment of mass spectrometry imaging spectra [22]. The alignment approach works on TOF data and reduces the mass range shift by applying a recalibration function on mass (m/z) data and by maximizing a similarity score that considers both the intensity and m/z position of peaks matched between two spectra. It can be applied using a reference spectrum. Here, the chosen reference spectrum was the spectrum with the highest correlation coefficient with all other spectra.
Machine learning. To further differentiate clone and nonclone spectra, a deep learning method involving ANN was implemented with TensorFlow 2.7.0. It was composed of a convolutional part and a fully connected part (architecture typical of a convolutional neural network (CNN)). The classifier ( Figure 2) was a very simple CNN [23] model taking a spectrum of 18,000 values as input (this accepts the preprocessed spectra and passes them on to the remaining network). The convolutional block was used to assist in the detection of patterns. It was composed of several layers (3 filters and a kernel size of 6): a convolutional layer to extract the characteristics, a max-pooling layer to reduce and pass on the main information [24] (pool size = 100) and a flatten layer followed by two fully connected layers (512 and 1024 units). A rectified linear unit function (ReLU) [25] was used in the convolutional and fully connected layers as the activation function. Classification was then performed with a normalization layer [26] to improve the class score with a final dense layer of dimension 2, followed by a softmax [27] function to produce the prediction probability over the 2 output classes (clones/others). The learning rate was set by default to 0.001 and the maximum number of epochs was set to 50 with early stopping with patience = 20. We used the Adam optimizer and the categorical cross-entropy loss [28]. The batch size was set to 60.
The preprocessed spectra were fed into a neural network with a convolutional layer, a max-pooling layer and a flattening layer to extract the main features and reduce captured information. The flattened layer was used as a transition to two fully connected layers to optimize classification. Features from the previous layer were then normalized by the normalizing layer followed by the output layer to produce results. The shape of the output layer was added with the batch size, n, set to 1 to simplify the illustration.
MALDI-TOF mass spectrometry data analysis cross-validation. For each test, the isolates were divided into five equally sized sets using random selection to preserve the clone/nonclone distribution. We validated our classification system using a nested crossvalidation (CV) technique stratified by clone/nonclone classification. Each CV fold was made of a training set composed of the data from 80% of the spectra depending on the criteria tested and a test set comprising the remaining 20% of the spectra. In total, 20 folds were performed with strict separation between the training and the test set, both in terms of isolates, culture media, age of the culture and mass spectrometers. On each fold, the clone/others classification system was trained on the training set and validated on the test set. Step-by-step preprocessing of the spectra from the raw spectrum to the processed spectra before use in the machine learning phase.

Machine learning.
To further differentiate clone and nonclone spectra, a deep learning method involving ANN was implemented with TensorFlow 2.7.0. It was composed of a convolutional part and a fully connected part (architecture typical of a convolutional neural network (CNN)). The classifier ( Figure 2) was a very simple CNN [23] model taking a spectrum of 18,000 values as input (this accepts the preprocessed spectra and passes them on to the remaining network). The convolutional block was used to assist in the detection of patterns. It was composed of several layers (3 filters and a kernel size of 6): a convolutional layer to extract the characteristics, a max-pooling layer to reduce and pass on the main information [24] (pool size = 100) and a flatten layer followed by two fully connected layers (512 and 1024 units). A rectified linear unit function (ReLU) [25] was used in the convolutional and fully connected layers as the activation function. Classification was then performed with a normalization layer [26] to improve the class score with a final dense layer of dimension 2, followed by a softmax [27] function to produce the prediction probability over the 2 output classes (clones/others). The learning rate was set by default to 0.001 and the maximum number of epochs was set to 50 with early stopping with patience = 20. We used the Adam optimizer and the categorical cross-entropy loss [28]. The batch size was set to 60.
The preprocessed spectra were fed into a neural network with a convolutional layer, a max-pooling layer and a flattening layer to extract the main features and reduce captured information. The flattened layer was used as a transition to two fully connected layers to optimize classification. Features from the previous layer were then normalized by the normalizing layer followed by the output layer to produce results. The shape of the output layer was added with the batch size, n, set to 1 to simplify the illustration.

MALDI-TOF mass spectrometry data analysis cross-validation.
For each test, the isolates were divided into five equally sized sets using random selection to preserve the clone/nonclone distribution. We validated our classification system using a nested cross-validation (CV) technique stratified by clone/nonclone classification. Each CV fold was made of a training set composed of the data from 80% of the spectra depending on the criteria tested and a test set comprising the remaining 20% of the spectra. In total, 20 folds were performed with strict separation between the training and the test set, both in terms of isolates, culture media, age of the culture and mass spectrometers. On each fold, the clone/others classification system was trained on the training set and validated on the test set. Evaluation metrics. For each impact assessment, we used the accuracy (pe of correct identifications), the F1-score, which is a synthesis score used in machin ing, the recall (sensitivity) and the specificity. Confidence intervals at the 95% co level were computed using the empirical bootstrap method [29].  Evaluation metrics. For each impact assessment, we used the accuracy (percentage of correct identifications), the F1-score, which is a synthesis score used in machine learning, the recall (sensitivity) and the specificity. Confidence intervals at the 95% confidence level were computed using the empirical bootstrap method [29].
where TP are true positives, FP are false positives, TN are true negatives, FN are falsenegatives and PPV is the positive predictive value. Study design. The study was designed in four steps ( Figure 3). First, using all the spectra that were acquired after 24 h of growth on the three culture media, we compared the machine effect. We used spectra obtained with three of the four machines for the learning phase and we tested the neural network with spectra obtained with the fourth machine. Second, using the same process, we applied the MSIWARP alignment method prior to the learning and testing phases. Third, to test the effect of the culture medium, we detailed the results depending on the culture media used for the growth of the isolates. Finally, using two of the four machines, we acquired the spectra from the 96 isolates at 24 and 48 h of growth to assess the impact of the age of the culture. organisms 2023, 11, x FOR PEER REVIEW ing phase and we tested the neural network with spectra obtained chine. Second, using the same process, we applied the MSIWARP ali to the learning and testing phases. Third, to test the effect of the cul tailed the results depending on the culture media used for the grow nally, using two of the four machines, we acquired the spectra from and 48 h of growth to assess the impact of the age of the culture. Ethical considerations. This study was carried out in accorda tion of Helsinki. The current study was not considered a study invol ing to French law No. 2012-300, as no clinical or identifying data wer were stored anonymously in the Pitié Salpêtrière Hospital Mycolog Ethical considerations. This study was carried out in accordance with the Declaration of Helsinki. The current study was not considered a study involving humans according to French law No. 2012-300, as no clinical or identifying data were used. All the strains were stored anonymously in the Pitié Salpêtrière Hospital Mycology Laboratory.

Genetic Diversity
Among the 96 selected isolates, 39 were closely related and belonged to our set of clones that we called R2 (Figure 4 and Supplemental Data

Fluconazole Susceptibility
Among our 96 isolates, 51 showed resistance to fluconazole (minimum inhibitory concentration (MIC) ≥ 4 mg/L) and 45 isolates were susceptible or intermediate to fluconazole (MIC < 4 mg/L) (Supplemental Data Table S1). All of the isolates belonging to the R2 set of clones were resistant to fluconazole, and 38/39 showed resistance ≥ 256 mg/L. Only R2 isolate 329 had an MIC of 16 mg/L s. Most isolates of the R2a profile (25/26) were from the La Pitié-Salpêtrière Hospital between November 2017 and October 2020, while one isolate of this clone was identified at Bichat-Claude Bernard Hospital in June 2021. Isolates of the R2b profile were detected starting in February 2020 in La Pitié-Salpêtrière and May 2021 at Bichat-Claude Bernard. The outbreaks of C. parapsilosis from both profiles are still ongoing in the two hospitals.

MALDI-TOF Mass Spectrometry Data Analysis
A total of 2258 spectra were acquired and used for determining the machine, alignment and culture medium impacts, and 768 new spectra were acquired for the assessment of the impact of the age of the culture.

Impact of the Machine and of the Alignment with MSIWarp
The results of the CV realized for each of the four machines are compiled in Table 1. Without alignment, the performance of the CNN model showed results ranging from 68% to 89% for the mean accuracy, depending on the tested machine. This experiment shows that all machines were not equal, even though the acquisition parameters were the same. MSIWarp alignment significantly improved the performance of the CNN model, especially for the two machines that showed lower performances without alignment. Notably, the recall (sensitivity) of the BACT-PSL and BICHAT machines improved from 0.30 to 0.64 and from 0.29 to 0.75, respectively, both without a loss in specificity.

Impact of the Culture Medium per Machine
Keeping the alignment with MSIWarp, we compared the results obtained on the three culture media per machine ( Table 2). Depending on the culture medium and the machine tested, the mean accuracy of the CNN model ranged from 0.77% to 0.96%. Except for the MYCO-PSL machine, for which the sensitivity was equivalent in the three culture media, greater performances were obtained on Sabouraud-GC for an equivalent specificity.

Impact of the Age of the Culture on Sabouraud Medium
Keeping the alignment with MSIWarp, we compared the performances obtained after 24 h and 48 h of growth on two machines (MYCO-PSL and SAINT-ANTOINE). Spectra from both machines were pooled and CV was performed only on the age of the culture (Table 3). When considering the same ages of the culture for training and testing, the performances were found to be equal, regardless of the metric taken into account (>90%). When the ages of the culture were crossed, especially when the CNN was trained with spectra from cultures grown for 48 h and tested with spectra from cultures grown for 24 h, the performance was disastrous, with all spectra identified as nonclones.

Discussion
Artificial intelligence includes the field of machine learning, which is the development of mathematical algorithms capable of solving problems based on learning from data samples. In this regard, deep learning algorithms (DLs), which use artificial neural networks (ANNs), are a subset of machine learning. In microbiology, particularly in the detection of antimicrobial resistance, these techniques have provided interesting insights [30]. These ANNs are a set of interconnected neurons that are capable of classifying output data from input signals. There are a number of different architectures that can be used, including convolutional neural networks (CNNs), which are known to be very powerful in image recognition [31]. For example, these algorithms have demonstrated their utility in microbiology for automated Gram stain reading [32].
However, in contrast to image recognition, experimental data on MALDI-TOF mass spectra remain scarce. Although MALDI-TOF mass spectrometry has become the main method used for the routine identification of bacteria, yeasts and filamentous fungi, only a few studies have explored the benefit of deep learning algorithms in MALDI-TOF spectra classification. This observation can be applied either in studies designed to distinguish closely related species or to identify a particular characteristic within a microbial species, such as resistance to certain antimicrobial molecules or belonging to an epidemic clone.
To date, no study has focused on the preanalytical steps involved in classifying spectra using a neural network. We show here that these steps are important by highlighting the role of the culture media, the growth time, the machine used to acquire the spectra and, finally, the mathematical treatment applied to the spectrum, in particular its alignment with a reference spectrum before its classification by the neural network.
The result can be excellent, mediocre or disastrous depending on whether these parameters are controlled. Thus, a learning process performed on two machines (MYCO-PSL and SAINT ANTOINE) from colonies grown 48 h on Sabouraud-GC agar allowed us to correctly classify 94% of the spectra acquired following the same conditions, while trying to classify spectra acquired after 24 h of growth using the same trained neural network led to disastrous results (all spectra were classified as nonclones). Our results also show that this pitfall could be circumvented by including the two culture times in the learning process, making it possible to obtain a satisfactory classification of the isolates after both 24 and 48 h of culture. The impact of the age of the culture on the shape of the spectrum has already been observed in studies designed to assess the identification performances in medical microbiology, especially for dermatophytes [33,34]. In some cases, this has led to the inclusion of spectra acquired at various ages of the culture in the reference databases to improve the identification performances. In the special case of the search for clones within a yeast species, the degree of precision makes it essential to control this parameter.
Beyond the colony's time of growth, our study showed the importance of the culture medium on which the colonies are grown in obtaining the most reliable results. This was not a surprise for us, as this parameter has often been pointed out in studies, even though those studies concluded that the impact of such variation on identification reliability was not a hurdle. In the case of the search for clones, the level of precision is such that it would be better to consider this parameter. Our study shows that classifying clones was possible either by extending the learning process to several culture media or by restricting the use of the model to spectra obtained from isolates cultured on the same medium as that used for learning.
The same conclusions can be drawn about the machines used for the learning phase and for the tests. In a previous study on Aspergillus flavus clonal detection, we highlighted a machine effect for the learning and testing phases and pointed out difficulties in obtaining satisfactory results with one of the tested machines (BACT-PSL) that was overused [17]. Nevertheless, we show here that by using several machines in the learning phase (leading to an increase in spectra analyzed), it was possible to obtain a satisfactory classification of the spectra for another machine, with 81 to 91% of correctly classified spectra, depending on the machine used to test the model. However, to obtain these results, classification by the neural network should be preceded by an alignment step of the spectra to minimize the variability of the spectra from one machine to the other. Fortunately, such a step can be performed automatically and only takes a fraction of a second for each new spectrum tested on the trained model. Quite unexpectedly, we were able to observe that our neural network could very easily identify the machine on which the spectra had been acquired and the culture medium on which the colony had been grown.
Altogether, these results show that it is possible to use deep neural networks to carry out epidemiological studies at a local level or even on several centers, provided that some parameters are monitored. On the basis of the research carried out in this study, we recommend that any center searching for specific clones in the context of the local spread of an outbreak should perform the learning phase using locally acquired spectra and then test the subsequent model using the same Maldi-ToF mass spectrometer. In addition, the conditions, i.e., culture medium and culture time, under which the colonies were obtained must be identical between the learning and the test phases. In the event that the spectra to be tested are expected to correspond to various acquisition conditions (for example, use of several culture media or several mass spectrometers), we recommend taking into account these conditions in the learning phase. The high impacts of parameters such as culture media or time of growth have also been observed with infrared spectrometry and bacterial typing [35], for which it is recommended to run all samples to be typed in the same experiment. Here, we show that it was possible to obtain satisfactory results when learning and testing were not performed at the same time or on the same machine. This is an interesting finding that needs to be highlighted. The other notable advantage is that a technology commonly used in biomedical laboratories was used as a starting point, which was not the case with the infrared spectrometry study.
However, our study has limitations. First, the number of tested isolates (96, 39 of which corresponded to an outbreak) is low. This certainly restricted the learning abilities of our neural network, as it is well known that the more elements that are included in the learning phase, the better the results are. However, outbreaks occurring in hospital settings usually involve a limited number of cases, especially those involving fungal agents; hence, there is a need to develop approaches suitable for helping with epidemiological investigations as soon as the outbreak is discovered and when the number of cases is still low. Thus, an outbreak involving 39 cases in two different hospitals is already a problem, which is why it is necessary to establish good detection tools.
All isolate identifications in our study were confirmed by MALDI TOF mass spectrometry using both the Bruker database and MSI-2 online, and all obtained C. parapsilosis identifications with high scores, confirming the species. Nevertheless, we acknowledge that the MALDI-TOF, even with a high score, may not be enough to ensure the quality of the identification results. Therefore some of the isolates used for this study have been sent to the Belgian collection of microorganisms (IHEM 28980; IHEM 28981; IHEM 28982; IHEM 28983; IHEM 28984; IHEM 28985; IHEM 28986; IHEM 28987; IHEM 28988; IHEM 28989; IHEM 28990; IHEM 28991; IHEM 28992; IHEM 28993). For those 14 isolates now in collection, MALDI-TOF identification was confirmed. We believe that they can be considered as positive controls for our experiment. We did not use an outgroup for our microsatellite experiment nor for our neural network. Indeed, we used very specific microsatellites primers that could not match with any other Candida spp.. Hence, the phylogenic tree could not include such outgroups and the very essence of supervised deep learning requires excluding outliers. Including an outgroup in the neural network risks giving uninterpretable results.
In this study, we did not explore the possibilities that artificial intelligence approaches different from deep neural networks could provide (such as support vector machine, PLS discriminant analysis, K nearest neighbors or random forest). We also did not try to develop more sophisticated deep neural networks (recurrent neural network, Siamese neural network, etc.). The focus of this article was rather to explore the different steps preceding the learning phase, as those steps are often overlooked in publications on the matter.

Conclusions
This study should be considered as a proof of concept aiming to highlight the issues in the use of deep learning methods with MALDI-TOF mass spectrometry for the differentiation of clonal strains from non-clonal strains in an epidemic context. The study focuses on the variations that can lead to misidentification by deep learning in the experimental phase prior to acquisition of the spectra. These are crucial elements to integrate into our knowledge in order to build a neural network model that is robust to these constraints. That being said, nothing prevents microbiologists from using a two-step sequential approach when investigating outbreaks of a certain magnitude. Firstly, the use of CNNs could make it possible to identify the strains potentially related to the epidemic and secondly, confirmation molecular methods could be implemented to confirm a strain belongs to the epidemic clone. In such cases, it is of importance to ensure that the CNN model is sensitive enough for detecting clonal strains.
Overall, the optimization of MALDI-TOF mass spectrum preparation before classification using deep learning techniques is a newly emerging subject, and much remains to be explored on this topic. However, with this study, we demonstrate that such optimization may enhance deep learning results and should eventually allow pushing the limits of MALDI-TOF mass spectrometry. This may open the way to further improvements in the diagnosis of fungal and bacterial outbreaks as a complement to molecular methods.