Next Article in Journal
Ultrasound Imaging of the Superficial and Deep Fasciae Thickness of Upper Limbs in Lymphedema Patients Versus Healthy Subjects
Previous Article in Journal
Enhancing Dysarthric Voice Conversion with Fuzzy Expectation Maximization in Diffusion Models for Phoneme Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Classifying Dry Eye Disease Patients from Healthy Controls Using Machine Learning and Metabolomics Data

by
Sajad Amouei Sheshkal
1,2,3,*,
Morten Gundersen
3,4,
Michael Alexander Riegler
1,2,
Øygunn Aass Utheim
5,
Kjell Gunnar Gundersen
3,
Helge Rootwelt
6,
Katja Benedikte Prestø Elgstøen
6 and
Hugo Lewi Hammer
1,2
1
Department of Computer Science, Oslo Metropolitan University, 0166 Oslo, Norway
2
Department of Holistic Systems, SimulaMet, 0167 Oslo, Norway
3
Ifocus Eye Clinic, 5527 Haugesund, Norway
4
Department of Life Sciences and Health, Oslo Metropolitan University, 0166 Oslo, Norway
5
Department of Ophthalmology, Oslo University Hospital, 0450 Oslo, Norway
6
Department of Medical Biochemistry, Oslo University Hospital, 0450 Oslo, Norway
*
Author to whom correspondence should be addressed.
Diagnostics 2024, 14(23), 2696; https://doi.org/10.3390/diagnostics14232696
Submission received: 23 October 2024 / Revised: 16 November 2024 / Accepted: 21 November 2024 / Published: 29 November 2024
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

Abstract

:
Background: Dry eye disease is a common disorder of the ocular surface, leading patients to seek eye care. Clinical signs and symptoms are currently used to diagnose dry eye disease. Metabolomics, a method for analyzing biological systems, has been found helpful in identifying distinct metabolites in patients and in detecting metabolic profiles that may indicate dry eye disease at early stages. In this study, we explored the use of machine learning and metabolomics data to identify cataract patients who suffer from dry eye disease, a topic that, to our knowledge, has not been previously explored. As there is no one-size-fits-all machine learning model for metabolomics data, choosing the most suitable model can significantly affect the quality of predictions and subsequent metabolomics analyses. Methods: To address this challenge, we conducted a comparative analysis of eight machine learning models on two metabolomics data sets from cataract patients with and without dry eye disease. The models were evaluated and optimized using nested k-fold cross-validation. To assess the performance of these models, we selected a set of suitable evaluation metrics tailored to the data set’s challenges. Results: The logistic regression model overall performed the best, achieving the highest area under the curve score of 0.8378 , balanced accuracy of 0.735 , Matthew’s correlation coefficient of 0.5147 , an F1-score of 0.8513 , and a specificity of 0.5667 . Additionally, following the logistic regression, the XGBoost and Random Forest models also demonstrated good performance. Conclusions: The results show that the logistic regression model with L2 regularization can outperform more complex models on an imbalanced data set with a small sample size and a high number of features, while also avoiding overfitting and delivering consistent performance across cross-validation folds. Additionally, the results demonstrate that it is possible to identify dry eye in cataract patients from tear film metabolomics data using machine learning models.

1. Introduction

Dry Eye Disease (DED) is a multifaceted disorder characterised by a disruption in the composition, integrity, and stability of the tear film due to various internal and external factors. It is one of the most common reasons people seek eye care, with a severity spectrum ranging from minor, fleeting discomfort to severe, persistent pain and visual function impairment. This progression not only presents a substantial economic and healthcare challenge but also significantly impacts the quality of life of sufferers and the broader community. The incidence of DED notably increases following cataract surgery, highlighting the critical need for ophthalmologists to thoroughly evaluate for existing DED and to implement proactive and active treatment approaches. The presence of DED before surgery, can also complicate the precision of pre-surgical measurements, necessitate the reduction of intra-operative factors that could harm the ocular surface, and require the adoption of post-surgical care protocols to prevent the worsening of DED symptoms [1,2,3,4,5,6]. Clinical signs and symptoms are currently used to diagnose dry eye disease; however, the correlation between signs and symptoms is weak, leading to challenges in diagnosing and monitoring DED [7].
Previous research on diagnosing dry eye disease has utilized machine learning models applied to diverse data sets, including clinical data [8,9,10], imaging data [11,12,13], and patient-reported outcomes [14]. These studies have demonstrated the potential of machine learning to enhance diagnostic accuracy and identify relevant patterns associated with dry eye disease [14]. Different types of data represent distinct sources of information and can provide different insights into dry eye disease [15,16]. Omics data sets, such as genomics, proteomics, and metabolomics, can offer a deeper understanding of the disease by providing detailed molecular information related to genetic predisposition, underlying pathophysiology and current physiochemical status of the tear film as the end results of endogenous and exogenous processes. These molecular data provide a broader view of individuals, allowing for a variety of applications such as detection, personalized treatment, and disease monitoring with greater accuracy [1].
Advancements in omics technologies allow researchers to explore the genome, transcriptome, proteome, and more, providing in-depth insights into the molecular mechanisms underlying diseases. Despite their utility, single omics approaches are insufficient for comprehensively understanding the intricate interactions between genes, RNA, proteins, and environmental factors. Metabolomics distinguishes itself by revealing how metabolites, the dynamic output of gene, mRNA, and protein function, respond to various internal and external stimuli. This makes metabolomics a crucial instrument for unraveling complex biological processes and deepening our understanding of the origins, progression, and treatment outcomes of diseases [1].
In the context of DED, metabolomics holds the promise of identifying disease-specific metabolite profiles. These profiles can play an important role in enhancing the early diagnosis of DED, and in elucidating its etiology and pathology [1]. By identifying specific metabolic pathways and therapeutic targets, metabolomics can guide the choice of personalized treatment plans and improve the prediction of patient outcomes [1]. Furthermore, this approach can significantly enhance the monitoring of disease progression and the evaluation of treatment efficacy, ultimately leading to more effective management of DED [1,17]. By comparing metabolomic data from patients with that of a control group, researchers and clinicians can gain valuable insights, further enriching the biomedical and clinical understanding of DED.
In metabolomics research, the use of machine learning models is important for unraveling complex metabolic networks and identifying biomarkers. Each machine learning algorithm offers unique capabilities, necessitating careful selection to ensure the effectiveness of a study [18]. The choice of a machine learning algorithm significantly impacts the investigation’s success, as it must align with the specific characteristics of the data, including its dimensionality and the biological complexity [18]. Researchers dedicate efforts to evaluate the predictive performance of various machine learning algorithms, comparing them to traditional statistical methods to identify the most suitable approach for their specific needs [19].
However, selecting the optimal machine learning algorithm for metabolomics studies is challenging. Comparative studies often reveal that the effectiveness of a machine learning model can vary greatly depending on the research context and data attributes. This inconsistency underlines the fact that there is no one-size-fits-all machine learning algorithm for metabolomics [20,21,22,23,24]. Researchers are encouraged to develop a nuanced understanding of the advantages and operational intricacies of each machine learning model. By tailoring their choice of algorithm to the specific demands and nuances of their data, scientists can enhance the accuracy and interpretability of their findings [18]. Despite the lack of a universal guideline for the selection of machine learning algorithms, this iterative process of assessment and application is essential to advance metabolomics research and discover new biological insights.
This study focused on evaluating the efficacy of various machine learning models to classify cataract patients based on the presence or absence of DED, utilising metabolomics data collected from individuals awaiting cataract surgery. This approach marks an effort to explore the application of machine learning methodologies to this specific group of patients, to our knowledge, not previously undertaken.
Identifying effective machine learning models can help more accurately classify cataract patients with and without DED. This can also provide more reliable machine learning models for use by explainable artificial intelligence methods in identifying metabolomics signatures.
The following are our main contributions:
  • Conducting a comparative study of machine learning techniques on challenging data sets characterized by imbalanced classes, low sample size, and high dimensionality.
  • Classifying cataract patients with and without dry eye disease using metabolomics data from the participants’ tear films.
  • Fine-tuning models using a nested k-fold cross-validation framework to identify the most effective models
  • Employing early-stage integration strategy to combine metabolomics data from positive and negative modes to enhance model performance.
  • Using a set of performance metrics suitable for the data set’s challenges to appropriately evaluate the developed models.
The rest of the paper is organized as follows: Section 2 details the data collection process, preprocessing steps, and the machine learning models used in this study, along with the evaluation metrics applied. Section 3 presents the performance of the machine learning models, including the impact of hyperparameter tuning, model consistency, and the interpretation of the most effective model. Section 4 presents the conclusions, followed by the limitations of the work and future directions in Section 3.5.

2. Materials and Methods

Our research methodology involved an initial preprocessing phase, in which the data sets were standardised and normalised. Subsequently, we employed the k-fold cross-validation technique to evaluate the performance of different machine learning models. Furthermore, for hyperparameter tuning, we ran an inner cross-validation with in the overall cross-validation. This research methodology, depicted in Figure 1, allowed us to assess both the immediate and optimised effects of machine learning applications on our data sets.

2.1. Data Description

The metabolomics data sets used in this study come from a clinical study of 222 patients scheduled for cataract surgery conducted from August 2020 to January 2022 at Ifocus Eye Clinic in Haugesund, Norway [25,26,27,28,29]. Due to a rigorous protocol, good mental health was required. Moreover, no systemic diseases that could affect the corneal surface were allowed. Patients were excluded if they had any manifestations of corneal disease, corneal scarring, or if they had previously undergone corneal refractive procedures [28]. Prior to cataract surgery, all participants were examined for dry eye disease using a set of clinical tests, with specific cut-off values detailed in previously published studies [25,28]. This selection process ensured that only eligible participants were included in the study, based on both their general health and the condition of their corneal surface.
The patients were sub-grouped into dry versus normal eyes based on clinical examination in a standardised manner. Of the 218 participants, tear samples harvested using Schirmer strips from 54 of the dry eye positive (DED+) group and 27 from the dry eye negative (DED-) group were selected randomly and subjected to complete global metabolomics analysis based on a well-validated global Liquid Chromatography–Mass Spectrometry (LC-MS) method at Oslo universitetssykehus [30]. The random selection of 81 patients from the larger pool of 218 ensured that the patients chosen for metabolomics analysis were not biased in any particular direction. Comprehensive metabolome coverage was obtained by analyzing the samples in both positive and negative ionization modes on the metabolomic instrument: Electrospray Ionization Positive (ESI+) and Electrospray Ionization Negative (ESI−) [30]. As most participants in the clinical study had dry eye disease, the number of samples with dry eye is greater than those without dry eye.
In the ESI+ mode, a total of 1922 metabolites were detected, of which 611 were given a proposed annotation by the Compound Discoverer software. The remainder are metabolite features without annotation. Similarly, the ESI− mode revealed 939 metabolites, of which 401 were annotated and the rest currently unidentified. The annotated metabolites include both metabolites identified at the highest level of confidence, and metabolites with lower levels of confidence of identification. Since identifying metabolites is a complex and resource-intensive task that requires careful examination and much work by biochemical experts, our aim is to first use machine learning to identify the most important metabolite features. Subsequently, this small group of key metabolite features will be analyzed in greater detail in our lab to try to accurately identify them at the highest level of confidence.
The metabolomics study followed the tenets of the Declaration of Helsinki and was approved by the Regional Committee for Medical and Health Research Ethics in Norway (Reference number 2020/140664). The data sets were anonymized prior to the machine learning experiments by removing all information that could potentially identify the participants. Ultimately, the data set consisted solely of the metabolomics data for the participants, ensuring that the anonymization process did not introduce any limitations to this study. Hence, the regional ethics committee was requested, and due to the true anonymity of the data sets, additional approval was not needed.
Previous research in the omics field has demonstrated that combining ESI+ and ESI− modes in mass spectrometry enables a more comprehensive analysis of samples, aimed at broadening the scope of analyte detection [31,32,33,34,35]. This dual-mode approach enhances the detection of diverse molecular components, increases the total number of assigned molecular features, and provides complementary data. These combined data sets result in a more complete chemical profile [36,37,38].
Merging the data sets from ESI+ and ESI− modes is complementary, as each mode ionizes and detects different types of molecules, although some metabolites can be detected in both ionization modes. The two modes together can access molecular features that are not detectable when using only one mode. This diversity in detected features enriches the data set, allowing machine learning models to learn from a broader and more varied set of data, which in turn enables the identification of more complex patterns and relationships. Consequently, this integration enhances the predictive capabilities of the machine learning models, although it increases the demand for memory and computational resources during model training. We employed an early integration approach to merge the data sets from both ESI+ and ESI− modes [39].
The metabolomics data sets from both ionization modes were merged, aiming to utilize all features of the patients for feeding the machine learning model. The integration of metabolomics data sets from both ionization modes was designed to augment the data set and enhance the predictive capability of machine learning models for dry eye disease in cataract patients. Information about the metabolomics data sets, categorized by each ionization mode is provided in Table 1.

2.2. Data Preprocessing

Missing values were imputed using the default gap-filling algorithm in Compound Discoverer, which applies methods such as filling based on spectrum noise, simulated peaks, or trace area. As a result, no missing values remain in the data sets. Normalization is important in metabolomics data sets to mitigate the effects of variation arising from instrument drift [40,41]. The use of automatic normalization for each metabolite, based on an identical pooled quality control analyzed at regular intervals throughout the experiment, and logarithmic transformation helps mitigate this issue by stabilizing the variance across the data set. By taking the logarithm of the metabolite intensities, we reduce the influence of outliers and compress the dynamic range, making the data more homoscedastic. To address disparities in high and low-intensity features and to decrease the variability in data spread (heteroscedasticity), we standardized each metabolite’s logarithmic value. This standardization process consisted of subtracting the mean x ¯ of the logarithm of each metabolite value and dividing by its standard deviation (s) [42] as described in Equation (1). Next, quantile normalization was applied with the aim of further minimizing sample-to-sample variation, thus ensuring a more uniform data structure to subsequent analyses [43,44,45,46]. By applying quantile normalization, we adjust the distribution of metabolite intensities across all samples to a common reference distribution, thereby reducing sample-to-sample variability.
x ^ i j = log 2 x i j x i ¯ s

2.3. Machine Learning Models and Evaluation Metrics

For classifying cataract patients based on dry eye disease status, six supervised machine learning techniques such as XGBoost [47], Random Forest [48], Support Vector Machine [49], Multi Layer Perceptron [50], Logistic Regression [51], and K-Nearest Neighbor [52] were employed, and two dummy classifiers with different strategies were used as baselines for the performance benchmark. These particular supervised machine learning models were chosen because they represent a range of common methods in metabolomics research [18]. Furthermore, machine learning models, particularly tree-based models, are more appropriate for tabular data sets than deep learning models because they handle complex and non-linear relationships better by dividing the data into regions and fitting models within each region, effectively capturing irregularities [53]. Additionally, tree-based models require much less tuning and computational resources compared to deep learning models, which is particularly important in studies with limited data.
Dummy classifiers with a uniform strategy and a most frequent strategy are selected to predict each class with equal probability and to predict the most frequently observed class in the training data set, respectively, to use them as baselines for interpreting the machine learning model’s performance. The dummy classifiers do not use information from the features and are quite basic. Therefore, if a machine learning model performs better than the dummy classifiers, it indicates that the model has been able to learn from the features.
To accurately assess the effectiveness of the eight machine learning models deployed, it is important to employ suitable evaluation metrics. Consequently, in this study, a set of criteria was utilized for the evaluation, including the Area Under the Curve (AUC), Balanced Accuracy, Matthews Correlation Coefficient (MCC), Specificity, and the F1-Score. The selected metrics provide a clear view of the performance of the machine learning models when applied to metabolomics data sets, ensuring a reliable analysis of their predictive capabilities and overall accuracy. More details about the machine learning models and evaluation metrics can be found in Appendix A.

2.4. Code and Availability

The experiments for this study were conducted using the Python programming language within Google Colaboratory. This environment provided a Central Processing Unit (CPU) backend, equipped with 13 GB of RAM and 108 GB of storage space. The machine learning libraries utilized in the research included sklearn version 1.2 . 2 , numpy version 1.23 . 5 , seaborn version 0.13 . 1 , pandas version 1.5 . 3 , matplotlib version 3.7 . 1 , and scipy version 1.11 . 4 . The Jupyter notebooks employed in this research are accessible at the following location: https://github.com/sajadamouei/classification-metabolomics (accessed on 10 November 2024).

3. Results and Discussion

To explore the predictive capacity of machine learning models in imbalanced metabolomics binary classification data sets, we adopted an evaluation methodology. Data sets reflect a high-dimensional data structure common to metabolomics studies. Given the challenges posed by the data set’s imbalance and complexity, we implemented an approach to model development and validation to ensure the reliability of our findings.
Initially, we developed eight machine learning models on the entire data set. These models were selected based on their ability to handle challenges in the data set and their various learning mechanisms, providing a broad range of evaluations. The development and evaluation of these models were carried out using a stratified 10-fold cross-validation technique. We selected 10 for k in the k-fold cross-validation as it offers a good balance between bias and variance in model evaluation, considering the sample size [54]. Stratification in the 10-fold cross-validation involves dividing the data set into ten folds of equal size, ensuring that the distribution of labels is consistent across all folds, thereby maintaining the representativeness of the data set. In each iteration, nine folds were used to train the model, and the remaining fold served as a test set. This process was repeated ten times, each fold serving as a test set once, ensuring that every data point was used for both training and testing. The performance of each model was then evaluated based on the mean of the metrics obtained from the ten iterations, providing an aggregate measure of the model’s effectiveness across the entire data set.
In the field of machine learning, hyperparameters are variables that define the model’s structure or the characteristics of the learning algorithm, and are set before training begins. Unlike hyperparameters, other parameters like coefficients or weights change during training. The need and number of hyperparameters differ depending on the algorithm. To examine the impact of the model optimization, we enhanced our evaluation methodology by adopting a nested stratified cross-validation strategy for fine-tuning these hyperparameters.
Given the specific characteristics and challenges of our data sets, we opted for a limited search space tailored to each model to prevent overfitting, utilizing a grid search method for this purpose. In this approach, an additional stratified 5-fold cross-validation dedicated to hyperparameter tuning was conducted within each fold of the primary stratified 10-fold cross-validation. This nested setup enabled us to further partition the training folds from the outer loop into smaller sub-folds, helping to refine the model parameters and stabilize their performance across different folds. Here, four sub-folds were employed for training with different hyperparameter settings, while the fifth sub-fold acted as a validation set to evaluate these settings. The best-performing hyperparameter set on the validation sets from the five inner folds was then selected as the optimal configuration for each model. With these optimal hyperparameters, the model was trained anew on the complete training data set from the outer loop, prior to its assessment on the test fold.
Algorithm 1 illustrates the process of stratified 10-fold cross-validation, including the incorporation of nested stratified 5-fold cross-validation for hyperparameter tuning.
Algorithm 1: Stratified 10-Fold Cross-Validation
  • Input: Dataset D with N instances, each comprising a feature set and a class label.
  • Output: Mean performance metrics of the models.
  • Procedure:
    • Stratify D by class labels to ensure class distribution is mirrored across folds.
    • Divide D into 10 equal parts, F 1 , F 2 , , F 10 , maintaining stratification.
    • For each fold F i :
      (a)
      Treat F i as the test set, and the remaining folds as the training set.
      (b)
      If tuning is needed, perform a stratified 5-fold CV on the training set to optimize hyperparameters.
      (c)
      Train the model on the training set, using optimized parameters if applicable.
      (d)
      Predict the true label for each data point in the test set, F i and store the predictions.
    • Evalute the performance by comparing the the predictions from all the test sets F 1 , , F 10 with true labels.

3.1. Results of Hyperparameter Tuning

To improve the performance of machine learning models, optimization and hyperparameter tuning for models such as Extreme Gradient Boosting (XGBoost), Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Logistic ridge Regression (LR), and K-Nearest Neighbors (k-NN) were performed using the grid search in a nested stratified k-fold cross-validation approach. AUC metrics were used in the grid search to obtain the optimal hyperparameters for the machine learning models. In this study, the LR model was selected as the most suitable based on its performance across all evaluation metrics, which will be presented in detail in the next subsection. For the hyperparameters of the LR, the L2 penalty was chosen as the regularization method to help prevent overfitting. The ‘max_iter’ hyperparameter was set to 100 to define the maximum number of iterations allowed for the solvers to converge. Other hyperparameters selected for tuning included ‘C’, which controls the inverse of the regularization strength; ‘solver’, the algorithm used to optimize the model parameters; and ‘class_weight’, which addresses imbalances in the classes. These hyperparameters were tuned using the nested k-fold cross-validation approach and a grid search strategy. The search space and the optimal hyperparameters identified for the LR model are presented in Table 2.

3.2. Performance of Machine Learning Models

Table 3, Table 4 and Table 5 summarize the efficacy of different machine learning models across three metabolomics data sets, using five evaluation metrics. The values presented in these tables are the averages obtained from the 10-fold cross-validation process. First, we examine performance obtained by machine learning models in three metabolomics data sets based on each metric, so that we can identify which data set is most suitable for machine learning models to detect dry eye in cataract patients.
The best AUC result achieved by machine learning models on the merged data set was 0.8378 , surpassing its effectiveness on the ESI− and ESI+ data sets by 0.0495 and 0.0211 , respectively. The highest balanced accuracy recorded by machine learning models was 0.735 in both the merged and ESI+ data sets, outperforming the ESI− data set by 0.0233 . Similarly, for the MCC metric, models performed better on the merged and ESI+ data sets, achieving a score of 0.5147 , which is an improvement of 0.0801 over the ESI− data set. In terms of specificity, models scored highest on the ESI− data set at 0.7167 , exceeding scores in the merged and ESI+ data sets by 0.15 . For the F1-Score, the peak performance was observed in the ESI+ data set at 0.8669 , higher than in the merged and ESI− data sets by 0.0092 and 0.0174 .
These results suggest that models generally performed better on the merged data set in terms of AUC, balanced accuracy, and MCC, though they fell slightly short in specificity and F1-Score. Therefore, these outcomes indicate that the merged data set can enhance model performance across some metrics, and achieve more balanced performance across all metrics, suggesting a beneficial effect of merging the ESI+ and ESI− data sets on model efficacy.
Following, we explore the effectiveness of various machine learning models, focusing on the merged data set, as detailed in Table 5, to find the most suitable models for identifying dry eye in cataract patients. The results in Table 5 reveal that, compared to baseline approaches that involve random guesses or selecting the most frequent class (represented by Dummy Classifiers), almost all the models demonstrated better performance across multiple evaluation metrics.
According to Table 5, the LR model has the highest AUC at 0.8378 . The AUC metric, commonly used in medical machine learning applications for disease prediction problems, shows that the LR model effectively differentiates between the two classes. Compared to other tested machine learning methods, this model achieves a higher AUC. After LR, the RF and XGBoost models recorded the next highest scores of 0.7878 and 0.7861 , respectively. Due to the imbalance in the data set’s classes, other metrics besides the AUC were also considered to provide a clearer comparison of the models.
In terms of balanced accuracy, LR achieved the top score of 0.735 , outperforming other models in the merged data set. This metric highlights the model’s effectiveness on an imbalanced data set by valuing the performance across both minority and majority classes equally. XGBoost and RF followed with scores of 0.72 and 0.675 , respectively.
For the MCC, LR again led with a score of 0.5147 , indicating its better performance in handling imbalanced data sets compared to other models. XGBoost and RF followed with scores of 0.4639 and 0.42 , respectively.
For specificity, the LR and XGBoost achieved the highest score of 0.5667 , and MLP followed with a score of 0.45 . The negative samples are the minority classes in the data sets, making this metric important for demonstrating the models’ ability to correctly detect negative samples.
Regarding the F1-score, the RF model outperformed the others with a score of 0.8577 , closely followed by LR and XGBoost with scores of 0.8513 and 0.8337 , respectively.
In summary, this comparative analysis indicates that the LR model achieved the highest performance in AUC, balanced accuracy, MCC, and specificity compared to other machine learning models in the merged data set. With a performance close to highest score in F1-score, it maintained balanced performance across all metrics compared to other machine learning models in this study on the merged metabolomics data set.
Previous research shows that the LR model was successful in biological and clinical contexts [55,56]. The results of this study demonstrated that the LR model outperforms other machine learning models and suggests that it is possible to achieve high performance even with a quite simple model. The more complex models, can quickly suffer from overfitting on the fairly small data set in this study. Additionally, LR is generally computationally less intensive and achieves convergence faster than more complex methods. This could be another reason for more stable model training and prediction, especially in this study’s data set, characterised by high-dimensional spaces and a limited number of samples. This is further discussed in Section 3.3.
The next best machine learning models following the LR model on the merged data set are XGBoost and RF, which are ensemble models. They achieved satisfactory performance on some evaluation metrics, but underperformed on others, and overall did not achieve consistent performance across all evaluation metrics.
Confusion matrix and ROC curve for the two best models, LR and XGBoost, are presented in Figure 2 to complement the reported metrics by providing detailed insights into the model’s classification results and their ability to distinguish between DED-positive and DED-negative patients.
In this study, the weakest results were observed with k-NN and SVM. Despite the theoretical benefits often associated with SVM, especially with an RBF kernel, in the field of bioinformatics and for high-dimensional data sets [18,57,58], the results presented in Table 5 demonstrate performance comparable to that of baseline dummy classifiers across several evaluation metrics on the merged data set. Both the k-NN and SVM models demonstrated suboptimal outcomes in this study. These results emphasize the need for a careful and nuanced approach when selecting models for metabolomics data sets.
Furthermore, integrating data sets from different ionization modes into a single data set has proven to improve model training by providing a wider range of features and patterns. This method enhances the data pool and helps models generalize better and increase prediction accuracy.

3.3. Model Performance Consistency

Figure 3 shows classification performance in terms of AUC along the y-axis and the standard deviation of the measured AUC from the different cross-validation folds along the x-axis. The standard deviation therefore gives an indication of the level of consistency in performance for different machine learning methods under repeated training of the algorithm. If a method has high classification performance over the folds, we also expect high classification constancy over the folds. Or said in another way, it is not possible to achieve high classification performance and at the same time have low consistency. Inspecting the three panels in Figure 3, we see that this is the overall trend, but there are for sure also differences in consistency for methods with about the same classification performance. Intuitively, we can expect that algorithms that are simple to train with a convex loss landscape, such as logistic ridge regression (based on a linear classifier), might document higher constancy compared to models that have a more complex loss landscape. We see that logistic ridge regression documents high consistency relative to classification performance for all the three data sets (Figure 3a–c).
Among the methods with a mean AUC over 0.75 for the ESI+ data set (Figure 3a), we see that, in addition to logistic ridge regression, XGBoost also documents high constancy, while MLP, SVM, and RF document less consistency. Further, k-NN also documents poor constancy, which is as expected since the mean AUC is low.
Among the methods with a mean AUC over 0.7 for the ESI− data set (panel Figure 3b), we see that only logistic ridge regression documents high consistency, while MLP and XGBoost document medium consistency. The other methods document poorer classification performance, and as expected, consistency is also low.
Among the methods with a mean AUC over 0.75 for the merged data (Figure 3c), we see that in addition to logistic ridge regression, XGBoost also documents high consistency, while MLP, RF, and SVM document medium consistency.
To summarize, logistic ridge regression documents high constancy, XGBoost medium to high consistency in classification performance while the other methods document medium to low consistency.

3.4. Logistic Regression Model Interpretation

Table 6 displays Mass-to-Charge Ratio (M/Z) and Retention Time (RT) values of features, highlighting the ten highest estimated regression parameters, which refer to the molecules most strongly associated with dry eye. Among the ten highest values, all molecules, except one, specifically features with ranks 1, 2, 3, 4, 5, 6, 7, 9, and 10 from Table 6, have positive estimates, indicating their association with an increased risk of dry eye. One molecule, with rank 8 from Table 6, has a negative estimate, suggesting its association with a reduced risk of dry eye. This was expected, as the data sets are imbalanced, with a majority being cataract patients with dry eye disease.

3.5. Limitations and Future Works

The machine learning models used in this study were evaluated on data sets obtained from a single clinic. While this limits the generalizability of the findings, it ensures that the data set analyzed is based on the biological characteristics of the samples rather than on variations in sampling techniques or sample handling procedures. However, we are not aware of any other data set with the same properties as the one used in this study (tear film metabolomics and dry eye disease). Additionally, the size of the metabolomics data could be increased by collecting more patient samples. This would enable the exploration of more complex models, such as deep learning and transfer learning, potentially improving the classification of cataract patients with and without dry eye disease.
Machine learning models trained on metabolomics data cannot currently be used directly in clinical medical systems due to the complexity of data collection and the high thresholds required for metrics such as specificity, precision, and sensitivity in clinical applications. However, it is important to note that our study was designed to identify the best-performing machine learning model and gain insights into metabolomic features associated with dry eye disease, rather than to produce a clinically deployable tool. These insights can then be further explored in detail through biological studies to identify biomarkers associated with dry eye disease. The identified biomarkers can in the future potentially be used in clinical settings in combination with other patient information, and together reach the required thresholds for a reliable medical decision system.

4. Conclusions

In this study, we focused on applying machine learning techniques to classify cataract patients, distinguishing between those with and without dry eye disease in a cohort of patients scheduled for cataract surgery. Dry eyes adversely affect patients’ quality of life and represent a risk factor for patients undergoing cataract surgery. Selecting the most suitable machine learning models for metabolomics data sets is crucial because no single model excels in all scenarios, and their accuracy and interpretability can influence subsequent steps in the application of these models in the metabolomics field. Therefore, in response to the challenge of selecting the optimal machine learning model for our specific imbalanced binary classification metabolomics data sets, we evaluated eight machine learning models to determine the most effective method for classifying cataract patients with and without dry eye disease.
To assess these models, we employed k-fold cross-validation across the entire data sets and used nested k-fold cross-validation for tuning hyperparameters. Additionally, multiple evaluation metrics were utilized to gain a clearer understanding of the models’ behavior and effectiveness. This is particularly important because the data sets in this study were imbalanced.
The results highlighted that the LR model outperformed other machine learning models on the merged metabolomics data set, achieving an AUC of 0.8378 , balanced accuracy of 0.735 , and MCC of 0 . 5147 . It also achieved a specificity of 0.5667 and an F1-score close to that of the best model, with values of 0.8513 , demonstrating overall balanced performance across all evaluation metrics. The LR model indicates consistent performance across 10-fold cross-validation, making it a reliable machine learning model in this study compared to other models in the metabolomics data set. In real-world clinical settings, the consistency of models is important, as it implies that the model’s performance is reliable across different patient populations and can generalize well to new, unseen samples. XGBoost and RF showed good performance on some evaluation metrics but could not maintain consistent performance across all metrics and 10-folds. SVM with RBF kernel and k-NN methods also showed poor performance in this study.
Furthermore, the results of this study demonstrate that the integration of data sets with positive and negative ionization modes enriches the models’ training process. This contributes to a more nuanced feature set that enhances generalization capabilities and prediction accuracy, positioning data set merging as a strategic approach to improve machine learning model efficacy for these specific metabolomics data sets.
The LR model outperformed other models in this study, and the results obtained were promising. We believe that the LR model could aid medical experts in detection of dry eye disease in cataract patients with metabolomics data sets.

Author Contributions

Conceptualization, S.A.S., H.L.H. and M.A.R.; methodology, S.A.S., H.L.H. and M.A.R.; software, S.A.S., H.L.H. and M.A.R.; validation, S.A.S., H.L.H., M.A.R., Ø.A.U., M.G., K.B.P.E. and H.R.; formal analysis, S.A.S., H.L.H. and M.A.R.; investigation, S.A.S., H.L.H. and M.A.R.; resources, S.A.S., H.L.H., M.A.R., Ø.A.U., M.G., K.B.P.E., H.R. and K.G.G.; data curation, S.A.S., H.L.H., M.A.R., Ø.A.U., K.B.P.E., H.R. and M.G.; writing—original draft preparation, S.A.S., H.L.H., M.A.R., K.B.P.E., H.R., Ø.A.U., M.G. and K.G.G.; writing—review and editing, S.A.S., H.L.H., M.A.R., K.B.P.E., H.R., Ø.A.U., M.G., K.B.P.E., H.R. and K.G.G.; visualization, S.A.S.and H.L.H.; supervision, H.L.H. and M.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by Regional Committee for Medical and Health Research Ethics in Norway (protocol code 2020/140664).

Informed Consent Statement

Not applicable.

Data Availability Statement

The data sets used in this study are available in Github repository at https://github.com/sajadamouei/classification-metabolomics (accessed on 10 November 2024).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Machine Learning Models

For classifying cataract patients based on dry eye disease status in our experiment, six supervised machine learning techniques were employed, and two dummy classifiers with different strategies were used as baselines for the performance benchmark. The methods that were employed in this study are as follows:
XGBoost: XGBoost [47] operates as an ensemble learning method, utilizing a foundation of decision tree models. Designed on the concept of decision trees, it employs a gradient boosting (GB) system and is a modular method of tree boosting commonly used in machine learning. XGBoost integrates collections of classifiers with initially low precision through a process of iterative refinement, culminating in a classifier of superior accuracy. It offers parallel tree boosting, also known as Gradient Boosting Decision Tree (GBDT) or Gradient Boosting Machine (GBM), to address a range of data science problems quickly and accurately. Given its proficiency in managing high-dimensional sparse data, and ability to evaluate the importance of features, XGBoost stands out as a particularly suitable choice for analysing metabolomics data sets. Additionally, XGBoost has been widely used in various real-world applications, such as predicting disease outcomes, due to its computational efficiency and scalability [59].
RF [48]: RF is a type of classification model that creates numerous decision trees, each developed from various subsets of randomly chosen input variables [60]. This method is resistant to overfitting due to its ensemble nature and does not require scaling prior to analysis. In essence, RF is an effective classification tool that offers an inherent unbiased estimate of the error in generalisation, and is also resistant to anomalies [61].
SVM [49]: SVM represent a classification approach that differs from conventional methods by focussing not just on minimising training errors, but on reducing the upper limit of the generalisation error by maximising the margin between the training data and the separating hyperplane. An advantage of using SVMs, particularly with the Radial Basis Function (RBF) kernel, is their ability to handle non-linear relationships effectively. The RBF kernel transforms the input space into a higher-dimensional space where the data points can be linearly separable, allowing the creation of a more flexible decision boundary [62].
MLP [50]: MLP is a variant of feedforward Artificial Neural Networks (ANNs). It is structured around three principal layers: an input layer, one or more hidden layers, and an output layer. MLPs operate by forwarding input through the network in a single direction, through layers of unidirectionally connected nodes, and are commonly trained using the backpropagation technique. They incorporate non-linear activation functions, which transform the incoming signal at a node into the outgoing signal, facilitating the network’s ability to model complex relationships.
LR [51]: Logistic Regression is a statistical approach designed to examine binary outcomes by establishing the connection between the outcome and various predictor variables. This technique is widely applied in fields such as medicine, biology, and epidemiology to solve binary classification challenges, where the goal is to categorise data into one of two groups [63]. When dealing with data sets that contain a vast array of features, there’s a risk that logistic models might overfit. To address this, regularization methods come into play. This study employs L2 regularization (ridge) [64] within its algorithm, which is why we describe our method as logistic ridge regression. L2 regularization helps reduce overfitting by adding a penalty term proportional to the square of the feature weights. It prevents the model from overweighting certain features, resulting in better generalization [65].
k-NN [52]: k-NN algorithm is a non-parametric method used for classification and regression, which predicts the label of a sample by aggregating the labels of its k nearest neighbors in the feature space. Its benefits include simplicity, effectiveness in handling multi-class cases, and the ability to adapt its model based on the local data structure.
Dummy Classifier (DC): DC generates outcomes without considering the input features. It acts as a baseline for comparing the effectiveness of more sophisticated classification models and is not intended for practical problem-solving applications. In this study, the ‘uniform’ and ‘most frequent’ strategies were selected for the DC, which we refer to as DCU and DCM, respectively. The ‘uniform’ strategy predicts each class with equal probability, regardless of its frequency in the data set, thereby serving as a baseline to ensure that any predictive model performs better than chance. The ‘most frequent’ strategy always predicts the most common class observed in the training data set and serves as a baseline for the highest possible accuracy that any model can achieve by simply predicting the most frequent class.

Appendix A.2. Evaluation Metrics

In this study, a set of criteria was utilized for the evaluation, including the AUC, Balanced Accuracy, MCC, Specificity, and the F1-Score, which are described in detail as follows:
The AUC in binary classification refers to the area under the Receiver Operating Characteristic (ROC) curve. This curve plots the true positive rate against the false positive rate at various threshold settings, providing a measure of a model’s ability to discriminate between the positive and negative classes across all possible thresholds. In this context, an AUC value close to 1.0 indicates perfect prediction, while an AUC value of 0.5 suggests performance no better than chance. A higher AUC value indicates a stronger capability of the model to distinguish between cataract patients with and without dry eye. The AUC is often considered a reliable performance metric for imbalanced binary classification problems [66]. However, when the data set is imbalanced and the AUC has reached a high score, the classification performance may not be as perfect as the AUC value suggests. Therefore, we used other metrics to reflect different aspects of machine learning capability and compared them in the metabolomics data sets.
In the following metrics, TP represents the count of samples with dry eye disease correctly identified by the model. TN denotes the count of samples without dry eye disease accurately predicted as negative. In contrast, FP refers to samples without dry eye disease that were incorrectly predicted to have the disease, while FN accounts for samples with dry eye disease that were mistakenly classified as negative by the model.
The specificity metric indicates the proportion of correctly identified negative instances. It highlights a model’s proficiency in accurately classifying negative classes. This metric allows us to understand the model’s ability to identify cataract patients who don’t have dry eye, which is important given the lower prevalence of the negative class in the data sets of this study. The formula for specificity is provided in (A1).
S p e c i f i c i t y = T N T N + F P
The sensitivity assesses the percentage of positive instances that are accurately identified. It addresses the issue of how many actual positives are properly classified. In this study, the sensitivity metric provides insight into the model’s effectiveness in correctly detecting dry eye, which is the majority class. The formula for sensitivity is in (A2).
S e n s i t i v i t y = T P T P + F N
Balanced accuracy [67] calculates the average of specificity and sensitivity, making it a valuable metric for scenarios where there are unequal sample sizes across classes. It is appropriate for this study since the data sets are imbalanced, as it helps prevent overestimation of performance evaluations. This metric assesses the accuracy of each class individually and then determines the simple average of these accuracies, as demonstrated in (A3).
B a l a n c e d A c c u r a c y = 1 2 T P T P + F N + T N T N + F P
The MCC metric is appropriate for evaluating binary classification models, particularly when faced with imbalanced distributions of classes. By including true positives, true negatives, false positives, and false negatives in its calculation, MCC offers a balanced evaluation of a model’s accuracy. This metric provides insight into the model’s true predictive power in the context of predicting dry eye disease in cataract patients, where classes are imbalanced. The MCC formula is demonstrated in (A4).
M C C = T P × T N F P × F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N )
The F1-score is a harmonic mean of precision and sensitivity. This balance ensures that both sensitivity and precision are taken into account which discourages extreme values. In this study, the F1-score metric provides a balanced view of the model’s performance by correctly identifying cataract patients with dry eye (sensitivity) while minimizing the misclassification of cataract patients without dry eye (precision). The calculation of the F1-score follows (A5).
F 1 = 2 × p r e c i s i o n × s e n s i t i v i t y p r e c i s i o n + sensitivity

References

  1. Yazdani, M.; Elgstøen, K.B.P.; Rootwelt, H.; Shahdadfar, A.; Utheim, Ø.A.; Utheim, T.P. Tear metabolomics in dry eye disease: A review. Int. J. Mol. Sci. 2019, 20, 3755. [Google Scholar] [CrossRef] [PubMed]
  2. Naderi, K.; Gormley, J.; O’Brart, D. Cataract surgery and dry eye disease: A review. Eur. J. Ophthalmol. 2020, 30, 840–855. [Google Scholar] [CrossRef] [PubMed]
  3. Zeev, M.S.B.; Miller, D.D.; Latkany, R. Diagnosis of dry eye disease and emerging technologies. Clin. Ophthalmol. 2014, 8, 581–590. [Google Scholar] [PubMed]
  4. Dana, R.; Bradley, J.L.; Guerin, A.; Pivneva, I.; Stillman, I.Ö.; Evans, A.M.; Schaumberg, D.A. Estimated prevalence and incidence of dry eye disease based on coding analysis of a large, all-age United States health care system. Am. J. Ophthalmol. 2019, 202, 47–54. [Google Scholar] [CrossRef]
  5. Gomes, J.A.; Santo, R.M. The impact of dry eye disease treatment on patient satisfaction and quality of life: A review. Ocul. Surf. 2019, 17, 9–19. [Google Scholar] [CrossRef]
  6. Zheng, Y.; Wu, X.; Lin, X.; Lin, H. The prevalence of depression and depressive symptoms among eye disease patients: A systematic review and meta-analysis. Sci. Rep. 2017, 7, 46453. [Google Scholar] [CrossRef]
  7. Wolffsohn, J.S.; Arita, R.; Chalmers, R.; Djalilian, A.; Dogru, M.; Dumbleton, K.; Gupta, P.K.; Karpecki, P.; Lazreg, S.; Pult, H.; et al. TFOS DEWS II diagnostic methodology report. Ocul. Surf. 2017, 15, 539–574. [Google Scholar] [CrossRef]
  8. Choi, H.R.; Lee, J.H.; Lee, H.K.; Song, J.S.; Kim, H.C. Association between dyslipidemia and dry eye syndrome among the Korean middle-aged population. Cornea 2020, 39, 161–167. [Google Scholar] [CrossRef]
  9. Nam, S.M.; Peterson, T.A.; Butte, A.J.; Seo, K.Y.; Han, H.W. Explanatory model of dry eye disease using health and nutrition examinations: Machine learning and network-based factor analysis from a national survey. JMIR Med. Inform. 2020, 8, e16153. [Google Scholar] [CrossRef]
  10. Kaido, M.; Kawashima, M.; Yokoi, N.; Fukui, M.; Ichihashi, Y.; Kato, H.; Yamatsuji, M.; Nishida, M.; Fukagawa, K.; Kinoshita, S.; et al. Advanced dry eye screening for visual display terminal workers using functional visual acuity measurement: The Moriguchi study. Br. J. Ophthalmol. 2015, 99, 1488–1492. [Google Scholar] [CrossRef]
  11. Aggarwal, S.; Kheirkhah, A.; Cavalcanti, B.M.; Cruzat, A.; Jamali, A.; Hamrah, P. Correlation of corneal immune cell changes with clinical severity in dry eye disease: An in vivo confocal microscopy study. Ocul. Surf. 2021, 19, 183–189. [Google Scholar] [CrossRef] [PubMed]
  12. Deng, X.; Tian, L.; Liu, Z.; Zhou, Y.; Jie, Y. A deep learning approach for the quantification of lower tear meniscus height. Biomed. Signal Process. Control 2021, 68, 102655. [Google Scholar] [CrossRef]
  13. Elsawy, A.; Eleiwa, T.; Chase, C.; Ozcan, E.; Tolba, M.; Feuer, W.; Abdel-Mottaleb, M.; Abou Shousha, M. Multidisease deep learning neural network for the diagnosis of corneal diseases. Am. J. Ophthalmol. 2021, 226, 252–261. [Google Scholar] [CrossRef] [PubMed]
  14. Storås, A.M.; Strümke, I.; Riegler, M.A.; Grauslund, J.; Hammer, H.L.; Yazidi, A.; Halvorsen, P.; Gundersen, K.G.; Utheim, T.P.; Jackson, C.J. Artificial intelligence in dry eye disease. Ocul. Surf. 2022, 23, 74–86. [Google Scholar] [CrossRef]
  15. Tong, Y.; Lu, W.; Yu, Y.; Shen, Y. Application of machine learning in ophthalmic imaging modalities. Eye Vis. 2020, 7, 22. [Google Scholar] [CrossRef]
  16. Bali, A.; Mansotra, V. Analysis of deep learning techniques for prediction of eye diseases: A systematic review. Arch. Comput. Methods Eng. 2024, 31, 487–520. [Google Scholar] [CrossRef]
  17. Smoleńska, Ż.; Zdrojewski, Z. Metabolomics and its potential in diagnosis, prognosis and treatment of rheumatic diseases. Reumatol. 2015, 53, 152–156. [Google Scholar] [CrossRef]
  18. Galal, A.; Talal, M.; Moustafa, A. Applications of machine learning in metabolomics: Disease modeling and classification. Front. Genet. 2022, 13, 1017340. [Google Scholar] [CrossRef]
  19. Shah, H.A.; Liu, J.; Yang, Z.; Feng, J. Review of machine learning methods for the prediction and reconstruction of metabolic pathways. Front. Mol. Biosci. 2021, 8, 634141. [Google Scholar] [CrossRef]
  20. Mendez, K.M.; Reinke, S.N.; Broadhurst, D.I. A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification. Metabolomics 2019, 15, 150. [Google Scholar] [CrossRef]
  21. Delafiori, J.; Navarro, L.C.; Siciliano, R.F.; de Melo, G.C.; Busanello, E.N.B.; Nicolau, J.C.; Sales, G.M.; de Oliveira, A.N.; Val, F.F.A.; de Oliveira, D.N.; et al. Covid-19 automated diagnosis and risk assessment through metabolomics and machine learning. Anal. Chem. 2021, 93, 2471–2479. [Google Scholar] [CrossRef] [PubMed]
  22. Yagin, F.H.; Alkhateeb, A.; Raza, A.; Samee, N.A.; Mahmoud, N.F.; Colak, C.; Yagin, B. An Explainable Artificial Intelligence Model Proposed for the Prediction of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and the Identification of Distinctive Metabolites. Diagnostics 2023, 13, 3495. [Google Scholar] [CrossRef] [PubMed]
  23. Hu, C.; Li, L.; Li, Y.; Wang, F.; Hu, B.; Peng, Z. Explainable machine-learning model for prediction of in-hospital mortality in septic patients requiring intensive care unit readmission. Infect. Dis. Ther. 2022, 11, 1695–1713. [Google Scholar] [CrossRef] [PubMed]
  24. Tiedt, S.; Brandmaier, S.; Kollmeier, H.; Duering, M.; Artati, A.; Adamski, J.; Klein, M.; Liebig, T.; Holdt, L.M.; Teupser, D.; et al. Circulating metabolites differentiate acute ischemic stroke from stroke mimics. Ann. Neurol. 2020, 88, 736–746. [Google Scholar] [CrossRef]
  25. Nilsen, C.; Graae Jensen, P.; Gundersen, M.; Utheim, Ø.A.; Gjerdrum, B.; Gundersen, K.G.; Jahanlu, D.; Potvin, R. The significance of inter-eye osmolarity difference in dry eye diagnostics. Clin. Ophthalmol. 2023, 17, 829–835. [Google Scholar] [CrossRef]
  26. Jensen, P.; Nilsen, C.; Gundersen, M.; Gundersen, K.G.; Potvin, R.; Gazerani, P.; Chen, X.; Utheim, T.P.; Utheim, Ø.A. A Preservative-Free Approach–Effects on Dry Eye Signs and Symptoms After Cataract Surgery. Clin. Ophthalmol. 2024, 18, 591–604. [Google Scholar] [CrossRef]
  27. Gundersen, M.; Jensen, P.; Nilsen, C.; Yazdani, M.; Utheim, Ø.A.; Sandås, E.M.; Rootwelt, H.; Gundersen, K.G.; Elgstøen, K.B.P. Method Development for Omics Analyses using Schirmer Strips. Curr. Eye Res. 2024, 49, 708–716. [Google Scholar] [CrossRef]
  28. Graae Jensen, P.; Gundersen, M.; Nilsen, C.; Gundersen, K.G.; Potvin, R.; Gazerani, P.; Chen, X.; Utheim, T.P.; Utheim, Ø.A. Prevalence of dry eye disease among individuals scheduled for cataract surgery in a Norwegian cataract clinic. Clin. Ophthalmol. 2023, 17, 1233–1243. [Google Scholar] [CrossRef]
  29. Nilsen, C.; Gundersen, M.; Graae Jensen, P.; Gundersen, K.G.; Potvin, R.; Utheim, Ø.A.; Gjerdrum, B. The Significance of Dry Eye Signs on Preoperative Keratometry Measurements in Patients Scheduled for Cataract Surgery. Clin. Ophthalmol. 2024, 18, 151–161. [Google Scholar] [CrossRef]
  30. Skogvold, H.B.; Sandås, E.M.; Østeby, A.; Løkken, C.; Rootwelt, H.; Rønning, P.O.; Wilson, S.R.; Elgstøen, K.B.P. Bridging the polar and hydrophobic metabolome in single-run untargeted liquid chromatography-mass spectrometry dried blood spot metabolomics for clinical purposes. J. Proteome Res. 2021, 20, 4010–4021. [Google Scholar] [CrossRef]
  31. Ohno, T.; Sleighter, R.L.; Hatcher, P.G. Comparative study of organic matter chemical characterization using negative and positive mode electrospray ionization ultrahigh-resolution mass spectrometry. Anal. Bioanal. Chem. 2016, 408, 2497–2504. [Google Scholar] [CrossRef] [PubMed]
  32. Calderón-Santiago, M.; Fernández-Peralbo, M.A.; Priego-Capote, F.; Luque de Castro, M.D. MSCombine: A tool for merging untargeted metabolomic data from high-resolution mass spectrometry in the positive and negative ionization modes. Metabolomics 2016, 12, 43. [Google Scholar] [CrossRef]
  33. Sforza, S.; Silipo, A.; Molinaro, A.; Marchelli, R.; Parrilli, M.; Lanzetta, R. Determination of fatty acid positions in native lipid A by positive and negative electrospray ionization mass spectrometry. J. Mass Spectrom. 2004, 39, 378–383. [Google Scholar] [CrossRef] [PubMed]
  34. Amante, E.; Cerrato, A.; Alladio, E.; Capriotti, A.L.; Cavaliere, C.; Marini, F.; Montone, C.M.; Piovesana, S.; Laganà, A.; Vincenti, M. Comprehensive biomarker profiles and chemometric filtering of urinary metabolomics for effective discrimination of prostate carcinoma from benign hyperplasia. Sci. Rep. 2022, 12, 4361. [Google Scholar] [CrossRef] [PubMed]
  35. Panagopoulos Abrahamsson, D.; Wang, A.; Jiang, T.; Wang, M.; Siddharth, A.; Morello-Frosch, R.; Park, J.S.; Sirota, M.; Woodruff, T.J. A comprehensive non-targeted analysis study of the prenatal exposome. Environ. Sci. Technol. 2021, 55, 10542–10557. [Google Scholar] [CrossRef]
  36. Hu, M.; Müller, E.; Schymanski, E.L.; Ruttkies, C.; Schulze, T.; Brack, W.; Krauss, M. Performance of combined fragmentation and retention prediction for the identification of organic micropollutants by LC-HRMS. Anal. Bioanal. Chem. 2018, 410, 1931–1941. [Google Scholar] [CrossRef]
  37. Lin, P.; Rincon, A.G.; Kalberer, M.; Yu, J.Z. Elemental composition of HULIS in the Pearl River Delta Region, China: Results inferred from positive and negative electrospray high resolution mass spectrometric data. Environ. Sci. Technol. 2012, 46, 7454–7462. [Google Scholar] [CrossRef]
  38. Penanes, P.A.; Gorshkov, V.; Ivanov, M.V.; Gorshkov, M.V.; Kjeldsen, F. Potential of Negative-Ion-Mode Proteomics: An MS1-Only Approach. J. Proteome Res. 2023, 22, 2734–2742. [Google Scholar] [CrossRef]
  39. Cai, Z.; Poulos, R.C.; Liu, J.; Zhong, Q. Machine learning for multi-omics data integration in cancer. Iscience 2022, 25, 103798. [Google Scholar] [CrossRef]
  40. Sun, J.; Xia, Y. Pretreating and normalizing metabolomics data for statistical analysis. Genes Dis. 2023, 11, 100979. [Google Scholar] [CrossRef]
  41. Misra, B.B. Data normalization strategies in metabolomics: Current challenges, approaches, and tools. Eur. J. Mass Spectrom. 2020, 26, 165–174. [Google Scholar] [CrossRef] [PubMed]
  42. Van den Berg, R.A.; Hoefsloot, H.C.; Westerhuis, J.A.; Smilde, A.K.; Van der Werf, M.J. Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genom. 2006, 7, 142. [Google Scholar] [CrossRef] [PubMed]
  43. Jauhiainen, A.; Madhu, B.; Narita, M.; Narita, M.; Griffiths, J.; Tavare, S. Normalization of metabolomics data with applications to correlation maps. Bioinformatics 2014, 30, 2155–2161. [Google Scholar] [CrossRef] [PubMed]
  44. Alakwaa, F.M.; Chaudhary, K.; Garmire, L.X. Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data. J. Proteome Res. 2018, 17, 337–347. [Google Scholar] [CrossRef]
  45. Meena, J.; Hasija, Y. Application of explainable artificial intelligence in the identification of Squamous Cell Carcinoma biomarkers. Comput. Biol. Med. 2022, 146, 105505. [Google Scholar] [CrossRef]
  46. Li, B.; Tang, J.; Yang, Q.; Cui, X.; Li, S.; Chen, S.; Cao, Q.; Xue, W.; Chen, N.; Zhu, F. Performance evaluation and online realization of data-driven normalization methods used in LC/MS based untargeted metabolomics analysis. Sci. Rep. 2016, 6, 38881. [Google Scholar] [CrossRef]
  47. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  48. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  49. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  50. Baum, E.B. On the capabilities of multilayer perceptrons. J. Complex. 1988, 4, 193–215. [Google Scholar] [CrossRef]
  51. Hosmer, D.W., Jr.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  52. Franco-Lopez, H.; Ek, A.R.; Bauer, M.E. Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method. Remote Sens. Environ. 2001, 77, 251–274. [Google Scholar] [CrossRef]
  53. Shwartz-Ziv, R.; Armon, A. Tabular data: Deep learning is not all you need. Inf. Fusion 2022, 81, 84–90. [Google Scholar] [CrossRef]
  54. Marcot, B.G.; Hanea, A.M. What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Comput. Stat. 2021, 36, 2009–2031. [Google Scholar] [CrossRef]
  55. Waddington, K.E.; Papadaki, A.; Coelewij, L.; Adriani, M.; Nytrova, P.; Kubala Havrdova, E.; Fogdell-Hahn, A.; Farrell, R.; Dönnes, P.; Pineda-Torra, I.; et al. Using serum metabolomics to predict development of anti-drug antibodies in multiple sclerosis patients treated with IFNβ. Front. Immunol. 2020, 11, 544281. [Google Scholar] [CrossRef] [PubMed]
  56. Boateng, E.Y.; Abaye, D.A. A review of the logistic regression model with emphasis on medical research. J. Data Anal. Inf. Process. 2019, 7, 190. [Google Scholar] [CrossRef]
  57. Zheng, H.; Zheng, P.; Zhao, L.; Jia, J.; Tang, S.; Xu, P.; Xie, P.; Gao, H. Predictive diagnosis of major depression using NMR-based metabolomics and least-squares support vector machine. Clin. Chim. Acta 2017, 464, 223–227. [Google Scholar] [CrossRef]
  58. Corona, R.I.; Sudarshan, S.; Aluru, S.; Guo, J.T. An SVM-based method for assessment of transcription factor-DNA complex models. BMC Bioinform. 2018, 19, 49–57. [Google Scholar] [CrossRef]
  59. Yuan, Y.; Du, J.; Luo, J.; Zhu, Y.; Huang, Q.; Zhang, M. Discrimination of missing data types in metabolomics data based on particle swarm optimization algorithm and XGBoost model. Sci. Rep. 2024, 14, 152. [Google Scholar] [CrossRef]
  60. Cutler, D.R.; Edwards, T.C., Jr.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random forests for classification in ecology. Ecology 2007, 88, 2783–2792. [Google Scholar] [CrossRef]
  61. Strobl, C.; Malley, J.; Tutz, G. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychol. Methods 2009, 14, 323. [Google Scholar] [CrossRef]
  62. Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
  63. Nusinovici, S.; Tham, Y.C.; Yan, M.Y.C.; Ting, D.S.W.; Li, J.; Sabanayagam, C.; Wong, T.Y.; Cheng, C.Y. Logistic regression was as good as machine learning for predicting major chronic diseases. J. Clin. Epidemiol. 2020, 122, 56–69. [Google Scholar] [CrossRef] [PubMed]
  64. Ng, A.Y. Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; p. 78. [Google Scholar]
  65. Lewkowycz, A.; Gur-Ari, G. On the training dynamics of deep networks with L_2 regularization. Adv. Neural Inf. Process. Syst. 2020, 33, 4790–4799. [Google Scholar]
  66. Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man, Cybern. Part B (Cybern.) 2008, 39, 539–550. [Google Scholar]
  67. Jiao, Y.; Du, P. Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant. Biol. 2016, 4, 320–330. [Google Scholar] [CrossRef]
Figure 1. Operational Flow of proposed Dry Eye Disease classification.
Figure 1. Operational Flow of proposed Dry Eye Disease classification.
Diagnostics 14 02696 g001
Figure 2. Confusion matrices and ROC curves for logistic regression and XGBoost models. In the confusion matrices (subfigures (a,c)), the true labels are represented on the vertical axis, and the predicted labels are on the horizontal axis, with the color intensity indicating the number of predictions in each category. The ROC curves (subfigures (b,d)) plot the True Positive Rate against the False Positive Rate, demonstrating the trade-off between the two metrics for varying classification thresholds. The diagonal dashed line represents a random classifier (AUC = 0.5), while the blue line shows the performance of the respective model.
Figure 2. Confusion matrices and ROC curves for logistic regression and XGBoost models. In the confusion matrices (subfigures (a,c)), the true labels are represented on the vertical axis, and the predicted labels are on the horizontal axis, with the color intensity indicating the number of predictions in each category. The ROC curves (subfigures (b,d)) plot the True Positive Rate against the False Positive Rate, demonstrating the trade-off between the two metrics for varying classification thresholds. The diagonal dashed line represents a random classifier (AUC = 0.5), while the blue line shows the performance of the respective model.
Diagnostics 14 02696 g002
Figure 3. Visualizing model performance: The graph provides a comparison of AUC scores derived from 10-fold cross-validation, highlighting the balance between mean AUC and AUC SD. The lighter and darker shades indicate base and optimized models, respectively.
Figure 3. Visualizing model performance: The graph provides a comparison of AUC scores derived from 10-fold cross-validation, highlighting the balance between mean AUC and AUC SD. The lighter and darker shades indicate base and optimized models, respectively.
Diagnostics 14 02696 g003
Table 1. Metabolomics datasets information.
Table 1. Metabolomics datasets information.
M1 ESI+M1 ESI−
Samples8181
Metabolites1922939
DED−2727
DED+5454
Abbreviations: DED+ = Participants with dry eye disease, DED− = Participants without dry eye disease.
Table 2. Hyperparameter Tuning for Logistic Regression Model.
Table 2. Hyperparameter Tuning for Logistic Regression Model.
HyperparameterSearch AreaOptimal Value
C1.0, 0.11.0
Solverlbfgs, newton-cglbfgs
Class WeightNone, balancedNone
Table 3. Comparative Performance of Machine Learning Models on the ESI+ data set. The values presented are the averages obtained from the 10-fold cross-validation process.
Table 3. Comparative Performance of Machine Learning Models on the ESI+ data set. The values presented are the averages obtained from the 10-fold cross-validation process.
ModelAUCB. A.MCCSpec.F1-Score
XGB0.8167 0.66670.33450.51670.7937
RF0.76310.70830.47760.43330.8669
SVM0.77330.65170.34950.38330.8304
MLP0.74940.61170.25790.550.6885
LR0.81110.7350.51470.56670.8513
K-NN0.62920.62170.28320.30.8236
DCM0.50.5000.7987
DCU0.50.4867−0.03190.10.7522
Abbreviations: B. A. = Balanced Accuracy; Spec. = Specificity. Bold values indicate the best performance for the respective metric.
Table 4. Comparative Performance of Machine Learning Models on the ESI− data set. The values presented are the averages obtained from the 10-fold cross-validation process.
Table 4. Comparative Performance of Machine Learning Models on the ESI− data set. The values presented are the averages obtained from the 10-fold cross-validation process.
ModelAUCB.A.MCCSpec.F1-Score
XGB0.7883 0.71170.43460.51670.8495
RF0.56830.52830.07170.13330.7837
SVM0.53720.4433−0.136600.7367
MLP0.70560.63330.25920.71670.637
LR0.73110.6150.21350.41670.7725
K-NN0.45060.4533−0.1030.150.6859
DCM0.50.5000.7987
DCU0.50.4867−0.03190.10.7522
Abbreviations: B. A. = Balanced Accuracy; Spec. = Specificity. Bold values indicate the best performance for the respective metric.
Table 5. Comparative Performance of Machine Learning Models on the merged ESI+ and ESI− data set. The values presented are the averages obtained from the 10-fold cross-validation process.
Table 5. Comparative Performance of Machine Learning Models on the merged ESI+ and ESI− data set. The values presented are the averages obtained from the 10-fold cross-validation process.
ModelAUCB. A.MCCSpec.F1-Score
XGB0.78610.720.46390.5667 0.8337
RF0.78780.6750.420.36670.8577
SVM0.760.610.25830.30.8162
MLP0.78110.65670.34190.450.8071
LR0.83780.7350.51470.56670.8513
K-NN0.66140.61830.27560.33330.8083
DCM0.50.5000.7987
DCU0.50.4867−0.03190.10.7522
Abbreviations: B. A. = Balanced Accuracy; Spec. = Specificity. Bold values indicate the best performance for the respective metric.
Table 6. Top 10 molecular features (metabolites) most strongly associated with dry eye disease, sorted by the size of the estimated logistic regression parameters.
Table 6. Top 10 molecular features (metabolites) most strongly associated with dry eye disease, sorted by the size of the estimated logistic regression parameters.
RankM\ZRTCoefficient
1460.2692415.830.06868
2227.0824915.8310.06539
3149.0961215.8330.0641
4236.088115.830.05915
5499.1638815.8320.05755
6245.0934715.8270.05497
7137.045764.1770.05428
8350.9882923.703−0.0534
9587.2826415.8290.0533
10459.2027715.8660.05314
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Amouei Sheshkal, S.; Gundersen, M.; Alexander Riegler, M.; Aass Utheim, Ø.; Gunnar Gundersen, K.; Rootwelt, H.; Prestø Elgstøen, K.B.; Lewi Hammer, H. Classifying Dry Eye Disease Patients from Healthy Controls Using Machine Learning and Metabolomics Data. Diagnostics 2024, 14, 2696. https://doi.org/10.3390/diagnostics14232696

AMA Style

Amouei Sheshkal S, Gundersen M, Alexander Riegler M, Aass Utheim Ø, Gunnar Gundersen K, Rootwelt H, Prestø Elgstøen KB, Lewi Hammer H. Classifying Dry Eye Disease Patients from Healthy Controls Using Machine Learning and Metabolomics Data. Diagnostics. 2024; 14(23):2696. https://doi.org/10.3390/diagnostics14232696

Chicago/Turabian Style

Amouei Sheshkal, Sajad, Morten Gundersen, Michael Alexander Riegler, Øygunn Aass Utheim, Kjell Gunnar Gundersen, Helge Rootwelt, Katja Benedikte Prestø Elgstøen, and Hugo Lewi Hammer. 2024. "Classifying Dry Eye Disease Patients from Healthy Controls Using Machine Learning and Metabolomics Data" Diagnostics 14, no. 23: 2696. https://doi.org/10.3390/diagnostics14232696

APA Style

Amouei Sheshkal, S., Gundersen, M., Alexander Riegler, M., Aass Utheim, Ø., Gunnar Gundersen, K., Rootwelt, H., Prestø Elgstøen, K. B., & Lewi Hammer, H. (2024). Classifying Dry Eye Disease Patients from Healthy Controls Using Machine Learning and Metabolomics Data. Diagnostics, 14(23), 2696. https://doi.org/10.3390/diagnostics14232696

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop