Next Article in Journal
The Influence of Initial Saturation on the Slurry-Bled Water-Seepage Law of Isolated Overburden Grout Injection
Previous Article in Journal
Mexican Bee Honey Identification Using Sugar Crystals’ Image Histograms
Previous Article in Special Issue
Feature Selection with Small Data Sets: Identifying Feature Importance for Predictive Classification of Return-to-Work Date after Knee Arthroplasty
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection

Department of Biomedical Engineering, Faculty of Electronics Telecommunication and Informatics, Gdansk University of Technology, 80-233 Gdansk, Poland
Appl. Sci. 2024, 14(23), 11187; https://doi.org/10.3390/app142311187
Submission received: 25 October 2024 / Revised: 26 November 2024 / Accepted: 27 November 2024 / Published: 30 November 2024

Abstract

:

Featured Application

The presented study can contribute to increasing the familial hypercholesterolemia classification and may help reduce the number of undiagnosed cases of the disease.

Abstract

This study presents an assessment of familial hypercholesterolemia (FH) probability using different algorithms (CatBoost, XGBoost, Random Forest, SVM) and its ensembles, leveraging electronic health record data. The primary objective is to explore an enhanced method for estimating FH probability, surpassing the currently recommended Dutch Lipid Clinic Network (DLCN) Score. The models were trained using the largest Polish cohort of patients enrolled in an FH clinic, all of whom underwent genetic testing for FH-associated mutations. The initial dataset comprised over 100 parameters per patient, which was reduced to 48 clinically accessible features to ensure applicability in routine outpatient settings. To preserve balance, the data were stratified according to DLCN score ranges (<0–2>, <3–5>, <6–8>, and ≥9), representing varying levels of FH likelihood. The dataset was then split into training and test sets with an 80/20 ratio. Machine-learning models were trained, with hyperparameters optimized via grid search. The accuracy of the DLCN score in predicting FH was first evaluated by examining the proportion of patients with positive DNA tests relative to those with a DLCN score of 6 and above, the threshold for genetic testing. The DLCN score demonstrated an accuracy of approximately 40%. In contrast, the CatBoost model and its ensembles achieved over 80% accuracy. While the DLCN score remains a clinically valuable tool, its diagnostic accuracy is limited. The findings indicate that the ML models offer a substantial improvement in the precision of FH diagnosis, demonstrating its potential to enhance clinical decision making in identifying patients with FH.

1. Introduction

Familial hypercholesterolemia (FH) is an autosomal dominant genetic disorder of lipid metabolism, characterized by chronically elevated low-density lipoprotein cholesterol (LDL-C) levels in blood serum. This condition significantly accelerates the development of early atherosclerosis and predisposes individuals to premature coronary heart disease [1]. In Poland, FH affects approximately one in 250 adults [2], yet a substantial number remain undiagnosed. The estimate is that nearly 98% of affected individuals remain unidentified, totaling approximately 102,000 undiagnosed patients within the Polish population [3]. Early diagnosis is pivotal as it enables the timely implementation of intensive lifestyle modifications and pharmacotherapy, which are crucial in reducing the incidence of cardiovascular events associated with FH. Currently, the clinical diagnosis of FH strongly relies on the Dutch Lipid Clinic Network Score (DLCNS). It assigns points based on criteria such as premature myocardial infarction or stroke, family history of premature cardiovascular disease, maximal LDL-C levels, and physical findings related to cholesterol deposits. The total score helps classify individuals indicating the clinical likelihood of having FH as “possible”, “probable”, or “definite”. The DLCN’s diagnostic criteria have been validated and are recommended as a decision tool for initiating DNA diagnosis for FH [4]. Although the DLCNS is a recommended method for physicians in the diagnosis of the possibility of FH, it may be limited by the complexity of retrieving all the essential information, suggesting a crucial role of clinical judgment in the identification of FH subjects. [5]. The effectiveness of DLCNS in detecting FH is rather low in genetically confirmed patients (30% of patients with at least probable FH, DLCNS > 5 , had confirmation a of pathological or potentially pathological variant in their genome) [5,6,7].
The rapid advancements in artificial intelligence (AI) are becoming highly appealing for medical applications, opening new possibilities where algorithms can enhance medical workflows [8,9]. The most progress can be noticed in this area of medicine where imaging constitutes the background of the diagnosis. That is because convolutional neural networks (CNNs) as well as vision transformers have been recognized as powerful tools in visual recognition and are widely used in the medical domain. The combination of transformer-based and convolutional neural networks significantly improves medical image classification and segmentation [10,11,12]. The usefulness of AI and deep-learning algorithms in image-domain medicine is well documented, from breast tumor detection to applications in dermatology. However, machine-learning algorithms are also finding applications in medical domains, relying more on data rather than imaging. Machine-learning techniques have proven to be highly effective in medical diagnosis, particularly in classifying cell nuclei in cancer detection with impressive accuracy, sensitivity, and specificity, utilizing methods such as support vector machines (SVMs) [13,14]. In hematological malignancy management, ML applications extend to improving diagnosis, prognosis, and treatment by analyzing pathology, radiology, genomics, and electronic health record data [15]. Moreover, causal machine learning can enhance diagnostic accuracy by distinguishing correlation from causation, achieving performance comparable to the top quarter of medical professionals [16]. ML’s impact is especially evident in the management of rare diseases, where it is primarily used for diagnosis and prognosis, with ensemble methods being the most widely used algorithms [17].
Early detection of a disease is essential in terms of FH and necessary treatment. The latest study shows that machine learning can improve screening towards FH as well as risk assessment through analyzing various data [18]. ML algorithms like SVM or tree-based methods have emerged as one of the most flexible, intuitive, and powerful approaches for analyzing complex data. Machine-learning algorithms proved to be very useful in disciplines like logistics for predictive analysis for certain operations [19] and also in healthcare applications. These methods naturally lend themselves to creating patient subgroups for risk classification, making them highly suitable for diagnostic purposes. These algorithms can provide a more reliable virtual genetic test for familial hypercholesterolemia diagnosis, outperforming the clinical Dutch Lipid Score [20]. Using the ML methods for classifying the FH is often limited by the dataset size, especially when it comes to patients who had either confirmed or ruled out the FH variants through genetic testing. Therefore, machine-learning algorithms like decision trees (DT), logistic regression (LR), naive Bayes (NB), random forest (RF) [21], and extreme gradient boosting (XGB), explored as alternative classification methods for FH diagnosis, are often combined with techniques for handling imbalanced or small datasets [22]. While machine learning holds promise in enhancing the genetic diagnosis of FH, it is not intended to replace the Dutch Lipid Score on a population level but rather to complement it in specialized clinical settings [20]. In line with these advancements, this study aims to evaluate machine-learning algorithms like SVM, RF, and XGBoost, but also employ the CatBoost algorithm, which is currently a state-of-the-art gradient boosting technique on decision trees, to enhance the diagnostic accuracy of familial hypercholesterolemia classification. Moreover, the ensemble methods are evaluated to increase the accuracy of FH classification. In this paper, the preliminary results of using machine-learning (ML) algorithms applied to an extensive and balanced dataset of patients enrolled in one of the Polish lipid clinics are presented, to enhance the accuracy and objectivity of FH classification, providing a more reliable alternative to traditional scoring methods. The conducted research helps address the challenge of insufficient validation studies on ML models for detecting FH. It provides valuable insights into the potential of these models in clinical applications. To the best of my knowledge, it is the first time that the accuracy of machine-learning algorithms in the classification of FH was tested on such a large and diverse dataset.

2. Methods

The core of this research is the Electronic Health Record dataset provided by the National Center of Familial Hypercholesterolemia in Gdansk, Poland. The initial dataset has 1198 records with 101 parameters corresponding to the data gathered over 2 visits each enrolled patient had. Every record corresponds to one patient. The average age of a patient was 58 years old (youngest was 21, oldest was 88), the sex distribution was 58% female and 42% male, and all patients had a genetic test performed to confirm or deny FH variants. A total of 41% of patients had positive DNA testing towards pathological or potentially pathological variants of FH. The data covered the laboratory results, the nurse investigation, the medical investigation, as well as physical examination and DNA testing towards specific genetic variants. This study aims to provide a reliable method of assessing familial hypercholesterolemia based on the measurements that can be ordered and gathered in regular outpatient clinics. Therefore, the number of parameters was limited to 48. The new dataset contained the information on laboratory measurements of: total cholesterol (TC), LDL-C calculated and direct, high densitity lipoprotein (HDL), triglycerides (TG), glucose, creatinine level, EGFR, alanine transaminase (ALT), aspartate aminotransferase (AST), creatine kinase (CK), C reactive protein (CRP), thyroid stimulating hormone (TSH), urine test, glycated hemoglobin (HbA1c), lipoprotein a (Lp(a)), apolipoprotein B (APOB), the results of electrocardiogram (ECG) examination (heart rate, the presence of sinus rhythm or atrial fibrillation, ST deviation, T wave deviation, presence of LBBB/RBBB, pathological q, stimulation), information of physical examination (lungs and heart oscultation, examination of abdomen, presence of oedema of lower extremities, tendon xanthomas, xanthelasmas, corenal arcus), the information of patient sex, age, waist and hip circumference, body mass index (BMI), waist–hip ratio (WHR), heart rate (HR), systolic and diastolic blood pressure, highest reported LDL-C, highest reported TC, and LDL-C after recalculation in patients treated with lipid lowering therapy without known lipid results before treatment. The data were updated with results of DNA tests confirming or ruling out the presence of pathological or potentially pathological variants in LDLR, APOB, or PCSK9 genes as well as with DLCN score. The overall study design is presented in Figure 1.

2.1. Data Cleaning

The data cleaning and pre-processing steps were implemented to optimize the performance of the predictive model for familial hypercholesterolemia (FH) detection. The key procedures included:
  • Categorical data encoding: Categorical variables were converted to boolean data using one-hot encoding to ensure compatibility with the machine-learning algorithm.
  • Handling missing data: Rows containing empty fields were removed from the dataset to maintain the integrity and consistency of the input data.
  • Feature selection: Columns containing information on DLCN score and DNA results were excluded from the dataset to avoid potential data leakage or bias in model training.
  • Data splitting: The dataset was partitioned into training and testing sets, with an 80/20 split. Stratified sampling was performed to balance the representation of patients regarding the DLCN score. This technique splits the data in such a way that each subset has approximately the same proportion of different categories as the original dataset. I followed the recommended protocol for classifying patients based on their likelihood of having FH. This standard approach involves dividing patients into four categories according to the probability of the disease: unlikely (<3 points), possible (3–5 points), probable (6–8 points), and definite (>8 points) [23,24].

2.2. Classification Algorithm

Due to the nature of the dataset, which combined numerical and boolean data, it was decided to evaluate the machine-learning algorithms for classification including Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), Random Forest (RF), and Support Vector Machine (SVM). Moreover, the feed-forward neural network and TabNet architectures were also utilized. XGBoost and CatBoost are gradient-boosting algorithms that are particularly effective for handling categorical variables in machine-learning tasks. They both belong to the family of gradient-boosting algorithms, which sequentially combine multiple weak learners to create a strong predictive model. CatBoost uses an efficient method to convert categorical variables into numerical values during training, which reduces the pre-processing burden and potential information loss. The CatBoost is being considered more robust to overfitting due to implemented order-boosting. The XGBoost incorporates advanced regularization techniques, such as L1/L2 regularization. However, it needs to be addressed during hyperparameter tuning. In general, the XGBoost is considered to be more prone to overfiting than CatBoost. The other two algorithms are also widely used machine-learning methods for classification tasks. Support Vector Machine is particularly useful for high-dimensional data, as it can handle non-linearly separable data through the use of kernel functions. This feature can be useful in terms of FH classification, where the identification of complex patterns is desired. Random Forest, on the other hand, is an ensemble method that builds multiple decision trees. It is considered to be robust and capable of handling imbalanced datasets, which is often the case in medical diagnosis studies. All these algorithms have advantages and disadvantages. Nevertheless, they seem to be a suitable choice for the study aimed at classifying FH probability. For this study, the feed-forward neural network was proposed. The network consisted of three hidden layers with 128, 64, and 32 neurons, respectively, each using the ReLU activation function. A dropout regularization of 0.3 was applied after each hidden layer. The network was designed for binary classification; therefore, the sigmoid activation function was used as well as the binary cross-entropy loss function. TabNet [25] was the last model evaluated in this study. The TabNet model is designed to handle tabular data. It has an encoder-decoder architecture with a decision-step approach to dynamically select and process features. The models utilized in this study were selected due to the nature of the dataset.
The performance of the selected models relies on their hyperparameters. Therefore, optimizing them is crucial for improving model accuracy and generalization. For this study, it was decided to use a grid search to identify the optimal hyperparameters for the models. This systematic approach involves defining a grid of hyperparameter values and evaluating model performance across all possible combinations. Grid search optimization turned every positive result in other medical-related research, like the one focusing on predicting HIV/AIDS test results, improving prediction accuracy and robustness [26]. The mathematical simplicity of this approach is one of its advantages. The downside is that it requires certain knowledge of the models and their hyperparameters to create the list that helps to narrow the search. Model performance was assessed using accuracy as the scoring method. The hyperparameter combination providing the highest score was selected as the optimal configuration for each model. For different models, a different list of hyperparameters was defined. For the XGBoost, the optimized parameters were:
  • Sampling predictors
  • Sampling observations
  • The learning rate
  • The maximum depth of a tree
  • The number of trees
  • Minimum loss reduction required to make a further partition on a leaf node of the tree
  • Regularization parameter
For Random Forest:
  • The number of trees
  • Maximum depth of each tree
  • Minimum number of samples that must be present in a leaf node
  • Minimum number of samples required to split an internal node
  • Whether to bootstrap samples or not
For Support Vector Machine:
  • The regularization strength
  • Decision boundaries
  • The type of kernel function
For CatBoost:
  • The maximum depth of the individual decision trees
  • The number of boosting iterations
  • Regularization term that prevents overfitting
  • The step size for updating the model during each iteration
For TabNet:
  • Width of the decision prediction layer
  • Width of the attention embedding for each mask
  • Number of steps in the architecture
To prevent overfitting of the model, cross-validation was also performed. It is an iterative process that divides the train data into k (k = 5) partitions. Each iteration keeps one partition for testing and the remaining k-1 partitions for training the model. The next iteration will set the next partition as test data and the remaining k-1 as train data and so on. In each iteration, it will record the performance of the model and at the end give the average of all the performance. The SVM, Random Forest, and boosting algorithms are well known to provide good results for classification problems based on tabular datasets. In this study, ensemble methods like voting and stack generalization were evaluated to maximize the strength of selected algorithms. A voting ensemble is a machine-learning technique that merges predictions from several models to enhance overall performance. This approach aims to outperform any individual model within the ensemble by leveraging the strengths of multiple models. It has proved to be a useful technique to enhance better results in medical diagnosis like heart disease detection [27] or tuberculosis prediction [28]. Another ensemble-learning technique that was used was stacked generalization (stacking). It involves training a new model to synthesize predictions from multiple pre-trained models. This method enhances predictive performance by leveraging the strengths of individual models, often referred to as base or submodels. Like voting, it was also widely used to enhance the results in medical applications like drug dose estimation [29] or retinopathy diagnosis [30].
The software utilized for this research was Python 3.9.13, the algorithms were implemented using consecutive libraries: pytorch-tabnet 4.1.0, scikit-learn 1.5.2, xgboost 2.0.3, catboost 0.26.1, pandas 1.4.4, numpy 1.23.4, seaborn 0.12.1. The calculations and training that required GPU acceleration were performed on two NVIDIA GeForce GTX 1070 cards (Nvidia, Santa Clara, CA, USA).

3. Results

The very important part of this study was selecting optimal hyperparameters for utilized models. Below the results of grid search for hyperparameter optimization across four models are presented. For XGBoost, the optimized parameters were:
  • Sampling predictors (colsample_bytree): 0.8
  • Sampling observations (subsample): 0.6
  • The learning rate (learning_rate): 0.1
  • The maximum depth of a tree (max_depth): 3
  • The number of trees (n_estimators): 200
  • Minimum loss reduction required to make a further partition on a leaf node of the tree (gamma): 0.3
  • Regularization parameter (reg_alpha and reg_lambda): 0 and 2
For Random Forest:
  • The number of trees (n_estimators): 300
  • Maximum depth of each tree (max_depth): 30
  • Minimum number of samples that must be present in a leaf node (min_samples_leaf): 1
  • Minimum number of samples required to split an internal node (min_samples_split): 10
  • Whether to bootstrap samples or not (bootstrap): True
For Support Vector Machine:
  • The regularization strength (C): 1
  • Decision boundaries (gamma): 1
  • The type of kernel function (kernel): linear
For CatBoost:
  • The maximum depth of the individual decision trees (depth): 4
  • The number of boosting iterations (iterations): 100
  • Regularization term that prevents overfitting (l2_leaf_reg): 7
  • The step size for updating the model during each iteration (learning_rate): 0.05
The hyperparameters defined for the designed neural network were:
  • Learning rate (lr): 0.001
  • Optimizer (optimizer_fn): Adam
  • Loss function: binary crossentropy
  • batch size: 32
and for TabNet:
  • Learning rate (lr): 0.02
  • Optimizer (optimizer_fn): Adam
  • batch size: 32
  • Width of the decision prediction layer (n_d): 8
  • Width of the attention embedding for each mask (n_a): 8
  • Number of steps (n_steps): 3
The selected classification models were evaluated using standard performance metrics such as accuracy, recall, precision, and F1-Score. The results of the evaluation of CatBoost, XGBoost, SVM, Random Forest, and neural networks are presented in Table 1. The evaluation of ensemble models using voting is presented in Table 2, while Table 3 presents the evaluation using stack generalization. To gain further insights into the classification results, confusion matrices were generated for each model. These matrices provided a detailed breakdown of true positives, true negatives, false positives, and false negatives, offering a clearer picture of how each model performed in distinguishing between FH and non-FH cases. First, the confusion matrices were calculated for the initial models. The results are presented in Figure 2. Corresponding results for ensemble methods of voting and stack generalization are presented in Figure 3 and Figure 4, respectively.
In the case of machine-learning algorithms implemented for disease classification, it is desired to have a deeper insight on their performance among selected groups of patients. In this study, the results of ML is compared with the DLCN score that allows the assigning of patients into four groups based on the likelihood of FH. Moreover, a DLCN score above 5 is a reliable threshold for recommending genetic testing [31,32]. In the dataset used in this study, all patients had DNA results towards FH regardless of the DLCN score. Therefore, for comparison, the diagnostic accuracy of the DLCN score was assessed by calculating the ratio of patients with a score above 5 to those confirmed with FH through genetic testing. Moreover, the selected model performance was evaluated in consideration of the FH likelihood groups based on the DLCN score. This evaluation is presented in Table 4.
A subset accuracy measure restricted to positive cases was introduced to evaluate how the classification models compared to the accuracy of the DLCN score. This metric represents the proportion of true positives among all instances where the model predicted a positive outcome.
The results of this comparison are presented in Table 5.
To evaluate the performance of the selected models (CatBoost, Voting Cat-RF-SVM, and Stacking Cat-SVM), it was necessary to establish a baseline derived from the DLCN score. A threshold of DLCN > 5 was used to transform the DLCN score into binary outcomes. Another method used to provide a standard reference point was logistic regression applied to the DLCN score. Likewise, the binary outcome of DLCN score, the predictions generated by logistic regression, served as the baseline for assessing whether the machine-learning models offered statistically significant improvements.
McNemar’s test was then employed to compare the paired predictions based on the DLCN score and each machine-learning model. This non-parametric test is particularly suitable for evaluating differences in classification performance, as it analyzes the disagreement between models’ predictions on the same dataset. The results of the statistical analysis are presented in Table 6.

4. Discussion

This study aimed to find the most effective method for using machine-learning algorithms to classify the presence of FH. To achieve this objective, the dataset contained the parameters that are commonly assessed in all patients during the preliminary diagnostic process and differential diagnosis of hypercholesterolemia, focusing on excluding secondary causes of the condition. This data from physical examinations, ECG results, and biochemical parameters are typically collected and requested in primary care settings before referring patients to specialized clinics. The hyperparameter optimization was a key focus. Various algorithms can be used for this purpose; however, it was decided to use grid search due to its applicability across all selected models. Additionally, recent studies indicate that grid search performs comparably to Bayesian and genetic algorithms [27]. Although grid search is widely used for hyperparameter selection, it requires pre-defining the hyperparameter values, after which it performs an exhaustive search across all specified combinations. This requirement is a limitation, making it computationally expensive. Consequently, the range of considered hyperparameters was limited. However, one advantage of grid search is that it can be universally applied to the chosen models and ensures that all possible parameter combinations are evaluated.
Further, in this study, the selected machine-learning models were evaluated on a well-balanced dataset initially consisting of 1198 records. However, certain records had missing values in key parameters. Given the size of the dataset, it was decided to exclude records with missing data instead of applying interpolation or imputation methods. The motivation behind this decision was that interpolation or imputation could introduce bias or inaccuracies, particularly in a clinical context where the relationship between variables is complex, and that removing incomplete records ensured that the models were trained on high-quality, reliable data. Furthermore, the size of the dataset after removal remained sufficient for effective model training and evaluation, pre-serving the balance between classes and ensuring robust performance metrics. The resulting dataset consisted of 813 records. The selected models were trained on 80% of this data and tested on the remaining part.
Table 1 shows that among tested models, Categorical Boosting and Random Forest algorithms provided the best results reflected by F1-Scores of 0.709 and 0.718 successively. The first one demonstrated higher accuracy (0.804 vs. 0.798), while the second had better recall (0.677 vs. 0.629). As shown in Figure 2, all algorithms effectively classify non-FH patients (the best true negative ratio of 0.960 was obtained for SVM), though model performance drops slightly for FH patient classification which is reflected by recall (the best recall of 0.709 was obtained for NN). This issue was addressed through model ensembling, employing two ensemble methods: voting and stacking. Results in Table 2 and Figure 4 indicate that ensembling enhanced accuracy, although Figure 3 and Figure 4 show only modest improvements in FH classification among patients with confirmed DNA mutations. Both ensemble techniques were effective. However, the best voting results were achieved with three models (reflected by the highest F1-Score: 0.750), while stacking showed strong performance with an ensemble of two models (reflected by the highest F1-Score: 0.745).
To compare models to the accuracy of the DLCN score-based classification of the FH, the positive class accuracy was checked. The results are presented in Table 5. It can be noticed that models fail to recognize the cases of FH in the low probability group of patients with DLCN scores between 0 and 2 but the stacked Cat-SVM model achieved nearly 70% accuracy in classifying positive FH patients in all other groups (possible, probable, and definite). The presented results show that the ensemble model can give a reliable assessment of FH if the patient has a DLCN score between 3 and 5 without DNA confirmation. The accuracy of the DLCN score was lower than the proposed machine-learning models, indicating that an AI-based approach may offer a more precise and reliable method for FH classification. These results suggest that models like CatBoost can enhance diagnostic accuracy and reduce the variability inherent in the DLCN score. Additionally, this study demonstrates that model ensembles provide an advantage in the classification of FH patients, further supporting the potential of AI to improve diagnostic outcomes. Moreover, the statistical significance of conducted studies was checked with McNemar’s test. Due to the ordinal nature of the DLCN score, the direct comparison with machine-learning models is limited. Therefore, to provide a statistical comparison, the DLCN score was transformed into binary outcomes using two methods: threshold and logistic regression. The results presented in Table 6 indicate statistically significant differences (p < 0.05) between DLCN-derived outcomes and the models. This shows that proposed machine-learning models may provide a meaningful improvement over traditional clinical scoring methods in classifying FH. The tested models demonstrated high accuracy and precision, indicating that they effectively identified FH patients while minimizing false positives and false negatives. The F1-Score, which balances precision and recall, further confirmed the robustness of the model in handling this classification task. This suggests that machine-learning models can help to conclude genetic test recommendations towards FH. The selection of the methods proposed in this study was justified by the nature of the dataset. Conducted tests showed that incorporating more complex methods like the one based on neural networks could provide good recall (NN obtained highest recall = 0.709) but lower precision (0.727 for TabNet and 0.611 for NN). This can be explained by the fact that neural networks generally perform better on unstructured data, such as images. Also in this study, a strict method of handling the missing data was adopted. Considering the size of the utilized dataset, the approach based on model ensembles seems to be less prone to overfitting in comparison. It should be mentioned, though, that more advanced methods based on deep neural networks could be used to incorporate other data types, such as results of imaging diagnostics to further enhance the model’s predictive capabilities and provide a more comprehensive assessment.

5. Conclusions

To the best of my knowledge, this study represents the first application of machine-learning algorithms to such an extensive dataset of Polish patients with familial hypercholesterolemia. Previous studies have focused on the use of ML to assess FH based on the parameters related to the DLCN score. This study aimed to evaluate the potential of ML in assessing FH, not only by utilizing the parameters included in the DLCN scale but also by incorporating additional parameters that are routinely measured and used for differential diagnostics of hypercholesterolemia. The purpose of developing ML methods is not only to improve the clinical assessment of potential FH cases but also to create a tool capable of identifying undiagnosed cases. For this reason, the approach involves selecting and classifying FH based on a set of parameters intentionally simplified to exclude those typically ordered only in clinics specializing in FH treatment. While the DLCN score remains a valuable diagnostic tool for physicians, this study confirms its limited effectiveness. The proposed methods demonstrate its ability to effectively classify the presence of FH in patients also with a DLCN score below 6, who would not normally qualify for genetic testing. Implementing such a solution could open the pathway to further diagnostics for these patients, increasing the likelihood of early detection and more effective treatment of familial hypercholesterolemia. This draws the conclusion that perhaps ML models should complement the DLCN score or even serve as an alternative in specific clinical cases.

Funding

This work was partially supported by Statutory Founds of Faculty of Electronics Telecommunication and Informatics of Gdansk University of Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code and data utilized in this study are available on reasonable and qualified research requests. Every inquiry will undergo an assessment. Upon acceptance, signing an access agreement will be required.

Acknowledgments

The author would like to express sincere gratitude to the National Center for Familial Hypercholesterolemia in Gdansk for their invaluable support and collaboration throughout this study.

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Raal, F.J.; Santos, R.D. Homozygous familial hypercholesterolemia: Current perspectives on diagnosis and treatment. Atherosclerosis 2012, 223, 262–268. [Google Scholar] [CrossRef] [PubMed]
  2. Pajak, A.; Szafraniec, K.; Polak, M.; Drygas, W.; Piotrowski, W.; Zdrojewski, T.; Jankowski, P. Prevalence of familial hypercholesterolemia: A meta-analysis of six large, observational, population-based studies in Poland. Arch. Med. Sci. 2016, 12, 687–696. [Google Scholar] [CrossRef] [PubMed]
  3. Chlebus, K.; Cybulska, B.; Gruchała, M.; Smaga, A.; Wróbel, K.; Wojtyniak, B.; Pajkowski, M.; Jankowski, P.; Zdrojewski, T. Prevalence, diagnosis, and treatment of familial hypercholesterolaemia in outpatient practices in Poland. Pol. Heart J. (Kardiol. Pol.) 2018, 76, 960–967. [Google Scholar] [CrossRef] [PubMed]
  4. Walma, E.; Visseren, F.; Jukema, J.; Kastelein, J.; Hoes, A.; Stalenhoef, A. The practice guideline ’Diagnosis and treatment of familial hypercholesterolaemia’ of the Dutch Health Care Insurance Board. Ned. Tijdschr. Voor Geneeskd. 2006, 150, 18–23. [Google Scholar]
  5. Casula, M.; Olmastroni, E.; Pirillo, A.; Catapano, A.; Arca, M.; Averna, M.; Bertolini, S.; Calandra, S.; Tarugi, P.; Pellegatta, F.; et al. Evaluation of the performance of Dutch Lipid Clinic Network score in an Italian FH population: The LIPIGEN study. Atherosclerosis 2018, 277, 413–418. [Google Scholar] [CrossRef] [PubMed]
  6. Boccatonda, A.; Rossi, I.; D’Ardes, D.; Cocomello, N.; Perla, F.M.; Bucciarelli, B.; Rocco, G.D.; Ben, M.; Angelico, F.; Guagnano, M.T.; et al. Comparison between different diagnostic scores for the diagnosis of familial hypercholesterolemia: Assessment of their diagnostic accuracy in comparison with genetic testing. Eur. Heart J. 2020, 41, ehaa946.3206. [Google Scholar] [CrossRef]
  7. Mohammadnia, N.; Bax, W.; Cornel, J. Abstract 11337: Sensitivity Analysis of an Electronic Health Record-Based Algorithm to Facilitate Detection of Familial Hypercholesterolemia: Results in Genetically Confirmed Familial Hypercholesterolemia. Circulation 2021, 144, A11337. [Google Scholar] [CrossRef]
  8. Patel, V.L.; Shortliffe, E.H.; Stefanelli, M.; Szolovits, P.; Berthold, M.R.; Bellazzi, R.; Abu-Hanna, A. The coming of age of artificial intelligence in medicine. Artif. Intell. Med. 2009, 46, 5–17. [Google Scholar] [CrossRef] [PubMed]
  9. Castaneda, C.; Nalley, K.; Mannion, C.; Bhattacharyya, P.; Blake, P.; Pecora, A.; Goy, A.; Suh, K.S. Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J. Clin. Bioinform. 2015, 5, 1–16. [Google Scholar] [CrossRef] [PubMed]
  10. Dai, Y.; Gao, Y.; Liu, F. Transmed: Transformers advance multi-modal medical image classification. Diagnostics 2021, 11, 1384. [Google Scholar] [CrossRef] [PubMed]
  11. Manzari, O.N.; Ahmadabadi, H.; Kashiani, H.; Shokouhi, S.B.; Ayatollahi, A. MedViT: A robust vision transformer for generalized medical image classification. Comput. Biol. Med. 2023, 157, 106791. [Google Scholar] [CrossRef] [PubMed]
  12. Wu, X.; Feng, Y.; Xu, H.; Lin, Z.; Chen, T.; Li, S.; Qiu, S.; Liu, Q.; Ma, Y.; Zhang, S. CTransCNN: Combining transformer and CNN in multilabel medical image classification. Knowl.-Based Syst. 2023, 281, 111030. [Google Scholar] [CrossRef]
  13. Sidey-Gibbons, J.A.M.; Sidey-Gibbons, C. Machine learning in medicine: A practical introduction. BMC Med. Res. Methodol. 2019, 19, 64. [Google Scholar] [CrossRef]
  14. Huang, M.W.; Chen, C.W.; Lin, W.C.; Ke, S.W.; Tsai, C.F. SVM and SVM ensembles in breast cancer prediction. PLoS ONE 2017, 12, e0161501. [Google Scholar] [CrossRef] [PubMed]
  15. Radakovich, N.; Nagy, M.; Nazha, A. Machine learning in haematological malignancies. Lancet Haematol. 2020, 7, e541–e550. [Google Scholar] [CrossRef] [PubMed]
  16. Richens, J.G.; Lee, C.M.; Johri, S. Improving the accuracy of medical diagnosis with causal machine learning. Nat. Commun. 2020, 11, 3923. [Google Scholar] [CrossRef] [PubMed]
  17. Schaefer, J.; Lehne, M.; Schepers, J.; Prasser, F.; Thun, S. The use of machine learning in rare diseases: A scoping review. Orphanet J. Rare Dis. 2020, 15, 145. [Google Scholar] [CrossRef]
  18. Luo, R.F.; Wang, J.H.; Hu, L.J.; Fu, Q.A.; Zhang, S.Y.; Jiang, L. Applications of machine learning in familial hypercholesterolemia. Front. Cardiovasc. Med. 2023, 10, 1237258. [Google Scholar] [CrossRef] [PubMed]
  19. Rao, A.R.; Wang, H.; Gupta, C. Predictive Analysis for Optimizing Port Operations. arXiv 2024, arXiv:2401.14498. [Google Scholar]
  20. Pina, A.; Helgadottir, S.; Mancina, R.; Pavanello, C.; Pirazzi, C.; Montalcini, T.; Henriques, R.; Calabresi, L.; Wiklund, O.; Macedo, M.P.; et al. Virtual genetic diagnosis for familial hypercholesterolemia powered by machine learning. Eur. J. Prev. Cardiol. 2020, 27, 1639–1646. [Google Scholar] [CrossRef] [PubMed]
  21. Albuquerque, J.; Medeiros, A.M.; Alves, A.C.; Bourbon, M.; Antunes, M. Performance comparison of different classification algorithms applied to the diagnosis of familial hypercholesterolemia in paediatric subjects. Sci. Rep. 2022, 12, 1164. [Google Scholar] [CrossRef]
  22. Albuquerque, J.; Medeiros, A.M.; Alves, A.C.; Bourbon, M.; Antunes, M. Comparative study on the performance of different classification algorithms, combined with pre-and post-processing techniques to handle imbalanced data, in the diagnosis of adult patients with familial hypercholesterolemia. PLoS ONE 2022, 17, e0269713. [Google Scholar] [CrossRef] [PubMed]
  23. Benn, M.; Watts, G.; Tybjaerg-hansen, A.; Nordestgaard, B. Familial hypercholesterolemia in the danish general population: Prevalence, coronary artery disease, and cholesterol-lowering medication. J. Clin. Endocrinol. Metab. 2012, 97, 3956–3964. [Google Scholar] [CrossRef] [PubMed]
  24. Defesche, J.; Lansberg, P.; Umans-Eckenhausen, M.; Kastelein, J. Advanced method for the identification of patients with inherited hypercholesterolemia. Semin. Vasc. Med. 2004, 4, 59–65. [Google Scholar] [CrossRef] [PubMed]
  25. Arik, S.O.; Pfister, T. Tabnet: Attentive interpretable tabular learning. arXiv 2019, arXiv:1908.07442. [Google Scholar] [CrossRef]
  26. Belete, D.; Huchaiah, M.D. Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int. J. Comput. Appl. 2021, 44, 875–886. [Google Scholar] [CrossRef]
  27. Alibrahim, H.; Ludwig, S.A. Hyperparameter Optimization: Comparing Genetic Algorithm against Grid Search and Bayesian Optimization. In Proceedings of the 2021 IEEE Congress on Evolutionary Computation (CEC), Krakow, Poland, 28 June–1 July 2021; pp. 1551–1559. [Google Scholar] [CrossRef]
  28. Osamor, V.; Okezie, A. Enhancing the weighted voting ensemble algorithm for tuberculosis predictive diagnosis. Sci. Rep. 2021, 11, 14806. [Google Scholar] [CrossRef] [PubMed]
  29. Ma, Z.; Wang, P.; Gao, Z.; Wang, R.; Khalighi, K. Ensemble of machine learning algorithms using the stacked generalization approach to estimate the warfarin dose. PLoS ONE 2018, 13, e0205872. [Google Scholar] [CrossRef] [PubMed]
  30. Kaushik, H.; Singh, D.; Kaur, M.; Alshazly, H.; Zaguia, A.; Hamam, H. Diabetic retinopathy diagnosis from fundus images using stacked generalization of deep models. IEEE Access 2021, 9, 108276–108292. [Google Scholar] [CrossRef]
  31. Scicali, R.; Pino, A.D.; Platania, R.; Purrazzo, G.; Ferrara, V.; Giannone, A.; Urbano, F.; Filippello, A.; Rapisarda, V.; Farruggia, E.; et al. Detecting familial hypercholesterolemia by serum lipid profile screening in a hospital setting: Clinical, genetic and atherosclerotic burden profile. Nutr. Metab. Cardiovasc. Dis. NMCD 2018, 28, 35–43. [Google Scholar] [CrossRef] [PubMed]
  32. Chan, D.; Pang, J.; Hooper, A.; Bell, D.; Bates, T.; Burnett, J.; Watts, G. A Comparative Analysis of Phenotypic Predictors of Mutations in Familial Hypercholesterolemia. J. Clin. Endocrinol. Metab. 2018, 103, 1704–1714. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The overall study design presents consecutive steps as well as the selected parameters of utilized methods.
Figure 1. The overall study design presents consecutive steps as well as the selected parameters of utilized methods.
Applsci 14 11187 g001
Figure 2. Comparison of confusion matrices calculated for the test dataset. The figure illustrates the changes between classification using Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), classification by Support Vector Machine (SVM), Random Forest (RF), feed-forward neural network (NN), and TabNet.
Figure 2. Comparison of confusion matrices calculated for the test dataset. The figure illustrates the changes between classification using Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), classification by Support Vector Machine (SVM), Random Forest (RF), feed-forward neural network (NN), and TabNet.
Applsci 14 11187 g002
Figure 3. Comparison of confusion matrices calculated for the test dataset. The figure illustrates the changes between classification using a voting algorithm for model ensembles between Categorical Boosting (Cat), Extreme Gradient Boosting (XGB), classification by Support Vector Machine (SVM), and Random Forest (RF).
Figure 3. Comparison of confusion matrices calculated for the test dataset. The figure illustrates the changes between classification using a voting algorithm for model ensembles between Categorical Boosting (Cat), Extreme Gradient Boosting (XGB), classification by Support Vector Machine (SVM), and Random Forest (RF).
Applsci 14 11187 g003
Figure 4. Comparison of confusion matrices calculated for the test dataset. The figure illustrates the changes between classification using a stack generalization (stacking) algorithm for model ensembles between Categorical Boosting (Cat), Extreme Gradient Boosting (XGB), classification by Support Vector Machine (SVM), and Random Forest (RF).
Figure 4. Comparison of confusion matrices calculated for the test dataset. The figure illustrates the changes between classification using a stack generalization (stacking) algorithm for model ensembles between Categorical Boosting (Cat), Extreme Gradient Boosting (XGB), classification by Support Vector Machine (SVM), and Random Forest (RF).
Applsci 14 11187 g004
Table 1. The results of model evaluation trained to the classification of familial hypercholesterolemia.
Table 1. The results of model evaluation trained to the classification of familial hypercholesterolemia.
ModelAccuracyPrecisionRecallF1-Score
CatBoost0.8040.8120.6290.709
XGBoost0.7240.6600.5650.609
SVM0.7730.8790.4680.611
RF0.7980.7640.6770.718
NN0.7180.6110.7090.657
TabNet0.7420.7270.5160.603
Table 2. The results of model ensemble evaluation trained to the classification of familial hypercholesterolemia. The utilized ensemble method is voting.
Table 2. The results of model ensemble evaluation trained to the classification of familial hypercholesterolemia. The utilized ensemble method is voting.
ModelAccuracyPrecisionRecallF1-Score
Cat-RF-SVM0.8280.8400.6770.750
Cat-SVM0.8160.8480.6290.722
Cat-RF0.7980.7960.6290.703
RF-SVM0.7790.7500.6290.684
Table 3. The results of model ensemble evaluation trained to the classification of familial hypercholesterolemia. The utilized ensemble method is stack generalization.
Table 3. The results of model ensemble evaluation trained to the classification of familial hypercholesterolemia. The utilized ensemble method is stack generalization.
ModelAccuracyPrecisionRecallF1-Score
Cat-RF-SVM0.8160.8080.6770.737
Cat-SVM0.8280.8540.6610.745
Cat-RF0.8100.7920.6770.730
RF-SVM0.8040.7780.6770.724
Table 4. The results of the evaluation of selected model ensembles regarding the decision into groups of FH likelihood assessed based on the DLCN score.
Table 4. The results of the evaluation of selected model ensembles regarding the decision into groups of FH likelihood assessed based on the DLCN score.
Catboost
DLCN ScoreAccuracyPrecisionRecallF1-Score
<0, 2>0.6250.6670.2860.400
<3, 5>0.8120.8280.6860.750
<6, 8>0.8570.8890.6150.727
9 and above0.8000.7140.7140.714
Voting Cat-RF-SVM
DLCN ScoreAccuracyPrecisionRecallF1-Score
<0, 2>0.7501.0000.4290.600
<3, 5>0.8000.8000.6860.738
<6, 8>0.8810.9000.6920.783
9 and above0.9000.8570.8570.857
Stacking Cat-SVM
DLCN ScoreAccuracyPrecisionRecallF1-Score
<0, 2>0.6250.6670.2860.400
<3, 5>0.8350.8620.7140.781
<6, 8>0.8810.9000.6920.783
9 and above0.8500.8330.7140.769
Table 5. The comparison of the positive class accuracy calculated within groups with the accuracy of the DLCN score.
Table 5. The comparison of the positive class accuracy calculated within groups with the accuracy of the DLCN score.
DLCN ScoreCatBoostVoting Cat-RF-SVMStacking Cat-SVMDLCN Score Accuracy
<0, 2>0.2860.4290.286-
<3, 5>0.6860.6860.714-
<6, 8>0.6150.6920.6920.310
9 and above0.7140.8570.7140.350
Table 6. The results of McNemar’s statistical comparison.
Table 6. The results of McNemar’s statistical comparison.
Model 1Model 2p-Value
DLCN bin. pred.CatBoost0.000 (<0.05)
DLCN bin. predVoting Cat-RF-SVM0.000 (<0.05)
DLCN bin. predStacking Cat-SVM0.000 (<0.05)
logistic reg.CatBoost0.000 (<0.05)
logistic regVoting Cat-RF-SVM0.000 (<0.05)
logistic regStacking Cat-SVM0.000 (<0.05)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kocejko, T. A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection. Appl. Sci. 2024, 14, 11187. https://doi.org/10.3390/app142311187

AMA Style

Kocejko T. A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection. Applied Sciences. 2024; 14(23):11187. https://doi.org/10.3390/app142311187

Chicago/Turabian Style

Kocejko, Tomasz. 2024. "A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection" Applied Sciences 14, no. 23: 11187. https://doi.org/10.3390/app142311187

APA Style

Kocejko, T. (2024). A Data-Driven Comparative Analysis of Machine-Learning Models for Familial Hypercholesterolemia Detection. Applied Sciences, 14(23), 11187. https://doi.org/10.3390/app142311187

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop