Improving the Diagnosis of Systemic Lupus Erythematosus with Machine Learning Algorithms Based on Real-World Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Population
2.2. Study Designs and Data Analysis
2.3. Statistical Analysis
2.4. Machine Learning Analysis
3. Results
3.1. Analysis Results of Baseline Characteristics
3.2. Analysis Results of Lab Tests and SLE
3.3. Feature Selection
3.4. Hyperparameter Tuning
3.5. Machine Learning Performance
4. Discussion
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shin, J.M.; Kim, D.; Kwon, Y.C.; Ahn, G.Y.; Lee, J.; Park, Y.; Lee, Y.K.; Lee, T.H.; Park, D.J.; Song, Y.J.; et al. Clinical and Genetic Risk Factors Associated with the Presence of Lupus Nephritis. J. Rheum. Dis. 2021, 28, 150–158. [Google Scholar] [CrossRef] [PubMed]
- Nashi, R.A.; Shmerling, R.H. Antinuclear Antibody Testing for the Diagnosis of Systemic Lupus Erythematosus. Med. Clin. N. Am. 2021, 105, 387–396. [Google Scholar] [CrossRef] [PubMed]
- Olsen, N.J.; Karp, D.R. Finding Lupus in the ANA Haystack. Lupus Sci. Med. 2020, 7, e000384. [Google Scholar] [CrossRef] [PubMed]
- Hochberg, M.C. Updating the American College of Rheumatology Revised Criteria for The Classification of Systemic Lupus Erythematosus. Arthritis Rheum. 1997, 40, 1725. [Google Scholar] [CrossRef]
- Petri, M.; Orbai, A.M.; Alarcon, G.S.; Gordon, C.; Merrill, J.T.; Fortin, P.R.; Bruce, I.N.; Isenberg, D.; Wallace, D.J.; Nived, O.; et al. Derivation and Validation of The Systemic Lupus International Collaborating Clinics Classification Criteria for Systemic Lupus Erythematosus. Arthritis Rheum. 2012, 64, 2677–2686. [Google Scholar] [CrossRef] [PubMed]
- Aringer, M.; Costenbader, K.; Daikh, D.; Brinks, R.; Mosca, M.; Ramsey-Goldman, R.; Smolen, J.S.; Wofsy, D.; Boumpas, D.T.; Kamen, D.L.; et al. 2019 European League Against Rheumatism/American College of Rheumatology Classification Criteria for Systemic Lupus Erythematosus. Arthritis Rheumatol. 2019, 71, 1400–1412. [Google Scholar] [CrossRef]
- Andrade, L.E.C.; Damoiseaux, J.; Vergani, D.; Fritzler, M.J. Antinuclear Antibodies (ANA) as a Criterion for Classification and Diagnosis of Systemic Autoimmune Diseases. J. Transl. Autoimmun. 2022, 5, 100145. [Google Scholar] [CrossRef]
- Waits, J.B. Rational use of laboratory testing in the initial evaluation of soft tissue and joint complaints. Prim. Care 2010, 37, 673–689. [Google Scholar] [CrossRef]
- Yazdany, J.; Schmajuk, G.; Robbins, M.; Daikh, D.; Beall, A.; Yelin, E.; Barton, J.; Carlson, A.; Margaretten, M.; Zell, J.; et al. Choosing Wisely: The American College of Rheumatology’s Top 5 List of Things Physicians and Patients Should Question. Arthritis Care Res. 2013, 65, 329–339. [Google Scholar] [CrossRef]
- Kang, S.H.; Seo, Y.I.; Lee, M.H.; Kim, H.A. Diagnostic Value of Anti-Nuclear Antibodies: Results from Korean University-Affiliated Hospitals. J. Korean Med. Sci. 2022, 37, e159. [Google Scholar] [CrossRef]
- Qaseem, A.; Alguire, P.; Dallas, P.; Feinberg, L.E.; Fitzgerald, F.T.; Horwitch, C.; Humphrey, L.; LeBlond, R.; Moyer, D.; Wiese, J.G.; et al. Appropriate Use of Screening and Diagnostic Tests to Foster High-Value, Cost-Conscious Care. Ann. Intern. Med. 2012, 156, 147–149. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Wang, M.; Zhao, S.; Yan, Y. Machine Learning for Diagnosis of Systemic Lupus Erythematosus: A Systematic Review and Meta-Analysis. Comput. Intell. Neurosci. 2022, 1, 7167066. [Google Scholar] [CrossRef] [PubMed]
- Murray, S.G.; Avati, A.; Schmajuk, G.; Yazdany, J. Automated and flexible identification of complex disease: Building a model for systemic lupus erythematosus using noisy labeling. J. Am. Med. Inform. Assoc. 2019, 26, 61–65. [Google Scholar] [CrossRef]
- Observational Health Data Sciences and Informatics. Data Standardization. Available online: https://www.ohdsi.org/data-standardization (accessed on 12 September 2024).
- Van Buuren, S.; Groothuis-Oudshoorn, K. mice: Multivariate Imputation by Chained Equations in R. J. Stat. Softw. 2011, 45, 1–67. [Google Scholar] [CrossRef]
- Van Buuren, S. Flexible Imputation of Missing Data, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2018. [Google Scholar]
- Song, Y.-Y.; Lu, Y. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130–135. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef]
- Bakas, S.; Reyes, M.; Jakab, A.; Bauer, S.; Rempfler, M.; Crimi, A.; Shinohara, R.T.; Berger, C.; Ha, S.M.; Rozycki, M.; et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv 2018, arXiv:1811.02629. [Google Scholar]
- Song, X.; Waitman, L.R.; Hu, Y.; Yu, A.S.; Robins, D.; Liu, M. Robust clinical marker identification for diabetic kidney disease with ensemble feature selection. J. Am. Med. Inform. Assoc. 2019, 26, 242–253. [Google Scholar] [CrossRef]
- Wainer, J.; Cawley, G. Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst. Appl. 2021, 182, 115222. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mac Mach. Learn. Res. 2011, 1, 2825–2830. [Google Scholar]
- Aygun, E.; Kelesoglu, F.M.; Dogdu, G.; Ersoy, A.; Basbug, D.; Akca, D.; Cam, O.N.; Akyuz, B.; Gunsay, T.; Kapici, A.H.; et al. Antinuclear Antibody Testing in a Turkish Pediatrics Clinic: Is it Always Necessary? Pan Afr. Med. J. 2019, 32, 181. [Google Scholar] [CrossRef]
- Abeles, A.M.; Abeles, M. The Clinical Utility of a Positive Antinuclear Antibody Test Result. Am. J. Med. 2013, 126, 342–348. [Google Scholar] [CrossRef] [PubMed]
Group 1 n = 17,263 | Group 2 n = 7392 | Group 3 n = 276 | p-Value | |
---|---|---|---|---|
Age | ||||
0–15, n (%) | 1871 (10.8) | 457 (6.2) | 15 (5.4) | <0.001 |
16–29, n (%) | 2174 (12.6) | 695 (9.4) | 68 (24.6) | <0.001 |
30–39, n (%) | 2096 (12.1) | 628 (8.5) | 53 (19.2) | <0.001 |
40–49, n (%) | 2708 (15.7) | 1039 (14.1) | 52 (18.8) | 0.001 |
50–59, n (%) | 3719 (21.5) | 1610 (21.8) | 38 (13.8) | 0.006 |
60–69, n (%) | 2690 (15.6) | 1577 (21.3) | 29 (10.5) | <0.001 |
≥70, n (%) | 2005 (11.6) | 1386 (18.8) | 21 (7.6) | <0.001 |
Overall age mean ± SD | 45.5 ± 20.6 | 51.9 ± 19.6 | 41.5 ± 18 | <0.001 |
Sex | ||||
Male, n (%) | 7327 (42.4) | 2692 (36.4) | 45 (16.3) | <0.001 |
Female, n (%) | 9936 (57.6) | 4700 (63.6) | 231 (83.7) | <0.001 |
Autoimmune diseases | ||||
Autoimmune hepatitis, n (%) | 145 (0.8) | 310 (4.2) | 4 (1.5) | <0.001 |
Idiopathic inflammatory myopathies, n (%) | 29 (0.2) | 33 (0.5) | 3 (1.1) | <0.001 |
Juvenile rheumatoid arthritis, n (%) | 35 (0.2) | 10 (0.1) | 0 (0) | 0.59 |
Overlap syndrome, n (%) | 5 (0) | 31 (0.4) | 5 (1.8) | <0.001 |
Raynaud’s disease, n (%) | 134 (0.8) | 106 (1.4) | 4 (1.5) | <0.001 |
Sjögren’s syndrome, n (%) | 95 (0.6) | 187 (2.5) | 13 (4.7) | <0.001 |
Systemic sclerosis, n (%) | 29 (0.2) | 131 (1.8) | 7 (2.5) | <0.001 |
SLE, n (%) | 0 (0) | 0 (0) | 276 (100) | <0.001 |
ANA titer | ||||
<1:80, n (%) | 17,263 (100) | 0 (0) | 0 (0) | <0.001 |
1:80, n (%) | 0 (0) | 5367 (72.6) | 67 (24.3) | <0.001 |
1:160, n (%) | 0 (0) | 1023 (13.8) | 62 (22.5) | <0.001 |
1:320, n (%) | 0 (0) | 401 (5.4) | 30 (10.9) | <0.001 |
1:640, n (%) | 0 (0) | 225 (3) | 37 (13.4) | <0.001 |
≥1:1280, n (%) | 0 (0) | 376 (5.1) | 80 (29) | <0.001 |
Autoantibodies | ||||
Anti-dsDNA IgG, n (%) | 464/2789 (16.6) | 699/2163 (32.3) | 130/266 (48.9) | <0.001 |
Anti-SS A (Anti-Ro), n (%) | 114/1893 (6) | 379/1540 (24.6) | 107/198 (54) | <0.001 |
Anti-SS B (Anti-La), n (%) | 29/1775 (1.6) | 153/1378 (11.1) | 39/197 (19.8) | <0.001 |
Anti-Smith Ab, n (%) | 17/1312 (1.3) | 24/1387 (1.7) | 46/229 (20.1) | <0.001 |
Anti-RNP Ab, n (%) | 29/919 (3.2) | 60/968 (6.2) | 27/130 (20.8) | <0.001 |
Anti-SCL-70, n (%) | 12/730 (1.6) | 48/910 (5.3) | 2/69 (2.9) | <0.001 |
Anti-Jo-1 Ab, n (%) | 16/800 (2) | 15/551 (2.7) | 0/47 (0) | 0.076 |
Anti-Centromere Ab, n (%) | 10/615 (1.6) | 94/852 (11) | 5/64 (7.8) | <0.001 |
ANA Titer | No. of Patients with Autoimmune Diseases (%) | p-Value for Trend | No. of Patients with SLE (%) | p-Value for Trend |
---|---|---|---|---|
1:80 | 297/5434 (5.5) | <0.001 | 67/5434 (1.2) | <0.001 |
1:160 | 195/1085 (18) | 62/1085 (5.7) | ||
1:320 | 124/431 (28.8) | 30/431 (7) | ||
1:640 | 131/262 (50) | 37/262 (14.1) | ||
≥1:1280 | 261/456 (57.2) | 80/456 (17.5) | ||
Total | 1008/7668 (13.2) | 276/7668 (3.6) |
Inflammation Markers | Group 1 n = 17,263 | Group 2 n = 7392 | Group 3 n = 276 | p-Value |
---|---|---|---|---|
CRP | 1.54 ± 4.01 | 1.35 ± 3.52 | 1.88 ± 4.43 | <0.001 |
WBC | 7563 ± 4169 | 7223 ± 4957 | 3898 ± 3553 | <0.001 |
Neutrophil count | 4680 ± 3379 | 4438 ± 3336 | 3880 ± 3205 | <0.001 |
Lymphocyte count | 2069 ± 1156 | 1986 ± 1805 | 1432 ± 859 | <0.001 |
Platelet count | 245,510 ± 97,094 | 241,005 ± 96,262 | 199,322 ± 107,689 | <0.001 |
NLR | 3.21 ± 5.19 | 3.02 ± 4.26 | 3.82 ± 5.19 | 0.002 |
PLR | 109.57 ± 117.42 | 80.99 ± 129.53 | 163.75 ± 169.30 | <0.001 |
Rank | Decision Tree | Random Forest | GBM | |||
---|---|---|---|---|---|---|
Feature | Importance | Feature | Importance | Feature | Importance | |
1 | Antinuclear Ab (ANA) | 0.546 | Antinuclear Ab (ANA) | 0.456 | Antinuclear Ab (ANA) | 0.506 |
2 | PLR | 0.093 | PLR | 0.083 | Anti-dsDNA IgG | 0.072 |
3 | Lymphocyte count | 0.084 | Anti-dsDNA IgG | 0.006 | PLR | 0.066 |
4 | Anti-dsDNA IgG | 0.071 | Age | 0.055 | Lymphocyte count | 0.051 |
5 | Anti-SS-A | 0.058 | Lymphocyte count | 0.051 | Age | 0.050 |
6 | Age | 0.033 | WBC | 0.042 | Platelet count | 0.036 |
7 | WBC | 0.032 | Platelet count | 0.038 | Anti-SS-A | 0.036 |
8 | NLR | 0.020 | Anti-SS-A | 0.037 | WBC | 0.035 |
9 | Platelet count | 0.019 | CRP | 0.034 | CRP | 0.029 |
10 | CRP | 0.010 | Neutrophil count | 0.033 | Neutrophil count | 0.025 |
Model | Parameters | Hyperparameters for Each Feature Set | ||
---|---|---|---|---|
10 | 20 | All | ||
Decision Tree | max_depth | 5 | 6 | 10 |
min_samples_split | 10 | 15 | 20 | |
Random Forest | n_estimators | 100 | 100 | 300 |
max_depth | 8 | 10 | 10 | |
min_samples_split | 2 | 3 | 3 | |
GBM | n_estimators | 300 | 300 | 300 |
max_depth | 5 | 5 | 6 | |
learning_rate | 10 | 15 | 10 |
Number of Features | Decision Tree | Random Forest | GBM | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F-1 Score | Accuracy | Precision | Recall | F-1 Score | Accuracy | Precision | Recall | F-1 Score | Accuracy | |
10 | 0.87 0.03 | 0.85 0.04 | 0.85 0.04 | 0.85 0.04 | 0.85 0.04 | 0.84 0.04 | 0.84 0.04 | 0.85 0.04 | 0.86 0.04 | 0.86 0.04 | 0.86 0.04 | 0.87 0.04 |
20 | 0.73 0.05 | 0.70 0.05 | 0.70 0.05 | 0.70 0.05 | 0.89 0.03 | 0.88 0.03 | 0.88 0.03 | 0.88 0.03 | 0.87 0.04 | 0.86 0.04 | 0.86 0.04 | 0.87 0.04 |
All | 0.81 0.04 | 0.81 0.04 | 0.81 0.04 | 0.81 0.04 | 0.88 0.03 | 0.88 0.03 | 0.88 0.03 | 0.88 0.03 | 0.86 0.04 | 0.85 0.04 | 0.85 0.04 | 0.85 0.04 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, M. Improving the Diagnosis of Systemic Lupus Erythematosus with Machine Learning Algorithms Based on Real-World Data. Mathematics 2024, 12, 2849. https://doi.org/10.3390/math12182849
Park M. Improving the Diagnosis of Systemic Lupus Erythematosus with Machine Learning Algorithms Based on Real-World Data. Mathematics. 2024; 12(18):2849. https://doi.org/10.3390/math12182849
Chicago/Turabian StylePark, Meeyoung. 2024. "Improving the Diagnosis of Systemic Lupus Erythematosus with Machine Learning Algorithms Based on Real-World Data" Mathematics 12, no. 18: 2849. https://doi.org/10.3390/math12182849
APA StylePark, M. (2024). Improving the Diagnosis of Systemic Lupus Erythematosus with Machine Learning Algorithms Based on Real-World Data. Mathematics, 12(18), 2849. https://doi.org/10.3390/math12182849