Insights into Amyotrophic Lateral Sclerosis from a Machine Learning Perspective
Abstract
:1. Introduction
2. Background to ALS
3. Methodology
3.1. Data Understanding: The PRO-ACT Database
- Static variables: Variables for which values were determined in one clinic visit and were no longer tested in later visits. These variables are fixed per patient and cannot change over time. Some examples are gender, ethnicity, time of onset, and diagnosis.
- Temporal variables: Variables whose values can change over time. They were tested at multiple clinic visits throughout the trials, and appear several times for each patient. Examples of these are pulse, blood pressure, weight, and laboratory test results.
- Target variables: The ALSFRS values for ten items. They too were measured in all clinic visits during the trial, and therefore are considered temporal data. We created the distinction between these and the regular temporal variables, as ALSFRS items have the role of target variables. These variables take values ranging from 0 to 4, 4 representing normal function, and 0 being the absolute loss of the respective function. Note that patients in the database have either ALSFRS or ALSFRS-R values documented. ALSFRS-R is a revised version of the ALSFRS system, which expands the respiratory function to three separate values [22]. For uniformity’s sake, we required that each patient have the same number of values for the respiratory function. Since ALSFRS cannot be expanded, it was necessary to “collapse” the ALSFRS-R respiratory values into one value (as in [32]). After consulting with experts at Prize4Life, it was decided that the value of Dyspnea would be the most accurate representation for respiratory capacity, for patients whose disease state was documented using the ALSFRS-R rating system.
3.2. Data Preparation
3.3. Feature Selection
Algorithm 1: Feature selection by the criterion [11] |
Input: Potential features, K the maximal feature sub-set size desired, and ALSFRS target value Output: Selected feature sub-set for the ALSFRS target value fork = 1:Kdo end Return: with the highest accuracy on the validation set |
3.4. Modeling
3.4.1. Prediction by Multi-Class and Ordinal Classification
3.4.2. Knowledge Representation by Bayesian Networks
3.5. Experimentation
4. Results and Analysis
4.1. Multi-Class Classification
4.2. Ordinal Classification
4.3. Knowledge Representation and Explanation
4.3.1. Decision-Tree Based Explanation
4.3.2. Bayesian Network-Based Explanation
4.3.3. Analysis of Variable Value Combinations
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Mitchell, D.; Borasio, G. Amyotrophic lateral sclerosis. Lancet 2007, 369, 2031–2041. [Google Scholar] [CrossRef]
- Rothstein, J.D. Current hypotheses for the underlying biology of amyotrophic lateral sclerosis. Ann. Neurol. 2009, 65, 3–9. [Google Scholar] [CrossRef]
- Rowland, L.P.; Schneider, N.A. Amyotrophic lateral sclerosis. N. Engl. J. Med. 2001, 344, 1688–1700. [Google Scholar] [CrossRef]
- Kiernan, M.; Vucic, S.; Cheah, B.; Turner, M.; Eisen, A.; Hardiman, O.; Burrell, J.; Zoing, M. Amyotrophic lateral sclerosis. Lancet 2011, 377, 942–955. [Google Scholar] [CrossRef] [Green Version]
- Turner, M.; Kiernan, M.; Leigh, N.; Talbot, K. Biomarkers in amyotrophic lateral sclerosis. Lancet Neurol. 2009, 8, 94–109. [Google Scholar] [CrossRef]
- Gordon, P.; Meininger, V. How can we improve clinical trials in amyotrophic lateral sclerosis? Nat. Rev. Neurol. 2011, 7, 650–654. [Google Scholar] [CrossRef]
- Lavrac, N. Machine Learning for Data Mining in Medicine. In Artificial Intelligence in Medicine. AIMDM 1999; Horn, W., Shahar, Y., Lindberg, G., Andreassen, S., Wyatt, J., Eds.; Lecture Notes in Computer Science, vol 1620; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
- Kononenko, I. Machine learning for medical diagnosis: History, state of the art and perspective. Artif. Intell. Med. 2001, 23, 89–109. [Google Scholar] [CrossRef]
- Cooper, G.; Aliferis, C.; Ambrosino, R.; Aronis, J.; Buchanan, B.; Caruana, R.; Fine, M.; Glymour, C.; Gordon, G.; Hanusa, B.; et al. An evaluation of machine-learning methods for predicting pneumonia mortality. Artif. Intell. Med. 1997, 9, 107–138. [Google Scholar] [CrossRef]
- Lerner, B. Bayesian fluorescence in situ hybridisation signal classification. Artif. Intell. Med. 2004, 30, 301–316. [Google Scholar] [CrossRef]
- Lerner, B.; Clocksin, W.F.; Dhanjal, S.; Hult’en, M.A.; Bishop, C.M. Feature representation and signal classification in fluorescence in-situ hybridization image analysis. IEEE Trans. Syst. Man Cybern. 2001, 31, 655–665. [Google Scholar] [CrossRef]
- Lerner, B.; Lawrence, N. A comparison of state-of-the-art classification techniques with application to cytogenetics. Neural Comput. Appl. 2001, 10, 39–47. [Google Scholar] [CrossRef]
- Alam, M.; Le, D.; Lim, J.I.; Chan, R.V.; Yao, X. Supervised machine learning based multi-task artificial intelligence classification of retinopathies. J. Clin. Med. 2019, 8, 872. [Google Scholar] [CrossRef] [PubMed]
- Cao, Y.; Fang, X.; Ottosson, J.; Näslund, E.; Stenberg, E.A. Comparative study of machine learning algorithms in predicting severe complications after bariatric surgery. J. Clin. Med. 2019, 8, 688. [Google Scholar] [CrossRef] [PubMed]
- Padmanabhan, M.; Yuan, P.; Chada, G.; Nguyen, H.V. Physician-friendly machine learning: A case study with cardiovascular disease risk prediction. J. Clin. Med. 2019, 8, 1050. [Google Scholar] [CrossRef] [PubMed]
- Rau, C.S.; Wu, S.C.; Chuang, J.F.; Huang, C.Y.; Liu, H.T.; Chien, P.C.; Hsieh, C.H. Machine learning models of survival prediction in trauma patients. J. Clin. Med. 2019, 8, 799. [Google Scholar] [CrossRef] [PubMed]
- Atassi, N.; Berry, J.; Shui, A.; Zach, N.; Sherman, A.; Sinani, E.; Walker, J.; Katsovskiy, I.; Schoenfeld, D.; Cudkowicz, M.; et al. The PRO-ACT database: Design, initial analyses, and predictive features. Neurology 2014, 83, 1719–1725. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Brooks, B.R.; Sanjack, M.; Ringel, S.; England, J.; Brinkmann, J.; Pestronk, A.; Cedarbaum, J.M. The amyotrophic lateral sclerosis functional rating scale-Assessment of activities of daily living in patients with amyotrophic lateral sclerosis. Arch. Neurol. 1996, 53, 141–147. [Google Scholar]
- Gomeni, R.; Fava, M. Amyotrophic lateral sclerosis disease progression model. Amyotroph. Lateral Scler. Front. Degener. 2014, 15, 119–129. [Google Scholar] [CrossRef]
- Guiloff, R.J. Clinical Trials in Neurology; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
- Renton, A.; Chiò, A.; Traynor, B. State of play in amyotrophic lateral sclerosis genetics. Nat. Neurosci. 2014, 17, 17–23. [Google Scholar] [CrossRef]
- Brooks, B.R.; Miller, R.G.; Swash, M.; Munsat, T.L. El Escorial revisited: Revised criteria for the diagnosis of amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. Other Mot. Neuron Disord. 2000, 5, 293–299. [Google Scholar] [CrossRef]
- Cudkowicz, M.; Qureshi, M.; Shefner, J. Measures and markers in amyotrophic lateral sclerosis. NeuroRx 2004, 2, 273–283. [Google Scholar] [CrossRef]
- Ajroud-Driss, S.; Siddique, T. Sporadic and hereditary amyotrophic lateral sclerosis (ALS). Biochim. Biophys. Acta 2015, 1852, 679–684. [Google Scholar] [CrossRef] [Green Version]
- Mandrioli, J.; Biguzzi, S.; Guidi, C.; Sette, E.; Trelizzi, E.; Ravasio, A.; Casmiro, M.; Salvi, F.; Liguori, R.; Rizzi, R.; et al. Heterogeneity in ALSFRS-R decline and survival: A population based study in Italy. Neurol. Sci. 2015, 36, 2243–2252. [Google Scholar] [CrossRef]
- Piaceri, I.; Del Mastio, M.; Tedde, A.; Bagnoli, S.; Latorraca, S.; Massaro, F.; Paganini, M.; Corrado, A.; Sorbi, S.; Nacmias, B. Clinical heterogeneity in Italian patients with amyptrophich lateral sclerosis. Clin. Genet. 2012, 82, 83–87. [Google Scholar] [CrossRef]
- Kuffner, R.; Zach, N.; Norel, R.; Hawe, J.; Schoenfeld, D.; Wang, L.; Li, G.; Fang, L.; Mackey, L.; Hardiman, O.; et al. Crowdsourced analysis of clinical trial data to predict amyotrophic lateral sclerosis progression. Nat. Biotechnol. 2015, 33, 51–59. [Google Scholar] [CrossRef]
- Zach, N.; Ennist, D.; Taylor, A.; Alon, H.; Sherman, A.; Kuffner, R.; Walker, J.; Sinani, E.; Katsovskiy, I.; Cudkowicz, M.; et al. Being PRO-ACTive: What can a clinical trial database reveal about ALS. Neurotherapeutics 2015, 12, 417–423. [Google Scholar] [CrossRef]
- Available online: https://www.synapse.org/#!Synapse:syn2873386/wiki/ (accessed on 1 October 2019).
- Available online: https://www.synapse.org/#!Synapse:syn2873386/wiki/391432 (accessed on 1 October 2019).
- Wirth, R.; Hipp, J. CRISP-DM: Towards a standard process for data mining. In Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, Manchester, UK, 11–13 April 2000; pp. 29–39. [Google Scholar]
- Hothorn, T.; Jung, H. RandomForest4Life: A random forest for predicting ALS disease progression. Amyotroph. Lateral Scler. Front. Degener. 2014, 15, 444–452. [Google Scholar] [CrossRef]
- Devijver, P.; Kittler, J. Pattern Recognition: A Statistical Approach; Prentice Hall: London, UK, 1982. [Google Scholar]
- Mitchell, T. Machine Learning; McGraw Hill: New York, NY, USA, 1997. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
- Agresti, A. An Introduction to Categorical Data Analysis; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; CRC Press: Boca Raton, FL, USA, 1984. [Google Scholar]
- Frank, E.; Hall, M. A Simple Approach to Ordinal Classification; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principals and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Heckerman, D.; Geiger, D.; Chickering, D. Learning Bayesian networks: The combination of knowledge and data. Mach. Learn. 1995, 20, 197–243. [Google Scholar] [CrossRef]
- Kelner, R.; Lerner, B. Learning Bayesian network classifiers by risk minimization. Int. J. Approx. Reason. 2012, 35, 248–272. [Google Scholar] [CrossRef]
- Fayyad, U.; Irani, K. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, Chambery, France, 1 September 1993. [Google Scholar]
- Saltelli, A.; Tarantola, S.; Caompolongo, F.; Ratto, M. Sensitivity Analysis in Practice; John Wiley & Sons Ltd.: Chichester, UK, 2004. [Google Scholar]
- Rafiq, M.; Lee, E.; Bradburn, M.; McDermott, C.; Shaw, P. Elevated creatinine kinase suggests better prognosis in patients with amyotrophic lateral sclerosis. J. Neurol. Neurosurg. Psychiatry 2013, 84, e2. [Google Scholar]
- Iłżecka, J.; Stelmasiak, Z. Creatinine kinase activity in amyptrophic lateral sclerosis patients. Neurol. Sci. 2003, 24, 286–287. [Google Scholar] [CrossRef]
- Wijesekera, L.C.; Leigh, N.P. Amyotrophic lateral sclerosis. Orphanet J. Rare Dis. 2009, 4, 3–25. [Google Scholar] [CrossRef]
- Oliviera, A.S.; Pereira, R.B. Amyotrophic lateral sclerosis (ALS); three letters that change peoples lives forever. ARQ Neuropsiquiatr 2009, 67, 750–782. [Google Scholar] [CrossRef]
- Chiò, A.; Logroscino, G.; Hardiman, O.; Swingler, R.; Mitchell, D.; Beghi, E.; Traynor, B.G. Prognostic factors in ALS: A critical review. Amyotroph. Lateral Scler. 2009, 10, 310–323. [Google Scholar] [CrossRef] [Green Version]
- Chiò, A.; Calvo, A.; Bovio, G.; Canosa, A.; Bertuzzo, D.; Galmozzi, F.; Cugnasco, P. Amyotrophic lateral sclerosis outcome measures and the role of albumin and creatinine: A population-based study. JAMA Neurol. 2014, 71, 1134–1142. [Google Scholar] [CrossRef]
Data Table | Description |
---|---|
ALSFRS | ALS functional rating scale values |
Demographics | Demographic data, such as gender, age , ethnicity, etc. |
FAMHX | Family history concerning neurological diseases |
FVC | Forced vital capacity (FVC) test values |
Riluzole | If patient is taking Riluzole (the leading drug for ALS) or not |
Patient ALSHX | Patient’s disease history: time of onset , symptoms at onset, etc. |
Vitals | Vital signs throughout the trials |
Labs | Multiple laboratory test results throughout the trials |
Speech | Salivation | Swallowing | Handwriting | Cutting Food & Eating | Dressing/Hygiene | Turning in Bed | Walking | Climbing Stairs | Respiratory | ||
---|---|---|---|---|---|---|---|---|---|---|---|
Basophil | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
Eosinophil | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Lymphocyte | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
Monocyte | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
Albumin | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Alkaline phosphatase | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
ALT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
AST | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
Bicarbonate | |||||||||||
Bilrubin | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
BUN | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
Calcium | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Chloride | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
Laboratory Variables | CK | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Creatinine | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Glucose | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
Hematrocrit | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Hemoglobin | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
Phosphorus | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
Platelets | ✓ | ✓ | |||||||||
Potassium | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
Protein | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
Red blood cells | |||||||||||
Sodium | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
White blood cells |
Setting | Speech | Salivation | Swallowing | Handwriting | Cutting Food and Eating | Dressing/Hygiene | Turning in Bed | Walking | Climbing Stairs | Respiratory |
---|---|---|---|---|---|---|---|---|---|---|
Last | ||||||||||
Both | ||||||||||
First |
Alg. | Speech | Respiratory | Salivation | Swallowing | Handwriting | Cutting Food | Dressing/ Hygiene | Turning in Bed | Walking | Climbing Stairs | |
---|---|---|---|---|---|---|---|---|---|---|---|
Last visit | CLM | 0.78 | 0.67 | 0.81 | 0.76 | 1.07 | 1.05 | 0.82 | 0.85 | 0.80 | 0.74 |
ODT | 0.79 | 0.66 | 0.78 | 0.77 | 1.02 | 0.99 | 0.84 | 0.89 | 0.80 | 0.73 | |
CPT | 1.77 | 2.11 | 2.02 | 2.00 | 1.46 | 1.25 | 1.13 | 1.33 | 1.20 | 0.91 | |
RF | 1.37 | 1.42 | 1.57 | 1.45 | 1.44 | 1.23 | 1.09 | 1.22 | 1.14 | 1.02 | |
Both visits | CLM | 0.71 | 0.62 | 0.76 | 0.68 | 1.06 | 1.02 | 0.81 | 0.84 | 0.77 | 0.76 |
ODT | 0.78 | 0.63 | 0.79 | 0.74 | 1.01 | 1.00 | 0.85 | 0.89 | 0.77 | 0.77 | |
CPT | 1.76 | 2.13 | 2.02 | 2.00 | 1.47 | 1.29 | 1.12 | 1.39 | 1.24 | 0.94 | |
RF | 1.39 | 1.39 | 1.61 | 1.46 | 1.45 | 1.20 | 1.10 | 1.23 | 1.14 | 1.01 | |
First visit | CLM | 1.55 | 0.89 | 1.15 | 1.20 | 1.42 | 1.50 | 1.37 | 1.39 | 0.85 | 0.95 |
ODT | 1.16 | 0.77 | 1.01 | 0.99 | 1.22 | 1.15 | 1.02 | 1.10 | 0.85 | 0.96 | |
CPT | 1.77 | 2.11 | 2.01 | 1.96 | 1.45 | 1.52 | 1.41 | 1.30 | 1.17 | 0.96 | |
RF | 2.18 | 1.61 | 2.06 | 2.03 | 1.84 | 1.26 | 1.20 | 1.60 | 1.34 | 0.99 |
Group | ALSFRS Functions | Semantic and Medical Interpretation |
---|---|---|
1 | Salivation, Speech, Swallowing | Bulbar |
2 | Handwriting, Cutting Food and Eating | Upper limbs |
3 | Walking, Climbing Stairs | Lower limbs |
4 | Turning in Bed, Dressing/Hygiene | Full body |
5 | Respiratory | Respiratory |
Function | Variable 1 | Variable 2 | Variable 3 | Variable 4 |
---|---|---|---|---|
Speech | FVC | Onset Site | CK | chloride |
Salivation | FVC | Onset Site | creatinine | potassium |
Swallowing | FVC | Onset Site | ALT | chloride |
Handwriting | FVC | Onset Site | CK | potassium |
Cutting Food | FVC | CK | chloride | phosphorus |
Dressing/Hygiene | FVC | Onset Site | potassium | CK |
Turning in Bed | FVC | Onset Site | creatinine | potassium |
Walking | FVC | alkaline phosphatase | creatinine | CK |
Climbing Stairs | FVC | alkaline phosphatase | creatinine | phosphorus |
Respiratory | FVC | CK | hemoglobin | Potassium |
FVC | Onset Site | ALT | Chloride | Severe Patients (%) | Mild Patients (%) | |
---|---|---|---|---|---|---|
1 | low | bulbar | high | normal | 16.04 | 1.40 |
2 | low | bulbar | high | low | 12.74 | 0.14 |
3 | low | limb | high | normal | 10.38 | 8.64 |
4 | moderate–high | bulbar | high | normal | 6.60 | 1.68 |
5 | low | limb | high | low | 6.13 | 2.38 |
6 | moderate–low | limb | high | normal | 6.13 | 11.67 |
7 | moderate–high | limb | high | normal | 2.83 | 12.51 |
8 | high | limb | high | normal | 0.47 | 12.14 |
9 | moderate–low | limb | normal | normal | 0.94 | 4.72 |
10 | moderate–high | limb | normal | normal | 0.00 | 4.44 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gordon, J.; Lerner, B. Insights into Amyotrophic Lateral Sclerosis from a Machine Learning Perspective. J. Clin. Med. 2019, 8, 1578. https://doi.org/10.3390/jcm8101578
Gordon J, Lerner B. Insights into Amyotrophic Lateral Sclerosis from a Machine Learning Perspective. Journal of Clinical Medicine. 2019; 8(10):1578. https://doi.org/10.3390/jcm8101578
Chicago/Turabian StyleGordon, Jonathan, and Boaz Lerner. 2019. "Insights into Amyotrophic Lateral Sclerosis from a Machine Learning Perspective" Journal of Clinical Medicine 8, no. 10: 1578. https://doi.org/10.3390/jcm8101578
APA StyleGordon, J., & Lerner, B. (2019). Insights into Amyotrophic Lateral Sclerosis from a Machine Learning Perspective. Journal of Clinical Medicine, 8(10), 1578. https://doi.org/10.3390/jcm8101578