Agreement and Reliability Analysis of Machine Learning Scaling and Wireless Monitoring in the Assessment of Acute Proximal Weakness by Experts and Non-Experts: A Feasibility Study
Abstract
:1. Introduction
2. Materials and Methods
2.1. Participants and Study Protocol
2.2. Automated Proximal Weakness Scaling
2.3. Agreement and Reliability Analysis
3. Results
3.1. Sensor Measurement and Features
3.2. Manual and Machine Learning Scaling
4. Discussion
4.1. Agreement and Reliability of AI Model in Clinical Decision Making
4.2. Developing AI for Personalized Medicine with Disparity and Insufficiency of Data
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- He, J.; Baxter, S.L.; Xu, J.; Xu, J.; Zhou, X.; Zhang, K. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 2019, 25, 30–36. [Google Scholar] [CrossRef]
- Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Eng. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef]
- Hess, D.C.; Audebert, H.J. The history and future of telestroke. Nat. Rev. Neurol. 2013, 9, 340–350. [Google Scholar] [CrossRef] [PubMed]
- Sukumaran, M.; Cantrell, D.R.; Ansari, S.A.; Huryley, M.; Shaibani, A.; Potts, M.B. Stroke patient workflow optimization. Endovasc. Tod. 2019, 18, 46–50. [Google Scholar]
- Jiang, F.; Jiang, Y.; Zhi, H.; Dong, Y.; Li, H.; Ma, S.; Wang, Y.; Dong, Q.; Shen, H.; Wang, Y. Artificial intelligence in healthcare: Past, present and future. Stroke Vasc. Neurol. 2017, 2, 230–243. [Google Scholar] [CrossRef]
- Yu, K.H.; Beam, A.L.; Kohane, I.S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018, 2, 719–731. [Google Scholar] [CrossRef] [PubMed]
- Patrick, J. How to Check the Reliability of Artificial Intelligence Solutions—Ensuring Client Expectations are Met. Appl. Clin. Informatics 2019, 10, 269–271. [Google Scholar] [CrossRef]
- Shen, T.; Lee, A.; Shen, C.; Lin, C. The long tail and rare disease research: The impact of next-generation sequencing for rare Mendelian disorders. Genet. Res. 2015, 97, e15. [Google Scholar] [CrossRef] [PubMed]
- Winata, G.I.; Wang, G.; Xiong, C.; Hoi, S. Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition. In Proceedings of the Interspeech 2021, Brno, Czech Republic, 30 August–3 September 2021. [Google Scholar] [CrossRef]
- Li, J.; Qiu, L.; Tang, B.; Chen, D.; Zhao, D.; Yan, R. Insufficient Data Can Also Rock! Learning to Converse Using Smaller Data with Augmentation. Proc. Conf. AAAI Artif Intell. 2019, 33, 6698–6705. [Google Scholar] [CrossRef]
- Ayan, E.; Unver, H.M. Data augmentation importance for classification of skin lesions via deep learning. In Proceedings of the 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), Istanbul, Turkey, 18–19 April 2018. [Google Scholar] [CrossRef]
- Hagos, T.M.; Kant, S. Transfer learning based detection of diabetic retinopathy from small dataset. arXiv 2019, arXiv:1905.07203. Available online: https://arxiv.org/abs/1905.07203 (accessed on 10 October 2021).
- Ravishankar, H.; Sudhakar, P.; Venkataramani, R.; Thiruvenkadam, S.; Annangi, P.; Babu, N.; Vaidya, V. Understanding the Mechanisms of Deep Transfer Learning for Medical Images. In Deep Learning and Data Labeling for Medical Applications. DLMIA 2016, LABELS 2016. Lecture Notes in Computer Science; Carneiro, G., Mateus, D., Peter, L., Bradley, A., Tavares, J.M.R.S., Belagiannis, V., Papa, J.P., Nascimento, J.C., Loog, M., Lu, Z., et al., Eds.; Springer: Cham, Switzerland, 2016; Volume 10008, pp. 188–196. [Google Scholar] [CrossRef] [Green Version]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Bunkhumpornpat, C.; Sinapiromsaran, K.; Lursinsap, C. Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem. In Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science; Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5476, pp. 475–482. [Google Scholar] [CrossRef] [Green Version]
- Li, D.; Liu, J.; Liu, J. NNI-SMOTE-XGBoost: A Novel Small Sample Analysis Method for Properties Prediction of Polymer Materials. Macmol. Theory Simul. 2021, 30, 2100010. [Google Scholar] [CrossRef]
- Zhang, Y.; Jin, X. An automatic construction and organization strategy for ensemble learning on data streams. ACM SIGMOD Rec. 2006, 35, 28–33. [Google Scholar] [CrossRef]
- Lei, H.; Li, H.; ElAzab, A.; Song, X.; Huang, Z.; Lei, B. Diagnosis of Parkinson’s Disease in Genetic Cohort Patients via Stage-Wise Hierarchical Deep Polynomial Ensemble Learning. In Predictive Intelligence in Medicine. PRIME 2019. Lecture Notes in Computer Science; Rekik, I., Adeli, E., Park, S., Eds.; Springer: Cham, Switzerland, 2019; Volume 11843, pp. 142–150. [Google Scholar] [CrossRef]
- Sammout, R.; Salah, B.K.; Ghedira, K.; Abdelhedi, R.; Kharrat, N.; Abdelhedi, R.; Kharrat, N. A Proposal of Clinical Decision Support System Using Ensemble Learning for Coronary Artery Disease Diagnosis. In Wireless Mobile Communication and Healthcare; Ye, J., O’Grady, M.J., Civitarese, G., Yordanova, K., Eds.; Springer International Publishing: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
- Park, E.; Lee, K.; Han, T.; Nam, H.S. Automatic Grading of Stroke Symptoms for Rapid Assessment Using Optimized Machine Learning and 4-Limb Kinematics: Clinical Validation Study. J. Med. Internet Res. 2020, 22, e20641. [Google Scholar] [CrossRef]
- Paternostro-Sluga, T.; Grim-Stieger, M.; Posch, M.; Schuhfried, O.; Vacariu, G.; Mittermaier, C.; Bittner, C.; Fialka-Moser, V. Reliability and validity of the Medical Research Council (MRC) scale and a modified scale for testing muscle strength in patients with radial palsy. J. Rehabil. Med. 2008, 40, 665–671. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cardoso, J.S.; Sousa, R. Measuring the performance of ordinal classification. Int. J. Pat. Rec. Arti. Int. 2011, 25, 1173–1195. [Google Scholar] [CrossRef] [Green Version]
- Kotsiantis, S.B.; Pintelas, P.E. A Cost Sensitive Technique for Ordinal Classification Problems. In Methods and Applications of Artificial Intelligence. SETN 2004. Lecture Notes in Computer Science; Vouros, G.A., Panayiotopoulos, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3025, pp. 220–229. [Google Scholar] [CrossRef]
- George, N.I.; Lu, T.-P.; Chang, C.-W. Cost-sensitive Performance Metric for Comparing Multiple Ordinal Classifiers. Artif. Intell. Res. 2015, 5, 135–143. [Google Scholar] [CrossRef] [Green Version]
- Lévesque, J.C.; Gagné, C.; Sabourin, R. Bayesian hyperparameter optimization for ensemble learning. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, New York, NY, USA, 25–29 June 2016. [Google Scholar]
- Chaturvedi, S.; Shweta, R. Evaluation of Inter-Rater Agreement and Inter-Rater Reliability for Observational Data: An Overview of Concepts and Methods. J. Ind. Acad. Appl. Psych. 2015, 41, 20–27. [Google Scholar]
- Altman, D.G.; Bland, J.M. Measurement in Medicine: The Analysis of Method Comparison Studies. J. R. Stat. Soc. Ser. D (Stat.) 1983, 32, 307–317. [Google Scholar] [CrossRef]
- Darcy, P.; Moughty, A.M. Pronator drift. N. Engl. J. Med. 2013, 369, e20. [Google Scholar] [CrossRef] [Green Version]
- Bartko, J.J. The Intraclass Correlation Coefficient as a Measure of Reliability. Psychol. Rep. 1966, 19, 3–11. [Google Scholar] [CrossRef]
- de Vet, H.C.W.; Terwee, C.B.; Mokkink, L.B.; Knol, D.L. Measurement in Medicine: A Practical Guide; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Krippendorff, K. Agreement and Information in the Reliability of Coding. Commun. Methods Meas. 2011, 5, 93–112. [Google Scholar] [CrossRef] [Green Version]
- Krippendorff, K. Content Analysis: An Introduction to Its Methodology; Sage Publications: Los Angeles, CA, USA, 2018. [Google Scholar]
- Gwet, K.L. Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement among Raters; Advanced Analytics: Gaithersburg, MD, USA, 2014. [Google Scholar]
- Artstein, R.; Poesio, M. Inter-Coder Agreement for Computational Linguistics. Comput. Linguist. 2008, 34, 555–596. [Google Scholar] [CrossRef] [Green Version]
- Allen, M. The SAGE Encyclopedia of Communication Research Methods; Sage Publications: New York, NY, USA, 2017. [Google Scholar]
- Matlab, R2020; Mathworks: Natick, MA, USA, 2020.
- NLTK. NLTK 3.5 Documentation, Inter-Coder Agreement for Computational Linguistics. Implementations of Inter-Annotator Agreement Coefficients Surveyed by Artstein and Poesio (2007), Inter-Coder Agreement for Computational Linguistics. Available online: http://www.nltk.org/api/nltk.metrics.html#module-nltk.metrics.agreement (accessed on 2 June 2021).
- Vallat, R. Pingouin: Statistics in Python. J. Open Source Soft. 2018, 3, 1026. [Google Scholar] [CrossRef]
- Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Levin, S.; Toerper, M.; Hamrock, E.; Hinson, J.S.; Barnes, S.; Gardner, H.; Dugas, A.; Linton, B.; Kirsch, T.; Kelen, G. Machine-Learning-Based Electronic Triage More Accurately Differentiates Patients With Respect to Clinical Outcomes Compared With the Emergency Severity Index. Ann. Emerg. Med. 2018, 71, 565–574. [Google Scholar] [CrossRef] [PubMed]
- Hong, W.S.; Haimovich, A.D.; Taylor, R.A. Predicting hospital admission at emergency department triage using machine learning. PLoS ONE 2018, 13, e0201016. [Google Scholar] [CrossRef] [Green Version]
- Mateen, B.A.; Liley, J.; Denniston, A.K.; Holmes, C.C.; Vollmer, S.J. Improving the quality of machine learning in health applications and clinical research. Nat. Mach. Intell. 2020, 2, 554–556. [Google Scholar] [CrossRef]
- Longoni, C.; Bonezzi, A.; Morewedge, C.K. Resistance to Medical Artificial Intelligence. J. Consum. Res. 2019, 46, 629–650. [Google Scholar] [CrossRef]
- Fraser, H.; Coiera, E.; Wong, D. Safety of patient-facing digital symptom checkers. Lancet 2018, 392, 2263–2264. [Google Scholar] [CrossRef] [Green Version]
- de Vet, H.C.W.; Terwee, C.B.; Knol, D.L.; Bouter, L.M. When to use agreement versus reliability measures. J. Clin. Epidemiol. 2006, 59, 1033–1039. [Google Scholar] [CrossRef] [Green Version]
- Nili, A.; Tate, M.; Barros, A. A critical analysis of inter-coder reliability methods in information systems research. In Proceedings of the 28th Australasian Conference on Information Systems, Tasmania, Australia, 5–6 December 2017. [Google Scholar]
- Zang, Y.; Huang, C.; Loy, C.C. FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation. arXiv 2021, arXiv:210212867. Available online: https://arxiv.org/abs/2102.12867 (accessed on 20 June 2021).
- Armstrong, S. The apps attempting to transfer NHS 111 online. BMJ 2018, 360, k156. [Google Scholar] [CrossRef]
- Bakator, M.; Radosav, D. Deep Learning and Medical Diagnosis: A Review of Literature. Multimodal Technol. Interact. 2018, 2, 47. [Google Scholar] [CrossRef] [Green Version]
- Bates, M. Health Care Chatbots Are Here to Help. IEEE Pulse 2019, 10, 12–14. [Google Scholar] [CrossRef]
- Wong, K.K.; Fortino, G.; Abbott, D. Deep learning-based cardiovascular image diagnosis: A promising challenge. Futur. Gener. Comput. Syst. 2019, 110, 802–811. [Google Scholar] [CrossRef]
- Fassbender, K.; Balucani, C.; Walter, S.; Levine, S.R.; Haass, A.; Grotta, J. Streamlining of prehospital stroke management: The golden hour. Lancet Neurol. 2013, 12, 585–596. [Google Scholar] [CrossRef]
- Park, E.; Kim, J.H.; Nam, H.S.; Chang, H.-J.; Park, E. Requirement Analysis and Implementation of Smart Emergency Medical Services. IEEE Access 2018, 6, 42022–42029. [Google Scholar] [CrossRef]
MRC Scale (6-Point Scale) | Response |
---|---|
9 (V) | Normal power |
8 (IV+) | Muscle holds the joint against a combination of gravity and moderate resistance, but muscle holds the joint against moderate to maximal resistance |
7 (IV) | Muscle holds the joint against a combination of gravity and moderate resistance |
6 (III+) | Muscle holds the joint against a combination of gravity and moderate resistance, but muscle holds the joint only against minimal resistance |
5 (III) | Muscle moves the joint fully against gravity and is capable of transient resistance, but collapses abruptly |
4 (II+) | Muscle cannot hold the joint against resistance, but moves the joint fully against gravity |
3 (II) | Muscle moves the joint against gravity, but not through full mechanical range of motion |
2 (I+) | Muscle moves the joint when gravity is eliminated |
1 (I) | A flicker of movement is observed or felt in the muscle |
Feature | ICC (2, k) | p | CI (95%) |
---|---|---|---|
MeanDrift | 0.742 | <0.001 | (0.59–0.85) |
MaxDrift | 0.798 | <0.001 | (0.68–0.88) |
SumOsc | 0.850 | <0.001 | (0.76–0.91) |
Methods | Metrics | GS-TS1 | GS-TS2 | GS-TS1-TS2 |
---|---|---|---|---|
Manual | K-alpha | 0.291 | 0.206 | 0.275 |
Fleiss Kappa | 0.300 | 0.218 | 0.285 | |
Machine Learning (DataAug) | K-alpha | 0.422 | 0.407 | 0.381 |
Fleiss Kappa | 0.416 | 0.413 | 0.383 | |
Machine Learning (DataAug + CostAdj) | K-alpha | 0.537 | 0.405 | 0.445 |
Fleiss Kappa | 0.534 | 0.414 | 0.448 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, E.; Lee, K.; Han, T.; Nam, H.S. Agreement and Reliability Analysis of Machine Learning Scaling and Wireless Monitoring in the Assessment of Acute Proximal Weakness by Experts and Non-Experts: A Feasibility Study. J. Pers. Med. 2022, 12, 20. https://doi.org/10.3390/jpm12010020
Park E, Lee K, Han T, Nam HS. Agreement and Reliability Analysis of Machine Learning Scaling and Wireless Monitoring in the Assessment of Acute Proximal Weakness by Experts and Non-Experts: A Feasibility Study. Journal of Personalized Medicine. 2022; 12(1):20. https://doi.org/10.3390/jpm12010020
Chicago/Turabian StylePark, Eunjeong, Kijeong Lee, Taehwa Han, and Hyo Suk Nam. 2022. "Agreement and Reliability Analysis of Machine Learning Scaling and Wireless Monitoring in the Assessment of Acute Proximal Weakness by Experts and Non-Experts: A Feasibility Study" Journal of Personalized Medicine 12, no. 1: 20. https://doi.org/10.3390/jpm12010020
APA StylePark, E., Lee, K., Han, T., & Nam, H. S. (2022). Agreement and Reliability Analysis of Machine Learning Scaling and Wireless Monitoring in the Assessment of Acute Proximal Weakness by Experts and Non-Experts: A Feasibility Study. Journal of Personalized Medicine, 12(1), 20. https://doi.org/10.3390/jpm12010020