Advanced Sampling Technique in Radiology Free-Text Data for Efficiently Building Text Mining Models by Deep Learning in Vertebral Fracture
Abstract
:1. Introduction
2. Materials and Methods
2.1. Database
2.2. Labeling of Vertebral Compression Fracture
2.3. Sampling Methods
2.4. Machine Learning Methodologies and Statistical Analyses
- Word lengths of the text report data: 200.
- Dimensions of the word vectors: 100.
- Window size: 6 (Supplementary Figure S1).
- Input neurons: 20,000 (200 words × 100 dimensions), and the input data of each neuron was the value of word vectors.
- (1)
- Activation function: sigmoid.
- (2)
- Optimizer: Adam with a learning rate of 0.001.
- (3)
- Number of layers: 5.
- (4)
- Loss function: binary_crossentropy
3. Results
3.1. Baseline Characteristics of the Data
3.2. Model Performance across Different Sampling Method
4. Discussion
4.1. The Importance of Text Mining Models
4.2. The Importance of Sampling Methods in Textual Data
4.3. Implementation of Vector Sum Minimization
4.4. Strengths
4.5. Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Miner, G.; Elder, J., IV; Fast, A.; Hill, T.; Nisbet, R.; Delen, D. Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications; Academic Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Sturgis, P. The effect of coding error on time use surveys estimates. J. Off. Stat. 2004, 20, 467. [Google Scholar]
- Brodley, C.E.; Friedl, M.A. Identifying mislabeled training data. J. Artif. Intell. Res. 1999, 11, 131–167. [Google Scholar] [CrossRef]
- Nadkarni, P.M.; Ohno-Machado, L.; Chapman, W.W. Natural language processing: An introduction. J. Am. Med. Inform. Assoc. 2011, 18, 544–551. [Google Scholar] [CrossRef] [PubMed]
- Kao, A.; Poteet, S.R. Natural Language Processing and Text Mining; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Blumenthal, D.; Tavenner, M. The “Meaningful Use” Regulation for Electronic Health Records. N. Engl. J. Med. 2010, 363, 501–504. [Google Scholar] [CrossRef] [PubMed]
- Mahmoudi, E.; Kamdar, N.; Kim, N.; Gonzales, G.; Singh, K.; Waljee, A.K. Use of electronic medical records in development and validation of risk prediction models of hospital readmission: Systematic review. BMJ 2020, 369, m958. [Google Scholar] [CrossRef] [PubMed]
- Cook, D.J.; Guyatt, G.H.; Adachi, J.D.; Clifton, J.; Griffith, L.E.; Epstein, R.S.; Juneper, E.F. Quality of life issues in women with vertebral fractures due to osteoporosis. Arthritis Rheum. Off. J. Am. Coll. Rheumatol. 1993, 36, 750–756. [Google Scholar] [CrossRef] [PubMed]
- Center, J.R.; Nguyen, T.V.; Schneider, D.; Sambrook, P.N.; Eisman, J.A. Mortality after all major types of osteoporotic fracture in men and women: An observational study. Lancet 1999, 353, 878–882. [Google Scholar] [CrossRef]
- Schousboe, J.T. Epidemiology of Vertebral Fractures. J. Clin. Densitom. 2016, 19, 8–22. [Google Scholar] [CrossRef]
- Resch, A.; Schneider, B.; Bernecker, P.; Battmann, A.; Wergedal, J.; Willvonseder, R.; Resch, H. Risk of vertebral fractures in men: Relationship to mineral density of the vertebral body. AJR. Am. J. Roentgenol. 1995, 164, 1447–1450. [Google Scholar] [CrossRef]
- Lindsay, R.; Silverman, S.L.; Cooper, C.; Hanley, D.A.; Barton, I.; Broy, S.B.; Licata, A.; Benhamou, L.; Geusens, P.; Flowers, K. Risk of new vertebral fracture in the year following a fracture. JAMA 2001, 285, 320–323. [Google Scholar] [CrossRef]
- Francis, R.; Baillie, S.; Chuck, A.; Crook, P.; Dunn, N.; Fordham, J.; Kelly, C.; Rodgers, A. Acute and long-term management of patients with vertebral fractures. QJM 2004, 97, 63–74. [Google Scholar] [CrossRef]
- Marsh, D.; Åkesson, K.; Beaton, D.; Bogoch, E.; Boonen, S.; Brandi, M.-L.; McLellan, A.; Mitchell, P.; Sale, J.; Wahl, D. Coordinator-based systems for secondary prevention in fragility fracture patients. Osteoporos. Int. 2011, 22, 2051–2065. [Google Scholar] [CrossRef] [PubMed]
- Adler-Milstein, J.; Everson, J.; Lee, S.Y.D. EHR adoption and hospital performance: Time-related effects. Health Serv. Res. 2015, 50, 1751–1771. [Google Scholar] [CrossRef]
- Grundmeier, R.W.; Masino, A.J.; Casper, T.C.; Dean, J.M.; Bell, J.; Enriquez, R.; Deakyne, S.; Chamberlain, J.M.; Alpern, E.R.; The Pediatric Emergency Care Applied Research Network. Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement. Appl. Clin. Inform. 2016, 7, 1051–1068. [Google Scholar]
- Chandran, M. Fracture Liaison Services in an open system: How was it done? what were the barriers and how were they overcome? Curr. Osteoporos. Rep. 2013, 11, 385–390. [Google Scholar] [CrossRef] [PubMed]
- Senay, A.; Delisle, J.; Banica, A.; Laflamme, G.Y.; Leduc, S.; Mac-Thiong, J.-M.; Ranger, P.; Rouleau, D.; Fernandes, J.C. Barriers to the identification of fragility fractures for secondary fracture prevention in an orthopaedic clinic-based fracture liaison service: A prospective cohort study. Curr. Orthop. Pract. 2018, 29, 574–578. [Google Scholar] [CrossRef]
- Kotsiantis, S.B.; Zaharakis, I.; Pintelas, P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007, 160, 3–24. [Google Scholar]
- Singh, R.P.; Hom, G.L.; Abramoff, M.D.; Campbell, J.P.; Chiang, M.F.; AAO Task Force on Artificial Intelligence. Current Challenges and Barriers to Real-World Artificial Intelligence Adoption for the Healthcare System, Provider, and the Patient. Transl. Vis. Sci. Technol. 2020, 9, 45. [Google Scholar] [CrossRef] [PubMed]
- Miller, D.D.; Brown, E.W. Artificial intelligence in medical practice: The question to the answer? Am. J. Med. 2018, 131, 129–133. [Google Scholar] [CrossRef]
- Namee, B.M.; Cunningham, P.; Byrne, S.; Corrigan, O.I. The problem of bias in training data in regression problems in medical decision support. Artif. Intell. Med. 2002, 24, 51–70. [Google Scholar] [CrossRef]
- Cochran, W.G. Sampling Techniques; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
- Hung, W.; Yang, C.; Cheng, W.; Wu, C. Revisit three “I” model: A novel five “I” model of fracture liaison service. Osteoporos. Int. 2019, 30, 2361–2362. [Google Scholar] [CrossRef] [PubMed]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1188–1196. [Google Scholar]
- Naeem, A.; Rehman, M.; Anjum, M.; Asif, M. Development of an efficient hierarchical clustering analysis using an agglomerative clustering algorithm. Curr. Sci. 2019, 117, 1045. [Google Scholar] [CrossRef]
- Liberti, L.; Lavor, C. Euclidean Distance Geometry: An Introduction; Springer: Berlin, Germany, 2017. [Google Scholar]
- Skovajsová, L. Long short-term memory description and its application in text processing. In Proceedings of the 2017 Communication and Information Technologies (KIT), Vysoke Tatry, Slovakia, 4–6 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–4. [Google Scholar]
- DeLong, E.R.; DeLong, D.M.; Clarke-Pearson, D.L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 1988, 44, 837–845. [Google Scholar] [CrossRef] [PubMed]
- Bewick, V.; Cheek, L.; Ball, J. Statistics review 13: Receiver operating characteristic curves. Crit. Care 2004, 8, 508. [Google Scholar] [CrossRef] [PubMed]
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
- Panda, A.; Das, C.J.; Baruah, U. Imaging of vertebral fractures. Indian J. Endocrinol. Metab. 2014, 18, 295–303. [Google Scholar] [PubMed]
- Gehlbach, S.H.; Bigelow, C.; Heimisdottir, M.; May, S.; Walker, M.; Kirkwood, J.R. Recognition of vertebral fracture in a clinical setting. Osteoporos. Int. 2000, 11, 577–582. [Google Scholar] [CrossRef] [PubMed]
- Majumdar, S.R.; Kim, N.; Colman, I.; Chahal, A.M.; Raymond, G.; Jen, H.; Siminoski, K.G.; Hanley, D.A.; Rowe, B.H. Incidental vertebral fractures discovered with chest radiography in the emergency department: Prevalence, recognition, and osteoporosis management in a cohort of elderly patients. Arch. Intern. Med. 2005, 165, 905–909. [Google Scholar] [CrossRef]
- Lenchik, L.; Rogers, L.F.; Delmas, P.D.; Genant, H.K. Diagnosis of osteoporotic vertebral fractures: Importance of recognition and description by radiologists. AJR Am. J. Roentgenol. 2004, 183, 949–958. [Google Scholar] [CrossRef]
- Pereira, L.; Rijo, R.; Silva, C.; Martinho, R. Text Mining Applied to Electronic Medical Records: A Literature Review. Int. J. E Health Med. Commun. 2015, 6, 1–18. [Google Scholar] [CrossRef]
- Sun, W.; Cai, Z. Data Processing and Text Mining Technologies on Electronic Medical Records: A Review. J. Health Eng. 2018, 2018, 4302425. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.-L.; Hong, S.-H.; Tsai, Y.-C. Approaches to text mining for analyzing treatment plan of quit smoking with free-text medical records: A PRISMA-compliant meta-analysis. Medicine 2020, 99, e20999. [Google Scholar] [CrossRef] [PubMed]
- Harpaz, R.; Callahan, A.; Tamang, S.; Low, Y.; Odgers, D.; Finlayson, S.; Jung, K.; LePendu, P.; Shah, N.H. Text mining for adverse drug events: The promise, challenges, and state of the art. Drug Saf. 2014, 37, 777–790. [Google Scholar] [CrossRef] [PubMed]
- Sugimoto, K.; Takeda, T.; Oh, J.-H.; Wada, S.; Konishi, S.; Yamahata, A.; Manabe, S.; Tomiyama, N.; Matsunaga, T.; Nakanishi, K.; et al. Extracting clinical terms from radiology reports with deep learning. J. Biomed. Inform. 2021, 116, 103729. [Google Scholar] [CrossRef] [PubMed]
- Li, D.-C.; Hu, S.C.; Lin, L.-S.; Yeh, C.-W. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets. PLoS ONE 2017, 12, e0181853. [Google Scholar] [CrossRef] [PubMed]
- Sedgwick, P. Stratified cluster sampling. BMJ 2013, 347, f7016. [Google Scholar] [CrossRef]
- Kowsari, K.; Meimandi, K.J.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text classification algorithms: A survey. Information 2019, 10, 150. [Google Scholar] [CrossRef]
- Lassance, C.; Gripon, V.; Ortega, A. Representing deep neural networks latent space geometries with graphs. Algorithms 2021, 14, 39. [Google Scholar] [CrossRef]
- Jonsson, P.; Lagerkvist, V. An initial study of time complexity in infinite-domain constraint satisfaction. Artif. Intell. 2017, 245, 115–133. [Google Scholar] [CrossRef]
- Riesen, K.; Bunke, H. Graph classification based on vector space embedding. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 1053–1081. [Google Scholar] [CrossRef]
- Hao, P.-Y.; Chiang, J.-H.; Tu, Y.-K. Hierarchically SVM classification based on support vector clustering method and its application to document categorization. Expert Syst. Appl. 2007, 33, 627–635. [Google Scholar] [CrossRef]
- Lobo, J.M.; Jiménez-Valverde, A.; Real, R. AUC: A misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr. 2008, 17, 145–151. [Google Scholar] [CrossRef]
Sampling Ratio | Sampling Method | Euclidean Distance of Each Cluster | p Value |
---|---|---|---|
1/10 (N = 2766) | Vector sum minimization | 2.9 ± 2.3 | <0.001 |
Vector sum maximization | 19.5 ± 26.1 | ||
Stratified sampling | 12 ± 10.3 | ||
Simple random sampling | N/A | N/A | |
1/20 (N = 1392) | Vector sum minimization | 4.0 ± 2.6 | <0.001 |
Vector sum maximization | 22.8 ± 32.1 | ||
Stratified sampling | 9.7 ± 7.8 | ||
Simple random sampling | N/A | N/A | |
1/30 (N = 936) | Vector sum minimization | 4.1 ± 3.1 | <0.001 |
Vector sum maximization | 28.8 ± 38.0 | ||
Stratified sampling | 14.4 ± 12.5 | ||
Simple random sampling | N/A | N/A | |
1/40 (N = 706) | Vector sum minimization | 5.9 ± 4.6 | <0.001 |
Vector sum maximization | 42.8 ± 54.8 | ||
Stratified sampling | 18.4 ± 18.9 | ||
Simple random sampling | N/A | N/A |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hung, W.-C.; Lin, Y.-L.; Lin, C.-W.; Chin, W.-L.; Wu, C.-H. Advanced Sampling Technique in Radiology Free-Text Data for Efficiently Building Text Mining Models by Deep Learning in Vertebral Fracture. Diagnostics 2024, 14, 137. https://doi.org/10.3390/diagnostics14020137
Hung W-C, Lin Y-L, Lin C-W, Chin W-L, Wu C-H. Advanced Sampling Technique in Radiology Free-Text Data for Efficiently Building Text Mining Models by Deep Learning in Vertebral Fracture. Diagnostics. 2024; 14(2):137. https://doi.org/10.3390/diagnostics14020137
Chicago/Turabian StyleHung, Wei-Chieh, Yih-Lon Lin, Chi-Wei Lin, Wei-Leng Chin, and Chih-Hsing Wu. 2024. "Advanced Sampling Technique in Radiology Free-Text Data for Efficiently Building Text Mining Models by Deep Learning in Vertebral Fracture" Diagnostics 14, no. 2: 137. https://doi.org/10.3390/diagnostics14020137
APA StyleHung, W.-C., Lin, Y.-L., Lin, C.-W., Chin, W.-L., & Wu, C.-H. (2024). Advanced Sampling Technique in Radiology Free-Text Data for Efficiently Building Text Mining Models by Deep Learning in Vertebral Fracture. Diagnostics, 14(2), 137. https://doi.org/10.3390/diagnostics14020137