Text Classification in Clinical Practice Guidelines Using Machine-Learning Assisted Pattern-Based Approach
Abstract
1. Introduction
2. Related Work
3. Methodology
3.1. Document Preprocessing
3.2. Salient Terms Extraction
3.3. Pattern Extraction Process
- Introduce all team members and nominate a leader to cordially handle meetings. The annotated CPG is provided to each member, the leader explained the purpose and process of the study and the voting process.
- All panel members analyze the provided CPG independently and extract the patterns based on their heuristics that can identify recommendation statements in a CPG.
- The leader collects all patterns extracted by each member and removes the duplicate patterns. A total of 21 unique patterns were identified by all KEs as shown in Table 2.
- The panel members discuss each pattern, and the concerned member explains the reason for selecting the corresponding pattern.
- All five participants rank each pattern from one to five, where one is the lowest and five being the highest rank. The leader aggregate the ranks of each pattern.
- A threshold value (total rank ≥ 15 ) is selected with the consensus of all team members, which is the 60% of team members agreement on a pattern.
- Select those patterns, which have a higher accumulative rank than the threshold value (15). Based on this criterion, 10 patterns are selected as final patterns shown in Table 3.
3.4. Sentence Classification
4. Results and Discussion
4.1. Results: Preprocessing
4.2. Results: Salient Terms Extraction
4.3. Results: Pattern Extraction
4.3.1. Heuristic Patterns
4.3.2. POS Patterns
4.3.3. UMLS Patterns
- Null hypothesis : Model FS is not better than Model WFS
- Alternate hypothesis : FS is better than WFS
5. System Evaluation
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Field, M.J.; Lohr, K.N. (Eds.) Clinical Practice Guidelines: Directions for a New Program; National Academies Press: Washington, DC, USA, 1990. [Google Scholar]
- Davis, D.A.; Taylor-Vaisey, A. Translating guidelines into practice: A systematic review of theoretic concepts, practical experience and research evidence in the adoption of clinical practice guidelines. CMAJ 1997, 157, 408–416. [Google Scholar]
- Kaiser, K.; Miksch, S.; Tu, S.W. Computer-Based Support for Clinical Guidelines and Protocols: Proceedings of the Symposium on Computerized Guidelines and Protocols (CGP 2004); IOS Press: Amsterdam, The Netherlands, 2004. [Google Scholar]
- Wenzina, R.; Kaiser, K. Identifying condition-action sentences using a heuristic-based information extraction method. In Process Support and Knowledge Representation in Health Care; Springer: Berlin, Germany, 2013; pp. 26–38. [Google Scholar]
- Fox, J.; Patkar, V.; Chronakis, I.; Begent, R. From practice guidelines to clinical decision support: Closing the loop. J. R. Soc. Med. 2009, 102, 464–473. [Google Scholar] [CrossRef] [PubMed]
- Rello, J.; Lorente, C.; Bodí, M.; Diaz, E.; Ricart, M.; Kollef, M.H. Why do physicians not follow evidence-based guidelines for preventing ventilator-associated pneumonia?: A survey based on the opinions of an international panel of intensivists. Chest 2002, 122, 656–661. [Google Scholar] [CrossRef] [PubMed]
- Kilsdonk, E.; Peute, L.W.; Riezebos, R.J.; Kremer, L.C.; Jaspers, M.W. From an expert-driven paper guideline to a user-centred decision support system: A usability comparison study. Artif. Intell. Med. 2013, 59, 5–13. [Google Scholar] [CrossRef] [PubMed]
- Davis, D.A.; Thomson, M.A.; Oxman, A.D.; Haynes, R.B. Evidence for the effectiveness of CME: A review of 50 randomized controlled trials. JAMA 1992, 268, 1111–1117. [Google Scholar] [CrossRef] [PubMed]
- Jang, B.; Kim, M.; Harerimana, G.; Kang, S.u.; Kim, J.W. Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Appl. Sci. 2020, 10, 5841. [Google Scholar] [CrossRef]
- Thangaraj, M.; Sivakami, M. Text classification techniques: A literature review. Interdiscip. J. Inf. Knowl. Manag. 2018, 13, 117–135. [Google Scholar] [CrossRef]
- Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text classification algorithms: A survey. Information 2019, 10, 150. [Google Scholar] [CrossRef]
- Jiang, M.; Liang, Y.; Feng, X.; Fan, X.; Pei, Z.; Xue, Y.; Guan, R. Text classification based on deep belief network and softmax regression. Neural Comput. Appl. 2018, 29, 61–70. [Google Scholar] [CrossRef]
- Xu, S. Bayesian Naïve Bayes classifiers to text classification. J. Inf. Sci. 2018, 44, 48–59. [Google Scholar] [CrossRef]
- Cai, D.; Garg, N.; Dobrzynski, M.; Guo, W.Q.; Khanna, A.; Xu, N. Content Pattern Based Automatic Document Classification. U.S. Patent App. 15/713,445, 28 March 2019. [Google Scholar]
- Fu, S.; Chen, D.; He, H.; Liu, S.; Moon, S.; Peterson, K.J.; Shen, F.; Wang, L.; Wang, Y.; Wen, A.; et al. Clinical concept extraction: A methodology review. J. Biomed. Informatics 2020, 109, 103526. [Google Scholar] [CrossRef] [PubMed]
- Yao, L.; Mao, C.; Luo, Y. Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med. Inform. Decis. Mak. 2019, 19, 71. [Google Scholar] [CrossRef] [PubMed]
- Bui, D.D.A.; Zeng-Treitler, Q. Learning regular expressions for clinical text classification. J. Am. Med Inform. Assoc. 2014, 21, 850–857. [Google Scholar] [CrossRef] [PubMed]
- Zhong, N.; Li, Y.; Wu, S.T. Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 2010, 24, 30–44. [Google Scholar] [CrossRef]
- Gallagher, M.; Hares, T.; Spencer, J.; Bradshaw, C.; Webb, I. The nominal group technique: A research tool for general practice? Fam. Pract. 1993, 10, 76–81. [Google Scholar] [CrossRef] [PubMed]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J.; Sutskever, L.; Zweig, G. word2vec. 2013. Available online: https://code.google.com/p/word2vec (accessed on 13 January 2021).
- Jacobsen, P.B. Clinical practice guidelines for the psychosocial care of cancer survivors: Current status and future prospects. Cancer 2009, 115, 4419–4429. [Google Scholar] [CrossRef]
- Peleg, M. Computer-interpretable clinical guidelines: A methodological review. J. Biomed. Inform. 2013, 46, 744–763. [Google Scholar] [CrossRef]
- Serban, R.; ten Teije, A.; van Harmelen, F.; Marcos, M.; Polo-Conde, C. Extraction and use of linguistic patterns for modelling medical guidelines. Artif. Intell. Med. 2007, 39, 137–149. [Google Scholar] [CrossRef]
- Hematialam, H.; Zadrozny, W. Identifying condition-action statements in medical guidelines using domain-independent features. arXiv 2017, arXiv:1706.04206. [Google Scholar]
- Gad El-Rab, W.; Zaïane, O.R.; El-Hajj, M. Formalizing clinical practice guideline for clinical decision support systems. Health Inform. J. 2017, 23, 146–156. [Google Scholar] [CrossRef] [PubMed]
- Priyanta, S.; Hartati, S.; Harjoko, A.; Wardoyo, R. Comparison of sentence subjectivity classification methods in Indonesian News. Int. J. Comput. Sci. Inf. Secur. 2016, 14, 407. [Google Scholar]
- Dashtipour, K.; Gogate, M.; Li, J.; Jiang, F.; Kong, B.; Hussain, A. A hybrid Persian sentiment analysis framework: Integrating dependency grammar based rules and deep neural networks. Neurocomputing 2020, 380, 1–10. [Google Scholar] [CrossRef]
- Lu, Q.; Zhu, Z.; Xu, F.; Guo, Q. Chinese Sentiment Classification Method with Bi-LSTM and Grammar Rules. Data Anal. Knowl. Discov. 2019, 3, 99–107. [Google Scholar]
- HaCohen-Kerner, Y.; Miller, D.; Yigal, Y. The influence of preprocessing on text classification using a bag-of-words representation. PLoS ONE 2020, 15, e0232525. [Google Scholar] [CrossRef] [PubMed]
- Srividhya, V.; Anitha, R. Evaluating preprocessing techniques in text categorization. Int. J. Comput. Sci. Appl. 2010, 47, 49–51. [Google Scholar]
- Shekar, B.; Dagnew, G. Grid search-based hyperparameter tuning and classification of microarray cancer data. In Proceedings of the IEEE 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India, 25–28 February 2019; pp. 1–8. [Google Scholar]
- James, P.A.; Oparil, S.; Carter, B.L.; Cushman, W.C.; Dennison-Himmelfarb, C.; Handler, J.; Lackland, D.T.; LeFevre, M.L.; MacKenzie, T.D.; Ogedegbe, O.; et al. 2014 evidence-based guideline for the management of high blood pressure in adults: Report from the panel members appointed to the Eighth Joint National Committee (JNC 8). JAMA 2014, 311, 507–520. [Google Scholar] [CrossRef]
- Manning, C.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.; McClosky, D. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, 23–24 June 2014; pp. 55–60. [Google Scholar]
- Bodenreider, O. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 2004, 32, D267–D270. [Google Scholar] [CrossRef]
- Aronson, A.R.; Lang, F.M. An overview of MetaMap: Historical perspective and recent advances. J. Am. Med Inform. Assoc. 2010, 17, 229–236. [Google Scholar] [CrossRef]
- Chow, A.W.; Benninger, M.S.; Brook, I.; Brozek, J.L.; Goldstein, E.J.; Hicks, L.A.; Pankey, G.A.; Seleznick, M.; Volturo, G.; Wald, E.R.; et al. IDSA clinical practice guideline for acute bacterial rhinosinusitis in children and adults. Clin. Infect. Dis. 2012, 54, e72–e112. [Google Scholar] [CrossRef]
- Society, B.T. Scottish Intercollegiate Guidelines Network. Br. Guidel. Manag. Asthma. Thorax 2003, 58, i1–i94. [Google Scholar]
- Jurafsky, D. Speech and Language Processing. Available online: https://web.stanford.edu/~jurafsky/slp3/slides/4_NB_Jan_10_2021.pdf (accessed on 19 March 2021).
- Wei, J.; Zou, K. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv 2019, arXiv:1901.11196. [Google Scholar]
S.No | Decision Tree | Rule Induction | LDA | Word2vec |
---|---|---|---|---|
1 | cosmopolitan | cosmopolitan | goal | recommend |
2 | angiotensin | reach | low | facilitate |
3 | bespeak | black | population | improve |
4 | adult | better | treatment | consideration |
5 | aged | opinion | year | evidence |
6 | animation | aged | recommendation | assess |
7 | condition | condition | evidence | condition |
8 | reach | former | pharmacological | quality |
9 | black | case | initiate | regardless |
10 | decrepit | commend | hypertension | referral |
S.No | Extracted Patterns | KE-1 | KE-2 | KE-3 | KE-4 | KE-5 | Total Score |
---|---|---|---|---|---|---|---|
1 | .*lead(s)? to.* | 3 | 3 | 2 | 1 | 2 | 11 |
2 | .*treatment (should|with|to).* | 2 | 3 | 4 | 4 | 3 | 16 |
3 | .*initiat(.*)treatment.* | 2 | 3 | 2 | 3 | 4 | 14 |
4 | .*to improve.* | 4 | 4 | 1 | 3 | 5 | 17 |
5 | .*evidence(.*)(to)? support.* | 1 | 4 | 2 | 1 | 3 | 11 |
6 | .*(patient(s)?)? with (disease).* | 3 | 3 | 4 | 5 | 4 | 19 |
7 | .*should (include|continue).* | 5 | 3 | 3 | 2 | 5 | 18 |
8 | .*appli(es|ed)ed (to)?.* | 2 | 1 | 3 | 2 | 3 | 11 |
9 | .*can be used.* | 3 | 2 | 2 | 1 | 2 | 10 |
10 | .*(add|remove)(.*) drug.* | 4 | 4 | 3 | 5 | 5 | 21 |
11 | .*(panel)(.*)(recommend(ed)? |conclude(ed)?|include(d)?).* | 4 | 2 | 2 | 1 | 3 | 12 |
12 | .*less effective.* | 1 | 3 | 4 | 2 | 3 | 13 |
13 | .*treatment (does not)? need.* | 2 | 3 | 2 | 3 | 3 | 13 |
14 | .*regardless of.* | 3 | 4 | 3 | 3 | 2 | 15 |
15 | .*meet.*goal.* | 2 | 1 | 2 | 2 | 3 | 10 |
16 | .*(increase|decrease).*dose.* | 5 | 4 | 5 | 5 | 4 | 23 |
17 | .*(recommend(ed)?) treatment.* | 3 | 3 | 4 | 3 | 3 | 16 |
18 | .*(improve(ment)? |high quality).*dose.* | 2 | 2 | 3 | 3 | 2 | 12 |
19 | .*(Recommendation /d+/s+:.* | 5 | 5 | 5 | 5 | 5 | 25 |
20 | .*expert(s)?.*opinion.* | 3 | 3 | 2 | 3 | 2 | 13 |
21 | .*(dis)?continu(e|ed|ing|ation).* | 4 | 3 | 3 | 2 | 4 | 16 |
S.No | Patterns without Salient Terms |
---|---|
1 | .*(add|remove) (.*) drug.* |
2 | .*(recommend(ed)?) treatment.* |
3 | .*to improve.* |
4 | .*(increase|decrease) .*dose.* * |
5 | .*treatment (should|with|to).* |
6 | .*Recommendation /d+/s+:.* |
7 | .*should (include|continue).* |
8 | .*(dis)?continu(e|ed|ing|ation).* |
9 | .*regardless of.* |
10 | .*(patient(s)?)?with (disease).* |
S.No | Patterns with Salient Terms |
---|---|
1 | .*(give|add|remove) (.*) drug.* |
2 | .*([I|i]n) (black|general) (.*) population.* |
3 | .*(recommend(ed)? |better) treatment.* |
4 | .*(increase|decrease) .*dose.* |
5 | .* ((public)? opinion) .* treatment (should|with|to).* |
6 | .*Recommendation /d+/s+:.* |
7 | .*should (include|continue).* |
8 | .*(dis)?continu(e|ed|ing|ation) |reach .* goal .* |
9 | .* (regardless of)|(having age).* |
10 | .*(patient(s)? |adult |(population group) )?with (disease).* |
Tag | Description | Tag | Description |
---|---|---|---|
CD | Cardinal number | IN | Preposition/sub-conj |
MD | modal | NN | Noun, sign. or mass |
JJ | Adjective | TO | ’to’ |
JJR | Adjective, comparative | VBG | Verb present participle |
VB | Verb base from | - | - |
S.No | Patterns |
---|---|
1 | .* VB .* drug .* |
2 | .* IN .* JJ .* population .* |
3 | .* (VB|JJR) .* treatment .* |
4 | .*NN.* dose .* |
5 | .* (JJ)? NN .* treatment (MD|IN|TO) .* |
6 | NN(/s+)?:(/s+)?CD .* |
7 | .*VB (include|continue).* |
8 | .*(VB+) .* goal.* |
9 | .*(regardless of)|VBG age.* |
10 | .* (JJ|NN) IN disease.* |
S.No | Patterns |
---|---|
1 | .*(give|add|remove) .* Pharmacologic Substance .* |
2 | .*([I|i]n) .* Population Group .* |
3 | .*(Health Care Activity|Qualitative Concept).* Functional Concept.* |
4 | .*Functional Concept .* Pharmacologic Substance.* |
5 | .*Qualitative Concept .*Functional Concept (should|with|to).* |
6 | .*Idea or Concept /d+/s+:.* |
7 | .* should .* (Functional Concept|Idea or Concept).* |
8 | .*Idea or Concept .* Intellectual Product .* |
9 | .* regardless of|Organism Attribute.* |
10 | .* Population Group .* with .* (Disease or Syndrome) .* |
Guideline | Total Sentence | Recommendation Sentences | Non-Recommendation Sentences |
---|---|---|---|
Hypertension | 278 | 78 (28.06%) | 200 (71.94%) |
Rhinosinusitis | 761 | 151 (19.84%) | 610 (80.16%) |
Asthma | 171 | 53 (30.99%) | 118 (69.01%) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hussain, M.; Hussain, J.; Ali, T.; Ali, S.I.; Bilal, H.S.M.; Lee, S.; Chung, T. Text Classification in Clinical Practice Guidelines Using Machine-Learning Assisted Pattern-Based Approach. Appl. Sci. 2021, 11, 3296. https://doi.org/10.3390/app11083296
Hussain M, Hussain J, Ali T, Ali SI, Bilal HSM, Lee S, Chung T. Text Classification in Clinical Practice Guidelines Using Machine-Learning Assisted Pattern-Based Approach. Applied Sciences. 2021; 11(8):3296. https://doi.org/10.3390/app11083296
Chicago/Turabian StyleHussain, Musarrat, Jamil Hussain, Taqdir Ali, Syed Imran Ali, Hafiz Syed Muhammad Bilal, Sungyoung Lee, and Taechoong Chung. 2021. "Text Classification in Clinical Practice Guidelines Using Machine-Learning Assisted Pattern-Based Approach" Applied Sciences 11, no. 8: 3296. https://doi.org/10.3390/app11083296
APA StyleHussain, M., Hussain, J., Ali, T., Ali, S. I., Bilal, H. S. M., Lee, S., & Chung, T. (2021). Text Classification in Clinical Practice Guidelines Using Machine-Learning Assisted Pattern-Based Approach. Applied Sciences, 11(8), 3296. https://doi.org/10.3390/app11083296