Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques
Abstract
:1. Introduction
- Development of a NLP and ASR based system to transcript and summarize patient–doctor interactions in a primary attention center;
- Creation of a validation set composed of real conversations and their associated summaries created by healthcare professionals;
- Reduction of the bureaucratic burden of healthcare personnel for improving the quality of their daily work life.
2. Related Work
2.1. Natural Language Processing
2.2. Text Summarization
- Template-driven Generation: This technique utilizes predefined templates [15] or sentence structures to generate summaries. These templates may have blank spaces or markers filled with relevant details extracted from the source material. Although this method is less flexible and innovative, it proves useful in specific fields characterized by well-defined text formats.
- Language Model-based Generation: This approach relies on deep learning architectures like Seq2Seq models [16], trained on extensive datasets to understand and encapsulate text structures and semantics.
- Language Generation with Attention Mechanisms: Employing models that integrate attention mechanisms [17], such as Transformer models [18], this method focuses on significant segments of the source text during summary creation. Attention mechanisms allow the model to assign importance and relevance to various text segments during summary generation, enhancing coherence and salience in the final summary output.
2.3. Automatic Speech Recognition
3. Methodology
3.1. Hugging Face Transformers
3.2. Dialogues Dataset
3.3. ASR Component
3.4. Transcription Processing
- Anonymization: Efforts were made to anonymize the dialogues by removing names of individuals and locations, along with any personally identifiable or sensitive information, including remarks related to political or religious beliefs.
- Elimination of non-essential details: Segments of the patient exchanges that do not contribute to the main discussion, and instead inflate the character count (potentially complicating the summarization capabilities of certain models) are removed to prevent confusion and maintain focus on the primary subject matter. Fragments of non-essential details can be those that include information of a personal nature.
- Reduction of word repetition: The elimination of unnecessary repetition of words (whether from transcription inaccuracies or habitual repetition in spoken communication) aims to clarify communication, facilitating clearer distinction between speakers. An instance of this could be a patient affirming themselves with multiple “okay” utterances consecutively.
3.5. Text Summarization
- Segmentation of processed transcriptions: This method involves splitting the conversation at “\n” line breaks, with each segment then being processed on its own. During each cycle, the segment is tokenized with the tokenizer, followed by the generation of a summary through the use of a pre-trained model.
- Segmentation in real time: This approach employs a recursive method to break down the text into smaller pieces in case the length surpass the model’s capacity. Should the text prove too lengthy, it is divided into halves, with the procedure recursively applied to each segment. This repetition continues until the text size is manageable for the model’s processing capabilities.
4. Experiments
4.1. Quantitative Results
4.2. Qualitative Results
“The patient comes in for a medical leave review following a car accident. The improvement is very slight and he is undergoing rehabilitation through the insurance company. The doctor recommends that he should not exert himself but should stay active. He reports pain in a shoulder. The doctor orders an X-ray and renews the medication that works well for the pain”.
“Okay, let’s go there. Did we have to update the report today? Yes. How are you doing? In pain. Now it seems that it hurts much less, but I still have the same pains and the dizziness is still the same. How was it? Because it was an accident, right? A car accident. Yes, it was a difficult accident. And what has the insurance told you? Or haven’t they told you anything? We are undergoing rehabilitation with the insurance. What do you do? I’m a maintenance technician. And do you use strength with your arms? Yes. Is it the right shoulder? Right shoulder, yes. And about medication? I need you to send me nolotil. What works best for you? I’m taking, sometimes I take nolotil, other times enentium. When you have the X-ray, if you want to see, or if we meet at the end of the month with the X-ray”.
“It was a car accident. The driver had to update the report today. He feels better now, but still has the same pains and dizziness. The dizziness takes at least a couple of months to go away. When I come out of rehabilitation, I have problems with my back. It’s important to stay active in rehabilitation and cook if you have to cook, so try to stay active and active. If you don’t do things that give you extra effort, but try to stay active, so if you need to cook then you cook. If the muscle relaxant at night works well for you, you can take it occasionally. If not, you have to take it less. If you don’t, then you can stop taking it altogether”.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hert, S.D. Burnout in Healthcare Workers: Prevalence, Impact and Preventative Strategies. Local Reg. Anesth. 2020, 13, 171–183. [Google Scholar] [CrossRef] [PubMed]
- Bradley, M.; Chahar, P. Burnout of healthcare providers during COVID-19. Clevel. Clin. J. Med. 2020, 91, 1–3. [Google Scholar] [CrossRef] [PubMed]
- Bouza, E.; Gil-Monte, P.; Palomo, E.; Bouza, E.; Cortell-Alcocer, M.; Del Rosario, G.; Gil-Monte, P.; González, J.; Gracia, D.; Martínez Moreno, A.; et al. Síndrome de quemarse por el trabajo (burnout) en los médicos de España. Rev. Clínica Española 2020, 220, 359–363. [Google Scholar] [CrossRef] [PubMed]
- Khurana, D.; Koli, A.; Khatter, K.; Singh, S. Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 2023, 82, 3713–3744. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Yang, T. Word Embedding for Understanding Natural Language: A Survey. In Guide to Big Data Applications; Srinivasan, S., Ed.; Springer International Publishing: Cham, Switzerland, 2018; pp. 83–104. [Google Scholar] [CrossRef]
- Iqbal, T.; Qureshi, S. The survey: Text generation models in deep learning. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 2515–2528. [Google Scholar] [CrossRef]
- Chiche, A.; Yitagesu, B. Part of speech tagging: A systematic review of deep learning and machine learning approaches. J. Big Data 2022, 9, 10. [Google Scholar] [CrossRef]
- Nandwani, P.; Verma, R. A review on sentiment analysis and emotion detection from text. Soc. Netw. Anal. Min. 2021, 11, 81. [Google Scholar] [CrossRef]
- El-Kassas, W.S.; Salama, C.R.; Rafea, A.A.; Mohamed, H.K. Automatic text summarization: A comprehensive survey. Expert Syst. Appl. 2021, 165, 113679. [Google Scholar] [CrossRef]
- Iannizzotto, G.; Bello, L.L.; Nucita, A.; Grasso, G.M. A Vision and Speech Enabled, Customizable, Virtual Assistant for Smart Environments. In Proceedings of the 2018 11th International Conference on Human System Interaction (HSI), Gdansk, Poland, 4–6 July 2018; pp. 50–56. [Google Scholar] [CrossRef]
- Liao, J.; Eskimez, S.; Lu, L.; Shi, Y.; Gong, M.; Shou, L.; Qu, H.; Zeng, M. Improving Readability for Automatic Speech Recognition Transcription. ACM Trans. Asian-Low-Resour. Lang. Inf. Process. 2023, 22, 5. [Google Scholar] [CrossRef]
- Jin, H.; Zhang, Y.; Meng, D.; Wang, J.; Tan, J. A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv 2024, arXiv:2403.02901. [Google Scholar]
- Collins, E.; Augenstein, I.; Riedel, S. A Supervised Approach to Extractive Summarisation of Scientific Papers. arXiv 2017, arXiv:1706.03946. [Google Scholar]
- Fang, Y.; Zhu, H.; Muszyńska, E.; Kuhnle, A.; Teufel, S. A Proposition-Based Abstractive Summariser. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan, 11–16 December 2016; pp. 567–578. [Google Scholar]
- Wu, P.; Zhou, Q.; Lei, Z.; Qiu, W.; Li, X. Template Oriented Text Summarization via Knowledge Graph. In Proceedings of the 2018 International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, China, 16–17 July 2018; pp. 79–83. [Google Scholar] [CrossRef]
- Shi, T.; Keneshloo, Y.; Ramakrishnan, N.; Reddy, C.K. Neural Abstractive Text Summarization with Sequence-to-Sequence Models. ACM/IMS Trans. Data Sci. 2021, 2, 1. [Google Scholar] [CrossRef]
- Kumar, S.; Solanki, A. An abstractive text summarization technique using transformer model with self-attention mechanism. Neural Comput. Appl. 2023, 35, 18603–18622. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
- Scott, D.; Hallett, C.; Fettiplace, R. Data-to-text summarisation of patient records: Using computer-generated summaries to access patient histories. Patient Educ. Couns. 2013, 92, 153–159. [Google Scholar] [CrossRef] [PubMed]
- Del-Agua, M.; Jancsary, J. Ambient Clinical Intelligence: Generating Medical Reports with PyTorch. 2022. Available online: https://pytorch.org/blog/ambient-clinical-intelligence-generating-medical-reports-with-pytorch/ (accessed on 22 March 2024).
- Ben Abacha, A.; Yim, W.W.; Fan, Y.; Lin, T. An Empirical Study of Clinical Note Generation from Doctor-Patient Encounters. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2–6 May 2023; pp. 2291–2302. [Google Scholar] [CrossRef]
- Grambow, C.; Zhang, L.; Schaaf, T. In-Domain Pre-Training Improves Clinical Note Generation from Doctor-Patient Conversations. In Proceedings of the First Workshop on Natural Language Generation in Healthcare, Waterville, ME, USA, 18 July 2022; pp. 9–22. [Google Scholar]
- Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
- Gomez-Donoso, F.; Orts-Escolano, S.; Garcia-Garcia, A.; Garcia-Rodriguez, J.; Castro-Vargas, J.A.; Ovidiu-Oprea, S.; Cazorla, M. A robotic platform for customized and interactive rehabilitation of persons with disabilities. Pattern Recognit. Lett. 2017, 99, 105–113. [Google Scholar] [CrossRef]
- Metallinou, A.; Lee, S.; Narayanan, S. Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice. In Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia, Berkeley, CA, USA, 15–17 December 2008; pp. 250–257. [Google Scholar] [CrossRef]
- Han, W.; Zhang, Z.; Zhang, Y.; Yu, J.; Chiu, C.C.; Qin, J.; Gulati, A.; Pang, R.; Wu, Y. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context. arXiv 2020, arXiv:2005.03191. [Google Scholar]
- Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust Speech Recognition via Large-Scale Weak Supervision. arXiv 2022, arXiv:2212.04356. [Google Scholar]
- Latif, S.; Qadir, J.; Qayyum, A.; Usama, M.; Younis, S. Speech Technology for Healthcare: Opportunities, Challenges, and State of the Art. IEEE Rev. Biomed. Eng. 2021, 14, 342–356. [Google Scholar] [CrossRef] [PubMed]
- Latif, S.; Rana, R.; Qadir, J. Adversarial Machine Learning And Speech Emotion Recognition: Utilizing Generative Adversarial Networks For Robustness. arXiv 2018, arXiv:1811.11402. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv 2020, arXiv:1910.03771. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners; 2019. Available online: https://paperswithcode.com/paper/language-models-are-unsupervised-multitask (accessed on 22 March 2024).
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
Dialogues | Summaries | ||||
---|---|---|---|---|---|
Turns | Sentences | Words | Sentences | Words | |
Total count | 1702 | 1406 | 14,377 | 195 | 2061 |
Mean | 122 | 100 | 1027 | 7 | 74 |
Maximum | 225 | 144 | 1468 | 15 | 125 |
Name | Parameters | VRAM Requirements | Speed |
---|---|---|---|
tiny | 39 M | ∼1 GB | ∼32x |
base | 74 M | ∼1 GB | ∼16x |
small | 244 M | ∼2 GB | ∼6x |
medium | 769 M | ∼5 GB | ∼2x |
large | 1550 M | ∼10 GB | ∼1x |
ROUGE-1 | ROUGE-L | |
---|---|---|
BART | 0.249 | 0.227 |
DISTILBART | 0.201 | 0.186 |
BART-SAMSUM | 0.332 | 0.320 |
GPT2 | 0.332 | 0.317 |
BERT2BERT | 0.248 | 0.226 |
T5 | 0.272 | 0.248 |
ROUGE-1 | ROUGE-L | BERTSCORE | |
---|---|---|---|
BART-SAMSUM | 0.099 | 0.072 | 0.666 |
GPT-3.5 | 0.325 | 0.186 | 0.719 |
GPT-4 | 0.319 | 0.182 | 0.697 |
BART-SAMSUM-F | 0.422 | 0.271 | 0.735 |
ROUGE-1 | ROUGE-L | |
---|---|---|
BART | 0.395 | 0.372 |
DISTILBART | 0.372 | 0.319 |
BART-SAMSUM | 0.558 | 0.485 |
GPT2 | 0.574 | 0.553 |
BERT2BERT | 0.455 | 0.424 |
T5 | 0.424 | 0.383 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cabello-Collado, C.; Rodriguez-Juan, J.; Ortiz-Perez, D.; Garcia-Rodriguez, J.; Tomás, D.; Vizcaya-Moreno, M.F. Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques. Sensors 2024, 24, 2751. https://doi.org/10.3390/s24092751
Cabello-Collado C, Rodriguez-Juan J, Ortiz-Perez D, Garcia-Rodriguez J, Tomás D, Vizcaya-Moreno MF. Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques. Sensors. 2024; 24(9):2751. https://doi.org/10.3390/s24092751
Chicago/Turabian StyleCabello-Collado, Celia, Javier Rodriguez-Juan, David Ortiz-Perez, Jose Garcia-Rodriguez, David Tomás, and Maria Flores Vizcaya-Moreno. 2024. "Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques" Sensors 24, no. 9: 2751. https://doi.org/10.3390/s24092751
APA StyleCabello-Collado, C., Rodriguez-Juan, J., Ortiz-Perez, D., Garcia-Rodriguez, J., Tomás, D., & Vizcaya-Moreno, M. F. (2024). Automated Generation of Clinical Reports Using Sensing Technologies with Deep Learning Techniques. Sensors, 24(9), 2751. https://doi.org/10.3390/s24092751