Leveraging Large Language Models for Departmental Classification of Medical Records
Abstract
1. Introduction
1.1. Background
1.2. Significance
2. Related Work
2.1. General Large Language Models
2.2. Medical Large Language Models
3. Methodology
3.1. Dataset
3.2. Data Processing
3.3. Model Development and Training
3.3.1. Loss Function
3.3.2. Optimization Algorithms
3.3.3. System Architecture of the Medical Department Classification Models
3.3.4. Pretrained LLMs
3.3.5. Methodological Details
4. Evaluation
4.1. Evaluation Methods
4.2. Evaluation Against Baselines
5. Results
5.1. Accuracy of Diagnosis
5.2. Resource Efficiency
5.3. Multilingual Support
6. Discussion
6.1. Limitations in Complex Cases
6.2. Demographic and Clinical Bias Considerations
6.3. Ethical and Privacy Concerns
6.4. Scalability and Adaptation
6.5. Expert Feedback
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Anisa, D.; Farhansyah, F.; Wulandari, S. Sosialisasi Pentingnya Rekam Medis Di Fasilitas Layanan Kesehatan Pada Siswa Sman 1 Teluk Sebong. Prima Portal Ris. DAN Inov. Pengabdi. Masy. 2023, 2, 253–256. [Google Scholar] [CrossRef]
- Liu, F.; Li, Z.; Zhou, H.; Yin, Q.; Yang, J.; Tang, X.; Luo, C.; Zeng, M.; Jiang, H.; Gao, Y.; et al. Large Language Models in the Clinic: A Comprehensive Benchmark. arXiv 2024, arXiv:2405.00716. [Google Scholar]
- Yang, J.; Jin, H.; Tang, R.; Han, X.; Feng, Q.; Jiang, H.; Zhong, S.; Yin, B.; Hu, X. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. arXiv 2023, arXiv:2304.13712. [Google Scholar] [CrossRef]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Zeng, A.; Xu, B.; Wang, B.; Zhang, C.; Yin, D.; Zhang, D.; Rojas, D.; Feng, G.; Zhao, H.; Lai, H.; et al. ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools. arXiv 2024, arXiv:2406.12793. [Google Scholar]
- Yang, X. A large language model for electronic health records. npj Digit. Med. 2022, 5, 194. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. QLORA: Efficient Finetuning of Quantized LLMs. Adv. Neural Inf. Process. Syst. 2023, 36, 10088–10115. [Google Scholar]
- Raiaan, M.A.K.; Mukta, S.H.; Fatema, K.; Fahad, N.M.; Sakib, S.; Mim, M.M.J.; Ahmad, J.; Ali, M.E.; Azam, S. A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE Access 2024, 12, 26839–26874. [Google Scholar] [CrossRef]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen Technical Report. arXiv 2023, arXiv:2309.16609. [Google Scholar]
- Zhang, G.; Piccardi, M.; Borzeshi, E.Z. Sequential Labeling with Structural SVM Under Nondecomposable Losses. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 4177–4188. [Google Scholar] [CrossRef] [PubMed]
- Demirkaya, A.; Chen, J.; Oymak, S. Exploring the Role of Loss Functions in Multiclass Classification. In Proceedings of the 2020 54th Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 18–20 March 2020; pp. 1–5. [Google Scholar]
- Tang, L. Evaluating large language models on medical evidence summarization. npj Digit. Med. 2023, 6, 158. [Google Scholar] [CrossRef] [PubMed]
- Li, S.S.; Balachandran, V.; Feng, S.; Ilgen, J.; Pierson, E.; Koh, P.W.; Tsvetkov, Y. MEDIQ: Question-Asking LLMs for Adaptive and Reliable Clinical Reasoning. arXiv 2024, arXiv:2406.00922. [Google Scholar]
- Huang, K.; Altosaar, J.; Ranganath, R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv 2020, arXiv:1904.05342. [Google Scholar]
- Li, Y.; Li, Z.; Zhang, K.; Dan, R.; Jiang, S.; Zhang, Y. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. Cureus 2023, 15, e40895. [Google Scholar] [CrossRef]
- Ye, Q.; Liu, J.; Chong, D.; Zhou, P.; Hua, Y.; Liu, F.; Cao, M.; Wang, Z.; Cheng, X.; Lei, Z.; et al. Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model. arXiv 2024, arXiv:2310.09089. [Google Scholar]
- FreedomIntelligence. Huatuo26M-Lite. [Dataset]. Available online: https://huggingface.co/datasets/FreedomIntelligence/Huatuo26M-Lite (accessed on 10 November 2024).
- Zeng, X. Zeng981/Nlpdataset. [Dataset]. Available online: https://huggingface.co/datasets/zeng981/nlpdataset (accessed on 12 November 2024).
- Zaheer, R.; Shaziya, H.; College, N. A Study of the Optimization Algorithms in Deep Learning. In Proceedings of the 2019 Third International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 10–11 January 2019. [Google Scholar] [CrossRef]
- Gemma Team; Riviere, M.; Pathak, S.; Sessa, P.G.; Hardin, C.; Bhupatiraju, S.; Hussenot, L.; Mesnard, T.; Shahriari, B.; Ramé, A.; et al. Gemma 2: Improving Open Language Models at a Practical Size. arXiv 2024, arXiv:2408.00118. [Google Scholar]
- Ainslie, J.; Lee-Thorp, J.; Jong, M.; Zemlyanskiy, Y.; Lebrón, F.; Sanghai, S. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. arXiv 2023, arXiv:2305.13245. [Google Scholar]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
- Yang, A.; Yang, B.; Hui, B.; Zheng, B.; Yu, B.; Zhou, C.; Li, C.; Li, C.; Liu, D.; Huang, F.; et al. Qwen2 Technical Report. arXiv 2024, arXiv:2407.10671. [Google Scholar]
- Lin, C.-Y. ROUGE: A Package for Automatic Evaluation of Summaries. Available online: https://aclanthology.org/W04-1013/ (accessed on 27 May 2025).
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics—ACL ’02. Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; p. 311.
- Song, D.; Li, D. Psycholinguistic Norms for 3,783 Two-Character Words in Simplified Chinese. Sage Open 2021, 11, 21582440211054495. [Google Scholar] [CrossRef]
- Topol, E. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again; Hachette: London, UK, 2019. [Google Scholar]
- Yu, K.H.; Beam, A.L.; Kohane, I.S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018, 2, 719–731. [Google Scholar] [CrossRef] [PubMed]
- Rieke, N. The future of digital health with federated learning. npj Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef] [PubMed]
- Mehrabi, N. A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Lu, J. Towards continual learning: A survey. Artif. Intell. Rev. 2022, 57, 1–29. [Google Scholar]
Model | PEFT | Transformers | PyTorch | Datasets | Tokenizers |
---|---|---|---|---|---|
Gemma-2-9B | 0.11.1 | 4.42.3 | 2.3.0 | 2.20.0 | 0.19.1 |
GLM-4-9B | 0.11.1 | 4.42.3 | 2.3.0 | 2.20.0 | 0.19.1 |
Qwen2-7B | 0.11.1 | 4.42.3 | 2.3.0 | 2.20.0 | 0.19.1 |
Llama-3.2-3B-Instruct | 0.12.0 | 4.52.2 | 2.4.0 | 2.21.0 | 0.20.0 |
Qwen-1.8B | 0.12.0 | 4.52.2 | 2.4.0 | 2.21.0 | 0.20.0 |
Model | Accuracy |
---|---|
Gemma-2-9B | 0.96 |
BioBERT | 0.27 |
ClinicalBERT | 0.96 |
SVM | 0.28 |
CNN | 0.91 |
Model | ROUGE-1 | ROUGE-2 |
---|---|---|
Gemma-2-9B | 96.2561 | 31.1659 |
GLM-4-9B | 94.9637 | 30.7327 |
Qwen2-7B | 95.7793 | 31.0675 |
Llama-3.2-3B-Instruct | 95.0357 | 30.7664 |
Qwen-1.8B | 94.9393 | 30.7158 |
Metrics | BLEU-4 | ROUGE-1 | ROUGE-2 | ROUGE-L | Runtime (s) | Samples per Second | Steps per Second |
---|---|---|---|---|---|---|---|
Gemma-2-9B | 79.1734 | 96.2561 | 31.1659 | 96.2561 | 7144.3885 | 2.487 | 1.244 |
GLM-4-9B | 78.3302 | 94.9637 | 30.7327 | 94.9637 | 18,074.5771 | 0.983 | 0.492 |
Qwen2-7B | 78.8742 | 95.7793 | 31.0675 | 95.7793 | 19,261.7879 | 0.923 | 0.461 |
Llama-3.2-3B-Instruct | 78.3801 | 95.0357 | 30.7664 | 95.0357 | 12,565.5941 | 1.414 | 0.707 |
Qwen-1.8B | 78.2719 | 94.9393 | 30.7158 | 94.9393 | 1622.0370 | 10.956 | 5.478 |
Index | Input | Label | Output |
---|---|---|---|
4528 | 我是一个打工仔,今年40岁,男,有腰椎间盘突出就已经是一个会让我难受的问题了,经常容易腰疼,问一下腰部骨质增生腰间盘突出怎么治? | 外科 | 内科 |
I’m a 40-year-old man and have worked manual labor jobs in the past. I already suffer from lumbar disc herniation, which has been a persistent and painful problem for me. I often experience lower back pain. I’d like to ask: how can lumbar spondylosis and herniated lumbar discs be treated? | Surgery | Internal Medicine | |
75 | 我晚上发现孩子的后背上出现了几个小红点点,当时我也没有在意,今天早晨发现他的全身都起了很多的红疹,小孩全身红疹怎么回事? | 皮肤性病科 | 儿科 |
I noticed a few small red dots on my child’s back at night. I didn’t pay much attention to it at that time. But this morning, I found that his whole body was covered with many red rashes. What’s going on with a child having rashes all over his body? | Department of Dermatology and Venereology | Pediatrics |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ihnaini, B.; Zeng, X.; Yan, H.; Fang, F.; Sangi, A.R. Leveraging Large Language Models for Departmental Classification of Medical Records. Appl. Sci. 2025, 15, 6525. https://doi.org/10.3390/app15126525
Ihnaini B, Zeng X, Yan H, Fang F, Sangi AR. Leveraging Large Language Models for Departmental Classification of Medical Records. Applied Sciences. 2025; 15(12):6525. https://doi.org/10.3390/app15126525
Chicago/Turabian StyleIhnaini, Baha, Xintong Zeng, Handi Yan, Feige Fang, and Abdur Rashid Sangi. 2025. "Leveraging Large Language Models for Departmental Classification of Medical Records" Applied Sciences 15, no. 12: 6525. https://doi.org/10.3390/app15126525
APA StyleIhnaini, B., Zeng, X., Yan, H., Fang, F., & Sangi, A. R. (2025). Leveraging Large Language Models for Departmental Classification of Medical Records. Applied Sciences, 15(12), 6525. https://doi.org/10.3390/app15126525