4. Experiment Results and Discussion
Author Contributions
Conceptualization, H.L. (Hao Li) and H.L. (Hongfei Lin); Methodology, H.L. (Hao Li); Software, H.L. (Hao Li); Validation, H.L. (Hongfei Lin); Formal analysis, H.L. (Hao Li); Investigation, H.L. (Hao Li) and H.L. (Hongfei Lin); Resources, H.L. (Hongfei Lin); Data curation, H.L. (Hao Li); Writing—original draft preparation, H.L. (Hao Li); Writing—review and editing, H.L. (Hongfei Lin); Visualization, H.L. (Hao Li); Supervision, H.L. (Hongfei Lin); Project administration, H.L. (Hongfei Lin). All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data is contained within the article.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Baber, N. International conference on harmonisation of technical requirements for registration of pharmaceuticals for human use (ICH). Br. J. Clin. Pharmacol. 1994, 37, 401. [Google Scholar] [CrossRef]
- Khemani, B.; Malave, S.; Shinde, S.; Shukla, M.; Shikalgar, R.; Talwar, H. AI-Driven Pharmacovigilance: Enhancing Adverse Drug Reaction Detection with Deep Learning and NLP. MethodsX 2025, 15, 103460. [Google Scholar] [CrossRef]
- Wei, Y.; Li, R.; Sun, C.; Zhu, C.; Chen, T.; Yang, H.; Liu, H. The Role of Artificial Intelligence in Adverse Drug Reaction Monitoring: Current Status and Challenges. Med. J. Peking Union Med. Coll. Hosp. 2025, 16, 1363–1370. [Google Scholar] [CrossRef]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar] [CrossRef]
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]
- Hsu, D.; Moh, M.; Moh, T.S.; Moh, D. Drug side effect frequency mining over a large twitter dataset using apache spark. In Handbook of Artificial Intelligence in Biomedical Engineering; Apple Academic Press: Palm Bay, FL, USA, 2021; pp. 233–259. [Google Scholar] [CrossRef]
- Golder, S.; Xu, D.; O’Connor, K.; Wang, Y.; Batra, M.; Hernandez, G.G. Leveraging natural language processing and machine learning methods for adverse drug event detection in electronic health/medical records: A scoping review. Drug Saf. 2025, 48, 321–337. [Google Scholar] [CrossRef]
- Murphy, R.M.; Klopotowska, J.E.; de Keizer, N.F.; Jager, K.J.; Leopold, J.H.; Dongelmans, D.A.; Abu-Hanna, A.; Schut, M.C. Adverse drug event detection using natural language processing: A scoping review of supervised learning methods. PLoS ONE 2023, 18, e0279842. [Google Scholar] [CrossRef]
- Kuhn, M.; Campillos, M.; Letunic, I.; Jensen, L.J.; Bork, P. A side effect resource to capture phenotypic effects of drugs. Mol. Syst. Biol. 2010, 6, MSB200998. [Google Scholar] [CrossRef]
- Regev, Y.; Finkelstein-Landau, M.; Feldman, R.; Gorodetsky, M.; Zheng, X.; Levy, S.; Charlab, R.; Lawrence, C.; Lippert, R.A.; Zhang, Q.; et al. Rule-based extraction of experimental evidence in the biomedical domain: The KDD Cup 2002 (task 1). ACM Sigkdd Explor. Newsl. 2002, 4, 90–92. [Google Scholar] [CrossRef]
- Rastegar-Mojarad, M.; Elayavilli, R.K.; Yu, Y.; Liu, H. Detecting signals in noisy data-can ensemble classifiers help identify adverse drug reaction in tweets. In Proceedings of the Social Media Mining Shared Task Workshop at the Pacific Symposium on Biocomputing, Kohala Coast, HI, USA, 4–8 January 2016. [Google Scholar]
- Liu, J.; Zhao, S.; Zhang, X. An ensemble method for extracting adverse drug events from social media. Artif. Intell. Med. 2016, 70, 62–76. [Google Scholar] [CrossRef]
- Patki, A.; Sarker, A.; Pimpalkhute, P.; Nikfarjam, A.; Ginn, R.; O’Connor, K.; Smith, K.; Gonzalez, G. Mining adverse drug reaction signals from social media: Going beyond extraction. Proc. Biolinksig 2014, 2014, 1–8. [Google Scholar]
- Yang, M.; Wang, X.; Kiang, M.Y. Identification of Consumer Adverse Drug Reaction Messages on Social Media. In Proceedings of the Pacific Asia Conference on Information Systems (PACIS), Jeju Island, Republic of Korea, 18–22 June 2013; p. 193. Available online: https://aisel.aisnet.org/pacis2013/193 (accessed on 23 March 2026).
- Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 26–28 October 2014; pp. 1746–1751. [Google Scholar] [CrossRef]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level convolutional networks for text classification. Adv. Neural Inf. Process. Syst. 2015, 28, 649–657. Available online: https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html (accessed on 23 March 2026).
- Huynh, T.; He, Y.; Willis, A.; Rueger, S. Adverse Drug Reaction Classification With Deep Neural Networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers; The COLING 2016 Organizing Committee: Osaka, Japan, 2016. [Google Scholar]
- Alimova, I.; Solovyev, V. Interactive attention network for adverse drug reaction classification. In Proceedings of the Conference on Artificial Intelligence and Natural Language; Springer: Cham, Switzerland, 2018; pp. 185–196. [Google Scholar] [CrossRef]
- Wu, C.; Wu, F.; Liu, J.; Wu, S.; Huang, Y.; Xie, X. Detecting tweets mentioning drug name and adverse drug reaction with hierarchical tweet representation and multi-head self-attention. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 34–37. [Google Scholar] [CrossRef]
- Zhang, T.; Lin, H.; Xu, B.; Yang, L.; Wang, J.; Duan, X. Adversarial neural network with sentiment-aware attention for detecting adverse drug reactions. J. Biomed. Inform. 2021, 123, 103896. [Google Scholar] [CrossRef] [PubMed]
- Sun, C.; Qiu, X.; Xu, Y.; Huang, X. How to fine-tune bert for text classification? In Proceedings of the China National Conference on Chinese Computational Linguistics; Springer: Cham, Switzerland, 2019; pp. 194–206. [Google Scholar] [CrossRef]
- Qiu, Y.; Zhang, X.; Wang, W.; Zhang, T.; Xu, B.; Lin, H. Kesdt: Knowledge enhanced shallow and deep transformer for detecting adverse drug reactions. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing; Springer: Cham, Switzerland, 2023; pp. 601–613. [Google Scholar] [CrossRef]
- Li, H.; Qiu, Y.; Lin, H. Multi-Feature Enhanced Adverse Drug Reaction Detection for Social Media. J. Chin. Inf. Process. 2025, 39, 148–156. [Google Scholar]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen technical report. arXiv 2023, arXiv:2309.16609. [Google Scholar] [CrossRef]
- Luo, L.; Ning, J.; Zhao, Y.; Wang, Z.; Ding, Z.; Chen, P.; Fu, W.; Han, Q.; Xu, G.; Qiu, Y.; et al. Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks. J. Am. Med. Inform. Assoc. 2024, 31, 1865–1874. [Google Scholar] [CrossRef]
- Zitu, M.M.; Owen, D.; Manne, A.; Wei, P.; Li, L. Large Language Models for Adverse Drug Events: A Clinical Perspective. J. Clin. Med. 2025, 14, 5490. [Google Scholar] [CrossRef]
- Fu, W.; Lin, H.; Xu, G.; Qiu, Y.; Wang, J.; Diao, Y.; Zheng, P. Data Augmentation and Instruction Fine-Tuning for ADR Detection. In Proceedings of the China Health Information Processing Conference; Springer: Singapore, 2024; pp. 3–20. [Google Scholar]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. Adv. Neural Inf. Process. Syst. 2023, 36, 10088–10115. [Google Scholar] [CrossRef]
- Shum, K.; Diao, S.; Zhang, T. Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023; Association for Computational Linguistics: Singapore, 2023; pp. 12113–12139. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Association for Computational Linguistics: Hong Kong, China, 2019; pp. 3980–3990. [Google Scholar] [CrossRef]
- Karimi, S.; Metke-Jimenez, A.; Kemp, M.; Wang, C. Cadec: A corpus of adverse drug event annotations. J. Biomed. Inform. 2015, 55, 73–81. [Google Scholar] [CrossRef]
- Alvaro, N.; Miyao, Y.; Collier, N. TwiMed: Twitter and PubMed comparable corpus of drugs, diseases, symptoms, and their relations. JMIR Public Health Surveill. 2017, 3, e6396. [Google Scholar] [CrossRef]
- Li, Z.; Yang, Z.; Luo, L.; Xiang, Y.; Lin, H. Exploiting adversarial transfer learning for adverse drug reaction detection from texts. J. Biomed. Inform. 2020, 106, 103431. [Google Scholar] [CrossRef]
- Gao, Y.; Ji, S.; Zhang, T.; Tiwari, P.; Marttinen, P. Contextualized graph embeddings for adverse drug event detection. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2022; pp. 605–620. [Google Scholar] [CrossRef]
- Gao, Y.; Ji, S.; Marttinen, P. Knowledge-augmented graph neural networks with concept-aware attention for adverse drug event detection. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024); ELRA and ICCL: Torino, Italy, 2024; pp. 9787–9798. [Google Scholar]
- Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; Lu, C.; Zhao, C.; Deng, C.; Zhang, C.; Ruan, C.; et al. Deepseek-v3 technical report. arXiv 2024, arXiv:2412.19437. [Google Scholar] [CrossRef]
- Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al. The llama 3 herd of models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
Figure 1.
Examples of adverse drug reaction detection in social media.
Figure 3.
Automatic construction of chain of thought inputs and outputs.
Figure 4.
Adverse drug reaction detection question–answer pairs constructed for SFT.
Figure 6.
Demonstration of templates for guiding model outputs under low-resource conditions.
Figure 7.
Performance comparison of different base models on the CADEC dataset with and without CoT.
Figure 8.
Performance comparison of different base models on the Twitter dataset with and without CoT.
Figure 9.
The impact of cluster quantity on DetectionADRGPT on the Twitter dataset.
Table 1.
Introduction to adverse drug reaction detection datasets from social media.
| Dataset | ADR | Non-ADR | Total | MaxLength |
|---|
| CADEC | 2478 | 4996 | 7474 | 46 |
| Twitter | 232 | 393 | 625 | 241 |
Table 2.
Performance comparison between the proposed model and different discriminative models on the CADEC dataset.
| Model | P (%) | R (%) | F1 (%) |
|---|
| CRNN | 61.26 | 65.96 | 63.52 |
| HTR + MSA | 60.67 | 61.70 | 61.18 |
| CNN + corpus | 52.75 | 61.28 | 56.69 |
| CNN + transfer | 61.84 | 60.00 | 60.91 |
| ATL | 63.68 | 63.40 | 63.54 |
| ANNSA | 82.73 | 83.52 | 83.06 |
| KESDT | 88.16 | 87.63 | 87.82 |
| KnowCAGE | 87.10 | 93.90 | 90.40 |
| DMFE | 88.40 | 89.63 | 89.01 |
| DetectionADRGPT | 86.37 (0.21) | 88.24 (0.34) | 87.29 (0.29) |
| LLaMA-DetectionADR | 92.13 (0.59) | 93.21 (0.26) | 92.67 (0.41) |
Table 3.
Performance comparison between the proposed model and different discriminative models on the Twitter dataset.
| Model | P (%) | R (%) | F1 (%) |
|---|
| CRNN | 68.52 | 66.43 | 67.46 |
| HTR + MSA | 66.58 | 63.62 | 65.07 |
| CNN + corpus | 60.51 | 61.50 | 61.00 |
| CNN + transfer | 69.58 | 61.74 | 65.42 |
| ATL | 70.84 | 65.02 | 67.81 |
| ANNSA | 58.82 | 73.34 | 64.18 |
| CGEM | 84.20 | 83.70 | 83.90 |
| KnowCAGE | 84.80 | 84.10 | 84.40 |
| DetectionADRGPT | 81.39 (0.51) | 84.25 (0.89) | 82.80 (0.72) |
| LLaMA-DetectionADR | 85.99 (0.65) | 86.28 (0.43) | 86.13 (0.52) |
Table 4.
Performance comparison between the proposed method and mainstream generative models on the CADEC dataset.
| Model | P (%) | R (%) | F1 (%) |
|---|
| GPT4o (zero shot) | 81.92 | 79.21 | 80.54 |
| GPT4o (few shot) | 84.17 | 78.84 | 81.42 |
| DeepseekV3 (zero shot) | 75.29 | 80.66 | 77.88 |
| DeepseekV3 (few shot) | 76.22 | 81.13 | 78.60 |
| LLaMA3-8B-SFT | 92.12 | 91.44 | 91.78 |
| Bal-LLaMA | 92.0 | 93.00 | 92.40 |
| DetectionADRGPT | 86.37 (0.21) | 88.24 (0.34) | 87.29 (0.29) |
| LLaMA-DetectionADR | 92.13 (0.59) | 93.21 (0.26) | 92.67 (0.41) |
Table 5.
Performance comparison between the proposed method and mainstream generative models on the Twitter dataset.
| Model | P (%) | R (%) | F1 (%) |
|---|
| GPT4o (zero shot) | 80.22 | 79.37 | 79.79 |
| GPT4o (few shot) | 82.19 | 80.11 | 81.13 |
| DeepseekV3 (zero shot) | 76.99 | 81.22 | 79.04 |
| DeepseekV3 (few shot) | 75.57 | 82.98 | 79.10 |
| LLaMA3-8B-SFT | 84.16 | 86.20 | 85.17 |
| Bal-LLaMA | 85.90 | 86.30 | 85.50 |
| DetectionADRGPT | 81.39 (0.51) | 84.25 (0.89) | 82.80 (0.72) |
| LLaMA-DetectionADR | 85.99 (0.65) | 86.28 (0.43) | 86.13 (0.52) |
Table 6.
Ablation experiments of the LLaMA-DetectionADR model on the CADEC dataset.
| Model | P (%) | R (%) | F1 (%) | ΔF1 |
|---|
| LLaMA-DetectionADR | 92.13 | 93.21 | 92.67 | − |
| w/o CoT | 91.17 | 92.56 | 91.86 | 0.81 |
| w/o Multiple Prompts | 92.25 | 92.19 | 92.22 | 0.45 |
Table 7.
Ablation experiments of the LLaMA-DetectionADR model on the Twitter dataset.
| Model | P (%) | R (%) | F1 (%) | ΔF1 |
|---|
| LLaMA-DetectionADR | 85.99 | 86.28 | 86.13 | − |
| w/o CoT | 84.85 | 86.27 | 85.55 | 0.58 |
| w/o Multiple Prompts | 85.79 | 85.71 | 85.74 | 0.39 |
Table 8.
Ablation experiments of the DetectionADRGPT architecture on the CADEC dataset.
| Model | P (%) | R (%) | F1 (%) | ΔF1 |
|---|
| DetectionADRGPT | 86.37 | 88.24 | 87.29 | − |
| w/o clustering | 84.69 | 87.12 | 85.89 | 1.40 |
| w/o CoT | 82.49 | 86.17 | 84.29 | 2.99 |
| w/o prompt | 84.17 | 80.08 | 82.07 | 5.22 |
Table 9.
Ablation experiments of the DetectionADRGPT architecture on the Twitter dataset.
| Model | P (%) | R (%) | F1 (%) | ΔF1 |
|---|
| DetectionADRGPT | 81.39 | 84.25 | 82.80 | − |
| w/o clustering | 81.71 | 82.74 | 82.22 | 0.58 |
| w/o CoT | 82.16 | 81.04 | 81.60 | 1.16 |
| w/o prompt | 81.67 | 79.36 | 80.49 | 2.31 |
Table 10.
Model performance comparison on noisy text from the Twitter dataset.
| Model | P (%) | R (%) | F1 (%) |
|---|
| LLaMA-SFT | 79.15 | 84.31 | 81.65 |
| LLaMA-DetectionADR | 83.56 | 84.92 | 84.23 |
Table 11.
Model performance comparison on logically ambiguous text from the Twitter dataset.
| Model | P (%) | R (%) | F1 (%) |
|---|
| LLaMA-SFT | 74.81 | 75.33 | 75.07 |
| LLaMA-DetectionADR | 81.27 | 82.64 | 81.95 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |