DAFT: Domain-Augmented Fine-Tuning for Large Language Models in Emotion Recognition of Health Misinformation
Abstract
1. Introduction
2. Research Background and Related Work
2.1. Definition of Health Misinformation
2.2. Health Misinformation Emotion Recognition Method
2.3. Prompt Fine-Tuning for LLMs
2.4. Review Summary
3. DAFT Framework for Emotion Recognition in Health Misinformation
3.1. Construction of Health Misinformation Emotion Corpus
3.1.1. Corpus Collection
3.1.2. Selection and Expansion of Basic Emotion Lexicon
3.1.3. Emotion Annotation of the Corpus
3.2. Fine-Tuning Model Based on Emotion-Annotated Corpus
3.3. Model Evaluation
4. Experimental Results and Analysis
4.1. Experimental Data
4.1.1. Dataset Construction
4.1.2. Dataset Balancing Processing
4.2. Model Parameter Settings
4.3. Validation Set Performance Monitoring
4.4. Model Comparison Experiment
5. Discussion
- (1)
- Effectiveness of fine-tuning: All models showed varying degrees of improvement after fine-tuning, with performance gains most evident in the recognition of negative emotions. The 4o-FT model achieved the lowest prediction error for negative emotions such as anger, fear, and sadness. This may be attributed to the dominance of negative emotions in health misinformation, which leads to a stronger representation of negative emotional features in the dataset. In addition, the GPT-4o model benefited from large-scale pretraining, which enhanced its capacity to recognize negative emotional cues.
- (2)
- Model capability comparison: GPT-3.5-turbo performed worse than GPT-4o and GPT-4o-mini in emotion recognition, especially in identifying positive emotions such as joy and surprise, where its accuracy was significantly lower. This suggests that smaller pretrained models face challenges in capturing subtle emotional differences in multidimensional emotion recognition tasks, which negatively affects their overall performance.
- (3)
- Impact of data balancing: The mixed sampling strategy effectively addressed emotional category imbalance in the dataset by increasing the number of positive emotion samples (e.g., joy and surprise) and reducing training bias. The confusion matrix results further demonstrate that, although negative emotions remain predominant, the distribution of true and predicted emotion labels became more consistent after fine-tuning, indicating that mixed sampling enhances the model’s ability to learn emotional features across categories.
- (1)
- The DAFT strategy significantly enhanced model performance in emotion recognition for health misinformation, improving prediction accuracy, precision, and emotional granularity. After fine-tuning, all three GPT models achieved reductions in MAE and MSE, consistent with findings that fine-tuning enhances task adaptation and semantic alignment in LLMs [12]. The F1-score of GPT-4o increased by 0.1156, and its accuracy improved from 76.95% to 84.77%, while GPT-4o-mini showed improved vector similarity (+0.1759) and reduced probability distribution divergence (−0.1498), demonstrating effective emotional representation learning.
- (2)
- Model performance is positively correlated with pre-training scale, as larger-scale models can encode richer emotional semantics and contextual knowledge, leading to better task generalization [13]. This trend is evident in the confusion matrix analysis: the GPT-4o model achieved higher recognition accuracy for primary emotions (219 Anger and 376 Fear correctly identified), with close alignment between predicted and true emotion counts. In contrast, the smaller GPT-3.5-turbo model struggled to capture complex emotional structures, correctly predicting only an average of 29 instances across Anger, Disgust, and Joy categories.
- (3)
- Balanced data sampling improves performance consistency across emotional categories. The mixed sampling strategy effectively mitigated overrepresentation of negative emotions and enabled fairer learning across emotional dimensions, in line with previous work emphasizing the importance of addressing data imbalance in classification tasks [54]. Although variation in performance across emotion categories was observed, no extreme disparities were found. For example, the 4o-FT model reached a maximum accuracy of 0.912 and a minimum of 0.752 across emotional dimensions, with a median recall of 0.5 and only 0.101 variation between the highest and lowest F1-scores. These results suggest that balancing data distribution improves the overall stability of model recognition performance.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Muhtar, S.M.; Amir, A.S. Utilizing social media for public health advocacy and awareness in digital health communication. MSJ Major. Sci. J. 2024, 1, 270–278. [Google Scholar] [CrossRef]
- Islam, A.K.M.N.; Laato, S.; Talukder, S.; Sutinen, E. Misinformation sharing and social media fatigue during COVID-19: An affordance and cognitive load perspective. Technol. Forecast. Soc. Change 2020, 159, 120201. [Google Scholar] [CrossRef] [PubMed]
- Zhou, X.; Jain, A.; Phoha, V.V.; Zafarani, R. Fake News Early Detection: An Interdisciplinary Study. arXiv 2019, arXiv:1904.11679. [Google Scholar]
- Chen, C.; Wang, H.; Shapiro, M.; Xiao, Y.; Wang, F.; Shu, K. Combating Health Misinformation in Social Media: Characterization, Detection, Intervention, and Open Issues. arXiv 2022, arXiv:2211.05289. [Google Scholar] [CrossRef]
- Robert, B.J.; Yan, Z.; Jacek, G. Predicting healthcare professionals’ intention to correct health misinformation on social media. Telemat. Inform. 2022, 73, 101864. [Google Scholar]
- Liu, Z.; Zhang, T.; Yang, K.; Thompson, P.; Yu, Z.; Ananiadou, S. Emotion detection for misinformation: A review. Inf. Fusion 2024, 107, 102300. [Google Scholar] [CrossRef]
- Horner, C.G.; Galletta, D.; Crawford, J.; Shirsat, A. Emotions: The unexplored fuel of fake news on social media. In Fake News on the Internet; Routledge: London, UK, 2023; pp. 147–174. [Google Scholar]
- Jawaher, A.; Suhuai, L.; Yuqing, L. A comprehensive survey on machine learning approaches for fake news detection. Multimed. Tools Appl. 2024, 17, 51009–51067. [Google Scholar]
- Raymond, C.; Gregorious, S.B.; Sandeep, D. Combining Sentiment Lexicons and Content-Based Features for Depression Detection. IEEE Intell. Syst. 2021, 36, 99–105. [Google Scholar] [CrossRef]
- Brauwers, G.; Frasincar, F. A Survey on Aspect-Based Sentiment Classification. ACM Comput. Surv. 2022, 4, 1–37. [Google Scholar] [CrossRef]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Peng, C.; Yang, X.; Smith, K.E.; Yu, Z.; Chen, A.; Bian, J.; Wu, Y. Model Tuning or Prompt Tuning a Study of Large Language Models for Clinical Concept and Relation Extraction. J. Biomed. Inform. 2024, 153, 104630. [Google Scholar] [CrossRef] [PubMed]
- Parthasarathy, V.B.; Zafar, A.; Khan, A.; Shahid, A. The ultimate guide to fine-tuning llms from basics to breakthroughs: An exhaustive review of technologies, research, best practices, applied research challenges and opportunities. arXiv 2024, arXiv:2408.13296. [Google Scholar] [CrossRef]
- White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv 2023, arXiv:2302.11382. [Google Scholar] [CrossRef]
- Maharjan, J.; Garikipati, A.; Preet Singh, N.; Cyrus, L.; Sharma, M.; Ciobanu, M.; Barnes, G.; Thapa, R.; Mao, Q.; Das, R. OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Sci. Rep. 2024, 1, 14156. [Google Scholar] [CrossRef] [PubMed]
- Xu, H.; Elbayad, M.; Murray, K.; Maillard, J.; Goswami, V. Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity. arXiv 2023, arXiv:2305.02176. [Google Scholar] [CrossRef]
- Fernandez, M.; Alani, H. Online misinformation: Challenges and future directions. In Proceedings of the Companion of the Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 595–602. [Google Scholar]
- Victoria, L.R. Disinformation and misinformation triangle: A conceptual model for “fake news” epidemic, causal factors and interventions. J. Doc. 2019, 5, 1013–1034. [Google Scholar]
- Karami, M.; Nazer, T.H.; Liu, H. Profiling Fake News Spreaders on Social Media through Psychological and Motivational Factors. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media, Online, 30 August–2 September 2021; pp. 225–230. [Google Scholar]
- Bordia, P.; Difonzo, N. Rumor, Gossip and Urban Legends. Diogenes 2007, 1, 19–35. [Google Scholar] [CrossRef]
- Wang, B.; Ma, J.; Lin, H.; Yang, Z.; Yang, R.; Tian, Y.; Chang, Y. Explainable fake news detection with large language model via defense among competing wisdom. In Proceedings of the ACM Web Conference 2024, Association for Computing Machinery, Singapore, 13–17 May 2024; pp. 2452–2463. [Google Scholar]
- Yuxi, W.; Martin, M.; Aleksandra, T.; David, S. Systematic Literature Review on the Spread of Health-related Misinformation on Social Media. Soc. Sci. Med. 2019, 240, 112552. [Google Scholar] [CrossRef]
- Sezer, K.; Adnan, K. A comprehensive analysis of COVID-19 misinformation, public health impacts, and communication strategies: Scoping review. J. Med. Internet Res. 2024, 26, e56931. [Google Scholar]
- Balshetwar, S.V.; Abilash, R.S.; Dani, J. Fake news detection in social media based on sentiment analysis using classifier techniques. Multimed. Tools Appl. 2023, 82, 35781–35811. [Google Scholar] [CrossRef]
- Kaur, R.; Kautish, S. Multimodal Sentiment Analysis: A Survey and Comparison. Int. J. Serv. Sci. Manag. Eng. Technol. IJSSMET 2019, 10, 38–58. [Google Scholar] [CrossRef]
- Chen, H.; Zheng, P.; Wang, X.; Hu, S.; Zhu, B.; Hu, J.; Wu, X.; Lyu, S. Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 1 June 2023; pp. 923–932. [Google Scholar]
- Yousif, A.; Buckley, J. Impact of Sentiment Analysis in Fake Review Detection. arXiv 2022, arXiv:2212.08995. [Google Scholar] [CrossRef]
- Hamed, S.K.; Ab Aziz, M.J.; Yaakub, M.R. Fake News Detection Model on Social Media by Leveraging Sentiment Analysis of News Content and Emotion Analysis of Users’ Comments. Sensors 2023, 4, 1748. [Google Scholar] [CrossRef] [PubMed]
- Xuewen, Z.; Yaxiong, P.; Xiao, G.; Gang, L. Sentiment analysis-based social network rumor detection model with bi-directional graph convolutional networks. In Proceedings of the International Conference on Computer Application and Information Security, Online, 21 March 2023. [Google Scholar]
- Shin, J.; Tang, C.; Mohati, T.; Nayebi, M.; Wang, S.; Hemmati, H. Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code. In Proceedings of the 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR), Ottawa, ON, Canada, 28–29 April 2025; pp. 490–502. [Google Scholar]
- Ray, P.P. ChatGPT A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet Things Cyber Phys. Syst. 2023, 3, 121–154. [Google Scholar] [CrossRef]
- Wang, Y.; Rao, S.; Lee, J.; Jobanputra, M.; Demberg, V. B-cos LM: Efficiently Transforming Pre-trained Language Models for Improved Explainability. arXiv 2025, arXiv:2502.12992. [Google Scholar]
- Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.-L. A Brief Overview of ChatGPT:The History, Status Quo and Potential Future Development. IEEE/CAA J. Autom. Sin. 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
- Sharma, V.; Raman, V. A reliable knowledge processing framework for combustion science using foundation models. Energy AI 2024, 16, 100365. [Google Scholar] [CrossRef]
- Briony, S.T.; David, L. Public Health and Online Misinformation: Challenges and Recommendations. Annu. Rev. Public Health 2020, 41, 433–451. [Google Scholar]
- Victor, S.; Javier, A. Prevalence of Health Misinformation on Social Media: Systematic Review. J. Med. Internet Res. 2021, 23, e17187. [Google Scholar]
- Navarrete, A.S.; Martinez-Araneda, C.; Vidal-Castro, C.; Rubio-Manzano, C. A novel approach to the creation of a labelling lexicon for improving emotion analysis in text. Electron. Libr. 2021, 39, 118–136. [Google Scholar] [CrossRef]
- Ordan, N.; Wintner, S. WordNet: A Lexical Database for the English Language. Choice Rev. Online 2007, 45, 45–1196. [Google Scholar] [CrossRef]
- Tonin, F.S.; Gmünder, V.; Bonetti, A.F.; Mendes, A.M.; Fernandez-Llimos, F. Use of ‘Pharmaceutical services’ Medical Subject Headings (MeSH) in articles assessing pharmacists’ interventions. Explor. Res. Clin. Soc. Pharm. 2022, 7, 100172. [Google Scholar] [CrossRef] [PubMed]
- Karimi, Z.A.; Sayeh, M. A survey of aspect-based sentiment analysis classification with a focus on graph neural network methods. Multimed. Tools Appl. 2023, 83, 56619–56695. [Google Scholar]
- Fugate, J.M.B.; O’Hare, A.J. A Review of The Handbook of Cognition and Emotion. J. Soc. Psychol. 2014, 154, 92–95. [Google Scholar] [CrossRef]
- Wankhade, M.; Rao, A.C.S.; Kulkarni, C. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 2022, 55, 5731–5780. [Google Scholar] [CrossRef]
- Lea, C.; Carlo, S.; Ester, B.; Patricio, M.B. Intensional Learning to Efficiently Build up Automatically Annotated Emotion Corpora. IEEE Trans. Affect. Comput. 2017, 11, 335–347. [Google Scholar] [CrossRef]
- Lv, X. Few-Shot Text Classification with an Efficient Prompt Tuning Method in Meta-Learning Framework. Int. J. Pattern Recognit. Artif. Intell. 2024, 38, 2451006. [Google Scholar] [CrossRef]
- Wankhade, M.; Kulkarni, C.; Rao, A.C.S. A survey on aspect base sentiment analysis methods and challenges. Appl. Soft Comput. 2024, 167, 112249. [Google Scholar] [CrossRef]
- Krzysztof, K.; Agnieszka, C.; Karolina, B.; Łukasz, B. Vaccine misinformation on social media—Topic-based content and sentiment analysis of Polish vaccine-deniers’ comments on Facebook. Hum. Vaccines Immunother. 2021, 17, 10–11. [Google Scholar]
- Katarzyna, B.; Marcin, S. A dataset for Sentiment analysis of Entities in News headlines (SEN). Procedia Comput. Sci. 2021, 192, 3627–3636. [Google Scholar] [CrossRef]
- Hemmatian, F.; Sohrabi, M.K. A survey on classification techniques for opinion mining and sentiment analysis. Artif. Intell. Rev. 2019, 52, 1495–1545. [Google Scholar] [CrossRef]
- Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
- Kruthika, S.G.; Nagavi, T.C.; Mahesha, P.; Chethana, H.T.; Ravi, V.; Mazroa, A.A. Identification of Suspect Using Transformer and Cosine Similarity Model for Forensic Voice Comparison. Secur. Priv. 2025, 8, e70038. [Google Scholar] [CrossRef]
- Nielsen, F. On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means. Entropy 2019, 21, 485. [Google Scholar] [CrossRef]
- Todinov, M.T. Probabilistic interpretation of algebraic inequalities related to reliability and risk. Qual. Reliab. Eng. Int. 2023, 39, 2330–2342. [Google Scholar] [CrossRef]
- Deng, X.; Liu, Q.; Deng, Y.; Mahadevan, S. An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Inf. Sci. 2016, 340, 250–261. [Google Scholar] [CrossRef]
- Chamidah, N.; Widiyanto, D.; Seta, H.B.; Aziz, A.A. The Impact of Oversampling and Undersampling on Aspect-Based Sentiment Analysis of Indramayu Tourism Using Logistic Regression. Rev. D’Intelligence Artif. 2024, 38, 795. [Google Scholar] [CrossRef]





| Concept | Author, Year | Main Contribution | Limitations |
|---|---|---|---|
| Definition of Health Misinformation | Fernandez & Alani (2018) [17] | Clarifies core concepts and challenges of online misinformation | Does not focus on health domain or emotional mechanisms in misinformation |
| Victoria L. Rubin (2019) [18] | Differentiates misinformation, disinformation and interventions | No empirical modeling of health-specific misinformation or emotion | |
| Chen et al. (2022) [4] | Systematic overview of health misinformation and detection approaches | Emotion is treated as a feature but not modeled as multi-dimensional vectors | |
| Health Misinformation Emotion Recognition Method | Balshetwar et al. (2023) [24] | Shows emotional features can help fake news detection | Uses general emotion polarity; ignores domain-specific medical emotion expressions |
| Kaur & Kautish (2019) [25] | Demonstrates benefit of combining text and image modalities | Not specific to health misinformation; no domain-adapted lexicon | |
| Hamed et al. (2023) [28] | Integrates sentiment of news and users’ comments to detect fake news | Emotion representation still coarse-grained | |
| Xuewen et al. (2023) [29] | Uses graph structures and sentiment for rumor detection | Does not exploit LLMs or prompt-based fine-tuning for emotion vectors | |
| Prompt Fine-tuning for LLMs | Liu et al. (2023) [11] | Systematically reviews prompting methods for PLMs | Focuses on general NLP tasks; no health-specific emotion recognition settings |
| Parthasarathy et al. (2024) [13] | Summarizes full-parameter and parameter-efficient fine-tuning | Lacks concrete application to health misinformation emotion recognition | |
| Xu et al. (2023) [16] | Reduces training cost via dynamic capacity | Method is generic; not combined with emotion-specific lexicons or health misinformation | |
| Shin et al. (2025) [30] | Provides evidence on when prompt or fine-tuning works better | Application domain is software engineering, not health misinformation |
| Misinformation Dataset | Type | Original Data Count and Filtered Data Count | Health Misinformation Assessment Criteria |
|---|---|---|---|
| Fake news dataset | News | 23502→2409 |
|
| Twitter rumor | Social- Media | 3213→1120 | |
| Monkeypox misinformation | News | 6287→1163 | |
| DataSet Misinfo FAKE | News and Social-media | 55164→4463 | |
| Vaccine misinformation | News and Social-media | 1999→763 | |
| COVID fake news | News | 2140→835 | |
| CoAID | News and Social-media | 4251→1763 |
| Indicator | Name | Meaning |
|---|---|---|
| Regression error metrics | MAE, MSE | Mean absolute errors and mean squared errors for each dimension |
| Structural similarity of vectors | Cosine similarity | Assessing the similarity in direction between predicted and actual distributions |
| Differences in probability distributions | Jensen-Shannon distance | Evaluating the overall distance between predicted and actual distributions |
| Classification support metrics | Precision, Recall, F1-Score | Analyzing the ability to identify whether emotion exists |
| Classification index | Confusion matrix | Analyzing misclassification cases to identify which emotion categories are prone to misclassification |
| Dimension | Occurrence Count (Before/After Balancing) | Relative Proportion (Before/After Balancing) | ||
|---|---|---|---|---|
| Anger | 1768 | 1768 | 14.35% | 15.96% |
| Disgust | 1460 | 1460 | 11.85% | 13.18% |
| Fear | 3998 | 2928 | 32.45% | 26.43% |
| Joy | 1563 | 2196 | 12.69% | 14.11% |
| Sadness | 2611 | 1963 | 21.19% | 17.72% |
| Surprise | 919 | 1397 | 7.46% | 12.61% |
| Original Model Name | Model Name Before Fine-Tuning | Fine-Tuned Model Name |
|---|---|---|
| GPT-4o-2024-08-06 | GPT-4o | 4o-FT |
| GPT-4o-mini-2024-07-18 | GPT-4o-mini | 4o-mini-FT |
| GPT-3.5-turbo-0125 | GPT-3.5-turbo | 3.5-turbo-FT |
| Parameter Name | Parameter Settings |
|---|---|
| model | in Table 4 |
| batch size | 32 |
| learning rate | 2 × 10−5 |
| epoch | 8 |
| Emotion Dimension | MAE | MSE | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4o-FT | GPT-4o | 4o-mini-FT | GPT-4o-mini | 3.5-Turbo-FT | GPT-3.5-turbo | 4o-FT | GPT-4o | 4o-mini-FT | GPT-4o-mini | 3.5-Turbo-FT | GPT-3.5-Turbo | |
| Anger | 0.104 | 0.132 | 0.105 | 0.157 | 0.121 | 0.141 | 0.024 | 0.029 | 0.025 | 0.039 | 0.030 | 0.035 |
| Disgust | 0.085 | 0.118 | 0.090 | 0.126 | 0.101 | 0.161 | 0.018 | 0.023 | 0.019 | 0.026 | 0.021 | 0.048 |
| Fear | 0.125 | 0.154 | 0.126 | 0.147 | 0.133 | 0.161 | 0.034 | 0.041 | 0.034 | 0.036 | 0.035 | 0.044 |
| Joy | 0.111 | 0.168 | 0.117 | 0.143 | 0.125 | 0.183 | 0.034 | 0.064 | 0.035 | 0.044 | 0.037 | 0.070 |
| Sadness | 0.123 | 0.151 | 0.125 | 0.167 | 0.131 | 0.172 | 0.035 | 0.037 | 0.034 | 0.043 | 0.034 | 0.044 |
| Surprise | 0.104 | 0.159 | 0.104 | 0.240 | 0.135 | 0.173 | 0.029 | 0.054 | 0.027 | 0.111 | 0.041 | 0.060 |
| Model Name | Before Fine-Tuning | After Fine-Tuning | Enhancement in Value |
|---|---|---|---|
| GPT-4o | 0.6532 | 0.7818 | +0.1286 |
| GPT-4o-mini | 0.5972 | 0.7731 | +0.1759 |
| GPT-3.5-turbo | 0.6022 | 0.7266 | +0.1244 |
| Model Name | Before Fine-Tuning | After Fine-Tuning | Difference |
|---|---|---|---|
| GPT-4o | 0.4891 | 0.3909 | −0.0982 |
| GPT-4o-mini | 0.5513 | 0.4015 | −0.1498 |
| GPT-3.5-turbo | 0.5398 | 0.4401 | −0.0997 |
| Model Name | Emotion Dimension | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 4o-FT | Anger | 0.852 | 0.541 | 0.662 |
| Disgust | 0.852 | 0.590 | 0.697 | |
| Fear | 0.912 | 0.472 | 0.622 | |
| Joy | 0.816 | 0.469 | 0.596 | |
| Sadness | 0.902 | 0.531 | 0.668 | |
| Surprise | 0.752 | 0.495 | 0.597 | |
| 4o-mini-FT | Anger | 0.835 | 0.550 | 0.663 |
| Disgust | 0.835 | 0.553 | 0.665 | |
| Fear | 0.903 | 0.483 | 0.629 | |
| Joy | 0.782 | 0.479 | 0.594 | |
| Sadness | 0.878 | 0.499 | 0.636 | |
| Surprise | 0.752 | 0.488 | 0.592 | |
| 3.5-turbo-FT | Anger | 0.808 | 0.527 | 0.638 |
| Disgust | 0.789 | 0.580 | 0.669 | |
| Fear | 0.879 | 0.487 | 0.627 | |
| Joy | 0.730 | 0.457 | 0.562 | |
| Sadness | 0.855 | 0.489 | 0.622 | |
| Surprise | 0.612 | 0.412 | 0.492 | |
| GPT-4o | Anger | 0.825 | 0.483 | 0.609 |
| Disgust | 0.748 | 0.507 | 0.604 | |
| Fear | 0.885 | 0.466 | 0.610 | |
| Joy | 0.698 | 0.236 | 0.352 | |
| Sadness | 0.881 | 0.342 | 0.493 | |
| Surprise | 0.580 | 0.410 | 0.480 | |
| GPT-4o-mini | Anger | 0.834 | 0.414 | 0.553 |
| Disgust | 0.793 | 0.463 | 0.585 | |
| Fear | 0.892 | 0.430 | 0.580 | |
| Joy | 0.752 | 0.216 | 0.335 | |
| Sadness | 0.870 | 0.271 | 0.414 | |
| Surprise | 0.543 | 0.362 | 0.434 | |
| GPT-3.5-turbo | Anger | 0.810 | 0.459 | 0.586 |
| Disgust | 0.690 | 0.481 | 0.567 | |
| Fear | 0.889 | 0.471 | 0.615 | |
| Joy | 0.655 | 0.277 | 0.389 | |
| Sadness | 0.849 | 0.472 | 0.607 | |
| Surprise | 0.623 | 0.393 | 0.482 |
| Pair of Models | Paired t-Test | p-Value | F1-Score Mean Difference | 95% Confidence Interval |
|---|---|---|---|---|
| 4o-FT, 4o | 3.121 | 0.0262 | 0.1103 | (0.0195, 0.2012) |
| 4o-mini-FT, 4o-mini | 6.016 | 0.0018 | 0.1042 | (0.0597, 0.1487) |
| 3.5-turbo-FT, 3.5-turbo | 2.270 | 0.0724 | 0.0607 | (−0.0080, 0.1294) |
| Prediction | Anger | Disgust | Fear | Joy | Sadness | Surprise | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Real Numbers | 4o-FT | GPT-4o | 4o-FT | GPT-4o | 4o-FT | GPT-4o | 4o-FT | GPT-4o | 4o-FT | GPT-4o | 4o-FT | GPT-4o | |
| Anger | 219 | 118 | 32 | 40 | 156 | 177 | 35 | 72 | 65 | 32 | 20 | 88 | |
| Disgust | 41 | 43 | 58 | 36 | 81 | 81 | 23 | 29 | 33 | 9 | 7 | 45 | |
| Fear | 141 | 187 | 80 | 71 | 376 | 288 | 60 | 129 | 142 | 42 | 22 | 104 | |
| Joy | 48 | 41 | 17 | 20 | 70 | 82 | 60 | 48 | 25 | 8 | 9 | 30 | |
| Sadness | 35 | 108 | 19 | 28 | 147 | 109 | 24 | 55 | 143 | 33 | 13 | 48 | |
| Surprise | 16 | 11 | 5 | 6 | 13 | 29 | 4 | 7 | 15 | 4 | 13 | 9 | |
| 4o-mini-FT | GPT-4o-mini | 4o-mini-FT | GPT-4o-mini | 4o-mini-FT | GPT-4o-mini | 4o-mini-FT | GPT-4o-mini | 4o-mini-FT | GPT-4o-mini | 4o-mini-FT | GPT-4o-mini | ||
| Anger | 51 | 53 | 13 | 3 | 58 | 30 | 15 | 6 | 13 | 4 | 1 | 33 | |
| Disgust | 16 | 17 | 33 | 0 | 37 | 13 | 10 | 1 | 7 | 1 | 0 | 23 | |
| Fear | 24 | 164 | 16 | 15 | 321 | 126 | 36 | 22 | 92 | 23 | 14 | 134 | |
| Joy | 6 | 26 | 4 | 3 | 47 | 42 | 48 | 6 | 30 | 6 | 7 | 56 | |
| Sadness | 10 | 119 | 12 | 1 | 115 | 80 | 19 | 14 | 165 | 22 | 17 | 101 | |
| Surprise | 5 | 24 | 3 | 0 | 18 | 15 | 3 | 2 | 22 | 2 | 18 | 31 | |
| 3.5-Turbo-FT | GPT-3.5-turbo | 3.5-Turbo-FT | GPT-3.5-turbo | 3.5-Turbo-FT | GPT-3.5-turbo | 3.5-Turbo-FT | GPT-3.5-turbo | 3.5-Turbo-FT | GPT-3.5-turbo | 3.5-Turbo-FT | GPT-3.5-turbo | ||
| Anger | 49 | 27 | 6 | 31 | 37 | 24 | 13 | 23 | 31 | 4 | 12 | 18 | |
| Disgust | 15 | 11 | 14 | 12 | 22 | 11 | 4 | 15 | 14 | 1 | 2 | 8 | |
| Fear | 65 | 103 | 30 | 90 | 195 | 145 | 31 | 65 | 130 | 16 | 35 | 78 | |
| Joy | 11 | 17 | 4 | 33 | 29 | 30 | 26 | 33 | 42 | 4 | 22 | 27 | |
| Sadness | 32 | 61 | 11 | 64 | 54 | 92 | 22 | 56 | 143 | 17 | 52 | 68 | |
| Surprise | 8 | 5 | 2 | 18 | 11 | 12 | 1 | 8 | 27 | 2 | 24 | 19 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Y.; Zhu, X.; Tang, W.; Zhou, L.; Feng, L.; Tang, M. DAFT: Domain-Augmented Fine-Tuning for Large Language Models in Emotion Recognition of Health Misinformation. Appl. Sci. 2025, 15, 12690. https://doi.org/10.3390/app152312690
Zhao Y, Zhu X, Tang W, Zhou L, Feng L, Tang M. DAFT: Domain-Augmented Fine-Tuning for Large Language Models in Emotion Recognition of Health Misinformation. Applied Sciences. 2025; 15(23):12690. https://doi.org/10.3390/app152312690
Chicago/Turabian StyleZhao, Youlin, Xingmi Zhu, Wanqing Tang, Linxing Zhou, Li Feng, and Mingwei Tang. 2025. "DAFT: Domain-Augmented Fine-Tuning for Large Language Models in Emotion Recognition of Health Misinformation" Applied Sciences 15, no. 23: 12690. https://doi.org/10.3390/app152312690
APA StyleZhao, Y., Zhu, X., Tang, W., Zhou, L., Feng, L., & Tang, M. (2025). DAFT: Domain-Augmented Fine-Tuning for Large Language Models in Emotion Recognition of Health Misinformation. Applied Sciences, 15(23), 12690. https://doi.org/10.3390/app152312690

