StressSpeak: A Speech-Driven Framework for Real-Time Personalized Stress Detection and Adaptive Psychological Support
Abstract
1. Introduction
1.1. Background
1.2. Research Gap
1.3. Objectives of the Study
- To design and implement a real-time stress detection system based on large language models (LLMs), integrating audio recording, speech-to-text conversion, and multimodal input–output capabilities (text and synthesized speech).
- To evaluate and compare transformer-based models across multiple benchmark stress-related datasets, analyzing accuracy, precision, recall, F1-score, and error rates in both zero-shot and few-shot learning settings.
- To examine real-time feasibility by quantifying latency–accuracy trade-offs across different model sizes and audio input lengths.
- To validate system usability and acceptance in real-world contexts through real-time user feedback from cross-domains.
2. Related Work
2.1. Language-Based Stress Detection
2.2. Machine Learning and Deep Learning Approaches
2.3. Large Language Models (LLMs) in Mental Health Prediction
3. Methodology
3.1. System Overview
3.2. Audio Acquisition and Speech-to-Text Conversion
3.3. Text Preprocessing
- Noise Removal: Elimination of non-informative components such as filler words, long pauses, or repeated phrases.
- Normalization: Standardization procedures including lowercasing, punctuation correction, and removal of extraneous characters.
- Error Correction: Correction of potential transcription inaccuracies using linguistic and grammatical tools.
- Tokenization: Division of text into structured units (tokens) suitable for input into large language models.
3.4. Stress Analysis Using Large Language Models
- Lexical choice (e.g., prevalence of negative language)
- Syntactic structure (e.g., sentence complexity, disruptions)
- Emotional tone and semantic coherence
- Mild Stress
- Moderate Stress
- Severe Stress
3.5. Personalized Recommendation Generation
- Mild Stress: Reflective journaling, gratitude exercises, or short mindful activities.
- Moderate Stress: Structured breathing exercises, mindfulness practices, or guided relaxation.
- Severe Stress: Encouragement to seek professional mental health support, crisis helpline resources, or urgent self-care strategies.
3.6. Web-Based User Interface
- Real-time audio recording and transcription display
- Visualization of emotional and stress analysis results
- Delivery of personalized, dynamic intervention suggestions
3.7. Validation of the Proposed Model Using Benchmark Datasets
3.8. Benchmark Datasets
- Stress Annotated Dataset (SAD): Focused on identifying social anxiety from online forum posts [54].
- Dreaddit: A Reddit-based dataset annotated for varying stress levels [55].
- DepSeverity: A dataset measuring the severity of depressive symptoms [56].
- Suicide Depression Classification with Noisy Labels (SDCNL): Counseling transcript data annotated for stress and emotional distress [57].
- Columbia-Suicide Severity Rating Scale (CSSRS)-Suicide: A dataset derived from the Columbia-Suicide Severity Rating Scale to classify suicidal risk levels [58].
3.9. Learning Paradigms
- Zero-Shot Learning: The model was evaluated without any task-specific fine-tuning, relying purely on pre-trained knowledge representations. This assessed the system’s inherent ability to generalize to unseen domains.
- Few-Shot Learning: The model underwent fine-tuning using a limited number of labeled examples from each dataset. This approach evaluated the model’s adaptability and its capacity to learn task-specific features from minimal data exposure.
4. Experimental Results
- LLaMA consistently demonstrated competitive or superior performance across most datasets.
- While RoBERTa slightly outperformed LLaMA on SAD after fine-tuning, LLaMA showed better robustness on diverse and more challenging datasets like SDCNL and CSSRS-Suicide.
4.1. Comprehensive Metric Evaluation
- Few-shot fine-tuning consistently boosted performance, lowering FN/FP rates.
- LLaMA and RoBERTa emerged as the most reliable models, especially for high-risk and structured datasets.
- Smaller models (Gemma, T5) underperformed, making them less suitable for stress-sensitive tasks.
- The trade-off between model size, accuracy, and latency (as noted in the latency analysis) is crucial when considering real-time deployment.
4.2. Real-Time Performance Evaluation
4.2.1. Latency Analysis
4.2.2. Multilingual Support
4.2.3. Input and Output Modes
4.3. Case Studies from Benchmark Datasets
- In SAD, expressions of social anxiety were classified as moderate stress, prompting journaling and exposure-therapy suggestions.
- In CSSRS-Suicide, signs of suicidal ideation were flagged, and immediate intervention recommendations were generated.
4.4. User Feedback from Different Domains
- Healthcare professionals appreciated the potential for non-invasive monitoring but emphasized the importance of clear disclaimers regarding clinical use.
- Students and educators valued multilingual processing and immediate coping suggestions, reporting increased engagement compared to static self-report tools.
- Corporate employees particularly favored the speech-based interface for quick stress check-ins during work hours, citing improved accessibility over text-only methods.
5. Discussion
5.1. LLM Robustness Across Diverse Contexts
5.2. Trade-Offs Between Model Size, Accuracy, and Latency
5.3. Multilingual and Multimodal Strengths
5.4. User-Centered Evaluation
5.5. Comparative Advantages over Previous Studies
- Integration of Real-Time Speech + LLMs vs. Retrospective Text-Based Methods: Many earlier stress-detection systems focus on textual or physiological signals retrospectively but do not include real-time speech input [4,5,6,7,11]. Moreover, while several studies have developed chatbot systems for mental health [52], our model extends beyond static conversational agents by integrating stress-level classification and adaptive response generation. This aligns with calls for more intelligent, context-aware digital interventions in psychological care [48,50]. However, our system integrates live audio acquisition, speech-to-text conversion, and immediate stress classification, making detection timely and context-aware.
- Multi-Metric Evaluation (Accuracy, FN/FP, Precision, Recall, F1) vs. Accuracy-Only or Binary Metrics: Several studies report only accuracy or binary classification (stress vs. non-stress) and often without detailed error analysis for false negatives/positives [63]. Our work goes further by examining FN rate, FP rate, precision, recall, and F1 in both zero-shot and few-shot settings across multiple datasets, providing a more nuanced view of model reliability, especially important in sensitive contexts such as suicidal ideation detection.
- Few-Shot Fine-Tuning Improves Generalization vs. Models Requiring Large Labeled Data: Traditional machine learning (ML) and deep learning (DL)-based methods (e.g., using Support Vector Machine (SVM), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN)) often require substantial annotated data for each domain or dataset [64]. Our study shows that with few-shot fine-tuning, even with limited new examples, performance improves significantly across datasets. This suggests better adaptability to new domains, and less dependence on large annotation efforts.
- Multilingual Support and Dual Output Modes vs. Monolingual, Text-Only Systems: Many previous works assume English input or text only [65]. Our model supports multilingual speech and outputs in both text and synthesized speech. This improves accessibility (for non-English users, those who prefer voice feedback), an important advance over prior systems that were more constrained.
- Latency Analysis & Real-Time Feasibility vs. High Accuracy Alone: Some prior work achieves high accuracy but do not report or consider latency or real-time usability. For example, many machine learning (ML)/Deep Learning (DL) methods in the literature focus purely on feature extraction + classification (speech signal, physiological data) without considering audio length/model size trade-offs [61]. We provide a latency model (Latency = 2.0 + 0.8 × model size in billions), highlighting how larger models incur longer delays. This allows for a practical assessment of trade-offs required for deployment in real-world scenarios.
- Domain Diversity and Risk-Sensitive Datasets vs. Simpler Data: Several studies use relatively clean datasets or those not involving high risk (e.g., general stress, depression). Our validation includes CSSRS-Suicide dataset, which addresses suicidal ideation, a high-risk domain where false negatives have severe consequences. The performance on CSSRS-Suicide (with few-shot fine-tuning) demonstrating high recall and F1 is an important strength.
5.6. Positioning Against State of the Art
- It establishes that few-shot fine-tuned LLMs outperform both traditional ML and smaller transformer baselines in detecting stress across varied linguistic contexts.
- It introduces multilingual, multimodal, real-time capabilities, bridging a critical gap in accessibility and inclusivity.
- It validates not only algorithmic performance but also human-centered usability across professional domains, strengthening the case for real-world deployment.
- The input and output of the system do not store and therefore removes privacy concerns as well.
5.7. Limitations
5.8. Future Directions
- Extend evaluation to more languages, dialects, and speech styles (slang, colloquial speech).
- Explore on-device or edge computing strategies to reduce latency while preserving accuracy.
- Incorporate additional modalities (e.g., facial expression, physiological signals) to strengthen detection and reduce false negatives in high-risk categories.
- Conduct larger-scale longitudinal validation across diverse populations to evaluate the sustained effectiveness, reliability, and clinical impact of the system over extended periods of real-world use.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Raj, G.; Sharma, A.K.; Arora, Y. Analyzing the effect of digital technology on mental health. In Strategies for E-Commerce Data Security: Cloud, Blockchain, AI, and Machine Learning; IGI Global: Hershey, PA, USA, 2024; pp. 54–82. [Google Scholar] [CrossRef]
- Wu, J.R.; Chan, F.; Iwanaga, K.; Myers, O.M.; Ermis-Demirtas, H.; Bloom, Z.D. The transactional theory of stress and coping as a stress management model for students in Hispanic-serving universities. J. Am. Coll. Health 2025, 1–8. [Google Scholar] [CrossRef]
- Mazumdar, H.; Sathvik, M.; Chakraborty, C.; Unhelkar, B.; Mahmoudi, S. Real-time mental health monitoring for metaverse consumers to ameliorate the negative impacts of escapism and post-trauma stress disorder. IEEE Trans. Consum. Electron. 2024, 70, 2129–2136. [Google Scholar] [CrossRef]
- Keszei, A.P.; Novak, M.; Streiner, D.L. Introduction to health measurement scales. J. Psychosom. Res. 2010, 68, 319–323. [Google Scholar] [CrossRef]
- Singh, A.; Kumar, P. Student stress and mental health crisis: Higher education institutional perspective. In Student Stress in Higher Education; IGI Global: Hershey, PA, USA, 2024; pp. 218–229. [Google Scholar] [CrossRef]
- Facca, D.; Smith, M.J.; Shelley, J.; Lizotte, D.; Donelle, L. Exploring the ethical issues in research using digital data collection strategies with minors: A scoping review. PLoS ONE 2020, 15, e0237875. [Google Scholar] [CrossRef]
- Ozyildirim, G. Teachers’ occupational health: A structural model of work-related stress, depressed mood at work, and organizational commitment. Psychol. Sch. 2024, 61, 2930–2948. [Google Scholar] [CrossRef]
- Lu, Y.; Aleta, A.; Du, C.; Shi, L.; Moreno, Y. LLMs and generative agent-based models for complex systems research. Phys. Life Rev. 2024, 51, 283–293. [Google Scholar] [CrossRef]
- Yechuri, S.; Vanambathina, S. A subconvolutional U-net with gated recurrent unit and efficient channel attention mechanism for real-time speech enhancement. Wirel. Pers. Commun. 2024. [Google Scholar] [CrossRef]
- Haque, Y.; Zawad, R.S.; Rony, C.S.A.; Al Banna, H.; Ghosh, T.; Kaiser, M.S.; Mahmud, M. State-of-the-art of stress prediction from heart rate variability using artificial intelligence. Cogn. Comput. 2024, 16, 455–481. [Google Scholar] [CrossRef]
- Nijhawan, T.; Attigeri, G.; Ananthakrishna, T. Stress detection using natural language processing and machine learning over social interactions. J. Big Data 2022, 9, 33. [Google Scholar] [CrossRef]
- Montejo-Ráez, A.; Molina-González, M.D.; Jiménez-Zafra, S.M.; García-Cumbreras, M.Á.; García-López, L.J. A survey on detecting mental disorders with natural language processing: Literature review, trends and challenges. Comput. Sci. Rev. 2024, 53, 100654. [Google Scholar] [CrossRef]
- Can, Y.S.; Arnrich, B.; Ersoy, C. Stress detection in daily life scenarios using smart phones and wearable sensors: A survey. J. Biomed. Inform. 2019, 92, 103139. [Google Scholar] [CrossRef]
- Bucur, A.-M. Leveraging LLM-generated data for detecting depression symptoms on social media. In International Conference of the Cross-Language Evaluation Forum for European Languages; Springer: Berlin/Heidelberg, Germany, 2024; pp. 193–204. [Google Scholar] [CrossRef]
- Baran, K. Smartphone thermal imaging for stressed people classification using CNN+ MobileNetV2. Procedia Comput. Sci. 2023, 225, 2507–2515. [Google Scholar] [CrossRef]
- Geetha, R.; Gunanandhini, S.; Srikanth, G.U.; Sujatha, V. Human stress detection in and through sleep patterns using machine learning algorithms. J. Inst. Eng. India Ser. B 2024, 105, 1691–1713. [Google Scholar] [CrossRef]
- Gupta, M.V.; Vaikole, S.; Oza, A.D.; Patel, A.; Burduhos-Nergis, D.P.; Burduhos-Nergis, D.D. Audio-visual stress classification using cascaded RNN-LSTM networks. Bioengineering 2022, 9, 510. [Google Scholar] [CrossRef]
- Bromuri, S.; Henkel, A.P.; Iren, D.; Urovi, V. Using AI to predict service agent stress from emotion patterns in service interactions. J. Serv. Manag. 2024, 32, 581–611. [Google Scholar] [CrossRef]
- Xefteris, V.-R.; Dominguez, M.; Grivolla, J.; Tsanousa, A.; Zaffanela, F.; Monego, M.; Symeonidis, S.; Diplaris, S.; Wanner, L.; Vrochidis, S.; et al. Stress detection based on physiological sensor and audio signals, and a late fusion framework: An experimental study and public dataset. Res. Sq. 2023. [Google Scholar] [CrossRef]
- Lopez, F.S.; Condori-Fernandez, N.; Catala, A. Towards real-time automatic stress detection for office workplaces. In Information Management and Big Data, Proceedings of the 5th International Conference, SIMBig 2018, Lima, Peru, 3–5 September 2018; Springer: Cham, Switzerland, 2018; pp. 273–288. [Google Scholar] [CrossRef]
- Suneetha, C. A survey of machine learning techniques on speech-based emotion recognition and post-traumatic stress disorder detection. NeuroQuantology 2022, 20, 69–79. [Google Scholar]
- Sohail, S.S.; Farhat, F.; Himeur, Y.; Nadeem, M.; Madsen, D.Ø.; Singh, Y.; Atalla, S.; Mansoor, W. Decoding ChatGPT: A taxonomy of existing research, current challenges, and possible future directions. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101675. [Google Scholar] [CrossRef]
- Arushi; Dillon, R.; Teoh, A.N.; Dillon, D. Detecting public speaking stress via real-time voice analysis in virtual reality: A review. In Proceedings of the Sustainability, Economics, Innovation, Globalisation and Organisational Psychology Conference, Singapore, 1–3 March 2023; Springer: Singapore, 2023; pp. 117–152. [Google Scholar] [CrossRef]
- Al-Saadawi, H.F.; Das, B.; Das, R. A systematic review of trimodal affective computing approaches: Text, audio, and visual integration in emotion recognition and sentiment analysis. Expert Syst. Appl. 2024, 255, 124852. [Google Scholar] [CrossRef]
- Thirunavukarasu, A.J.; Ting, D.S.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S. Large language models in medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef]
- Denecke, K.; Reichenpfader, D. Sentiment analysis of clinical narratives: A scoping review. J. Biomed. Inform. 2023, 140, 104336. [Google Scholar] [CrossRef]
- Epel, E.S.; Crosswell, A.D.; Mayer, S.E.; Prather, A.A.; Slavich, G.M.; Puterman, E.; Mendes, W.B. More than a feeling: A unified view of stress measurement for population science. Front. Neuroendocrinol. 2018, 49, 146–169. [Google Scholar] [CrossRef]
- Bentley, K.H.; Franklin, J.C.; Ribeiro, J.D.; Kleiman, E.M.; Fox, K.R.; Nock, M.K. Anxiety and its disorders as risk factors for suicidal thoughts and behaviors: A meta-analytic review. Clin. Psychol. Rev. 2016, 43, 30–46. [Google Scholar] [CrossRef]
- Nguyen, T.; Phung, D.; Dao, B.; Venkatesh, S.; Berk, M. Affective and content analysis of online depression communities. IEEE Trans. Affect. Comput. 2014, 5, 217–226. [Google Scholar] [CrossRef]
- Panagiotopoulos, P.; Barnett, J.; Bigdeli, A.Z.; Sams, S. Social media in emergency management: Twitter as a tool for communicating risks to the public. Technol. Forecast. Soc. Change 2016, 111, 86–96. [Google Scholar] [CrossRef]
- Palen, L.; Hughes, A.L. Social media in disaster communication. In Handbook of Disaster Research; Rodríguez, H., Donner, W., Trainor, J.E., Eds.; Springer: Cham, Switzerland, 2018; pp. 497–518. [Google Scholar] [CrossRef]
- Khan, A.; Ali, R. Unraveling minds in the digital era: A review on mapping mental health disorders through machine learning techniques using online social media. Soc. Netw. Anal. Min. 2024, 14, 78. [Google Scholar] [CrossRef]
- Uban, A.-S.; Chulvi, B.; Rosso, P. An emotion and cognitive-based analysis of mental health disorders from social media data. Future Gener. Comput. Syst. 2021, 124, 480–494. [Google Scholar] [CrossRef]
- Poudel, U.; Jakhar, S.; Mohan, P.; Nepal, A. AI in Mental Health: A Review of Technological Advancements and Ethical Issues in Psychiatry. Issues Ment. Health Nurs. 2025, 46, 693–701. [Google Scholar] [CrossRef]
- Iyortsuun, N.K.; Kim, S.-H.; Jhon, M.; Yang, H.-J.; Pant, S. A review of machine learning and deep learning approaches on mental health diagnosis. Healthcare 2023, 11, 285. [Google Scholar] [CrossRef]
- Balraj, C.S.; Nagaraj, P. Prediction of Mental Health Issues and Challenges Using Hybrid Machine and Deep Learning Techniques. In Proceedings of the International Conference on Mathematics and Computing, Krishnankoil, India, 2–7 January 2024; Springer Nature: Singapore, 2024; pp. 15–27. [Google Scholar] [CrossRef]
- Saidi, A.; Othman, S.B.; Saoud, S.B. Hybrid CNN-SVM classifier for efficient depression detection system. In Proceedings of the 2020 4th International Conference on Advanced Systems and Emergent Technologies (IC ASET), Hammamet, Tunisia, 15–18 December 2020; IEEE: New York, NY, USA, 2020; pp. 229–234. [Google Scholar] [CrossRef]
- Xie, W.; Wang, C.; Lin, Z.; Luo, X.; Chen, W.; Xu, M.; Liang, L.; Liu, X.; Wang, Y.; Luo, H.; et al. Multimodal fusion diagnosis of depression and anxiety based on CNN-LSTM model. Comput. Med. Imaging Graph. 2022, 102, 102128. [Google Scholar] [CrossRef]
- Garg, M. Mental health analysis in social media posts: A survey. Arch. Comput. Methods Eng. 2023, 30, 1819–1842. [Google Scholar] [CrossRef]
- Kuttala, R.; Subramanian, R.; Oruganti, V.R.M. Multimodal hierarchical CNN feature fusion for stress detection. IEEE Access 2023, 11, 6867–6878. [Google Scholar] [CrossRef]
- Wang, X.; Liu, K.; Wang, C. Knowledge-enhanced pre-training large language model for depression diagnosis and treatment. In Proceedings of the 2023 IEEE 9th International Conference on Cloud Computing and Intelligent Systems (CCIS), Beijing, China, 12–13 August 2023; IEEE: New York, NY, USA, 2023; pp. 532–536. [Google Scholar] [CrossRef]
- Naegelin, M.; Weibel, R.P.; Kerr, J.I.; Schinazi, V.R.; La Marca, R.; von Wangenheim, F.; Hoelscher, C.; Ferrario, A. An interpretable machine learning approach to multimodal stress detection in a simulated office environment. J. Biomed. Inform. 2023, 139, 104299. [Google Scholar] [CrossRef]
- Haque, F.; Nur, R.U.; Al Jahan, S.; Mahmud, Z.; Shah, F.M. A transformer-based approach to detect suicidal ideation using pre-trained language models. In Proceedings of the 2020 23rd International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 19–21 December 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Sadeghi, M.; Egger, B.; Agahi, R.; Richer, R.; Capito, K.; Rupp, L.H.; Schindler-Gmelch, L.; Berking, M.; Eskofier, B.M. Exploring the capabilities of a language model-only approach for depression detection in text data. In Proceedings of the 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), Pittsburgh, PA, USA, 15–18 October 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Qiu, J.; Lam, K.; Li, G.; Acharya, A.; Wong, T.Y.; Darzi, A.; Yuan, W.; Topol, E.J. LLM-based agentic systems in medicine and healthcare. Nat. Mach. Intell. 2024, 6, 1418–1420. [Google Scholar] [CrossRef]
- Anisuzzaman, D.M.; Malins, J.G.; Friedman, P.A.; Attia, Z.I. Fine-tuning llms for specialized use cases. Mayo Clin. Proc. Digit. Health 2024, 3, 100184. [Google Scholar] [CrossRef]
- Wang, Y.; Fu, T.; Xu, Y.; Ma, Z.; Xu, H.; Du, B.; Lu, Y.; Gao, H.; Wu, J.; Chen, J. TWIN-GPT: Digital twins for clinical trials via large language model. ACM Trans. Multimed. Comput. Commun. Appl. 2024. [Google Scholar] [CrossRef]
- Xiao, H.; Zhou, F.; Liu, X.; Liu, T.; Li, Z.; Liu, X.; Huang, X. A comprehensive survey of large language models and multimodal large language models in medicine. Inf. Fusion 2025, 117, 102888. [Google Scholar] [CrossRef]
- Mellouk, W.; Handouzi, W. Multimodal Contactless Human Stress Detection Using Deep Learning. In Proceedings of the International Conference on Computing Systems and Applications, Sousse, Tunisia, 22–26 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 3–12. [Google Scholar] [CrossRef]
- Wals Zurita, A.J.; Miras del Rio, H.; Ugarte Ruiz de Aguirre, N.; Nebrera Navarro, C.; Rubio Jimenez, M.; Muñoz Carmona, D.; Miguez Sanchez, C. The transformative potential of large language models in mining electronic health records data: Content analysis. JMIR Med. Inform. 2025, 13, e58457. [Google Scholar] [CrossRef]
- Nassiri, K.; Akhloufi, M.A. Recent advances in large language models for healthcare. BioMedInformatics 2024, 4, 1097–1143. [Google Scholar] [CrossRef]
- Yu, H.; McGuinness, S. An experimental study of integrating fine-tuned LLMs and prompts for enhancing mental health support chatbot system. J. Med. Artif. Intell. 2024, 7, 1–16. [Google Scholar] [CrossRef]
- Xiang, J.Z.; Wang, Q.Y.; Fang, Z.B.; Esquivel, J.A.; Su, Z.X. A multi-modal deep learning approach for stress detection using physiological signals: Integrating time and frequency domain features. Front. Physiol. 2025, 16, 1584299. [Google Scholar] [CrossRef]
- Mauriello, M.L.; Lincoln, T.; Hon, G.; Simon, D.; Jurafsky, D.; Paredes, P. Sad: A stress annotated dataset for recognizing everyday stressors in sms-like conversational systems. In Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Turcan, E.; McKeown, K. Dreaddit: A Reddit dataset for stress analysis in social media. arXiv 2019, arXiv:1911.00133. [Google Scholar] [CrossRef]
- Naseem, U.; Dunn, A.G.; Kim, J.; Khushi, M. Early identification of depression severity levels on reddit using ordinal classification. In Proceedings of the ACM Web Conference, Lyon, France, 25–29 April 2022; pp. 2563–2572. [Google Scholar] [CrossRef]
- Haque, A.; Reddi, V.; Giallanza, T. Deep learning for suicide and depression identification with unsupervised label correction. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Part V, Bratislava, Slovakia, 14–17 September 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 436–447. [Google Scholar] [CrossRef]
- Barzilay, S.; Yaseen, Z.S.; Hawes, M.; Kopeykina, I.; Ardalan, F.; Rosenfield, P.; Murrough, J.; Galynker, I. Determinants and predictive value of clinician assessment of short-term suicide risk. Suicide Life-Threat. Behav. 2019, 49, 614–626. [Google Scholar] [CrossRef]
- Delobelle, P.; Winters, T.; Berendt, B. Robbert: A dutch roberta-based language model. arXiv 2020, arXiv:2001.06286. [Google Scholar] [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar] [CrossRef]
- Varma, S.; Shivam, S.; Ray, B.; Banerjee, A. Few-Shot Learning with Fine-Tuned Language Model for Suicidal Text Detection. In Proceedings of the International Conference on Frontiers in Computing and Systems, Himachal Pradesh, India, 16–17 October 2023; Springer Nature: Singapore, 2023; pp. 139–151. [Google Scholar] [CrossRef]
- Rozand, V.; Lebon, F.; Papaxanthis, C.; Lepers, R. Effect of mental fatigue on speed–accuracy trade-off. Neuroscience 2015, 297, 219–230. [Google Scholar] [CrossRef]
- Kumar, A.; Shaun, M.A.; Chaurasia, B.K. Identification of psychological stress from speech signal using deep learning algorithm. E-Prime-Adv. Electr. Eng. Electron. Energy 2024, 9, 100707. [Google Scholar] [CrossRef]
- Shahapur, S.S.; Chitti, P.; Patil, S.; Nerurkar, C.A.; Shivannagol, V.S.; Rayanaikar, V.C.; Sawant, V.; Betageri, V. Decoding minds: Estimation of stress level in students using machine learning. Indian J. Sci. Technol. 2024, 17, 2002–2012. [Google Scholar] [CrossRef]
- Ali, A.A.; Fouda, A.E.; Hanafy, R.J.; Fouda, M.E. Leveraging audio and text modalities in mental health: A study of LLMs performance. arXiv 2024, arXiv:2412.10417. [Google Scholar] [CrossRef]
Speech
Text conversion
LLM
Stress detection and its classification
Recommendation based on categorized stress level
Audio + Textual Output).
Speech
Text conversion
LLM
Stress detection and its classification
Recommendation based on categorized stress level
Audio + Textual Output).





| Phase | Metric | Gemma | Llama | BERT | Roberta | DistilBERT | XLNet | T5 | DeBERTa | Electra |
|---|---|---|---|---|---|---|---|---|---|---|
| Zero-Shot | Accuracy | 78 | 90.6 | 89.6 | 91.5 | 90.2 | 91 | 91.5 | 91.7 | 91.9 |
| FN Rate | 22 | 9.4 | 10.4 | 8.4 | 9.8 | 9 | 8.4 | 8.4 | 8 | |
| FP Rate | 22 | 9.4 | 10.4 | 8.6 | 9.8 | 9 | 8.6 | 8.2 | 8.2 | |
| Precision | 78 | 90.6 | 89.6 | 91.42 | 90.2 | 91 | 91.42 | 91.78 | 91.82 | |
| Recall | 78 | 90.6 | 89.6 | 91.6 | 90.2 | 91 | 91.6 | 91.6 | 92 | |
| F1 | 78 | 90.6 | 89.6 | 91.51 | 90.2 | 91 | 91.51 | 91.69 | 91.91 | |
| Few-Shot | Accuracy | 82.2 | 91.5 | 90.9 | 93.8 | 91.7 | 91.7 | 92.3 | 92.4 | 93.2 |
| FN Rate | 17.8 | 8.4 | 9.2 | 6.2 | 8.4 | 8.4 | 7.6 | 7.6 | 6.8 | |
| FP Rate | 17.8 | 8.6 | 9 | 6.2 | 8.2 | 8.2 | 7.8 | 7.6 | 6.8 | |
| Precision | 82.2 | 91.42 | 90.98 | 93.8 | 91.78 | 91.78 | 92.22 | 92.4 | 93.2 | |
| Recall | 82.2 | 91.6 | 90.8 | 93.8 | 91.6 | 91.6 | 92.4 | 92.4 | 93.2 | |
| F1 | 82.2 | 91.51 | 90.89 | 93.8 | 91.69 | 91.69 | 92.31 | 92.4 | 93.2 |
| Phase | Metric | Gemma | Llama | BERT | Roberta | DistilBERT | XLNet | T5 | DeBERTa | Electra |
|---|---|---|---|---|---|---|---|---|---|---|
| Zero-Shot | Accuracy | 72.1 | 81.4 | 73.8 | 75.4 | 78.9 | 73 | 73.3 | 75.2 | 73.8 |
| FN Rate | 28 | 18.6 | 26.2 | 24.6 | 21.2 | 27 | 26.8 | 24.8 | 26.2 | |
| FP Rate | 27.8 | 18.6 | 26.2 | 24.6 | 21 | 27 | 26.6 | 24.8 | 26.2 | |
| Precision | 72.14 | 81.4 | 73.8 | 75.4 | 78.96 | 73 | 73.35 | 75.2 | 73.8 | |
| Recall | 72 | 81.4 | 73.8 | 75.4 | 78.8 | 73 | 73.2 | 75.2 | 73.8 | |
| F1 | 72.07 | 81.4 | 73.8 | 75.4 | 78.88 | 73 | 73.27 | 75.2 | 73.8 | |
| Few-Shot | Accuracy | 75.6 | 85.1 | 76.8 | 79.5 | 81.4 | 76.4 | 75.9 | 79 | 77.4 |
| FN Rate | 24.4 | 14.8 | 23.2 | 20.4 | 18.6 | 23.6 | 24 | 21 | 22.6 | |
| FP Rate | 24.4 | 15 | 23.2 | 20.6 | 18.6 | 23.6 | 24.2 | 21 | 22.6 | |
| Precision | 75.6 | 85.03 | 76.8 | 79.44 | 81.4 | 76.4 | 75.85 | 79 | 77.4 | |
| Recall | 75.6 | 85.2 | 76.8 | 79.6 | 81.4 | 76.4 | 76 | 79 | 77.4 | |
| F1 | 75.6 | 85.11 | 76.8 | 79.52 | 81.4 | 76.4 | 75.92 | 79 | 77.4 |
| Phase | Metric | Gemma | Llama | BERT | Roberta | DistilBERT | XLNet | T5 | DeBERTa | Electra |
|---|---|---|---|---|---|---|---|---|---|---|
| Zero-Shot | Accuracy | 57.11 | 72.18 | 60.44 | 62.35 | 65.18 | 59.83 | 56.68 | 62.39 | 62.24 |
| FN Rate | 41.77 | 22.08 | 39.56 | 27.65 | 29.44 | 40.17 | 43.32 | 27.61 | 23.94 | |
| FP Rate | 15.58 | 17.82 | 20.47 | 11.1 | 14.9 | 18.3 | 22.45 | 13.9 | 20.36 | |
| Precision | 62.98 | 80.09 | 50.73 | 72.93 | 66.1 | 54.6 | 47.8 | 68.3 | 64.17 | |
| Recall | 58.23 | 77.92 | 60.44 | 72.35 | 70.56 | 59.83 | 56.68 | 72.39 | 76.06 | |
| F1 | 60.51 | 79.97 | 55.24 | 72.85 | 68.18 | 57.96 | 51.87 | 70.32 | 69.61 | |
| Few-Shot | Accuracy | 60.39 | 76.4 | 64.14 | 65.83 | 70.05 | 63.35 | 61.73 | 66.28 | 66.96 |
| FN Rate | 32.46 | 18.38 | 31.22 | 20.44 | 20.22 | 36.65 | 38.27 | 23.72 | 18.25 | |
| FP Rate | 9.2 | 12.15 | 14.3 | 8.05 | 10.1 | 13.55 | 16.9 | 10.55 | 16.62 | |
| Precision | 69.08 | 87.89 | 58.12 | 77.6 | 73.4 | 61.22 | 54.9 | 74 | 67.29 | |
| Recall | 67.54 | 81.62 | 68.78 | 79.56 | 79.78 | 63.35 | 61.73 | 76.28 | 81.75 | |
| F1 | 68.3 | 84.54 | 62.79 | 78.57 | 76.33 | 62.04 | 58.06 | 75.03 | 73.82 |
| Phase | Metric | Gemma | Llama | BERT | Roberta | DistilBERT | XLNet | T5 | DeBERTa | Electra |
|---|---|---|---|---|---|---|---|---|---|---|
| Zero-Shot | Accuracy | 63.8 | 75.9 | 66.1 | 66.8 | 70 | 62.1 | 60.5 | 68.2 | 67.4 |
| FN Rate | 36.2 | 24 | 34 | 33.2 | 30 | 38 | 39.6 | 31.8 | 32.6 | |
| FP Rate | 36.2 | 24.2 | 33.8 | 33.2 | 30 | 37.8 | 39.4 | 31.8 | 32.6 | |
| Precision | 63.8 | 75.85 | 66.13 | 66.8 | 70 | 62.12 | 60.52 | 68.2 | 67.4 | |
| Recall | 63.8 | 76 | 66 | 66.8 | 70 | 62 | 60.4 | 68.2 | 67.4 | |
| F1 | 63.8 | 75.92 | 66.07 | 66.8 | 70 | 62.06 | 60.46 | 68.2 | 67.4 | |
| Few-Shot | Accuracy | 68 | 78.4 | 69.2 | 71.2 | 73.6 | 66.7 | 65.1 | 72.1 | 71.3 |
| FN Rate | 32 | 21.6 | 30.8 | 28.8 | 26.4 | 33.2 | 34.8 | 28 | 28.8 | |
| FP Rate | 32 | 21.6 | 30.8 | 28.8 | 26.4 | 33.4 | 35 | 27.8 | 28.6 | |
| Precision | 68 | 78.4 | 69.2 | 71.2 | 73.6 | 66.67 | 65.07 | 72.14 | 71.34 | |
| Recall | 68 | 78.4 | 69.2 | 71.2 | 73.6 | 66.8 | 65.2 | 72 | 71.2 | |
| F1 | 68 | 78.4 | 69.2 | 71.2 | 73.6 | 66.73 | 65.13 | 72.07 | 71.27 |
| Phase | Metric | Gemma | Llama | BERT | Roberta | DistilBERT | XLNet | T5 | DeBERTa | Electra |
|---|---|---|---|---|---|---|---|---|---|---|
| Zero-Shot | Accuracy | 51 | 66.4 | 51.9 | 52.8 | 61.4 | 51.3 | 46.4 | 51.9 | 51.4 |
| FN Rate | 49 | 33.6 | 48 | 47.2 | 38.6 | 48.8 | 53.6 | 48 | 48.6 | |
| FP Rate | 49 | 33.6 | 48.2 | 47.2 | 38.6 | 48.6 | 53.6 | 48.2 | 48.6 | |
| Precision | 51 | 66.4 | 51.9 | 52.8 | 61.4 | 51.3 | 46.4 | 51.9 | 51.4 | |
| Recall | 51 | 66.4 | 52 | 52.8 | 61.4 | 51.2 | 46.4 | 52 | 51.4 | |
| F1 | 51 | 66.4 | 51.95 | 52.8 | 61.4 | 51.25 | 46.4 | 51.95 | 51.4 | |
| Few-Shot | Accuracy | 54 | 71.5 | 55.6 | 56.7 | 65.7 | 54.1 | 49.6 | 55 | 54.8 |
| FN Rate | 46 | 28.4 | 44.4 | 43.2 | 34.4 | 46 | 50.4 | 45 | 45.2 | |
| FP Rate | 46 | 28.6 | 44.4 | 43.4 | 34.2 | 45.8 | 50.4 | 45 | 45.2 | |
| Precision | 54 | 71.46 | 55.6 | 56.69 | 65.73 | 54.11 | 49.6 | 55 | 54.8 | |
| Recall | 54 | 71.6 | 55.6 | 56.8 | 65.6 | 54 | 49.6 | 55 | 54.8 | |
| F1 | 54 | 71.53 | 55.6 | 56.74 | 65.67 | 54.05 | 49.6 | 55 | 54.8 |
| Dataset Instance | Stress Classification | Recommendation |
|---|---|---|
| SAD: | Classified as Moderate Stress. LLAMA 3 detected signs of social anxiety and self-perceived judgment. | Suggested exposure therapy techniques, journaling thoughts, and consulting a therapist specializing in social anxiety. |
| “I feel so nervous and judged every time I’m in a group, even at work.” | ||
| Dreaddit: | Classified as Severe Stress. LLAMA 3 identified escalating distress and potential sleep-related issues. | Recommended seeking immediate counseling, practicing relaxation techniques, and addressing sleep hygiene. |
| “I’ve been feeling overwhelmed and can’t sleep. It’s getting worse each day.” | ||
| DepSeverity: | Classified as Severe Depression. LLAMA 3 recognized patterns of de- pressive severity and potential lethargy. | Suggested consulting a psychiatrist for medication, maintaining a routine, and engaging in physical activity. |
| “I feel hopeless and tired all the time, like there’s no point to anything.” | ||
| SDCNL: | Classified as Moderate Stress. LLAMA 3 analyzed role conflict and emotional exhaustion. | Suggested creating a structured schedule, setting realistic goals, and engaging in mindfulness practices. |
| “I’m constantly stressed out about balancing work and family, it’s exhausting.” | ||
| CSSRS-Suicide: | Classified as High Suicide Risk. LLAMA 3 flagged suicidal ideation and urgency for intervention. | Recommended contacting a crisis helpline, involving trusted individuals, and scheduling a mental health evaluation. |
| “I sometimes think people would be better off without me.” |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Umer, L.; Iqbal, J.; Ayaz, Y.; Imam, H.; Ahmad, A.; Asgher, U. StressSpeak: A Speech-Driven Framework for Real-Time Personalized Stress Detection and Adaptive Psychological Support. Diagnostics 2025, 15, 2871. https://doi.org/10.3390/diagnostics15222871
Umer L, Iqbal J, Ayaz Y, Imam H, Ahmad A, Asgher U. StressSpeak: A Speech-Driven Framework for Real-Time Personalized Stress Detection and Adaptive Psychological Support. Diagnostics. 2025; 15(22):2871. https://doi.org/10.3390/diagnostics15222871
Chicago/Turabian StyleUmer, Laraib, Javaid Iqbal, Yasar Ayaz, Hassan Imam, Adil Ahmad, and Umer Asgher. 2025. "StressSpeak: A Speech-Driven Framework for Real-Time Personalized Stress Detection and Adaptive Psychological Support" Diagnostics 15, no. 22: 2871. https://doi.org/10.3390/diagnostics15222871
APA StyleUmer, L., Iqbal, J., Ayaz, Y., Imam, H., Ahmad, A., & Asgher, U. (2025). StressSpeak: A Speech-Driven Framework for Real-Time Personalized Stress Detection and Adaptive Psychological Support. Diagnostics, 15(22), 2871. https://doi.org/10.3390/diagnostics15222871

