From Convolution to Spikes for Mental Health: A CNN-to-SNN Approach Using the DAIC-WOZ Dataset
Abstract
Featured Application
Abstract
1. Introduction
2. Materials and Methods
2.1. Neural Networks
2.2. Dataset
2.3. Feature Extraction
2.4. Neural Networks Architecture
- Input Layer: 128 × 128 grayscale Mel-spectrograms (single-channel input)
- Conv Layer 1: Conv2d(1, 32, kernel_size = 3) → ReLU activation
- Pooling Layer 1: AvgPool2d(kernel_size = 2)
- Conv Layer 2: Conv2d(32, 64, kernel_size = 3) → ReLU activation
- Pooling Layer 2: AvgPool2d(kernel_size = 2)
- Flattening Layer
- Fully Connected Layer: Linear(128, 2) followed by softmax activation
- Batch size: 32
- Early stopping: patience = 5 epochs, max epochs = 50
- Average convergence epoch: 10
2.5. Software, Libraries, and Generative AI Tools
3. Results
3.1. CNN Baseline Performance
- Class 0 (Non-depressed): Precision = 0.8145, Recall = 0.8421, F1 = 0.8280
- Class 1 (Depressed): Precision = 0.8365, Recall = 0.8082, F1 = 0.8221
3.2. SNN Performance Across Conversion Modes
4. Discussion
4.1. Analysis and Interpretation
4.2. Comparative Analysis
4.3. Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Leal, S.S.; Ntalampiras, S.; Sassi, R. Speech-Based Depression Assessment: A Comprehensive Survey. IEEE Trans. Affect. Comput. 2024, 1, 1–16. [Google Scholar] [CrossRef]
- Mao, K.; Wu, Y.; Chen, J. A Systematic Review on Automated Clinical Depression Diagnosis. Npj Ment. Health Res. 2023, 2, 20. [Google Scholar] [CrossRef]
- Moreno-Agostino, D.; Wu, Y.-T.; Daskalopoulou, C.; Hasan, M.T.; Huisman, M.; Prina, M. Global Trends in the Prevalence and Incidence of Depression: A Systematic Review and Meta-Analysis. J. Affect. Disord. 2021, 281, 235–243. [Google Scholar] [CrossRef]
- Brody, D.J.; Hughes, J.P. Depression Prevalence in Adolescents and Adults: United States, August 2021–August 2023. NCHS Data Brief 2025, 527, 1–11. [Google Scholar] [CrossRef]
- Terlizzi, E.P.; Zablotsky, B. Symptoms of Anxiety and Depression Among Adults: United States, 2019 and 2022. Natl. Health Stat. Rep. 2024, 213, CS353885. [Google Scholar]
- Arias-de la Torre, J.; Vilagut, G.; Ronaldson, A.; Bakolis, I.; Dregan, A.; Martín, V.; Martinez-Alés, G.; Molina, A.J.; Serrano-Blanco, A.; Valderas, J.M.; et al. Prevalence and Variability of Depressive Symptoms in Europe: Update Using Representative Data from the Second and Third Waves of the European Health Interview Survey (EHIS-2 and EHIS-3). Lancet Public Health 2023, 8, e889–e898. [Google Scholar] [CrossRef]
- Lim, G.Y.; Tam, W.W.; Lu, Y.; Ho, C.S.; Zhang, M.W.; Ho, R.C. Prevalence of Depression in the Community from 30 Countries between 1994 and 2014. Sci. Rep. 2018, 8, 2861. [Google Scholar] [CrossRef] [PubMed]
- Chen, Q.; Huang, S.; Xu, H.; Peng, J.; Wang, P.; Li, S.; Zhao, J.; Shi, X.; Zhang, W.; Shi, L.; et al. The Burden of Mental Disorders in Asian Countries, 1990–2019: An Analysis for the Global Burden of Disease Study 2019. Transl. Psychiatry 2024, 14, 167. [Google Scholar] [CrossRef] [PubMed]
- Mahmud, S.; Mohsin, M.; Dewan, M.N.; Muyeed, A. The Global Prevalence of Depression, Anxiety, Stress, and Insomnia Among General Population During COVID-19 Pandemic: A Systematic Review and Meta-Analysis. Trends Psychol. 2023, 31, 143–170. [Google Scholar] [CrossRef]
- Jia, H.; Guerin, R.J.; Barile, J.P.; Okun, A.H.; McKnight-Eily, L.; Blumberg, S.J.; Njai, R.; Thompson, W.W. National and State Trends in Anxiety and Depression Severity Scores Among Adults During the COVID-19 Pandemic—United States, 2020–2021. MMWR Morb. Mortal. Wkly. Rep. 2021, 70, 1427–1432. [Google Scholar] [CrossRef] [PubMed]
- Kroenke, K.; Strine, T.W.; Spitzer, R.L.; Williams, J.B.W.; Berry, J.T.; Mokdad, A.H. The PHQ-8 as a Measure of Current Depression in the General Population. J. Affect. Disord. 2009, 114, 163–173. [Google Scholar] [CrossRef]
- Lu, Y.-J.; Chang, X.; Li, C.; Zhang, W.; Cornell, S.; Ni, Z.; Masuyama, Y.; Yan, B.; Scheibler, R.; Wang, Z.-Q.; et al. ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. In Proceedings of the Interspeech 2022, Incheon, Republic of Korea, 18–22 September 2022; pp. 5458–5462. [Google Scholar] [CrossRef]
- Wu, P.; Wang, R.; Lin, H.; Zhang, F.; Tu, J.; Sun, M. Automatic Depression Recognition by Intelligent Speech Signal Processing: A Systematic Survey. CAAI Trans. Intell. Technol. 2023, 8, 701–711. [Google Scholar] [CrossRef]
- Othmani, A.; Kadoch, D.; Bentounes, K.; Rejaibi, E.; Alfred, R.; Hadid, A. Towards Robust Deep Neural Networks for Affect and Depression Recognition from Speech. In Pattern Recognition. ICPR International Workshops and Challenges; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; Volume 12662, pp. 5–19. [Google Scholar] [CrossRef]
- Liu, L.; Liu, L.; Wafa, H.A.; Tydeman, F.; Xie, W.; Wang, Y. Diagnostic Accuracy of Deep Learning Using Speech Samples in Depression: A Systematic Review and Meta-Analysis. J. Am. Med. Inform. Assoc. 2024, 31, 2394–2404. [Google Scholar] [CrossRef]
- Kapse, P.; Garg, V.K. Advanced Deep Learning Techniques for Depression Detection: A Review. SSRN Sch. Pap. 2022, 4180783. [Google Scholar] [CrossRef]
- Boulal, H.; Hamidi, M.; Abarkan, M.; Barkani, J. Amazigh CNN Speech Recognition System Based on Mel Spectrogram Feature Extraction Method. Int. J. Speech Technol. 2024, 27, 287–296. [Google Scholar] [CrossRef]
- Fang, W.; Chen, Y.; Ding, J.; Yu, Z.; Masquelier, T.; Chen, D.; Huang, L.; Zhou, H.; Li, G.; Tian, Y. SpikingJelly: An Open-Source Machine Learning Infrastructure Platform for Spike-Based Intelligence. Sci. Adv. 2023, 9, eadi1480. [Google Scholar] [CrossRef]
- Bu, T.; Fang, W.; Ding, J.; Dai, P.; Yu, Z.; Huang, T. Optimal ANN-SNN Conversion for High-Accuracy and Ultra-Low-Latency Spiking Neural Networks. arXiv 2023. [Google Scholar]
- Izhikevich, E.M. Simple Model of Spiking Neurons. IEEE Trans. Neural Netw. 2003, 14, 1569–1572. [Google Scholar] [CrossRef] [PubMed]
- Gratch, J.; Artstein, R.; Lucas, G.; Stratou, G.; Scherer, S.; Nazarian, A.; Wood, R.; Boberg, J.; DeVault, D.; Marsella, S.; et al. The Distress Analysis Interview Corpus of Human and Computer Interviews. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; pp. 3123–3128. [Google Scholar]
- Pandya, S. A Machine Learning Framework for Enhanced Depression Detection in Mental Health Care Setting. Int. J. Sci. Res. Sci. Eng. Technol. 2023, 10, 356–368. [Google Scholar] [CrossRef]
- Bhatt, N.; Jain, A.; Jain, M.; Bhatt, S. Depression Detection from Speech Using a Voting Ensemble Approach. In Proceedings of the 2024 IEEE 8th International Conference on Information and Communication Technology (CICT), Allahabad, India, 21–23 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Saeed, M.; Komashinsky, V.; Mohammed, S.; Abdulqader, N.; Saif, L. Speech Signal Analysis to Predict Depression. Int. J. Adv. Netw. Appl. 2025, 16, 6460–6465. [Google Scholar] [CrossRef]
- Al Hanai, T.; Ghassemi, M.; Glass, J. Detecting Depression with Audio/Text Sequence Modeling of Interviews. In Proceedings of the Interspeech 2018, Hyderabad, India, 2–6 September 2018; pp. 1716–1720. [Google Scholar] [CrossRef]
- Chlasta, K.; Wołk, K.; Krejtz, I. Automated Speech-Based Screening of Depression Using Deep Convolutional Neural Networks. Procedia Comput. Sci. 2019, 164, 618–628. [Google Scholar] [CrossRef]
- Ding, H.; Du, Z.; Wang, Z.; Xue, J.; Wei, Z.; Yang, K.; Jin, S.; Zhang, Z.; Wang, J. IntervoxNet: A Novel Dual-Modal Audio-Text Fusion Network for Automatic and Efficient Depression Detection from Interviews. Front. Phys. 2024, 12, 1430035. [Google Scholar] [CrossRef]
- Lam, G.; Dongyan, H.; Lin, W. Context-Aware Deep Learning for Multi-Modal Depression Detection. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar] [CrossRef]
- Lin, L.; Chen, X.; Shen, Y.; Zhang, L. Towards Automatic Depression Detection: A BiLSTM/1D CNN-Based Model. Appl. Sci. 2020, 10, 8701. [Google Scholar] [CrossRef]
- Manoret, P.; Chotipurk, P.; Sunpaweravong, S.; Jantrachotechatchawan, C.; Duangrattanalert, K. Automatic Detection of Depression from Stratified Samples of Audio Data. arXiv 2021. [Google Scholar] [CrossRef]
- Saidi, A.; Ben Othman, S.; Ben Saoud, S. Hybrid CNN-SVM Classifier for Efficient Depression Detection System. In Proceedings of the 2020 4th International Conference on Advanced Systems and Emergent Technologies (IC_ASET), Hammamet, Tunisia, 19–22 March 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 229–234. [Google Scholar] [CrossRef]
- Vázquez-Romero, A.; Gallardo-Antolín, A. Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks. Entropy 2020, 22, 688. [Google Scholar] [CrossRef] [PubMed]
- Williamson, J.R.; Godoy, E.; Cha, M.; Schwarzentruber, A.; Khorrami, P.; Gwon, Y.; Kung, H.-T.; Dagli, C.; Quatieri, T.F. Detecting Depression Using Vocal, Facial and Semantic Communication Cues. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge (AVEC ‘16), Amsterdam, The Netherlands, 16 October 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 11–18. [Google Scholar] [CrossRef]
Aspect | CNN | SNN |
---|---|---|
Data type | Continuous (real-valued) | Discrete (spikes over time) |
Time dynamics | Static input | Temporal input/output |
Energy efficiency | Low (power-hungry inference) | High (sparse, event-driven) |
Training method | Standard backpropagation | Surrogate gradients, ANN-to-SNN conversion |
Biological realism | Low | High |
Use case maturity | Production-ready, standardized | Experimental, emerging |
Model | Accuracy | Macro F1 | ROC-AUC | Class 0 F1 | Class 1 F1 |
---|---|---|---|---|---|
CNN | 0.8251 | 0.8251 | 0.9070 | 0.8280 | 0.8221 |
SNN (max) | 0.7925 | 0.7909 | 0.8141 | 0.7723 | 0.8094 |
SNN (99.9%) | 0.8254 | 0.8254 | 0.8558 | 0.8285 | 0.8223 |
SNN (0.5) | 0.8132 | 0.8130 | 0.8382 | 0.8069 | 0.8190 |
SNN (0.333) | 0.8119 | 0.8118 | 0.8407 | 0.8063 | 0.8173 |
Method | Precision | Recall | F1-Score | Accuracy | Other Metrics | Reference |
---|---|---|---|---|---|---|
MFCC + Mel spectrum | 82.90% | 98.97% | 91.43% | 82% 79% * | – | [24] |
Sequence (Audio) | 0.71 | 0.56 | 0.63 | – | – | [25] |
ResNet 34 (224 × 224 px) | 0.5714 | 0.6667 | 0.6154 | 81% | – | [26] |
CNN (audio) | 0.80 | 0.92 | 0.86 | 0.79 | – | [27] |
CNN-Augm (audio) | 0.78 | 0.58 | 0.67 | – | – | [28] |
1D CNN (audio) | 0.73 | 0.92 | 0.81 | – | – | [29] |
Monolingual (DAIC WOZ) | 0.70 | 0.78 | 0.74 | 0.73 | Specificity: 0.68 | [15] |
1D CNN-GRU | 64.00 | 91.67 | 75.00 | – | – | [30] |
CNN | 0.55 | 0.86 | 0.67 | 58.57 | – | [31] |
1D-CNN (1,5 kernel size) | 0.50 (0.85) | 0.72 (0.69) | 0.59 (0.76) | 0.70 (0.70) | – | [32] |
Audio fusion | – | – | 0.57 | – | AUC: 0.72 | [33] |
CNN (Our work) | 0.8145 | 0.8421 | 0.8251 | 0.8251 | ROC-AUC: 0.9070 | This work |
SNN (99.9%, Our work) | 0.8285 | 0.8082 | 0.8254 | 0.8254 | ROC-AUC: 0.8558 | This work |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Triohin, V.; Leba, M.; Ionica, A.C. From Convolution to Spikes for Mental Health: A CNN-to-SNN Approach Using the DAIC-WOZ Dataset. Appl. Sci. 2025, 15, 9032. https://doi.org/10.3390/app15169032
Triohin V, Leba M, Ionica AC. From Convolution to Spikes for Mental Health: A CNN-to-SNN Approach Using the DAIC-WOZ Dataset. Applied Sciences. 2025; 15(16):9032. https://doi.org/10.3390/app15169032
Chicago/Turabian StyleTriohin, Victor, Monica Leba, and Andreea Cristina Ionica. 2025. "From Convolution to Spikes for Mental Health: A CNN-to-SNN Approach Using the DAIC-WOZ Dataset" Applied Sciences 15, no. 16: 9032. https://doi.org/10.3390/app15169032
APA StyleTriohin, V., Leba, M., & Ionica, A. C. (2025). From Convolution to Spikes for Mental Health: A CNN-to-SNN Approach Using the DAIC-WOZ Dataset. Applied Sciences, 15(16), 9032. https://doi.org/10.3390/app15169032