Evaluating the Performance of eGeMAPS Features in Detecting Depression Using Resampling Methods
Abstract
1. Introduction
- We show that, at least in the speaker set from E-DAIC, hypothesis can be rejected with 95% confidence in many instances, meaning a strong indication that eGeMAPS audio features and speakers’ depression status are in general dependent.
- Our results further reinforce previous results obtained in the literature by providing 95% bootstrap confidence intervals for the probability of decision errors when using eGeMAPS features.
- We show that eGeMAPS features are able to better classify depression in male speakers than in female speakers.
- We show that eGeMAPS temporal features provide higher discrimination power when detecting depression in females, while eGeMAPS energy features provide higher discrimination power when detecting depression in males.
Literature Review
2. Materials and Methods
2.1. The E-DAIC Dataset
Preparation of Audio Files from Speakers
2.2. OpenSMILE
The eGeMAPS Features Set
2.3. WEKA
2.4. Resampling Methods
2.4.1. Permutation Tests
2.4.2. Permutation Tests for Evaluating Machine Learning Classifiers
2.4.3. Bootstrap Confidence Intervals
2.4.4. Hypothesis Testing Using Bootstrap
2.5. Testing Independence of Features and Depression Labels
2.5.1. Step 1: Preprocessing
2.5.2. Step 2: Build Random Balanced Training Sets
- Include all speakers in .
- Randomly choose speakers from the non-depressed set and include them in . Each non-depressed speaker is equally likely to be chosen but any non-depressed speaker is chosen at most once.
- Let and be the subsets of speakers from and included in . For each speaker , randomly choose segments out of the segments from such a speaker and include their corresponding feature vectors and truth labels into . Each segment of speaker s is equally likely to be chosen but any segment is chosen at most once.
2.5.3. Step 3: Training and Testing a Classifier Using
2.5.4. Step 4: Permutation of Labels
2.5.5. Step 5: Permutation Test of
3. Results and Discussion
3.1. Testing Independence of Features and Depression Labels
3.2. eGeMAPS Dependence with Speaker Gender
- using the bootstrap method described in Section 2.4.3.
3.3. Detection Power of Subsets of eGeMAPS Features
- All eGeMAPS features (): subset with all 88 features defined in [25];
- Temporal features (): (6 features) rate of loudness peaks, mean length and standard deviation of continuously voiced regions and of unvoiced regions, number of continuous voiced regions per second (Features 82–87 produced by the configuration file eGeMAPSv02.conf available in the OpenSMILE website.).
- eGeMAPS features excluding temporal features ().
- Frequency features (): (24 features) mean and coefficient of variation in pitch, jitter, center frequency and bandwidth of the first, second, and third formants; and the 20th, 50th, and 80th percentiles of pitch, the 20th–80th percentile of pitch, and the mean and the standard deviation of the rising/falling slopes of pitch (Features 1–10, 31, 32, 41–44, 47–50, 53–56 produced by eGeMAPSv02.conf.).
- eGeMAPS features excluding frequency features ().
- Energy features (): (15 features) mean and coefficient of variation in shimmer, loudness, harmonic-to-noise ratio; equivalent sound level; and the 20th, 50th, and 80th percentiles of loudness, the 20th–80th percentile of loudness, and the mean and the standard deviation of the rising/falling slopes of loudness (Features 11–20, 33-36, and 88 produced by eGeMAPSv02.conf.).
- eGeMAPS features excluding energy features ().
- Spectral features (): (43 features) mean and coefficient of variation in alpha ratio (voiced and unvoiced segments), Hammarberg index (voiced and unvoiced segments), spectral slope from 0 to 500 Hz, spectral slope from 500 to 1500 Hz; relative energy of the first, second, and third formants; ratio of the energy of the spectral harmonic peak between a formant’s center frequency and the pitch frequency for the first, second, and third formants; ratio between the energy of the first harmonic to the second and between the energy of the first to the third harmonics; MFCCs 1 through 4, and spectral flux (Features 21–30, 37-40, 45, 46, 51, 52, and 57–81 produced by eGeMAPSv02.conf.).
- eGeMAPS features excluding spectral features ().
4. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| E-DAIC | Extended Distress Analysis Interview Corpus |
| eGeMAPS | Extended Geneva Minimalistic Acoustic Parameter Set |
| MFCC | Mel-Frequency Cepstrum Coefficients |
| SMO | Sequential Minimal Optimization |
| SVM | Support Vector Machine |
| WEKA | Waikato Environment for Knowledge Analysis |
References
- Goodwin, R.D.; Dierker, L.C.; Wu, M.; Galea, S.; Hoven, C.W.; Weinberger, A.H. Trends in US depression prevalence from 2015 to 2020: The widening treatment gap. Am. J. Prev. Med. 2022, 63, 726–733. [Google Scholar] [CrossRef] [PubMed]
- Ojala, M.; Garriga, G.C. Permutation tests for studying classifier performance. J. Mach. Learn. Res. 2010, 11, 1833–1863. [Google Scholar]
- Low, L.S.A.; Maddage, N.C.; Lech, M.; Sheeber, L.B.; Allen, N.B. Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans. Biomed. Eng. 2010, 58, 574–586. [Google Scholar] [CrossRef] [PubMed]
- Ringeval, F.; Schuller, B.; Valstar, M.; Cummins, N.; Cowie, R.; Tavabi, L.; Schmitt, M.; Alisamir, S.; Amiriparian, S.; Messner, E.M.; et al. AVEC 2019 workshop and challenge: State-of-mind, detecting depression with AI, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop; Association for Computing Machinery: New York, NY, USA, 2019; pp. 3–12. [Google Scholar]
- Wang, J.; Zhang, L.; Liu, T.; Pan, W.; Hu, B.; Zhu, T. Acoustic differences between healthy and depressed people: A cross-situation study. BMC Psychiatry 2019, 19, 300. [Google Scholar] [CrossRef] [PubMed]
- Low, D.M.; Bentley, K.H.; Ghosh, S.S. Automated assessment of psychiatric disorders using speech: A systematic review. Laryngoscope Investig. Otolaryngol. 2020, 5, 96–116. [Google Scholar] [CrossRef] [PubMed]
- Kiss, G.; Vicsi, K. Mono-and multi-lingual depression prediction based on speech processing. Int. J. Speech Technol. 2017, 20, 919–935. [Google Scholar] [CrossRef]
- Cummins, N.; Epps, J.; Breakspear, M.; Goecke, R. An investigation of depressed speech detection: Features and normalization. In Proceedings of the INTERSPEECH 2011 12th Annual Conference of the International Speech Communication Association; International Speech Communication Association: Grenoble, France, 2011; pp. 2997–3000. [Google Scholar]
- Huang, Z.; Epps, J.; Joachim, D. Investigation of speech landmark patterns for depression detection. IEEE Trans. Affect. Comput. 2019, 13, 666–679. [Google Scholar] [CrossRef]
- Lyu, S.H.; Yang, L.; Zhou, Z.H. A refined margin distribution analysis for forest representation learning. Adv. Neural Inf. Process. Syst. 2019, 32, 5530–5540. [Google Scholar]
- Bailey, A.; Plumbley, M.D. Gender bias in depression detection using audio features. In Proceedings of the 2021 29th European Signal Processing Conference (EUSIPCO); IEEE: Piscataway, NJ, USA, 2021; pp. 596–600. [Google Scholar]
- Ma, X.; Yang, H.; Chen, Q.; Huang, D.; Wang, Y. Depaudionet: An efficient deep model for audio based depression classification. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge; Association for Computing Machinery: New York, NY, USA, 2016; pp. 35–42. [Google Scholar]
- Kwon, N.; Hossain, S.; Blaylock, N.; O’Connell, H.; Hachen, N.; Gwin, J. Detecting Anxiety and Depression from Phone Conversations using x-vectors. In Proceedings of the Workshop on Speech, Music and Mind, Virtual, 15 September 2022; pp. 1–5. [Google Scholar]
- Brueckner, R.; Kwon, N.; Subramanian, V.; Blaylock, N.; O’Connell, H. Audio-based detection of anxiety and depression via vocal biomarkers. In Proceedings of the Future of Information and Communication Conference; Springer: Cham, Switzerland, 2024; pp. 124–141. [Google Scholar]
- Tao, F.; Esposito, A.; Vinciarelli, A. The androids corpus: A new publicly available benchmark for speech based depression detection. Depression 2023, 47, 11–19. [Google Scholar]
- Alghowinem, S.; Goecke, R.; Epps, J.; Wagner, M.; Cohn, J.F. Cross-cultural depression recognition from vocal biomarkers. In Proceedings of the 17th Annual Conference of the International Speech Communication Association, Interspeech 2016, San Francisco, CA, USA, 8–12 September 2016; pp. 1943–1947. [Google Scholar]
- Cummins, N.; Scherer, S.; Krajewski, J.; Schnieder, S.; Epps, J.; Quatieri, T.F. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 2015, 71, 10–49. [Google Scholar] [CrossRef]
- Quatieri, T.F.; Malyska, N. Vocal-source biomarkers for depression: A link to psychomotor activity. In Proceedings of the INTERSPEECH 2012 ISCA’s 13th Annual Conference, Portland, OR, USA, 9–13 September 2012; Volume 2, pp. 1059–1062. [Google Scholar]
- Alghowinem, S.; Gedeon, T.; Goecke, R.; Cohn, J.F.; Parker, G. Interpretation of depression detection models via feature selection methods. IEEE Trans. Affect. Comput. 2020, 14, 133–152. [Google Scholar] [CrossRef] [PubMed]
- Kroenke, K.; Strine, T.W.; Spitzer, R.L.; Williams, J.B.; Berry, J.T.; Mokdad, A.H. The PHQ-8 as a measure of current depression in the general population. J. Affect. Disord. 2009, 114, 163–173. [Google Scholar] [CrossRef] [PubMed]
- DeVault, D.; Artstein, R.; Benn, G.; Dey, T.; Fast, E.; Gainer, A.; Georgila, K.; Gratch, J.; Hartholt, A.; Lhommet, M.; et al. SimSensei Kiosk: A virtual human interviewer for healthcare decision support. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, Paris, France, 5–9 May 2014; pp. 1061–1068. [Google Scholar]
- Gratch, J.; Artstein, R.; Lucas, G.; Stratou, G.; Scherer, S.; Nazarian, A.; Wood, R.; Boberg, J.; DeVault, D.; Marsella, S.; et al. The Distress Analysis Interview Corpus of human and computer interviews. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26–31 May 2014; pp. 3123–3128. [Google Scholar]
- Eyben, F.; Wöllmer, M.; Schuller, B. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia; Association for Computing Machinery: New York, NY, USA, 2010; pp. 1459–1462. [Google Scholar]
- Ringeval, F.; Schuller, B.; Valstar, M.; Gratch, J.; Cowie, R.; Scherer, S.; Mozgai, S.; Cummins, N.; Schmitt, M.; Pantic, M. Avec 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge; Association for Computing Machinery: New York, NY, USA, 2017; pp. 3–9. [Google Scholar]
- Eyben, F.; Scherer, K.R.; Schuller, B.W.; Sundberg, J.; André, E.; Busso, C.; Devillers, L.Y.; Epps, J.; Laukka, P.; Narayanan, S.S.; et al. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 2015, 7, 190–202. [Google Scholar] [CrossRef]
- Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench. Online Appendix for“ Data Mining: Practical Machine Learning Tools and Techniques”; The University of Waikato: Hamilton, New Zealand, 2016. [Google Scholar]
- Platt, J. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In Advances in Kernel Methods—Support Vector Learning; Schoelkopf, B., Burges, C., Smola, A., Eds.; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- John, G.H.; Langley, P. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; pp. 338–345. [Google Scholar]
- Quinlan, R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers: San Mateo, CA, USA, 1993. [Google Scholar]
- Good, P.I. Resampling Methods; Springer: Cham, Switzerland, 2006. [Google Scholar]
- Chernick, M.R.; LaBudde, R.A. An Introduction to Bootstrap Methods with Applications to R; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Campbell, E.L.; Dineley, J.; Conde, P.; Matcham, F.; White, K.M.; Oetzmann, C.; Simblett, S.; Bruce, S.; Folarin, A.A.; Wykes, T.; et al. Classifying depression symptom severity: Assessment of speech representations in personalized and generalized machine learning models. In Proceedings of the INTERSPEECH 2023; ISCA: Dublin, Ireland, 2023; Volume 2023, pp. 1738–1742. [Google Scholar]


| 2 s | 4 s | 6 s | 8 s | |
|---|---|---|---|---|
| female | ||||
| SVM | ||||
| Bayes | ||||
| Tree | ||||
| male | ||||
| SVM | ||||
| Bayes | ||||
| Tree |
| 2 s | 4 s | 6 s | 8 s | |
|---|---|---|---|---|
| female | ||||
| SVM | ||||
| Bayes | ||||
| Tree | ||||
| male | ||||
| SVM | ||||
| Bayes | ||||
| Tree |
| 2 s | 4 s | 6 s | 8 s | |
|---|---|---|---|---|
| female | ||||
| SVM | ||||
| Bayes | ||||
| Tree | ||||
| male | ||||
| SVM | ||||
| Bayes | ||||
| Tree |
| 2 s | 4 s | 6 s | 8 s | |
|---|---|---|---|---|
| SVM | ||||
| Bayes | ||||
| Tree |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Turnipseed, J.; Fonseca, B.J.B., Jr. Evaluating the Performance of eGeMAPS Features in Detecting Depression Using Resampling Methods. Signals 2026, 7, 41. https://doi.org/10.3390/signals7030041
Turnipseed J, Fonseca BJB Jr. Evaluating the Performance of eGeMAPS Features in Detecting Depression Using Resampling Methods. Signals. 2026; 7(3):41. https://doi.org/10.3390/signals7030041
Chicago/Turabian StyleTurnipseed, Joshua, and Benedito J. B. Fonseca, Jr. 2026. "Evaluating the Performance of eGeMAPS Features in Detecting Depression Using Resampling Methods" Signals 7, no. 3: 41. https://doi.org/10.3390/signals7030041
APA StyleTurnipseed, J., & Fonseca, B. J. B., Jr. (2026). Evaluating the Performance of eGeMAPS Features in Detecting Depression Using Resampling Methods. Signals, 7(3), 41. https://doi.org/10.3390/signals7030041

