Distinguishing Between Healthy and Unhealthy Newborns Based on Acoustic Features and Deep Learning Neural Networks Tuned by Bayesian Optimization and Random Search Algorithm
Abstract
1. Introduction
- (1)
- (2)
- The experiments are conducted by highlighting the importance of acoustic features on the performance of the DFFNN and addressing the explanation of its performance.
- (3)
- We investigate the contribution of two optimization methods (Bayesian optimization versus random search) in improving the accuracy of DFFNN.
- (4)
2. Methods
2.1. Acoustic Features
2.1.1. Mel Frequency Cepstral Coefficients (MFCC)
2.1.2. AAM Features
2.1.3. Prosody Features
2.2. Deep Feedforward Neural Networks
2.3. Optimization Techniques
- (1)
- Select the starting point x and keep it as the current solution.
- (2)
- Produce a random vector dx from the parameter space and calculate f(x + dx).
- (3)
- If f(x + dx) < f(x), save the new solution as the current solution x = x + dx.
- (4)
- Stop if the stopping criterion is achieved. Otherwise, iterate from step 2.
3. Data and Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Al-Nasheri, A.; Muhammad, G.; Alsulaiman, M.; Ali, Z.; Malki, K.H.; Mesallam, T.A.; Ibrahim, M.F. Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access 2017, 6, 6961–6974. [Google Scholar] [CrossRef]
- Al-Dhief, F.T.; Baki, M.M.; Latiff, N.M.A.; Malik, N.N.N.A.; Salim, N.S.; Albader, M.A.A.; Mahyuddin, N.M.; Mohammed, M.A. Voice pathology detection and classification by adopting online sequential extreme learning machine. IEEE Access 2021, 9, 77293–77306. [Google Scholar] [CrossRef]
- Guo, C.; Chen, F.; Chang, Y.; Yan, J. Applying Random Forest classification to diagnose autism using acoustical voice-quality parameters during lexical tone production. Biomed. Signal Process. Control 2022, 77, 103811. [Google Scholar] [CrossRef]
- Kim, H.; Park, H.-Y.; Park, D.G.; Im, S.; Lee, S. Non-invasive way to diagnose dysphagia by training deep learning model with voice spectrograms. Biomed. Signal Process. Control 2023, 86 Pt B, 105259. [Google Scholar] [CrossRef]
- Gutiérrez-Serafín, B.; Andreu-Perez, J.; Pérez-Espinosa, H.; Paulmann, S.; Ding, W. Toward assessment of human voice biomarkers of brain lesions through explainable deep learning. Biomed. Signal Process. Control 2024, 87 Pt B, 105457. [Google Scholar] [CrossRef]
- Celik, G. CovidCoughNet: A new method based on convolutional neural networks and deep feature extraction using pitch-shifting data augmentation for COVID-19 detection from cough, breath, and voice signals. Comput. Biol. Med. 2023, 163, 107153. [Google Scholar] [CrossRef]
- Despotovic, V.; Ismael, M.; Cornil, M.; Mc Call, R.; Fagherazzi, G. Detection of COVID-19 from voice, cough and breathing patterns: Dataset and preliminary results. Comput. Biol. Med. 2021, 138, 104944. [Google Scholar] [CrossRef]
- Bugdol, M.D.; Bugdol, M.N.; Lipowicz, A.M.; Mitas, A.W.; Bienkowska, M.J.; Wijata, A.M. Prediction of menarcheal status of girls using voice features. Comput. Biol. Med. 2018, 100, 296–304. [Google Scholar] [CrossRef]
- Ye, W.; Jiang, Z.; Li, Q.; Liu, Y.; Mou, Z. A hybrid model for pathological voice recognition of post-stroke dysarthria by using 1DCNN and double-LSTM networks. Appl. Acoust. 2022, 197, 108934. [Google Scholar] [CrossRef]
- Yao, Y.; Powell, M.; White, J.; Feng, J.; Fu, Q.; Zhang, P.; Schmidt, D.C. A multi-stage transfer learning strategy for diagnosing a class of rare laryngeal movement disorders. Comput. Biol. Med. 2023, 166, 107534. [Google Scholar] [CrossRef]
- Turkmen, H.I.; Karsligil, M.E.; Kocak, I. Classification of laryngeal disorders based on shape and vascular defects of vocal folds. Comput. Biol. Med. 2015, 62, 76–85. [Google Scholar] [CrossRef]
- Svoboda, S.; Bořil, T.; Rusz, J.; Tykalová, T.; Horáková, D.; Guttmann, C.R.G.; Blagoev, K.B.; Hatabu, H.; Valtchinov, V.I. Assessing clinical utility of machine learning and artificial intelligence approaches to analyze speech recordings in multiple sclerosis: A pilot study. Comput. Biol. Med. 2022, 148, 105853. [Google Scholar] [CrossRef]
- Linder, R.; Albers, A.E.; Hess, M.; Pöppl, S.J.; Schönweiler, R. Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. J. Voice 2008, 22, 155–163. [Google Scholar] [CrossRef]
- Yagnavajjula, M.K.; Mittapalle, K.R.; Alku, P.; Sreenivasa, R.K.; Mitra, P. Automatic classification of neurological voice disorders using wavelet scattering features. Speech Commun. 2024, 157, 103040. [Google Scholar] [CrossRef]
- Solana-Lavalle, G.; Roberto Rosas-Romero, R. Analysis of voice as an assisting tool for detection of Parkinson’s disease and its subsequent clinical interpretation. Biomed. Signal Process. Control 2021, 66, 102415. [Google Scholar] [CrossRef]
- Lahmiri, S.; Shmuel, A. Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine. Biomed. Signal Process. Control 2019, 49, 427–433. [Google Scholar] [CrossRef]
- Lahmiri, S. Parkinson’s disease detection based on dysphonia measurements. Phys. A 2017, 471, 98–105. [Google Scholar] [CrossRef]
- Hireš, M.; Gazda, M.; Drotár, P.; Pah, N.D.; Abdul Motin, M.; Kumar, D.K. Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Comput. Biol. Med. 2022, 141, 105021. [Google Scholar] [CrossRef]
- Rosales-Pérez, C.A.; Reyes-García, C.A.; Gonzalez, J.A.; Reyes-Galaviz, J.A.; Escalante, H.E.; Orlandi, S. Classifying infant cry patterns by the Genetic Selection of a Fuzzy Model. Biomed. Signal Process. Control 2015, 17, 38–46. [Google Scholar] [CrossRef]
- Chittora, A.; Patil, H.A. Spectral analysis of infant cries and adult speech. Int. J. Speech Technol. 2016, 19, 841–856. [Google Scholar] [CrossRef]
- Sachin, M.U.; Nagaraj, R.; Samiksha, M.; Rao, S.; Moharir, M. GPU based Deep Learning to Detect Asphyxia in Neonates. Indian J. Sci. Technol. 2017, 10, 1–5. [Google Scholar] [CrossRef]
- Lim, W.J.; Muthusamy, H.; Vijean, V.; Yazid, H.; Nadarajaw, T.; Yaacob, S. Dual-tree complex wavelet packet transform and feature selection techniques for infant cry classification. J. Telecommun. Electron. Comput. Eng. 2018, 10, 75–79. [Google Scholar]
- Anders, F.; Hlawitschka, M.; Fuchs, M. Automatic classification of infant vocalization sequences with convolutional neural networks. Speech Commun. 2020, 119, 36–45. [Google Scholar] [CrossRef]
- Ashwini, K.; Durai Raj, P.M.; Srinivasan, K.; Chang, C.-Y. Deep learning assisted neonatal cry classification via support vector machine models. Front. Public Health 2021, 9, 670352. [Google Scholar]
- Ting, H.-N.; Choo, Y.-M.; Kamar, A.A. Classification of asphyxia infant cry using hybrid speech features and deep learning models. Expert Syst. Appl. 2022, 208, 118064. [Google Scholar] [CrossRef]
- Abbaskhah, A.; Sedighi, H.; Marvi, H. Infant cry classification by MFCC feature extraction with MLP and CNN structures. Biomed. Signal Process. Control 2023, 86 Pt B, 105261. [Google Scholar] [CrossRef]
- Ozseven, T. Infant cry classification by using different deep neural network models and hand-crafted features. Biomed. Signal Process. Control 2023, 83, 104648. [Google Scholar] [CrossRef]
- Lahmiri, S.; Tadj, C.; Gargour, C. Biomedical diagnosis of infant cry signal based on analysis of cepstrum by deep feedforward artificial neural networks. IEEE Instrum. Meas. Mag. 2021, 24, 24–29. [Google Scholar] [CrossRef]
- Lahmiri, S.; Tadj, C.; Gargour, C.; Bekiros, S. Deep learning systems for automatic diagnosis of infant cry signals. Chaos Solitons Fractals 2022, 154, 111700. [Google Scholar] [CrossRef]
- Matikolaie, F.S.; Kheddache, Y.; Tadj, C. Automated newborn cry diagnostic system using machine learning approach. Biomed. Signal Process. Control 2022, 73, 103434. [Google Scholar]
- Lahmiri, S.; Tadj, C.; Gargour, C.; Bekiros, S. Optimal tuning of support vector machines and k-NN algorithm by using Bayesian optimization for newborn cry signal diagnosis based on audio signal processing features. Chaos Solitons Fractals 2023, 167, 112972. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning, Adaptive Computation and Machine Learning Series; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Gelbart, M.; Snoek, J.; Adams, R.P. Bayesian Optimization with Unknown Constraints. arXiv 2014, arXiv:1403.5607. [Google Scholar] [CrossRef]
- Garnett, R. Bayesian Optimization; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
- Rastrigin, L.A. The convergence of the random search method in the extremal control of a many parameter system. Autom. Remote Control 1963, 24, 1337–1342. [Google Scholar]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Matikolaie, F.S.; Chakib, T. On the use of long-term features in a newborn cry diagnostic system. Biomed. Signal Process. Control 2020, 59, 101889. [Google Scholar]
- Sarria-Paja, M.; Falk, T.H. Fusion of auditory inspired amplitude modulation spectrum and cepstral features for whispered and normal speech speaker verification. Comput. Speech Lang. 2017, 45, 437–456. [Google Scholar] [CrossRef]
- Qiu, Y.; Yang, X.; Yang, S.; Gong, Y.; Lv, Q.; Yang, B. Classification of Infant Cry Based on Hybrid Audio Features and ResLSTM. J. Voice 2024, in press. [Google Scholar] [CrossRef]
- Qiao, X.; Jiao, S.; Li, H.; Liu, G.; Gao, X.; Li, Z. Infant cry classification using an efficient graph structure and attention-based model. Kuwait J. Sci. 2024, 51, 100221. [Google Scholar] [CrossRef]


| Bayesian Optimization | Random Search | |||||
|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | |
| All acoustic features | 87.80% ± 0.23 | 89.87% ± 0.51 | 86.35% ± 0.58 | 86.12% ± 0.33 | 89.17% ± 0.66 | 86.12% ± 0.43 |
| Prosody | 71.72% ± 0.41 | 73.58% ± 0.65 | 70.05% ± 0.62 | 72.92% ± 0.38 | 72.85% ± 0.59 | 69.55% ± 0.81 |
| MFCC | 78.58% ± 0.17 | 80.71% ± 0.52 | 76.11% ± 0.60 | 80.45% ± 0.20 | 79.60% ± 0.50 | 75.83% ± 0.77 |
| AAM | 87.34% ± 0.52 | 86.24% ± 0.66 | 86.24% ± 0.59 | 85.95% ± 0.49 | 89.01% ± 0.61 | 85.98% ± 0.90 |
| Study | Features | Machine Learning | Accuracy |
|---|---|---|---|
| [30] | AAM | PNN | 70.70% |
| [30] | AAM | SVM | 75.75% |
| [31] | AAM | SVM + BO | 83.62% |
| [31] | AAM | kNN + BO | 80.07% |
| Current study | AAM | DFFNN + BO | 87.34% ± 0.52 |
| Current study | AAM | DFFNN + RS | 85.95% ± 0.49 |
| [30] | MFCC | PNN | 68.90% |
| [30] | MFCC | SVM | 76.50% |
| [31] | MFCC | SVM + BO | 72.37% |
| [31] | MFCC | kNN + BO | 74.07% |
| Current study | MFCC | DFFNN + BO | 78.58% ± 0.17 |
| Current study | MFCC | DFFNN + RS | 80.45% ± 0.20 |
| [30] | Prosody | PNN | 52.10% |
| [30] | Prosody | SVM | 61.50% |
| [31] | Prosody | SVM + BO | 70.65% |
| [31] | Prosody | kNN + BO | 70.43% |
| Current study | Prosody | DFFNN + BO | 71.72% ± 0.41 |
| Current study | Prosody | DFFNN + RS | 72.92% ± 0.38 |
| [30] | AAM + MFCC + Prosody | PNN | 69.10% |
| [30] | AAM + MFCC + Prosody | SVM | 77.90% |
| [31] | AAM + MFCC + Prosody | SVM + BO | 81.74% |
| [31] | AAM + MFCC + Prosody | kNN + BO | 82.88% |
| Current study | AAM + MFCC + Prosody | DFFNN + BO | 87.80% ± 0.23 |
| Current study | AAM + MFCC + Prosody | DFFNN + RS | 86.12% ± 0.33 |
| Features | BO | RS |
|---|---|---|
| AAM | 480.9293 | 822.0263 |
| MFCC | 238.9178 | 364.2461 |
| Prosody | 107.4308 | 353.5441 |
| AAM + MFCC + Prosody | 244.0757 | 531.0088 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lahmiri, S.; Tadj, C.; Gargour, C. Distinguishing Between Healthy and Unhealthy Newborns Based on Acoustic Features and Deep Learning Neural Networks Tuned by Bayesian Optimization and Random Search Algorithm. Entropy 2025, 27, 1109. https://doi.org/10.3390/e27111109
Lahmiri S, Tadj C, Gargour C. Distinguishing Between Healthy and Unhealthy Newborns Based on Acoustic Features and Deep Learning Neural Networks Tuned by Bayesian Optimization and Random Search Algorithm. Entropy. 2025; 27(11):1109. https://doi.org/10.3390/e27111109
Chicago/Turabian StyleLahmiri, Salim, Chakib Tadj, and Christian Gargour. 2025. "Distinguishing Between Healthy and Unhealthy Newborns Based on Acoustic Features and Deep Learning Neural Networks Tuned by Bayesian Optimization and Random Search Algorithm" Entropy 27, no. 11: 1109. https://doi.org/10.3390/e27111109
APA StyleLahmiri, S., Tadj, C., & Gargour, C. (2025). Distinguishing Between Healthy and Unhealthy Newborns Based on Acoustic Features and Deep Learning Neural Networks Tuned by Bayesian Optimization and Random Search Algorithm. Entropy, 27(11), 1109. https://doi.org/10.3390/e27111109

