This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Study of Deep Learning Models for Audio Classification of Infant Crying in a Baby Monitoring System
by
Denisa Maria Herlea
Denisa Maria Herlea ,
Bogdan Iancu
Bogdan Iancu
and
Eugen-Richard Ardelean
Eugen-Richard Ardelean *
Computer Science Department, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania
*
Author to whom correspondence should be addressed.
Informatics 2025, 12(2), 50; https://doi.org/10.3390/informatics12020050 (registering DOI)
Submission received: 12 February 2025
/
Revised: 25 April 2025
/
Accepted: 15 May 2025
/
Published: 16 May 2025
Abstract
This study investigates the ability of well-known deep learning models, such as ResNet and EfficientNet, to perform audio-based infant cry detection. By comparing the performance of different machine learning algorithms, this study seeks to determine the most effective approach for the detection of infant crying, enhancing the functionality of baby monitoring systems and contributing to a more advanced understanding of audio-based deep learning applications. Understanding and accurately detecting a baby’s cries is crucial for ensuring their safety and well-being, a concern shared by new and expecting parents worldwide. Despite advancements in child health, as noted by UNICEF’s 2022 report of the lowest ever recorded child mortality rate, there is still room for technological improvement. This paper presents a comprehensive evaluation of deep learning models for infant cry detection, analyzing the performance of various architectures on spectrogram and MFCC feature representations. A key focus is the comparison between pretrained and non-pretrained models, assessing their ability to generalize across diverse audio environments. Through extensive experimentation, ResNet50 and DenseNet trained on spectrograms emerged as the most effective architectures, significantly outperforming other models in classification accuracy. Additionally, the study investigates the impact of feature extraction techniques, dataset augmentation, and model fine-tuning, providing deeper insights into the role of representation learning in audio classification. The findings contribute to the growing field of audio-based deep learning applications, offering a detailed comparative study of model architectures, feature representations, and training strategies for infant cry detection.
Share and Cite
MDPI and ACS Style
Herlea, D.M.; Iancu, B.; Ardelean, E.-R.
A Study of Deep Learning Models for Audio Classification of Infant Crying in a Baby Monitoring System. Informatics 2025, 12, 50.
https://doi.org/10.3390/informatics12020050
AMA Style
Herlea DM, Iancu B, Ardelean E-R.
A Study of Deep Learning Models for Audio Classification of Infant Crying in a Baby Monitoring System. Informatics. 2025; 12(2):50.
https://doi.org/10.3390/informatics12020050
Chicago/Turabian Style
Herlea, Denisa Maria, Bogdan Iancu, and Eugen-Richard Ardelean.
2025. "A Study of Deep Learning Models for Audio Classification of Infant Crying in a Baby Monitoring System" Informatics 12, no. 2: 50.
https://doi.org/10.3390/informatics12020050
APA Style
Herlea, D. M., Iancu, B., & Ardelean, E.-R.
(2025). A Study of Deep Learning Models for Audio Classification of Infant Crying in a Baby Monitoring System. Informatics, 12(2), 50.
https://doi.org/10.3390/informatics12020050
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.