Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (155)

Search Parameters:
Keywords = Mel frequency cepstral coefficients (MFCC) features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 1530 KiB  
Article
Enhanced Respiratory Sound Classification Using Deep Learning and Multi-Channel Auscultation
by Yeonkyeong Kim, Kyu Bom Kim, Ah Young Leem, Kyuseok Kim and Su Hwan Lee
J. Clin. Med. 2025, 14(15), 5437; https://doi.org/10.3390/jcm14155437 - 1 Aug 2025
Viewed by 127
Abstract
Background/Objectives: Identifying and classifying abnormal lung sounds is essential for diagnosing patients with respiratory disorders. In particular, the simultaneous recording of auscultation signals from multiple clinically relevant positions offers greater diagnostic potential compared to traditional single-channel measurements. This study aims to improve [...] Read more.
Background/Objectives: Identifying and classifying abnormal lung sounds is essential for diagnosing patients with respiratory disorders. In particular, the simultaneous recording of auscultation signals from multiple clinically relevant positions offers greater diagnostic potential compared to traditional single-channel measurements. This study aims to improve the accuracy of respiratory sound classification by leveraging multichannel signals and capturing positional characteristics from multiple sites in the same patient. Methods: We evaluated the performance of respiratory sound classification using multichannel lung sound data with a deep learning model that combines a convolutional neural network (CNN) and long short-term memory (LSTM), based on mel-frequency cepstral coefficients (MFCCs). We analyzed the impact of the number and placement of channels on classification performance. Results: The results demonstrated that using four-channel recordings improved accuracy, sensitivity, specificity, precision, and F1-score by approximately 1.11, 1.15, 1.05, 1.08, and 1.13 times, respectively, compared to using three, two, or single-channel recordings. Conclusions: This study confirms that multichannel data capture a richer set of features corresponding to various respiratory sound characteristics, leading to significantly improved classification performance. The proposed method holds promise for enhancing sound classification accuracy not only in clinical applications but also in broader domains such as speech and audio processing. Full article
(This article belongs to the Section Respiratory Medicine)
Show Figures

Figure 1

23 pages, 3741 KiB  
Article
Multi-Corpus Benchmarking of CNN and LSTM Models for Speaker Gender and Age Profiling
by Jorge Jorrin-Coz, Mariko Nakano, Hector Perez-Meana and Leobardo Hernandez-Gonzalez
Computation 2025, 13(8), 177; https://doi.org/10.3390/computation13080177 - 23 Jul 2025
Viewed by 287
Abstract
Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, [...] Read more.
Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, crowdsourced Mozilla Common Voice, and in-the-wild VoxCeleb1. All models share the same architecture, optimizer, and data preprocessing; no corpus-specific hyperparameter tuning is applied. We perform a detailed preprocessing and feature extraction procedure, evaluating multiple configurations and validating their applicability and effectiveness in improving the obtained results. A feature analysis shows that Mel spectrograms benefit CNNs, whereas Mel Frequency Cepstral Coefficients (MFCCs) suit LSTMs, and that the optimal Mel-bin count grows with corpus Signal Noise Rate (SNR). With this fixed recipe, EfficientNet achieves 99.82% gender accuracy on Common Voice (+1.25 pp over the previous best) and 98.86% on VoxCeleb1 (+0.57 pp). MobileNet attains 99.86% age-group accuracy on Common Voice (+2.86 pp) and a 5.35-year MAE for age estimation on TIMIT using a lightweight configuration. The consistent, near-state-of-the-art results across three acoustically diverse datasets substantiate the robustness and versatility of the proposed pipeline. Code and pre-trained weights are released to facilitate downstream research. Full article
(This article belongs to the Section Computational Engineering)
Show Figures

Graphical abstract

10 pages, 857 KiB  
Proceeding Paper
Implementation of a Prototype-Based Parkinson’s Disease Detection System Using a RISC-V Processor
by Krishna Dharavathu, Pavan Kumar Sankula, Uma Maheswari Vullanki, Subhan Khan Mohammad, Sai Priya Kesapatnapu and Sameer Shaik
Eng. Proc. 2025, 87(1), 97; https://doi.org/10.3390/engproc2025087097 - 21 Jul 2025
Viewed by 195
Abstract
In the wide range of human diseases, Parkinson’s disease (PD) has a high incidence, according to a recent survey by the World Health Organization (WHO). According to WHO records, this chronic disease has affected approximately 10 million people worldwide. Patients who do not [...] Read more.
In the wide range of human diseases, Parkinson’s disease (PD) has a high incidence, according to a recent survey by the World Health Organization (WHO). According to WHO records, this chronic disease has affected approximately 10 million people worldwide. Patients who do not receive an early diagnosis may develop an incurable neurological disorder. PD is a degenerative disorder of the brain, characterized by the impairment of the nigrostriatal system. A wide range of symptoms of motor and non-motor impairment accompanies this disorder. By using new technology, the PD is detected through speech signals of the PD victims by using the reduced instruction set computing 5th version (RISC-V) processor. The RISC-V microcontroller unit (MCU) was designed for the voice-controlled human-machine interface (HMI). With the help of signal processing and feature extraction methods, the digital signal is impaired by the impairment of the nigrostriatal system. These speech signals can be classified through classifier modules. A wide range of classifier modules are used to classify the speech signals as normal or abnormal to identify PD. We use Matrix Laboratory (MATLAB R2021a_v9.10.0.1602886) to analyze the data, develop algorithms, create modules, and develop the RISC-V processor for embedded implementation. Machine learning (ML) techniques are also used to extract features such as pitch, tremor, and Mel-frequency cepstral coefficients (MFCCs). Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

19 pages, 1039 KiB  
Article
Prediction of Parkinson Disease Using Long-Term, Short-Term Acoustic Features Based on Machine Learning
by Mehdi Rashidi, Serena Arima, Andrea Claudio Stetco, Chiara Coppola, Debora Musarò, Marco Greco, Marina Damato, Filomena My, Angela Lupo, Marta Lorenzo, Antonio Danieli, Giuseppe Maruccio, Alberto Argentiero, Andrea Buccoliero, Marcello Dorian Donzella and Michele Maffia
Brain Sci. 2025, 15(7), 739; https://doi.org/10.3390/brainsci15070739 - 10 Jul 2025
Viewed by 504
Abstract
Background: Parkinson’s disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease, affecting countless individuals worldwide. PD is characterized by the onset of a marked motor symptomatology in association with several non-motor manifestations. The clinical phase of the disease is usually [...] Read more.
Background: Parkinson’s disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease, affecting countless individuals worldwide. PD is characterized by the onset of a marked motor symptomatology in association with several non-motor manifestations. The clinical phase of the disease is usually preceded by a long prodromal phase, devoid of overt motor symptomatology but often showing some conditions such as sleep disturbance, constipation, anosmia, and phonatory changes. To date, speech analysis appears to be a promising digital biomarker to anticipate even 10 years before the onset of clinical PD, as well serving as a useful prognostic tool for patient follow-up. That is why, the voice can be nominated as the non-invasive method to detect PD from healthy subjects (HS). Methods: Our study was based on cross-sectional study to analysis voice impairment. A dataset comprising 81 voice samples (41 from healthy individuals and 40 from PD patients) was utilized to train and evaluate common machine learning (ML) models using various types of features, including long-term (jitter, shimmer, and cepstral peak prominence (CPP)), short-term features (Mel-frequency cepstral coefficient (MFCC)), and non-standard measurements (pitch period entropy (PPE) and recurrence period density entropy (RPDE)). The study adopted multiple machine learning (ML) algorithms, including random forest (RF), K-nearest neighbors (KNN), decision tree (DT), naïve Bayes (NB), support vector machines (SVM), and logistic regression (LR). Cross-validation technique was applied to ensure the reliability of performance metrics on train and test subsets. These metrics (accuracy, recall, and precision), help determine the most effective models for distinguishing PD from healthy subjects. Result: Among all the algorithms used in this research, random forest (RF) was the best-performing model, achieving an accuracy of 82.72% with a ROC-AUC score of 89.65%. Although other models, such as support vector machine (SVM), could be considered with an accuracy of 75.29% and a ROC-AUC score of 82.63%, RF was by far the best one when evaluated across all metrics. The K-nearest neighbor (KNN) and decision tree (DT) performed the worst. Notably, by combining a comprehensive set of long-term, short-term, and non-standard acoustic features, unlike previous studies that typically focused on only a subset, our study achieved higher predictive performance, offering a more robust model for early PD detection. Conclusions: This study highlights the potential of combining advanced acoustic analysis with ML algorithms to develop non-invasive and reliable tools for early PD detection, offering substantial benefits for the healthcare sector. Full article
(This article belongs to the Section Neurodegenerative Diseases)
Show Figures

Figure 1

22 pages, 4293 KiB  
Article
Speech-Based Parkinson’s Detection Using Pre-Trained Self-Supervised Automatic Speech Recognition (ASR) Models and Supervised Contrastive Learning
by Hadi Sedigh Malekroodi, Nuwan Madusanka, Byeong-il Lee and Myunggi Yi
Bioengineering 2025, 12(7), 728; https://doi.org/10.3390/bioengineering12070728 - 1 Jul 2025
Viewed by 831
Abstract
Diagnosing Parkinson’s disease (PD) through speech analysis is a promising area of research, as speech impairments are often one of the early signs of the disease. This study investigates the efficacy of fine-tuning pre-trained Automatic Speech Recognition (ASR) models, specifically Wav2Vec 2.0 and [...] Read more.
Diagnosing Parkinson’s disease (PD) through speech analysis is a promising area of research, as speech impairments are often one of the early signs of the disease. This study investigates the efficacy of fine-tuning pre-trained Automatic Speech Recognition (ASR) models, specifically Wav2Vec 2.0 and HuBERT, for PD detection using transfer learning. These models, pre-trained on large unlabeled datasets, can be capable of learning rich speech representations that capture acoustic markers of PD. The study also proposes the integration of a supervised contrastive (SupCon) learning approach to enhance the models’ ability to distinguish PD-specific features. Additionally, the proposed ASR-based features were compared against two common acoustic feature sets: mel-frequency cepstral coefficients (MFCCs) and the extended Geneva minimalistic acoustic parameter set (eGeMAPS) as a baseline. We also employed a gradient-based method, Grad-CAM, to visualize important speech regions contributing to the models’ predictions. The experiments, conducted using the NeuroVoz dataset, demonstrated that features extracted from the pre-trained ASR models exhibited superior performance compared to the baseline features. The results also reveal that the method integrating SupCon consistently outperforms traditional cross-entropy (CE)-based models. Wav2Vec 2.0 and HuBERT with SupCon achieved the highest F1 scores of 90.0% and 88.99%, respectively. Additionally, their AUC scores in the ROC analysis surpassed those of the CE models, which had comparatively lower AUCs, ranging from 0.84 to 0.89. These results highlight the potential of ASR-based models as scalable, non-invasive tools for diagnosing and monitoring PD, offering a promising avenue for the early detection and management of this debilitating condition. Full article
Show Figures

Figure 1

28 pages, 1634 KiB  
Review
AI-Powered Vocalization Analysis in Poultry: Systematic Review of Health, Behavior, and Welfare Monitoring
by Venkatraman Manikandan and Suresh Neethirajan
Sensors 2025, 25(13), 4058; https://doi.org/10.3390/s25134058 - 29 Jun 2025
Viewed by 989
Abstract
Artificial intelligence and bioacoustics represent a paradigm shift in non-invasive poultry welfare monitoring through advanced vocalization analysis. This comprehensive systematic review critically examines the transformative evolution from traditional acoustic feature extraction—including Mel-Frequency Cepstral Coefficients (MFCCs), spectral entropy, and spectrograms—to cutting-edge deep learning architectures [...] Read more.
Artificial intelligence and bioacoustics represent a paradigm shift in non-invasive poultry welfare monitoring through advanced vocalization analysis. This comprehensive systematic review critically examines the transformative evolution from traditional acoustic feature extraction—including Mel-Frequency Cepstral Coefficients (MFCCs), spectral entropy, and spectrograms—to cutting-edge deep learning architectures encompassing Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, attention mechanisms, and groundbreaking self-supervised models such as wav2vec2 and Whisper. The investigation reveals compelling evidence for edge computing deployment via TinyML frameworks, addressing critical scalability challenges in commercial poultry environments characterized by acoustic complexity and computational constraints. Advanced applications spanning emotion recognition, disease detection, and behavioral phenotyping demonstrate unprecedented potential for real-time welfare assessment. Through rigorous bibliometric co-occurrence mapping and thematic clustering analysis, this review exposes persistent methodological bottlenecks: dataset standardization deficits, evaluation protocol inconsistencies, and algorithmic interpretability limitations. Critical knowledge gaps emerge in cross-species domain generalization and contextual acoustic adaptation, demanding urgent research prioritization. The findings underscore explainable AI integration as essential for establishing stakeholder trust and regulatory compliance in automated welfare monitoring systems. This synthesis positions acoustic AI as a cornerstone technology enabling ethical, transparent, and scientifically robust precision livestock farming, bridging computational innovation with biological relevance for sustainable poultry production systems. Future research directions emphasize multi-modal sensor integration, standardized evaluation frameworks, and domain-adaptive models capable of generalizing across diverse poultry breeds, housing conditions, and environmental contexts while maintaining interpretability for practical farm deployment. Full article
(This article belongs to the Special Issue Feature Papers in Smart Agriculture 2025)
Show Figures

Figure 1

26 pages, 1521 KiB  
Article
AI-Based Classification of Pediatric Breath Sounds: Toward a Tool for Early Respiratory Screening
by Lichuan Liu, Wei Li and Beth Moxley
Appl. Sci. 2025, 15(13), 7145; https://doi.org/10.3390/app15137145 - 25 Jun 2025
Viewed by 441
Abstract
Context: Respiratory morbidity is a leading cause of children’s consultations with general practitioners. Auscultation, the act of listening to breath sounds, is a crucial diagnostic method for respiratory system diseases. Problem: Parents and caregivers often lack the necessary knowledge and experience to identify [...] Read more.
Context: Respiratory morbidity is a leading cause of children’s consultations with general practitioners. Auscultation, the act of listening to breath sounds, is a crucial diagnostic method for respiratory system diseases. Problem: Parents and caregivers often lack the necessary knowledge and experience to identify subtle differences in children’s breath sounds. Furthermore, obtaining reliable feedback from young children about their physical condition is challenging. Methods: The use of a human–artificial intelligence (AI) tool is an essential component for screening and monitoring young children’s respiratory diseases. Using clinical data to design and validate the proposed approaches, we propose novel methods for recognizing and classifying children’s breath sounds. Different breath sound signals were analyzed in the time domain, frequency domain, and using spectrogram representations. Breath sound detection and segmentation were performed using digital signal processing techniques. Multiple features—including Mel–Frequency Cepstral Coefficients (MFCCs), Linear Prediction Coefficients (LPCs), Linear Prediction Cepstral Coefficients (LPCCs), spectral entropy, and Dynamic Linear Prediction Coefficients (DLPCs)—were extracted to capture both time and frequency characteristics. These features were then fed into various classifiers, including K-Nearest Neighbor (KNN), artificial neural networks (ANNs), hidden Markov models (HMMs), logistic regression, and decision trees, for recognition and classification. Main Findings: Experimental results from across 120 infants and preschoolers (2 months to 6 years) with respiratory disease (30 asthma, 30 croup, 30 pneumonia, and 30 normal) verified the performance of the proposed approaches. Conclusions: The proposed AI system provides a real-time diagnostic platform to improve clinical respiratory management and outcomes in young children, thereby reducing healthcare costs. Future work exploring additional respiratory diseases is warranted. Full article
Show Figures

Figure 1

15 pages, 1458 KiB  
Article
Photoplethysmography Feature Extraction for Non-Invasive Glucose Estimation by Means of MFCC and Machine Learning Techniques
by Christian Salamea-Palacios, Melissa Montalvo-López, Raquel Orellana-Peralta and Javier Viñanzaca-Figueroa
Biosensors 2025, 15(7), 408; https://doi.org/10.3390/bios15070408 - 24 Jun 2025
Viewed by 510
Abstract
Diabetes Mellitus is considered one of the most widespread diseases in the world. Traditional glucose monitoring devices carry discomfort and risks associated with the frequent extraction of blood from users. The present article proposes a noninvasive glucose estimation system based on the application [...] Read more.
Diabetes Mellitus is considered one of the most widespread diseases in the world. Traditional glucose monitoring devices carry discomfort and risks associated with the frequent extraction of blood from users. The present article proposes a noninvasive glucose estimation system based on the application of Mel Frequency Cepstral Coefficients (MFCCs) for the characterization of photoplethysmographic signals (PPG). Two variants of the MFCC feature extraction methods are evaluated along with three machine learning techniques for the development of an effective regression function for the estimation of glucose concentration. A comparison between the performance of the algorithms revealed that the best combination achieved a mean absolute error of 9.85 mg/dL and a correlation of 0.94 between the estimated concentration and the real glucose values. Similarly, 99.53% of the validation samples were distributed within zones A and B of the Clarke Error Grid Analysis. The proposed system achieves levels of correlation comparable to analogous technologies that require earlier calibration for its operation, which indicates a strong potential for the future use of the algorithm as an alternative to invasive monitoring devices. Full article
(This article belongs to the Section Wearable Biosensors)
Show Figures

Figure 1

22 pages, 5083 KiB  
Article
Intelligent Mobile-Assisted Language Learning: A Deep Learning Approach for Pronunciation Analysis and Personalized Feedback
by Fengqin Liu, Korawit Orkphol, Natthapon Pannurat, Thanat Sooknuan, Thanin Muangpool, Sanya Kuankid and Montri Phothisonothai
Inventions 2025, 10(4), 46; https://doi.org/10.3390/inventions10040046 - 24 Jun 2025
Viewed by 636
Abstract
This paper introduces an innovative mobile-assisted language-learning (MALL) system that harnesses deep learning technology to analyze pronunciation patterns and deliver real-time, personalized feedback. Drawing inspiration from how the human brain processes speech through neural pathways, our system analyzes multiple speech features using spectrograms, [...] Read more.
This paper introduces an innovative mobile-assisted language-learning (MALL) system that harnesses deep learning technology to analyze pronunciation patterns and deliver real-time, personalized feedback. Drawing inspiration from how the human brain processes speech through neural pathways, our system analyzes multiple speech features using spectrograms, mel-frequency cepstral coefficients (MFCCs), and formant frequencies in a manner that mirrors the auditory cortex’s interpretation of sound. The core of our approach utilizes a convolutional neural network (CNN) to classify pronunciation patterns from user-recorded speech. To enhance the assessment accuracy and provide nuanced feedback, we integrated a fuzzy inference system (FIS) that helps learners identify and correct specific pronunciation errors. The experimental results demonstrate that our multi-feature model achieved 82.41% to 90.52% accuracies in accent classification across diverse linguistic contexts. The user testing revealed statistically significant improvements in pronunciation skills, where learners showed a 5–20% enhancement in accuracy after using the system. The proposed MALL system offers a portable, accessible solution for language learners while establishing a foundation for future research in multilingual functionality and mobile platform optimization. By combining advanced speech analysis with intuitive feedback mechanisms, this system addresses a critical challenge in language acquisition and promotes more effective self-directed learning. Full article
Show Figures

Figure 1

15 pages, 49760 KiB  
Article
Rapid Diagnosis of Distributed Acoustic Sensing Vibration Signals Using Mel-Frequency Cepstral Coefficients and Liquid Neural Networks
by Haitao Liu, Yunfan Xu, Yuefeng Qi, Haosong Yang and Weihong Bi
Sensors 2025, 25(10), 3090; https://doi.org/10.3390/s25103090 - 13 May 2025
Cited by 1 | Viewed by 599
Abstract
Distributed Acoustic Sensing (DAS) systems face increasing challenges in massive data processing and real-time fault diagnosis due to the growing complexity of industrial environments and data volume. To address these issues, an end-to-end diagnostic framework is developed, integrating Mel-Frequency Cepstral Coefficients (MFCCs) for [...] Read more.
Distributed Acoustic Sensing (DAS) systems face increasing challenges in massive data processing and real-time fault diagnosis due to the growing complexity of industrial environments and data volume. To address these issues, an end-to-end diagnostic framework is developed, integrating Mel-Frequency Cepstral Coefficients (MFCCs) for high-efficiency signal compression and Liquid Neural Networks (LNNs) for lightweight, real-time classification. The MFCC algorithm, originally used in speech processing, is adapted to extract key features from DAS vibration signals, achieving compression ratios of 60–100× without significant information loss. LNNs’ dynamic topology and sparse activation enable high accuracy with extremely low latency and minimal computational cost, making it highly suitable for edge deployment. The proposed framework was validated both in simulated environments and on a real-world conveyor belt system at Qinhuangdao Port, where it achieved 100% accuracy across four vibration modes over 14 weeks of operation. Comparative experiments show that LNNs outperform traditional models such as 1D-CNN and LSTMs in terms of accuracy, inference speed, and model size. The proposed MFCC-LNN pipeline also demonstrates strong cross-domain generalization capabilities in pipeline monitoring, seismic detection, and speech signal processing. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

33 pages, 4811 KiB  
Article
Enhancing the Prediction of Episodes of Aggression in Patients with Dementia Using Audio-Based Detection: A Multimodal Late Fusion Approach with a Meta-Classifier
by Ioannis Galanakis, Rigas Filippos Soldatos, Nikitas Karanikolas, Athanasios Voulodimos, Ioannis Voyiatzis and Maria Samarakou
Appl. Sci. 2025, 15(10), 5351; https://doi.org/10.3390/app15105351 - 10 May 2025
Cited by 1 | Viewed by 568
Abstract
This study presents an enhancement in the prediction of aggressive outbursts in dementia patients from our previous work, by integrating audio-based violence detection into our previous visual-based aggressive body movement detections. By combining audio and visual information, we aim to further enhance the [...] Read more.
This study presents an enhancement in the prediction of aggressive outbursts in dementia patients from our previous work, by integrating audio-based violence detection into our previous visual-based aggressive body movement detections. By combining audio and visual information, we aim to further enhance the model’s capabilities and make it more suitable for real-world scenario applications. This current work utilizes an audio dataset, containing various audio segments capturing vocal expressions during aggressive and non-aggressive scenarios. Various noise-filtering techniques were performed on the audio files using Mel-frequency cepstral coefficients (MFCCs), frequency filtering, and speech prosody to extract clear information from the audio features. Furthermore, we perform a late fusion rule to merge the predictions of the two models into a unified trained meta-classifier to determine the further improvement of the model with the audio integrated into it with a higher aim for a more precise and multimodal approach in detecting and predicting aggressive outburst behavior in patients suffering from dementia. The analysis of the correlations in our multimodal approach suggests that the accuracy of the early detection models is improved, providing a novel proof of concept with the appropriate findings to advance the understanding of aggression prediction in clinical settings and offer more effective intervention tactics from caregivers. Full article
(This article belongs to the Special Issue Big Data Analytics and Deep Learning for Predictive Maintenance)
Show Figures

Figure 1

19 pages, 2092 KiB  
Article
Multi-Detection-Based Speech Emotion Recognition Using Autoencoder in Mobility Service Environment
by Jeong Min Oh, Jin Kwan Kim and Joon Young Kim
Electronics 2025, 14(10), 1915; https://doi.org/10.3390/electronics14101915 - 8 May 2025
Viewed by 661
Abstract
In mobility service environments, recognizing the user condition and driving status is critical in driving safety and experiences. While speech emotion recognition is one of the possible features to predict the driver status, current emotion recognition models have a fundamental limitation: they target [...] Read more.
In mobility service environments, recognizing the user condition and driving status is critical in driving safety and experiences. While speech emotion recognition is one of the possible features to predict the driver status, current emotion recognition models have a fundamental limitation: they target to classify only single emotion classes, not multi-classes. It prevents the comprehensive understanding of the driver’s condition and intention during driving. In addition, mobility devices inherently generate noises that might affect speech emotion recognition performances in the mobility service. Considering mobility service environments, we investigate possible models that detect multiple emotions while mitigating noise issues. In this paper, we propose a speech-emotion recognition model based on the autoencoder for multi-emotion detection. First, we analyze the Mel Frequency Cepstral Coefficients (MFCCs) to design the specific features. We also develop a multi-emotion detection scheme based on an autoencoder to detect multiple emotions with substantial flexibility compared to existing models. With our proposed scheme, we investigate and analyze mobility noise impacts and mitigation approaches to evaluate performance results. Full article
Show Figures

Figure 1

14 pages, 16532 KiB  
Article
Research on the UAV Sound Recognition Method Based on Frequency Band Feature Extraction
by Jilong Zhong, Aigen Fan, Kuangang Fan, Wenjie Pan and Lu Zeng
Drones 2025, 9(5), 351; https://doi.org/10.3390/drones9050351 - 5 May 2025
Viewed by 879
Abstract
The unmanned aerial vehicle (UAV) industry is developing rapidly, and the application of UAVs is becoming increasingly widespread. Due to the lowering of the threshold for using UAVs, the random flight of UAVs poses safety hazards. In response to the safety risks associated [...] Read more.
The unmanned aerial vehicle (UAV) industry is developing rapidly, and the application of UAVs is becoming increasingly widespread. Due to the lowering of the threshold for using UAVs, the random flight of UAVs poses safety hazards. In response to the safety risks associated with the unauthorized operation of UAVs, research on anti-UAV technology has become imperative. This study proposes an improved sound feature extraction method that utilizes the frequency distribution features of UAV sounds. By analyzing the spectrogram of UAV sounds, it was found that the classic Mel Frequency Cepstral Coefficients (MFCC) feature extraction method does not match the frequency bands of UAV sounds. Based on the MFCC feature extraction algorithm framework, an improved frequency band feature extraction method was proposed. This method replaces the Mel filter in the classic algorithm with a piecewise linear function with the frequency band weight as the slope, which can effectively suppress the influence of low- and high-frequency noise and fully focus on the different frequency band feature data of UAV sounds. In this study, the actual flight sounds of UAVs were collected, and the sound feature matrix of UAVs was extracted using the frequency band feature extraction method. The sound features were classified and recognized using a Convolutional Neural Network (CNN). The experimental results show that the frequency band feature extraction method has a better recognition effect compared to the classic MFCC feature extraction method. Full article
Show Figures

Figure 1

18 pages, 4885 KiB  
Article
Decoding Poultry Welfare from Sound—A Machine Learning Framework for Non-Invasive Acoustic Monitoring
by Venkatraman Manikandan and Suresh Neethirajan
Sensors 2025, 25(9), 2912; https://doi.org/10.3390/s25092912 - 5 May 2025
Cited by 2 | Viewed by 1436
Abstract
Acoustic monitoring presents a promising, non-invasive modality for assessing animal welfare in precision livestock farming. In poultry, vocalizations encode biologically relevant cues linked to health status, behavioral states, and environmental stress. This study proposes an integrated analytical framework that combines signal-level statistical analysis [...] Read more.
Acoustic monitoring presents a promising, non-invasive modality for assessing animal welfare in precision livestock farming. In poultry, vocalizations encode biologically relevant cues linked to health status, behavioral states, and environmental stress. This study proposes an integrated analytical framework that combines signal-level statistical analysis with machine learning and deep learning classifiers to interpret chicken vocalizations in a welfare assessment context. The framework was evaluated using three complementary datasets encompassing health-related vocalizations, behavioral call types, and stress-induced acoustic responses. The pipeline employs a multistage process comprising high-fidelity signal acquisition, feature extraction (e.g., mel-frequency cepstral coefficients, spectral contrast, zero-crossing rate), and classification using models including Random Forest, HistGradientBoosting, CatBoost, TabNet, and LSTM. Feature importance analysis and statistical tests (e.g., t-tests, correlation metrics) confirmed that specific MFCC bands and spectral descriptors were significantly associated with welfare indicators. LSTM-based temporal modeling revealed distinct acoustic trajectories under visual and auditory stress, supporting the presence of habituation and stressor-specific vocal adaptations over time. Model performance, validated through stratified cross-validation and multiple statistical metrics (e.g., F1-score, Matthews correlation coefficient), demonstrated high classification accuracy and generalizability. Importantly, the approach emphasizes model interpretability, facilitating alignment with known physiological and behavioral processes in poultry. The findings underscore the potential of acoustic sensing and interpretable AI as scalable, biologically grounded tools for real-time poultry welfare monitoring, contributing to the advancement of sustainable and ethical livestock production systems. Full article
(This article belongs to the Special Issue Sensors in 2025)
Show Figures

Figure 1

20 pages, 2817 KiB  
Article
Escalate Prognosis of Parkinson’s Disease Employing Wavelet Features and Artificial Intelligence from Vowel Phonation
by Rumana Islam and Mohammed Tarique
BioMedInformatics 2025, 5(2), 23; https://doi.org/10.3390/biomedinformatics5020023 - 30 Apr 2025
Viewed by 1416
Abstract
Background: This work presents an artificial intelligence-based algorithm for detecting Parkinson’s disease (PD) from voice signals. The detection of PD at pre-symptomatic stages is imperative to slow disease progression. Speech signal processing-based PD detection can play a crucial role here, as it has [...] Read more.
Background: This work presents an artificial intelligence-based algorithm for detecting Parkinson’s disease (PD) from voice signals. The detection of PD at pre-symptomatic stages is imperative to slow disease progression. Speech signal processing-based PD detection can play a crucial role here, as it has been reported in the literature that PD affects the voice quality of patients at an early stage. Hence, speech samples can be used as biomarkers of PD, provided that suitable voice features and artificial intelligence algorithms are employed. Methods: Advanced signal-processing techniques are used to extract audio features from the sustained vowel ‘/a/’ sound. The extracted audio features include baseline features, intensities, formant frequencies, bandwidths, vocal fold parameters, and Mel-frequency cepstral coefficients (MFCCs) to form a feature vector. Then, this feature vector is further enriched by including wavelet-based features to form the second feature vector. For classification purposes, two popular machine learning models, namely, support vector machine (SVM) and k-nearest neighbors (kNNs), are trained to distinguish patients with PD. Results: The results demonstrate that the inclusion of wavelet-based voice features enhances the performance of both the SVM and kNN models for PD detection. However, kNN provides better accuracy, detection speed, training time, and misclassification cost than SVM. Conclusions: This work concludes that wavelet-based voice features are important for detecting neurodegenerative diseases like PD. These wavelet features can enhance the classification performance of machine learning models. This work also concludes that kNN is recommendable over SVM for the investigated voice features, despite the inclusion and exclusion of the wavelet features. Full article
Show Figures

Figure 1

Back to TopTop