Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (188)

Search Parameters:
Keywords = Mel-Frequency Cepstral Coefficient (MFCC)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 1530 KiB  
Article
Enhanced Respiratory Sound Classification Using Deep Learning and Multi-Channel Auscultation
by Yeonkyeong Kim, Kyu Bom Kim, Ah Young Leem, Kyuseok Kim and Su Hwan Lee
J. Clin. Med. 2025, 14(15), 5437; https://doi.org/10.3390/jcm14155437 - 1 Aug 2025
Viewed by 157
Abstract
Background/Objectives: Identifying and classifying abnormal lung sounds is essential for diagnosing patients with respiratory disorders. In particular, the simultaneous recording of auscultation signals from multiple clinically relevant positions offers greater diagnostic potential compared to traditional single-channel measurements. This study aims to improve [...] Read more.
Background/Objectives: Identifying and classifying abnormal lung sounds is essential for diagnosing patients with respiratory disorders. In particular, the simultaneous recording of auscultation signals from multiple clinically relevant positions offers greater diagnostic potential compared to traditional single-channel measurements. This study aims to improve the accuracy of respiratory sound classification by leveraging multichannel signals and capturing positional characteristics from multiple sites in the same patient. Methods: We evaluated the performance of respiratory sound classification using multichannel lung sound data with a deep learning model that combines a convolutional neural network (CNN) and long short-term memory (LSTM), based on mel-frequency cepstral coefficients (MFCCs). We analyzed the impact of the number and placement of channels on classification performance. Results: The results demonstrated that using four-channel recordings improved accuracy, sensitivity, specificity, precision, and F1-score by approximately 1.11, 1.15, 1.05, 1.08, and 1.13 times, respectively, compared to using three, two, or single-channel recordings. Conclusions: This study confirms that multichannel data capture a richer set of features corresponding to various respiratory sound characteristics, leading to significantly improved classification performance. The proposed method holds promise for enhancing sound classification accuracy not only in clinical applications but also in broader domains such as speech and audio processing. Full article
(This article belongs to the Section Respiratory Medicine)
Show Figures

Figure 1

13 pages, 769 KiB  
Article
A Novel You Only Listen Once (YOLO) Deep Learning Model for Automatic Prominent Bowel Sounds Detection: Feasibility Study in Healthy Subjects
by Rohan Kalahasty, Gayathri Yerrapragada, Jieun Lee, Keerthy Gopalakrishnan, Avneet Kaur, Pratyusha Muddaloor, Divyanshi Sood, Charmy Parikh, Jay Gohri, Gianeshwaree Alias Rachna Panjwani, Naghmeh Asadimanesh, Rabiah Aslam Ansari, Swetha Rapolu, Poonguzhali Elangovan, Shiva Sankari Karuppiah, Vijaya M. Dasari, Scott A. Helgeson, Venkata S. Akshintala and Shivaram P. Arunachalam
Sensors 2025, 25(15), 4735; https://doi.org/10.3390/s25154735 - 31 Jul 2025
Viewed by 283
Abstract
Accurate diagnosis of gastrointestinal (GI) diseases typically requires invasive procedures or imaging studies that pose the risk of various post-procedural complications or involve radiation exposure. Bowel sounds (BSs), though typically described during a GI-focused physical exam, are highly inaccurate and variable, with low [...] Read more.
Accurate diagnosis of gastrointestinal (GI) diseases typically requires invasive procedures or imaging studies that pose the risk of various post-procedural complications or involve radiation exposure. Bowel sounds (BSs), though typically described during a GI-focused physical exam, are highly inaccurate and variable, with low clinical value in diagnosis. Interpretation of the acoustic characteristics of BSs, i.e., using a phonoenterogram (PEG), may aid in diagnosing various GI conditions non-invasively. Use of artificial intelligence (AI) and improvements in computational analysis can enhance the use of PEGs in different GI diseases and lead to a non-invasive, cost-effective diagnostic modality that has not been explored before. The purpose of this work was to develop an automated AI model, You Only Listen Once (YOLO), to detect prominent bowel sounds that can enable real-time analysis for future GI disease detection and diagnosis. A total of 110 2-minute PEGs sampled at 44.1 kHz were recorded using the Eko DUO® stethoscope from eight healthy volunteers at two locations, namely, left upper quadrant (LUQ) and right lower quadrant (RLQ) after IRB approval. The datasets were annotated by trained physicians, categorizing BSs as prominent or obscure using version 1.7 of Label Studio Software®. Each BS recording was split up into 375 ms segments with 200 ms overlap for real-time BS detection. Each segment was binned based on whether it contained a prominent BS, resulting in a dataset of 36,149 non-prominent segments and 6435 prominent segments. Our dataset was divided into training, validation, and test sets (60/20/20% split). A 1D-CNN augmented transformer was trained to classify these segments via the input of Mel-frequency cepstral coefficients. The developed AI model achieved area under the receiver operating curve (ROC) of 0.92, accuracy of 86.6%, precision of 86.85%, and recall of 86.08%. This shows that the 1D-CNN augmented transformer with Mel-frequency cepstral coefficients achieved creditable performance metrics, signifying the YOLO model’s capability to classify prominent bowel sounds that can be further analyzed for various GI diseases. This proof-of-concept study in healthy volunteers demonstrates that automated BS detection can pave the way for developing more intuitive and efficient AI-PEG devices that can be trained and utilized to diagnose various GI conditions. To ensure the robustness and generalizability of these findings, further investigations encompassing a broader cohort, inclusive of both healthy and disease states are needed. Full article
(This article belongs to the Special Issue Biomedical Signals, Images and Healthcare Data Analysis: 2nd Edition)
Show Figures

Figure 1

23 pages, 3741 KiB  
Article
Multi-Corpus Benchmarking of CNN and LSTM Models for Speaker Gender and Age Profiling
by Jorge Jorrin-Coz, Mariko Nakano, Hector Perez-Meana and Leobardo Hernandez-Gonzalez
Computation 2025, 13(8), 177; https://doi.org/10.3390/computation13080177 - 23 Jul 2025
Viewed by 290
Abstract
Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, [...] Read more.
Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, crowdsourced Mozilla Common Voice, and in-the-wild VoxCeleb1. All models share the same architecture, optimizer, and data preprocessing; no corpus-specific hyperparameter tuning is applied. We perform a detailed preprocessing and feature extraction procedure, evaluating multiple configurations and validating their applicability and effectiveness in improving the obtained results. A feature analysis shows that Mel spectrograms benefit CNNs, whereas Mel Frequency Cepstral Coefficients (MFCCs) suit LSTMs, and that the optimal Mel-bin count grows with corpus Signal Noise Rate (SNR). With this fixed recipe, EfficientNet achieves 99.82% gender accuracy on Common Voice (+1.25 pp over the previous best) and 98.86% on VoxCeleb1 (+0.57 pp). MobileNet attains 99.86% age-group accuracy on Common Voice (+2.86 pp) and a 5.35-year MAE for age estimation on TIMIT using a lightweight configuration. The consistent, near-state-of-the-art results across three acoustically diverse datasets substantiate the robustness and versatility of the proposed pipeline. Code and pre-trained weights are released to facilitate downstream research. Full article
(This article belongs to the Section Computational Engineering)
Show Figures

Graphical abstract

10 pages, 857 KiB  
Proceeding Paper
Implementation of a Prototype-Based Parkinson’s Disease Detection System Using a RISC-V Processor
by Krishna Dharavathu, Pavan Kumar Sankula, Uma Maheswari Vullanki, Subhan Khan Mohammad, Sai Priya Kesapatnapu and Sameer Shaik
Eng. Proc. 2025, 87(1), 97; https://doi.org/10.3390/engproc2025087097 - 21 Jul 2025
Viewed by 206
Abstract
In the wide range of human diseases, Parkinson’s disease (PD) has a high incidence, according to a recent survey by the World Health Organization (WHO). According to WHO records, this chronic disease has affected approximately 10 million people worldwide. Patients who do not [...] Read more.
In the wide range of human diseases, Parkinson’s disease (PD) has a high incidence, according to a recent survey by the World Health Organization (WHO). According to WHO records, this chronic disease has affected approximately 10 million people worldwide. Patients who do not receive an early diagnosis may develop an incurable neurological disorder. PD is a degenerative disorder of the brain, characterized by the impairment of the nigrostriatal system. A wide range of symptoms of motor and non-motor impairment accompanies this disorder. By using new technology, the PD is detected through speech signals of the PD victims by using the reduced instruction set computing 5th version (RISC-V) processor. The RISC-V microcontroller unit (MCU) was designed for the voice-controlled human-machine interface (HMI). With the help of signal processing and feature extraction methods, the digital signal is impaired by the impairment of the nigrostriatal system. These speech signals can be classified through classifier modules. A wide range of classifier modules are used to classify the speech signals as normal or abnormal to identify PD. We use Matrix Laboratory (MATLAB R2021a_v9.10.0.1602886) to analyze the data, develop algorithms, create modules, and develop the RISC-V processor for embedded implementation. Machine learning (ML) techniques are also used to extract features such as pitch, tremor, and Mel-frequency cepstral coefficients (MFCCs). Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

20 pages, 1798 KiB  
Article
An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment
by Alessio Catalfamo, Antonio Celesti, Maria Fazio, A. F. M. Saifuddin Saif, Yu-Sheng Lin, Edelberto Franco Silva and Massimo Villari
Big Data Cogn. Comput. 2025, 9(7), 188; https://doi.org/10.3390/bdcc9070188 - 17 Jul 2025
Viewed by 481
Abstract
Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications [...] Read more.
Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice. Full article
Show Figures

Figure 1

19 pages, 1039 KiB  
Article
Prediction of Parkinson Disease Using Long-Term, Short-Term Acoustic Features Based on Machine Learning
by Mehdi Rashidi, Serena Arima, Andrea Claudio Stetco, Chiara Coppola, Debora Musarò, Marco Greco, Marina Damato, Filomena My, Angela Lupo, Marta Lorenzo, Antonio Danieli, Giuseppe Maruccio, Alberto Argentiero, Andrea Buccoliero, Marcello Dorian Donzella and Michele Maffia
Brain Sci. 2025, 15(7), 739; https://doi.org/10.3390/brainsci15070739 - 10 Jul 2025
Viewed by 516
Abstract
Background: Parkinson’s disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease, affecting countless individuals worldwide. PD is characterized by the onset of a marked motor symptomatology in association with several non-motor manifestations. The clinical phase of the disease is usually [...] Read more.
Background: Parkinson’s disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease, affecting countless individuals worldwide. PD is characterized by the onset of a marked motor symptomatology in association with several non-motor manifestations. The clinical phase of the disease is usually preceded by a long prodromal phase, devoid of overt motor symptomatology but often showing some conditions such as sleep disturbance, constipation, anosmia, and phonatory changes. To date, speech analysis appears to be a promising digital biomarker to anticipate even 10 years before the onset of clinical PD, as well serving as a useful prognostic tool for patient follow-up. That is why, the voice can be nominated as the non-invasive method to detect PD from healthy subjects (HS). Methods: Our study was based on cross-sectional study to analysis voice impairment. A dataset comprising 81 voice samples (41 from healthy individuals and 40 from PD patients) was utilized to train and evaluate common machine learning (ML) models using various types of features, including long-term (jitter, shimmer, and cepstral peak prominence (CPP)), short-term features (Mel-frequency cepstral coefficient (MFCC)), and non-standard measurements (pitch period entropy (PPE) and recurrence period density entropy (RPDE)). The study adopted multiple machine learning (ML) algorithms, including random forest (RF), K-nearest neighbors (KNN), decision tree (DT), naïve Bayes (NB), support vector machines (SVM), and logistic regression (LR). Cross-validation technique was applied to ensure the reliability of performance metrics on train and test subsets. These metrics (accuracy, recall, and precision), help determine the most effective models for distinguishing PD from healthy subjects. Result: Among all the algorithms used in this research, random forest (RF) was the best-performing model, achieving an accuracy of 82.72% with a ROC-AUC score of 89.65%. Although other models, such as support vector machine (SVM), could be considered with an accuracy of 75.29% and a ROC-AUC score of 82.63%, RF was by far the best one when evaluated across all metrics. The K-nearest neighbor (KNN) and decision tree (DT) performed the worst. Notably, by combining a comprehensive set of long-term, short-term, and non-standard acoustic features, unlike previous studies that typically focused on only a subset, our study achieved higher predictive performance, offering a more robust model for early PD detection. Conclusions: This study highlights the potential of combining advanced acoustic analysis with ML algorithms to develop non-invasive and reliable tools for early PD detection, offering substantial benefits for the healthcare sector. Full article
(This article belongs to the Section Neurodegenerative Diseases)
Show Figures

Figure 1

18 pages, 3035 KiB  
Article
Data-Driven Modeling and Enhancement of Surface Quality in Milling Based on Sound Signals
by Paschalis Charalampous
J. Manuf. Mater. Process. 2025, 9(7), 231; https://doi.org/10.3390/jmmp9070231 - 4 Jul 2025
Viewed by 383
Abstract
The present study introduces an AI (Artificial Intelligence) framework for surface roughness assessment in milling operations through sound signal processing. As industrial demands escalate for in-process quality control solutions, the proposed system leverages audio data to estimate surface finish states without interrupting production. [...] Read more.
The present study introduces an AI (Artificial Intelligence) framework for surface roughness assessment in milling operations through sound signal processing. As industrial demands escalate for in-process quality control solutions, the proposed system leverages audio data to estimate surface finish states without interrupting production. In order to address this, a novel classification approach was developed that maps audio waveform data into predictive indicators of surface quality. In particular, an experimental dataset was employed consisting of sound signals that were captured during milling procedures applying various machining conditions, where each signal was labeled with a corresponding roughness quality obtained via offline metrology. The formulated classification pipeline commences with audio acquisition, resampling, and normalization to ensure consistency across the dataset. These signals are then transformed into Mel-Frequency Cepstral Coefficients (MFCCs), which yield a compact time–frequency representation optimized for human auditory perception. Next, several AI algorithms were trained in order to classify these MFCCs into predefined surface roughness categories. Finally, the results of the work demonstrate that sound signals could contain sufficient discriminatory information enabling a reliable classification of surface finish quality. This approach not only facilitates in-process monitoring but also provides a foundation for intelligent manufacturing systems capable of real-time quality assurance. Full article
Show Figures

Figure 1

22 pages, 4293 KiB  
Article
Speech-Based Parkinson’s Detection Using Pre-Trained Self-Supervised Automatic Speech Recognition (ASR) Models and Supervised Contrastive Learning
by Hadi Sedigh Malekroodi, Nuwan Madusanka, Byeong-il Lee and Myunggi Yi
Bioengineering 2025, 12(7), 728; https://doi.org/10.3390/bioengineering12070728 - 1 Jul 2025
Viewed by 842
Abstract
Diagnosing Parkinson’s disease (PD) through speech analysis is a promising area of research, as speech impairments are often one of the early signs of the disease. This study investigates the efficacy of fine-tuning pre-trained Automatic Speech Recognition (ASR) models, specifically Wav2Vec 2.0 and [...] Read more.
Diagnosing Parkinson’s disease (PD) through speech analysis is a promising area of research, as speech impairments are often one of the early signs of the disease. This study investigates the efficacy of fine-tuning pre-trained Automatic Speech Recognition (ASR) models, specifically Wav2Vec 2.0 and HuBERT, for PD detection using transfer learning. These models, pre-trained on large unlabeled datasets, can be capable of learning rich speech representations that capture acoustic markers of PD. The study also proposes the integration of a supervised contrastive (SupCon) learning approach to enhance the models’ ability to distinguish PD-specific features. Additionally, the proposed ASR-based features were compared against two common acoustic feature sets: mel-frequency cepstral coefficients (MFCCs) and the extended Geneva minimalistic acoustic parameter set (eGeMAPS) as a baseline. We also employed a gradient-based method, Grad-CAM, to visualize important speech regions contributing to the models’ predictions. The experiments, conducted using the NeuroVoz dataset, demonstrated that features extracted from the pre-trained ASR models exhibited superior performance compared to the baseline features. The results also reveal that the method integrating SupCon consistently outperforms traditional cross-entropy (CE)-based models. Wav2Vec 2.0 and HuBERT with SupCon achieved the highest F1 scores of 90.0% and 88.99%, respectively. Additionally, their AUC scores in the ROC analysis surpassed those of the CE models, which had comparatively lower AUCs, ranging from 0.84 to 0.89. These results highlight the potential of ASR-based models as scalable, non-invasive tools for diagnosing and monitoring PD, offering a promising avenue for the early detection and management of this debilitating condition. Full article
Show Figures

Figure 1

28 pages, 1634 KiB  
Review
AI-Powered Vocalization Analysis in Poultry: Systematic Review of Health, Behavior, and Welfare Monitoring
by Venkatraman Manikandan and Suresh Neethirajan
Sensors 2025, 25(13), 4058; https://doi.org/10.3390/s25134058 - 29 Jun 2025
Viewed by 1006
Abstract
Artificial intelligence and bioacoustics represent a paradigm shift in non-invasive poultry welfare monitoring through advanced vocalization analysis. This comprehensive systematic review critically examines the transformative evolution from traditional acoustic feature extraction—including Mel-Frequency Cepstral Coefficients (MFCCs), spectral entropy, and spectrograms—to cutting-edge deep learning architectures [...] Read more.
Artificial intelligence and bioacoustics represent a paradigm shift in non-invasive poultry welfare monitoring through advanced vocalization analysis. This comprehensive systematic review critically examines the transformative evolution from traditional acoustic feature extraction—including Mel-Frequency Cepstral Coefficients (MFCCs), spectral entropy, and spectrograms—to cutting-edge deep learning architectures encompassing Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, attention mechanisms, and groundbreaking self-supervised models such as wav2vec2 and Whisper. The investigation reveals compelling evidence for edge computing deployment via TinyML frameworks, addressing critical scalability challenges in commercial poultry environments characterized by acoustic complexity and computational constraints. Advanced applications spanning emotion recognition, disease detection, and behavioral phenotyping demonstrate unprecedented potential for real-time welfare assessment. Through rigorous bibliometric co-occurrence mapping and thematic clustering analysis, this review exposes persistent methodological bottlenecks: dataset standardization deficits, evaluation protocol inconsistencies, and algorithmic interpretability limitations. Critical knowledge gaps emerge in cross-species domain generalization and contextual acoustic adaptation, demanding urgent research prioritization. The findings underscore explainable AI integration as essential for establishing stakeholder trust and regulatory compliance in automated welfare monitoring systems. This synthesis positions acoustic AI as a cornerstone technology enabling ethical, transparent, and scientifically robust precision livestock farming, bridging computational innovation with biological relevance for sustainable poultry production systems. Future research directions emphasize multi-modal sensor integration, standardized evaluation frameworks, and domain-adaptive models capable of generalizing across diverse poultry breeds, housing conditions, and environmental contexts while maintaining interpretability for practical farm deployment. Full article
(This article belongs to the Special Issue Feature Papers in Smart Agriculture 2025)
Show Figures

Figure 1

26 pages, 1521 KiB  
Article
AI-Based Classification of Pediatric Breath Sounds: Toward a Tool for Early Respiratory Screening
by Lichuan Liu, Wei Li and Beth Moxley
Appl. Sci. 2025, 15(13), 7145; https://doi.org/10.3390/app15137145 - 25 Jun 2025
Viewed by 446
Abstract
Context: Respiratory morbidity is a leading cause of children’s consultations with general practitioners. Auscultation, the act of listening to breath sounds, is a crucial diagnostic method for respiratory system diseases. Problem: Parents and caregivers often lack the necessary knowledge and experience to identify [...] Read more.
Context: Respiratory morbidity is a leading cause of children’s consultations with general practitioners. Auscultation, the act of listening to breath sounds, is a crucial diagnostic method for respiratory system diseases. Problem: Parents and caregivers often lack the necessary knowledge and experience to identify subtle differences in children’s breath sounds. Furthermore, obtaining reliable feedback from young children about their physical condition is challenging. Methods: The use of a human–artificial intelligence (AI) tool is an essential component for screening and monitoring young children’s respiratory diseases. Using clinical data to design and validate the proposed approaches, we propose novel methods for recognizing and classifying children’s breath sounds. Different breath sound signals were analyzed in the time domain, frequency domain, and using spectrogram representations. Breath sound detection and segmentation were performed using digital signal processing techniques. Multiple features—including Mel–Frequency Cepstral Coefficients (MFCCs), Linear Prediction Coefficients (LPCs), Linear Prediction Cepstral Coefficients (LPCCs), spectral entropy, and Dynamic Linear Prediction Coefficients (DLPCs)—were extracted to capture both time and frequency characteristics. These features were then fed into various classifiers, including K-Nearest Neighbor (KNN), artificial neural networks (ANNs), hidden Markov models (HMMs), logistic regression, and decision trees, for recognition and classification. Main Findings: Experimental results from across 120 infants and preschoolers (2 months to 6 years) with respiratory disease (30 asthma, 30 croup, 30 pneumonia, and 30 normal) verified the performance of the proposed approaches. Conclusions: The proposed AI system provides a real-time diagnostic platform to improve clinical respiratory management and outcomes in young children, thereby reducing healthcare costs. Future work exploring additional respiratory diseases is warranted. Full article
Show Figures

Figure 1

15 pages, 1458 KiB  
Article
Photoplethysmography Feature Extraction for Non-Invasive Glucose Estimation by Means of MFCC and Machine Learning Techniques
by Christian Salamea-Palacios, Melissa Montalvo-López, Raquel Orellana-Peralta and Javier Viñanzaca-Figueroa
Biosensors 2025, 15(7), 408; https://doi.org/10.3390/bios15070408 - 24 Jun 2025
Viewed by 518
Abstract
Diabetes Mellitus is considered one of the most widespread diseases in the world. Traditional glucose monitoring devices carry discomfort and risks associated with the frequent extraction of blood from users. The present article proposes a noninvasive glucose estimation system based on the application [...] Read more.
Diabetes Mellitus is considered one of the most widespread diseases in the world. Traditional glucose monitoring devices carry discomfort and risks associated with the frequent extraction of blood from users. The present article proposes a noninvasive glucose estimation system based on the application of Mel Frequency Cepstral Coefficients (MFCCs) for the characterization of photoplethysmographic signals (PPG). Two variants of the MFCC feature extraction methods are evaluated along with three machine learning techniques for the development of an effective regression function for the estimation of glucose concentration. A comparison between the performance of the algorithms revealed that the best combination achieved a mean absolute error of 9.85 mg/dL and a correlation of 0.94 between the estimated concentration and the real glucose values. Similarly, 99.53% of the validation samples were distributed within zones A and B of the Clarke Error Grid Analysis. The proposed system achieves levels of correlation comparable to analogous technologies that require earlier calibration for its operation, which indicates a strong potential for the future use of the algorithm as an alternative to invasive monitoring devices. Full article
(This article belongs to the Section Wearable Biosensors)
Show Figures

Figure 1

22 pages, 5083 KiB  
Article
Intelligent Mobile-Assisted Language Learning: A Deep Learning Approach for Pronunciation Analysis and Personalized Feedback
by Fengqin Liu, Korawit Orkphol, Natthapon Pannurat, Thanat Sooknuan, Thanin Muangpool, Sanya Kuankid and Montri Phothisonothai
Inventions 2025, 10(4), 46; https://doi.org/10.3390/inventions10040046 - 24 Jun 2025
Viewed by 644
Abstract
This paper introduces an innovative mobile-assisted language-learning (MALL) system that harnesses deep learning technology to analyze pronunciation patterns and deliver real-time, personalized feedback. Drawing inspiration from how the human brain processes speech through neural pathways, our system analyzes multiple speech features using spectrograms, [...] Read more.
This paper introduces an innovative mobile-assisted language-learning (MALL) system that harnesses deep learning technology to analyze pronunciation patterns and deliver real-time, personalized feedback. Drawing inspiration from how the human brain processes speech through neural pathways, our system analyzes multiple speech features using spectrograms, mel-frequency cepstral coefficients (MFCCs), and formant frequencies in a manner that mirrors the auditory cortex’s interpretation of sound. The core of our approach utilizes a convolutional neural network (CNN) to classify pronunciation patterns from user-recorded speech. To enhance the assessment accuracy and provide nuanced feedback, we integrated a fuzzy inference system (FIS) that helps learners identify and correct specific pronunciation errors. The experimental results demonstrate that our multi-feature model achieved 82.41% to 90.52% accuracies in accent classification across diverse linguistic contexts. The user testing revealed statistically significant improvements in pronunciation skills, where learners showed a 5–20% enhancement in accuracy after using the system. The proposed MALL system offers a portable, accessible solution for language learners while establishing a foundation for future research in multilingual functionality and mobile platform optimization. By combining advanced speech analysis with intuitive feedback mechanisms, this system addresses a critical challenge in language acquisition and promotes more effective self-directed learning. Full article
Show Figures

Figure 1

15 pages, 49760 KiB  
Article
Rapid Diagnosis of Distributed Acoustic Sensing Vibration Signals Using Mel-Frequency Cepstral Coefficients and Liquid Neural Networks
by Haitao Liu, Yunfan Xu, Yuefeng Qi, Haosong Yang and Weihong Bi
Sensors 2025, 25(10), 3090; https://doi.org/10.3390/s25103090 - 13 May 2025
Cited by 1 | Viewed by 604
Abstract
Distributed Acoustic Sensing (DAS) systems face increasing challenges in massive data processing and real-time fault diagnosis due to the growing complexity of industrial environments and data volume. To address these issues, an end-to-end diagnostic framework is developed, integrating Mel-Frequency Cepstral Coefficients (MFCCs) for [...] Read more.
Distributed Acoustic Sensing (DAS) systems face increasing challenges in massive data processing and real-time fault diagnosis due to the growing complexity of industrial environments and data volume. To address these issues, an end-to-end diagnostic framework is developed, integrating Mel-Frequency Cepstral Coefficients (MFCCs) for high-efficiency signal compression and Liquid Neural Networks (LNNs) for lightweight, real-time classification. The MFCC algorithm, originally used in speech processing, is adapted to extract key features from DAS vibration signals, achieving compression ratios of 60–100× without significant information loss. LNNs’ dynamic topology and sparse activation enable high accuracy with extremely low latency and minimal computational cost, making it highly suitable for edge deployment. The proposed framework was validated both in simulated environments and on a real-world conveyor belt system at Qinhuangdao Port, where it achieved 100% accuracy across four vibration modes over 14 weeks of operation. Comparative experiments show that LNNs outperform traditional models such as 1D-CNN and LSTMs in terms of accuracy, inference speed, and model size. The proposed MFCC-LNN pipeline also demonstrates strong cross-domain generalization capabilities in pipeline monitoring, seismic detection, and speech signal processing. Full article
(This article belongs to the Section Fault Diagnosis & Sensors)
Show Figures

Figure 1

33 pages, 4811 KiB  
Article
Enhancing the Prediction of Episodes of Aggression in Patients with Dementia Using Audio-Based Detection: A Multimodal Late Fusion Approach with a Meta-Classifier
by Ioannis Galanakis, Rigas Filippos Soldatos, Nikitas Karanikolas, Athanasios Voulodimos, Ioannis Voyiatzis and Maria Samarakou
Appl. Sci. 2025, 15(10), 5351; https://doi.org/10.3390/app15105351 - 10 May 2025
Cited by 1 | Viewed by 570
Abstract
This study presents an enhancement in the prediction of aggressive outbursts in dementia patients from our previous work, by integrating audio-based violence detection into our previous visual-based aggressive body movement detections. By combining audio and visual information, we aim to further enhance the [...] Read more.
This study presents an enhancement in the prediction of aggressive outbursts in dementia patients from our previous work, by integrating audio-based violence detection into our previous visual-based aggressive body movement detections. By combining audio and visual information, we aim to further enhance the model’s capabilities and make it more suitable for real-world scenario applications. This current work utilizes an audio dataset, containing various audio segments capturing vocal expressions during aggressive and non-aggressive scenarios. Various noise-filtering techniques were performed on the audio files using Mel-frequency cepstral coefficients (MFCCs), frequency filtering, and speech prosody to extract clear information from the audio features. Furthermore, we perform a late fusion rule to merge the predictions of the two models into a unified trained meta-classifier to determine the further improvement of the model with the audio integrated into it with a higher aim for a more precise and multimodal approach in detecting and predicting aggressive outburst behavior in patients suffering from dementia. The analysis of the correlations in our multimodal approach suggests that the accuracy of the early detection models is improved, providing a novel proof of concept with the appropriate findings to advance the understanding of aggression prediction in clinical settings and offer more effective intervention tactics from caregivers. Full article
(This article belongs to the Special Issue Big Data Analytics and Deep Learning for Predictive Maintenance)
Show Figures

Figure 1

19 pages, 2092 KiB  
Article
Multi-Detection-Based Speech Emotion Recognition Using Autoencoder in Mobility Service Environment
by Jeong Min Oh, Jin Kwan Kim and Joon Young Kim
Electronics 2025, 14(10), 1915; https://doi.org/10.3390/electronics14101915 - 8 May 2025
Viewed by 661
Abstract
In mobility service environments, recognizing the user condition and driving status is critical in driving safety and experiences. While speech emotion recognition is one of the possible features to predict the driver status, current emotion recognition models have a fundamental limitation: they target [...] Read more.
In mobility service environments, recognizing the user condition and driving status is critical in driving safety and experiences. While speech emotion recognition is one of the possible features to predict the driver status, current emotion recognition models have a fundamental limitation: they target to classify only single emotion classes, not multi-classes. It prevents the comprehensive understanding of the driver’s condition and intention during driving. In addition, mobility devices inherently generate noises that might affect speech emotion recognition performances in the mobility service. Considering mobility service environments, we investigate possible models that detect multiple emotions while mitigating noise issues. In this paper, we propose a speech-emotion recognition model based on the autoencoder for multi-emotion detection. First, we analyze the Mel Frequency Cepstral Coefficients (MFCCs) to design the specific features. We also develop a multi-emotion detection scheme based on an autoencoder to detect multiple emotions with substantial flexibility compared to existing models. With our proposed scheme, we investigate and analyze mobility noise impacts and mitigation approaches to evaluate performance results. Full article
Show Figures

Figure 1

Back to TopTop