MDPI - Publisher of Open Access Journals

16 pages, 1530 KiB

Open AccessArticle

Enhanced Respiratory Sound Classification Using Deep Learning and Multi-Channel Auscultation

by Yeonkyeong Kim, Kyu Bom Kim, Ah Young Leem, Kyuseok Kim and Su Hwan Lee

J. Clin. Med. 2025, 14(15), 5437; https://doi.org/10.3390/jcm14155437 - 1 Aug 2025

Viewed by 157

Background/Objectives: Identifying and classifying abnormal lung sounds is essential for diagnosing patients with respiratory disorders. In particular, the simultaneous recording of auscultation signals from multiple clinically relevant positions offers greater diagnostic potential compared to traditional single-channel measurements. This study aims to improve [...] Read more.

Background/Objectives: Identifying and classifying abnormal lung sounds is essential for diagnosing patients with respiratory disorders. In particular, the simultaneous recording of auscultation signals from multiple clinically relevant positions offers greater diagnostic potential compared to traditional single-channel measurements. This study aims to improve the accuracy of respiratory sound classification by leveraging multichannel signals and capturing positional characteristics from multiple sites in the same patient. Methods: We evaluated the performance of respiratory sound classification using multichannel lung sound data with a deep learning model that combines a convolutional neural network (CNN) and long short-term memory (LSTM), based on mel-frequency cepstral coefficients (MFCCs). We analyzed the impact of the number and placement of channels on classification performance. Results: The results demonstrated that using four-channel recordings improved accuracy, sensitivity, specificity, precision, and F1-score by approximately 1.11, 1.15, 1.05, 1.08, and 1.13 times, respectively, compared to using three, two, or single-channel recordings. Conclusions: This study confirms that multichannel data capture a richer set of features corresponding to various respiratory sound characteristics, leading to significantly improved classification performance. The proposed method holds promise for enhancing sound classification accuracy not only in clinical applications but also in broader domains such as speech and audio processing. Full article

(This article belongs to the Section Respiratory Medicine)

► Show Figures

Figure 1

13 pages, 769 KiB

Open AccessArticle

A Novel You Only Listen Once (YOLO) Deep Learning Model for Automatic Prominent Bowel Sounds Detection: Feasibility Study in Healthy Subjects

by Rohan Kalahasty, Gayathri Yerrapragada, Jieun Lee, Keerthy Gopalakrishnan, Avneet Kaur, Pratyusha Muddaloor, Divyanshi Sood, Charmy Parikh, Jay Gohri, Gianeshwaree Alias Rachna Panjwani, Naghmeh Asadimanesh, Rabiah Aslam Ansari, Swetha Rapolu, Poonguzhali Elangovan, Shiva Sankari Karuppiah, Vijaya M. Dasari, Scott A. Helgeson, Venkata S. Akshintala and Shivaram P. Arunachalam

Sensors 2025, 25(15), 4735; https://doi.org/10.3390/s25154735 - 31 Jul 2025

Viewed by 283

Abstract

Accurate diagnosis of gastrointestinal (GI) diseases typically requires invasive procedures or imaging studies that pose the risk of various post-procedural complications or involve radiation exposure. Bowel sounds (BSs), though typically described during a GI-focused physical exam, are highly inaccurate and variable, with low [...] Read more.

Accurate diagnosis of gastrointestinal (GI) diseases typically requires invasive procedures or imaging studies that pose the risk of various post-procedural complications or involve radiation exposure. Bowel sounds (BSs), though typically described during a GI-focused physical exam, are highly inaccurate and variable, with low clinical value in diagnosis. Interpretation of the acoustic characteristics of BSs, i.e., using a phonoenterogram (PEG), may aid in diagnosing various GI conditions non-invasively. Use of artificial intelligence (AI) and improvements in computational analysis can enhance the use of PEGs in different GI diseases and lead to a non-invasive, cost-effective diagnostic modality that has not been explored before. The purpose of this work was to develop an automated AI model, You Only Listen Once (YOLO), to detect prominent bowel sounds that can enable real-time analysis for future GI disease detection and diagnosis. A total of 110 2-minute PEGs sampled at 44.1 kHz were recorded using the Eko DUO^® stethoscope from eight healthy volunteers at two locations, namely, left upper quadrant (LUQ) and right lower quadrant (RLQ) after IRB approval. The datasets were annotated by trained physicians, categorizing BSs as prominent or obscure using version 1.7 of Label Studio Software^®. Each BS recording was split up into 375 ms segments with 200 ms overlap for real-time BS detection. Each segment was binned based on whether it contained a prominent BS, resulting in a dataset of 36,149 non-prominent segments and 6435 prominent segments. Our dataset was divided into training, validation, and test sets (60/20/20% split). A 1D-CNN augmented transformer was trained to classify these segments via the input of Mel-frequency cepstral coefficients. The developed AI model achieved area under the receiver operating curve (ROC) of 0.92, accuracy of 86.6%, precision of 86.85%, and recall of 86.08%. This shows that the 1D-CNN augmented transformer with Mel-frequency cepstral coefficients achieved creditable performance metrics, signifying the YOLO model’s capability to classify prominent bowel sounds that can be further analyzed for various GI diseases. This proof-of-concept study in healthy volunteers demonstrates that automated BS detection can pave the way for developing more intuitive and efficient AI-PEG devices that can be trained and utilized to diagnose various GI conditions. To ensure the robustness and generalizability of these findings, further investigations encompassing a broader cohort, inclusive of both healthy and disease states are needed. Full article

(This article belongs to the Special Issue Biomedical Signals, Images and Healthcare Data Analysis: 2nd Edition)

► Show Figures

Figure 1

23 pages, 3741 KiB

Open AccessArticle

Multi-Corpus Benchmarking of CNN and LSTM Models for Speaker Gender and Age Profiling

by Jorge Jorrin-Coz, Mariko Nakano, Hector Perez-Meana and Leobardo Hernandez-Gonzalez

Computation 2025, 13(8), 177; https://doi.org/10.3390/computation13080177 - 23 Jul 2025

Viewed by 290

Abstract

Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, [...] Read more.

Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, crowdsourced Mozilla Common Voice, and in-the-wild VoxCeleb1. All models share the same architecture, optimizer, and data preprocessing; no corpus-specific hyperparameter tuning is applied. We perform a detailed preprocessing and feature extraction procedure, evaluating multiple configurations and validating their applicability and effectiveness in improving the obtained results. A feature analysis shows that Mel spectrograms benefit CNNs, whereas Mel Frequency Cepstral Coefficients (MFCCs) suit LSTMs, and that the optimal Mel-bin count grows with corpus Signal Noise Rate (SNR). With this fixed recipe, EfficientNet achieves 99.82% gender accuracy on Common Voice (+1.25 pp over the previous best) and 98.86% on VoxCeleb1 (+0.57 pp). MobileNet attains 99.86% age-group accuracy on Common Voice (+2.86 pp) and a 5.35-year MAE for age estimation on TIMIT using a lightweight configuration. The consistent, near-state-of-the-art results across three acoustically diverse datasets substantiate the robustness and versatility of the proposed pipeline. Code and pre-trained weights are released to facilitate downstream research. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Graphical abstract

10 pages, 857 KiB

Open AccessProceeding Paper

Implementation of a Prototype-Based Parkinson’s Disease Detection System Using a RISC-V Processor

by Krishna Dharavathu, Pavan Kumar Sankula, Uma Maheswari Vullanki, Subhan Khan Mohammad, Sai Priya Kesapatnapu and Sameer Shaik

Eng. Proc. 2025, 87(1), 97; https://doi.org/10.3390/engproc2025087097 - 21 Jul 2025

Viewed by 206

Abstract

In the wide range of human diseases, Parkinson’s disease (PD) has a high incidence, according to a recent survey by the World Health Organization (WHO). According to WHO records, this chronic disease has affected approximately 10 million people worldwide. Patients who do not [...] Read more.

In the wide range of human diseases, Parkinson’s disease (PD) has a high incidence, according to a recent survey by the World Health Organization (WHO). According to WHO records, this chronic disease has affected approximately 10 million people worldwide. Patients who do not receive an early diagnosis may develop an incurable neurological disorder. PD is a degenerative disorder of the brain, characterized by the impairment of the nigrostriatal system. A wide range of symptoms of motor and non-motor impairment accompanies this disorder. By using new technology, the PD is detected through speech signals of the PD victims by using the reduced instruction set computing 5th version (RISC-V) processor. The RISC-V microcontroller unit (MCU) was designed for the voice-controlled human-machine interface (HMI). With the help of signal processing and feature extraction methods, the digital signal is impaired by the impairment of the nigrostriatal system. These speech signals can be classified through classifier modules. A wide range of classifier modules are used to classify the speech signals as normal or abnormal to identify PD. We use Matrix Laboratory (MATLAB R2021a_v9.10.0.1602886) to analyze the data, develop algorithms, create modules, and develop the RISC-V processor for embedded implementation. Machine learning (ML) techniques are also used to extract features such as pitch, tremor, and Mel-frequency cepstral coefficients (MFCCs). Full article

(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)

► Show Figures

Figure 1

20 pages, 1798 KiB

Open AccessArticle

An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment

by Alessio Catalfamo, Antonio Celesti, Maria Fazio, A. F. M. Saifuddin Saif, Yu-Sheng Lin, Edelberto Franco Silva and Massimo Villari

Big Data Cogn. Comput. 2025, 9(7), 188; https://doi.org/10.3390/bdcc9070188 - 17 Jul 2025

Viewed by 481

Abstract

Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications [...] Read more.

Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Computer Vision, Augmented Reality Virtual Reality and Metaverse)

► Show Figures

Figure 1

19 pages, 1039 KiB

Open AccessArticle

Prediction of Parkinson Disease Using Long-Term, Short-Term Acoustic Features Based on Machine Learning

by Mehdi Rashidi, Serena Arima, Andrea Claudio Stetco, Chiara Coppola, Debora Musarò, Marco Greco, Marina Damato, Filomena My, Angela Lupo, Marta Lorenzo, Antonio Danieli, Giuseppe Maruccio, Alberto Argentiero, Andrea Buccoliero, Marcello Dorian Donzella and Michele Maffia

Brain Sci. 2025, 15(7), 739; https://doi.org/10.3390/brainsci15070739 - 10 Jul 2025

Viewed by 516

Abstract

Background: Parkinson’s disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease, affecting countless individuals worldwide. PD is characterized by the onset of a marked motor symptomatology in association with several non-motor manifestations. The clinical phase of the disease is usually [...] Read more.

Background: Parkinson’s disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s disease, affecting countless individuals worldwide. PD is characterized by the onset of a marked motor symptomatology in association with several non-motor manifestations. The clinical phase of the disease is usually preceded by a long prodromal phase, devoid of overt motor symptomatology but often showing some conditions such as sleep disturbance, constipation, anosmia, and phonatory changes. To date, speech analysis appears to be a promising digital biomarker to anticipate even 10 years before the onset of clinical PD, as well serving as a useful prognostic tool for patient follow-up. That is why, the voice can be nominated as the non-invasive method to detect PD from healthy subjects (HS). Methods: Our study was based on cross-sectional study to analysis voice impairment. A dataset comprising 81 voice samples (41 from healthy individuals and 40 from PD patients) was utilized to train and evaluate common machine learning (ML) models using various types of features, including long-term (jitter, shimmer, and cepstral peak prominence (CPP)), short-term features (Mel-frequency cepstral coefficient (MFCC)), and non-standard measurements (pitch period entropy (PPE) and recurrence period density entropy (RPDE)). The study adopted multiple machine learning (ML) algorithms, including random forest (RF), K-nearest neighbors (KNN), decision tree (DT), naïve Bayes (NB), support vector machines (SVM), and logistic regression (LR). Cross-validation technique was applied to ensure the reliability of performance metrics on train and test subsets. These metrics (accuracy, recall, and precision), help determine the most effective models for distinguishing PD from healthy subjects. Result: Among all the algorithms used in this research, random forest (RF) was the best-performing model, achieving an accuracy of 82.72% with a ROC-AUC score of 89.65%. Although other models, such as support vector machine (SVM), could be considered with an accuracy of 75.29% and a ROC-AUC score of 82.63%, RF was by far the best one when evaluated across all metrics. The K-nearest neighbor (KNN) and decision tree (DT) performed the worst. Notably, by combining a comprehensive set of long-term, short-term, and non-standard acoustic features, unlike previous studies that typically focused on only a subset, our study achieved higher predictive performance, offering a more robust model for early PD detection. Conclusions: This study highlights the potential of combining advanced acoustic analysis with ML algorithms to develop non-invasive and reliable tools for early PD detection, offering substantial benefits for the healthcare sector. Full article

(This article belongs to the Section Neurodegenerative Diseases)

► Show Figures

Figure 1

18 pages, 3035 KiB

Open AccessArticle

Data-Driven Modeling and Enhancement of Surface Quality in Milling Based on Sound Signals

by Paschalis Charalampous

J. Manuf. Mater. Process. 2025, 9(7), 231; https://doi.org/10.3390/jmmp9070231 - 4 Jul 2025

Viewed by 383

Abstract

The present study introduces an AI (Artificial Intelligence) framework for surface roughness assessment in milling operations through sound signal processing. As industrial demands escalate for in-process quality control solutions, the proposed system leverages audio data to estimate surface finish states without interrupting production. [...] Read more.

The present study introduces an AI (Artificial Intelligence) framework for surface roughness assessment in milling operations through sound signal processing. As industrial demands escalate for in-process quality control solutions, the proposed system leverages audio data to estimate surface finish states without interrupting production. In order to address this, a novel classification approach was developed that maps audio waveform data into predictive indicators of surface quality. In particular, an experimental dataset was employed consisting of sound signals that were captured during milling procedures applying various machining conditions, where each signal was labeled with a corresponding roughness quality obtained via offline metrology. The formulated classification pipeline commences with audio acquisition, resampling, and normalization to ensure consistency across the dataset. These signals are then transformed into Mel-Frequency Cepstral Coefficients (MFCCs), which yield a compact time–frequency representation optimized for human auditory perception. Next, several AI algorithms were trained in order to classify these MFCCs into predefined surface roughness categories. Finally, the results of the work demonstrate that sound signals could contain sufficient discriminatory information enabling a reliable classification of surface finish quality. This approach not only facilitates in-process monitoring but also provides a foundation for intelligent manufacturing systems capable of real-time quality assurance. Full article

► Show Figures

Figure 1

22 pages, 4293 KiB

Open AccessArticle

Speech-Based Parkinson’s Detection Using Pre-Trained Self-Supervised Automatic Speech Recognition (ASR) Models and Supervised Contrastive Learning

by Hadi Sedigh Malekroodi, Nuwan Madusanka, Byeong-il Lee and Myunggi Yi

Bioengineering 2025, 12(7), 728; https://doi.org/10.3390/bioengineering12070728 - 1 Jul 2025

Viewed by 842

Abstract

Diagnosing Parkinson’s disease (PD) through speech analysis is a promising area of research, as speech impairments are often one of the early signs of the disease. This study investigates the efficacy of fine-tuning pre-trained Automatic Speech Recognition (ASR) models, specifically Wav2Vec 2.0 and [...] Read more.

Diagnosing Parkinson’s disease (PD) through speech analysis is a promising area of research, as speech impairments are often one of the early signs of the disease. This study investigates the efficacy of fine-tuning pre-trained Automatic Speech Recognition (ASR) models, specifically Wav2Vec 2.0 and HuBERT, for PD detection using transfer learning. These models, pre-trained on large unlabeled datasets, can be capable of learning rich speech representations that capture acoustic markers of PD. The study also proposes the integration of a supervised contrastive (SupCon) learning approach to enhance the models’ ability to distinguish PD-specific features. Additionally, the proposed ASR-based features were compared against two common acoustic feature sets: mel-frequency cepstral coefficients (MFCCs) and the extended Geneva minimalistic acoustic parameter set (eGeMAPS) as a baseline. We also employed a gradient-based method, Grad-CAM, to visualize important speech regions contributing to the models’ predictions. The experiments, conducted using the NeuroVoz dataset, demonstrated that features extracted from the pre-trained ASR models exhibited superior performance compared to the baseline features. The results also reveal that the method integrating SupCon consistently outperforms traditional cross-entropy (CE)-based models. Wav2Vec 2.0 and HuBERT with SupCon achieved the highest F1 scores of 90.0% and 88.99%, respectively. Additionally, their AUC scores in the ROC analysis surpassed those of the CE models, which had comparatively lower AUCs, ranging from 0.84 to 0.89. These results highlight the potential of ASR-based models as scalable, non-invasive tools for diagnosing and monitoring PD, offering a promising avenue for the early detection and management of this debilitating condition. Full article

(This article belongs to the Special Issue Machine Learning and Artificial Intelligence for Biomedical Applications, 3rd Edition)

► Show Figures

Figure 1

28 pages, 1634 KiB

Open AccessReview

AI-Powered Vocalization Analysis in Poultry: Systematic Review of Health, Behavior, and Welfare Monitoring

by Venkatraman Manikandan and Suresh Neethirajan

Sensors 2025, 25(13), 4058; https://doi.org/10.3390/s25134058 - 29 Jun 2025

Viewed by 1006

Abstract

Artificial intelligence and bioacoustics represent a paradigm shift in non-invasive poultry welfare monitoring through advanced vocalization analysis. This comprehensive systematic review critically examines the transformative evolution from traditional acoustic feature extraction—including Mel-Frequency Cepstral Coefficients (MFCCs), spectral entropy, and spectrograms—to cutting-edge deep learning architectures [...] Read more.

Artificial intelligence and bioacoustics represent a paradigm shift in non-invasive poultry welfare monitoring through advanced vocalization analysis. This comprehensive systematic review critically examines the transformative evolution from traditional acoustic feature extraction—including Mel-Frequency Cepstral Coefficients (MFCCs), spectral entropy, and spectrograms—to cutting-edge deep learning architectures encompassing Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, attention mechanisms, and groundbreaking self-supervised models such as wav2vec2 and Whisper. The investigation reveals compelling evidence for edge computing deployment via TinyML frameworks, addressing critical scalability challenges in commercial poultry environments characterized by acoustic complexity and computational constraints. Advanced applications spanning emotion recognition, disease detection, and behavioral phenotyping demonstrate unprecedented potential for real-time welfare assessment. Through rigorous bibliometric co-occurrence mapping and thematic clustering analysis, this review exposes persistent methodological bottlenecks: dataset standardization deficits, evaluation protocol inconsistencies, and algorithmic interpretability limitations. Critical knowledge gaps emerge in cross-species domain generalization and contextual acoustic adaptation, demanding urgent research prioritization. The findings underscore explainable AI integration as essential for establishing stakeholder trust and regulatory compliance in automated welfare monitoring systems. This synthesis positions acoustic AI as a cornerstone technology enabling ethical, transparent, and scientifically robust precision livestock farming, bridging computational innovation with biological relevance for sustainable poultry production systems. Future research directions emphasize multi-modal sensor integration, standardized evaluation frameworks, and domain-adaptive models capable of generalizing across diverse poultry breeds, housing conditions, and environmental contexts while maintaining interpretability for practical farm deployment. Full article

(This article belongs to the Special Issue Feature Papers in Smart Agriculture 2025)

► Show Figures

Figure 1

26 pages, 1521 KiB

Open AccessArticle

AI-Based Classification of Pediatric Breath Sounds: Toward a Tool for Early Respiratory Screening

by Lichuan Liu, Wei Li and Beth Moxley

Appl. Sci. 2025, 15(13), 7145; https://doi.org/10.3390/app15137145 - 25 Jun 2025

Viewed by 446

Abstract

Context: Respiratory morbidity is a leading cause of children’s consultations with general practitioners. Auscultation, the act of listening to breath sounds, is a crucial diagnostic method for respiratory system diseases. Problem: Parents and caregivers often lack the necessary knowledge and experience to identify [...] Read more.

Context: Respiratory morbidity is a leading cause of children’s consultations with general practitioners. Auscultation, the act of listening to breath sounds, is a crucial diagnostic method for respiratory system diseases. Problem: Parents and caregivers often lack the necessary knowledge and experience to identify subtle differences in children’s breath sounds. Furthermore, obtaining reliable feedback from young children about their physical condition is challenging. Methods: The use of a human–artificial intelligence (AI) tool is an essential component for screening and monitoring young children’s respiratory diseases. Using clinical data to design and validate the proposed approaches, we propose novel methods for recognizing and classifying children’s breath sounds. Different breath sound signals were analyzed in the time domain, frequency domain, and using spectrogram representations. Breath sound detection and segmentation were performed using digital signal processing techniques. Multiple features—including Mel–Frequency Cepstral Coefficients (MFCCs), Linear Prediction Coefficients (LPCs), Linear Prediction Cepstral Coefficients (LPCCs), spectral entropy, and Dynamic Linear Prediction Coefficients (DLPCs)—were extracted to capture both time and frequency characteristics. These features were then fed into various classifiers, including K-Nearest Neighbor (KNN), artificial neural networks (ANNs), hidden Markov models (HMMs), logistic regression, and decision trees, for recognition and classification. Main Findings: Experimental results from across 120 infants and preschoolers (2 months to 6 years) with respiratory disease (30 asthma, 30 croup, 30 pneumonia, and 30 normal) verified the performance of the proposed approaches. Conclusions: The proposed AI system provides a real-time diagnostic platform to improve clinical respiratory management and outcomes in young children, thereby reducing healthcare costs. Future work exploring additional respiratory diseases is warranted. Full article

(This article belongs to the Special Issue From Human–Machine Interaction to Human–Machine Cooperation: Status and Progress)

► Show Figures

Figure 1

15 pages, 1458 KiB

Open AccessArticle

Photoplethysmography Feature Extraction for Non-Invasive Glucose Estimation by Means of MFCC and Machine Learning Techniques

by Christian Salamea-Palacios, Melissa Montalvo-López, Raquel Orellana-Peralta and Javier Viñanzaca-Figueroa

Biosensors 2025, 15(7), 408; https://doi.org/10.3390/bios15070408 - 24 Jun 2025

Viewed by 518

Abstract

Diabetes Mellitus is considered one of the most widespread diseases in the world. Traditional glucose monitoring devices carry discomfort and risks associated with the frequent extraction of blood from users. The present article proposes a noninvasive glucose estimation system based on the application [...] Read more.

Diabetes Mellitus is considered one of the most widespread diseases in the world. Traditional glucose monitoring devices carry discomfort and risks associated with the frequent extraction of blood from users. The present article proposes a noninvasive glucose estimation system based on the application of Mel Frequency Cepstral Coefficients (MFCCs) for the characterization of photoplethysmographic signals (PPG). Two variants of the MFCC feature extraction methods are evaluated along with three machine learning techniques for the development of an effective regression function for the estimation of glucose concentration. A comparison between the performance of the algorithms revealed that the best combination achieved a mean absolute error of 9.85 mg/dL and a correlation of 0.94 between the estimated concentration and the real glucose values. Similarly, 99.53% of the validation samples were distributed within zones A and B of the Clarke Error Grid Analysis. The proposed system achieves levels of correlation comparable to analogous technologies that require earlier calibration for its operation, which indicates a strong potential for the future use of the algorithm as an alternative to invasive monitoring devices. Full article

(This article belongs to the Section Wearable Biosensors)

► Show Figures

Figure 1

22 pages, 5083 KiB

Open AccessFeature PaperArticle

Intelligent Mobile-Assisted Language Learning: A Deep Learning Approach for Pronunciation Analysis and Personalized Feedback

by Fengqin Liu, Korawit Orkphol, Natthapon Pannurat, Thanat Sooknuan, Thanin Muangpool, Sanya Kuankid and Montri Phothisonothai

Inventions 2025, 10(4), 46; https://doi.org/10.3390/inventions10040046 - 24 Jun 2025

Viewed by 644

Abstract

This paper introduces an innovative mobile-assisted language-learning (MALL) system that harnesses deep learning technology to analyze pronunciation patterns and deliver real-time, personalized feedback. Drawing inspiration from how the human brain processes speech through neural pathways, our system analyzes multiple speech features using spectrograms, [...] Read more.

This paper introduces an innovative mobile-assisted language-learning (MALL) system that harnesses deep learning technology to analyze pronunciation patterns and deliver real-time, personalized feedback. Drawing inspiration from how the human brain processes speech through neural pathways, our system analyzes multiple speech features using spectrograms, mel-frequency cepstral coefficients (MFCCs), and formant frequencies in a manner that mirrors the auditory cortex’s interpretation of sound. The core of our approach utilizes a convolutional neural network (CNN) to classify pronunciation patterns from user-recorded speech. To enhance the assessment accuracy and provide nuanced feedback, we integrated a fuzzy inference system (FIS) that helps learners identify and correct specific pronunciation errors. The experimental results demonstrate that our multi-feature model achieved 82.41% to 90.52% accuracies in accent classification across diverse linguistic contexts. The user testing revealed statistically significant improvements in pronunciation skills, where learners showed a 5–20% enhancement in accuracy after using the system. The proposed MALL system offers a portable, accessible solution for language learners while establishing a foundation for future research in multilingual functionality and mobile platform optimization. By combining advanced speech analysis with intuitive feedback mechanisms, this system addresses a critical challenge in language acquisition and promotes more effective self-directed learning. Full article

(This article belongs to the Special Issue Advances and Innovations in Deep Learning: Unveiling Multidisciplinary Applications and Challenges)

► Show Figures

Figure 1

18 pages, 1498 KiB

Open AccessArticle

Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN

by Gheed T. Waleed and Shaimaa H. Shaker

Information 2025, 16(7), 518; https://doi.org/10.3390/info16070518 - 21 Jun 2025

Viewed by 1157

Abstract

Speech emotion recognition (SER) plays a vital role in enhancing human–computer interaction (HCI) and can be applied in affective computing, virtual support, and healthcare. This research presents a high-performance SER framework based on a lightweight 1D Convolutional Neural Network (1D-CNN) and a multi-feature [...] Read more.

Speech emotion recognition (SER) plays a vital role in enhancing human–computer interaction (HCI) and can be applied in affective computing, virtual support, and healthcare. This research presents a high-performance SER framework based on a lightweight 1D Convolutional Neural Network (1D-CNN) and a multi-feature fusion technique. Rather than employing spectrograms as image-based input, frame-level characteristics (Mel-Frequency Cepstral Coefficients, Mel-Spectrograms, and Chroma vectors) are calculated throughout the sequences to preserve temporal information and reduce the computing expense. The model attained classification accuracies of 94.0% on MELD (multi-party talks) and 91.9% on RAVDESS (acted speech). Ablation experiments demonstrate that the integration of complimentary features significantly outperforms the utilisation of a singular feature as a baseline. Data augmentation techniques, including Gaussian noise and time shifting, enhance model generalisation. The proposed method demonstrates significant potential for real-time emotion recognition using audio only in embedded or resource-constrained devices. Full article

(This article belongs to the Special Issue Artificial Intelligence Methods for Human-Computer Interaction)

► Show Figures

Figure 1

22 pages, 3487 KiB

Open AccessArticle

Voice-Evoked Color Prediction Using Deep Neural Networks in Sound–Color Synesthesia

by Raminta Bartulienė, Aušra Saudargienė, Karolina Reinytė, Gustavas Davidavičius, Rūta Davidavičienė, Šarūnas Ašmantas, Gailius Raškinis and Saulius Šatkauskas

Brain Sci. 2025, 15(5), 520; https://doi.org/10.3390/brainsci15050520 - 19 May 2025

Viewed by 821

Abstract

Background/Objectives: Synesthesia is an unusual neurological condition when stimulation of one sensory modality automatically triggers an additional sensory sensation in an additional unstimulated modality. In this study, we investigated a case of sound–color synesthesia in a female with impaired vision. After confirming a [...] Read more.

Background/Objectives: Synesthesia is an unusual neurological condition when stimulation of one sensory modality automatically triggers an additional sensory sensation in an additional unstimulated modality. In this study, we investigated a case of sound–color synesthesia in a female with impaired vision. After confirming a positive case of synesthesia, we aimed to determine the sound features that played a key role in the subject’s sound perception and color development. Methods: We applied deep neural networks and a benchmark of binary logistic regression to classify blue and pink synesthetically voice-evoked color classes using 136 voice features extracted from eight study participants’ voice recordings. Results: The minimum Redundancy Maximum Relevance algorithm was applied to select the 20 most relevant voice features. The recognition accuracy of 0.81 was already achieved using five features, and the best results were obtained utilizing the seventeen most informative features. The deep neural network classified previously unseen voice recordings with 0.84 accuracy, 0.81 specificity, 0.86 sensitivity, and 0.85 and 0.81 F1-scores for blue and pink classes, respectively. The machine learning algorithms revealed that voice parameters, such as Mel-frequency cepstral coefficients, Chroma vectors, and sound energy, play the most significant role. Conclusions: Our results suggest that a person’s voice’s pitch, tone, and energy affect different color perceptions. Full article

(This article belongs to the Special Issue Perceptual Learning and Cortical Plasticity)

► Show Figures

Figure 1

15 pages, 49760 KiB

Open AccessArticle

Rapid Diagnosis of Distributed Acoustic Sensing Vibration Signals Using Mel-Frequency Cepstral Coefficients and Liquid Neural Networks

by Haitao Liu, Yunfan Xu, Yuefeng Qi, Haosong Yang and Weihong Bi

Sensors 2025, 25(10), 3090; https://doi.org/10.3390/s25103090 - 13 May 2025

Cited by 1 | Viewed by 604

Abstract

Distributed Acoustic Sensing (DAS) systems face increasing challenges in massive data processing and real-time fault diagnosis due to the growing complexity of industrial environments and data volume. To address these issues, an end-to-end diagnostic framework is developed, integrating Mel-Frequency Cepstral Coefficients (MFCCs) for [...] Read more.

Distributed Acoustic Sensing (DAS) systems face increasing challenges in massive data processing and real-time fault diagnosis due to the growing complexity of industrial environments and data volume. To address these issues, an end-to-end diagnostic framework is developed, integrating Mel-Frequency Cepstral Coefficients (MFCCs) for high-efficiency signal compression and Liquid Neural Networks (LNNs) for lightweight, real-time classification. The MFCC algorithm, originally used in speech processing, is adapted to extract key features from DAS vibration signals, achieving compression ratios of 60–100× without significant information loss. LNNs’ dynamic topology and sparse activation enable high accuracy with extremely low latency and minimal computational cost, making it highly suitable for edge deployment. The proposed framework was validated both in simulated environments and on a real-world conveyor belt system at Qinhuangdao Port, where it achieved 100% accuracy across four vibration modes over 14 weeks of operation. Comparative experiments show that LNNs outperform traditional models such as 1D-CNN and LSTMs in terms of accuracy, inference speed, and model size. The proposed MFCC-LNN pipeline also demonstrates strong cross-domain generalization capabilities in pipeline monitoring, seismic detection, and speech signal processing. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

Search Results (239)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (239)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI