Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (280)

Search Parameters:
Keywords = frequency cepstral coefficients

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 4190 KB  
Article
Acoustic Characteristics of Vowel Production in Children with Cochlear Implants Using a Multi-View Fusion Model
by Qingqing Xie, Jing Wang, Ling Du, Lifang Zhang and Yanan Li
Algorithms 2026, 19(1), 9; https://doi.org/10.3390/a19010009 (registering DOI) - 22 Dec 2025
Viewed by 58
Abstract
This study aims to examine the acoustic characteristics of Mandarin vowels produced by children with cochlear implants and to explore the differences in their speech production compared with those of children with normal hearing. We propose a multiview model-based method for vowel feature [...] Read more.
This study aims to examine the acoustic characteristics of Mandarin vowels produced by children with cochlear implants and to explore the differences in their speech production compared with those of children with normal hearing. We propose a multiview model-based method for vowel feature analysis. This approach involves extracting and fusing formant features, Mel-frequency cepstral coefficients (MFCCs), and linear predictive coding coefficients (LPCCs) to comprehensively represent vowel articulation. We conducted k-means clustering on individual features and applied multiview clustering to the fused features. The results showed that children with cochlear implants formed discernible vowel clusters in the formant space, though with lower compactness than those of normal-hearing children. Furthermore, the MFCCs and LPCCs features revealed significant inter-group differences. Most importantly, the multiview model, utilizing fused features, achieved superior clustering performance compared to any single feature. These findings demonstrated that effective fusion of frequency domain features provided a more comprehensive representation of phonetic characteristics, offering potential value for clinical assessment and targeted speech intervention in children with hearing impairment. Full article
Show Figures

Figure 1

30 pages, 4486 KB  
Article
Passive Localization in GPS-Denied Environments via Acoustic Side Channels: Harnessing Smartphone Microphones to Infer Wireless Signal Strength Using MFCC Features
by Khalid A. Darabkh, Oswa M. Amro and Feras B. Al-Qatanani
J. Sens. Actuator Netw. 2025, 14(6), 119; https://doi.org/10.3390/jsan14060119 - 16 Dec 2025
Viewed by 241
Abstract
The Global Positioning System (GPS) and Received Signal Strength Indicator (RSSI) usage for location provenance often fails in obstructed, noisy, or densely populated urban environments. This study proposes a passive location provenance method that uses the location’s acoustics and the device’s acoustic side [...] Read more.
The Global Positioning System (GPS) and Received Signal Strength Indicator (RSSI) usage for location provenance often fails in obstructed, noisy, or densely populated urban environments. This study proposes a passive location provenance method that uses the location’s acoustics and the device’s acoustic side channel to address these limitations. With the smartphone’s internal microphone, we can effectively capture the subtle vibrations produced by the capacitors within the voltage-regulating circuit during wireless transmissions. Subsequently, we extract key features from the resulting audio signals. Meanwhile, we record the RSSI values of the WiFi access points received by the smartphone in the exact location of the audio recordings. Our analysis reveals a strong correlation between acoustic features and RSSI values, indicating that passive acoustic emissions can effectively represent the strength of WiFi signals. Hence, the audio recordings can serve as proxies for Radio-Frequency (RF)-based location signals. We propose a location-provenance framework that utilizes sound features alone, particularly the Mel-Frequency Cepstral Coefficients (MFCCs), achieving coarse localization within approximately four kilometers. This method requires no specialized hardware, works in signal-degraded environments, and introduces a previously overlooked privacy concern: that internal device sounds can unintentionally leak spatial information. Our findings highlight a novel passive side-channel with implications for both privacy and security in mobile systems. Full article
Show Figures

Graphical abstract

27 pages, 3213 KB  
Article
Urban Sound Classification for IoT Devices in Smart City Infrastructures
by Simona Domazetovska Markovska, Viktor Gavriloski, Damjan Pecioski, Maja Anachkova, Dejan Shishkovski and Anastasija Angjusheva Ignjatovska
Urban Sci. 2025, 9(12), 517; https://doi.org/10.3390/urbansci9120517 - 5 Dec 2025
Viewed by 382
Abstract
Urban noise is a major environmental concern that affects public health and quality of life, demanding new approaches beyond conventional noise level monitoring. This study investigates the development of an AI-driven Acoustic Event Detection and Classification (AED/C) system designed for urban sound recognition [...] Read more.
Urban noise is a major environmental concern that affects public health and quality of life, demanding new approaches beyond conventional noise level monitoring. This study investigates the development of an AI-driven Acoustic Event Detection and Classification (AED/C) system designed for urban sound recognition and its integration into smart city application. Using the UrbanSound8K dataset, five acoustic parameters—Mel Frequency Cepstral Coefficients (MFCC), Mel Spectrogram (MS), Spectral Contrast (SC), Tonal Centroid (TC), and Chromagram (Ch)—were mathematically modeled and applied to feature extraction. Their combinations were tested with three classical machine learning algorithms: Support Vector Machines (SVM), Random Forest (RF), Naive Bayes (NB) and a deep learning approach, i.e., Convolutional Neural Networks (CNN). A total of 52 models with the three ML algorithms were analyzed along with 4 models with CNN. The MFCC-based CNN models showed the highest accuracy, achieving up to 92.68% on test data. This achieved accuracy represents approximately +2% improvement compared to prior CNN-based approaches reported in similar studies. Additionally, the number of trained models, 56 in total, exceeds those presented in comparable research, ensuring more robust performance validation and statistical reliability. Real-time validation confirmed the applicability for IoT devices, and a low-cost wireless sensor unit (WSU) was developed with fog and cloud computing for scalable data processing. The constructed WSU demonstrates a cost reduction of at least four times compared to previously developed units, while maintaining good performance, enabling broader deployment potential in smart city applications. The findings demonstrate the potential of AI-based AED/C systems for continuous, source-specific noise classification, supporting sustainable urban planning and improved environmental management in smart cities. Full article
Show Figures

Figure 1

17 pages, 2207 KB  
Article
Water Content Detection of Red Sandstone Based on Shock Acoustic Sensing and Convolutional Neural Network
by Zhaokang Qiu, Yang Liu, Yi Zhang, Xueqi Zhao, Dongdong Chen and Shengwu Tu
Sensors 2025, 25(23), 7164; https://doi.org/10.3390/s25237164 - 24 Nov 2025
Viewed by 274
Abstract
In response to the challenge of changes in the physical and mechanical properties of red sandstone when it comes into contact with water during construction projects, this paper proposes a moisture content detection method for red sandstone based on the knocking method. Taking [...] Read more.
In response to the challenge of changes in the physical and mechanical properties of red sandstone when it comes into contact with water during construction projects, this paper proposes a moisture content detection method for red sandstone based on the knocking method. Taking red sandstone as the research object, this study explores a moisture content detection approach by combining the knocking method with Convolutional Neural Network and Support Vector Machine algorithms (CNN-SVM). Specifically, this research involves knocking the surface of red sandstone specimens with a knocking hammer and precisely capturing the acoustic signals generated during the knocking process using a microphone. Subsequently, an effective detection of the moisture content in red sandstone is achieved through a method based on feature extraction from knocking sound signals and a Convolutional Neural Network classification model. This method is easy to operate. By utilizing modern signal processing techniques combined with the CNN-SVM model, it enables accurate identification and non-destructive testing of the moisture content in red sandstone even with small sample datasets. Mel Frequency Cepstral Coefficients (MFCCs) and Continuous Wavelet Transform (CWT) were separately used as features for detecting red sandstone specimens with different moisture contents. The detection results show that the classification accuracy of red sandstone moisture content using MFCCs as the feature reaches as high as 94.4%, significantly outperforming the classification method using CWT as the feature. This study validates the effectiveness and reliability of the proposed method, providing a novel and efficient approach for rapid and non-destructive detection of the moisture content in red sandstone. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

15 pages, 1109 KB  
Article
A Novel Unsupervised You Only Listen Once (YOLO) Machine Learning Platform for Automatic Detection and Characterization of Prominent Bowel Sounds Towards Precision Medicine
by Gayathri Yerrapragada, Jieun Lee, Mohammad Naveed Shariff, Poonguzhali Elangovan, Keerthy Gopalakrishnan, Avneet Kaur, Divyanshi Sood, Swetha Rapolu, Jay Gohri, Gianeshwaree Alias Rachna Panjwani, Rabiah Aslam Ansari, Jahnavi Mikkilineni, Naghmeh Asadimanesh, Thangeswaran Natarajan, Jayarajasekaran Janarthanan, Shiva Sankari Karuppiah, Vivek N. Iyer, Scott A. Helgeson, Venkata S. Akshintala and Shivaram P. Arunachalam
Bioengineering 2025, 12(11), 1271; https://doi.org/10.3390/bioengineering12111271 - 19 Nov 2025
Viewed by 759
Abstract
Phonoenterography (PEG) offers a non-invasive and radiation-free technique to assess gastrointestinal activity through acoustic signal analysis. In this feasibility study, 110 high-resolution PEG recordings (44.1 kHz, 16-bit) were acquired from eight healthy individuals, yielding 6314 prominent bowel sound (PBS) segments through automated segmentation. [...] Read more.
Phonoenterography (PEG) offers a non-invasive and radiation-free technique to assess gastrointestinal activity through acoustic signal analysis. In this feasibility study, 110 high-resolution PEG recordings (44.1 kHz, 16-bit) were acquired from eight healthy individuals, yielding 6314 prominent bowel sound (PBS) segments through automated segmentation. Each event was characterized using a 279-feature acoustic profile comprising Mel-frequency cepstral coefficients (MFCCs), their first-order derivatives (Δ-MFCCs), and six global spectral parameters. After normalization and dimensionality reduction with PCA and UMAP (cosine distance, 35 neighbors, minimum distance = 0.01), five clustering strategies were evaluated. K-Means (k = 5) achieved the most favorable balance between cluster quality (silhouette = 0.60; Calinski–Harabasz = 19,165; Davies–Bouldin = 0.68) and interpretability, consistently identifying five acoustic patterns: single-burst, multiple-burst, harmonic, random-continuous, and multi-modal. Temporal modeling of clustered events further revealed distinct sequential dynamics, with Single-Burst events showing the longest dwell times, random continuous the shortest, and strong diagonal elements in the transition matrix confirming measurable state persistence. Frequent transitions between random continuous and multi-modal states suggested dynamic exchanges between transient and overlapping motility patterns. Together, these findings demonstrate that unsupervised PEG-based analysis can capture both acoustic variability and temporal organization of bowel sounds. This annotation-free approach provides a scalable framework for real-time gastrointestinal monitoring and holds potential for clinical translation in conditions such as postoperative ileus, bowel obstruction, irritable bowel syndrome, and inflammatory bowel disease. Full article
Show Figures

Figure 1

25 pages, 5621 KB  
Article
Balanced Neonatal Cry Classification: Integrating Preterm and Full-Term Data for RDS Screening
by Somaye Valizade Shayegh and Chakib Tadj
Information 2025, 16(11), 1008; https://doi.org/10.3390/info16111008 - 19 Nov 2025
Viewed by 346
Abstract
Respiratory distress syndrome (RDS) is one of the most serious neonatal conditions, frequently leading to respiratory failure and death in low-resource settings. Early detection is therefore critical, particularly where access to advanced diagnostic tools is limited. Recent advances in machine learning have enabled [...] Read more.
Respiratory distress syndrome (RDS) is one of the most serious neonatal conditions, frequently leading to respiratory failure and death in low-resource settings. Early detection is therefore critical, particularly where access to advanced diagnostic tools is limited. Recent advances in machine learning have enabled non-invasive neonatal cry diagnostic systems (NCDSs) for early screening. To the best of our knowledge, this is the first cry-based RDS detection study to include both preterm and full-term infants in a subject-balanced design, using 76 neonates (38 RDS, 38 healthy; 19 per subgroup) and 8534 expiratory cry segments (4267 per class). Cry waveforms were converted to mono, high-pass-filtered, and segmented to isolate expiratory units. Mel-Frequency Cepstral Coefficients (MFCCs) and Filterbank (FBANK) features were extracted and transformed into fixed-dimensional embeddings using a lightweight X-vector model with mean-SDor attention-based pooling, followed by a binary classifier. Model parameters were optimized via grid search. Performance was evaluated using accuracy, precision, recall, F1-score, and ROC–AUC under stratified 10-fold cross-validation. MFCC + mean–SD achieved 93.59 ± 0.48% accuracy, while MFCC + attention reached 93.53 ± 0.52% accuracy with slightly higher precision, reducing false RDS alarms and improving clinical reliability. To enhance interpretability, Integrated Gradients were applied to MFCC and FBANK features to reveal the spectral regions contributing most to the decision. Overall, the proposed NCDS reliably distinguishes RDS from healthy cries and generalizes across neonatal subgroups despite the greater variability in preterm vocalizations. Full article
(This article belongs to the Special Issue Biomedical Signal and Image Processing with Artificial Intelligence)
Show Figures

Figure 1

14 pages, 1737 KB  
Article
Classification of Speech and Associated EEG Responses from Normal-Hearing and Cochlear Implant Talkers Using Support Vector Machines
by Shruthi Raghavendra, Sungmin Lee and Chin-Tuan Tan
Audiol. Res. 2025, 15(6), 158; https://doi.org/10.3390/audiolres15060158 - 18 Nov 2025
Viewed by 396
Abstract
Background/Objectives: Speech produced by individuals with hearing loss differs notably from that of normal-hearing (NH) individuals. Although cochlear implants (CIs) provide sufficient auditory input to support speech acquisition and control, there remains considerable variability in speech intelligibility among CI users. As a [...] Read more.
Background/Objectives: Speech produced by individuals with hearing loss differs notably from that of normal-hearing (NH) individuals. Although cochlear implants (CIs) provide sufficient auditory input to support speech acquisition and control, there remains considerable variability in speech intelligibility among CI users. As a result, speech produced by CI talkers often exhibits distinct acoustic characteristics compared to that of NH individuals. Methods: Speech data were obtained from eight cochlear-implant (CI) and eight normal-hearing (NH) talkers, while electroencephalogram (EEG) responses were recorded from 11 NH listeners exposed to the same speech stimuli. Support Vector Machine (SVM) classifiers employing 3-fold cross-validation were evaluated using classification accuracy as the performance metric. This study evaluated the efficacy of Support Vector Machine (SVM) algorithms using four kernel functions (Linear, Polynomial, Gaussian, and Radial Basis Function) to classify speech produced by NH and CI talkers. Six acoustic features—Log Energy, Zero-Crossing Rate (ZCR), Pitch, Linear Predictive Coefficients (LPC), Mel-Frequency Cepstral Coefficients (MFCCs), and Perceptual Linear Predictive Cepstral Coefficients (PLP-CC)—were extracted. These same features were also extracted from electroencephalogram (EEG) recordings of NH listeners who were exposed to the speech stimuli. The EEG analysis leveraged the assumption of quasi-stationarity over short time windows. Results: Classification of speech signals using SVMs yielded the highest accuracies of 100% and 94% for the Energy and MFCC features, respectively, using Gaussian and RBF kernels. EEG responses to speech achieved classification accuracies exceeding 70% for ZCR and Pitch features using the same kernels. Other features such as LPC and PLP-CC yielded moderate to low classification performance. Conclusions: The results indicate that both speech-derived and EEG-derived features can effectively differentiate between CI and NH talkers. Among the tested kernels, Gaussian and RBF provided superior performance, particularly when using Energy and MFCC features. These findings support the application of SVMs for multimodal classification in hearing research, with potential applications in improving CI speech processing and auditory rehabilitation. Full article
(This article belongs to the Section Hearing)
Show Figures

Figure 1

18 pages, 3175 KB  
Article
AudioFakeNet: A Model for Reliable Speaker Verification in Deepfake Audio
by Samia Dilbar, Muhammad Ali Qureshi, Serosh Karim Noon and Abdul Mannan
Algorithms 2025, 18(11), 716; https://doi.org/10.3390/a18110716 - 13 Nov 2025
Viewed by 757
Abstract
Deepfake audio refers to the generation of voice recordings using deep neural networks that replicate a specific individual’s voice, often for deceptive or fraud purposes. Although this has been an area of research for quite some time, deepfakes still pose substantial challenges for [...] Read more.
Deepfake audio refers to the generation of voice recordings using deep neural networks that replicate a specific individual’s voice, often for deceptive or fraud purposes. Although this has been an area of research for quite some time, deepfakes still pose substantial challenges for reliable true speaker authentication. To address the issue, we propose AudioFakeNet, a hybrid deep learning architecture that use Convolutional Neural Networks (CNNs) along with Long Short-Term Memory (LSTM) units, and Multi-Head Attention (MHA) mechanisms for robust deepfake detection. CNN extracts spatial and spectral features, LSTM captures temporal dependencies, and MHA enhances to focus on informative audio segments. The model is trained using Mel-Frequency Cepstral Coefficients (MFCCs) from the publicly available dataset and was validated on self-collected dataset, ensuring reproducibility. Performance comparisons with state-of-the-art machine learning and deep learning models show that our proposed AudioFakeNet achieves higher accuracy, better generalization, and lower Equal Error Rate (EER). Its modular design allows for broader adaptability in fake-audio detection tasks, offering significant potential across diverse speech synthesis applications. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

19 pages, 824 KB  
Article
Cuffless Blood Pressure Estimation from Phonocardiogram Signals Using Deep Learning with Adaptive Feature Recalibration
by Talit Jumphoo, Atcharawan Rattanasak, Kasidit Kokkhunthod, Wongsathon Pathonsuwan, Rattikan Nualsri, Sittinon Thanonklang, Pattama Tongdee, Porntip Nimkuntod, Monthippa Uthansakul and Peerapong Uthansakul
Symmetry 2025, 17(11), 1943; https://doi.org/10.3390/sym17111943 - 13 Nov 2025
Viewed by 507
Abstract
Blood pressure (BP) monitoring is essential for cardiovascular health management, yet traditional cuff-based methods face limitations including patient discomfort and inapplicability for certain populations. This study presents a deep learning framework for cuffless BP estimation using phonocardiogram (PCG) signals. The proposed model integrates [...] Read more.
Blood pressure (BP) monitoring is essential for cardiovascular health management, yet traditional cuff-based methods face limitations including patient discomfort and inapplicability for certain populations. This study presents a deep learning framework for cuffless BP estimation using phonocardiogram (PCG) signals. The proposed model integrates convolutional neural networks (CNNs) with Squeeze-and-Excitation (SE) blocks and demographic information to enhance prediction accuracy. Mel-Frequency Cepstral Coefficients (MFCCs), along with their delta and delta–delta coefficients, were employed to capture comprehensive acoustic characteristics of heart sounds. The results demonstrated that the proposed model achieved high predictive accuracy and strong consistency with reference BP measurements. Component analysis confirmed that the inclusion of SE blocks provided substantial performance gains, while demographic information further improved prediction stability. Clinical validation also verified that the model maintained close agreement with true BP values across the tested population, showing significant improvement over the baseline CNN implementation. These findings suggest potential for accessible, non-invasive BP monitoring systems suitable for continuous health tracking. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

18 pages, 1138 KB  
Article
Speech-Based Depression Recognition in Hikikomori Patients Undergoing Cognitive Behavioral Therapy
by Samara Soares Leal, Stavros Ntalampiras, Maria Gloria Rossetti, Antonio Trabacca, Marcella Bellani and Roberto Sassi
Appl. Sci. 2025, 15(21), 11750; https://doi.org/10.3390/app152111750 - 4 Nov 2025
Viewed by 527
Abstract
Major depressive disorder (MDD) affects approximately 4.4% of the global population. Its prevalence is increasing among adolescents and has led to the psychosocial condition known as hikikomori. MDD is typically assessed by self-report questionnaires, which, although informative, are subject to evaluator bias [...] Read more.
Major depressive disorder (MDD) affects approximately 4.4% of the global population. Its prevalence is increasing among adolescents and has led to the psychosocial condition known as hikikomori. MDD is typically assessed by self-report questionnaires, which, although informative, are subject to evaluator bias and subjectivity. To address these limitations, recent studies have explored machine learning (ML) for automated MDD detection. Among the input data used, speech signals stand out due to their low cost and minimal intrusiveness. However, many speech-based approaches lack integration with cognitive behavioral therapy (CBT) and adherence to evidence-based, patient-centered care—often aiming to replace rather than support clinical monitoring. In this context, we propose ML models to assess MDD in hikikomori patients using speech data from a real-world clinical trial. The trial is conducted in Italy, supervised by physicians, and comprises an eight-session CBT plan that is clinical evidence-based and follows patient-centered practices. Patients’ speech is recorded during therapy, and the Mel-Frequency Cepstral Coefficients (MFCCs) and wav2vec 2.0 embedding are extracted to train the models. The results show that the Multi-Layer Perceptron (MLP) predicted depression outcomes with a Root Mean Squared Error (RMSE) of 0.064 using only MFCCs from the first session, suggesting that early-session speech may be valuable for outcome prediction. When considering the entire CBT treatment (i.e., all sessions), the MLP achieved an RMSE of 0.063 using MFCCs and a lower RMSE of 0.057 with wav2vec 2.0, indicating approximately a 9.5% performance improvement. To aid the interpretability of the treatment outcomes, a binary task was conducted, where Logistic Regression (LR) achieved 70% recall in predicting depression improvement among young adults using wav2vec 2.0. These findings position speech as a valuable predictive tool in clinical informatics, potentially supporting clinicians in anticipating treatment response. Full article
(This article belongs to the Special Issue Advances in Audio Signal Processing)
Show Figures

Figure 1

11 pages, 744 KB  
Proceeding Paper
A Deep Learning Framework for Early Detection of Potential Cardiac Anomalies via Murmur Pattern Analysis in Phonocardiograms
by Aymane Edder, Fatima-Ezzahraa Ben-Bouazza, Oumaima Manchadi, Youssef Ait Bigane, Djeneba Sangare and Bassma Jioudi
Eng. Proc. 2025, 112(1), 63; https://doi.org/10.3390/engproc2025112063 - 31 Oct 2025
Viewed by 368
Abstract
Heart murmurs, resulting from turbulent blood flow within the cardiac structure, represent some of the initial acoustic manifestations of potential underlying cardiovascular anomalies, such as arrhythmias. This research presents a deep learning framework aimed at the early detection of potential cardiac anomalies through [...] Read more.
Heart murmurs, resulting from turbulent blood flow within the cardiac structure, represent some of the initial acoustic manifestations of potential underlying cardiovascular anomalies, such as arrhythmias. This research presents a deep learning framework aimed at the early detection of potential cardiac anomalies through the analysis of murmur patterns in phonocardiogram (PCG) signals. Our methodology employs a spectro-temporal feature fusion technique that integrates Mel spectrograms, Mel Frequency Cepstral Coefficients (MFCCs), Root Mean Square (RMS) energy, and Power Spectral Density (PSD) representations. The features are derived from segmented 5-second phonocardiogram (PCG) windows and subsequently input into a two-dimensional convolutional neural network (CNN) for the purpose of classification. In order to mitigate class imbalance and enhance generalization, We employ data augmentation techniques, including pitch moving and noise injection. The model under consideration has undergone training and evaluation utilizing a carefully selected subset of the CirCor DigiScope dataset. The experimental findings indicate a robust performance, with a classification accuracy recorded at 92.40% and a cross-entropy loss measured at 0.2242. The results indicate that an analysis of PCG signals informed by murmurs may function as an effective non-invasive method for the early screening of conditions that may include arrhythmias, particularly in clinical environments with limited resources. Full article
Show Figures

Figure 1

11 pages, 703 KB  
Article
Distinguishing Between Healthy and Unhealthy Newborns Based on Acoustic Features and Deep Learning Neural Networks Tuned by Bayesian Optimization and Random Search Algorithm
by Salim Lahmiri, Chakib Tadj and Christian Gargour
Entropy 2025, 27(11), 1109; https://doi.org/10.3390/e27111109 - 27 Oct 2025
Viewed by 394
Abstract
Voice analysis and classification for biomedical diagnosis purpose is receiving a growing attention to assist physicians in the decision-making process in clinical milieu. In this study, we develop and test deep feedforward neural networks (DFFNN) to distinguish between healthy and unhealthy newborns. The [...] Read more.
Voice analysis and classification for biomedical diagnosis purpose is receiving a growing attention to assist physicians in the decision-making process in clinical milieu. In this study, we develop and test deep feedforward neural networks (DFFNN) to distinguish between healthy and unhealthy newborns. The DFFNN are trained with acoustic features measured from newborn cries, including auditory-inspired amplitude modulation (AAM), Mel Frequency Cepstral Coefficients (MFCC), and prosody. The configuration of the DFFNN is optimized by using Bayesian optimization (BO) and random search (RS) algorithm. Under both optimization techniques, the experimental results show that the DFFNN yielded to the highest classification rate when trained with all acoustic features. Specifically, the DFFNN-BO and DFFNN-RS achieved 87.80% ± 0.23 and 86.12% ± 0.33 accuracy, respectively, under ten-fold cross-validation protocol. Both DFFNN-BO and DFFNN-RS outperformed existing approaches tested on the same database. Full article
(This article belongs to the Section Signal and Data Analysis)
Show Figures

Figure 1

16 pages, 2589 KB  
Article
A Laser-Induced Audible Metal Defect Detection Method Based on Spectral Discriminative Weights
by Bin Zhu, Tao Liu, Wuyue Hou, Sirui Wang, Yuhua Hang, Lei Shao, Zhen Cai, Jinna Mei and Xueqin Chen
Electronics 2025, 14(21), 4175; https://doi.org/10.3390/electronics14214175 - 25 Oct 2025
Viewed by 377
Abstract
This paper proposes a metal defect detection method based on laser-induced audible sound testing (LAST). Defective and defect-free martensitic stainless-steel cubes were used as study samples, and the spectral characteristics of the acoustic signals generated under laser irradiation were comparatively analyzed. Based on [...] Read more.
This paper proposes a metal defect detection method based on laser-induced audible sound testing (LAST). Defective and defect-free martensitic stainless-steel cubes were used as study samples, and the spectral characteristics of the acoustic signals generated under laser irradiation were comparatively analyzed. Based on F-ratio analysis, weighting curves characterizing the discrimination capability of each frequency band were calculated. Subsequently, nonlinear filter banks were designed according to the spectrum discrimination weights, tailored to the degree of spectrum discrimination. Finally, a globally weighted cepstral coefficient (GWCC) extraction algorithm for laser-induced acoustic signals was developed to determine whether defects are present in metals. Experimental results show that the recognition rate of defective samples based on GWCC features reached 94%, higher than that of traditional acoustic features, effectively enhancing feature discriminability. The results of this study demonstrate that applying LAST to metal defect detection is feasible. This method leverages laser-generated acoustic signals from a more comprehensive and economical perspective, pioneering a new solution for non-destructive testing of metal defects. Full article
Show Figures

Figure 1

39 pages, 4554 KB  
Article
A Robust and Efficient Workflow for Heart Valve Disease Detection from PCG Signals: Integrating WCNN, MFCC Optimization, and Signal Quality Evaluation
by Shin-Chi Lai, Yen-Ching Chang, Ying-Hsiu Hung, Szu-Ting Wang, Yao-Feng Liang, Li-Chuan Hsu, Ming-Hwa Sheu and Chuan-Yu Chang
Sensors 2025, 25(21), 6562; https://doi.org/10.3390/s25216562 - 24 Oct 2025
Viewed by 620
Abstract
This study proposes a comprehensive and computationally efficient system for the recognition of heart valve diseases (HVDs) in phonocardiogram (PCG) signals, emphasizing an end-to-end workflow suitable for real-world deployment. The core of the system is a lightweight weighted convolutional neural network (WCNN) featuring [...] Read more.
This study proposes a comprehensive and computationally efficient system for the recognition of heart valve diseases (HVDs) in phonocardiogram (PCG) signals, emphasizing an end-to-end workflow suitable for real-world deployment. The core of the system is a lightweight weighted convolutional neural network (WCNN) featuring a key weighting calculation (KWC) layer, which enhances noise robustness by adaptively weighting feature map channels based on global average pooling. The proposed system incorporates optimized feature extraction using Mel-frequency cepstral coefficients (MFCCs) guided by GradCAM, and a band energy ratio (BER) metric to assess signal quality, showing that lower BER values are associated with higher misclassification rates due to noise. Experimental results demonstrated classification accuracies of 99.6% and 90.74% on the GitHub PCG and PhysioNet/CinC Challenge 2016 databases, respectively, where the models were trained and tested independently. The proposed model achieved superior accuracy using significantly fewer parameters (312,357) and lower computational cost (4.5 M FLOPs) compared with previously published research. Compared with the model proposed by Karhade et al., the proposed model use 74.9% fewer parameters and 99.3% fewer FLOPs. Furthermore, the proposed model was implemented on a Raspberry Pi, achieving real-time HVDs detection with a detection time of only 1.87 ms for a 1.4 s signal. Full article
(This article belongs to the Special Issue AI-Based Automated Recognition and Detection in Healthcare)
Show Figures

Figure 1

21 pages, 2200 KB  
Article
Segmented vs. Non-Segmented Heart Sound Classification: Impact of Feature Extraction and Machine Learning Models
by Ceyda Boz and Yucel Kocyigit
Appl. Sci. 2025, 15(20), 11047; https://doi.org/10.3390/app152011047 - 15 Oct 2025
Cited by 1 | Viewed by 656
Abstract
Cardiovascular diseases remain a leading cause of mortality worldwide, emphasizing the importance of early diagnosis. Heart sound analysis offers a non-invasive avenue for detecting cardiac abnormalities. This study systematically evaluates the effect of segmentation on phonocardiogram (PCG) classification performance. Unlike conventional fixed-window or [...] Read more.
Cardiovascular diseases remain a leading cause of mortality worldwide, emphasizing the importance of early diagnosis. Heart sound analysis offers a non-invasive avenue for detecting cardiac abnormalities. This study systematically evaluates the effect of segmentation on phonocardiogram (PCG) classification performance. Unlike conventional fixed-window or HSMM-based methods, a data-adaptive segmentation approach combining Shannon energy and Otsu thresholding is proposed. After segmentation, features are extracted using Empirical Mode Decomposition (EMD) and Mel-Frequency Cepstral Coefficients (MFCCs), followed by classification with k-Nearest Neighbor (kNN), Support Vector Machine (SVM), and Random Forest (RF). Experiments on the PhysioNet/CinC 2016 and Pascal datasets revealed that segmentation markedly enhances classification accuracy. The optimal results were achieved using kNN with segmented EMD features, attaining 99.97% accuracy, 99.98% sensitivity, and 99.96% specificity; segmented MFCC features also provided high accuracy (99.37%). In contrast, non-segmented models yielded substantially lower performance. Principal Component Analysis (PCA) is applied for dimensionality reduction, preserving classification efficiency while minimizing computational cost. These findings demonstrate the critical importance of effective segmentation in heart sound classification and establish the proposed Shannon–Otsu-based method as a robust, interpretable, and resource-efficient tool for automated cardiac diagnostics. Using annotated PhysioNet recordings, segmentation achieved ~90% sensitivity for S1/S2 detection. A limitation is the absence of full segment annotations in the Pascal dataset, which prevents comprehensive timing-error evaluation. Full article
Show Figures

Figure 1

Back to TopTop