Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (66)

Search Parameters:
Keywords = voice signal classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 3315 KB  
Article
Remote Tower Air Traffic Controller Fatigue Detection Based on Eye-Tracking and EEG Fusion
by Dajiang Song, Weijun Pan, Zirui Yin, Boyuan Han and Huafei Gao
Aerospace 2026, 13(6), 549; https://doi.org/10.3390/aerospace13060549 - 12 Jun 2026
Viewed by 233
Abstract
Remote tower operations require air traffic controllers to maintain continuous visual monitoring and integrate information from panoramic displays, radar data, flight strips, and voice communication. Such screen-mediated and sustained surveillance tasks may lead to covert fatigue, which is difficult to capture using a [...] Read more.
Remote tower operations require air traffic controllers to maintain continuous visual monitoring and integrate information from panoramic displays, radar data, flight strips, and voice communication. Such screen-mediated and sustained surveillance tasks may lead to covert fatigue, which is difficult to capture using a single physiological or behavioral signal. To address this issue, this study proposes a Gated EEG–Eye Fusion Network (GEEF-Net) for window-level fatigue detection in remote tower controllers. EEG and eye-tracking signals were synchronously collected during simulated remote tower tasks and segmented into 5 s windows with a 2 s step. For each window, 53 EEG features and 47 eye-tracking features were extracted to construct a 100-dimensional multimodal representation. GEEF-Net adopts a lightweight modality-gating mechanism to adaptively weight EEG and eye-tracking representations before fatigue classification. Under the main subject-dependent validation setting, GEEF-Net achieved an Accuracy of 0.883, an F1-score of 0.788, and a ROC-AUC of 0.944, outperforming EEG-only, eye-only, and early-fusion baselines in most overall metrics. The gating analysis indicated that eye-tracking features received a higher average weight than EEG features, suggesting the importance of visual behavior in remote tower fatigue detection. Cross-subject validation showed that individual differences remain a major challenge, while few-shot subject-specific calibration improved model adaptation when limited target-subject samples were available. These findings suggest that EEG–eye-tracking fusion with lightweight modality gating is a feasible approach for fatigue detection in simulated remote tower tasks. However, larger datasets and operationally realistic validation considering shift work, circadian effects, and operational pressure are still required before the approach can be considered operationally reliable. Full article
(This article belongs to the Section Air Traffic and Transportation)
Show Figures

Figure 1

33 pages, 1935 KB  
Article
Smart Industrial Safety in High-Noise Environments Using IoT and AI
by Alessia Bramanti, Luca Catarinucci, Mattia Cotardo, Rosaria Del Sorbo, Claudia Giliberti, Mazhar Jan, Luca Landi, Raffaele Mariconte, Teodoro Montanaro, Federico Paolucci, Luigi Patrono, Davide Rollo, Francesco Antonio Salzano and Ilaria Sergi
Electronics 2026, 15(6), 1311; https://doi.org/10.3390/electronics15061311 - 20 Mar 2026
Cited by 1 | Viewed by 901
Abstract
High noise levels in industrial workplaces pose significant challenges to occupational safety, particularly with hearing protection and effective communication. Traditional hearing protection devices, while effectively attenuating harmful noise, often compromise situational awareness by excessively isolating workers from the acoustic environment and preventing the [...] Read more.
High noise levels in industrial workplaces pose significant challenges to occupational safety, particularly with hearing protection and effective communication. Traditional hearing protection devices, while effectively attenuating harmful noise, often compromise situational awareness by excessively isolating workers from the acoustic environment and preventing the perception of critical auditory cues (e.g., emergency alarms), thereby introducing additional safety risks. This paper presents a smart industrial safety system that integrates Internet of Things (IoT) and artificial intelligence (AI) and is based on intelligent hearing protection devices to (a) selectively attenuate hazardous industrial noise while (b) preserving human speech and (c) reproduce targeted audio notifications to workers near malfunctioning or hazardous machinery. A real-time voice activity detection (VAD) model is employed to distinguish vocal components from background noise to adaptively control digital signal processing filters. Furthermore, indoor localization enables the delivery of targeted audio messages to workers in proximity to relevant events. Experimental evaluations on embedded hardware demonstrate that the selected VAD model operates well within real-time constraints and effectively supports dynamic noise filtering. Objective evaluation of the filtering stage using Mean Opinion Score (MOS), signal-to-noise ratio (SNR), and Harmonics-to-Noise Ratio (HNR) shows consistent quality improvements across all tested conditions, with MOS gains up to +118%, SNR increases between +10.4 and +29.0 dB, and HNR improvements up to +6.22 dB, indicating enhanced speech intelligibility and preservation of voice harmonic structure even under high-noise scenarios. Robustness validation of the VAD module across varying acoustic conditions confirms reliable speech detection performance, achieving perfect classification at +10 dB SNR, very high accuracy at 0 dB (98.3%, ROC AUC 0.998), and stable operation even at 7 dB SNR (79.8% accuracy, ROC AUC 0.878). The proposed architecture achieves a balanced trade-off between hearing protection and speech intelligibility while enhancing the effectiveness of safety communications in noisy industrial environments. Full article
Show Figures

Figure 1

25 pages, 918 KB  
Review
Parkinson’s Disease Detection Using Machine Learning Algorithms: A Comprehensive Review
by Jelica Cincović, Miloš Cvetanović, Milica Djurić-Jovičić, Nebojsa Bacanin and Boško Nikolić
Algorithms 2026, 19(3), 193; https://doi.org/10.3390/a19030193 - 4 Mar 2026
Viewed by 1075
Abstract
Parkinson’s disease (PD) is a progressive neurodegenerative disorder in which early detection remains a major clinical challenge due to heterogeneous motor and non-motor manifestations and the lack of reliable biomarkers. In recent years, machine learning (ML) and deep learning (DL) methods have been [...] Read more.
Parkinson’s disease (PD) is a progressive neurodegenerative disorder in which early detection remains a major clinical challenge due to heterogeneous motor and non-motor manifestations and the lack of reliable biomarkers. In recent years, machine learning (ML) and deep learning (DL) methods have been increasingly investigated as decision-support tools for PD screening using diverse clinical and behavioral data. This review synthesizes PD detection studies published between 2017 and 2025, systematically analyzing 32 representative works across multiple modalities, including MRI, PET, EEG, REM sleep biomarkers, voice recordings, gait signals, handwriting/drawing tasks, and finger-tapping measurements. Across the reviewed literature, high classification performance is frequently reported, with CNN-based and hybrid DL architectures achieving particularly strong results in imaging and time-series settings, while classical ML approaches such as SVM and ensemble models remain competitive for engineered feature-based datasets. However, the review also reveals major barriers to reliable translation, including small datasets, inconsistent evaluation protocols, limited external validation, and the risk of performance inflation caused by non-subject-independent data splitting. Overall, this review provides a structured and modality-oriented reference of algorithms, datasets, and performance trends, while highlighting key methodological gaps and practical priorities for developing robust and clinically deployable PD detection systems. Full article
Show Figures

Figure 1

21 pages, 1582 KB  
Article
Tile Debonding Detection Based on Acoustic Signal Features and a Dual-Branch Convolutional Neural Network
by Dejiang Wang and Bo Kang
Buildings 2026, 16(4), 870; https://doi.org/10.3390/buildings16040870 - 21 Feb 2026
Viewed by 622
Abstract
Tiles are commonly used as architectural finishing materials, but are prone to debonding defects due to construction and environmental factors in engineering applications. Therefore, effective detection of tile debonding holds significant engineering relevance. This study proposes a tile debonding detection method based on [...] Read more.
Tiles are commonly used as architectural finishing materials, but are prone to debonding defects due to construction and environmental factors in engineering applications. Therefore, effective detection of tile debonding holds significant engineering relevance. This study proposes a tile debonding detection method based on impact sound signal features and a dual-branch convolutional neural network. The sound signals collected through tapping are transformed into two types of two-dimensional feature maps using Mel-frequency cepstral coefficients (MFCCs) and continuous wavelet transform (CWT), which are then fed in parallel into the dual-branch convolutional neural network for feature extraction and fusion. Finally, tile debonding classification is performed in the classifier module. Experimental results show that the proposed model achieves a classification accuracy of 98.5% under laboratory conditions. Moreover, it demonstrates strong robustness under varying noise levels and sound pressure conditions, maintaining an accuracy of 82% in a 75 dB human voice noise environment. Field validation in real-world engineering environments yields an accuracy of 91.5%. These findings indicate that the proposed method, which combines MFCC and CWT features with a dual-branch convolutional neural network architecture, enables high-precision identification of tile debonding defects. Full article
Show Figures

Figure 1

18 pages, 6224 KB  
Article
Voice-Based Pain Level Classification for Sensor-Assisted Intelligent Care
by Andrew Y. Lu and Wei Lu
Sensors 2026, 26(3), 892; https://doi.org/10.3390/s26030892 - 29 Jan 2026
Cited by 2 | Viewed by 707
Abstract
Various sensors are increasingly being adopted to support intelligent healthcare systems, which address the growing problem of staff shortages in assisted-living communities. In this context, detecting and assessing pain remain critical yet challenging tasks in both clinical and non-clinical settings. Traditional approaches such [...] Read more.
Various sensors are increasingly being adopted to support intelligent healthcare systems, which address the growing problem of staff shortages in assisted-living communities. In this context, detecting and assessing pain remain critical yet challenging tasks in both clinical and non-clinical settings. Traditional approaches such as self-reporting, physiological signal monitoring, and facial expression analysis often face limitations related to accessibility, equipment costs, and the need for professional support. To overcome these challenges in this work, we investigate a sensor-assisted system for pain detection and propose a lightweight framework that enables real-time classification of pain levels using acoustic sensors. Our system exploits the spectral features of voice signals that strongly correlate with pain to train Convolutional Neural Network (CNN) models. Our system has been validated through simulations in Jupiter Notebook and a Raspberry Pi-based hardware prototype. The experimental results demonstrate that the proposed three-level pain classification approach obtains an average accuracy of 72.74%, outperforming existing methods with the same pain-level granularity by 18.94–26.74% and achieving performance comparable to that of binary pain detection methods. Our hardware prototype, built from commercial off-the-shelf components for under 100 USD, achieves real-time processing speeds ranging from approximately 6 to 22 s. In addition to CNN models, our experiments demonstrate that other machine learning algorithms, such as Artificial Neural Networks, XGBoost, Random Forests, and Decision Trees, also prove to be applicable within our pain level classification framework. Full article
(This article belongs to the Special Issue Independent Living: Sensor-Assisted Intelligent Care and Healthcare)
Show Figures

Figure 1

21 pages, 359 KB  
Review
Artificial Intelligence and Neuromuscular Diseases: A Narrative Review
by Donald C. Wunsch, Daniel B. Hier and Donald C. Wunsch
AI Med. 2026, 1(1), 5; https://doi.org/10.3390/aimed1010005 - 27 Jan 2026
Viewed by 2082
Abstract
Neuromuscular diseases are biologically diverse, clinically heterogeneous, and often difficult to diagnose and treat, highlighting the need for computational tools that can help resolve overlapping phenotypes and support timely, mechanism-informed interventions. This narrative review synthesizes recent advances in artificial intelligence (AI) and machine [...] Read more.
Neuromuscular diseases are biologically diverse, clinically heterogeneous, and often difficult to diagnose and treat, highlighting the need for computational tools that can help resolve overlapping phenotypes and support timely, mechanism-informed interventions. This narrative review synthesizes recent advances in artificial intelligence (AI) and machine learning applied to neuromuscular diseases across diagnosis, outcome modeling, biomarker development, and therapeutics. AI-based approaches may assist clinical and genetic diagnosis from phenotypic data; however, early phenotype-driven tools have seen limited clinician adoption due to modest accuracy, usability challenges, and poor workflow integration. Electrophysiological studies remain central to diagnosing neuromuscular diseases, and AI shows promise for accurate classification of electrophysiological signals. Predictive models for disease outcome and progression—particularly in amyotrophic lateral sclerosis—are under active investigation, but most remain at an early stage of development and are not yet ready for routine clinical use. Digital biomarkers derived from imaging, gait, voice, and wearable sensors are emerging, with MRI-based quantification of muscle fat replacement representing the most mature and widely accepted application to date. Efforts to apply AI to therapeutic discovery, including drug repurposing and optimization of gene-based therapies, are ongoing but have thus far yielded limited clinical translation. Persistent barriers to broader adoption include disease rarity, data scarcity, heterogeneous acquisition protocols, inconsistent terminology, limited external validation, insufficient model explainability, and lack of seamless integration into clinical workflows. Addressing these challenges is essential to moving AI tools from the laboratory into clinical practice. We conclude with a practical checklist of considerations intended to guide the development and adoption of AI tools in neuromuscular disease care. Full article
15 pages, 554 KB  
Article
Exploring Acoustic Correlates of Depression and Preliminary Screening Models Using XGBoost and SHAP
by Kwang-Ho Seok, Jaeeun Shin and Sung-Man Bae
Behav. Sci. 2025, 15(12), 1648; https://doi.org/10.3390/bs15121648 - 30 Nov 2025
Viewed by 704
Abstract
This exploratory study investigated whether voice-derived acoustic features reflect depressive symptom severity and whether they carry preliminary predictive signal for distinguishing individuals with Major Depressive Disorder (MDD) from healthy controls (HC). Using the publicly available MODMA dataset (23 MDD; 29 HC), 6553 acoustic [...] Read more.
This exploratory study investigated whether voice-derived acoustic features reflect depressive symptom severity and whether they carry preliminary predictive signal for distinguishing individuals with Major Depressive Disorder (MDD) from healthy controls (HC). Using the publicly available MODMA dataset (23 MDD; 29 HC), 6553 acoustic features were extracted with openSMILE. Spearman correlation and group-difference analyses identified several MFCC-derived spectral features as moderately and systematically associated with PHQ-9 scores, indicating their potential relevance as severity-linked acoustic markers. To complement these findings, a supplementary severity-based classification using a PHQ-9 ≥ 10 threshold showed that a logistic regression model trained on the top five correlated MFCC features achieved a cross-validated AUC of 0.78 (SD = 0.15), supporting their association with clinically defined symptom burden. Four machine learning pipelines were further evaluated for an exploratory MDD–HC classification task. Among them, the PCA + XGBoost model demonstrated the most stable generalization (test AUC = 0.60), although predictive performance remained limited within the constraints of the small and high-dimensional dataset. SHAP analysis highlighted MFCC-derived features as key contributors to model decisions, providing transparent interpretability. Overall, the study presents preliminary evidence linking acoustic characteristics to depressive symptoms and outlines a reproducible analytical workflow, while underscoring the need for substantially larger and more diverse datasets to establish clinically meaningful predictive validity. Full article
Show Figures

Figure 1

22 pages, 3760 KB  
Article
Embedded Implementation of Real-Time Voice Command Recognition on PIC Microcontroller
by Mohamed Shili, Salah Hammedi, Amjad Gawanmeh and Khaled Nouri
Automation 2025, 6(4), 79; https://doi.org/10.3390/automation6040079 - 28 Nov 2025
Viewed by 4150
Abstract
This paper describes a real-time system for recognizing voice commands for resource-constrained embedded devices, specifically a PIC microcontroller. While most existing speech ordering support solutions rely on high-performance processing platforms or cloud computation, the system described here performs fully embedded low-power processing locally [...] Read more.
This paper describes a real-time system for recognizing voice commands for resource-constrained embedded devices, specifically a PIC microcontroller. While most existing speech ordering support solutions rely on high-performance processing platforms or cloud computation, the system described here performs fully embedded low-power processing locally on the device. Sound is captured through a low-cost MEMS microphone, segmented into short audio frames, and time domain features are extracted (i.e., Zero-Crossing Rate (ZCR) and Short-Time Energy (STE)). These features were chosen for low power and computational efficiency and the ability to be processed in real time on a microcontroller. For the purposes of this experimental system, a small vocabulary of four command words (i.e., “ON”, “OFF”, “LEFT”, and “RIGHT”) were used to simulate real sound-ordering interfaces. The main contribution is demonstrated in the clever combination of low-complex, lightweight signal-processing techniques with embedded neural network inference, completing a classification cycle in real time (under 50 ms). It was demonstrated that the classification accuracy was over 90% using confusion matrices and timing analysis of the classifier’s performance across vocabularies with varying levels of complexity. This method is very applicable to IoT and portable embedded applications, offering a low-latency classification alternative to more complex and resource intensive classification architectures. Full article
Show Figures

Graphical abstract

25 pages, 3380 KB  
Article
Colour Classification Analysis Based on MFCC Acoustic Feature Sets and Machine Learning Algorithms in Sound–Colour Synaesthesia
by Raminta Bartulienė, Diana Ragaišė, Martynas Maciulevičius, Renaldas Raišutis, Gustavas Davidavičius, Aušra Saudargienė and Saulius Šatkauskas
Appl. Sci. 2025, 15(22), 12059; https://doi.org/10.3390/app152212059 - 13 Nov 2025
Viewed by 1126
Abstract
Sound–colour synaesthesia is a rare phenomenon in which auditory stimuli automatically evoke stable, subjectively real colour experiences. This study aimed to investigate whether the colours most frequently reported by a synesthete can be reliably predicted based on objective acoustic parameters of voice signals. [...] Read more.
Sound–colour synaesthesia is a rare phenomenon in which auditory stimuli automatically evoke stable, subjectively real colour experiences. This study aimed to investigate whether the colours most frequently reported by a synesthete can be reliably predicted based on objective acoustic parameters of voice signals. The study analysed the responses of a 24-year-old blind woman to different voices, which she consciously associates with distinct coloured silhouettes. A classification analysis based on MFCC acoustic feature sets and machine learning algorithms (SVM, XGBoost) demonstrated that the models could be trained with very high Accuracy—up to 97–100% in binary classification and 89–90% in multi-class classification. These results provide new insights into how specific sound characteristics are linked to imagery arising from the human subconscious. Full article
Show Figures

Figure 1

29 pages, 2068 KB  
Article
Voice-Based Early Diagnosis of Parkinson’s Disease Using Spectrogram Features and AI Models
by Danish Quamar, V. D. Ambeth Kumar, Muhammad Rizwan, Ovidiu Bagdasar and Manuella Kadar
Bioengineering 2025, 12(10), 1052; https://doi.org/10.3390/bioengineering12101052 - 29 Sep 2025
Cited by 4 | Viewed by 4309
Abstract
Parkinson’s disease (PD) is a progressive neurodegenerative disorder that significantly affects motor functions, including speech production. Voice analysis offers a less invasive, faster and more cost-effective approach for diagnosing and monitoring PD over time. This research introduces an automated system to distinguish between [...] Read more.
Parkinson’s disease (PD) is a progressive neurodegenerative disorder that significantly affects motor functions, including speech production. Voice analysis offers a less invasive, faster and more cost-effective approach for diagnosing and monitoring PD over time. This research introduces an automated system to distinguish between PD and non-PD individuals based on speech signals using state-of-the-art signal processing and machine learning (ML) methods. A publicly available voice dataset (Dataset 1, 81 samples) containing speech recordings from PD patients and non-PD individuals was used for model training and evaluation. Additionally, a small supplementary dataset (Dataset 2, 15 samples) was created although excluded from experiment, to illustrate potential future extensions of this work. Features such as Mel-frequency cepstral coefficients (MFCCs), spectrograms, Mel spectrograms and waveform representations were extracted to capture key vocal impairments related to PD, including diminished vocal range, weak harmonics, elevated spectral entropy and impaired formant structures. These extracted features were used to train and evaluate several ML models, including support vector machine (SVM), XGBoost and logistic regression, as well as deep learning (DL)architectures such as deep neural networks (DNN), convolutional neural networks (CNN) combined with long short-term memory (LSTM), CNN + gated recurrent unit (GRU) and bidirectional LSTM (BiLSTM). Experimental results show that DL models, particularly BiLSTM, outperform traditional ML models, achieving 97% accuracy and an AUC of 0.95. The comprehensive feature extraction from both datasets enabled robust classification of PD and non-PD speech signals. These findings highlight the potential of integrating acoustic features with DL methods for early diagnosis and monitoring of Parkinson’s Disease. Full article
Show Figures

Figure 1

42 pages, 1982 KB  
Article
SHAP-Based Identification of Potential Acoustic Biomarkers in Patients with Post-Thyroidectomy Voice Disorder
by Salih Celepli, Irem Bigat, Bilgi Karakas, Huseyin Mert Tezcan, Mehmet Dincay Yar, Pinar Celepli, Mehmet Feyzi Aksahin, Oguz Hancerliogullari, Yavuz Fuat Yilmaz and Osman Erogul
Diagnostics 2025, 15(16), 2065; https://doi.org/10.3390/diagnostics15162065 - 18 Aug 2025
Cited by 2 | Viewed by 2521
Abstract
Objective: The objective of this study was to identify potential robust acoustic biomarkers for functional post-thyroidectomy voice disorder (PTVD) that may support early diagnosis and personalized treatment strategies, using acoustic analysis and explainable machine learning methods. Methods: Spectral and cepstral features were extracted [...] Read more.
Objective: The objective of this study was to identify potential robust acoustic biomarkers for functional post-thyroidectomy voice disorder (PTVD) that may support early diagnosis and personalized treatment strategies, using acoustic analysis and explainable machine learning methods. Methods: Spectral and cepstral features were extracted from /a/ and /i/ voice recordings collected preoperatively and 4–6 weeks postoperatively from a total of 126 patients. Various Support Vector Machine (SVM) and Boosting models were trained. SHapley Additive exPlanations (SHAP) analysis was applied to enhance interpretability. SHAP values from training and test sets were compared via scatter plots to identify stable candidate biomarkers with high consistency. Results: GentleBoost (AUC = 0.85) and LogitBoost (AUC = 0.81) demonstrated the highest classification performance. Performance metrics across all models were evaluated for statistical significance. DeLong’s test was conducted to assess differences between ROC curves. The features iCPP, aCPP, and aHNR were identified as stable candidate biomarkers, exhibiting consistent SHAP distributions in both training and test sets in terms of direction and magnitude. These features showed statistically significant correlations with PTVD (p < 0.05) and demonstrated strong effect sizes (Cohen’s d = −2.95, −1.13, −0.60). Their diagnostic relevance was further supported by post hoc power analyses (iCPP: 1.00; aCPP: 0.998). Conclusions: SHAP-supported machine learning models offer an objective and clinically meaningful approach for evaluating PTVD. The identified features may serve as potential biomarkers to guide individualized voice therapy decisions during the early postoperative period. Full article
(This article belongs to the Special Issue A New Era in Diagnosis: From Biomarkers to Artificial Intelligence)
Show Figures

Figure 1

26 pages, 514 KB  
Article
Improving Voice Spoofing Detection Through Extensive Analysis of Multicepstral Feature Reduction
by Leonardo Mendes de Souza, Rodrigo Capobianco Guido, Rodrigo Colnago Contreras, Monique Simplicio Viana and Marcelo Adriano dos Santos Bongarti
Sensors 2025, 25(15), 4821; https://doi.org/10.3390/s25154821 - 5 Aug 2025
Cited by 1 | Viewed by 3449
Abstract
Voice biometric systems play a critical role in numerous security applications, including electronic device authentication, banking transaction verification, and confidential communications. Despite their widespread utility, these systems are increasingly targeted by sophisticated spoofing attacks that leverage advanced artificial intelligence techniques to generate realistic [...] Read more.
Voice biometric systems play a critical role in numerous security applications, including electronic device authentication, banking transaction verification, and confidential communications. Despite their widespread utility, these systems are increasingly targeted by sophisticated spoofing attacks that leverage advanced artificial intelligence techniques to generate realistic synthetic speech. Addressing the vulnerabilities inherent to voice-based authentication systems has thus become both urgent and essential. This study proposes a novel experimental analysis that extensively explores various dimensionality reduction strategies in conjunction with supervised machine learning models to effectively identify spoofed voice signals. Our framework involves extracting multicepstral features followed by the application of diverse dimensionality reduction methods, such as Principal Component Analysis (PCA), Truncated Singular Value Decomposition (SVD), statistical feature selection (ANOVA F-value, Mutual Information), Recursive Feature Elimination (RFE), regularization-based LASSO selection, Random Forest feature importance, and Permutation Importance techniques. Empirical evaluation using the ASVSpoof 2017 v2.0 dataset measures the classification performance with the Equal Error Rate (EER) metric, achieving values of approximately 10%. Our comparative analysis demonstrates significant performance gains when dimensionality reduction methods are applied, underscoring their value in enhancing the security and effectiveness of voice biometric verification systems against emerging spoofing threats. Full article
(This article belongs to the Special Issue Sensors and Machine-Learning Based Signal Processing)
Show Figures

Figure 1

21 pages, 34246 KB  
Article
A Multi-Epiphysiological Indicator Dog Emotion Classification System Integrating Skin and Muscle Potential Signals
by Wenqi Jia, Yanzhi Hu, Zimeng Wang, Kai Song and Boyan Huang
Animals 2025, 15(13), 1984; https://doi.org/10.3390/ani15131984 - 5 Jul 2025
Viewed by 1635
Abstract
This study introduces an innovative dog emotion classification system that integrates four non-invasive physiological indicators—skin potential (SP), muscle potential (MP), respiration frequency (RF), and voice pattern (VP)—with the extreme gradient boosting (XGBoost) algorithm. A four-breed dataset was meticulously constructed by recording and labeling [...] Read more.
This study introduces an innovative dog emotion classification system that integrates four non-invasive physiological indicators—skin potential (SP), muscle potential (MP), respiration frequency (RF), and voice pattern (VP)—with the extreme gradient boosting (XGBoost) algorithm. A four-breed dataset was meticulously constructed by recording and labeling physiological signals from dogs exposed to four fundamental emotional states: happiness, sadness, fear, and anger. Comprehensive feature extraction (time-domain, frequency-domain, nonlinearity) was conducted for each signal modality, and inter-emotional variance was analyzed to establish discriminative patterns. Four machine learning algorithms—Neural Networks (NN), Support Vector Machines (SVM), Gradient Boosting Decision Trees (GBDT), and XGBoost—were trained and evaluated, with XGBoost achieving the highest classification accuracy of 90.54%. Notably, this is the first study to integrate a fusion of two complementary electrophysiological indicators—skin and muscle potentials—into a multi-modal dataset for canine emotion recognition. Further interpretability analysis using Shapley Additive exPlanations (SHAP) revealed skin potential and voice pattern features as the most contributive to model performance. The proposed system demonstrates high accuracy, efficiency, and portability, laying a robust groundwork for future advancements in cross-species affective computing and intelligent animal welfare technologies. Full article
(This article belongs to the Special Issue Animal–Computer Interaction: New Horizons in Animal Welfare)
Show Figures

Figure 1

13 pages, 1695 KB  
Article
Deepfake Voice Detection: An Approach Using End-to-End Transformer with Acoustic Feature Fusion by Cross-Attention
by Liang Yu Gong and Xue Jun Li
Electronics 2025, 14(10), 2040; https://doi.org/10.3390/electronics14102040 - 16 May 2025
Cited by 5 | Viewed by 5668
Abstract
Deepfake technology uses artificial intelligence to create highly realistic but fake audio, video, or images, often making it difficult to distinguish from real content. Due to its potential use for misinformation, fraud, and identity theft, deepfake technology has gained a bad reputation in [...] Read more.
Deepfake technology uses artificial intelligence to create highly realistic but fake audio, video, or images, often making it difficult to distinguish from real content. Due to its potential use for misinformation, fraud, and identity theft, deepfake technology has gained a bad reputation in the digital world. Recently, many works have reported on the detection of deepfake videos/images. However, few studies have concentrated on developing robust deepfake voice detection systems. Among most existing studies in this field, a deepfake voice detection system commonly requires a large amount of training data and a robust backbone to detect real and logistic attack audio. For acoustic feature extractions, Mel-frequency Filter Bank (MFB)-based approaches are more suitable for extracting speech signals than applying the raw spectrum as input. Recurrent Neural Networks (RNNs) have been successfully applied to Natural Language Processing (NLP), but these backbones suffer from gradient vanishing or explosion while processing long-term sequences. In addition, the cross-dataset evaluation of most deepfake voice recognition systems has weak performance, leading to a system robustness issue. To address these issues, we propose an acoustic feature-fusion method to combine Mel-spectrum and pitch representation based on cross-attention mechanisms. Then, we combine a Transformer encoder with a convolutional neural network block to extract global and local features as a front end. Finally, we connect the back end with one linear layer for classification. We summarized several deepfake voice detectors’ performances on the silence-segment processed ASVspoof 2019 dataset. Our proposed method can achieve an Equal Error Rate (EER) of 26.41%, while most of the existing methods result in EER higher than 30%. We also tested our proposed method on the ASVspoof 2021 dataset, and found that it can achieve an EER as low as 28.52%, while the EER values for existing methods are all higher than 28.9%. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

17 pages, 4114 KB  
Article
Biomimetic Computing for Efficient Spoken Language Identification
by Gaurav Kumar and Saurabh Bhardwaj
Biomimetics 2025, 10(5), 316; https://doi.org/10.3390/biomimetics10050316 - 14 May 2025
Cited by 2 | Viewed by 1776
Abstract
Spoken Language Identification (SLID)-based applications have become increasingly important in everyday life, driven by advancements in artificial intelligence and machine learning. Multilingual countries utilize the SLID method to facilitate speech detection. This is accomplished by determining the language of the spoken parts using [...] Read more.
Spoken Language Identification (SLID)-based applications have become increasingly important in everyday life, driven by advancements in artificial intelligence and machine learning. Multilingual countries utilize the SLID method to facilitate speech detection. This is accomplished by determining the language of the spoken parts using language recognizers. On the other hand, when working with multilingual datasets, the presence of multiple languages that have a shared origin presents a significant challenge for accurately classifying languages using automatic techniques. Further, one more challenge is the significant variance in speech signals caused by factors such as different speakers, content, acoustic settings, language differences, changes in voice modulation based on age and gender, and variations in speech patterns. In this study, we introduce the DBODL-MSLIS approach, which integrates biomimetic optimization techniques inspired by natural intelligence to enhance language classification. The proposed method employs Dung Beetle Optimization (DBO) with Deep Learning, simulating the beetle’s foraging behavior to optimize feature selection and classification performance. The proposed technique integrates speech preprocessing, which encompasses pre-emphasis, windowing, and frame blocking, followed by feature extraction utilizing pitch, energy, Discrete Wavelet Transform (DWT), and Zero crossing rate (ZCR). Further, the selection of features is performed by DBO algorithm, which removes redundant features and helps to improve efficiency and accuracy. Spoken languages are classified using Bayesian optimization (BO) in conjunction with a long short-term memory (LSTM) network. The DBODL-MSLIS technique has been experimentally validated using the IIIT Spoken Language dataset. The results indicate an average accuracy of 95.54% and an F-score of 84.31%. This technique surpasses various other state-of-the-art models, such as SVM, MLP, LDA, DLA-ASLISS, HMHFS-IISLFAS, GA base fusion, and VGG-16. We have evaluated the accuracy of our proposed technique against state-of-the-art biomimetic computing models such as GA, PSO, GWO, DE, and ACO. While ACO achieved up to 89.45% accuracy, our Bayesian Optimization with LSTM outperformed all others, reaching a peak accuracy of 95.55%, demonstrating its effectiveness in enhancing spoken language identification. The suggested technique demonstrates promising potential for practical applications in the field of multi-lingual voice processing. Full article
Show Figures

Figure 1

Back to TopTop