MDPI - Publisher of Open Access Journals

18 pages, 615 KB

Open AccessArticle

Auditory Processing and Speech Sound Disorders: Behavioral and Electrophysiological Findings

by Konstantinos Drosos, Paris Vogazianos, Dionysios Tafiadis, Louiza Voniati, Alexandra Papanicolaou, Klea Panayidou and Chryssoula Thodi

Audiol. Res. 2025, 15(5), 119; https://doi.org/10.3390/audiolres15050119 - 19 Sep 2025

Viewed by 367

Abstract

Background: Children diagnosed with Speech Sound Disorders (SSDs) encounter difficulties in speech perception, especially when listening in the presence of background noise. Recommended protocols for auditory processing evaluation include behavioral linguistic and speech processing tests, as well as objective electrophysiological measures. The present [...] Read more.

Background: Children diagnosed with Speech Sound Disorders (SSDs) encounter difficulties in speech perception, especially when listening in the presence of background noise. Recommended protocols for auditory processing evaluation include behavioral linguistic and speech processing tests, as well as objective electrophysiological measures. The present study compared the auditory processing profiles of children with SSD and typically developing (TD) children using a battery of behavioral language and auditory tests combined with auditory evoked responses. Methods: Forty (40) parents of 7–10 years old Greek Cypriot children completed parent questionnaires related to their children’s listening; their children completed an assessment comprising language, phonology, auditory processing, and auditory evoked responses. The experimental group included 24 children with a history of SSDs; the control group consisted of 16 TD children. Results: Three factors significantly differentiated SSD from TD children: Factor 1 (auditory processing screening), Factor 5 (phonological awareness), and Factor 13 (Auditory Brainstem Response—ABR wave V latency). Among these, Factor 1 consistently predicted SSD classification both independently and in combined models, indicating strong ecological and diagnostic relevance. This predictive power suggests real-world listening behaviors are central to SSD differentiation. The significant correlation between Factor 5 and Factor 13 may suggest an interaction between auditory processing at the brainstem level and higher-order phonological manipulation. Conclusions: This research underscores the diagnostic significance of integrating behavioral and physiological metrics through dimensional and predictive methodologies. Factor 1, which focuses on authentic listening environments, was identified as the strongest predictor. These results advocate for the inclusion of ecologically valid listening items in the screening for APD. Poor discrimination of speech in noise imposes discrepancies between incoming auditory information and retained phonological representations, which disrupts the implicit processing mechanisms that align auditory input with phonological representations stored in memory. Speech and language pathologists can incorporate pertinent auditory processing assessment findings to identify potential language-processing challenges and formulate more effective therapeutic intervention strategies. Full article

(This article belongs to the Section Speech and Language)

► Show Figures

Figure 1

20 pages, 6876 KB

Open AccessArticle

Spatiotemporal Heterogeneity of Forest Park Soundscapes Based on Deep Learning: A Case Study of Zhangjiajie National Forest Park

by Debing Zhuo, Chenguang Yan, Wenhai Xie, Zheqian He and Zhongyu Hu

Forests 2025, 16(9), 1416; https://doi.org/10.3390/f16091416 - 4 Sep 2025

Viewed by 517

Abstract

As a perceptual representation of ecosystem structure and function, the soundscape has become an important indicator for evaluating ecological health and assessing the impacts of human disturbances. Understanding the spatiotemporal heterogeneity of soundscapes is essential for revealing ecological processes and human impacts in [...] Read more.

As a perceptual representation of ecosystem structure and function, the soundscape has become an important indicator for evaluating ecological health and assessing the impacts of human disturbances. Understanding the spatiotemporal heterogeneity of soundscapes is essential for revealing ecological processes and human impacts in protected areas. This study investigates such heterogeneity in Zhangjiajie National Forest Park using deep learning approaches. To this end, we constructed a dataset comprising eight representative sound source categories by integrating field recordings with online audio (BBC Sound Effects Archive and Freesound), and trained a classification model to accurately identify biophony, geophony, and anthrophony, which enabled the subsequent analysis of spatiotemporal distribution patterns. Our results indicate that temporal variations in the soundscape are closely associated with circadian rhythms and tourist activities, while spatial patterns are strongly shaped by topography, vegetation, and human interference. Biophony is primarily concentrated in areas with minimal ecological disturbance, geophony is regulated by landforms and microclimatic conditions, and anthrophony tends to mask natural sound sources. Overall, the study highlights how deep learning-based soundscape classification can reveal the mechanisms by which natural and anthropogenic factors structure acoustic environments, offering methodological references and practical insights for ecological management and soundscape conservation. Full article

(This article belongs to the Section Forest Ecology and Management)

► Show Figures

Figure 1

21 pages, 3700 KB

Open AccessArticle

Lung Sound Classification Model for On-Device AI

by Jinho Park, Chanhee Jeong, Yeonshik Choi, Hyuck-ki Hong and Youngchang Jo

Appl. Sci. 2025, 15(17), 9361; https://doi.org/10.3390/app15179361 - 26 Aug 2025

Viewed by 955

Abstract

Following the COVID-19 pandemic, public interest in healthcare has significantly in-creased, emphasizing the importance of early disease detection through lung sound analysis. Lung sounds serve as a critical biomarker in the diagnosis of pulmonary diseases, and numerous deep learning-based approaches have been actively [...] Read more.

Following the COVID-19 pandemic, public interest in healthcare has significantly in-creased, emphasizing the importance of early disease detection through lung sound analysis. Lung sounds serve as a critical biomarker in the diagnosis of pulmonary diseases, and numerous deep learning-based approaches have been actively explored for this purpose. Existing lung sound classification models have demonstrated high accuracy, benefiting from recent advances in artificial intelligence (AI) technologies. However, these models often rely on transmitting data to computationally intensive servers for processing, introducing potential security risks due to the transfer of sensitive medical information over networks. To mitigate these concerns, on-device AI has garnered growing attention as a promising solution for protecting healthcare data. On-device AI enables local data processing and inference directly on the device, thereby enhancing data security compared to server-based schemes. Despite these advantages, on-device AI is inherently limited by computational constraints, while conventional models typically require substantial processing power to maintain high performance. In this study, we propose a lightweight lung sound classification model designed specifically for on-device environments. The proposed scheme extracts audio features using Mel spectrograms, chromagrams, and Mel-Frequency Cepstral Coefficients (MFCC), which are converted into image representations and stacked to form the model input. The lightweight model performs convolution operations tailored to both temporal and frequency–domain characteristics of lung sounds. Comparative experimental results demonstrate that the proposed model achieves superior inference performance while maintaining a significantly smaller model size than conventional classification schemes, making it well-suited for deployment on resource-constrained devices. Full article

► Show Figures

Figure 1

26 pages, 6425 KB

Open AccessArticle

Deep Spectrogram Learning for Gunshot Classification: A Comparative Study of CNN Architectures and Time-Frequency Representations

by Pafan Doungpaisan and Peerapol Khunarsa

J. Imaging 2025, 11(8), 281; https://doi.org/10.3390/jimaging11080281 - 21 Aug 2025

Viewed by 849

Abstract

Gunshot sound classification plays a crucial role in public safety, forensic investigations, and intelligent surveillance systems. This study evaluates the performance of deep learning models in classifying firearm sounds by analyzing twelve time–frequency spectrogram representations, including Mel, Bark, MFCC, CQT, Cochleagram, STFT, FFT, [...] Read more.

Gunshot sound classification plays a crucial role in public safety, forensic investigations, and intelligent surveillance systems. This study evaluates the performance of deep learning models in classifying firearm sounds by analyzing twelve time–frequency spectrogram representations, including Mel, Bark, MFCC, CQT, Cochleagram, STFT, FFT, Reassigned, Chroma, Spectral Contrast, and Wavelet. The dataset consists of 2148 gunshot recordings from four firearm types, collected in a semi-controlled outdoor environment under multi-orientation conditions. To leverage advanced computer vision techniques, all spectrograms were converted into RGB images using perceptually informed colormaps. This enabled the application of image processing approaches and fine-tuning of pre-trained Convolutional Neural Networks (CNNs) originally developed for natural image classification. Six CNN architectures—ResNet18, ResNet50, ResNet101, GoogLeNet, Inception-v3, and InceptionResNetV2—were trained on these spectrogram images. Experimental results indicate that CQT, Cochleagram, and Mel spectrograms consistently achieved high classification accuracy, exceeding 94% when paired with deep CNNs such as ResNet101 and InceptionResNetV2. These findings demonstrate that transforming time–frequency features into RGB images not only facilitates the use of image-based processing but also allows deep models to capture rich spectral–temporal patterns, providing a robust framework for accurate firearm sound classification. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

23 pages, 1302 KB

Open AccessArticle

Deep Learning-Enhanced Ocean Acoustic Tomography: A Latent Feature Fusion Framework for Hydrographic Inversion with Source Characteristic Embedding

by Jiawen Zhou, Zikang Chen, Yongxin Zhu and Xiaoying Zheng

Information 2025, 16(8), 665; https://doi.org/10.3390/info16080665 - 4 Aug 2025

Viewed by 624

Abstract

Ocean Acoustic Tomography (OAT) is an important marine remote sensing technique used for inverting large-scale ocean environmental parameters, but traditional methods face challenges in computational complexity and environmental interference. This paper proposes a causal analysis-driven AI FOR SCIENCE method for high-precision and rapid [...] Read more.

Ocean Acoustic Tomography (OAT) is an important marine remote sensing technique used for inverting large-scale ocean environmental parameters, but traditional methods face challenges in computational complexity and environmental interference. This paper proposes a causal analysis-driven AI FOR SCIENCE method for high-precision and rapid inversion of oceanic hydrological parameters in complex underwater environments. Based on the open-source VTUAD (Vessel Type Underwater Acoustic Data) dataset, the method first utilizes a fine-tuned Paraformer (a fast and accurate parallel transformer) model for precise classification of sound source targets. Then, using structural causal models (SCM) and potential outcome frameworks, causal embedding vectors with physical significance are constructed. Finally, a cross-modal Transformer network is employed to fuse acoustic features, sound source priors, and environmental variables, enabling inversion of temperature and salinity in the Georgia Strait of Canada. Experimental results show that the method achieves accuracies of 97.77% and 95.52% for temperature and salinity inversion tasks, respectively, significantly outperforming traditional methods. Additionally, with GPU acceleration, the inference speed is improved by over sixfold, aimed at enabling real-time Ocean Acoustic Tomography (OAT) on edge computing platforms as smart hardware, thereby validating the method’s practicality. By incorporating causal inference and cross-modal data fusion, this study not only enhances inversion accuracy and model interpretability but also provides new insights for real-time applications of OAT. Full article

(This article belongs to the Special Issue Advances in Intelligent Hardware, Systems and Applications)

► Show Figures

Figure 1

19 pages, 1160 KB

Open AccessArticle

Multi-User Satisfaction-Driven Bi-Level Optimization of Electric Vehicle Charging Strategies

by Boyin Chen, Jiangjiao Xu and Dongdong Li

Energies 2025, 18(15), 4097; https://doi.org/10.3390/en18154097 - 1 Aug 2025

Viewed by 565

Abstract

The accelerating integration of electric vehicles (EVs) into contemporary transportation infrastructure has underscored significant limitations in traditional charging paradigms, particularly in accommodating heterogeneous user requirements within dynamic operational environments. This study presents a differentiated optimization framework for EV charging strategies through the systematic [...] Read more.

The accelerating integration of electric vehicles (EVs) into contemporary transportation infrastructure has underscored significant limitations in traditional charging paradigms, particularly in accommodating heterogeneous user requirements within dynamic operational environments. This study presents a differentiated optimization framework for EV charging strategies through the systematic classification of user types. A multidimensional decision-making environment is established for three representative user categories—residential, commercial, and industrial—by synthesizing time-variant electricity pricing models with dynamic carbon emission pricing mechanisms. A bi-level optimization architecture is subsequently formulated, leveraging deep reinforcement learning (DRL) to capture user-specific demand characteristics through customized reward functions and adaptive constraint structures. Validation is conducted within a high-fidelity simulation environment featuring 90 autonomous EV charging agents operating in a metropolitan parking facility. Empirical results indicate that the proposed typology-driven approach yields a 32.6% average cost reduction across user groups relative to baseline charging protocols, with statistically significant improvements in expenditure optimization (p < 0.01). Further interpretability analysis employing gradient-weighted class activation mapping (Grad-CAM) demonstrates that the model’s attention mechanisms are well aligned with theoretically anticipated demand prioritization patterns across the distinct user types, thereby confirming the decision-theoretic soundness of the framework. Full article

(This article belongs to the Section E: Electric Vehicles)

► Show Figures

Figure 1

35 pages, 5195 KB

Open AccessArticle

A Multimodal AI Framework for Automated Multiclass Lung Disease Diagnosis from Respiratory Sounds with Simulated Biomarker Fusion and Personalized Medication Recommendation

by Abdullah, Zulaikha Fatima, Jawad Abdullah, José Luis Oropeza Rodríguez and Grigori Sidorov

Int. J. Mol. Sci. 2025, 26(15), 7135; https://doi.org/10.3390/ijms26157135 - 24 Jul 2025

Viewed by 1446

Abstract

Respiratory diseases represent a persistent global health challenge, underscoring the need for intelligent, accurate, and personalized diagnostic and therapeutic systems. Existing methods frequently suffer from limitations in diagnostic precision, lack of individualized treatment, and constrained adaptability to complex clinical scenarios. To address these [...] Read more.

Respiratory diseases represent a persistent global health challenge, underscoring the need for intelligent, accurate, and personalized diagnostic and therapeutic systems. Existing methods frequently suffer from limitations in diagnostic precision, lack of individualized treatment, and constrained adaptability to complex clinical scenarios. To address these challenges, our study introduces a modular AI-powered framework that integrates an audio-based disease classification model with simulated molecular biomarker profiles to evaluate the feasibility of future multimodal diagnostic extensions, alongside a synthetic-data-driven prescription recommendation engine. The disease classification model analyzes respiratory sound recordings and accurately distinguishes among eight clinical classes: bronchiectasis, pneumonia, upper respiratory tract infection (URTI), lower respiratory tract infection (LRTI), asthma, chronic obstructive pulmonary disease (COPD), bronchiolitis, and healthy respiratory state. The proposed model achieved a classification accuracy of 99.99% on a holdout test set, including 94.2% accuracy on pediatric samples. In parallel, the prescription module provides individualized treatment recommendations comprising drug, dosage, and frequency trained on a carefully constructed synthetic dataset designed to emulate real-world prescribing logic.The model achieved over 99% accuracy in medication prediction tasks, outperforming baseline models such as those discussed in research. Minimal misclassification in the confusion matrix and strong clinician agreement on 200 prescriptions (Cohen’s

κ

= 0.91 [0.87–0.94] for drug selection, 0.78 [0.74–0.81] for dosage, 0.96 [0.93–0.98] for frequency) further affirm the system’s reliability. Adjusted clinician disagreement rates were 2.7% (drug), 6.4% (dosage), and 1.5% (frequency). SHAP analysis identified age and smoking as key predictors, enhancing model explainability. Dosage accuracy was 91.3%, and most disagreements occurred in renal-impaired and pediatric cases. However, our study is presented strictly as a proof-of-concept. The use of synthetic data and the absence of access to real patient records constitute key limitations. A trialed clinical deployment was conducted under a controlled environment with a positive rate of satisfaction from experts and users, but the proposed system must undergo extensive validation with de-identified electronic medical records (EMRs) and regulatory scrutiny before it can be considered for practical application. Nonetheless, the findings offer a promising foundation for the future development of clinically viable AI-assisted respiratory care tools. Full article

(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine: 3rd Edition)

► Show Figures

Figure 1

27 pages, 1533 KB

Open AccessEditor’s ChoiceArticle

Sound Source Localization Using Hybrid Convolutional Recurrent Neural Networks in Undesirable Conditions

by Bastian Estay Zamorano, Ali Dehghan Firoozabadi, Alessio Brutti, Pablo Adasme, David Zabala-Blanco, Pablo Palacios Játiva and Cesar A. Azurdia-Meza

Electronics 2025, 14(14), 2778; https://doi.org/10.3390/electronics14142778 - 10 Jul 2025

Viewed by 961

Abstract

Sound event localization and detection (SELD) is a fundamental task in spatial audio processing that involves identifying both the type and location of sound events in acoustic scenes. Current SELD models often struggle with low signal-to-noise ratios (SNRs) and high reverberation. This article [...] Read more.

Sound event localization and detection (SELD) is a fundamental task in spatial audio processing that involves identifying both the type and location of sound events in acoustic scenes. Current SELD models often struggle with low signal-to-noise ratios (SNRs) and high reverberation. This article addresses SELD by reformulating direction of arrival (DOA) estimation as a multi-class classification task, leveraging deep convolutional recurrent neural networks (CRNNs). We propose and evaluate two modified architectures: M-DOAnet, an optimized version of DOAnet for localization and tracking, and M-SELDnet, a modified version of SELDnet, which has been designed for joint SELD. Both modified models were rigorously evaluated on the STARSS23 dataset, which comprises 13-class, real-world indoor scenes totaling over 7 h of audio, using spectrograms and acoustic intensity maps from first-order Ambisonics (FOA) signals. M-DOAnet achieved exceptional localization (6.00° DOA error, 72.8% F1-score) and perfect tracking (100% MOTA with zero identity switches). It also demonstrated high computational efficiency, training in 4.5 h (164 s/epoch). In contrast, M-SELDnet delivered strong overall SELD performance (0.32 rad DOA error, 0.75 F1-score, 0.38 error rate, 0.20 SELD score), but with significantly higher resource demands, training in 45 h (1620 s/epoch). Our findings underscore a clear trade-off between model specialization and multifunctionality, providing practical insights for designing SELD systems in real-time and computationally constrained environments. Full article

(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis, 2nd Edition)

► Show Figures

Figure 1

20 pages, 300 KB

Open AccessReview

Mapping Constructivist Active Learning for STEM: Toward Sustainable Education

by Rania Bou Saad, Ariadna Llorens Garcia and Jose M. Cabre Garcia

Sustainability 2025, 17(13), 6225; https://doi.org/10.3390/su17136225 - 7 Jul 2025

Viewed by 1525

Abstract

As STEM education evolves, educators face growing challenges in selecting and adapting active learning strategies that are pedagogically sound, scalable, and aligned with sustainability goals. This study identifies and analyzes thirteen active (X-BLs) methods using a quantitative and qualitative, multi-criteria framework based on [...] Read more.

As STEM education evolves, educators face growing challenges in selecting and adapting active learning strategies that are pedagogically sound, scalable, and aligned with sustainability goals. This study identifies and analyzes thirteen active (X-BLs) methods using a quantitative and qualitative, multi-criteria framework based on historical originality, conceptual distinctiveness, and compatibility with STEM education. The resulting classification—organized into the categories of originality, innovation, collaboration, and technology—provides a dynamic lens for understanding the development and context of active learning approaches. Beyond its theoretical contribution, the framework offers practical guidance for curriculum designers, school leaders, and policy makers seeking to implement context-sensitive, future-oriented STEM education. By clarifying which methods are foundational and which are more adaptive or emergent, the findings can support strategic decision making, promote pedagogical innovation, and contribute to building more sustainable and interdisciplinary learning environments. This work also sets the stage for further exploration of culturally and regionally grounded pedagogical approaches that address real-world challenges. Full article

25 pages, 2093 KB

Open AccessArticle

Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

by Samuel Yaw Mensah, Tao Zhang, Nahid AI Mahmud and Yanzhang Geng

Electronics 2025, 14(13), 2643; https://doi.org/10.3390/electronics14132643 - 30 Jun 2025

Viewed by 2308

Abstract

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and [...] Read more.

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and unauthorized speech. This study investigates a hybrid deep learning framework that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) to enhance speech quality and improve sound classification accuracy in noisy security environments. The proposed model is trained and validated using real-world datasets containing diverse noise distortions, including VoxCeleb for benchmarking speech enhancement and UrbanSound8K and ESC-50 for sound classification. Performance is evaluated using industry-standard metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and Signal-to-Noise Ratio (SNR). The architecture includes multi-layered neural networks, residual connections, and dropout regularization to ensure robustness and generalizability. Additionally, the paper addresses key challenges in deploying deep learning models for security applications, such as computational complexity, latency, and vulnerability to adversarial attacks. Experimental results demonstrate that the proposed DNN + GAN-based approach significantly improves speech intelligibility and classification performance in high-interference scenarios, offering a scalable solution for enhancing the reliability of audio-based security systems. Full article

► Show Figures

Figure 1

16 pages, 1166 KB

Open AccessArticle

Research on Acoustic Scene Classification Based on Time–Frequency–Wavelet Fusion Network

by Fengzheng Bi and Lidong Yang

Sensors 2025, 25(13), 3930; https://doi.org/10.3390/s25133930 - 24 Jun 2025

Viewed by 731

Abstract

Acoustic scene classification aims to recognize the scenes corresponding to sound signals in the environment, but audio differences from different cities and devices can affect the model’s accuracy. In this paper, a time–frequency–wavelet fusion network is proposed to improve model performance by focusing [...] Read more.

Acoustic scene classification aims to recognize the scenes corresponding to sound signals in the environment, but audio differences from different cities and devices can affect the model’s accuracy. In this paper, a time–frequency–wavelet fusion network is proposed to improve model performance by focusing on three dimensions: the time dimension of the spectrogram, the frequency dimension, and the high- and low-frequency information extracted by a wavelet transform through a time–frequency–wavelet module. Multidimensional information was fused through the gated temporal–spatial attention unit, and the visual state space module was introduced to enhance the contextual modeling capability of audio sequences. In addition, Kolmogorov–Arnold network layers were used in place of multilayer perceptrons in the classifier part. The experimental results show that the proposed method achieves a 56.16% average accuracy on the TAU Urban Acoustic Scenes 2022 mobile development dataset, which is an improvement of 6.53% compared to the official baseline system. This performance improvement demonstrates the effectiveness of the model in complex scenarios. In addition, the accuracy of the proposed method on the UrbanSound8K dataset reached 97.60%, which is significantly better than the existing methods, further verifying the generalization ability of the proposed model in the acoustic scene classification task. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

32 pages, 1553 KB

Open AccessArticle

A Fuzzy Logic Framework for Text-Based Incident Prioritization: Mathematical Modeling and Case Study Evaluation

by Arturo Peralta, José A. Olivas and Pedro Navarro-Illana

Mathematics 2025, 13(12), 2014; https://doi.org/10.3390/math13122014 - 18 Jun 2025

Viewed by 724

Abstract

Incident prioritization is a critical task in enterprise environments, where textual descriptions of service disruptions often contain vague or ambiguous language. Traditional machine learning models, while effective in rigid classification, struggle to interpret the linguistic uncertainty inherent in natural language reports. This paper [...] Read more.

Incident prioritization is a critical task in enterprise environments, where textual descriptions of service disruptions often contain vague or ambiguous language. Traditional machine learning models, while effective in rigid classification, struggle to interpret the linguistic uncertainty inherent in natural language reports. This paper proposes a fuzzy logic-based framework for incident categorization and prioritization, integrating natural language processing (NLP) with a formal system of fuzzy inference. The framework transforms semantic embeddings from incident reports into fuzzy sets, allowing incident severity and urgency to be represented as degrees of membership in multiple categories. A mathematical model based on Mamdani-type inference and triangular membership functions is developed to capture and process imprecise inputs. The proposed system is evaluated on a real-world dataset comprising 10,000 incident descriptions from a mid-sized technology enterprise. A comparative evaluation is conducted against two baseline models: a fine-tuned BERT classifier and a traditional support vector machine (SVM). Results show that the fuzzy logic approach achieves a 7.4% improvement in F1-score over BERT (92.1% vs. 85.7%) and a 12.5% improvement over SVM (92.1% vs. 79.6%) for medium-severity incidents, where linguistic ambiguity is most prevalent. Qualitative analysis from domain experts confirmed that the fuzzy model provided more interpretable and context-aware classifications, improving operator trust and alignment with human judgment. These findings suggest that fuzzy modeling offers a mathematically sound and operationally effective solution for managing uncertainty in text-based incident management, contributing to the broader understanding of mathematical modeling in enterprise-scale social phenomena. Full article

(This article belongs to the Special Issue Social Phenomena: Mathematical Modeling and Data Analysis)

► Show Figures

Figure 1

15 pages, 4413 KB

Open AccessArticle

Fault Diagnosis Systems for Robots: Acoustic Sensing-Based Identification of Detached Components for Fault Localization

by Woonghee Yeo and Mitsuharu Matsumoto

Appl. Sci. 2025, 15(12), 6564; https://doi.org/10.3390/app15126564 - 11 Jun 2025

Cited by 1 | Viewed by 1043

Abstract

As robotic systems become more prevalent in daily life and industrial environments, ensuring their reliability through autonomous self-diagnosis is becoming increasingly important. This study investigates whether acoustic sensing can serve as a viable foundation for such self-diagnostic systems by examining its effectiveness in [...] Read more.

As robotic systems become more prevalent in daily life and industrial environments, ensuring their reliability through autonomous self-diagnosis is becoming increasingly important. This study investigates whether acoustic sensing can serve as a viable foundation for such self-diagnostic systems by examining its effectiveness in localizing structural faults. This study focuses on developing a fault diagnosis framework for robots using acoustic sensing technology. The objective is to design a simple yet accurate system capable of identifying fault locations and types of robots based solely on sound data, without relying on traditional sensors or cameras. To achieve this, sweep signals were applied to a modular robot, and acoustic responses were collected under various structural configurations over five days. Frequency-domain features were extracted using the Fast Fourier Transform (FFT), and classification was performed using five machine learning models: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), XGBoost, and Multi-Layer Perceptron (MLP). Among these, MLP achieved the highest accuracy (71.4%), followed by SVM (65.7%), LightGBM (62.9%), KNN (60%), XGBoost (57.1%), and RF (51.4%). These results demonstrate the feasibility of diagnosing structural changes in robots using acoustic sensing alone, even with a compact hardware setup and limited training data. These findings suggest that acoustic sensing can provide a practical and efficient approach for robot fault diagnosis, offering potential applications in environments where conventional diagnostic tools are impractical. The study highlights the advantages of incorporating acoustic sensing into fault diagnosis systems and underscores its potential for developing accessible and effective diagnostic solutions for robotics. Full article

(This article belongs to the Special Issue New Technology Trends in Smart Sensing)

► Show Figures

Figure 1

28 pages, 13595 KB

Open AccessArticle

Open-Set Recognition of Environmental Sound Based on KDE-GAN and Attractor–Reciprocal Point Learning

by Jiakuan Wu, Nan Wang, Huajie Hong, Wei Wang, Kunsheng Xing and Yujie Jiang

Acoustics 2025, 7(2), 33; https://doi.org/10.3390/acoustics7020033 - 28 May 2025

Viewed by 1007

Abstract

While open-set recognition algorithms have been extensively explored in computer vision, their application to environmental sound analysis remains understudied. To address this gap, this study investigates how to effectively recognize unknown sound categories in real-world environments by proposing a novel Kernel Density Estimation-based [...] Read more.

While open-set recognition algorithms have been extensively explored in computer vision, their application to environmental sound analysis remains understudied. To address this gap, this study investigates how to effectively recognize unknown sound categories in real-world environments by proposing a novel Kernel Density Estimation-based Generative Adversarial Network (KDE-GAN) for data augmentation combined with Attractor–Reciprocal Point Learning for open-set classification. Specifically, our approach addresses three key challenges: (1) How to generate boundary-aware synthetic samples for robust open-set training: A closed-set classifier’s pre-logit layer outputs are fed into the KDE-GAN, which synthesizes samples mapped to the logit layer using the classifier’s original weights. Kernel Density Estimation then enforces Density Loss and Offset Loss to ensure these samples align with class boundaries. (2) How to optimize feature space organization: The closed-set classifier is constrained by an Attractor–Reciprocal Point joint loss, maintaining intra-class compactness while pushing unknown samples toward low-density regions. (3) How to evaluate performance in highly open scenarios: We validate the method using UrbanSound8K, AudioEventDataset, and TUT Acoustic Scenes 2017 as closed sets, with ESC-50 categories as open-set samples, achieving AUROC/OSCR scores of 0.9251/0.8743, 0.7921/0.7135, and 0.8209/0.6262, respectively. The findings demonstrate the potential of this framework to enhance environmental sound monitoring systems, particularly in applications requiring adaptability to unseen acoustic events (e.g., urban noise surveillance or wildlife monitoring). Full article

► Show Figures

Figure 1

20 pages, 2808 KB

Open AccessArticle

Deep Learning-Based Multi-Label Classification for Forest Soundscape Analysis: A Case Study in Shennongjia National Park

by Caiyun Yang, Xuanxin Liu, Yiyang Li and Xinwen Yu

Forests 2025, 16(6), 899; https://doi.org/10.3390/f16060899 - 27 May 2025

Viewed by 622

Abstract

Forest soundscapes contain rich ecological information that reflects the composition, structure, and dynamics of biodiversity within forest ecosystems. The effective monitoring of these soundscapes is essential for forest conservation and wildlife management. However, traditional manual annotation methods are time-consuming and limited in scalability, [...] Read more.

Forest soundscapes contain rich ecological information that reflects the composition, structure, and dynamics of biodiversity within forest ecosystems. The effective monitoring of these soundscapes is essential for forest conservation and wildlife management. However, traditional manual annotation methods are time-consuming and limited in scalability, while commonly used acoustic indices such as the Normalized Difference Soundscape Index (NDSI) lack the capacity to resolve overlapping or complex sound sources often encountered in dense forest environments. To overcome these limitations, this study applied a deep learning-based multi-label classification approach to long-term field recordings collected from Shennongjia National Park, a typical subtropical forest ecosystem in China. The model automatically classifies sound sources into biophony, geophony, and anthrophony. Compared to the NDSI, the model demonstrated higher precision and robustness, especially under low-signal-to-noise-ratio conditions. While the NDSI provides an efficient overview of soundscape disturbances, it demonstrates limitations in differentiating geophonic components and detecting subtle variations. This study supports a complementary “macro–micro” analytical framework that enables capturing broad, time-averaged soundscape trends through the NDSI, while achieving fine-grained, label-specific detection of biophony, geophony, and anthrophony through the multi-label classification model. This integration enhances analytical resolution, enabling the scalable, automated monitoring of complex forest soundscapes. This study contributes a novel and adaptable approach for real-time biodiversity assessment and long-term forest conservation. Full article

(This article belongs to the Special Issue Forest Ecology and Resource Monitoring Based on Sensors, Signal and Image Processing)

► Show Figures

Figure 1

Search Results (134)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (134)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI