Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (43)

Search Parameters:
Keywords = acoustic events recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 6085 KiB  
Article
Earthquake Precursors Based on Rock Acoustic Emission and Deep Learning
by Zihan Jiang, Zhiwen Zhu, Giuseppe Lacidogna, Leandro F. Friedrich and Ignacio Iturrioz
Sci 2025, 7(3), 103; https://doi.org/10.3390/sci7030103 - 1 Aug 2025
Viewed by 151
Abstract
China is one of the countries severely affected by earthquakes, making precise and timely identification of earthquake precursors essential for reducing casualties and property damage. A novel method is proposed that combines a rock acoustic emission (AE) detection technique with deep learning methods [...] Read more.
China is one of the countries severely affected by earthquakes, making precise and timely identification of earthquake precursors essential for reducing casualties and property damage. A novel method is proposed that combines a rock acoustic emission (AE) detection technique with deep learning methods to facilitate real-time monitoring and advance earthquake precursor detection. The AE equipment and seismometers were installed in a granite tunnel 150 m deep in the mountains of eastern Guangdong, China, allowing for the collection of experimental data on the correlation between rock AE and seismic activity. The deep learning model uses features from rock AE time series, including AE events, rate, frequency, and amplitude, as inputs, and estimates the likelihood of seismic events as the output. Precursor features are extracted to create the AE and seismic dataset, and three deep learning models are trained using neural networks, with validation and testing. The results show that after 1000 training cycles, the deep learning model achieves an accuracy of 98.7% on the validation set. On the test set, it reaches a recognition accuracy of 97.6%, with a recall rate of 99.6% and an F1 score of 0.975. Additionally, it successfully identified the two biggest seismic events during the monitoring period, confirming its effectiveness in practical applications. Compared to traditional analysis methods, the deep learning model can automatically process and analyse recorded massive AE data, enabling real-time monitoring of seismic events and timely earthquake warning in the future. This study serves as a valuable reference for earthquake disaster prevention and intelligent early warning. Full article
Show Figures

Figure 1

26 pages, 5535 KiB  
Article
Research on Power Cable Intrusion Identification Using a GRT-Transformer-Based Distributed Acoustic Sensing (DAS) System
by Xiaoli Huang, Xingcheng Wang, Han Qin and Zhaoliang Zhou
Informatics 2025, 12(3), 75; https://doi.org/10.3390/informatics12030075 - 21 Jul 2025
Viewed by 438
Abstract
To address the high false alarm rate of intrusion detection systems based on distributed acoustic sensing (DAS) for power cables in complex underground environments, an innovative GRT-Transformer multimodal deep learning model is proposed. The core of this model lies in its distinctive three-branch [...] Read more.
To address the high false alarm rate of intrusion detection systems based on distributed acoustic sensing (DAS) for power cables in complex underground environments, an innovative GRT-Transformer multimodal deep learning model is proposed. The core of this model lies in its distinctive three-branch parallel collaborative architecture: two branches employ Gramian Angular Summation Field (GASF) and Recursive Pattern (RP) algorithms to convert one-dimensional intrusion waveforms into two-dimensional images, thereby capturing rich spatial patterns and dynamic characteristics and the third branch utilizes a Gated Recurrent Unit (GRU) algorithm to directly focus on the temporal evolution features of the waveform; additionally, a Transformer component is integrated to capture the overall trend and global dependencies of the signals. Ultimately, the terminal employs a Bidirectional Long Short-Term Memory (BiLSTM) network to perform a deep fusion of the multidimensional features extracted from the three branches, enabling a comprehensive understanding of the bidirectional temporal dependencies within the data. Experimental validation demonstrates that the GRT-Transformer achieves an average recognition accuracy of 97.3% across three typical intrusion events—illegal tapping, mechanical operations, and vehicle passage—significantly reducing false alarms, surpassing traditional methods, and exhibiting strong practical potential in complex real-world scenarios. Full article
Show Figures

Figure 1

22 pages, 1268 KiB  
Article
Semi-Supervised Learned Autoencoder for Classification of Events in Distributed Fibre Acoustic Sensors
by Artem Kozmin, Oleg Kalashev, Alexey Chernenko and Alexey Redyuk
Sensors 2025, 25(12), 3730; https://doi.org/10.3390/s25123730 - 14 Jun 2025
Viewed by 380
Abstract
The global market for infrastructure security systems based on distributed acoustic sensors is rapidly expanding, driven by the need for timely detection and prevention of potential threats. However, deploying these systems is challenging due to the high costs associated with dataset creation. Additionally, [...] Read more.
The global market for infrastructure security systems based on distributed acoustic sensors is rapidly expanding, driven by the need for timely detection and prevention of potential threats. However, deploying these systems is challenging due to the high costs associated with dataset creation. Additionally, advanced signal processing algorithms are necessary for accurately determining the location and nature of detected events. In this paper, we present an enhanced approach based on semi-supervised learning for developing event classification models tailored for real-time and continuous perimeter monitoring of infrastructure facilities. The proposed method leverages a hybrid architecture combining an autoencoder and a classifier to enhance the accuracy and efficiency of event classification. The autoencoder extracts essential features from raw data using unlabeled data, improving the model’s ability to learn meaningful representations. The classifier, trained on labeled data, recognizes and classifies specific events based on these features. The integrated loss function incorporates elements from both the autoencoder and the classifier, guiding the autoencoder to extract features relevant for accurate event classification. Validation using real-world datasets demonstrates that the proposed method achieves recognition performance comparable to the baseline model, while requiring less labeled data and employing a simpler architecture. These results offer practical insights for reducing deployment costs, enhancing system performance, and increasing throughput for new deployments. Full article
(This article belongs to the Special Issue Fiber Optic Sensing and Applications)
Show Figures

Figure 1

28 pages, 13595 KiB  
Article
Open-Set Recognition of Environmental Sound Based on KDE-GAN and Attractor–Reciprocal Point Learning
by Jiakuan Wu, Nan Wang, Huajie Hong, Wei Wang, Kunsheng Xing and Yujie Jiang
Acoustics 2025, 7(2), 33; https://doi.org/10.3390/acoustics7020033 - 28 May 2025
Viewed by 739
Abstract
While open-set recognition algorithms have been extensively explored in computer vision, their application to environmental sound analysis remains understudied. To address this gap, this study investigates how to effectively recognize unknown sound categories in real-world environments by proposing a novel Kernel Density Estimation-based [...] Read more.
While open-set recognition algorithms have been extensively explored in computer vision, their application to environmental sound analysis remains understudied. To address this gap, this study investigates how to effectively recognize unknown sound categories in real-world environments by proposing a novel Kernel Density Estimation-based Generative Adversarial Network (KDE-GAN) for data augmentation combined with Attractor–Reciprocal Point Learning for open-set classification. Specifically, our approach addresses three key challenges: (1) How to generate boundary-aware synthetic samples for robust open-set training: A closed-set classifier’s pre-logit layer outputs are fed into the KDE-GAN, which synthesizes samples mapped to the logit layer using the classifier’s original weights. Kernel Density Estimation then enforces Density Loss and Offset Loss to ensure these samples align with class boundaries. (2) How to optimize feature space organization: The closed-set classifier is constrained by an Attractor–Reciprocal Point joint loss, maintaining intra-class compactness while pushing unknown samples toward low-density regions. (3) How to evaluate performance in highly open scenarios: We validate the method using UrbanSound8K, AudioEventDataset, and TUT Acoustic Scenes 2017 as closed sets, with ESC-50 categories as open-set samples, achieving AUROC/OSCR scores of 0.9251/0.8743, 0.7921/0.7135, and 0.8209/0.6262, respectively. The findings demonstrate the potential of this framework to enhance environmental sound monitoring systems, particularly in applications requiring adaptability to unseen acoustic events (e.g., urban noise surveillance or wildlife monitoring). Full article
Show Figures

Figure 1

18 pages, 2345 KiB  
Article
SGM-EMA: Speech Enhancement Method Score-Based Diffusion Model and EMA Mechanism
by Yuezhou Wu, Zhiri Li and Hua Huang
Appl. Sci. 2025, 15(10), 5243; https://doi.org/10.3390/app15105243 - 8 May 2025
Viewed by 860
Abstract
The score-based diffusion model has made significant progress in the field of computer vision, surpassing the performance of generative models, such as variational autoencoders, and has been extended to applications such as speech enhancement and recognition. This paper proposes a U-Net architecture using [...] Read more.
The score-based diffusion model has made significant progress in the field of computer vision, surpassing the performance of generative models, such as variational autoencoders, and has been extended to applications such as speech enhancement and recognition. This paper proposes a U-Net architecture using a score-based diffusion model and an efficient multi-scale attention mechanism (EMA) for the speech enhancement task. The model leverages the symmetric structure of U-Net to extract speech features and captures contextual information and local details across different scales using the EMA mechanism, improving speech quality in noisy environments. We evaluate the method on the VoiceBank-DEMAND (VB-DMD) dataset and the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus–TUT Sound Events 2017 (TIMIT-TUT) dataset. The experimental results show that the proposed model performed well in terms of speech quality perception (PESQ), extended short-time objective intelligibility (ESTOI), and scale-invariant signal-to-distortion ratio (SI-SDR). Especially when processing out-of-dataset noisy speech, the proposed method achieved excellent speech enhancement results compared to other methods, demonstrating the model’s strong generalization capability. We also conducted an ablation study on the SDE solver and the EMA mechanism, and the results show that the reverse diffusion method outperformed the Euler–Maruyama method, and the EMA strategy could improve the model performance. The results demonstrate the effectiveness of these two techniques in our system. Nevertheless, since the model is specifically designed for Gaussian noise, its performance under non-Gaussian or complex noise conditions may be limited. Full article
(This article belongs to the Special Issue Application of Deep Learning in Speech Enhancement Technology)
Show Figures

Figure 1

21 pages, 8334 KiB  
Article
A Study Based on b-Value and Information Entropy in the 2008 Wenchuan 8.0 Earthquake
by Shasha Liang, Ziqi Wang and Xinyue Wang
Entropy 2025, 27(4), 431; https://doi.org/10.3390/e27040431 - 16 Apr 2025
Viewed by 367
Abstract
Earthquakes, as serious natural disasters, have greatly harmed human beings. In recent years, the combination of acoustic emission technology and information entropy has shown good prospects in earthquake prediction. In this paper, we study the application of acoustic emission b-values and information entropy [...] Read more.
Earthquakes, as serious natural disasters, have greatly harmed human beings. In recent years, the combination of acoustic emission technology and information entropy has shown good prospects in earthquake prediction. In this paper, we study the application of acoustic emission b-values and information entropy in earthquake prediction in China and analyze their changing characteristics and roles. The acoustic emission b-value is based on the Gutenberg–Richter law, which quantifies the relationship between magnitude and occurrence frequency. Lower b-values are usually associated with higher earthquake risks. Meanwhile, information entropy is used to quantify the uncertainty of the system, which can reflect the distribution characteristics of seismic events and their dynamic changes. In this study, acoustic emission data from several stations around the 2008 Wenchuan 8.0 earthquake are selected for analysis. By calculating the acoustic emission b-value and information entropy, the following is found: (1) Both the b-value and information entropy show obvious changes before the main earthquake: during the seismic phase, the acoustic emission b-value decreases significantly, and the information entropy also shows obvious decreasing entropy changes. The b-values of stations AXI and DFU continue to decrease in the 40 days before the earthquake, while the b-values of stations JYA and JMG begin to decrease significantly in the 17 days or so before the earthquake. The information entropy changes in the JJS and YZP stations are relatively obvious, especially for the YZP station, which shows stronger aggregation characteristics of seismic activity. This phenomenon indicates that the regional underground structure is in an extremely unstable state. (2) The stress evolution process of the rock mass is divided into three stages: in the first stage, the rock mass enters a sub-stabilized state about 40 days before the main earthquake; in the second stage, the rupture of the cracks changes from a disordered state to an ordered state, which occurs about 10 days before the earthquake; and in the third stage, the impending destabilization of the entire subsurface structure is predicted, which occurs in a short period before the earthquake. In summary, the combined analysis of the acoustic emission b-value and information entropy provides a novel dual-parameter synergy framework for earthquake monitoring and early warning, enhancing precursor recognition through the coupling of stress evolution and system disorder dynamics. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

21 pages, 866 KiB  
Article
An Event Recognition Method for a Φ-OTDR System Based on CNN-BiGRU Network Model with Attention
by Changli Li, Xiaoyu Chen and Yi Shi
Photonics 2025, 12(4), 313; https://doi.org/10.3390/photonics12040313 - 28 Mar 2025
Viewed by 671
Abstract
The phase-sensitive optical time domain reflectometry (Φ-OTDR) technique offers a method for distributed acoustic sensing (DAS) systems to detect external acoustic fluctuations and mechanical vibrations. By accurately identifying vibration events, DAS systems provide a non-invasive solution for security monitoring. However, limitations in temporal [...] Read more.
The phase-sensitive optical time domain reflectometry (Φ-OTDR) technique offers a method for distributed acoustic sensing (DAS) systems to detect external acoustic fluctuations and mechanical vibrations. By accurately identifying vibration events, DAS systems provide a non-invasive solution for security monitoring. However, limitations in temporal signal analysis and the lack of spatial features significantly impact classification accuracy in event recognition. To address these challenges, this paper proposes a network model for vibration-event recognition that integrates convolutional neural networks (CNNs), bidirectional gated recurrent units (BiGRUs), and attention mechanisms, referred to as CNN-BiGRU-Attention (CBA). First, the CBA model processes spatiotemporal matrices converted from raw signals, extracting low-level features through convolution and pooling. Subsequently, features are further extracted and separated along both the temporal and spatial dimensions. In the spatial-dimension branch, horizontal convolution and pooling generate enhanced spatial feature maps. In the temporal-dimension branch, vertical convolution and pooling are followed by BiGRU processing to capture dynamic changes in vibration events from both past and future contexts. Additionally, the attention mechanism focuses on extracted features in both dimensions. The features from the two dimensions are then fused using two cross-attention mechanisms. Finally, classification probabilities are output through a fully connected layer and a softmax activation function. In the experimental simulation section, the model is validated using real-world data. A comparison with four other typical models demonstrates that the proposed CBA model offers significant advantages in both recognition accuracy and robustness. Full article
(This article belongs to the Special Issue Distributed Optical Fiber Sensing Technology)
Show Figures

Figure 1

18 pages, 2018 KiB  
Article
Adapting a Large-Scale Transformer Model to Decode Chicken Vocalizations: A Non-Invasive AI Approach to Poultry Welfare
by Suresh Neethirajan
AI 2025, 6(4), 65; https://doi.org/10.3390/ai6040065 - 25 Mar 2025
Cited by 2 | Viewed by 1350
Abstract
Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, [...] Read more.
Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, to decode chicken vocalizations. Our primary objective was to determine whether Whisper could effectively identify acoustic patterns associated with emotional and physiological states in poultry, thereby enabling real-time, non-invasive welfare assessments. To achieve this, chicken vocal data were recorded under diverse experimental conditions, including healthy versus unhealthy birds, pre-stress versus post-stress scenarios, and quiet versus noisy environments. The audio recordings were processed through Whisper, producing text-like outputs. Although these outputs did not represent literal translations of chicken vocalizations into human language, they exhibited consistent patterns in token sequences and sentiment indicators strongly correlated with recognized poultry stressors and welfare conditions. Sentiment analysis using standard NLP tools (e.g., polarity scoring) identified notable shifts in “negative” and “positive” scores that corresponded closely with documented changes in vocal intensity associated with stress events and altered physiological states. Despite the inherent domain mismatch—given Whisper’s original training on human speech—the findings clearly demonstrate the model’s capability to reliably capture acoustic features significant to poultry welfare. Recognizing the limitations associated with applying English-oriented sentiment tools, this study proposes future multimodal validation frameworks incorporating physiological sensors and behavioral observations to further strengthen biological interpretability. To our knowledge, this work provides the first demonstration that Transformer-based architectures, even without species-specific fine-tuning, can effectively encode meaningful acoustic patterns from animal vocalizations, highlighting their transformative potential for advancing productivity, sustainability, and welfare practices in precision poultry farming. Full article
(This article belongs to the Special Issue Artificial Intelligence in Agriculture)
Show Figures

Figure 1

15 pages, 2866 KiB  
Article
Optical Fiber Vibration Signal Recognition Based on the EMD Algorithm and CNN-LSTM
by Kun Li, Yao Zhen, Peng Li, Xinyue Hu and Lixia Yang
Sensors 2025, 25(7), 2016; https://doi.org/10.3390/s25072016 - 23 Mar 2025
Cited by 1 | Viewed by 652
Abstract
Accurately identifying optical fiber vibration signals is crucial for ensuring the proper operation of optical fiber perimeter security warning systems. To enhance the recognition accuracy of intrusion events detected by the distributed acoustic sensing system (DAS) based on phase-sensitive optical time-domain reflectometer (φ-OTDR) [...] Read more.
Accurately identifying optical fiber vibration signals is crucial for ensuring the proper operation of optical fiber perimeter security warning systems. To enhance the recognition accuracy of intrusion events detected by the distributed acoustic sensing system (DAS) based on phase-sensitive optical time-domain reflectometer (φ-OTDR) technology, we propose an identification method that combines empirical mode decomposition (EMD) with convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. First, the EMD algorithm decomposes the collected original optical fiber vibration signal into several intrinsic mode functions (IMFs), and the correlation coefficient between each IMF and the original signal is calculated. The signal is then reconstructed by selecting effective IMF components based on a suitable threshold. This reconstructed signal serves as the input for the network. CNN is used to extract time-series features from the vibration signal and LSTM is employed to classify the reconstructed signal. Experimental results demonstrate that this method effectively identifies three different types of vibration signals collected from a real-world environment, achieving a recognition accuracy of 97.3% for intrusion signals. This method successfully addresses the challenge of φ-OTDR pattern recognition and provides valuable insights for the development of practical engineering products. Full article
(This article belongs to the Section Optical Sensors)
Show Figures

Figure 1

12 pages, 1513 KiB  
Article
Emotion-Recognition System for Smart Environments Using Acoustic Information (ERSSE)
by Gabriela Santiago, Jose Aguilar and Rodrigo García
Information 2024, 15(11), 677; https://doi.org/10.3390/info15110677 - 30 Oct 2024
Viewed by 1553
Abstract
Acoustic management is very important for detecting possible events in the context of a smart environment (SE). In previous works, we proposed a reflective middleware for acoustic management (ReM-AM) and its autonomic cycles of data analysis tasks, along with its ontology-driven architecture. In [...] Read more.
Acoustic management is very important for detecting possible events in the context of a smart environment (SE). In previous works, we proposed a reflective middleware for acoustic management (ReM-AM) and its autonomic cycles of data analysis tasks, along with its ontology-driven architecture. In this work, we aim to develop an emotion-recognition system for ReM-AM that uses sound events, rather than speech, as its main focus. The system is based on a sound pattern for emotion recognition and the autonomic cycle of intelligent sound analysis (ISA), defined by three tasks: variable extraction, sound data analysis, and emotion recommendation. We include a case study to test our emotion-recognition system in a simulation of a smart movie theater, with different situations taking place. The implementation and verification of the tasks show a promising performance in the case study, with 80% accuracy in sound recognition, and its general behavior shows that it can contribute to improving the well-being of the people present in the environment. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

24 pages, 3882 KiB  
Article
Open-Set Recognition of Pansori Rhythm Patterns Based on Audio Segmentation
by Jie You and Joonwhoan Lee
Appl. Sci. 2024, 14(16), 6893; https://doi.org/10.3390/app14166893 - 6 Aug 2024
Cited by 1 | Viewed by 1206
Abstract
Pansori, a traditional Korean form of musical storytelling, is characterized by performances involving a vocalist and a drummer. It is well-known for the singer’s expressive narrative (aniri) and delicate gesture with fan in hand. The classical Pansori repertoires mostly tell love, satire, and [...] Read more.
Pansori, a traditional Korean form of musical storytelling, is characterized by performances involving a vocalist and a drummer. It is well-known for the singer’s expressive narrative (aniri) and delicate gesture with fan in hand. The classical Pansori repertoires mostly tell love, satire, and humor, as well as some social lessons. These performances, which can extend from three to five hours, necessitate that the vocalist adheres to precise rhythmic structures. The distinctive rhythms of Pansori are crucial for conveying both the narrative and musical expression effectively. This paper explores the challenge of open-set recognition, aiming to efficiently identify unknown Pansori rhythm patterns while applying the methodology to diverse acoustic datasets, such as sound events and genres. We propose a lightweight deep learning-based encoder–decoder segmentation model, which employs a 2-D log-Mel spectrogram as input for the encoder and produces a frame-based 1-D decision along the temporal axis. This segmentation approach, processing 2-D inputs to classify frame-wise rhythm patterns, proves effective in detecting unknown patterns within time-varying sound streams encountered in daily life. Throughout the training phase, both center and supervised contrastive losses, along with cross-entropy loss, are minimized. This strategy aimed to create a compact cluster structure within the feature space for known classes, thereby facilitating the recognition of unknown rhythm patterns by allocating ample space for their placement within the embedded feature space. Comprehensive experiments utilizing various datasets—including Pansori rhythm patterns (91.8%), synthetic datasets of instrument sounds (95.1%), music genres (76.9%), and sound datasets from DCASE challenges (73.0%)—demonstrate the efficacy of our proposed method to detect unknown events, as evidenced by the AUROC metrics. Full article
(This article belongs to the Special Issue Algorithmic Music and Sound Computing)
Show Figures

Figure 1

19 pages, 9149 KiB  
Article
Multi-Sensor Fusion Approach to Drinking Activity Identification for Improving Fluid Intake Monitoring
by Ju-Hsuan Li, Pei-Wei Yu, Hsuan-Chih Wang, Che-Yu Lin, Yen-Chen Lin, Chien-Pin Liu, Chia-Yeh Hsieh and Chia-Tai Chan
Appl. Sci. 2024, 14(11), 4480; https://doi.org/10.3390/app14114480 - 24 May 2024
Cited by 2 | Viewed by 1256
Abstract
People nowadays often ignore the importance of proper hydration. Water is indispensable to the human body’s function, including maintaining normal temperature, getting rid of wastes and preventing kidney damage. Once the fluid intake is lower than the consumption, it is difficult to metabolize [...] Read more.
People nowadays often ignore the importance of proper hydration. Water is indispensable to the human body’s function, including maintaining normal temperature, getting rid of wastes and preventing kidney damage. Once the fluid intake is lower than the consumption, it is difficult to metabolize waste. Furthermore, insufficient fluid intake can also cause headaches, dizziness and fatigue. Fluid intake monitoring plays an important role in preventing dehydration. In this study, we propose a multimodal approach to drinking activity identification to improve fluid intake monitoring. The movement signals of the wrist and container, as well as acoustic signals of swallowing, are acquired. After pre-processing and feature extraction, typical machine learning algorithms are used to determine whether each sliding window is a drinking activity. Next, the recognition performance of the single-modal and multimodal methods is compared through the event-based and sample-based evaluation. In sample-based evaluation, the proposed multi-sensor fusion approach performs better on support vector machine and extreme gradient boosting and achieves 83.7% and 83.9% F1-score, respectively. Similarly, the proposed method in the event-based evaluation achieves the best F1-score of 96.5% on the support vector machine. The results demonstrate that the multimodal approach performs better than the single-modal in drinking activity identification. Full article
(This article belongs to the Special Issue Intelligent Electronic Monitoring Systems and Their Application)
Show Figures

Figure 1

16 pages, 4193 KiB  
Article
Combined Data Augmentation on EANN to Identify Indoor Anomalous Sound Event
by Xiyu Song, Junhan Xiong, Mei Wang, Qingshan Mei and Xiaodong Lin
Appl. Sci. 2024, 14(4), 1327; https://doi.org/10.3390/app14041327 - 6 Feb 2024
Cited by 3 | Viewed by 1666
Abstract
Indoor abnormal sound event identification refers to the automatic detection and recognition of abnormal sounds in an indoor environment using computer auditory technology. However, the process of model training usually requires a large amount of high-quality data, which can be time-consuming and costly [...] Read more.
Indoor abnormal sound event identification refers to the automatic detection and recognition of abnormal sounds in an indoor environment using computer auditory technology. However, the process of model training usually requires a large amount of high-quality data, which can be time-consuming and costly to collect. Utilizing limited data has become another preferred approach for such research, but it introduces overfitting issues for machine learning models on small datasets. To overcome this issue, we proposed and validated the framework of combining the offline augmentation of raw audio and online augmentation of spectral features, making the application of small datasets in indoor anomalous sound event identification more feasible. Along with this, an improved two-dimensional audio convolutional neural network (EANN) was also proposed to evaluate and compare the impacts of different data augmentation methods under the framework on the sensitivity of sound event identification. Moreover, we further investigated the performance of four combinations of data augmentation techniques. Our research shows that the proposed combined data augmentation method has an accuracy of 97.4% on the test dataset, which is 10.6% higher than the baseline method. This demonstrates the method’s potential in the identification of indoor abnormal sound events. Full article
Show Figures

Figure 1

16 pages, 2437 KiB  
Article
Electrophysiological Correlates of Vocal Emotional Processing in Musicians and Non-Musicians
by Christine Nussbaum, Annett Schirmer and Stefan R. Schweinberger
Brain Sci. 2023, 13(11), 1563; https://doi.org/10.3390/brainsci13111563 - 7 Nov 2023
Cited by 2 | Viewed by 2321
Abstract
Musicians outperform non-musicians in vocal emotion recognition, but the underlying mechanisms are still debated. Behavioral measures highlight the importance of auditory sensitivity towards emotional voice cues. However, it remains unclear whether and how this group difference is reflected at the brain level. Here, [...] Read more.
Musicians outperform non-musicians in vocal emotion recognition, but the underlying mechanisms are still debated. Behavioral measures highlight the importance of auditory sensitivity towards emotional voice cues. However, it remains unclear whether and how this group difference is reflected at the brain level. Here, we compared event-related potentials (ERPs) to acoustically manipulated voices between musicians (n = 39) and non-musicians (n = 39). We used parameter-specific voice morphing to create and present vocal stimuli that conveyed happiness, fear, pleasure, or sadness, either in all acoustic cues or selectively in either pitch contour (F0) or timbre. Although the fronto-central P200 (150–250 ms) and N400 (300–500 ms) components were modulated by pitch and timbre, differences between musicians and non-musicians appeared only for a centro-parietal late positive potential (500–1000 ms). Thus, this study does not support an early auditory specialization in musicians but suggests instead that musicality affects the manner in which listeners use acoustic voice cues during later, controlled aspects of emotion evaluation. Full article
(This article belongs to the Special Issue The Role of Sounds and Music in Emotion and Cognition)
Show Figures

Figure 1

15 pages, 11940 KiB  
Article
An Investigation of ECAPA-TDNN Audio Type Recognition Method Based on Mel Acoustic Spectrograms
by Jian Wang, Zhongzheng Wang, Xingcheng Han and Yan Han
Electronics 2023, 12(21), 4421; https://doi.org/10.3390/electronics12214421 - 27 Oct 2023
Cited by 3 | Viewed by 3450
Abstract
Audio signals play a crucial role in our perception of our surroundings. People rely on sound to assess motion, distance, direction, and environmental conditions, aiding in danger avoidance and decision making. However, in real-world environments, during the acquisition and transmission of audio signals, [...] Read more.
Audio signals play a crucial role in our perception of our surroundings. People rely on sound to assess motion, distance, direction, and environmental conditions, aiding in danger avoidance and decision making. However, in real-world environments, during the acquisition and transmission of audio signals, we often encounter various types of noises that interfere with the intended signals. As a result, the essential features of audio signals become significantly obscured. Under the interference of strong noise, identifying noise segments or sound segments, and distinguishing audio types becomes pivotal for detecting specific events and sound patterns or isolating abnormal sounds. This study analyzes the characteristics of Mel’s acoustic spectrogram, explores the application of the deep learning ECAPA-TDNN method for audio type recognition, and substantiates its effectiveness through experiments. Ultimately, the experimental results demonstrate that the deep learning ECAPA-TDNN method for audio type recognition, utilizing Mel’s acoustic spectrogram as features, achieves a notably high recognition accuracy. Full article
(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)
Show Figures

Figure 1

Back to TopTop