sensors-logo

Journal Browser

Journal Browser

Acoustic Event Detection and Sensing

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Physical Sensors".

Deadline for manuscript submissions: closed (31 December 2021) | Viewed by 13121

Special Issue Editor


E-Mail Website
Guest Editor
School of Electrical Engineering and Computer Science and AI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, Korea
Interests: acoustic event detection; DoA estimation; audio analytics; speech enhancement; speech recognition

Special Issue Information

Dear Colleagues,

Recently, there have been many research works undertaken using various types of sensors such as cameras, gyro sensors, chemical sensors, and so on to provide better quality of life to the users. Especially, acoustic sensing is one of five senses through which human beings interact with nature, and it is somewhat more important than visual sensing when the visual sensing from video cameras is unreliable. For example, acoustic sensing signals can be used for surveillance systems to detect and localize ambient sounds in order to protect people in danger, including rescue voice, car crashes, explosions, etc.

In this Special Issue, we focus on approaches based on machine learning or deep learning in acoustic event detection and sensing, including indoor or outdoor activity monitoring, emergency or event detection like fire alarms or patients falling in hospitals, and the localization of acoustic sources for contextual information extraction.

Prof. Dr. Hong Kook Kim
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Acoustic sensing using microphone array and beamforming
  • Acoustic source localization
  • Acoustic scene analysis
  • Monophonic or polyphonic acoustic event detection
  • Acoustic anomaly detection
  • Contextual information processing from acoustic sensing data
  • Acoustic data collection and management

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

9 pages, 4738 KiB  
Communication
Deep Learning-Based Estimation of Reverberant Environment for Audio Data Augmentation
by Deokgyu Yun and Seung Ho Choi
Sensors 2022, 22(2), 592; https://doi.org/10.3390/s22020592 - 13 Jan 2022
Cited by 4 | Viewed by 2025
Abstract
This paper proposes an audio data augmentation method based on deep learning in order to improve the performance of dereverberation. Conventionally, audio data are augmented using a room impulse response, which is artificially generated by some methods, such as the image method. The [...] Read more.
This paper proposes an audio data augmentation method based on deep learning in order to improve the performance of dereverberation. Conventionally, audio data are augmented using a room impulse response, which is artificially generated by some methods, such as the image method. The proposed method estimates a reverberation environment model based on a deep neural network that is trained by using clean and recorded audio data as inputs and outputs, respectively. Then, a large amount of a real augmented database is constructed by using the trained reverberation model, and the dereverberation model is trained with the augmented database. The performance of the augmentation model was verified by a log spectral distance and mean square error between the real augmented data and the recorded data. In addition, according to dereverberation experiments, the proposed method showed improved performance compared with the conventional method. Full article
(This article belongs to the Special Issue Acoustic Event Detection and Sensing)
Show Figures

Figure 1

13 pages, 1096 KiB  
Article
Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification
by Jin-Young Son and Joon-Hyuk Chang
Sensors 2021, 21(20), 6718; https://doi.org/10.3390/s21206718 - 09 Oct 2021
Cited by 3 | Viewed by 1810
Abstract
Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance [...] Read more.
Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions. Full article
(This article belongs to the Special Issue Acoustic Event Detection and Sensing)
Show Figures

Figure 1

18 pages, 974 KiB  
Article
An Incremental Class-Learning Approach with Acoustic Novelty Detection for Acoustic Event Recognition
by Barış Bayram and Gökhan İnce
Sensors 2021, 21(19), 6622; https://doi.org/10.3390/s21196622 - 05 Oct 2021
Cited by 6 | Viewed by 2094
Abstract
Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may not be stationary, and novel events may exist [...] Read more.
Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may not be stationary, and novel events may exist that eventually deteriorate the performance of the analysis. In this study, a self-learning-based ASA for acoustic event recognition (AER) is presented to detect and incrementally learn novel acoustic events by tackling catastrophic forgetting. The proposed ASA framework comprises six elements: (1) raw acoustic signal pre-processing, (2) low-level and deep audio feature extraction, (3) acoustic novelty detection (AND), (4) acoustic signal augmentations, (5) incremental class-learning (ICL) (of the audio features of the novel events) and (6) AER. The self-learning on different types of audio features extracted from the acoustic signals of various events occurs without human supervision. For the extraction of deep audio representations, in addition to visual geometry group (VGG) and residual neural network (ResNet), time-delay neural network (TDNN) and TDNN based long short-term memory (TDNN–LSTM) networks are pre-trained using a large-scale audio dataset, Google AudioSet. The performances of ICL with AND using Mel-spectrograms, and deep features with TDNNs, VGG, and ResNet from the Mel-spectrograms are validated on benchmark audio datasets such as ESC-10, ESC-50, UrbanSound8K (US8K), and an audio dataset collected by the authors in a real domestic environment. Full article
(This article belongs to the Special Issue Acoustic Event Detection and Sensing)
Show Figures

Figure 1

16 pages, 3354 KiB  
Article
Automatic Detection of Chewing and Swallowing
by Akihiro Nakamura, Takato Saito, Daizo Ikeda, Ken Ohta, Hiroshi Mineno and Masafumi Nishimura
Sensors 2021, 21(10), 3378; https://doi.org/10.3390/s21103378 - 12 May 2021
Cited by 9 | Viewed by 3047
Abstract
A series of eating behaviors, including chewing and swallowing, is considered to be crucial to the maintenance of good health. However, most such behaviors occur within the human body, and highly invasive methods such as X-rays and fiberscopes must be utilized to collect [...] Read more.
A series of eating behaviors, including chewing and swallowing, is considered to be crucial to the maintenance of good health. However, most such behaviors occur within the human body, and highly invasive methods such as X-rays and fiberscopes must be utilized to collect accurate behavioral data. A simpler method of measurement is needed in healthcare and medical fields; hence, the present study concerns the development of a method to automatically recognize a series of eating behaviors from the sounds produced during eating. The automatic detection of left chewing, right chewing, front biting, and swallowing was tested through the deployment of the hybrid CTC/attention model, which uses sound recorded through 2ch microphones under the ear and weak labeled data as training data to detect the balance of chewing and swallowing. N-gram based data augmentation was first performed using weak labeled data to generate many weak labeled eating sounds to augment the training data. The detection performance was improved through the use of the hybrid CTC/attention model, which can learn the context. In addition, the study confirmed a similar detection performance for open and closed foods. Full article
(This article belongs to the Special Issue Acoustic Event Detection and Sensing)
Show Figures

Figure 1

18 pages, 2450 KiB  
Article
Classification of Surface Vehicle Propeller Cavitation Noise Using Spectrogram Processing in Combination with Convolution Neural Network
by Nhat Hoang Bach, Le Ha Vu and Van Duc Nguyen
Sensors 2021, 21(10), 3353; https://doi.org/10.3390/s21103353 - 12 May 2021
Cited by 9 | Viewed by 2696
Abstract
This paper proposes a method to enhance the quality of detecting and classifying surface vehicle propeller cavitation noise (VPCN) in shallow water by using the improved Detection Envelope Modulation On Noise (DEMON) algorithm in combination with the modified Convolution Neural Network (CNN). To [...] Read more.
This paper proposes a method to enhance the quality of detecting and classifying surface vehicle propeller cavitation noise (VPCN) in shallow water by using the improved Detection Envelope Modulation On Noise (DEMON) algorithm in combination with the modified Convolution Neural Network (CNN). To improve the quality of the VPCN spectrogram signal, we apply the DEMON algorithm while analyzing the amplitude variation (AV) to detect the fundamental frequencies of the VPCN signal. To enhance the performance of the traditional CNN, we adapt the size of the sliding window in accordance with the properties of the VPCN spectrogram data, and also reconstruct the CNN layer structure. As for the results, the fundamental frequencies contented in the VPCN spectrogram data can be detected. The analytical results based on the measured data show that the accuracy of the VPCN classification obtained by the proposed method is above 90%, which is higher than those obtained by traditional methods. Full article
(This article belongs to the Special Issue Acoustic Event Detection and Sensing)
Show Figures

Figure 1

Back to TopTop