Submit to Sensors Review for Sensors Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Advances in Acoustic Sensors and Deep Audio Pattern Recognition

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Electronic Sensors".

Deadline for manuscript submissions: closed (20 January 2025) | Viewed by 25709

Share This Special Issue

Special Issue Editor

Dr. Stavros Ntalampiras

E-Mail Website
Guest Editor

Department of Computer Science, University of Milan, 20133 Milan, Italy
Interests: audio analyzing; AI; computer vision; robotics; deep learning
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Lately, there has been a constantly increasing demand for applications of generalized sound-recognition technologies focused on non-speech signals, including environmental sounds, music, animal vocalizations, etc. The field is advancing at a fast pace, and most of the literature adopts solutions based on deep architectures. Even though such solutions offer significant performance improvements, there is a series of aspects that remain open, such as interpretability, out-of-distribution learning, etc., which currently constitute the center of attention of much of the ongoing research.

We invite original papers, communications, and review articles covering the latest advances in acoustic sensors and audio pattern recognition technologies while focusing on the following topics and applications:

Self-supervised learning; cooperative deep learning methods; continual learning; multi-task learning; small-footprint models; graph neural networks; deep generative models; out-of-distribution generalization; few-shot learning; adversarial machine learning; transfer and reinforcement learning interpretation; and verifiable, reliable, explainable, auditable, robust and unbiased modeling.
Computational auditory scene analysis, bioacoustics, medical acoustics, music information retrieval, privacy in smart-home assistants, and acoustic sensor networks.

Dr. Stavros Ntalampiras
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

deep learning
interpretable, verifiable, reliable, explainable, auditable, robust, and unbiased modeling
adversarial machine learning
out-of-distribution learning

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

17 pages, 1463 KB

Open AccessArticle

Interpretable Probabilistic Identification of Depression in Speech

by Stavros Ntalampiras

Sensors 2025, 25(4), 1270; https://doi.org/10.3390/s25041270 - 19 Feb 2025

Cited by 3 | Viewed by 1205

Abstract

Mental health assessment is typically carried out via a series of conversation sessions with medical professionals, where the overall aim is the diagnosis of mental illnesses and well-being evaluation. Despite its arguable socioeconomic significance, national health systems fail to meet the increased demand for such services that has been observed in recent years. To assist and accelerate the diagnosis process, this work proposes an AI-based tool able to provide interpretable predictions by automatically processing the recorded speech signals. An explainability-by-design approach is followed, where audio descriptors related to the problem at hand form the feature vector (Mel-scaled spectrum summarization, Teager operator and periodicity description), while modeling is based on Hidden Markov Models adapted from an ergodic universal one following a suitably designed data selection scheme. After extensive and thorough experiments adopting a standardized protocol on a publicly available dataset, we report significantly higher results with respect to the state of the art. In addition, an ablation study was carried out, providing a comprehensive analysis of the relevance of each system component. Last but not least, the proposed solution not only provides excellent performance, but its operation and predictions are transparent and interpretable, laying out the path to close the usability gap existing between such systems and medical personnel. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

26 pages, 7994 KB

Open AccessArticle

Biases in Ecoacoustics Analysis: A Protocol to Equalize Audio Recorders

by Andrea Potenza, Valentina Zaffaroni-Caorsi, Roberto Benocci, Giorgia Guagliumi, Jalal M. Fouani, Alessandro Bisceglie and Giovanni Zambon

Sensors 2024, 24(14), 4642; https://doi.org/10.3390/s24144642 - 17 Jul 2024

Cited by 7 | Viewed by 1960

Abstract

Eco-acoustic indices allow us to rapidly evaluate habitats and ecosystems and derive information about anthropophonic impacts. However, it is proven that indices’ values and trends are not comparable between studies. These incongruences may be caused by the availability on the market of recorders with different characteristics and costs. Thus, there is a need to reduce these biases and incongruences to ensure an accurate analysis and comparison between soundscape ecology studies and habitat assessments. In this study, we propose and validate an audio recording equalization protocol to reduce eco-acoustic indices’ biases, by testing three soundscape recorder models: Song Meter Micro, Soundscape Explorer Terrestrial and Audiomoth. The equalization process aligns the signal amplitude and frequency response of the soundscape recorders to those of a type 1 level meter. The adjustment was made in MATLAB R2023a using a filter curve generated comparing a reference signal (white noise); the measurements were performed in an anechoic chamber using 11 audio sensors and a type 1 sound level meter (able to produce a .WAV file). The statistical validation of the procedure was performed on recordings obtained in an urban and Regional Park (Italy) assessing a significant reduction in indices’ biases on the Song Meter Micro and Audiomoth. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

21 pages, 19137 KB

Open AccessArticle

Soundscape Characterization Using Autoencoders and Unsupervised Learning

by Daniel Alexis Nieto-Mora, Maria Cristina Ferreira de Oliveira, Camilo Sanchez-Giraldo, Leonardo Duque-Muñoz, Claudia Isaza-Narváez and Juan David Martínez-Vargas

Sensors 2024, 24(8), 2597; https://doi.org/10.3390/s24082597 - 18 Apr 2024

Cited by 9 | Viewed by 3219

Abstract

Passive acoustic monitoring (PAM) through acoustic recorder units (ARUs) shows promise in detecting early landscape changes linked to functional and structural patterns, including species richness, acoustic diversity, community interactions, and human-induced threats. However, current approaches primarily rely on supervised methods, which require prior knowledge of collected datasets. This reliance poses challenges due to the large volumes of ARU data. In this work, we propose a non-supervised framework using autoencoders to extract soundscape features. We applied this framework to a dataset from Colombian landscapes captured by 31 audiomoth recorders. Our method generates clusters based on autoencoder features and represents cluster information with prototype spectrograms using centroid features and the decoder part of the neural network. Our analysis provides valuable insights into the distribution and temporal patterns of various sound compositions within the study area. By utilizing autoencoders, we identify significant soundscape patterns characterized by recurring and intense sound types across multiple frequency ranges. This comprehensive understanding of the study area’s soundscape allows us to pinpoint crucial sound sources and gain deeper insights into its acoustic environment. Our results encourage further exploration of unsupervised algorithms in soundscape analysis as a promising alternative path for understanding and monitoring environmental changes. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

12 pages, 4523 KB

Open AccessArticle

A 90.9 dB SNDR 95.3 dB DR Audio Delta–Sigma Modulator with FIA-Assisted OTA

by Gongxing Huang, Cong Wei and Rongshan Wei

Sensors 2024, 24(5), 1449; https://doi.org/10.3390/s24051449 - 23 Feb 2024

Viewed by 2515

Abstract

This paper presents a low-power, high-gain integrator design that uses a cascode operational transconductance amplifier (OTA) with floating inverter–amplifier (FIA) assistance. Compared to a traditional cascode, the proposed integrator can achieve a gain of 80 dB, while reducing power consumption by 30%. Upon completing the analysis, the value of the FIA drive capacitor and clock scheme for the FIA-assisted OTA were obtained. To enhance the dynamic range (DR) and mitigate quantization noise, a tri-level quantizer was employed. The design of the feedback digital-to-analog converter (DAC) was simplified, as it does not use additional mismatch shaping techniques. A third-order, discrete-time delta–sigma modulator was designed and fabricated in a 0.18 μm complementary metal-oxide semiconductor (CMOS) process. It operated on a 1.8 V supply, consuming 221 µW with a 24 kHz bandwidth. The measured SNDR and DR were 90.9 dB and 95.3 dB, respectively. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

18 pages, 12509 KB

Open AccessArticle

EnViTSA: Ensemble of Vision Transformer with SpecAugment for Acoustic Event Classification

by Kian Ming Lim, Chin Poo Lee, Zhi Yang Lee and Ali Alqahtani

Sensors 2023, 23(22), 9084; https://doi.org/10.3390/s23229084 - 10 Nov 2023

Cited by 10 | Viewed by 2876

Abstract

Recent successes in deep learning have inspired researchers to apply deep neural networks to Acoustic Event Classification (AEC). While deep learning methods can train effective AEC models, they are susceptible to overfitting due to the models’ high complexity. In this paper, we introduce EnViTSA, an innovative approach that tackles key challenges in AEC. EnViTSA combines an ensemble of Vision Transformers with SpecAugment, a novel data augmentation technique, to significantly enhance AEC performance. Raw acoustic signals are transformed into Log Mel-spectrograms using Short-Time Fourier Transform, resulting in a fixed-size spectrogram representation. To address data scarcity and overfitting issues, we employ SpecAugment to generate additional training samples through time masking and frequency masking. The core of EnViTSA resides in its ensemble of pre-trained Vision Transformers, harnessing the unique strengths of the Vision Transformer architecture. This ensemble approach not only reduces inductive biases but also effectively mitigates overfitting. In this study, we evaluate the EnViTSA method on three benchmark datasets: ESC-10, ESC-50, and UrbanSound8K. The experimental results underscore the efficacy of our approach, achieving impressive accuracy scores of 93.50%, 85.85%, and 83.20% on ESC-10, ESC-50, and UrbanSound8K, respectively. EnViTSA represents a substantial advancement in AEC, demonstrating the potential of Vision Transformers and SpecAugment in the acoustic domain. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

19 pages, 679 KB

Open AccessArticle

Online Continual Learning in Acoustic Scene Classification: An Empirical Study

by Donghee Ha, Mooseop Kim and Chi Yoon Jeong

Sensors 2023, 23(15), 6893; https://doi.org/10.3390/s23156893 - 3 Aug 2023

Cited by 2 | Viewed by 2541

Abstract

Numerous deep learning methods for acoustic scene classification (ASC) have been proposed to improve the classification accuracy of sound events. However, only a few studies have focused on continual learning (CL) wherein a model continually learns to solve issues with task changes. Therefore, in this study, we systematically analyzed the performance of ten recent CL methods to provide guidelines regarding their performances. The CL methods included two regularization-based methods and eight replay-based methods. First, we defined realistic and difficult scenarios such as online class-incremental (OCI) and online domain-incremental (ODI) cases for three public sound datasets. Then, we systematically analyzed the performance of each CL method in terms of average accuracy, average forgetting, and training time. In OCI scenarios, iCaRL and SCR showed the best performance for small buffer sizes, and GDumb showed the best performance for large buffer sizes. In ODI scenarios, SCR adopting supervised contrastive learning consistently outperformed the other methods, regardless of the memory buffer size. Most replay-based methods have an almost constant training time, regardless of the memory buffer size, and their performance increases with an increase in the memory buffer size. Based on these results, we must first consider GDumb/SCR for the continual learning methods for ASC. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

18 pages, 2268 KB

Open AccessArticle

Affective Neural Responses Sonified through Labeled Correlation Alignment

by Andrés Marino Álvarez-Meza, Héctor Fabio Torres-Cardona, Mauricio Orozco-Alzate, Hernán Darío Pérez-Nastar and German Castellanos-Dominguez

Sensors 2023, 23(12), 5574; https://doi.org/10.3390/s23125574 - 14 Jun 2023

Cited by 1 | Viewed by 2266

Abstract

Sound synthesis refers to the creation of original acoustic signals with broad applications in artistic innovation, such as music creation for games and videos. Nonetheless, machine learning architectures face numerous challenges when learning musical structures from arbitrary corpora. This issue involves adapting patterns borrowed from other contexts to a concrete composition objective. Using Labeled Correlation Alignment (LCA), we propose an approach to sonify neural responses to affective music-listening data, identifying the brain features that are most congruent with the simultaneously extracted auditory features. For dealing with inter/intra-subject variability, a combination of Phase Locking Value and Gaussian Functional Connectivity is employed. The proposed two-step LCA approach embraces a separate coupling stage of input features to a set of emotion label sets using Centered Kernel Alignment. This step is followed by canonical correlation analysis to select multimodal representations with higher relationships. LCA enables physiological explanation by adding a backward transformation to estimate the matching contribution of each extracted brain neural feature set. Correlation estimates and partition quality represent performance measures. The evaluation uses a Vector Quantized Variational AutoEncoder to create an acoustic envelope from the tested Affective Music-Listening database. Validation results demonstrate the ability of the developed LCA approach to generate low-level music based on neural activity elicited by emotions while maintaining the ability to distinguish between the acoustic outputs. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

15 pages, 4233 KB

Open AccessArticle

Acoustic Source Localization in CFRP Composite Plate Based on Wave Velocity-Direction Function Fitting

by Yu Zhang, Yu Feng, Xiaobo Rui, Lixin Xu, Lei Qi, Zi Yang, Cong Hu, Peng Liu and Haijiang Zhang

Sensors 2023, 23(6), 3052; https://doi.org/10.3390/s23063052 - 12 Mar 2023

Cited by 5 | Viewed by 2625

Abstract

Composite materials are widely used, but they are often subjected to impacts from foreign objects, causing structural damage. To ensure the safety of use, it is necessary to locate the impact point. This paper investigates impact sensing and localization technology for composite plates and proposes a method of acoustic source localization for CFRP composite plates based on wave velocity-direction function fitting. This method divides the grid of composite plates, constructs the theoretical time difference matrix of the grid points, and compares it with the actual time difference to form an error matching matrix to localize the impact source. In this paper, finite element simulation combined with a lead-break experiment is used to explore the wave velocity-angle function relationship of Lamb waves in composite materials. The simulation experiment is used to verify the feasibility of the localization method, and the lead-break experimental system is built to locate the actual impact source. The results show that the acoustic emission time-difference approximation method can effectively solve the problem of impact source localization in composite structures, and the average localization error is 1.44 cm and the maximum localization error is 3.35 cm in 49 experimental points with good stability and accuracy. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

19 pages, 3109 KB

Open AccessArticle

Improvement of Acoustic Models Fused with Lip Visual Information for Low-Resource Speech

by Chongchong Yu, Jiaqi Yu, Zhaopeng Qian and Yuchen Tan

Sensors 2023, 23(4), 2071; https://doi.org/10.3390/s23042071 - 12 Feb 2023

Cited by 6 | Viewed by 2176

Abstract

Endangered language generally has low-resource characteristics, as an immaterial cultural resource that cannot be renewed. Automatic speech recognition (ASR) is an effective means to protect this language. However, for low-resource language, native speakers are few and labeled corpora are insufficient. ASR, thus, suffers deficiencies including high speaker dependence and over fitting, which greatly harms the accuracy of recognition. To tackle the deficiencies, the paper puts forward an approach of audiovisual speech recognition (AVSR) based on LSTM-Transformer. The approach introduces visual modality information including lip movements to reduce the dependence of acoustic models on speakers and the quantity of data. Specifically, the new approach, through the fusion of audio and visual information, enhances the expression of speakers’ feature space, thus achieving the speaker adaptation that is difficult in a single modality. The approach also includes experiments on speaker dependence and evaluates to what extent audiovisual fusion is dependent on speakers. Experimental results show that the CER of AVSR is 16.9% lower than those of traditional models (optimal performance scenario), and 11.8% lower than that for lip reading. The accuracy for recognizing phonemes, especially finals, improves substantially. For recognizing initials, the accuracy improves for affricates and fricatives where the lip movements are obvious and deteriorates for stops where the lip movements are not obvious. In AVSR, the generalization onto different speakers is also better than in a single modality and the CER can drop by as much as 17.2%. Therefore, AVSR is of great significance in studying the protection and preservation of endangered languages through AI. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Figure 1

20 pages, 6023 KB

Open AccessArticle

Detecting Lombard Speech Using Deep Learning Approach

by Krzysztof Kąkol, Gražina Korvel, Gintautas Tamulevičius and Bożena Kostek

Sensors 2023, 23(1), 315; https://doi.org/10.3390/s23010315 - 28 Dec 2022

Cited by 4 | Viewed by 2952

Abstract

Robust Lombard speech-in-noise detecting is challenging. This study proposes a strategy to detect Lombard speech using a machine learning approach for applications such as public address systems that work in near real time. The paper starts with the background concerning the Lombard effect. Then, assumptions of the work performed for Lombard speech detection are outlined. The framework proposed combines convolutional neural networks (CNNs) and various two-dimensional (2D) speech signal representations. To reduce the computational cost and not resign from the 2D representation-based approach, a strategy for threshold-based averaging of the Lombard effect detection results is introduced. The pseudocode of the averaging process is also included. A series of experiments are performed to determine the most effective network structure and the 2D speech signal representation. Investigations are carried out on German and Polish recordings containing Lombard speech. All 2D signal speech representations are tested with and without augmentation. Augmentation means using the alpha channel to store additional data: gender of the speaker, F0 frequency, and first two MFCCs. The experimental results show that Lombard and neutral speech recordings can clearly be discerned, which is done with high detection accuracy. It is also demonstrated that the proposed speech detection process is capable of working in near real-time. These are the key contributions of this work. Full article

(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)

► Show Figures

Journal Menu

Journal Browser

Advances in Acoustic Sensors and Deep Audio Pattern Recognition

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI