sensors-logo

Journal Browser

Journal Browser

Machine Learning-Based Audio Signal Processing via the Use of Sensors

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: closed (31 May 2022) | Viewed by 9915

Special Issue Editor


E-Mail Website
Guest Editor
Telecommunications Engineering Department, University of Jaen, 23740 Linares, Spain
Interests: audio signal processing; machine learning; signal decomposition-based methods; music information retrieval

Special Issue Information

Dear colleagues,

Over recent years and thanks to the massive amount of data available in many fields and the recent computational advances, machine learning approaches, and deep neural networks specifically, have been shown to outperform traditional expert-designed algorithms in a large variety of tasks including audio classification, source separation, enhancement, and content analysis. Nonetheless, classical signal processing and signal decomposition-based methods are still a hot topic for unsupervised scenarios.  

When multiple sensors are available, the multichannel information can be used to boost the performance of the conventional methods using the observations of a single sensor—for example, by exploiting the spatial information inherent to the observed multichannel signals. Another topic of interest for the community is related to distributed acoustic sensor networks which allow flexible implementations for audio signal processing tasks.

This Special Issue aims to present the development of novel machine learning-based audio signal processing methods in which signals are collected either from a specific arrangement of sensors or a fusion of sensors of different types. The topics of interest include, but are not limited to, the following:

  • microphone array audio signal processing;
  • unsupervised and semisupervised systems for multisensor audio signal processing;
  • deep learning approaches for multisensor audio signal processing;
  • source localization in 3D spaces;
  • multi-mic-based audio event detection/scene classification;
  • biomedical audio signal processing using multiple sensors;
  • real-time audio signal processing when multisensors are used;
  • enhancement/denoising of audio signal processing via the use of sensors;
  • wireless acoustic sensors networks for distributed scenarios.
 

Dr. Julio Carabias-Orti
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

 
 
 
 

Keywords

 

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 1281 KiB  
Article
BPCNN: Bi-Point Input for Convolutional Neural Networks in Speaker Spoofing Detection
by Sunghyun Yoon and Ha-Jin Yu
Sensors 2022, 22(12), 4483; https://doi.org/10.3390/s22124483 - 14 Jun 2022
Cited by 1 | Viewed by 1589
Abstract
We propose a method, called bi-point input, for convolutional neural networks (CNNs) that handle variable-length input features (e.g., speech utterances). Feeding input features into a CNN in a mini-batch unit requires that all features in each mini-batch have the same shape. A set [...] Read more.
We propose a method, called bi-point input, for convolutional neural networks (CNNs) that handle variable-length input features (e.g., speech utterances). Feeding input features into a CNN in a mini-batch unit requires that all features in each mini-batch have the same shape. A set of variable-length features cannot be directly fed into a CNN because they commonly have different lengths. Feature segmentation is a dominant method for CNNs to handle variable-length features, where each feature is decomposed into fixed-length segments. A CNN receives one segment as an input at one time. However, a CNN can consider only the information of one segment at one time, not the entire feature. This drawback limits the amount of information available at one time and consequently results in suboptimal solutions. Our proposed method alleviates this problem by increasing the amount of information available at one time. With the proposed method, a CNN receives a pair of two segments obtained from a feature as an input at one time. Each of the two segments generally covers different time ranges and therefore has different information. We also propose various combination methods and provide a rough guidance to set a proper segment length without evaluation. We evaluate the proposed method on the spoofing detection tasks using the ASVspoof 2019 database under various conditions. The experimental results reveal that the proposed method reduces the relative equal error rate (EER) by approximately 17.2% and 43.8% on average for the logical access (LA) and physical access (PA) tasks, respectively. Full article
(This article belongs to the Special Issue Machine Learning-Based Audio Signal Processing via the Use of Sensors)
Show Figures

Figure 1

14 pages, 3046 KiB  
Article
Improved Estimation of End-Milling Parameters from Acoustic Emission Signals Using a Microphone Array Assisted by AI Modelling
by Andrés Sio-Sever, Juan Manuel Lopez, César Asensio-Rivera, Antonio Vizan-Idoipe and Guillermo de Arcas
Sensors 2022, 22(10), 3807; https://doi.org/10.3390/s22103807 - 17 May 2022
Cited by 3 | Viewed by 1840
Abstract
This paper presents the implementation of a measurement system that uses a four microphone array and a data-driven algorithm to estimate depth of cut during end milling operations. The audible range acoustic emission signals captured with the microphones are combined using a spectral [...] Read more.
This paper presents the implementation of a measurement system that uses a four microphone array and a data-driven algorithm to estimate depth of cut during end milling operations. The audible range acoustic emission signals captured with the microphones are combined using a spectral subtraction and a blind source separation algorithm to reduce the impact of noise and reverberation. Afterwards, a set of features are extracted from these signals which are finally fed into a nonlinear regression algorithm assisted by machine learning techniques for the contactless monitoring of the milling process. The main advantages of this algorithm lie in relatively simple implementation and good accuracy in its results, which reduce the variance of the current noncontact monitoring systems. To validate this method, the results have been compared with the values obtained with a precision dynamometer and a geometric model algorithm obtaining a mean error of 1% while maintaining an STD below 0.2 mm. Full article
(This article belongs to the Special Issue Machine Learning-Based Audio Signal Processing via the Use of Sensors)
Show Figures

Figure 1

15 pages, 1619 KiB  
Article
Deep Prior Approach for Room Impulse Response Reconstruction
by Mirco Pezzoli, Davide Perini, Alberto Bernardini, Federico Borra, Fabio Antonacci and Augusto Sarti
Sensors 2022, 22(7), 2710; https://doi.org/10.3390/s22072710 - 1 Apr 2022
Cited by 17 | Viewed by 3230
Abstract
In this paper, we propose a data-driven approach for the reconstruction of unknown room impulse responses (RIRs) based on the deep prior paradigm. We formulate RIR reconstruction as an inverse problem. More specifically, a convolutional neural network (CNN) is employed prior, in order [...] Read more.
In this paper, we propose a data-driven approach for the reconstruction of unknown room impulse responses (RIRs) based on the deep prior paradigm. We formulate RIR reconstruction as an inverse problem. More specifically, a convolutional neural network (CNN) is employed prior, in order to obtain a regularized solution to the RIR reconstruction problem for uniform linear arrays. This approach allows us to avoid assumptions on sound wave propagation, acoustic environment, or measuring setting made in state-of-the-art RIR reconstruction algorithms. Moreover, differently from classical deep learning solutions in the literature, the deep prior approach employs a per-element training. Therefore, the proposed method does not require training data sets, and it can be applied to RIRs independently from available data or environments. Results on simulated data demonstrate that the proposed technique is able to provide accurate results in a wide range of scenarios, including variable direction of arrival of the source, room T60, and SNR at the sensors. The devised technique is also applied to real measurements, resulting in accurate RIR reconstruction and robustness to noise compared to state-of-the-art solutions. Full article
(This article belongs to the Special Issue Machine Learning-Based Audio Signal Processing via the Use of Sensors)
Show Figures

Figure 1

19 pages, 2313 KiB  
Article
Binaural Acoustic Scene Classification Using Wavelet Scattering, Parallel Ensemble Classifiers and Nonlinear Fusion
by Vahid Hajihashemi, Abdorreza Alavi Gharahbagh, Pedro Miguel Cruz, Marta Campos Ferreira, José J. M. Machado and João Manuel R. S. Tavares
Sensors 2022, 22(4), 1535; https://doi.org/10.3390/s22041535 - 16 Feb 2022
Cited by 11 | Viewed by 2201
Abstract
The analysis of ambient sounds can be very useful when developing sound base intelligent systems. Acoustic scene classification (ASC) is defined as identifying the area of a recorded sound or clip among some predefined scenes. ASC has huge potential to be used in [...] Read more.
The analysis of ambient sounds can be very useful when developing sound base intelligent systems. Acoustic scene classification (ASC) is defined as identifying the area of a recorded sound or clip among some predefined scenes. ASC has huge potential to be used in urban sound event classification systems. This research presents a hybrid method that includes a novel mathematical fusion step which aims to tackle the challenges of ASC accuracy and adaptability of current state-of-the-art models. The proposed method uses a stereo signal, two ensemble classifiers (random subspace), and a novel mathematical fusion step. In the proposed method, a stable, invariant signal representation of the stereo signal is built using Wavelet Scattering Transform (WST). For each mono, i.e., left and right, channel, a different random subspace classifier is trained using WST. A novel mathematical formula for fusion step was developed, its parameters being found using a Genetic algorithm. The results on the DCASE 2017 dataset showed that the proposed method has higher classification accuracy (about 95%), pushing the boundaries of existing methods. Full article
(This article belongs to the Special Issue Machine Learning-Based Audio Signal Processing via the Use of Sensors)
Show Figures

Figure 1

Back to TopTop