Next Article in Journal
A Talk-Listen-Ack Beaconing Strategy for Neighbor Discovery Protocols in Wireless Sensor Networks
Next Article in Special Issue
Blind Source Separation Based on Double-Mutant Butterfly Optimization Algorithm
Previous Article in Journal
Power Resource Optimization for Backscatter-Aided Symbiotic Full-Duplex Secondary Transmission with Hardware Impairments in a Cognitive Radio Framework
Previous Article in Special Issue
Improved Swarm Intelligent Blind Source Separation Based on Signal Cross-Correlation
Article

Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models

1
Information Engineering and Computer Science School, University of Trento, 38121 Trento, Italy
2
Fondazione Bruno Kessler, 38121 Trento, Italy
*
Author to whom correspondence should be addressed.
Academic Editor: Leon Rothkrantz
Sensors 2022, 22(1), 374; https://doi.org/10.3390/s22010374
Received: 30 November 2021 / Revised: 29 December 2021 / Accepted: 30 December 2021 / Published: 4 January 2022
(This article belongs to the Special Issue Artificial Intelligence Based Audio Signal Processing)
Robustness against background noise and reverberation is essential for many real-world speech-based applications. One way to achieve this robustness is to employ a speech enhancement front-end that, independently of the back-end, removes the environmental perturbations from the target speech signal. However, although the enhancement front-end typically increases the speech quality from an intelligibility perspective, it tends to introduce distortions which deteriorate the performance of subsequent processing modules. In this paper, we investigate strategies for jointly training neural models for both speech enhancement and the back-end, which optimize a combined loss function. In this way, the enhancement front-end is guided by the back-end to provide more effective enhancement. Differently from typical state-of-the-art approaches employing on spectral features or neural embeddings, we operate in the time domain, processing raw waveforms in both components. As application scenario we consider intent classification in noisy environments. In particular, the front-end speech enhancement module is based on Wave-U-Net while the intent classifier is implemented as a temporal convolutional network. Exhaustive experiments are reported on versions of the Fluent Speech Commands corpus contaminated with noises from the Microsoft Scalable Noisy Speech Dataset, shedding light and providing insight about the most promising training approaches. View Full-Text
Keywords: joint training; speech enhancement; intent classification joint training; speech enhancement; intent classification
Show Figures

Figure 1

MDPI and ACS Style

Ali, M.N.; Falavigna, D.; Brutti, A. Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models. Sensors 2022, 22, 374. https://doi.org/10.3390/s22010374

AMA Style

Ali MN, Falavigna D, Brutti A. Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models. Sensors. 2022; 22(1):374. https://doi.org/10.3390/s22010374

Chicago/Turabian Style

Ali, Mohamed N., Daniele Falavigna, and Alessio Brutti. 2022. "Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models" Sensors 22, no. 1: 374. https://doi.org/10.3390/s22010374

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop