Next Article in Journal
A DC-DC Center-Tapped Resonant Dual-Active Bridge with Two Modulation Techniques
Previous Article in Journal
Design of Dual-Band Dual-Mode Band-Pass Filter Utilizing 0° Feed Structure and Lumped Capacitors for WLAN/WiMAX Applications
Previous Article in Special Issue
Automatic ECG Diagnosis Using Convolutional Neural Network
Open AccessFeature PaperArticle

Temporal Auditory Coding Features for Causal Speech Enhancement

School of Electrical and Computer Engineering, Faculty of Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
Author to whom correspondence should be addressed.
Electronics 2020, 9(10), 1698;
Received: 24 September 2020 / Revised: 12 October 2020 / Accepted: 14 October 2020 / Published: 16 October 2020
(This article belongs to the Special Issue Application of Neural Networks in Biosignal Process)
Perceptually motivated audio signal processing and feature extraction have played a key role in the determination of high-level semantic processes and the development of emerging systems and applications, such as mobile phone telecommunication and hearing aids. In the era of deep learning, speech enhancement methods based on neural networks have seen great success, mainly operating on the log-power spectra. Although these approaches surpass the need for exhaustive feature extraction and selection, it is still unclear whether they target the important sound characteristics related to speech perception. In this study, we propose a novel set of auditory-motivated features for single-channel speech enhancement by fusing temporal envelope and temporal fine structure information in the context of vocoder-like processing. A causal gated recurrent unit (GRU) neural network is employed to recover the low-frequency amplitude modulations of speech. Experimental results indicate that the exploited system achieves considerable gains for normal-hearing and hearing-impaired listeners, in terms of objective intelligibility and quality metrics. The proposed auditory-motivated feature set achieved better objective intelligibility results compared to the conventional log-magnitude spectrogram features, while mixed results were observed for simulated listeners with hearing loss. Finally, we demonstrate that the proposed analysis/synthesis framework provides satisfactory reconstruction accuracy of speech signals. View Full-Text
Keywords: speech enhancement; speech intelligibility; temporal envelope; temporal fine structure; neural networks speech enhancement; speech intelligibility; temporal envelope; temporal fine structure; neural networks
Show Figures

Figure 1

MDPI and ACS Style

Thoidis, I.; Vrysis, L.; Markou, D.; Papanikolaou, G. Temporal Auditory Coding Features for Causal Speech Enhancement. Electronics 2020, 9, 1698.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Search more from Scilit
Back to TopTop