applsci-logo

Journal Browser

Journal Browser

Application of Deep Learning in Speech Enhancement Technology

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 30 May 2025 | Viewed by 577

Special Issue Editor

School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK
Interests: signal processing; machine learning; robot perception
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Speech enhancement aims to improve the quality of speech degraded by environmental noise with signal-processing techniques and is used in many applications such as voice communication, hearing aids, speech recognition, and human–robot interaction. In recent years, the research in speech enhancement has advanced significantly with deep learning and artificial intelligence techniques. When sufficient training data are available, deep neural networks can learn to predict speech from the noisy signal, achieving promising results in non-stationary and highly noisy acoustic environments. For this reason, deep learning-based speech enhancement has been investigated intensively and is becoming a hot spot in the field of speech processing. A number of methods have been developed with the aim of solving speech enhancement problems in extremely challenging environments, developing new deep architectures, increasing the generality and explainability of the deep model, incorporating deep learning into multi-channel signal processing, and multi-modal speech enhancement. This Special Issue aims to accelerate the research progress by reporting the latest theoretical and practical advances applying deep learning to speech enhancement, discussing emerging problems, creative solutions, and novel insights in the field. This Special Issue will mainly focus on (but is not limited to) the following deep learning-related topics:

  • Single-channel speech enhancement;
  • Multi-channel speech enhancement;
  • Multi-modal speech enhancement;
  • Explainable speech enhancement;
  • Novel applications of speech enhancement.

Dr. Lin Wang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • explainable AI
  • microphone array
  • multi-modal speech processing
  • noise reduction
  • speech enhancement

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 2345 KiB  
Article
SGM-EMA: Speech Enhancement Method Score-Based Diffusion Model and EMA Mechanism
by Yuezhou Wu, Zhiri Li and Hua Huang
Appl. Sci. 2025, 15(10), 5243; https://doi.org/10.3390/app15105243 - 8 May 2025
Viewed by 314
Abstract
The score-based diffusion model has made significant progress in the field of computer vision, surpassing the performance of generative models, such as variational autoencoders, and has been extended to applications such as speech enhancement and recognition. This paper proposes a U-Net architecture using [...] Read more.
The score-based diffusion model has made significant progress in the field of computer vision, surpassing the performance of generative models, such as variational autoencoders, and has been extended to applications such as speech enhancement and recognition. This paper proposes a U-Net architecture using a score-based diffusion model and an efficient multi-scale attention mechanism (EMA) for the speech enhancement task. The model leverages the symmetric structure of U-Net to extract speech features and captures contextual information and local details across different scales using the EMA mechanism, improving speech quality in noisy environments. We evaluate the method on the VoiceBank-DEMAND (VB-DMD) dataset and the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus–TUT Sound Events 2017 (TIMIT-TUT) dataset. The experimental results show that the proposed model performed well in terms of speech quality perception (PESQ), extended short-time objective intelligibility (ESTOI), and scale-invariant signal-to-distortion ratio (SI-SDR). Especially when processing out-of-dataset noisy speech, the proposed method achieved excellent speech enhancement results compared to other methods, demonstrating the model’s strong generalization capability. We also conducted an ablation study on the SDE solver and the EMA mechanism, and the results show that the reverse diffusion method outperformed the Euler–Maruyama method, and the EMA strategy could improve the model performance. The results demonstrate the effectiveness of these two techniques in our system. Nevertheless, since the model is specifically designed for Gaussian noise, its performance under non-Gaussian or complex noise conditions may be limited. Full article
(This article belongs to the Special Issue Application of Deep Learning in Speech Enhancement Technology)
Show Figures

Figure 1

Back to TopTop