Next Article in Journal
Operational Modal Analysis for Vibration Control Following Moving Window Locality Preserving Projections for Linear Slow-Time-Varying Structures
Previous Article in Journal
Flatness-Based Aggressive Formation Tracking Control of Quadrotors with Finite-Time Estimated Feedforwards
Article

Integrating Dilated Convolution into DenseLSTM for Audio Source Separation

1
School of Electronics Engineering, Chungbuk National University, Cheongju 28644, Korea
2
Creative Content Research Division, Electronics and Telecommunications Research Institute, Daejeon 34129, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(2), 789; https://doi.org/10.3390/app11020789
Received: 9 December 2020 / Revised: 30 December 2020 / Accepted: 12 January 2021 / Published: 15 January 2021
(This article belongs to the Section Computing and Artificial Intelligence)
Herein, we proposed a multi-scale multi-band dilated time-frequency densely connected convolutional network (DenseNet) with long short-term memory (LSTM) for audio source separation. Because the spectrogram of the acoustic signal can be thought of as images as well as time series data, it is suitable for convolutional recurrent neural network (CRNN) architecture. We improved the audio source separation performance by applying the dilated block with a dilated convolution to CRNN architecture. The dilated block has the role of effectively increasing the receptive field in the spectrogram. In addition, it was designed in consideration of the acoustic characteristics that the frequency axis and the time axis in the spectrogram are changed by independent influences such as speech rate and pitch. In speech enhancement experiments, we estimated the speech signal using various deep learning architectures from a signal in which the music, noise, and speech were mixed. We conducted the subjective evaluation on the estimated speech signal. In addition, speech quality, intelligibility, separation, and speech recognition performance were also measured. In music signal separation, we estimated the music signal using several deep learning architectures from the mixture of the music and speech signal. After that, the separation performance and music identification accuracy were measured using the estimated music signal. Overall, the proposed architecture shows the best performance compared to other deep learning architectures not only in speech experiments but also in music experiments. View Full-Text
Keywords: dilated convolution; audio source separation; speech enhancement; speech recognition; music signal separation; music identification dilated convolution; audio source separation; speech enhancement; speech recognition; music signal separation; music identification
Show Figures

Figure 1

MDPI and ACS Style

Heo, W.-H.; Kim, H.; Kwon, O.-W. Integrating Dilated Convolution into DenseLSTM for Audio Source Separation. Appl. Sci. 2021, 11, 789. https://doi.org/10.3390/app11020789

AMA Style

Heo W-H, Kim H, Kwon O-W. Integrating Dilated Convolution into DenseLSTM for Audio Source Separation. Applied Sciences. 2021; 11(2):789. https://doi.org/10.3390/app11020789

Chicago/Turabian Style

Heo, Woon-Haeng, Hyemi Kim, and Oh-Wook Kwon. 2021. "Integrating Dilated Convolution into DenseLSTM for Audio Source Separation" Applied Sciences 11, no. 2: 789. https://doi.org/10.3390/app11020789

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop