The competition of speech recognition technology related to smartphones is now getting into full swing with the widespread internet of thing (IoT) devices. For robust speech recognition, it is necessary to detect speech signals in various acoustic environments. Speech/music classification that facilitates optimized signal processing from classification results has been extensively adapted as an essential part of various electronics applications, such as multi-rate audio codecs, automatic speech recognition, and multimedia document indexing. In this paper, we propose a new technique to improve robustness of a speech/music classifier for an enhanced voice service (EVS) codec adopted as a voice-over-LTE (VoLTE) speech codec using long short-term memory (LSTM). For effective speech/music classification, feature vectors implemented with the LSTM are chosen from the features of the EVS. To overcome the diversity of music data, a large scale of data is used for learning. Experiments show that LSTM-based speech/music classification provides better results than the conventional EVS speech/music classification algorithm in various conditions and types of speech/music data, especially at lower signal-to-noise ratio (SNR) than conventional EVS algorithm.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited