Next Article in Journal
Degradation Products on Byzantine Glasses from Northern Tunisia
Next Article in Special Issue
Multimodal Few-Shot Learning for Gait Recognition
Previous Article in Journal
Morphological and Chemical Characterization of Titanium and Zirconia Dental Implants with Different Macro- and Micro-Structure
Previous Article in Special Issue
Induction Motor Fault Classification Based on FCBF-PSO Feature Selection Method
Article

Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

1
Department of Electrical & Computer Engineering, North South University, Bashundhara, Dhaka-1229, Bangladesh
2
Gina Cody School of Engineering and Computer Science, Concordia University, Montreal, QC H3G, Canada
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(21), 7522; https://doi.org/10.3390/app10217522
Received: 5 October 2020 / Revised: 18 October 2020 / Accepted: 21 October 2020 / Published: 26 October 2020
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition. View Full-Text
Keywords: speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding
Show Figures

Figure 1

MDPI and ACS Style

Chowdhury, L.; Zunair, H.; Mohammed, N. Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss. Appl. Sci. 2020, 10, 7522. https://doi.org/10.3390/app10217522

AMA Style

Chowdhury L, Zunair H, Mohammed N. Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss. Applied Sciences. 2020; 10(21):7522. https://doi.org/10.3390/app10217522

Chicago/Turabian Style

Chowdhury, Labib, Hasib Zunair, and Nabeel Mohammed. 2020. "Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss" Applied Sciences 10, no. 21: 7522. https://doi.org/10.3390/app10217522

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop