Next Article in Journal
Evaluation of Regression Models: Model Assessment, Model Selection and Generalization Error
Previous Article in Journal
Gender Recognition by Voice Using an Improved Self-Labeled Algorithm
Open AccessArticle

A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface

1
Electrical Engineering and Computer Science Department, the University of Toledo, Toledo, OH 43606, USA
2
ECE Department, Purdue University Northwest, Hammond, IN 46323, USA
*
Author to whom correspondence should be addressed.
Mach. Learn. Knowl. Extr. 2019, 1(1), 504-520; https://doi.org/10.3390/make1010031
Received: 26 January 2019 / Revised: 13 March 2019 / Accepted: 15 March 2019 / Published: 19 March 2019
In this paper, we present a novel pipelined near real-time speaker recognition architecture that enhances the performance of speaker recognition by exploiting the advantages of hybrid feature extraction techniques that contain the features of Gabor Filter (GF), Convolution Neural Networks (CNN), and statistical parameters as a single matrix set. This architecture has been developed to enable secure access to a voice-based user interface (UI) by enabling speaker-based authentication and integration with an existing Natural Language Processing (NLP) system. Gaining secure access to existing NLP systems also served as motivation. Initially, we identify challenges related to real-time speaker recognition and highlight the recent research in the field. Further, we analyze the functional requirements of a speaker recognition system and introduce the mechanisms that can address these requirements through our novel architecture. Subsequently, the paper discusses the effect of different techniques such as CNN, GF, and statistical parameters in feature extraction. For the classification, standard classifiers such as Support Vector Machine (SVM), Random Forest (RF) and Deep Neural Network (DNN) are investigated. To verify the validity and effectiveness of the proposed architecture, we compared different parameters including accuracy, sensitivity, and specificity with the standard AlexNet architecture. View Full-Text
Keywords: classifiers; convolution neural network; architecture; feature extraction; machine learning; random forest; speaker recognition; voice interface classifiers; convolution neural network; architecture; feature extraction; machine learning; random forest; speaker recognition; voice interface
Show Figures

Figure 1

MDPI and ACS Style

Dhakal, P.; Damacharla, P.; Javaid, A.Y.; Devabhaktuni, V. A Near Real-Time Automatic Speaker Recognition Architecture for Voice-Based User Interface. Mach. Learn. Knowl. Extr. 2019, 1, 504-520.

Show more citation formats Show less citations formats

Article Access Map by Country/Region

1
Back to TopTop