MDPI - Publisher of Open Access Journals

21 pages, 6268 KiB

Open AccessArticle

On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification

by Achintya Kumar Sarkar and Zheng-Hua Tan

Acoustics 2023, 5(3), 693-713; https://doi.org/10.3390/acoustics5030042 - 17 Jul 2023

Cited by 2 | Viewed by 2916

Deep representation learning has gained significant momentum in advancing text-dependent speaker verification (TD-SV) systems. When designing deep neural networks (DNN) for extracting bottleneck (BN) features, the key considerations include training targets, activation functions, and loss functions. In this paper, we systematically study the impact of these choices on the performance of TD-SV. For training targets, we consider speaker identity, time-contrastive learning (TCL), and auto-regressive prediction coding, with the first being supervised and the last two being self-supervised. Furthermore, we study a range of loss functions when speaker identity is used as the training target. With regard to activation functions, we study the widely used sigmoid function, rectified linear unit (ReLU), and Gaussian error linear unit (GELU). We experimentally show that GELU is able to reduce the error rates of TD-SV significantly compared to sigmoid, irrespective of the training target. Among the three training targets, TCL performs the best. Among the various loss functions, cross-entropy, joint-softmax, and focal loss functions outperform the others. Finally, the score-level fusion of different systems is also able to reduce the error rates. To evaluate the representation learning methods, experiments are conducted on the RedDots 2016 challenge database consisting of short utterances for TD-SV systems based on classic Gaussian mixture model-universal background model (GMM-UBM) and i-vector methods. Full article

(This article belongs to the Collection Featured Position and Review Papers in Acoustics Science)

► Show Figures

Figure 1

18 pages, 2718 KiB

Open AccessArticle

Individual Violin Recognition Method Combining Tonal and Nontonal Features

by Qi Wang and Changchun Bao

Electronics 2020, 9(6), 950; https://doi.org/10.3390/electronics9060950 - 8 Jun 2020

Cited by 3 | Viewed by 2860

Abstract

Individual recognition among instruments of the same type is a challenging problem and it has been rarely investigated. In this study, the individual recognition of violins is explored. Based on the source–filter model, the spectrum can be divided into tonal content and nontonal content, which reflects the timbre from complementary aspects. The tonal/nontonal gammatone frequency cepstral coefficients (GFCC) are combined to describe the corresponding spectrum contents in this study. In the recognition system, Gaussian mixture models–universal background model (GMM–UBM) is employed to parameterize the distribution of the combined features. In order to evaluate the recognition task of violin individuals, a solo dataset including 86 violins is developed in this study. Compared with other features, the combined features show a better performance in both individual violin recognition and violin grade classification. Experimental results also show the GMM–UBM outperforms the CNN, especially when the training data are limited. Finally, the effect of players on the individual violin recognition is investigated. Full article

(This article belongs to the Special Issue Recent Advances in Multimedia Signal Processing and Communications)

► Show Figures

Figure 1

14 pages, 2104 KiB

Open AccessArticle

Development of Machine Learning for Asthmatic and Healthy Voluntary Cough Sounds: A Proof of Concept Study

by Hwan Ing Hee, BT Balamurali, Arivazhagan Karunakaran, Dorien Herremans, Onn Hoe Teoh, Khai Pin Lee, Sung Shin Teng, Simon Lui and Jer Ming Chen

Appl. Sci. 2019, 9(14), 2833; https://doi.org/10.3390/app9142833 - 16 Jul 2019

Cited by 35 | Viewed by 5286

Abstract

(1) Background: Cough is a major presentation in childhood asthma. Here, we aim to develop a machine-learning based cough sound classifier for asthmatic and healthy children. (2) Methods: Children less than 16 years old were randomly recruited in a Children’s Hospital, from February 2017 to April 2018, and were divided into 2 cohorts—healthy children and children with acute asthma presenting with cough. Children with other concurrent respiratory conditions were excluded in the asthmatic cohort. Demographic data, duration of cough, and history of respiratory status were obtained. Children were instructed to produce voluntary cough sounds. These clinically labeled cough sounds were randomly divided into training and testing sets. Audio features such as Mel-Frequency Cepstral Coefficients and Constant-Q Cepstral Coefficients were extracted. Using a training set, a classification model was developed with Gaussian Mixture Model–Universal Background Model (GMM-UBM). Its predictive performance was tested using the test set against the physicians’ labels. (3) Results: Asthmatic cough sounds from 89 children (totaling 1192 cough sounds) and healthy coughs from 89 children (totaling 1140 cough sounds) were analyzed. The sensitivity and specificity of the audio-based classification model was 82.81% and 84.76%, respectively, when differentiating coughs from asthmatic children versus coughs from ‘healthy’ children. (4) Conclusion: Audio-based classification using machine learning is a potentially useful technique in assisting the differentiation of asthmatic cough sounds from healthy voluntary cough sounds in children. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Graphical abstract

14 pages, 3234 KiB

Open AccessArticle

Towards a Continuous Biometric System Based on ECG Signals Acquired on the Steering Wheel

by João Ribeiro Pinto, Jaime S. Cardoso, André Lourenço and Carlos Carreiras

Sensors 2017, 17(10), 2228; https://doi.org/10.3390/s17102228 - 28 Sep 2017

Cited by 107 | Viewed by 8429

Abstract

Electrocardiogram signals acquired through a steering wheel could be the key to seamless, highly comfortable, and continuous human recognition in driving settings. This paper focuses on the enhancement of the unprecedented lesser quality of such signals, through the combination of Savitzky-Golay and moving average filters, followed by outlier detection and removal based on normalised cross-correlation and clustering, which was able to render ensemble heartbeats of significantly higher quality. Discrete Cosine Transform (DCT) and Haar transform features were extracted and fed to decision methods based on Support Vector Machines (SVM), k-Nearest Neighbours (kNN), Multilayer Perceptrons (MLP), and Gaussian Mixture Models - Universal Background Models (GMM-UBM) classifiers, for both identification and authentication tasks. Additional techniques of user-tuned authentication and past score weighting were also studied. The method’s performance was comparable to some of the best recent state-of-the-art methods (94.9% identification rate (IDR) and 2.66% authentication equal error rate (EER)), despite lesser results with scarce train data (70.9% IDR and 11.8% EER). It was concluded that the method was suitable for biometric recognition with driving electrocardiogram signals, and could, with future developments, be used on a continuous system in seamless and highly noisy settings. Full article

(This article belongs to the Section Biosensors)

► Show Figures

Figure 1

16 pages, 630 KiB

Open AccessArticle

Efficient Invariant Features for Sensor Variability Compensation in Speaker Recognition

by Abdennour Alimohad, Ahmed Bouridane and Abderrezak Guessoum

Sensors 2014, 14(10), 19007-19022; https://doi.org/10.3390/s141019007 - 13 Oct 2014

Cited by 2 | Viewed by 5581

Abstract

In this paper, we investigate the use of invariant features for speaker recognition. Owing to their characteristics, these features are introduced to cope with the difficult and challenging problem of sensor variability and the source of performance degradation inherent in speaker recognition systems. Our experiments show: (1) the effectiveness of these features in match cases; (2) the benefit of combining these features with the mel frequency cepstral coefficients to exploit their discrimination power under uncontrolled conditions (mismatch cases). Consequently, the proposed invariant features result in a performance improvement as demonstrated by a reduction in the equal error rate and the minimum decision cost function compared to the GMM-UBM speaker recognition systems based on MFCC features. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI