Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (6)

Search Parameters:
Keywords = text-independent speaker verification system

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 520 KB  
Article
Investigation of Text-Independent Speaker Verification by Support Vector Machine-Based Machine Learning Approaches
by Odin Kohler and Masudul Imtiaz
Electronics 2025, 14(5), 963; https://doi.org/10.3390/electronics14050963 - 28 Feb 2025
Cited by 4 | Viewed by 2915
Abstract
Speaker verification is a common issue that has enumerable biomedical security applications. Speaker verification comes in two different forms: text-independent and text-dependent. Each of these forms can be implemented via many different machine learning and deep learning techniques. From our research, we found [...] Read more.
Speaker verification is a common issue that has enumerable biomedical security applications. Speaker verification comes in two different forms: text-independent and text-dependent. Each of these forms can be implemented via many different machine learning and deep learning techniques. From our research, we found that there is significantly less work implementing text-independent speaker verification using machine learning techniques than there is using deep learning techniques. Because of this gap, we were motivated to build our own SVM and CNN model for text-independent speaker verification and compare them to other systems using SVMs or deep learning techniques. We limited ourselves to SVMs because they are commonly used for speech recognition and achieved very high accuracies. The main motivation behind this was two-fold. The first reason is to demonstrate that SVMs can and have been successfully used for text-independent speaker verification at a level comparable to deep learning techniques; the second reason is to make work using SVMs for text-independent speaker verification more accessible so it can be expanded upon easily. The analysis and comparison conducted in this paper will demonstrate how SVMs achieve results comparable to deep learning techniques and allow future researchers to more easily find SVMs used for text-independent speaker verification and derive a sense of what is being implemented in the field. Full article
Show Figures

Figure 1

16 pages, 7288 KB  
Article
mmSafe: A Voice Security Verification System Based on Millimeter-Wave Radar
by Zhanjun Hao, Jianxiang Peng, Xiaochao Dang, Hao Yan and Ruidong Wang
Sensors 2022, 22(23), 9309; https://doi.org/10.3390/s22239309 - 29 Nov 2022
Cited by 4 | Viewed by 3728
Abstract
With the increasing popularity of smart devices, users can control their mobile phones, TVs, cars, and smart furniture by using voice assistants, but voice assistants are susceptible to intrusion by outsider speakers or playback attacks. In order to address this security issue, a [...] Read more.
With the increasing popularity of smart devices, users can control their mobile phones, TVs, cars, and smart furniture by using voice assistants, but voice assistants are susceptible to intrusion by outsider speakers or playback attacks. In order to address this security issue, a millimeter-wave radar-based voice security authentication system is proposed in this paper. First, the speaker’s fine-grained vocal cord vibration signal is extracted by eliminating static object clutter and motion effects; second, the weighted Mel Frequency Cepstrum Coefficients (MFCCs) are obtained as biometric features; and finally, text-independent security authentication is performed by the WMHS (Weighted MFCCs and Hog-based SVM) method. This system is highly adaptable and can authenticate designated speakers, resist intrusion by other unspecified speakers as well as playback attacks, and is secure for smart devices. Extensive experiments have verified that the system achieves a 93.4% speaker verification accuracy and a 5.8% miss detection rate for playback attacks. Full article
(This article belongs to the Special Issue Communication, Security, and Privacy in IoT)
Show Figures

Figure 1

17 pages, 3285 KB  
Article
Pseudo-Phoneme Label Loss for Text-Independent Speaker Verification
by Mengqi Niu, Liang He, Zhihua Fang, Baowei Zhao and Kai Wang
Appl. Sci. 2022, 12(15), 7463; https://doi.org/10.3390/app12157463 - 25 Jul 2022
Cited by 3 | Viewed by 2827
Abstract
Compared with text-independent speaker verification (TI-SV) systems, text-dependent speaker verification (TD-SV) counterparts often have better performance for their efficient utilization of speech content information. On this account, some TI-SV methods tried to boost performance by incorporating an extra automatic speech recognition (ASR) component [...] Read more.
Compared with text-independent speaker verification (TI-SV) systems, text-dependent speaker verification (TD-SV) counterparts often have better performance for their efficient utilization of speech content information. On this account, some TI-SV methods tried to boost performance by incorporating an extra automatic speech recognition (ASR) component to explore content information, such as c-vector. However, the introduced ASR component requires a large amount of annotated data and consumes high computation resources. In this paper, we propose a pseudo-phoneme label (PPL) loss for the TI-SR task by integrating content cluster loss at the frame level and speaker recognition loss at the segment level in a unified network by multitask learning, without additional data requirement and exhausting computation. By referring to HuBERT, we generate pseudo-phoneme labels to adjust a frame level feature distribution by deep cluster to ensure each cluster corresponds to an implicit pronunciation unit in the feature space. We compare the proposed loss with the softmax loss, center loss, triplet loss, log-likelihood-ratio cost loss, additive margin softmax loss and additive angular margin loss on the VoxCeleb database. Experimental results demonstrate the effectiveness of our proposed method. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

11 pages, 1578 KB  
Communication
Evaluating the Performance of Speaker Recognition Solutions in E-Commerce Applications
by Olja Krčadinac, Uroš Šošević and Dušan Starčević
Sensors 2021, 21(18), 6231; https://doi.org/10.3390/s21186231 - 17 Sep 2021
Cited by 6 | Viewed by 2765
Abstract
Two important tasks in many e-commerce applications are identity verification of the user accessing the system and determining the level of rights that the user has for accessing and manipulating system’s resources. The performance of these tasks is directly dependent on the certainty [...] Read more.
Two important tasks in many e-commerce applications are identity verification of the user accessing the system and determining the level of rights that the user has for accessing and manipulating system’s resources. The performance of these tasks is directly dependent on the certainty of establishing the identity of the user. The main research focus of this paper is user identity verification approach based on voice recognition techniques. The paper presents research results connected to the usage of open-source speaker recognition technologies in e-commerce applications with an emphasis on evaluating the performance of the algorithms they use. Four open-source speaker recognition solutions (SPEAR, MARF, ALIZE, and HTK) have been evaluated in cases of mismatched conditions during training and recognition phases. In practice, mismatched conditions are influenced by various lengths of spoken sentences, different types of recording devices, and the usage of different languages in training and recognition phases. All tests conducted in this research were performed in laboratory conditions using the specially designed framework for multimodal biometrics. The obtained results show consistency with the findings of recent research which proves that i-vectors and solutions based on probabilistic linear discriminant analysis (PLDA) continue to be the dominant speaker recognition approaches for text-independent tasks. Full article
(This article belongs to the Special Issue Sensor-Based Biometrics Recognition and Processing)
Show Figures

Figure 1

14 pages, 3446 KB  
Article
Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System
by Soonshin Seo and Ji-Hwan Kim
Electronics 2020, 9(10), 1706; https://doi.org/10.3390/electronics9101706 - 17 Oct 2020
Cited by 4 | Viewed by 3117
Abstract
One of the most important parts of a text-independent speaker verification system is speaker embedding generation. Previous studies demonstrated that shortcut connections-based multi-layer aggregation improves the representational power of a speaker embedding system. However, model parameters are relatively large in number, and unspecified [...] Read more.
One of the most important parts of a text-independent speaker verification system is speaker embedding generation. Previous studies demonstrated that shortcut connections-based multi-layer aggregation improves the representational power of a speaker embedding system. However, model parameters are relatively large in number, and unspecified variations increase in the multi-layer aggregation. Therefore, in this study, we propose a self-attentive multi-layer aggregation with feature recalibration and deep length normalization for a text-independent speaker verification system. To reduce the number of model parameters, we set the ResNet with the scaled channel width and layer depth as a baseline. To control the variability in the training, we apply a self-attention mechanism to perform multi-layer aggregation with dropout regularizations and batch normalizations. Subsequently, we apply a feature recalibration layer to the aggregated feature using fully-connected layers and nonlinear activation functions. Further, deep length normalization is used on a recalibrated feature in the training process. Experimental results using the VoxCeleb1 evaluation dataset showed that the performance of the proposed methods was comparable to that of state-of-the-art models (equal error rate of 4.95% and 2.86%, using the VoxCeleb1 and VoxCeleb2 training datasets, respectively). Full article
(This article belongs to the Special Issue Human Computer Interaction for Intelligent Systems)
Show Figures

Figure 1

12 pages, 1672 KB  
Article
Addressing Text-Dependent Speaker Verification Using Singing Speech
by Yan Shi, Juanjuan Zhou, Yanhua Long, Yijie Li and Hongwei Mao
Appl. Sci. 2019, 9(13), 2636; https://doi.org/10.3390/app9132636 - 28 Jun 2019
Cited by 7 | Viewed by 3459
Abstract
The automatic speaker verification (ASV) has achieved significant progress in recent years. However, it is still very challenging to generalize the ASV technologies to new, unknown and spoofing conditions. Most previous studies focused on extracting the speaker information from natural speech. This paper [...] Read more.
The automatic speaker verification (ASV) has achieved significant progress in recent years. However, it is still very challenging to generalize the ASV technologies to new, unknown and spoofing conditions. Most previous studies focused on extracting the speaker information from natural speech. This paper attempts to address the speaker verification from another perspective. The speaker identity information was exploited from singing speech. We first designed and released a new corpus for speaker verification based on singing and normal reading speech. Then, the speaker discrimination was compared and analyzed between natural and singing speech in different feature spaces. Furthermore, the conventional Gaussian mixture model, the dynamic time warping and the state-of-the-art deep neural network were investigated. They were used to build text-dependent ASV systems with different training-test conditions. Experimental results show that the voiceprint information in the singing speech was more distinguishable than the one in the normal speech. More than relative 20% reduction of equal error rate was obtained on both the gender-dependent and independent 1 s-1 s evaluation tasks. Full article
(This article belongs to the Section Acoustics and Vibrations)
Show Figures

Figure 1

Back to TopTop