Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (5)

Search Parameters:
Keywords = imaginary speech recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 829 KB  
Article
SPaRLoRA: Spectral-Phase Residual Initialization for LoRA in Low-Resource ASR
by Liang Lan, Wenyong Wang, Guanyu Zou, Jia Wang and Daliang Wang
Electronics 2025, 14(22), 4466; https://doi.org/10.3390/electronics14224466 - 16 Nov 2025
Viewed by 593
Abstract
Parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaptation (LoRA) are widely used to adapt large pre-trained models under limited resources, yet they often underperform full fine-tuning in low-resource automatic speech recognition (ASR). This gap stems partly from initialization strategies that ignore speech signals’ inherent [...] Read more.
Parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaptation (LoRA) are widely used to adapt large pre-trained models under limited resources, yet they often underperform full fine-tuning in low-resource automatic speech recognition (ASR). This gap stems partly from initialization strategies that ignore speech signals’ inherent spectral-phase structure. Unlike SVD/QR-based approaches (PiSSA, OLoRA) that construct mathematically optimal but signal-agnostic subspaces, we propose SPaRLoRA (Spectral-Phase Residual LoRA), which leverages Discrete Fourier Transform (DFT) bases to create speech-aware low-rank adapters. SPaRLoRA explicitly incorporates both magnitude and phase information by concatenating real and imaginary parts of DFT basis vectors, and applies residual correction to focus learning exclusively on components unexplained by the spectral subspace. Evaluated on a 200-h Sichuan dialect ASR benchmark, SPaRLoRA achieves a 2.1% relative character error rate reduction over standard LoRA, outperforming variants including DoRA, PiSSA, and OLoRA. Ablation studies confirm the individual and complementary benefits of spectral basis, phase awareness, and residual correction. Our work demonstrates that signal-structure-aware initialization significantly enhances parameter-efficient fine-tuning for low-resource ASR without architectural changes or added inference cost. Full article
(This article belongs to the Special Issue Multimodal Learning and Transfer Learning)
Show Figures

Figure 1

14 pages, 1810 KB  
Article
Efficient Speech Signal Dimensionality Reduction Using Complex-Valued Techniques
by Sungkyun Ko and Minho Park
Electronics 2024, 13(15), 3046; https://doi.org/10.3390/electronics13153046 - 1 Aug 2024
Cited by 1 | Viewed by 1427
Abstract
In this study, we propose the CVMFCC-DR (Complex-Valued Mel-Frequency Cepstral Coefficients Dimensionality Reduction) algorithm as an efficient method for reducing the dimensionality of speech signals. By utilizing the complex-valued MFCC technique, which considers both real and imaginary components, our algorithm enables dimensionality reduction [...] Read more.
In this study, we propose the CVMFCC-DR (Complex-Valued Mel-Frequency Cepstral Coefficients Dimensionality Reduction) algorithm as an efficient method for reducing the dimensionality of speech signals. By utilizing the complex-valued MFCC technique, which considers both real and imaginary components, our algorithm enables dimensionality reduction without information loss while decreasing computational costs. The efficacy of the proposed algorithm is validated through experiments which demonstrate its effectiveness in building a speech recognition model using a complex-valued neural network. Additionally, a complex-valued softmax interpretation method for complex numbers is introduced. The experimental results indicate that the approach yields enhanced performance compared to traditional MFCC-based techniques, thereby highlighting its potential in the field of speech recognition. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence Engineering)
Show Figures

Figure 1

13 pages, 2067 KB  
Article
A Dual-Branch Speech Enhancement Model with Harmonic Repair
by Lizhen Jia, Yanyan Xu and Dengfeng Ke
Appl. Sci. 2024, 14(4), 1645; https://doi.org/10.3390/app14041645 - 18 Feb 2024
Viewed by 2770
Abstract
Recent speech enhancement studies have mostly focused on completely separating noise from human voices. Due to the lack of specific structures for harmonic fitting in previous studies and the limitations of the traditional convolutional receptive field, there is an inevitable decline in the [...] Read more.
Recent speech enhancement studies have mostly focused on completely separating noise from human voices. Due to the lack of specific structures for harmonic fitting in previous studies and the limitations of the traditional convolutional receptive field, there is an inevitable decline in the auditory quality of the enhanced speech, leading to a decrease in the performance of subsequent tasks such as speech recognition and speaker identification. To address these problems, this paper proposes a Harmonic Repair Large Frame enhancement model, called HRLF-Net, that uses a harmonic repair network for denoising, followed by a real-imaginary dual branch structure for restoration. This approach fully utilizes the harmonic overtones to match the original harmonic distribution of speech. In the subsequent branch process, it restores the speech to specifically optimize its auditory quality to the human ear. Experiments show that under HRLF-Net, the intelligibility and quality of speech are significantly improved, and harmonic information is effectively restored. Full article
(This article belongs to the Special Issue Advanced Technology in Speech and Acoustic Signal Processing)
Show Figures

Figure 1

20 pages, 1537 KB  
Article
Imaginary Speech Recognition Using a Convolutional Network with Long-Short Memory
by Ana-Luiza Rusnac and Ovidiu Grigore
Appl. Sci. 2022, 12(22), 11873; https://doi.org/10.3390/app122211873 - 21 Nov 2022
Cited by 8 | Viewed by 3907
Abstract
In recent years, a lot of researchers’ attentions were concentrating on imaginary speech understanding, decoding, and even recognition. Speech is a complex mechanism, which involves multiple brain areas in the process of production, planning, and precise control of a large number of muscles [...] Read more.
In recent years, a lot of researchers’ attentions were concentrating on imaginary speech understanding, decoding, and even recognition. Speech is a complex mechanism, which involves multiple brain areas in the process of production, planning, and precise control of a large number of muscles and articulation involved in the actual utterance. This paper proposes an intelligent imaginary speech recognition system of eleven different utterances, seven phonemes, and four words from the Kara One database. We showed, during our research, that the feature space of the cross-covariance in frequency domain offers a better perspective of the imaginary speech by computing LDA for 2D representation of the feature space, in comparison to cross-covariance in the time domain and the raw signals without any processing. In the classification stage, we used a CNNLSTM neural network and obtained a performance of 43% accuracy for all eleven different utterances. The developed system was meant to be a subject’s shared system. We also showed that, using the channels corresponding to the anatomical structures of the brain involved in speech production, i.e., Broca area, primary motor cortex, and secondary motor cortex, 93% of information is preserved, obtaining 40% accuracy by using 29 electrodes out of the initial 62. Full article
(This article belongs to the Special Issue Applied Artificial Intelligence (AI))
Show Figures

Figure 1

19 pages, 3818 KB  
Article
CNN Architectures and Feature Extraction Methods for EEG Imaginary Speech Recognition
by Ana-Luiza Rusnac and Ovidiu Grigore
Sensors 2022, 22(13), 4679; https://doi.org/10.3390/s22134679 - 21 Jun 2022
Cited by 23 | Viewed by 5810
Abstract
Speech is a complex mechanism allowing us to communicate our needs, desires and thoughts. In some cases of neural dysfunctions, this ability is highly affected, which makes everyday life activities that require communication a challenge. This paper studies different parameters of an intelligent [...] Read more.
Speech is a complex mechanism allowing us to communicate our needs, desires and thoughts. In some cases of neural dysfunctions, this ability is highly affected, which makes everyday life activities that require communication a challenge. This paper studies different parameters of an intelligent imaginary speech recognition system to obtain the best performance according to the developed method that can be applied to a low-cost system with limited resources. In developing the system, we used signals from the Kara One database containing recordings acquired for seven phonemes and four words. We used in the feature extraction stage a method based on covariance in the frequency domain that performed better compared to the other time-domain methods. Further, we observed the system performance when using different window lengths for the input signal (0.25 s, 0.5 s and 1 s) to highlight the importance of the short-term analysis of the signals for imaginary speech. The final goal being the development of a low-cost system, we studied several architectures of convolutional neural networks (CNN) and showed that a more complex architecture does not necessarily lead to better results. Our study was conducted on eight different subjects, and it is meant to be a subject’s shared system. The best performance reported in this paper is up to 37% accuracy for all 11 different phonemes and words when using cross-covariance computed over the signal spectrum of a 0.25 s window and a CNN containing two convolutional layers with 64 and 128 filters connected to a dense layer with 64 neurons. The final system qualifies as a low-cost system using limited resources for decision-making and having a running time of 1.8 ms tested on an AMD Ryzen 7 4800HS CPU. Full article
(This article belongs to the Special Issue Artificial Neural Networks for IoT-Enabled Smart Applications)
Show Figures

Figure 1

Back to TopTop