MDPI - Publisher of Open Access Journals

15 pages, 4321 KB

Open AccessArticle

Feasibility Study of Real-Time Speech Detection and Characterization Using Millimeter-Wave Micro-Doppler Radar

by Nati Steinmetz and Nezah Balal

Remote Sens. 2025, 17(1), 91; https://doi.org/10.3390/rs17010091 - 29 Dec 2024

Viewed by 2165

This study presents a novel approach to remote speech recognition using a millimeter-wave micro-Doppler radar system operating at 94 GHz. By detecting micro-Doppler speech-related vibrations, the system enables non-contact and privacy-preserving speech recognition. Initial experiments used a piezoelectric crystal to simulate vocal cord [...] Read more.

This study presents a novel approach to remote speech recognition using a millimeter-wave micro-Doppler radar system operating at 94 GHz. By detecting micro-Doppler speech-related vibrations, the system enables non-contact and privacy-preserving speech recognition. Initial experiments used a piezoelectric crystal to simulate vocal cord vibrations, followed by tests with actual human speech. Advanced signal processing techniques, including short-time Fourier transform (STFT), were used to generate spectrograms and reconstruct speech signals. The system demonstrated high accuracy, with cross-correlation analysis quantitatively confirming a strong correlation between radar-reconstructed and original audio signals. These results validate the effectiveness of detecting and characterizing speech-related vibrations without direct audio recording. The findings have significant implications for applications in noisy industrial environments, enabling robust voice interaction capabilities, as well as in healthcare diagnostics and assistive technologies, where contactless and privacy-preserving solutions are essential. Future research will explore diverse real-world scenarios and the integration of advanced signal processing and machine learning techniques to further enhance accuracy and robustness. Full article

(This article belongs to the Special Issue Remote Sensing in 2024)

► Show Figures

Figure 1

17 pages, 1801 KB

Open AccessArticle

Toward Effective Aircraft Call Sign Detection Using Fuzzy String-Matching between ASR and ADS-B Data

by Mohammed Saïd Kasttet, Abdelouahid Lyhyaoui, Douae Zbakh, Adil Aramja and Abderazzek Kachkari

Aerospace 2024, 11(1), 32; https://doi.org/10.3390/aerospace11010032 - 29 Dec 2023

Cited by 5 | Viewed by 3060

Abstract

Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-end Recurrent Neural Network (RNN) toolkits were developed with higher speed [...] Read more.

Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-end Recurrent Neural Network (RNN) toolkits were developed with higher speed and accuracy that can often achieve a Word Error Rate (WER) below 10%. These toolkits can nowadays be deployed, for instance, within aircraft cockpits and Air Traffic Control (ATC) systems in order to identify aircraft and display recognized voice messages related to flight data, especially for airports not equipped with radar. Hence, the performance of air traffic controllers and pilots can ultimately be improved by reducing workload and stress and enforcing safety standards. Our experiment conducted at Tangier’s International Airport ATC aimed to build an ASR model that is able to recognize aircraft call signs in a fast and accurate way. The acoustic and linguistic models were trained on the Ibn Battouta Speech Corpus (IBSC), resulting in an unprecedented speech dataset with approved transcription that includes real weather aerodrome observation data and flight information with a call sign captured by an ADS-B receiver. All of these data were synchronized with voice recordings in a structured format. We calculated the WER to evaluate the model’s accuracy and compared different methods of dataset training for model building and adaptation. Despite the high interference in the VHF radio communication channel and fast-speaking conditions that increased the WER level to 20%, our standalone and low-cost ASR system with a trained RNN model, supported by the Deep Speech toolkit, was able to achieve call sign detection rate scores up to 96% in air traffic controller messages and 90% in pilot messages while displaying related flight information from ADS-B data using the Fuzzy string-matching algorithm. Full article

(This article belongs to the Special Issue Automatic Speech Recognition and Understanding in Air Traffic Management)

► Show Figures

Figure 1

11 pages, 2251 KB

Open AccessArticle

Doppler Radar-Based Human Speech Recognition Using Mobile Vision Transformer

by Wei Li, Yongfu Geng, Yang Gao, Qining Ding, Dandan Li, Nanqi Liu and Jinheng Chen

Electronics 2023, 12(13), 2874; https://doi.org/10.3390/electronics12132874 - 29 Jun 2023

Cited by 2 | Viewed by 3715

Abstract

As one of the important vital features of the human body, the acquisition of a speech signal plays an important role in human–computer interaction. In this study, voice sounds are gathered and identified using Doppler radar. The skin on the neck vibrates when [...] Read more.

As one of the important vital features of the human body, the acquisition of a speech signal plays an important role in human–computer interaction. In this study, voice sounds are gathered and identified using Doppler radar. The skin on the neck vibrates when a person speaks, which causes the vocal cords to vibrate as well. The vibration signal received by the radar will produce a unique micro-Doppler signal according to words with different pronunciations. Following the conversion of these signals into micro-Doppler feature maps, these speech signal maps are categorized and identified. The speech recognition method used in this paper is on neural networks. CNN convolutional neural networks have a lower generalization and accuracy when there are insufficient training samples and sample extraction bias, and the training model is not suitable for use on mobile terminals. MobileViT is a lightweight transformers-based model that can be used for image classification tasks. MobileViT uses a lightweight attention mechanism to extract features with a faster inference speed and smaller model size while ensuring a higher accuracy. Our proposed method does not require large-scale data collection, which is beneficial for different users. In addition, the learning speed is relatively fast, with an accuracy of 99.5%. Full article

► Show Figures

Figure 1

32 pages, 2982 KB

Open AccessArticle

Validating Automatic Speech Recognition and Understanding for Pre-Filling Radar Labels—Increasing Safety While Reducing Air Traffic Controllers’ Workload

by Nils Ahrenhold, Hartmut Helmke, Thorsten Mühlhausen, Oliver Ohneiser, Matthias Kleinert, Heiko Ehr, Lucas Klamert and Juan Zuluaga-Gómez

Aerospace 2023, 10(6), 538; https://doi.org/10.3390/aerospace10060538 - 5 Jun 2023

Cited by 11 | Viewed by 3452

Abstract

Automatic speech recognition and understanding (ASRU) for air traffic control (ATC) has been investigated in different ATC environments and applications. The objective of this study was to quantify the effect of ASRU support for air traffic controllers (ATCos) radar label maintenance in terms [...] Read more.

Automatic speech recognition and understanding (ASRU) for air traffic control (ATC) has been investigated in different ATC environments and applications. The objective of this study was to quantify the effect of ASRU support for air traffic controllers (ATCos) radar label maintenance in terms of safety and human performance. Therefore, an implemented ASRU system was validated within a human-in-the-loop environment by ATCos in different traffic-density scenarios. In the baseline condition, ATCos performed radar label maintenance by entering verbally instructed ATC commands with a mouse and keyboard. In the proposed solution, ATCos were supported by ASRU, which achieved a command recognition rate of 92.5% with a command error rate of 2.4%. ASRU support reduced the number of wrong or missing inputs from ATCos into the radar label by a factor of two, which contemporaneously improved their situational awareness. Furthermore, ATCos where able to perform more successful secondary tasks when using ASRU support, indicating a greater capacity to handle unexpected events. The results from NASA TLX showed that the perceived workload decreased with a statistical significance of 4.3% across all scenarios. In conclusion, this study provides evidence that using ASRU for radar label maintenance can significantly reduce workload and improve flight safety. Full article

(This article belongs to the Special Issue Automatic Speech Recognition and Understanding in Air Traffic Management)

► Show Figures

Figure 1

15 pages, 4317 KB

Open AccessArticle

IR-UWB Radar-Based Robust Heart Rate Detection Using a Deep Learning Technique Intended for Vehicular Applications

by Faheem Khan, Stéphane Azou, Roua Youssef, Pascal Morel and Emanuel Radoi

Electronics 2022, 11(16), 2505; https://doi.org/10.3390/electronics11162505 - 11 Aug 2022

Cited by 24 | Viewed by 8737

Abstract

This paper deals with robust heart rate detection intended for the in-car monitoring of people. There are two main problems associated with radar-based heart rate detection. Firstly, the signal associated with the human heart is difficult to separate from breathing harmonics in the [...] Read more.

This paper deals with robust heart rate detection intended for the in-car monitoring of people. There are two main problems associated with radar-based heart rate detection. Firstly, the signal associated with the human heart is difficult to separate from breathing harmonics in the frequency domain. Secondly, the vital signal is affected by any interference signal from hand gestures, lips motion during speech or any other random body motions (RBM). To handle the problem of the breathing harmonics, we propose a novel algorithm based on time series data instead of the conventionally used frequency domain technique. In our proposed method, a deep learning classifier is used to detect the pattern of the heart rate signal. To deal with the interference mitigation from the random body motions, we identify an optimum location for the radar sensor inside the car. In this paper, a commercially available Novelda Xethru X4 radar is used for signal acquisition and vital sign measurement of 5 people. The performance of the proposed algorithm is compared with and found to be superior to that of the conventional frequency domain technique. Full article

(This article belongs to the Special Issue New Trends and Methods in Communication Systems)

► Show Figures

Figure 1

17 pages, 821 KB

Open AccessEditor’s ChoiceArticle

Exploring Silent Speech Interfaces Based on Frequency-Modulated Continuous-Wave Radar

by David Ferreira, Samuel Silva, Francisco Curado and António Teixeira

Sensors 2022, 22(2), 649; https://doi.org/10.3390/s22020649 - 14 Jan 2022

Cited by 19 | Viewed by 5110

Abstract

Speech is our most natural and efficient form of communication and offers a strong potential to improve how we interact with machines. However, speech communication can sometimes be limited by environmental (e.g., ambient noise), contextual (e.g., need for privacy), or health conditions (e.g., [...] Read more.

Speech is our most natural and efficient form of communication and offers a strong potential to improve how we interact with machines. However, speech communication can sometimes be limited by environmental (e.g., ambient noise), contextual (e.g., need for privacy), or health conditions (e.g., laryngectomy), preventing the consideration of audible speech. In this regard, silent speech interfaces (SSI) have been proposed as an alternative, considering technologies that do not require the production of acoustic signals (e.g., electromyography and video). Unfortunately, despite their plentitude, many still face limitations regarding their everyday use, e.g., being intrusive, non-portable, or raising technical (e.g., lighting conditions for video) or privacy concerns. In line with this necessity, this article explores the consideration of contactless continuous-wave radar to assess its potential for SSI development. A corpus of 13 European Portuguese words was acquired for four speakers and three of them enrolled in a second acquisition session, three months later. Regarding the speaker-dependent models, trained and tested with data from each speaker while using 5-fold cross-validation, average accuracies of 84.50% and 88.00% were respectively obtained from Bagging (BAG) and Linear Regression (LR) classifiers, respectively. Additionally, recognition accuracies of 81.79% and 81.80% were also, respectively, achieved for the session and speaker-independent experiments, establishing promising grounds for further exploring this technology towards silent speech recognition. Full article

(This article belongs to the Special Issue Future Speech Interfaces with Sensors and Machine Intelligence)

► Show Figures

Figure 1

24 pages, 2441 KB

Open AccessArticle

Respiration Based Non-Invasive Approach for Emotion Recognition Using Impulse Radio Ultra Wide Band Radar and Machine Learning

by Hafeez Ur Rehman Siddiqui, Hina Fatima Shahzad, Adil Ali Saleem, Abdul Baqi Khan Khakwani, Furqan Rustam, Ernesto Lee, Imran Ashraf and Sandra Dudley

Sensors 2021, 21(24), 8336; https://doi.org/10.3390/s21248336 - 13 Dec 2021

Cited by 36 | Viewed by 5926

Abstract

Emotion recognition gained increasingly prominent attraction from a multitude of fields recently due to their wide use in human-computer interaction interface, therapy, and advanced robotics, etc. Human speech, gestures, facial expressions, and physiological signals can be used to recognize different emotions. Despite the [...] Read more.

Emotion recognition gained increasingly prominent attraction from a multitude of fields recently due to their wide use in human-computer interaction interface, therapy, and advanced robotics, etc. Human speech, gestures, facial expressions, and physiological signals can be used to recognize different emotions. Despite the discriminating properties to recognize emotions, the first three methods have been regarded as ineffective as the probability of human’s voluntary and involuntary concealing the real emotions can not be ignored. Physiological signals, on the other hand, are capable of providing more objective, and reliable emotion recognition. Based on physiological signals, several methods have been introduced for emotion recognition, yet, predominantly such approaches are invasive involving the placement of on-body sensors. The efficacy and accuracy of these approaches are hindered by the sensor malfunctioning and erroneous data due to human limbs movement. This study presents a non-invasive approach where machine learning complements the impulse radio ultra-wideband (IR-UWB) signals for emotion recognition. First, the feasibility of using IR-UWB for emotion recognition is analyzed followed by determining the state of emotions into happiness, disgust, and fear. These emotions are triggered using carefully selected video clips to human subjects involving both males and females. The convincing evidence that different breathing patterns are linked with different emotions has been leveraged to discriminate between different emotions. Chest movement of thirty-five subjects is obtained using IR-UWB radar while watching the video clips in solitude. Extensive signal processing is applied to the obtained chest movement signals to estimate respiration rate per minute (RPM). The RPM estimated by the algorithm is validated by repeated measurements by a commercially available Pulse Oximeter. A dataset is maintained comprising gender, RPM, age, and associated emotions which are further used with several machine learning algorithms for automatic recognition of human emotions. Experiments reveal that IR-UWB possesses the potential to differentiate between different human emotions with a decent accuracy of 76% without placing any on-body sensors. Separate analysis for male and female participants reveals that males experience high arousal for happiness while females experience intense fear emotions. For disgust emotion, no large difference is found for male and female participants. To the best of the authors’ knowledge, this study presents the first non-invasive approach using the IR-UWB radar for emotion recognition. Full article

(This article belongs to the Special Issue Ultra Wideband (UWB) Systems in Biomedical Sensing)

► Show Figures

Figure 1

19 pages, 980 KB

Open AccessArticle

Constant-Beamwidth Beamforming with Concentric Ring Arrays

by Avital Kleiman, Israel Cohen and Baruch Berdugo

Sensors 2021, 21(21), 7253; https://doi.org/10.3390/s21217253 - 31 Oct 2021

Cited by 13 | Viewed by 3876

Abstract

Designing beampatterns with constant beamwidth over a wide range of frequencies is useful in many applications in speech, radar, sonar and communication. In this paper, we design constant-beamwidth beamformers for concentric ring arrays. The proposed beamformers utilize the circular geometry to provide improved [...] Read more.

Designing beampatterns with constant beamwidth over a wide range of frequencies is useful in many applications in speech, radar, sonar and communication. In this paper, we design constant-beamwidth beamformers for concentric ring arrays. The proposed beamformers utilize the circular geometry to provide improved beamwidth consistency compared to beamformers which are designed for linear sensor arrays of the same order. In the proposed configuration, all sensors on each ring share the same weight value. This constraint significantly simplifies the beamformers and reduces the hardware and computational resources required in a physical setup. Furthermore, a theoretical justification of the beamforming method is provided. We demonstrate the advantages of the proposed beamformers compared to the one-dimensional configuration in terms of directivity index, white noise gain and sidelobe attenuation. Full article

(This article belongs to the Special Issue Sensors in Indoor Positioning Systems)

► Show Figures

Figure 1

17 pages, 1175 KB

Open AccessReview

Localization of Sound Sources: A Systematic Review

by Muhammad Usman Liaquat, Hafiz Suliman Munawar, Amna Rahman, Zakria Qadir, Abbas Z. Kouzani and M. A. Parvez Mahmud

Energies 2021, 14(13), 3910; https://doi.org/10.3390/en14133910 - 29 Jun 2021

Cited by 64 | Viewed by 11691

Abstract

Sound localization is a vast field of research and advancement which is used in many useful applications to facilitate communication, radars, medical aid, and speech enhancement to but name a few. Many different methods are presented in recent times in this field to [...] Read more.

Sound localization is a vast field of research and advancement which is used in many useful applications to facilitate communication, radars, medical aid, and speech enhancement to but name a few. Many different methods are presented in recent times in this field to gain benefits. Various types of microphone arrays serve the purpose of sensing the incoming sound. This paper presents an overview of the importance of using sound localization in different applications along with the use and limitations of ad-hoc microphones over other microphones. In order to overcome these limitations certain approaches are also presented. Detailed explanation of some of the existing methods that are used for sound localization using microphone arrays in the recent literature is given. Existing methods are studied in a comparative fashion along with the factors that influence the choice of one method over the others. This review is done in order to form a basis for choosing the best fit method for our use. Full article

► Show Figures

Figure 1

27 pages, 12013 KB

Open AccessArticle

Sound Localization for Ad-Hoc Microphone Arrays

by Muhammad Usman Liaquat, Hafiz Suliman Munawar, Amna Rahman, Zakria Qadir, Abbas Z. Kouzani and M. A. Parvez Mahmud

Energies 2021, 14(12), 3446; https://doi.org/10.3390/en14123446 - 10 Jun 2021

Cited by 25 | Viewed by 7878

Abstract

Sound localization is a field of signal processing that deals with identifying the origin of a detected sound signal. This involves determining the direction and distance of the source of the sound. Some useful applications of this phenomenon exists in speech enhancement, communication, [...] Read more.

Sound localization is a field of signal processing that deals with identifying the origin of a detected sound signal. This involves determining the direction and distance of the source of the sound. Some useful applications of this phenomenon exists in speech enhancement, communication, radars and in the medical field as well. The experimental arrangement requires the use of microphone arrays which record the sound signal. Some methods involve using ad-hoc arrays of microphones because of their demonstrated advantages over other arrays. In this research project, the existing sound localization methods have been explored to analyze the advantages and disadvantages of each method. A novel sound localization routine has been formulated which uses both the direction of arrival (DOA) of the sound signal along with the location estimation in three-dimensional space to precisely locate a sound source. The experimental arrangement consists of four microphones and a single sound source. Previously, sound source has been localized using six or more microphones. The precision of sound localization has been demonstrated to increase with the use of more microphones. In this research, however, we minimized the use of microphones to reduce the complexity of the algorithm and the computation time as well. The method results in novelty in the field of sound source localization by using less resources and providing results that are at par with the more complex methods requiring more microphones and additional tools to locate the sound source. The average accuracy of the system is found to be 96.77% with an error factor of 3.8%. Full article

► Show Figures

Figure 1

45 pages, 4487 KB

Open AccessReview

Application of Deep Learning on Millimeter-Wave Radar Signals: A Review

by Fahad Jibrin Abdu, Yixiong Zhang, Maozhong Fu, Yuhan Li and Zhenmiao Deng

Sensors 2021, 21(6), 1951; https://doi.org/10.3390/s21061951 - 10 Mar 2021

Cited by 98 | Viewed by 22706

Abstract

The progress brought by the deep learning technology over the last decade has inspired many research domains, such as radar signal processing, speech and audio recognition, etc., to apply it to their respective problems. Most of the prominent deep learning models exploit data [...] Read more.

The progress brought by the deep learning technology over the last decade has inspired many research domains, such as radar signal processing, speech and audio recognition, etc., to apply it to their respective problems. Most of the prominent deep learning models exploit data representations acquired with either Lidar or camera sensors, leaving automotive radars rarely used. This is despite the vital potential of radars in adverse weather conditions, as well as their ability to simultaneously measure an object’s range and radial velocity seamlessly. As radar signals have not been exploited very much so far, there is a lack of available benchmark data. However, recently, there has been a lot of interest in applying radar data as input to various deep learning algorithms, as more datasets are being provided. To this end, this paper presents a survey of various deep learning approaches processing radar signals to accomplish some significant tasks in an autonomous driving application, such as detection and classification. We have itemized the review based on different radar signal representations, as it is one of the critical aspects while using radar data with deep learning models. Furthermore, we give an extensive review of the recent deep learning-based multi-sensor fusion models exploiting radar signals and camera images for object detection tasks. We then provide a summary of the available datasets containing radar data. Finally, we discuss the gaps and important innovations in the reviewed papers and highlight some possible future research prospects. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

10 pages, 4849 KB

Open AccessProceeding Paper

Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications

by Juan Zuluaga-Gomez, Karel Veselý, Alexander Blatt, Petr Motlicek, Dietrich Klakow, Allan Tart, Igor Szöke, Amrutha Prasad, Saeed Sarfjoo, Pavel Kolčárek, Martin Kocour, Honza Černocký, Claudia Cevenini, Khalid Choukri, Mickael Rigault and Fabian Landis

Proceedings 2020, 59(1), 14; https://doi.org/10.3390/proceedings2020059014 - 3 Dec 2020

Cited by 11 | Viewed by 4081

Abstract

Voice communication is the main channel to exchange information between pilots and Air-Traffic Controllers (ATCos). Recently, several projects have explored the employment of speech recognition technology to automatically extract spoken key information such as call signs, commands, and values, which can be used [...] Read more.

Voice communication is the main channel to exchange information between pilots and Air-Traffic Controllers (ATCos). Recently, several projects have explored the employment of speech recognition technology to automatically extract spoken key information such as call signs, commands, and values, which can be used to reduce ATCos’ workload and increase performance and safety in Air-Traffic Control (ATC)-related activities. Nevertheless, the collection of ATC speech data is very demanding, expensive, and limited to the intrinsic speakers’ characteristics. As a solution, this paper presents ATCO

^{2}

, a project that aims to develop a unique platform to collect, organize, and pre-process ATC data collected from air space. Initially, the data are gathered directly through publicly accessible radio frequency channels with VHF receivers and LiveATC, which can be considered as an “unlimited-source” of low-quality data. The ATCO

^{2}

project explores employing context information such as radar and air surveillance data (collected with ADS-B and Mode S) from the OpenSky Network (OSN) to correlate call signs automatically extracted from voice communication with those available from ADS-B channels, to eventually increase the overall call sign detection rates. More specifically, the timestamp and location of the spoken command (issued by the ATCo by voice) are extracted, and a query is sent to the OSN server to retrieve the call sign tags in ICAO format for the airplanes corresponding to the given area. Then, a word sequence provided by an automatic speech recognition system is fed into a Natural Language Processing (NLP) based module together with the set of call signs available from the ADS-B channels. The NLP module extracts the call sign, command, and command arguments from the spoken utterance. Full article

(This article belongs to the Proceedings of 8th OpenSky Symposium 2020)

► Show Figures

Figure 1

10 pages, 485 KB

Open AccessLetter

Fast Recursive Computation of Sliding DHT with Arbitrary Step

by Vitaly Kober

Sensors 2020, 20(19), 5556; https://doi.org/10.3390/s20195556 - 28 Sep 2020

Cited by 5 | Viewed by 2013

Abstract

Short-time (sliding) transform based on discrete Hartley transform (DHT) is often used to estimate the power spectrum of a quasi-stationary process such as speech, audio, radar, communication, and biomedical signals. Sliding transform calculates the transform coefficients of the signal in a fixed-size moving [...] Read more.

Short-time (sliding) transform based on discrete Hartley transform (DHT) is often used to estimate the power spectrum of a quasi-stationary process such as speech, audio, radar, communication, and biomedical signals. Sliding transform calculates the transform coefficients of the signal in a fixed-size moving window. In order to speed up the spectral analysis of signals with slowly changing spectra, the window can slide along the signal with a step of more than one. A fast algorithm for computing the discrete Hartley transform in windows that are equidistant from each other is proposed. The algorithm is based on a second-order recursive relation between subsequent equidistant local transform spectra. The performance of the proposed algorithm with respect to computational complexity is compared with the performance of known fast Hartley transform and sliding algorithms. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

16 pages, 3046 KB

Open AccessArticle

Non-Contact Speech Recovery Technology Using a 24 GHz Portable Auditory Radar and Webcam

by Yue Ma, Hong Hong, Hui Li, Heng Zhao, Yusheng Li, Li Sun, Chen Gu and Xiaohua Zhu

Remote Sens. 2020, 12(4), 653; https://doi.org/10.3390/rs12040653 - 17 Feb 2020

Cited by 7 | Viewed by 4782

Abstract

Language has been one of the most effective ways of human communication and information exchange. To solve the problem of non-contact robust speech recognition, recovery, and surveillance, this paper presents a speech recovery technology based on a 24 GHz portable auditory radar and [...] Read more.

Language has been one of the most effective ways of human communication and information exchange. To solve the problem of non-contact robust speech recognition, recovery, and surveillance, this paper presents a speech recovery technology based on a 24 GHz portable auditory radar and webcam. The continuous-wave auditory radar is utilized to extract the vocal vibration signal, and the webcam is used to obtain the fitted formant frequency. The traditional formant speech synthesizer is selected to synthesize and recover speech, using the vocal vibration signal as the sound source excitation and the fitted formant frequency as the vocal tract resonance characteristics. Experiments on reading single English characters and words are carried out. Using microphone records as a reference, the effectiveness of the proposed speech recovery technology is verified. Mean opinion scores show a relatively high consistency between the synthesized speech and original acoustic speech. Full article

(This article belongs to the Special Issue Radar Remote Sensing on Life Activities)

► Show Figures

Graphical abstract

19 pages, 2723 KB

Open AccessArticle

Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

by Young Hoon Shin and Jiwon Seo

Sensors 2016, 16(11), 1812; https://doi.org/10.3390/s16111812 - 29 Oct 2016

Cited by 30 | Viewed by 8839

Abstract

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract [...] Read more.

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

Search Results (20)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (20)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI