Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (21)

Search Parameters:
Keywords = GMM-HMM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 2624 KiB  
Article
GMM-HMM-Based Eye Movement Classification for Efficient and Intuitive Dynamic Human–Computer Interaction Systems
by Jiacheng Xie, Rongfeng Chen, Ziming Liu, Jiahao Zhou, Juan Hou and Zengxiang Zhou
J. Eye Mov. Res. 2025, 18(4), 28; https://doi.org/10.3390/jemr18040028 - 9 Jul 2025
Viewed by 299
Abstract
Human–computer interaction (HCI) plays a crucial role across various fields, with eye-tracking technology emerging as a key enabler for intuitive and dynamic control in assistive systems like Assistive Robotic Arms (ARAs). By precisely tracking eye movements, this technology allows for more natural user [...] Read more.
Human–computer interaction (HCI) plays a crucial role across various fields, with eye-tracking technology emerging as a key enabler for intuitive and dynamic control in assistive systems like Assistive Robotic Arms (ARAs). By precisely tracking eye movements, this technology allows for more natural user interaction. However, current systems primarily rely on the single gaze-dependent interaction method, which leads to the “Midas Touch” problem. This highlights the need for real-time eye movement classification in dynamic interactions to ensure accurate and efficient control. This paper proposes a novel Gaussian Mixture Model–Hidden Markov Model (GMM-HMM) classification algorithm aimed at overcoming the limitations of traditional methods in dynamic human–robot interactions. By incorporating sum of squared error (SSE)-based feature extraction and hierarchical training, the proposed algorithm achieves a classification accuracy of 94.39%, significantly outperforming existing approaches. Furthermore, it is integrated with a robotic arm system, enabling gaze trajectory-based dynamic path planning, which reduces the average path planning time to 2.97 milliseconds. The experimental results demonstrate the effectiveness of this approach, offering an efficient and intuitive solution for human–robot interaction in dynamic environments. This work provides a robust framework for future assistive robotic systems, improving interaction intuitiveness and efficiency in complex real-world scenarios. Full article
Show Figures

Figure 1

20 pages, 343 KiB  
Article
Mathematical Modeling and Parameter Estimation of Lane-Changing Vehicle Behavior Decisions
by Jianghui Wen, Yebei Xu, Min Dai and Nengchao Lyu
Mathematics 2025, 13(6), 1014; https://doi.org/10.3390/math13061014 - 20 Mar 2025
Viewed by 429
Abstract
Lane changing is a crucial scenario in traffic environments, and accurately recognizing and predicting lane-changing behavior is essential for ensuring the safety of both autonomous vehicles and drivers. Through considering the multi-vehicle information interaction characteristics in lane-changing behavior for vehicles and the impact [...] Read more.
Lane changing is a crucial scenario in traffic environments, and accurately recognizing and predicting lane-changing behavior is essential for ensuring the safety of both autonomous vehicles and drivers. Through considering the multi-vehicle information interaction characteristics in lane-changing behavior for vehicles and the impact of driver experience needs on lane-changing decisions, this paper proposes a lane-changing model for vehicles to achieve safe and comfortable driving. Firstly, a lane-changing intention recognition model incorporating interaction effects was established to obtain the initial lane-changing intention probability of the vehicles. Secondly, by accounting for individual driving styles, a lane-changing behavior decision model was constructed based on a Gaussian mixture hidden Markov model (GMM-HMM) along with a parameter estimation method. The initial lane-changing intention probability serves as the input for the decision model, and the final lane-changing decision is made by comparing the probabilities of lane-changing and non-lane-changing scenarios. Finally, the model was validated using real-world data from the Next Generation Simulation (NGSIM) dataset, with empirical results demonstrating its high accuracy in recognizing and predicting lane-changing behavior. This study provides a robust framework for enhancing lane-changing decision making in complex traffic environments. Full article
Show Figures

Figure 1

16 pages, 4341 KiB  
Article
Research on Pig Sound Recognition Based on Deep Neural Network and Hidden Markov Models
by Weihao Pan, Hualong Li, Xiaobo Zhou, Jun Jiao, Cheng Zhu and Qiang Zhang
Sensors 2024, 24(4), 1269; https://doi.org/10.3390/s24041269 - 16 Feb 2024
Cited by 10 | Viewed by 2891
Abstract
In order to solve the problem of low recognition accuracy of traditional pig sound recognition methods, deep neural network (DNN) and Hidden Markov Model (HMM) theory were used as the basis of pig sound signal recognition in this study. In this study, the [...] Read more.
In order to solve the problem of low recognition accuracy of traditional pig sound recognition methods, deep neural network (DNN) and Hidden Markov Model (HMM) theory were used as the basis of pig sound signal recognition in this study. In this study, the sounds made by 10 landrace pigs during eating, estrus, howling, humming and panting were collected and preprocessed by Kalman filtering and an improved endpoint detection algorithm based on empirical mode decomposition-Teiger energy operator (EMD-TEO) cepstral distance. The extracted 39-dimensional mel-frequency cepstral coefficients (MFCCs) were then used as a dataset for network learning and recognition to build a DNN- and HMM-based sound recognition model for pig states. The results show that in the pig sound dataset, the recognition accuracy of DNN-HMM reaches 83%, which is 22% and 17% higher than that of the baseline models HMM and GMM-HMM, and possesses a better recognition effect. In a sub-dataset of the publicly available dataset AudioSet, DNN-HMM achieves a recognition accuracy of 79%, which is 8% and 4% higher than the classical models SVM and ResNet18, respectively, with better robustness. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

15 pages, 2973 KiB  
Article
An Efficient Method for the Reliability Evaluation of Power Systems Considering the Variable Photovoltaic Power Output
by Haojie He, Liweiyong Guo, Peidong Han, Changzheng Shao and Tan Xu
Appl. Sci. 2023, 13(16), 9053; https://doi.org/10.3390/app13169053 - 8 Aug 2023
Cited by 1 | Viewed by 1756
Abstract
The operational reliability of power systems is threatened by the random failure of components and uncertain power output of renewable energies, such as photovoltaics. Under such circumstances, reliability evaluation is necessary for maintaining a continuous and stable energy supply. However, traditional reliability evaluation [...] Read more.
The operational reliability of power systems is threatened by the random failure of components and uncertain power output of renewable energies, such as photovoltaics. Under such circumstances, reliability evaluation is necessary for maintaining a continuous and stable energy supply. However, traditional reliability evaluation methods are usually extremely time-consuming, considering the numerous system states that need to be analysed. Hence, the reliability evaluation process cannot follow up the dynamic changes in PV output, which makes the timeline of the evaluation disappointing. This paper proposes an efficient reliability evaluation method for power systems with PV integration. The method reveals the analytical relationship between the reliability levels of the power system and the uncertainty factors that influence the reliability, such as the PV output. In this way, the dynamic reliability evaluation is achieved, and the evaluation results can be updated timely when the output of PV changes. First, a Gaussian mixture-hidden Markov model (GMM-HMM) is used to model the distribution characteristics of PV output. Then, the state enumeration and the hyperbolic truncated polynomial chaos expansion method are used to determine the analytical relationship between the reliability indices and PV output. Lastly, based on the analytical function, the operational reliability of the power systems is dynamically evaluated considering the real-time PV output. The effectiveness of the proposed method is verified using the modified IEEE 30 system as an example. Full article
Show Figures

Figure 1

19 pages, 6405 KiB  
Article
Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition
by Young-Long Chen, Neng-Chung Wang, Jing-Fong Ciou and Rui-Qi Lin
Appl. Sci. 2023, 13(12), 7008; https://doi.org/10.3390/app13127008 - 10 Jun 2023
Cited by 7 | Viewed by 2324
Abstract
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet [...] Read more.
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods. Full article
Show Figures

Figure 1

22 pages, 3221 KiB  
Article
An HMM-DNN-Based System for the Detection and Classification of Low-Frequency Acoustic Signals from Baleen Whales, Earthquakes, and Air Guns off Chile
by Susannah J. Buchan, Miguel Duran, Constanza Rojas, Jorge Wuth, Rodrigo Mahu, Kathleen M. Stafford and Nestor Becerra Yoma
Remote Sens. 2023, 15(10), 2554; https://doi.org/10.3390/rs15102554 - 13 May 2023
Cited by 6 | Viewed by 3269
Abstract
Marine passive acoustic monitoring can be used to study biological, geophysical, and anthropogenic phenomena in the ocean. The wide range of characteristics from geophysical, biological, and anthropogenic sounds sources makes the simultaneous automatic detection and classification of these sounds a significant challenge. Here, [...] Read more.
Marine passive acoustic monitoring can be used to study biological, geophysical, and anthropogenic phenomena in the ocean. The wide range of characteristics from geophysical, biological, and anthropogenic sounds sources makes the simultaneous automatic detection and classification of these sounds a significant challenge. Here, we propose a single Hidden Markov Model-based system with a Deep Neural Network (HMM-DNN) for the detection and classification of low-frequency biological (baleen whales), geophysical (earthquakes), and anthropogenic (air guns) sounds. Acoustic data were obtained from the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization station off Juan Fernandez, Chile (station HA03) and annotated by an analyst (498 h of audio data containing 30,873 events from 19 different classes), and then divided into training (60%), testing (20%), and tuning (20%) subsets. Each audio frame was represented as an observation vector obtained through a filterbank-based spectral feature extraction procedure. The HMM-DNN training procedure was carried out discriminatively by setting HMM states as targets. A model with Gaussian Mixtures Models and HMM (HMM-GMM) was trained to obtain an initial set of HMM target states. Feature transformation based on Linear Discriminant Analysis and Maximum Likelihood Linear Transform was also incorporated. The HMM-DNN system displayed good capacity for correctly detecting and classifying events, with high event-level accuracy (84.46%), high weighted average sensitivity (84.46%), and high weighted average precision (89.54%). Event-level accuracy increased with higher event signal-to-noise ratios. Event-level metrics per class also showed that our HMM-DNN system generalized well for most classes but performances were best for classes that either had a high number of training exemplars (e.g., generally above 50) and/or were for classes of signals that had low variability in spectral features, duration, and energy levels. Fin whale and Antarctic blue whale song and air guns performed particularly well. Full article
(This article belongs to the Section Ocean Remote Sensing)
Show Figures

Figure 1

21 pages, 659 KiB  
Article
Extracting Statistical Properties of Solar and Photovoltaic Power Production for the Scope of Building a Sophisticated Forecasting Framework
by Joseph Ndong and Ted Soubdhan
Forecasting 2023, 5(1), 1-21; https://doi.org/10.3390/forecast5010001 - 29 Dec 2022
Cited by 1 | Viewed by 2283
Abstract
Building a sophisticated forecasting framework for solar and photovoltaic power production in geographic zones with severe meteorological conditions is very challenging. This difficulty is linked to the high variability of the global solar radiation on which the energy production depends. A suitable forecasting [...] Read more.
Building a sophisticated forecasting framework for solar and photovoltaic power production in geographic zones with severe meteorological conditions is very challenging. This difficulty is linked to the high variability of the global solar radiation on which the energy production depends. A suitable forecasting framework might take into account this high variability and could be able to adjust/re-adjust model parameters to reduce sensitivity to estimation errors. The framework should also be able to re-adapt the model parameters whenever the atmospheric conditions change drastically or suddenly—this changes according to microscopic variations. This work presents a new methodology to analyze carefully the meaningful features of global solar radiation variability and extract some relevant information about the probabilistic laws which governs its dynamic evolution. The work establishes a framework able to identify the macroscopic variations from the solar irradiance. The different categories of variability correspond to different levels of meteorological conditions and events and can occur in different time intervals. Thereafter, the tool will be able to extract the abrupt changes, corresponding to microscopic variations, inside each level of variability. The methodology is based on a combination of probability and possibility theory. An unsupervised clustering technique based on a Gaussian mixture model is proposed to identify, first, the categories of variability and, using a hidden Markov model, we study the temporal dependency of the process to identify the dynamic evolution of the solar irradiance as different temporal states. Finally, by means of some transformations of probabilities to possibilities, we identify the abrupt changes in the solar radiation. The study is performed in Guadeloupe, where we have a long record of global solar radiation data recorded at 1 Hertz. Full article
(This article belongs to the Collection Energy Forecasting)
Show Figures

Figure 1

15 pages, 1578 KiB  
Article
sEMG-Based Continuous Hand Action Prediction by Using Key State Transition and Model Pruning
by Kaikui Zheng, Shuai Liu, Jinxing Yang, Metwalli Al-Selwi and Jun Li
Sensors 2022, 22(24), 9949; https://doi.org/10.3390/s22249949 - 16 Dec 2022
Cited by 6 | Viewed by 2321
Abstract
Conventional classification of hand motions and continuous joint angle estimation based on sEMG have been widely studied in recent years. The classification task focuses on discrete motion recognition and shows poor real-time performance, while continuous joint angle estimation evaluates the real-time joint angles [...] Read more.
Conventional classification of hand motions and continuous joint angle estimation based on sEMG have been widely studied in recent years. The classification task focuses on discrete motion recognition and shows poor real-time performance, while continuous joint angle estimation evaluates the real-time joint angles by the continuity of the limb. Few researchers have investigated continuous hand action prediction based on hand motion continuity. In our study, we propose the key state transition as a condition for continuous hand action prediction and simulate the prediction process using a sliding window with long-term memory. Firstly, the key state modeled by GMM-HMMs is set as the condition. Then, the sliding window is used to dynamically look for the key state transition. The prediction results are given while finding the key state transition. To extend continuous multigesture action prediction, we use model pruning to improve reusability. Eight subjects participated in the experiment, and the results show that the average accuracy of continuous two-hand actions is 97% with a 70 ms time delay, which is better than LSTM (94.15%, 308 ms) and GRU (93.83%, 300 ms). In supplementary experiments with continuous four-hand actions, over 85% prediction accuracy is achieved with an average time delay of 90 ms. Full article
Show Figures

Figure 1

44 pages, 1693 KiB  
Review
An Overview of Machine Learning within Embedded and Mobile Devices–Optimizations and Applications
by Taiwo Samuel Ajani, Agbotiname Lucky Imoize and Aderemi A. Atayero
Sensors 2021, 21(13), 4412; https://doi.org/10.3390/s21134412 - 28 Jun 2021
Cited by 130 | Viewed by 21380
Abstract
Embedded systems technology is undergoing a phase of transformation owing to the novel advancements in computer architecture and the breakthroughs in machine learning applications. The areas of applications of embedded machine learning (EML) include accurate computer vision schemes, reliable speech recognition, innovative healthcare, [...] Read more.
Embedded systems technology is undergoing a phase of transformation owing to the novel advancements in computer architecture and the breakthroughs in machine learning applications. The areas of applications of embedded machine learning (EML) include accurate computer vision schemes, reliable speech recognition, innovative healthcare, robotics, and more. However, there exists a critical drawback in the efficient implementation of ML algorithms targeting embedded applications. Machine learning algorithms are generally computationally and memory intensive, making them unsuitable for resource-constrained environments such as embedded and mobile devices. In order to efficiently implement these compute and memory-intensive algorithms within the embedded and mobile computing space, innovative optimization techniques are required at the algorithm and hardware levels. To this end, this survey aims at exploring current research trends within this circumference. First, we present a brief overview of compute intensive machine learning algorithms such as hidden Markov models (HMM), k-nearest neighbors (k-NNs), support vector machines (SVMs), Gaussian mixture models (GMMs), and deep neural networks (DNNs). Furthermore, we consider different optimization techniques currently adopted to squeeze these computational and memory-intensive algorithms within resource-limited embedded and mobile environments. Additionally, we discuss the implementation of these algorithms in microcontroller units, mobile devices, and hardware accelerators. Conclusively, we give a comprehensive overview of key application areas of EML technology, point out key research directions and highlight key take-away lessons for future research exploration in the embedded machine learning domain. Full article
(This article belongs to the Special Issue Embedded Systems and Internet of Things)
Show Figures

Figure 1

14 pages, 572 KiB  
Article
Securing the Insecure: A First-Line-of-Defense for Body-Centric Nanoscale Communication Systems Operating in THz Band
by Waqas Aman, Muhammad Mahboob Ur Rahman, Hasan T. Abbas, Muhammad Arslan Khalid, Muhammad A. Imran, Akram Alomainy and Qammer H. Abbasi
Sensors 2021, 21(10), 3534; https://doi.org/10.3390/s21103534 - 19 May 2021
Cited by 5 | Viewed by 3796
Abstract
This manuscript presents a novel mechanism (at the physical layer) for authentication and transmitter identification in a body-centric nanoscale communication system operating in the terahertz (THz) band. The unique characteristics of the propagation medium in the THz band renders the existing techniques (say [...] Read more.
This manuscript presents a novel mechanism (at the physical layer) for authentication and transmitter identification in a body-centric nanoscale communication system operating in the terahertz (THz) band. The unique characteristics of the propagation medium in the THz band renders the existing techniques (say for impersonation detection in cellular networks) not applicable. In this work, we considered a body-centric network with multiple on-body nano-senor nodes (of which some nano-sensors have been compromised) who communicate their sensed data to a nearby gateway node. We proposed to protect the transmissions on the link between the legitimate nano-sensor nodes and the gateway by exploiting the path loss of the THz propagation medium as the fingerprint/feature of the sender node to carry out authentication at the gateway. Specifically, we proposed a two-step hypothesis testing mechanism at the gateway to counter the impersonation (false data injection) attacks by malicious nano-sensors. To this end, we computed the path loss of the THz link under consideration using the high-resolution transmission molecular absorption (HITRAN) database. Furthermore, to refine the outcome of the two-step hypothesis testing device, we modeled the impersonation attack detection problem as a hidden Markov model (HMM), which was then solved by the classical Viterbi algorithm. As a bye-product of the authentication problem, we performed transmitter identification (when the two-step hypothesis testing device decides no impersonation) using (i) the maximum likelihood (ML) method and (ii) the Gaussian mixture model (GMM), whose parameters are learned via the expectation–maximization algorithm. Our simulation results showed that the two error probabilities (missed detection and false alarm) were decreasing functions of the signal-to-noise ratio (SNR). Specifically, at an SNR of 10 dB with a pre-specified false alarm rate of 0.2, the probability of correct detection was almost one. We further noticed that the HMM method outperformed the two-step hypothesis testing method at low SNRs (e.g., a 10% increase in accuracy was recorded at SNR = −5 dB), as expected. Finally, it was observed that the GMM method was useful when the ground truths (the true path loss values for all the legitimate THz links) were noisy. Full article
(This article belongs to the Special Issue Body-Centric Sensors for the Internet of Things)
Show Figures

Figure 1

17 pages, 1397 KiB  
Article
Development of Speech Recognition Systems in Emergency Call Centers
by Alakbar Valizada, Natavan Akhundova and Samir Rustamov
Symmetry 2021, 13(4), 634; https://doi.org/10.3390/sym13040634 - 9 Apr 2021
Cited by 19 | Viewed by 5687
Abstract
In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific [...] Read more.
In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

13 pages, 2232 KiB  
Article
A Hybrid Hidden Markov Model for Pipeline Leakage Detection
by Mingchi Zhang, Xuemin Chen and Wei Li
Appl. Sci. 2021, 11(7), 3138; https://doi.org/10.3390/app11073138 - 1 Apr 2021
Cited by 12 | Viewed by 3898
Abstract
In this paper, a deep neural network hidden Markov model (DNN-HMM) is proposed to detect pipeline leakage location. A long pipeline is divided into several sections and the leakage occurs in different section that is defined as different state of hidden Markov model [...] Read more.
In this paper, a deep neural network hidden Markov model (DNN-HMM) is proposed to detect pipeline leakage location. A long pipeline is divided into several sections and the leakage occurs in different section that is defined as different state of hidden Markov model (HMM). The hybrid HMM, i.e., DNN-HMM, consists of a deep neural network (DNN) with multiple layers to exploit the non-linear data. The DNN is initialized by using a deep belief network (DBN). The DBN is a pre-trained model built by stacking top-down restricted Boltzmann machines (RBM) that compute the emission probabilities for the HMM instead of Gaussian mixture model (GMM). Two comparative studies based on different numbers of states using Gaussian mixture model-hidden Markov model (GMM-HMM) and DNN-HMM are performed. The accuracy of the testing performance between detected state sequence and actual state sequence is measured by micro F1 score. The micro F1 score approaches 0.94 for GMM-HMM method and it is close to 0.95 for DNN-HMM method when the pipeline is divided into three sections. In the experiment that divides the pipeline as five sections, the micro F1 score for GMM-HMM is 0.69, while it approaches 0.96 with DNN-HMM method. The results demonstrate that the DNN-HMM can learn a better model of non-linear data and achieve better performance compared to GMM-HMM method. Full article
(This article belongs to the Special Issue Nondestructive Testing (NDT): Volume II)
Show Figures

Figure 1

26 pages, 561 KiB  
Review
An Overview of End-to-End Automatic Speech Recognition
by Dong Wang, Xiaodong Wang and Shaohe Lv
Symmetry 2019, 11(8), 1018; https://doi.org/10.3390/sym11081018 - 7 Aug 2019
Cited by 213 | Viewed by 26244
Abstract
Automatic speech recognition, especially large vocabulary continuous speech recognition, is an important issue in the field of machine learning. For a long time, the hidden Markov model (HMM)-Gaussian mixed model (GMM) has been the mainstream speech recognition framework. But recently, HMM-deep neural network [...] Read more.
Automatic speech recognition, especially large vocabulary continuous speech recognition, is an important issue in the field of machine learning. For a long time, the hidden Markov model (HMM)-Gaussian mixed model (GMM) has been the mainstream speech recognition framework. But recently, HMM-deep neural network (DNN) model and the end-to-end model using deep learning has achieved performance beyond HMM-GMM. Both using deep learning techniques, these two models have comparable performances. However, the HMM-DNN model itself is limited by various unfavorable factors such as data forced segmentation alignment, independent hypothesis, and multi-module individual training inherited from HMM, while the end-to-end model has a simplified model, joint training, direct output, no need to force data alignment and other advantages. Therefore, the end-to-end model is an important research direction of speech recognition. In this paper we review the development of end-to-end model. This paper first introduces the basic ideas, advantages and disadvantages of HMM-based model and end-to-end models, and points out that end-to-end model is the development direction of speech recognition. Then the article focuses on the principles, progress and research hotspots of three different end-to-end models, which are connectionist temporal classification (CTC)-based, recurrent neural network (RNN)-transducer and attention-based, and makes theoretically and experimentally detailed comparisons. Their respective advantages and disadvantages and the possible future development of the end-to-end model are finally pointed out. Automatic speech recognition is a pattern recognition task in the field of computer science, which is a subject area of Symmetry. Full article
Show Figures

Figure 1

17 pages, 2799 KiB  
Article
An Unsupervised Classification Method for Flame Image of Pulverized Coal Combustion Based on Convolutional Auto-Encoder and Hidden Markov Model
by Tian Qiu, Minjian Liu, Guiping Zhou, Li Wang and Kai Gao
Energies 2019, 12(13), 2585; https://doi.org/10.3390/en12132585 - 4 Jul 2019
Cited by 49 | Viewed by 3887
Abstract
Combustion condition monitoring is a fundamental and critical issue that needs to be addressed in the wide-load operation of coal-fired boilers. In this paper, an unsupervised classification framework based on the convolutional auto-encoder (CAE), the principal component analysis (PCA), and the hidden Markov [...] Read more.
Combustion condition monitoring is a fundamental and critical issue that needs to be addressed in the wide-load operation of coal-fired boilers. In this paper, an unsupervised classification framework based on the convolutional auto-encoder (CAE), the principal component analysis (PCA), and the hidden Markov model (HMM) is proposed to monitor the combustion condition with the uniformly spaced flame images, which are collected from the furnace combustion monitoring system. First, CAE is adopted to extract the features from the flame images, which obtain the sparse representations in the images. Then, PCA is applied to project the feature vectors into the orthogonal space for robustness and computation efficiency. Finally, a HMM is built to calculate the corresponding optimal states by learning the temporal behaviors in the compressed representations. A coal combustion adjustment experiment was conducted in a 660 MW opposed-firing boiler, and the sequential 14,400 flame images with three different combustion states were obtained to evaluate the effectiveness of the proposed approach. We tested six different compression dimensions of the latent variable z in the CAE model and ensured that the appropriate compress parameter was 1024. The proposed framework is compared with five other methods: the CAE + Gaussian mixture model (GMM), CAE + Kmean, the CAE + fuzzy c-mean method, CAE + HMM, and the traditional handcraft feature extraction method (TH) + HMM. The results show that the proposed framework has the highest classification accuracy (95.25% for the training samples and 97.36% for the testing samples) and has the best performance in recognizing the semi-stable state (85.67% for the training samples and 77.60% for the testing samples), indicating that the proposed framework is capable of identifying the combustion condition, changing when the combustion deteriorates as the coal feed rate falls. Full article
Show Figures

Graphical abstract

21 pages, 4716 KiB  
Article
Convolutional Recurrent Neural Network-Based Event Detection in Tunnels Using Multiple Microphones
by Nam Kyun Kim, Kwang Myung Jeon and Hong Kook Kim
Sensors 2019, 19(12), 2695; https://doi.org/10.3390/s19122695 - 14 Jun 2019
Cited by 14 | Viewed by 5488
Abstract
This paper proposes a sound event detection (SED) method in tunnels to prevent further uncontrollable accidents. Tunnel accidents are accompanied by crashes and tire skids, which usually produce abnormal sounds. Since the tunnel environment always has a severe level of noise, the detection [...] Read more.
This paper proposes a sound event detection (SED) method in tunnels to prevent further uncontrollable accidents. Tunnel accidents are accompanied by crashes and tire skids, which usually produce abnormal sounds. Since the tunnel environment always has a severe level of noise, the detection accuracy can be greatly reduced in the existing methods. To deal with the noise issue in the tunnel environment, the proposed method involves the preprocessing of tunnel acoustic signals and a classifier for detecting acoustic events in tunnels. For preprocessing, a non-negative tensor factorization (NTF) technique is used to separate the acoustic event signal from the noisy signal in the tunnel. In particular, the NTF technique developed in this paper consists of source separation and online noise learning. In other words, the noise basis is adapted by an online noise learning technique for enhancement in adverse noise conditions. Next, a convolutional recurrent neural network (CRNN) is extended to accommodate the contributions of the separated event signal and noise to the event detection; thus, the proposed CRNN is composed of event convolution layers and noise convolution layers in parallel followed by recurrent layers and the output layer. Here, a set of mel-filterbank feature parameters is used as the input features. Evaluations of the proposed method are conducted on two datasets: a publicly available road audio events dataset and a tunnel audio dataset recorded in a real traffic tunnel for six months. In the first evaluation where the background noise is low, the proposed CRNN-based SED method with online noise learning reduces the relative recognition error rate by 56.25% when compared to the conventional CRNN-based method with noise. In the second evaluation, where the tunnel background noise is more severe than in the first evaluation, the proposed CRNN-based SED method yields superior performance when compared to the conventional methods. In particular, it is shown that among all of the compared methods, the proposed method with the online noise learning provides the best recognition rate of 91.07% and reduces the recognition error rates by 47.40% and 28.56% when compared to the Gaussian mixture model (GMM)–hidden Markov model (HMM)-based and conventional CRNN-based SED methods, respectively. The computational complexity measurements also show that the proposed CRNN-based SED method requires a processing time of 599 ms for both the NTF-based source separation with online noise learning and CRNN classification when the tunnel noisy signal is one second long, which implies that the proposed method detects events in real-time. Full article
(This article belongs to the Special Issue Sensors In Target Detection)
Show Figures

Figure 1

Back to TopTop