Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (24)

Search Parameters:
Keywords = GMM-HMM

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
46 pages, 7346 KB  
Review
Integrating Speech Recognition into Intelligent Information Systems: From Statistical Models to Deep Learning
by Chaoji Wu, Yi Pan, Haipan Wu and Lei Ning
Informatics 2025, 12(4), 107; https://doi.org/10.3390/informatics12040107 - 4 Oct 2025
Viewed by 1756
Abstract
Automatic speech recognition (ASR) has advanced rapidly, evolving from early template-matching systems to modern deep learning frameworks. This review systematically traces ASR’s technological evolution across four phases: the template-based era, statistical modeling approaches, the deep learning revolution, and the emergence of large-scale models [...] Read more.
Automatic speech recognition (ASR) has advanced rapidly, evolving from early template-matching systems to modern deep learning frameworks. This review systematically traces ASR’s technological evolution across four phases: the template-based era, statistical modeling approaches, the deep learning revolution, and the emergence of large-scale models under diverse learning paradigms. We analyze core technologies such as hidden Markov models (HMMs), Gaussian mixture models (GMMs), recurrent neural networks (RNNs), and recent architectures including Transformer-based models and Wav2Vec 2.0. Beyond algorithmic development, we examine how ASR integrates into intelligent information systems, analyzing real-world applications in healthcare, education, smart homes, enterprise systems, and automotive domains with attention to deployment considerations and system design. We also address persistent challenges—noise robustness, low-resource adaptation, and deployment efficiency—while exploring emerging solutions such as multimodal fusion, privacy-preserving modeling, and lightweight architectures. Finally, we outline future research directions to guide the development of robust, scalable, and intelligent ASR systems for complex, evolving environments. Full article
(This article belongs to the Section Machine Learning)
Show Figures

Figure 1

29 pages, 5817 KB  
Article
Unsupervised Segmentation and Alignment of Multi-Demonstration Trajectories via Multi-Feature Saliency and Duration-Explicit HSMMs
by Tianci Gao, Konstantin A. Neusypin, Dmitry D. Dmitriev, Bo Yang and Shengren Rao
Mathematics 2025, 13(19), 3057; https://doi.org/10.3390/math13193057 - 23 Sep 2025
Viewed by 582
Abstract
Learning from demonstration with multiple executions must contend with time warping, sensor noise, and alternating quasi-stationary and transition phases. We propose a label-free pipeline that couples unsupervised segmentation, duration-explicit alignment, and probabilistic encoding. A dimensionless multi-feature saliency (velocity, acceleration, curvature, direction-change rate) yields [...] Read more.
Learning from demonstration with multiple executions must contend with time warping, sensor noise, and alternating quasi-stationary and transition phases. We propose a label-free pipeline that couples unsupervised segmentation, duration-explicit alignment, and probabilistic encoding. A dimensionless multi-feature saliency (velocity, acceleration, curvature, direction-change rate) yields scale-robust keyframes via persistent peak–valley pairs and non-maximum suppression. A hidden semi-Markov model (HSMM) with explicit duration distributions is jointly trained across demonstrations to align trajectories on a shared semantic time base. Segment-level probabilistic motion models (GMM/GMR or ProMP, optionally combined with DMP) produce mean trajectories with calibrated covariances, directly interfacing with constrained planners. Feature weights are tuned without labels by minimizing cross-demonstration structural dispersion on the simplex via CMA-ES. Across UAV flight, autonomous driving, and robotic manipulation, the method reduces phase-boundary dispersion by 31% on UAV-Sim and by 30–36% under monotone time warps, noise, and missing data (vs. HMM); improves the sparsity–fidelity trade-off (higher time compression at comparable reconstruction error) with lower jerk; and attains nominal 2σ coverage (94–96%), indicating well-calibrated uncertainty. Ablations attribute the gains to persistence plus NMS, weight self-calibration, and duration-explicit alignment. The framework is scale-aware and computationally practical, and its uncertainty outputs feed directly into MPC/OMPL for risk-aware execution. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

29 pages, 2766 KB  
Article
Sound-Based Detection of Slip and Trip Incidents Among Construction Workers Using Machine and Deep Learning
by Fangxin Li, Francis Xavier Duorinaah, Min-Koo Kim, Julian Thedja, JoonOh Seo and Dong-Eun Lee
Buildings 2025, 15(17), 3136; https://doi.org/10.3390/buildings15173136 - 1 Sep 2025
Viewed by 722
Abstract
Unsafe events such as slips and trips occur regularly on construction sites. Efficient identification of these events can help protect workers from accidents and improve site safety. However, current detection methods rely on subjective reporting, which has several limitations. To address these limitations, [...] Read more.
Unsafe events such as slips and trips occur regularly on construction sites. Efficient identification of these events can help protect workers from accidents and improve site safety. However, current detection methods rely on subjective reporting, which has several limitations. To address these limitations, this study presents a sound-based slip and trip classification method using wearable sound sensors and machine learning. Audio signals were recorded using a smartwatch during simulated slip and trip events. Various 1D and 2D features were extracted from the processed audio signals and used to train several classifiers. Three key findings are as follows: (1) The hybrid CNN-LSTM network achieved the highest classification accuracy of 0.966 with 2D MFCC features, while GMM-HMM achieved the highest accuracy of 0.918 with 1D sound features. (2) 1D MFCC features achieved an accuracy of 0.867, outperforming time- and frequency-domain 1D features. (3) MFCC images were the best 2D features for slip and trip classification. This study presents an objective method for detecting slip and trip events, thereby providing a complementary approach to manual assessments. Practically, the findings serve as a foundation for developing automated near-miss detection systems, identification of workers constantly vulnerable to unsafe events, and detection of unsafe and hazardous areas on construction sites. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

21 pages, 2624 KB  
Article
GMM-HMM-Based Eye Movement Classification for Efficient and Intuitive Dynamic Human–Computer Interaction Systems
by Jiacheng Xie, Rongfeng Chen, Ziming Liu, Jiahao Zhou, Juan Hou and Zengxiang Zhou
J. Eye Mov. Res. 2025, 18(4), 28; https://doi.org/10.3390/jemr18040028 - 9 Jul 2025
Cited by 1 | Viewed by 798
Abstract
Human–computer interaction (HCI) plays a crucial role across various fields, with eye-tracking technology emerging as a key enabler for intuitive and dynamic control in assistive systems like Assistive Robotic Arms (ARAs). By precisely tracking eye movements, this technology allows for more natural user [...] Read more.
Human–computer interaction (HCI) plays a crucial role across various fields, with eye-tracking technology emerging as a key enabler for intuitive and dynamic control in assistive systems like Assistive Robotic Arms (ARAs). By precisely tracking eye movements, this technology allows for more natural user interaction. However, current systems primarily rely on the single gaze-dependent interaction method, which leads to the “Midas Touch” problem. This highlights the need for real-time eye movement classification in dynamic interactions to ensure accurate and efficient control. This paper proposes a novel Gaussian Mixture Model–Hidden Markov Model (GMM-HMM) classification algorithm aimed at overcoming the limitations of traditional methods in dynamic human–robot interactions. By incorporating sum of squared error (SSE)-based feature extraction and hierarchical training, the proposed algorithm achieves a classification accuracy of 94.39%, significantly outperforming existing approaches. Furthermore, it is integrated with a robotic arm system, enabling gaze trajectory-based dynamic path planning, which reduces the average path planning time to 2.97 milliseconds. The experimental results demonstrate the effectiveness of this approach, offering an efficient and intuitive solution for human–robot interaction in dynamic environments. This work provides a robust framework for future assistive robotic systems, improving interaction intuitiveness and efficiency in complex real-world scenarios. Full article
Show Figures

Figure 1

20 pages, 343 KB  
Article
Mathematical Modeling and Parameter Estimation of Lane-Changing Vehicle Behavior Decisions
by Jianghui Wen, Yebei Xu, Min Dai and Nengchao Lyu
Mathematics 2025, 13(6), 1014; https://doi.org/10.3390/math13061014 - 20 Mar 2025
Viewed by 775
Abstract
Lane changing is a crucial scenario in traffic environments, and accurately recognizing and predicting lane-changing behavior is essential for ensuring the safety of both autonomous vehicles and drivers. Through considering the multi-vehicle information interaction characteristics in lane-changing behavior for vehicles and the impact [...] Read more.
Lane changing is a crucial scenario in traffic environments, and accurately recognizing and predicting lane-changing behavior is essential for ensuring the safety of both autonomous vehicles and drivers. Through considering the multi-vehicle information interaction characteristics in lane-changing behavior for vehicles and the impact of driver experience needs on lane-changing decisions, this paper proposes a lane-changing model for vehicles to achieve safe and comfortable driving. Firstly, a lane-changing intention recognition model incorporating interaction effects was established to obtain the initial lane-changing intention probability of the vehicles. Secondly, by accounting for individual driving styles, a lane-changing behavior decision model was constructed based on a Gaussian mixture hidden Markov model (GMM-HMM) along with a parameter estimation method. The initial lane-changing intention probability serves as the input for the decision model, and the final lane-changing decision is made by comparing the probabilities of lane-changing and non-lane-changing scenarios. Finally, the model was validated using real-world data from the Next Generation Simulation (NGSIM) dataset, with empirical results demonstrating its high accuracy in recognizing and predicting lane-changing behavior. This study provides a robust framework for enhancing lane-changing decision making in complex traffic environments. Full article
Show Figures

Figure 1

16 pages, 4341 KB  
Article
Research on Pig Sound Recognition Based on Deep Neural Network and Hidden Markov Models
by Weihao Pan, Hualong Li, Xiaobo Zhou, Jun Jiao, Cheng Zhu and Qiang Zhang
Sensors 2024, 24(4), 1269; https://doi.org/10.3390/s24041269 - 16 Feb 2024
Cited by 13 | Viewed by 3635
Abstract
In order to solve the problem of low recognition accuracy of traditional pig sound recognition methods, deep neural network (DNN) and Hidden Markov Model (HMM) theory were used as the basis of pig sound signal recognition in this study. In this study, the [...] Read more.
In order to solve the problem of low recognition accuracy of traditional pig sound recognition methods, deep neural network (DNN) and Hidden Markov Model (HMM) theory were used as the basis of pig sound signal recognition in this study. In this study, the sounds made by 10 landrace pigs during eating, estrus, howling, humming and panting were collected and preprocessed by Kalman filtering and an improved endpoint detection algorithm based on empirical mode decomposition-Teiger energy operator (EMD-TEO) cepstral distance. The extracted 39-dimensional mel-frequency cepstral coefficients (MFCCs) were then used as a dataset for network learning and recognition to build a DNN- and HMM-based sound recognition model for pig states. The results show that in the pig sound dataset, the recognition accuracy of DNN-HMM reaches 83%, which is 22% and 17% higher than that of the baseline models HMM and GMM-HMM, and possesses a better recognition effect. In a sub-dataset of the publicly available dataset AudioSet, DNN-HMM achieves a recognition accuracy of 79%, which is 8% and 4% higher than the classical models SVM and ResNet18, respectively, with better robustness. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

15 pages, 2973 KB  
Article
An Efficient Method for the Reliability Evaluation of Power Systems Considering the Variable Photovoltaic Power Output
by Haojie He, Liweiyong Guo, Peidong Han, Changzheng Shao and Tan Xu
Appl. Sci. 2023, 13(16), 9053; https://doi.org/10.3390/app13169053 - 8 Aug 2023
Cited by 1 | Viewed by 1989
Abstract
The operational reliability of power systems is threatened by the random failure of components and uncertain power output of renewable energies, such as photovoltaics. Under such circumstances, reliability evaluation is necessary for maintaining a continuous and stable energy supply. However, traditional reliability evaluation [...] Read more.
The operational reliability of power systems is threatened by the random failure of components and uncertain power output of renewable energies, such as photovoltaics. Under such circumstances, reliability evaluation is necessary for maintaining a continuous and stable energy supply. However, traditional reliability evaluation methods are usually extremely time-consuming, considering the numerous system states that need to be analysed. Hence, the reliability evaluation process cannot follow up the dynamic changes in PV output, which makes the timeline of the evaluation disappointing. This paper proposes an efficient reliability evaluation method for power systems with PV integration. The method reveals the analytical relationship between the reliability levels of the power system and the uncertainty factors that influence the reliability, such as the PV output. In this way, the dynamic reliability evaluation is achieved, and the evaluation results can be updated timely when the output of PV changes. First, a Gaussian mixture-hidden Markov model (GMM-HMM) is used to model the distribution characteristics of PV output. Then, the state enumeration and the hyperbolic truncated polynomial chaos expansion method are used to determine the analytical relationship between the reliability indices and PV output. Lastly, based on the analytical function, the operational reliability of the power systems is dynamically evaluated considering the real-time PV output. The effectiveness of the proposed method is verified using the modified IEEE 30 system as an example. Full article
Show Figures

Figure 1

19 pages, 6405 KB  
Article
Combined Bidirectional Long Short-Term Memory with Mel-Frequency Cepstral Coefficients Using Autoencoder for Speaker Recognition
by Young-Long Chen, Neng-Chung Wang, Jing-Fong Ciou and Rui-Qi Lin
Appl. Sci. 2023, 13(12), 7008; https://doi.org/10.3390/app13127008 - 10 Jun 2023
Cited by 13 | Viewed by 2714
Abstract
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet [...] Read more.
Recently, neural network technology has shown remarkable progress in speech recognition, including word classification, emotion recognition, and identity recognition. This paper introduces three novel speaker recognition methods to improve accuracy. The first method, called long short-term memory with mel-frequency cepstral coefficients for triplet loss (LSTM-MFCC-TL), utilizes MFCC as input features for the LSTM model and incorporates triplet loss and cluster training for effective training. The second method, bidirectional long short-term memory with mel-frequency cepstral coefficients for triplet loss (BLSTM-MFCC-TL), enhances speaker recognition accuracy by employing a bidirectional LSTM model. The third method, bidirectional long short-term memory with mel-frequency cepstral coefficients and autoencoder features for triplet loss (BLSTM-MFCCAE-TL), utilizes an autoencoder to extract additional AE features, which are then concatenated with MFCC and fed into the BLSTM model. The results showed that the performance of the BLSTM model was superior to the LSTM model, and the method of adding AE features achieved the best learning effect. Moreover, the proposed methods exhibit faster computation times compared to the reference GMM-HMM model. Therefore, utilizing pre-trained autoencoders for speaker encoding and obtaining AE features can significantly enhance the learning performance of speaker recognition. Additionally, it also offers faster computation time compared to traditional methods. Full article
Show Figures

Figure 1

22 pages, 3221 KB  
Article
An HMM-DNN-Based System for the Detection and Classification of Low-Frequency Acoustic Signals from Baleen Whales, Earthquakes, and Air Guns off Chile
by Susannah J. Buchan, Miguel Duran, Constanza Rojas, Jorge Wuth, Rodrigo Mahu, Kathleen M. Stafford and Nestor Becerra Yoma
Remote Sens. 2023, 15(10), 2554; https://doi.org/10.3390/rs15102554 - 13 May 2023
Cited by 6 | Viewed by 3792
Abstract
Marine passive acoustic monitoring can be used to study biological, geophysical, and anthropogenic phenomena in the ocean. The wide range of characteristics from geophysical, biological, and anthropogenic sounds sources makes the simultaneous automatic detection and classification of these sounds a significant challenge. Here, [...] Read more.
Marine passive acoustic monitoring can be used to study biological, geophysical, and anthropogenic phenomena in the ocean. The wide range of characteristics from geophysical, biological, and anthropogenic sounds sources makes the simultaneous automatic detection and classification of these sounds a significant challenge. Here, we propose a single Hidden Markov Model-based system with a Deep Neural Network (HMM-DNN) for the detection and classification of low-frequency biological (baleen whales), geophysical (earthquakes), and anthropogenic (air guns) sounds. Acoustic data were obtained from the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization station off Juan Fernandez, Chile (station HA03) and annotated by an analyst (498 h of audio data containing 30,873 events from 19 different classes), and then divided into training (60%), testing (20%), and tuning (20%) subsets. Each audio frame was represented as an observation vector obtained through a filterbank-based spectral feature extraction procedure. The HMM-DNN training procedure was carried out discriminatively by setting HMM states as targets. A model with Gaussian Mixtures Models and HMM (HMM-GMM) was trained to obtain an initial set of HMM target states. Feature transformation based on Linear Discriminant Analysis and Maximum Likelihood Linear Transform was also incorporated. The HMM-DNN system displayed good capacity for correctly detecting and classifying events, with high event-level accuracy (84.46%), high weighted average sensitivity (84.46%), and high weighted average precision (89.54%). Event-level accuracy increased with higher event signal-to-noise ratios. Event-level metrics per class also showed that our HMM-DNN system generalized well for most classes but performances were best for classes that either had a high number of training exemplars (e.g., generally above 50) and/or were for classes of signals that had low variability in spectral features, duration, and energy levels. Fin whale and Antarctic blue whale song and air guns performed particularly well. Full article
(This article belongs to the Section Ocean Remote Sensing)
Show Figures

Figure 1

21 pages, 659 KB  
Article
Extracting Statistical Properties of Solar and Photovoltaic Power Production for the Scope of Building a Sophisticated Forecasting Framework
by Joseph Ndong and Ted Soubdhan
Forecasting 2023, 5(1), 1-21; https://doi.org/10.3390/forecast5010001 - 29 Dec 2022
Cited by 1 | Viewed by 2480
Abstract
Building a sophisticated forecasting framework for solar and photovoltaic power production in geographic zones with severe meteorological conditions is very challenging. This difficulty is linked to the high variability of the global solar radiation on which the energy production depends. A suitable forecasting [...] Read more.
Building a sophisticated forecasting framework for solar and photovoltaic power production in geographic zones with severe meteorological conditions is very challenging. This difficulty is linked to the high variability of the global solar radiation on which the energy production depends. A suitable forecasting framework might take into account this high variability and could be able to adjust/re-adjust model parameters to reduce sensitivity to estimation errors. The framework should also be able to re-adapt the model parameters whenever the atmospheric conditions change drastically or suddenly—this changes according to microscopic variations. This work presents a new methodology to analyze carefully the meaningful features of global solar radiation variability and extract some relevant information about the probabilistic laws which governs its dynamic evolution. The work establishes a framework able to identify the macroscopic variations from the solar irradiance. The different categories of variability correspond to different levels of meteorological conditions and events and can occur in different time intervals. Thereafter, the tool will be able to extract the abrupt changes, corresponding to microscopic variations, inside each level of variability. The methodology is based on a combination of probability and possibility theory. An unsupervised clustering technique based on a Gaussian mixture model is proposed to identify, first, the categories of variability and, using a hidden Markov model, we study the temporal dependency of the process to identify the dynamic evolution of the solar irradiance as different temporal states. Finally, by means of some transformations of probabilities to possibilities, we identify the abrupt changes in the solar radiation. The study is performed in Guadeloupe, where we have a long record of global solar radiation data recorded at 1 Hertz. Full article
(This article belongs to the Collection Energy Forecasting)
Show Figures

Figure 1

15 pages, 1578 KB  
Article
sEMG-Based Continuous Hand Action Prediction by Using Key State Transition and Model Pruning
by Kaikui Zheng, Shuai Liu, Jinxing Yang, Metwalli Al-Selwi and Jun Li
Sensors 2022, 22(24), 9949; https://doi.org/10.3390/s22249949 - 16 Dec 2022
Cited by 6 | Viewed by 2535
Abstract
Conventional classification of hand motions and continuous joint angle estimation based on sEMG have been widely studied in recent years. The classification task focuses on discrete motion recognition and shows poor real-time performance, while continuous joint angle estimation evaluates the real-time joint angles [...] Read more.
Conventional classification of hand motions and continuous joint angle estimation based on sEMG have been widely studied in recent years. The classification task focuses on discrete motion recognition and shows poor real-time performance, while continuous joint angle estimation evaluates the real-time joint angles by the continuity of the limb. Few researchers have investigated continuous hand action prediction based on hand motion continuity. In our study, we propose the key state transition as a condition for continuous hand action prediction and simulate the prediction process using a sliding window with long-term memory. Firstly, the key state modeled by GMM-HMMs is set as the condition. Then, the sliding window is used to dynamically look for the key state transition. The prediction results are given while finding the key state transition. To extend continuous multigesture action prediction, we use model pruning to improve reusability. Eight subjects participated in the experiment, and the results show that the average accuracy of continuous two-hand actions is 97% with a 70 ms time delay, which is better than LSTM (94.15%, 308 ms) and GRU (93.83%, 300 ms). In supplementary experiments with continuous four-hand actions, over 85% prediction accuracy is achieved with an average time delay of 90 ms. Full article
Show Figures

Figure 1

44 pages, 1693 KB  
Review
An Overview of Machine Learning within Embedded and Mobile Devices–Optimizations and Applications
by Taiwo Samuel Ajani, Agbotiname Lucky Imoize and Aderemi A. Atayero
Sensors 2021, 21(13), 4412; https://doi.org/10.3390/s21134412 - 28 Jun 2021
Cited by 147 | Viewed by 22579
Abstract
Embedded systems technology is undergoing a phase of transformation owing to the novel advancements in computer architecture and the breakthroughs in machine learning applications. The areas of applications of embedded machine learning (EML) include accurate computer vision schemes, reliable speech recognition, innovative healthcare, [...] Read more.
Embedded systems technology is undergoing a phase of transformation owing to the novel advancements in computer architecture and the breakthroughs in machine learning applications. The areas of applications of embedded machine learning (EML) include accurate computer vision schemes, reliable speech recognition, innovative healthcare, robotics, and more. However, there exists a critical drawback in the efficient implementation of ML algorithms targeting embedded applications. Machine learning algorithms are generally computationally and memory intensive, making them unsuitable for resource-constrained environments such as embedded and mobile devices. In order to efficiently implement these compute and memory-intensive algorithms within the embedded and mobile computing space, innovative optimization techniques are required at the algorithm and hardware levels. To this end, this survey aims at exploring current research trends within this circumference. First, we present a brief overview of compute intensive machine learning algorithms such as hidden Markov models (HMM), k-nearest neighbors (k-NNs), support vector machines (SVMs), Gaussian mixture models (GMMs), and deep neural networks (DNNs). Furthermore, we consider different optimization techniques currently adopted to squeeze these computational and memory-intensive algorithms within resource-limited embedded and mobile environments. Additionally, we discuss the implementation of these algorithms in microcontroller units, mobile devices, and hardware accelerators. Conclusively, we give a comprehensive overview of key application areas of EML technology, point out key research directions and highlight key take-away lessons for future research exploration in the embedded machine learning domain. Full article
(This article belongs to the Special Issue Embedded Systems and Internet of Things)
Show Figures

Figure 1

14 pages, 572 KB  
Article
Securing the Insecure: A First-Line-of-Defense for Body-Centric Nanoscale Communication Systems Operating in THz Band
by Waqas Aman, Muhammad Mahboob Ur Rahman, Hasan T. Abbas, Muhammad Arslan Khalid, Muhammad A. Imran, Akram Alomainy and Qammer H. Abbasi
Sensors 2021, 21(10), 3534; https://doi.org/10.3390/s21103534 - 19 May 2021
Cited by 5 | Viewed by 3983
Abstract
This manuscript presents a novel mechanism (at the physical layer) for authentication and transmitter identification in a body-centric nanoscale communication system operating in the terahertz (THz) band. The unique characteristics of the propagation medium in the THz band renders the existing techniques (say [...] Read more.
This manuscript presents a novel mechanism (at the physical layer) for authentication and transmitter identification in a body-centric nanoscale communication system operating in the terahertz (THz) band. The unique characteristics of the propagation medium in the THz band renders the existing techniques (say for impersonation detection in cellular networks) not applicable. In this work, we considered a body-centric network with multiple on-body nano-senor nodes (of which some nano-sensors have been compromised) who communicate their sensed data to a nearby gateway node. We proposed to protect the transmissions on the link between the legitimate nano-sensor nodes and the gateway by exploiting the path loss of the THz propagation medium as the fingerprint/feature of the sender node to carry out authentication at the gateway. Specifically, we proposed a two-step hypothesis testing mechanism at the gateway to counter the impersonation (false data injection) attacks by malicious nano-sensors. To this end, we computed the path loss of the THz link under consideration using the high-resolution transmission molecular absorption (HITRAN) database. Furthermore, to refine the outcome of the two-step hypothesis testing device, we modeled the impersonation attack detection problem as a hidden Markov model (HMM), which was then solved by the classical Viterbi algorithm. As a bye-product of the authentication problem, we performed transmitter identification (when the two-step hypothesis testing device decides no impersonation) using (i) the maximum likelihood (ML) method and (ii) the Gaussian mixture model (GMM), whose parameters are learned via the expectation–maximization algorithm. Our simulation results showed that the two error probabilities (missed detection and false alarm) were decreasing functions of the signal-to-noise ratio (SNR). Specifically, at an SNR of 10 dB with a pre-specified false alarm rate of 0.2, the probability of correct detection was almost one. We further noticed that the HMM method outperformed the two-step hypothesis testing method at low SNRs (e.g., a 10% increase in accuracy was recorded at SNR = −5 dB), as expected. Finally, it was observed that the GMM method was useful when the ground truths (the true path loss values for all the legitimate THz links) were noisy. Full article
(This article belongs to the Special Issue Body-Centric Sensors for the Internet of Things)
Show Figures

Figure 1

17 pages, 1397 KB  
Article
Development of Speech Recognition Systems in Emergency Call Centers
by Alakbar Valizada, Natavan Akhundova and Samir Rustamov
Symmetry 2021, 13(4), 634; https://doi.org/10.3390/sym13040634 - 9 Apr 2021
Cited by 21 | Viewed by 6440
Abstract
In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific [...] Read more.
In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that dialogue speech in call centers has specific context and noisy, emotional environments, available speech recognition systems show poor performance. Therefore, in order to accurately recognize dialogue speeches, the main modules of speech recognition systems—language models and acoustic training methodologies—as well as symmetric data labeling approaches have been investigated and analyzed. To find an effective acoustic model for dialogue data, different types of Gaussian Mixture Model/Hidden Markov Model (GMM/HMM) and Deep Neural Network/Hidden Markov Model (DNN/HMM) methodologies were trained and compared. Additionally, effective language models for dialogue systems were defined based on extrinsic and intrinsic methods. Lastly, our suggested data labeling approaches with spelling correction are compared with common labeling methods resulting in outperforming the other methods with a notable percentage. Based on the results of the experiments, we determined that DNN/HMM for an acoustic model, trigram with Kneser–Ney discounting for a language model and using spelling correction before training data for a labeling method are effective configurations for dialogue speech recognition in emergency call centers. It should be noted that this research was conducted with two different types of datasets collected from emergency calls: the Dialogue dataset (27 h), which encapsulates call agents’ speech, and the Summary dataset (53 h), which contains voiced summaries of those dialogues describing emergency cases. Even though the speech taken from the emergency call center is in the Azerbaijani language, which belongs to the Turkic group of languages, our approaches are not tightly connected to specific language features. Hence, it is anticipated that suggested approaches can be applied to the other languages of the same group. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

13 pages, 2232 KB  
Article
A Hybrid Hidden Markov Model for Pipeline Leakage Detection
by Mingchi Zhang, Xuemin Chen and Wei Li
Appl. Sci. 2021, 11(7), 3138; https://doi.org/10.3390/app11073138 - 1 Apr 2021
Cited by 12 | Viewed by 4353
Abstract
In this paper, a deep neural network hidden Markov model (DNN-HMM) is proposed to detect pipeline leakage location. A long pipeline is divided into several sections and the leakage occurs in different section that is defined as different state of hidden Markov model [...] Read more.
In this paper, a deep neural network hidden Markov model (DNN-HMM) is proposed to detect pipeline leakage location. A long pipeline is divided into several sections and the leakage occurs in different section that is defined as different state of hidden Markov model (HMM). The hybrid HMM, i.e., DNN-HMM, consists of a deep neural network (DNN) with multiple layers to exploit the non-linear data. The DNN is initialized by using a deep belief network (DBN). The DBN is a pre-trained model built by stacking top-down restricted Boltzmann machines (RBM) that compute the emission probabilities for the HMM instead of Gaussian mixture model (GMM). Two comparative studies based on different numbers of states using Gaussian mixture model-hidden Markov model (GMM-HMM) and DNN-HMM are performed. The accuracy of the testing performance between detected state sequence and actual state sequence is measured by micro F1 score. The micro F1 score approaches 0.94 for GMM-HMM method and it is close to 0.95 for DNN-HMM method when the pipeline is divided into three sections. In the experiment that divides the pipeline as five sections, the micro F1 score for GMM-HMM is 0.69, while it approaches 0.96 with DNN-HMM method. The results demonstrate that the DNN-HMM can learn a better model of non-linear data and achieve better performance compared to GMM-HMM method. Full article
(This article belongs to the Special Issue Nondestructive Testing (NDT): Volume II)
Show Figures

Figure 1

Back to TopTop