Human Activity Recognition with an HMM-Based Generative Model

Human activity recognition (HAR) has become an interesting topic in healthcare. This application is important in various domains, such as health monitoring, supporting elders, and disease diagnosis. Considering the increasing improvements in smart devices, large amounts of data are generated in our daily lives. In this work, we propose unsupervised, scaled, Dirichlet-based hidden Markov models to analyze human activities. Our motivation is that human activities have sequential patterns and hidden Markov models (HMMs) are some of the strongest statistical models used for modeling data with continuous flow. In this paper, we assume that emission probabilities in HMM follow a bounded–scaled Dirichlet distribution, which is a proper choice in modeling proportional data. To learn our model, we applied the variational inference approach. We used a publicly available dataset to evaluate the performance of our proposed model.

VI is an approximation technique, more accurate compared to ML and faster than fully Bayesian inference [90][91][92][93][94][95][96][97][98]. Moreover, compared to a deterministic method, such as maximum likelihood, it does not suffer from convergence to a local maximum and over-fitting. Moreover, it is not as computationally complex as fully Bayesian. Finally, we evaluate our proposed model, SD-HMM, on publicly available datasets of human activity recognition. To evaluate our model, we compared it with the hidden Markov model with GMM emission probabilities as a widely used alternative.
In this work, we propose an improved version of the hidden Markov model, such that emission probabilities are raised from scaled Dirichlet mixture models. Then, we applied an elegant learning method, variational inference, to estimate parameters. we evaluated the performance of our proposed model and compared it with similar alternatives in human activity recognition.
The paper is organized as follows: In Section 2, we discuss HMM. In Section 3, we estimate model parameters with variational inferences. In Section 4, we present the results of evaluating our proposed model in human activity recognition. We conclude in Section 6.

Hidden Markov Model
In this section, we explain the first Markov chain. Let us consider a sequence of states or events. In the first-order Markov Model, it is assumed that the future state depends only on the current state. Thus, an event at a particular point in time t depends on the previous event at time t − 1. HMM is expressed by the following parameters: • Transition probability, which is the probability of altering the state at time t to the same state or another state at the time step t + 1. The sum of all transition probabilities at the current state is equal to 1. • Emission probability, which indicates the probability of an observation generated from a particular state. • Initial probability π, in which HMM starts at time step of 0. The sum of all probabilities is equal to 1.
We express HMM with parameters θ = {B, C, ϕ, π} by the following notations: • A sequence of observations: X = {X 1 , . . . , X T } generated by hidden states M indicates the number of mixture components associated with the state j. • π j : Initial probability to start the sequence from state j.
ϕ includes two parameters of the SD mixture model. To modify conventional HMM, we first explain the SD distribution. Let us consider a D-dimensional observation X = x 1 , . . . , x d , which is drawn from the SD distribution with two parameters, α and β.
. . , α D ) and β = (β 1 , . . . , β D ) are respective shape and scale parameters. These two parameters provide more flexibility in modeling various shapes of data. Moreover, we are assuming that X is a proportional vector. To express the modification of HMMs, we represent the estimates of states and mixture components by: and the local states sequence given the whole observation set by: γ t h t ,m t and ξ t h t ,h r +1 for all t ∈ [1, T] are responsibilities and computed by similar forwardbackward procedures for HMM with Gaussian mixtures.

Parameter Estimation with Variational Inference
In this section, we discuss a powerful learning approach, variational inference, which inherits the advantages of deterministic and Bayesian inferences. This technique has more precise results compared to deterministic methods while being faster than fully Bayesian inference. The idea is to minimize the distance between the true posterior and an approximating distribution using the Kullback-Leibler (KL) divergence [90]. Moreover, With this approximation scheme, we can simultaneously estimate the model's parameters and the optimal number of components. Here, we will explain the steps of variational learning to update all parameters of HMM for the transition distribution, mixing matrix, and shape parameters of emission distributions. As the first step, we need to define a prior distribution for each parameter. Given the model parameters, the likelihood of sequential observations X is defined as follows, where S and L are the respective sets of hidden states and the set of mixture components: Considering that this quantity is not computable, we introduce a lower bound by using an approximating distribution q(B, C, π, ϕ, S, L) of the true posterior p(B, C, π, ϕ, S, L | X). So, Further to Jensen's inequality and considering that KL(q || p) ≥ 0, KL(q || p) = 0 if q is equal to the true posterior. L(q) is a lower bound to ln p( X): Now, with the help of the mean field theory, we define a restricted family of distributions to compute the true posterior distribution as it is not tractable: We define the lower bound as follows: − ln(q(α))} = F(q(π)) + F(q(C)) + F(q(B)) + F(q(α)) + F(q(β)) + F(q(S, L)) To define the priors for HMM parameters, we choose the Dirichlet distribution for B, C, and π, considering that all of the coefficients of these parameters are strictly positive, less than one, and their coefficients sum up to one.
For α and β, we choose Gamma and Dirichlet distributions, respectively: h j = h j1 , . . . , h jD . Gamma and Dirichlet distributions are defined as G(.) and D(.), respectively. Moreover, u jd , v jd , and h jd indicate positive hyperparameters. Now, we explain the optimization of q(B), q(C), and q(π) as follows: q(π) = D(π 1 , . . . , pi K | w π 1 , . . . , w π K ) (19) For q(α) and q(β): jd ×ᾱ js ln α js − lnᾱ js (28) If X pt is assigned to state i and mixture component j, Z pij = 1, otherwise, zero. To compute responsibilities, we apply a typical forward-backward procedure [99] where Z pij = ∑ T t=1 γ C pijt = p(s = i, m = j | X). In the expectation step, we keep fixed the parameters estimated in the previous step, and the updated values are computed as follows: Ψ and . are the Digamma function and expectation, respectively. Moreover, we update the shape and scaled parameters as follows [100]: Here, we summarize the whole procedure of the SD-based HMM in Algorithm 1: Algorithm 1: Variational learning of the SD-HMM model.

1.
Initialize the shape and scale parameters of the SD distribution.

2.
Define the initial responsibilities.
Compute the data likelihood. 7.
Compute the responsibilities with the forward-backward procedure. 8.
Update the hyperparameters of the shape and scale parameters. 9.
Update B, C, and π using w B , w C , and w π .

Experimental Results
In this part, we evaluate our model on publicly available datasets. Here, we explain the procedure and dataset: We propose a novel clustering algorithm and compare our model with other methods. Before applying our model to the datasets, we removed all labels and we did not have any training or testing. After predicting labels by our proposed clustering algorithm, we compared the actual and predicted labels with each other. To assess the performance, we applied four following metrics where TP, TN, FP, and FN represent the total number of true positives, true negatives, false positives, and false negatives, respectively: Normalization of features: Considering various ranges of features, we applied minmax scaling with the following formula, which helps in shifting and re-scaling the values and keep them between zero and one: We tested our model on publicly available and real datasets, the so-called opportunity [101,102] and UCI HAR databases [103].

Opportunity Dataset
In this dataset, data were collected by external and wearable sensors. Volunteers attached wearable sensors to their clothes and the rest were attached to objects or installed at points of interest. Through this setting, various activities of different levels were recognized. The sensors used were (1) body-worn sensors, including 7 inertial measurement units (IMUs), 12 3D acceleration sensors, and 4 3D coordinates from a localization system.
(2) Object sensors, including 12 objects, were instrumented with wireless sensors measuring 3D acceleration and the 2D rate of turn. (3) Ambient sensors, including 13 switches and 8 3D acceleration sensors in kitchen appliances and furniture. Data were collected from 4 individuals and 6 runs per user with 5 activities of daily living and 1 "drill" run, which is a scripted sequence of activities. We analyzed four daily activities of individuals standing, walking, lying, and sitting. We tested our model on 2 runs of individual activities. • Oversampling: Both datasets have 108 features; as illustrated in Table 1, there are considerable inequalities in the distribution of instances per cluster. As shown in the first run of the test, the percentages of four activities are 59.7%, 17.4%, 19.9%, and 3% for standing, walking, lying, and sitting, respectively. These shares for the second run are 41%, 23.8%, 5.1%, and 30%. As this challenge results in frequency bias, the model may be affected by the dominant class and learn from clusters, including more observations. We tackled this challenge with oversampling using a method called the synthetic minority over-sampling technique (SMOTE). In this method, new data points were generated by interpolating between observations in the original dataset. Thus, we obtain a balanced dataset. After this step, we had equal observations in 4 clusters with 22,380 and 10,379 in the first and second runs, respectively.  Table 2 for the first and second runs, respectively. This is a typical issue, especially when we work on real datasets. We replaced missing values with the median of each feature to minimize the effects of the outliers. As we mentioned, we have 108 features. After tackling the above-mentioned challenges, we tested our algorithm on these two datasets. Here, SD-HMM, D-HMM, and GMM-HMM stand for scaled Dirichlet-based HMM, Dirichlet-based HMM, and Gaussian-based HMM, respectively.

UCI Dataset
To collect data, the activities of 30 individuals who were between 19 and 48 years were analyzed. Each volunteer performed activities while wearing a smartphone on the waist. With the help of an embedded gyroscope and accelerometer, information related to the 3-axial 3-axial angular velocity and linear acceleration were captured. To label the data, a camera was applied for manual annotation. Similar to previous experiments, we analyzed the standing, walking, lying, and sitting of individuals. This dataset includes 561 features.

Results and Discussion
In the experimental part, we validated the performance of our proposed algorithm on three real-world datasets and compared it with D-HMM, which is similar to our proposed model. Moreover, we compared it with GMM-HMM as the most widely used alternative. In the opportunity dataset, we focused on two subsets and the results are demonstrated in Tables 3 and 4. As shown in Table 3, SD-HMM outperforms GMM-HMM by 89.33%, 86.54%, 85.51%, and 86.02% in accuracy, precision, recall, and F1-score, respectively. In Table 4, we present the outcomes of our test on the second dataset. Similar to the previous case, SD-HMM robustness is enhanced by 87.12%, 87.28%, 85.44%, and 86.35% in accuracy, precision, recall, and F1-score, respectively. In the next part of our experiment which is dedicated to the UCI dataset, Table 5 indicates that SD-HMM has behavior similar to two previous cases and provides better results with 86.17%, 85.05%, 86.83%, and 85.93% in accuracy, precision, recall, and F1-score, respectively. Scaled Dirichlet distribution has one more parameter compared to Dirichlet distribution; this character provides more flexibility to SD-HMM. The main finding from the comparison of our novel model against the conventional model is that our proposed model can be considered an alternative.

Conclusions
In this work, we proposed scaled Dirichlet-based hidden Markov models. Our principal motivation was the various natures of data and the fact that the assumption of Gaussianity could not be generalized to all cases and scenarios. In recent years, other alternatives have been evaluated and they may provide better flexibility in fitting data. Scaled Dirichlet distribution has two parameters that modify the shape and scale of distribution and this character assists us in modeling asymmetric and various skewed forms. Here, we assumed that the scaled Dirichlet distribution is the source of emission probability distribution. With such modifications and improvements in the structure of GMM-HMM, we may achieve some robustness. After constructing our model, we learned that variational inference provides results with reasonable computational time and accuracy. To validate our model, we tested our proposed methodology on three datasets of human activity recognition. As this application has been applied in numerous medical domains, we intended to demonstrate the robustness of our novel method in such a demanding domain. Datasets used in this test were collected by wearable, object, and environmental sensors. The results of our evaluation indicate that our proposed method outperforms D-HMM and GMM-HMM. In future work, we will study the activities of several individuals and test other alternatives as emission probabilities. Moreover, we believe our work is more robust to outliers and we will introduce feature selection in the future. Data Availability Statement: Datasets are publicly available and can be found in following links: https://archive.ics.uci.edu/ml/datasets/opportunity+activity+recognition, https://archive.ics.uci. edu/ml/datasets/human+activity+recognition+using+smartphones.