Sustainable Wearable System: Human Behavior Modeling for Life-Logging Activities Using K-Ary Tree Hashing Classiﬁer

: Human behavior modeling (HBM) is a challenging classiﬁcation task for researchers seeking to develop sustainable systems that precisely monitor and record human life-logs. In recent years, several models have been proposed; however, HBM remains an inspiring problem that is only partly solved. This paper proposes a novel framework of human behavior modeling based on wearable inertial sensors; the system framework is composed of data acquisition, feature extraction, optimization and classiﬁcation stages. First, inertial data is ﬁltered via three di ﬀ erent ﬁlters, i.e., Chebyshev, Elliptic and Bessel ﬁlters. Next, six di ﬀ erent features from time and frequency domains are extracted to determine the maximum optimal values. Then, the Probability Based Incremental Learning (PBIL) optimizer and the K-Ary tree hashing classiﬁer are applied to model di ﬀ erent human activities. The proposed model is evaluated on two benchmark datasets, namely DALIAC and PAMPA2, and one self-annotated dataset, namely, IM-LifeLog, respectively. For evaluation, we used a leave-one-out cross validation scheme. The experimental results show that our model outperformed existing state-of-the-art methods with accuracy rates of 94.23%, 94.07% and 96.40% over DALIAC, PAMPA2 and IM-LifeLog datasets, respectively. The proposed system can be used in healthcare, physical activity detection, surveillance systems and medical ﬁtness ﬁelds.


Introduction
Recent developments in human behavior modeling (HBM) help individuals shift their goals from merely counting steps to more comprehensive monitoring and surveillance especially associated with human activities within a controlled environment [1]. Human movements can be analyzed through sophisticated sensors that can be safely attached to the body for efficient monitoring and the compilation of human life-log activities within both indoor and outdoor environments [2]. However, wearable sensors possess certain challenges in recognizing human activities due to a lack of reliable contextual information caused by inconsistency in human body movement, lapses during activity recording and other interruptions like resting etc. [3]. Therefore, an efficient model is required that can correctly detect complex human postures and their significance.
Wearable sensors can continuously monitor human posture and record different raw-signal attributes such as life-log, biological, and physiological parameters [4]. Standard parameters include heart rate, physical activities through wearable electrocardiogram (ECG), accelerometer and other sensors in wearable devices [5]. Nowadays, such sensors are incorporated in body-mounted devices contextual information with behavioral transitions are maintained. The major contributions of our paper are as follows: • We propose a multi-feature extraction methodology from time and frequency domains that improve the feature selection process of life-logging activities.

•
Multiple features from different domains are then optimized with Probability Based Incremental Learning and a K-Ary based hashing tree classifier that provides contextual information and classifies behaviors.

•
Comprehensive evaluation is performed on two public benchmark datasets and on a self-annotated dataset that delivers significantly better performance than other state-of-the-art methodologies.
The rest of this paper is organized as follows. In Section 2, the literature review is presented based on four main categories of human behavior modeling (HBM). Section 3 addresses the proposed HBM model that includes time and frequency domain features, probability-based optimization, and the K-Ary based tree hashing classifier. Section 4 discusses the experimental setup and a comparison of the proposed method with other state-of-the-art methods. Finally, Section 5 presents the conclusion and future work.

Related Work
A substantial amount of work has been done to develop life-logging activities using the image and inertial sensors. This section divides the related work into four subsections: i.e., (1) feature-based activity recognition using images, (2) feature-based activity recognition using wearable sensors, (3) activity classification via images, and (4) activity classification via wearable sensors.

Feature Based Activity Recognition Using 2D/3D Images
Image processing techniques that increase the perception of certain features have been used to improve the interpretability of 2D/3D videos and still images. Due to the variety of features extraction methodologies, this has become a trending topic and numerous methods have been proposed. Most of the approaches can be divided into five broad categories: (i) Spatio-Temporal (ii) Frequency-based (iii) Local Descriptors (iv) Shape-based and (v) Appearance-based methods. For example, Hu et al. [11] proposed a local features descriptor based on spatial displacement and direction relations for human interaction recognition. 3D skeleton data is captured by Kinect sensors and is determined by a spatial-temporal saliency-based representation method. Anitha et al. [12] presented Laplace smoothing transforms (LST) as a feature extraction method for human action recognition. The LST is used to extract action recognition properties via low frequency characteristics of the image based on a Laplacian function. The extracted frequency features are later used for classification purposes. The major limitation in this model is that the KNN classifier deliver poor performance with high dimensional data and it takes too much time calculating the distance between dataset points all of which drastically reduce the performance of this model. Mahmood et al. [13] proposed a WHITE STAG model and angular-geometric sequential based features to extract features in space-time methods. This approach is applied over full-body silhouettes and skeleton joints to track human activities. Sreela et al. [14] proposed an image action recognition model in which a deep neural network, i.e., a residual neural network, is used for feature extraction. The neural network eliminates the vanishing gradient problem as the errors are back propagated through a short-cut connection. This also reduces the training time for the deep network. The input image to the neural network is 224 × 224 × 3 in size and the convolution layer is a 7 × 7 kernel. The model is evaluated over a Pascal VOC action dataset and it outperforms in fifteen classes of actions. This model has been used in deep neural networks that can perform well on the trained dataset but has failed badly on external real world images. Moreover, it is highly sensitive to changes in the context of the images. Jalal et al. [15] proposed invariant features of depth silhouettes using R transform. They used body shape information to encode depth values for human activity recognition in indoor environments. In [16], Weng and Fu proposed histogram of Sustainability 2020, 12, 10324 4 of 21 oriented gradients (HOG) features for searching indexes by combining the upper body part and pose estimation to recognize human activities.

Feature Based Activity Recognition Using Wearable Sensors
Developments in sensing technologies help researchers build various smart systems to monitor daily human life activities. Nowadays, wearable sensors have the ability to detect abnormal and unforeseen situations and can provide assistance in times of dire need. Numerous researchers have proposed different human activity recognition models by monitoring various physiological parameters along with other symptoms. In a comparative study, Jalal et al. [17] proposed hierarchical features to recognize human behavior. These hierarchical features include mostly statistical features that respond to abrupt changes, temporal variation, and signal magnitude. Tahir et al. [18] developed a combination of multiple features that include statistical features, 1-D walsh Hadamard wavelet transform features, and 1-D local binary pattern features. Statistical features include mean, variance, peak to peak and peak magnitude to RMS ratio which together achieve a reasonable accuracy rate. Batool et al. [19] proposed Mel-frequency cepstral coefficients and statistical features to detect physical activity using accelerometer and gyroscopic sensors. Shahar et al. [20] analyzed the accelerometer and gyroscope signal via sensors mounted at the chest, waist, and left and right wrists of the body. The mean, standard deviation, maximum, and minimum peak features are extracted for hockey playing activities. The model outperforms configurations where the sensors are mounted at the waist and the left wrist location only. Their model is best suited for gaming activities. This model can also be further enhanced and applied for daily life activities. The major limitations in extracting features from mean, standard deviation, maximum, and minimum peak features is that obtained features might not be optimal as redundant features will be obtained that might not give optimal performance in real time situations. Uddin et al. [21] proposed a feature selection methodology based on Guided Random Forest. To select the features, the Guided Random Forest is first trained on the dataset to get important scores for the features. The selected scores are then injected to influence the feature selection process. The Random Forest allows parallel computing, low computational cost, and the selection of high quality features which helped to develop an improved activity recognition model. Iqbal et al. [22] collected a CUI-HPAR dataset of physical activities represented as feature vectors. The extracted statistical and frequency features include mean, standard deviation, entropy, cross-correlation, root mean square, zero crossing, and max value. The extracted features are classified using a supervised learning model. The depicted model achieved an accuracy rate of 90%.

Activity Classification via Images
Image classification is a vital topic in the computer vision research field. Many methods have been developed to improve image classification models and they can be broadly classified into supervised and unsupervised learning. The supervised learning model usually selects class label information from the same class, ignoring with-in class image variability and thereby degrading feature selection performance. Some contributions in the image classification field follow: Adama et al. [23] proposed a methodology for human activity recognition (HAR) for assistive robotics. Eleven features are acquired from an RGB-depth sensor and redundant features are removed by an ensemble of three classifiers, namely, Support Vector Machine, K-nearest neighbor, and Random Forest. Their proposed methodology performed better compared to methods having a single classifier. Ijjina et al. [24] used a Convolution Neural Network to train discriminative features to recognize human activity-based motion sequence information from RGB-D video images. The performance evaluation is demonstrated on Weizmann, MIVIA, SBU, and NATOPS datasets. The model performed well but it has certain limitations; if the images contain any degree of rotation then this model cannot classify the image accurately. Moreover, the information about position can be lost and will not be able to classify the image accurately. Cippitelli et al. [25] exploited the multiclass Support Vector Machine (SVM) classifier for HAR to monitor elderly people in home environments. The SVM classifier is trained on skeleton Sustainability 2020, 12, 10324 5 of 21 data extracted by RGBD sensors. Koppula et al. [26] developed the descriptive labeling of sub-activities and human interaction with objects by Markov random field. The labelled objects are further trained via a Structural Support Vector Machine (SSVM) methodology. The SSVM methodology, which is tested over a challenging dataset, obtains an accuracy of 79.4%. Jalal et al. [27] trained spatiotemporal features based on skeleton joint features using a Hidden Markov Model (HMM) to detect human activities in their HAR system. The HMM is trained with the code vectors to recognize the segmented human activity by the forward spotting scheme. This model gives better results than other state-of-the-art models. Zanfir et al. [28] designed a moving pose descriptor by using a modified KNN classifier to evaluate the pose information of low-latency information and applied it to HAR.

Activity Classification via Wearable Sensors
Human activity classification applications mostly rely on wearable sensor technologies. However, in practical applications, several factors like noise, data losses, and data quality need to be addressed. Therefore a data-driven classification algorithm is required that is capable of tackling experimental constraints. In a comprehensive study, Mannini et al. [29] presented a wavelet-based activity classifier to separate dynamic motion components from gravity components using multiple accelerometer sensors that produce a significantly improved performance accuracy. Akram et al. [30] evaluated the performance of five classifiers that include multilayer perceptron, SVM, LMT, Random Forest, and Simple Logistic using a 10-fold cross-validation method of daily life activities. The proposed method delivers an accuracy rate of 91.15%. Tahir et al. [31] exploited a Maximum Entropy Markov Model classifier to measure the highest entropy for human activities. This model provided an accuracy of more than 91% against IMSB and USC-HAD datasets. Cao et al. [32] proposed a group-based context-aware classifier to recognize human activities on smartphones. The Group-based context-aware system exploits hierarchical features known as GCHAR that design two-level inter and intra-group structures to detect transitions in activity groups. It achieved an accuracy rate of 94.1%. This model becomes non-responsive if the input space is very large in real time situations. These issues caused a serious downgrade in the performance of this model. Golestani et al. [33] implemented a Deep Recurrent Neural Network (RNNs) based on long/short-term memory (LSTM) units. This model can learn complex activities compared to the commonly used classifiers. Zebin et al. [34] proposed a feature learning methodology to classify human activities using a Convolution Neural Network (CNN). The inertial sensors were located at five different locations of the lower body to systematically classify CNN from the feature learning methodology. Murahari et al. [35] introduced attention models as a data-driven approach for exploring relevant temporal contexts. The attention model learns a set of weight over input data and adds this attention layer to the Deep Convolution LSTM for classification decisions. This model is evaluated over a PAMPA2 dataset and it achieved an accuracy of 87.5%. Xi et al. [36] used dilated convolutional layers to automatically extract inter-sensor and intra-sensor features by using convolution neural network (CNN) and recurrent neural network (RNN). They also proposed a novel dilated Simple Recurrent Unit (SRU) approach to capture latent time dependencies among PAMPA2 dataset features. The proposed model achieved an overall accuracy of 93.5%. Hammerla et al. [37] explored convolution neural network (CNN) to evaluate over the PAMPA2 dataset which contains movement data captured with wearable sensors. The proposed model achieved an accuracy of 93.7% over the PAMPA2 dataset.

Proposed Methods
The proposed model has been integrated by passing the signal through a standardized model to achieve enhanced classification. The block diagram of the proposed HBM is shown in Figure 1. It includes five main steps, namely data acquisition, data segmentation and denoising, feature extraction, data optimization, and classification. The data of the signal is first segmented with a fixed window frame. The signal is refined using Chebyshev, Elliptic, and Bessel filters. Then, the filtered and denoised signals are used for the extraction of features. The extracted features include both frequency and

Preprocessing Stage
An inertial sensor, like an IMU sensor, is highly prone to error, so any unintentional change can alter the signal shape and complicate feature extraction. Therefore, signal data is first preprocessed using multiple filters such as Chebyshev, Elliptic, and Bessel filter. The Chebyshev filter removes power line interference, smooths the raw data, and increases the accuracy of the data (See Figure 2a). The Elliptic filter removes the random oddities in raw signals as shown in See Figure 2b. The Bessel filter preserves the shape of the filtered signals in the passband and eliminates the impulsive type of noise. The Bessel filter provides a better response than Chebyshev and Elliptic filter as illustrated in Figure 2. Meanwhile, the Bessel filter denoises the signal in the proposed model.

Preprocessing Stage
An inertial sensor, like an IMU sensor, is highly prone to error, so any unintentional change can alter the signal shape and complicate feature extraction. Therefore, signal data is first preprocessed using multiple filters such as Chebyshev, Elliptic, and Bessel filter. The Chebyshev filter removes power line interference, smooths the raw data, and increases the accuracy of the data (See Figure 2a). The Elliptic filter removes the random oddities in raw signals as shown in See Figure 2b. The Bessel filter preserves the shape of the filtered signals in the passband and eliminates the impulsive type of noise. The Bessel filter provides a better response than Chebyshev and Elliptic filter as illustrated in Figure 2. Meanwhile, the Bessel filter denoises the signal in the proposed model.

Preprocessing Stage
An inertial sensor, like an IMU sensor, is highly prone to error, so any unintentional change can alter the signal shape and complicate feature extraction. Therefore, signal data is first preprocessed using multiple filters such as Chebyshev, Elliptic, and Bessel filter. The Chebyshev filter removes power line interference, smooths the raw data, and increases the accuracy of the data (See Figure 2a). The Elliptic filter removes the random oddities in raw signals as shown in See Figure 2b. The Bessel filter preserves the shape of the filtered signals in the passband and eliminates the impulsive type of noise. The Bessel filter provides a better response than Chebyshev and Elliptic filter as illustrated in Figure 2. Meanwhile, the Bessel filter denoises the signal in the proposed model.

Windows Selection
A signal cannot be processed in a non-stationary form. Therefore, framing or windowing is needed to divide the signal into short stationary segments. In this way, each behavior can be accommodated to recognize different signal patterns. The three datasets used in this paper reflect different behavioral patterns. Human motion patterns such as ascending stairs, treadmill running, bicycling, and rope jumping in the DALIAC dataset required a short time duration frame to accurately analyze and predict the subject's behavior.
Inertial signals are taken as a stream and subjected to feature extraction for the sensor data stream. Initially, the definitive parameters are considered, namely window selection into a fixed size for better recognition of signal patterns. Conventionally, a 4-5 s window has been utilized because the same pattern of the activity is followed in most of the databases [38][39][40]. In our case, due to the proposition to use a physical activities dataset that reveals more dynamicity in the data, the selection of the traditional approach was considered less effective and needs further analysis. While considering our data, different window sizes from 4 s to 9 s were chosen to quantify the effect of the varying window length on DALIAC, PAMPA2, and IM-LifeLog datasets, as shown in the Table 1. In the proposed model a 9 s window was used to obtain accurate results.

Windows Selection
A signal cannot be processed in a non-stationary form. Therefore, framing or windowing is needed to divide the signal into short stationary segments. In this way, each behavior can be accommodated to recognize different signal patterns. The three datasets used in this paper reflect different behavioral patterns. Human motion patterns such as ascending stairs, treadmill running, bicycling, and rope jumping in the DALIAC dataset required a short time duration frame to accurately analyze and predict the subject's behavior.
Inertial signals are taken as a stream and subjected to feature extraction for the sensor data stream. Initially, the definitive parameters are considered, namely window selection into a fixed size for better recognition of signal patterns. Conventionally, a 4-5 s window has been utilized because the same pattern of the activity is followed in most of the databases [38][39][40]. In our case, due to the proposition to use a physical activities dataset that reveals more dynamicity in the data, the selection of the traditional approach was considered less effective and needs further analysis. While considering our data, different window sizes from 4 s to 9 s were chosen to quantify the effect of the varying window length on DALIAC, PAMPA2, and IM-LifeLog datasets, as shown in the Table 1. In the proposed model a 9 s window was used to obtain accurate results.  The Discrete Hartley Transform (DHT) feature is an orthogonal transform that computes complex numbers Xn to accurately capture the behavior of the signal as a function of time. The real Xn_real and imaginary Xn_img part of the complex number represent the cosine and sine components in real numbers with embedded magnitude and phase information. The standard DHT is defined as; where, where N represents the total number of Discrete Hartley Transform of a sequence x(n), for 0 to N − 1. Figure 3 represents Discrete Hartley Transform of signal components. (12) where xi represents the samples of the signal and N is the total number of samples within a given frame.

Envelope Estimation Feature
The Hilbert transform envelope detector is used to detect the envelope of the amplitude of the signal. The envelope is calculated by first taking the square root of the differentiator f'(t). The resultant f'(t) 2 is then summed up with the product of f(t) and f''(t). The square root is then fed with gain. Envelope estimation is shown in Figure 4. The final equation is defined as;

Local Mean Decomposition Feature
Local Mean Decomposition (LMD) gradually and smoothly decomposes the complicated signal into many single components known as product functions (PFs). The PFs give the information of two important attributes: frequency and envelope of the signal from which a time-varying instantaneous frequency can be derived. The LMD process will continue until all the PFs components are formed, as shown in Figure 4. The following are the steps of the LMD method: • Calculate the Local Mean Decomposition of the adjacent samples, i.e., n i and n i+1 of the signal as; • Then, calculate the envelope estimation of the signal of the frame of the signal, i.e., a i • Next, the calculated mean value m(t) is separated from the current sample x(t). Then, the frequency modulation signal s(t) is achieved by dividing the separated function g(t) from the envelope function a(t) as; • Later on, all estimated envelops are multiplied to attain an envelope signal a(t).
• Next, the PFs components are obtained by multiplying the frequency modulated signal, i.e., s(t) and the envelope signal a(t) together.
• Finally, calculated PFs components are then subtracted from the original current sample to determine whether u k (t) is a monotonic function or not. If the result is not monotonic then all the above processes are repeated from the first step.

Empirical Mode Decomposition Feature
The Empirical Mode Decomposition (EMD) is an adaptive analysis method. It is a data-driven method that uses the characteristics of the signal to decompose its values. Moreover, it automatically and efficiently adjusts the scale of the signal. The decomposition process starts by first finding local maxima and minima values. Moreover, two important conditions are checked: Firstly, zero crossing of the signal should not exceed the value of one and, secondly, the local mean of the envelope of the signal should be close to zero. If both conditions are satisfied, we proceed further and extract the intrinsic mode function (IMF). The envelope of the signal is taken, and we extract the intrinsic mode function (IMF) from the signal, but only if zero-crossing does not exceed one and the local mean of envelope of the signal is close to zero, as shown in Figure 5. If it does not meet these two conditions, then the whole process is repeated until it satisfies the above-mentioned conditions. It is represented as; where Emean is the mean of the upper envelope of the signal Eup, and the lower envelope of the signal Edown. The mean of the envelope Emean is subtracted from the original signal x(t) to get a decomposition of the signal d(t).

Spectral Kurtosis Feature
The Spectral Kurtosis (SKT) is the time-frequency domain method. It calculates the nearness arrangement of transients in the frequency domain by taking a proportion of peakedness and signal impulsiveness. Hence, it is a useful tool for characterizing non-stationary or non-Gaussian behavior in the frequency domain. The SKT is depicted as; where f k is the frequency in Hertz, s k is the spectral value, µ 1 and µ 2 are the spectral centroid and spectral spread, respectively. The b 1 and b 2 are the band edges that calculate Spectral Kurtosis.

The Transient Detection Principles Feature
The Transient Detection Principles (TDP) eliminate the area of low background activity and detect transient events in the signal data. It detects the non-harmonic phase of a signal within short frames of the signal, as shown in Figure 3. The mean absolute value of the signal is selected as threshold value t, to eliminate the area of low background activity. The threshold t is calculated as; where x i represents the samples of the signal and N is the total number of samples within a given frame.

Envelope Estimation Feature
The Hilbert transform envelope detector is used to detect the envelope of the amplitude of the signal. The envelope is calculated by first taking the square root of the differentiator f (t). The resultant f (t) 2 is then summed up with the product of f (t) and f "(t). The square root is then fed with gain. Envelope estimation is shown in Figure 4. The final equation is defined as;

Empirical Mode Decomposition Feature
The Empirical Mode Decomposition (EMD) is an adaptive analysis method. It is a data-driven method that uses the characteristics of the signal to decompose its values. Moreover, it automatically and efficiently adjusts the scale of the signal. The decomposition process starts by first finding local maxima and minima values. Moreover, two important conditions are checked: Firstly, zero crossing of the signal should not exceed the value of one and, secondly, the local mean of the envelope of the signal should be close to zero. If both conditions are satisfied, we proceed further and extract the intrinsic mode function (IMF). The envelope of the signal is taken, and we extract the intrinsic mode function (IMF) from the signal, but only if zero-crossing does not exceed one and the local mean of envelope of the signal is close to zero, as shown in Figure 5. If it does not meet these two conditions, then the whole process is repeated until it satisfies the above-mentioned conditions. It is represented as; where E mean is the mean of the upper envelope of the signal E up , and the lower envelope of the signal E down . The mean of the envelope E mean is subtracted from the original signal x(t) to get a decomposition of the signal d(t).

Empirical Mode Decomposition Feature
The Empirical Mode Decomposition (EMD) is an adaptive analysis method. It is a data-driven method that uses the characteristics of the signal to decompose its values. Moreover, it automatically and efficiently adjusts the scale of the signal. The decomposition process starts by first finding local maxima and minima values. Moreover, two important conditions are checked: Firstly, zero crossing of the signal should not exceed the value of one and, secondly, the local mean of the envelope of the signal should be close to zero. If both conditions are satisfied, we proceed further and extract the intrinsic mode function (IMF). The envelope of the signal is taken, and we extract the intrinsic mode function (IMF) from the signal, but only if zero-crossing does not exceed one and the local mean of envelope of the signal is close to zero, as shown in Figure 5. If it does not meet these two conditions, then the whole process is repeated until it satisfies the above-mentioned conditions. It is represented as; where Emean is the mean of the upper envelope of the signal Eup, and the lower envelope of the signal Edown. The mean of the envelope Emean is subtracted from the original signal x(t) to get a decomposition of the signal d(t).

Optimization Algorithm: Probability Based Incremental Learning
Probability Based Incremental Learning (PBIL) is a combination of a genetic algorithm (GA) and competitive learning. GA is mostly known as a function optimizer that works on three complex operations, i.e., selection, crossover, and mutation [38]. These GA operations are theoretical and numerically complex. To overcome the disadvantage of GA, a PBIL algorithm is designed that works like a GA on a binary encoded representation of an optimal problem. The PBIL algorithm works on a real valued probability vector from which potential solutions are generated. The population of an individual is represented by a probability vector. It is defined as; where, p 1 (x i ) represents the probability of ith gene position that has obtained the value of 1. The PBIL vectors are initialized to 0.5 which produces uniform distribution with an equal probability of 1 and 0 for each bit of chromosomes. Thus, the chromosomes gradually shift toward the solution with highest fitness value [39]. The evolution process in PBIL works by generating M number of solutions on the basis of current probability vector p 1 (x). The results are then evaluated and N best solutions (N ≤ M) are selected to update the probability vector by using a Hebbian inspired rule. It is depicted as; where a is the learning rate and it is represented as a (0,1]. After the probability vector is updated, a new set of solutions is generated by sampling from the new probability vector. The process is repeated until the probability vector for each bit position is either converged to 0.0 or 1.0. The PBIL algorithm efficiently works on the data. Still, a major drawback of PBIL is that it often gets trapped in a local optimum as a single probability vector which is used to generate the complete population. To overcome this issue, the updated PBILs proposed in [41] are used in this paper. In the updated PBIL, each individual vector uses different probability vectors to generate its own children. Moreover, to further speed up the convergence, the neighborhood updates its probability vectors on a random basis, as shown in Figure 6. It is depicted as; .p i (i) + LR.r.best(i) + LR.(1.0 − r).neig best (i) (18) where LR is the learning rate, best(i) and neig best (i) are the best solution of ith string. While, r is the random parameter selected from the interval of [0, 1].

Optimization Algorithm: Probability Based Incremental Learning
Probability Based Incremental Learning (PBIL) is a combination of a genetic algorithm (GA) and competitive learning. GA is mostly known as a function optimizer that works on three complex operations, i.e., selection, crossover, and mutation [38]. These GA operations are theoretical and numerically complex. To overcome the disadvantage of GA, a PBIL algorithm is designed that works like a GA on a binary encoded representation of an optimal problem. The PBIL algorithm works on a real valued probability vector from which potential solutions are generated. The population of an individual is represented by a probability vector. It is defined as; where, p1(xi) represents the probability of ith gene position that has obtained the value of 1. The PBIL vectors are initialized to 0.5 which produces uniform distribution with an equal probability of 1 and 0 for each bit of chromosomes. Thus, the chromosomes gradually shift toward the solution with highest fitness value [39].
The evolution process in PBIL works by generating M number of solutions on the basis of current probability vector p1(x). The results are then evaluated and N best solutions (N ≤ M) are selected to update the probability vector by using a Hebbian inspired rule. It is depicted as; where a is the learning rate and it is represented as aϵ(0,1]. After the probability vector is updated, a new set of solutions is generated by sampling from the new probability vector. The process is repeated until the probability vector for each bit position is either converged to 0.0 or 1.0. The PBIL algorithm efficiently works on the data. Still, a major drawback of PBIL is that it often gets trapped in a local optimum as a single probability vector which is used to generate the complete population. To overcome this issue, the updated PBILs proposed in [41] are used in this paper. In the updated PBIL, each individual vector uses different probability vectors to generate its own children. Moreover, to further speed up the convergence, the neighborhood updates its probability vectors on a random basis, as shown in Figure 6. It is depicted as; where LR is the learning rate, best(i) and neigbest(i) are the best solution of ith string. While, r is the random parameter selected from the interval of [0, 1]. Figure 6. Empirical mode decomposition of signal data. Figure 6. Empirical mode decomposition of signal data.

K-Ary Tree Hashing Classifier
The K-Ary Tree Hashing (KATH) classifier is a graph classification algorithm that obtains competitive accuracy with a fast run time, especially for large scale graphs. The main idea of KATH is that the whole graph is projected on a set of optimized features in a common feature space without having any prior knowledge of subtree patterns. Later, a traversal table is constructed that keeps track of similar patterns within the optimized data. It performs recursive indexing for a N number of optimized data to generate (n − 1) depth optimized data to uniquely specify the patterns. Finally, MinHash is applied to classify the sub-patterns to discriminate against different human activities [42]. After analyzing the KATH algorithm, two favorable properties of KATH are obtained: Firstly, the bound of convergence rate tends to become tighter as the KATH algorithm steps into higher resolution. Secondly, the KATH kernel is robust in that it tolerates drift between graph chunks over a stream [43,44]. The experimental results show that KATH can classify data from significantly different activities other than the generic framework, e.g., SVM and logistic regression, as shown in Figure 7. Algorithm 1 depicts the pseudocode of the KATH algorithm. The input includes a graph represented by g = (ν, ∈, ), the number of iterations is represented by R while M represents the feature space. The nodes ν are relabeled and only consider their neighboring nodes N v to assign a new label (r) . The traversal table is generated and stored in T. The algorithm works in three steps, as shown in Algorithm 2. The First Traversal Table is constructed (Lines 1-7), where each leaf is extended to new leaves (Lines 8-13) based on the indexing of optimized data. Finally, a MinHash scheme is applied (Lines 15-16) to classify the data into different human activities. The results of DALIAC, PAMAP2, and IM-LifeLog datasets at k iterations are depicted in Table 2. PCA (Principle Component Analysis) is used as a dimensional reduction algorithm to plot the clustering results in a 3D feature space. We construct a dxk dimensional transformation matrix W that allows us to map a sample vector x onto a new k-dimensional feature subspace that has fewer dimensions than the original d-dimensional feature space as shown in the below equation.
where x is the array of sample vectors from x1 to xd, and z is the array of sample vectors from z1 to zk in R d and R k dimensional space respectively.

Experimental Testing and Datasets Descriptions
A platform is established to evaluate the performance of the proposed methodology using inertial sensors to attain data on human daily life activities. All datasets are applied in real-time situations, especially in health care assessments of physical exercises and daily life-log routines. All datasets are evaluated using a Leave One Subject Out (LOSO) cross-validation method over training and testing data.
Three datasets are used for experimental evaluations. The DALIAC dataset [45] is taken from four sensors placed on the right hip, chest, right wrist, and left ankle of thirteen healthy subjects. The age of the subjects ranged between 8 to 26 years, and their weight ranged between 14 kg to 75 kg. Thirteen activities were performed by the participants: sitting, lying, standing, washing dishes, vacuuming, sweeping, walking, ascending stairs, descending stairs, treadmill running, bicycling on ergometer (50 W), bicycling on ergometer (100 W) and rope jumping.
The PAMAP2 dataset [46] was performed on IMUs inertial sensors that consist of accelerometer, gyroscope, and magnetometer signal values. Three IMU sensors are placed on the subject's wrist, chest, and ankle. We performed experiments over thirteen activities: lying, sitting, standing, walking, running, cycling, nordic walking, watching TV, computer work, car driving, ascending stairs, descending stairs, and vacuum cleaning.
The IM-LifeLog dataset [47] is our self-annotated human life-logging dataset. Three IMU sensors were placed on the subject's body at the ankle, elbow, and chest. A total of nine subjects performed ten activities: taking breakfast, working on computer, eating, reading an article, exercise, watching tv, taking call, cleaning, taking medicine, and lying down. Each activity was performed for one minute duration. The age of the participants ranged from 10 to 45 years and the weight range was 15 to 80 kg.

Hardware Platform
Three IMU sensors and an NRFL01 module were interfaced with Arduino UNO; these are known as the sender modules. The three sender modules were mounted on the participant's ankle, elbow, and chest, as shown in Figure 8. The fourth setup is comprised of only the NRFL01 interfaced with an Arduino UNO, which is known as the receiver module. All three sender modules were mounted on the human body in order to send the IMU sensor data to the fourth module, the receiver modules. Arduino software (IDE) was used to develop the software for the complete setup. The fourth module, the receiver, was interfaced with a Visual studio 2015 application for simulation in the real-time environment. The IMU sensors consisted of a 3-axis accelerometer, a 3-axis gyroscope,

Experimental Testing and Datasets Descriptions
A platform is established to evaluate the performance of the proposed methodology using inertial sensors to attain data on human daily life activities. All datasets are applied in real-time situations, especially in health care assessments of physical exercises and daily life-log routines. All datasets are evaluated using a Leave One Subject Out (LOSO) cross-validation method over training and testing data.
Three datasets are used for experimental evaluations. The DALIAC dataset [45] is taken from four sensors placed on the right hip, chest, right wrist, and left ankle of thirteen healthy subjects. The age of the subjects ranged between 8 to 26 years, and their weight ranged between 14 kg to 75 kg. Thirteen activities were performed by the participants: sitting, lying, standing, washing dishes, vacuuming, sweeping, walking, ascending stairs, descending stairs, treadmill running, bicycling on ergometer (50 W), bicycling on ergometer (100 W) and rope jumping.
The PAMAP2 dataset [46] was performed on IMUs inertial sensors that consist of accelerometer, gyroscope, and magnetometer signal values. Three IMU sensors are placed on the subject's wrist, chest, and ankle. We performed experiments over thirteen activities: lying, sitting, standing, walking, running, cycling, nordic walking, watching TV, computer work, car driving, ascending stairs, descending stairs, and vacuum cleaning.
The IM-LifeLog dataset [47] is our self-annotated human life-logging dataset. Three IMU sensors were placed on the subject's body at the ankle, elbow, and chest. A total of nine subjects performed ten activities: taking breakfast, working on computer, eating, reading an article, exercise, watching tv, taking call, cleaning, taking medicine, and lying down. Each activity was performed for one minute duration. The age of the participants ranged from 10 to 45 years and the weight range was 15 to 80 kg.

Hardware Platform
Three IMU sensors and an NRFL01 module were interfaced with Arduino UNO; these are known as the sender modules. The three sender modules were mounted on the participant's ankle, elbow, and chest, as shown in Figure 8. The fourth setup is comprised of only the NRFL01 interfaced with an Arduino UNO, which is known as the receiver module. All three sender modules were mounted on the human body in order to send the IMU sensor data to the fourth module, the receiver modules. Arduino software (IDE) was used to develop the software for the complete setup. The fourth module, the receiver, was interfaced with a Visual studio 2015 application for simulation in the real-time environment. The IMU sensors consisted of a 3-axis accelerometer, a 3-axis gyroscope, and a 3-axis magnetometer in a small package. The main limitation in this setup was the 9-volt battery, which could only operate for up to two days. This limitation can be overcome by recharging or replacing the battery at appropriate times.
The accuracy of individual sensor groups with their location is investigated by placing the sensors at seven different locations, specifically, upper arm, elbow, wrist, chest, hip, thigh, and ankle. These configurations create 12 different combinations that are investigated using the proposed methodology. The sensors mounted at joint locations, wrist, ankle, and elbow, provide the information of 3D motion capture data regarding human body movement. The sensor mounted at the wrist captures the 3D rotation of hand movement. Similarly, by placing the sensor at the ankle position, four ankle movements can be monitored: plantar flexion (PF), dorsiflexion (DF), inversion (INV), and eversion (EVR) which are further used to capture the accurate leg movement. The results regarding the position of the sensors are evaluated and the best accuracy on our model is achieved by placing the sensors at the chest, elbow, and ankle as shown in Table 3. Hence, the elbow, ankle, and chest position are recommended for recognizing human activity classification.

Experimental Results and Evaluation
Experiments are conducted on three benchmark datasets to evaluate the performance of the proposed HBM model. Table 4 presents the confusion matrix of 13 activities in the DALIAC dataset, where accuracy of 94.23% was obtained. On the other hand, Table 5 depicts the mean accuracy of 94.07% over 13 different activities in the PAMAP2 dataset. Table 6 shows the confusion matrix of the IM-LifeLog dataset over 10 activities, where accuracy of 96.40% over the KATH classifier was achieved. Finally, Table 7 presents the comparison results of the proposed approach over DALIAC, PAMAP2, and IM-LifeLog datasets, respectively.

Conclusions and Future Works
In this paper, we proposed novel frequency and time domain features for an HBM system that recognizes life-logging activities transmitted from inertial IMU sensors. These features examined the Discrete Hartley Transform, Local Mean Decomposition, Spectral Kurtosis, Transient Detection Principles, Envelope Estimation, and Empirical Mode Decomposition features. Such features are optimized using Probability Based Incremental Learning (PBIL) and are then classified using a K-Ary based Tree Hashing classifier (KATH). The proposed model is evaluated on three benchmark datasets that give an accuracy of 94.23%, 94.07%, 96.40% over DALIAC, PAMAP2, and self-annotated IM-LifeLog datasets, respectively. Our proposed methodology achieved remarkable accuracy rates over current state-of-the-art methods.
As future work, the proposed method will be further improved by adding features from different domains. Additionally, we are planning to develop datasets for elderly healthcare using inertial and optical sensors.