Stochastic Recognition of Physical Activity and Healthcare Using Tri-Axial Inertial Wearable Sensors

Featured Application: The proposed technique is an application of physical activity detection, analyzing three challenging benchmark datasets. It can be applied in sports assistance systems that help physical trainers to conduct exercises, track functional movements, and to maximize the performance of people. Furthermore, it can be applied in surveillance system for abnormal events and action detection. Abstract: The classiﬁcation of human activity is becoming one of the most important areas of human health monitoring and physical ﬁtness. With the use of physical activity recognition applications, people su ﬀ ering from various diseases can be e ﬃ ciently monitored and medical treatment can be administered in a timely fashion. These applications could improve remote services for health care monitoring and delivery. However, the ﬁxed health monitoring devices provided in hospitals limits the subjects’ movement. In particular, our work reports on wearable sensors that provide remote monitoring that periodically checks human health through di ﬀ erent postures and activities to give people timely and e ﬀ ective treatment. In this paper, we propose a novel human activity recognition (HAR) system with multiple combined features to monitor human physical movements from continuous sequences via tri-axial inertial sensors. The proposed HAR system ﬁlters 1D signals using a notch ﬁlter that examines the lower / upper cuto ﬀ frequencies to calculate the optimal wearable sensor data. Then, it calculates multiple combined features, i.e., statistical features, Mel Frequency Cepstral Coe ﬃ cients, and Gaussian Mixture Model features. For the classiﬁcation and recognition engine, a Decision Tree classiﬁer optimized by the Binary Grey Wolf Optimization algorithm is proposed. The proposed system is applied and tested on three challenging benchmark datasets to assess the feasibility of the model. The experimental results show that our proposed system attained an exceptional level of performance compared to conventional solutions. We achieved accuracy rates of 88.25%, 93.95%, and 96.83% over MOTIONSENSE, MHEALTH, and the proposed self-annotated IM-AccGyro human-machine dataset, respectively.


Introduction
Chronic and physical fitness-related diseases are rapidly increasing as the population increases. Physical activities are directly associated with human health benefits. Therefore, many researchers strongly recommend 30 to 40 minutes of physical activity regularly for a healthier life, since it can reduce the risk of many diseases, such as heart attacks, diabetes, cancer, cardiovascular disease, and so on [1]. In hospitals, many patients need continuous monitoring, which is quite expensive and inconvenient, Instead of relying on image data, many researchers have designed wearable sensor technologies for activity monitoring and classification. Jansi et al. [19] presented a multi-feature (time and frequency) domain to enhance the classification of eight different human activities from inertial sensors installed in smartphones. Tian et al. [20] proposed a two-layer diversity-enhanced multi-classifier recognition method from one triaxle accelerometer to classify four different activities. Furthermore, they extracted three-domain features (time, frequency, and AR coefficients) to optimize the performance of the multi-classifier recognition system. Tahir et al. [21] proposed a multifused model to maximize the optimal features values. The extracted features values are then optimized and classified using adaptive moment estimation and a maximum entropy Markov Model. This method achieved an accuracy of 90.91% over the MHEALTH dataset. Haresamudram et al. [22] introduced a masked reconstruction-based BERT model for human activity recognition activities. The activities are pre-trained as a self-supervised approach. The transformer encoder architecture is also applied for continuous data from body worn sensors and achieved an accuracy of 79.86% over a MOTIONSENSE dataset. Jordao et al. [23] implemented convolution neural network for wearable sensors human activity recognition data. The authors evaluated the implemented methodology on an MHEALTH dataset by using the "leave one subject out" validation protocol; they achieved an accuracy rate of 83%. Batool et al. [24] proposed a physical activity detection model based on Mel Frequency Cepstral Coefficients (MFCC) and statistical features. The extracted features were then optimized and classified with a PSO and SVM algorithm. The implemented methodology was later evaluated over a MOTIONSENSE dataset, giving an accuracy of 87.50%. Zha et al. [25] presented Logical Hidden Semi-Markov Models (LHSMMs), which are a combination of Logical Hidden Markov Models (LHMMSs) and Hidden Semi-Markov Models, to segment the duration of each activity. Moreover, a comparison of LHSMMs and LHMMs proved that the given method is more robust and has higher probability results than the LHMM method.
Optical sensors like digital and bumblebee cameras can be used to improve human lifestyles. However, there are limitations to the use of optical sensors for detecting human activities. With those sensors, the detection of subject's movements is restricted to a particular range and they have privacy issues, e.g., recording in private places like restrooms or intruding on the user's personal life. It is uncomfortable for the subjects to carry optical sensors around with them because sensors are bulky, invasive, and are not easily worn during working hours. Additionally, such cameras are relatively more expensive than other wearable sensors. Despite previous human activity classification research, there are still challenges in computation, multi-sensor support, and precise signal data acquisition. Therefore, we suggest a novel method for human activity classification in this paper.

Methodology
The complete framework for human activity recognition using wearable inertial sensors is depicted in Figure 1. Data are first collected by inertial sensors attached at the wrist, the knee, and the back of the participants. The collected data are preprocessed using a band-stop filter. Fourteen different features are extracted from the filtered data. Subsequently, the extracted features are optimized using BGWO. Finally, the optimized features are fed to a DT classifier to obtain the final classification. A detailed description of the system is given in the following section. Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 19

Preprocessing and Filtration of Sensor Data
Initially, signal enhancement is applied to the inertial sensor data to eliminate redundancy, irrelevancy, and inconsistency in the framed data. The band stop notch filter [26] is used for signal representation to improve the quality of data. A second-order band stop (notch) filter has two cut-off frequencies, lower cut-off and upper cut-off, which are applied to the framed data. Such a filter passes all frequencies from zero to both cutoff frequencies. All the frequencies between the lower cutoff and the upper cutoff are rejected. The bandwidth of the notch-filter is calculated by subtracting the lower cut-off frequency from the higher cut-off frequency, as shown in Figure 2. The filtered data and unfiltered data of the accelerometer signal are plotted on the same axis. The red signal represents the filtered data and the blue signal represents the unfiltered data. Moreover, an on-figure magnifier is added on the plotted accelerometer data showing the zoom-in area of the graph. Band-stop filtration of the accelerometer data in the preprocessing step. The red and blue signals represent the filtered data and the unfiltered data, respectively.

Feature Extraction and Selection
After framing and preprocessing the data, fourteen different features are extracted in each frame, including statistical features, Mel Frequency Cepstral Coefficients (MFCC) features, ECG features, and GMM features. The statistical features are computationally less intensive and can be easily extracted in real time [26][27][28]. The statistical features applied in this paper are the mean, median,

Preprocessing and Filtration of Sensor Data
Initially, signal enhancement is applied to the inertial sensor data to eliminate redundancy, irrelevancy, and inconsistency in the framed data. The band stop notch filter [26] is used for signal representation to improve the quality of data. A second-order band stop (notch) filter has two cut-off frequencies, lower cut-off and upper cut-off, which are applied to the framed data. Such a filter passes all frequencies from zero to both cutoff frequencies. All the frequencies between the lower cutoff and the upper cutoff are rejected. The bandwidth of the notch-filter is calculated by subtracting the lower cut-off frequency from the higher cut-off frequency, as shown in Figure 2. The filtered data and unfiltered data of the accelerometer signal are plotted on the same axis. The red signal represents the filtered data and the blue signal represents the unfiltered data. Moreover, an on-figure magnifier is added on the plotted accelerometer data showing the zoom-in area of the graph.

Preprocessing and Filtration of Sensor Data
Initially, signal enhancement is applied to the inertial sensor data to eliminate redundancy, irrelevancy, and inconsistency in the framed data. The band stop notch filter [26] is used for signal representation to improve the quality of data. A second-order band stop (notch) filter has two cut-off frequencies, lower cut-off and upper cut-off, which are applied to the framed data. Such a filter passes all frequencies from zero to both cutoff frequencies. All the frequencies between the lower cutoff and the upper cutoff are rejected. The bandwidth of the notch-filter is calculated by subtracting the lower cut-off frequency from the higher cut-off frequency, as shown in Figure 2. The filtered data and unfiltered data of the accelerometer signal are plotted on the same axis. The red signal represents the filtered data and the blue signal represents the unfiltered data. Moreover, an on-figure magnifier is added on the plotted accelerometer data showing the zoom-in area of the graph. Band-stop filtration of the accelerometer data in the preprocessing step. The red and blue signals represent the filtered data and the unfiltered data, respectively.

Feature Extraction and Selection
After framing and preprocessing the data, fourteen different features are extracted in each frame, including statistical features, Mel Frequency Cepstral Coefficients (MFCC) features, ECG features, and GMM features. The statistical features are computationally less intensive and can be easily extracted in real time [26][27][28]. The statistical features applied in this paper are the mean, median,

Feature Extraction and Selection
After framing and preprocessing the data, fourteen different features are extracted in each frame, including statistical features, Mel Frequency Cepstral Coefficients (MFCC) features, ECG features, Appl. Sci. 2020, 10, 7122 5 of 20 and GMM features. The statistical features are computationally less intensive and can be easily extracted in real time [26][27][28]. The statistical features applied in this paper are the mean, median, harmonic mean, position vector, sine, and cosine. The MFCC features represent the frequency and amplitude of the sensor signals and are individually helpful in finding the pattern of each activity. The ECG features are commonly used to find the absolute pattern of a signal. In this paper, the ECG features include the autoregressive, waveform length, slope sign change, and Willison amplitude, which can efficiently detect the specific pattern of each activity. Furthermore, the GMM features include the weighting ratio, mean, and covariance for activity recognition. Figure 3 defines the flowchart of feature combinations as: Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 19 harmonic mean, position vector, sine, and cosine. The MFCC features represent the frequency and amplitude of the sensor signals and are individually helpful in finding the pattern of each activity. The ECG features are commonly used to find the absolute pattern of a signal. In this paper, the ECG features include the autoregressive, waveform length, slope sign change, and Willison amplitude, which can efficiently detect the specific pattern of each activity. Furthermore, the GMM features include the weighting ratio, mean, and covariance for activity recognition. Figure 3 defines the flowchart of feature combinations as:

Mean Feature
The mean feature is the average value of the sampled signal in each frame [29]. It is calculated by taking the sum of the features and dividing it by the total number of samples.

Median Feature
The median feature is the middle value of the samples, which divides the data sample into two halves with equal numbers of observations [29]. It separates the higher half from the lower half of the sample data.

Harmonic Mean Feature
The harmonic mean feature HMi(t) specifically measures the reciprocal of the arithmetic mean as a ratio that gives equal weight to each data point in the sample data. It is defined as:

Mean Feature
The mean feature is the average value of the sampled signal in each frame [29]. It is calculated by taking the sum of the features and dividing it by the total number of samples.

Median Feature
The median feature is the middle value of the samples, which divides the data sample into two halves with equal numbers of observations [29]. It separates the higher half from the lower half of the sample data.

Harmonic Mean Feature
The harmonic mean feature HM i (t) specifically measures the reciprocal of the arithmetic mean as a ratio that gives equal weight to each data point in the sample data. It is defined as: · · · · · · · · · + 1 where f n is the total number of samples in each frame and f i corresponds to the current sample of the signal and ranges from f i to f i+N . Figure 4 represents the overall 1D plot of the means, medians, and harmonic means. The blue signal and red signal represent the sitting down and standing up data taken from the accelerometer. Moreover, purple dots, green dots, and yellow dots on accelerometer signals represent mean, median, and harmonic mean value, respectively.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 19 where fn is the total number of samples in each frame and fi corresponds to the current sample of the signal and ranges from fi to fi+N. Figure 4 represents the overall 1D plot of the means, medians, and harmonic means. The blue signal and red signal represent the sitting down and standing up data taken from the accelerometer. Moreover, purple dots, green dots, and yellow dots on accelerometer signals represent mean, median, and harmonic mean value, respectively.

Sine Feature
The sine feature SinӨ measures the angle along the x-axis by calculating the magnitude and direction angle along SinӨ. The magnitude of the sine vector measures the length of a line segment and the direction measures the angle of the ith coefficient signal values between two consecutive sample values t−1 and t. The sine magnitude and angle are defined as: where fi and fi+1 represent the current samples of the signal, and tan −1 represents the return angle between two corresponding feature vectors fi and fi+1.

Cosine Feature
The cosine feature CosӨ measures the angle along the y-axis and calculates the magnitude and angle of a vector using the cosine feature. The magnitude of the cosine vector represents the length of a line segment from its origin to a target point (see Figure 5). Meanwhile, the direction represents the angle of the line formed along the y-axis. The cosine is defined as:

Sine Feature
The sine feature SinӨ measures the angle along the x-axis by calculating the magnitude and direction angle along SinӨ. The magnitude of the sine vector measures the length of a line segment and the direction measures the angle of the ith coefficient signal values between two consecutive sample values t−1 and t. The sine magnitude and angle are defined as: Sin where f i and f i+1 represent the current samples of the signal, and tan −1 represents the return angle between two corresponding feature vectors f i and f i+1 .

Cosine Feature
The cosine feature CosӨ measures the angle along the y-axis and calculates the magnitude and angle of a vector using the cosine feature. The magnitude of the cosine vector represents the length of a line segment from its origin to a target point (see Figure 5). Meanwhile, the direction represents the angle of the line formed along the y-axis. The cosine is defined as: Cos

Position Vector Feature
The position vector feature Pi measures the difference between the ith coefficient values between two consecutive sample values. It generates a straight line between two points and is defined as: where fi , fi+1, and fi+2 represent the current three samples of the sensor signal, as shown in Figure  6. The running and walking activities are represented by a red signal and a blue signal. The position vector of the running activity is represented with a purple dotted line and the position vector of the walking activity is represented with a green dotted line. The purple and green dots are further extended along the x-axis and y-axis to represent them more precisely.
where fn is the log filter bank amplitude, which is calculated by using a discrete cosine transform. Meanwhile, N represents the total number of filter bank channels.

Position Vector Feature
The position vector feature P i measures the difference between the ith coefficient values between two consecutive sample values. It generates a straight line between two points and is defined as: where f i , f i+1 , and f i+2 represent the current three samples of the sensor signal, as shown in Figure 6.

Position Vector Feature
The position vector feature Pi measures the difference between the ith coefficient values between two consecutive sample values. It generates a straight line between two points and is defined as: where fi , fi+1, and fi+2 represent the current three samples of the sensor signal, as shown in Figure  6. The running and walking activities are represented by a red signal and a blue signal. The position vector of the running activity is represented with a purple dotted line and the position vector of the walking activity is represented with a green dotted line. The purple and green dots are further extended along the x-axis and y-axis to represent them more precisely.

MFCC Vector Feature
The Mel Frequency Cepstral Coefficients (MFCC) measure the rate of change of information in a spectral band. These features calculate the peak values in the periodic element of a sensor signal and the resulting signal is neither in the frequency domain nor in the time domain but is in the quefrency domain. The MFCC relates the perceived signal to the actual sensor signal and is defined as: where fn is the log filter bank amplitude, which is calculated by using a discrete cosine transform. Meanwhile, N represents the total number of filter bank channels.  where f n is the log filter bank amplitude, which is calculated by using a discrete cosine transform. Meanwhile, N represents the total number of filter bank channels.

Autoregressive Feature
The autoregressive (AR) feature samples each activity signal as a linear combination of the previous sample plus the error sequence. These features map the particular pattern of an activity and return a true value if the pattern is detected. It is specifically used for feature extraction and it is defined as: where f n-i is the n-i feature of the sample data, a i represents the AR coefficient and the current sample of the sensor signal n is the error sequence. The second-and fourth-order ARs are used in this paper since they give the best results for each activity pattern.

Waveform Length Feature
The Waveform Length (WL) features are mainly used in ECG signals to measure the complexity of a signal. These are used to detect human physical activities using wearable inertial sensors. Additionally, they efficiently estimate the complexity of each activity signal by taking the negative of the current sample from the last sample and summing them. These features give the amplitude, frequency, and time information. Its formula is: where f i is the feature vector of the current sample, f i−1 is the past sample, and N is the total number of samples in the current frame.

Slope Sign Change Feature
The slope sign change feature is the ratio of the vertical and horizontal changes between two sample points of a window segment. The vertical change between two sample points is called the rise, and the horizontal change is known as the run. The slope sign change is defined as: where h and k are the x and y coordinates of a point in the signal sample in the current frame, respectively. f 1 and f 2 represent the current sample points.

Willison Amplitude Feature
The Willison amplitude is used for the detection of ECG signals. In HAR, it is defined as the number of counts for each change in the inertial sensor signal that exceeds the threshold. We used thresholds ranging from 0.01 to 0.27 to detect different human activities since they give efficient performance over time. It is defined as: where f n and f n+1 are the current and previous samples, respectively. Meanwhile, N is the total number of samples in the current frame.

GMM Mean Feature
The GMM is statistically used for density estimation and clustering models. It consists of covariance matrices, a mixture weight, and a mean vector. In this paper, the GMM mean vector is calculated by using the maximum likelihood estimation that effectively estimates the mean vector of the inertial signal. It is defined as:

GMM Weighting Ratio Feature
The GMM weighting ratio calculates the maximum likelihood probability for the detection of each activity signal. In this paper, the GMM weighting ratio is calculated using the iterative maximization method and is defined as: where f i is the current feature of the signal and is subtracted from the next predicted f i+1 feature of the signal to estimate the weighted ratio of the current feature.

GMM-Based Covariance Ratio Feature
The GMM covariance measures the joint variability of two samples in the current window. The result of the covariance is always positive for a greater value of the second data sample and vice versa.
where µ x is the mean of the current sample f i , and µ y is the mean of the previous sample f i−1 .

Basic Classifier
In this paper, three basic classifiers, along with optimization algorithms, were used to evaluate the proposed preprocessing and feature extraction methodology. These are the Decision Tree classifier optimized by BGWO, the SVM optimized by particle PSO, and the Genetic Algorithm optimized by ACO. Figure 7 defines the flowchart of feature optimization and classification.

Pre-Classification Using Binary Grey Wolf Optimization
Binary Grey Wolf Optimization (BGWO) is an optimization algorithm inspired by grey wolves that live in groups of 5 to 12 [30]. To estimate the leadership in an individual group, four levels named alpha, beta, delta, and omega are considered. Alpha includes the leaders of the individual group of males and females. Beta gives suggested feedback to the alpha when making decisions. Delta includes

Pre-Classification Using Binary Grey Wolf Optimization
Binary Grey Wolf Optimization (BGWO) is an optimization algorithm inspired by grey wolves that live in groups of 5 to 12 [30]. To estimate the leadership in an individual group, four levels named alpha, beta, delta, and omega are considered. Alpha includes the leaders of the individual group of males and females. Beta gives suggested feedback to the alpha when making decisions. Delta includes the roles of sentinels, elders, caretakers, and scouts. The omega wolves only obey other groups of wolves [31]. The BGWO algorithm is calculated as: where Xp is the prey's position vector. X mimics the position of wolves in n-dimensional space. r1 and r2 are random vectors that lie from 0 to 1 in each iteration. In the hunting process, alpha is the optimal solution, while beta and delta need to know the possible position of prey. Out of all possible solutions, only the three best possible solutions are selected to modify the decision space consistent with the best [32].
In BGWO, a is the main component of the algorithm. a is processed from increase to decrease vector of each dimension using the following formula to obtain the optimal solution [33]. The equation is defined as: where t is the total number of iterations and maxiter is the maximum number of iterations in the optimization algorithm. BGWO is applied on three datasets, including the MOTIONSENSE, MHEALTH, and IM-AccGyro datasets. The final sample data convergence results and optimization results are shown in Figures 8-10.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 19 the roles of sentinels, elders, caretakers, and scouts. The omega wolves only obey other groups of wolves [31]. The BGWO algorithm is calculated as: where Xp is the prey's position vector. X mimics the position of wolves in n-dimensional space. r1 and r2 are random vectors that lie from 0 to 1 in each iteration. In the hunting process, alpha is the optimal solution, while beta and delta need to know the possible position of prey. Out of all possible solutions, only the three best possible solutions are selected to modify the decision space consistent with the best [32].
In BGWO, a is the main component of the algorithm. a is processed from increase to decrease vector of each dimension using the following formula to obtain the optimal solution [33]. The equation is defined as: where t is the total number of iterations and maxiter is the maximum number of iterations in the optimization algorithm. BGWO is applied on three datasets, including the MOTIONSENSE, MHEALTH, and IM-AccGyro datasets. The final sample data convergence results and optimization results are shown in    the roles of sentinels, elders, caretakers, and scouts. The omega wolves only obey other groups of wolves [31]. The BGWO algorithm is calculated as: where Xp is the prey's position vector. X mimics the position of wolves in n-dimensional space. r1 and r2 are random vectors that lie from 0 to 1 in each iteration. In the hunting process, alpha is the optimal solution, while beta and delta need to know the possible position of prey. Out of all possible solutions, only the three best possible solutions are selected to modify the decision space consistent with the best [32].
In BGWO, a is the main component of the algorithm. a is processed from increase to decrease vector of each dimension using the following formula to obtain the optimal solution [33]. The equation is defined as: where t is the total number of iterations and maxiter is the maximum number of iterations in the optimization algorithm. BGWO is applied on three datasets, including the MOTIONSENSE, MHEALTH, and IM-AccGyro datasets. The final sample data convergence results and optimization results are shown in Figures 8-10.

HAR Using a Decision Tree
The Decision Tree (DT) classifier [34] is commonly used as a predictive model for clustering, prediction, and recognition. A tree is built based on divide-and-rule, and certain parameters are applied to the DT to get the classification result.
The internal nodes of a DT are compared to the attribute values, and the decisions of the branches are made for the current node according to certain attribute conditions [35]. Finally, the leaf nodes provide conclusions. The above-defined process is repeated on a new node and forms a child of the root tree. Every non-leaf node is an input attribute of the sample data, and every leaf node is an output attribute of the sample data.
In a Decision Tree, each path from a root node to a leaf node resembles a set of attributes of conjunctions, while the tree itself represents the disjunction of these attributes of conjunctions [36]. Therefore, a DT is easily converted from IF-THEN statements to classification procedures and rules. The Decision Tree algorithm is described as follows.

Initialization of Attributes
In this case, every internal node is named attribute Ai. Every arc is marked as a predicate, which is applied to the corresponding attribute of the parent node. Finally, each node is named as a class as C1, C2, ..., and CN, respectively.

Classification and Prediction
A Decision Tree is built using training data, and this is commonly known as the induction of a decision tree. Classification and prediction methods are based on the given induction data matrix.

Building a DT from Training Data
To build a DT from training data, DT is built based on gain and gain ratio. The training data is divided into two classes Pi (acceptable level) and Ni (unacceptable level). The information needed to identify the classes of an element is made based on the following information.

HAR Using a Decision Tree
The Decision Tree (DT) classifier [34] is commonly used as a predictive model for clustering, prediction, and recognition. A tree is built based on divide-and-rule, and certain parameters are applied to the DT to get the classification result.
The internal nodes of a DT are compared to the attribute values, and the decisions of the branches are made for the current node according to certain attribute conditions [35]. Finally, the leaf nodes provide conclusions. The above-defined process is repeated on a new node and forms a child of the root tree. Every non-leaf node is an input attribute of the sample data, and every leaf node is an output attribute of the sample data.
In a Decision Tree, each path from a root node to a leaf node resembles a set of attributes of conjunctions, while the tree itself represents the disjunction of these attributes of conjunctions [36]. Therefore, a DT is easily converted from IF-THEN statements to classification procedures and rules. The Decision Tree algorithm is described as follows.

Initialization of Attributes
In this case, every internal node is named attribute Ai. Every arc is marked as a predicate, which is applied to the corresponding attribute of the parent node. Finally, each node is named as a class as C 1 , C 2 , ..., and C N , respectively.

Classification and Prediction
A Decision Tree is built using training data, and this is commonly known as the induction of a decision tree. Classification and prediction methods are based on the given induction data matrix.

Building a DT from Training Data
To build a DT from training data, DT is built based on gain and gain ratio. The training data is divided into two classes P i (acceptable level) and N i (unacceptable level). The information needed to identify the classes of an element is made based on the following information.
The training set is partitioned based on features X i . into different classes based on the weighted average. The Info(P i ,N i ) is represented with Info(TS) as follows: Next, the information gain is calculated with the difference in the basis of features to identify the desired elements. The information gain on each element of the feature X i is defined as: The classification decisions are made based on the greatest value of the gain. Such a decision process is repeated until all the features are properly classified. Each node in DT is located with the greatest gain. The gain has a shortcoming effect, where the number of features is too big. To cope with this issue, the gain ratio is calculated based on gain information and split information. The split information is the split values of two classes P i (acceptable level) and N i (unacceptable level), which are represented with TS. The gain ratio is defined as: A DT is a learning model that uses the information gain to evaluate a target node, and the complete function uses divide-and-rule, a no-return strategy, and a top-down approach. Every branch of a node subset acts recursively and builds Decision Tree nodes and leaves until the classification model is achieved [37]. The final result is shown in Figure 11.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 19 Next, the information gain is calculated with the difference in the basis of features to identify the desired elements. The information gain on each element of the feature Xi is defined as: The classification decisions are made based on the greatest value of the gain. Such a decision process is repeated until all the features are properly classified. Each node in DT is located with the greatest gain. The gain has a shortcoming effect, where the number of features is too big. To cope with this issue, the gain ratio is calculated based on gain information and split information. The split information is the split values of two classes Pi (acceptable level) and Ni (unacceptable level), which are represented with TS. The gain ratio is defined as: A DT is a learning model that uses the information gain to evaluate a target node, and the complete function uses divide-and-rule, a no-return strategy, and a top-down approach. Every branch of a node subset acts recursively and builds Decision Tree nodes and leaves until the classification model is achieved [37]. The final result is shown in Figure 11.

Dataset Description
A platform is established to evaluate the performance of the proposed methodology using a three-axis accelerometer and gyroscope. Wearable sensors are used to acquire human activity data, and the data are uploaded to signal processing software. Our algorithm can be applied in real-time situations, especially for the health care assessments of children and elderly people.
Three datasets were used in this experiment. The MOTIONSENSE dataset [38] is taken from the

Dataset Description
A platform is established to evaluate the performance of the proposed methodology using a three-axis accelerometer and gyroscope. Wearable sensors are used to acquire human activity data, and the data are uploaded to signal processing software. Our algorithm can be applied in real-time situations, especially for the health care assessments of children and elderly people.
Three datasets were used in this experiment. The MOTIONSENSE dataset [38] is taken from the accelerometer and gyroscope sensors of an iPhone with a time constraint of 6 seconds. Six activities were performed by 24 participants in different manners, i.e., downstairs, jogging, going upstairs, sitting, walking, and standing.
The MHEALTH dataset [39] is an accelerometer, gyroscope, and magnetometer dataset. In this paper, only the accelerometer and gyroscope sensor data were used to evaluate the proposed model. The sensors were placed on the subject's chest, right wrist, and left ankle. Twelve outdoor activities were performed, including walking, climbing stairs, standing still, sitting and relaxing, waist bending forward, cycling, jogging, running, jump forward and back, knees bending, frontal elevation of arms, and lying down.
The IM-AccGyro dataset [40] is our self-annotated human-machine interactive dataset. Three accelerometer sensors were attached at three different locations, which were the arm, leg, and neck of the subject, as shown in Figure 10. The ages of the participants ranged from 15 to 30 years. Six different indoor and outdoor physical exercise activities were performed including boxing, walking, running, sitting-down, standing-up, and clapping.

Hardware Platform
In the experimental setup, the hardware platform comprised three GY-521 sensors. These sensors were interfaced with the Arduino device using jumper wires for electrical communication, and Bluetooth modules (HC-05) were also connected via GY-521 sensors. All the modules including GY-521, HC-05, and Arduino Uno along with a 9-Volt battery were fixed in a specially designed protective case that was then mounted on a belt and tied on the human body at the arm, leg, and neck position, as shown in Figure 12. Therefore, no sensor component would misbehave during the activities. The Bluetooth HC-05 transceiver modules were responsible for wireless communication and 9-Volt batteries were used with the setup to ensure uninterrupted data collection. The open-source Arduino software (IDE) was used to simulate operation in a real-time environment. During the trials, no loss of data occurred.
The GY-521 sensor provides six-degree-of-freedom (DOF) motion tracking. It consists of a three-axis accelerometer and a three-axis gyroscope embedded in a small chip, and it is based on a micro electro mechanical system (MEMS). The purpose of using the GY-521 sensor is its built-in digital motion processor (DMP) for motion processing. We received data angles of yaw, roll, and pitch with the GY-521. Thus, the burden of a host computer in the manipulation of motion data was minimized.
The limitation with the current setup is the 9-Volt battery. The power of the battery can operate the system for up to two days. There is a need to recharge or to replace the battery often to operate the prototype system for longer periods.
electro mechanical system (MEMS). The purpose of using the GY-521 sensor is its built-in digital motion processor (DMP) for motion processing. We received data angles of yaw, roll, and pitch with the GY-521. Thus, the burden of a host computer in the manipulation of motion data was minimized.
The limitation with the current setup is the 9-Volt battery. The power of the battery can operate the system for up to two days. There is a need to recharge or to replace the battery often to operate the prototype system for longer periods.

Experimental Results and Evaluation
The proposed system is evaluated using the "leave one subject out" (LOSO) cross-validation method with training and testing data. The three chosen classifiers with optimization algorithms are the following: the GA optimized by ACO, the DT optimized by wolf optimization, and the SVM optimized by PSO. The human activity classification algorithm is validated using precision, recall, and F-measure to identify different postures and movements. The precision is defined as the True positive (instances that belong to the class) by the total number of instances (True positive and False positive).
The recall is defined as the proportion of instances classified in one class by the total instances. Total instances include True Positive (TP) and True Negative (TN) values.
The F-measure is the combination of precision and recall and is defined as: The classification results of the three classifiers, i.e., support vector machine, genetic algorithm, and decision tree on MOTIONSENSE, MHEALTH, and IM-AccGyro datasets are reported in Tables 1-3. All classifiers are trained on the training set. The classification results in Tables 1-3 were obtained by using the testing set. In Table 3, we got a better classification result on the IM-AccGyro dataset with an F-measure of more than 90% compared to the classification results of Tables 1 and 2, i.e., the MOTIONSENSE, and MHEALTH datasets. The overall results showed that our proposed method achieved better performance than other state-of-the-art methods.    Table 4 depicts the confusion matrix of the MOTIONSENSE dataset for six different activities with a mean accuracy of 88.25%. Table 5 presents the mean accuracy of 93.95% on the MHEALTH dataset of 12 different activities. Table 6 shows the confusion matrix of the IM-AccGyro dataset for six different activities with an average accuracy of 96.83%. Table 7 presents the comparison results of the proposed approach over MOTIONSENSE, MHEALTH, and IM-AccGyro datasets, respectively.  Mean Accuracy = 96.83% Table 7. Comparison of state-of-the-art methods with the proposed approach.

Conclusions and Future Works
In this paper, we proposed a novel robust framework, called the multi combined features of the HAR system, which recognizes human activities via the inertial measurements captured from wearable sensors. The multi combined features examined the spatiotemporal variation, optimal pattern, structural uncertainty, rehabilitation motion, and transitional activity features. These features are then passed through three classifiers with optimization algorithms including the Genetic Algorithm (GA) optimized by Ant Colony Optimization (ACO), the Decision Tree (DT) optimized by Binary Grey Wolf Optimization (BGWO), and the Support Vector Machine (SVM) optimized by Particle Swarm Optimization (PSO). During the experiments, we used three challenging inertial sensors datasets including the MOTIONSENSE, MHEALTH, and a proposed self-annotated IM-AccGyro human-machine dataset. The recent work done by numerous researchers using state-of-the-art classifiers (SVM [41] and GA [42]) has shown good classification results against multiple benchmark HAR datasets. That is why we evaluated our proposed model against these classifiers. Our proposed method achieved remarkable recognition accuracy performance over the state-of-the-art methods.
As future work, we will improve the efficiency of our multi combined features by adding wavelet and frequency-domain features. Additionally, we are planning to develop more complex activities for different scenarios such as smart homes, offices, and hospitals using various other wearable sensors.