HF-SPHR: Hybrid Features for Sustainable Physical Healthcare Pattern Recognition Using Deep Belief Networks

The daily life-log routines of elderly individuals are susceptible to numerous complications in their physical healthcare patterns. Some of these complications can cause injuries, followed by extensive and expensive recovery stages. It is important to identify physical healthcare patterns that can describe and convey the exact state of an individual’s physical health while they perform their daily life activities. In this paper, we propose a novel Sustainable Physical Healthcare Pattern Recognition (SPHR) approach using a hybrid features model that is capable of distinguishing multiple physical activities based on a multiple wearable sensors system. Initially, we acquired raw data from well-known datasets, i.e., mobile health and human gait databases comprised of multiple human activities. The proposed strategy includes data pre-processing, hybrid feature detection, and feature-to-feature fusion and reduction, followed by codebook generation and classification, which can recognize sustainable physical healthcare patterns. Feature-to-feature fusion unites the cues from all of the sensors, and Gaussian mixture models are used for the codebook generation. For the classification, we recommend deep belief networks with restricted Boltzmann machines for five hidden layers. Finally, the results are compared with state-of-the-art techniques in order to demonstrate significant improvements in accuracy for physical healthcare pattern recognition. The experiments show that the proposed architecture attained improved accuracy rates for both datasets, and that it represents a significant sustainable physical healthcare pattern recognition (SPHR) approach. The anticipated system has potential for use in human–machine interaction domains such as continuous movement recognition, pattern-based surveillance, mobility assistance, and robot control systems.


Introduction
The global elderly population is increasing every day, which requires an independent and aging-in-place lifestyle [1]. Research on Sustainable Physical Healthcare Pattern Recognition (SPHR) has a long tradition, because physical activity recognition can deliver great benefits to society. However, complex SPHR remains a challenging and active research area. A commonly-used strategy is to acquire, analyze, and classify the data for physical activity recognition [2]. It has a wide range of applications, including video surveillance systems, healthcare monitoring, uncertain event detection, interactive 3D games, and smart homes [3]. In order to examine the effectiveness of SPHR for indoor/outdoor environments, the major systems are categorized into two types of data retrieval devices, namely, vision-based and wearable-sensors-based [4]. In vision-based systems, SPHR is relatively prominent, and has been studied extensively, providing acceptable recognition rates. It is challenging to accomplish vision-based setups in real-life environments due In [18], Jalal et al. represented a technique using spatiotemporal multi-fused features to classify segmented human activity. The proposed study used vector quantization for code vector generation, and HMM for SPHR.
On the other hand, wearable sensors can be attached to the human body in order to capture human motion data constantly. In [19], Irvine et al. focused on data-driven approaches and proposed a new ensemble of neural networks. The authors generated four base models and integrated them using a support function fusion method to compute the output decision score for each base classifier. In a study of wearable sensors by Xi et al. [20], surface electromyography (sEMG) wearable sensors are attached on the limbs to monitor the performance of daily activities for frail individuals. They proposed time-, frequencyand entropy-based feature abstraction. Gaussian Kernel Support Vector Machines (GK-SVM) and Fuzzy Min-Max Neural Networks (FMMNN) are used for activity classification. In [21], Wijekoon et al. described a knowledge-light method, as opposed to knowledge intensive methods. They proposed the use of a few seconds of data to help personalize SPHR models, and to further transfer recognition knowledge to identify unknown activities. In [22], Quaid et al. introduced a human pattern behavior recognition method using inertial sensors. They proposed extracting statistical, cepstral, temporal, and spectral features, and then reweighting these features to adapt varying signal patterns. Finally, the classification is performed using biological operations of crossover and mutation. Tahir et al. [23] presented a wearable inertial sensor-based activity recognition system using filters and multifused features. Feature optimization has been accomplished using adaptive moment estimation (Adam) and AdaDelta, which is further patterned using MEMM. Debache et al. The authors of [24] proposed a low-complexity model that is comparable to heavily-featured models for SPHR. They used mobile health (mHealth) and daily Life Activity (DaLiAc) datasets to compare their model's performance using logistic regression (LR), gradient boosting (GB), k-nearest-neighbors (KNN), support vector machines (SVM), and CNN. The authors of [25] proposed a novel method based on the Human Gait Database (HuGaDB) dataset. Their contributions include the identification of direction and sensor position, a best feature selection method, and achieving the highest recognition accuracy for HuGaDB. Furthermore, the model has four different classifiers, namely, Random Forest [26] (RF), SVM, KNN, and Decision Tree (DT). Jalal et al. [27] presented a genetic-based classifier approach for human activity recognition. They proposed a reweighted genetic algorithm for SPHR using inertial data.
Considering our focal schema, we know that SPHR is eventually associated to the real-time monitoring of activities. Additionally, it contains tradeoffs between computational time and activity pattern recognition accuracies. In spite of all of these advanced research methodologies being proposed, there is still a deficiency in the classification of human activities using state-of-the-art techniques. Thus, our research is dedicated to the development of an efficient method that maintains high accuracy rates along with low computational complexities.
Here, we propose an innovative methodology for SPHR using wearable sensors, including an inertial measurement unit (IMU), electrocardiography (ECG), and electromyography (EMG). Our model was able to recognize diverse human activities with better performance measures. Moreover, the proposed methodology consists of de-noising signals, pre-processing, and hybrid feature abstraction. For hybrid features, this research proposed the following four types of features:

•
Statistical nonparametric operator; i.e., a 1D local binary pattern (1D-LBP) generates a code [28] that can describe larger data in its compressed form using the sample and its neighbors. • Entropy-based features: these features are used to find the optimal characteristics of a signal [29], and can easily differentiate between noisy and plain signals. • Wavelet transform features: these features provide an inherent multiresolution approach and wavelet transform properties [30,31] during the signal analysis.
• Mel-frequency cepstral coefficient (MFCC) features: a powerful algorithm to process signals based on Mel-frequency cepstrum coefficients, which can detect the difference between a signal's variations [32,33] for multiple activities.
After extracting the hybrid features, the proposed model performs feature-to-feature fusion, feature selection, a codebook using Gaussian models, and classification for stateof-the-art datasets. Through experimental results, we showed that the proposed model outperformed other comparative state-of-the-art approaches. The major contributions of this model are as follows:

1.
We developed hybrid approaches for feature abstraction, including statistical nonparametric, entropy-based, wavelet transform, and Mel-cepstral features.

2.
We designed a multi-layer sequential forward selection (MLSFS) to differentiate and select the optimal features for SPHR.

3.
A combination of a Gaussian mixture model (GMM) with Gaussian mixture regression (GMR) was introduced to generate the codebook and optimum interpretation of the features.

4.
We used two publicly-available benchmark datasets for our model, and fully validated it against other state-of-the-art methods, including CNN, AdaBoost, and ANN-based algorithms.
The rest of the paper is structured as follows. Section 2 presents the details of the proposed model. Section 3 reports on the investigation and dataset details, along with the results. Section 4 discusses the methodology. Section 5 reports related discussions in the field of SPHR. Section 5 concludes the paper and provides some forthcoming directions.

Materials and Methods
The proposed system acquires raw signals from wearable sensors, specifically, an inertial measurement unit, an electrocardiogram, and an electromyogram for biosignal-based datasets. Initially, a pre-processing phase is used to remove any noise via three different filters, namely, median, notch, and moving average filters. After that, we apply a sliding window algorithm to find hybrid features of different types [34]. In the perspective of multisensory systems, these hybrid features are then fused [35] through a feature-in-feature-out technique [36,37] to improve, refine, and obtain new merged features. The dimensions of these fused data features are reduced using our novel modified multi-layer sequential forward selection algorithm. Next, in order to symbolize these reduced features, we propose a GMM along with GMR algorithms to generate a codebook. Finally, the codebook is then fed to the deep belief networks along with multiple layers of RBMs. An overview of the proposed system is shown in the Figure 1.

Data Acquisition and Pre-Processing
Feature abstraction is deeply reliant on the pre-processing phase; hence, it is important to reduce all of the noise from the acquired data. The data from the sensors [38]including IMU, ECG, and EMG-are extremely susceptible to interference and random noise, which can lead to signal variations, ultimately affecting the features. Therefore, we have applied three different filter types-namely, a median filter for IMU, a notch filter for ECG, and a moving average filter for EMG signals-to eliminate the associated noise. Figure 2 shows the filtering effects on selected lead for ECG, and the axis for IMU.

Data Segmentation
In the segmentation step, the signal samples are partitioned into segments of data in order to capture the dynamic motion. Each window is an approximation of the signal, which is provided for the signal analytics. We can segment a signal in different ways, as activity-defined windows, event-defined windows and sliding windows [39,40]. After the filtering in the pre-processing step, we segmented the filtered data using widows of 5 s duration for each of the signals' axes and ECG/EMG leads, as defined in Algorithm 1, in order to maximize the recognition accuracy.
Sliding windows are used to partition the bio-signal into fixed-sized time windows that can be either non-overlapping or overlapping. Overlapping sliding windows have a generalized positive impact on the performance of the proposed HF-SPHR system. Figure  3 demonstrates all of the windows generated for the x-axis of the IMU when it is placed on the chest, and for lead 1 of the ECG.   Sliding windows are used to partition the bio-signal into fixed-sized time windows that can be either non-overlapping or overlapping. Overlapping sliding windows have a generalized positive impact on the performance of the proposed HF-SPHR system. Figure 3 demonstrates all of the windows generated for the x-axis of the IMU when it is placed on the chest, and for lead 1 of the ECG.

IMU-Based Hybrid Feature Extraction
An inertial measurement unit is a mechanized device that is used to monitor and provide data on object-specific force, angular degree [41] and positioning values. It uses a combination of accelerometers, gyroscopes and magnetometers, which consist of x, y, and z axes. After the pre-processing phase is completed, the second phase is to generate hybrid features from each sensor's processed signal separately. The four major domains of hybrid features employed are statistical non-parametric, entropy-based, wavelet transform, and Mel-frequency cepstral coefficient features. This paper proposes three features for IMU signals: 1D-LBP, state-space correlation entropy (SSCE), and dispersion entropy (DE), which is explained in the sections below. Algorithm 2 ( 1 SSCE and 2 Dispersion Entropy [42][43][44]) shows the pseudocode for the overall IMU feature extraction.

1D Local Binary Pattern
1D-LBP is a non-parametric statistical feature extraction [45] technique. It focuses on the vibration of the signal, and captures the descriptive information representing the relative changes in the IMU signal amplitudes. This feature requires substantially less computational power, and has strong discriminative capabilities.
Here in Equation (1), x is the signal window for 1D-LBP, y is the threshold, T represents selected binary values, and n is the number of total values in each selected window. Figure 4 denotes 1D-LBP features for the mHealth dataset. Each IMU axis is represented on the x-axis, whereas the y-axis represents the number of windows. Each box in the figure visually represents the 1D-LBP data for every IMU axis. The central red mark in the box indicates the median, while the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively.

1D Local Binary Pattern
1D-LBP is a non-parametric statistical feature extraction [45] technique. It focuses on the vibration of the signal, and captures the descriptive information representing the relative changes in the IMU signal amplitudes. This feature requires substantially less computational power, and has strong discriminative capabilities.
Here in Equation (1), x is the signal window for 1D-LBP, y is the threshold, T represents selected binary values, and n is the number of total values in each selected window. Figure 4 denotes 1D-LBP features for the mHealth dataset. Each IMU axis is represented on the x-axis, whereas the y-axis represents the number of windows. Each box in the figure visually represents the 1D-LBP data for every IMU axis. The central red mark in the box indicates the median, while the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively.

State-Space Correlation Entropy
The data related to the time series can be divided into embedded vectors. The state space covariance matrix captures the correlations of the embedded vectors in a time series. The upper triangular and lower triangular elements of the matrix are identical. The diagonal elements of the matrix capture the autocorrelation of the embedded vectors which are calculated from the probability of the correlations between the embedded vectors (See Figure 5) using Equation (2). The dimension of embedded vector is another important parameter for SSCE, for which, when small, the number of embedded vectors is high.
where P k is the probability evaluation and n is the number of bins.

Dispersion Entropy
Dispersion Entropy is used to quantify the regularity of a time series and detect noise bandwidth, simultaneous frequencies, and amplitude changes. As a measure of uncertainty, DE tackles the limitations of permutation entropy and Shannon entropy, including the discrimination of different groups of similar traits with lesser computation time. Dispersion entropy includes four main steps, and they are formulated according to Equation (3); where, x is the signal, m is the embedding dimension, c is the number of classes, d is the time domain, and p π v 0 v 1 ...v m−1 is the number of dispersion patterns, computed as in Equation (4). Meanwhile, z m,c i is the embedding vector, and d is the time delay, as shown in Figure 6.

ECG-Based Hybrid Feature Extraction
ECG-based features are classified into five types that detect possible heart problems and other abnormalities [46] related to SPHR. These ECG feature extractions are explained in Algorithm 3 ( 1 MFCC [47][48][49]), which is provided in the supplementary materials section.

Wavelet Packet Entropy (WPE)
In WPE, the original signal is decomposed into two components-detail coefficients (DCs) and approximation coefficients (ACs)-using a wavelet decomposition tree [50] until the decomposition level is reached. Mathematically, this procedure of decomposition can be defined as in Equation (5):

ECG-Based Hybrid Feature Extraction
ECG-based features are classified into five types that detect possible heart problems and other abnormalities [46] related to SPHR. These ECG feature extractions are explained in Algorithm 3 ( 1 MFCC [47][48][49]), which is provided in the section.

Algorithm 3 ECG Feature Abstraction
Sustainability 2021, 13, x FOR PEER REVIEW 9 of 28 Figure 6. 1D plot of the dispersion entropy feature extraction for the IMU device.

ECG-Based Hybrid Feature Extraction
ECG-based features are classified into five types that detect possible heart problems and other abnormalities [46] related to SPHR. These ECG feature extractions are explained in Algorithm 3 ( 1 MFCC [47][48][49]), which is provided in the supplementary materials section.

Wavelet Packet Entropy (WPE)
In WPE, the original signal is decomposed into two components-detail coefficients (DCs) and approximation coefficients (ACs)-using a wavelet decomposition tree [50] until the decomposition level is reached. Mathematically, this procedure of decomposition can be defined as in Equation (5):

Wavelet Packet Entropy (WPE)
In WPE, the original signal is decomposed into two components-detail coefficients (DCs) and approximation coefficients (ACs)-using a wavelet decomposition tree [50] until the decomposition level is reached. Mathematically, this procedure of decomposition can be defined as in Equation (5): where h(k) and g(k) are the two filters that are used to obtain ACs/DCs, and d i,j represents the reconstruction signals at the ith level and jth node. A decomposition wavelet tree (DWT) is shown using four-level decomposition into ACs and DCs in Figure 7a, whereas a two-level wavelet packet decomposition into AC and DC is presented in Figure 7b.

P-Wave and T-Wave Detection
Pand T-wave detection features are used to extract ECG signals using a Q-wave, R-wave, and S-wave (QRS) complex and a Hamilton segmenter algorithm. According to the Hamilton segmenter algorithm, we need to apply a few rules to every cycle, which is called a QRS complex, in an ECG signal. Equations (6) and (7) explain the rules adopted from the algorithm for P-wave θ P and T-wave θ T detection: where h(x) represents the height of the peak detected, and ω(x) represents the width of the peak. By using these formulas, we have developed an algorithm, which is presented in Algorithm 3. The samples of the finding of P and T waves from two different activities, like jogging and sitting, are given below in Figure 8. After discovering the QRS complex for each ECG cycle in Figure 8a, the red squares denote the T wave detection, whereas the green triangles represent the P-wave detection for the jogging activity. In Figure 8b, the black triangles symbolize P waves, and the green squares represent T-wave detection for the lying down activity.

Mel-Frequency Cepstral Coefficients
During the MFCC coefficient generation, we initially pre-processed the ECG signal by applying pre-emphasis with α = 0.97. With an analysis frame duration of 3000 ms and a frame shift of 10 ms, the signal is then windowed using hamming and N as 256. Next, in order to take the discrete Fourier transform of the frame, Equation (8) is used, where h(n) is a N sample long analysis window, K is the length of DFT, and s i (n) is the periodogram-based power spectral estimate for the frame, which is formulated as: Meanwhile, Mel filtering, a Natural Logarithm, and DCT are applied (See Figure 9), with the number of Mel filter-bank channels being 20, the number of cepstral coefficients being 12, and the liftering parameter being 22. The filter-banks are created using Equation (9), where m is the number of filters and (f) is the list of m + 2 Mel-spaced frequencies: However, in order to calculate the 12 cepstral coefficients, Equation (10) is used, where d t is the coefficient from the t frame, and a typical value for N is 2. Figure 10 represents a few outcomes of MFCC for different activities.

R-Point Detection and R-R Interval
The R-point is the top peak in a QRS complex [51]; therefore, we extracted the R-points first using Equation (11), where h(x) is the minimum height peak of a specific signal, and ω(x) are the width limitations for R peaks. Then, the model calculated the difference between two consecutive R-points in the same window. Such differences provide the R-R Intervals in each window, and have a maximum of 3 R peaks. Here, we have extracted three R-points from each window in order to ensure consistency in the feature extraction and to avoid bias towards a particular activity. In Figure 11a, after finding a QRS complex, the R-points are shown using blue circles, and R-R Intervals are detected and presented in Figure 11b using a scatter plot.

EMG-Based Hybrid Feature Extraction
EMG is a process that is used to record and assess the electrical activity formed by skeletal muscles. For the EMG feature abstraction process, we used entropy-based features, which include a nonlinear dynamic parameter [52] for the measurement of signal complexity. We used the fuzzy entropy, approximate entropy, and Renyi entropy of orders 2 and 3. Algorithm 4 ( 1 Fuzzy Entropy [53,54]; 2 Approximate Entropy [55]; 3 Renyi Entropy [56]) in the section explains the implementation of all three types of entropies for the EMG signal.

Algorithm 4 EMG Feature Abstraction
EMG is a process that is used to record and assess the electrical activity formed by skeletal muscles. For the EMG feature abstraction process, we used entropy-based features, which include a nonlinear dynamic parameter [52] for the measurement of signal complexity. We used the fuzzy entropy, approximate entropy, and Renyi entropy of orders 2 and 3. Algorithm 4 ( 1 Fuzzy Entropy [53,54]; 2 Approximate Entropy [55]; 3 Renyi Entropy [56]) in the supplementary materials section explains the implementation of all three types of entropies for the EMG signal.
where, in Equations (12)- (14), m is the consecutive vector sequence, n is the gradient, r is the width of the boundary of the exponential function, N is the sample time series, and D m ij is the degree of similarity. Following, we used different values for n and r, which leads to a decrease in the standard deviation. Here, we selected r = 0.24 and n = 0.2 for all of the windows of both ECG leads in the HuGaDB dataset, as shown in Figure 12.
where, in Equations (12)- (14), m is the consecutive vector sequence, n is the gradient, r is the width of the boundary of the exponential function, N is the sample time series, and is the degree of similarity. Following, we used different values for n and r, which leads to a decrease in the standard deviation. Here, we selected r = 0.24 and n = 0.2 for all of the windows of both ECG leads in the HuGaDB dataset, as shown in Figure 12.

Approximate Entropy
During approximate entropy, we measure the randomness of a series of data without any previous knowledge [57] about the dataset. Equations (15) and (16) show the inner concept of the calculation of approximate entropy, where m is the embedding dimensions and r is the noise filter. We used m = 2 and r = 2.0 for our data. Figure 13 shows the approximate entropy calculated for the EMG leads using the above-mentioned parameters:

Approximate Entropy
During approximate entropy, we measure the randomness of a series of data without any previous knowledge [57] about the dataset. Equations (15) and (16) show the inner concept of the calculation of approximate entropy, where m is the embedding dimensions and r is the noise filter. We used m = 2 and r = 2.0 for our data. Figure 13 shows the approximate entropy calculated for the EMG leads using the above-mentioned parameters: ApEntro(m, r, N) = ϕ m (r) − ϕ m+1 (r) . (16) Figure 12. Fuzzy Entropy features extracted for EMG lead 1 and lead 2 for the HuGaDB dataset.

Approximate Entropy
During approximate entropy, we measure the randomness of a series of data without any previous knowledge [57] about the dataset. Equations (15) and (16) show the inner concept of the calculation of approximate entropy, where m is the embedding dimensions and r is the noise filter. We used m = 2 and r = 2.0 for our data. Figure 13 shows the approximate entropy calculated for the EMG leads using the above-mentioned parameters: Figure 13. Approximate Entropy feature Extraction using r = 2.0 and m = 2. Figure 13. Approximate Entropy feature Extraction using r = 2.0 and m = 2.

Renyi Entropy Order 2 and Order 3
Renyi entropy is the generalization of Shannon's entropy explained in Equation (18), which preserves the additivity of statistically-independent systems [58,59], and is commonly used for the analysis of biosignals [60]. Equation (17) presents the formula for the calculation of the Renyi entropy for order α, where s is the signal sample values, α is the order = 2, 3, . . . , M is the finite number of possible values from s, and p is the probability of each s. Figure 14 shows the Renyi entropy of α = 2 and α = 3 for the EMG signal leads.

Feature-to-Feature Fusion
After the separate extraction of the IMU, ECG, and EMG, the model proposes to fuse the hybrid features for each sensor type together, as described in Equations (19)-(21); Furthermore, in order to obtain more complete global information, the fused features from all three sensors will again be merged together based on time. This type of data fusion is also known as feature in-feature out, where the input and output of the fusion show both features, as shown in Figure 15. Equation (22) shows the formula to fuse the hybrid features from each sensor: Figure 15. Proposed feature-to-feature fusion concept.

Feature Reduction: Modified Multi-Layer Sequential Forward Selection
In the feature reduction phase, we eliminate unnecessary features based on a search strategy and an objective function. In search strategies, the algorithms are further categorized into sequential algorithms and randomized algorithms. Similarly, the objective functions are also categorized into filters and wrappers [61]. Dimension reduction not only helps to obtain better results for classification; it can also be used to find those features which act as the best predictors. Here, we proposed a unique algorithm for the feature reduction, designated as modified multi-layer sequential forward selection.
Whitney's implementation for sequential forward selection (SFS) has been used by many data scientists, and is based on the formula given in Equation (23), where S d is the feature set of size d, D is the dataset values, and M is the classification model used as KNN. Equation (24) explains how to maintain the monotonicity condition in two subsets of the feature set S d while J is the condition.
The outdated SFS selected feature sets using a single layer. The MLSFS preserves the features of a signal until the correlation rates for all of the features are established. Furthermore, MLSFS will select the most correlated features captured from the well-defined correlation rates. It achieved better accuracy in feature reduction, and it is presented in Algorithm 5, which is provided in the section.
The outdated SFS selected feature sets using a single layer. The MLSFS preserves the features of a signal until the correlation rates for all of the features are established. Furthermore, MLSFS will select the most correlated features captured from the well-defined correlation rates. It achieved better accuracy in feature reduction, and it is presented in Algorithm 5, which is provided in the supplementary materials section.

Codebook Generation
In order to encode the resultant fused features, a codebook known as a Gaussian mixture model is used. It is a widely accepted method for representing complex information and feature matching [62] based on an expectation maximization (EM) algorithm. The EM algorithm estimates the unknown parameter sets Θ of probabilistic weights, and helps to find the maximum likelihood function by giving an initial parameter set Θ 1 and continuing to apply E and M steps. Then, the EM algorithm generates a sequence {Θ 1 , Θ 2 , …, Θ m , …} and considers both E and M steps, as in Equations (25) and (26): where, , Θ ) presents the probability of the jth sample with the kth Gaussian element at the mth iterations along weights , means , and covariance ∑ values.

Codebook Generation
In order to encode the resultant fused features, a codebook known as a Gaussian mixture model is used. It is a widely accepted method for representing complex information and feature matching [62] based on an expectation maximization (EM) algorithm. The EM algorithm estimates the unknown parameter sets Θ of probabilistic weights, and helps to find the maximum likelihood function by giving an initial parameter set Θ 1 and continuing to apply E and M steps. Then, the EM algorithm generates a sequence {Θ 1 , Θ 2 , . . . , Θ m , . . . } and considers both E and M steps, as in Equations (25) and (26): where, γ m (z j k x j , Θ m ) presents the probability of the jth sample with the kth Gaussian element at the mth iterations along weights ω m k , means µ m k , and covariance ∑ m k values. Similarly, Gaussian mixture regression provides a way of extracting a single generalized signal from the set of features given. Hence, we can clearly retrieve an analytically smooth signal through regression by encoding the temporal signal features [63] into a mixture of Gaussians. This technique takes each vector of the signals' GMM as an input of x I and finds the output x O using GMR.
Finally, GMR is considered to provide better results compared to other stochastic approaches because it gives a fast and logical means to restructure the 'best' sequence from a Gaussian model. Figure 16 provides a glimpse of GMM-GMR encoded vectors for the HuGaDB and mHealth datasets.
Gaussians. This technique takes each vector of the signals' GMM as an input of x I and finds the output x O using GMR.
Finally, GMR is considered to provide better results compared to other stochastic approaches because it gives a fast and logical means to restructure the 'best' sequence from a Gaussian model. Figure 16 provides a glimpse of GMM-GMR encoded vectors for the HuGaDB and mHealth datasets.

Deep Belief Network Implementation Using RBMs
DBNs are multi-layered probabilistic models [64] which consist of multi-parameters for model learning. Each layer contains simple undirected graphs called RBMs. RBM layers are of two types, which are hidden layers and visible layers. The visible layer is the bottom layer, and hidden layers are the top layers. Figure 17 explains the workings of the hidden and visible layers of RBMs. Hidden layers model the probability distribution of the visible variables, and are fully bidirectionally connected with symmetric weights. In RBMs, the layers are not interconnected. The hierarchical processing of stacked RBMs can be used to create a DBN model (See Figure 17). An RBM encodes the joint probability distribution via the energy function, as in Equation (27), in which v is the visible data, h is the hidden data, w is the weight, and = (w, b (v) , b (h) ). We can write the encoded joint probability as in Equation (28) These rules are derived to update the initial states, such that every update gives a lower energy state and ultimately settles into equilibrium. Here, in Equations (29) and (30), σ(x) = 1/(1 + exp(−x)), where the sigmoid function is observed as:

Deep Belief Network Implementation Using RBMs
DBNs are multi-layered probabilistic models [64] which consist of multi-parameters for model learning. Each layer contains simple undirected graphs called RBMs. RBM layers are of two types, which are hidden layers and visible layers. The visible layer is the bottom layer, and hidden layers are the top layers. Figure 17 explains the workings of the hidden and visible layers of RBMs. Hidden layers model the probability distribution of the visible variables, and are fully bidirectionally connected with symmetric weights. In RBMs, the layers are not interconnected. The hierarchical processing of stacked RBMs can be used to create a DBN model (See Figure 17). An RBM encodes the joint probability distribution via the energy function, as in Equation (27), in which v is the visible data, h is the hidden data, w is the weight, and θ = (w, b (v) , b (h) ). We can write the encoded joint probability as in Equation (28): These rules are derived to update the initial states, such that every update gives a lower energy state and ultimately settles into equilibrium. Here, in Equations (29) and (30), σ(x) = 1/(1 + exp(−x)), where the sigmoid function is observed as: In order to train the RBMs, the visible layer is provided with the input data. Here, the learning is to adapt the parameter θ such that the probability distribution in Equation (28) becomes maximally similar to the true values, which means that it will maximize the log-likelihood of each generation of the observed data. A contrastive divergence (CD) algorithm samples the new values for all of the hidden layers in parallel with the current input in order to give a complete sample (v data , h data ). Furthermore, it generates a sample for the visible layer, and then samples the hidden layer again. Then, we obtain the sample from the model as (v model , h model ). The weights can be updated according to Equation (31).
Sustainability 2021, 13, x FOR PEER REVIEW 17 of 28 In order to train the RBMs, the visible layer is provided with the input data. Here, the learning is to adapt the parameter θ such that the probability distribution in Equation (28) becomes maximally similar to the true values, which means that it will maximize the log-likelihood of each generation of the observed data. A contrastive divergence (CD) algorithm samples the new values for all of the hidden layers in parallel with the current input in order to give a complete sample (vdata, hdata). Furthermore, it generates a sample for the visible layer, and then samples the hidden layer again. Then, we obtain the sample from the model as (vmodel, hmodel). The weights can be updated according to Equation (31). Figure 17. Architecture of DBN using RBMs.

Experimental Performance
In order to evaluate the accomplishment of the DBN classifier [65] for human activity recognition, this paper considered using accuracy, sensitivity, specificity, precision, recall, F-measure and misclassification scores as the performance measures. The accurate classification of the SPHR is called accuracy [66], as expressed in Equation (32). In Equations

Experimental Performance
In order to evaluate the accomplishment of the DBN classifier [65] for human activity recognition, this paper considered using accuracy, sensitivity, specificity, precision, recall, F-measure and misclassification scores as the performance measures. The accurate classification of the SPHR is called accuracy [66], as expressed in Equation (32). In Equations (32)-(36), TN, TP, FN, and FP represent true negative, true positive, false negative, and false positive, respectively.
Sensitivity measures the proportion of actual positives that are correctly identified, and this is called the true positive rate (TPR). Equation (33) describes the formula used to calculate the sensitivity.
Specificity is defined as the measure of the proportion of negatives that are correctly identified. Equation (34) gives us the formula to measure the specificity, given TN and FP.
Precision is the proportion of true positives correctly identified from total positives. Equation (35) describes the formula for the calculation of precision.
Recall is the proportion of all true positives out of true positives and false negatives. Equation (36) tells us the formula for recall.
where n represents all classes for classification. The F-measure is a method to combine precision and recall together into a single measure that captures the quality of both performance measures. The misclassification rate can be calculated from the accuracy: Misclassi f ication rate = 1 − accuracy (38)

Datasets Description
In order to appraise the testing/training abilities of our proposed model, we used two public benchmark datasets, i.e., the mHealth dataset [67] from the UCI Machine Learning repository, and the HuGaDB dataset [68] from the GitHub repository.
In the mHealth dataset, there are a total of 12 activities with 24 attributes each. It uses 21 attributes for IMU sensors on the chest, left ankle and right arm, two attributes for the ECG sensor, and one attribute for the labels describing the activity performed. The dataset represents 10 subjects and locomotion activities: standing still, sitting and relaxing, lying down, walking, climbing stairs, waist bending forward, the frontal elevation of arms, knees bending (crouching), cycling, jogging, running, and jumping back and forth. Each subject had all of the above-mentioned sensors attached, with a frequency of 50 Hz.
The second dataset used to evaluate performance was a human gait database. It consists of 12 activities and 39 attributes for each activity. For IMU, there are 36 attributes; for EMG, there are two attributes; and the last attribute is for the activity label. This dataset was collected for 18 subjects with repeated activities. The activities were walking, running, going up, going down, sitting, sitting down, standing up, standing, bicycling, going up by elevator, going down by elevator, and sitting in car. Six IMUs and two EMG sensors were attached to each subject, and a sample rate of 1000 Hz was used.
In our work, the data from all of the subjects is separated with respect to the sensors' nature, and then preprocessed to remove noise. Finally, the signals were split into windows of 5 s each, with 12 overlapping values. We used the 'leave-one-subject-out' (LOSO) [69] cross-validation technique for the training and testing.

Results Evaluations
The experiment was performed with a laptop, with the specification of the CPU being i7-8550 and the RAM being 24 GB, and a NVIDIA GeForce GTX GPU 2 GB. The programming tool was MATLAB, with multiple frameworks available in the tool and online. For efficient results, the sample data from HuGaDB was sent to the Gaussian mixture models in batches of half the sample length for the walking activity. The model used deep belief network with RBMs of four layers in order to minimize reconstruction errors and set the number of training samples according to the cross-validation. RBMs use CD as a sampling method type. The learning rate for each RBM was set to 0.05, and the model uses discriminative RBMs, as explained in Figure 18.

Results Evaluations
The experiment was performed with a laptop, with the specification of the CPU being i7-8550 and the RAM being 24 GB, and a NVIDIA GeForce GTX GPU 2GB. The programming tool was MATLAB, with multiple frameworks available in the tool and online. For efficient results, the sample data from HuGaDB was sent to the Gaussian mixture models in batches of half the sample length for the walking activity. The model used deep belief network with RBMs of four layers in order to minimize reconstruction errors and set the number of training samples according to the cross-validation. RBMs use CD as a sampling method type. The learning rate for each RBM was set to 0.05, and the model uses discriminative RBMs, as explained in Figure 18. In the first layer of the RBMs, we set the number of nodes to the number of input variables. The second, third, fourth, and fifth RBM layers had 500, 500, 500, and 1000 nodes. All of the training and testing sample sets from the cross-validation were looked into one after the other in order to see which set performed best. By training and testing the test set, the classification confusion matrix was produced for the mHealth dataset as in Table 1; for the HuGaDB dataset, see Table 2.  In the first layer of the RBMs, we set the number of nodes to the number of input variables. The second, third, fourth, and fifth RBM layers had 500, 500, 500, and 1000 nodes. All of the training and testing sample sets from the cross-validation were looked into one after the other in order to see which set performed best. By training and testing the test set, the classification confusion matrix was produced for the mHealth dataset as in Table 1; for the HuGaDB dataset, see Table 2.    Activities  H1  H2  H3  H4  H5  H6  H7  H8  H9 H10 H11 H12   H1  10  1  0  0  0  0  0  0  0  0  0  0  H2  0  9  It can be observed from Figure 19 column 1 that there are activities with high altitudes of signal closeness to each other, i.e., standing still, sitting, and lying down. Similarly, the walking, running and jogging signals bear a resemblance to each other in column 2. It is important to notice that our proposed model is able to distinguish between such activities with decent accuracy rates of 93.33% for the mHealth dataset and 92.50% for the HuGaDB datasets. Comparisons of the sensitivity and specificity are given in Table 3 for the mHealth  dataset and Table 4 for the HuGaDB dataset. Table 5 shows the precision, recall, and F-measure for each activity for both datasets.  A comparison between the different layers of the RBMs using the time and number of iterations is presented in Table 6. Parameter tuning [70] is an important step of DBN. Hence, a batch size of 15 samples was used; the weight cost for each node was set to 0.0002, and a maximum of seven epochs for each layer was proposed as the list of parameters [71] being tuned. The reconstruction error for each layer decreases as the RBM moves towards the next layer. The time in seconds is given, and the number of nodes can also be observed. Table 6. Comparisons of the RBM layers in the deep belief network for the mHealth and HuGaDB datasets.

Dataset
No. of RBMs- Table 7 presents the comparative study results using the accuracies for the proposed model and other statistically-well-known classifiers and methodologies i.e., random forest, artificial neural network, ensemble algorithms, Adam based optimization, decision trees, SVM, kNN, and Hampel Estimated. The overall results show that the proposed model achieved better classification results using a deep belief network and discriminative RBMs, which shows a novel contribution for SPHR. The proposed HF-SPHR model has to be assessed and adjusted according to the following challenges:

•
In its actual implementation, pattern recognition challenges were faced while the same activity was performed by different individuals.
• Wearable-sensors-based architectures are susceptible to placement changes and other locomotion activities. We used other state-of-the-art classifier techniques like random forest and AdaBoost for comparison with the proposed DBN and RBM model. Table 8 shows that DBN significantly outperforms other classifiers with regard to its accuracy rate.

Discussion
This paper proposed a robust sustainable system with consistency across different challenging datasets; because elderly and disabled individuals [82] stay indoors, two indoor activity-based datasets were used for stability. The proposed HF-SPHR system produced a good quality performance with both datasets, handling problems of varying human activities and a variety of signal shapes due to the incorporation of multiple types of sensors. The actions performed in both datasets are complex, because the movements involved in performing most of the activities are quite similar-namely, jogging, running, walking and standing, sitting, lying-down-as described in Figure 19. However, HF-SPHR remained composed and reliable in recognizing and distinguishing between similar actions due to the robust hybrid features. The proposed system showed high accuracy, specificity, precision, recall, and F-measure rates.
The ECG cycle extraction was challenging due to the similarity in actions like lyingdown and sitting. In the feature extraction phase, the QRS complex was identified successfully using a few important ECG peak rules, followed by the extraction of the P wave, T wave, R wave, and R-R Intervals as features of the ECG signals. However, the similarity between some actions caused QRS complex cycles to overlap more significantly with each other in a few instances. For example, in classes such as jogging and running, the QRS complex cycles overlapped at some points. As such, the performance of such actions was been compromised due to the overlapping of the QRS complexes. However, our system offered different domains' features-namely, WPE and MFCC in hybrid form-to keep the performance at a high level.

Conclusions
This paper proposed a robust model for sustainable physical healthcare Pattern Recognition with hybrid feature manipulation and Gaussian mixture models. It also suggested the application of a deep belief network classifier with discriminative RBMs, which automatically extracts features and also reduces the dependence on domain experts. This model achieved excellent recognition results. HF-SPHR can also serve the purpose of a deep learning model that can efficiently and sustainably recognize activities. By introducing the structure of MFCC, entropy and other features, HF-SPHR effectively extracts the raw data from different sensors more comprehensively, extracts more relevant features, and increases the diversity of the feature sets. The experiments also revealed the influence of the HF-SPHR model in terms of accuracy, sensitivity, specificity, precision, recall, and the F-measure. HF-SPHR helped in constructing an ideal human behavior recognition model. It is worth mentioning that the proposed HF-SPHR technique recognized static activities with lower accuracies compared to dynamic activities where further improvements are necessary. It will be of interest to see how the model performs for complex activities.