Human Activity Recognition Based on Two-Channel Residual–GRU–ECA Module with Two Types of Sensors

Xun Wang; Jie Shang

doi:10.3390/electronics12071622

and

¹

Faculty of Electrical Engineering & Computer Science, Ningbo University, Ningbo 315040, China

²

Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315040, China

^*

Author to whom correspondence should be addressed.

Electronics2023, 12(7), 1622;https://doi.org/10.3390/electronics12071622

Version Notes

Order Reprints

Abstract

With the thriving development of sensor technology and pervasive computing, sensor-based human activity recognition (HAR) has become more and more widely used in healthcare, sports, health monitoring, and human interaction with smart devices. Inertial sensors were one of the most commonly used sensors in HAR. In recent years, the demand for comfort and flexibility in wearable devices has gradually increased, and with the continuous development and advancement of flexible electronics technology, attempts to incorporate stretch sensors into HAR have begun. In this paper, we propose a two-channel network model based on residual blocks, an efficient channel attention module (ECA), and a gated recurrent unit (GRU) that is capable of the long-term sequence modeling of data, efficiently extracting spatial–temporal features, and performing activity classification. A dataset named IS-Data was designed and collected from six subjects wearing stretch sensors and inertial sensors while performing six daily activities. We conducted experiments using IS-Data and a public dataset called w-HAR to validate the feasibility of using stretch sensors in human action recognition and to investigate the effectiveness of combining flexible and inertial data in human activity recognition, and our proposed method showed superior performance and good generalization performance when compared with the state-of-the-art methods.

Keywords:

human activity recognition; residual block; efficient channel attention module; stretch sensors; GRU

1. Introduction

Human activity recognition (HAR) is the process of extracting and capturing effective action information features from behavioral data produced by the human body, learning and understanding the actions performed by humans, and further determining the type of action. HAR combines knowledge from machine learning, intelligent biotechnology, smart wearable computing technology, computer vision, and many other disciplines. It plays a crucial role in recognizing a user’s interactions with his or her surroundings. HAR has therefore received a great deal of attention in applications such as home automation systems, medical rehabilitation, motion monitoring, fall prevention, and maladaptive behavior recognition [,]. For instance, HAR can be used to monitor the daily activities of elderly people who live alone, observing whether elderly individuals experience a fall so that they can seek help from their families in time, or even preventing such a fall []. It can also be used to collect data on the movement status of exercisers for real-time scientific exercise management [], to analyze the limb movement status of patients with neurological disorders for diagnosis and treatment, or, according to an individual patient’s movement data, to formulate and adjust the rehabilitation plans in the rehabilitation phase [].

HAR can be broadly classified into video-based and sensor-based activity recognition [].

The camera-based method extracts the human activity features from images and video streams by placing cameras in the human surroundings. Although this method allows for a more intuitive sense of the various details of the action, it is very restricted in its use to specific scenarios and cannot be used in unstructured scenarios as the camera has high requirements for external conditions, such as weather conditions, lighting conditions, and viewpoint orientation, when collecting data, and the clothing and height of the human targets will differ []. In addition, the presence of cameras may be considered invasive to the privacy of the user, but also increases the overall computational cost due to the sheer volume of data.

The sensor-based method uses multimodal information that is obtained using various wearable sensors to identify, interpret, and evaluate human activities. Initially, sensor technology was mostly used to analyze human gait and joint kinematics to assist in medical diagnosis and rehabilitation monitoring. Sensor technology has since made significant breakthroughs in several key performance indicators, such as better accuracy, smaller size, and lower manufacturing costs []. These advantages have led to a wider range of sensor applications. Compared with the equipment needed for the vision-based method, sensors are portable, lightweight, and can easily be integrated with other devices. They enable the continuous sensing of movements during daily activities without limiting the user’s movement behavior, and they improve the way that users interact with each other and with their surroundings.

Inertial and flexible sensors are commonly used wearable sensors, and inertial sensors have been successfully applied to HAR. However, because inertial sensors are mostly made of rigid materials with poor ductility, wearable devices incorporating inertial sensors are bulky and have poor skin fit leading to comfort problems, and their detection accuracy is greatly affected by the speed of movement []. Researchers have turned their attention to flexible sensors, which are less affected by motion speed and are made of thin, light, and soft materials [,]. The most prominent of these flexible sensors are stretch sensors, which can be sewn onto clothing to obtain data on bending or straightening at joints, or on breathing and heartbeat. This paper focuses on HAR based on inertial and flexible sensors.

Traditional approaches to activity recognition primarily use machine learning methods to manually extract features from sensor data, and these are typically statistical or structural features, including means, medians, standard deviations, etc., and domain expertise is often required to obtain the most relevant set of manual features. While these types of handcrafted features perform well in limited training data scenarios, the extraction of effective features becomes very complicated and difficult as the number of sensors increases.

These problems can be solved by deep learning algorithms with convolutional neural networks (CNNs), recurrent neural networks (RNNs), deep belief networks, and autoencoders. CNNs excel in local feature extraction [], but CNNs do not consider pattern sequences or remember changes in pattern sequences over time based on the lengths of gaps between them []. On the other hand, RNNs have the advantage of being able to effectively capture time series data features, though traditional RNNs suffer from the vanishing gradient problem and lack the ability to capture long-term dependencies []. Therefore, to overcome the above challenges, we propose a two-channel network model based on residual blocks, an efficient channel attention module (ECA), and a gated recurrent unit (GRU).

The main contributions of this paper are as follows:

A sensor-based human activity recognition framework which consists of inertial sensors and stretch sensors is proposed to investigate the effectiveness of different combinations of sensor data in human activity recognition and to verify the feasibility of flexible stretch sensors in human action recognition.
A two-channel network model based on residual blocks, an efficient channel attention module (ECA), and a gated recurrent unit (GRU), and which is capable of modelling long-term sequences of data, effectively extracting spatial–temporal features, and performing activity classification, is proposed.
The IS-Data dataset was produced and data were collected using self-developed stretching sensors and inertial sensors. The proposed model was tested on the IS-Data dataset and the publicly available w-HAR dataset, and in both cases the results outperformed the existing solutions.

The remainder of this paper is organized as follows: Section 2 introduces an overview of related work on HAR and deep learning approaches. The proposed methodology, including the experimental framework and model structure, is described in Section 3. Section 4 introduces the dataset, experimental content, evaluation metrics, ablation studies, and analysis of experimental results. Finally, the conclusion of the study is discussed in Section 5.

2. Related Work

In this section, we review background work related to sensor-based human activity recognition as well as related methods of deep learning.

2.1. Human Activity Recognition Based on Wearable Sensors

Sensor-based human activity recognition is a field that focuses on recognizing and analyzing what an individual person is doing based on sensor data. By analyzing input data collected from sensors in smart wearable devices attached to the human body, HAR can assist intelligent computer systems to better understand and analyze human behavior, resulting in better assistance services to improve their quality of life. With the development of sensor technology, sensor-based HAR has received extensive attention from the academic community. Much of the current work on human activity recognition uses devices such as inertial sensors (e.g., accelerometers and gyroscopes), flexible sensors, and pressure sensors to collect data and extract features from these data for recognition and classification.

For example, Margarito et al. [] put accelerometers on the wrists of subjects to collect acceleration data and then used a template matching algorithm to classify eight common sports activities. Tan-Hsu Tan et al. [] proposed an ensemble learning algorithm (ELA) to perform activity recognition using the signals recorded by smartphone sensors. Qin et al. [] encoded inertial sensor measurements into an image and then used a residual network (ResNet) to learn HA features from encoded heterogeneous HA data. Cha et al. [] introduced a gait recognition system that combines a flexible sensor and a garment. The system embeds the flexible sensor into the garment and not only obtains good gait recognition results, but also distinguishes between standing and sitting postures.

Some researchers have used multiple sensors in combination to achieve better recognition. Dauphin et al. [] used acceleration data and orientation data (azimuth, pitch, and roll) to classify simple human activities and complex human activities in one model. Klaassen et al. [] proposed a whole-body sensing system consisting of inertial sensors, stretch sensors, and EMG sensors for monitoring stroke patients in their home environments. Chand et al. [] used a stretchable soft robotic sensor to detect ankle joint kinematics during slip disturbances. Maramis et al. [] and Bhandari et al. [] used wrist accelerometers to distinguish between cigarette-to-mouth movement and smoking activity, using traditional methods to detect anomalies at a remote server. Cruciani et al. [] proposed a smartphone sensor-based and audio-based HAR method using a CNN. E. Uddin et al. [] fused motion data from several body sensors, including a magnetometer, an accelerometer, and an electrocardiograph. They then employed a kernel principal component analysis (PCA) to fine-tune the mined features, which were processed by RNN for efficient recognition.

2.2. Human Activity Recognition Based on Deep Learning

Classical machine learning algorithms require extensive feature engineering and domain expertise to transform raw sensor data into feature representation. deep learning techniques can largely reduce the work required for feature engineering and can learn much more high-level and meaningful features automatically by training an end-to-end neural network, they have attracted the attention of the activity recognition research community. Several studies have discussed the need for applying various deep learning methods to identify human activities, as well as their effectiveness.

In the past few years, several CNN-based methods have been proposed for the automatic extraction of features for human activity recognition. For example, Hammerla, Y et al. [] proposed DNN, CNN, and RNN approaches and performed experiments using three public datasets, concluding that a CNN-based method was most appropriate for running and walking. Coelho, Y et al. [] proposed a convolutional neural network-based human activity recognition system for classifying six activities (walking, running, walking upstairs, walking downstairs, standing, and sitting) based on accelerometer data, achieving 94.89% accuracy of and 95.78% precision. Song-Mi Lee et al. [] proposed a one-dimensional (1D) convolutional neural network (CNN)-based method that uses data collected from smartphones with three-axis accelerometers to identify human activity.

CNN can effectively capture local dependency and maintain scale invariance []. Therefore, it is suitable for discovering the relationship of the spatial dimension. However, CNN lacks the ability to capture time dependency in time series sensor data, especially in long-term time series sensor data.

Unlike CNN, which mainly exploits local correlations, RNN is designed to classify and predict the intrinsic relationships of time series data. It is widely used to process time series data []. To model long-term time series data, long short-term memory (LSTM) or gated recurrent units (GRUs) are often used with RNN. For example, Haque et al. [] presented a GRU-based model to capture the temporal context of sensor data. Yu et al. [] used a multilayer parallel LSTM network to identify human activities. The computational complexity was lower than that of CNN-based methods, but it still achieved a CNN-like performance. Okai et al. [] discussed a HAR method that used RNN with data augmentation and compared it to other deep learning algorithms such as LSTM and GRU.

As has been reported in a few works [], hybrid models combining 1D CNNs and recurrent neural networks (RNNs) have been extensively studied. As was shown in [], hybrid CNN–LSTM models offer better performance than CNN models or LSTM models alone. Some studies [,,,] have also demonstrated that the incorporation of an attention mechanism with deep learning-based architectures can achieve a significant performance improvement. For example, Sravan Kumar Challa et al. [] used a hybrid of a convolutional neural network (CNN) and a bi-directional long and short-term memory (BiLSTM). The proposed multi-branch CNN–BiLSTM network can automatically extract features from raw sensor data with minimal data pre-processing. Gao et al. [] has introduced a novel dual attention module, including channel and temporal attention, to improve the representation learning ability of a CNN model. Wang et al. [] proposed an RNN-based attention network (RAN) consisting of CNN, LSTM, and attention modules. It helps to identify and localize multiple activity types in sequentially weakly labelled activity samples collected by wearable sensors. X. Li et al. [] proposed a hybrid model with four initial blocks containing convolutions of different scales which performed well, though the model converged slowly and required a lot of time for training. Chen et al. [] proposed a deep learning model for multimodal CHA recognition using CNN and LSTM with specific sub-network architectures for different sensor data. Xu et al. [] proposed “InnoHAR”, which is a combination of an initial neural network and an RNN with different scale-based convolutional kernels for HAR. Karim et al. [] suggest sing multivariate LSTM-FCN for time series classification. A HAR dataset containing 34 other datasets obtained from different domains was used to demonstrate the performance of the proposed model. In this work, the squeeze and stimulus modules were combined to improve the performance of the proposed model. Canizo M et al. [] proposed a multi-headed CNN–RNN model with a CNN head dedicated to each sensor. The feature maps generated by the CNN head are concatenated and passed to the RNN block, and the temporal patterns are then identified in the feature maps. Nidhi Dua et al. [] proposed a neural network model using CNNs and GRUs to perform automatic feature extraction and activity classification. Abdel et al. [] proposed a dual-channel network composed of a convolutional residual network, LSTM, and an attention mechanism. The accuracy of the proposed network on WISDM reached 98.9%.

3. Materials and Methods

In this study, we propose a deep learning-based human activity recognition framework using wearable sensors. As a first step, activity data are collected using inertial sensors and stretch sensors. In the second step, the raw data are pre-processed with data segmentation, data normalization, etc. In the third step, the processed data are fed into the proposed neural network model for feature extraction and classified using a classification layer with SoftMax activation. Finally, the model performance is evaluated by standard assessment methods such as accuracy, precision, and F1 score. The overall structure of the framework is shown in Figure 1.

Figure 1. A deep learning-based human activity recognition framework using wearable sensors.

3.1. The Proposed Network Model

The structure of the proposed network model is shown in Figure 2 The proposed model consists of two main paths, one using a GRU to model the sensor data in long-term time series to extract valid temporal representation features, and one using lightweight residual blocks and efficient channel attention (ECA) modules to extract valid spatial representation features.

Figure 2. Our network model.

Finally, the temporal and spatial representations generated from the two paths are concatenated to be used for final decision making at a multilayer perceptron (MLP) with a SoftMax function for activity classification.

3.1.1. Gated Recurrent Unit

Considering the temporal dependencies between the data frames, RNNs can be used to model sensor data in long-term time series and extract effective temporal representation features. However, traditional RNNs suffer from the problem of gradient disappearance and cannot capture long-term dependencies []. We used a variant of RNN called GRU to capture long-term dependencies. Although LSTM can also solve the problem of RNN gradient vanishing, the memory units of the LSTM architecture have difficulty with increasing memory consumption and are also prone to overfitting problems []. On the other hand, GRU does not have a separate memory cell in its structure, but only has an update gate and a reset gate. The update gate is used to manage the update level of each hidden state, which determines which data need to be transferred to the next stage, and the update gate is updated according to Equations (1) and (2). The reset gate is used to determine which unwanted data can be discarded and is updated according to Equations (3) and (4).

Through these gates, the GRU can remember important information and model the temporal context of the sequence data. The GRU structure is shown in Figure 3. x_t is the current input, h_t₋₁ is the previous hidden state, z_t and r_t are the update gates and reset gates, respectively, h_t is the output of the GRU unit at timestamp t, g_t is the candidate output, h_t is updated using g_t, the update gate z_t determines when to update h_t, the reset gate r_t is used to compute the candidate new value of g_t, and it provides the correlation between h_t₋₁ and the computation of the next h_t candidate.

z_{t} = σ (W_{z} x_{t} \oplus U_{z} h_{t - 1})

(1)

r_{t} = σ (W_{r} x_{t} \oplus U_{r} h_{t - 1})

(2)

g_{t} = \tan h (W_{g} x_{t} \oplus U_{g} (r_{t} \otimes h_{t - 1}))

(3)

h_{t} = ((1 - z_{t}) \otimes h_{t - 1}) \oplus (z_{t} \otimes g_{t})

(4)

where W_z, W_r, W_g, U_z, and U_r are the weight matrix,

\oplus

is the primary addition operation, and

\otimes

is the primary multiplication operation.

Figure 3. Structure of the gated recurrent unit.

3.1.2. Efficient Channel Attention (ECA) Module

The channel attention mechanism has been proven to have great potential for improving the effectiveness of convolutional neural networks. Therefore, to overcome the tension between performance and complexity [,], we applied a more efficient attention module (EcaNet) in our construction [].

Unlike SENet, EcaNet can achieve significant performance gains by adding only a small number of parameters, avoiding the loss of information that could result from dimensionality reduction. With its appropriate cross-channel interaction significantly reducing the complexity of the model and ensuring good performance, ECA is a novel way to study the interaction between different channels. Figure 4 shows a schematic diagram of the ECA module. Given aggregated features obtained through global average pooling (GAP), ECA generates channel weights by performing a 1D convolution of size k, where k is determined adaptively through a mapping of channel dimensions C.

Figure 4. Diagram of the efficient channel attention (ECA) block.

3.1.3. Residual Blocks

The data on human activity captured with sensors is actually a time series that is highly relevant over time, and it is necessary to efficiently extract the local features of this relevance. While convolutional networks have been demonstrated in the literature to possess a powerful ability to extract and learn spatial features from time series data, deeper information leakage and gradient disappearance are still major problems. ResNet [] attempts to address this problem through linking convolutional layers with shortcut residual links to facilitate gradients through the network. Upon this we built a lightweight residual module that consisted of two convolutional layers, each of which was followed by batch normalization (BN) [] and rectified linear unit activation (ReLU), whose structure is shown in Figure 5 and can be defined as

y^{h + 1} = I (y^{h}) + F (y^{h}, ω^{h})

(5)

where F is a function known as the residual function.

ω^{h} = \{ω_{j k}^{h} | 1 \leq j \leq n, 1 \leq k \leq m\}

is a set of two-layer weights associated with h residual blocks, and

ω_{j k}^{h}

is the weight of k neurons in the first layer to j neurons in the second layer. The function I represents an independent mapping. Each block accepts an input

y^{h}

and separately produces a forward stream of output

y^{h + 1}

.

c (y^{h}) = C o n v (y^{h})

(6)

b (y^{h}) = R e L U (B N (c (y^{h})))

(7)

Figure 5. Structure of the residual block.

A basic residual module unit can be reconstructed and calculated as

y^{h + 1} = I (y^{h}) + b_{2} (b_{1} (y^{h}))

(8)

4. Experimental Results and Discussion

In order to validate the feasibility of stretch sensor data in human activity recognition, to evaluate the effectiveness of data from different sensors in human activity recognition, and to test the performance of the proposed network model, we performed experimental tests with data from inertial sensors, data from stretch sensors, and combinations of data from inertial sensors and stretch sensors in the IS-Data and w-HAR datasets, respectively. More information concerning activities recorded in the w-HAR dataset and the IS-Data dataset is shown in Table 1.

Table 1. Dataset activity lists.

4.1. Experimental Dataset

4.1.1. w-HAR Dataset

The w-HAR dataset [] was built by researchers from Washington State University. They mounted the IMU on the right ankle of the user and sewed the stretch sensor to a knee sleeve. They had 22 subjects aged 20–45 years (including 14 males and 8 females) carry out 8 activities (jumping, lying down, sitting, walking downstairs, walking upstairs, standing, walking, and transition). Acceleration and angular velocity data obtained from the IMU sensors were collected at a sampling frequency of 250 Hz, and the stretch sensor was sampled at 25 Hz. In this study, we only used six of these activities (jumping, sitting, walking downstairs, walking upstairs, standing, and walking). The IMU sensor data were down sampled from 250 Hz to 25 Hz to reduce the computational cost.

4.1.2. IS-Data Dataset

The dataset produced for this paper was obtained using a self-developed six-axis inertial sensor and a stretch sensor. As is shown in Figure 6, the stretch sensor used in this study was a capacitive sensor, where the output capacitance increased with increasing strain and decreased with decreasing strain as the sensor extended and contracted.

Figure 6. Diagram of the flexible sensor.

The stretch sensor module was sewn onto the knee brace and was connected to a circuit module. The bending angle of the sensor was calculated by measuring the resistance of the sensor, and finally the data of the angle were output.

Two stretch sensors were fitted to the knees of the participants, as shown in the figure. When wearing the device, the user just wore the knee pads and ensured that the stretch sensors were aligned with the center of the kneecap. In addition, three six-axis inertial sensors were placed on the wrist, waist, and ankle, all of which used Bluetooth Low Energy (BLE) technology to collect data, using the BLE protocol, and these data were then transmitted to the smartphone and stored in a file. The dynamic range of the accelerometer and gyroscope output data were set to ±8 g and ±2000 dps. Both types of sensors used the same 50 Hz sampling frequency.

Following this, six adult participants with an average age of 26 years wore these sensors and performed the following six activities: walking, standing, sitting, stepping in place, running in place, and jumping. We required the participants to be in a state of health and comfort in order to perform these activities as naturally as possible, and the dataset was named IS-Data.

4.2. Pre-Processing

Since the data from the sensors were transmitted via a wireless Bluetooth connection, some of the data may have been lost, or they may have contained some noise during the acquisition process, so a linear interpolation algorithm was used to fill in the missing values and a Butterworth filter was used to filter the noise from the data signal. The raw data that were collected were normalized to a range of 0 to 1 using Equation (9).

A_{i}^{n} = \frac{A_{i} - a_{i}^{m i n}}{a_{i}^{m a x} - a_{i}^{m i n}}, i = 1,2, \dots

(9)

where

A_{i}^{n}

denotes the normalized data, n denotes the number of channels, and

a_{i}^{m a x}

and

a_{i}^{m i n}

are the maximum and minimum values of the i-th channel, respectively.

The processed data were then segmented using a sliding window with an L size of 2.5 s and an overlap of 50%, where the sample data

F_{W}

were represented as

F_{W} = (a_{x} a_{y} a_{z} g_{x} g_{y} g_{z} f_{1}) \in R^{L \times N}

(10)

where a_x, a_y, a_z, g_x, g_y, g_z, f₁ are column vectors containing L data samples of 3-axial acceleration, 3-axial angular velocity, stretching-shrinking information, respectively. N is the number of sensors and L is the length of the sliding window. Finally, all the column vectors are joined together into window data

F_{W} \in R^{L \times N}

.

Details concerning the data preprocessing, such as the sampling rate, window size, and overlapping rate, are presented in Table 2. In the experiments, each dataset was divided into a training set (70%) and a validation set (30%). The other hyper-parameters, such as training epochs, batch size, and learning rate, are also listed Table 2.

Table 2. Summary of setup for datasets.

4.3. Experimental Environment

The network in this paper was trained on a computer equipped with an Intel Core i7 CPU, 16 GB of RAM, and a graphics processor (GPU) (NVIDIA GeForce GTX 3060 with 6 GB of memory). The algorithm was implemented using python 3.8 based on Google’s opensource deep learning framework TensorFlow 2.3.0, and the development environment used for the experiments was PyCharm on a 64-b version of Windows 11. The GPU was used to speed up the training and testing of the model.

4.4. Evaluation Metrics

In this paper, the effectiveness of the proposed model was calculated using different performance metrics [], described as follows:

Accuracy: this is defined as the fraction of samples predicted correctly to the total number of samples.

Precision: the fraction of positive samples recognized correctly out of the total number of samples recognized as positive.

Recall: the fraction of positive samples recognized correctly out of the total number of positive samples.

F1-score: this is a comprehensive estimate of the model’s accuracy and can be calculated as the harmonic mean of the precision and recall.

Confusion matrix (CM): This is a square matrix that gives the complete performance of a classification model. The rows of the CM signify instances of the true class labels, and columns signify predicted class labels. The diagonal elements of this matrix define the number of points for which the predicted label is equal to the true label.

A multiclass classification issue is represented by collection A that has n distinct class labels Bi, (i = 1, 2, 3, …, n) represented by {B1, B2, B3, …, Bn}. For this situation, the confusion matrix is an n × n matrix. Each row of the matrix corresponds to an actual instance of a class, and each column corresponds to an anticipated instance of a class. An element C_ij of the confusion matrix specifies the number of cases for which the actual class is i and the signified class is j, as shown in Figure 7.

Figure 7. Confusion matrix for a multiclass classification problem.

These are mathematically expressed as:

F 1 s c o r e (%) = 2 \times \frac{P r e c i s i o n \times R e c a l l}{(P r e c i s i o n + R e c a l l)} \times 100 %

(11)

P r e c i s i o n (%) = \frac{T P}{(T P + F P)} \times 100 %

(12)

R e c a l l (%) = \frac{T P}{(T P + F N)} \times 100 %

(13)

F 1 s c o r e (%) = 2 \times \frac{P r e c i s i o n \times R e c a l l}{(P r e c i s i o n + R e c a l l)} \times 100 %

(14)

4.5. Experiment Analysis and Performance Comparison

4.5.1. Inertial Sensor and Stretch Sensor Results

A summary of the recognition accuracy, recall, and F1 scores for the various activities recorded in the IS-Data dataset and the w-HAR dataset are shown in Table 3 and Table 4. As Table 3 and Table 4 illustrate, for the three types of activities—walking, standing, and sitting—the respective stretch sensor recognition precision values are 99.16%, 99.61%, and 100% for the IS-Data dataset, and these values are higher than the inertial sensor recognition accuracy values. The stretch sensor recognition precision values are 96.68%, 97.31%, and 98.56% for the w-HAR dataset, with little difference in classification results between these and those of the inertial sensors. For jumps, the data recognition accuracy values of the stretch sensors are 93.6% and 94.41% in the two datasets, and although these are not as high as under inertial data, they achieve a comparable recognition performance. Thus, one can see that the stretch sensor has performs very well when monitoring activities that involve more significant joint site changes and that it has good potential for differentiating lower limb activities with similar patterns.

Table 3. The recognition precision, recall, and F1 score for various activities in the IS-Data dataset (%).

Table 4. The recognition precision, recall, and F1 score for various activities in the W-HAR dataset (%).

Confusion matrices for the proposed model with respect to the IS-Data dataset and the w-HAR dataset are illustrated in Figure 8 and Figure 9. As shown in Figure 8, the stretch sensor has many misclassifications for two activities that walking in place and running in place, as there is a strong similarity between these two activities in the lower limb activities. Similarly, the recognition of upstairs and downstairs can be easily confused as shown in Figure 8. It is just not possible to distinguish well between running in place and walking in place depending on the degree of flexion of the lower limb joints and the use of inertial sensors is slightly more effective.

Figure 8. Confusion matrix obtained for inertial sensors (a) and stretch sensors (b) on the IS-Data dataset.

Figure 9. Confusion matrix obtained for inertial sensors (a) and stretch sensors (b) on the w-HAR dataset.

4.5.2. Results of Recognition of the Combination of Inertial Sensors and Stretch Sensors

For some very similar activities, the final recognition results are not very good, whether based on the inertial sensors or the stretch sensors. We therefore combined the two types of sensor data to test whether they could improve the recognition of the activities, and we compared the experimental results with data from the inertial sensors and the stretch sensors, respectively.

The confusion matrix obtained by combining the data from the two sensors is shown in Figure 10. The results obtained from the data of both sensors using the IS-Data dataset and the w-HAR dataset are shown in Table 5 and Table 6, respectively. The combined use of data from the two sensors, as described in the previous subsection, led to an improvement in the identification of similar actions. For the activity of walking in place in the IS-Data dataset, there were increases in precision of 30.42% and 1.82% compared with the stretch data and inertial data, respectively. For the activity of running in place, the precision increased by 29.15% and 3.32%, respectively. In the w-HAR dataset, respective precision increases of 15.15% and 6.06% were obtained for walking upstairs compared with the stretch data and inertial data. The precision for walking downstairs increased by 6.66%.

Figure 10. Confusion matrices obtained for the combination of inertial sensor data and stretch sensor data using the IS-Data dataset (a) and for the combination of inertial sensor data and stretch sensor data using the w-HAR dataset (b).

Table 5. The recognition precision, recall, and F1 score for various activities in the IS-Data dataset (%).

Table 6. The recognition precision, recall, and F1 score for various activities in the W-HAR dataset (%).

In Figure 11, the accuracy of the combination of inertial sensor and stretch sensor data in the IS-Data dataset is 97.51%, which is 11.27% higher than the accuracy of stretch sensor data alone, and about 1.4% better than the accuracy of the inertial sensor data alone, which had an accuracy of 96.11%. Similarly, the accuracy of the combination of inertial sensor and stretch sensor data in the w-HAR dataset is 98.24%, which is approximately 2.49% better than that of the stretch sensor data alone, and about 0.58% better than the accuracy of the inertial sensor data alone, which was 97.66%.

Figure 11. Accuracy of the three types of data in the w-HAR dataset and the IS-Data dataset.

Overall, the combination of inertial and stretch sensor data contributes to the effectiveness of activity recognition.

4.5.3. On the Performance of the Proposed Network Model

The proposed model will now be compared with state-of the-art deep learning approaches in the scope of HAR.

As is shown in Table 7, we begin by comparing it with baseline classification models (CNN, LSTM, and GRU). Next, three hybrid deep learning approaches are compared with the proposed model: CNN–GRU [], InnoHAR [], and ResNet + AttLSTM []. It should be noted that both the recognition accuracy and the F1 score of our model were better than those of the other models, and that our model achieved the highest accuracy (97.51%) and the highest F1-score (97.63%) when the stretch sensor and inertial sensor data were combined. The proposed model thus achieved the most stable performance.

Table 7. Results of comparison between baseline models using the IS-Data dataset.

4.6. Ablation Studies

4.6.1. Impact of Residual Blocks

In order to investigate the impact of the proposed addition of residual blocks on the model’s performance, we conducted ablation experiments on two datasets using combined data from the stretch and inertial sensors.

This experiment used a simple CNN without residual connections as the baseline architecture. The results are displayed in Table 8. The CNN demonstrated the lowest performance in recognition, perhaps because the capture of spatial relationships could not be carried out smoothly owing to information leakage. In contrast, we obtained better performance with the model which used the residual connection module. The F1 scores produced with the IS-Data dataset and the w-HAR dataset improved by 2.51% and 2.41%, respectively.

Table 8. Impact of residual modules.

4.6.2. Effect of GRU

For the purpose of evaluating the effectiveness of GRU in capturing long-term dependencies in time series, we performed ablation experiments on RNN, LSTM, and GRU using two datasets and a combination of data obtained from stretch sensors and inertial sensors. The results are displayed in Table 9. Simple RNNs exhibit the poorest recognition performance since the gradient disappearance problem causes simple RNNs to fail to capture long-term dependencies in time series. In comparison, the LSTM achieved better performance, with F1 scores improving by approximately 1.72% and 1.03%, respectively. The GRU performed better than the LSTM, with F1 score increases of about 2.83% and 2.06%, respectively.

Table 9. Impact of GRU.

4.6.3. Effect of the ECA Module

We wished to determine the effect of the addition of the ECA module on the accuracy of the model recognition, and it is evident from Table 10 that the F1 scores obtained with the ECA module are slightly higher for both datasets (by 0.83% and 0.59%).

Table 10. Impact of the ECA module.

In summary, the proposed two-channel network model based on residual blocks, efficient channel attention (ECA) modules, and gate control recurrent units (GRUs) effectively extracts optimal features from the sensor data. In comparison with other state-of-the-art HAR methods, much higher F1 scores are obtained for the IS-Data datasets and w-HAR datasets.

5. Conclusions and Discussion

In recent years, most sensor-based HAR methods have used inertial sensors, but with the development of flexible sensors, and considering the comfortable nature of wearable sensors and the lack of accuracy of inertial sensors themselves, this may soon change. This paper evaluates the feasibility of using deep learning algorithms for the application of stretch sensors in human activity recognition with respect to the combination of inertial sensors and stretch sensors. It has been demonstrated that stretch sensors can identify certain activities that involve the use of human joints more accurately than inertial sensors. Combining the advantages of two types of sensors will not only improve the comfort and freedom of traditional wearable sensors, but will also improve the accuracy of recognition.

We propose a two-channel network model based on residual blocks, efficient channel attention (ECA) modules, and gate control recurrent units (GRUs). Using residual blocks and GRUs enables the modelling of long-term time series data and the effective extraction of spatial–temporal features, which helps the model learn discriminative features. For further efficient learning, the use of ECA modules with appropriate cross-channel interaction can significantly reduce the complexity of the model and guarantee the performance of the classification of activities. Our experiments on the self-collected IS-Data dataset and a public dataset called w-HAR demonstrated that our proposed model achieves better accuracy.

Although we have only briefly evaluated the feasibility of stretch sensors in human activity recognition with a small amount of human joint flexion data that we collected from stretch sensors, this is enough to demonstrate that stretch sensors hold unique advantages and may soon become a novel and popular choice for multi-modal human activity recognition in this research area.

Author Contributions

Conceptualization, X.W.; model analysis, X.W.; methodology, X.W.; validation, X.W.; investigation, X.W.; resources, J.S.; data curation, J.S.; writing—original draft preparation, X.W.; writing—review and editing, X.W.; data visualization and graphic improvement, J.S.; supervision, J.S.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, S.; Li, Y.; Zhang, S.; Shahabi, F.; Xia, S.; Deng, Y.; Alshurafa, N. Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances. Sensors 2022, 22, 1476. [Google Scholar] [CrossRef]
Toor, A.A.; Usman, M.; Younas, F.; M. Fong, A.C.; Khan, S.A.; Fong, S. Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems. Sensors 2020, 20, 2131. [Google Scholar] [CrossRef]
Oikonomou, K.M.; Kansizoglou, I.; Manaveli, P.; Grekidis, A.; Menychtas, D.; Aggelousis, N.; Sirakoulis, G.C.; Gasteratos, A. Joint-Aware Action Recognition for Ambient Assisted Living. In Proceedings of the 2022 IEEE International Conference on Imaging Systems and Techniques (IST), Kaohsiung, Taiwan, 21–23 June 2022; pp. 1–6. [Google Scholar]
Liu, G.; Ho, C.; Slappey, N.; Zhou, Z.; Snelgrove, S.E.; Brown, M.; Grabinski, A.; Guo, X.; Chen, Y.; Miller, K.; et al. A wearable conductivity sensor for wireless real-time sweat monitoring. Sens. Actuators B Chem. 2016, 227, 35–42. [Google Scholar] [CrossRef]
Yan, H.; Hu, B.; Chen, G.; Zhengyuan, E. Real-Time Continuous Human Rehabilitation Action Recognition Using OpenPose and FCN. In Proceedings of the 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Shenzhen, China, 24–26 April 2020; pp. 239–242. [Google Scholar]
Yadav, S.K.; Tiwari, K.; Pandey, H.M.; Akbar, S.A. A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions. Knowl. Based Syst. 2021, 223, 106970. [Google Scholar] [CrossRef]
Chung, S.; Lim, J.; Noh, K.J.; Kim, G.; Jeong, H. Sensor Data Acquisition and Multimodal Sensor Fusion for Human Activity Recognition Using Deep Learning. Sensors 2019, 19, 1716. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Tian, G.; Zhang, S.; Li, C. A Knowledge-Based Approach for Multiagent Collaboration in Smart Home: From Activity Recognition to Guidance Service. IEEE Trans. Instrum. Meas. 2020, 69, 317–329. [Google Scholar] [CrossRef]
Yin, W.; Reddy, C.; Zhou, Y.; Zhang, X. A Novel Application of Flexible Inertial Sensors for Ambulatory Measurement of Gait Kinematics. IEEE Trans. Hum. Mach. Syst. 2021, 51, 346–354. [Google Scholar] [CrossRef]
Totaro, M.; Poliero, T.; Mondini, A.; Lucarotti, C.; Cairoli, G.; Ortiz, J.; Beccai, L. Soft Smart Garments for Lower Limb Joint Position Analysis. Sensors 2017, 17, 2314. [Google Scholar] [CrossRef]
Mokhlespour Esfahani, M.I.; Zobeiri, O.; Moshiri, B.; Narimani, R.; Mehravar, M.; Rashedi, E.; Parnianpour, M. Trunk Motion System (TMS) Using Printed Body Worn Sensor (BWS) via Data Fusion Approach. Sensors 2017, 17, 112. [Google Scholar] [CrossRef]
Kansizoglou, I.; Bampis, L.; Gasteratos, A. Deep Feature Space: A Geometrical Perspective. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6823–6838. [Google Scholar] [CrossRef]
Dua, N.; Singh, S.N.; Semwal, V.B.; Challa, S.K. Inception inspired CNN-GRU hybrid network for human activity recognition. Multimed. Tools Appl. 2023, 82, 5369–5403. [Google Scholar] [CrossRef]
Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
Margarito, J.; Helaoui, R.; Bianchi, A.M.; Sartor, F.; Bonomi, A.G. User-Independent Recognition of Sports Activities from a Single Wrist-Worn Accelerometer: A Template-Matching-Based Approach. IEEE Trans. Biomed. Eng. 2016, 63, 788–796. [Google Scholar] [CrossRef] [PubMed]
Tan, T.-H.; Wu, J.-Y.; Liu, S.-H.; Gochoo, M. Human Activity Recognition Using an Ensemble Learning Algorithm with Smartphone Sensor Data. Electronics 2022, 11, 322. [Google Scholar] [CrossRef]
Qin, Z.; Zhang, Y.; Meng, S.; Qin, Z.; Choo, K.-K.R. Imaging and fusing time series for wearable sensor-based human activity recognition. Inf. Fusion 2020, 53, 80–87. [Google Scholar] [CrossRef]
Cha, Y.; Kim, H.; Kim, D. Flexible Piezoelectric Sensor-Based Gait Recognition. Sensors 2018, 18, 468. [Google Scholar] [CrossRef]
Dauphin, Y.N.; Vries, H.d.; Bengio, Y. Equilibrated adaptive learning rates for non-convex optimization. In Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 1, Montreal, QC, Canada; 2015; pp. 1504–1512. [Google Scholar]
Klaassen, B.; van Beijnum, B.-J.; Weusthof, M.; Hofs, D.; van Meulen, F.; Droog, E.; Luinge, H.; Slot, L.; Tognetti, A.; Lorussi, F.; et al. A Full Body Sensing System for Monitoring Stroke Patients in a Home Environment. In Proceedings of the Biomedical Engineering Systems and Technologies; Springer: Cham, Switzerland, 2015; pp. 378–393. [Google Scholar]
Chander, H.; Stewart, E.; Saucier, D.; Nguyen, P.; Luczak, T.; Ball, J.E.; Knight, A.C.; Smith, B.K.; V Burch, R.F.; Prabhu, R.K. Closing the Wearable Gap—Part III: Use of Stretch Sensors in Detecting Ankle Joint Kinematics During Unexpected and Expected Slip and Trip Perturbations. Electronics 2019, 8, 1083. [Google Scholar] [CrossRef]
Maramis, C.; Kilintzis, V.; Scholl, P.; Chouvarda, I. Objective Smoking: Towards Smoking Detection Using Smartwatch Sensors. In Proceedings of the Precision Medicine Powered by pHealth and Connected Health; Springer: Singapore, 2018; pp. 211–215. [Google Scholar]
Bhandari, B.; Lu, J.; Zheng, X.; Rajasegarar, S.; Karmakar, C. Non-invasive sensor based automated smoking activity detection. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju, Republic of Korea, 11–15 July 2017; pp. 845–848. [Google Scholar]
Cruciani, F.; Vafeiadis, A.; Nugent, C.; Cleland, I.; McCullagh, P.; Votis, K.; Giakoumis, D.; Tzovaras, D.; Chen, L.; Hamzaoui, R. Feature learning for Human Activity Recognition using Convolutional Neural Networks. CCF Trans. Pervasive Comput. Interact. 2020, 2, 18–32. [Google Scholar] [CrossRef]
Uddin, M.Z.; Hassan, M.M.; Alsanad, A.; Savaglio, C. A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf. Fusion 2020, 55, 105–115. [Google Scholar] [CrossRef]
Hammerla, N.Y.; Halloran, S.; Plötz, T. Deep, convolutional, and recurrent models for human activity recognition using wearables. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 1533–1540. [Google Scholar]
Coelho, Y.; Rangel, L.; dos Santos, F.; Frizera-Neto, A.; Bastos-Filho, T. Human Activity Recognition Based on Convolutional Neural Network. In Proceedings of the XXVI Brazilian Congress on Biomedical Engineering; Springer: Singapore, 2019; pp. 247–252. [Google Scholar]
Song-Mi, L.; Sang Min, Y.; Heeryon, C. Human activity recognition from accelerometer data using Convolutional Neural Network. In Proceedings of the 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Republic of Korea, 13–16 February 2017; pp. 131–134. [Google Scholar]
Kim, Y.; Moon, T. Human Detection and Activity Classification Based on Micro-Doppler Signatures Using Deep Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2016, 13, 8–12. [Google Scholar] [CrossRef]
Liu, J.; Shahroudy, A.; Xu, D.; Wang, G. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. arXiv 2016, arXiv:1607.07043. [Google Scholar] [CrossRef]
Haque, M.N.; Tonmoy, M.T.H.; Mahmud, S.; Ali, A.A.; Khan, M.A.H.; Shoyaib, M. GRU-based Attention Mechanism for Human Activity Recognition. In Proceedings of the 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 3–5 May 2019; pp. 1–6. [Google Scholar]
Yu, T.; Chen, J.; Yan, N.; Liu, X. A Multi-Layer Parallel LSTM Network for Human Activity Recognition with Smartphone Sensors. In Proceedings of the 2018 10th International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, 18–20 October 2018; pp. 1–6. [Google Scholar]
Okai, J.; Paraschiakos, S.; Beekman, M.; Knobbe, A.; de Sá, C.R. Building robust models for Human Activity Recognition from raw accelerometers data using Gated Recurrent Units and Long Short Term Memory Neural Networks. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 2486–2491. [Google Scholar]
Abbaspour, S.; Fotouhi, F.; Sedaghatbaf, A.; Fotouhi, H.; Vahabi, M.; Linden, M. A Comparative Analysis of Hybrid Deep Learning Models for Human Activity Recognition. Sensors 2020, 20, 5707. [Google Scholar] [CrossRef]
Mekruksavanich, S.; Jitpattanakul, A. Biometric User Identification Based on Human Activity Recognition Using Wearable Sensors: An Experiment Using Deep Learning Models. Electronics 2021, 10, 308. [Google Scholar] [CrossRef]
Ma, H.; Li, W.; Zhang, X.; Gao, S.; Lu, S. AttnSense: Multi-level attention mechanism for multimodal human activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3109–3115. [Google Scholar]
Zheng, Z.; Shi, L.; Wang, C.; Sun, L.; Pan, G. LSTM with Uniqueness Attention for Human Activity Recognition. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2019: Image Processing; Springer: Cham, Switzerland, 2019; pp. 498–509. [Google Scholar]
Murahari, V.S.; Plötz, T. On attention models for human activity recognition. In Proceedings of the 2018 ACM International Symposium on Wearable Computers, Singapore, 8–12 October 2018; pp. 100–103. [Google Scholar]
Zeng, M.; Gao, H.; Yu, T.; Mengshoel, O.J.; Langseth, H.; Lane, I.; Liu, X. Understanding and improving recurrent networks for human activity recognition by continuous attention. In Proceedings of the 2018 ACM International Symposium on Wearable Computers, Singapore, 8–12 October 2018; pp. 56–63. [Google Scholar]
Challa, S.K.; Kumar, A.; Semwal, V.B. A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data. Vis. Comput. 2022, 38, 4095–4109. [Google Scholar] [CrossRef]
Gao, W.; Zhang, L.; Teng, Q.; He, J.; Wu, H. DanHAR: Dual Attention Network for multimodal human activity recognition using wearable sensors. Appl. Soft Comput. 2021, 111, 107728. [Google Scholar] [CrossRef]
Wang, K.; He, J.; Zhang, L. Sequential Weakly Labeled Multiactivity Localization and Recognition on Wearable Sensors Using Recurrent Attention Networks. IEEE Trans. Hum. Mach. Syst. 2021, 51, 355–364. [Google Scholar] [CrossRef]
Li, X.; Wang, Y.; Zhang, B.; Ma, J. PSDRNN: An Efficient and Effective HAR Scheme Based on Feature Extraction and Deep Learning. IEEE Trans. Ind. Inform. 2020, 16, 6703–6713. [Google Scholar] [CrossRef]
Chen, L.; Liu, X.; Peng, L.; Wu, M. Deep learning based multimodal complex human activity recognition using wearable devices. Appl. Intell. 2021, 51, 4029–4042. [Google Scholar] [CrossRef]
Xu, C.; Chai, D.; He, J.; Zhang, X.; Duan, S. InnoHAR: A Deep Neural Network for Complex Human Activity Recognition. IEEE Access 2019, 7, 9893–9902. [Google Scholar] [CrossRef]
Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate LSTM-FCNs for time series classification. Neural Netw. 2019, 116, 237–245. [Google Scholar] [CrossRef]
Canizo, M.; Triguero, I.; Conde, A.; Onieva, E. Multi-head CNN–RNN for multi-time series anomaly detection: An industrial case study. Neurocomputing 2019, 363, 246–260. [Google Scholar] [CrossRef]
Dua, N.; Singh, S.N.; Semwal, V.B. Multi-input CNN-GRU based human activity recognition using wearable sensors. Computing 2021, 103, 1461–1478. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Hawash, H.; Chakrabortty, R.K.; Ryan, M.; Elhoseny, M.; Song, H. ST-DeepHAR: Deep Learning Model for Human Activity Recognition in IoHT Applications. IEEE Internet Things J. 2021, 8, 4969–4979. [Google Scholar] [CrossRef]
Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Mazzia, V.; Angarano, S.; Salvetti, F.; Angelini, F.; Chiaberge, M. Action Transformer: A self-attention model for short-time pose-based human action recognition. Pattern Recognit. 2022, 124, 108487. [Google Scholar] [CrossRef]
Santavas, N.; Kansizoglou, I.; Bampis, L.; Karakasis, E.; Gasteratos, A. Attention! A Lightweight 2D Hand Pose Estimation Approach. IEEE Sens. J. 2021, 21, 11488–11496. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q.J. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv 2019, arXiv:1910.03151. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning—Volume 37, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
Bhat, G.; Tran, N.; Shill, H.; Ogras, U.Y. w-HAR: An Activity Recognition Dataset and Framework Using Low-Power Wearable Devices. Sensors 2020, 20, 5356. [Google Scholar] [CrossRef]
Dewangan, D.K.; Sahu, S.P.; Sairam, B.; Agrawal, A. VLDNet: Vision-based lane region detection network for intelligent vehicle system using semantic segmentation. Computing 2021, 103, 2867–2892. [Google Scholar] [CrossRef]

Figure 1. A deep learning-based human activity recognition framework using wearable sensors.

Figure 2. Our network model.

Figure 3. Structure of the gated recurrent unit.

Figure 4. Diagram of the efficient channel attention (ECA) block.

Figure 5. Structure of the residual block.

Figure 6. Diagram of the flexible sensor.

Figure 7. Confusion matrix for a multiclass classification problem.

Figure 8. Confusion matrix obtained for inertial sensors (a) and stretch sensors (b) on the IS-Data dataset.

Figure 9. Confusion matrix obtained for inertial sensors (a) and stretch sensors (b) on the w-HAR dataset.

Figure 10. Confusion matrices obtained for the combination of inertial sensor data and stretch sensor data using the IS-Data dataset (a) and for the combination of inertial sensor data and stretch sensor data using the w-HAR dataset (b).

Figure 11. Accuracy of the three types of data in the w-HAR dataset and the IS-Data dataset.

Table 1. Dataset activity lists.

Dataset	Activity	Number	Label
W-HAR	Jumping	458	J
	Lying down	474	L
	Sitting	696	S
	Standing	620	St
	Walking	2007	W
	Walking upstairs	109	SU
	Walking downstairs	99	SD
	Transition	277	T
IS-Data	Walking forward	475	HA1
	Standing	517	HA2
	Sitting	450	HA3
	Walking in place	494	HA4
	Running in place	542	HA5
	Jumping	375	HA6

Table 2. Summary of setup for datasets.

Dataset	Setting
Dataset	Numbers of Activities	Sampling Frequency (Hz)	Size of Windows	Epoch	Batch Size	Learning
w-HAR	8	25	50	200	256	1 × 10⁻⁴
IS-Data	6	50	128	200	256	5 × 10⁻⁴

Table 3. The recognition precision, recall, and F1 score for various activities in the IS-Data dataset (%).

Activity	Inertial Sensor			Stretch Sensor
Activity	Precision	Recall	F1 Score	Precision	Recall	F1 Score
Walking forward	98.32	100	99.15	99.16	95.15	97.11
Standing	99.23	97.71	98.46	99.61	95.55	97.54
Sitting	97.11	97.98	97.54	100	100	100
Walking in place	94.33	88.43	91.28	65.73	63.06	64.37
Running in place	90.22	94.58	92.35	64.39	69.66	66.92
Jumping	98.67	99.73	99.20	93.6	99.43	96.43

Table 4. The recognition precision, recall, and F1 score for various activities in the W-HAR dataset (%).

Activity	Inertial Sensor			Stretch Sensor
Activity	Precision	Recall	F1 Score	Precision	Recall	F1 Score
Walking forward	98.67	99.17	98.92	96.68	99.49	98.06
Standing	98.92	97.87	98.39	97.31	92.82	95.01
Sitting	97.6	100	98.79	98.56	98.56	98.56
Walking upstairs	87.88	82.86	85.3	78.79	56.52	65.82
Walking downstairs	86.67	78.79	82.54	86.67	89.66	88.14
Jumping	96.35	95.64	95.99	91.24	93.98	92.59

Table 5. The recognition precision, recall, and F1 score for various activities in the IS-Data dataset (%).

	Precision	Recall	F1 Score
Walking forward	98.53	98.73	98.63
Standing	99.61	99.81	99.71
Sitting	99.56	99.12	99.34
Walking in place	96.15	92.41	94.24
Running in place	93.54	96.39	94.94
Jumping	98.4	99.46	98.93

Table 6. The recognition precision, recall, and F1 score for various activities in the W-HAR dataset (%).

	Precision	Recall	F1 Score
Walking forward	98.67	99.5	99.08
Standing	99.46	96.86	98.14
Sitting	99.52	99.01	99.26
Walking upstairs	93.94	81.58	87.32
Walking downstairs	93.33	96.55	94.91
Jumping	94.89	98.48	96.65

Table 7. Results of comparison between baseline models using the IS-Data dataset.

Model	Stretch Sensor		Inertial Sensor		Stretch–Inertial Sensor
Model	Accuracy	F1-Score	Accuracy	F1-Score	Accuracy	F1 Score
CNN []	81.95	81.21	91.31	91.26	92.71	92.94
LSTM []	80.03	80.42	91.17	91.42	92.08	92.12
GRU []	83.03	83.42	92.89	92.72	93.97	93.22
CNN–GRU []	83.48	83.97	93.16	93.20	94.47	94.50
InnoHAR []	84.12	85.37	92.73	94.56	94.83	94.91
ResNet + AttLSTM []	85.36	85.27	95.52	95.96	96.21	96.48
Our model	86.24	87.06	96.11	96.33	97.51	97.63

Table 8. Impact of residual modules.

Description	F1 Score (%)
Description	IS-Data	w-HAR
CNNs without residual connections	95.83	95.40
Our model using residual connections	97.63	98.24

Table 9. Impact of GRU.

Description	F1 Score (%)
Description	IS-Data	w-HAR
Uses simple RNN	94.79	94.22
Uses LSTM	96.51	96.25
Our model uses GRU	97.63	98.24

Table 10. Impact of the ECA module.

Description	F1 Score (%)
Description	IS-Data	w-HAR
No ECA module has been added	96.82	97.65
The ECA module has been added	97.63	98.24

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Human Activity Recognition Based on Two-Channel Residual–GRU–ECA Module with Two Types of Sensors

Abstract

1. Introduction

2. Related Work

2.1. Human Activity Recognition Based on Wearable Sensors

2.2. Human Activity Recognition Based on Deep Learning

3. Materials and Methods

3.1. The Proposed Network Model

3.1.1. Gated Recurrent Unit

3.1.2. Efficient Channel Attention (ECA) Module

3.1.3. Residual Blocks

4. Experimental Results and Discussion

4.1. Experimental Dataset

4.1.1. w-HAR Dataset

4.1.2. IS-Data Dataset

4.2. Pre-Processing

4.3. Experimental Environment

4.4. Evaluation Metrics

4.5. Experiment Analysis and Performance Comparison

4.5.1. Inertial Sensor and Stretch Sensor Results

4.5.2. Results of Recognition of the Combination of Inertial Sensors and Stretch Sensors

4.5.3. On the Performance of the Proposed Network Model

4.6. Ablation Studies

4.6.1. Impact of Residual Blocks

4.6.2. Effect of GRU

4.6.3. Effect of the ECA Module

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics