You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

30 March 2023

Human Activity Recognition Based on Two-Channel Residual–GRU–ECA Module with Two Types of Sensors

and
1
Faculty of Electrical Engineering & Computer Science, Ningbo University, Ningbo 315040, China
2
Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo 315040, China
*
Author to whom correspondence should be addressed.

Abstract

With the thriving development of sensor technology and pervasive computing, sensor-based human activity recognition (HAR) has become more and more widely used in healthcare, sports, health monitoring, and human interaction with smart devices. Inertial sensors were one of the most commonly used sensors in HAR. In recent years, the demand for comfort and flexibility in wearable devices has gradually increased, and with the continuous development and advancement of flexible electronics technology, attempts to incorporate stretch sensors into HAR have begun. In this paper, we propose a two-channel network model based on residual blocks, an efficient channel attention module (ECA), and a gated recurrent unit (GRU) that is capable of the long-term sequence modeling of data, efficiently extracting spatial–temporal features, and performing activity classification. A dataset named IS-Data was designed and collected from six subjects wearing stretch sensors and inertial sensors while performing six daily activities. We conducted experiments using IS-Data and a public dataset called w-HAR to validate the feasibility of using stretch sensors in human action recognition and to investigate the effectiveness of combining flexible and inertial data in human activity recognition, and our proposed method showed superior performance and good generalization performance when compared with the state-of-the-art methods.

1. Introduction

Human activity recognition (HAR) is the process of extracting and capturing effective action information features from behavioral data produced by the human body, learning and understanding the actions performed by humans, and further determining the type of action. HAR combines knowledge from machine learning, intelligent biotechnology, smart wearable computing technology, computer vision, and many other disciplines. It plays a crucial role in recognizing a user’s interactions with his or her surroundings. HAR has therefore received a great deal of attention in applications such as home automation systems, medical rehabilitation, motion monitoring, fall prevention, and maladaptive behavior recognition [,]. For instance, HAR can be used to monitor the daily activities of elderly people who live alone, observing whether elderly individuals experience a fall so that they can seek help from their families in time, or even preventing such a fall []. It can also be used to collect data on the movement status of exercisers for real-time scientific exercise management [], to analyze the limb movement status of patients with neurological disorders for diagnosis and treatment, or, according to an individual patient’s movement data, to formulate and adjust the rehabilitation plans in the rehabilitation phase [].
HAR can be broadly classified into video-based and sensor-based activity recognition [].
The camera-based method extracts the human activity features from images and video streams by placing cameras in the human surroundings. Although this method allows for a more intuitive sense of the various details of the action, it is very restricted in its use to specific scenarios and cannot be used in unstructured scenarios as the camera has high requirements for external conditions, such as weather conditions, lighting conditions, and viewpoint orientation, when collecting data, and the clothing and height of the human targets will differ []. In addition, the presence of cameras may be considered invasive to the privacy of the user, but also increases the overall computational cost due to the sheer volume of data.
The sensor-based method uses multimodal information that is obtained using various wearable sensors to identify, interpret, and evaluate human activities. Initially, sensor technology was mostly used to analyze human gait and joint kinematics to assist in medical diagnosis and rehabilitation monitoring. Sensor technology has since made significant breakthroughs in several key performance indicators, such as better accuracy, smaller size, and lower manufacturing costs []. These advantages have led to a wider range of sensor applications. Compared with the equipment needed for the vision-based method, sensors are portable, lightweight, and can easily be integrated with other devices. They enable the continuous sensing of movements during daily activities without limiting the user’s movement behavior, and they improve the way that users interact with each other and with their surroundings.
Inertial and flexible sensors are commonly used wearable sensors, and inertial sensors have been successfully applied to HAR. However, because inertial sensors are mostly made of rigid materials with poor ductility, wearable devices incorporating inertial sensors are bulky and have poor skin fit leading to comfort problems, and their detection accuracy is greatly affected by the speed of movement []. Researchers have turned their attention to flexible sensors, which are less affected by motion speed and are made of thin, light, and soft materials [,]. The most prominent of these flexible sensors are stretch sensors, which can be sewn onto clothing to obtain data on bending or straightening at joints, or on breathing and heartbeat. This paper focuses on HAR based on inertial and flexible sensors.
Traditional approaches to activity recognition primarily use machine learning methods to manually extract features from sensor data, and these are typically statistical or structural features, including means, medians, standard deviations, etc., and domain expertise is often required to obtain the most relevant set of manual features. While these types of handcrafted features perform well in limited training data scenarios, the extraction of effective features becomes very complicated and difficult as the number of sensors increases.
These problems can be solved by deep learning algorithms with convolutional neural networks (CNNs), recurrent neural networks (RNNs), deep belief networks, and autoencoders. CNNs excel in local feature extraction [], but CNNs do not consider pattern sequences or remember changes in pattern sequences over time based on the lengths of gaps between them []. On the other hand, RNNs have the advantage of being able to effectively capture time series data features, though traditional RNNs suffer from the vanishing gradient problem and lack the ability to capture long-term dependencies []. Therefore, to overcome the above challenges, we propose a two-channel network model based on residual blocks, an efficient channel attention module (ECA), and a gated recurrent unit (GRU).
The main contributions of this paper are as follows:
  • A sensor-based human activity recognition framework which consists of inertial sensors and stretch sensors is proposed to investigate the effectiveness of different combinations of sensor data in human activity recognition and to verify the feasibility of flexible stretch sensors in human action recognition.
  • A two-channel network model based on residual blocks, an efficient channel attention module (ECA), and a gated recurrent unit (GRU), and which is capable of modelling long-term sequences of data, effectively extracting spatial–temporal features, and performing activity classification, is proposed.
  • The IS-Data dataset was produced and data were collected using self-developed stretching sensors and inertial sensors. The proposed model was tested on the IS-Data dataset and the publicly available w-HAR dataset, and in both cases the results outperformed the existing solutions.
The remainder of this paper is organized as follows: Section 2 introduces an overview of related work on HAR and deep learning approaches. The proposed methodology, including the experimental framework and model structure, is described in Section 3. Section 4 introduces the dataset, experimental content, evaluation metrics, ablation studies, and analysis of experimental results. Finally, the conclusion of the study is discussed in Section 5.

3. Materials and Methods

In this study, we propose a deep learning-based human activity recognition framework using wearable sensors. As a first step, activity data are collected using inertial sensors and stretch sensors. In the second step, the raw data are pre-processed with data segmentation, data normalization, etc. In the third step, the processed data are fed into the proposed neural network model for feature extraction and classified using a classification layer with SoftMax activation. Finally, the model performance is evaluated by standard assessment methods such as accuracy, precision, and F1 score. The overall structure of the framework is shown in Figure 1.
Figure 1. A deep learning-based human activity recognition framework using wearable sensors.

3.1. The Proposed Network Model

The structure of the proposed network model is shown in Figure 2 The proposed model consists of two main paths, one using a GRU to model the sensor data in long-term time series to extract valid temporal representation features, and one using lightweight residual blocks and efficient channel attention (ECA) modules to extract valid spatial representation features.
Figure 2. Our network model.
Finally, the temporal and spatial representations generated from the two paths are concatenated to be used for final decision making at a multilayer perceptron (MLP) with a SoftMax function for activity classification.

3.1.1. Gated Recurrent Unit

Considering the temporal dependencies between the data frames, RNNs can be used to model sensor data in long-term time series and extract effective temporal representation features. However, traditional RNNs suffer from the problem of gradient disappearance and cannot capture long-term dependencies []. We used a variant of RNN called GRU to capture long-term dependencies. Although LSTM can also solve the problem of RNN gradient vanishing, the memory units of the LSTM architecture have difficulty with increasing memory consumption and are also prone to overfitting problems []. On the other hand, GRU does not have a separate memory cell in its structure, but only has an update gate and a reset gate. The update gate is used to manage the update level of each hidden state, which determines which data need to be transferred to the next stage, and the update gate is updated according to Equations (1) and (2). The reset gate is used to determine which unwanted data can be discarded and is updated according to Equations (3) and (4).
Through these gates, the GRU can remember important information and model the temporal context of the sequence data. The GRU structure is shown in Figure 3. xt is the current input, ht−1 is the previous hidden state, zt and rt are the update gates and reset gates, respectively, ht is the output of the GRU unit at timestamp t, gt is the candidate output, ht is updated using gt, the update gate zt determines when to update ht, the reset gate rt is used to compute the candidate new value of gt, and it provides the correlation between ht−1 and the computation of the next ht candidate.
z t = σ W z x t U z h t 1
r t = σ W r x t U r h t 1
g t = tan h W g x t U g r t h t 1
h t = 1 z t h t 1 z t g t
where Wz, Wr, Wg, Uz, and Ur are the weight matrix, is the primary addition operation, and is the primary multiplication operation.
Figure 3. Structure of the gated recurrent unit.

3.1.2. Efficient Channel Attention (ECA) Module

The channel attention mechanism has been proven to have great potential for improving the effectiveness of convolutional neural networks. Therefore, to overcome the tension between performance and complexity [,], we applied a more efficient attention module (EcaNet) in our construction [].
Unlike SENet, EcaNet can achieve significant performance gains by adding only a small number of parameters, avoiding the loss of information that could result from dimensionality reduction. With its appropriate cross-channel interaction significantly reducing the complexity of the model and ensuring good performance, ECA is a novel way to study the interaction between different channels. Figure 4 shows a schematic diagram of the ECA module. Given aggregated features obtained through global average pooling (GAP), ECA generates channel weights by performing a 1D convolution of size k, where k is determined adaptively through a mapping of channel dimensions C.
Figure 4. Diagram of the efficient channel attention (ECA) block.

3.1.3. Residual Blocks

The data on human activity captured with sensors is actually a time series that is highly relevant over time, and it is necessary to efficiently extract the local features of this relevance. While convolutional networks have been demonstrated in the literature to possess a powerful ability to extract and learn spatial features from time series data, deeper information leakage and gradient disappearance are still major problems. ResNet [] attempts to address this problem through linking convolutional layers with shortcut residual links to facilitate gradients through the network. Upon this we built a lightweight residual module that consisted of two convolutional layers, each of which was followed by batch normalization (BN) [] and rectified linear unit activation (ReLU), whose structure is shown in Figure 5 and can be defined as
y h + 1 = I y h + F y h , ω h
where F is a function known as the residual function. ω h = ω j k h | 1 j n , 1 k m is a set of two-layer weights associated with h residual blocks, and ω j k h is the weight of k neurons in the first layer to j neurons in the second layer. The function I represents an independent mapping. Each block accepts an input y h and separately produces a forward stream of output y h + 1 .
c y h = C o n v y h
b y h = R e L U B N c y h
Figure 5. Structure of the residual block.
A basic residual module unit can be reconstructed and calculated as
y h + 1 = I y h + b 2 b 1 y h

4. Experimental Results and Discussion

In order to validate the feasibility of stretch sensor data in human activity recognition, to evaluate the effectiveness of data from different sensors in human activity recognition, and to test the performance of the proposed network model, we performed experimental tests with data from inertial sensors, data from stretch sensors, and combinations of data from inertial sensors and stretch sensors in the IS-Data and w-HAR datasets, respectively. More information concerning activities recorded in the w-HAR dataset and the IS-Data dataset is shown in Table 1.
Table 1. Dataset activity lists.

4.1. Experimental Dataset

4.1.1. w-HAR Dataset

The w-HAR dataset [] was built by researchers from Washington State University. They mounted the IMU on the right ankle of the user and sewed the stretch sensor to a knee sleeve. They had 22 subjects aged 20–45 years (including 14 males and 8 females) carry out 8 activities (jumping, lying down, sitting, walking downstairs, walking upstairs, standing, walking, and transition). Acceleration and angular velocity data obtained from the IMU sensors were collected at a sampling frequency of 250 Hz, and the stretch sensor was sampled at 25 Hz. In this study, we only used six of these activities (jumping, sitting, walking downstairs, walking upstairs, standing, and walking). The IMU sensor data were down sampled from 250 Hz to 25 Hz to reduce the computational cost.

4.1.2. IS-Data Dataset

The dataset produced for this paper was obtained using a self-developed six-axis inertial sensor and a stretch sensor. As is shown in Figure 6, the stretch sensor used in this study was a capacitive sensor, where the output capacitance increased with increasing strain and decreased with decreasing strain as the sensor extended and contracted.
Figure 6. Diagram of the flexible sensor.
The stretch sensor module was sewn onto the knee brace and was connected to a circuit module. The bending angle of the sensor was calculated by measuring the resistance of the sensor, and finally the data of the angle were output.
Two stretch sensors were fitted to the knees of the participants, as shown in the figure. When wearing the device, the user just wore the knee pads and ensured that the stretch sensors were aligned with the center of the kneecap. In addition, three six-axis inertial sensors were placed on the wrist, waist, and ankle, all of which used Bluetooth Low Energy (BLE) technology to collect data, using the BLE protocol, and these data were then transmitted to the smartphone and stored in a file. The dynamic range of the accelerometer and gyroscope output data were set to ±8 g and ±2000 dps. Both types of sensors used the same 50 Hz sampling frequency.
Following this, six adult participants with an average age of 26 years wore these sensors and performed the following six activities: walking, standing, sitting, stepping in place, running in place, and jumping. We required the participants to be in a state of health and comfort in order to perform these activities as naturally as possible, and the dataset was named IS-Data.

4.2. Pre-Processing

Since the data from the sensors were transmitted via a wireless Bluetooth connection, some of the data may have been lost, or they may have contained some noise during the acquisition process, so a linear interpolation algorithm was used to fill in the missing values and a Butterworth filter was used to filter the noise from the data signal. The raw data that were collected were normalized to a range of 0 to 1 using Equation (9).
A i n = A i a i m i n a i m a x a i m i n , i = 1,2 ,
where A i n denotes the normalized data, n denotes the number of channels, and a i m a x and a i m i n are the maximum and minimum values of the i-th channel, respectively.
The processed data were then segmented using a sliding window with an L size of 2.5 s and an overlap of 50%, where the sample data F W were represented as
F W = a x a y a z g x g y g z f 1 R L × N
where ax, ay, az, gx, gy, gz, f1 are column vectors containing L data samples of 3-axial acceleration, 3-axial angular velocity, stretching-shrinking information, respectively. N is the number of sensors and L is the length of the sliding window. Finally, all the column vectors are joined together into window data F W R L × N .
Details concerning the data preprocessing, such as the sampling rate, window size, and overlapping rate, are presented in Table 2. In the experiments, each dataset was divided into a training set (70%) and a validation set (30%). The other hyper-parameters, such as training epochs, batch size, and learning rate, are also listed Table 2.
Table 2. Summary of setup for datasets.

4.3. Experimental Environment

The network in this paper was trained on a computer equipped with an Intel Core i7 CPU, 16 GB of RAM, and a graphics processor (GPU) (NVIDIA GeForce GTX 3060 with 6 GB of memory). The algorithm was implemented using python 3.8 based on Google’s opensource deep learning framework TensorFlow 2.3.0, and the development environment used for the experiments was PyCharm on a 64-b version of Windows 11. The GPU was used to speed up the training and testing of the model.

4.4. Evaluation Metrics

In this paper, the effectiveness of the proposed model was calculated using different performance metrics [], described as follows:
Accuracy: this is defined as the fraction of samples predicted correctly to the total number of samples.
Precision: the fraction of positive samples recognized correctly out of the total number of samples recognized as positive.
Recall: the fraction of positive samples recognized correctly out of the total number of positive samples.
F1-score: this is a comprehensive estimate of the model’s accuracy and can be calculated as the harmonic mean of the precision and recall.
Confusion matrix (CM): This is a square matrix that gives the complete performance of a classification model. The rows of the CM signify instances of the true class labels, and columns signify predicted class labels. The diagonal elements of this matrix define the number of points for which the predicted label is equal to the true label.
A multiclass classification issue is represented by collection A that has n distinct class labels Bi, (i = 1, 2, 3, …, n) represented by {B1, B2, B3, …, Bn}. For this situation, the confusion matrix is an n × n matrix. Each row of the matrix corresponds to an actual instance of a class, and each column corresponds to an anticipated instance of a class. An element Cij of the confusion matrix specifies the number of cases for which the actual class is i and the signified class is j, as shown in Figure 7.
Figure 7. Confusion matrix for a multiclass classification problem.
These are mathematically expressed as:
F 1   s c o r e   % = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l × 100 %
P r e c i s i o n   % = T P T P + F P × 100 %
R e c a l l   % = T P T P + F N × 100 %
F 1   s c o r e   % = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l × 100 %

4.5. Experiment Analysis and Performance Comparison

4.5.1. Inertial Sensor and Stretch Sensor Results

A summary of the recognition accuracy, recall, and F1 scores for the various activities recorded in the IS-Data dataset and the w-HAR dataset are shown in Table 3 and Table 4. As Table 3 and Table 4 illustrate, for the three types of activities—walking, standing, and sitting—the respective stretch sensor recognition precision values are 99.16%, 99.61%, and 100% for the IS-Data dataset, and these values are higher than the inertial sensor recognition accuracy values. The stretch sensor recognition precision values are 96.68%, 97.31%, and 98.56% for the w-HAR dataset, with little difference in classification results between these and those of the inertial sensors. For jumps, the data recognition accuracy values of the stretch sensors are 93.6% and 94.41% in the two datasets, and although these are not as high as under inertial data, they achieve a comparable recognition performance. Thus, one can see that the stretch sensor has performs very well when monitoring activities that involve more significant joint site changes and that it has good potential for differentiating lower limb activities with similar patterns.
Table 3. The recognition precision, recall, and F1 score for various activities in the IS-Data dataset (%).
Table 4. The recognition precision, recall, and F1 score for various activities in the W-HAR dataset (%).
Confusion matrices for the proposed model with respect to the IS-Data dataset and the w-HAR dataset are illustrated in Figure 8 and Figure 9. As shown in Figure 8, the stretch sensor has many misclassifications for two activities that walking in place and running in place, as there is a strong similarity between these two activities in the lower limb activities. Similarly, the recognition of upstairs and downstairs can be easily confused as shown in Figure 8. It is just not possible to distinguish well between running in place and walking in place depending on the degree of flexion of the lower limb joints and the use of inertial sensors is slightly more effective.
Figure 8. Confusion matrix obtained for inertial sensors (a) and stretch sensors (b) on the IS-Data dataset.
Figure 9. Confusion matrix obtained for inertial sensors (a) and stretch sensors (b) on the w-HAR dataset.

4.5.2. Results of Recognition of the Combination of Inertial Sensors and Stretch Sensors

For some very similar activities, the final recognition results are not very good, whether based on the inertial sensors or the stretch sensors. We therefore combined the two types of sensor data to test whether they could improve the recognition of the activities, and we compared the experimental results with data from the inertial sensors and the stretch sensors, respectively.
The confusion matrix obtained by combining the data from the two sensors is shown in Figure 10. The results obtained from the data of both sensors using the IS-Data dataset and the w-HAR dataset are shown in Table 5 and Table 6, respectively. The combined use of data from the two sensors, as described in the previous subsection, led to an improvement in the identification of similar actions. For the activity of walking in place in the IS-Data dataset, there were increases in precision of 30.42% and 1.82% compared with the stretch data and inertial data, respectively. For the activity of running in place, the precision increased by 29.15% and 3.32%, respectively. In the w-HAR dataset, respective precision increases of 15.15% and 6.06% were obtained for walking upstairs compared with the stretch data and inertial data. The precision for walking downstairs increased by 6.66%.
Figure 10. Confusion matrices obtained for the combination of inertial sensor data and stretch sensor data using the IS-Data dataset (a) and for the combination of inertial sensor data and stretch sensor data using the w-HAR dataset (b).
Table 5. The recognition precision, recall, and F1 score for various activities in the IS-Data dataset (%).
Table 6. The recognition precision, recall, and F1 score for various activities in the W-HAR dataset (%).
In Figure 11, the accuracy of the combination of inertial sensor and stretch sensor data in the IS-Data dataset is 97.51%, which is 11.27% higher than the accuracy of stretch sensor data alone, and about 1.4% better than the accuracy of the inertial sensor data alone, which had an accuracy of 96.11%. Similarly, the accuracy of the combination of inertial sensor and stretch sensor data in the w-HAR dataset is 98.24%, which is approximately 2.49% better than that of the stretch sensor data alone, and about 0.58% better than the accuracy of the inertial sensor data alone, which was 97.66%.
Figure 11. Accuracy of the three types of data in the w-HAR dataset and the IS-Data dataset.
Overall, the combination of inertial and stretch sensor data contributes to the effectiveness of activity recognition.

4.5.3. On the Performance of the Proposed Network Model

The proposed model will now be compared with state-of the-art deep learning approaches in the scope of HAR.
As is shown in Table 7, we begin by comparing it with baseline classification models (CNN, LSTM, and GRU). Next, three hybrid deep learning approaches are compared with the proposed model: CNN–GRU [], InnoHAR [], and ResNet + AttLSTM []. It should be noted that both the recognition accuracy and the F1 score of our model were better than those of the other models, and that our model achieved the highest accuracy (97.51%) and the highest F1-score (97.63%) when the stretch sensor and inertial sensor data were combined. The proposed model thus achieved the most stable performance.
Table 7. Results of comparison between baseline models using the IS-Data dataset.

4.6. Ablation Studies

4.6.1. Impact of Residual Blocks

In order to investigate the impact of the proposed addition of residual blocks on the model’s performance, we conducted ablation experiments on two datasets using combined data from the stretch and inertial sensors.
This experiment used a simple CNN without residual connections as the baseline architecture. The results are displayed in Table 8. The CNN demonstrated the lowest performance in recognition, perhaps because the capture of spatial relationships could not be carried out smoothly owing to information leakage. In contrast, we obtained better performance with the model which used the residual connection module. The F1 scores produced with the IS-Data dataset and the w-HAR dataset improved by 2.51% and 2.41%, respectively.
Table 8. Impact of residual modules.

4.6.2. Effect of GRU

For the purpose of evaluating the effectiveness of GRU in capturing long-term dependencies in time series, we performed ablation experiments on RNN, LSTM, and GRU using two datasets and a combination of data obtained from stretch sensors and inertial sensors. The results are displayed in Table 9. Simple RNNs exhibit the poorest recognition performance since the gradient disappearance problem causes simple RNNs to fail to capture long-term dependencies in time series. In comparison, the LSTM achieved better performance, with F1 scores improving by approximately 1.72% and 1.03%, respectively. The GRU performed better than the LSTM, with F1 score increases of about 2.83% and 2.06%, respectively.
Table 9. Impact of GRU.

4.6.3. Effect of the ECA Module

We wished to determine the effect of the addition of the ECA module on the accuracy of the model recognition, and it is evident from Table 10 that the F1 scores obtained with the ECA module are slightly higher for both datasets (by 0.83% and 0.59%).
Table 10. Impact of the ECA module.
In summary, the proposed two-channel network model based on residual blocks, efficient channel attention (ECA) modules, and gate control recurrent units (GRUs) effectively extracts optimal features from the sensor data. In comparison with other state-of-the-art HAR methods, much higher F1 scores are obtained for the IS-Data datasets and w-HAR datasets.

5. Conclusions and Discussion

In recent years, most sensor-based HAR methods have used inertial sensors, but with the development of flexible sensors, and considering the comfortable nature of wearable sensors and the lack of accuracy of inertial sensors themselves, this may soon change. This paper evaluates the feasibility of using deep learning algorithms for the application of stretch sensors in human activity recognition with respect to the combination of inertial sensors and stretch sensors. It has been demonstrated that stretch sensors can identify certain activities that involve the use of human joints more accurately than inertial sensors. Combining the advantages of two types of sensors will not only improve the comfort and freedom of traditional wearable sensors, but will also improve the accuracy of recognition.
We propose a two-channel network model based on residual blocks, efficient channel attention (ECA) modules, and gate control recurrent units (GRUs). Using residual blocks and GRUs enables the modelling of long-term time series data and the effective extraction of spatial–temporal features, which helps the model learn discriminative features. For further efficient learning, the use of ECA modules with appropriate cross-channel interaction can significantly reduce the complexity of the model and guarantee the performance of the classification of activities. Our experiments on the self-collected IS-Data dataset and a public dataset called w-HAR demonstrated that our proposed model achieves better accuracy.
Although we have only briefly evaluated the feasibility of stretch sensors in human activity recognition with a small amount of human joint flexion data that we collected from stretch sensors, this is enough to demonstrate that stretch sensors hold unique advantages and may soon become a novel and popular choice for multi-modal human activity recognition in this research area.

Author Contributions

Conceptualization, X.W.; model analysis, X.W.; methodology, X.W.; validation, X.W.; investigation, X.W.; resources, J.S.; data curation, J.S.; writing—original draft preparation, X.W.; writing—review and editing, X.W.; data visualization and graphic improvement, J.S.; supervision, J.S.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, S.; Li, Y.; Zhang, S.; Shahabi, F.; Xia, S.; Deng, Y.; Alshurafa, N. Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances. Sensors 2022, 22, 1476. [Google Scholar] [CrossRef]
  2. Toor, A.A.; Usman, M.; Younas, F.; M. Fong, A.C.; Khan, S.A.; Fong, S. Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems. Sensors 2020, 20, 2131. [Google Scholar] [CrossRef]
  3. Oikonomou, K.M.; Kansizoglou, I.; Manaveli, P.; Grekidis, A.; Menychtas, D.; Aggelousis, N.; Sirakoulis, G.C.; Gasteratos, A. Joint-Aware Action Recognition for Ambient Assisted Living. In Proceedings of the 2022 IEEE International Conference on Imaging Systems and Techniques (IST), Kaohsiung, Taiwan, 21–23 June 2022; pp. 1–6. [Google Scholar]
  4. Liu, G.; Ho, C.; Slappey, N.; Zhou, Z.; Snelgrove, S.E.; Brown, M.; Grabinski, A.; Guo, X.; Chen, Y.; Miller, K.; et al. A wearable conductivity sensor for wireless real-time sweat monitoring. Sens. Actuators B Chem. 2016, 227, 35–42. [Google Scholar] [CrossRef]
  5. Yan, H.; Hu, B.; Chen, G.; Zhengyuan, E. Real-Time Continuous Human Rehabilitation Action Recognition Using OpenPose and FCN. In Proceedings of the 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), Shenzhen, China, 24–26 April 2020; pp. 239–242. [Google Scholar]
  6. Yadav, S.K.; Tiwari, K.; Pandey, H.M.; Akbar, S.A. A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions. Knowl. Based Syst. 2021, 223, 106970. [Google Scholar] [CrossRef]
  7. Chung, S.; Lim, J.; Noh, K.J.; Kim, G.; Jeong, H. Sensor Data Acquisition and Multimodal Sensor Fusion for Human Activity Recognition Using Deep Learning. Sensors 2019, 19, 1716. [Google Scholar] [CrossRef] [PubMed]
  8. Zhang, Y.; Tian, G.; Zhang, S.; Li, C. A Knowledge-Based Approach for Multiagent Collaboration in Smart Home: From Activity Recognition to Guidance Service. IEEE Trans. Instrum. Meas. 2020, 69, 317–329. [Google Scholar] [CrossRef]
  9. Yin, W.; Reddy, C.; Zhou, Y.; Zhang, X. A Novel Application of Flexible Inertial Sensors for Ambulatory Measurement of Gait Kinematics. IEEE Trans. Hum. Mach. Syst. 2021, 51, 346–354. [Google Scholar] [CrossRef]
  10. Totaro, M.; Poliero, T.; Mondini, A.; Lucarotti, C.; Cairoli, G.; Ortiz, J.; Beccai, L. Soft Smart Garments for Lower Limb Joint Position Analysis. Sensors 2017, 17, 2314. [Google Scholar] [CrossRef]
  11. Mokhlespour Esfahani, M.I.; Zobeiri, O.; Moshiri, B.; Narimani, R.; Mehravar, M.; Rashedi, E.; Parnianpour, M. Trunk Motion System (TMS) Using Printed Body Worn Sensor (BWS) via Data Fusion Approach. Sensors 2017, 17, 112. [Google Scholar] [CrossRef]
  12. Kansizoglou, I.; Bampis, L.; Gasteratos, A. Deep Feature Space: A Geometrical Perspective. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6823–6838. [Google Scholar] [CrossRef]
  13. Dua, N.; Singh, S.N.; Semwal, V.B.; Challa, S.K. Inception inspired CNN-GRU hybrid network for human activity recognition. Multimed. Tools Appl. 2023, 82, 5369–5403. [Google Scholar] [CrossRef]
  14. Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
  15. Margarito, J.; Helaoui, R.; Bianchi, A.M.; Sartor, F.; Bonomi, A.G. User-Independent Recognition of Sports Activities from a Single Wrist-Worn Accelerometer: A Template-Matching-Based Approach. IEEE Trans. Biomed. Eng. 2016, 63, 788–796. [Google Scholar] [CrossRef] [PubMed]
  16. Tan, T.-H.; Wu, J.-Y.; Liu, S.-H.; Gochoo, M. Human Activity Recognition Using an Ensemble Learning Algorithm with Smartphone Sensor Data. Electronics 2022, 11, 322. [Google Scholar] [CrossRef]
  17. Qin, Z.; Zhang, Y.; Meng, S.; Qin, Z.; Choo, K.-K.R. Imaging and fusing time series for wearable sensor-based human activity recognition. Inf. Fusion 2020, 53, 80–87. [Google Scholar] [CrossRef]
  18. Cha, Y.; Kim, H.; Kim, D. Flexible Piezoelectric Sensor-Based Gait Recognition. Sensors 2018, 18, 468. [Google Scholar] [CrossRef]
  19. Dauphin, Y.N.; Vries, H.d.; Bengio, Y. Equilibrated adaptive learning rates for non-convex optimization. In Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 1, Montreal, QC, Canada; 2015; pp. 1504–1512. [Google Scholar]
  20. Klaassen, B.; van Beijnum, B.-J.; Weusthof, M.; Hofs, D.; van Meulen, F.; Droog, E.; Luinge, H.; Slot, L.; Tognetti, A.; Lorussi, F.; et al. A Full Body Sensing System for Monitoring Stroke Patients in a Home Environment. In Proceedings of the Biomedical Engineering Systems and Technologies; Springer: Cham, Switzerland, 2015; pp. 378–393. [Google Scholar]
  21. Chander, H.; Stewart, E.; Saucier, D.; Nguyen, P.; Luczak, T.; Ball, J.E.; Knight, A.C.; Smith, B.K.; V Burch, R.F.; Prabhu, R.K. Closing the Wearable Gap—Part III: Use of Stretch Sensors in Detecting Ankle Joint Kinematics During Unexpected and Expected Slip and Trip Perturbations. Electronics 2019, 8, 1083. [Google Scholar] [CrossRef]
  22. Maramis, C.; Kilintzis, V.; Scholl, P.; Chouvarda, I. Objective Smoking: Towards Smoking Detection Using Smartwatch Sensors. In Proceedings of the Precision Medicine Powered by pHealth and Connected Health; Springer: Singapore, 2018; pp. 211–215. [Google Scholar]
  23. Bhandari, B.; Lu, J.; Zheng, X.; Rajasegarar, S.; Karmakar, C. Non-invasive sensor based automated smoking activity detection. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju, Republic of Korea, 11–15 July 2017; pp. 845–848. [Google Scholar]
  24. Cruciani, F.; Vafeiadis, A.; Nugent, C.; Cleland, I.; McCullagh, P.; Votis, K.; Giakoumis, D.; Tzovaras, D.; Chen, L.; Hamzaoui, R. Feature learning for Human Activity Recognition using Convolutional Neural Networks. CCF Trans. Pervasive Comput. Interact. 2020, 2, 18–32. [Google Scholar] [CrossRef]
  25. Uddin, M.Z.; Hassan, M.M.; Alsanad, A.; Savaglio, C. A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf. Fusion 2020, 55, 105–115. [Google Scholar] [CrossRef]
  26. Hammerla, N.Y.; Halloran, S.; Plötz, T. Deep, convolutional, and recurrent models for human activity recognition using wearables. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 1533–1540. [Google Scholar]
  27. Coelho, Y.; Rangel, L.; dos Santos, F.; Frizera-Neto, A.; Bastos-Filho, T. Human Activity Recognition Based on Convolutional Neural Network. In Proceedings of the XXVI Brazilian Congress on Biomedical Engineering; Springer: Singapore, 2019; pp. 247–252. [Google Scholar]
  28. Song-Mi, L.; Sang Min, Y.; Heeryon, C. Human activity recognition from accelerometer data using Convolutional Neural Network. In Proceedings of the 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Republic of Korea, 13–16 February 2017; pp. 131–134. [Google Scholar]
  29. Kim, Y.; Moon, T. Human Detection and Activity Classification Based on Micro-Doppler Signatures Using Deep Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2016, 13, 8–12. [Google Scholar] [CrossRef]
  30. Liu, J.; Shahroudy, A.; Xu, D.; Wang, G. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition. arXiv 2016, arXiv:1607.07043. [Google Scholar] [CrossRef]
  31. Haque, M.N.; Tonmoy, M.T.H.; Mahmud, S.; Ali, A.A.; Khan, M.A.H.; Shoyaib, M. GRU-based Attention Mechanism for Human Activity Recognition. In Proceedings of the 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 3–5 May 2019; pp. 1–6. [Google Scholar]
  32. Yu, T.; Chen, J.; Yan, N.; Liu, X. A Multi-Layer Parallel LSTM Network for Human Activity Recognition with Smartphone Sensors. In Proceedings of the 2018 10th International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, 18–20 October 2018; pp. 1–6. [Google Scholar]
  33. Okai, J.; Paraschiakos, S.; Beekman, M.; Knobbe, A.; de Sá, C.R. Building robust models for Human Activity Recognition from raw accelerometers data using Gated Recurrent Units and Long Short Term Memory Neural Networks. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 2486–2491. [Google Scholar]
  34. Abbaspour, S.; Fotouhi, F.; Sedaghatbaf, A.; Fotouhi, H.; Vahabi, M.; Linden, M. A Comparative Analysis of Hybrid Deep Learning Models for Human Activity Recognition. Sensors 2020, 20, 5707. [Google Scholar] [CrossRef]
  35. Mekruksavanich, S.; Jitpattanakul, A. Biometric User Identification Based on Human Activity Recognition Using Wearable Sensors: An Experiment Using Deep Learning Models. Electronics 2021, 10, 308. [Google Scholar] [CrossRef]
  36. Ma, H.; Li, W.; Zhang, X.; Gao, S.; Lu, S. AttnSense: Multi-level attention mechanism for multimodal human activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3109–3115. [Google Scholar]
  37. Zheng, Z.; Shi, L.; Wang, C.; Sun, L.; Pan, G. LSTM with Uniqueness Attention for Human Activity Recognition. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2019: Image Processing; Springer: Cham, Switzerland, 2019; pp. 498–509. [Google Scholar]
  38. Murahari, V.S.; Plötz, T. On attention models for human activity recognition. In Proceedings of the 2018 ACM International Symposium on Wearable Computers, Singapore, 8–12 October 2018; pp. 100–103. [Google Scholar]
  39. Zeng, M.; Gao, H.; Yu, T.; Mengshoel, O.J.; Langseth, H.; Lane, I.; Liu, X. Understanding and improving recurrent networks for human activity recognition by continuous attention. In Proceedings of the 2018 ACM International Symposium on Wearable Computers, Singapore, 8–12 October 2018; pp. 56–63. [Google Scholar]
  40. Challa, S.K.; Kumar, A.; Semwal, V.B. A multibranch CNN-BiLSTM model for human activity recognition using wearable sensor data. Vis. Comput. 2022, 38, 4095–4109. [Google Scholar] [CrossRef]
  41. Gao, W.; Zhang, L.; Teng, Q.; He, J.; Wu, H. DanHAR: Dual Attention Network for multimodal human activity recognition using wearable sensors. Appl. Soft Comput. 2021, 111, 107728. [Google Scholar] [CrossRef]
  42. Wang, K.; He, J.; Zhang, L. Sequential Weakly Labeled Multiactivity Localization and Recognition on Wearable Sensors Using Recurrent Attention Networks. IEEE Trans. Hum. Mach. Syst. 2021, 51, 355–364. [Google Scholar] [CrossRef]
  43. Li, X.; Wang, Y.; Zhang, B.; Ma, J. PSDRNN: An Efficient and Effective HAR Scheme Based on Feature Extraction and Deep Learning. IEEE Trans. Ind. Inform. 2020, 16, 6703–6713. [Google Scholar] [CrossRef]
  44. Chen, L.; Liu, X.; Peng, L.; Wu, M. Deep learning based multimodal complex human activity recognition using wearable devices. Appl. Intell. 2021, 51, 4029–4042. [Google Scholar] [CrossRef]
  45. Xu, C.; Chai, D.; He, J.; Zhang, X.; Duan, S. InnoHAR: A Deep Neural Network for Complex Human Activity Recognition. IEEE Access 2019, 7, 9893–9902. [Google Scholar] [CrossRef]
  46. Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate LSTM-FCNs for time series classification. Neural Netw. 2019, 116, 237–245. [Google Scholar] [CrossRef]
  47. Canizo, M.; Triguero, I.; Conde, A.; Onieva, E. Multi-head CNN–RNN for multi-time series anomaly detection: An industrial case study. Neurocomputing 2019, 363, 246–260. [Google Scholar] [CrossRef]
  48. Dua, N.; Singh, S.N.; Semwal, V.B. Multi-input CNN-GRU based human activity recognition using wearable sensors. Computing 2021, 103, 1461–1478. [Google Scholar] [CrossRef]
  49. Abdel-Basset, M.; Hawash, H.; Chakrabortty, R.K.; Ryan, M.; Elhoseny, M.; Song, H. ST-DeepHAR: Deep Learning Model for Human Activity Recognition in IoHT Applications. IEEE Internet Things J. 2021, 8, 4969–4979. [Google Scholar] [CrossRef]
  50. Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
  51. Mazzia, V.; Angarano, S.; Salvetti, F.; Angelini, F.; Chiaberge, M. Action Transformer: A self-attention model for short-time pose-based human action recognition. Pattern Recognit. 2022, 124, 108487. [Google Scholar] [CrossRef]
  52. Santavas, N.; Kansizoglou, I.; Bampis, L.; Karakasis, E.; Gasteratos, A. Attention! A Lightweight 2D Hand Pose Estimation Approach. IEEE Sens. J. 2021, 21, 11488–11496. [Google Scholar] [CrossRef]
  53. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q.J. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. arXiv 2019, arXiv:1910.03151. [Google Scholar] [CrossRef]
  54. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar] [CrossRef]
  55. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning—Volume 37, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
  56. Bhat, G.; Tran, N.; Shill, H.; Ogras, U.Y. w-HAR: An Activity Recognition Dataset and Framework Using Low-Power Wearable Devices. Sensors 2020, 20, 5356. [Google Scholar] [CrossRef]
  57. Dewangan, D.K.; Sahu, S.P.; Sairam, B.; Agrawal, A. VLDNet: Vision-based lane region detection network for intelligent vehicle system using semantic segmentation. Computing 2021, 103, 2867–2892. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.