A Comprehensive Survey on Deep Learning Methods in Human Activity Recognition
Abstract
:1. Introduction
- Introduction: This section delineates the problem of HAR, setting the context for the remainder of the paper.
- Related Work: Here, we reference the seminal and recent literature on HAR, underscoring the importance of comprehensive literature reviews in the domain.
- Methodology: In this section, we present our methodology, highlighting our data sources and the processes we employed to distill key information.
- Taxonomy of Methods: This section presents a deeper categorization of HAR methods and distinctly partitions them into sensor-based and vision-based techniques.
- Datasets: In this section, we catalog the most prevalent datasets employed in HAR research.
- Conclusion and Future Directions: Finally, in this section we wrap up the paper by discussing potential avenues for future research in the realm of HAR.
2. Related Work
3. Methodology
3.1. Data Sources
3.2. Use of Natural Language Processing
3.3. Data Compilation
3.4. Taxonomy Method
- Cost Under this metric, we investigate the computational cost associated with training and deploying each method discussed in the literature. This metric is critically assessed based on the deep learning approach utilized in each method. Our empirical evaluation takes into account several key factors: the memory requirements essential for training and deploying, the complexity encountered during the inference process, and the usage of ensemble methods by the authors. By scrutinizing these elements, we aim to provide a thorough and nuanced understanding of the computational costs, offering insights into the practicality and efficiency of each method in real-world scenarios. This metric is of vital importance, as it is crucial for HAR methods to be cost-effective, as they are deployed in memory- and computing-constrained embedded systems.
- Approach We focus on identifying the machine learning models adopted by the authors to address the problem at hand. This aspect is crucial for understanding the benefits and shortcomings of each method. By identifying the types of deep learning models used, we enable readers to discern the benefits and drawbacks inherent to each approach. Such an understanding is pivotal for researchers and practitioners alike, as it not only provides a clear picture of the current state of the field but also aids in identifying potential areas for future exploration and development. We aim to offer a comprehensive overview that not only informs but also inspires readers to bridge gaps and contribute to the evolution of the future literature.
- Performance In this segment of our analysis, we turn our attention to the evaluation performance of the various methods within the datasets they were validated on. We categorize performance into three distinct tiers: low, medium, and high. A performance is deemed ’low’ when scores fall below 75%, ‘medium’ for those ranging between 75% and 95%, and ’high’ for scores exceeding 95%. However, it is crucial for our readers to understand that these performance degrees are indicative and not absolute. This is because different methods are often evaluated using diverse metrics, making direct comparisons challenging. Therefore, while these performance categories provide a helpful framework for initial assessment, they should be interpreted with an understanding of the varied and specific contexts in which each method is tested. Our intention is to offer a guide that aids in gauging performance, while also acknowledging the complexities and nuances inherent in methodological evaluations.
- Datasets This component of our taxonomy is essential, as it provides a clear insight into the environments and conditions under which each method was tested and refined. By presenting this information, we aim to give readers a comprehensive understanding of the types of data each method is best suited for, as well as the potential limitations or biases inherent in these datasets.
- Supervision In the ’Supervision’ section, we report the nature of supervision employed in the training of the methods we have examined. This aspect is pivotal, as the type of supervision has a significant impact on several facets of the developmental process, most notably in the cost and effort associated with data labeling. Methods that utilize supervised learning often require large datasets, which in turn necessitate extensive input from human annotators, thereby increasing costs. Conversely, methods based on unsupervised learning, while alleviating the need for labeled data, often confront challenges in maintaining a consistent quality metric. Such methods are also more prone to collapsing during training. By outlining the supervision techniques used, we aim to provide insights into the trade-offs and considerations inherent in each approach, offering a comprehensive perspective on how the choice of supervision influences not just the method’s development but also its potential applications and efficacy in real-world scenarios.
3.5. Budget
4. HAR Devices and Processing Algorithms
4.1. Devices
4.1.1. Body-Worn Sensors
4.1.2. Object Sensors
4.1.3. Ambient Sensors
4.1.4. Hybrid Sensors
4.1.5. Vision Sensors
4.2. Algorithms
4.2.1. Convolutional Neural Networks
4.2.2. Input Adaptation
4.2.3. Data-Driven Approach
4.2.4. Model-Driven Approach
4.2.5. Weight-Sharing
4.2.6. Recurrent Neural Networks
5. Sensor-Based HAR
5.1. Accelerometer and IMU Modalities
5.2. Methods Leveraging WiFi Signals
5.3. Radar Signal HAR
5.4. Various Modalities and Modality Fusion
6. Vision-Based HAR
7. Datasets
7.1. Sensor-Based Datasets
7.2. Vision-Based Datasets
7.2.1. Action-Level Datasets
7.2.2. Behavioral-Level Datasets
7.2.3. Interaction-Level Datasets
7.2.4. Group Activity-Level Datasets
8. HAR in Robotics and Industry
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Item | Description |
Context | “Human action recognition is a computer vision task that identifies how a person or a group acts on a video sequence. Various methods have been proposed to address this problem over the years; the most recent ones rely on deep-learning techniques. First, RGB video measurements and stacks of optical flows are fed as input, and then through two-dimensional convolutional neural networks (2D-CNNs), spatial and temporal analyses are performed. In another line of work, several applications use 2D-CNNs for feature extraction. This way, an image is represented via a feature vector employed to recurrent neural networks (RNNs) for temporal analysis. Motivated by the fact that most high complexity CNNs are utilized on human action recognition tasks and owing to the necessity for mobile implementations on platforms with restricted computational resources, this article evaluates the performance of four lightweight architectures. In particular, we examine how the models of certain mobile-oriented CNNs, viz., ShuffleNet-v2, EfficientNet-b0, MobileNet-v3, and GhostNet, execute in spatial analysis. To that end, we measure the classification accuracy on two human action datasets, the HMDB51, and the UCF101, when the presented models have been previously trained on ImageNet and BU101. The frameworks’ evaluation is based on the average, max scores, and voting generated through the three and fifteen RGB frames of each video included in the test set. Finally, via the trained mobile 2D-CNNs extracted features, RNNs performance evaluation is also assessed where the temporal analysis is achieved.” |
Question | Answer |
What is the paper’s main contribution? | evaluates the performance of four lightweight architectures |
What problem is being addressed? | Human action recognition |
What is the input modality? | RGB video measurements and 3 stacks of optical flows |
How is the method evaluated? | based on the average, 14 max scores, and voting |
What is the methodology the authors approach the problem with? | deep-learning techniques |
References
- Gupta, S. Deep learning based human activity recognition (HAR) using wearable sensor data. Int. J. Inf. Manag. Data Insights 2021, 1, 100046. [Google Scholar] [CrossRef]
- Diraco, G.; Rescio, G.; Caroppo, A.; Manni, A.; Leone, A. Human Action Recognition in Smart Living Services and Applications: Context Awareness, Data Availability, Personalization, and Privacy. Sensors 2023, 23, 6040. [Google Scholar] [CrossRef] [PubMed]
- Shuvo, M.M.H.; Ahmed, N.; Nouduri, K.; Palaniappan, K. A Hybrid Approach for Human Activity Recognition with Support Vector Machine and 1D Convolutional Neural Network. In Proceedings of the 2020 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 13–15 October 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Rojanavasu, P.; Jantawong, P.; Jitpattanakul, A.; Mekruksavanich, S. Improving Inertial Sensor-based Human Activity Recognition using Ensemble Deep Learning. In Proceedings of the 2023 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON), Phuket, Thailand, 22–25 March 2023; pp. 488–492. [Google Scholar] [CrossRef]
- Muhoza, A.C.; Bergeret, E.; Brdys, C.; Gary, F. Multi-Position Human Activity Recognition using a Multi-Modal Deep Convolutional Neural Network. In Proceedings of the 2023 8th International Conference on Smart and Sustainable Technologies (SpliTech), Split, Croatia, 20–23 June 2023; pp. 1–5. [Google Scholar]
- Tao, S.; Goh, W.L.; Gao, Y. A Convolved Self-Attention Model for IMU-based Gait Detection and Human Activity Recognition. In Proceedings of the 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS), Hangzhou, China, 11–13 June 2023; pp. 1–5. [Google Scholar]
- Hassler, A.P.; Menasalvas, E.; García-García, F.J.; Rodríguez-Mañas, L.; Holzinger, A. Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome. BMC Med. Inform. Decis. Mak. 2019, 19, 33. [Google Scholar] [CrossRef] [PubMed]
- Xu, S.; Zhang, L.; Huang, W.; Wu, H.; Song, A. Deformable convolutional networks for multimodal human activity recognition using wearable sensors. IEEE Trans. Instrum. Meas. 2022, 71, 2505414. [Google Scholar] [CrossRef]
- Beddiar, D.R.; Nini, B.; Sabokrou, M.; Hadid, A. Vision-based human activity recognition: A survey. Multimed. Tools Appl. 2020, 79, 30509–30555. [Google Scholar] [CrossRef]
- Lara, O.D.; Labrador, M.A. A survey on human activity recognition using wearable sensors. IEEE Commun. Surv. Tutorials 2012, 15, 1192–1209. [Google Scholar] [CrossRef]
- Ke, S.R.; Thuc, H.L.U.; Lee, Y.J.; Hwang, J.N.; Yoo, J.H.; Choi, K.H. A review on video-based human activity recognition. Computers 2013, 2, 88–131. [Google Scholar] [CrossRef]
- Ray, A.; Kolekar, M.H.; Balasubramanian, R.; Hafiane, A. Transfer learning enhanced vision-based human activity recognition: A decade-long analysis. Int. J. Inf. Manag. Data Insights 2023, 3, 100142. [Google Scholar] [CrossRef]
- Singh, R.; Kushwaha, A.K.S.; Srivastava, R. Recent trends in human activity recognition–A comparative study. Cogn. Syst. Res. 2023, 77, 30–44. [Google Scholar] [CrossRef]
- Gu, F.; Chung, M.H.; Chignell, M.; Valaee, S.; Zhou, B.; Liu, X. A survey on deep learning for human activity recognition. Acm Comput. Surv. (CSUR) 2021, 54, 1–34. [Google Scholar] [CrossRef]
- Hussain, Z.; Sheng, M.; Zhang, W.E. Different approaches for human activity recognition: A survey. arXiv 2019, arXiv:1906.05074. [Google Scholar]
- Jobanputra, C.; Bavishi, J.; Doshi, N. Human activity recognition: A survey. Procedia Comput. Sci. 2019, 155, 698–703. [Google Scholar] [CrossRef]
- Ramasamy Ramamurthy, S.; Roy, N. Recent trends in machine learning for human activity recognition—A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1254. [Google Scholar] [CrossRef]
- Dang, L.M.; Min, K.; Wang, H.; Piran, M.J.; Lee, C.H.; Moon, H. Sensor-based and vision-based human activity recognition: A comprehensive survey. Pattern Recognit. 2020, 108, 107561. [Google Scholar] [CrossRef]
- Vrigkas, M.; Nikou, C.; Kakadiaris, I.A. A review of human activity recognition methods. Front. Robot. AI 2015, 2, 28. [Google Scholar] [CrossRef]
- Saleem, G.; Bajwa, U.I.; Raza, R.H. Toward human activity recognition: A survey. Neural Comput. Appl. 2023, 35, 4145–4182. [Google Scholar] [CrossRef]
- Morshed, M.G.; Sultana, T.; Alam, A.; Lee, Y.K. Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities. Sensors 2023, 23, 2182. [Google Scholar] [CrossRef]
- Hinton, G.E.; Roweis, S. Stochastic neighbor embedding. Adv. Neural Inf. Process. Syst. 2002, 15. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Seyfioğlu, M.S.; Özbayoğlu, A.M.; Gürbüz, S.Z. Deep convolutional autoencoder for radar-based classification of similar aided and unaided human activities. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 1709–1723. [Google Scholar] [CrossRef]
- Ignatov, A. Real-time human activity recognition from accelerometer data using Convolutional Neural Networks. Appl. Soft Comput. 2018, 62, 915–922. [Google Scholar] [CrossRef]
- Hegde, N.; Bries, M.; Swibas, T.; Melanson, E.; Sazonov, E. Automatic recognition of activities of daily living utilizing insole-based and wrist-worn wearable sensors. IEEE J. Biomed. Health Inform. 2017, 22, 979–988. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Liu, A.X.; Shahzad, M.; Ling, K.; Lu, S. Device-free human activity recognition using commercial WiFi devices. IEEE J. Sel. Areas Commun. 2017, 35, 1118–1131. [Google Scholar] [CrossRef]
- Ruan, W.; Sheng, Q.Z.; Yao, L.; Li, X.; Falkner, N.J.; Yang, L. Device-free human localization and tracking with UHF passive RFID tags: A data-driven approach. J. Netw. Comput. Appl. 2018, 104, 78–96. [Google Scholar] [CrossRef]
- Rol, L.; Lidauer, L.; Sattlecker, G.; Kickinger, F.; Auer, W.; Sturm, V.; Efrosinin, D.; Drillich, M.; Iwersen, M. Monitoring drinking behavior in bucket-fed dairy calves using an ear-attached tri-axial accelerometer: A pilot study. Comput. Electron. Agric. 2018, 145, 298–301. [Google Scholar]
- Alsinglawi, B.; Nguyen, Q.V.; Gunawardana, U.; Maeder, A.; Simoff, S.J. RFID systems in healthcare settings and activity of daily living in smart homes: A review. E-Health Telecommun. Syst. Netw. 2017, 6, 1–17. [Google Scholar] [CrossRef]
- Fan, X.; Wang, F.; Wang, F.; Gong, W.; Liu, J. When RFID meets deep learning: Exploring cognitive intelligence for activity identification. IEEE Wirel. Commun. 2019, 26, 19–25. [Google Scholar] [CrossRef]
- Qi, J.; Yang, P.; Waraich, A.; Deng, Z.; Zhao, Y.; Yang, Y. Examining sensor-based physical activity recognition and monitoring for healthcare using inter- net of things: A systematic review. J. Biomed. Inform. 2018, 87, 138–153. [Google Scholar] [CrossRef] [PubMed]
- Hao, J.; Bouzouane, A.; Gaboury, S. Recognizing multi-resident activities in non-intrusive sensor-based smart homes by formal concept analysis. Neuro-Computing 2018, 318, 75–89. [Google Scholar] [CrossRef]
- Roy, N.; Misra, A.; Cook, D. Ambient and smartphone sensor assisted ADL recognition in multi-inhabitant smart environments. J. Ambient Intell. Humaniz. Comput. 2016, 7, 1–19. [Google Scholar] [CrossRef]
- Jalal, A.; Kim, Y.H.; Kim, Y.J.; Kamal, S.; Kim, D. Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit. 2017, 61, 295–308. [Google Scholar] [CrossRef]
- Oyedotun, O.K.; Khashman, A. Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl. 2017, 28, 3941–3951. [Google Scholar] [CrossRef]
- Herath, S.; Harandi, M.; Porikli, F. Going deeper into action recognition: A survey. Image Vis. Comput. 2017, 60, 4–21. [Google Scholar] [CrossRef]
- Xu, D.; Yan, Y.; Ricci, E.; Sebe, N. Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput. Vis. Image Underst. 2017, 156, 117–127. [Google Scholar] [CrossRef]
- Zerrouki, N.; Harrou, F.; Sun, Y.; Houacine, A. Vision-based human action classification using adaptive boosting algorithm. IEEE Sens. J. 2018, 18, 5115–5121. [Google Scholar] [CrossRef]
- Chatzitofis, A.; Saroglou, L.; Boutis, P.; Drakoulis, P.; Zioulis, N.; Subramanyam, S.; Kevelham, B.; Charbonnier, C.; Cesar, P.; Zarpalas, D.; et al. Human4d: A human-centric multimodal dataset for motions and immersive media. IEEE Access 2020, 8, 176241–176262. [Google Scholar] [CrossRef]
- Ionescu, C.; Papava, D.; Olaru, V.; Sminchisescu, C. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 1325–1339. [Google Scholar] [CrossRef] [PubMed]
- Mahmood, N.; Ghorbani, N.; Troje, N.F.; Pons-Moll, G.; Black, M.J. AMASS: Archive of motion capture as surface shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5442–5451. [Google Scholar]
- Zeng, M.; Nguyen, L.T.; Yu, B.; Mengshoel, O.J.; Zhu, J.; Wu, P.; Zhang, J. Convolutional neural networks for human activity recognition using mobile sensors. In Proceedings of the 6th International Conference on Mobile Computing, Applications and Services, Austin, TX, USA, 6–7 November 2014; pp. 197–205. [Google Scholar]
- Yang, J.; Nguyen, M.N.; San, P.P.; Li, X.; Krishnaswamy, S. Deep convolutional neural networks on multichannel time series for human activity recognition. In Proceedings of the Ijcai, Buenos Aires, Argentina, 25–31 July 2015; Volume 15, pp. 3995–4001. [Google Scholar]
- Ha, S.; Yun, J.M.; Choi, S. Multi-modal convolutional neural networks for activity recognition. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics, Kowloon Tong, Hong Kong, China, 9–12 October 2015; pp. 3017–3022. [Google Scholar]
- Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
- Ha, S.; Choi, S. Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 381–388. [Google Scholar]
- Edel, M.; Köppe, E. Binarized-blstm-rnn based human activity recognition. In Proceedings of the 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Alcala de Henares, Spain, 4–7 October 2016; pp. 1–7. [Google Scholar]
- Guan, Y.; Plötz, T. Ensembles of deep lstm learners for activity recognition using wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2017, 1, 1–28. [Google Scholar] [CrossRef]
- Hammerla, N.Y.; Halloran, S.; Plötz, T. Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv 2016, arXiv:1604.08880. [Google Scholar]
- Inoue, M.; Inoue, S.; Nishida, T. Deep recurrent neural network for mobile human activity recognition with high throughput. Artif. Life Robot. 2018, 23, 173–185. [Google Scholar] [CrossRef]
- Maurya, R.; Teo, T.H.; Chua, S.H.; Chow, H.C.; Wey, I.C. Complex Human Activities Recognition Based on High Performance 1D CNN Model. In Proceedings of the 2022 IEEE 15th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), Penang, Malaysia, 19–22 December 2022; pp. 330–336. [Google Scholar]
- Liang, Y.; Feng, K.; Ren, Z. Human Activity Recognition Based on Transformer via Smart-phone Sensors. In Proceedings of the 2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI), Taiyuan, China, 26–28 May 2023; pp. 267–271. [Google Scholar]
- Aswal, V.; Sreeram, V.; Kuchik, A.; Ahuja, S.; Patel, H. Real-time human activity generation using bidirectional long short term memory networks. In Proceedings of the 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13–15 May 2020; pp. 775–780. [Google Scholar]
- Choudhury, N.A.; Moulik, S.; Roy, D.S. Physique-based human activity recognition using ensemble learning and smartphone sensors. IEEE Sens. J. 2021, 21, 16852–16860. [Google Scholar] [CrossRef]
- Thakur, D.; Biswas, S.; Ho, E.S.; Chattopadhyay, S. Convae-lstm: Convolutional autoencoder long short-term memory network for smartphone-based human activity recognition. IEEE Access 2022, 10, 4137–4156. [Google Scholar] [CrossRef]
- Dong, Y.; Zhou, R.; Zhu, C.; Cao, L.; Li, X. Hierarchical activity recognition based on belief functions theory in body sensor networks. IEEE Sens. J. 2022, 22, 15211–15221. [Google Scholar] [CrossRef]
- Teng, Q.; Wang, K.; Zhang, L.; He, J. The layer-wise training convolutional neural networks using local loss for sensor-based human activity recognition. IEEE Sens. J. 2020, 20, 7265–7274. [Google Scholar] [CrossRef]
- Zilelioglu, H.; Khodabandelou, G.; Chibani, A.; Amirat, Y. Semi-Supervised Generative Adversarial Networks with Temporal Convolutions for Human Activity Recognition. IEEE Sens. J. 2023, 23, 12355–12369. [Google Scholar] [CrossRef]
- Mekruksavanich, S.; Jantawong, P.; Hnoohom, N.; Jitpattanakul, A. A novel deep bigru-resnet model for human activity recognition using smartphone sensors. In Proceedings of the 2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE), Bangkok, Thailand, 22–25 June 2022; pp. 1–5. [Google Scholar]
- Dubey, A.; Lyons, N.; Santra, A.; Pandey, A. XAI-BayesHAR: A novel Framework for Human Activity Recognition with Integrated Uncertainty and Shapely Values. In Proceedings of the 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas, 12–14 December 2022; pp. 1281–1288. [Google Scholar]
- Han, C.; Zhang, L.; Xu, S.; Wang, X.; Wu, H.; Song, A. An Efficient Diverse-branch Convolution Scheme for Sensor-Based Human Activity Recognition. IEEE Trans. Instrum. Meas. 2023, 72, 2509313. [Google Scholar] [CrossRef]
- Stolovas, I.; Suárez, S.; Pereyra, D.; De Izaguirre, F.; Cabrera, V. Human activity recognition using machine learning techniques in a low-resource embedded system. In Proceedings of the 2021 IEEE URUCON, Montevideo, Uruguay, 24–26 November 2021; pp. 263–267. [Google Scholar]
- Khatun, M.A.; Yousuf, M.A.; Moni, M.A. Deep CNN-GRU Based Human Activity Recognition with Automatic Feature Extraction Using Smartphone and Wearable Sensors. In Proceedings of the 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE), Kolkata, India, 20–21 January 2023; pp. 1–6. [Google Scholar]
- De Vita, A.; Russo, A.; Pau, D.; Di Benedetto, L.; Rubino, A.; Licciardo, G.D. A partially binarized hybrid neural network system for low-power and resource constrained human activity recognition. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 3893–3904. [Google Scholar] [CrossRef]
- Tang, Y.; Zhang, L.; Min, F.; He, J. Multiscale deep feature learning for human activity recognition using wearable sensors. IEEE Trans. Ind. Electron. 2022, 70, 2106–2116. [Google Scholar] [CrossRef]
- Rustam, F.; Reshi, A.A.; Ashraf, I.; Mehmood, A.; Ullah, S.; Khan, D.M.; Choi, G.S. Sensor-based human activity recognition using deep stacked multilayered perceptron model. IEEE Access 2020, 8, 218898–218910. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, S.; Yang, W.; Xu, Y. Environment-independent wi-fi human activity recognition with adversarial network. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 3330–3334. [Google Scholar]
- Hsieh, C.F.; Chen, Y.C.; Hsieh, C.Y.; Ku, M.L. Device-free indoor human activity recognition using Wi-Fi RSSI: Machine learning approaches. In Proceedings of the 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), Taoyuan, Taiwan, 28–30 September 2020; pp. 1–2. [Google Scholar]
- Salehinejad, H.; Hasanzadeh, N.; Djogo, R.; Valaee, S. Joint Human Orientation-Activity Recognition Using WIFI Signals for Human-Machine Interaction. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Zhang, J.; Wu, F.; Wei, B.; Zhang, Q.; Huang, H.; Shah, S.W.; Cheng, J. Data augmentation and dense-LSTM for human activity recognition using WiFi signal. IEEE Internet Things J. 2020, 8, 4628–4641. [Google Scholar] [CrossRef]
- Ding, X.; Jiang, T.; Li, Y.; Xue, W.; Zhong, Y. Device-free location-independent human activity recognition using transfer learning based on CNN. In Proceedings of the 2020 IEEE International Conference on Communications Workshops (ICC Workshops), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
- Khan, D.; Ho, I.W.H. Deep learning of CSI for efficient device-free human activity recognition. In Proceedings of the 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), New Orleans, LA, USA, 14 June–31 July 2021; pp. 19–24. [Google Scholar]
- Zeeshan, M.; Pandey, A.; Kumar, S. CSI-based device-free joint activity recognition and localization using Siamese networks. In Proceedings of the 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS), Bangalore, India, 4–8 January 2022; pp. 260–264. [Google Scholar]
- Xiang, F.; Nie, X.; Cui, C.; Nie, W.; Dong, X. Radar-based human activity recognition using two-dimensional feature extraction. In Proceedings of the 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 6–8 January 2023; pp. 267–271. [Google Scholar]
- Guo, Z.; Guendel, R.G.; Yarovoy, A.; Fioranelli, F. Point Transformer-Based Human Activity Recognition Using High-Dimensional Radar Point Clouds. In Proceedings of the 2023 IEEE Radar Conference (RadarConf23), San Antonio, TX, USA, 1–5 May 2023; pp. 1–6. [Google Scholar]
- Werthen-Brabants, L.; Bhavanasi, G.; Couckuyt, I.; Dhaene, T.; Deschrijver, D. Quantifying uncertainty in real time with split BiRNN for radar human activity recognition. In Proceedings of the 2022 19th European Radar Conference (EuRAD), Milan, Italy, 28–30 September 2022; pp. 173–176. [Google Scholar]
- McQuire, J.; Watson, P.; Wright, N.; Hiden, H.; Catt, M. A Data Efficient Vision Transformer for Robust Human Activity Recognition from the Spectrograms of Wearable Sensor Data. In Proceedings of the 2023 IEEE Statistical Signal Processing Workshop (SSP), Hanoi, Vietnam, 2–5 July 2023; pp. 364–368. [Google Scholar] [CrossRef]
- Luo, Y.; Coppola, S.M.; Dixon, P.C.; Li, S.; Dennerlein, J.T.; Hu, B. A database of human gait performance on irregular and uneven surfaces collected by wearable sensors. Sci. Data 2020, 7, 219. [Google Scholar] [CrossRef] [PubMed]
- Reiss, A.; Stricker, D. Creating and benchmarking a new dataset for physical activity monitoring. In Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments, Heraklion, Crete, Greece, 6–8 June 2012; pp. 1–8. [Google Scholar]
- Reiss, A.; Stricker, D. Introducing a new benchmarked dataset for activity monitoring. In Proceedings of the 2012 16th International Symposium on Wearable Computers, Newcastle, UK, 18–22 June 2012; pp. 108–109. [Google Scholar]
- Qin, W.; Wu, H.N. Switching GMM-HMM for Complex Human Activity Modeling and Recognition. In Proceedings of the 2022 China Automation Congress (CAC), Xiamen, China, 25–27 November 2022; pp. 696–701. [Google Scholar]
- Bhuiyan, R.A.; Amiruzzaman, M.; Ahmed, N.; Islam, M.R. Efficient frequency domain feature extraction model using EPS and LDA for human activity recognition. In Proceedings of the 2020 3rd IEEE International Conference on Knowledge Innovation and Invention (ICKII), Kaohsiung, Taiwan, 21–23 August 2020; pp. 344–347. [Google Scholar]
- Zhou, Y.; Yang, Z.; Zhang, X.; Wang, Y. A hybrid attention-based deep neural network for simultaneous multi-sensor pruning and human activity recognition. IEEE Internet Things J. 2022, 9, 25363–25372. [Google Scholar] [CrossRef]
- Li, W.; Feng, X.; He, Z.; Zheng, H. Human activity recognition based on data fusion of fmcw radar and image. In Proceedings of the 2021 7th International Conference on Computer and Communications (ICCC), Tianjin, China, 10–13 December 2021; pp. 943–947. [Google Scholar]
- Yen, C.T.; Liao, J.X.; Huang, Y.K. Human daily activity recognition performed using wearable inertial sensors combined with deep learning algorithms. IEEE Access 2020, 8, 174105–174114. [Google Scholar] [CrossRef]
- Chowdhury, A.I.; Ashraf, M.; Islam, A.; Ahmed, E.; Jaman, M.S.; Rahman, M.M. hActNET: An improved neural network based method in recognizing human activities. In Proceedings of the 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Istanbul, Turkey, 22–24 October 2020; pp. 1–6. [Google Scholar]
- Psychoula, I.; Singh, D.; Chen, L.; Chen, F.; Holzinger, A.; Ning, H. Users’ privacy concerns in IoT based applications. In Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Guangzhou, China, 8–12 October 2018; pp. 1887–1894. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advancements in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
- Keren, G.; Schuller, B. Convolutional RNN: An enhanced model for extracting features from sequential data. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 3412–3419. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Gundu, S.; Syed, H. Vision-Based HAR in UAV Videos Using Histograms and Deep Learning Techniques. Sensors 2023, 23, 2569. [Google Scholar] [CrossRef] [PubMed]
- Islam, M.M.; Iqbal, T. Hamlet: A hierarchical multimodal attention-based human activity recognition algorithm. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 10285–10292. [Google Scholar]
- Gupta, D.; Singh, A.K.; Gupta, N.; Vishwakarma, D.K. SDL-Net: A Combined CNN & RNN Human Activity Recognition Model. In Proceedings of the 2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT), Bhubaneswar, India, 9–11 June 2023; pp. 1–5. [Google Scholar]
- Popescu, A.C.; Mocanu, I.; Cramariuc, B. Fusion mechanisms for human activity recognition using automated machine learning. IEEE Access 2020, 8, 143996–144014. [Google Scholar] [CrossRef]
- Kumar, K.V.; Harikiran, J.; Chandana, B.S. Human Activity Recognition with Privacy Preserving using Deep Learning Algorithms. In Proceedings of the 2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP), Vijayawada, India, 12–14 February 2022; pp. 1–8. [Google Scholar]
- Bukht, T.F.N.; Rahman, H.; Jalal, A. A Novel Framework for Human Action Recognition Based on Features Fusion and Decision Tree. In Proceedings of the 2023 4th International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 20–22 February 2023; pp. 1–6. [Google Scholar]
- Mutegeki, R.; Han, D.S. A CNN-LSTM approach to human activity recognition. In Proceedings of the 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020; pp. 362–366. [Google Scholar]
- Razmah, M.; Prabha, R.; Divya, B.; Sridevi, S.; Naveen, A. LSTM Method for Human Activity Recognition of Video Using PSO Algorithm. In Proceedings of the 2022 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS), Chennai, India, 8–9 December 2022; pp. 1–6. [Google Scholar]
- Alrashdi, I.; Siddiqi, M.H.; Alhwaiti, Y.; Alruwaili, M.; Azad, M. Maximum entropy Markov model for human activity recognition using depth camera. IEEE Access 2021, 9, 160635–160645. [Google Scholar] [CrossRef]
- Ahad, M.A.R.; Antar, A.D.; Ahmed, M.; Ahad, M.A.R.; Antar, A.D.; Ahmed, M. Sensor-based benchmark datasets: Comparison and analysis. In IoT Sensor-Based Activity Recognition: Human Activity Recognition; Springer: Cham, Switzerland, 2021; pp. 95–121. [Google Scholar]
- Blunck, H.; Bhattacharya, S.; Stisen, A.; Prentow, T.S.; Kjærgaard, M.B.; Dey, A.; Jensen, M.M.; Sonne, T. Activity recognition on smart devices: Dealing with diversity in the wild. Getmobile Mob. Comput. Commun. 2016, 20, 34–38. [Google Scholar] [CrossRef]
- Torres, R.L.S.; Ranasinghe, D.C.; Shi, Q.; Sample, A.P. Sensor enabled wearable RFID technology for mitigating the risk of falls near beds. In Proceedings of the 2013 IEEE International Conference on RFID (RFID), Orlando, FL, USA, 30 April–2 May 2013; pp. 191–198. [Google Scholar]
- Palumbo, F.; Gallicchio, C.; Pucci, R.; Micheli, A. Human activity recognition using multisensor data fusion based on reservoir computing. J. Ambient. Intell. Smart Environ. 2016, 8, 87–107. [Google Scholar] [CrossRef]
- Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A public domain dataset for human activity recognition using smartphones. In Proceedings of the Esann, Bruges, Belgium, 24–26 April 2013; Volume 3, p. 3. [Google Scholar]
- Reyes-Ortiz, J.L.; Oneto, L.; Samà, A.; Parra, X.; Anguita, D. Transition-aware human activity recognition using smartphones. Neurocomputing 2016, 171, 754–767. [Google Scholar] [CrossRef]
- Casale, P.; Pujol, O.; Radeva, P. Personalization and user verification in wearable systems using biometric walking patterns. Pers. Ubiquitous Comput. 2012, 16, 563–580. [Google Scholar] [CrossRef]
- Chavarriaga, R.; Sagha, H.; Calatroni, A.; Digumarti, S.T.; Tröster, G.; Millán, J.D.R.; Roggen, D. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognit. Lett. 2013, 34, 2033–2042. [Google Scholar] [CrossRef]
- Ordónez, F.J.; De Toledo, P.; Sanchis, A. Activity recognition using hybrid generative/discriminative models on home environments using binary sensors. Sensors 2013, 13, 5460–5477. [Google Scholar] [CrossRef] [PubMed]
- Baños, O.; Damas, M.; Pomares, H.; Rojas, I.; Tóth, M.A.; Amft, O. A benchmark dataset to evaluate sensor displacement in activity recognition. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; pp. 1026–1035. [Google Scholar]
- Altun, K.; Barshan, B.; Tunçel, O. Comparative study on classifying human activities with miniature inertial and magnetic sensors. Pattern Recognit. 2010, 43, 3605–3620. [Google Scholar] [CrossRef]
- Bacciu, D.; Barsocchi, P.; Chessa, S.; Gallicchio, C.; Micheli, A. An experimental characterization of reservoir computing in ambient assisted living applications. Neural Comput. Appl. 2014, 24, 1451–1464. [Google Scholar] [CrossRef]
- Banos, O.; Garcia, R.; Holgado-Terriza, J.A.; Damas, M.; Pomares, H.; Rojas, I.; Saez, A.; Villalonga, C. mHealthDroid: A novel framework for agile development of mobile health applications. In Proceedings of the Ambient Assisted Living and Daily Activities: 6th International Work-Conference, IWAAL 2014, Belfast, UK, 2–5 December 2014; Proceedings 6. Springer International Publishing: Berlin/Heidelberg, Germany, 2014; pp. 91–98. [Google Scholar]
- Weiss, G.M.; Yoneda, K.; Hayajneh, T. Smartphone and smartwatch-based biometrics using activities of daily living. IEEE Access 2019, 7, 133190–133202. [Google Scholar] [CrossRef]
- Schmidt, P.; Reiss, A.; Duerichen, R.; Marberger, C.; Van Laerhoven, K. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In Proceedings of the 20th ACM international conference on multimodal interaction, Boulder, CO, USA, 16–20 October 2018; pp. 400–408. [Google Scholar]
- Schuldt, C.; Laptev, I.; Caputo, B. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, 26 August 2004; Volume 3, pp. 32–36. [Google Scholar]
- Ballan, L.; Bertini, M.; Del Bimbo, A.; Seidenari, L.; Serra, G. Effective codebooks for human action categorization. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, Kyoto, Japan, 27 September–4 October 2009; pp. 506–513. [Google Scholar]
- Li, W.; Wong, Y.; Liu, A.A.; Li, Y.; Su, Y.T.; Kankanhalli, M. Multi-camera action dataset (MCAD): A dataset for studying non-overlapped cross-camera action recognition. arXiv 2016, arXiv:1607.06408. [Google Scholar]
- Wang, J.; Liu, Z.; Wu, Y.; Yuan, J. Mining actionlet ensemble for action recognition with depth cameras. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 1290–1297. [Google Scholar]
- Reddy, K.K.; Shah, M. Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 2013, 24, 971–981. [Google Scholar] [CrossRef]
- Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A large video database for human motion recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [Google Scholar]
- Marszalek, M.; Laptev, I.; Schmid, C. Actions in context. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 2929–2936. [Google Scholar]
- Gorelick, L.; Blank, M.; Shechtman, E.; Irani, M.; Basri, R. Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2247–2253. [Google Scholar] [CrossRef] [PubMed]
- Yao, B.; Jiang, X.; Khosla, A.; Lin, A.L.; Guibas, L.; Fei-Fei, L. Human action recognition by learning bases of action attributes and parts. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1331–1338. [Google Scholar]
- Weinl, D.; Ronfard, R.; Boyer, E. Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 2006, 104, 249–257. [Google Scholar]
- Stein, S.; McKenna, S.J. Combining embedded accelerometers with computer vision for recognizing food preparation activities. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Zurich, Switzerland, 8–12 September 2013; pp. 729–738. [Google Scholar]
- Nghiem, A.T.; Bremond, F.; Thonnat, M.; Valentin, V. ETISEO, performance evaluation for video surveillance systems. In Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, London, UK, 5–7 September 2007; pp. 476–481. [Google Scholar]
- Niebles, J.C.; Chen, C.W.; Fei-Fei, L. Modeling temporal structure of decomposable motion segments for activity classification. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Proceedings, Part II 11. Springer: Berlin/Heidelberg, Germany, 2010; pp. 392–405. [Google Scholar]
- Ryoo, M.S.; Aggarwal, J.K. UT-interaction dataset, ICPR contest on semantic description of human activities (SDHA). In Proceedings of the IEEE International Conference on Pattern Recognition Workshops, Istanbul, Turkey, 23–26 August 2010; Volume 2, p. 4. [Google Scholar]
- Chen, C.-C.; Aggarwal, J.K. Recognizing human action from a far field of view. In Proceedings of the 2009 Workshop on Motion and Video Computing (WMVC), Snowbird, UT, USA, 8–9 December 2009; pp. 1–7. [Google Scholar]
- Caba Heilbron, F.; Escorcia, V.; Ghanem, B.; Carlos Niebles, J. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; 12 June 2015; pp. 961–970. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The kinetics human action video dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Soomro, K.; Zamir, A.R.; Shah, M. A dataset of 101 human action classes from videos in the wild. Cent. Res. Comput. Vis. 2012, 2, 1–7. [Google Scholar]
- Liu, J.; Luo, J.; Shah, M. Recognizing realistic actions from videos “in the wild”. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1996–2003. [Google Scholar]
- Berenson, D.; Abbeel, P.; Goldberg, K. A robot path planning framework that learns from experience. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, St. Paul, MN, USA, 14–18 May 2012; pp. 3671–3678. [Google Scholar]
- Martinez, J.; Black, M.J.; Romero, J. On human motion prediction using recurrent neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2891–2900. [Google Scholar]
- Wang, H.; Dong, J.; Cheng, B.; Feng, J. PVRED: A position-velocity recurrent encoder-decoder for human motion prediction. IEEE Trans. Image Process. 2021, 30, 6096–6106. [Google Scholar] [CrossRef]
- Cao, Z.; Gao, H.; Mangalam, K.; Cai, Q.Z.; Vo, M.; Malik, J. Long-term human motion prediction with scene context. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part I 16. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 387–404. [Google Scholar]
- Aksan, E.; Kaufmann, M.; Cao, P.; Hilliges, O. A spatio-temporal transformer for 3d human motion prediction. In Proceedings of the 2021 International Conference on 3D Vision (3DV), Virtual Conference, 1–3 December 2021; pp. 565–574. [Google Scholar]
- Medjaouri, O.; Desai, K. Hr-stan: High-resolution spatio-temporal attention network for 3d human motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LO, USA, 18–24 June 2022; pp. 2540–2549. [Google Scholar]
- Tanberk, S.; Tükel, D.B.; Uysal, M. A Simple AI-Powered Video Analytics Framework for Human Motion Imitation. In Proceedings of the 2020 Innovations in Intelligent Systems and Applications Conference (ASYU), Istanbul, Turkey, 15–17 October 2020; pp. 1–5. [Google Scholar]
Categories | Main Focus | Future Directions Discussed | Comparison of Techniques |
---|---|---|---|
Device-free | Comprehensive survey of human activity recognition focusing on device-free solutions; taxonomy proposed. | Yes | Yes |
Multiple | Survey of HAR methods in eldercare and healthcare using IoT; compares various data collection methods and machine learning techniques. | Yes | Yes |
Wearable Device | Survey of HAR using wearable sensors; general architecture, taxonomy, key issues, challenges, and system evaluation. | Yes | Yes |
Machine Learning | Overview of machine learning techniques in activity recognition; discusses challenges and recent advances. | Yes | No |
Sensor-based and Vision-based | Comprehensive review of HAR technology; classification of methodologies and evaluation of advantages and weaknesses. | Yes | Yes |
Vision-based | Detailed review of human activity classification from videos and images; categorization of methodologies and dataset analysis. | Yes | Yes |
Video-based | Extensive survey of video-based human activity recognition; covers core technology, recognition systems, and applications. | Yes | No |
Multiple | Overview of HAR categorizing methods; comparative analysis of state-of-the-art techniques. | Yes | Yes |
Multiple | Analysis of human action recognition systems; focus on feature learning-based representations and deep learning. | Yes | Yes |
Transfer Learning in HAR | Impact of transfer learning in HAR and other areas; reviews related research articles focusing on vision sensor-based HAR. | Yes | Yes |
Video-based | Survey of human action identification from video; comparison of hand-crafted and automatic feature extraction approaches. | Yes | Yes |
Deep Learning in HAR | Extensive survey on deep learning applications in HAR; detailed review of contemporary deep learning methods. | Yes | No |
Repository Name | Title | # Citations | Abstract | Year |
---|---|---|---|---|
IEEEXplore | Yes | Yes | Yes | Yes |
arXiv | Yes | No | Yes | Yes |
MDPI | Yes | No | Yes | Yes |
Paper | Approach | Cost | Performance | Dataset | Supervised? |
---|---|---|---|---|---|
[3] | SVM + 1D CNN | Low | High | UCI-HAR | |
[4] | MLP Ensemble | High | High | REALDISP | |
[5] | 1D CNN | Medium | Medium | SHO, MHealth | |
[52] | 1D CNN | High | High | UCI-HAR, WISDM, Skoda Dataset, self-prepared | |
[6] | Self-attention + 1D CNN | High | High | UCI-HAR, MHealth | |
[53] | Transformers | High | High | WISDM | |
[54] | LSTM | Medium | Medium | WISDM | |
[55] | MLP Ensembles | High | High | MotionSense (kaggle), self-prepared | |
[56] | ConvLSTM | High | High | WISDM, UCI, PAMAP2, OPPORTUNITY | |
[8] | Deformable CNN | High (4 × 3090) | Medium | OPPORTUNITY, UNIMIB-SHAR, WISDM | |
[57] | LSTMs, Hierarchical Clustering | Medium | Medium | MHealth, UCI-HAR | |
[58] | CNN | Medium | High | UCI-HAR, OPPORTUNITY, UNIMIB-SHAR, WISDM, PAMAP2 | |
[59] | Temporal CNN | High | Medium | PAMAP2, OPPORTUNITY, LISSI | Semi |
[60] | GRU-ResNet | High | High | UCI-HAR | Supervised |
[61] | MLP | Medium | High | N/A | Unsupervised |
[62] | CNN | High | Medium | OPPORTUNITY, UNIMIB-SHAR, WISDM | Supervised |
[63] | Linear Discriminant Analysis | Low | High | Self-prepared | Supervised |
[64] | GRU-CNN | High | High | UCI-HAR, OPPORTUNITY, MHealth | Supervised |
[65] | CNN | Medium | High | PAMAP2 | Supervised |
[66] | CNN | Low | High | UCI-HAR, PAMAP2, WISDM, UNIMIB-SHAR | Supervised |
[67] | MLP | Low | High | UCI ML Repository | Supervised |
Paper | Approach | Cost | Performance | Dataset | Supervised? |
---|---|---|---|---|---|
[68] | CNN-RNN | Medium | Medium | Self collected | Unsupervised |
[69] | SVM, MLP, CNN | Low | Medium | Self collected | Unsupervised |
[70] | CNN | Low | High | Self collected | Supervised |
[71] | ConvLSTM, PCA with STFT | Low | High | Self collected | Supervised |
[72] | CNN | Low | High with Transfer Learning, Low w/o Transfer Learning | Self collected | Supervised |
[73] | CNN Unet | Low | Medium | Self collected | Supervised |
[74] | CNN | Low | High | Self collected | Unsupervised |
Paper | Approach | Cost | Performance | Dataset | Supervised? |
---|---|---|---|---|---|
[75] | 2DPCA, 2DLDA, kNN | Low | High | University of Glasgow Dataset | Unsupervised |
[76] | Transformer | High | High | 4d Imaging Radar Dataset | Supervised |
[77] | RNN | Medium | N/A | PARRad Dataset | Unsupervised |
Paper | Approach | Cost | Performance | Dataset | Supervised? |
---|---|---|---|---|---|
[78] | Transformer | High | High | [79,80,81] | Supervised |
[82] | GMM, HMM | Low | Medium | NGSIM (Next Generation Simulation (NGSIM) https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Vehicle-Trajector/8ect-6jqj (accessed on 12 December 2023)) | Unsupervised |
[83] | SVM | Low | High | UCI-HAR | Supervised |
[84] | Attention based | High | Medium | OPPORTUNITY, UCI ML REPOSITORY, Daily life activities | Supervised |
[85] | CNN | High | High | Self-supervised | Supervised |
Paper | Approach | Cost | Performance | Dataset | Supervised? |
---|---|---|---|---|---|
[98] | Attention, LSTM | High | High | UTD-MHAD, UT-Kinect, UCSD MIT | Supervised |
[99] | CNN + LSTM | High | High | Kinetic Activity Recognition Dataset | Supervised |
[100] | 3D CNNs | High | High | MSRDailyActivity3D, NTU RGB + D and UTD-MHAD, PRECIS HAR | Supervised |
[101] | DBN | Low | High | HMDB51 | Supervised |
[102] | Decision Tree | Low | High | UT-Interaction | Unsupervised |
[103] | CNN LSTM | Medium | High | iSPL, UCI-HAR | Supervised |
[104] | LSTM | Medium | High | UCF-50 | Supervised |
[105] | HMM | Low | High | Depth Dataset Using Kinect Camera | Unsupervised |
Dataset | # Subjects | # Activities | Sensors | # Instances | # Source |
---|---|---|---|---|---|
HHAR | 9 | 6 | Accelerometer, gyroscope | 44 million | [107] |
UCIBWS | 14 | 4 | RFID | 75k | [108] |
AReM | 1 | 6 | IRIS Nodes | 42k | [109] |
HAR | 30 | 6 | Accelerometer, gyroscope | 10k | [110] |
HAPT | 30 | 12 | Accelerometer, gyroscope | 10k | [111] |
Single Chest | 15 | 7 | Accelerometer | N/A | [112] |
OPPORTUNITY | 4 | 35 | Accelerometer, motion sensors, ambient sensors | 2551 | [113] |
ADLs | 2 | 10 | PIR, magnetic, pressure and electric sensor | 2747 | [114] |
REALDISP | 17 | 33 | Accelerometer, gyroscope | 1419 | [115] |
UIFWA | 22 | 2 | Accelerometer | N/A | [112] |
PAMAP2 | 9 | 19 | IMU, ECG | 3.8 million | [81] |
DSA | 8 | 19 | Accelerometer, magnetometers, gyroscope | 9120 | [116] |
Wrist ADL | 16 | 14 | Accelerometer | N/A | [116] |
RSS | N/A | 2 | N/A | 13,917 | [117] |
MHEALTH | 10 | 12 | Accelerometer, ECG | 120 | [118] |
WISDM | 51 | 18 | Accelerometer, gyroscope | 15 million | [119] |
WESAD | 15 | 3 | N/A | 63 million | [120] |
Dataset | Action | Behavior | Human–Object Interaction | Human–Human Interaction | Group Activities |
---|---|---|---|---|---|
KTH [121] | ✓ | ||||
Weizmann [128] | ✓ | ||||
Stanford 40 [129] | ✓ | ||||
IXMAS [130] | ✓ | ||||
VISOR [122] | ✓ | ||||
MCAD [123] | ✓ | ||||
MSR Daily Activity 3D [124] | ✓ | ✓ | ✓ | ||
50 Salads [131] | ✓ | ✓ | |||
UCF50 [125] | ✓ | ✓ | |||
ETISEO [132] | ✓ | ✓ | |||
Olympic Sports [133] | ✓ | ✓ | |||
UT-Interaction [134] | ✓ | ✓ | |||
UT-Tower [135] | ✓ | ✓ | ✓ | ||
ActivityNet [136] | ✓ | ✓ | ✓ | ||
Kinetics [137] | ✓ | ✓ | ✓ | ||
HMDB-51 [126] | ✓ | ✓ | ✓ | ||
Hollywood [127] | ✓ | ✓ | |||
Hollywood2 [127] | ✓ | ✓ | |||
UCF-101 [138] | ✓ | ✓ | ✓ | ||
YouTube Action [139] | ✓ | ✓ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kaseris, M.; Kostavelis, I.; Malassiotis, S. A Comprehensive Survey on Deep Learning Methods in Human Activity Recognition. Mach. Learn. Knowl. Extr. 2024, 6, 842-876. https://doi.org/10.3390/make6020040
Kaseris M, Kostavelis I, Malassiotis S. A Comprehensive Survey on Deep Learning Methods in Human Activity Recognition. Machine Learning and Knowledge Extraction. 2024; 6(2):842-876. https://doi.org/10.3390/make6020040
Chicago/Turabian StyleKaseris, Michail, Ioannis Kostavelis, and Sotiris Malassiotis. 2024. "A Comprehensive Survey on Deep Learning Methods in Human Activity Recognition" Machine Learning and Knowledge Extraction 6, no. 2: 842-876. https://doi.org/10.3390/make6020040
APA StyleKaseris, M., Kostavelis, I., & Malassiotis, S. (2024). A Comprehensive Survey on Deep Learning Methods in Human Activity Recognition. Machine Learning and Knowledge Extraction, 6(2), 842-876. https://doi.org/10.3390/make6020040