Improving Wearable-Based Activity Recognition Using Image Representations †
Abstract
:1. Introduction
2. Related Work
2.1. CNNs for Time Series
2.2. Image Representations of Time Series
3. Approach
3.1. Image Generation Process
3.2. Patterns over Time
- Oscillatory variation. This pattern refers to the case in which the value of the data points in the time window go up and down from one time step to the next one. Specifically, if we consider 3 consecutive time steps , , and , where , we have either that at times and the values of the corresponding data points and are strictly higher than the value of the data point at time , or that at time the value of the corresponding data point is strictly higher than both the values of the data points at and at . This pattern is illustrated on the blue areas shown in Figure 2.
- Steady variation. This pattern refers to the case where the value of the time window goes either up or down consistently. This is illustrated on the green areas in Figure 2.
- Range. This pattern refers to the maximum, minimum, and the difference between maximum and minimum data point values within the time window.
3.3. Canvas & Regions [Locality]
3.4. Pixel Filling [Edge Detection]
3.5. Mapping Inertial Sensors to Pixels
3.6. Filling Strategies
- Counterclockwise (CCW). This strategy, which is illustrated in Figure 6a, starts at the center pixel of the region, or in the pixel right before the center in the case of a region with an even number of pixels. From that point, the ‘marked’ pixels are filled in continuously, first up one pixel, then left, then down, and so on in a counterclockwise manner. The path that it is followed is such that does not considers the same pixel twice and leaves no pixel ‘unmarked’ on the way.
- Clockwise (CW). This strategy (illustrated in Figure 6b) starts at the pixel of the top-left corner of the region. The path then makes pixels ‘marked’ in a clockwise fashion, while avoids considering the same pixel twice or skipping any pixel.
- Diagonal (Diag). This strategy that is depicted in Figure 6c has the pixel at the top-left corner of the region its starting point. The path from there corresponds to a 45° diagonal that goes upwards. The pixels are never considered twice and the pixels on the path are never left ‘unmarked’.
- Strokes (Strk). This strategy (see Figure 6d) seeks to produce continuous ‘strokes’ that pass over 3 regions of the same row of the image representation. To do this, we consider for each of the regions on the row diagonal or horizontal lines, which are then followed on the next region by a different type of line. That is, the path considered in the first region on the left is of reverse diagonals. This is then followed in the next region by a path of horizontal lines. Finally, the last region considers a path of diagonal lines. The lines are drawn from left to right and are stacked one on top and one on the bottom of the latest line. By default we consider the top-left corner the beginning of the path. This is true for the first left-most region and for any region where no pixel has been marked in the previous region to the left that is directly adjacent to the region under consideration. If the case is different, then the starting point considered is the one next to the highest ‘marked’ pixel in the previous region to the left which is directly adjacent to the region under consideration. In Figure 6d an example of this strategy is shown. As it can be seen, the left-most region starts at the top-left corner (depicted by a black dot) and it goes into filling that reverse diagonal, then the one on the top of that, then the one on the bottom of the first one, and finishes half-way through the diagonal on top of the second diagonal. The next region to the right then starts at the pixel that is adjacent to the highest ‘marked’ pixel in the previous region, using horizontal lines from left to right, and filling next the line on top of the first one and then the line below. The third region is filled in a similar way, but following diagonal lines.
3.7. Coloring
3.7.1. Color Channels to Represent Nearby Time Windows
3.7.2. Color Channels to Represent Multiple Sensors
3.8. Augmented Canvas for Multiple Sensors
4. Evaluation Methodology
4.1. Datasets
4.1.1. WISDM Dataset
4.1.2. UCI-HAR Dataset
4.1.3. USC-HAD Dataset
4.1.4. PAMAP2 Dataset
4.1.5. Opportunity Dataset
4.1.6. Daphnet Freezing of Gait Dataset
4.1.7. Skoda Mini Checkpoint Dataset
4.2. Metrics
5. Results
5.1. Benchmarking
5.1.1. Comparison to Existent Image Representations
5.1.2. Comparison to Best Performing Approaches for Each Dataset
WISDM Dataset
UCI-HAR Dataset
USC-HAD Dataset
PAMAP2 Dataset
Opportunity Dataset
Daphnet Freezing of Gait Dataset
Skoda Mini Checkpoint Dataset
5.1.3. Summary of Benchmarking
5.2. Comparison between Different Canvas Layouts
6. Discussion
6.1. Threats to Validity
6.2. Approach Limitations
6.3. Approach Practicality
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Weiser, M. The Computer for the 21 st Century. Sci. Am. 1991, 265, 94–105. [Google Scholar] [CrossRef]
- Myles, G.; Friday, A.; Davies, N. Preserving privacy in environments with location-based applications. IEEE Pervasive Comput. 2003, 2, 56–64. [Google Scholar] [CrossRef] [Green Version]
- Guinea, A.S.; Sarabchian, M.; Mühlhäuser, M. Image-based Activity Recognition from IMU Data. In Proceedings of the 2021 IEEE international conference on pervasive computing and communications workshops (PerCom workshops), Kassel, Germany, 22–26 March 2021. [Google Scholar]
- Sanchez Guinea, A.; Heinrich, S.; Mühlhäuser, M. VIDENS: Vision-based User Identification from Inertial Sensors. In Proceedings of the 2021 International Symposium on Wearable Computers, Virtual Event, 21–26 September 2021; pp. 153–155. [Google Scholar]
- Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Ordóñez, F.J.; Roggen, D. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef] [Green Version]
- Ha, S.; Yun, J.M.; Choi, S. Multi-modal convolutional neural networks for activity recognition. In Proceedings of the 2015 IEEE International conference on systems, man, and cybernetics, Hong Kong, China, 9–12 October 2015; pp. 3017–3022. [Google Scholar]
- Yang, J.B.; Nguyen, M.N.; San, P.P.; Li, X.L.; Krishnaswamy, S. Deep convolutional neural networks on multichannel time series for human activity recognition. In Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; pp. 3995–4001. [Google Scholar]
- Hammerla, N.Y.; Halloran, S.; Plötz, T. Deep, convolutional, and recurrent models for human activity recognition using wearables. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; AAAI Press: Palo Alto, CA, USA, 2016; pp. 1533–1540. [Google Scholar]
- Ha, S.; Choi, S. Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 381–388. [Google Scholar]
- Chen, L.; Zhang, Y.; Peng, L. METIER: A Deep Multi-Task Learning Based Activity and User Recognition Model Using Wearable Sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 5. [Google Scholar] [CrossRef] [Green Version]
- Huang, A.; Wang, D.; Zhao, R.; Zhang, Q. Au-Id: Automatic User Identification and Authentication Through the Motions Captured from Sequential Human Activities Using RFID. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2019, 3, 1–26. [Google Scholar] [CrossRef]
- Sheng, T.; Huber, M. Weakly Supervised Multi-Task Representation Learning for Human Activity Analysis Using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 57. [Google Scholar] [CrossRef]
- Wang, Z.; Oates, T. Encoding time series as images for visual inspection and classification using tiled convolutional neural networks. In Proceedings of the Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–26 January 2015. [Google Scholar]
- Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. In Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015; AAAI Press: Palo Alto, CA, USA, 2015; pp. 3939–3945. [Google Scholar]
- Hatami, N.; Gavet, Y.; Debayle, J. Classification of time-series images using deep convolutional neural networks. In Proceedings of the Tenth International Conference on Machine Vision (ICMV 2017), Vienna, Austria, 13–15 November 2017; International Society for Optics and Photonics: Bellingham, WA, USA, 2018; Volume 10696, p. 106960Y. [Google Scholar]
- Yang, C.L.; Yang, C.Y.; Chen, Z.X.; Lo, N.W. Multivariate Time Series Data Transformation for Convolutional Neural Network. In Proceedings of the 2019 IEEE/SICE International Symposium on System Integration (SII), Paris, France, 14–16 January 2019; pp. 188–192. [Google Scholar]
- Alsheikh, M.A.; Selim, A.; Niyato, D.; Doyle, L.; Lin, S.; Tan, H.P. Deep activity recognition models with triaxial accelerometers. In Proceedings of the Workshops at the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Alaskar, H. Deep learning-based model architecture for time-frequency images analysis. Int. J. Adv. Comput. Sci. Appl. 2018, 9, 1–9. [Google Scholar] [CrossRef]
- Halberstadt, A.L. Automated detection of the head-twitch response using wavelet scalograms and a deep convolutional neural network. Sci. Rep. 2020, 10, 8344. [Google Scholar] [CrossRef]
- An, S.; Medda, A.; Sawka, M.N.; Hutto, C.J.; Millard-Stafford, M.L.; Appling, S.; Richardson, K.L.; Inan, O.T. AdaptNet: Human Activity Recognition via Bilateral Domain Adaptation Using Semi-Supervised Deep Translation Networks. IEEE Sens. J. 2021, 21, 20398–20411. [Google Scholar] [CrossRef]
- Jiang, W.; Yin, Z. Human activity recognition using wearable sensors by deep convolutional neural networks. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; pp. 1307–1310. [Google Scholar]
- Ignatov, A. Real-time human activity recognition from accelerometer data using Convolutional Neural Networks. Appl. Soft Comput. 2018, 62, 915–922. [Google Scholar] [CrossRef]
- Jafari, A.; Ganesan, A.; Thalisetty, C.S.K.; Sivasubramanian, V.; Oates, T.; Mohsenin, T. Sensornet: A scalable and low-power deep convolutional neural network for multimodal data classification. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 66, 274–287. [Google Scholar] [CrossRef]
- Wang, R. Edge detection using convolutional neural network. In International Symposium on Neural Networks; Springer: Berlin/Heidelberg, Germany, 2016; pp. 12–20. [Google Scholar]
- Lu, J.; Tong, K.Y. Robust Single Accelerometer-Based Activity Recognition Using Modified Recurrence Plot. IEEE Sens. J. 2019, 19, 6317–6324. [Google Scholar] [CrossRef]
- Haresamudram, H.; Anderson, D.V.; Plötz, T. On the role of features in human activity recognition. In Proceedings of the 23rd International Symposium on Wearable Computers, London, UK, 9–13 September 2019; pp. 78–88. [Google Scholar]
- Kwapisz, J.R.; Weiss, G.M.; Moore, S.A. Activity recognition using cell phone accelerometers. ACM SigKDD Explor. Newsl. 2011, 12, 74–82. [Google Scholar] [CrossRef]
- Hossain, H.S.; Al Haiz Khan, M.A.; Roy, N. DeActive: Scaling activity recognition with active deep learning. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2018, 2, 1–23. [Google Scholar] [CrossRef] [Green Version]
- Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A Public Domain Dataset for Human Activity Recognition Using Smartphones; Esann: Bruges, Belgium, 2013; Volume 3, p. 3. [Google Scholar]
- Murad, A.; Pyun, J.Y. Deep recurrent neural networks for human activity recognition. Sensors 2017, 17, 2556. [Google Scholar] [CrossRef] [Green Version]
- Chavarriaga, R.; Sagha, H.; Calatroni, A.; Digumarti, S.T.; Tröster, G.; Millán, J.d.R.; Roggen, D. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognit. Lett. 2013, 34, 2033–2042. [Google Scholar] [CrossRef] [Green Version]
- Bachlin, M.; Plotnik, M.; Roggen, D.; Maidan, I.; Hausdorff, J.M.; Giladi, N.; Troster, G. Wearable assistant for Parkinson’s disease patients with the freezing of gait symptom. IEEE Trans. Inf. Technol. Biomed. 2009, 14, 436–446. [Google Scholar] [CrossRef]
- Stiefmeier, T.; Roggen, D.; Ogris, G.; Lukowicz, P.; Tröster, G. Wearable activity tracking in car manufacturing. IEEE Pervasive Comput. 2008, 7, 42–50. [Google Scholar] [CrossRef]
- Zeng, M.; Gao, H.; Yu, T.; Mengshoel, O.J.; Langseth, H.; Lane, I.; Liu, X. Understanding and improving recurrent networks for human activity recognition by continuous attention. In Proceedings of the 2018 ACM International Symposium on Wearable Computers, Singapore, 8–12 October 2018; pp. 56–63. [Google Scholar]
- Zeng, M.; Nguyen, L.T.; Yu, B.; Mengshoel, O.J.; Zhu, J.; Wu, P.; Zhang, J. Convolutional neural networks for human activity recognition using mobile sensors. In Proceedings of the 6th International Conference on Mobile Computing, Applications and Services, Austin, TX, USA, 6–7 November 2014; pp. 197–205. [Google Scholar]
- Zhang, Y.; Zhang, Z.; Zhang, Y.; Bao, J.; Zhang, Y.; Deng, H. Human Activity Recognition Based on Motion Sensor Using U-Net. IEEE Access 2019, 7, 75213–75226. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Qian, H.; Pan, S.J.; Da, B.; Miao, C. A novel distribution-embedded neural network for sensor-based activity recognition. IJCAI 2019, 2019, 5614–5620. [Google Scholar]
- Ronao, C.A.; Cho, S.B. Human activity recognition with smartphone sensors using deep learning neural networks. Expert Syst. Appl. 2016, 59, 235–244. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, L.; Cao, Z.; Guo, J. Distilling the knowledge from handcrafted features for human activity recognition. IEEE Trans. Ind. Inform. 2018, 14, 4334–4342. [Google Scholar] [CrossRef]
- Xi, R.; Hou, M.; Fu, M.; Qu, H.; Liu, D. Deep dilated convolution on multimodality time series for human activity recognition. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Xia, K.; Huang, J.; Wang, H. LSTM-CNN Architecture for Human Activity Recognition. IEEE Access 2020, 8, 56855–56866. [Google Scholar] [CrossRef]
- Liu, S.; Yao, S.; Li, J.; Liu, D.; Wang, T.; Shao, H.; Abdelzaher, T. GIobalFusion: A Global Attentional Deep Learning Framework for Multisensor Information Fusion. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 1–27. [Google Scholar] [CrossRef] [Green Version]
- Guan, Y.; Plötz, T. Ensembles of deep lstm learners for activity recognition using wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2017, 1, 1–28. [Google Scholar] [CrossRef] [Green Version]
- Ma, H.; Li, W.; Zhang, X.; Gao, S.; Lu, S. AttnSense: Multi-level attention mechanism for multimodal human activity recognition. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; AAAI Press: Palo Alto, CA, USA, 2019; pp. 3109–3115. [Google Scholar]
- Zhang, Y.; Zhang, Y.; Zhang, Z.; Bao, J.; Song, Y. Human activity recognition based on time series analysis using U-Net. arXiv 2018, arXiv:1809.08113. [Google Scholar]
Approach | Dataset | Accuracy | Ours | Difference |
---|---|---|---|---|
Jiang and Yin [23] | UCI-HAR | 0.9518 | 0.9930 | +0.0412 |
USC-HAD | 0.9701 | 0.9900 | +0.0199 | |
Ignatov [24] | UCI-HAR | 0.9760 | 0.9930 | +0.0170 |
WISDM | 0.9332 | 0.9750 | +0.0418 | |
Jafariet al. [25] | PAMAP2 | 0.9800 | 0.9940 | +0.0100 |
Approach | Details | Accuracy |
---|---|---|
Ignatov [24] | Time series treated as images + CNN | 0.9332 |
Zeng et al. [37] | CNN-based approach | 0.9475 |
Alsheikh et al. [19] | Deep learning models + HMMs | 0.9446 |
Zhang et al. [38] | Based on U-Net network [39] | 0.9640 |
Hossain et al. [30] | Deep and active learning models | 0.9724 |
Our Approach | Image Representation of time series + CNN | 0.9750 |
Approach | Details | Accuracy |
---|---|---|
Qian et al. [40] | Distribution-Embedded Deep Neural Network (DDNN) | 0.9058 |
Ronao and Cho [41] | CNN model | 0.9460 |
Jiang and Yin [23] | DCNN using 2D activity image based on inertial signals | 0.9518 |
Ignatov [24] | CNN + global statistical features | 0.9760 |
Our Approach | Image Representation of Time series + CNN | 0.9930 |
Approach | Details | Accuracy |
---|---|---|
Chen et al. [42] | Deep Long-Term Memory (LSTM) | 0.9770 |
Jiang and Yin [23] | DCNN using 2D activity image | 0.9701 |
Murad and Pyun [32] | LSTM-based deep RNN | 0.9780 |
Our Approach | Image Representation of Time series + CNN | 0.9900 |
Approach | Details | F1-Score |
---|---|---|
Zeng et al. [36] | LSTM + Continuous Temporal | 0.8990 |
Xi et al. [43] | Deep Dilated Convolutional networks | 0.9320 |
Qian et al. [40] | Distribution-Embedded Deep NN (DDNN) | 0.9338 |
Hammerla et al. [10] | CNN + RNN | 0.9370 |
Jafari et al. [25] | CNN for multimodal time series image representations | 0.9800 |
Our approach | Image Representation of Time series + CNN | 0.9900 |
Approach | Details | Value (Metric) |
---|---|---|
Hammerla et al. [10] | b-LSTM-S | 0.9270 (-score) |
Xia et al. [44] | LSTM-CNN Architecture | 0.9271 (-score) |
DeepConvLSTM [7] | Convolutional and LSTM network | 0.9300 (-score) |
Hossain et al. [30] | Deep and active learning model | 0.9406 () |
Our approach | Image Representation of Time series + CNN | 0.9500 (-score) |
Our approach | Image Representation of Time series + CNN | 0.9540 () |
Approach | Details | Accuracy |
---|---|---|
Liu et al. [45] | Attention modules for spatial fusion | 0.9094 |
Alsheikh et al. [19] | Deep learning models | 0.9150 |
Qian et al. [40] | Distribution-Embedded Deep NN (DDNN) | 0.9161 |
Hossain et al. [30] | Deep and active learning model | 0.9234 |
Our approach | Image Representation of Time series + CNN | 0.9360 |
Approach | Details | -Score |
---|---|---|
Guan and Plötz [46] | Ensembles of deep LSTM networks | 0.9260 |
AttnSense [47] | Attention with CNN and a GRU network | 0.9310 |
Zeng et al. [36] | LSTM + Continuous Temporal Attention | 0.9381 |
DeepConvLSTM [7] | Convolutional and LSTM network | 0.9580 |
Our approach | Image Representation of Time series + CNN | 0.9970 |
Dataset | Best Performing State-of-the-Art Approach | Our Approach | ||||
---|---|---|---|---|---|---|
Paper | Approach | Metric | Result | Result | Difference | |
WISDM | Hossain et al. [30] | Deep and active learning | 0.9724 | 0.9750 | + 0.0026 | |
UCI-HAR | Zhang et al. [48] | Based on U-Net (28 conv. layers) | 0.9840 | 0.9930 | + 0.0090 | |
USC-HAD | Murad and Pyun [32] | DRNN Model | 0.9780 | 0.9900 | + 0.0120 | |
PAMAP2 | Jafariet al. [25] | CNN (5 conv. layers) + image reps. | -score | 0.9800 | 0.9900 | + 0.0100 |
Opportunity | Ordóñez and Roggen [7] | Convolutional and LSTM network | -score | 0.9300 | 0.9530 | + 0.0230 |
Daphnet | Hossain et al. [30] | Deep and active learning | 0.9234 | 0.9360 | + 0.0126 | |
Skoda | Ordóñez and Roggen [7] | Convolutional and LSTM network | -score | 0.9580 | 0.9970 | + 0.0390 |
Canvas Layout | Accuracy |
---|---|
baseline | 0.9750 |
bottom & top halves switched | 0.9572 |
flattened quadrants (over rows) | 0.9657 |
left & right halves switched | 0.9645 |
top-left & bottom-right quadrants switched | 0.9519 |
top-right & bottom-left quadrants switched | 0.9710 |
Avg. | 0.9642 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sanchez Guinea, A.; Sarabchian, M.; Mühlhäuser, M. Improving Wearable-Based Activity Recognition Using Image Representations. Sensors 2022, 22, 1840. https://doi.org/10.3390/s22051840
Sanchez Guinea A, Sarabchian M, Mühlhäuser M. Improving Wearable-Based Activity Recognition Using Image Representations. Sensors. 2022; 22(5):1840. https://doi.org/10.3390/s22051840
Chicago/Turabian StyleSanchez Guinea, Alejandro, Mehran Sarabchian, and Max Mühlhäuser. 2022. "Improving Wearable-Based Activity Recognition Using Image Representations" Sensors 22, no. 5: 1840. https://doi.org/10.3390/s22051840
APA StyleSanchez Guinea, A., Sarabchian, M., & Mühlhäuser, M. (2022). Improving Wearable-Based Activity Recognition Using Image Representations. Sensors, 22(5), 1840. https://doi.org/10.3390/s22051840