Wearable Sensor-Free Adult Physical Activity Monitoring Using Smartphone IMU Signals: Cross-Subject Deep Learning with Window-Length and Sensor Modality Studies
Abstract
1. Introduction
- We conducted adult HAR experiments under a realistic cross-subject GroupKFold protocol to evaluate generalization to unseen users.
- We propose a reduced 6-class activity taxonomy to reduce label ambiguity and improve robustness.
- We provide a systematic window-length analysis (2.0 s, 4.0 s, 6.0 s) to quantify segmentation effects.
- We performed sensor modality ablation (accelerometer-only, gyroscope-only, and combined IMU) to identify informative signals.
- We analyzed multimodal fusion with smartwatch data and discussed practical implications for smartphone-based activity monitoring without additional wearable devices.
2. Related Work
3. Materials and Methods
3.1. Dataset
3.2. Preprocessing and Window Segmentation
3.3. Reduced 6-Class Activity Taxonomy
- Locomotion
- Stairs
- Static
- Eat–drink
- Sports
- Upper-body.
3.4. Deep Learning Models
3.5. Evaluation Protocol and Metrics
4. Experimental Setup
4.1. Implementation
4.2. Hardware
4.3. Training Configuration
4.4. Experiments
- Model comparison–smartphone-only CNN versus multimodal fusion using smartphone and smartwatch signals.
- Window-length study–evaluating the effect of different sliding-window sizes (2.0 s, 4.0 s, 6.0 s) on recognition performance.
- Sensor modality ablation–assessing models trained on accelerometer-only, gyroscope-only, and combined accelerometer–gyroscope inputs.
- Per-class analysis–using class-wise F1-scores and confusion matrices to analyze error patterns and activity-specific behavior.
4.5. Deployment Feasibility
5. Results and Discussion
5.1. Model Comparison
5.2. Window-Length Analysis
5.3. Sensor Modality Ablation
5.4. Per-Class Performance
6. Conclusions
7. Limitations and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| HAR | Human Activity Recognition |
| IMU | Inertial Measurement Unit |
| CNN | Convolutional Neural Network |
| LSTM | Long Short-Term Memory |
| TCN | Temporal Convolutional Network |
| F1 | F1-score |
| Macro-F1 | Macro-averaged F1-score |
| GPU | Graphics Processing Unit |
| VRAM | Video Random Access Memory |
References
- Zhang, S.; Wang, L.; Zhu, J. Deep Learning in Human Activity Recognition with Wearable Sensors: Advances and Challenges. Sensors 2022, 22, 1476. [Google Scholar] [CrossRef] [PubMed]
- Sousa Lima, W.; Souto, E.; El-Khatib, K.; Jalali, R.; Gama, J. Human Activity Recognition Using Inertial Sensors in a Smartphone: An Overview. Sensors 2019, 19, 3213. [Google Scholar] [CrossRef]
- Garcia-Gonzalez, D.; Rivero, D.; Fernandez-Blanco, E.; Luaces, M.R. Deep Learning Models for Real-Life Human Activity Recognition from Smartphone Sensor Data. Internet Things 2023, 24, 100925. [Google Scholar] [CrossRef]
- Baños, O.; Gálvez, J.M.; Damas, M.; Pomares, H.; Rojas, I. Window Size Impact in Human Activity Recognition. Sensors 2014, 14, 6474–6499. [Google Scholar] [CrossRef] [PubMed]
- Mennella, C.; Esposito, M.; De Pietro, G.; Maniscalco, U. Multiscale Activity Recognition Algorithms to Improve Cross-Subjects Performance Resilience in Rehabilitation Monitoring Systems. Comput. Methods Programs Biomed. 2025, 267, 108792. [Google Scholar] [CrossRef]
- Weiss, G.M. WISDM Smartphone and Smartwatch Activity and Biometrics Dataset. UCI Mach. Learn. Repos. 2019. [Google Scholar] [CrossRef]
- Kwapisz, J.R.; Weiss, G.M.; Moore, S.A. Activity Recognition Using Cell Phone Accelerometers. ACM SIGKDD Explor. Newsl. 2011, 12, 74–82. [Google Scholar] [CrossRef]
- Ordóñez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef] [PubMed]
- Hernandez, N.; Ben-Abdallah, F.; Mazzara, M.; Dragoni, N. Human Activity Recognition Using Deep Learning: A Survey. Sensors 2020, 20, 155. [Google Scholar] [CrossRef]
- Ronao, C.A.; Cho, S.-B. Human Activity Recognition with Smartphone Sensors Using Deep Learning Neural Networks. Expert Syst. Appl. 2016, 59, 235–244. [Google Scholar] [CrossRef]
- Sekaran, S.R.; Han, P.Y.; Yin, O.S. Smartphone-based human activity recognition using lightweight multiheaded temporal convolutional network. Expert Syst. Appl. 2023, 227, 120132. [Google Scholar] [CrossRef]
- Bulling, A.; Blanke, U.; Schiele, B. A Tutorial on Human Activity Recognition Using Body-Worn Inertial Sensors. ACM Comput. Surv. 2014, 46, 33. [Google Scholar] [CrossRef]
- Neverova, N.; Wolf, C.; Taylor, G.; Nebout, F. ModDrop: Adaptive Multi-Modal Gesture Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 1692–1706. [Google Scholar] [CrossRef]
- Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A Public Domain Dataset for Human Activity Recognition Using Smartphones. ESANN 2013, 3, 437–442. [Google Scholar]
- Shoaib, M.; Bosch, S.; Incel, O.D.; Scholten, H.; Havinga, P.J.M. Complex Human Activity Recognition Using Smartphone and Wrist-Worn Motion Sensors. Sensors 2016, 16, 426. [Google Scholar] [CrossRef]
- San-Segundo, R.; Gil-Martín, M.; Díaz-Morcillo, A.; Montero, J.M. Human Activity Recognition Using a Smartwatch and a Smartphone. Pattern Recognit. Lett. 2018, 119, 22–29. [Google Scholar] [CrossRef]
- Mekruksavanich, S.; Jitpattanakul, A. Efficient and Explainable Human Activity Recognition Using Deep Residual Network with Squeeze-and-Excitation Mechanism. Appl. Syst. Innov. 2025, 8, 57. [Google Scholar] [CrossRef]
- Lamaakal, I.; Yahyati, C.; Maleh, Y.; El Makkaoui, K.; Ouahbi, I.; Abd El-Latif, A.A.; Zomorodi, M.; Abd El-Rahiem, B. A tiny inertial transformer for human activity recognition via multimodal knowledge distillation and explainable AI. Sci. Rep. 2025, 15, 42335. [Google Scholar] [CrossRef]
- Zhou, H.; Zhang, X.; Feng, Y.; Zhang, T.; Xiong, L. Efficient human activity recognition on edge devices using DeepConv LSTM architectures. Sci. Rep. 2025, 15, 13830. [Google Scholar] [CrossRef] [PubMed]
- Soleimani, E.; Nazerfard, E. Cross-Subject Transfer Learning in Human Activity Recognition Systems using Generative Adversarial Networks. arXiv 2019, arXiv:1903.12489. [Google Scholar] [CrossRef]
- Logacjov, A.; Bach, K.; Kongsvold, A.; Bårdstu, H.B.; Mork, P.J. HARTH: A Human Activity Recognition Dataset for Machine Learning. Sensors 2021, 21, 7853. [Google Scholar] [CrossRef]
- Hoelzemann, A.; Romero, J.L.; Bock, M.; Van Laerhoven, K.; Lv, Q. Hang-Time HAR: A Benchmark Dataset for Basketball Activity Recognition Using Wrist-Worn Inertial Sensors. Sensors 2023, 23, 5879. [Google Scholar] [CrossRef] [PubMed]


| Experiment | Input/Setting | Accuracy (Mean ± Std) | Macro-F1 (Mean ± Std) |
|---|---|---|---|
| CNN Phone-only (baseline) | 4.0 s window, phone accel + gyro (6 ch) | 0.4716 ± 0.0596 | 0.4626 ± 0.0408 |
| FusionTCN + ModDrop (p = 0.3) | 4.0 s window, phone + watch (12 ch) | 0.4189 ± 0.0348 | 0.4074 ± 0.0520 |
| Study | Dataset | Validation Protocol | Classes | Activity Types | Primary Metric | Value | Key Note |
|---|---|---|---|---|---|---|---|
| Group A: User-dependent/random-split protocols (for comparison only) | |||||||
| Ronao & Cho [10] (Expert Syst. Appl., 2016) | UCI-HAR (30 subjects, smartphone) | User-dep. split (70% train/30% test) | 6 | Walk, Upstairs, Downstairs, Sit, Stand, Laying | Accuracy | 95.75% | No subject separation; locomotion-focused |
| Mekruksavanich & Jitpattanakul [17] (ASI, 2025) | WISDM v1.1 (51 subjects, smartphone) | 5-fold CV (user-dep., no subject separation) | 6 | Walk, Jog, Upstairs, Downstairs, Sit, Stand | Accuracy & F1 | 98.78% F1: 98.09% | Same WISDM dataset; user-dep. split only |
| Garcia-Gonzalez et al. [3] (IoT, 2023) | WISDM (smartphone) | 7-fold CV (user-independent) | 6 | Walk, Jog, Upstairs, Downstairs, Sit, Stand | F1 | 84.6% | User-indep. but only 6 locomotion classes |
| Group B: Strict cross-subject protocols (LOSO/GroupKFold)–directly comparable to present study | |||||||
| Soleimani & Nazerfard [20] (arXiv, 2019) | Opportunity Challenge (4 subjects, body-worn IMU) | Cross-subject (train on 1, test on another) | 6 | ADL micro-activities: Relaxing, Coffee time, Sandwich, etc. | Weighted F1 (no transfer) | 0.21–0.48 | 22–47% perf. drop vs supervised; even w/GAN: 0.49–0.73 |
| Logacjov et al. [21] (Sensors, 2021) | HARTH (22 subjects, thigh + back accel., fixed placement) | LOSO (22 subjects, free-living) | 12 | Free-living daily activities incl. Stairs asc./desc., Cycling, etc. | Macro-F1 (best: SVM) | 0.81 (±0.18) | Stairs: 40–64% per-class F1; SD = ±0.18 across classes |
| Hoelzemann et al. [22] (Sensors, 2023) | Hang-Time HAR (24 players, wrist IMU, 2 countries) | LOSO (24 players, game + drill sessions) | 10 | Basketball-specific: Dribble, Pass, Layup, Rebound, Run, etc. | Macro-F1 (game session) | ~0.25 (sport classes) | Rebound & layup < 0.20; high intra-class variab. |
| Present Study | |||||||
| Present study (CNN, smartphone-only) | WISDM507 (51 subjects, smartphone IMU, free placement, 18 orig. labels) | GroupKFold strict (subject-indep., 5 folds) | 6 (grouped from 18) | Locomotion, Stairs, Static, Eat–drink, Sports, Upper-body | Macro-F1 | 0.46 | Random baseline typically < 0.20 for six-class tasks; full subject separation; smartphone free placement |
| Window Length | Accuracy (Mean ± Std) | Macro-F1 (Mean ± Std) |
|---|---|---|
| 2.0 s | 0.4595 ± 0.0439 | 0.4571 ± 0.0443 |
| 4.0 s | 0.4568 ± 0.0553 | 0.4473 ± 0.0525 |
| 6.0 s | 0.4451 ± 0.0535 | 0.4297 ± 0.0558 |
| Input Modality | Accuracy (Mean ± Std) | Macro-F1 (Mean ± Std) |
|---|---|---|
| accel-only (3 ch) | 0.4030 ± 0.0603 | 0.3997 ± 0.0502 |
| gyro-only (3 ch) | 0.5030 ± 0.0467 | 0.4491 ± 0.0417 |
| accel + gyro (6 ch) | 0.4534 ± 0.0559 | 0.4535 ± 0.0643 |
| Class | F1-Score |
|---|---|
| locomotion | 0.512 |
| eat–drink | 0.462 |
| sports | 0.452 |
| upper-body | 0.325 |
| static | 0.151 |
| stairs | 0.140 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Turdalyuly, M.; Zholdassova, A.; Turdalykyzy, T.; Doshybekov, A. Wearable Sensor-Free Adult Physical Activity Monitoring Using Smartphone IMU Signals: Cross-Subject Deep Learning with Window-Length and Sensor Modality Studies. Information 2026, 17, 368. https://doi.org/10.3390/info17040368
Turdalyuly M, Zholdassova A, Turdalykyzy T, Doshybekov A. Wearable Sensor-Free Adult Physical Activity Monitoring Using Smartphone IMU Signals: Cross-Subject Deep Learning with Window-Length and Sensor Modality Studies. Information. 2026; 17(4):368. https://doi.org/10.3390/info17040368
Chicago/Turabian StyleTurdalyuly, Mussa, Ay Zholdassova, Tolganay Turdalykyzy, and Aydin Doshybekov. 2026. "Wearable Sensor-Free Adult Physical Activity Monitoring Using Smartphone IMU Signals: Cross-Subject Deep Learning with Window-Length and Sensor Modality Studies" Information 17, no. 4: 368. https://doi.org/10.3390/info17040368
APA StyleTurdalyuly, M., Zholdassova, A., Turdalykyzy, T., & Doshybekov, A. (2026). Wearable Sensor-Free Adult Physical Activity Monitoring Using Smartphone IMU Signals: Cross-Subject Deep Learning with Window-Length and Sensor Modality Studies. Information, 17(4), 368. https://doi.org/10.3390/info17040368

