1. Introduction
Agricultural settings, particularly those involving open-field cultivation, often exhibit diverse and unpredictable conditions that impede efficient operations and resource management [
1,
2]. To address these complexities, collaborative systems between humans and robots have been proposed, aiming to foster shared goals through effective information exchange and coordinated task execution [
3,
4,
5].
In human–robot interaction (HRI), robotic agents must interpret human intentions and respond accordingly. This requirement is supported by the application of human activity recognition (HAR), which employs technologies such as wearable sensor systems, computer vision, and machine learning to classify and identify human actions [
6,
7,
8]. Robotic systems can align their operations with human behavior through accurate recognition, facilitating joint tasks like weeding, crop collection, and transportation to storage facilities [
9,
10].
The agricultural domain has recently witnessed increased research attention toward HAR. A review of related studies reveals the development of more intuitive interaction mechanisms for HRI. For instance, a vision-based method for recognizing static hand gestures was introduced to enhance communication between humans and robots [
11]. However, static gesture recognition alone proved insufficient for continuous or complex tasks. Dynamic gesture recognition techniques were proposed to bridge this gap, focusing on capturing a broader range of full-body movements [
9,
12]. These dynamic systems enabled more precise and responsive robot behaviors, especially in harvesting activities. Nevertheless, these vision-based approaches encountered limitations due to reliance on RGB-D cameras, which are sensitive to changes in lighting and background conditions.
Wearable sensors have been explored as a robust alternative for HAR in agriculture to mitigate the limitations of visual systems. These devices offer supplemental information, allowing reliable recognition even under poor lighting or obstructed visibility. Despite their advantages, several challenges remain unresolved, particularly in the integration and synchronization of sensor data, the issue of signal drift, and the computational cost associated with processing data from multiple sources. Even so, wearable sensor-based HAR systems hold promise for real-time applications because they are easy to implement, affordable, and independent of specific environmental contexts [
13].
Accelerometers, gyroscopes, and magnetometers are among this area’s most frequently adopted sensing units [
14,
15,
16]. In addition, wearable technologies like smartwatches and smart glasses, which integrate multiple sensors into a single device, are becoming increasingly popular [
17,
18]. Studies have shown that combining data from various sensors enhances overall system reliability, as the deficiencies or noise in one sensor can be compensated by others.
A central aspect across these studies is using machine learning classifiers to analyze the input from various sensors. Most HAR machine learning-based systems rely on supervised learning, where algorithms are trained with labeled datasets to detect activity patterns and assign them to specific categories. In multi-sensor HAR frameworks, each sensor typically comprises several physical channels, each capturing motion signals along different axes. Therefore, analyzing multi-channel data and manually deriving significant features are crucial steps in the recognition process [
19].
However, as the number and diversity of sensors in wearable networks grow, traditional machine learning methods face limitations in handling such complex data efficiently. Deep learning offers a viable solution by automatically learning intricate patterns and temporal dependencies and extracting features directly from raw signals. As a result of deep learning’s superior capabilities, researchers have increasingly turned to deep learning-based frameworks for HAR tasks [
20]. Prominent models such as convolutional neural networks (CNNs) [
21], long short-term memory (LSTM) networks [
22], and gated recurrent units (GRUs) [
23] have been widely adopted for their high accuracy and adaptability in recognizing human activities.
This study introduces a novel method for HAR based on wearable sensor data. It employs a lightweight and high-efficiency deep residual architecture that incorporates aggregated transformations, following the concept presented in [
24]. The proposed approach addresses critical shortcomings of conventional models by incorporating structural enhancements into the ResNet framework, specifically designed to handle the challenges posed by human activity data in HRI within agricultural environments.
The main contributions of this work are summarized as follows:
A specialized end-to-end deep learning framework, 1D-ResNeXt, has been developed to address HAR in agricultural domains. Unlike prior adaptations of ResNeXt that merely transform two-dimensional operations into one-dimensional ones, this model incorporates three distinct innovations. First, it employs causal convolution to eliminate the risk of temporal information leakage when processing sequential sensor streams. Second, it introduces a multi-kernel aggregation mechanism tailored for agricultural activities, enabling the model to capture movement characteristics across multiple time scales in parallel. Third, it adopts a lightweight parameter reduction strategy that relies on additive feature fusion rather than concatenation, lowering computational overhead. This design ensures that the architecture can be efficiently deployed on resource-limited agricultural edge devices while preserving high recognition accuracy.
The developed model enhances the accuracy and resilience of HAR applications in agricultural scenarios. By integrating advanced feature extraction mechanisms and an attention module, it effectively addresses the complexity and variability inherent in real-world physical activities.
Extensive comparative evaluations against state-of-the-art HAR techniques are conducted using a publicly available benchmark dataset. The results demonstrate the proposed model’s superior performance and adaptability across various sensing modalities and environmental conditions.
The organization of this paper is as follows.
Section 2 comprehensively reviews prior studies on wearable sensor-based HAR and deep learning approaches.
Section 3 introduces the proposed framework, describing the architecture and implementation of the 1D-ResNeXt model.
Section 4 details the experimental design and reports the performance evaluation results.
Section 5 analyzes the findings, providing key interpretations and insights. Finally,
Section 6 concludes the paper and outlines prospective directions for future research.
3. Methodology
This section outlines the systematic approach adopted for constructing an optimized, lightweight HAR system using wearable sensors, tailored explicitly for human–robot collaboration in agricultural settings.
Figure 1 depicts the full end-to-end architecture of the proposed system, comprising four primary stages: data collection, data pre-processing, model development, and performance evaluation.
The process initiates with the acquisition of sensor data from multiple wearable devices, including tri-axial accelerometers, gyroscopes, and magnetometers. These sensors are strategically placed on five key regions of the body—namely the chest, neck, right wrist, left wrist, and lower back—to capture a wide range of movement patterns.
Once acquired, the raw sensor signals are subjected to a structured pre-processing sequence. This involves noise reduction, normalization of values, and segmenting continuous signals into manageable input windows, preparing the data for downstream learning tasks.
The refined data is subsequently utilized to train and validate the proposed deep learning model, based on a one-dimensional ResNeXt architecture. A 5-fold cross-validation strategy is employed, with an 80/20 split between training and testing datasets to ensure robust performance assessment.
In the final phase, the model’s effectiveness in identifying agricultural activities under collaborative human–robot conditions is quantitatively measured. Standard evaluation indicators—accuracy, precision, recall, and F1-score—determine the model’s predictive capability and classification reliability.
3.3. The Proposed Deep Learning Model
This study introduces a novel multi-branch aggregation architecture, 1D-ResNeXt, which extends the foundational principles of the original ResNeXt model [
39]. While ResNeXt was originally designed for image-based tasks in computer vision, the 1D-ResNeXt adapts and repurposes its architecture specifically for processing one-dimensional time-series data captured by sensors to enhance user identification.
The proposed architecture introduces several significant modifications. First, traditional two-dimensional convolutional operations are restructured into dedicated one-dimensional blocks. These blocks incorporate causal convolution, ensuring that future time steps cannot influence current predictions—an essential property for real-time agricultural systems that earlier ResNeXt variants fail to provide. Second, the network enhances temporal representation by employing a multi-resolution feature extraction strategy, which captures rapid motion changes and extended activity patterns commonly observed in farming tasks. Finally, unlike the original ResNeXt—designed exclusively for two-dimensional spatial analysis—the 1D-ResNeXt integrates agricultural activity-specific temporal modeling. It applies additive feature fusion rather than concatenation, cutting computational cost by roughly 40% while achieving superior recognition performance on sequential sensor data.
The updated version of the manuscript now clearly emphasizes, within the model formulation, the specific modifications made to adapt ResNeXt for 1D time-series sensor inputs. This clarification highlights the core design enhancements that contributed to the model’s superior performance in identifying users, surpassing traditional ResNeXt implementations not optimized for sequential input.
Architecturally, the proposed 1D-ResNeXt model comprises a series of convolutional blocks along with multi-kernel branches that perform additive aggregation of feature maps. This differs from the approach used in InceptionNet [
24], which combines feature maps through concatenation across different kernel sizes. The aggregation-by-addition mechanism adopted in 1D-ResNeXt reduces the total number of trainable parameters, thereby improving the model’s efficiency and making it suitable for deployment in edge computing environments where latency and computational resources are constrained.
A detailed architectural overview of the proposed model is illustrated in
Figure 3.
The proposed architecture’s convolutional module consists of four primary layers: a one-dimensional convolutional layer (Conv1D), batch normalization (BN), a rectified linear unit (ReLU), and a max-pooling (MP) layer. The Conv1D layer employs learnable one-dimensional filters to extract discriminative features by producing corresponding feature maps from the input signal. These filters are aligned with the one-dimensional structure of the input time-series data.
The BN layer normalizes activations, thereby improving the stability of the training process and accelerating convergence. The ReLU activation function introduces non-linearity, enabling the model to capture more complex patterns. Meanwhile, the max-pooling layer performs spatial downsampling, reducing the dimensionality of feature maps while preserving salient information.
In addition to the standard convolutional pipeline, the model incorporates a multi-kernel (MK) block. Each MK module utilizes three parallel convolutional filters of varying sizes—1 × 3, 1 × 5, and 1 × 7—to capture features at multiple temporal resolutions. A 1 × 1 convolution is applied before each of the larger kernel operations to reduce model complexity and minimize the number of trainable parameters. This architectural enhancement promotes computational efficiency without compromising representational power.
Figure 4 illustrates the MK block’s internal structure.
After feature extraction, the resulting maps are passed through a global average pooling (GAP) layer, compressing each feature map into a single scalar by averaging its elements. This process converts the multidimensional feature output into a one-dimensional vector. The final classification is performed through a fully connected layer, followed by a softmax activation, which produces a probability distribution over the target classes.
Model optimization is performed using the Adam optimizer, known for its adaptive learning rate properties. For loss computation, categorical cross-entropy is employed, a standard objective function for multi-class classification tasks, which measures the divergence between predicted probabilities and true class labels to guide the learning process.
5. Discussion
The extensive experimental assessment of the proposed 1D-ResNeXt architecture provided valuable findings for enhancing HAR using wearable sensors in agricultural HRI. The outcomes highlight the model’s robustness across various sensor placement schemes, temporal segmentation settings, and combinations of sensor modalities, confirming the effectiveness of the proposed methodology.
6. Conclusions and Future Directions
This study demonstrates that strategic sensor placement and architectural innovation can transform HAR systems for agricultural human–robot collaboration. The central finding—that chest-mounted sensors paired with 0.5 s temporal windows achieve 99.92% accuracy—highlights a key principle: effective agricultural HAR relies not merely on computational sophistication but on recognizing the biomechanical signatures of farm work itself.
These findings carry three significant implications for the field. First, they challenge the prevailing assumption that longer temporal contexts improve activity recognition. Our results show that agricultural tasks exhibit highly discriminative features within short intervals, suggesting that real-time systems can maintain speed and accuracy. This insight addresses a critical barrier to deploying HAR in time-sensitive agricultural operations, where delayed recognition may compromise safety or efficiency. Second, the multimodal sensor analysis offers a nuanced perspective on the informational value of farming contexts. While trimodal fusion (accelerometer, gyroscope, magnetometer) yields the highest accuracy, the accelerometer–gyroscope combination achieves 99.49%—a marginal 0.43% reduction that significantly decreases computational overhead and power consumption. This finding has immediate practical value: designers of agricultural wearables can make principled trade-offs between recognition performance and sustainability, enabling full-day operation without sacrificing worker comfort or mobility. Third, and perhaps most importantly, the architectural modifications introduced in 1D-ResNeXt—causal convolution, multi-kernel aggregation, and additive fusion—constitute domain-specific innovations rather than generic adaptations. These design choices stem from recognizing agricultural activities distinct from daily living activities typically studied in conventional HAR research. The 40% parameter reduction achieved through additive fusion, combined with superior recognition accuracy, underscores that agricultural HAR benefits most from purpose-built architectures attuned to farm work’s temporal and biomechanical dynamics.
Beyond its technical contributions, this work addresses a fundamental challenge in agricultural robotics: enabling machines to safely and effectively share workspaces with humans. Accurate, real-time activity recognition forms the perceptual foundation of collaborative farm systems—robots that adapt to what workers are doing, not merely where they are located. The chest sensor’s consistent superiority across temporal windows suggests that core body movements, rather than limb-specific motions, provide the most reliable signals for human–robot coordination.
Future investigations should advance in four directions to ensure these findings achieve practical relevance. First, large-scale field trials need to be carried out in varied agricultural settings such as fruit harvesting, grain processing, and livestock management. These experiments must account for real-world challenges, including diverse weather conditions, irregular terrain, and heterogeneous equipment setups. Performance should also be monitored over extended durations—several months—to evaluate sensor robustness and potential model drift. Second, standardized multi-dataset evaluation frameworks should be developed by gathering HAR data from agricultural practices across different regions, crop systems, and operational methods. Such protocols would validate the system’s ability to generalize across domains. Third, seamless integration with robotic platforms should be explored by designing communication interfaces between HAR modules and agricultural robots. This would enable real-time adaptation to human activity changes, enhancing operational safety and efficiency. Finally, detailed energy optimization studies are required to extend device usability throughout full working days. These efforts should examine adaptive sampling strategies and context-aware sensor activation to minimize power consumption without sacrificing recognition accuracy.
Author Contributions
Conceptualization, S.M. and A.J.; methodology, S.M.; software, A.J.; validation, A.J.; formal analysis, S.M.; investigation, S.M.; resources, A.J.; data curation, A.J.; writing—original draft preparation, S.M.; writing—review and editing, A.J.; visualization, S.M.; supervision, A.J.; project administration, A.J.; funding acquisition, S.M. and A.J. All authors have read and agreed to the published version of the manuscript.
Funding
This research budget was allocated by the University of Phayao; the Thailand Science Research and Innovation Fund (Fundamental Fund); National Science, Research and Innovation Fund (NSRF); and King Mongkut’s University of Technology North Bangkok (Project no. KMUTNB-FF-68-B-02).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
To clarify, our research utilizes a pre-existing, publicly available dataset. The dataset has been anonymized and does not contain any personally identifiable information. We have cited the source of the dataset in our manuscript and have complied with the terms of use set forth by the dataset provider.
Data Availability Statement
The original data presented in the study are openly available for the Human–Robot Collaboration in Agriculture (HRCA) dataset at
https://ibo.certh.gr/open-datasets/, (accessed on 12 April 2025).
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Fue, K.G.; Porter, W.M.; Barnes, E.M.; Rains, G.C. An Extensive Review of Mobile Agricultural Robotics for Field Operations: Focus on Cotton Harvesting. AgriEngineering 2020, 2, 150–174. [Google Scholar] [CrossRef]
- Tan, Y.; Liu, X.; Zhang, J.; Wang, Y.; Hu, Y. A Review of Research on Fruit and Vegetable Picking Robots Based on Deep Learning. Sensors 2025, 25, 3677. [Google Scholar] [CrossRef]
- Ogenyi, U.E.; Liu, J.; Yang, C.; Ju, Z.; Liu, H. Physical Human–Robot Collaboration: Robotic Systems, Learning Methods, Collaborative Strategies, Sensors, and Actuators. IEEE Trans. Cybern. 2021, 51, 1888–1901. [Google Scholar] [CrossRef]
- Raja, R. Software Architecture for Agricultural Robots: Systems, Requirements, Challenges, Case Studies, and Future Perspectives. IEEE Trans. AgriFood Electron. 2024, 2, 125–137. [Google Scholar] [CrossRef]
- Han, J.; Conti, D. Recent Advances in Human–Robot Interactions. Appl. Sci. 2025, 15, 6850. [Google Scholar] [CrossRef]
- Liu, H.; Gamboa, H.; Schultz, T. Human Activity Recognition, Monitoring, and Analysis Facilitated by Novel and Widespread Applications of Sensors. Sensors 2024, 24, 5250. [Google Scholar] [CrossRef]
- Bhola, G.; Vishwakarma, D.K. A review of vision-based indoor HAR: State-of-the-art, challenges, and future prospects. Multimed. Tools Appl. 2023, 83, 1965–2005. [Google Scholar] [CrossRef] [PubMed]
- Ankalaki, S. Simple to Complex, Single to Concurrent Sensor-Based Human Activity Recognition: Perception and Open Challenges. IEEE Access 2024, 12, 93450–93486. [Google Scholar] [CrossRef]
- Moysiadis, V.; Benos, L.; Karras, G.; Kateris, D.; Peruzzi, A.; Berruto, R.; Papageorgiou, E.; Bochtis, D. Human–Robot Interaction through Dynamic Movement Recognition for Agricultural Environments. AgriEngineering 2024, 6, 2494–2512. [Google Scholar] [CrossRef]
- Upadhyay, A.; Zhang, Y.; Koparan, C.; Rai, N.; Howatt, K.; Bajwa, S.; Sun, X. Advances in ground robotic technologies for site-specific weed management in precision agriculture: A review. Comput. Electron. Agric. 2024, 225, 109363. [Google Scholar] [CrossRef]
- Moysiadis, V.; Katikaridis, D.; Benos, L.; Busato, P.; Anagnostis, A.; Kateris, D.; Pearson, S.; Bochtis, D. An Integrated Real-Time Hand Gesture Recognition Framework for Human–Robot Interaction in Agriculture. Appl. Sci. 2022, 12, 8160. [Google Scholar] [CrossRef]
- Pal, A.; Leite, A.C.; From, P.J. A novel end-to-end vision-based architecture for agricultural human–robot collaboration in fruit picking operations. Robot. Auton. Syst. 2024, 172, 104567. [Google Scholar] [CrossRef]
- Thottempudi, P.; Acharya, B.; Moreira, F. High-Performance Real-Time Human Activity Recognition Using Machine Learning. Mathematics 2024, 12, 3622. [Google Scholar] [CrossRef]
- Mekruksavanich, S.; Jitpattanakul, A. A Deep Learning Network with Aggregation Residual Transformation for Human Activity Recognition Using Inertial and Stretch Sensors. Computers 2023, 12, 141. [Google Scholar] [CrossRef]
- Aguileta, A.A.; Brena, R.F.; Mayora, O.; Molino-Minero-Re, E.; Trejo, L.A. Multi-Sensor Fusion for Activity Recognition—A Survey. Sensors 2019, 19, 3808. [Google Scholar] [CrossRef]
- Pham, M.; Yang, D.; Sheng, W. A Sensor Fusion Approach to Indoor Human Localization Based on Environmental and Wearable Sensors. IEEE Trans. Autom. Sci. Eng. 2019, 16, 339–350. [Google Scholar] [CrossRef]
- Mekruksavanich, S.; Jitpattanakul, A.; Youplao, P.; Yupapin, P. Enhanced Hand-Oriented Activity Recognition Based on Smartwatch Sensor Data Using LSTMs. Symmetry 2020, 12, 1570. [Google Scholar] [CrossRef]
- Mekruksavanich, S.; Jantawong, P.; Jitpattanakul, A. Deep Learning Approaches for HAR of Daily Living Activities Using IMU Sensors in Smart Glasses. In Proceedings of the 2023 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON), Phuket, Thailand, 22–25 March 2023; pp. 474–478. [Google Scholar] [CrossRef]
- Ye, X.; Sakurai, K.; Nair, N.K.C.; Wang, K.I.K. Machine Learning Techniques for Sensor-Based Human Activity Recognition with Data Heterogeneity—A Review. Sensors 2024, 24, 7975. [Google Scholar] [CrossRef]
- Kaseris, M.; Kostavelis, I.; Malassiotis, S. A Comprehensive Survey on Deep Learning Methods in Human Activity Recognition. Mach. Learn. Knowl. Extr. 2024, 6, 842–876. [Google Scholar] [CrossRef]
- Lai, Y.C.; Kan, Y.C.; Hsu, K.C.; Lin, H.C. Multiple inputs modeling of hybrid convolutional neural networks for human activity recognition. Biomed. Signal Process. Control 2024, 92, 106034. [Google Scholar] [CrossRef]
- Sassi Hidri, M.; Hidri, A.; Alsaif, S.A.; Alahmari, M.; AlShehri, E. Enhancing Sensor-Based Human Physical Activity Recognition Using Deep Neural Networks. J. Sens. Actuator Netw. 2025, 14, 42. [Google Scholar] [CrossRef]
- Pan, J.; Hu, Z.; Yin, S.; Li, M. GRU with Dual Attentions for Sensor-Based Human Activity Recognition. Electronics 2022, 11, 1797. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5987–5995. [Google Scholar] [CrossRef]
- Fukatsu, T.; Nanseki, T. Monitoring System for Farming Operations with Wearable Devices Utilized Sensor Networks. Sensors 2009, 9, 6171–6184. [Google Scholar] [CrossRef] [PubMed]
- Yerebakan, M.O.; Hu, B. Wearable Sensors Assess the Effects of Human–Robot Collaboration in Simulated Pollination. Sensors 2024, 24, 577. [Google Scholar] [CrossRef]
- Dentamaro, V.; Gattulli, V.; Impedovo, D.; Manca, F. Human activity recognition with smartphone-integrated sensors: A survey. Expert Syst. Appl. 2024, 246, 123143. [Google Scholar] [CrossRef]
- Uddin, M.Z.; Soylu, A. Human activity recognition using wearable sensors, discriminant analysis, and long short-term memory-based neural structured learning. Sci. Rep. 2021, 11, 16455. [Google Scholar] [CrossRef]
- Aiello, G.; Catania, P.; Vallone, M.; Venticinque, M. Worker safety in agriculture 4.0: A new approach for mapping operator’s vibration risk through Machine Learning activity recognition. Comput. Electron. Agric. 2022, 193, 106637. [Google Scholar] [CrossRef]
- Tagarakis, A.C.; Benos, L.; Aivazidou, E.; Anagnostis, A.; Kateris, D.; Bochtis, D. Wearable Sensors for Identifying Activity Signatures in Human-Robot Collaborative Agricultural Environments. Eng. Proc. 2021, 9, 5. [Google Scholar] [CrossRef]
- Anagnostis, A.; Benos, L.; Tsaopoulos, D.; Tagarakis, A.; Tsolakis, N.; Bochtis, D. Human Activity Recognition Through Recurrent Neural Networks for Human–Robot Interaction in Agriculture. Appl. Sci. 2021, 11, 2188. [Google Scholar] [CrossRef]
- Chen, K.; Zhang, D.; Yao, L.; Guo, B.; Yu, Z.; Liu, Y. Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges, and Opportunities. ACM Comput. Surv. 2021, 54, 77. [Google Scholar] [CrossRef]
- Mekruksavanich, S.; Jitpattanakul, A. RNN-based deep learning for physical activity recognition using smartwatch sensors: A case study of simple and complex activity recognition. Math. Biosci. Eng. 2022, 19, 5671–5698. [Google Scholar] [CrossRef]
- Ordóñez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition. Sensors 2016, 16, 115. [Google Scholar] [CrossRef] [PubMed]
- Imran, H.A.; Hamza, K.; Mehmood, Z. HARResNext: An efficient ResNext inspired network for human activity recognition with inertial sensors. In Proceedings of the 2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2), Rawalpindi, Pakistan, 24–26 May 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Benos, L.; Tsaopoulos, D.; Tagarakis, A.C.; Kateris, D.; Bochtis, D. Optimal Sensor Placement and Multimodal Fusion for Human Activity Recognition in Agricultural Tasks. Appl. Sci. 2024, 14, 8520. [Google Scholar] [CrossRef]
- Slattery, P.; Cofré Lizama, L.E.; Wheat, J.; Gastin, P.; Dascombe, B.; Middleton, K. The Agreement between Wearable Sensors and Force Plates for the Analysis of Stride Time Variability. Sensors 2024, 24, 3378. [Google Scholar] [CrossRef]
- Salminen, M.; Perttunen, J.; Avela, J.; Vehkaoja, A. A novel method for accurate division of the gait cycle into seven phases using shank angular velocity. Gait Posture 2024, 111, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Ismail Fawaz, H.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.A.; Petitjean, F. InceptionTime: Finding AlexNet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
- Bragança, H.; Colonna, J.G.; Oliveira, H.A.B.F.; Souto, E. How Validation Methodology Influences Human Activity Recognition Mobile Systems. Sensors 2022, 22, 2360. [Google Scholar] [CrossRef]
- Yu, J.; Zhang, L.; Cheng, D.; Bu, C.; Wu, H.; Song, A. RepMobile: A MobileNet-Like Network With Structural Reparameterization for Sensor-Based Human Activity Recognition. IEEE Sens. J. 2024, 24, 24224–24237. [Google Scholar] [CrossRef]
- Silpa, A.S.; Benifa, J.B.; Anu, K.; Vijayakumar, A. Human Activity Recognition Using Efficientnet-B0 Deep Learning Model. In Proceedings of the 2023 Intelligent Computing and Control for Engineering and Business Systems (ICCEBS), Chennai, India, 14–15 December 2023; pp. 1–3. [Google Scholar] [CrossRef]
- Zhou, H.; Zhao, Y.; Liu, Y.; Lu, S.; An, X.; Liu, Q. Multi-Sensor Data Fusion and CNN-LSTM Model for Human Activity Recognition System. Sensors 2023, 23, 4750. [Google Scholar] [CrossRef]
Figure 1.
End-to-end workflow of the proposed wearable sensor-based activity recognition framework.
Figure 2.
A visual representation of the wearable sensors’ placement on the human body.
Figure 3.
The architecture of the 1D-ResNeXt model.
Figure 4.
Details of the multi-kernel module.
Figure 5.
F1-score analysis of activity recognition performance across different sensor placements and time window sizes.
Figure 6.
Confusion matrices of the proposed 1D-ResNeXt using sensor data from different placements: (a) chest, (b) cervix, (c) lumbar, (d) left wrist, and (e) right wrist.
Figure 7.
Learning curves of the proposed 1D-ResNeXt using sensor data from different placements: (a) chest, (b) cervix, (c) lumbar, (d) left wrist, and (e) right wrist.
Table 1.
Details of activities in the HRCA dataset.
| Class | Activity Label | Description |
|---|
| 0 | Standing still | The subject remains motionless, waiting for a start signal to begin the task. |
| 1 | Walking without a crate | The action starts when one foot contacts the ground to bear weight, while the opposite foot prepares to step. |
| 2 | Bending to approach an empty crate | This involves the subject initiating a forward bend at the trunk and/or knees to reach an empty crate. |
| 3 | Bending to approach a full crate | Similar to Class 2, but the target object is a full crate instead of an empty one. |
| 4 | Lifting an empty crate | The subject moves from a bent posture to a fully upright stance while lifting an empty crate. |
| 5 | Lifting a full crate | This action mirrors Class 4, but with a full crate being lifted instead of an empty one. |
| 6 | Walking with an empty crate | The walking phase begins while the participant carries an empty crate, similar in motion to Class 1. |
| 7 | Walking with a full crate | This movement replicates Class 6 but includes carrying a full crate. |
| 8 | Placing an empty crate onto the UGV | The task starts with trunk or knee flexion and ends when the empty crate is successfully placed on the UGV. |
| 9 | Placing a full crate onto the UGV | Same as Class 8, but the object placed is a full crate instead of an empty one. |
Table 2.
Adam optimizer parameters.
| Parameter Name | Parameter Value |
|---|
| 0.001 |
| Beta_1 | 0.9 |
| Beta_2 | 0.999 |
| Epsilon | |
| Decay | 0.01 |
Table 3.
Evaluation metrics employed in this study.
| Metric | Equation * | Description |
|---|
| Accuracy | | Defines the ratio of correctly predicted samples (both positive and negative) to the total number of evaluated instances. |
| Precision | | Specifies the proportion of correctly identified positive cases among all instances predicted as positive by the model. |
| Recall | | Indicates the fraction of actual positive samples that were successfully recognized by the classifier. |
| F1-score | | Represents the harmonic mean of precision and recall, providing a balanced measure when both false positives and false negatives are significant. |
Table 4.
Performance evaluation of the proposed 1D-ResNeXt model using 0.5 s temporal windows across different sensor placements.
| Sensor Placements | Recognition Performance |
|---|
| Accuracy | Precision | Recall | F1-Score |
|---|
| Chest | 99.92% (±0.10%) | 99.77% (±0.30%) | 99.96% (±0.05%) | 99.86% (±0.18%) |
| Cervix | 97.76% (±1.06%) | 96.59% (±2.37%) | 96.34% (±2.06%) | 96.39% (±2.18%) |
| Lumbar | 99.36% (±0.70%) | 99.47% (±0.57%) | 99.47% (±0.57%) | 99.46% (±0.58%) |
| Left wrist | 97.33% (±1.77%) | 97.25% (±2.06%) | 96.76% (±2.26%) | 96.88% (±2.28%) |
| Right wrist | 97.88% (±0.99%) | 98.05% (±0.85%) | 97.84% (±1.50%) | 97.90% (±1.18%) |
Table 5.
Performance evaluation of the proposed 1D-ResNeXt model using 1.0 s temporal windows across different sensor placements.
| Sensor Placements | Recognition Performance |
|---|
| Accuracy | Precision | Recall | F1-Score |
|---|
| Chest | 97.46% (±3.27%) | 96.02% (±4.72%) | 95.30% (±6.77%) | 94.74% (±7.38%) |
| Cervix | 95.16% (±3.42%) | 93.99% (±3.25%) | 89.45% (±7.21%) | 90.65% (±6.24%) |
| Lumbar | 98.82% (±0.87%) | 98.18% (±1.27%) | 97.99% (±1.47%) | 98.00% (±1.46%) |
| Left wrist | 91.62% (±3.12%) | 91.75% (±3.55%) | 86.52% (±6.12%) | 87.99% (±5.34%) |
| Right wrist | 94.80% (±1.97%) | 93.36% (±5.44%) | 92.68% (±5.76%) | 92.86% (±5.67%) |
Table 6.
Performance evaluation of the proposed 1D-ResNeXt model using 1.5 s temporal windows across different sensor placements.
| Sensor Placements | Recognition Performance |
|---|
| Accuracy | Precision | Recall | F1-Score |
|---|
| Chest | 97.36% (±2.56%) | 96.12% (±3.72%) | 97.31% (±3.53%) | 96.26% (±3.78%) |
| Cervix | 93.30% (±3.60%) | 93.00% (±3.43%) | 89.86% (±6.85%) | 90.28% (±6.09%) |
| Lumbar | 95.64% (±2.48%) | 95.35% (±2.31%) | 95.02% (±2.46%) | 94.63% (±2.41%) |
| Left wrist | 93.30% (±3.60%) | 93.00% (±3.43%) | 89.86% (±6.85%) | 90.28% (±6.09%) |
| Right wrist | 96.62% (±1.93%) | 96.76% (±1.97%) | 94.40% (±4.01%) | 95.20% (±3.28%) |
Table 7.
Performance evaluation of the proposed 1D-ResNeXt model using 2.0 s temporal windows across different sensor placements.
| Sensor Placements | Recognition Performance |
|---|
| Accuracy | Precision | Recall | F1-Score |
|---|
| Chest | 95.93% (±1.88%) | 92.48% (±6.55%) | 92.79% (±5.61%) | 92.40% (±6.27%) |
| Cervix | 86.04% (±3.05%) | 76.10% (±2.70%) | 75.31% (±4.78%) | 74.88% (±3.86%) |
| Lumbar | 92.20% (±1.29%) | 91.75% (±4.62%) | 91.05% (±3.28%) | 90.61% (±3.89%) |
| Left wrist | 89.73% (±4.97%) | 82.78% (±6.10%) | 81.67% (±6.82%) | 81.04% (±7.29%) |
| Right wrist | 94.80% (±1.97%) | 93.36% (±5.44%) | 92.68% (±5.76%) | 92.86% (±5.64%) |
Table 8.
Comparison results with previous work.
| Sensor Placements | MobileNet [41] | EfficientNet [42] | CNN-LSTM [43] | Proposed 1D-ResNeXt |
|---|
| Accuracy | F1-Score | Accuracy | F1-Score | Accuracy | F1-Score | Accuracy | F1-Score |
|---|
| Chest | 99.16% | 99.23% | 98.45% | 98.05% | 95.97% | 96.45% | 99.92% | 99.86% |
| Cervix | 96.95% | 95.74% | 97.11% | 95.98% | 92.54% | 90.74% | 97.76% | 96.39% |
| Lumbar | 96.39% | 94.13% | 97.95% | 96.69% | 97.50% | 96.74% | 99.36% | 99.46% |
| Left wrist | 92.66% | 90.09% | 95.99% | 96.17% | 95.93% | 96.44% | 97.33% | 96.88% |
| Right wrist | 97.29% | 97.28% | 97.08% | 96.61% | 95.86% | 96.74% | 97.88% | 97.90% |
Table 9.
Performance evaluation of the proposed 1D-ResNeXt model to show sensor-type impact.
| Sensor Types * | Recognition Performance |
|---|
| Accuracy | Precision | Recall | F1-Score |
|---|
| Acc. | 92.29% (±7.49%) | 92.11% (±7.96%) | 88.33% (±11.77%) | 89.23% (±11.18%) |
| Gyro. | 92.93% (±1.32%) | 90.18% (±1.25%) | 88.87% (±3.09%) | 89.11% (±2.28%) |
| Mag. | 85.94% (±2.99%) | 84.61% (±3.15%) | 82.76% (±2.34%) | 82.71% (±2.47%) |
| Acc., Gyro. | 99.49% (±0.54%) | 99.17% (±0.76%) | 98.89% (±0.91%) | 99.02% (±0.82%) |
| Acc., Mag. | 97.73% (±4.15%) | 97.59% (±4.49%) | 96.37% (±6.69%) | 96.74% (±6.05%) |
| Gyro., Mag. | 94.01% (±5.11%) | 92.18% (±4.73%) | 91.79% (±6.11%) | 91.48% (±6.11%) |
| Acc., Gyro., Mag. | 99.92% (±0.10%) | 99.77% (±0.30%) | 99.96% (±0.05%) | 99.86% (±0.18%) |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).