Robust IoT Activity Recognition via Stochastic and Deep Learning

Wang, Xuewei; Wang, Shihao; Zhang, Xiaoxi; Li, Chunsheng

doi:10.3390/app15084166

Open AccessArticle

Robust IoT Activity Recognition via Stochastic and Deep Learning

¹

Department of Decision Sciences, School of Business, Macau University of Science and Technology, Macau, China

²

School of Computer and Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4166; https://doi.org/10.3390/app15084166

Submission received: 11 March 2025 / Revised: 1 April 2025 / Accepted: 4 April 2025 / Published: 10 April 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

In the evolving landscape of Internet of Things (IoT) applications, human activity recognition plays an important role in domains such as health monitoring, elderly care, sports training, and smart environments. However, current approaches face significant challenges: sensor data are often noisy and variable, leading to difficulties in reliable feature extraction and accurate activity identification; furthermore, ensuring data integrity and user privacy remains an ongoing concern in real-world deployments. To address these challenges, we propose a novel framework that synergizes advanced statistical signal processing with state-of-the-art machine learning and deep learning models. Our approach begins with a rigorous preprocessing pipeline—encompassing filtering and normalization—to enhance data quality, followed by the application of probability density functions and key statistical measures to capture intrinsic sensor characteristics. We then employ a hybrid modeling strategy combining traditional methods (SVM, Decision Tree, and Random Forest) and deep learning architectures (CNN, LSTM, Transformer, Swin Transformer, and TransUNet) to achieve high recognition accuracy and robustness. Additionally, our framework incorporates IoT security measures designed to safeguard data integrity and privacy, marking a significant advancement over existing methods in both efficiency and effectiveness.

Keywords:

human activity recognition; sensors; internet of things

1. Introduction

The rapid expansion of the Internet of Things (IoT) has unlocked new methods for monitoring human activity, which is increasingly recognized as a vital indicator of overall health [1]. For example, patients recovering from surgery often exhibit slower or limited movements, and a careful analysis of these patterns can help medical professionals detect issues that may hinder recovery [2]. However, traditional methods of recording activity are prone to errors and subjective biases, highlighting the need for approaches that are both accurate and reliable.

Human activity recognition in IoT makes use of a range of sensing techniques [3,4,5,6], each with its own advantages and shortcomings. Vision-based systems, such as cameras, provide high-resolution spatial and temporal data, yet they raise serious privacy concerns by capturing identifiable images and can struggle in environments with obstacles [7]. Wearable sensors, like accelerometers, and ambient devices, such as smart plugs, partially address privacy issues by not recording visual data, but they often impose user burdens or deliver only sparse, low-resolution information [8]. In contrast, wireless sensing methods—including WiFi CSI analysis and radar systems—offer device-free monitoring by leveraging existing infrastructure; however, they can be less accurate due to environmental noise and may still pose indirect privacy risks by revealing personal routines through signal patterns. Also, radar is usually used for distance detection instead of activities, and meanwhile, most IoT devices are low-end hardware, without equipped with such devices [9].

These different methods underscore ongoing tradeoffs between privacy, accuracy, and usability. Systems that use visual or acoustic data can capture rich information at the cost of privacy, while wearable and ambient sensors are easier to use but sometimes lack the detail needed for precise recognition. Wireless techniques, although less intrusive, may not robustly capture fine-grained activities, and user compliance remains an issue for wearables. Additionally, complex scenarios like overlapping activities in smart homes can challenge ambient systems. Together, these challenges point to the need for a balanced solution that preserves privacy, ensures high accuracy, and integrates seamlessly into daily life without being overly intrusive or dependent on user actions.

Modern sensor technology offers a promising solution. The devices, which come equipped with accelerometers and gyroscopes, can continuously track and record movement data in real time [10]. Originally developed to improve user interfaces and enhance gaming experiences [11], these sensors now enable detailed and non-invasive monitoring of everyday activities. Continuous monitoring with such devices reduces reliance on self-reported data and offers a more accurate picture of a person’s physical state.

Despite these benefits, it is not straightforward to implement an effective human activity recognition appraoch via sensing. Sensor data can be noisy and affected by environmental disturbances, complicating the extraction of meaningful information [12,13]. In this paper, we aim to enhance the recognition accuracy and robustness of human activity recognition (HAR) systems in noisy IoT sensing environments by leveraging an innovative approach based on IMU data processing and feature extraction. Moreover, combining robust data analysis techniques with strong security measures is essential in IoT applications to safeguard user information. To overcome these challenges, our work presents a novel framework that merges traditional statistical methods with modern machine learning approaches.

Therefore, the central research question addressed in this work is how to enhance human activity recognition accuracy and robustness in a noisy sensing environment for IoT devices. To address this problem, our method starts with a careful preprocessing stage that includes filtering and normalization to reduce noise and improve accuracy. We then apply principles from stochastic processes and statistical signal processing to the preprocessed data, extracting key features such as mean, variance, skewness, and kurtosis. This process helps us uncover the underlying patterns associated with different activities like walking, running, and squatting. While previous studies have explored statistical signal processing, machine learning, or deep learning techniques in isolation, our approach uniquely integrates an optimized preprocessing pipeline—featuring carefully calibrated Kalman filtering—with a hybrid model that synergizes statistical and deep learning-based feature extraction. This fusion directly addresses critical gaps in handling noisy sensor data, improving robustness and recognition accuracy in IoT-based human activity recognition systems beyond what prior methods have achieved.

Overall, the major contributions can be summarized from the following aspects:

We developed a dedicated application for collecting smartphone sensor data, enabling a human activity recognition system that achieved 98.91% accuracy with Random Forest compared to the 91.30% achieved by state-of-the-art TransUNet architecture.
We evaluated both traditional machine learning and deep learning models—including Random Forests, CNNs, LSTMs, and three Transformer-based architectures—establishing statistically significant performance differences (ANOVA: $F = 48.69$ , $p < 0.001$ ) and identifying computational efficiency tradeoffs critical for resource-constrained IoT deployments.
We conducted systematic sensitivity analyses of Kalman filter parameters and window segmentation lengths, providing empirical evidence for optimal preprocessing configurations and enhancing reproducibility across different IoT environments.

In summary, this work not only advances techniques for human activity recognition but also addresses critical issues related to data quality and security in IoT applications, paving the way for more reliable and secure health monitoring systems.

2. Related Work

Due to rapid advances in sensor technology and reduced costs, its application in human activity recognition has become a central focus of research.

2.1. Sensor-Based Recognition

Recent improvements in sensor technology have significantly broadened its applications in fields such as medical health monitoring, sports science, and daily activity tracking [14,15,16]. In particular, sensors like accelerometers and gyroscopes can capture subtle human motions in real time, enabling the identification of basic movement patterns such as walking and running [17,18]. These studies demonstrate that an in-depth analysis of sensor data leads to a better understanding of the complexities of human movement, which is essential for developing more efficient and precise human activity recognition systems. For example, frequency domain analysis of accelerometer data [19] enables researchers to detect variations in movement speed and gait, findings extensive applications in both sports science and rehabilitative medicine [20,21,22].

Further progress in enhancing the accuracy and robustness of human activity recognition has been achieved by employing data fusion techniques. By combining information from multiple sensors—such as accelerometers, gyroscopes, and magnetometers—a more comprehensive description of motion is obtained, which improves the system’s ability to recognize complex human movement patterns [23,24]. For instance, while a single accelerometer may struggle to differentiate between stable walking and running, integrating data from gyroscopes and magnetometers allows for accurate activity classification by providing unique perspectives from each sensor [25,26]. Moreover, these multisensor fusion techniques help to mitigate the effects of environmental noise and individual variations, thereby enhancing the adaptability and stability of the system under diverse conditions.

Additionally, WiFi Channel State Information and radar-based have demonstrated strong capabilities in human activity recognition; they primarily rely on capturing environmental changes rather than directly sensing device states. This introduces potential limitations, such as increased susceptibility to environmental noise and occlusions, as well as privacy concerns, since these methods may inadvertently capture information about surrounding individuals without their consent. In contrast, IMU-based sensing is inherently user-centric, measuring only the movement of the device itself, thereby mitigating privacy. Additionally, IMU sensors are already embedded in most consumer IoT devices, allowing for seamless deployment without additional infrastructure.

2.2. Sensing Approaches

The use of stochastic processes and statistical signal processing techniques is crucial in the study of human activity recognition [27]. These methods excel at processing and analyzing complex time series data from sensors, enabling the extraction of key features that are critical for identifying various activity patterns [28]. By applying statistical techniques such as autocorrelation functions and power spectral density, researchers can quantify the periodicity and randomness in sensor data, which is vital for understanding the dynamics of complex human movements. In addition, advanced statistical signal processing methods like wavelet transforms have been utilized to analyze non-stationary signals. This approach allows researchers to examine the local characteristics of the signal at different temporal scales, thereby offering new tools for identifying more complex motion patterns [29].

The integration of machine learning techniques has further propelled the development of human activity recognition technologies, extending their capabilities beyond the recognition of basic motion patterns [30,31,32]. By training models such as Support Vector Machines (SVMs), Decision Trees, and Random Forests, researchers can learn complex activity patterns from large-scale sensor data [33]. These models are adept at analyzing hidden patterns and relationships within sensor data, allowing for the detection of subtle variations in human movement, such as fatigue-induced gait changes or falls. Moreover, deep learning technologies, particularly Convolutional Neural Networks (CNNs) [34,35,36,37] and Recurrent Neural Networks (RNNs) [19,38,39], have demonstrated robust capabilities in handling time series data by automatically extracting time-dependent features and performing complex pattern recognition. This progress opens up new possibilities for the future development of human activity recognition technologies, as these methods not only enhance recognition accuracy but also provide effective support for real-time health monitoring and preventive healthcare.

3. Dataset

This section details the dataset acquisition and preprocessing procedures employed in our study.

3.1. Sensor Data Acquisition

A thorough understanding of sensor operations is fundamental for accurate data collection. An accelerometer measures the acceleration of a device in various directions, whereas a gyroscope quantifies the rotational velocity about its three principal axes [40]. On the Android platform, these sensor readings are accessed via dedicated Application Programming Interfaces (APIs) [41].

Initially, our application declares the necessary sensor permissions. Subsequently, access to the accelerometer and gyroscope is obtained through the Android SensorManager service (note that magnetometer data are not utilized in this study). By registering a sensor listener, our system is capable of receiving real-time sensor data. As shown in Figure 1, the accelerometer reports the device’s acceleration along the three sensor coordinate axes, capturing both physical acceleration (change in velocity) and gravitational acceleration. The data are provided in the x, y, and z fields of sensor_event_t.acceleration in SI units (

{m / s}^{2}

), representing the net acceleration after subtracting gravitational acceleration. Similarly, the gyroscope reports the device’s rotational velocity around the three axes. Following the standard mathematical definition, counterclockwise rotation is considered positive (right-hand rule), and the results, measured in radians per second (rad/s), are stored in the x, y, and z fields of sensor_event_t.gyro.

To balance data accuracy and processing efficiency, the collection frequency is carefully optimized through empirical testing. A high frequency can produce excessive data volume and computational overhead, particularly relevant for resource-constrained IoT devices, while a low frequency may miss critical motion details essential for activity recognition. Through systematic evaluation of different sampling rates, we determined that a moderate frequency of 20 Hz (implemented as private final long interval = 50) provides optimal balance between temporal resolution and processing efficiency. This sampling rate captures the fundamental frequencies of human activities (typically below 10 Hz) while maintaining reasonable data volumes. Our sensitivity analysis confirmed that increasing the sampling rate beyond 20 Hz offered diminishing returns in classification accuracy while substantially increasing computational demands. This careful calibration, backed by empirical testing rather than arbitrary selection, represents an important methodological contribution of our dataset acquisition strategy, optimizing both accuracy and efficiency for real-world IoT deployments.

3.2. Dataset Preprocessing

Preprocessing is a vital step that enhances data quality and lays a strong foundation for subsequent feature extraction and pattern recognition [42].

As shown in Figure 2, the raw accelerometer data exhibit significant noise and variability. This noise can be attributed to various factors, including sensor drift, environmental disturbances, and sensor inaccuracies. To address these challenges, we employed a series of preprocessing steps to improve the quality of our sensor data. In this study, we conducted the following preprocessing steps:

Filtering: Sensor data often contain various sources of noise. To improve reliability, we applied filtering techniques such as low-pass filters and Kalman filters to effectively remove random noise from the raw data. Through systematic parameter sensitivity analysis, we determined that Kalman filtering with process noise

Q = 10^{- 5}

and measurement noise

R = 10^{- 2}

provides optimal noise reduction while preserving essential signal characteristics. Our ablation studies demonstrate that this optimized filtering step alone improves classification accuracy by 6.3% compared to using raw data.

Normalization: Differences in sensor performance across devices and varying usage environments can lead to inconsistent data. We normalized the data to a common scale, which standardizes the measurements and facilitates accurate comparison and analysis. When combined with our optimized Kalman filtering, normalization contributed to an overall accuracy improvement of 8.2–14.7% compared to using unprocessed data.

Window Segmentation: For time series analysis, the continuous sensor data were segmented into multiple time windows, each capturing data over a specific duration. This segmentation enables localized analysis, allowing for more precise feature extraction and time series evaluation. Through empirical testing of various window lengths (20–60 samples), we determined that 30-sample windows provide the optimal balance between temporal resolution and feature stability, with shorter windows failing to capture complete activity cycles and longer windows introducing class boundary confusion.

Feature Engineering: In the final preprocessing stage, we extracted a set of statistical features from each data window. These features include fundamental metrics such as the mean, variance, maximum, and minimum values of the sensor data within the window.

The careful balance between sensor configuration, efficient data handling, and robust feature engineering underpins the improved performance of our human activity recognition system. The systematic optimization of preprocessing parameters represents a significant methodological contribution, as it establishes reproducible guidelines for effective sensor data processing in IoT environments.

4. Statistical Analysis and Feature Extraction

In this section, we present our novel approach to analyzing sensor data using statistical and stochastic process methods. By leveraging probability density functions (PDFs), statistical moments, and signal processing techniques, our method extracts discriminative features that are critical for accurately distinguishing among various human movement states.

4.1. Characterizing Sensor Data with Probability Density Functions and Statistical Moments

Sensor data from accelerometers and gyroscopes are inherently time series data and can be modeled as stochastic processes. Our analysis begins by computing the probability density functions (PDFs) of these signals to reveal their underlying distributions. For instance, by examining the PDFs of accelerometer data along the x, y, and z axes during walking, several key characteristics emerge:

Multimodality: The PDFs exhibit multiple peaks, suggesting that the acceleration data do not fluctuate around a single value. This behavior reflects the periodic motion of different body parts (e.g., arm swinging and stepping).
Symmetry and Skewness: While the distributions along the x and z axes are relatively symmetric, the y axis shows slight skewness, possibly due to natural body inclinations or specific gait patterns.
Range of Fluctuation and Central Tendency: Differences in the spread of data across axes indicate that certain directions (e.g., the x axis) experience more pronounced acceleration changes, whereas others (e.g., the z axis) remain more stable.

Similarly, the PDFs of gyroscope data, which capture angular velocities, offer insight into rotational dynamics:

Central Focus: All axes typically peak near zero, suggesting that rotational movements are minor and centered around a stable state.
Data Concentration: The x and z axes exhibit sharper peaks compared to the more dispersed y axis, indicating differences in the stability of rotational motion.
Subtle Skewness: Minor asymmetries in the distribution may be attributed to natural variations in walking dynamics.

Our sensitivity analysis demonstrates that these distributional characteristics are significantly influenced by preprocessing techniques. The optimized Kalman filter (

Q = 10^{- 5}

,

R = 10^{- 2}

) preserves the essential multimodal structure of activity-specific PDFs while reducing spurious peaks caused by sensor noise. This preservation of signal characteristics, while minimizing noise, contributes directly to the superior performance of statistical feature-based classifiers, particularly Random Forest models, which achieved 98.91% accuracy by effectively leveraging these distinctive distributional patterns.

The figures below (i.e., Figure 3, Figure 4 and Figure 5) illustrate the PDFs for accelerometer and gyroscope data during walking, running, and squatting, highlighting the distinct statistical characteristics of each activity.

In addition to PDFs, first- and second-order statistical quantities such as the mean and variance provide fundamental information about the central tendency and dispersion of sensor readings. For example, higher variance in accelerometer data during running compared to walking reflects more intense movement dynamics. Higher-order statistical quantities, including skewness and kurtosis, offer further insights into the shape of the data distribution, capturing subtle differences that are crucial for fine-grained motion analysis.

4.2. Temporal Dynamics: Autocorrelation and Power Spectral Density Analysis

Beyond basic statistics, we analyze the temporal dynamics of sensor data through autocorrelation and power spectral density (PSD) plots, such as those shown in Figure 6. For the accelerometer data along the x axis during walking, we uncovered the following:

Autocorrelation Analysis: The autocorrelation function, which equals 1 at a lag of 0, decreases as the time lag increases. Periodic increases in the autocorrelation at specific lags suggest recurring motion patterns, such as the periodic impact of footsteps.
PSD Analysis: The PSD plot reveals that low-frequency components dominate the signal, and this is consistent with the relatively slow and periodic movements of walking. The rapid decline in the PSD at higher frequencies indicates a lack of high-frequency energy, confirming the absence of abrupt motion changes.

Figure 6. Autocorrelation coefficients and power spectral density (PSD) of the accelerometer data along the x, y, and z axes during walking.

For gyroscope data, similar analyses show that rotational dynamics are concentrated in the low-frequency range, with rapidly diminishing correlation over time. These analyses not only validate the consistency of movement patterns but also provide a robust basis for feature extraction in subsequent classification tasks.

4.3. Feature Extraction and Classification

The statistical analyses described above are integral to constructing comprehensive feature vectors. These vectors incorporate a range of computed features, including probability density metrics, mean, variance, skewness, kurtosis, autocorrelation coefficients, and PSD values. In addition, we integrated the raw sensor data into a neural network, thereby enhancing the model’s ability to learn complex patterns.

Feature Vector Construction: Each data sample was transformed into a feature vector that encapsulates the statistical and spectral properties of the sensor data. This vector served as the input for various classification algorithms.

Classifier Selection and Optimization: We experimented with a diverse array of classifiers, including traditional methods (SVM, Decision Trees, and Random Forests) and advanced deep learning architectures (CNN, LSTM, Transformer, Swin Transformer, and TransUNet). Hyperparameters were systematically optimized through grid search and cross-validation, with specific configurations detailed in Section 5.2.

Computational Efficiency Considerations: Beyond classification accuracy, we evaluated the computational demands of each model—a critical factor for IoT deployments.

The tables below summarize the statistical features extracted from sensor data for different activities, further illustrating the distinct profiles obtained through our comprehensive analysis. In particular, Table 1 presents the statistical features of walking, which were computed from the accelerometer and gyroscope data during the walking activity, Table 2 presents the statistical features of running, which were computed from the accelerometer and gyroscope data during the running activity, and Table 3 presents the statistical features of squatting, which were computed from the accelerometer and gyroscope data during the squatting activity.

4.4. Application of Statistical Methods in Movement State Recognition

Once the statistical properties of the sensor data were established, these features were utilized for movement action recognition. The details of our approach are as follows:

Feature Vector Construction: Creating a feature vector for each sample that includes statistical measures (PDF characteristics, mean, variance, skewness, and kurtosis) and spectral features (autocorrelation and PSD).
Classifier Selection: Evaluating multiple classifiers such as SVM, Random Forest, and neural networks to identify the most suitable model for distinguishing between walking, running, and squatting.
Model Training and Optimization: Training the classification model on labeled sensor data and iteratively optimizing feature selection and model parameters to enhance performance.

Our innovative integration of detailed statistical analysis with modern machine learning methods forms the basis of a robust framework for human activity recognition. This contribution not only improves the accuracy of activity classification but also provides a scalable approach for real-time applications in health monitoring and preventive healthcare.

5. Feature Analysis and Pattern Recognition

In this section, we detail our advanced approach to feature analysis and pattern recognition, which plays a critical role in transforming raw sensor data into actionable insights. Our novel methodology not only leverages conventional time domain and frequency domain features but also introduces innovative combined features that enhance the accuracy and robustness of movement state recognition.

5.1. Feature Analysis

Feature extraction is a fundamental step in signal processing and pattern recognition, aiming to distill the most informative features from raw sensor data for effective classification. In our study, data collected from sensors—specifically accelerometers and gyroscopes—were preprocessed through filtering and normalization to mitigate noise and eliminate redundant information. This rigorous preprocessing ensures that the subsequent feature extraction process yields high-quality inputs for analysis.

We extracted a range of features from the preprocessed data, which were broadly categorized into time domain and frequency domain features. In the time domain, statistical measures such as the mean, variance, peak value, and root mean square were computed. These metrics capture the trends and fluctuations in the sensor data, providing clear distinctions between different movement states. For instance, the variance in accelerometer readings is typically higher during running compared to walking, reflecting more vigorous movement patterns.

In the frequency domain, we converted the time domain signals using Fourier transformation to extract spectral characteristics such as power spectral density, skewness, and kurtosis. These features are vital for identifying periodic patterns and subtle variations in movement, particularly when distinguishing activities with different frequencies like walking versus running.

Furthermore, our approach introduces combined features that integrate data from an Inertial Measurement Unit (IMU) including accelerometers and gyroscopes and merge both time-domain and frequency-domain characteristics. This innovative feature fusion significantly enriches the information content available for subsequent classification tasks, representing a key contribution and novelty of our work.

5.2. Pattern Recognition

Following feature extraction, our study applied a comprehensive ensemble of machine learning and deep learning techniques to accurately classify human activities. We implemented and rigorously evaluated eight distinct algorithms: traditional models (SVM, Decision Trees, and Random Forests) and advanced deep learning architectures (CNN, LSTM, Transformer, Swin Transformer, and TransUNet).Our classification framework integrates both traditional machine learning and deep learning paradigms, creating a hybrid approach that leverages their complementary strengths. Traditional algorithms excel in scenarios with limited data and offer interpretability, while deep learning models capture complex temporal patterns. For the traditional models, we employed feature vectors directly, whereas for the deep learning approaches, we preserved the temporal structure of the data:

CNN: Extracts spatial patterns from sensor signals through convolutional operations.
LSTM: Captures long-term dependencies in sequential data via recurrent connections.
Transformer: Leverages attention mechanisms to model global interactions.
Swin Transformer: Employs hierarchical feature extraction with windowed attention.
TransUNet: Combines CNN’s local feature extraction with Transformer’s global context.

The hyperparameters were optimized through systematic grid search with cross-validation:

CNN: Filter sizes $\in {32, 64, 128}$ , kernel sizes $\in {3, 5, 7}$ , dense units $\in {50, 100, 200}$ .
LSTM: Units $\in {50, 100, 200}$ , dropout rates $\in {0.2, 0.3, 0.4}$ , learning rates $\in {0.01, 0.001, 0.0001}$ .
Transformer-based models: Head sizes $\in {32, 64, 128}$ , number of heads $\in {2, 4, 8}$ , feedforward dimensions $\in {64, 128, 256}$ .

This comprehensive evaluation framework addresses the need for robust comparison between traditional and emerging deep learning approaches, particularly advanced Transformer-based architectures that have shown promising results in time series classification tasks.

6. Experimental Results and Analysis

The primary objective of our experiment was to develop and evaluate a robust system for recognizing human movement states (running, squatting, and walking) using smartphone accelerometer and gyroscope data. Our experimental framework encompasses comprehensive preprocessing techniques, statistical validation, and comparative evaluation across traditional and state-of-the-art deep learning models, including recently developed Transformer-based architectures.

6.1. Preprocessing and Filter Parameter Optimization

The collected sensor data underwent rigorous preprocessing through filtering and normalization to improve data quality. A critical advancement in our methodology is the systematic optimization of filter parameters through sensitivity analysis. We evaluated multiple combinations of Kalman filter parameters (process noise

Q \in [10^{- 6}, 10^{- 3}]

and measurement noise

R \in [10^{- 3}, 1]

), revealing that optimal noise reduction occurred at

Q = 10^{- 5}

and

R = 10^{- 2}

. This configuration significantly reduced random noise while preserving essential signal characteristics.

Figure 7 illustrates Fourier transform plots of the raw, low-pass filtered, high-pass filtered, and Kalman filtered accelerometer data along the x, y, and z axes during running. The optimized Kalman filter effectively attenuated high-frequency noise components while maintaining the fundamental frequency characteristics of the activity signal, demonstrating superior performance compared to traditional low-pass and high-pass filtering approaches.

Further sensitivity analysis on window segmentation length indicated that 30-sample windows provide the optimal balance between temporal resolution and feature stability. Shorter windows failed to capture complete activity cycles, while longer windows introduced class boundary confusion when activities transitioned. This systematic approach to parameter optimization represents a significant methodological contribution of our work.

6.2. Model Training and Performance Evaluation

For pattern recognition, we implemented a comprehensive suite of eight classifiers, including both traditional machine learning algorithms (SVM, Decision Tree, and Random Forest) and advanced deep learning architectures (CNN, LSTM, Transformer, Swin Transformer, and TransUNet). This expanded model comparison addresses previous limitations in the literature regarding Transformer-based architectures and provides a more complete assessment of state-of-the-art approaches. To better suit our HAR task, we adapted the Swin Transformer by adjusting its window-based attention mechanism to align with temporal sensor data segments. Additionally, we modified the TransUNet structure by simplifying the encoder–decoder design for improved efficiency in time series feature extraction.

All models were trained on consistently preprocessed data to ensure fair comparison. Table 4 presents the classification performance across all evaluated models, with Random Forest achieving the highest accuracy (98.91%), followed closely by TransUNet (91.30%) and Decision Tree (89.13%).

To ensure statistical rigor, we implemented 5-fold cross-validation and conducted ANOVA testing (

F = 48.69

,

p < 0.001

), confirming that performance differences between models are statistically significant. Subsequent pairwise t-tests further elucidated specific inter-model differences (Figure 8), with Random Forest significantly outperforming most other models (

p < 0.05

) except TransUNet (

t = 0.04

,

p = 0.97

), with which it showed comparable performance.

Figure 9 illustrates the classification performance of each model through confusion matrices. The visualization reveals that Random Forest achieved the most balanced classification across all three activities, with minimal misclassifications between classes. Random Forest correctly classified 25 of 27 running instances, all 19 squatting instances, and all walking instances. The TransUNet model showed similarly strong performance, particularly for running activities (26 of 27 correct), demonstrating the potential of hybrid architectures that combine convolutional features with Transformer-based attention mechanisms.

The training dynamics of deep learning models are visualized in Figure 10, which plots the training and validation accuracy across epochs. The TransUNet model demonstrated the strongest learning progression, achieving a peak validation accuracy of 90.54% at epoch 29. The LSTM and Swin Transformer models showed competitive performance (peak validation accuracies of 89.19% and 89.19%, respectively). In contrast, the standard Transformer model exhibited considerably more modest improvements, with validation accuracy plateauing around 66.22%, suggesting particular challenges in capturing relevant temporal patterns with limited training data.

6.3. Computational Efficiency Analysis

A critical contribution of our study is the comprehensive evaluation of computational efficiency—particularly relevant for resource-constrained IoT applications. Figure 11 presents a detailed comparison of the inference time and parameter count across all models.

Traditional machine learning models demonstrated superior efficiency, with Decision Tree achieving exceptional performance (inference time: 0.00033 s, parameters: 189, as shown in Figure 12) while maintaining strong accuracy. Despite its superior classification performance, Random Forest required moderately higher computational resources (inference time: 0.00574 s, parameters: 18,486). In stark contrast, all deep learning models exhibited substantially greater computational demands, with Swin Transformer having the longest inference time (0.06217 s) and CNN the highest parameter count (282,059).

This efficiency analysis reveals a crucial tradeoff in IoT applications: while deep learning models theoretically offer superior representation capacity, their practical deployment incurs substantial computational overhead. These findings are particularly relevant for edge computing scenarios where processing power and energy consumption are constrained.

6.4. Ablation Studies and Hyperparameter Optimization

To address the critical need for methodological transparency, we conducted systematic ablation studies and hyperparameter optimization experiments. Specifically, we compared four preprocessing configurations: (1) raw data, (2) Kalman filtering only, (3) normalization only, and (4) a combination of Kalman filtering and normalization. The complete pipeline consistently outperformed partial implementations by 8.2–14.7%, highlighting the synergistic effect of noise reduction and feature normalization.

For hyperparameter optimization, we employed grid search with cross-validation across all models. For CNN, we systematically evaluated filter sizes

\in 32, 64, 128

, kernel sizes

\in 3, 5, 7

, dense units

\in 50, 100, 200

, and learning rates

\in 0.01, 0.001, 0.0001

. The best performance was observed when filters were set to 64, kernel size to 3, dense units to 100, and learning rate to 0.001.

Similar optimization procedures were applied to the LSTM model, where the number of units

\in 50, 100, 200

, dropout rates

\in 0.2, 0.3, 0.4

, and learning rates

\in 0.01, 0.001, 0.0001

were explored. The Transformer-based models were also tuned following a similar approach. These comprehensive hyperparameter analyses improved model performance and offered insights into the sensitivity of different architectures to variations in parameter settings.

6.5. Analysis of Model Performance Differences

An intriguing aspect of our findings is that traditional machine learning models, particularly Random Forest, outperformed more complex deep learning architectures despite the latter’s theoretical advantages in representation capacity. Several factors contribute to this phenomenon:

Data Volume Limitations: Deep learning models typically require larger datasets to generalize effectively. Our experiment utilized a moderate-sized dataset (92 test samples), potentially limiting the full expression of deep learning capabilities. This is evidenced by the cross-validation results, where CNN showed higher mean accuracy (87.12%) than its test accuracy (79.35%), suggesting some overfitting.
Feature Engineering Effectiveness: Random Forest’s superior performance underscores the value of our statistical feature extraction approach. The handcrafted features effectively capture the discriminative characteristics of different activities, providing Random Forest with highly informative inputs that may obviate the need for deep learning’s automatic feature extraction.
Model–Data Compatibility: The ensemble nature of Random Forest, with its ability to handle non-linear relationships and feature interactions without making strong assumptions about data distribution, appears particularly well suited to human activity recognition data, which exhibit both statistical regularities and stochastic variations.
Architectural Limitations: Standard Transformer models struggled the most (65.22% accuracy), which was likely due to their reliance on global attention mechanisms that may not be optimal for capturing the localized patterns in activity data. In contrast, TransUNet’s hybrid architecture (91.30% accuracy) successfully combines convolutional feature extraction with Transformer modules, approaching Random Forest’s performance.

These findings highlight an important consideration for IoT applications: the most sophisticated model is not always the most effective, particularly when computational efficiency is considered alongside accuracy. Random Forest offers an optimal balance of accuracy, interpretability, and efficiency for this specific activity recognition task.

In summary, our experimental results validate the efficacy of our integrated system in recognizing human activity states with high accuracy and computational efficiency. The comprehensive evaluation across traditional and deep learning models, combined with statistical validation and efficiency analysis, provides valuable insights for deploying activity recognition systems in real-world IoT environments with varying resource constraints.

7. Discussion and Future Work

Our experimental results confirm that the developed application is capable of accurately distinguishing between walking, running, and squatting states using data from Android smartphone sensors. However, it is important to acknowledge potential limitations of our approach. Under extreme noise conditions, such as highly dynamic environments with significant electromagnetic interference or sensor disturbances, the recognition accuracy may decrease due to the reduced efficacy of filtering techniques. Additionally, variations in sensor configurations across different smartphone models or placements (e.g., orientation, mounting positions, etc.) might lead to discrepancies in recognition performance. Future work will address these limitations by exploring adaptive filtering methods, robust feature extraction algorithms, and techniques that enable the system to generalize across diverse sensor setups and challenging operational environments.

The contributions of our work extend beyond technical performance. The system provides practical tools for health monitoring, sports training, and rehabilitation therapy by enabling real-time tracking and analysis of motion patterns. Our approach also demonstrates the potential for personalized activity recognition by adapting to diverse user profiles through further data collection and advanced machine learning techniques. Future work will focus on refining feature extraction algorithms, enhancing model generalization across varied environments and users and improving user interface design to offer real-time feedback and comprehensive progress tracking. Additionally, addressing data security and privacy through robust measures and adherence to regulatory standards will be pivotal as the application scales to broader usage scenarios, such as fall detection for the elderly, child activity monitoring, and behavioral analysis in security fields.

8. Conclusions

This study successfully demonstrates the feasibility and effectiveness of using existing smartphone sensors for complex motion recognition. Our application accurately identifies and differentiates between walking, running, and squatting states by combining rigorous data preprocessing, advanced filtering, and a hybrid approach that integrates statistical analysis with state-of-the-art pattern recognition methods. The universal availability of accelerometers and gyroscopes in modern smartphones underlines the broad accessibility of our solution, enabling users to monitor daily motion patterns without additional hardware.

The experimental results show that traditional classifiers such as Random Forests can achieve high accuracy and efficiency in the task of this paper, while deep learning methods require more computing resources and longer training time. In contrast, traditional methods provide higher interpretability and lower computing requirements, making them suitable for real-time applications and devices with limited computing power. This tradeoff is a key consideration for practical deployment in various IoT scenarios.

In summary, the major contributions of this work include the innovative integration of stochastic process analysis with machine learning techniques, the effective reduction in sensor noise through optimized Kalman filtering, and the practical implementation of a scalable, real-time activity recognition system. These advancements not only enhance the accuracy and reliability of motion state recognition but also open new avenues for applications in health management, sports, and rehabilitation, making a significant impact on the field of human activity recognition.

Author Contributions

Conceptualization, X.W., X.Z. and C.L.; methodology, X.W. and X.Z.; software, X.W.; validation, X.W. and S.W.; formal analysis, X.W. and S.W.; resources, C.L.; data curation, X.W. and S.W.; writing—original draft preparation, X.W. and S.W.; writing—review and editing, C.L. and X.Z.; visualization, X.Z.; supervision, C.L.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors sincerely thank the editor and the anonymous reviewers for their invaluable contributions, which have significantly enhanced the quality of this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, P.; Wang, Y.; Tian, Y.; Zhou, T.S.; Li, J.S. An automatic user-adapted physical activity classification method using smartphones. IEEE Trans. Biomed. Eng. 2016, 64, 706–714. [Google Scholar] [CrossRef]
Huang, E.; Yan, K.; Onnela, J.P. Combining accelerometer and gyroscope data in smartphone-based activity recognition using movelets. arXiv 2021, arXiv:2109.01118. [Google Scholar]
Babangida, L.; Perumal, T.; Mustapha, N.; Yaakob, R. Internet of Things (IoT) based activity recognition strategies in smart homes: A review. IEEE Sens. J. 2022, 22, 8327–8336. [Google Scholar] [CrossRef]
Bian, S.; Liu, M.; Zhou, B.; Lukowicz, P. The state-of-the-art sensing techniques in human activity recognition: A survey. Sensors 2022, 22, 4596. [Google Scholar] [CrossRef]
sedaghati, N.; ardebili, S.; Ghaffari, A. Application of human activity/action recognition: A review. Multimed. Tools Appl. 2025, 1–30. [Google Scholar] [CrossRef]
Ye, X.; Sakurai, K.; Nair, N.K.C.; Wang, K.I.K. Machine learning techniques for sensor-based human activity recognition with data heterogeneity—A review. Sensors 2024, 24, 7975. [Google Scholar] [CrossRef]
Lan, G.; Wu, Y.; Hu, F.; Hao, Q. Vision-based human pose estimation via deep learning: A survey. IEEE Trans. Hum.-Mach. Syst. 2022, 53, 253–268. [Google Scholar] [CrossRef]
Ramanujam, E.; Perumal, T.; Padmavathi, S. Human activity recognition with smartphone and wearable sensors using deep learning techniques: A review. IEEE Sens. J. 2021, 21, 13029–13040. [Google Scholar] [CrossRef]
Liu, R.W.; Guo, Y.; Nie, J.; Hu, Q.; Xiong, Z.; Yu, H.; Guizani, M. Intelligent edge-enabled efficient multi-source data fusion for autonomous surface vehicles in maritime internet of things. IEEE Trans. Green Commun. Netw. 2022, 6, 1574–1587. [Google Scholar] [CrossRef]
Bennasar, M.; Price, B.A.; Gooch, D.; Bandara, A.K.; Nuseibeh, B. Significant features for human activity recognition using tri-axial accelerometers. Sensors 2022, 22, 7482. [Google Scholar] [CrossRef]
Mejia-Ricart, L.F.; Helling, P.; Olmsted, A. Evaluate action primitives for human activity recognition using unsupervised learning approach. In Proceedings of the 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), Cambridge, UK, 11–14 December 2017; pp. 186–188. [Google Scholar]
Al Mudawi, N.; Azmat, U.; Alazeb, A.; Alhasson, H.F.; Alabdullah, B.; Rahman, H.; Liu, H.; Jalal, A. IoT powered RNN for improved human activity recognition with enhanced localization and classification. Sci. Rep. 2025, 15, 10328. [Google Scholar] [CrossRef]
Alsaadi, M.; Keshta, I.; Ramesh, J.V.N.; Nimma, D.; Shabaz, M.; Pathak, N.; Singh, P.P.; Kiyosov, S.; Soni, M. Logical reasoning for human activity recognition based on multisource data from wearable device. Sci. Rep. 2025, 15, 380. [Google Scholar] [CrossRef] [PubMed]
Ferrari, A.; Micucci, D.; Mobilio, M.; Napoletano, P. Human activities recognition using accelerometer and gyroscope. In Proceedings of the European Conference on Ambient Intelligence, Rome, Italy, 13–15 November 2019; Springer: Cham, Switzerland, 2019; pp. 357–362. [Google Scholar]
Zhang, H.; Xu, L. Multi-stmt: Multi-level network for human activity recognition based on wearable sensors. IEEE Trans. Instrum. Meas. 2024, 73, 1–12. [Google Scholar] [CrossRef]
Khan, D.; Alonazi, M.; Abdelhaq, M.; Al Mudawi, N.; Algarni, A.; Jalal, A.; Liu, H. Robust human locomotion and localization activity recognition over multisensory. Front. Physiol. 2024, 15, 1344887. [Google Scholar] [CrossRef] [PubMed]
Ferrari, A.; Micucci, D.; Mobilio, M.; Napoletano, P. Trends in human activity recognition using smartphones. J. Reliab. Intell. Environ. 2021, 7, 189–213. [Google Scholar] [CrossRef]
Yin, Y.; Xie, L.; Jiang, Z.; Xiao, F.; Cao, J.; Lu, S. A systematic review of human activity recognition based on mobile devices: Overview, progress and trends. IEEE Commun. Surv. Tutor. 2024, 26, 890–929. [Google Scholar] [CrossRef]
Khan, D.; Alshahrani, A.; Almjally, A.; Al Mudawi, N.; Algarni, A.; Al Nowaiser, K.; Jalal, A. Advanced IoT-based human activity recognition and localization using Deep Polynomial neural network. IEEE Access 2024, 12, 94337–94353. [Google Scholar] [CrossRef]
Huang, Y.J.; Chang, C.S.; Wu, Y.C.; Han, C.C.; Cheng, Y.Y.; Chen, H.M. Development of wearable devices for collecting digital rehabilitation/fitness data from lower limbs. Sensors 2024, 24, 1935. [Google Scholar] [CrossRef]
Cui, Y. An Efficient Approach to Sports Rehabilitation and Outcome Prediction Using RNN-LSTM. Mob. Netw. Appl. 2024, 1–16. [Google Scholar] [CrossRef]
Karlov, M.; Abedi, A.; Khan, S.S. Rehabilitation exercise quality assessment through supervised contrastive learning with hard and soft negatives. Med. Biol. Eng. Comput. 2025, 63, 15–28. [Google Scholar] [CrossRef]
Ahmed, N.; Rafiq, J.I.; Islam, M.R. Enhanced human activity recognition based on smartphone sensor data using hybrid feature selection model. Sensors 2020, 20, 317. [Google Scholar] [CrossRef] [PubMed]
Batool, M.; Alotaibi, M.; Alotaibi, S.R.; AlHammadi, D.A.; Jamal, M.A.; Jalal, A.; Lee, B. Multimodal Human Action Recognition Framework using an Improved CNNGRU Classifier. IEEE Access 2024, 12, 158388–158406. [Google Scholar] [CrossRef]
Liu, M.; Rey, V.F.; Zhang, Y.; Ray, L.S.S.; Zhou, B.; Lukowicz, P. imove: Exploring bio-impedance sensing for fitness activity recognition. In Proceedings of the 2024 IEEE International Conference on Pervasive Computing and Communications (PerCom), Biarritz, France, 11–15 March 2024; pp. 194–205. [Google Scholar]
Zhang, Y.; Wang, L.; Chen, H.; Tian, A.; Zhou, S.; Guo, Y. IF-ConvTransformer: A framework for human activity recognition using IMU fusion and ConvTransformer. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–26. [Google Scholar] [CrossRef]
Nematallah, H.; Rajan, S. Quantitative Analysis of Mother Wavelet Function Selection for Wearable Sensors-Based Human Activity Recognition. Sensors 2024, 24, 2119. [Google Scholar] [CrossRef]
Ahmed, N.; Numan, M.O.A.; Kabir, R.; Islam, M.R.; Watanobe, Y. A Robust Deep Feature Extraction Method for Human Activity Recognition Using a Wavelet Based Spectral Visualisation Technique. Sensors 2024, 24, 4343. [Google Scholar] [CrossRef]
Gholamiangonabadi, D.; Grolinger, K. Personalized models for human activity recognition with wearable sensors: Deep neural networks and signal processing. Appl. Intell. 2023, 53, 6041–6061. [Google Scholar] [CrossRef]
Hassan, N.; Miah, A.S.M.; Shin, J. A deep bidirectional LSTM model enhanced by transfer-learning-based feature extraction for dynamic human activity recognition. Appl. Sci. 2024, 14, 603. [Google Scholar] [CrossRef]
Kaseris, M.; Kostavelis, I.; Malassiotis, S. A comprehensive survey on deep learning methods in human activity recognition. Mach. Learn. Knowl. Extr. 2024, 6, 842–876. [Google Scholar] [CrossRef]
Najeh, H.; Lohr, C.; Leduc, B. Real-Time Human Activity Recognition on Embedded Equipment: A Comparative Study. Appl. Sci. 2024, 14, 2377. [Google Scholar] [CrossRef]
Ilisei, D.; Suciu, D.M. Human-activity recognition with smartphone sensors. In Proceedings of the on the Move to Meaningful Internet Systems: OTM 2019 Workshops: Confederated International Workshops: EI2N, FBM, ICSP, Meta4eS and SIAnA 2019, Rhodes, Greece, October 21–25, 2019, Revised Selected Papers; Springer: Cham, Switzerland, 2020; pp. 179–188. [Google Scholar]
Lee, S.M.; Yoon, S.M.; Cho, H. Human activity recognition from accelerometer data using Convolutional Neural Network. In Proceedings of the 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), Jeju, Republic of Korea, 13–16 February 2017; pp. 131–134. [Google Scholar]
Ayaz, F.; Alhumaily, B.; Hussain, S.; Imran, M.A.; Arshad, K.; Assaleh, K.; Zoha, A. Radar signal processing and its impact on deep learning-driven human activity recognition. Sensors 2025, 25, 724. [Google Scholar] [CrossRef]
Oleh, U.; Obermaisser, R.; Ahammed, A.S. A Review of Recent Techniques for Human Activity Recognition: Multimodality, Reinforcement Learning, and Language Models. Algorithms 2024, 17, 434. [Google Scholar] [CrossRef]
Fatima, I.; Farhan, A.A.; Tamoor, M.; ur Rehman, S.; Alhulayyil, H.A.; Tariq, F. DiscHAR: A Discrete Approach to Enhance Human Activity Recognition in Cyber Physical Systems: Smart Homes. Computers 2024, 13, 300. [Google Scholar] [CrossRef]
Lalapura, V.S.; Bhimavarapu, V.R.; Amudha, J.; Satheesh, H.S. A systematic evaluation of recurrent neural network models for edge intelligence and human activity recognition applications. Algorithms 2024, 17, 104. [Google Scholar] [CrossRef]
Kumar, P.; Chauhan, S.; Awasthi, L.K. Human activity recognition (har) using deep learning: Review, methodologies, progress and future research directions. Arch. Comput. Methods Eng. 2024, 31, 179–219. [Google Scholar] [CrossRef]
Jiang, Z.; Van Zoest, V.; Deng, W.; Ngai, E.C.; Liu, J. Leveraging machine learning for disease diagnoses based on wearable devices: A survey. IEEE Internet Things J. 2023, 10, 21959–21981. [Google Scholar] [CrossRef]
Gutiérrez, J.D.; Jiménez, A.R.; Seco, F.; Álvarez, F.J.; Aguilera, T.; Torres-Sospedra, J.; Melchor, F. GetSensorData: An extensible Android-based application for multi-sensor data registration. SoftwareX 2022, 19, 101186. [Google Scholar] [CrossRef]
Prakash, D. Pre-processing techniques for preparing clean and high-quality data for diabetes prediction. Int. J. Res. Publ. Rev. 2024, 5, 458–465. [Google Scholar] [CrossRef]

Figure 1. Coordinate system used by Sensor API (relative to the mobile device).

Figure 2. Pedestrian accelerometer data during walking.

Figure 3. Probability density functions (PDFs) of accelerometer and gyroscope data during walking.

Figure 4. Probability density functions (PDFs) of accelerometer and gyroscope data during running.

Figure 5. Probability density functions (PDFs) of accelerometer and gyroscope data during squatting.

Figure 7. Fourier transform plots of the raw, low-pass filtered, high-pass filtered, and Kalman filtered accelerometer data along the x, y, and z axes during running, demonstrating the superior noise reduction capability of the optimized Kalman filter.

Figure 8. Statistical tests.

Figure 9. Confusion matrices for different models showing the distribution of true vs. predicted classes across the three activities (Running, SquatStand, Walking).

Figure 10. Training and validation accuracy curves for deep learning models across epochs, highlighting the superior convergence properties of TransUNet compared to other architectures.

Figure 11. Comparative analysis of model inference times across traditional and deep learning models, highlighting the efficiency advantages of traditional approaches for resource-constrained IoT applications.

Figure 12. Comparative analysis of model complexity and inference latency, emphasizing tradeoffs between accuracy and efficiency across all classifiers.

Table 1. Statistical features of walking.

Statistical Feature	Mean	Variance	Skewness	Kurtosis
Accelerometer x	6.2384	18.9900	−0.4241	−0.7563
Accelerometer y	−3.0871	17.7884	0.5580	−0.6338
Accelerometer z	4.0417	16.5674	0.5320	−1.1627
Gyroscope x	0.0841	0.8902	0.0019	1.2293
Gyroscope y	−0.0851	1.1626	−0.2177	4.1347
Gyroscope z	−0.0345	1.7633	0.1725	0.1497

Table 2. Statistical features of running.

Statistical Feature	Mean	Variance	Skewness	Kurtosis
Accelerometer x	11.1203	199.4844	0.9640	−0.0665
Accelerometer y	4.2636	123.8192	−0.2531	−1.1467
Accelerometer z	2.0734	27.6127	0.5500	0.6395
Gyroscope x	−0.3546	5.5117	0.3244	−0.4091
Gyroscope y	0.0617	2.4959	−0.6855	3.7655
Gyroscope z	0.0154	8.3822	0.1533	−0.7929

Table 3. Statistical features of squatting.

Statistical Feature	Mean	Variance	Skewness	Kurtosis
Accelerometer x	8.7487	16.6963	0.1992	−0.7296
Accelerometer y	3.7476	3.7924	0.5660	0.2565
Accelerometer z	1.9465	4.2108	1.5020	3.1110
Gyroscope x	−0.0057	0.4072	0.2009	1.7751
Gyroscope y	−0.0126	0.4706	−0.4958	8.5725
Gyroscope z	−0.0039	0.3233	−0.5625	0.3254

Table 4. Comprehensive performance comparison across machine learning and deep learning models.

Model Category	Model	Accuracy	Cross-Val Mean (SD)
Traditional ML	SVM	72.83%	73.37% (±2.92%)
	Decision Tree	89.13%	84.51% (±4.48%)
	Random Forest	98.91%	89.66% (±4.73%)
Deep Learning	CNN	79.35%	87.12% (±1.33%)
	LSTM	85.87%	81.41% (±1.74%)
	Transformer	65.22%	60.49% (±1.99%)
	Swin Transformer	82.61%	83.10% (±1.52%)
	TransUNet	91.30%	89.57% (±1.38%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Wang, S.; Zhang, X.; Li, C. Robust IoT Activity Recognition via Stochastic and Deep Learning. Appl. Sci. 2025, 15, 4166. https://doi.org/10.3390/app15084166

AMA Style

Wang X, Wang S, Zhang X, Li C. Robust IoT Activity Recognition via Stochastic and Deep Learning. Applied Sciences. 2025; 15(8):4166. https://doi.org/10.3390/app15084166

Chicago/Turabian Style

Wang, Xuewei, Shihao Wang, Xiaoxi Zhang, and Chunsheng Li. 2025. "Robust IoT Activity Recognition via Stochastic and Deep Learning" Applied Sciences 15, no. 8: 4166. https://doi.org/10.3390/app15084166

APA Style

Wang, X., Wang, S., Zhang, X., & Li, C. (2025). Robust IoT Activity Recognition via Stochastic and Deep Learning. Applied Sciences, 15(8), 4166. https://doi.org/10.3390/app15084166

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust IoT Activity Recognition via Stochastic and Deep Learning

Abstract

1. Introduction

2. Related Work

2.1. Sensor-Based Recognition

2.2. Sensing Approaches

3. Dataset

3.1. Sensor Data Acquisition

3.2. Dataset Preprocessing

4. Statistical Analysis and Feature Extraction

4.1. Characterizing Sensor Data with Probability Density Functions and Statistical Moments

4.2. Temporal Dynamics: Autocorrelation and Power Spectral Density Analysis

4.3. Feature Extraction and Classification

4.4. Application of Statistical Methods in Movement State Recognition

5. Feature Analysis and Pattern Recognition

5.1. Feature Analysis

5.2. Pattern Recognition

6. Experimental Results and Analysis

6.1. Preprocessing and Filter Parameter Optimization

6.2. Model Training and Performance Evaluation

6.3. Computational Efficiency Analysis

6.4. Ablation Studies and Hyperparameter Optimization

6.5. Analysis of Model Performance Differences

7. Discussion and Future Work

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI