Author Contributions
Conceptualisation, M.I.K., P.M., A.L., D.H. and I.K.I.; methodology, M.I.K.; software, M.I.K. and I.K.I.; validation, M.I.K., P.M. and I.K.I.; formal analysis, M.I.K., P.M., A.L., D.H. and I.K.I.; investigation, M.I.K.; resources, M.I.K., P.M. and I.K.I.; writing—original draft preparation, M.I.K.; writing—review and editing, M.I.K., P.M., A.L., D.H. and I.K.I.; visualisation, M.I.K., P.M., A.L., D.H. and I.K.I.; supervision, I.K.I. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Overview of the proposed Human Activity Recognition framework using CSI data. The pipeline consists of four main stages: data acquisition through Wi-Fi signal propagation, signal preprocessing using filtering and wavelet transformation, data augmentation via random transformation techniques (jittering, scaling, slice shuffling, and magnitude warping), and classification using both baseline and hybrid deep learning models.
Figure 1.
Overview of the proposed Human Activity Recognition framework using CSI data. The pipeline consists of four main stages: data acquisition through Wi-Fi signal propagation, signal preprocessing using filtering and wavelet transformation, data augmentation via random transformation techniques (jittering, scaling, slice shuffling, and magnitude warping), and classification using both baseline and hybrid deep learning models.
Figure 2.
Experimental setup for capturing human activities using the 5G frequency band. The indoor environment measures 3 m by 2.8 m. A USRP X300 (transmitter) and a USRP X310 (receiver) are positioned at opposite corners. Four chairs are arranged 1 m apart to define the human activity space.
Figure 2.
Experimental setup for capturing human activities using the 5G frequency band. The indoor environment measures 3 m by 2.8 m. A USRP X300 (transmitter) and a USRP X310 (receiver) are positioned at opposite corners. Four chairs are arranged 1 m apart to define the human activity space.
Figure 3.
Comparison of original and Butterworth-filtered CSI signals for single-user and multi-user activity sessions. (a) One subject standing, (b) three subjects sitting.
Figure 3.
Comparison of original and Butterworth-filtered CSI signals for single-user and multi-user activity sessions. (a) One subject standing, (b) three subjects sitting.
Figure 4.
Augmentation impact of random-transformation techniques on CSI time-series for different activities. Each row displays the original and augmented signals using jittering, scaling, slicing, and window warping, respectively.
Figure 4.
Augmentation impact of random-transformation techniques on CSI time-series for different activities. Each row displays the original and augmented signals using jittering, scaling, slicing, and window warping, respectively.
Figure 5.
Architecture of the proposed time-series analysis model, incorporating CNN for feature extraction and a Transformer encoder for temporal representation, followed by a classification block.
Figure 5.
Architecture of the proposed time-series analysis model, incorporating CNN for feature extraction and a Transformer encoder for temporal representation, followed by a classification block.
Figure 6.
Sensitivity analysis comparing the distribution of mean values between original and augmented CSI data for two multi-user activity scenarios: (a) two subjects performing, one sitting and one standing, and (b) three subjects performing, two sitting and one standing. Augmented data (red) show broader tails but remain centred similarly to original data (blue): (a) two subjects, one sitting and one standing (b) three subjects, two sitting and one standing.
Figure 6.
Sensitivity analysis comparing the distribution of mean values between original and augmented CSI data for two multi-user activity scenarios: (a) two subjects performing, one sitting and one standing, and (b) three subjects performing, two sitting and one standing. Augmented data (red) show broader tails but remain centred similarly to original data (blue): (a) two subjects, one sitting and one standing (b) three subjects, two sitting and one standing.
Figure 7.
Boxplot comparison of mean values between original and augmented CSI data across multi-user activity scenarios. Augmentation maintains central tendency and spread while introducing slight variations, especially with increased subject count: (a) one subject sitting (b) two subjects, one sitting and one standing; (c) three subjects, two sitting and one standing; (d) four subjects, two sitting and two standing.
Figure 7.
Boxplot comparison of mean values between original and augmented CSI data across multi-user activity scenarios. Augmentation maintains central tendency and spread while introducing slight variations, especially with increased subject count: (a) one subject sitting (b) two subjects, one sitting and one standing; (c) three subjects, two sitting and one standing; (d) four subjects, two sitting and two standing.
Figure 8.
Confusion matrix for Experiment II showing the performance of the models for multi-users—2 subjects for different activities: (a) CNN + Transformer (b) CNN + GRU (c) CNN + BiLSTM (d) CNN + LSTM.
Figure 8.
Confusion matrix for Experiment II showing the performance of the models for multi-users—2 subjects for different activities: (a) CNN + Transformer (b) CNN + GRU (c) CNN + BiLSTM (d) CNN + LSTM.
Figure 9.
Confusion matrix for Experiment II showing the performance of the models for multi-users—3 subjects for different activities: (a) CNN + Transformer (b) CNN + GRU (c) CNN + BiLSTM (d) CNN + LSTM.
Figure 9.
Confusion matrix for Experiment II showing the performance of the models for multi-users—3 subjects for different activities: (a) CNN + Transformer (b) CNN + GRU (c) CNN + BiLSTM (d) CNN + LSTM.
Figure 10.
Confusionmatrix for Experiment II showing the performance of the models for multi-users—4 subjects for different activities: (a) CNN + Transformer (b) CNN + GRU (c) CNN + BiLSTM (d) CNN + LSTM.
Figure 10.
Confusionmatrix for Experiment II showing the performance of the models for multi-users—4 subjects for different activities: (a) CNN + Transformer (b) CNN + GRU (c) CNN + BiLSTM (d) CNN + LSTM.
Figure 11.
Confusion matrix for Experiment II showing the performance of the models for multiple users—combined 2-3-4 subjects for different activities.
Figure 11.
Confusion matrix for Experiment II showing the performance of the models for multiple users—combined 2-3-4 subjects for different activities.
Figure 12.
Comparisonof classification accuracy across different classifiers and proposed CNN+Transformer architecture under varying user configurations.
Figure 12.
Comparisonof classification accuracy across different classifiers and proposed CNN+Transformer architecture under varying user configurations.
Table 1.
Literature summary table.
Table 1.
Literature summary table.
Authors (Year) | Dataset/Source | Model(s) Used | Signal Processing/Augmentation | Evaluation Method | Key Findings |
---|
Muaaz et al. (2021) [26] | WiFi NICs CSI | CNN | CSI ratio, PCA, spectrogram | Experimental evaluation | 97.78% accuracy; robust to environmental variations |
Shi et al. (2022) [28] | WiFi CSI | CNN + Domain Adaptation | CSI enhancement | Cross-domain evaluation | One-fits-all model with improved generalisation to new environments |
Wang et al. (2022) [29] | WiFi CSI data | Domain Generalisation (AFFAR) | Adaptive feature fusion | Cross-domain testing | Combined domain-specific and domain-invariant features for robustness |
Abuhoureyah et al. (2024) [30] | Custom multi-user CSI data | Deep learning + ICA + CWT | ICA, CWT | Experimental evaluation | Separated overlapping signals; enabled robust multi-user location-independent HAR |
Wang et al. (2021) [31] | CSI-based HAR dataset | Few-shot Learning | Data augmentation | Experimental evaluation | Few-shot learning enabled improved accuracy in limited-data settings |
Zhang et al. (2022) [32] | Custom WiFi CSI | Graph Few-shot Learning | Augmented graph features | Few-shot evaluation | Generalised well across tasks using limited labelled samples effectively |
Xiao et al. (2024) [33] | Synthetic WiFi CSI | Diffusion + Contrastive Learning | Diffusion-based augmentation | Contrastive accuracy eval | Outperformed baseline models in generalisation under limited data |
Zhang et al. (2022) [34] | CSI-based HAR dataset | Zero-effort cross-domain (Widar3.0) | None specified | Cross-domain evaluation | Achieved high accuracy without requiring user calibration |
Shi et al. (2020) [35] | WiFi CSI | One-shot Learning + CSI enhancement | CSI signal denoising | Experimental testing | Enabled recognition with few samples and improved signal quality |
Xiao et al. (2023) [33] | WiFi CSI synthetic data | Diffusion Model + MLP | GAN/Diffusion-based augmentation | Comparative experiments | Improved training effectiveness using synthetic CSI data |
Elkelany et al. (2023) [27] | CSI dataset (12 activities, 3 environments) | ABiLSTM | Spectrogram conversion | 10-fold CV | Achieved up to 94.03% accuracy across environments |
Table 2.
CSI feature categories and descriptions.
Table 2.
CSI feature categories and descriptions.
Feature Category | Description |
---|
Time-Domain Statistical | Mean, Median, Std Dev, Min, Max, Kurtosis, Skewness, IQR, Variance, Root Mean Square |
Temporal Dynamics | Mean Absolute Difference, Mean Difference, Median Absolute Difference, Sum of Absolute Differences |
Signal Shape Characteristics | Peak-to-Peak Distance, Area Under Curve, Spectral Slope |
Frequency Domain | FFT Mean Coefficient, Spectral Centroid (Weighted frequency mean), Spectral Entropy, Spectral Kurtosis |
Signal Complexity | Permutation Entropy, Weighted Permutation Entropy, Spectral Variation (Normalised spectral std dev) |
Energy/Power Features | Absolute Energy, Average Power, Wavelet Energy (Sum of squared DWT coefficients) |
Statistical Distribution | Mean Absolute Deviation, Median Absolute Deviation, ECDF Percentile (25th percentile) |
Spectral Relationships | Power Bandwidth (Integrated spectrum), Spectral Distance (Cumulative spectral differences) |
Table 3.
Hyperparameter settings for the CNN + Transformer model.
Table 3.
Hyperparameter settings for the CNN + Transformer model.
Component | Hyperparameter | Value |
---|
Architecture | Number of Conv1D Layers | 3 |
Conv1D Filter Sizes | 32, 64, 128 |
Kernel Size | 3 |
Pooling Type | MaxPooling1D, GlobalMaxPooling1D |
Batch Normalisation | Yes |
Transformer | Encoder Layers | 3 |
Attention Heads | 4 |
Head Size | 64 |
Feedforward Dimension | 256 |
Dropout Rates | 0.1 (Transformer), 0.5, 0.3 (Dense) |
Positional Encoding | Relative (vector-based) |
Training | Optimiser | Adam |
Learning Rate | |
Batch Size | 32 |
Epochs | 500 |
Early Stopping Patience | 10 |
LR Schedule | Step decay at epochs 20 and 30 |
Loss Function | Sparse Categorical Cross-entropy |
Table 4.
Mann–Whitney U and Levene’s test results for original vs. augmented data.
Table 4.
Mann–Whitney U and Levene’s test results for original vs. augmented data.
Feature | Label | Mann–Whitney U | p-Value | Levene’s W | p-Value | Interpretation |
---|
Mean | Empty | 144,584.0 | 0.9139 | 0.1772 | 0.6739 | Fail to reject |
Mean | 1Subject-1Sit | 206,671.5 | 0.9325 | 0.0449 | 0.8322 | Fail to reject |
Mean | 2Subjects-1Sit-1Stand | 105,480.5 | 0.9384 | 0.0819 | 0.7747 | Fail to reject |
Mean | 3Subjects-2Sit-1Stand | 105,100.5 | 0.9871 | 0.0858 | 0.7696 | Fail to reject |
Mean | 4Subjects-2Sit-2Stand | 106,725.5 | 0.7811 | 1.0095 | 0.3151 | Fail to reject |
Table 5.
Experiment II results: accuracy across different augmentation factors for each model and phase.
Table 5.
Experiment II results: accuracy across different augmentation factors for each model and phase.
Model | Aug. Factor | Phase 1 | Phase 2 | Phase 3 | Phase 4 | All Activities |
---|
CNN + Transformer | 0 | 0.869 | 0.840 | 0.899 | 0.869 | 0.757 |
1 | 0.963 | 0.927 | 0.957 | 0.885 | 0.817 |
3 | 0.988 | 0.970 | 0.976 | 0.938 | 0.910 |
5 | 0.994 | 0.979 | 0.987 | 0.959 | 0.939 |
7 | 0.994 | 0.986 | 0.990 | 0.965 | 0.954 |
10 | 0.994 | 0.989 | 0.991 | 0.973 | 0.963 |
CNN + BiLSTM | 0 | 0.807 | 0.852 | 0.885 | 0.871 | 0.766 |
1 | 0.968 | 0.925 | 0.957 | 0.874 | 0.805 |
3 | 0.987 | 0.968 | 0.974 | 0.933 | 0.908 |
5 | 0.992 | 0.977 | 0.986 | 0.955 | 0.937 |
7 | 0.992 | 0.984 | 0.990 | 0.964 | 0.948 |
10 | 0.995 | 0.985 | 0.992 | 0.972 | 0.960 |
CNN + GRU | 0 | 0.762 | 0.729 | 0.832 | 0.881 | 0.772 |
1 | 0.965 | 0.932 | 0.960 | 0.875 | 0.799 |
3 | 0.985 | 0.969 | 0.979 | 0.935 | 0.906 |
5 | 0.993 | 0.978 | 0.985 | 0.954 | 0.930 |
7 | 0.993 | 0.985 | 0.989 | 0.963 | 0.952 |
10 | 0.994 | 0.987 | 0.991 | 0.973 | 0.958 |
CNN + LSTM | 0 | 0.746 | 0.729 | 0.841 | 0.869 | 0.761 |
1 | 0.962 | 0.927 | 0.952 | 0.872 | 0.794 |
3 | 0.984 | 0.966 | 0.974 | 0.933 | 0.901 |
5 | 0.990 | 0.973 | 0.983 | 0.954 | 0.927 |
7 | 0.993 | 0.982 | 0.988 | 0.965 | 0.949 |
10 | 0.995 | 0.985 | 0.989 | 0.973 | 0.959 |
Table 6.
Experiment III results: multi-user presence and activity detection.
Table 6.
Experiment III results: multi-user presence and activity detection.
Model | Metric | One User | Two Users | Three Users | Four Users | Mixed 2/3/4 Users |
---|
CNN + Transformer | Accuracy | 0.997 | 0.956 | 0.971 | 0.923 | 0.938 |
Precision | 0.997 | 0.956 | 0.970 | 0.923 | 0.935 |
Recall | 0.997 | 0.955 | 0.971 | 0.923 | 0.934 |
F1-score | 0.997 | 0.955 | 0.971 | 0.923 | 0.934 |
CNN + BiLSTM | Accuracy | 0.995 | 0.953 | 0.971 | 0.916 | 0.935 |
Precision | 0.995 | 0.953 | 0.971 | 0.917 | 0.930 |
Recall | 0.995 | 0.953 | 0.970 | 0.917 | 0.930 |
F1-score | 0.995 | 0.953 | 0.971 | 0.917 | 0.930 |
CNN + GRU | Accuracy | 0.996 | 0.952 | 0.971 | 0.921 | 0.933 |
Precision | 0.996 | 0.952 | 0.970 | 0.921 | 0.930 |
Recall | 0.996 | 0.952 | 0.970 | 0.920 | 0.929 |
F1-score | 0.996 | 0.952 | 0.970 | 0.920 | 0.928 |
CNN + LSTM | Accuracy | 0.995 | 0.953 | 0.961 | 0.896 | 0.930 |
Precision | 0.994 | 0.953 | 0.960 | 0.896 | 0.926 |
Recall | 0.995 | 0.952 | 0.960 | 0.895 | 0.926 |
F1-score | 0.995 | 0.942 | 0.960 | 0.895 | 0.926 |
Table 7.
Performance of CNN+Transformer with individual and cumulative augmentations (augmentation factor = 5) across all activity classes.
Table 7.
Performance of CNN+Transformer with individual and cumulative augmentations (augmentation factor = 5) across all activity classes.
Model Variant | Accuracy (%) | F1-Score | Precision | Recall |
---|
CNN + Transformer + No Aug. | 0.872 | 0.890 | 0.891 | 0.893 |
+ Magnitude_Warp | 0.837 | 0.819 | 0.823 | 0.819 |
+ Magnitude_Warp + Slice_Shuffle | 0.907 | 0.900 | 0.901 | 0.900 |
+ Magnitude_Warp + Slice_Shuffle + Scale | 0.928 | 0.922 | 0.923 | 0.922 |
+ Magnitude_Warp + Slice_Shuffle + Scale +Jitters | 0.939 | 0.933 | 0.940 | 0.934 |
Table 8.
Comparison of classification performance across different model architectures, including standalone CNN, LSTM, and the proposed CNN + Transformer.
Table 8.
Comparison of classification performance across different model architectures, including standalone CNN, LSTM, and the proposed CNN + Transformer.
Model Variant | Accuracy | F1-Score | Precision | Recall |
---|
CNN [36] | 0.857 | – | – | – |
CNN | 0.893 | 0.891 | 0.893 | 0.892 |
LSTM | 0.900 | 0.894 | 0.894 | 0.894 |
Proposed CNN + Transformer | 0.939 | 0.933 | 0.940 | 0.934 |