Next Article in Journal
Silicon Drift Detectors for the Measurement and Reconstruction of Beta Spectra
Next Article in Special Issue
HandFI: Multilevel Interacting Hand Reconstruction Based on Multilevel Feature Fusion in RGB Images
Previous Article in Journal
Initial Pose Estimation Method for Robust LiDAR-Inertial Calibration and Mapping
Previous Article in Special Issue
Estimating a 3D Human Skeleton from a Single RGB Image by Fusing Predicted Depths from Multiple Virtual Viewpoints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mitigating Data Leakage in a WiFi CSI Benchmark for Human Action Recognition

Nokia Bell Labs, 1082 Budapest, Hungary
Sensors 2024, 24(24), 8201; https://doi.org/10.3390/s24248201
Submission received: 19 November 2024 / Revised: 12 December 2024 / Accepted: 20 December 2024 / Published: 22 December 2024

Abstract

:
Human action recognition using WiFi channel state information (CSI) has gained attention due to its non-intrusive nature and potential applications in healthcare, smart environments, and security. However, the reliability of methods developed for CSI-based action recognition is often contingent on the quality of the datasets and evaluation protocols used. In this paper, we uncovered a critical data leakage issue, which arises from improper data partitioning, in a widely used WiFi CSI benchmark dataset. Specifically, the benchmark fails to separate individuals between the training and test sets, leading to inflated performance metrics as models inadvertently learn individual-specific features rather than generalizable action patterns. We analyzed this issue in depth, retrained several benchmarked models using corrected data partitioning methods, and demonstrated a significant drop in accuracy when individuals were properly separated across training and testing. Our findings highlight the importance of rigorous data partitioning in CSI-based action recognition and provide recommendations for mitigating data leakage in future research. This work contributes to the development of more robust and reliable human action recognition systems using WiFi CSI.

1. Introduction

Human action recognition using WiFi channel state information (CSI) has gained significant attention due to its non-intrusive nature and potential for applications in areas such as healthcare [1], security [2], and smart homes [3]. Benchmarks and datasets are crucial for evaluating and comparing methods for CSI-based human action recognition. However, recent advancements in this field are often reliant on specific benchmarks, i.e., the WiAR [4] or Widar3.0 databases [5], and the accuracy of these methods can be compromised by inherent flaws in the data.
In this paper, we identify a critical data leakage issue in a widely used WiFi CSI benchmark that results from improper partitioning methods. Specifically, the investigated benchmark published by Moshiri et al. [6] relies heavily on CSI signal-based partitioning, where data from the same individuals are included in both the training and testing sets. This introduces a form of leakage as the models learn specific characteristics of individuals rather than generalizable features of human actions. Ideally, the training and test sets should be partitioned by assigning different individuals entirely to each set, ensuring that the model is evaluated on unseen participants not just on new actions from the same individuals. This form of leakage leads to overly optimistic performance estimates as the models are inadvertently trained on patterns that overlap between the training and testing phases. Consequently, the reported high accuracies do not reflect the model’s ability to generalize to new, unseen subjects—a key requirement for real-world applications of WiFi CSI-based recognition systems.
Our paper provides an in-depth analysis of this issue, quantifies its impact on the several existing models published by Moshiri et al. [6], and proposes strategies to correct the data partitioning method to prevent leakage. We also present revised results using individual-based partitioning, showing a more realistic measure of model performance. By addressing this flaw, we aim to enhance the reliability and robustness of WiFi CSI-based human action recognition research.
This paper offers the following contributions:
  • Identification of the data leakages resulting from the CSI signal-based partitioning in a popular WiFi CSI benchmark.
  • Evaluation of the extent to which this leakage inflates performance metrics.
  • Practical recommendations for preventing data leakage through proper partitioning protocols based on individuals.

Structure of the Paper

The remainder of this paper is organized as follows. In Section 2, we provide a preliminary overview of WiFi CSI, outlining its relevance and role in human action recognition (HAR). Section 3 reviews related work, focusing on previous approaches of CSI-based HAR and prior studies addressing data leakage issues in machine learning. In Section 4, we describe the materials and methods used and proposed by the authors [6] who developed the benchmark database, highlighting the key features of their approach. Section 5 details the data leakage issue we identified, emphasizing its impact on model performance. Section 6 presents the experimental results, where we retrain the models using both the original partitioning method and a corrected approach that ensures proper partitioning by individuals. Section 7 offers a discussion of our findings, while Section 8 concludes the paper by summarizing our contributions and suggesting future directions for research.

2. Preliminaries on WiFi CSI

WiFi CSI refers to a set of parameters that describe the properties of the wireless communication channel between a transmitter and a receiver. It captures the effects of physical factors such as signal attenuation, scattering, reflection, and diffraction as WiFi signals travel through the environment. In essence, CSI provides a detailed snapshot of how the signal interacts with objects and people in the space, offering fine-grained information about signal behavior across different subcarriers and frequency bands. These data, collected from standard WiFi devices, can reveal subtle changes in the environment, making them a powerful tool for applications beyond communication, such as human action recognition.
CSI is derived from the multiple-input multiple-output (MIMO) capabilities of modern WiFi systems, which allow multiple antennas to send and receive data across various transmission paths or channels. When a WiFi signal travels from a transmitter (such as a router) to a receiver (such as a smartphone or laptop), it interacts with various objects and people in its path. These interactions cause variations in the amplitude and phase of the signal. CSI measures these variations at the physical layer, breaking the signal into multiple subcarriers—each corresponding to a different frequency range—and tracking how each subcarrier is affected. The granularity of CSI data provide detailed information about how the signal evolves as it moves through space, capturing environmental dynamics that are not available through simpler metrics like received signal strength indicator (RSSI). Whereas RSSI only measures the overall strength of the received signal, CSI offers a multidimensional view, breaking the signal into multiple subcomponents and tracking how they fluctuate in real time. This makes CSI a rich source of environmental data that can be used to infer changes in the surrounding space. Consider a MIMO system where N t transmit antennas and N r receive antennas are used. Let H C N r × N t represent the channel matrix where each element of H is a complex value that describes the channel between a specific pair of transmit and receive antennas [7]. Formally, the channel matrix can be written as follows:
H = h 11 h 12 h 1 N t h 21 h 22 h 2 N t h N r 1 h N r 2 h N r N t ,
where each element h i j C represents the channel response between the jth transmit antenna and the ith receive antenna. Each h i j is a complex number that models both the amplitude attenuation and phase shift of the signal as it propagates through the channel:
h i j = | h i j | e j θ i j ,
where | h i j | is the amplitude attenuation of the signal, and θ i j represents the phase shift introduced by the channel.
The unique properties of CSI make it particularly valuable for HAR [8]. WiFi signals, being ubiquitous and capable of penetrating walls and other obstacles, offer a non-intrusive method to monitor and understand human activities in indoor environments. Unlike cameras [9], which require line-of-sight and can raise privacy concerns, WiFi CSI can be collected passively without direct visual observation [10]. This makes it an appealing solution for applications in healthcare, security, and smart homes, where continuous monitoring of human activity is required.
The core principle behind using CSI for human action recognition is that human movements, such as walking, running, or sitting down, cause measurable disturbances in the WiFi signal [11]. These movements create changes in the signal’s amplitude and phase, which are captured by the CSI. For instance, when a person walks between a WiFi transmitter and receiver, their movement induces multi-path effects, causing the signal to bounce off their body and other surfaces in the environment. These disturbances leave distinct patterns in the CSI data that can be used to identify specific actions. Machine learning algorithms can then be applied to the CSI data to classify and recognize these human actions. By analyzing the variations in the signal, models can be trained to detect a wide range of activities, from simple gestures to more complex movements. For example, researchers have successfully used CSI to differentiate between actions like sitting, standing, walking, and even more granular activities such as hand gestures [12] or breathing patterns [13]. The level of detail provided by CSI allows for high-precision activity recognition, even in challenging environments where traditional sensors or cameras may struggle.
For more detailed information on leveraging WiFi CSI in sensing applications, readers are encouraged to refer to the work of Ma et al. [14], which provides a comprehensive exploration of this topic.

3. Related Works

WiFi CSI has emerged as a promising tool for non-intrusive HAR thanks to its ability to capture the fine-grained changes in the wireless channel caused by human movement. Numerous studies have explored the use of CSI for detecting and classifying human activities. In Section 3.1, an overview on the major papers connected to WiFi CSI-based HAR is given. Data leakage occurs when information from the training set is inadvertently included in the test set, leading to overly optimistic performance estimates that fail to generalize to unseen data. In Section 3.2, papers that documented data leakage in various domains of machine learning research are outlined.

3.1. WiFi CSI-Based HAR

In the WiSee project, Pu et al. [15] implemented a novel gesture recognition system. Specifically, the WiSee architecture operated by detecting minute Doppler shifts and multi-path distortions in wireless signals caused by human motion. The system was able to classify a set of nine gestures. The proof-of-concept prototype was implemented using USRP-N210 hardware and a software defined radio (SDR) system. Next, it was evaluated in both an office setting and a two-bedroom apartment. Al-qaness’s [16] approach for WiFi CSI-based HAR contained four steps: data collection and normalization, pattern segmentation, feature selection, and activity classification. Specifically, six features were selected both from the CSI amplitude and CSI phase. Based on these features, a random forest [17] algorithm was applied for action classification. WiFall [18] employs anomaly detection algorithms and learns unique CSI patterns to identify falls. It introduces a wireless propagation model designed for indoor environments influenced by human activity and provides a theoretical analysis of the model’s behavior during a fall. E-eyes [19] identifies human activities by analyzing the moving variance of amplitude. This approach is particularly effective for non-stationary activities, especially those with abrupt amplitude changes, like falling and jumping. Conversely, stationary activities, such as sleeping and sitting, show minimal amplitude variation in repetitive patterns, making the moving variance less effective in these cases. The CARM (channel state information-based human activity recognition and monitoring) system [20] incorporates two theoretical models. The first, the CSI speed model, quantifies the relationship between CSI fluctuations and human movement speed, while the second, the CSI activity model, quantifies the connection between human movement speed and specific activities. CDHAR (CSI-based device-free HAR) [21] is a system that uses a WiFi-sensing radar mounted on unmanned aerial vehicles to detect human activities. It employs kernel density estimation to establish adaptive detection thresholds and determine activity duration. For classification, CDHAR utilizes a random subspace ensemble method.
In recent years, deep learning has also been intensively applied to CSI-based HAR. For instance, a long short-term memory (LSTM) network was implemented by Yousefi et al. [7] to recognize human activities using CSI signals. In contrast, Chen et al. [22] implemented an attention-based LSTM for WiFi CSI-based HAR. Similarly, Yang et al. [23] utilized an attention mechanism with LSTM. Schäfer et al. [24] opted to combine the LSTM with a support vector machine for HAR. A deep learning network that integrates hidden features from temporal and spatial dimensions was implemented by Wang et al. [25]. Another line of papers combined convolutional neural network (CNN) and recurrent neural network (RNN) for the spatiotemporal modeling of CSI signals [26,27]. Jiao and Zhang [28] mapped the WiFi CSI-based HAR problem onto the image classification task by converting CSI signals into images using various time series imaging techniques, such as Gramian angular fields [29]. Using the CSI images, a LeNet style [30] CNN was implemented for HAR. This approach was improved by the same authors in [31] by fine tuning on pretrained ImageNet [32] database CNNs. In contrast, Luo et al. [33] fine tuned vision transformers [34] on short-time Fourier transform spectograms [35] that were generated from CSI signals. In [36], Ahmad et al. introduced a state-of-the-art study on the recent advances of WiFi-based HAR and movement tracking using deep learning techniques.

3.2. Data Leakage in Machine Learning Research

The growing reliance on machine learning methodologies has revolutionized data analysis across various domains, yet it has also introduced critical challenges, notably data leakage. This phenomenon occurs when information from the test set inadvertently influences the model training process, leading to artificially inflated performance metrics. Such circumstances not only compromise the integrity of research findings, but can also result in misleading conclusions applied in real-world scenarios. As researchers increasingly integrate complex datasets to refine algorithms, understanding the mechanisms and implications of data leakage becomes paramount. Data leakage can lead to overly optimistic performance metrics as models appear proficient in making predictions that they should not have been trained to understand [37]. Essentially, leakage undermines the integrity of model evaluation by providing a false sense of accuracy, complicating the discernment of a models true predictive power. Furthermore, such occurrences may be rooted in improper data preprocessing or inadequate separation of training and validation datasets, thus directly affecting model generalization in real-world applications.
Kaufman et al. [38] discussed the importance of identifying leakage in data mining to ensure accurate predictive modeling. Specifically, the authors stressed the significance of timestamping data to prevent leakage and ensure accurate predictions about the future. In [39], Kapoor and Narayanan delved into the nuances of data leakage within the realm of machine learning-based science. Namely, the authors elaborated a taxonomy of data leakage cases and grouped them into eight classes, i.e., those without a separate test set, preprocessing on the training and test sets, joint feature selection on the training and test sets, duplicated data points, illegitimate features, temporal leakage, non-independence between the training and test sets, and sampling bias. The authors underscored that the training dataset must be clearly separated from the test dataset during preprocessing, modeling, and evaluation steps. Further, a machine learning model should have no access to illegitimate features. For instance, Filho et al. [40] pointed out that Ye et al. [41] applied anti-hypertensive drugs as a feature in a hypertension prediction task. As a result, data leakage was introduced since the model would not have this information during the prediction phase.

4. Materials and Methods

Moshiri et al. [6] collected CSI data using a Raspberry Pi device equipped with a network interface card. Specifically, the authors collected CSI data for seven different human activities, i.e., fall, stand up, sit down, lie down, run, walk, and bend. Each activity was performed by three different users twenty times. For more details on the experimental setup, we refer readers to the 5.1 Subsection of Moshiri et al.’s [6] journal paper. The authors made publicly available the collected database, which is referred as CSI-HAR database in the following: the CSI-HAR database is available at: https://github.com/parisafm/CSI-HAR-Dataset accessed on 19 December 2024. Consequently, the dataset created for the study of [6] is publicly available and consists of 420 samples collected from three volunteers performing the specified activities. The authors plan to expand this dataset and explore various scenarios, including interactions between multiple users and activities performed by individuals of different ages.
The generalized framework of Moshiri et al.’s [6] proposed methods is depicted in Figure 1. As one can see, the WiFi CSI-based HAR is formulated as a classification task, and various deep learning algorithms were designed for HAR in [6]. The authors proposed four different deep learning architectures to solve the task: 2D-CNN, 1D-CNN, LSTM, and BLSTM (bi-directional long short-term memory). The structure of the 2D-CNN is depicted in Figure 2. Following the first convolutional layer, which uses a leaky ReLU activation function and max pooling, batch normalization (BN) [42] is introduced to help stabilize and accelerate network training. BN stabilizes the estimation of mean and standard deviation across mini-batches, bringing them closer to values of 0 and 1, respectively. Dropout layers are incorporated between convolutional layers to reduce overfitting and enhance the network’s generalization ability. The pooled features, obtained from max pooling, need to be flattened by transforming the feature map matrix into a single-column matrix. This flattened matrix is then fed through two fully connected layers to obtain the predicted class outputs. Unlike 1D-CNN, LSTM, and BLSTM, 2D-CNN was not trained on one-dimensional CSI signals but the corresponding CSI signals (52 channels in the CSI-HAR [6] database) were converted to RGB images using MATLAB’s imagesc [43] function. Illustration of the CSI images of various human actions can be seen in Figure 3. The structure of the 1D-CNN, which is very similar to those of 2D-CNN, can be seen in Figure 4.
Due to the time-series nature of CSI signals and the LSTM’s [44] capability to capture complex temporal dynamics, Moshiri et al. [6] expected that an LSTM network would demonstrate an outstanding performance in WiFi CSI-based HAR. For the HAR task, LSTM offers two key benefits: it can automatically extract features without needing preprocessing; and it retains the temporal state information of activities, leading to better performance for similar actions, like lying down versus sitting down, compared to 1D-CNN or the hidden Markov model. In [6], the authors used a simple LSTM with one hidden layer containing 128 hidden units, where the feature vector consists of a 52-dimensional vector of CSI amplitudes. The proposed LSTM architecture is shown in Figure 5.
The conventional LSTM network analyzes CSI data in a single direction, meaning the current hidden state only takes into account past CSI information. However, future CSI data are also crucial for HAR. This is why, Moshiri et al. [6] made experiments with an attention-based BLSTM to capture both past and future information, addressing long-term dependencies. Specifically, the network of Moshiri et al. [6] includes both forward and backward layers, enabling it to extract information from both directions. Essentially, it is a two-layer LSTM model where one processes the input in a forward direction and the other in reverse. Attention [45], as the name suggests, is a method that enables the model to focus on specific timesteps in input sequences of arbitrary length. Given that the sequential features learned by the BLSTM network for WiFi-based HAR are high-dimensional, with varying feature contributions and time steps depending on the case, the attention model was leveraged to automatically determine the significance of features and to adjust their weights based on recognition performance. In the study of Moshiri et al. [6], as shown in Figure 6, a BLSTM with an attention layer consisting of 400 units was employed to learn the relative importance of features and timesteps, giving higher weights to more important features to improve performance.
The parameters of the training, which were applied in the training process of 2D-CNN, 1D-CNN, LSTM, and BLSTM, are shown in Table 1. The methods proposed by Moshiri et al. [6] were reimplemented using Python 3.11.5 and PyTorch 2.5.1 in a computer configuration, as given in Table 2.

5. Detected Data Leakage

When data are split without considering individual separation, the training and test sets both include data from the same individuals. This approach can lead to significant data leakage as the model may inadvertently learn person-specific features rather than generalized action patterns. As shown in Figure 7, the improper data split—which was applied by Moshiri et al. [6]—allows the model to memorize unique attributes associated with individuals, leading to inflated performance metrics that do not reflect the model’s true generalization ability. To address this issue, we introduced a data split with respect to individuals, as shown in Figure 8, where distinct sets of individuals are assigned to either the training or the test set but not both. This approach ensures that the model is evaluated on previously unseen individuals, reducing the risk of data leakage and promoting the learning of action-specific features that are more generalizable across different individuals. By isolating the training and test sets based on individuals, the model is forced to focus on patterns relevant to actions rather than individual-specific characteristics. In our experiments, the data from the first two volunteers of CSI-HAR [6] were allocated to the training set, and the data of the third volunteer were allocated to the test set. Our experiments revealed that this corrected partitioning method significantly impacts the model’s accuracy, highlighting the importance of rigorous data splitting strategies in WiFi CSI-based human action recognition tasks. The findings underscore the need for careful evaluation protocols to prevent data leakage and support the development of robust and generalizable HAR systems.
Data partitioning with respect to individuals is not only critical for evaluating model performance accurately in research, but it is also essential in the context of production development. In real-world applications of WiFi CSI-based human action recognition—such as healthcare monitoring, security, and smart home environments—models must generalize well to new users who were not part of the training data. If a model’s performance is assessed using the same individuals in both the training and test sets, its reported accuracy is likely overstated, reflecting its ability to memorize specific individual characteristics rather than recognizing generalized human actions. Without individual-based partitioning, models risk failing in real production environments where they will encounter previously unseen individuals. This misalignment between research results and real-world performance can lead to unreliable systems that struggle to maintain accuracy when deployed at scale. For example, in healthcare settings where accurate monitoring of activities is crucial, a model that cannot generalize across individuals may miss critical events or generate false positives, potentially compromising patient safety and the system’s trustworthiness. By enforcing individual-based data splits during development, we simulated real-world deployment conditions more accurately, encouraging the model to learn robust features that are independent of any particular user’s characteristics. This approach promotes the creation of generalizable models that are better equipped for large-scale deployment, where diverse populations and varied environments are the norm. In turn, this leads to more reliable and consistent performance, enhancing the model’s utility and reliability in production settings.

6. Results

6.1. Evaluation Metrics

To ensure the validity and comparability of our findings, we adopted the exact evaluation metrics used by Moshiri et al. in their paper [6]. This approach not only verified our results against the original benchmarks, but it also strengthened the credibility of our analysis of data leakage and its impact on model reliability. In machine learning-based classification, accuracy is a widely used evaluation metric that measures the proportion of correct predictions made by a model over the total number of predictions. Formally, the accuracy for multi-class classification can be defined as follows:
A c c u r a c y = Number of Correct Predictions Total Number of Predictions = i = 1 C T P i i = 1 C ( T P i + F P i + F N i ) ,
where
  • C represents the total number of classes;
  • T P i denotes the true positives for class i;
  • F P i and F N i represent the false positives and false negatives for class i, respectively.
In the context of classification tasks, accuracy provides a measure of the model’s overall effectiveness by indicating the percentage of correct predictions across all classes. However, it is worth noting that while accuracy is a straightforward metric, it may not fully reflect the model performance in cases of imbalanced data, where one class is significantly more frequent than others. In such scenarios, additional metrics such as precision, recall, and F1 score can offer more insight into a model’s performance across different classes. However, CSI-HAR [6] is balanced with an equal distribution across all classes. As a result, accuracy serves as an appropriate and sufficient evaluation metric, providing a clear and reliable measure of model performance without the need for additional metrics to account for class imbalance.

6.2. Numerical Results

The numerical results of Moshiri et al.’s [6] methods using the two different data partitioning strategies are illustrated in Table 3. From these results, it can be seen that we could only replicate the published results when employing a data split that did not take individuals into account. In the case of the correct data split (with respect to humans), the results declined to approximately 63% to 72% of the published performance metrics. This substantial reduction underscores the impact of proper individual-based partitioning on model evaluation, revealing that the initially reported results may have been inflated due to data leakage.
In Figure 9 and Figure 10, the confusion matrices obtained with data split without and with respect to humans are depicted, respectively. A confusion matrix in multi-class classification with percentages showed the proportions of actual versus predicted class labels, where each cell represents the percentage of instances relative to the total, with correct predictions on the diagonal and misclassifications on the off-diagonal [47]. In the case of data splits without respect to individuals, the confusion matrices showed uniformly high values along the diagonal, indicating that the networks achieved consistently high accuracy across all action types. This uniformity suggests that no specific action type posed a significant challenge, likely due to the networks leveraging individual-specific features rather than action-specific patterns. In the case of the individual-respecting data split, the confusion matrices revealed distinct challenging cases. For example, the “lie down” action showed notably low accuracy, indicating that this action is more difficult for the network to identify accurately. In contrast, the “fall” action maintained high accuracy, similar to the results in the without-individual-respecting case, suggesting that it remains an easily recognizable action for the network.
The training curves of the 2D-CNN model with data splits without and with respect to individuals are depicted in Figure 11 and Figure 12, respectively. These figures illustrate the differences in the model training behavior under each partitioning strategy, highlighting the impact of individual-based separation on the network’s learning process and final performance. Several interesting conclusions can be drawn from these two figures. When the data are split without regard for individual subjects, a close alignment between training and test accuracies can be observed, with test accuracy following the training trend closely and showing only minor differences. In contrast, when the data split respects individual subjects, a marked divergence arises between training and test accuracy. Although training accuracy continued to improve steadily, the test accuracy quickly reached a plateau, indicating that the model struggles to generalize to new, unseen data. This gap suggests that the model may be overfitting to the training data under subject-based partitioning, underscoring the importance of sound data management practices to ensure model robustness and generalizability. The striking contrast between training and test accuracy behavior highlights how crucial the choice of data partitioning strategy is for accurately assessing model performance. These findings emphasize the need for subject-based partitioning to achieve reliable estimates of model generalization and to reduce the risk of overfitting.

7. Discussion

Our analysis highlights the substantial impact of data partitioning strategies on the performance of the WiFi CSI-based human action recognition models introduced and published in [6]. In our experiments, when individuals were not separated between the training and test sets, the models achieved accuracy levels very close to those reported in the original publication. However, these results were largely influenced by data leakage as the networks were able to leverage individual-specific features, leading to inflated performance metrics. The confusion matrices for this case showed consistently high values along the diagonal, indicating uniformly strong performance across action types and suggesting that no particular actions were overly challenging for the models. This finding underscores the misleading nature of evaluations that do not account for individual-based partitioning as they do not accurately reflect the models’ generalizability to new users.
Conversely, when the correct data split (with respect to individuals) is applied, the models’ performance declined significantly, achieving only 63% to 72% of the original reported accuracy. This drop illustrates the importance of rigorously isolating individuals in training and testing to ensure fair evaluation. The confusion matrices for the individual-respecting split provide further insight, showing that certain actions, such as “lie down”, were particularly challenging for the models, resulting in a much lower accuracy. On the other hand, actions like “fall” maintained high accuracy, similar to the results seen in the original, non-individual-respecting split, which was likely due to the distinct and robust patterns associated with this action.
These findings underscore the need for more stringent evaluation practices in CSI-based human action recognition. Models developed without proper data partitioning risk poor performance when deployed in real-world environments, where they encounter previously unseen individuals [48]. Our results demonstrate that partitioning data with respect to individuals promotes the learning of action-specific rather than individual-specific features, enhancing the model’s potential for real-world application.
In light of these observations, we recommend that future studies prioritize rigorous data splitting protocols to avoid data leakage and ensure generalizable results. Additionally, using supplementary evaluation metrics beyond accuracy, such as per-class precision and recall, could provide deeper insight into the model’s strengths and limitations for each action class. Overall, this work contributes to establishing best practices in CSI-based human action recognition and highlights the importance of robust evaluation to advance the field toward reliable and deployable solutions.

8. Conclusions

In this work, we examined the critical impact of data partitioning strategies on the performance of WiFi CSI-based human action recognition models. Our investigation revealed that data leakage occurs when the same individuals are included in both training and test sets, leading to inflated accuracy metrics due to the model’s reliance on individual-specific features rather than generalized patterns of human actions. When applying an individual-respecting split, the model’s performance dropped to 63–72% of the originally reported metrics, underscoring the importance of rigorous evaluation protocols. These findings emphasize the necessity of proper data partitioning to ensure reliable model performance and generalizability, particularly for real-world applications, where models are expected to handle new, unseen individuals. By highlighting the significant discrepancy in performance caused by data leakage, our work encourages the adoption of best practices in CSI-based human action recognition, particularly regarding data management and model evaluation. Future research should continue to prioritize robust validation strategies to advance the field toward creating deployable, reliable systems that maintain high accuracy in practical scenarios.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study were derived from the CSI-HAR database available for download at https://github.com/parisafm/CSI-HAR-Dataset (accessed on 19 December 2024). For this study, the data were rearranged to address specific research questions, and the rearranged dataset is available for download at https://github.com/elektrische-schafen/CSI-HAR-Database (accessed on 19 December 2024).

Acknowledgments

We would like to express our sincere gratitude to our colleague Krisztián Varga for his invaluable assistance and expertise in GPU computing. His guidance and support have been instrumental in optimizing our computational workflows and accelerating the progress of this research project. We would like to express our heartfelt gratitude to the entire team of Nokia Bell Labs, Budapest, for fostering an environment of collaboration, support, and positivity throughout the duration of this project. Finally, we wish to thank the anonymous reviewers and the academic editor for their careful reading of our manuscript and their many insightful comments and suggestions.

Conflicts of Interest

Author Domonkos Varga was employed by the company Nokia Bell Labs. The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BLSTMbi-directional long short-term memory
BNbatch normalization
CNNconvolutional neural network
CPUcentral processing unit
CSIchannel state information
FNfalse negative
FPfalse positive
GPUgraphics processing unit
MIMOmultiple-input multiple-output
HARhuman action recognition
LSTMlong short-term memory
RNNrecurrent neural network
RSSIreceived signal strength indicator
SDRsoftware defined radio
TPtrue positive

References

  1. Tan, B.; Chen, Q.; Chetty, K.; Woodbridge, K.; Li, W.; Piechocki, R. Exploiting WiFi channel state information for residential healthcare informatics. IEEE Commun. Mag. 2018, 56, 130–137. [Google Scholar] [CrossRef]
  2. Liu, J.; He, Y.; Xiao, C.; Han, J.; Ren, K. Time to think the security of WiFi-based behavior recognition systems. IEEE Trans. Dependable Secur. Comput. 2023, 21, 449–462. [Google Scholar] [CrossRef]
  3. Lei, Z.; Rong, B.; Jiahao, C.; Yonghong, Z. Smart City Healthcare: Non-Contact Human Respiratory Monitoring with WiFi-CSI. IEEE Trans. Consum. Electron. 2024, 70, 5960–5968. [Google Scholar]
  4. Guo, L.; Wang, L.; Lin, C.; Liu, J.; Lu, B.; Fang, J.; Liu, Z.; Shan, Z.; Yang, J.; Guo, S. Wiar: A public dataset for wifi-based activity recognition. IEEE Access 2019, 7, 154935–154945. [Google Scholar] [CrossRef]
  5. Zhang, Y.; Zheng, Y.; Qian, K.; Zhang, G.; Liu, Y.; Wu, C.; Yang, Z. Widar3.0: Zero-effort cross-domain gesture recognition with wi-fi. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 8671–8688. [Google Scholar] [CrossRef] [PubMed]
  6. Moshiri, P.F.; Shahbazian, R.; Nabati, M.; Ghorashi, S.A. A CSI-based human activity recognition using deep learning. Sensors 2021, 21, 7225. [Google Scholar] [CrossRef]
  7. Yousefi, S.; Narui, H.; Dayal, S.; Ermon, S.; Valaee, S. A survey on behavior recognition using WiFi channel state information. IEEE Commun. Mag. 2017, 55, 98–104. [Google Scholar] [CrossRef]
  8. Kong, Y.; Fu, Y. Human action recognition and prediction: A survey. Int. J. Comput. Vis. 2022, 130, 1366–1401. [Google Scholar] [CrossRef]
  9. Pareek, P.; Thakkar, A. A survey on video-based human action recognition: Recent updates, datasets, challenges, and applications. Artif. Intell. Rev. 2021, 54, 2259–2322. [Google Scholar] [CrossRef]
  10. Wu, X.; Chu, Z.; Yang, P.; Xiang, C.; Zheng, X.; Huang, W. TW-See: Human activity recognition through the wall with commodity Wi-Fi devices. IEEE Trans. Veh. Technol. 2018, 68, 306–319. [Google Scholar] [CrossRef]
  11. Sun, Z.; Ke, Q.; Rahmani, H.; Bennamoun, M.; Wang, G.; Liu, J. Human action recognition from various data modalities: A review. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3200–3225. [Google Scholar] [CrossRef] [PubMed]
  12. Abdelnasser, H.; Youssef, M.; Harras, K.A. Wigest: A ubiquitous wifi-based gesture recognition system. In Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM), Hong Kong, China, 26 April–1 May 2015; pp. 1472–1480. [Google Scholar]
  13. Abdelnasser, H.; Harras, K.A.; Youssef, M. UbiBreathe: A ubiquitous non-invasive WiFi-based breathing estimator. In Proceedings of the 16th ACM International Symposium on Mobile ad Hoc Networking and Computing, Hangzhou, China, 22–25 June 2015; pp. 277–286. [Google Scholar]
  14. Ma, Y.; Zhou, G.; Wang, S. WiFi sensing with channel state information: A survey. ACM Comput. Surv. (CSUR) 2019, 52, 1–36. [Google Scholar] [CrossRef]
  15. Pu, Q.; Gupta, S.; Gollakota, S.; Patel, S. Whole-home gesture recognition using wireless signals. In Proceedings of the 19th Annual International Conference on Mobile Computing & Networking, Miami, FL, USA, 30 September–4 October 2013; pp. 27–38. [Google Scholar]
  16. Al-qaness, M.A. Device-free human micro-activity recognition method using WiFi signals. Geo-Spat. Inf. Sci. 2019, 22, 128–137. [Google Scholar] [CrossRef]
  17. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  18. Wang, Y.; Wu, K.; Ni, L.M. Wifall: Device-free fall detection by wireless networks. IEEE Trans. Mob. Comput. 2016, 16, 581–594. [Google Scholar] [CrossRef]
  19. Wang, Y.; Liu, J.; Chen, Y.; Gruteser, M.; Yang, J.; Liu, H. E-eyes: Device-free location-oriented activity identification using fine-grained wifi signatures. In Proceedings of the 20th Annual International Conference on Mobile Computing and Networking, Maui, HI, USA, 7–11 September 2014; pp. 617–628. [Google Scholar]
  20. Wang, W.; Liu, A.X.; Shahzad, M.; Ling, K.; Lu, S. Device-free human activity recognition using commercial WiFi devices. IEEE J. Sel. Areas Commun. 2017, 35, 1118–1131. [Google Scholar] [CrossRef]
  21. Yuan, H.; Yang, X.; He, A.; Li, Z.; Zhang, Z.; Tian, Z. Features extraction and analysis for device-free human activity recognition based on channel statement information in b5G wireless communications. EURASIP J. Wirel. Commun. Netw. 2020, 2020, 1–10. [Google Scholar] [CrossRef]
  22. Chen, Z.; Zhang, L.; Jiang, C.; Cao, Z.; Cui, W. WiFi CSI based passive human activity recognition using attention based BLSTM. IEEE Trans. Mob. Comput. 2018, 18, 2714–2724. [Google Scholar] [CrossRef]
  23. Yang, X.; Cao, R.; Zhou, M.; Xie, L. Temporal-frequency attention-based human activity recognition using commercial WiFi devices. IEEE Access 2020, 8, 137758–137769. [Google Scholar] [CrossRef]
  24. Schäfer, J.; Barrsiwal, B.R.; Kokhkharova, M.; Adil, H.; Liebehenschel, J. Human activity recognition using CSI information with nexmon. Appl. Sci. 2021, 11, 8860. [Google Scholar] [CrossRef]
  25. Wang, F.; Gong, W.; Liu, J. On spatial diversity in WiFi-based human activity recognition: A deep learning-based approach. IEEE Internet Things J. 2018, 6, 2035–2047. [Google Scholar] [CrossRef]
  26. Yang, J.; Zou, H.; Zhou, Y.; Xie, L. Learning gestures from WiFi: A siamese recurrent convolutional architecture. IEEE Internet Things J. 2019, 6, 10763–10772. [Google Scholar] [CrossRef]
  27. Lee, H.; Ahn, C.R.; Choi, N. Fine-grained occupant activity monitoring with Wi-Fi channel state information: Practical implementation of multiple receiver settings. Adv. Eng. Inform. 2020, 46, 101147. [Google Scholar] [CrossRef]
  28. Jiao, W.; Zhang, C. An Efficient Human Activity Recognition System Using WiFi Channel State Information. IEEE Syst. J. 2023, 17, 6687–6690. [Google Scholar] [CrossRef]
  29. Wang, Z.; Oates, T. Imaging time-series to improve classification and imputation. arXiv 2015, arXiv:1506.00327. [Google Scholar]
  30. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  31. Zhang, C.; Jiao, W. Imgfi: A high accuracy and lightweight human activity recognition framework using csi image. IEEE Sens. J. 2023, 23, 21966–21977. [Google Scholar] [CrossRef]
  32. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  33. Luo, F.; Khan, S.; Jiang, B.; Wu, K. Vision Transformers for Human Activity Recognition using WiFi Channel State Information. IEEE Internet Things J. 2024, 11, 28111–28122. [Google Scholar] [CrossRef]
  34. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  35. Brunton, S.L.; Kutz, J.N. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
  36. Ahmad, I.; Ullah, A.; Choi, W. WiFi-Based Human Sensing with Deep Learning: Recent Advances, Challenges, and Opportunities. IEEE Open J. Commun. Soc. 2024, 5, 3595–3623. [Google Scholar] [CrossRef]
  37. Kapoor, S.; Narayanan, A. Leakage and the reproducibility crisis in ML-based science. arXiv 2022, arXiv:2207.07048. [Google Scholar]
  38. Kaufman, S.; Rosset, S.; Perlich, C.; Stitelman, O. Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data (TKDD) 2012, 6, 1–21. [Google Scholar] [CrossRef]
  39. Kapoor, S.; Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 2023, 4, 100804. [Google Scholar] [CrossRef] [PubMed]
  40. Chiavegatto Filho, A.; Batista, A.F.D.M.; Dos Santos, H.G. Data leakage in health outcomes prediction with machine learning. Comment on “prediction of incident hypertension within the next year: Prospective study using statewide electronic health records and machine learning”. J. Med. Internet Res. 2021, 23, e10969. [Google Scholar] [CrossRef] [PubMed]
  41. Ye, C.; Fu, T.; Hao, S.; Zhang, Y.; Wang, O.; Jin, B.; Xia, M.; Liu, M.; Zhou, X.; Wu, Q.; et al. Prediction of incident hypertension within the next year: Prospective study using statewide electronic health records and machine learning. J. Med. Internet Res. 2018, 20, e22. [Google Scholar] [CrossRef] [PubMed]
  42. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, PMLR, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
  43. Majumdar, N.; Banerjee, S. MATLAB Graphics and Data Visualization Cookbook; PACKT Publishing: Birmingham, UK, 2012. [Google Scholar]
  44. Schmidhuber, J.; Hochreiter, S. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar]
  45. Vaswani, A. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  46. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  47. Heydarian, M.; Doyle, T.E.; Samavi, R. MLCM: Multi-label confusion matrix. IEEE Access 2022, 10, 19083–19095. [Google Scholar] [CrossRef]
  48. Saupe, D.; Hahn, F.; Hosu, V.; Zingman, I.; Rana, M.; Li, S. Crowd workers proven useful: A comparative study of subjective video quality assessment. In Proceedings of the QoMEX 2016: 8th International Conference on Quality of Multimedia Experience, Lisbon, Portugal, 6–8 June 2016. [Google Scholar]
Figure 1. Generalized deep learning framework applied by Moshiri et al. [6] for WiFi CSI-based human action recognition.
Figure 1. Generalized deep learning framework applied by Moshiri et al. [6] for WiFi CSI-based human action recognition.
Sensors 24 08201 g001
Figure 2. Structure of the 2D-CNN for the WiFi CSI-based HAR proposed by Moshiri et al. [6].
Figure 2. Structure of the 2D-CNN for the WiFi CSI-based HAR proposed by Moshiri et al. [6].
Sensors 24 08201 g002
Figure 3. Illustration of CSI images of various human actions in the CSI-HAR database [6]. The CSI signals (52 channels in CSI-HAR [6]) were converted to RGB images using MATLAB 2022B’s imagesc function [43]. (a) Walk. (b) Fall. (c) Stand up. (d) Sit down.
Figure 3. Illustration of CSI images of various human actions in the CSI-HAR database [6]. The CSI signals (52 channels in CSI-HAR [6]) were converted to RGB images using MATLAB 2022B’s imagesc function [43]. (a) Walk. (b) Fall. (c) Stand up. (d) Sit down.
Sensors 24 08201 g003
Figure 4. Structure of the 1D-CNN for the WiFi CSI-based HAR proposed by Moshiri et al. [6].
Figure 4. Structure of the 1D-CNN for the WiFi CSI-based HAR proposed by Moshiri et al. [6].
Sensors 24 08201 g004
Figure 5. Structure of the LSTM for the WiFi CSI-based HAR proposed by Moshiri et al. [6].
Figure 5. Structure of the LSTM for the WiFi CSI-based HAR proposed by Moshiri et al. [6].
Sensors 24 08201 g005
Figure 6. Structure of the BLSTM for the WiFi CSI-based HAR proposed by Moshiri et al. [6].
Figure 6. Structure of the BLSTM for the WiFi CSI-based HAR proposed by Moshiri et al. [6].
Sensors 24 08201 g006
Figure 7. Data split without respect to humans. This figure illustrates the improper data partitioning method applied by Moshiri et al. [6], where data from the same individuals are included in both training and test sets. Specifically, the CSI signals are extracted first and then randomly allocated to a training set (75% of signals) and a test set (25% of signals). As a result, a machine or deep learning model learns individual-specific features rather than general patterns of human actions, leading to data leakage and inflated performance metrics.
Figure 7. Data split without respect to humans. This figure illustrates the improper data partitioning method applied by Moshiri et al. [6], where data from the same individuals are included in both training and test sets. Specifically, the CSI signals are extracted first and then randomly allocated to a training set (75% of signals) and a test set (25% of signals). As a result, a machine or deep learning model learns individual-specific features rather than general patterns of human actions, leading to data leakage and inflated performance metrics.
Sensors 24 08201 g007
Figure 8. Data split with respect to humans. This figure illustrates the correct data partitioning method, where different individuals are assigned entirely to the training and test sets. Next, CSI signal extraction was carried out. This approach ensures that the model is evaluated on unseen individuals, preventing data leakage and promoting generalization of human action recognition models.
Figure 8. Data split with respect to humans. This figure illustrates the correct data partitioning method, where different individuals are assigned entirely to the training and test sets. Next, CSI signal extraction was carried out. This approach ensures that the model is evaluated on unseen individuals, preventing data leakage and promoting generalization of human action recognition models.
Sensors 24 08201 g008
Figure 9. Confusion matrices of Moshiri et al.’s [6] methods if the CSI data are split without respect to humans: (a) 2D-CNN; (b) 1D-CNN; (c) LSTM; and (d) BLSTM.
Figure 9. Confusion matrices of Moshiri et al.’s [6] methods if the CSI data are split without respect to humans: (a) 2D-CNN; (b) 1D-CNN; (c) LSTM; and (d) BLSTM.
Sensors 24 08201 g009
Figure 10. Confusion matrices of Moshiri et al.’s [6] methods if the CSI data are split with respect to humans: (a) 2D-CNN; (b) 1D-CNN; (c) LSTM; and (d) BLSTM.
Figure 10. Confusion matrices of Moshiri et al.’s [6] methods if the CSI data are split with respect to humans: (a) 2D-CNN; (b) 1D-CNN; (c) LSTM; and (d) BLSTM.
Sensors 24 08201 g010
Figure 11. Retraining of 2D-CNN on the CSI-HAR database [6] without respect to humans. In the upper section of the figure, the training accuracy is represented by a blue line and the test accuracy by a black line. In the lower section of the figure, the red line indicates training loss, while the black line shows the loss on the test set.
Figure 11. Retraining of 2D-CNN on the CSI-HAR database [6] without respect to humans. In the upper section of the figure, the training accuracy is represented by a blue line and the test accuracy by a black line. In the lower section of the figure, the red line indicates training loss, while the black line shows the loss on the test set.
Sensors 24 08201 g011
Figure 12. Retraining of 2D-CNN on the CSI-HAR database [6] with respect to humans. In the upper section of the figure, the training accuracy is represented by a blue line and the test accuracy by a black line. In the lower section of the figure, the red line indicates training loss, while the black line shows the loss on the test set.
Figure 12. Retraining of 2D-CNN on the CSI-HAR database [6] with respect to humans. In the upper section of the figure, the training accuracy is represented by a blue line and the test accuracy by a black line. In the lower section of the figure, the red line indicates training loss, while the black line shows the loss on the test set.
Sensors 24 08201 g012
Table 1. Parameter settings.
Table 1. Parameter settings.
ParameterValue
Loss functionCross-entropy
OptimizerAdam [46] ( β 1 = 0.9 , β 2 = 0.99 , ϵ 2 = 1 × 10 9 )
Learning rate0.001
Decay rate0.8
Batch size64
Epochs50
Table 2. Computer configurations.
Table 2. Computer configurations.
Computer modelSTRIX Z270H Gaming
Operating systemWindows
CPUIntel(R) Core(TM) i7-7700K CPU 4.20 GHz (8 cores)
Memory15 GB
GPUNVIDIA GeForce GTX 1080
Table 3. Comparison of the results on CSI-HAR [6].
Table 3. Comparison of the results on CSI-HAR [6].
Reported in [6]Retrained w/o.r.t. HumansRetrained w.r.t. Humans
ArchitectureAccuracyAccuracyAccuracy
2D-CNN95.5%92.2%66.4%
1D-CNN87.4%86.9%55.0%
LSTM89.2%89.3%61.8%
BLSTM94.7%93.6%62.2%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Varga, D. Mitigating Data Leakage in a WiFi CSI Benchmark for Human Action Recognition. Sensors 2024, 24, 8201. https://doi.org/10.3390/s24248201

AMA Style

Varga D. Mitigating Data Leakage in a WiFi CSI Benchmark for Human Action Recognition. Sensors. 2024; 24(24):8201. https://doi.org/10.3390/s24248201

Chicago/Turabian Style

Varga, Domonkos. 2024. "Mitigating Data Leakage in a WiFi CSI Benchmark for Human Action Recognition" Sensors 24, no. 24: 8201. https://doi.org/10.3390/s24248201

APA Style

Varga, D. (2024). Mitigating Data Leakage in a WiFi CSI Benchmark for Human Action Recognition. Sensors, 24(24), 8201. https://doi.org/10.3390/s24248201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop