We use a wide range of experiments to evaluate the WiPID system used for authentication, giving configurations of the experiments, evaluating the overall performance, and exploring the impact of different methods and systems and different sample sizes. Finally, we conduct real-time testing experiments.
5.1. Experiment Setup
WiPID experiment setting refers to the experimental configuration of WiPIN [
17] and WiDFF [
18], and the experimental environment is designed in accordance with practical conditions to ensure data accuracy and scientific validity. In terms of hardware, a Tenda F3 router was used as the WiFi signal transmitter, while a laptop equipped with an Intel 5300 wireless network card served as the receiver. The receiver was further connected to three external antennas to enhance signal reception, thereby improving the accuracy and stability of CSI data acquisition (as shown in
Figure 6).
The experiments were conducted in an indoor office environment with dimensions of approximately 3 m × 2 m. The transmitter and receiver were placed on separate tables at opposite sides of the room, with a separation distance of 2 m and a height of 1.2 m, to emulate typical indoor signal propagation conditions. Participants were instructed to stand along the perpendicular bisector of the line connecting the transmitter and receiver, ensuring consistent influence of the human body on the signal propagation path. In addition, the experimental setup was arranged near a doorway to simulate a practical “walk-through” identity authentication scenario without explicit user interaction. The experimental environment of WiPID is shown in
Figure 7.
A total of 50 volunteers were recruited for the experiment, with an equal distribution of male participants and an age range of 18 to 30 years, ensuring diversity and representativeness of the dataset. During data collection, all participants were required to remain stationary and refrain from carrying electronic devices that could interfere with wireless signals, thereby minimizing noise in the acquired data.
During data acquisition, the receiver continuously recorded CSI data at a sampling rate of 500 Hz (in practice, higher sampling rates yield more data and potentially higher accuracy) and stored it in .dat format. Each sample contains channel state information across multiple subcarriers, including key features such as amplitude and phase, which capture the impact of the human body on WiFi signal propagation. The duration of each sample was 1–2 s, corresponding to approximately 500–1000 packets, ensuring temporal continuity. To improve data reliability, 100 valid samples were collected from each participant, along with additional redundant samples to compensate for packet loss or anomalies. Each antenna pair captured signals from 30 subcarriers, resulting in a total of 90 independent CSI subcarriers. The raw data were first preprocessed using the 802.11n CSI Tool under a Linux environment, and subsequently processed in MATLAB R2023b to ensure dataset consistency and quality. MATLAB was mainly used for data organization, format conversion, and sample construction. All data were standardized into tensors of size 3 × 30 × 500, where 3 denotes the antennas, 30 the subcarriers, and 500 the temporal length (i.e., number of packets). This standardized representation ensures consistency in model input. Detailed system settings are summarized in
Table 2.
The complete dataset consists of 5394 samples, including 4249 samples for training and 1145 for testing (as shown in
Table 3). During model training, data normalization was applied to improve convergence speed and generalization performance. The model was trained on an NVIDIA GeForce RTX 4060 Laptop GPU using Python 3.10 and PyTorch 2.1.1. Model training consisted of 80 epochs, optimized using the Adam optimizer with an initial learning rate of 0.001, and beta parameters of 0.9 and 0.999, respectively, ensuring stable convergence and effective optimization.
Additionally, we conducted real-time performance testing of the model. The real-time collected raw .dat files were processed and sent in fixed sizes, then converted to .mat format and visualized. These data were input into the model in real time for predictive analysis. This process ensured the immediacy and continuity of data processing and model prediction, thereby also testing the overall system’s response speed and accuracy.
5.2. Performance Evaluation
Overall Evaluation: To evaluate the overall performance of WiPID, four metrics—accuracy, precision, recall, and F1-score—are adopted. All metrics are computed using macro-averaging to provide a balanced assessment of the classifier across different users. Accuracy reflects the ratio of correctly predicted samples to the total number of predictions, serving as a key indicator of overall model performance. Precision, in contrast, focuses on the proportion of true positive samples among those predicted as positive.
Figure 8 illustrates the accuracy and precision results of WiPID. Most users achieve high accuracy and precision values, both close to 1. However, the accuracy of some users decreases, which may be due to the similarity of their signal features, causing confusion. From the analysis of accuracy and precision, it can be seen that the WiPID system has good classification performance in most cases, but the performance of the model may fluctuate in certain users or environments. To further improve the system, we can consider increasing the diversity of the dataset or improving the model to reduce the impact of these fluctuations.
Recall reflects the fraction of true positives that are successfully predicted among all real positive samples. As shown in
Figure 9, most users have a good recall rate, almost close to 1, but some users have a lower recall rate, indicating that the model has a higher miss rate for these users. For example, the recall rate of the 47th user is significantly lower than that of other users, possibly due to the user’s unique signal characteristics or significant noise, making it difficult for the model to correctly identify positive class samples for this user. By further analyzing the signal characteristics of these abnormal users, we can improve the model or data preprocessing methods specifically to enhance the overall performance of the system.
The F1-score, calculated as the harmonic mean of precision and recall, serves as an indicator for evaluating overall model effectiveness. F1-Score combines precision and recall, reflecting the overall performance of the model on positive samples.
Figure 10 shows the F1-Score performance for each person in our proposed system. In our experiments, the overall F1-Score is high, but the F1-Score for some users is low, indicating that there is room for improvement in the performance of the model for these users. Particularly for some users, the F1-Score is low, possibly due to the complexity of their feature data, causing the model to perform poorly in both precision and recall. These phenomena indicate that the recognition performance of the model decreases in specific users or scenarios.
Compare to other system: In this study, we compared WiPID with several mainstream WiFi identity recognition systems, including WiWho [
14], FreeSense [
15], NeuralWave [
16], WiPIN [
17], and WiDFF [
18].
Table 4 lists the comparison of the processing methods of these systems. Through comparison, it is observed that most WiFi identity recognition systems adopt signal preprocessing, which complicates the processing flow. However, WiPID did not use conventional methods for data preprocessing. Instead, it directly employed a convolutional autoencoder to replace this step, achieving excellent results and optimizing the processing flow, thereby enabling end-to-end model training. The feature extraction methods used by different systems also affect the final recognition performance. WiWho and WiPIN utilize statistical features, FreeSense employs Wavelet Transform, while WiDFF and WiPID adopt neural networks feature extraction methods. In particular, WiPID integrates multi-scale features, enabling the system to capture more critical features and thus maintain a higher accuracy rate. WiPID has optimized data preprocessing, feature extraction, and classification methods, surpassing other systems in multiple aspects.
Impact of of other deep Model: After comparing the WiFi fingerprint recognition performance of various deep learning models (including Transformer [
34], ResNet [
30], and FasterNet [
25]) on a dataset of 50 people, the WiPID model demonstrated significant advantages in handling large-scale user data. The average recognition performance of these models is shown in the
Table 5, including accuracy, precision, recall, and F1 score. WiPID achieved a high accuracy of 0.981, demonstrating its ability to accurately predict in the vast majority of cases. This performance is significantly better than that of Transformer and FasterNet, possibly due to the latter is relatively weak in terms of feature processing and model stability. WiPID also achieved a precision of 0.983, meaning it almost made no false positives when predicting positive cases, which is crucial for applications where the cost of false alarms is high, such as security monitoring and identity verification. Furthermore, WiPID achieved a recall rate of 0.982, highlighting its ability to cover positive samples, greatly reducing the possibility of missed detections. This is crucial for improving user experience and system reliability, especially in scenarios requiring precise identification of each user. WiPID also achieved an F1 score of 0.982, demonstrating its balance between precision and recall, allowing the model to maintain efficient recognition performance even in complex environments. These data not only demonstrate advantages of WiPID in various metrics but also imply the high quality of the dataset, ensuring good performance even when different models are used.
Comparison of Training Sample Sizes: We used different sizes of training samples and multiple deep learning models on a 50-person WiFi fingerprint recognition dataset.
Table 6 shows the effect of different sizes of training samples and models on the results, comparing models including Transformer, ResNet, FasterNet, and our proposed WiPID, trained with 20, 40, 60, and all samples, respectively. It can be seen that the performance of each model improves as the number of training samples increases. However, even with a small number of training samples, WiPID shows the slowest decrease in recognition rate and exhibits better stability. Specifically, when the sample size is 20, WiPID achieves an accuracy of 0.718, significantly higher than other models, indicating its ability to maintain a high recognition rate with a small number of samples. As the sample size increases to 40 and 60, the accuracy of WiPID further improves to 0.759 and 0.891, reflecting its strong robustness under medium-sized data. In all sample cases, WiPID achieves an accuracy of 0.981, outperforming other models.
Impact of Pooling Strategies in ECA: To validate the effectiveness of the proposed standard deviation pooling-based attention mechanism, ablation experiments are conducted by adopting different channel descriptor strategies within the ECA (Efficient Channel Attention) module. Specifically, under the same network architecture and training settings, global average pooling (GAP), global max pooling (GMP), and the proposed standard deviation pooling (STD) are respectively employed to generate channel attention weights. The model performance is then compared, and the results are presented in the
Table 7. The experimental results demonstrate that different pooling strategies have a significant impact on model performance. When GAP is used, the model achieves an accuracy of 0.969, showing relatively stable performance but still leaving room for improvement. In contrast, using GMP results in a slight performance degradation (accuracy of 0.961), indicating that relying solely on the maximum response is insufficient to fully characterize the feature distribution.In comparison, the model with standard deviation pooling (WiPID) achieves the best performance across all metrics, with an accuracy of 0.981 and an F1-score of 0.982, representing an improvement of approximately 1.2% over GAP. This result indicates that standard deviation pooling provides stronger discriminative capability in channel attention modeling. This improvement can be attributed to the fact that, unlike traditional mean or max statistics, standard deviation effectively captures the dispersion and variability of feature distributions. In WiFi-based identity recognition tasks, the influence of different individuals on wireless signals is typically reflected in subtle yet consistent dynamic variations, which are better characterized by feature fluctuations. By modeling the magnitude of such variations, standard deviation pooling enables the attention mechanism to focus more on discriminative dynamic features, thereby enhancing the overall recognition performance of the model.
Impact of Iteration Number N on Model Performance: To further investigate the impact of the iteration number N in the multi-scale feature extraction module on model performance, ablation experiments are conducted on a dataset consisting of 50 subjects. Under the same experimental settings (including training strategies and hyperparameters), the iteration number N is varied as N = 1,2,3,4,5, and the corresponding recognition performance is comparatively analyzed, as shown in the
Table 8. The experimental results demonstrate that the iteration number has a significant impact on model performance. When N = 1 and N = 2, the model achieves accuracies of 0.949 and 0.958, respectively, indicating relatively lower performance. This suggests that a small number of iterations is insufficient to fully extract multi-scale features, leading to limited capability in capturing individual differences in WiFi signals.When N = 3, the model achieves the best performance. At this point, multi-scale features are sufficiently fused, allowing the model to strike a good balance between feature representation capability and model complexity, thereby significantly improving recognition performance.However, when the iteration number further increases to N = 4 and N = 5, the model performance drops significantly, with accuracies decreasing to 0.866 and 0.812, respectively. This degradation can be attributed to two factors. On one hand, increasing the number of iterations leads to higher model complexity and computational cost, making training more difficult. On the other hand, excessive iterations may introduce redundant features and even amplify noise, thereby weakening the discriminative ability of the learned representations. Moreover, under the same number of training epochs, the model becomes harder to converge sufficiently, which further negatively affects the final performance. In summary, the iteration number N = 3 effectively balances multi-scale feature extraction capability and model complexity in this study, making it a reasonable and optimal choice.
5.3. Real-Time Person Identification Experiment
A real-time identity authentication test was carried out to examine the practical applicability of the WiPID system. The experimental configuration and environment were kept the same as those used during data collection. We send the data received at the receiving end in a fixed size through the cloud to another laptop, and the collected data is processed in real time by Intel 5300 CSI Tool [
19] and converted into inputs for the trained WiPID model. Subsequently, the recognition was performed by a Python 3.10 script. During testing, we evaluated the user’s recognition effectiveness and response speed, displaying the test results through a window, as in
Figure 11. The system predicted the input samples, displaying the predicted probability and processing time on the window. In this experiment, our main focus was on the model’s recognition effectiveness, thus we did not undertake complex system design. We conducted real-time tests on three users selected from the dataset, and the results indicated that real-time recognition was essentially achievable.
This experimental verified the feasibility of real-time identity authentication through WiFi fingerprint extraction.
However, it should be noted that this test was conducted under ideal dataset collection conditions. More exploration and optimization is needed to extend the application of the method. For example, the recognition effectiveness of the system may be affected when the number of recognized users increases, when there are more environmental disturbances, and when there are changes in the weight and dress of the users. This experiment initially validated the potential of the WiPID system in real-time recognition, yet for widespread deployment in practical applications, further research and improvement are necessary.