Wi-CHAR: A WiFi Sensing Approach with Focus on Both Scenes and Restricted Data

Significant strides have been made in the field of WiFi-based human activity recognition, yet recent wireless sensing methodologies still grapple with the reliance on copious amounts of data. When assessed in unfamiliar domains, the majority of models experience a decline in accuracy. To address this challenge, this study introduces Wi-CHAR, a novel few-shot learning-based cross-domain activity recognition system. Wi-CHAR is meticulously designed to tackle both the intricacies of specific sensing environments and pertinent data-related issues. Initially, Wi-CHAR employs a dynamic selection methodology for sensing devices, tailored to mitigate the diminished sensing capabilities observed in specific regions within a multi-WiFi sensor device ecosystem, thereby augmenting the fidelity of sensing data. Subsequent refinement involves the utilization of the MF-DBSCAN clustering algorithm iteratively, enabling the rectification of anomalies and enhancing the quality of subsequent behavior recognition processes. Furthermore, the Re-PN module is consistently engaged, dynamically adjusting feature prototype weights to facilitate cross-domain activity sensing in scenarios with limited sample data, effectively distinguishing between accurate and noisy data samples, thus streamlining the identification of new users and environments. The experimental results show that the average accuracy is more than 93% (five-shot) in various scenarios. Even in cases where the target domain has fewer data samples, better cross-domain results can be achieved. Notably, evaluation on publicly available datasets, WiAR and Widar 3.0, corroborates Wi-CHAR’s robust performance, boasting accuracy rates of 89.7% and 92.5%, respectively. In summary, Wi-CHAR delivers recognition outcomes on par with state-of-the-art methodologies, meticulously tailored to accommodate specific sensing environments and data constraints.


Introduction
Human activity recognition (HAR) plays a pivotal role in emerging Internet of Things (IoT) technologies, encompassing domains such as smart healthcare, smart homes, and user identification [1,2].Numerous HAR systems exist, including camera-based approaches [3], wearable sensor-based methods [4], radio frequency-based techniques [5,6], ultrasonicbased solutions [7], and FMCW-based methodologies [8,9].Despite the commendable recognition performance demonstrated by these HAR systems, practical deployment poses several challenges, such as privacy and security concerns, high equipment costs, limited sensing distances, and installation or wearing requirements.
HAR based on WiFi Channel State Information (CSI) has emerged as a focal point in intelligent sensing research.In comparison to other sensing technologies, WiFi sensor devices offer advantages in terms of cost-effectiveness, ubiquity, security, and ease of deployment.CSI, being highly responsive to human motion, provides detailed amplitude and phase information across subcarriers in the frequency domain.Leveraging these technical merits of WiFi, researchers have proposed device-free human sensing applications Sensors 2024, 24, 2364.https://doi.org/10.3390/s24072364https://www.mdpi.com/journal/sensorsutilizing WiFi CSI, including indoor localization [10], intrusion detection [11], vital sign monitoring [12], and gesture recognition [13].
Meanwhile, numerous deep learning-based studies [14] have made significant strides in this domain, particularly in understanding the pattern relationships between CSI patterns and activity types.However, WiFi signals are susceptible to absorption, diffraction, reflection, or scattering phenomena during propagation, resulting in a strong coupling relationship between CSI and environmental factors beyond human actions.The CSI patterns elicited by the same action in varying environments or under different conditions may exhibit disparities.While high accuracy can be achieved if HAR models are trained and tested in identical locations, their performance drastically declines when confronted with new activity classes, users, or scenarios, thus presenting a cross-domain challenge [15].To tackle this issue, numerous studies have proposed WiFi-based cross-domain HAR approaches.However, some methodologies exhibit inherent limitations.Moreover, in larger environments featuring multiple WiFi devices, dynamic sensing device selection could significantly enhance sensing accuracy and efficacy, further bolstering HAR applications.
To address the aforementioned issues, we have designed Wi-CHAR, a WiFi-based Cross-domain Human Activity Recognition system utilizing few-shot learning.Wi-CHAR comprises two key modules.Firstly, focusing on scene analysis, it utilizes access points (APs) and WiFi-enabled sensors to establish transmission pairs.We propose a method for dynamically selecting the optimal sensing receiver device based on the individual's location.The fundamental idea is to utilize multiple WiFi device transceiver pairs to comprehensively select sensing devices tailored to the specific environmental layout.Secondly, Wi-CHAR prioritizes constrained data, thereby avoiding cross-domain pattern alterations by employing similarity metrics instead of CSI patterns, to some extent resolving the cross-domain challenge.By capturing a small volume of action data for few-shot learning, Wi-CHAR can detect human activities across multiple environments without necessitating retraining in a new domain, thus mitigating data labeling and training burdens, achieving generalization.A perceptual recognition model can be derived from a limited number of training samples using a few-shot learning algorithm, thereby enhancing system robustness through the amalgamation of diverse samples and aiding the recognition model in delineating clearer boundaries.Additionally, this paper proposes a method to enhance the structure and fortify the noise immunity of the prototype network.Conventional prototype networks often exhibit poor noise immunity, leading to decreased model accuracy in the presence of noise interference.However, by reassigning feature embeddings to mitigate noise impact, Wi-CHAR effectively improves noise immunity, thereby enhancing overall model performance.
The system's performance is evaluated by conducting a series of experiments on its own human activity datasets under different conditions.Based on this, the system's performance is analyzed to recognize new user behaviors and scenarios under fewer sample conditions.Comparative experiments are also conducted in this paper to verify the reliability and robustness of the system for activity recognition with limited training samples.This paper also performs performance evaluations on the public datasets WiAR [16] and Widar 3.0 [15].The experimental results show that the system can recognize common human activities with high accuracy based on the available support sets.In summary, this paper makes the following contributions: (1) Consider the problem of sensing restricted data.Wi-CHAR can be used in new domains using only a small number of labelled samples, eliminating the need to retrain new models.
(2) Take into account the sensing scenes, design an adaptation model for partial area sensing capability decrease.This can obtain higher quality data over a greater sensing region and improve the human activity recognition effect.
(3) We propose a prototype network structure Re-PN to improve the noise immunity performance of the system.Compared with the basic prototype network, the average performance of the proposed method is improved by 12%.The rest of the paper is organized as follows.Section 2 describes the related work.Section 3 details the process of implementing the system.Section 4 provides the analysis and evaluation of the experimental results, and Section 5 concludes the paper.

Related Work
2.1.Non-Few-Shot Learning with WiFi HAR Fine-grained CSI has been widely used for human motion detection in the past few years.CrossSense [17] utilizes simulated CSI samples from the target environment to retrain the recognition model, thereby enhancing performance in new environments.Widar 3.0 [15] introduces a generalized deep learning model for cross-domain gesture recognition, requiring only one-time training and adaptable to diverse data domains.Wang et al. [13] introduced SS-GAN and ST-GAN, which augment the training sample set by generating virtual samples to address gesture recognition challenges in novel scenarios.WiDIGR [18] uses a two-dimensional Fresnel zone to eliminate the effect of walking directly on the signal spectrogram.CeHAR [19] was proposed as a parameter-free dual-feature fusion method with compact fusion of CSI amplitude and phase features.Sheng et al. [20] used a trained source domain model as a pre-trained model in a new scene.Zhang et al. [21] proposed a Dense-LSTM that expanded the training datasets by eight CSI transform methods and achieved about 90% accuracy in adapting to recognize new individual activities.WiLCA [22] implemented a cross-domain authentication system using a small amount of data.Sun et al. [23] conducted research on WiFi-based human motion detection through walls, using an iterative adaptive approach to improve Doppler resolution and further extend the potential of WiFi for through-wall sensing applications.Zhou et al. [11] combined the Back Propagation Neural Network (BPNN), the Adaptive Genetic Algorithm (AGA), and CSI tensor decomposition to improve data processing while obtaining high indoor positioning accuracy.
All these approaches aim at detecting human motion within the sensing range of Wi-Fi devices.WiFi-based sensing systems have very large sensing ranges and fuzzy sensing boundaries.These methods are not friendly to additional training for each new domain.The Wi-CHAR platform in this paper is based on an accurate sensing boundary model for device selection.It achieves higher accuracy in cross-domain sensing that is robust to different environments.

Few-Shot Learning with WiFi HAR
Many recent works use few-shot learning, such as WiLISensing [24], a locationindependent, limited-data human activity recognition system.Inspired by relational networks, ML-DFGR [25] proposed a WiFi gesture recognition system that is robust to new users and environments due to its transferable similarity evaluation capability.AFSL-HAR [26] achieved significant performance in identifying new categories by fine-tuning the model parameters with a small number of samples.AirFi [27] proposes that the domain generalization effect of perception can be further improved by using the method of few-shot learning.MatNet-eCSI [28] proposes a neural network with enhanced external memory to improve environmental robustness through one-shot learning.MetaSense [29] adopts a few-shot learning framework, enabling deep mobile sensing methods to rapidly adapt to new users and new devices.RF-Net [6] employs a metric-based meta-learning framework to achieve cross-environment HAR using two pairs of WiFi devices; however, RF-Net's cross-domain performance is limited.OneFi [30] adopts a single-sample learning framework to recognize unseen gestures, yet this requires four receivers to convert existing gestures into virtual gestures, a process that demands intricate knowledge.
Collecting a large amount of data can be very expensive and, in some cases, even impossible.Therefore, this paper is inspired by few-sample learning to build models using fewer samples to reduce the cost of model building and improve scalability in new environments.Wi-CHAR can be used in new scenarios using only a small number of labeled samples without the need to train a new model.

System Design
In this section, we present the system design.Firstly, we describe the overall architecture of the framework.Subsequently, we provide a comprehensive overview of the dynamic selection method, data processing, feature extraction, and the enhanced prototype classification network for receiving devices across multiple links within this system.Finally, we briefly outline the approach for implementing the training of the activity recognition model.environments.Wi-CHAR can be used in new scenarios using only a small number of labeled samples without the need to train a new model.

System Design
In this section, we present the system design.Firstly, we describe the overall architecture of the framework.Subsequently, we provide a comprehensive overview of the dynamic selection method, data processing, feature extraction, and the enhanced prototype classification network for receiving devices across multiple links within this system.Finally, we briefly outline the approach for implementing the training of the activity recognition model.

Dynamic Selection of Rx in n-Links
The prevalence of WiFi sensor devices in indoor environments makes the optimal solution choice possible.Not all Tx-Rx pairs are equally good at sensing because the position and orientation of the target relative to the Tx-Rx pair affect the sensing accuracy, and the sensing recognition under a single transmit-receive link suffers from a positiondependent problem.Sensing-Signal-to-Noise-Ratio (SSNR) [31] can quantify the sensing capability.Assuming that the settings of the WiFi transceiver pair are known and the distance from the sensed target to the transmitter and receiver is the Line of Sight (LoS) path length, then we have:

Dynamic Selection of Rx in n-Links
The prevalence of WiFi sensor devices in indoor environments makes the optimal solution choice possible.Not all Tx-Rx pairs are equally good at sensing because the position and orientation of the target relative to the Tx-Rx pair affect the sensing accuracy, and the sensing recognition under a single transmit-receive link suffers from a positiondependent problem.Sensing-Signal-to-Noise-Ratio (SSNR) [31] can quantify the sensing capability.Assuming that the settings of the WiFi transceiver pair are known and the distance from the sensed target to the transmitter and receiver is the Line of Sight (LoS) path length, then we have: where r D is the distance between the transmitter and receiver, i.e., the path length of the LoS, r T and r R are the distances from the target to the transmitter and receiver, respectively.
Sensors 2024, 24, 2364 5 of 19 In a real indoor environment, there are many other objects on reflection.To extend the sensing coverage model to a multipath-rich environment, Equation ( 2) is used to represent the power variation due to multi-path: where γ is the slope of the linear curve, b is a constant, γ, K, and b have a fixed value for each pair of transceiver, and P LoS is the static path signal power.It is shown that the SSNR is related to the distance from the target to the transceiver device and the distance of the transceiver setup.The dynamic receiver device selection step is as in Algorithm 1. Removal of receivers with poor sensing capability according to the above SSNR and iteration to obtain the optimal receiver location.
To verify the device selection model, the dynamic selection of sensing devices is performed after determining the area.The best sensing-receiving device within a certain area is obtained, as shown in Figure 2. The data are obtained to pave the way for later activity recognition.
where D r is the distance between the transmitter and receiver, i.e., the path length of the LoS, T r and R r are the distances from the target to the transmitter and receiver, respec- tively.In a real indoor environment, there are many other objects on reflection.To extend the sensing coverage model to a multipath-rich environment, Equation ( 2) is used to represent the power variation due to multi-path: where γ is the slope of the linear curve, b is a constant, γ , K , and b have a fixed value for each pair of transceiver, and LoS P is the static path signal power.It is shown that the SSNR is related to the distance from the target to the transceiver device and the distance of the transceiver setup.The dynamic receiver device selection step is as in Algorithm 1. Removal of receivers with poor sensing capability according to the above SSNR and iteration to obtain the optimal receiver location.
To verify the device selection model, the dynamic selection of sensing devices is performed after determining the area.The best sensing-receiving device within a certain area is obtained, as shown in Figure 2. The data are obtained to pave the way for later activity recognition.

Data Processing and Feature Extraction
CSI has finer subcarrier-level granularity than RSSI [32] and is easily accessible through commercial WiFi devices.WiFi CSI has multi-path propagation and can be represented as a linear superposition of all paths, including noise ( ( , )) where s θ and n θ denote the amplitudes of the static path signal and noise, respec- tively.Doppler frequency shift (DFS) can be obtained after a short-time Fourier transform (SFFT) of the channel frequency response of the CSI signal as follows:

Data Processing and Feature Extraction
CSI has finer subcarrier-level granularity than RSSI [32] and is easily accessible through commercial WiFi devices.WiFi CSI has multi-path propagation and can be represented as a linear superposition of all paths, including noise (H n ( f , t)), dynamic paths (H d ( f , t)), and static paths (H n ( f , t)): where θ s and θ n denote the amplitudes of the static path signal and noise, respectively.Doppler frequency shift (DFS) can be obtained after a short-time Fourier transform (SFFT) of the channel frequency response of the CSI signal as follows: where λ is the wavelength and d(t) is the length of the reflection path.The CSI after time-frequency analysis can be expressed as the Doppler shift D( f , t): Sensors 2024, 24, 2364 6 of 19

Algorithm 1 Dynamic Device (Rx) Selection Algorithm
Input: Tx and Rxs position PTx, PRx{1, . . ., N}, Rx number N, Parameters r T , r R , r D , Position of the target x at t:P t , The static path signal power of Rx at moment t: P t Los .Output: Res (selection result) of the Rxs selected at time t.//First exclude Rx outside the induction zone.1: Angle A t of the target at position P t and r T with Tx; 2: for i in {1, . . . ,n} do 3: Angle A t i of the target at position P t and r R with Rx; 4 : r 2 D /(r T r R ) 2 → SSNR ; //Preliminary SSNR.5: end for 6: for j in {1, . . . ,Res_Rx} do 7: Get position relationship → SSNR{j} ;//Candidates.8: Computation (r T r R ) b and (γ(P t Los + ∆P) + b)SSNR min ; 9: An equivalent Rx←Res(P t ); 10: end for 11: Select an optimal Rxs with direction: Res.
where B( f D k (t)) is the window function for cutting the new number segment of interest.The raw CSI data often contains noise, and hardware devices may introduce offsets that can adversely affect experimental results when used directly.In this paper, upon acquiring the raw CSI data, we initially denoised the CSI signal using a high-pass filtering method, followed by PCA for extracting principal component feature data.Active samples were then extracted using a threshold-based segmentation method.Finally, a short-time Fourier transform (STFT) was performed to extract the discrete Fourier spectrum (DFS) of the action signal.This paper uses the MF-DBSCAN clustering algorithm to cluster the obtained Doppler spectrograms and correct or remove the anomalies twice.Compared with the K-means algorithm, the DBSCAN algorithm does not need to specify the number of classes for clustering in advance.It can be applied to a wider range of data with arbitrary shapes and can also find outliers.In our experiments, we achieved improved results with reduced arithmetic processing for specific sensing data.The MF-DBSCAN algorithm is detailed in Algorithm 2, and the clustering results are illustrated in Figure 3.As CSI samples for different actions may vary in length, it is crucial to normalize the sample lengths to a fixed duration.Get the globally optimal MinPts , Eps : 9: else marked as noise; 10: until no tagged objects.
In existing few-shot learning studies, two types of feature embedding models are commonly used, including the four-layer convolutional network structure (Conv4) and ResNet18 [33].The ResNet18 model has a deeper network structure than Conv4 and has significant advantages in generalization performance, so ResNet18 convolutional architecture is used as the action segmentation post-backbone of the feature extractor to extract the feature data of the segmented DFS sequence.Let f θ .be the feature extraction net- work, where θ is the learnable parameter.Given the input data x , the feature represen- tation ( )

Re-PN Module
This paper aims to improve the generalization of the classifier obtained by training with a small amount of data.The prototype network (PN) is the focus of the metric learning network, which is simple and effective, avoiding the complexity of recursive networks and reducing memory requirements.All data samples in the training and test sets will be divided into support and query sets.Suppose there is a support set of N labeled samples  In existing few-shot learning studies, two types of feature embedding models are commonly used, including the four-layer convolutional network structure (Conv4) and ResNet18 [33].The ResNet18 model has a deeper network structure than Conv4 and has significant advantages in generalization performance, so ResNet18 convolutional architecture is used as the action segmentation post-backbone of the feature extractor to extract the feature data of the segmented DFS sequence.Let f θ .be the feature extraction network, where θ is the learnable parameter.Given the input data x, the feature representation z = f θ (x).

Re-PN Module
This paper aims to improve the generalization of the classifier obtained by training with a small amount of data.The prototype network (PN) is the focus of the metric learning network, which is simple and effective, avoiding the complexity of recursive networks and reducing memory requirements.All data samples in the training and test sets will be divided into support and query sets.Suppose there is a support set of N labeled samples S = {(x 1 , y 1 ), . . ., (x N , y N )}, where x i ∈ R D is the D-dimensional feature vector of the samples and y i ∈ {1, . . . ,k} is the corresponding label.S k ∈ S denotes the set of samples labeled as class k.The D-dimensional original data are first mapped to the M-dimensional embedding space θ.For the support set, all |S k | sample images of the same class are extracted by the neural network feature mapping function f θ features.For the query set sample x , it is projected into the same feature embedding space f θ ( x ) as the support set sample, and the distance is measured by clustering prototypes µ k with each class of the query set and giving a prediction of the class label y to which it belongs to: where µ k denotes the prototype of the action type.The optimization of the prototype network model is achieved by minimizing the negative log probability of correct labels by the gradient descent method: where n is the true label of the training sample.The updated loss function of the prototype network model is expressed as: CSI data obtained in real-world scenarios often contain significant noise and interference, leading to a notable degradation in the accuracy of traditional PN models under such conditions.Wi-CHAR introduces a method to enhance the PN structure, termed Re-PN, aiming to bolster its noise immunity performance through a reassignment approach.Algorithm 3 outlines the Re-PN methodology, wherein adjustments are made adaptively.This adaptive adjustment endows Re-PN with the capability to differentiate between correct and noisy data samples.It emphasizes the importance weight of correct samples while simultaneously mitigating the interference caused by potential noisy samples on the feature prototype representations.The schematic diagram illustrating Re-PN is depicted in Figure 4, given a test set T of samples x T j , a support set S = (x S i , y S i ) M i=1 , and a query set . For the support set feature embedding f θ (x i ), the improved design introduces a weight parameter α i to measure the degree of influence of a certain sample x i feature embedding of the support set on the feature prototype computation.The feature embedding computation based on the reassignment method network model is expressed as: where S k denotes all similar images belonging to the category k in the support set.
where d(•) is the distance metric function.The predicted probability distribution of the test sample x T j over each class is calculated by Equation (7).Replacing the test set Q with the query set T in the training phase, the loss can be obtained by the central loss function as follows: where c yi denotes the feature embedding center of the y i category sample and x i denotes the feature before the fully connected layer.The final loss function of the model is: where η is the hyperparameter and is taken as η = 1 in the experiment.We use an episode-based strategy to train the Re-PN model.Finally, the loss function of Equation ( 12) is calculated.The training of the model is implemented using the Adam optimization algorithm to update the parameters of the model, and the learning rate parameter L r is updated using the cosine annealing learning rate update strategy: where epoch is the number of current iterations and max_epoch is the total number of training sessions.The above process is repeated until the parameters of the network model do not change much.Calculate i=1 α i feature prototype; 8: end for 9: Loss J ← 0 ; 10: for c in {1, . . . ,N C } do 11:  where yi c denotes the feature embedding center of the i y category sample and i x de- notes the feature before the fully connected layer.The final loss function of the model is: where η is the hyperparameter and is taken as 1 η = in the experiment.We use an ep- isode-based strategy to train the Re-PN model.Finally, the loss function of Equation ( 12) is calculated.The training of the model is implemented using the Adam optimization algorithm to update the parameters of the model, and the learning rate parameter r L is updated using the cosine annealing learning rate update strategy: where epoch is the number of current iterations and max_ epoch is the total number of training sessions.The above process is repeated until the parameters of the network model do not change much.Output: Re-PN Loss J of Classifier Model. 1: ({1,..., }, ) //Few-shot task set.
2: for k in do

Experiments and Performance Analysis
In this section, we first present the experimental setup.Then, the effectiveness of Wi-CHAR on owned and public data is evaluated in intra-domain and cross-domain scenarios.The performance of different hyperparameter settings is also compared with the most advanced HAR systems to validate system performance.

Experimental Setup
A TP-LINK AX3000 router was used as a transmitter (Tx), and multiple Google Nexus 6P smartphones with Nexmon [34] framework and Thinkpad X201i devices with Intel 5300 Tools [35] were used as receivers (Rx) to collect CSI samples of human activity during the experiments.
In order to systematically evaluate the performance of Wi-CHAR, this study was conducted with several subjects.In the movement monitoring phase, a variety of common postures were evaluated in this paper, i.e., sitting still, walking and standing up, and sitting down.Sudden states such as falls were measured.Data were available for six categories of human activities, as shown in Table 1.The samples were collected in three scenarios: a conference room and a large classroom, as depicted in Figure 5.A total of six subjects (three male and three female) participated in the experiment, and we also examined the impact of their physical parameters (e.g., height, weight, age) on the experiment.Wi-CHAR necessitates at least two receivers in each region to capture the complex changes in path velocity induced by the target's motion.Initially, three thousand movement data points were generated to form the sample set.Subsequently, only a small number of data samples were collected within the experimental scenario to facilitate motion sensing.Furthermore, the performance of the Re-PN model was validated on the public datasets Widar 3.0 [15] and WiAR [16].No additional restrictions were imposed on the participants during the experiments.Each environment was equipped with a camera to record all target activities as a reference for the experiment.The training and testing phases were conducted on a Windows desktop featuring an Intel Core i9-10700kF CPU, 24GB RAM, NVIDIA GeForce GTX 3080ti GPU, and PyTorch-1.8.0 framework.
In order to systematically evaluate the performance of Wi-CHAR, this study was conducted with several subjects.In the movement monitoring phase, a variety of common postures were evaluated in this paper, i.e., sitting still, walking and standing up, and sitting down.Sudden states such as falls were measured.Data were available for six categories of human activities, as shown in Table 1.The samples were collected in three scenarios: a conference room and a large classroom, as depicted in Figure 5.A total of six subjects (three male and three female) participated in the experiment, and we also examined the impact of their physical parameters (e.g., height, weight, age) on the experiment.Wi-CHAR necessitates at least two receivers in each region to capture the complex changes in path velocity induced by the target's motion.Initially, three thousand movement data points were generated to form the sample set.Subsequently, only a small number of data samples were collected within the experimental scenario to facilitate motion sensing.Furthermore, the performance of the Re-PN model was validated on the public datasets Widar 3.0 [15] and WiAR [16].No additional restrictions were imposed on the participants during the experiments.Each environment was equipped with a camera to record all target activities as a reference for the experiment.

Performance Overview
To accurately and comprehensively evaluate the performance of Wi-CHAR, numerous experiments were conducted under various conditions.Initially, the effectiveness of the Wi-CHAR system within the same domain was tested.Subsequently, the system's performance with new users, new scenarios, and different datasets was assessed.In each cross-domain experiment, only one domain factor was altered.
This study primarily relies on recognition accuracy as an evaluation metric.It signifies the probability of correctly recognizing an action sample and is calculated using the equation: where TP and FP represent true positive and false positive, respectively.TN and FN represent true negative and false negative, respectively.TP + TN is the number of correctly identified signal samples, and the denominator is the number of all samples tested.The higher Accuracy it is, the better the performance of our system.

Evaluation within the Intra-Domain
We first evaluate the performance of the proposed method traditionally, i.e., all CSI sample sets are from activities performed by a single user in the same scenario.Figure 6 shows the confusion matrix evaluated in the same domain on Widar 3.0, WiAR, and our own datasets.The proposed system, Wi-CHAR, achieves 93.9%, 92.5%, and 89.7% accuracy on its own datasets, Widar 3.0 and WiAR, respectively.The Euclidean distance metric is used in the experiments, and each action category in the support set contains only five samples.This section uses 80% of the remaining data as the training data and 20% as the test set.
correctly identified signal samples, and the denominator is the number of all samples tested.The higher Accuracy it is, the better the performance of our system.

Evaluation within the Intra-Domain
We first evaluate the performance of the proposed method traditionally, i.e., all CSI sample sets are from activities performed by a single user in the same scenario.Figure 6 shows the confusion matrix evaluated in the same domain on Widar 3.0, WiAR, and our own datasets.The proposed system, Wi-CHAR, achieves 93.9%, 92.5%, and 89.7% accuracy on its own datasets, Widar 3.0 and WiAR, respectively.The Euclidean distance metric is used in the experiments, and each action category in the support set contains only five samples.This section uses 80% of the remaining data as the training data and 20% as the test set.

Cross-Scene Recognition Effect
Empty rooms were chosen as the source domain, while conference rooms and large classrooms were designated as the target domains.Each experiment was repeated 10 times, and the objective evaluation results are depicted in Figure 7.The average accuracy of practical actions on our own data surpasses 93%, with the highest accuracy exceeding 96% (five-shot).
In the Widar 3.0 datasets, M1, M2, and M3 represent the lounge, conference room, and laboratory, respectively, while W1, W2, and W3 denote the classroom, office, and hall, respectively.The experimental results obtained are presented in Tables 2 and 3, respectively.As observed in Table 2, the additional scene data collected also exhibits superior recognition rates with Wi-CHAR, further highlighting the system's cross-scene capability.

Cross-Scene Recognition Effect
Empty rooms were chosen as the source domain, while conference rooms and large classrooms were designated as the target domains.Each experiment was repeated 10 times, and the objective evaluation results are depicted in Figure 7.The average accuracy of practical actions on our own data surpasses 93%, with the highest accuracy exceeding 96% (five-shot).

Cross-User Recognition Effect
To evaluate the cross-user performance of Wi-CHAR, this study trained the model using CSI samples collected from one user and tested the system's performance using CSI activity samples from other users (u1, u2, u3, u4, u5).One of the sixteen experimenters (p0) from the Widar 3.0 dataset was randomly selected as the training set, and the activities of five participants (p1, p2, p3, p4, p5) were tested in the classroom and hall environments.
Wi-CHAR achieved the highest accuracy of 93% in the five-shot condition.The average accuracy in the "one sample per category" condition was approximately 55%.The performance difference between testing on our data and Widar 3.0 can be attributed to the number of users and types of actions.Widar 3.0 had sixteen users for testing, whereas this experiment only included six users, and there were differences in the types of actions included in the two datasets.The experimental results are depicted in Figure 8.In the Widar 3.0 datasets, M1, M2, and M3 represent the lounge, conference room, and laboratory, respectively, while W1, W2, and W3 denote the classroom, office, and hall, respectively.The experimental results obtained are presented in Tables 2 and 3, respectively.As observed in Table 2, the additional scene data collected also exhibits superior recognition rates with Wi-CHAR, further highlighting the system's cross-scene capability.

Cross-User Recognition Effect
To evaluate the cross-user performance of Wi-CHAR, this study trained the model using CSI samples collected from one user and tested the system's performance using CSI activity samples from other users (u1, u2, u3, u4, u5).One of the sixteen experimenters (p0) from the Widar 3.0 dataset was randomly selected as the training set, and the activities of five participants (p1, p2, p3, p4, p5) were tested in the classroom and hall environments.
Wi-CHAR achieved the highest accuracy of 93% in the five-shot condition.The average accuracy in the "one sample per category" condition was approximately 55%.The performance difference between testing on our data and Widar 3.0 can be attributed to the number of users and types of actions.Widar 3.0 had sixteen users for testing, whereas this experiment only included six users, and there were differences in the types of actions included in the two datasets.The experimental results are depicted in Figure 8.

Cross-User and Cross-Scene Recognition Effect
In this set of experiments, the training and testing categories remain consistent, bu both users and scenarios are altered.These experiments aim to identify the activity of new user in a new scenario.The results of these experiments are illustrated in Figure 9 "Classroom-Conference" denotes the utilization of activity samples collected in the class room scenario to train the Wi-CHAR system, while samples obtained from the conferenc room scenario are used to assess the system's performance.For instance, "u2" represent the second user.

Cross-User and Cross-Scene Recognition Effect
In this set of experiments, the training and testing categories remain consistent, but both users and scenarios are altered.These experiments aim to identify the activity of a new user in a new scenario.The results of these experiments are illustrated in Figure 9. "Classroom-Conference" denotes the utilization of activity samples collected in the classroom scenario to train the Wi-CHAR system, while samples obtained from the conference room scenario are used to assess the system's performance.For instance, "u2" represents the second user.
both users and scenarios are altered.These experiments aim to identify the activity of a new user in a new scenario.The results of these experiments are illustrated in Figure 9 "Classroom-Conference" denotes the utilization of activity samples collected in the classroom scenario to train the Wi-CHAR system, while samples obtained from the conference room scenario are used to assess the system's performance.For instance, "u2" represents the second user.

Discussion and Analysis
As observed in Section 4.2 above, the system implemented in this study demonstrates satisfactory performance under varied conditions.The recognition accuracy on our datasets is marginally higher than that of the Widar 3.0 and WiAR datasets.This discrepancy may stem from the fact that the samples in this paper are derived from data post-multi-WiF device selection, resulting in improved data quality compared to the public datasets.Addi tionally, the action types examined in this paper primarily comprise common daily activi ties, which are coarse-grained and relatively less susceptible to environmental influence.

Effect of the Number of Rx and Dynamic Selection
To elucidate the impact of the number of WiFi devices, the experiments in this section vary the number of Rx from two to seven (five-shot) in both the conference room and the classroom environments.Increasing the number of Rx devices leads to higher accuracy and less variation, as dynamic device selection mitigates the performance degradation caused by improper device placement.It can be observed that the improvemen

Discussion and Analysis
As observed in Section 4.2 above, the system implemented in this study demonstrates satisfactory performance under varied conditions.The recognition accuracy on our datasets is marginally higher than that of the Widar 3.0 and WiAR datasets.This discrepancy may stem from the fact that the samples in this paper are derived from data post-multi-WiFi device selection, resulting in improved data quality compared to the public datasets.Additionally, the action types examined in this paper primarily comprise common daily activities, which are coarse-grained and relatively less susceptible to environmental influence.

Effect of the Number of Rx and Dynamic Selection
To elucidate the impact of the number of WiFi devices, the experiments in this section vary the number of Rx from two to seven (five-shot) in both the conference room and the classroom environments.Increasing the number of Rx devices leads to higher accuracy and less variation, as dynamic device selection mitigates the performance degradation caused by improper device placement.It can be observed that the improvement diminishes when the number of receiving devices exceeds five.Therefore, it can be inferred that having more WiFi devices in a typical home environment is beneficial, as long as there is sufficient space.However, when there are more than five devices, the enhancement in perceptual accuracy is not as pronounced.Each group of experiments comprises three cases of dynamic device selection (Dynamic Selection), selection by distance (Distance Selection), and no selection (No Selection), as depicted in Figure 10.Even in cross-domain scenarios, the recognition error rate of dynamic device selection remains predominantly below 0.1, which is significantly superior to non-dynamic selection.diminishes when the number of receiving devices exceeds five.Therefore, it can be inferred that having more WiFi devices in a typical home environment is beneficial, as long as there is sufficient space.However, when there are more than five devices, the enhancement in perceptual accuracy is not as pronounced.Each group of experiments comprises three cases of dynamic device selection (Dynamic Selection), selection by distance (Distance Selection), and no selection (No Selection), as depicted in Figure 10.Even in crossdomain scenarios, the recognition error rate of dynamic device selection remains predominantly below 0.1, which is significantly superior to non-dynamic selection.

Effect of Different Sample Sizes
The experiments in this section examined the impact of different sample values on the accuracy of the Wi-CHAR platform by adjusting various K values (sample values in each category) of the training prototype network, as shown in Figure 11a.Additionally, the effect of different subjects on various sample sizes was verified, as depicted in Figure 11b.

Effect of Different Sample Sizes
The experiments in this section examined the impact of different sample values on the accuracy of the Wi-CHAR platform by adjusting various K values (sample values in each category) of the training prototype network, as shown in Figure 11a.Additionally, the effect of different subjects on various sample sizes was verified, as depicted in Figure 11b.
From the aforementioned experimental results, it can be deduced that our network demonstrates minimal influence between different environments and subjects.The average accuracy exceeds 93% in the five-shot condition, while in the ten-shot condition, the average recognition rate surpasses 97%.In other words, recognition accuracy increases gradually as the number of samples increases.

Effect of MF-DBSCAN Algorithm
To validate the degree of impact of improved clustering-based data processing algorithms on the system, this section compares density-based clustering (DBSCAN), improved density-based clustering (MF-DBSCAN), Gaussian mixture models (GMMs-EM), From the aforementioned experimental results, it can be deduced that our network demonstrates minimal influence between different environments and subjects.The average accuracy exceeds 93% in the five-shot condition, while in the ten-shot condition, the average recognition rate surpasses 97%.In other words, recognition accuracy increases gradually as the number of samples increases.

Effect of MF-DBSCAN Algorithm
To validate the degree of impact of improved clustering-based data processing algorithms on the system, this section compares density-based clustering (DBSCAN), improved density-based clustering (MF-DBSCAN), Gaussian mixture models (GMMs-EM), K-mean clustering algorithms (K-means), learning vector quantization algorithms (LVQ), and hierarchical clustering methods (AGNES).The comparison results are shown in Figure 12a.From the comparison results, we can see that the accuracy of the traditional DBSCAN algorithm is above 85%, and the improved MF-DBSCAN algorithm can reach more than 92%, which is higher than other clustering algorithms.Therefore, the improved DBSCAN algorithm is selected for the data clustering process in this paper.From the comparison results, we can see that the accuracy of the traditional DBSCAN algorithm is above 85%, and the improved MF-DBSCAN algorithm can reach more than 92%, which is higher than other clustering algorithms.Therefore, the improved DBSCAN algorithm is selected for the data clustering process in this paper.Next, we analyzed the effect of the MF-DBSCAN algorithm on the classification network used in this paper, and the experimental results in Figure 12b show that the classification model (DBSCAN + PN) with only traditional DBSCAN and traditional prototype network processing is relatively poor (the AUC is only 0.667), while the classification model using the improved DBSCAN method under the traditional prototype network condition has an AUC of 0.739 and 0.802 under the DBSCAN+Re-PN condition.The AUC of the classification model using the improved DBSCAN method under the traditional prototype network condition is 0.739, and the AUC under the DBSCAN + Re-PN condition is 0.802.It can be concluded that the improved prototype network is obvious for the classification effect of this paper, and the model advantage is significantly improved.Furthermore, the AUC under the MF-DBSCAN + Re-PN condition can reach 0.926, showing that the impact of MF-DBSCAN on the classification model is also larger.Our improvement of the two traditional methods has had a significant performance improvement.The cornerstone of the HAR system Wi-CHAR proposed in this paper is the reassignment of a prototype network (Re-PN), an improvement upon the original PN.To assess the effectiveness of this enhancement, the experiments in this section compare the performance of the conventional PN, the Re-PN within the current system configuration, and Next, we analyzed the effect of the MF-DBSCAN algorithm on the classification network used in this paper, and the experimental results in Figure 12b show that the classification model (DBSCAN + PN) with only traditional DBSCAN and traditional prototype network processing is relatively poor (the AUC is only 0.667), while the classification model using the improved DBSCAN method under the traditional prototype network condition has an AUC of 0.739 and 0.802 under the DBSCAN+Re-PN condition.The AUC of the classification model using the improved DBSCAN method under the traditional prototype network condition is 0.739, and the AUC under the DBSCAN + Re-PN condition is 0.802.It can be concluded that the improved prototype network is obvious for the classification effect of this paper, and the model advantage is significantly improved.Furthermore, the AUC under the MF-DBSCAN + Re-PN condition can reach 0.926, showing that the impact of MF-DBSCAN on the classification model is also larger.Our improvement of the two traditional methods has had a significant performance improvement.

Comparison of Different Metrics Models
The cornerstone of the HAR system Wi-CHAR proposed in this paper is the reassignment of a prototype network (Re-PN), an improvement upon the original PN.To assess the effectiveness of this enhancement, the experiments in this section compare the performance of the conventional PN, the Re-PN within the current system configuration, and other similar computing network structures (Siamese Network (SN), Matching Network (MN), and Relation Network (RN)).Additionally, as depicted in Figure 13a, the average accuracy of Re-PN is 12% higher than that of the traditional PN.

Algorithm Complexity Analysis
For Algorithm 1 and Algorithm 2, the time complexity of Algorithm 1 is where n denotes the number of candidate Rx's, of which there are only a small number.
For MF-DBSCAN, the basic time complexity is related to the amount of clustered data, deriving the points whose densities are connected according to _ eps list , _ Minpts list , and then iterating until all core sample points have a corresponding class, related to the time required to find the points, but this is of a smaller order of magnitude.The worst case is 2 ( ) O m , where m is the number of points, and its space complexity is ( ) O m .Our feature extraction uses Resnet18 [33] and then operates by Euclidean distance metric, softmax, etc.The time complexity mainly comes from convolutional operations; the time complexity of this framework is 1.8 × 10 9 .This shows that our framework is significantly better than methods such as CNN + LSTM in terms of time overhead.

Comparison with Existing Methods
We have compared Wi-CHAR with several other recent cross-domain recognition methods in various ways to demonstrate the performance of our approach.These include transfer learning frameworks (Sheng et al. [20]), traditional CNN/RNN frameworks (CLAR [36], CDAR [37]), adversarial learning architectures (CrossGR [38]), and meta- The choice of similarity metric is another crucial factor.This experiment compares the effects of two metrics, namely Euclidean distance and cosine similarity.The experiments in this section were conducted multiple times within the domain for three datasets, with the data input type being DFS.As illustrated in Figure 13b, the average accuracy of Wi-CHAR based on cosine similarity is lower than that of Wi-CHAR based on Euclidean distance.Therefore, it is more appropriate to employ Euclidean distance rather than cosine similarity in the Re-PN model.

Algorithm Complexity Analysis
For Algorithm 1 and Algorithm 2, the time complexity of Algorithm 1 is O(n 2 ), where n denotes the number of candidate Rx's, of which there are only a small number.For MF-DBSCAN, the basic time complexity is related to the amount of clustered data, deriving the points whose densities are connected according to eps_list, Minpts_list, and then iterating until all core sample points have a corresponding class, related to the time required to find the points, but this is of a smaller order of magnitude.The worst case is O(m 2 ), where m is the number of points, and its space complexity is O(m).Our feature extraction uses Resnet18 [33] and then operates by Euclidean distance metric, softmax, etc.The time complexity mainly comes from convolutional operations; the time complexity of this framework is 1.8 × 10 9 .This shows that our framework is significantly better than methods such as CNN + LSTM in terms of time overhead.

Comparison with Existing Methods
We have compared Wi-CHAR with several other recent cross-domain recognition methods in various ways to demonstrate the performance of our approach.These include transfer learning frameworks (Sheng et al. [20]), traditional CNN/RNN frameworks (CLAR [36], CDAR [37]), adversarial learning architectures (CrossGR [38]), and metalearning frameworks (MatNet-eCSI [28], ML-WiGR [39]).We focused on the core metrics common to the above methods: accuracy, recognition target, main algorithm, and input features, using them as benchmarks for comparison while avoiding the introduction of other presentations and parameters.Although each method achieves some degree of cross-domain effect, the Wi-CHAR method can handle multiple domain factors, such as users and environments.Despite using DFS features, the MF-DBSCAN method does not consume more time.In terms of algorithms, for the basic feature extraction model, we only used CNN, which saves more training time compared to frameworks that commonly utilize CNN + LSTM methods.Additionally, the few-shot learning method can adapt to new domains with fewer samples, while transfer learning and adversarial learning methods require additional data samples.
Wi-CHAR achieves high recognition accuracy, demonstrating that our model is robust and can achieve acceptable generalization with a small number of training samples.Further details are provided in Table 4.

Conclusions
This paper proposes the Wi-CHAR system, a WiFi-based cross-domain HAR system focusing on scenes and restricted data.It achieves high accuracy and generality in HAR over large areas with fewer samples.Wi-CHAR demonstrates robustness and versatility, delivering effective results across various scenes.It overcomes the challenge of significant degradation in model accuracy in cross-domain scenarios and eliminates the need for retraining when data acquisition in real environments is limited.

Figure 1
Figure 1 presents an overview of the Wi-CHAR framework, divided into two main components: the data processing part and the motion sensing part.In the data acquisition and processing stage, it verifies the suitability of device arrangement in the scene, ensures the proper functioning of Tx-Rx pairs, and selects devices based on specific locations.The data from the most suitable receiving device is utilized for recognition.During the activity recognition phase, input features of the PN model are constructed to train the sensing model.Recognition results can be further obtained by adjusting the weights assigned to the PN.

Figure 1 Figure 1 .
Figure 1 presents an overview of the Wi-CHAR framework, divided into two main components: the data processing part and the motion sensing part.In the data acquisition and processing stage, it verifies the suitability of device arrangement in the scene, ensures the proper functioning of Tx-Rx pairs, and selects devices based on specific locations.The data from the most suitable receiving device is utilized for recognition.During the activity recognition phase, input features of the PN model are constructed to train the sensing model.Recognition results can be further obtained by adjusting the weights assigned to the PN.

Figure 2 .
Figure 2. WiFi receiving device selection for targets in different positions.

Figure 2 .
Figure 2. WiFi receiving device selection for targets in different positions.
∈  is the D-dimensional feature vector of the sam- ples and {1,..., } i y k ∈ is the corresponding label.k S S ∈ denotes the set of samples labeled as class k .The D-dimensional original data are first mapped to the M-dimensional embedding space θ .For the support set, all | | k S sample images of the same class are extracted by the neural network feature mapping function f θ features.For the query set sample x  , it is projected into the same feature embedding space ( ) f x θ  as the support

Figure 4 .
Figure 4. Schematic diagram of activity recognition based on few-shot learning.

Algorithm 3
Re-weighting prototypical network model (Re-PN model) categories N contained in the support set, K is the number of classes in the training set.

Figure 4 .
Figure 4. Schematic diagram of activity recognition based on few-shot learning.
The training and testing phases were conducted on a Windows desktop featuring an Intel Core i9-10700kF CPU, 24GB RAM, NVIDIA GeForce GTX 3080ti GPU, and PyTorch-1.8.0 framework.Action-related feat ure representations Ac tion -re lated d o m ain -

Figure 5 .
Figure 5. Scenarios for collecting human activity datasets.

Figure 6 .
Figure 6.Confusion matrix calculated in three action datasets.

Figure 6 .
Figure 6.Confusion matrix calculated in three action datasets.

Figure 7 .
Figure 7. Recognition accuracy of actions in different environments.

Figure 7 .
Figure 7. Recognition accuracy of actions in different environments.

Figure 9 .
Figure 9. Activities performed by a new user in a new scenario.

Figure 9 .
Figure 9. Activities performed by a new user in a new scenario.

Figure 10 .
Figure 10.Comparison of device selection accuracy across domain conditions.4.3.2.Effect of Different Sample Sizes The experiments in this section examined the impact of different sample values on the accuracy of the Wi-CHAR platform by adjusting various K values (sample values in each category) of the training prototype network, as shown in Figure 11a.Additionally, the effect

Figure 10 .
Figure 10.Comparison of device selection accuracy across domain conditions.

Figure 10 .
Figure 10.Comparison of device selection accuracy across domain conditions.
Sensors 2024, 24, x FOR PEER REVIEW 15 of 19 K-mean clustering algorithms (K-means), learning vector quantization algorithms (LVQ), and hierarchical clustering methods (AGNES).The comparison results are shown in Figure 12a.

Figure 12 .
Figure 12.(a) Effect of base classifier type; (b) Effect of MF-DBSCAN method on classification network.

Figure 13 .
Figure 13.(a) Comparison of different similarity computational network models; (b) Comparison of different similarity measures.

Figure 13 .
Figure 13.(a) Comparison of different similarity computational network models; (b) Comparison of different similarity measures.
Re-weighting prototypical network model (Re-PN model) Input: Training set P = {(x 1 , y 1 ), . . ., (x N , y N )}, Number of categories N contained in the support set, K is the number of classes in the training set.Output: Re-PN Loss J of Classifier Model.1: V ← Rs({1, . . ., K}, N C ) ; //Few-shot task set.2: for k in {1, . . . ,N C } do Select query set.12: for (x, y) in Q k do //Calculate losses and update model parameters.
13: Calculate losses L p , L c ; 14: update Loss J ← J + L p + L c .15: end for 16: end for Sensors 2024, 24, x FOR PEER REVIEW 9 of 19The distribution of data samples Action-related feature representations Re-PN modifies the calculation of the prototype to make it suitable in noisy scenarios Source Domain

Table 1 .
Types of human activity.

Table 1 .
Types of human activity.

Table 2 .
Accuracy of HAR in different scenes.

Table 2 .
Accuracy of HAR in different scenes.

Table 4 .
Comparison of Wi-CHAR with other cross-domain systems.