IQ-Data-Based WiFi Signal Classiﬁcation Algorithm Using the Choi-Williams and Margenau-Hill-Spectrogram Features: A Case in Human Activity Recognition

: This paper presents a novel approach that applies WiFi-based IQ data and time–frequency images to classify human activities automatically and accurately. The proposed strategy ﬁrst uses the Choi–Williams distribution transform and the Margenau–Hill spectrogram transform to obtain the time–frequency images, followed by the offset and principal component analysis (PCA) feature extraction. The offset features were extracted from the IQ data and several spectra with maximum energy values in the time domain, and the PCA features were extracted via the whole images and several image slices on them with rich unit information. Finally, a traditional supervised learning classiﬁer was used to label various activities. With twelve-thousand experimental samples from four categories of WiFi signals, the experimental data validated our proposed method. The results showed that our method was more robust to varying image slices or PCA numbers over the measured dataset. Our method with the random forest (RF) classiﬁer surpassed the method with alternative classiﬁers on classiﬁcation performance and ﬁnally obtained a 91.78% average sensitivity, 91.74% average precision, 91.73% average F1-score, 97.26% average speciﬁcity, and 95.89% average accuracy.


Introduction
With the development of life rescue technology, especially the development of detection technology for earthquake survivors and outdoor sports victims, human activities behind obstacles such as walls and debris have become a critical direction in life detection [1]. An essential characteristic of microwaves is their weak diffraction ability and almost linear propagation. Their carrier frequency determines the ability of the microwave signal to pass through a wall. Microwaves penetrate well through concrete walls [2] with a 2-4 GHz carrier frequency. In the range of this frequency, the power is low and will not harm the human body. According to the IEEE 802.11 standard, WiFi signals use a 2.4-2.4835 GHz carrier frequency [3], so WiFi signals can pass through walls [4]. Wireless behavior recognition based on WiFi is realized by detecting the WiFi signals' characteristics reflected by the human body [5]. Using WiFi signals to carry out nonvisual behavior recognition has substantial research and application value in life rescue.
Some specific problems limit the application of WiFi for nonvisual behavior recognition, such as cochannel interference, anisotropic wireless propagation, and data traffic jams in WiFi networks. Reference [6] showed that WiFi network interference can cause radar performance deterioration by enhancing the probability of false alarms. Besides the cochannel interference, the anisotropic wireless propagation also has adverse effects on the performance. A WiFi device with a link-centric architecture, even if the underlying devices are all equipped with omnidirectional antennas, creates an anisotropic wireless propagation environment [7]. Moreover, due to data traffic jams in WiFi networks, the beacon signal interval is challenging to manipulate [8]. The important information in reflected WiFi signals received by the spectrum/signal analyzer is easily lost. The classification performance based on weak information data is likely to be poor.

Article Contribution
To improve the classification performance based on weak information, we propose a novel classification algorithm based on time-frequency features using WiFi signals to improve classification performance in this article. Our method applies the Choi-Williams distribution [9] and the Margenau-Hill spectrogram distribution [10] time-frequency analysis to obtain the images of the signals. The classification features include offset parameters and principal component analysis (PCA) values. Our approach uses energy to obtain the central time frames of the spectra and image slices. The offset parameters are calculated from the IQ data and several spectra with maximum energy values in the time domain, and the PCA values are calculated using the whole images and several image slices on them with rich unit information. This strategy is likely to avoid the weak unit information of the whole time-frequency image because the unit information of the image slices is rich compared to that of the entire time-frequency image. Hence, our method of using the features from the entire signal is likely to boost the classification performance.

Symbols and Article Organization
In this paper, scalars are denoted by lowercase letters, e.g., x, whereas vectors are denoted by bold lowercase letters, x. Matrices are denoted by bold uppercase letters, X. Furthermore, =denotes the equal operator. (·) * and E(·) denote the conjugate operator and the estimated operator, respectively.
The remainder of this paper is structured as follows. Section 2 introduces the related works. Section 3 describes the details of our method. Section 4 describes the experimental environment and the recording process of the measurement data. Our algorithm performances are illustrated with numerical results from the human activity classification in Section 5. Finally, conclusions are drawn in Section 6.
In [4], the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) proposed a transparent wall technology with a low bandwidth, low power consumption, a compact structure, and accessibility by nonmilitary entities. This technology uses the 2.4 GHz WiFi signal based on the Industrial, Scientific, and Medical band, eliminating static object reflection, including walls. The CSAIL used a radio frequency capture device to capture the wireless signal behind a wall or occlusion and reconstructed the human image by analyzing the reflected wireless signal [11]. To realize static target positioning by sensing the micromotion caused by the target's breathing, Reference [12] proposed a multiperson positioning system based on the wireless signal in a complex environment.
Reference [7] designed a scheme with a ubiquitously deployed WiFi infrastructure and evaluated it in typical multipath-rich indoor scenarios using CSI data. In [13,14], Y. Zeng et al. applied the WiFi CSI to classify shopper status and recognize the gait of people 2-3 m away. Wang et al. used the CSI to identify human gait [15] and proposed a human activity recognition and monitoring system to quantify the relationship between human movement speed and human activity [16]. M. I. Khan et al. used the CSI to track vital signs and remove any outliers from the gathered data [17]. Yuan et al. applied the CSI to extract features for device-free human activity recognition in b5G wireless communication [18]. Sharma et al. used the CSI to train convolutional neural networks and classify human activities in different places [19]. The device-free system in [20] investigated drivers' activities as a multiclass classification problem leveraging the CSI of the WiFi signals for better discrimination of in-vehicle activities. Reference [21] developed a human activity recognition system via the CSI, and Reference [22] extracted various time and frequency domain features via the system. Furthermore, Reference [23] recognized human activities based on environment-independent fingerprints extracted from the CSI.
Reference [24] used 2.4 GHz bistatic passive radar to detect a moving human target behind a wall and obtained the range and Doppler information. In [6], A. D. Singh et al. studied the cochannel interference between WiFi and through-wall micro-Doppler radar based on the features of indoor walking at a frequency of 2.462 GHz. Our previous work [25] classified human activity via the PCA features of the short-time Fourier transform time-frequency image of samples.

Methodology
This section includes three subsections. The first one introduces the time-frequency methods used in this article. The second one displays the classification features extracted from the image. The last one presents the framework of our approach.

Time-Frequency Methods
Due to the information that the reflected WiFi signal being too weak to obtain a good classification performance, we applied the Choi-Williams distribution [9] image and Margenau-Hill-Spectrogram distribution [10] image together for the feature extraction in this article.
(1) The Choi-Williams distribution has the following expression: where x denotes the input signal and t and f denote the vectors of the time instants and normalized frequencies, respectively. σ denotes the standard deviation, i.e., the square root of the variance.
(2) The Margenau-Hill-Spectrogram distribution has the following expression: where F x (·) denotes the short-time Fourier transform of x with the analysis window g.

Classification Features
In this article, we applied offset parameters and PCA parameters as the classification features.
The offset parameters included the mean [27], standard deviation [28], variance [29], skewness [30], kurtosis [31], and central moment [32]. The mean parameter measures the central tendency of the signal probability distribution. The standard deviation is a measure of the amount of variation or dispersion of the input signal. The variance, measuring how far the signal spreads out from its average value, is the expectation of the squared deviation. The kurtosis measures the "tailedness" of the signal probability distribution. The skewness and central moment measure the asymmetry and moment of the probability distribution of the signal about its mean, respectively. For convenience, the formula of the skewness, kurtosis, and central moment can be written as follows: where µ and σ denote the mean and the standard deviation of x, respectively, and k denotes the order of the central moments.
Orthogonal linear PCA, which transforms the input signal into a new coordinate system, is often used to process the spectral information by extracting its main features and reducing the computational complexity [33,34]. The most significant variance, via some scalar projection of the signal, lies on the first coordinate, i.e., the first principal component. The second significant variance lies on the second coordinate, etc. Figure 3 shows the first 60 PCA values of all the subfigures in Figures 1 and 2. In this article, the principal components were calculated by the singular-value decomposition of the whole image or image slices.

Method Framework
Due to the data traffic jams in WiFi networks and the anisotropic wireless propagation of WiFi devices, the reflected WiFi signals via the spectrum/signal analyzer are likely to lose important information, resulting in weak classification performance. To solve this problem, we propose an approach to human activity classification based on the Choi-Williams distribution and Margenau-Hill-Spectrogram distribution time-frequency images with the offset features of the whole IQ signal, as shown in Figure 4. In our method, the first group of offset parameters was calculated directly by the transformed IQ data. The second group was calculated via the spectrum of the time frame with the maximum energy in both the Choi-Williams distribution and Margenau-Hill-Spectrogram distribution time-frequency images, and the third one came from the time frame spectrum with the second maximum energy of the images. Different unit images had the same small scale, and the unit image was a subset of a spectrogram image. As shown in Figures 1 and 2, the unit information is rich on several image slices instead of the whole image. Hence, not only did we perform PCA to analyze the entire time-frequency image, but we also performed PCA to analyze the image slice with rich unit information. The central time frame selection of the image slice was the same as the time frame selection of the spectrum for the offset parameters' calculation.

Experimental Environment
In the experiment, the transmitter was an ASUS ROG GT-AX11000 tri-band WiFi gaming router, and the receiver was a Tektronix RSA 306B spectrum/signal analyzer. Moreover, the data recorder was a Thinkpad X1 with the Tektronix SignalVu-PC software. The measurement data were collected in the corridor on the 10th floor of the Science and Engineering Building A at Inner Mongolia Normal University, as shown in Figure 5. The distance between the router and receiver antennas was 10 m, with the target at the center. The heights of the router and receiver antennas were approximately 1 m, with the same centroid height as the subject. The router was operated at 2.412 GHz, with an instantaneous bandwidth of 20 MHz, satisfying the receiver antennas' range (1.5-3.5 GHz). The experimental data were collected from four categories of signals, including idle and three different activities (the marching-in-place exercise, rope skipping, and arms rotating). There were 3000 samples in every signal category, with 12,000 samples in total. The holdout partition randomly selected the training and testing samples.

Results and Discussion
We applied six statistics (sensitivity, precision, F1-score, specificity, accuracy, and classification rate) to measure the classification performance in this section. Therein, the first five statistics measured every activity result, and the classification rate was for the whole performance. Assume that P and N denote the number of positive and negative samples, respectively. TP and FP denote the number of true positives and false positives, respectively. A true positive means an activity was labeled correctly, while a false positive means a false alarm, i.e., another activity was labeled as the activity under test. Furthermore, TN and FN denote the number of true negatives and false negatives, respectively. A true negative is also known as a correct rejection, while a false negative is a missed detection. These measures can be expressed as: which yield First, we assessed the performance using the measured WiFi signals of recognizing different activities (the marching-in-place exercise, rope skipping, and arms rotating) and the idle condition. Every category included 3000 samples; thereby, the total number of samples was 12,000, with the scatter plot given in Figure 6. As shown in Figure 4, the extracted features in our method came from the time-frequency images, whose generation function and parameters were the same as those in the last section. Ten groups of image slices with ten PCA values each were used for the calculation. The time length of every image slice was 200 ms. Six kinds of machine-learning-based classifiers, including two kinds of K-nearest neighbors (K = 3 or 5) [21,35], bagging [36], boosting [36,37], random forest (RF) [38,39], and support vector machine (SVM) [21,22,40], were applied for the classification. Therein, the ensemble type of the boosting classifier was AdaBoostM2. In addition, the kernel of the SVM was a two-order polynomial function with the auto-kernel scale, whereas the box constraint was set to one with true standardization. The holdout cross-validation partition (p = 0.3) was used via selecting 70% (8400 samples) for learning features and the remaining 30% (3600 samples) for testing. The sensitivity, precision, F1-score, specificity, and accuracy of the classifications are shown in Figure 8. Compared to the other classifiers, using the RF classifier in our method was likely to obtain the best performance in this scenario. The confusion matrix of the classification via the RF classifier is given in Table 1, with the sensitivity, accuracy, and specificity in Table 2.   To evaluate the effect of the number of image slices on the classification performance, we calculated the classification performance under different numbers of image slices. The method and parameter setting of the feature extraction and classifiers were the same as those of the last evaluation. As Figure 9 shows, with the increase of the slice number, the classification rate improved. When the image slice number was equal to zero, i.e., all the features came from the whole signal or image without the high-quality features from the image slice with rich unit information, the performance decreased. In this figure, the classification rates were 25.86% (NN3), 26.29%(NN5), 46.26% (boosting), 74.72% (bagging), 84.21% (SVM), and 85.77% (RF) without the image slice features. When the image slice number was equal to 10, the classification rates were 37.90% (NN3), 38.86% (NN5), 61.08% (boosting), 79.64% (bagging), 88.25% (SVM), and 91.78%(RF), respectively. The classification rate of the boosting classifier was boosted by 14.82%, and that of the RF classifier improved by 6.01%. Due to the classification rate of the RF classifier being higher than the classification performance of the others, the details of the classification performance of the RF classifier are analyzed in Figure 10. In this figure, the average precision, average F1-score, average specificity, and average accuracy of the four categories of WiFi signals in our method were likely to reach 91.74%, 91.73%, 97.26%, and 95.89%, respectively. To evaluate the effect of the number of images or image slices on the classification performance, we calculated the classification performance under different numbers of image slices. The image slice number was set to 10, and the other parameter settings were the same as those of the last evaluation. The results are shown in Figure 11. In this figure, the PCA values negatively affected the classification performance via the boosting classifier while positively affecting the classification performance via the other classifiers. Moreover, the classification performances of the RF classifier were higher than those of the other classifiers. There was a 2.77% improvement of the classification rates between no PCA features (89.01%) and 10 PCA features per image or image slice via the RF classifier.

Conclusions
In this article, a novel approach that applied the features from the IQ data and the time-frequency images to classify human activities automatically and accurately was proposed. The two images were from the time-frequency transform of the Choi-Williams distribution and the Margenau-Hill spectrogram distribution. There were two categories of features in the presented strategy, i.e., the offset parameters and the PCA values. The offset parameters, with the mean, standard deviation, variance, skewness, kurtosis, and central moments included, were calculated by the IQ data and several spectra with maximum energy values in the time domain. The PCA values were calculated by the whole images and several image slices on them with rich unit information.
The proposed algorithm was validated on the experimental data. Our method was shown to be more robust to varying image slices or PCA numbers over the measured dataset, including three activities (the marching-in-place exercise, rope skipping, and arms rotating) and the idle signal. Experimentally, our method with the RF classifier surpassed the methods with alternative classifiers on the classification performance and finally obtained a 91.78% average sensitivity, 91.74% average precision, 91.73% average F1score, 97.26% average specificity, and 95.89% average accuracy. Moreover, the classification results showed that with the increase of slice number and PCA number, the classification rates of our method with the RF classifier improved by 6.01% and 2.77%, respectively.
In future work, we can consider the denoising method of the WiFi signal and the method of the physical feature extraction. Under the conditions of a complex and complicated environment, various WiFi signals have mutual interference when using similar channels, and thus, the received signals are accompanied by noise. Suppressing the noise to obtain the denoising signal is likely to affect the classification performance positively. Moreover, the features of our method were offset parameters and PCA values, which means they had nothing to do with the physical features of the activities. If we need to map the features and activities together, we need to consider the problem of the physical features' extraction. Funding: This research was partially funded by the CRRC corporation limited-Research on Modular technology of rail transit intelligent detection system (CIJS20-JS042-R) and the central government guiding local scientific and technological development funds-Realization of multi-input multi-output forward scattering physiological information detection radar (2021SZVUP023).

Conflicts of Interest:
The authors declare no conflict of interest.