Human Walking Direction Detection Using Wireless Signals, Machine and Deep Learning Algorithms

The use of wireless signals for device-free activity recognition and precise indoor positioning has gained significant popularity recently. By taking advantage of the characteristics of the received signals, it is possible to establish a mapping between these signals and human activities. Existing approaches for detecting human walking direction have encountered challenges in adapting to changes in the surrounding environment or different people. In this paper, we propose a new approach that uses the channel state information of received wireless signals, a Hampel filter to remove the outliers, a Discrete wavelet transform to remove the noise and extract the important features, and finally, machine and deep learning algorithms to identify the walking direction for different people and in different environments. Through experimentation, we demonstrate that our approach achieved accuracy rates of 92.9%, 95.1%, and 89% in detecting human walking directions for untrained data collected from the classroom, the meeting room, and both rooms, respectively. Our results highlight the effectiveness of our approach even for people of different genders, heights, and environments, which utilizes machine and deep learning algorithms for low-cost deployment and device-free detection of human activities in indoor environments.

One of the challenges in HAR systems is human walking direction detection (HWDD).HWDD can be used in various applications, including:

•
In security systems, to track the movement of individuals in restricted areas and detect any unauthorized access or suspicious behavior.

•
In healthcare, to monitor the movement of patients in hospitals or nursing homes and alert staff if a patient is wandering or falls [16,22,29].

•
In retail stores, to track the movement of customers and analyze their behavior, such as the areas where they spend the most time, the products with which they interact, and the path they take through the store.

•
In robotics, to control the movement of robots and ensure their safety around humans.
Therefore, HWDD using wireless signals can enable various automated and intelligent systems to be more efficient, effective, and responsive to human needs and improve their well-being in terms of health and comfort [26].
Several studies have investigated the utilization of wireless signals for detecting human walking direction, speed, and location [9,10,[31][32][33][34][35][36].The previous works are based on RSSI and CSI amplitudes; however, they have serious limitations, such as degraded accuracy in small spaces, environmental changes, and a need for a specific device setup [9,10,31,34,36].Moreover, these methods have relied only on a small subset of available signal streams from multi-antenna systems, resulting in limited predictive capabilities for determining the walking direction [34].
In this paper, we propose an approach that leverages machine and deep learning and all available signal streams of commercial multi-antenna Wi-Fi devices, enabling more accurate and reliable walking direction detection.We obtain the CSI values of signals transmitted via pairs of transmitting antennas and received via pairs of receiving antennas of 802.11nWi-Fi devices.We use the Hampel filter algorithm and the discrete wavelet transform (DWT) to extract the features of the amplitude of the signal and the calibration to extract the features of the phase of the signal.The performance of four different classifiers, namely K-nearest neighbor (KNN), random forest (RF), the support vector machine (SVM), and the one-dimensional convolutional neural network (1D-CNN), was investigated using a pre-labeled dataset.Performance evaluations showed that SVM is an optimal classifier, achieving HWDD accuracy rates of 92.9%, 95.1%, and 89% for untrained data collected from classroom, meeting room, and both rooms, respectively.This high precision indicates that our approach effectively addresses the challenges of achieving high accuracy in HWDD in the presence of environmental changes and diversity in users.The major contributions of this paper are as follows: • First, we thoroughly analyze the literature related to HAR in general and HWDD in particular.• Then, we identify the major weaknesses of existing HWDD schemes.
Next, we propose our new approach for HWDD, which uses the CSI of all available signal streams, the Hampel filter algorithm, and DWT to denoise and extract the features of amplitude, phase calibration, and finally, machine and deep learning algorithms to identify the human walking direction.• Then, we perform extensive experiments under different conditions, such as various environments and diversity in users.• Finally, we evaluate the performance of our approach using a variety of machine and deep learning classifiers, including RF, KNN, SVM, and 1D-CNN.
The remainder of this paper is organized as follows.Section 2 provides background information and presents a comprehensive review of the literature.Section 3 introduces a new approach to HWDD and describes the experiments and data collection methodologies.Section 4 presents the metrics and performance evaluation results.Finally, Section 5 concludes the article and discusses future work.

Channel State Information
Channel state information (CSI), denoted as H(f,t) for carrier frequency f and time t, can be described using the channel frequency response (CFR).The relationship between the transmitted signal X(f,t) and the received signal Y(f,t) in the frequency domain is given by In Wi-Fi, a wireless channel consists of multiple orthogonal frequency division multiplexing (OFDM) subcarriers.For example, a 20 MHz wide Wi-Fi channel includes 56 subcarriers.Upon receiving a packet (a data unit in computer networks), the Wi-Fi receiver measures CSI for each subcarrier.
The 802.11n Wi-Fi devices have multiple antennas that enable them to send and receive packets using multiple signal streams: a stream per each transmit and receive antenna pair.The 802.11n receiver measures the CSI for each subcarrier and for each signal stream.
The CSI matrix is a representation of the channel information between the transmitting and receiving antennas in a wireless communication system.It provides vital information on the amplitude, phase, and frequency response of the channel, which are essential for efficient data transmission and reliable communication.The Wi-Fi receiver has the capability to generate a sampled version of the signal spectrum for each subcarrier, including complex numbers that represent both amplitude attenuation and phase shift.These data can be summarized as where ||H i || represents the amplitude and ∠H i represents the phase of the CSI at the ith subcarrier.
The dimensions of the CSI matrix are determined by the number of transmitting and receiving antennas, as well as the number of subcarriers within the channel bandwidth.Assuming one complex value per subcarrier, the size of the CSI matrix can be calculated as M × N × 56 for a 20 MHz channel bandwidth and M × N × 114 for a 40 MHz channel bandwidth.Here, M and N represent the number of transmitting and receiving antennas, respectively [37,38].

Literature Review
Currently, many HAR techniques exist that aim to track human gestures, activities, movements, counts, and walking directions using wireless signals in indoor environments.Some of them are summarized in Appendix A in Tables A1-A4.In the following subsections, we discuss the main contributions in each category.

Human Gesture Detection
One of the potential areas where wireless signals can be employed is human gesture detection.There are many works in literature related to this area of study.For example, Wisee is the first wireless system that addressed the NLOS environment and can detect nine gestures with an accuracy of 94%.It used the analysis of Doppler shifts in wireless signals during transmission.However, the accuracy degraded with an increase in the number of interfering users [39].In another work [40], the authors used CSI and deep learning techniques to recognize 12 gestures, and its accuracy was 98%.There is a similar work called WiRoI, but it used SVM classifiers instead of deep learning to recognize human movements.However, it was trained for only a single user [27].In another similar work [41], the authors used CSI and KNN classifier to recognize five different hand gestures with an accuracy of 95% and 85% using amplitude and phase information, respectively.
The authors of WiFinger used CSI and multidimensional dynamic time wrapping to recognize eight finger gestures with an accuracy of 93% and 90% in LOS and NLOS conditions, respectively [30].FreeGesture used CSI and a convolutional neural network (CNN) classifier to recognize six gestures with an accuracy of 95.8%.However, it was applied to only one user [21].In [11], the authors used RSSI and one-dimensional CNN (1D-CNN) to recognize different hand gestures in the recognition area with an accuracy of 86.91%.AirDraw used CSI (phase and triangulation) to detect handwriting, with a median error of 2.2 cm [12].Wiga used CSI and CNN-long short-term memory (LSTM) to detect yoga exercises with an accuracy of 97.7% and 85.6% for training and testing data, respectively [23].WiADG used CSI and CNN classifiers to detect six gestures with an accuracy of 98% and 66.6% in the original environment and under environmental changes, respectively.However, noise removal was not used [24].FingerPass used CSI and LSTMbased deep neural network (DNN) to detect finger gestures with an accuracy of 80% [25].Table A1 in Appendix A includes a structured comparison of the works in this category.

Human Activity Detection
We will now describe some of the works that aim to detect human activity using wireless signals.PAWS used RSSI and KNN classifiers to recognize six activities with an accuracy of 72.47% [42].WiFall used CSI to detect falls with an accuracy of 90% and 94% using SVM and RF algorithms, respectively.However, it was designed and tested on a single user [13].E-eyes used RSSI and CSI to recognize two activities by using clustering.They can accurately identify activities that are closely associated with specific locations, indicating a high correlation between activity and location.However, it was designed and tested on a single user [29].CARM utilized CSI to recognize three activities by measuring the correlation between speed and movement and using a hidden Markov model (HMM) to identify specific activities with an accuracy rate of 96% [8].WiGest used RSSI to recognize three different activities in NLOS scenarios with an accuracy of 87.5% for different users [43].In [15], the authors used CSI and the class-estimated-basis space singular value decomposition non-negative matrix factorization (CSVD-NMF) for an activity recognition system and occupancy detection with an accuracy of 90.6% and 91%, respectively.Freedetector used CSI and RF classifiers to detect the presence of humans with an accuracy of 93.73%.However, it was designed and tested on a single user [16].DeepSense used CSI and CNN to recognize different activities with an accuracy of 97.4% [26].In [17], the authors used CSI and CNN-LSTM model to detect eight different activities with an accuracy of 96%.However, it was designed and tested on only one user, and due to the need for multiple receivers to achieve precise activity detection, their model was not considered to be cost-effective.In [18], the authors used CSI and CNN classifiers to detect five different activities with an accuracy of 78%.However, it was designed and tested on one user.CDHAR used CSI and the ensemble method to recognize five activities with an accuracy of 90% [19].In [20], the authors used CSI and different machine learning classifiers to detect three activities with an accuracy of 95.6%.FALAR used CSI and CSVD-NMF to recognize six activities with an accuracy of 90% [22].Some works in this category are summarized in Table A2 in Appendix A.

Human Movement Detection
There are several studies pertaining to detecting human movement using wireless signals.For example, WiWho uses CSI and the decision tree classifier to detect human steps with an accuracy of 80% and 92% for six and two users, respectively [14].WiDMove used CSI and SVM classifiers to identify the entrance and departure of users with an accuracy of about 80% [44].WifiU analyzed the patterns of Wi-Fi signals in the frequency domain using SVM classifier to estimate the impact of the torso and legs on the Wi-Fi spectrum with an accuracy of 79.28%, 89.52%, and 93.05% for top-1, top-2, and top-3 based on 50 subjects, respectively.However, the experiment was carried out only in one place [45].In [46], the authors used CSI and a linear discriminant classifier to count the number of people in different rooms with an accuracy of 74%.However, the accuracy decreased by about 20% in larger rooms.A more structured comparison of the abovementioned works can be found in Table A3 in Appendix A.

Human Walking Direction Detection (HWDD)
HWDD is a topic of major interest in this paper.There are several works that use wireless signals for HWDD.For example, the authors of WiDir suggested the utilization of a Fresnel zone-based model to approximate the direction of a person's movement by analyzing the phase shift in CSI, achieving a median error rate below 10 degrees.The system requires a specific device setup consisting of a minimum of three laptops to establish two Fresnel zones to operate accurately [9].WiDar used CSI to estimate the velocity (speed and direction) of walking, as well as the location at a precision of a decimeter, analyzing Doppler frequency shifts.However, accuracy decreased as the user moved further away from the wireless link and the users were required to wear special clothing for the experiment [10].
WiDet used RSSI and CNN to detect human walking directions through corridors with an accuracy of 94.5% for a total of 163 walking events.However, RSSI is known to be susceptible to the shadowing effect and multipath, which can introduce errors and inaccuracies in signal measurements, and we did not find certain experiment details such as whether the experiment was conducted for a single user or multiple users, as well as the specific data training and testing size [31].In [32], the authors used CSI amplitudes to detect the relationship between the extracted signals, the number of people, the speed of walking, and the direction of walking for 286 samples of three people with a precision greater than 90%.However, accuracy is degraded with different training sizes.
In [34], the authors proposed two techniques to determine the walking direction of a user passing through a corridor where special sensor nodes are installed on both sides at three different heights.The first technique involves an unsupervised technique that utilizes the variance of RSSI within short-term windows.This method determines the walking direction by analyzing the time delay between signal fluctuations in two selected streams.The second technique is a supervised approach that compares the measured signal in a chosen stream with a reference signal in the same stream using dynamic time warping (DTW).The accuracy of both methods ranges from 40% to 99%, depending on the chosen streams for 280 samples.However, the limitation of these techniques is their reliance on a small subset of streams to determine the direction of the walk, which may not provide the best performance.In [33], the authors used CSI to classify the direction and velocity of human walking with an accuracy rate of 91% for 136 samples from three people in four corridors and achieved a differentiation of human height with an accuracy of 76.7%.However, except for the height experiment, the collected data were limited to a single user.
Gate-ID used CSI amplitudes and deep learning techniques to detect human gait in two directions (left and right) with an accuracy of 90.7% [35].WiDIGR used CSI based on the Fresnel zone and SVM classifier to detect walking direction with an accuracy of 78.28% and 92.83% for an apartment and empty rooms, respectively.However, the accuracy decreased by about 20% during testing.In addition, the accuracy degraded in small places and with changes in the environment and an increase in the number of people [36].Interested readers can find a more structured comparison of the abovementioned works in Table A4.

System Design
In this paper, we propose a new approach for HWDD leveraging the CSI of Wi-Fi signals and machine and deep learning algorithms, as shown in Figure 1.Our system utilizes the CSI of all available signal streams, unlike the previous methods, where only a subset of the streams were used.Moreover, the proposed scheme relies on feature extraction techniques such as phase calibration, the Hampel filter algorithm, and DWT.In this section, we first discuss the setup of the experiment, the data collection method, data preprocessing, feature extraction, and walking direction recognition.

Experiment Setup
We explored three alternatives to obtain CSI data from Wi-Fi routers, namely the Linux 802.11nCSI Tool [38], Atheros CSI Tool [47], and Nexmon Channel State Information Extractor [48].These tools require custom firmware and Linux wireless drivers, which require installation on devices instead of relying on vendor-provided solutions.Due to the obsolescence of the Network Interface Controller supported by the Linux 802.11nCSI Tool and the limited device compatibility of the Nexmon CSI Extractor, we used the Atheros CSI Tool.This open-source tool is designed for 802.11n measurement and experimentation, facilitating the extraction of comprehensive wireless communication details from Atheros Wi-Fi Network Interface Cards (NICs).This information includes CSI, received packet payload, data rate, timestamp, RSSI of each antenna, and other relevant metrics.
In our experiment, we used a laptop and a pair of TP-link TL-WDR4300 Wi-Fi routers with a channel bandwidth of 20 MHz and a frequency of 2.4 GHz (f or more details about the TP-Link TL-WDR4300 router, refer to the OpenWrt documentation: https://openwrt.org/toh/tp-link/tl-wdr4300_v1 (accessed on 1 September 2023)).These routers serve as a transmitter (TX) and receiver (RX), configured to operate in access point and client modes.The routers have two transmitting and two receiving antennas, resulting in a total of four signal streams, and the wireless communication details were extracted using 56 OFDM subcarriers as shown in Table 1.We installed OpenWrt on the routers, which is a Linux-based operating system designed for embedded devices.However, we used custom OpenWrt, which is modified to extract the CSI data to the user space [47].
In our experiments, one of the Wi-Fi routers was configured as a transmitter and the other as a receiver.The transmitter periodically (every 50 ms) sends a dummy packet to a receiver.Each dummy packet is transmitted through two antennas at the transmitter.In other words, two copies of the same dummy packet are transmitted at the transmitter, a copy per antenna.The receiver uses two of its antennas to receive.When a receiver receives the dummy packet, it actually receives four copies of it.The first two copies are received at one antenna; these copies are transmitted through different antennas at the transmitter.The second two copies are received at the second antenna.
The receiver obtains the CSI vector for each received copy of the packet.Since these copies are transmitted through two different transmit antennas and received at two different receive antennas, the surrounding environment, including human activity in the vicinity, affects the signals of the packets differently, and thus, the four vectors are not identical.The four vectors constitute a CSI matrix together.The receiver then sends the CSI matrix and its associated receipt timestamp to the laptop through a wired Ethernet link.The laptop is then responsible for handling, storing, and visualizing the computed CSI data.

Data Collection
Each vector in the CSI matrix has 56 entries, an entry per OFDM subcarrier.Each entry in the CSI vector is called a CSI value.Each CSI value includes two items: an amplitude and the phase of the subcarrier.Thus, for each packet, the laptop records of 56 × 4 × 2 = 448 CSI values.
In the experiments, the user performs certain activities for five seconds.For example, the user moves from left to right.During this time, the transmitter is configured to send 100 packets, that is, at a rate of 20 packets/s or at 50 ms intervals, so user activity affects the CSI matrices of the packets received at the receiver.A hundred CSI matrixes together constitute one sample.
Figure 2 depicts the two indoor environments, namely a classroom (6.5 m × 8.0 m) and a meeting room (5.5 m × 6.0 m), used for experiments and four points, between which the users walked during experiments.There are more tables and chairs in the classroom compared to the meeting room.The position of objects can affect the propagation of the signal in different ways, leading to different multipath effects in different environments.We set the transmitter and receiver at a height of 0.75 m in each environment, with a distance of 1.2 m between them.The samples were collected for four different activities, i.e., walking directions: left (from A to B), right (from B to A), up (from C to D), and down (from D to C). Nine volunteers with different genders (seven males and two females), ages (from 17 to 29 years old), heights (from 1.68 m to 1.88 m), and walking styles participated in the experiments.Each volunteer walked in each direction about five to seven times in both rooms, resulting in 186 and 182 samples from the classroom and meeting room, respectively.For example, if each of the nine volunteers had walked in each of the four directions five times, the total number of samples collected would be 5 × 4 × 9 = 180 samples for one room.

Phase Calibration
Figure 3 depicts the phases of one sample for the first thirty subcarriers.The stream ij represents CSIs of the packet copies that are transmitted from TX antenna i and received at RX antenna j.The raw phase information becomes impractical for usage due to the impact caused by the carrier frequency offset (CFO) and sampling frequency offset (SFO).CFO arises from a lack of synchronization in timing and phases between the transmitter and receiver before transmitting the packet [49].Figures 3a and 4a depict the raw phases of the first thirty subcarriers and a single subcarrier of one sample, respectively.It is obvious that the raw phase is not sufficient to provide insights into walking activity.However, Figures 3b and 4b show that after calibration, the new phase data exhibits reduced noise, and the change in phase during walking activity is noticeable.

Denoising the CSI Amplitudes Using the Hampel Filter Algorithm
Amplitudes are affected by various types of outlier and noise, from limited bandwidth to transition rate and power adaptations, as well as thermal noise.Consequently, signal outliers arise that are not attributable to human actions.To address this problem, the Hampel filter algorithm [50] is applied.This algorithm applies the median and median absolute deviation to identify the outliers' location.The Hampel filter is a robust method for outlier detection that uses a sliding window approach to identify and remove data points that deviate significantly from the median value in that window.

Feature Extraction from CSI Amplitudes Using Discrete Wavelet Transform
The wavelet transform is a powerful technique that allows for the precise localization of events in both time and frequency domains.By utilizing the wavelet transform as a feature extractor, we can achieve higher accuracy in event detection compared to statistical features.The approach is based on the idea of analyzing time series data on multiple scales.With smaller-scale wavelets, the transformed signal will exhibit large peaks during the event, allowing the identification of the occurrence of the event in the time domain [31].
To extract the useful features of CSI amplitudes, we used DWT.The DWT algorithm can be used to decompose CSI into multiple levels of wavelet coefficients, each representing different frequency bands in the signal.The high-frequency wavelet coefficients represent the noise and other high-frequency components of the signal, while the low-frequency coefficients represent the smoother components of the signal.By filtering out the high-frequency coefficients representing noise and high impulses, as shown in Figures 5a and 6a, and retaining the low-frequency coefficients, the denoised signal is obtained in Figures 5b and 6b.

Comparison of Denoised Phase and Amplitudes for Different Activities
We removed the noise from CSI phases using calibration and the outliers and noise from amplitudes using the Hampel filter algorithm, selected the important features, and removed the remaining noise using DWT.Figures 7 and 8 show the comparison of denoised amplitudes and phases, respectively, for different walking directions.For left and right walking directions, both amplitudes and phases have sharp fluctuations for all streams, which indicates the moment when the user crosses the LOS line between the transmitter and the receiver.Moreover, the data presented in the figures demonstrate a noticeable difference in the amplitude and phase patterns produced through different activities.Our observations have led us to believe that CSI data can be utilized to recognize different walking directions.The CSI for the received packet consists of N tx × N rx × N s matrix, where N tx and N rx denote the antenna numbers for the transmitter and receiver, respectively, and N s represents the subcarriers across the OFDM channel.Specifically, with N tx = 2 and N rx = 2, one packet consists of four distinct CSI values or streams, each containing values for 56 subcarriers.We selected only four subcarriers out of the total 56, which significantly reduces the computational complexity.This reduction enables faster and more real-time direction detection in various practical scenarios.We specifically chose subcarriers from different ranges within the 56 subcarriers to ensure that they represent frequencies that are far from each other and react differently to the changes caused by human walking.These data can be represented in the following format: 22 , CSI 2,33 , CSI 1,44 }, CSI 3 = {CSI 3,11 , CSI 3,22 , CSI 3,33 , CSI 1,44 }, CSI 4 = {CSI 4,11 , CSI 4,22 , CSI 4,33 , CSI 1,44 }, where CSI i,j is the amplitude, phase for i-th stream and j-th subcarrier.

Activity Recognition
In this work, we use several machine learning and deep learning techniques to classify the directions of human walking.In particular, we employed four popular classifiers, namely SVM, RF, KNN, and 1D-CNN.SVM is a supervised learning model that identifies patterns in data.To handle non-linear classification problems, input samples are mapped to a high-dimensional feature space using a kernel function, and a maximum-margin hyperplane is then identified in this space.
RF, on the other hand, is an ensemble learning method that enhances classification performance through the fusion of multiple decision trees.Each tree is constructed using a random subset of the training data, and the final classification decision is made based on the majority vote of the trees.
KNN, a simple but effective classification method, relies on measuring the distance between feature values.The closest point in a scale space is then used to classify the new data point.The distance can be computed using various metrics, such as the Euclidean or Manhattan method.
Finally, 1D-CNN is a deep learning model specifically designed for one-dimensional data such as time series or sequences.The 1D-CNN leverages convolutional layers to automatically learn hierarchical representations from the input data, capturing local patterns that are essential for the direction of human walking in our work.
Our dataset is organized as follows.As mentioned before, we collected in total 186 + 182 = 368 samples of four walking directions from nine users in two rooms.In ad-dition, we also recorded 20 samples for the case of no activity, i.e., no people.Therefore, we have 206 samples from the classroom and 202 samples from the meeting room, totaling 408 samples when combined (classroom + meeting room).Moreover, samples are reorganized as follows.As we discussed earlier, each sample includes 100 CSI matrices, i.To train and test our model, we randomly partitioned the dataset: 80% for data training and the remaining 20% for testing.This approach was used to ensure that the model was tested on new data.We used 10-fold cross-validation to determine the best hyperparameters for each algorithm.Specifically, we assessed the performance of RF with a depth of 12, SVM with a linear kernel, and KNN with three neighbors.
The 1D-CNN model consists of two convolutional layers with rectified linear unit (ReLU) activation functions, followed by max-pooling layers to capture and emphasize key features.The flattened output is then connected to three dense layers (of sizes 512, 256, and 128), each activated via ReLU.The final layer utilizes a softmax activation function to produce probabilistic predictions across five different classes corresponding to distinct human walking directions, as shown in Figure 9.The model is trained using the Adam optimizer with a sparse categorical cross entropy loss function over 100 epochs.

Performance Evaluation 4.1. Metrics
After creating and training the model, it is necessary to evaluate how well the model performs.For this purpose, it is necessary to use a test dataset that is different from the data on which the model is trained.The predicted values are then compared with the actual values.For different types of deep learning and machine learning problems, different quality assessment tools need to be used.For classification tasks, for example, the following tools can be used: error matrix, precision, recall, f1 score, and so on.Such tools are called metrics and are selected depending on the specifics of the task [51].To assess the effectiveness of our system, we use several evaluation metrics: confusion matrix, recognition accuracy, precision, recall, and f1 score.The confusion matrix displays the classification results for each walking direction and the corresponding actual walking direction performed by the user.Each cell within the matrix denotes the proportion or percentage of accurately classified activities.However, the recognition accuracy represents the percentage of human walking directions accurately classified by our system.Precision quantifies the ratio of correctly identified positive instances to the overall number of positive classifications generated by the model.In contrast, recall represents the ratio of correctly identified positive instances to the total number of positive instances present in the dataset.The f1 score serves as the harmonic mean of precision and recall, offering a unified metric that strikes a balance between both evaluation measures.Together, these metrics provide a comprehensive evaluation of the performance of our system.

Accuracy =
TP + TN TP + TN + FP + FN × 100% where TP represents true positives, TN represents true negatives, FP represents false positives, and FN represents false negatives.The source code for preprocessing, training, and testing is published on Github [52].A raw dataset that we gathered for this study is available on Figshare [53].

Impact of Environment Change
Figure 10 depicts the comparison of the HWDD accuracy performance of different classifiers.Figure 10a shows the comparison for the case when only CSI amplitudes are used.When trained and tested on combined data from both rooms, the RF, SVM, KNN, and 1D-CNN classifiers have accuracy rates of 78%, 89%, 82.9%, and 84.2%, respectively.In general, SVM is the most consistent and performing model in both environments.Similarly, SVM achieved high accuracy rates of 92.9% and 95.1% in the case of classroom and meeting room data, respectively.This difference in performance is caused by variations in the data collected from each environment.The classroom has more tables and chairs that can reflect signals, making it more challenging to accurately classify the walking direction.In contrast, a meeting room has less furniture and is smaller, allowing the model to perform better.Figure 10c shows the comparison for the case where classifiers are trained and tested with both CSI amplitudes and phases.For both datasets in both rooms combined, SVM performed consistently well compared to RF, KNN, and 1D-CNN, achieving a high accuracy rate of 87.8%.For the classroom and meeting room, SVM achieved accuracies of 92.9% and 95.1%, respectively.However, when we used only CSI phases for training testing, the accuracy results were not comparable to the CSI amplitudes.The one-dimensional CNN achieved the highest accuracy of 66.7%, 87.8%, and 68.3% for the classroom, meeting room, and both rooms, respectively, as shown in Figure 10b.
Overall, the results of SVM are promising for HWDD using CSI amplitudes.We explore the use of CSI amplitudes, CSI phases, and a combination of both in our approach.Our analysis revealed that the CSI amplitudes exhibited the most consistent performance in the face of environmental variations, different users, and classifiers, emphasizing their superiority over other techniques.
Table 2 displays the confusion matrices of the SVM classifier, showcasing its performance in different environments using CSI amplitudes.In the classroom, our approach achieved an average accuracy of 92.9%.It accurately recognized the right and up directions with rates of 88.89% and 77.78%, respectively.Furthermore, it correctly detected all other activities with an accuracy rate of 100%.In the meeting room, our approach achieved an average accuracy of 95.1%.It successfully recognized the left direction with an accuracy rate of 85.71%, and the down and up directions with an accuracy rate over 75%.Similarly, it accurately classified all other activities with an accuracy rate of 100%.Finally, in both rooms, our approach achieved an average accuracy of 89%.It accurately recognized the down, left, right, and up directions with an accuracy rate greater than 84%.The empty activity was also identified correctly, with an accuracy rate of 100%.In Figure 11, the accuracy of the 1D-CNN classifier is presented for training and testing data using CSI amplitudes over 100 epochs.In the classroom, our model achieved accuracy rates of 100% and 88.1% for training and testing data, respectively, as shown in Figure 11a.In the meeting room, the 1D-CNN classifier performed well compared to RF, SVM, and KNN, achieving high accuracy rates of 100% for both training and testing data, as illustrated in Figure 11b.For the combined data from both rooms, our model achieved accuracy rates of 100% and 84.2% for training and testing data, respectively, as depicted in Figure 11c.Overall, the model performed well in small spaces.However, its performance in large and diverse rooms was not as consistent as that of the SVM classifier.To enhance the reliability and robustness of deep learning algorithms, it is suggested to collect more samples.Table 3 shows the comparison of the precision, recall, and f1 score for different environments.The SVM achieved the best performance using CSI amplitudes.The results showed that the system achieved high precision and a f1 greater than 80% for all activities.This indicates that the system has a high level of accuracy in identifying each activity, with an extremely low rate of false positives.The system also achieved a recall rate greater than 85% for all activities, except for the up activity, which had a recall rate of 78%.These results suggest that the system can accurately detect and classify human activities with a high level of consistency.

Impact of Individual Diversity
Figure 12 depicts the accuracy performance for an increasing number of individuals.To evaluate the consistency of our approach among diverse users, we enlisted the participation of nine volunteers in the experiment.The selected volunteers have different ages, heights, weights, and genders.Our proposed method utilizing CSI amplitudes yielded exceptional results in the meeting room, as depicted in Figure 12b.SVM and KNN achieved 100% accuracy across all cases except for one where the accuracy rate was 87.5%.Additionally, 1D-CNN also produced the highest possible accuracy rates in most cases except for two cases where accuracy was 88.9% and 75%, respectively.
According to Figure 12a, in the classroom, the 1D-CNN achieved an average accuracy of over 84%, while the SVM and KNN classifiers achieved average accuracy rates of over 77% and 72%, respectively.In contrast, the accuracy of the RF is not comparable to that of other classifiers for both environments.These findings highlight the importance of using 1D-CNN and SVM classifiers with CSI amplitudes to improve accuracy, especially for different users and environments.One-dimensional CNN and SVM outperformed RF and KNN when training each user individually because they are skilled at picking up detailed patterns from CSI amplitudes.This allows them to create more accurate models for each user's unique wireless signal variations, resulting in better performance in distinguishing activities.

Impact of Increasing the Number of People
The findings of this study, as illustrated in Figure 13, indicate that our proposed method, which utilizes CSI amplitudes and the SVM classifier, yields high accuracy in both environments as the number of participants increases from one to nine.We collected data from both rooms for each participant, where the number "1" refers to the data of both rooms for one user, "2" represents the data of both rooms for two users, and so on.As the number of participants increases, the accuracy of the SVM classifier also increases.On the contrary, the accuracy of the RF, KNN, and 1D-CNN classifiers deteriorates.

Impact of Training Size
Our proposed method has demonstrated the ability to achieve comparable results using a limited number of training samples, making it a more practical option for real-world applications.Using the CSI amplitudes and the SVM classifier in the combined data of both rooms, our approach was able to achieve high accuracy, even with varying sizes of training data for both environments, as depicted in Figure 14.The SVM classifier yielded similar results, with slight differences observed when different training sizes were used.SVM consistently performs well with varying amounts of training data due to its capability to construct an optimal hyperplane for classification.This is especially effective in highdimensional spaces like those formed from CSI amplitudes, contributing to its stability across diverse training set sizes.These results indicate the robustness of our method across different data testing sizes.To further improve the performance and effectiveness of our approach, we can collect more samples to increase the size of the data training in the future.

Robustness
The results of the experiment demonstrate that the accuracy remains consistent with the increasing number of volunteers when using the SVM classifier and the CSI amplitudes in both rooms in Figure 13.Furthermore, when using different training sizes, the accuracy remained comparable in both rooms, according to Figure 14.The recognition accuracy achieved was 92.9%, 95.1%, and 89% in the classroom, meeting room, and both classroom and meeting room, respectively, as shown in Figure 10a.These results demonstrate that our system is robust to different individuals, environments, group sizes, and training sizes.
The performance of our approach by utilizing CSI amplitudes and SVM classifier in two indoor environments was evaluated and compared with two other indoor recognition systems, namely WiFi-ID [54] and WiDIGR [43], in two common indoor scenarios.Figure 15 shows the performance of each system, from the smallest to the largest group size of volunteers.The results indicate that the average accuracy of all systems decreases as the group size increases.In particular, our approach outperformed the other two systems, which rely solely on feature extraction and cannot accurately reflect movement information.Therefore, the reason our approach performed significantly better than the other systems is that those systems were not able to obtain precise movement information.

Conclusions and Future Work
In this paper, we introduced a device-free method that can precisely identify the direction of human walk using the CSI of Wi-Fi signals and machine and deep learning algorithms.Raw CSI signals are first calibrated and effectively denoised using Hampel filter and DWT algorithms.We conducted extensive experiments in two indoor environments.Our system, using SVM and CSI amplitudes, achieved recognition accuracy rates of 92.9%, 95.1%, and 89% in the classroom, meeting room, and both rooms, respectively.Additionally, our system, employing 1D-CNN and CSI amplitudes, demonstrated recognition accuracy rates of 88.1%, 100%, and 84.2% in the classroom, meeting room, and both rooms, respectively.
Our experiments consistently proved the robustness of our system in various scenarios.The accuracy remained stable even with an increasing number of volunteers and different training sizes, different individuals, environments, group sizes, and training data variations.Our approach demonstrates versatility in its applicability in various scenarios.It can be effectively employed to track the walking directions of customers in retail stores.Furthermore, in security systems, it proves valuable for monitoring human walking directions and detecting unauthorized access in restricted areas.However, our study has limitations, including a small dataset that may impact the accuracy of deep learning algorithms.Volunteers walking in straight lines limited the diversity of observed environmental conditions, although this assumption does not significantly impact system usability in common settings like homes or offices, where straight walkways are prevalent.Additionally, the study lacks samples collected in the presence of multiple individuals.
Our future research work will focus on improving the accuracy of human walking direction detection in the presence of multiple individuals and in the collection of more samples.We aim to enhance the efficiency and accuracy of activity recognition in complex and dynamic indoor environments.

Figure 5 .Figure 6 .
Figure 5. Outlier removal and noise reduction based on Hampel filter algorithm and DWT for 30 subcarriers.Different colors represent different subcarriers.

Figure 7 .
Figure 7. Denoised amplitudes of four walking directions for one subcarrier.
e., one CSI matrix per packet.An original CSI matrix includes 56 × 4 = 224 CSI values; however, we further use CSIs of only four subcarriers, and thus now, each matrix includes 4 × 4 = 16 CSI values.

Figure 8 .
Figure 8. Calibrated phases of four different walking directions for one subcarrier.

Figure 12 .
Figure 12.Accuracy for increasing the number of individuals in two indoor environments using CSI amplitudes.

Figure 13 .
Figure 13.Accuracy of increasing the number of people in two indoor environments using amplitudes.

Figure 14 .
Figure 14.Impact of different training size using amplitudes.

Figure 15 .
Figure 15.Comparison with the baseline recognition systems.

Table 2 .
Confusion matrix of SVM classifier using CSI amplitudes.

Table 3 .
Comparison between different metrics and environments using CSI amplitudes.

Table A1 .
Comparison between Wi-Fi-based system of human gesture detection.

Table A2 .
Comparison between Wi-Fi-based system of human activity detection.

Table A3 .
Comparison between Wi-Fi-based system of human movement detection.

Table A4 .
Comparison between Wi-Fi-based system of human walking direction detection.