1. Introduction
Gait, as a significant biological characteristic for humans, can serve as an early indicator of underlying health issues. Deviations from normal gait patterns may signal conditions stemming from brain function deterioration, neurological disorders, or musculoskeletal problems, such as Parkinson’s disease [
1]. Abnormal gait recognition involves the detailed analysis of biomechanical features and movement patterns during walking to identify irregularities [
2]. Accurate identification of abnormal gait patterns not only facilitates prompt and targeted medical interventions but also significantly enhances treatment efficacy, potentially leading to better patient outcomes and quality of life [
3].
Recently, human activities analysis based on Channel State Information (CSI) has gained much attention due to advantages of its noninvasive nature, ubiquity, better coverage, cost efficiency, and privacy protection [
4,
5,
6,
7,
8,
9]. The studies by [
4,
5] focused on identity recognition using gait features, specifically aiming to identify or verify an individual’s identity based on their unique walking patterns. In our previous work [
6], a deep learning architecture based on CNN and Bidirectional Long Short-Term Memory (BiLSTM) was proposed for recognizing complex continuous human activities. The results demonstrate that the proposed network achieves high accuracy in recognizing human movements involving rapid and drastic actions. In Ref. [
7], a passenger-counting system based on Wi-Fi sensing is proposed and validated through practical deployment on buses, demonstrating its potential for real-time application scenarios. The study presented in [
8] proposes a fine-grained finger gesture recognition system using commercial Wi-Fi. This system leverages the principal components of CSI and selects critical subcarriers for accurate gesture recognition. The extraction of principal components enables the system to adapt to individual diversity and gesture inconsistency.
These pioneering studies on Wi-Fi sensing highlight the effectiveness of wireless signals in recognizing human activities and identifying individuals. They have further inspired the work of [
10], which focused on distinguishing between normal and abnormal gait patterns. Such a work belongs to a coarse-grained binary classification problem. However, in practical applications, it is often necessary to identify specific types of abnormal gaits, requiring fine-grained classification of various abnormal gait types [
11]. To the best of our knowledge, research on multi-class fine-grained abnormal gait recognition remains scarce. This gap motivates our study, which aims to develop a fine-grained gait recognition system capable of distinguishing multiple types of abnormal gaits.
In this work, we propose a deep learning architecture fine-grained abnormal gait recognition from Wi-Fi CSI. The proposed framework comprises three key modules, i.e., data collection, data preprocessing, and deep learning-based classification. In the data collection phase, we construct a Wi-Fi sensing platform using two commercial Intel 5300 network interface cards (NICs), one for transmitting and the other for receiving, allowing CSI data collection and dataset building. In terms of data preprocessing, we apply wavelet filtering and linear calibration to reduce the noise and nonlinear distortion in the amplitude and phase of CSI, respectively. We construct a deep learning classification module based on CNN-BiGRU with attention mechanism for gait recognition from the processed CSI data. Here, CNN is used to extract spatial features of the motion, while BiGRU is employed to learn bidirectional temporal features of the motion’s past and future. Compared to traditional recurrent network structures such as LSTM and GRU, BiGRU utilizes a bidirectional feature extraction structure (i.e., considering both past and future information) to process temporal information and capture the correlation and dependence of sequential data before and after. To verify the impact of different environments on the recognition performance of the proposed method, experiments were conducted under various conditions in different locations, demonstrating that this method can achieve high-precision recognition of abnormal gaits with an average recognition accuracy exceeding 95%. Compared with baseline methods, the proposed approach achieves at least a 2% improvement in recognition accuracy.
The contributions of this paper are summarized as follows:
- (1)
We investigate a fine-grained abnormal gait recognition method using Wi-Fi CSI. Our goal is to identify seven distinct gait classes, including six abnormal and one normal gait. This work focuses on multi-class classification, an area that has not been extensively explored in the context of Wi-Fi sensing.
- (2)
We propose a novel deep learning architecture for fine-grained abnormal gait recognition, combining CNN, BiGRU, and an attention mechanism. This architecture captures both spatial and temporal features of CSI data through CNN and BiGRU, respectively, addressing the limitations of relying on a single feature extraction method. The attention mechanism is incorporated to enhance feature focus, further improving overall performance.
- (3)
Unlike traditional designs that only consider amplitude, this paper comprehensively takes into account both amplitude and phase information. Experiments demonstrate that phase information improves the recognition performance for gait.
2. System Model
Figure 1 illustrates the overall structure of the Wi-Fi perception system, which comprises three core modules: data acquisition, data preprocessing, and activity classification. Specifically, the data acquisition module captures raw CSI data using a network interface card. Subsequently, the data preprocessing module processes these raw data, including noise reduction and calibration of amplitude attenuation and phase shifts, and distinguishes between active and inactive regions of the CSI data based on amplitude variance. Finally, the activity classification module employs a classifier built using a neural network, which receives the denoised and calibrated CSI data from the active regions, automatically extracts spatial and temporal features of the CSI, and performs classification.
2.1. Data Collection Module
We construct a data collection platform based on two personal computers (PCs), each equipped with an Intel 5300 NIC, as shown by
Figure 2. It is noted that we did not using the commercial Wi-Fi router as the transmitter, since we found that it sometimes lost data packages. In order to make our data collecting high quality, we designed our specialized transceiver using the two NICs. At the transmit side, we used only one out of the three antennas of the NIC, and left the other two unused. At the receive side, we used all three antennas of the receiving NIC. In this way, a
single-input multiple-output (SIMO) Wi-Fi wireless transceiver was constructed, and we utilized the famous CSI Tool, which was proposed by [
12], to parse and obtain three channels of CSI. Most existing commercial Wi-Fi technologies employ the IEEE 802.11 a/g/n wireless communication protocols, with their core using orthogonal frequency division multiplexing (OFDM) to modulate signals onto multiple subcarriers for parallel transmission.
The essence of OFDM is to convert a broadband channel into multiple parallel narrowband channels, where the channel on each subcarrier can be regarded as a flat-fading channel, thereby significantly reducing the complexity of the receiver equalizer [
13]. The baseband signal after down-conversion in OFDM at the
m-th receive antenna can be expressed as
where
denotes the index of the receive antennas,
is the CSI matrix at the
m-th antenna,
is the received baseband signal matrices,
represents the noise matrix during the transmission process,
denotes the transmitted data at the transmit side.
K and
T denote the number of subcarriers and time slots, respectively.
The entries of the OFDM CSI matrix are dependent on the wireless signal propagation environment. Factors like path loss, reflection, scattering, and refraction affect the CSI of OFDM subcarriers. Moving objects in the physical space dynamically impact the time–frequency characteristics of CSI. To show this intuitively, examples of three-dimensional (3D) plots of the collected CSI amplitude are shown in
Figure 3, with the packet index representing the time domain and the subcarrier index representing the frequency domain. The objective of Wi-Fi gate recognition is to analyze the time and frequency characteristics of the CSI data to identify types of human activity.
2.2. Data Preprocessing Module
2.2.1. Amplitude Processing
To handle outliers and noise in the CSI data, we first apply a Hampel filter with a sliding window to detect and replace outliers caused by environmental interference or equipment anomalies. Data points outside the
range are identified as outliers and replaced with the median
of the window, where
is typically set to 3 [
14]. Next, we use wavelet transform to reduce noise [
15]. This involves decomposing the signal, applying a threshold to quantize coefficients, and reconstructing the signal to obtain a denoised version while preserving important features of the useful signal.
Figure 4 shows the effect of outlier removal using the Hampel filter and noise reduction using wavelet transform (illustrated with an example of Scissors Gait data).
2.2.2. Phase Calibration
The phase information extracted from the original CSI data contains carrier frequency offset (CFO) and sample frequency offset (SFO), which makes it unusable directly. Therefore, a linear transformation method is utilized to calibrate the phase information [
16]. The original phase on the
i-th subcarrier obtained after calibration is denoted as
In this context,
represents the original phase,
represents the true phase,
is the time offset caused by the SFO (sample frequency offset),
is the unknown phase offset caused by CFO (carrier frequency offset),
z is measurement noise,
k denotes the index of the
i-th subcarrier, and
N is the length of the Fast Fourier Transform (in IEEE 802.11n,
N = 64). Next, by subtracting the linear term
from the original phase,
and
can be eliminated, resulting in the calibrated phase. Here, the linear term is defined as
After calibration using the linear transformation, the phase can be expressed as:
The phase comparison before and after data preprocessing, as shown in
Figure 5 (using Scissors Gait as an example), renders the phase a detectable signal.
2.2.3. Activity Segmentation
Due to the presence of static information, namely the inactive parts, in CSI data, feeding this portion of the data into a neural network would increase the complexity of the algorithm. Therefore, it is essential to effectively distinguish between the active and inactive parts of the data, discard the inactive parts, and use the amplitude and phase information of the active parts as input to the neural network. Complete active data are also key to improving the classification accuracy of the neural network. For complex and vigorous continuous activities, the variance of the active part data is much greater than that of the inactive part data. Hence, based on this phenomenon, an activity threshold is preset. Additionally, due to the sensitivity of CSI, there may be brief fluctuations in the inactive parts that could be mistakenly classified as active parts. To obtain more complete and accurate active data, a window threshold is introduced, aiming to eliminate the erroneous classifications caused by these fluctuations. The specific steps of the dual-threshold-based activity segmentation method proposed in this paper are as follows.
Step 1: Apply PCA to the matrix composed of amplitudes, automatically select the principal components that represent the most common variations in the CSI time series, and obtain the principal component matrix, which reflects the variations in subcarrier amplitudes.
Step 2: Perform activity segmentation using the first principal component. By applying a sliding window approach, calculate the variance of the data points within the window and return the data sequence composed of these variances. This results in the moving variance of the first principal component, which is used as an indicator for activity segmentation.
Step 3: Given an activity threshold , activity is deemed to start when the variance of the first principal component exceeds the threshold , and activity is deemed to end when the variance of the first principal component falls below the threshold . Sample points with variances greater than are marked as the active portions of the CSI data.
Step 4: By introducing the window threshold , we once again label sample points with a window size (i.e., packet index) smaller than as inactive data, thereby obtaining the final labeled data.
Figure 6 illustrates this effect, where the dashed-line boxes roughly outline the indexed segments marked as active, while solid circles represent brief fluctuations.
2.3. Gait Recognition Module
The preprocessed CSI data are then sent to the gait classification module, where the CSI contains not only the spatial features of actions but also their temporal features. LSTM and GRU are capable of learning dependencies and correlations between long sequences of information, capturing historical information and significant events with large intervals or delays. However, LSTM and GRU networks, which possess temporal modeling capabilities, do not consider the extraction of spatial features of actions. CNN, characterized by local connections and weight sharing, possesses powerful feature extraction capabilities but neglects the correlation between temporal information. Furthermore, due to their structural characteristic of transmitting temporal information in a single direction, LSTM and GRU can only consider past temporal information of actions, neglecting the learning of patterns from future information. BiGRU can extract temporal features from both past and future directions, but it assigns equal weights to the features of all CSI, whereas different features may contribute differently to the recognition of abnormal gait actions. Therefore, activity recognition systems in existing work have not fully exploited the spatiotemporal features of actions, leading to suboptimal recognition accuracy. To address this, this paper proposes a novel Wi-Fi-based abnormal gait perception framework that integrates CNN-BiGRU with an attention mechanism.
The model architecture is illustrated in
Figure 7, where the input signal is a two-dimensional matrix obtained by stacking and expanding the amplitude and phase components of the CSI matrix
,
, which corresponds to the three antennas of the Wi-Fi network card.
In this context, and represent the amplitude matrix and phase matrix, respectively, obtained by extracting the amplitude and phase of each element in the matrix . Here, M = 3 indicates that there are data from a total of three antennas.
Neural networks require consistent input data dimensions. However, due to the varying durations of each action, the lengths of their data packets differ. Additionally, increasing the input data dimension will also increase the time complexity of the algorithm. Therefore, to ensure consistent input data dimensions and reduce the complexity of the algorithm, the designed network applies a sliding window at the input layer to segment the two-dimensional matrix along the time series direction. Segments with less than 60% of labeled active sample points are discarded to remove inactive data from the CSI, obtaining data segments of the same dimension. The retained segmented data segments serve as the final input to the network. The input data undergo feature extraction through two branches, with feature fusion serving as the basis for the final classification. The first branch is built on a one-dimensional CNN to extract features in the spatial dimension of gait movements. The second branch is built on GRU and BiGRU to extract features in the temporal dimension. The extracted features from both the spatial and temporal dimensions are integrated and used as the final basis for classification, with the softmax function employed to classify the actions.
2.3.1. GRU and BiGRU
RNN (Recurrent Neural Network) has the ability of short-term memory and has significant advantages in dealing with short-term time series problems. However, when dealing with time series of high dimensionality, the issue of vanishing gradients may arise. The subsequent proposals of LSTM and GRU have improved this issue. GRU is an advanced variant of LSTM. Compared to LSTM, it simplifies the gating mechanism and does not introduce additional memory units. It controls the updating of information only through the update gate and reset gate. The GRU structure is shown in
Figure 8a, which includes three parameters: the update gate
, the reset gate
, and the hidden state
. These parameters are updated through Equations (6)–(9).
where
,
, and
are weight matrices,
represents the temporal information at time t,
denotes the hidden state at time (
), and
is the sigmoid activation function.
Since the collected gait movements are continuous actions, both past and future information are equally important for action recognition. BiGRU can extract temporal features from both past and future directions. Therefore, BiGRU is selected to learn the bidirectional patterns of motion features in order to extract more comprehensive features. The bidirectional GRU structure is shown in
Figure 8b. BiGRU consists of forward and backward GRUs, and the final state
is jointly determined by the hidden states of both forward and backward GRUs. This state is then taken as the output of BiGRU.
2.3.2. Attention Mechanism
The CNN-BiGRU model architecture proposed above can effectively classify actions with significantly different gait patterns. However, for actions with subtle differences in gait, such as Parkinsonian and myopathic gaits, which both exhibit a forward-leaning posture but differ in hand and foot movements, how can we focus on these fine-grained distinctions? To address this, our model architecture incorporates an attention mechanism.
The attention mechanism was initially designed for machine translation and has since been widely applied in the fields of image processing and natural language processing. This concept can be intuitively explained through the analogy of human visual perception: when a person visually perceives objects, they typically focus on specific regions of interest based on their needs. In this way, when similar scenarios reappear in the future, the individual will learn to direct their attention to those relevant areas [
17]. BiGRU assigns equal weights to all features of CSI, whereas different features may contribute differently to gait pattern recognition. For instance, both Parkinsonian and myopathic gaits exhibit forward-leaning postures, but their distinct hand and foot movements—one with knee and upper limb flexion, the other with uncoordinated limb movements—significantly impact CSI, as illustrated in
Figure 9. These differences necessitate greater focus on variations in hand and foot movement-related CSI. Therefore, implementing an attention mechanism allows higher weights to be assigned to more critical features, enhancing the influence of key information, and thereby, improving the network’s recognition performance.
The attention mechanism is illustrated in
Figure 10. The input to the attention model is the sequence features learned from the BiGRU network, denoted as
, where
. The importance score
for each feature vector is calculated using the tanh function, expressed as:
where
is the weight vector and
b is the bias. Subsequently, the scores are normalized using
. Finally, the product of the feature vectors and the normalized scores is taken as the final output of the attention mechanism, expressed as:
4. Conclusions
This paper proposes a fine-grained abnormal gait recognition method that integrates an attention mechanism with CNN-BiGRU. The proposed approach effectively extracts rich spatiotemporal features from abnormal gait movements, enabling high-precision recognition of complex and continuous abnormal gaits. Experimental results demonstrate that the method achieves an average recognition accuracy exceeding 95% in ideal office and laboratory scenarios. However, its accuracy slightly decreases to 92.5% in real-world corridor environments, with most misclassifications occurring between gait patterns containing similar motion components.
Future work will focus on the following directions. The current study has only validated seven abnormal gait types in controlled environments, and its applicability to real clinical scenarios remains uncertain. Subsequent research should employ transfer learning techniques to address cross-domain generalization challenges. Additionally, model training and inference speeds will be optimized while maintaining recognition accuracy. Under the premise of preserving data fidelity, we will explore the use of GANs to enhance model generalization capability and improve classification accuracy. Furthermore, we aim to extend this research to multi-person abnormal gait recognition and advance the practical implementation of Wi-Fi sensing technologies.