Next Article in Journal
VimGeo: An Efficient Visual Model for Cross-View Geo-Localization
Previous Article in Journal
Research on Low-Altitude UAV Target Tracking Method Based on ISAC
Previous Article in Special Issue
MMFA: Masked Multi-Layer Feature Aggregation for Speaker Verification Using WavLM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Analysis of Sensor Location and Time–Frequency Feature Contributions in IMU-Based Gait Identity Recognition

1
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
School of Electronics and Information Engineering, South China Normal University, Foshan 528225, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2025, 14(19), 3905; https://doi.org/10.3390/electronics14193905
Submission received: 23 August 2025 / Revised: 26 September 2025 / Accepted: 28 September 2025 / Published: 30 September 2025

Abstract

Inertial measurement unit (IMU)-based gait biometrics have attracted increasing attention for unobtrusive identity recognition. While recent studies often fuse signals from multiple sensor positions and time–frequency features, the actual contribution of each sensor location and signal modality remains insufficiently explored. In this work, we present a comprehensive quantitative analysis of the role of different IMU placements and feature domains in gait-based identity recognition. IMU data were collected from three body positions (shank, waist, and wrist) and processed to extract both time-domain and frequency-domain features. An attention-gated fusion network was employed to weight each signal branch adaptively, enabling interpretable assessment of their discriminative power. Experimental results show that shank IMU dominates recognition accuracy, while waist and wrist sensors primarily provide auxiliary information. Similarly, the contribution of time-domain features to classification performance is the greatest, while frequency-domain features offer complementary robustness. These findings illustrate the importance of sensor and feature selection in designing efficient, scalable IMU-based identity recognition systems for wearable applications.

1. Introduction

With the rapid development of the Internet of Things (IoT) and Wearable Devices, the demand for identity recognition and behavioral monitoring in daily life and work scenarios is increasing [1]. In order to solve these needs, biometrics technology has emerged, which is mainly divided into physiological biometrics and behavioral pattern recognition [2]. Traditional physiological biometrics can provide a certain degree of security and accuracy, such as facial features [3], fingerprints [4], and iris [5]. However, these methods usually require active user cooperation or dedicated acquisition devices. These methods face significant limitations in scenarios involving continuous and unobtrusive applications [6]. Therefore, researchers began to pay attention to identity recognition methods based on behavioral biometrics. These approaches leverage wearable sensors to capture users’ natural movement patterns during daily activities, enabling feature analysis for more convenient identity recognition without requiring users to participate or perform explicit recognition actions actively.
Natural human movements such as gait, writing, and gestures contain rich and unique individual patterns, and can achieve identity recognition without users needing to actively cooperate. Among various behavioral biometric features, gait-based identity recognition has attracted much attention due to its non-intrusiveness and cross-scenario stability. Wearable inertial measurement units (IMUs) have become important tools in gait biometrics [7,8]. By placing IMUs at different body locations, it is possible to capture distinctive gait dynamics that reflect individual differences in movement patterns, joint range of motion, and coordination [9,10]. These characteristics make IMU-based gait biometrics a promising solution for identity recognition in smart healthcare, access control, and human–computer interaction.
Recent research has increasingly explored the use of multi-position and multi-modal IMU data. Integrating signals from different body locations and both time and frequency domains can further enhance recognition accuracy and robustness. However, in the existing studies of gait-based identity recognition, the relative importance of different sensor positions and feature domains remains underexplored. Most existing studies tend to integrate all available signals indiscriminately, without quantitatively analyzing which sensor location or feature modality contributes most to identification accuracy, or whether certain signals are redundant. In practice, such an approach may introduce unnecessary complexity and computational cost, which is undesirable for deployment in resource-constrained wearable scenarios.
In this work, we proposed a time–frequency attention-gated fusion network (TFAGNet) and conducted a comprehensive analysis of sensor position and modality contributions to IMU-based identity recognition. We quantitatively investigate the discriminative power of signals from the shank, waist, and wrist, as well as the complementary value of time-domain and frequency-domain features. Our experimental results demonstrated that the IMUs of the lower limb (shank) dominated the identity recognition task, while other sensor positions mainly played the role of redundancy or auxiliary information. Furthermore, although frequency-domain features receive lower attention weights, ablation studies demonstrate their role in enhancing model robustness. These findings suggest that efficient and adaptive sensor selection is critical for practical IMU-based identity recognition systems.
The main contributions of this work are summarized as follows:
  • We proposed a time–frequency architecture that integrates time-domain and frequency-domain features from multiple IMU sensors, allowing the model to capture both short-term motion dynamics and periodic motion patterns for more accurate identity recognition.
  • We conducted a quantitative analysis of the contributions of different sensor positions and signal modalities (time-domain and frequency-domain features) in multi-IMU identity recognition.
  • We testified that lower-limb (shank) IMUs and time-domain features play dominant roles in identification performance, while signals from other positions and frequency-domain features mainly serve as auxiliary or redundant information that enhances system robustness and generalization.
The rest of this paper is structured as follows: Section 2 reviews related work on IMU-based gait identity recognition, time–frequency feature extraction, and multi-sensor position fusion. Section 3 provides a detailed description of the proposed TFAGNet architecture. Section 4 describes the dataset collection and preprocessing procedures. Section 5 details the experimental setup, performance evaluation, and analysis of modality and position contributions. Finally, Section 6 concludes the paper and discusses directions for future work.

2. Related Work

2.1. IMU-Based Gait and Identity Recognition

IMU has been widely applied in user identity recognition and gait analysis due to its portability, low cost, and ability to capture rich motion dynamics. Early studies were mainly divided into two categories: signal matching methods and machine learning-based methods [11]. Signal matching methods identify users by directly comparing gait signals or their derived features, using similarity measures such as Dynamic Time Warping (DTW) or correlation coefficients. For example, Derawi et al. [12] employed DTW to align and compare segmented gait cycles, thereby enhancing their ability to resist interference from speed variations. Mäntyjärvi et al. [13] proposed a correlation-based signal matching method that identifies users by performing normalized cross-correlation and frequency domain similarity measurement on gait segments. These methods are intuitive and have significant effects in time pattern alignment, but they may be susceptible to signal length mismatch and noise.
As for machine learning-based methods, such as Support Vector Machines (SVM) [14,15], k-Nearest Neighbors (kNN) [16], Random Forest (RF) [17], and Hidden Markov Models (HMM) [18], which utilize features extracted from raw IMU signals and treat each segment as an input sample for classification, although these approaches achieved promising results, their performance was often limited by individual differences, sensor positions, and changing motion conditions. With the development of wearable devices, recent research has gradually shifted towards more robust and generalizable frameworks for IMU-based identity recognition.

2.2. Time–Frequency Feature Extraction in Wearable Biometrics

Time-domain and frequency-domain feature engineering has long been a focus in sensor-based biometric recognition. Gait recognition systems based on accelerometer and gyroscope signals have extracted time-domain features to capture short-term motion dynamics, and frequency-domain features to reveal periodicity and energy distribution [19,20,21]. Recent studies have explored more advanced frameworks for feature selection and representation. Hu et al. [22] proposed a two-stage feature extraction framework for continuous smartphone authentication, first computing time-domain motion descriptors and frequency-domain spectral features from multisensor signals, and then selecting the most discriminative ones to enhance recognition performance. Middya et al. [23] transformed accelerometer signals into spectrotemporal images representations, integrating time-domain dynamics with frequency-domain periodicity for deep learning-based user recognition.
These studies have demonstrated that combining time-domain and frequency-domain features can improve the robustness and accuracy of gait-based identification. However, the relative importance of these two domains remains insufficiently explored, especially in deep learning frameworks.

2.3. Multi-Sensor Position Fusion for Wearable Identity Recognition

Leveraging multiple IMU sensors placed at different body locations is considered an effective way to enhance identity recognition performance and system robustness. Previous studies have shown that certain sensor locations, particularly the lower limb, tend to capture more discriminative gait patterns, while other positions provide redundant or auxiliary information. For example, Chen et al. [24] proposed a deep learning framework that employs parallel convolutional and bi-directional LSTM encoders to extract temporal features from each sensor location, followed by feature-level fusion to integrate complementary information across positions for user identity recognition. Liu et al. [1] employed phase space reconstruction (PSR) and topological data analysis (TDA) on multi-location IMU data, showing that fusing signals from multiple body locations outperformed single-sensor setups in gait-based identity recognition across various surfaces. CNN-based multi-sensor fusion approach in [25] compared feature-level and decision-level fusion using waist, wrists, and ankles, confirming that multi-position fusion consistently outperforms single-sensor configurations.
Despite the fusion methods having improved the performance, most existing works either fuse all available signals indiscriminately or assign fixed fusion weights, without quantifying the individual contribution or redundancy of each sensor. There is still a lack of comprehensive analysis on which sensor positions are most informative and how multi-position fusion impacts recognition accuracy and efficiency.

2.4. Deep Learning and Attention-Based Fusion in Wearable Sensing

The application of deep learning in wearable sensor-based identity recognition has significantly improved the extraction of complex, nonlinear features from multi-dimensional motion data. Convolutional neural networks (CNNs) are widely applied to automatically learn local spatial-temporal patterns from raw IMU signals, achieving robust performance without handcrafted features [26,27,28]. To further improve discriminative ability, Liu et al. [29] proposed a multi-branch CNN that has been used to separately learn from positive and negative vibration directions. Moreover, lightweight and structurally optimized CNNs have been explored for real-time and low-power wearable applications [30]. Recurrent neural networks (RNNs), especially long short-term memory (LSTM) models, are effective in capturing long-range temporal dependencies in sequential motion data [31]. Hybrid architectures like CNN-LSTM combine convolutional feature extraction with temporal modeling to exploit both spatial and sequential patterns and have achieved promising results in various wearable sensor-based recognition tasks [32,33].
Recent works have introduced attention-based modules to dynamically weight different sensor streams or feature modalities, enhancing the interpretability and adaptiveness of fusion strategies. For example, Zhao et al. [34] proposed a gated two-tower transformer for smartphone authentication, using a gating mechanism to adaptively balance complementary feature branches. Similarly, Guo et al. [35] designed a dynamic weighting behavioral transformer that integrates keystroke dynamics with IMU data and applies two multi-head attention modules in parallel, followed by an adaptive weighting generator to modulate the contribution of each modality in the fused representation. Nguyen et al. [36] introduced a Spatio-Temporal Dual-Attention Transformer (STDAT) that simultaneously focuses on both time and channel dimensions to effectively fuse multi-sensor data. Lu et al. [37] developed GaitFormer, a two-stream Transformer that processes temporal and channel information in parallel using an autocorrelation attention mechanism. Yi et al. [38] employed an enhanced Horizontal Spatial Attention Mechanism (HSAM) to assign importance scores to different channels from a multimodal sensing insole, thereby optimizing the sensor configuration. Huan et al. [39] presented a novel Channel Attention Weight Redistribution (CAWR) mechanism to intelligently re-evaluate and assign weights to different sensor channels, improving feature extraction for gait signals. Nevertheless, many studies still overlook the explicit quantification of each modality’s contribution and rarely analyze how dynamic attention weighting relates to real-world deployment needs, such as computational efficiency.
In summary, while previous research has made substantial progress in IMU-based identity recognition, time–frequency feature engineering, multi-sensor position fusion, and deep learning-based adaptive integration, there remains a lack of comprehensive and quantitative analysis of the actual contribution of each sensor location and feature modality. Addressing these challenges is crucial for designing efficient, robust, and interpretable wearable authentication systems.

3. Methods

Previous studies on IMU-based gait identity recognition predominantly focused on time-domain feature extraction. While time-domain signals effectively capture short-term dynamics, they may ignore periodicity and frequency features cues that are highly discriminative for identity. Moreover, the relative contributions of sensor locations and time–frequency modalities have seldom been quantified under a unified framework.
To address this limitation, we proposed a gait recognition framework TFAGNet, which integrates time-domain and frequency-domain representations from three IMU locations. The model architecture is shown in Figure 1, with the pseudo-code description of the model calculation process is shown in Algorithm 1. The overall architecture consists of three main components:
(1)
Six parallel sub-networks from three positions (shank, waist, wrist) in the two modalities (time and frequency), producing comparable 512-dimensional embeddings;
(2)
A multi-head attention-gated fusion module that yields a fused representation and per-branch importance scores;
(3)
A lightweight classifier for identity recognition.
Algorithm 1 TFAGNet for identity identification.
Input: 
The input signal and the hidden layer zero are composed of the time-series signals of each body part and the absolute value of its fast Fourier-transformed (FFT) spectrum. Six input tensors X = H 0 = [ S s h a n k t i m e , S s h a n k f r e q , S w a i s t t i m e , S w a i s t f r e q , S w r i s t t i m e , S w r i s t f r e q ] , each tensor has the dimension R ( B a t c h , C h a n n e l s , L ) , where B a t c h is batch size, C h a n n e l s = 8 , L = 6000 is sequence length.
Output: 
Identity recognition result y and the attention weight A of the multi-head attention fusion module.
 1:
for  S u b N e t : n e t [ n e t s h a n k t i m e , n e t s h a n k f r e q , n e t w a i s t t i m e , n e t w a i s t f r e q , n e t w r i s t t i m e , n e t w r i s t f r e q ]  do
 2:
    for  l a y e r : l [ 1 , . . . , N ]  do
 3:
         H n e t l = M a x P o o l ( R e L U ( B N ( T B C ( H n e t l 1 , W l ) ) ) )
 4:
         T B C ( X , W ) = X 1 × W , , X B × W
 5:
        The TBC shared filters W among all the B groups.
 6:
    end for
 7:
     f n e t = 1 L t = 1 L H N [ t ] //AvgPool, H N R ( B a t c h , C h a n n e l , L )
 8:
end for
 9:
F = S t a c k ( [ f n e t s h a n k t i m e , f n e t s h a n k f r e q , f n e t w a i s t t i m e , f n e t w a i s t f r e q , f n e t w r i s t t i m e , f n e t w r i s t f r e q ] )
10:
Q = M e a n P o o l ( F ) × W Q + b Q //Query
11:
K = F × W K + b K //Key
12:
V = F × W V + b V //Value
13:
A t t e n t i o n : A = s o f t m a x ( Q K T d k )
14:
F f u s e d = A × V × W f u s e d + b f u s e d
15:
y = S o f t m a x ( D r o p o u t ( F f u s e d ) × W c l s + b c l s )
16:
return   y , A

3.1. Parallel Sub-Networks for Multi-Modal Feature Extraction

Human motion signals collected by IMUs vary significantly across different body locations and signal domains. IMU sensors on the shank, wrist, and waist can record different aspects of human movement. In order to effectively leverage this multimodal information, TFAGNet employs six parallel feature extraction sub-networks.
Each sub-network contains four layers of Tied Block Convolution (TBC) followed by BatchNorm, ReLU, and MaxPooling. Unlike standard convolutions, TBC introduces controlled grouping and parameter sharing in feature maps to improve parameter efficiency and representation capacity while maintaining a lightweight structure [40]. The use of tied blocks allows for the splitting of the input channels into multiple logical groups while preserving global interactions via convolution groups.
At the end of each sub-network, a global average pooling layer is applied to compress the temporal dimension, resulting in a 512-dimensional feature vector per modality.

3.2. Attention-Gated Fusion Module

To integrate features from the six sub-networks, we design a gating network based on multi-head attention. This module is configured with 8 parallel attention heads. The six 512-dimensional feature vectors extracted by the sub-networks are first stacked. The total dimension for the Query (Q), Key (K), and Value (V) vectors is 512, which is split across the 8 heads, resulting in a per-head dimension of 64.
The attention is applied across these six feature branches. A single global query vector is computed by averaging the features from all branches. This global query then attends to the features of each individual branch, allowing the model to dynamically weigh their interdependencies. A fully connected layer followed by a softmax then generates a final importance weight for each sub-network. The final output is a globally fused feature vector, enabling data-dependent, interpretable, and efficient fusion, which is essential for robust identification under different conditions.

3.3. Classification Layer

The fused feature vector passes through a Dropout layer to reduce overfitting, followed by a fully connected layer that performs the final classification. The classifier maps the 512-dimensional global feature into the target number of classes 65 for identity recognition, outputting the final prediction.

4. Experiments

4.1. Data Collection

To evaluate the effectiveness of our proposed method, we conducted data collection experiment involving 65 subjects. Table 1 provides detailed demographic information of the valid participants, including gender, age, height, and weight.
Each participant was equipped with three wearable IMU (Shimmer3 IMU, Shimmer, Dublin, Ireland) firmly attached to the dominant wrist, waist, and shank using adjustable elastic straps. Each IMU simultaneously recorded triaxial accelerometer and triaxial gyroscope signals at a sampling rate of 100 Hz, with full-scale ranges of ±16 g and ±2000°/s, respectively. Prior to data collection, all devices were time-synchronized via a common trigger signal and subjected to static calibration to correct for offset and scale factor errors.
We used 9-DoF Shimmer3 wearable sensing nodes (Shimmer, Dublin, Ireland), each integrating a tri-axial accelerometer, a tri-axial gyroscope, and a tri-axial magnetometer. For this study, the magnetometer was disabled, and only 6-DoF (acc+gyro) were recorded to avoid indoor magnetic disturbances on data collection. The device size is 65 mm × 32 mm × 12 mm and the mass is 31 g, enabling robust body-worn use. Each participant was equipped with three IMU firmly attached to the dominant wrist, waist, and shank using adjustable elastic straps. The accelerometer and gyroscope ranges were set to 16 g and 2000, respectively, with a sampling rate of 100 Hz per axis.
Before data collection, we calibrated the IMUs following the Shimmer procedure: a six-position static test for the accelerometer ( ± X , ± Y , ± Z ), a zero-rate bias estimation plus rotations about each axis for the gyroscope. The resulting offset, sensitivity, and alignment matrices were saved to device memory and applied on the host side.
All sensors streamed data wirelessly to a Windows laptop via Bluetooth. Shimmer ConsensysPRO was used for device configuration, live monitoring, and logging. Raw streams were recorded with device timestamps and saved as CSV files. A start-of-trial trigger ensured inter-device alignment. All devices were time-synchronized via a common trigger signal and subjected to static calibration to correct for offset and scale factor errors.
To simulate real-world walking scenarios in which individuals may walk at varying speeds, data collection was conducted on a motorized treadmill with three preset speed conditions: slow, normal, and fast paces. For each speed condition, data were recorded for a continuous 10 min interval, resulting in approximately 30 min of sensor data per participant. Short transitional periods were included between speed changes to minimize abrupt gait pattern shifts. The experimental setup is illustrated in Figure 2.
Figure 3 illustrates raw IMU signals acquired from the shank, wrist, and waist under slow, normal, and fast walking conditions, highlighting the variability of motion patterns across sensor positions and gait speeds. Clear variations in amplitude, temporal periodicity, and waveform complexity are evident, reflecting the influence of both sensor placement and locomotion speed on the recorded signals.
This study has been approved by the Institutional Review Board of Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences (IRB No. SIAT-IRB-200715-H0510). All the participants signed a written informed consent form before the experiment.

4.2. Data Preprocessing

To ensure consistent inputs and improve robustness, we apply a standardized preprocessing pipeline to all raw IMU streams from the wrist, waist, and shank. The pipeline contains four mandatory stages: signal filtering, magnitude computation, data segmentation, and time–frequency transformation.
(1) Signal filtering
For each triaxial accelerometer ( a x , a y , a z ) and gyroscope ( ω x , ω y , ω z ) channel, a Butterworth low-pass filter with a cutoff frequency of 10 Hz was first applied to suppress high-frequency noise unrelated to human gait. This choice follows previous biomechanical studies indicating that most gait-related energy lies below 10 Hz. Subsequently, a Wiener adaptive filter with a window length of seven samples was employed to further reduce residual random noise while preserving step-related peaks.
(2) Magnitude Computation
For each sensor, we first computed the magnitude of the tri-axial accelerometer a m a g = a x 2 + a y 2 + a z 2 and the magnitude of the trxial gyroscope signals ω m a g = ω x 2 + ω y 2 + ω z 2 to obtain two additional magnitude channels.
(3) Data Segmentation
Each datum was segmented into non-overlapping windows of 6000 data points, yielding individual fixed-length samples for training and evaluation.
(4) Time–Frequency Transformation
We generated both time-domain signals and their corresponding frequency-domain representations for each windowed sample. The frequency-domain features were extracted using the Fast Fourier Transform (FFT), and the magnitude spectrum up to the Nyquist frequency was retained for subsequent processing.
Following the time–frequency transformation, the final data samples for model training were constructed from each data window. Specifically, for each 6000-point time window, we retained two parallel representations: (1) the processed time-domain signal; (2) the corresponding frequency-domain magnitude spectrum. These two components constitute a complete data unit for the model input, with both having dimensions of 8 × 6000 . The eight channels comprise the six-axis raw signals (from the tri-axial accelerometer and tri-axial gyroscope) and the two computed magnitude channels.
Prior to being fed into the model for training, we applied Z-score normalization to each of the eight channels independently across the temporal dimension. This step ensures that the model training is robust to variations in signal amplitude and offset across different sensors and subjects, which helps to accelerate model convergence and enhance final recognition performance.
Finally, each processed and normalized sample was assigned a label corresponding to the subject’s ID (from 0 to 64) to facilitate supervised learning. Through this complete preparation pipeline, we obtained a total of 2800 labeled samples, ready for subsequent cross-validation, training, and evaluation.

4.3. Experimental Settings

Our model was implemented in Python (version 3.8.18) using the PyTorch (version 1.12.1+cu113) framework and trained on a server equipped with an NVIDIA RTX 3090 GPU (Nvidia Corporation, Santa Clara, CA, USA). We employed the Adam optimizer with an initial learning rate of 1 × 10 4 . To ensure stable convergence, a StepLR learning rate scheduler was utilized, which decayed the learning rate by a factor of 0.5 every 20 epochs. The training process ran for a total of 50 epochs with a batch size of 64. The key hyperparameters for our training process are summarized in Table 2.
We evaluated the proposed time–frequency attention-gated fusion network for the multi-class identity recognition task and conducted experiments using a five-fold cross-validation strategy. We evaluate the model performance using six commonly used metrics for multi-class classification: Accuracy (ACC), Precision (PRE), Recall (REC), F1-Score (F1), Matthews Correlation Coefficient (MCC), and Area Under the Curve (AUC). Accuracy is defined as the proportion of correctly classified samples. Precision, Recall, and F1-score are computed for each class using a one-vs-rest strategy and averaged across classes with weights proportional to class support. MCC is calculated using the generalized multiclass formulation based on the closed-form expression of the confusion matrix. AUC is computed in a one-vs-rest manner for each class and then averaged across classes using class-support weights.

5. Results and Analysis

5.1. Gait Identification Performance

Figure 4 presents the gait identification results of the proposed TFAGNet on the 65-subject dataset, shown as a normalized confusion matrix derived from five-fold cross-validation. The matrix reveals that over 92% of predictions are concentrated along the diagonal, demonstrating the model’s strong classification capability. The remaining less than 8% of samples appear in off-diagonal positions, and these misclassifications are predominantly limited to a small subset of subjects. Such cases typically involve individuals whose gait signatures exhibit high inter-subject similarity, leading to occasional confusion. Despite these challenging instances, the high diagonal concentration and low misclassification rate indicate that TFAGNet maintains stable discriminative performance and robust generalization across diverse subjects in multi-class gait recognition.

5.2. Comparison with Baseline Methods

We compared the proposed TFAGNet with a series of representative deep learning methods for inertia-based identity identification, including CNN-based, hybrid CNN-LSTM, and lightweight convolutional architectures. All models were trained and tested on our collected IMU dataset using identical preprocessing procedures and a five-fold cross-validation protocol to ensure fairness.
Table 3 summarizes the comparative performance of each method in terms of ACC, PRE, Recall, F1-Score, MCC, AUC, and model complexity indicators (FLOPs and parameter count). TFAGNet outperforms all baseline methods by a clear margin in classification performance, achieving 96.0% average accuracy and F1-score. Moreover, this accuracy gain is accompanied by a reduced computational cost (1.12 GFLOPs, 1.54M parameters), which is lower than most of the compared models.
These results indicate that TFAGNet not only has high identity recognition performance but also maintains computational efficiency. This makes it well-suited for deployment in real-world wearable systems, where both resource constraints and high recognition accuracy are critical.

5.3. Impact of Signal Modalities and Sensor Placements

To quantitatively assess the contribution of different signal modalities and sensor placements, we visualized the global average attention weights across all attention heads and the six subnetworks, as shown in Figure 5. The attention heatmap demonstrates that time-domain features from the shank and wrist receive consistently higher weights, suggesting that these modalities contribute more significantly to identity recognition. Moreover, frequency-domain branches receive non-zero weights in most samples, indicating that the model leverages under moderate sensor noise.
Consistent with Figure 5, Table 4 shows the ablation study results. Removing time-domain branches leads to a substantial accuracy drop from 96.0% to 68.0%, confirming their central role in identity recognition. In comparison, removing frequency-domain branches results in a smaller decrease to 93.0%, verifying their auxiliary but beneficial contribution. Regarding sensor placement, eliminating the shank subnetwork reduces accuracy to 89.0%, the largest location-wise drop, followed by the wrist (95.0%) and waist (96.0%), reflecting the same importance order observed in the attention map. Notably, the single-location “Only Shank” configuration still achieves 94.0% accuracy, outperforming “Only Wrist” (84.0%) and “Only Waist” (78.0%), demonstrating the high discriminative power of lower-limb sensors.
By jointly interpreting the attention weight distributions and ablation experiment outcomes, we confirm that TFAGNet effectively prioritizes the most informative modalities, sensor placements, and time-domain shank and wrist signals, while leveraging frequency-domain information to enhance noise robustness and generalization.

5.4. Impact of Architectural Components

To understand how specific design choices influence performance and computational efficiency, we conducted ablation experiments targeting two core modules: the TBC and the attention-gated fusion mechanism.
TBC was introduced in TFAGNet to reduce parameter count and computational cost while maintaining recognition capacity. As shown in Table 4, removing the Tied Block Convolution (TBC) slightly improves accuracy but increases FLOPs by nearly threefold (from 1.12 G to 3.32 G) and the parameter count from 1.54 M to 4.12 M, confirming that TBC offers a significant efficiency advantage without sacrificing performance.
We further evaluated the importance of the attention-gated fusion mechanism by replacing it with a simpler weighted concatenation strategy. The experimental result shown in Table 4 indicates that the overall classification accuracy dropped from 96.0% to 92.0% when replacing the proposed multi-head attention with a simpler weighted concatenation module. These results demonstrate the critical role of attention mechanisms in dynamically prioritizing the most informative modalities and sensor locations.
In conclusion, these findings confirm that the architecture design of the TBC for lightweight computation and the multi-head attention-gated fusion for adaptive feature integration are crucial for achieving high accuracy and model efficiency, which are essential for practical deployment in wearable identity recognition scenarios.

5.5. Practical Applications

Our findings demonstrate that IMU-based gait recognition, balanced for accuracy and efficiency, has significant potential for a variety of practical, real-world applications. The core advantage of this technology lies in its unobtrusive and continuous nature, enabling seamless user recognition without active participation. Some key application domains include the following:
In hospitals, this technology can be used for long-term patient monitoring. For example, it could automatically verify that the correct elderly or cognitively impaired individual is receiving a scheduled medication, thereby reducing the risk of medical errors. For remote rehabilitation, it can continuously authenticate the patient’s identity during therapy sessions, ensuring the integrity and compliance of the collected data.
In secure environments, an authorized person could be granted access simply by walking towards a checkpoint, eliminating the need for fobs, cards, or traditional biometric scans. Furthermore, it can act as a continuous authentication layer for personal devices; a smartphone or laptop could automatically lock if it detects a gait pattern inconsistent with that of its owner, providing an additional layer of data security.
In smart homes, the system can identify which family member has entered a room and automatically adjust ambient settings like lighting, temperature, and music to their saved preferences. In scenarios involving shared devices like AR/VR headsets, the system could recognize the current user by their movement and automatically log them into their personal profile, creating a personalized user experience.

6. Conclusions

This work proposed TFAGNet, a time–frequency attention-gated fusion network for IMU-based gait identity recognition, together with a unified, quantitative framework to attribute performance to sensor locations and time-frequency modalities. On a 65-subject dataset, TFAGNet achieves high recognition accuracy with modest complexity, demonstrating an effective accuracy-efficiency balance for wearable scenarios. Our analysis shows that time-domain features are the primary source of discriminative power, whereas frequency-domain features provide complementary robustness. Among sensor placements, the shank contributes the most to recognition, the wrist offers a secondary benefit, and the waist is least informative. These findings provide actionable guidance: prioritize shank sensing and time channels for core performance and include frequency features to enhance robustness.
In future work, we will expand the study of sensor placements, including additional and bilateral locations and robustness to mild misplacement, and extend this framework beyond controlled treadmill conditions to free-living conditions and missing-modality scenarios. Furthermore, we aim to explore online personalized adaptation technologies for practical applications.

Author Contributions

Conceptualization, F.S. and F.L.; methodology, F.L.; software, F.L.; validation, F.L. and H.W.; formal analysis, H.W. and X.L.; investigation, H.W.; data curation, X.L.; writing—original draft preparation, F.L. and H.W.; writing—review and editing, F.S.; visualization, F.L. and H.W.; funding acquisition, F.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Key Research and Development Plan of Guangdong Province under Grant 2022B1515120062; in part by the Shenzhen International Cooperation Project under Grant GJHZ20220913142808016; in part by the Shenzhen Sustainable Development Special Project under Grant KCXFZ20230731094100001 and KCXFZ20240903094300001; in part by the Joint Fund of NSFC and Chongqing under Grant U21A20447; and in part by the Joint Fund of Yeqisun and NSFC under Grant U2241210.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences (protocol code SIAT-IRB-200715-H0510; 17 July 2020).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data used in this study are not publicly available because the institutional review board did not grant permission, but they are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Liu, Y.; Ivanov, K.; Wang, J.; Xiong, F.; Wang, J.; Wang, M.; Nie, Z.; Wang, L.; Yan, Y. Topological data analysis for robust gait biometrics based on wearable sensors. IEEE Trans. Consum. Electron. 2024, 70, 4910–4921. [Google Scholar] [CrossRef]
  2. Zhang, Z.; Ning, H.; Farha, F.; Ding, J.; Choo, K.K.R. Artificial intelligence in physiological characteristics recognition for internet of things authentication. Digit. Commun. Netw. 2024, 10, 740–755. [Google Scholar] [CrossRef]
  3. Li, M.; Huang, B.; Tian, G. A comprehensive survey on 3D face recognition methods. Eng. Appl. Artif. Intell. 2022, 110, 104669. [Google Scholar] [CrossRef]
  4. Hou, B.; Zhang, H.; Yan, R. Finger-vein biometric recognition: A review. IEEE Trans. Instrum. Meas. 2022, 71, 5020426. [Google Scholar] [CrossRef]
  5. Nguyen, K.; Proença, H.; Alonso-Fernandez, F. Deep learning for iris recognition: A survey. ACM Comput. Surv. 2024, 56, 1–35. [Google Scholar] [CrossRef]
  6. Li, Y.; Sun, X.; Yang, Z.; Huang, H. Snnauth: Sensor-based continuous authentication on smartphones using spiking neural networks. IEEE Internet Things J. 2024, 11, 15957–15968. [Google Scholar] [CrossRef]
  7. Sun, Y.; Lo, B. An artificial neural network framework for gait-based biometrics. IEEE J. Biomed. Health Inform. 2018, 23, 987–998. [Google Scholar] [CrossRef]
  8. Manupibul, U.; Tanthuwapathom, R.; Jarumethitanont, W.; Kaimuk, P.; Limroongreungrat, W.; Charoensuk, W. Integration of force and IMU sensors for developing low-cost portable gait measurement system in lower extremities. Sci. Rep. 2023, 13, 10653. [Google Scholar] [CrossRef]
  9. Papavasileiou, I.; Qiao, Z.; Zhang, C.; Zhang, W.; Bi, J.; Han, S. GaitCode: Gait-based continuous authentication using multimodal learning and wearable sensors. Smart Health 2021, 19, 100162. [Google Scholar] [CrossRef]
  10. Liu, S.; Shao, W.; Li, T.; Xu, W.; Song, L. Recent advances in biometrics-based user authentication for wearable devices: A contemporary survey. Digit. Signal Process. 2022, 125, 103120. [Google Scholar] [CrossRef]
  11. Marsico, M.D.; Mecca, A. A survey on gait recognition via wearable sensors. ACM Comput. Surv. (CSUR) 2019, 52, 1–39. [Google Scholar] [CrossRef]
  12. Derawi, M.O.; Nickel, C.; Bours, P.; Busch, C. Unobtrusive user-authentication on mobile phones using biometric gait recognition. In Proceedings of the 2010 Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Darmstadt, Germany, 15–17 October 2010; IEEE: New York, NY, USA, 2010; pp. 306–311. [Google Scholar]
  13. Mantyjarvi, J.; Lindholm, M.; Vildjiounaite, E.; Makela, S.M.; Ailisto, H. Identifying users of portable devices from gait pattern with accelerometers. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA, 23 March 2005; IEEE: New York, NY, USA, 2005; Volume 2, pp. ii/973–ii/976. [Google Scholar]
  14. Li, G.; Huang, L.; Xu, H. iwalk: Let your smartphone remember you. In Proceedings of the 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 21–23 July 2017; IEEE: New York, NY, USA, 2017; pp. 414–418. [Google Scholar]
  15. Sudhakar, S.R.V.; Kayastha, N.; Sha, K. ActID: An efficient framework for activity sensor based user identification. Comput. Secur. 2021, 108, 102319. [Google Scholar] [CrossRef]
  16. Nickel, C.; Wirtl, T.; Busch, C. Authentication of smartphone users based on the way they walk using k-NN algorithm. In Proceedings of the 2012 Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Piraeus-Athens, Greece, 18–20 July 2012; IEEE: New York, NY, USA, 2012; pp. 16–20. [Google Scholar]
  17. Luca, R.; Bejinariu, S.I.; Costin, H.; Rotaru, F. Inertial data based learning methods for person authentication. In Proceedings of the 2021 International Symposium on Signals, Circuits and Systems (ISSCS), Iasi, Romania, 15–16 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–4. [Google Scholar]
  18. Nickel, C.; Busch, C. Classifying accelerometer data via hidden markov models to authenticate people by the way they walk. IEEE Aerosp. Electron. Syst. Mag. 2013, 28, 29–35. [Google Scholar] [CrossRef]
  19. Rong, L.; Jianzhong, Z.; Ming, L.; Xiangfeng, H. A wearable acceleration sensor system for gait recognition. In Proceedings of the 2007 2nd IEEE Conference on Industrial Electronics and Applications, Harbin, China, 23–25 May 2007; IEEE: New York, NY, USA, 2007; pp. 2654–2659. [Google Scholar]
  20. Lu, H.; Huang, J.; Saha, T.; Nachman, L. Unobtrusive gait verification for mobile phones. In Proceedings of the 2014 ACM International Symposium on Wearable Computers, Seattle, WA, USA, 13–17 September 2014; pp. 91–98. [Google Scholar]
  21. Ahmad, M.; Alqarni, M.A.; Khan, A.; Khan, A.; Hussain Chauhdary, S.; Mazzara, M.; Umer, T.; Distefano, S. Smartwatch-Based Legitimate User Identification for Cloud-Based Secure Services. Mob. Inf. Syst. 2018, 2018, 5107024. [Google Scholar] [CrossRef]
  22. Hu, M.; Zhang, K.; You, R.; Tu, B. Multisensor-based continuous authentication of smartphone users with two-stage feature extraction. IEEE Internet Things J. 2022, 10, 4708–4724. [Google Scholar] [CrossRef]
  23. Middya, A.I.; Roy, S.; Mandal, S. User recognition in participatory sensing systems using deep learning based on spectro-temporal representation of accelerometer signals. Knowl. Based Syst. 2022, 258, 110046. [Google Scholar] [CrossRef]
  24. Chen, L.; Zhang, Y.; Peng, L. Metier: A deep multi-task learning based activity and user recognition model using wearable sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 1–18. [Google Scholar] [CrossRef]
  25. Dehzangi, O.; Taherisadr, M.; ChangalVala, R. IMU-based gait recognition using convolutional neural networks and multi-sensor fusion. Sensors 2017, 17, 2735. [Google Scholar] [CrossRef]
  26. Asuncion, L.V.R.; De Mesa, J.X.P.; Juan, P.K.H.; Sayson, N.T.; Cruz, A.R.D. Thigh motion-based gait analysis for human identification using inertial measurement units (IMUs). In Proceedings of the 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), Baguio City, Philippines, 29 November–2 December 2018; IEEE: New York, NY, USA, 2018; pp. 1–6. [Google Scholar]
  27. Huang, H.; Zhou, P.; Li, Y.; Sun, F. A lightweight attention-based CNN model for efficient gait recognition with wearable IMU sensors. Sensors 2021, 21, 2866. [Google Scholar] [CrossRef]
  28. Lee, Y.J.; Wu, C.C. One step of gait information from sensing walking surface for personal identification. IEEE Sens. J. 2023, 23, 5243–5250. [Google Scholar] [CrossRef]
  29. Liu, J.; Song, W.; Shen, L.; Han, J.; Ren, K. Secure user verification and continuous authentication via earphone imu. IEEE Trans. Mob. Comput. 2022, 22, 6755–6769. [Google Scholar] [CrossRef]
  30. Venkatachalam, S.; Nair, H.; Vellaisamy, P.; Zhou, Y.; Youssfi, Z.; Shen, J.P. Realtime person identification via gait analysis using imu sensors on edge devices. In Proceedings of the 2024 International Conference on Neuromorphic Systems (ICONS), Arlington, VA, USA, 30 July–2 August 2024; IEEE: New York, NY, USA, 2024; pp. 371–375. [Google Scholar]
  31. Yuan, J.; Zhang, Y.; Liu, S.; Zhu, R. Wearable leg movement monitoring system for high-precision real-time metabolic energy estimation and motion recognition. Research 2023, 6, 0214. [Google Scholar] [CrossRef] [PubMed]
  32. Cortes, V.M.P.; Chatterjee, A.; Khovalyg, D. Dynamic personalized human body energy expenditure: Prediction using time series forecasting LSTM models. Biomed. Signal Process. Control 2024, 87, 105381. [Google Scholar]
  33. Lee, Y.J.; Wu, Y.S.; Lin, P.C. Utilization of two types of feature datasets with image-based and time series deep learning models in recognizing walking status and revealing personal identification. Adv. Eng. Inform. 2024, 62, 102729. [Google Scholar] [CrossRef]
  34. Zhao, C.; Gao, F.; Shen, Z. Multi-motion sensor behavior based continuous authentication on smartphones using gated two-tower transformer fusion networks. Comput. Secur. 2024, 139, 103698. [Google Scholar] [CrossRef]
  35. Guo, Z.; Shen, W.; Xiao, M.; Cui, L.; Xie, D. Transformer-Based Biometrics Method for Smart Phone Continuous Authentication. In Proceedings of the 2024 International Conference on Networking, Sensing and Control (ICNSC), Hangzhou, China, 18–20 October 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
  36. Nguyen, K.N.; Rasnayaka, S.; Wickramanayake, S.; Meedeniya, D.; Saha, S.; Sim, T. Spatio-temporal dual-attention transformer for time-series behavioral biometrics. IEEE Trans. Biom. Behav. Identity Sci. 2024, 6, 591–601. [Google Scholar] [CrossRef]
  37. Lu, Z.; Zhou, H.; Wang, L.; Kong, D.; Lyu, H.; Wu, H.; Chen, B.; Chen, F.; Dong, N.; Yang, G. GaitFormer: Two-Stream Transformer Gait Recognition Using Wearable IMU Sensors in the Context of Industry 5.0. IEEE Sens. J. 2025, 25, 19947–19956. [Google Scholar] [CrossRef]
  38. Yi, S.; Mei, Z.; Ivanov, K.; Mei, Z.; He, T.; Zeng, H. Gait-based identification using wearable multimodal sensing and attention neural networks. Sens. Actuators A Phys. 2024, 374, 115478. [Google Scholar] [CrossRef]
  39. Huan, R.; Dong, G.; Cui, J.; Jiang, C.; Chen, P.; Liang, R. INSENGA: Inertial sensor gait recognition method using data imputation and channel attention weight redistribution. IEEE Sens. J. 2025. [Google Scholar] [CrossRef]
  40. Li, J.; Wen, Y.; He, L. Scconv: Spatial and channel reconstruction convolution for feature redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 6153–6162. [Google Scholar]
Figure 1. Time–frequency attention-gated fusion network (TFAGNet).
Figure 1. Time–frequency attention-gated fusion network (TFAGNet).
Electronics 14 03905 g001
Figure 2. Data collection scenario.
Figure 2. Data collection scenario.
Electronics 14 03905 g002
Figure 3. Raw IMU signals collected from different body parts and at different speeds.
Figure 3. Raw IMU signals collected from different body parts and at different speeds.
Electronics 14 03905 g003
Figure 4. Confusion matrix of the average accuracy of the five-fold cross-validation experiment.
Figure 4. Confusion matrix of the average accuracy of the five-fold cross-validation experiment.
Electronics 14 03905 g004
Figure 5. Global-average attention weights across six branches.
Figure 5. Global-average attention weights across six branches.
Electronics 14 03905 g005
Table 1. Demographic information of the participants.
Table 1. Demographic information of the participants.
Statistical CharacteristicValue
Number of Subjects65 (All)30 (Males)35 (Females)
Age (years)27.78 ± 6.1228.20 ± 6.3127.43 ± 5.93
Height (cm)168.55 ± 8.43175.55 ± 5.53162.56 ± 5.28
Weight (kg)61.97 ± 11.3170.15 ± 9.8454.97 ± 6.95
Table 2. Training parameters and hyperparameters.
Table 2. Training parameters and hyperparameters.
HyperparametersValue
Batch Size64
OptimizerAdam
Initial Learning Rate0.0001
Learning Rate SchedulerStepLR (gamma = 0.5, step_size = 20)
Epochs50
Loss FunctionCross-Entropy Loss
Table 3. Comparison results with baseline methods.
Table 3. Comparison results with baseline methods.
YearMethodIdentity Recognition
2800 Samples from 65 Subjects
ACCPRERECF1MCCAUCFlopsParams
2021CNN + CEDS [27]0.81 ± 0.020.86 ± 0.020.81 ± 0.020.81 ± 0.030.81 ± 0.030.97 ± 0.004.29 G2.45 M
2022Two-direction CNN [29]0.88 ± 0.030.92 ± 0.020.88 ± 0.030.88 ± 0.030.88 ± 0.030.99 ± 0.001.24 G8.44 M
2023SCNN [28]0.84 ± 0.050.90 ± 0.020.84 ± 0.050.84 ± 0.040.84 ± 0.050.99 ± 0.008.38 G4.78 M
2024Efficient CNN [30]0.84 ± 0.060.88 ± 0.040.84 ± 0.060.83 ± 0.060.83 ± 0.060.99 ± 0.003.29 G4.63 M
2024CNN-LSTM [32]0.86 ± 0.080.90 ± 0.050.86 ± 0.080.86 ± 0.090.86 ± 0.090.99 ± 0.003.79 G3.88 M
2024SW-LSTM [33]0.89 ± 0.020.91 ± 0.020.89 ± 0.020.89 ± 0.030.89 ± 0.020.99 ± 0.002.63 G4.53 M
OursTFAGNet0.96 ± 0.010.97 ± 0.000.96 ± 0.010.96 ± 0.010.96 ± 0.010.99 ± 0.001.12 G1.54 M
The bold font indicates the results of our proposed method.
Table 4. Ablation experiment results for different signals and modules.
Table 4. Ablation experiment results for different signals and modules.
ComponentsIdentity Recognition
2800 Samples from 65 Subjects
ACCPRERECF1MCCAUCFlopsParams
Without TBC0.97 ± 0.000.97 ± 0.000.97 ± 0.000.96 ± 0.000.96 ± 0.001.00 ± 0.003.32 G4.12 M
Without Multi-head Attention0.92 ± 0.010.93 ± 0.010.92 ± 0.010.91 ± 0.010.91 ± 0.010.98 ± 0.001.12 G2.30 M
Without Time Domain0.68 ± 0.020.72 ± 0.020.68 ± 0.020.67 ± 0.020.68 ± 0.020.95 ± 0.010.56 G1.29 M
Without Frequency Domain0.93 ± 0.010.94 ± 0.010.93 ± 0.010.93 ± 0.010.93 ± 0.010.99 ± 0.000.56 G1.29 M
Without Shank SubNet0.89 ± 0.010.90 ± 0.010.89 ± 0.010.88 ± 0.010.88 ± 0.010.98 ± 0.000.75 G1.37 M
Without Waist SubNet0.96 ± 0.010.97 ± 0.010.96 ± 0.010.96 ± 0.020.96 ± 0.010.99 ± 0.000.75 G1.37 M
Without Wrist SubNet0.95 ± 0.010.96 ± 0.010.95 ± 0.010.95 ± 0.010.95 ± 0.010.99 ± 0.000.75 G1.37 M
Only Shank SubNet0.94 ± 0.010.95 ± 0.010.94 ± 0.010.94 ± 0.010.94 ± 0.010.98 ± 0.010.37 G1.20 M
Only Waist SubNet0.78 ± 0.010.81 ± 0.020.78 ± 0.010.77 ± 0.010.78 ± 0.010.96 ± 0.010.37 G1.20 M
Only Wrist SubNet0.84 ± 0.010.86 ± 0.020.84 ± 0.010.83 ± 0.010.84 ± 0.010.97 ± 0.010.37 G1.20 M
ALL0.96 ± 0.010.97 ± 0.000.96 ± 0.010.96 ± 0.010.96 ± 0.010.99 ± 0.001.12 G1.54 M
Bold font indicates the results of the proposed complete model (ALL). Underlined values represent the best result in each column.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, F.; Wang, H.; Li, X.; Sun, F. Analysis of Sensor Location and Time–Frequency Feature Contributions in IMU-Based Gait Identity Recognition. Electronics 2025, 14, 3905. https://doi.org/10.3390/electronics14193905

AMA Style

Liu F, Wang H, Li X, Sun F. Analysis of Sensor Location and Time–Frequency Feature Contributions in IMU-Based Gait Identity Recognition. Electronics. 2025; 14(19):3905. https://doi.org/10.3390/electronics14193905

Chicago/Turabian Style

Liu, Fangyu, Hao Wang, Xiang Li, and Fangmin Sun. 2025. "Analysis of Sensor Location and Time–Frequency Feature Contributions in IMU-Based Gait Identity Recognition" Electronics 14, no. 19: 3905. https://doi.org/10.3390/electronics14193905

APA Style

Liu, F., Wang, H., Li, X., & Sun, F. (2025). Analysis of Sensor Location and Time–Frequency Feature Contributions in IMU-Based Gait Identity Recognition. Electronics, 14(19), 3905. https://doi.org/10.3390/electronics14193905

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop