Next Article in Journal
Effect of Abutment Screw Design on Torque Loss Under Cyclic Fatigue Loading: A Comparison of TSIII and KSIII Implant Systems
Previous Article in Journal
Exosomes from Adipose Tissue Mesenchymal Stem Cells, a Preliminary Study for In Vitro and In Vivo Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

New Gait Representation Maps for Enhanced Recognition in Clinical Gait Analysis

by
Nagwan Abdel Samee
1,†,
Mohammed A. Al-masni
2,*,†,
Eman N. Marzban
3,†,
Abobakr Khalil Al-Shamiri
4,
Mugahed A. Al-antari
2,
Maali Ibrahim Alabdulhafith
1,
Noha F. Mahmoud
5 and
Yasser M. Kadah
6
1
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
2
Department of Artificial Intelligence and Data Science, College of Artificial Intelligence Convergence, Sejong University, Seoul 05006, Republic of Korea
3
Biomedical Engineering Department, Cairo University, Giza 12613, Egypt
4
School of Computer Science, University of Southampton Malaysia, Iskandar Puteri 79100, Johor, Malaysia
5
Rehabilitation Sciences Department, Health and Rehabilitation Sciences College, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia
6
Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah 22254, Saudi Arabia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Bioengineering 2025, 12(10), 1130; https://doi.org/10.3390/bioengineering12101130
Submission received: 11 September 2025 / Revised: 10 October 2025 / Accepted: 17 October 2025 / Published: 21 October 2025

Abstract

Gait analysis is essential in the evaluation of neuromuscular and musculoskeletal disorders; however, traditional approaches based on expert visual observation remain subjective and often lack consistency. Accurate and objective assessment of gait impairments is critical for early diagnosis, monitoring rehabilitation progress, and guiding clinical decision-making. Although Gait Energy Images (GEI) have become widely used in automated, vision-based gait analysis, they are limited in capturing boundary details and time-resolved motion dynamics, both critical for robust clinical interpretation. To overcome these limitations, we introduce four novel gait representation maps: the time-coded gait boundary image (tGBI), color-coded GEI (cGEI), time-coded gait delta image (tGDI), and color-coded boundary-to-image transform (cBIT). These representations are specifically designed to embed spatial, temporal, and boundary-specific features of the gait cycle, and are constructed from binary silhouette sequences through straightforward yet effective transformations that preserve key structural and dynamic information. Experiments on the INIT GAIT dataset demonstrate that the proposed representations consistently outperform the conventional GEI across multiple machine learning models and classification tasks involving different numbers of gait impairment categories (four and six classes). These findings highlight the potential of the proposed approaches to enhance the accuracy and reliability of automated clinical gait analysis.

1. Introduction

Gait, or land human locomotion, is a complex movement determined by the coordinated actions of the skeletal, muscular, and nervous systems. Various factors, such as gender, age, pathological conditions, and muscle fatigue, can influence gait, often resulting in abnormal walking patterns that impair mobility and overall functionality [1,2]. These abnormalities may affect both the upper and lower limbs, making clinical gait analysis a valuable tool for assessing neurological and physical disorders, including stroke, multiple sclerosis, and Parkinson’s disease. For example, hemiplegic gait, characterized by spasticity and imbalance, is frequently observed in stroke survivors [3]. Assessing the severity of this dysfunction provides critical insights into a patient’s recovery progress.
Although gait abnormalities can often be visually observed under uncontrolled conditions, the interpretation of individual gait patterns traditionally relies on rehabilitation experts who use visual observation to evaluate walking characteristics. However, this approach is subjective and highly dependent on the evaluator’s expertise [4]. Physicians typically rely on quantitative parameters, such as spatio-temporal, kinetic, kinematic, and physiological or anthropometric measures, to assess gait-related pathologies during clinical gait analysis [5].
Automated gait analysis systems have emerged as a promising alternative, offering precise identification of gait impairments, enabling early diagnosis, and supporting informed clinical decision-making. Moreover, automated gait analysis has garnered significant attention in fields such as healthcare, rehabilitation, sports performance monitoring, and security surveillance, providing a more objective and consistent assessment [6].
By analyzing biomechanical gait features such as speed, step length, and stance and swing times, these systems can detect impairments and even differentiate between disorders based on severity. Furthermore, such analyses can predict fall risks in older adults and assess head impact risks in athletes [2,7], highlighting their potential for broader applications.
Over the past two decades, there has been increasing interest in clinical applications of gait assessment. Methods for gait impairment analysis can be broadly categorized based on sensing modalities into motion and biosignal measurements (e.g., wearable sensors such as accelerometers, gyroscopes, Electromyography (EMG), and Electroencephalography (EEG)) and vision-based measurements (e.g., non-wearable sensors such as 2D and 3D cameras) [1]. Among these, vision-based approaches have gained popularity due to their ability to seamlessly integrate with computer vision and artificial intelligence systems for automated gait analysis. For instance, spatial features can be learned through model training using image data, such as frames captured from video clips of individuals walking. In some studies, even signal-based modalities are converted into images to facilitate gait pattern recognition. Examples include spectrogram images generated from acceleration signals [8] and recurrence plot images derived from vertical Ground Reaction Force (vGRF) signals [9].
A foundational concept in vision-based modalities is the use of gait silhouette images, which require a carefully designed environment to record and extract these silhouettes from video footage of a patient’s movement. However, using individual silhouette images alone is ineffective, as they lack the temporal information necessary to represent a complete gait pattern. Instead, combining a sequence of consecutive silhouette images to represent an entire gait cycle (i.e., comprising stance and swing phases) is critical for enabling machine learning models to effectively identify gait impairments.
A well-known method for generating such a combination is the Gait Energy Image (GEI) [10,11,12]. GEI is generated by averaging a sequence of aligned gait silhouette images over a complete gait cycle, producing a single composite image that captures the spatial and temporal features of the gait cycle. While GEI is widely used as an input for machine learning models in gait analysis, it has notable limitations. Specifically, GEI does not account for boundary information or time-resolved dynamics, resulting in a grayscale image that represents the subject’s motion through the temporal mean intensity of each pixel. Additionally, GEI may fail to preserve fine-grained motion details, which could limit its utility in identifying subtle gait abnormalities.
In contrast, this study introduces four new gait representation maps designed to incorporate richer and more clinically relevant information compared to GEI. The first proposed method is the time-coded gait boundary image (tGBI), which accumulates the boundaries of binary silhouette images across the gait cycle rather than the silhouettes themselves, thereby preserving boundary-specific information. The second is the color-coded GEI (cGEI), which divides the gait cycle into three segments and assigns each segment to a distinct color channel, creating a time-resolved and color-enhanced representation. The third image representation, the time-coded gait delta image (tGDI), emphasizes motion dynamics by capturing the differences between consecutive images in the gait sequence, with indices reflecting their position within the gait cycle. Finally, the color-coded boundary-to-image transform (cBIT) encodes the boundary points of each binary silhouette as distinct lines within a color image, offering a novel visualization of boundary transitions.
These proposed representations address the limitations of GEI and provide a more comprehensive approach to clinical gait analysis by preserving detailed motion dynamics and incorporating time-resolved information.
The main contributions of this paper can be summarized as follows. First, we propose four new gait image representations, each designed to emphasize distinct dynamic features of the gait cycle, including boundary information, motion dynamics, and time-resolved characteristics. Second, we conduct gait impairment recognition experiments and demonstrate how these new representation maps achieve superior performance across various machine learning approaches when compared to the conventional GEI. Third, we evaluate the proposed representations using datasets comprising two different numbers of gait impairment classes (four and six), encompassing normal gait, severe gait impairment, and impairments affecting the left and right legs and arms.

2. Related Works

Gait representation plays a crucial role in recognizing and analyzing human gait patterns, especially in clinical settings. Traditional methods have been foundational, but with advancements in technology, newer techniques have emerged to better capture the temporal and spatial dynamics of human motion.
One of the most widely used traditional methods for gait recognition is the GEI, which creates a single image by averaging silhouette frames over the entire gait cycle. GEI captures the overall shape and motion of an individual’s gait [12]. The advantage of GEI lies in its simplicity and its ability to encapsulate overall gait dynamics. It has been a foundational technique in human identification and gait analysis for detecting pathologies. However, GEI’s reliance on silhouette images can lead to the loss of finer details, such as boundary variations and subtle motion characteristics, which are essential for accurately recognizing gait abnormalities in clinical settings [6,12].
Over time, various techniques have been developed to address the limitations of GEI, enhancing the representation of gait patterns in more comprehensive and clinically relevant ways [13,14,15,16,17]. Recurrence plot images derived from signals like vertical Ground Reaction Force (vGRF) offer one such advancement. Recurrence plots visualize the temporal patterns in gait signals, providing a more nuanced view of the gait cycle that is particularly valuable for clinical assessment. Studies have demonstrated that using recurrence plots can improve the accuracy of gait classification by capturing subtle gait abnormalities that may be overlooked by GEI [9].
In addition, the use of spectrogram images derived from accelerometer data has shown potential in improving the recognition of pathological gait. Spectrograms transform raw acceleration signals into time-frequency representations, providing a richer and more comprehensive view of gait dynamics, which is critical for detecting abnormalities linked to conditions such as Parkinson’s disease [8,18,19]. For instance, El-Ziaat et al. [8] proposed a multi-feature fusion approach to enhance the detection of freezing of Gait episodes in patients with Parkinson’s disease. The study combined time-domain statistical feature engineering with spectrogram-based time-frequency analysis, leveraging Convolutional Neural Networks (CNNs) to extract deep features from spectrogram images. Linda et al. [20] introduced a Color-Mapped Contour Gait Image (CCGI) technique to enhance Cross-View Gait Recognition (CVGR) using deep convolutional neural networks. CCGI addressed challenges associated with cross-view variations by encoding spatial and temporal information through color mapping, resulting in regularized contour images with fewer outliers.
Machine learning and deep learning techniques have greatly expanded the capabilities of gait recognition, enabling systems to handle large datasets and automatically extract relevant features from raw gait data. These methods are particularly advantageous in clinical applications, where the accuracy and efficiency of gait classification are paramount.
Traditional machine learning approaches, such as K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), and Neural Networks (NN), have been used for gait classification [21,22,23]. For instance, Zhang et al. [22] developed a gait recognition framework based on a Siamese neural network to extract gait features for human identification. The network employed distance metric learning to optimize similarity metrics for distinguishing between gaits of the same and different individuals. The study employed GEIs as input and utilized the KNN algorithm for classification.
The advent of deep learning, particularly CNNs, has revolutionized gait recognition by enabling automatic feature extraction from raw gait data. CNNs excel in image-based gait recognition tasks, where they can learn to detect important spatial and temporal features in gait images, such as GEIs. Several studies have successfully applied CNNs to recognize pathological gait patterns with high accuracy, highlighting the promise of deep learning in this field [24,25,26,27,28,29,30,31,32,33]. Castro et al. [24] proposed a gait identification framework leveraging CNNs to extract high-level features from low-level optical flow motion data. Their approach introduced a preprocessing pipeline to organize and normalize motion features as input for the CNN, which was designed to generate discriminative gait signatures. In [25], the authors presented a gait recognition system using CNNs and optical flow for feature extraction. The system incorporated two key modules: motion detection and tracking, and feature extraction. Motion detection employed background subtraction to isolate and segment the moving area, focusing on the spatial silhouette. The feature extraction module calculated gait cycles from silhouette sequences and constructed GEI. Optical flow is applied to GEIs to isolate dynamic components, excluding static parts. These processed features were input into a CNN to learn unique gait patterns. Al-masni et al. [28] proposed an innovative method for analyzing gait impairments by introducing silhouette sinograms, a one-dimensional representation that captures the relative variations in motion from silhouette images. These sinograms are created by calculating the distance and angle between the centroid and boundary points of detected silhouettes, effectively highlighting motion patterns. They utilized a one-dimensional convolutional neural network (1D CNN) to process consecutive silhouette sinogram signals, effectively learning spatiotemporal features through assisted knowledge learning.
In recent years, researchers have explored the integration of silhouette extraction and gait recognition within unified frameworks to address the limitations of traditional stepwise methods. By combining key processes like segmentation, feature extraction, and classification into a single learning framework, these approaches offer enhanced performance and efficiency. For instance, Nahar et al. [34] proposed a unified CNN for gait recognition that merged feature extraction and recognition into an end-to-end framework. Their method emphasized dynamic areas of gait using gait entropy images as input, which captured motion information while being robust to covariate conditions such as varying viewpoints and walking styles. Similarly, Song et al. [35] introduced GaitNet, an end-to-end network designed for gait-based human identification. This framework integrated silhouette segmentation and classification through two jointly trained CNNs. The joint learning approach overcame inefficiencies of traditional modular pipelines by adjusting each component to optimize the overall objective. The results of both studies highlighted the potential of end-to-end frameworks to enhance gait recognition performance by eliminating the limitations of fixed, independent modules and enabling more adaptive, streamlined learning.

3. Materials and Methods

3.1. Dataset

In this study, we utilized the publicly available INIT Gait Database (https://www.vision.uji.es/gaitDB/ (accessed 20 March 2025)), which provides high-quality binary silhouette sequences extracted from RGB video recordings. These recordings were captured at LABCOM, a specialized audiovisual studio at the University Jaume I. The dataset was created under controlled conditions with uniform green background and a lateral view camera setup, ensuring precise silhouette extraction.
The dataset includes ten healthy volunteers (nine males and one female), who were instructed to simulate a variety of gait patterns. These patterns were designed to resemble pathological gait abnormalities typically observed in clinical practice. The dataset comprises eight gait types: natural normal walking (NM), abnormal right leg (RL), abnormal left leg (LL), abnormal full body (FB), abnormal right arm (RA), abnormal left arm (LA), and partial arm impairments in which the swing of the left or right arm is restricted to approximately half of its normal range (LA-0.5 and RA-0.5). Normal walking represents the unaffected gait of a healthy subject, whereas the abnormal categories capture disturbances such as shortened stride length, reduced or absent arm swing, and shuffling gait with bent knees and small steps. Together, these categories provide a clinically inspired range of gait patterns, which in our study are treated as separate classes for multiclass recognition rather than a simple binary normal/abnormal classification.
Each participant performed two sequences per gait style, resulting in a total of 160 recorded sequences (10 subjects × 8 gait styles × 2 sequences). Each recording contained a variable number of gait cycles, depending on the simulated impairment. For example, participants walking with full-body impairment (FB) moved more slowly, producing a greater number of cycles within a single sequence compared to normal walking. From these sequences, the number of gait cycles extracted for each style were: FB (112), LL (102), RL (113), NM (85), LA (90), and RA (85). In total, all 160 recorded sequences contain 41,104 silhouette images, with each frame having a resolution of 400 × 800 pixels.
To maintain subject independence and address gender distribution, data were split at the subject level: 8 males (80%) for training and 2 subjects (20%; one male, one female) for testing. This setup ensured that the female participant was excluded from training but included in testing, thereby allowing evaluation of generalization across gender. The proposed representations maintained robust performance even on female samples in the test set. It should be noted that the ten participants varied in height and weight, although no obese individuals were included. This natural variation adds diversity to the dataset, but without introducing extreme body-size differences that could confound silhouette-based gait analysis.
This dataset serves as a valuable resource for studying gait variations and evaluating automated gait classification models.

3.2. Proposed Framework

In this study, a novel gait representation framework is introduced to enhance the detection of gait impairments. The key innovation of this work lies in the generation of new gait representation images, which are designed to capture distinct dynamic features of the gait cycle, including boundary information, motion dynamics, and time-resolved characteristics. These four newly developed gait representations provide a richer feature extraction process compared to the conventional GEI, leading to superior classification performance across different machine learning approaches.
To effectively utilize these new gait representations, a hybrid deep learning and Principal Component Analysis (PCA) approach is employed. The proposed Computer-Aided Diagnosis (CAD) system follows a structured three-stage process. First, the gait representation preparation stage involves constructing these four distinct gait image representations to emphasize key movement characteristics. Next, deep feature extraction is performed using pretrained convolutional neural networks (CNNs) including AlexNet which operate within a transfer learning framework to extract high-level deep features while mitigating overfitting risks. Finally, in the dimensionality reduction and classification stage, PCA is applied to refine the extracted deep features by selecting the most significant principal components, addressing multicollinearity while preserving essential variance. This step ensures a more compact and efficient feature space, improving the overall classification accuracy.
The proposed CAD system is structured into four main modules. The first module, Gait Image Generation, involves constructing the new gait representations that highlight critical movement characteristics. The second module, Feature Extraction, leverages pretrained CNNs to obtain discriminative deep features from the generated images. The third module, Dimensionality Reduction, applies PCA to optimize feature selection and eliminate redundancy in the extracted features. Lastly, the Classification and Performance Evaluation module involves implementing traditional machine learning classifiers to categorize gait impairments and assess performance using standard CAD evaluation metrics.
To validate the effectiveness of the proposed framework, gait impairment recognition experiments were conducted on datasets consisting of four-class and six-class gait impairments, encompassing normal gait, severe gait impairments, and impairments affecting the left and right legs and arms. A schematic of the proposed framework is presented in Figure 1.

3.2.1. Generation of New Gait Representation Images

In this section, four new gait representation images are proposed as alternatives to the gait energy image (GEI) representation. The proposed representations are designed to incorporate additional or more clinically relevant information than GEI provides. Three of the new representations are intended for both human and machine interpretation, while the fourth is primarily designed for machine interpretation. All representations begin with preprocessing steps to align and standardize the size of binary silhouettes, followed by the extraction of their boundaries.
Binary Silhouettes Preprocessing
A.
Image Cropping:
The first step is to segment and crop all binary silhouettes in the gait cycle of each subject, so they fit within the same image size. This is achieved by projecting the original silhouettes along the horizontal and vertical axes, then identifying the start and end points as the minimum and maximum positions of nonzero elements. Given an initial binary silhouette image n in the sequence S n ( x , y ) , the procedure can be expressed mathematically as follows:
S n x y = x S ( x , y ) and   S n y x = y S ( x , y ) .
The cropped image S n c x , y is subsequently derived as the subset of rows and columns for which the projections in both the x and y directions are nonzero such that,
S n c x , y = S n x , y   S n x y > 0   a n d   S n y x > 0   .
The cropped image sizes N n x and N n y are obtained as the cardinality of all nonzero elements in the projection vectors S n y x and S n x y , which can be calculated as the 0-norm of each:
N n x = S n y x 0   , N n y = S n x y 0   .
Each binary silhouette is cropped in both the horizontal and vertical directions to retain only the points between the start and end locations. This results in a sequence of binary silhouette images of variable horizontal and vertical dimensions. To standardize their sizes, all images are zero-padded equally from both directions to match the maximum height and width found among cropped images, resulting in uniformly sized binary silhouette images. The common binary silhouette sizes in a particular sequence are obtained as follows:
N x =   max n N n x   ,     N y =   max n N n y .
The result of zero-padding all binary silhouette images in the sequence to the common sizes N x and N y is termed S n c z x , y for the image n in the sequence. This will ensure consistency of subsequent movement assessment steps for the different silhouettes in the gait sequence.
B.
Centroid and Boundary Calculations:
For each gait sequence, the centroid location c n x , c n y of every uniformly sized, cropped binary silhouette image S n c z x , y is calculated as the center of mass of the silhouette region, assuming uniform mass density throughout. This can be expressed mathematically as:
c n x =   y x x   S n c z x , y y x   S n c z x , y   ,   c n y =   y x y   S n c z x , y y x   S n c z x , y .
Next, the uniformly sized, cropped binary silhouette images undergo a boundary tracing operation to obtain an ordered sequence of points that represent the silhouette boundary. This approach is more sophisticated than standard edge detection, as it provides an ordered sequence of boundary points rather than a simple binary contour image —an important advantage when dealing with complex silhouette shapes. Several existing methods can perform boundary tracing, such as pixel-following or vertex-following methods, which typically assume that the input binary images are zero-padded with no boundary points at the image matrix edges. In this study, the boundary tracing procedure was implemented using the Moore-Neighbor tracing algorithm, modified with the Jacob’s stopping criteria [36]. The process begins at a boundary point, typically the uppermost-leftmost pixel, and examines its eight neighboring pixels one by one in the same direction (e.g., clockwise). Starting from a neighbor with a value of zero (background), it selects the first neighbor with a value of one (part of the silhouette) as the next point in the boundary sequence. This procedure is repeated until one of Jacob’s stopping criteria is satisfied, indicating that the boundary sequence has returned to the starting point. The result of this boundary detection operation is a list of boundary points’ horizontal and vertical coordinates, b n x , b n y , for each cropped binary silhouette image S n c z x , y . It should be noted that the size of these boundary point sequences may vary across different images within the same gait sequence. Figure 2 depicts an example of the boundary and centroid detection.
C.
Gait Cycle Detection:
Accurate detection of the gait cycle is a critical step in gait analysis, as it ensures that the extracted features correspond to a complete and consistent motion sequence. A gait cycle is defined as the time interval between two consecutive occurrences of a specific event in the walking pattern, such as heel strike or toe-off of the same foot. Detecting the gait cycle involves analyzing the temporal patterns in a sequence of gait frames and identifying key phases of the walking motion. Gait cycle detection typically relies on the analysis of temporal variations in the subject’s motion, often represented by binary silhouettes or joint trajectories. Methods such as motion energy profiles, centroid displacement, or periodicity analysis are commonly employed. Motion energy profiles, derived from the horizontal or vertical projection of silhouette pixels, are particularly effective in identifying repetitive patterns associated with walking. The detection process involves identifying the starting and ending frames of a complete gait cycle. This is achieved by locating periodic peaks or troughs in the motion energy profile that correspond to significant events, such as the maximum extension of a leg or the alignment of both feet. In this work, the detection was based on the calculation of 2D correlation coefficients across all cropped binary silhouette images S n c z x , y in the gait sequence or their Fourier transformation magnitudes. The rationale for this selection lies in the fact that the 2D correlation coefficient is a robust statistical measure for assessing similarity between binary silhouettes, with inherent properties that make it particularly advantageous in scenarios where noise, such as segmentation errors, may interfere with data analysis. This results in two signals with the 2D correlation coefficients vs. time for images ( C o r r I n ) or Fourier transform magnitudes ( C o r r F n ) that are expected to have periodic peaks and valleys that can be used to robustly detect the gait cycle. Based on practical experimentation, the valleys were found to be more reliable reference points in each cycle due to their clearly defined locations, whereas the peaks tend to be more difficult to locate because of flatter regions observed in certain gait cycles.
The valley detection algorithm employed in this study is a simple minima detection method that considers the prominence of a local minimum. Prominence is a measure of how the valley compares with respect to its depth and location relative to other valleys. To compute the prominence of a valley, the closest points before and after the detected valley with the same correlation coefficient value are first identified. Each of these outer points can be either a point on the correlation coefficient curve or the end of data. Then, we find the highest correlation coefficient peaks in both the left and right intervals between the detected valley and the two outer points. The difference in correlation coefficient values between the smaller of these two peaks and the detected valley is taken as a measure of the valley prominence. To ensure reliable detection of valleys, a minimum prominence threshold of 0.1, corresponding to 10% of the range of correlation coefficient values in each signal, was used in this study. Figure 3 illustrates an example of a correlation curve used to identify gait cycles.
Gait Sequence Image Representations
A.
Gait Energy Image (GEI):
The GEI is a widely used spatiotemporal representation in gait analysis, which summarizes the motion information of a subject over a complete gait cycle into a single grayscale image. The GEI is computed from a sequence of binary silhouette images, where each silhouette represents the subject’s body in a single frame of the gait cycle. To ensure consistency across frames, all binary silhouettes are preprocessed to align and normalize the subject’s position and size. The alignment process centers the silhouettes by translating each frame so that the centroid of the silhouette coincides with a fixed reference point.
Additionally, the silhouettes are scaled to a uniform height and width to account for variations in subject size or camera distance.
In this work, the GEI representation is calculated as the average of the aligned binary silhouette images over a complete gait cycle. Given a sequence of N cropped and aligned binary silhouettes S n c z x , y , where n     { 1 ,   2 ,   ,   N } and x , y denotes the pixel coordinates, the GEI is computed as:
G E I x ,   y = 1 N   n = 1 N   S n c z x , y .
Each pixel value in the resulting GEI represents the temporal mean intensity of that pixel across the gait cycle, providing a compact and informative representation of the subject’s motion. So, the resulting GEI is a single grayscale image summarizing the gait motion cycle. Figure 4 shows an example of a generated GEI.
B.
Time-Coded Gait Boundary Image (tGBI):
In this proposed representation, instead of accumulating the binary silhouettes across images in a gait cycle, the boundaries of the binary silhouettes are accumulated. Additionally, each boundary is weighed according to its time index within the gait cycle. Consequently, this representation is referred to as the Time-Coded Gait Boundary Image (GBI). The motivation for this new representation is to emphasize the importance and visibility of the boundary changes throughout the gait cycle, without interference or even obstruction from the filled regions within the binary silhouettes. As a result, this representation has the potential to enhance both human and machine interpretation of gait patterns. An example of a generated tGBI is shown in Figure 5.
The calculation of the tGBI representation proceeds as follows. First, the boundary detection operation produces a list of boundary point coordinates, b n x , b n y , for each cropped binary silhouette image S n c z x , y . A new boundary image B n c z x , y , having the same size as S n c z x , y is then created for each binary silhouette, defined as:
B n c z x , y = 1   ,   ( x , y )   b n x ,   b n y 0   , O t h e r w i s e   .
Subsequently, the tGBI representation is computed by accumulating the boundary images of all binary silhouettes in the gait sequence, weighed by their respective time indices, as follows:
t G B I x ,   y = 1 N   n = 1 N   n   · B n c z x , y .
C.
Color-Coded Gait Energy Image (cGEI):
This proposed representation is a variant of the traditional GEI that incorporates color coding for time in the gait cycle. In particular, the gait cycle is divided into three temporal segments, and a partial GEI is computed for each. These partial images are then assigned to the red, green, and blue colors, respectively, to form the Color Gait Energy Image (cGEI). An example of a generated cGEI is illustrated in Figure 6. This representation provides a time-resolved view of the gait sequence, offering an advantage over the conventional GEI, which lacks time resolving capability. As a result, the cGEI is suitable for both human and machine interpretation. Mathematically, the cGEI representation can be defined as a color image with three color components that are expressed as follows:
c G E I x ,   y R e d = 1 N   n = 1 N 3   S n c z x , y .
c G E I x ,   y G r e e n = 1 N   n = N 3 + 1 2 N 3   S n c z x , y .
c G E I x ,   y B l u e = 1 N   n = 2 N 3 + 1 N   S n c z x , y .
D.
Time-Coded Gait Delta Image (tGDI):
This proposed representation is designed to highlight the differences between consecutive images in the gate sequence as representatives of the gait motion along with their indices within the gate cycle. To construct this representation, each image in the gate sequence is subtracted from its preceding image, producing delta images with positive, negative, and zero values. The positive and negative parts indicate regions where motion occurred either to (positive values) or from (negative values) them. Two simple thresholding masks are applied to the difference image (i.e., delta image) to generate separate maps of positive and negative values. These maps are then multiplied by their respective time indices within the gait cycle. Finally, the resulting time-weighted positive and negative maps are concatenated as the red and blue channels, respectively, to form the Time-Coded Gait Delta Image (tGDI). This representation allows a better and finer time-resolved view of the sequence of images in the gait cycle. It offers the added value of having the time variable as well as the movement of different parts information present in the final image. Owing to its intuitive concept and straightforward computational procedure, the tGDI is suitable for both human and machine interpretation. An example of a generated tGDI is presented in Figure 7. To compute the tGDI representation, each cropped binary silhouette image S n c z x , y is subtracted from the subsequent image S n + 1 c z x , y to create a delta image d n c z x , y of the same size as S n c z x , y . This delta image contains positive and negative values, depending on how the gait motion went to or from different parts of the image. A new color delta image D n c z x , y is then created from each delta image whereby the red channel has the positive values of the delta image, while the blue channel has the absolute values of the negative values. This process can be mathematically expressed as follows:
D n c z x , y R e d = 1   ,   d n c z x , y > 0   0   ,             O t h e r w i s e   .
D n c z x , y B l u e = 1   ,   d n c z x , y < 0   0   ,             O t h e r w i s e  
The tGDI representation is obtained by accumulating the color delta images of all binary silhouettes in the gait sequence, weighed by their respective time indices, as follows:
t G D I x ,   y = 1 N   n = 1 N   n   · D n c z x , y
E.
Color-Coded Boundary-to-Image Transform (cBIT):
This proposed representation adopts a different approach by encoding the complete information of a gait sequence into a single image. The basic idea is to encode the boundary points of each binary silhouette as a single line in a color image representation such that the x -coordinates are written as the red color information, while the y -coordinates are written as the blue-color information for that line. In this way, the image lines collectively contain the full boundary information of all binary silhouettes in the gait sequence. In principle, this information enables the reconstruction of the original binary silhouettes by recovering the boundary points from the encoded image and applying a simple flood fill algorithm. As such, this representation can be considered as a unique transformation with a one-to-one correspondence between its two domains, and is thus referred to as the Color-Coded Boundary-to-Image Transform (cBIT) representation. Even though the information in the resultant image encompasses all the information in the original sequence, it is very difficult to interpret by humans given its unintuitive appearance that does not resemble gait images. Rather, it is intended for such techniques as deep learning whereby they can be designed and trained to take advantage of the full scope of the embedded information in that representation. Hence, this representation can be labelled as suitable for machine interpretation only. An example of a generated cBIT is shown in Figure 8.
The calculation of the cBIT representation begins with the result of the boundary detection operation, obtained as a list of horizontal and vertical locations of boundary points, b n x , b n y , for each cropped binary silhouette image S n c z x , y . The size of all boundary point arrays from all images in the gait sequence is selected as their maximum, N B . To ensure uniformity, all boundary point arrays are resampled to this common size, if necessary, resulting in b ' n x , b ' n y . Subsequently, the cBIT image T x , y of size N B × N is constructed for all binary silhouettes as follows:
T x , n R e d   =   b ' n x  
T x , n B l u e = b ' n y  

3.2.2. Gait Classification

The goal of this section is to propose example classification methods that utilize the newly developed gait representations as training and testing data in order to demonstrate their potential. These methods are deliberately kept simple and efficient to enable clear comparisons across different representations. Transfer learning is a widely utilized technique in deep learning that enables the adaptation of a pre-trained network, originally developed for one task, to another relevant task. This approach leverages the knowledge embedded in a network trained on a large, diverse dataset to improve performance on a target task with limited data, a common scenario in biomedical engineering, including gait classification.
A standard implementation of transfer learning involves using a pre-trained network as a feature extractor, processing input data up to its last learnable layer and treating the resulting activations as highly informative features. For example, AlexNet, a pioneering deep convolutional neural network originally trained on ImageNet dataset for image classification, can be repurposed for feature extraction. In this study, we utilized the original 2012 version of AlexNet. This version represented a substantial advancement in deep learning by markedly surpassing prior methodologies in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2012). AlexNet consists of five convolutional layers followed by three fully connected layers, with each layer progressively learning to capture more abstract and complex features from the input data. It accepts RGB images of size 224 × 224 × 3 as input and incorporates key design elements such as overlapping max-pooling, local response normalization (LRN), and rectified linear units (ReLU) activation functions, which accelerate convergence and enhance model performance.
To use AlexNet as a feature extractor, the network’s architecture is preserved up to the last learnable layer, typically the second fully connected layer (fc7), before the final classification layer (fc8). The pre-trained weights from the ImageNet training are retained, as they encapsulate general patterns and features learned from a large and diverse dataset. During feature extraction, input images are passed through AlexNet, and the activations from the fc7 layer are extracted. This provides a total of 4096 features. These activations represent a compact and semantically rich feature vector for each input image, encoding high-level visual attributes such as shapes, textures, and patterns. These features are then used as input to a new model or classifier tailored to the target task.
The use of AlexNet for transfer learning offers several advantages. AlexNet’s architecture is relatively simple yet highly effective, making it computationally efficient compared to more recent, deeper models. Its design includes ReLU for non-linearity and dropout for regularization, which contribute to robust learning. Leveraging a pretrained AlexNet eliminates the need to train a deep network from scratch, which is particularly advantageous when working with limited data as mentioned above. Training a deep learning model from scratch is computationally expensive, time-consuming, and prone to overfitting, especially when the dataset is small. By using a pretrained AlexNet, the extracted features capture the generalizable knowledge of the model, ensuring a robust starting point for further analysis. It is important to note that in the proposed framework, AlexNet is not used as an end-to-end classifier but solely as a feature extractor within the transfer learning pipeline. The proposed approach is architecture-agnostic and can easily integrate other pre-trained deep networks. This design choice ensures that the novelty and generalizability of the proposed method stem from the newly developed gait representations and the feature optimization strategy, rather than the specific choice of backbone network. Consequently, if AlexNet were to produce suboptimal features, it could be replaced without affecting the overall framework structure or performance evaluation pipeline.
Since the input to AlexNet is a color image, the network training takes the color information into account resulting in varying processing weights for each of the color channels. In the proposed gait representations, three techniques utilize color coding to encode different types of information. Consequently, the feature extraction process depends on how this information is assigned to the color channels in the input image. In other words, for the same underlying data, altering the order of the color channels will produce different sets of features from AlexNet. To ensure that the most comprehensive set of features is captured, all possible permutations of the color channels (i.e., color shift) are generated and used as inputs to AlexNet, and the resulting feature sets are concatenated. For grayscale representations, this permutation will result in the same set of features repeated in the feature vector, which does not affect the outcome. Given that there are three color channels (R, G, B), the total number of possible permutations is 3 ! = 3 × 2 × 1 = 6 , comprising (R, G, B), (R, B, G), (G, R, B), (G, B, R), (B, R, G), and (B, G, R). Accordingly, for each input image, a feature vector F consisting of 4096 × 6 = 24,576 features is obtained. This results in a high-dimensional feature space, which necessitates the application of dimensionality reduction techniques such as principal component analysis (PCA).
PCA is a widely used dimensionality reduction technique which transforms high-dimensional data into a lower-dimensional space while preserving as much variance as possible. It identifies new axes, called principal components, which are linear combinations of the original variables. These components are ordered by the amount of variance they explain, with the first principal component capturing the largest variance and each subsequent component capturing the maximum remaining variance under the constraint of orthogonality to the previous components.
The PCA process involves several key steps. First, the data is centered by subtracting the mean of each feature. Next, the covariance matrix of the data is computed to capture relationships between features. An eigenvalue decomposition or singular value decomposition (SVD) is then performed on the covariance matrix to obtain the eigenvectors and eigenvalues. The eigenvectors represent the directions of the principal components, while the eigenvalues indicate the amount of variance explained by each principal component. The output of PCA includes the eigenvectors matrix (Loading Matrix) that contains the eigenvectors as its columns. Each eigenvector represents a principal component and defines the direction of maximum variance in the data. The entries of the eigenvectors indicate the contributions (or loadings) of the original variables to each principal component. This matrix provides insights into how the original variables are related to the principal components. By retaining only the first few principal components, PCA reduces the dimensionality of the data while preserving its most significant patterns and discarding noise or less informative dimensions. In this study, the process of dimensionality reduction using PCA was performed as follows. Given the multiclass nature of the problem, the process is applied on each class independently to obtain a set of P eigenvectors, where P is arbitrarily selected as 15 to retain most of the variance (at least 80%) in all classes, while providing significant dimensionality reduction. The eigenvectors from all classes are combined to provide a set of N · P eigenvectors. This combined eigenvectors matrix E has dimensionality of N · P × 24,576 (the number of concatenated features is 24,576 in this study). The reduced feature vector R of dimensionality N · P × 1 is then calculated as, R = E T · F . For a 6-class problem and a selected number of retained eigenvectors P of 15, this results in a modest ( 90 × 1 ) reduced feature vector, which is much simpler to deal with for training robust classification models.
The next step is to use lower-complexity conventional machine learning models to perform classification on the extracted reduced set of features. These models are computationally less intensive and require fewer parameters than a fully trained deep network. This significantly reduces the risk of overfitting, as the classifier focuses on task-specific decision boundaries while leveraging the rich, pre-learned representations from AlexNet. The combination of deep feature extraction and simpler classifiers balances performance with computational efficiency, making this approach particularly suitable for practical biomedical engineering problem scenarios where computational resources are limited and datasets are small. Furthermore, this strategy enhances model interpretability, as the contributions of individual features can often be analyzed more easily in conventional models compared to end-to-end deep learning systems.
An effective approach for classification using the extracted features is to employ a K-Nearest Neighbors (KNN) algorithm. KNN is a simple yet powerful non-parametric machine learning model that classifies data points based on the majority class of their nearest neighbors in the feature space. The advantages of KNN over other conventional models lie in its simplicity, interpretability, and absence of assumptions about the underlying data distribution. KNN does not require model training beyond the initial hyperparameter optimization, making it computationally efficient. Instead, it stores the feature vectors and their corresponding labels and performs classification during inference by comparing the distances between feature vectors. This characteristic makes KNN particularly suitable for transfer learning applications, where the extracted features from AlexNet are already highly discriminative and well-structured for distance-based classification. Another significant advantage of using KNN is its flexibility in handling multi-class problems without the need for complex modifications, as it naturally extends to multi-class settings by considering the nearest neighbors’ majority votes. Additionally, KNN’s simplicity reduces the risk of overfitting compared to more complex machine learning models. It benefits from the high-quality feature representations produced by AlexNet, allowing the classifier to focus on exploiting these discriminative features rather than learning complex decision boundaries that may not generalize well. Furthermore, the interpretability of KNN enhances transparency, as predictions can be traced back to specific training examples in the feature space, making it easier to analyze the classification results.
In order to ensure reliable estimation of performance, K-fold cross-validation, a robust technique commonly used in machine learning and statistical modeling to evaluate the performance and generalizability of a model, is utilized. It works by partitioning the dataset into K equal (or nearly equal) subsets (or folds). The model is trained on (K − 1) folds and validated on the remaining fold, ensuring that every data point is used for both training and validation exactly once. This process is repeated K times, with a different fold serving as the validation set in each iteration. The results from all K iterations are then averaged to produce a single set of performance metrics, such as accuracy, sensitivity, etc. This method reduces the variability associated with a single train-test split, providing a more reliable estimate of the model’s performance on unseen data. Additionally, K-fold cross-validation helps to detect overfitting and ensures the robustness of the model, making it an essential component of model evaluation in machine learning workflows. For this study, the value of K was selected as 5 to balance computational efficiency with performance reliability. A block diagram of the introduced classification methodology is shown in Figure 9.

4. Results and Discussion

In this study, five gait representation maps were compared: the traditional GEI and four newly introduced maps as described above. AlexNet was used for feature extraction along with several conventional machine learning classifiers. Two classification problems were investigated to assess the robustness of the proposed methods under varying complexities: a 4-class task (FB, LL, RL, NM) and a more challenging 6-class task (FB, LL, RL, NM, LA, RA). The following performance metrics were used to assess the maps and algorithms: overall accuracy (Acc), True Positive Rate (TPR), also known as sensitivity (SE) or recall, Area Under the receiver operating characteristic (ROC) Curve (AUC), F1-score (the harmonic mean of precision and recall), and weighted average Positive Predictive Value (PPV). The number of samples for each class was FB (112), LL (102), RL (113), NM (85), LA (90), and RA (85), respectively.
Table 1 summarizes the overall accuracy, weighted-average sensitivity, F1-scores, and AUC achieved across different representations and setups. Without applying color shifts, the GEI map achieved the best performance in the 4-class experiment, with an accuracy of 95.4% and an AUC of 0.994. However, when color shifts were applied, all the proposed representation maps outperformed GEI. In contrast, the cGEI map showed superior performance in the 6-class experiment, with an accuracy of 87.6% and an AUC of 0.983.
Additionally, applying the color shift method improved performance across all experiments and all maps, as well as for the combined features from all maps, as shown in Figure 10. For the 4-class experiment, results improved by 2.5–5.8% in accuracy, 0.7–4.7% in AUC, and 2.5–5.5% in PPV. This demonstrates the substantial benefit of leveraging color permutations (i.e., color shifts) to extract richer feature sets from the representations. The cBIT map exhibited the largest performance gain from color shifts.
For the 6-class experiment, results improved by 6.6–12.4% in accuracy, 1.4–11.6% in AUC, and 6.7–12.2% in PPV. Notably, the gains were more pronounced in this higher complexity task, showing that multi-class problems benefit even more from enriched feature extraction. Classification using the combined features from all maps was maximally enhanced by applying color shifts, suggesting that integrating diverse representations coupled with channel re-encoding offers a powerful approach for robust gait impairment recognition.
In the 4-class problem, using the cGEI map resulted in a 1% decrease in the TPR of LL, and combining all features led to a decrease of 0.9% in the PPV of FB and a 4.7% reduction in the TPR of NM. Additionally, in the 6-class problem, using the tGDI map lowered the AUC of LA by 0.31%, the cBIT map reduced the TPR of RL by 0.9%, and combining all features decreased the PPV, TPR, and AUC of FB by 3.6%, 4.5%, and 0.5%, respectively. These relatively minor reductions are likely attributed to inter-class confusion where subtle impairments (e.g., left vs. right arm restrictions) share overlapping gait features, particularly when combined into high-dimensional feature spaces. This is illustrated in Figure 11. Detailed class-level performance with and without the color shift method is shown in Table 2 and Table 3. The FB class consistently achieved the highest values across all metrics in both the 4-class and 6-class experiments. Overall, performance improved in all setups, with the exception of a few specific cases.
Beyond our experiments, these findings connect well with prior research. The GEI map has been widely used and investigated for both gait recognition [6,12,13,14,15,16,17] and anomaly detection tasks [2,3,10,11,28,37]. However, changes in walking speed, clothing, or wearing additional items such as a coats or shoes can significantly degrade the performance of GEI [7,10,13,14,15,16,27,38]. In 2019, Xu et al. [16] introduced the Single Support GEI (SSGEI), a new map designed to be speed-invariant for subject recognition. In our study, we introduce four silhouette-derived feature maps, specifically aimed at classifying gait anomalies.
Earlier, in 2014, Whytock et al. [38] introduced the Skeleton Variance Image (SVIM). Through mathematical manipulation, they also derived the Skeleton Energy Image (SEIM) and Gait Variance Image (GVI) from SVIM and GEI, respectively. In their experiments, GEI performed best under normal conditions (i.e., unchanged clothing and posture). However, its performance dropped markedly when these conditions varied.
In 2020, Loureiro et al. [10] proposed the Skeleton Energy Image (SEI) and created their own dataset, GAIT-IST, consisting of five classes. In their work, they used VGG-19 for feature extraction, and either fed these features directly into VGG-19 for classification or applied PCA followed by LDA or SVM. Under the best configuration, SVM and LDA achieved similar accuracies of 96.4%, whereas using VGG-19 for the entire pipeline reached 98.5%. However, this performance dropped sharply due to overfitting when tested on a different dataset, with the highest accuracy only reaching 76.7%.
Compared to these studies, our proposed framework demonstrated strong generalization within the controlled INIT dataset, especially when leveraging multi-representation combinations and color shifts. Table 4 lists studies that employed GEI or proposed new representation maps for gait anomaly detection. These outcomes suggest that integrating novel representation maps capturing boundary, temporal, and motion-difference features, combined with multi-channel exploitation, provides a promising direction for enhancing automated gait impairment analysis.
While this study primarily utilized silhouettes from the INIT Gait Database, which are acquired under controlled imaging conditions, this choice was deliberate to isolate and evaluate the representational power of the proposed model independently of segmentation and background noise effects. In real-world clinical environments, silhouette extraction is influenced by factors such as background complexity, illumination variation, clothing, and occlusions. These factors constitute a separate preprocessing challenge that directly affects all silhouette-based representations, including GEI. Our contribution, therefore, focuses on improving the robustness and discriminative capability of the gait representation given a reasonably accurate silhouette extraction. Once integrated with advanced segmentation techniques (e.g., background subtraction based on deep learning, adaptive thresholding, or multimodal sensing), the proposed representations can be expected to generalize effectively to clinical settings.

5. Conclusions

This study introduces four novel gait representation maps (tGBI, cGEI, tGDI, and cBIT) designed to address the limitations of traditional Gait Energy Images (GEI) by incorporating boundary information, temporal dynamics, and enhanced spatial features. Comprehensive experiments conducted on the INIT Gait dataset demonstrated that the proposed representations—particularly with the application of color shifts—consistently outperform the conventional GEI in classifying gait impairments across multiple machine learning models and varying impairment categories (n = 4 and n = 6 classes). These findings highlight the potential of the proposed methods to improve the accuracy and robustness of automated clinical gait analysis, offering valuable tools for supporting objective and reliable diagnosis in healthcare settings.

Author Contributions

Conceptualization, M.A.A.-m. and Y.M.K.; data curation, M.A.A.-m., N.F.M. and N.A.S.; formal analysis, E.N.M. and Y.M.K.; funding acquisition, N.A.S.; investigation, A.K.A.-S., M.A.A.-a. and M.I.A.; methodology, M.A.A.-m. and Y.M.K.; project administration, M.A.A.-m., N.A.S. and Y.M.K.; resources, N.A.S.; software, E.N.M. and Y.M.K.; validation, A.K.A.-S., M.A.A.-a., M.I.A. and N.F.M.; visualization, M.A.A.-m., E.N.M. and Y.M.K.; writing—original draft, M.A.A.-m., E.N.M., N.A.S., A.K.A.-S. and Y.M.K.; writing—review and editing, M.A.A.-m. and Y.M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research project was funded by the Deanship of Scientific Research, Princess Nourah bint Abdulrahman University, through the Program of Research Project Funding After Publication, grant No (44-PRFA-P-7).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The authors confirm that the data used in this study are publicly available at https://www.vision.uji.es/gaitDB/ (accessed on 5 March 2025).

Acknowledgments

The authors would like to express their grateful to the Deanship of Scientific Research, Princess Nourah bint Abdulrahman University, through the Program of Research Project Funding After Publication, grant No (44-PRFA-P-7).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Matsushita, Y.; Tran, D.T.; Yamazoe, H.; Lee, J.-H. Recent use of deep learning techniques in clinical applications based on gait: A survey. J. Comput. Des. Eng. 2021, 8, 1499–1532. [Google Scholar] [CrossRef]
  2. Verlekar, T.T.; Soares, L.D.; Correia, P.L. Automatic classification of gait impairments using a markerless 2D video-based system. Sensors 2018, 18, 2743. [Google Scholar] [CrossRef]
  3. Zhou, C.; Feng, D.; Chen, S.; Ban, N.; Pan, J. Portable vision-based gait assessment for post-stroke rehabilitation using an attention-based lightweight CNN. Expert Syst. Appl. 2024, 238, 122074. [Google Scholar] [CrossRef]
  4. Ranjan, R.; Ahmedt-Aristizabal, D.; Armin, M.A.; Kim, J. Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset. arXiv 2024, arXiv:2407.04190. [Google Scholar] [CrossRef]
  5. Whittle, M.W. Gait Analysis: An introduction; Butterworth-Heinemann: Oxford, UK, 2014. [Google Scholar]
  6. Song, X.; Hou, S.; Huang, Y.; Cao, C.; Liu, X.; Huang, Y.; Shan, C. Gait Attribute Recognition: A New Benchmark for Learning Richer Attributes from Human Gait Patterns. IEEE Trans. Inf. Forensics Secur. 2023, 19, 1–14. [Google Scholar] [CrossRef]
  7. Wu, L.C.; Kuo, C.; Loza, J.; Kurt, M.; Laksari, K.; Yanez, L.Z.; Senif, D.; Anderson, S.C.; Miller, L.E.; Urban, J.E. Detection of American football head impacts using biomechanical features and support vector machine classification. Sci. Rep. 2017, 8, 855. [Google Scholar] [CrossRef]
  8. El-Ziaat, H.; El-Bendary, N.; Moawad, R. Using Multi-Feature Fusion for Detecting Freezing of Gait Episodes in Patients with Parkinson’s Disease. In Proceedings of the 2020 International Conference on Innovative Trends in Communication and Computer Engineering (ITCE), Aswan, Egypt, 8–9 February 2020; pp. 92–97. [Google Scholar]
  9. Lin, C.-W.; Wen, T.-C.; Setiawan, F. Evaluation of vertical ground reaction forces pattern visualization in neurodegenerative diseases identification using deep learning and recurrence plot image feature extraction. Sensors 2020, 20, 3857. [Google Scholar] [CrossRef]
  10. Loureiro, J.; Correia, P.L. Using a skeleton gait energy image for pathological gait classification. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), Buenos Aires, Argentina, 16–20 November 2020; pp. 503–507. [Google Scholar]
  11. Gong, L.; Li, J.; Yu, M.; Zhu, M.; Clifford, R. A novel computer vision based gait analysis technique for normal and Parkinson’s gaits classification. In Proceedings of the 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Calgary, AB, Canada, 17–22 August 2020; pp. 209–215. [Google Scholar]
  12. Han, J.; Bhanu, B. Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 28, 316–322. [Google Scholar] [CrossRef]
  13. Gupta, S.K.; Chattopadhyay, P. Exploiting pose dynamics for human recognition from their gait signatures. Multimed. Tools Appl. 2021, 80, 35903–35921. [Google Scholar] [CrossRef]
  14. Bukhari, M.; Bajwa, K.B.; Gillani, S.; Maqsood, M.; Durrani, M.Y.; Mehmood, I.; Ugail, H.; Rho, S. An efficient gait recognition method for known and unknown covariate conditions. IEEE Access 2020, 9, 6465–6477. [Google Scholar] [CrossRef]
  15. Lenac, K.; Sušanj, D.; Ramakić, A.; Pinčić, D. Extending appearance based gait recognition with depth data. Appl. Sci. 2019, 9, 5529. [Google Scholar] [CrossRef]
  16. Xu, C.; Makihara, Y.; Li, X.; Yagi, Y.; Lu, J. Speed-invariant gait recognition using single-support gait energy image. Multimed. Tools Appl. 2019, 78, 26509–26536. [Google Scholar] [CrossRef]
  17. Yao, L.; Kusakunniran, W.; Wu, Q.; Zhang, J.; Tang, Z.; Yang, W. Robust gait recognition using hybrid descriptors based on skeleton gait energy image. Pattern Recognit. Lett. 2021, 150, 289–296. [Google Scholar] [CrossRef]
  18. Ahlrichs, C.; Samà, A.; Lawo, M.; Cabestany, J.; Rodríguez-Martín, D.; Pérez-López, C.; Sweeney, D.; Quinlan, L.R.; Laighin, G.Ò.; Counihan, T. Detecting freezing of gait with a tri-axial accelerometer in Parkinson’s disease patients. Med. Biol. Eng. Comput. 2016, 54, 223–233. [Google Scholar] [CrossRef]
  19. Saad, A.; Zaarour, I.; Guerin, F.; Bejjani, P.; Ayache, M.; Lefebvre, D. Detection of freezing of gait for Parkinson’s disease patients with multi-sensor device and Gaussian neural networks. Int. J. Mach. Learn. Cybern. 2017, 8, 941–954. [Google Scholar] [CrossRef]
  20. Linda, G.M.; Themozhi, G.; Bandi, S.R. Color-mapped contour gait image for cross-view gait recognition using deep convolutional neural network. Int. J. Wavelets Multiresolution Inf. Process. 2020, 18, 1941012. [Google Scholar] [CrossRef]
  21. Sharma, O.; Bansal, S. Gait Recogniton System for Human Identification Using BPNN Classifier. Int. J. Innov. Technol. Explor. Eng. 2013, 3, 217–220. [Google Scholar]
  22. Zhang, C.; Liu, W.; Ma, H.; Fu, H. Siamese neural network based gait recognition for human identification. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 2832–2836. [Google Scholar]
  23. Gaba, I.; Ahuja, S.P. Gait analysis for identification by using BPNN with LDA and MDA techniques. In Proceedings of the 2014 IEEE International Conference on MOOC, Innovation and Technology in Education (MITE), Patiala, India, 19–20 December 2014; pp. 122–127. [Google Scholar]
  24. Castro, F.M.; Marín-Jiménez, M.J.; Guil, N.; Perez De La Blanca, N. Automatic learning of gait signatures for people identification. In Proceedings of the International Work-Conference on Artificial Neural Networks; Cadiz, Spain, 14–16 June 2017, Springer: Cham, Switzerland, 2017; pp. 257–270. [Google Scholar]
  25. Hawas, A.R.; El-Khobby, H.A.; Abd-Elnaby, M.; Abd El-Samie, F.E. Gait identification by convolutional neural networks and optical flow. Multimed. Tools Appl. 2019, 78, 25873–25888. [Google Scholar] [CrossRef]
  26. Alotaibi, M.; Mahmood, A. Improved gait recognition based on specialized deep convolutional neural network. Comput. Vis. Image Underst. 2017, 164, 103–110. [Google Scholar] [CrossRef]
  27. Nahar, S.; Narsingani, S.; Patel, Y. Cross View and Cross Walking Gait Recognition Using a Convolutional Neural Network. In Proceedings of the International Conference on Computer Vision and Image Processing, Jammu, India, 3–5 November 2023; Springer: Cham, Switzerland, 2023; pp. 112–123. [Google Scholar]
  28. Al-Masni, M.A.; Marzban, E.N.; Al-Shamiri, A.K.; Al-Antari, M.A.; Alabdulhafith, M.I.; Mahmoud, N.F.; Abdel Samee, N.; Kadah, Y.M. Gait Impairment Analysis Using Silhouette Sinogram Signals and Assisted Knowledge Learning. Bioengineering 2024, 11, 477. [Google Scholar] [CrossRef] [PubMed]
  29. Gul, S.; Malik, M.I.; Khan, G.M.; Shafait, F. Multi-view gait recognition system using spatio-temporal features and deep learning. Expert Syst. Appl. 2021, 179, 115057. [Google Scholar] [CrossRef]
  30. Mogan, J.N.; Lee, C.P.; Lim, K.M.; Ali, M.; Alqahtani, A. Gait-CNN-ViT: Multi-model gait recognition with convolutional neural networks and vision transformer. Sensors 2023, 23, 3809. [Google Scholar] [CrossRef]
  31. Yunardi, R.T.; Sardjono, T.A.; Mardiyanto, R. Skeleton-Based Gait Recognition Using Modified Deep Convolutional Neural Networks and Long Short-Term Memory for Person Recognition. IEEE Access 2024, 12, 121131–121143. [Google Scholar] [CrossRef]
  32. Ghosh, R. A Faster R-CNN and recurrent neural network based approach of gait recognition with and without carried objects. Expert Syst. Appl. 2022, 205, 117730. [Google Scholar] [CrossRef]
  33. Junaid, M.I.; Prakash, A.J.; Ari, S. Human gait recognition using joint spatiotemporal modulation in deep convolutional neural networks. J. Vis. Commun. Image Represent. 2024, 105, 104322. [Google Scholar] [CrossRef]
  34. Nahar, S.; Narsingani, S.; Patel, Y. A Unified Convolutional Neural Network for Gait Recognition. In Proceedings of the Asian Conference on Pattern Recognition, Kitakyushu, Japan, 5–8 November 2023; Springer: Cham, Switzerland, 2023; pp. 230–242. [Google Scholar]
  35. Song, C.; Huang, Y.; Huang, Y.; Jia, N.; Wang, L. Gaitnet: An end-to-end network for gait based human identification. Pattern Recognit. 2019, 96, 106988. [Google Scholar] [CrossRef]
  36. Gonzalez, R.C. Digital Image Processing; Pearson Education: Chennai, India, 2009. [Google Scholar]
  37. Elkholy, A.; Makihara, Y.; Gomaa, W.; Ahad, M.A.R.; Yagi, Y. Unsupervised GEI-based gait disorders detection from different views. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 5423–5426. [Google Scholar]
  38. Whytock, T.; Belyaev, A.; Robertson, N.M. Dynamic distance-based shape features for gait recognition. J. Math. Imaging Vis. 2014, 50, 314–326. [Google Scholar] [CrossRef]
  39. Iwama, H.; Okumura, M.; Makihara, Y.; Yagi, Y. The OU-ISIR gait database comprising the large population dataset and performance evaluation of gait recognition. IEEE Trans. Inf. Forensics Secur. 2012, 7, 1511–1521. [Google Scholar] [CrossRef]
  40. Nieto-Hidalgo, M.; García-Chamizo, J.M. Classification of pathologies using a vision based feature extraction. In Proceedings of the Ubiquitous Computing and Ambient Intelligence: 11th International Conference, UCAmI 2017, Philadelphia, PA, USA, 7–10 November 2017; pp. 265–274. [Google Scholar]
Figure 1. Schematic of the proposed framework: gait map generation, feature extraction using AlexNet, PCA-based dimensionality reduction, train-validation-test splitting, training and validation of classical classifiers, and final model evaluation.
Figure 1. Schematic of the proposed framework: gait map generation, feature extraction using AlexNet, PCA-based dimensionality reduction, train-validation-test splitting, training and validation of classical classifiers, and final model evaluation.
Bioengineering 12 01130 g001
Figure 2. Example of cropping process and boundary detection on a gait silhouette sequence.
Figure 2. Example of cropping process and boundary detection on a gait silhouette sequence.
Bioengineering 12 01130 g002
Figure 3. Gait cycle detection illustrated through correlation coefficient curves, showing from left to right the valley detection results for normal gait, arm and leg impairments, and full-body abnormalities. Blue curves represent image correlation coefficients, green curves indicate frequency correlations, and stars mark local minima corresponding to the start or end of new gait cycles.
Figure 3. Gait cycle detection illustrated through correlation coefficient curves, showing from left to right the valley detection results for normal gait, arm and leg impairments, and full-body abnormalities. Blue curves represent image correlation coefficients, green curves indicate frequency correlations, and stars mark local minima corresponding to the start or end of new gait cycles.
Bioengineering 12 01130 g003
Figure 4. Generation process of the Gait Energy Image (GEI) map from a sequence of silhouette images.
Figure 4. Generation process of the Gait Energy Image (GEI) map from a sequence of silhouette images.
Bioengineering 12 01130 g004
Figure 5. Generation process of the time-coded Gait Boundary Image (tGBI) from binary silhouette boundaries.
Figure 5. Generation process of the time-coded Gait Boundary Image (tGBI) from binary silhouette boundaries.
Bioengineering 12 01130 g005
Figure 6. Generation of the color-coded Gait Energy Image (cGEI) by segmenting the gait cycle into temporal color channels.
Figure 6. Generation of the color-coded Gait Energy Image (cGEI) by segmenting the gait cycle into temporal color channels.
Bioengineering 12 01130 g006
Figure 7. Generation of the time-coded Gait Delta Image (tGDI) highlighting motion changes between consecutive frames.
Figure 7. Generation of the time-coded Gait Delta Image (tGDI) highlighting motion changes between consecutive frames.
Bioengineering 12 01130 g007
Figure 8. Generation process of the color-coded Boundary-to-Image Transform (cBIT) using boundary points of silhouette images.
Figure 8. Generation process of the color-coded Boundary-to-Image Transform (cBIT) using boundary points of silhouette images.
Bioengineering 12 01130 g008
Figure 9. The proposed feature extraction and classification methodology.
Figure 9. The proposed feature extraction and classification methodology.
Bioengineering 12 01130 g009
Figure 10. Differences in overall performance with and without the application of color shifts.
Figure 10. Differences in overall performance with and without the application of color shifts.
Bioengineering 12 01130 g010
Figure 11. Differences in class-level performance with and without applying color shifts. (Top): results for the 4-class experiment; (bottom): results for the 6-class experiment.
Figure 11. Differences in class-level performance with and without applying color shifts. (Top): results for the 4-class experiment; (bottom): results for the 6-class experiment.
Bioengineering 12 01130 g011
Table 1. Classification performance of gait impairments using different gait representations.
Table 1. Classification performance of gait impairments using different gait representations.
Map4-Class Problem6-Class Problem
Without Color Shiftwith Color ShiftWithout Color Shiftwith Color Shift
GEIDiscriminant AnalysisNot ApplicableKNNNot Applicable
Acc: 95.4%Acc: 83.5%
AUC: 99.4%AUC: 98.0%
SE: 95.38%SE: 83.48%
F1-score: 95.47%F1-score: 83.98%
tGBIDiscriminant AnalysisNot ApplicableKNNNot Applicable
Acc: 91.5%Acc: 83.0%
AUC: 98.8%AUC: 98.0%
SE: 91.53%SE: 82.97%
F1-score: 91.64%F1-score: 83.25%
cGEIDiscriminant AnalysisKNNKNNKNN
Acc: 94.4%Acc: 97.1%Acc: 87.6%Acc: 94.2%
AUC: 99.1%AUC: 99.8%AUC: 98.3%AUC: 99.7%
SE: 94.42%SE: 97.08%SE: 87.58%SE: 94.19%
F1-score: 94.49%F1-score: 97.1%F1-score: 87.58%F1-score: 94.24%
tGDIDiscriminant AnalysisKNNKNNKNN
Acc: 94.9%Acc: 98.3%Acc: 82.8%Acc: 95.2%
AUC: 99.1%AUC: 99.9%AUC: 97.1%AUC: 99.0%
SE: 94.92%SE: 98.29%SE: 82.8%SE: 95.24%
F1-score: 94.94%F1-score: 98.31%F1-score: 82.95%F1-score: 95.29%
cBITKNNKNNKNNKNN
Acc: 91.5% Acc: 97.3%Acc: 83.0%Acc: 92.0%
AUC: 94.5%AUC: 99.2%AUC: 94.8%AUC: 99.4%
SE: 91.74%SE: 97.3%SE: 82.98%SE: 92%
F1-score: 91.78%F1-score: 97.32%F1-score: 82.96%F1-score: 92.06%
All MapsSVMKNNKNNKNN
Acc: 92%Acc: 96.36%Acc: 77.17%Acc: 89.26%
AUC: 99.0%AUC: 99.7%AUC: 86.47%AUC: 98.08%
SE: 91.99%SE: 96.36%SE: 77.18%SE: 89.25%
F1-score: 92.03%F1-score: 96.35%F1-score: 77.4%F1-score: 89.35%
Table 2. Individual class performance in the 4-class experiment.
Table 2. Individual class performance in the 4-class experiment.
MapsClassPPV (%)TPR/SE (%)AUC (%)
Without
Color Shift
With
Color Shifts
Without
Color Shift
With
Color Shifts
Without
Color Shift
With Color Shifts
GEIFB100.00Not Applicable100.00Not Applicable100.0Not Applicable
LL92.4095.1099.2
NM98.6085.9098.9
RL91.7098.2099.4
Average95.5595.3899.4
tGBIFB100.00Not Applicable99.10Not Applicable99.4Not Applicable
LL89.0087.3098.4
NM93.2081.2098.1
RL85.0095.6099.2
Average91.7691.5398.8
cGEIFB100.00100.00100.00100.00100.0100.0
LL91.4096.9094.1093.1099.499.7
NM97.4097.6087.1095.3098.299.7
RL89.9094.1094.7099.1098.599.8
Average94.5697.1294.4297.0899.199.8
tGDIFB100.00100.00100.00100.00100.0100.0
LL93.1099.0092.2096.1098.699.9
NM95.0095.5089.4098.8098.599.7
RL91.6098.2096.5098.2099.199.8
Average94.9698.3394.9298.2999.199.8
cBITFB99.10100.0097.30100.0098.5100.0
LL90.9095.0088.2094.1092.797.8
NM88.50100.0090.6097.6093.8100.0
RL87.9094.8090.3097.3092.899.2
Average91.8197.3491.7597.3094.599.2
All MapsFB100.0099.10100.00100.00100.0100.0
LL91.1095.1080.4095.1097.399.9
NM92.3096.4098.8094.1099.699.7
RL84.9094.7089.4095.6097.999.3
Average92.0796.3591.9996.3698.799.7
Table 3. Individual class performance in the 6-class experiment.
Table 3. Individual class performance in the 6-class experiment.
MapsClassPPV (%)TPR/SE (%)AUC (%)
Without
Color Shift
With
Color Shifts
Without
Color Shift
With
Color Shifts
Without
Color Shift
With
Color Shifts
GEIFB100.00Not Applicable100.00Not Applicable100.0Not Applicable
LA76.7076.7096.1
LL95.5083.3098.7
NM74.3064.7095.4
RA61.5078.8095.7
RL90.3090.3098.5
Average84.4883.4897.6
tGBIFB100.00Not Applicable100.00Not Applicable100.0Not Applicable
LA73.2078.9096.8
LL94.0077.5097.5
NM77.3068.2095.7
RA72.0078.8096.7
RL79.4088.5098.3
Average83.5482.9797.6
cGEIFB100.00100.00100.00100.00100.0100.0
LA85.1094.4088.9093.3097.899.5
LL90.7094.2086.3095.1098.299.8
NM80.3085.4071.8089.4096.399.1
RA78.9095.2083.5092.9097.999.8
RL86.4094.6090.3092.9097.999.6
Average87.5794.2987.5894.1998.199.6
tGDIFB99.10100.0097.30100.00100.0100.0
LA82.1090.9086.7088.9097.697.3
LL85.9099.0071.6094.1094.299.2
NM77.6089.9077.6094.1096.899.3
RA70.3091.1075.3096.5095.098.2
RL79.3098.2085.0096.5097.099.4
Average83.1195.3382.8095.2496.999.0
cBITFB100.00100.0099.10100.0099.4100.0
LA80.6089.5083.3094.4093.999.6
LL82.1093.0076.5091.2090.499.3
NM77.2091.2071.8085.9093.598.6
RA69.8082.6070.6089.4093.498.6
RL82.9093.5090.3089.4096.599.5
Average82.9592.1382.9892.094.799.3
All MapsFB100.0096.40100.0095.50100.099.5
LA66.3083.7072.2091.1082.897.8
LL84.9090.0077.5088.2087.398.1
NM63.6085.7057.6077.6076.096.3
RA55.8082.8062.4090.6077.097.8
RL84.8094.4084.1090.3090.298.4
Average77.6189.4477.1889.2586.598.1
Table 4. Summary of recent studies using GEI-based or novel gait maps for anomaly detection and their best reported performance.
Table 4. Summary of recent studies using GEI-based or novel gait maps for anomaly detection and their best reported performance.
ReferenceMap/Feature Extraction ApproachClassifierDataset(s)ClassesPerformance
Verlekar et al., 2018 [2]GEI and silhouette features using image processing.SVMINITFB, LL, RL, and NMAcc.: 98.8%
Elkholy et al., 2019 [37]GEI and applying convolutional autoencoderIsolation forestsOU-ISIR (train) [39] and INIT (test)Normal and abnormalAUC: 0.94
Gong et al., 2020 [11]GEI and applying R-CNN on video clipsOC-SVMTheir own dataset and Youtube videosNormal and ParkinsonianAcc. 97.33%
Loureiro et al., 2020 [10]GEI and SEI (VGG-19)VGG-19GAIT-ISTNormal, diplegic, hemiplegic, neuropathic, and ParkinsonianAcc.: 98.5%
GEI and SEI (VGG-19 + PCA)LDA / SVMGAIT-ISTAcc.: 96.4%
GEI (VGG-19 + PCA)LDAGAIT-IST (train) and
DAI 2 (test) [40]
Acc.: 76.7%
Zhou et al. 2024 [3]GEI (lightweight CNN)CNNGAIT-IST [10]Normal, diplegic, hemiplegic, neuropathic and ParkinsonianAcc.: 98.1%
GEI (lightweight CNN)CNNGAIT-ISTNormal + 3 levels hemiplegiaAcc.: 96.92%
Al-masni et al., 2024 [28]GEI (2D CNN)CNNINITFB, LL, RL, LA, RA, and NMAcc. 70.94%
GEI (ResNet50)ResNet50FB, LL, RL, and NMAcc.: 86.25%
Proposed MethodNew maps (AlexNet features)KNNINITFB, LL, RL, and NMAcc.: 98.3%
F1 score: 98.31%
AUC: 0.999
FB, LL, RL, LA, RA, and NMAcc.: 95.2%
F1 score: 95.29%
AUC: 0.990
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Samee, N.A.; Al-masni, M.A.; Marzban, E.N.; Al-Shamiri, A.K.; Al-antari, M.A.; Alabdulhafith, M.I.; Mahmoud, N.F.; Kadah, Y.M. New Gait Representation Maps for Enhanced Recognition in Clinical Gait Analysis. Bioengineering 2025, 12, 1130. https://doi.org/10.3390/bioengineering12101130

AMA Style

Samee NA, Al-masni MA, Marzban EN, Al-Shamiri AK, Al-antari MA, Alabdulhafith MI, Mahmoud NF, Kadah YM. New Gait Representation Maps for Enhanced Recognition in Clinical Gait Analysis. Bioengineering. 2025; 12(10):1130. https://doi.org/10.3390/bioengineering12101130

Chicago/Turabian Style

Samee, Nagwan Abdel, Mohammed A. Al-masni, Eman N. Marzban, Abobakr Khalil Al-Shamiri, Mugahed A. Al-antari, Maali Ibrahim Alabdulhafith, Noha F. Mahmoud, and Yasser M. Kadah. 2025. "New Gait Representation Maps for Enhanced Recognition in Clinical Gait Analysis" Bioengineering 12, no. 10: 1130. https://doi.org/10.3390/bioengineering12101130

APA Style

Samee, N. A., Al-masni, M. A., Marzban, E. N., Al-Shamiri, A. K., Al-antari, M. A., Alabdulhafith, M. I., Mahmoud, N. F., & Kadah, Y. M. (2025). New Gait Representation Maps for Enhanced Recognition in Clinical Gait Analysis. Bioengineering, 12(10), 1130. https://doi.org/10.3390/bioengineering12101130

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop