Impacts of Occlusion on the Symmetry of Gait Representations for Age and Gender Estimation

Zheng, Ryan Qin Chin; Connie, Tee; Lim, Zhe Khae; Goh, Michael Kah Ong

doi:10.3390/sym18071082

Open AccessArticle

Impacts of Occlusion on the Symmetry of Gait Representations for Age and Gender Estimation

¹

Faculty of Information Science and Technology, Multimedia University, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia

²

Centre for Image and Vision Computing, CoE for Artificial Intelligence, Jalan Ayer Keroh Lama, Melaka 75450, Malaysia

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(7), 1082; https://doi.org/10.3390/sym18071082 (registering DOI)

Submission received: 12 May 2026 / Revised: 19 June 2026 / Accepted: 24 June 2026 / Published: 25 June 2026

(This article belongs to the Special Issue Asymmetry and Symmetry in Computer Vision and Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

Gait refers to an individual’s unique walking pattern and is a promising biometric for age and gender estimation. Human gait exhibits inherent bilateral symmetry arising from the coordinated movement of the left and right sides of the body. However, occlusion remains a major challenge that disrupts the symmetric structure of gait patterns and degrades recognition performance. This paper investigates the impact of different occlusion types on gait-based age and gender estimation and proposes a Generative Adversarial Network (GAN)-based image restoration model to mitigate occlusion effects. Two occlusion types, namely block-wise and component-specific, are examined. A self-collected dataset of 715 side-walking gait energy images (GEIs) from 120 subjects was synthetically occluded to simulate real-life scenarios. Block-wise occlusion was applied both vertically and horizontally across GEI silhouettes, while component-specific occlusion targeted individual body parts. GAN-based restoration was subsequently applied to occluded images prior to model training. Experimental results confirm that occlusion significantly degrades recognition accuracy, with larger occluded regions causing greater performance drops. Shoulder occlusion most severely impacted age estimation, while head occlusion had the greatest effect on gender estimation. GAN-based restoration substantially recovered lost accuracy, demonstrating the potential of restoration techniques in compensating for missing body information. These findings highlight the importance of upper-body regions in gait-based soft biometrics and demonstrate the need to address occlusion in real-world gait recognition systems.

Keywords:

gait recognition; age estimation; gender estimation; occlusion recovery; gait energy image

1. Introduction

Gait describes a person’s unique walking pattern, which serves as a valuable biometric for many applications like age and gender estimation [1,2,3,4]. Compared with traditional biometrics such as face and fingerprint recognition, gait offers several unique advantages. Since gait information can be extracted from a person’s walking motion and body dynamics, recognition can be performed without physical contact or active subject cooperation. Furthermore, gait can be analyzed using surveillance footage acquired at relatively long stand-off distances, making it suitable for unconstrained environments. The effective recognition distance depends on factors such as camera resolution, viewing conditions, and the recognition algorithm employed. In addition, gait recognition can be performed using low-resolution imagery, and gait patterns are generally more difficult to disguise or imitate than many other biometric traits [5,6,7,8,9]. As a result, gait recognition has gained popularity in various applications such as surveillance, criminal investigation, and forensics.

Human gait exhibits inherent bilateral symmetry arising from the coordinated movement of the left and right limbs during locomotion. This symmetry is reflected in gait representations such as GEIs, where the spatial distribution of silhouette information captures the repetitive and structured nature of walking patterns. The preservation of these symmetrical gait characteristics is important for extracting discriminative features used in soft-biometric tasks such as age and gender estimation. Consequently, disruptions to the symmetric structure of gait representations may adversely affect recognition performance and reduce the reliability of gait-based analysis systems.

Nevertheless, when it comes to practical applications, gait recognition faces several challenges. Covariates including carried objects, clothing, shoes and occlusion are the most common issues faced when capturing the subjects’ gait videos. Occlusions are caused by obstacles such as trees and buildings or a limited view when the subject moves out of view [10,11,12,13,14]. The presence of occlusion significantly degrades the performance of gait recognition by obscuring important body information such as shape, pose, and motion [12,15,16,17,18,19]. This brings challenges in extracting appropriate gait features and causes loss of identity information as well as difficulty in capturing the periodic characteristics that are important for the recognition system.

In this study, the impacts of different occlusion types are systematically analyzed. We investigate the effects of different types and locations of occlusion on the accuracy of age and gender estimation from gait patterns. To mimic real-life situations, a dataset consisting of 715 walking sequences from 120 subjects is created. Occlusion effects are syntactically applied on the GEIs [20] constructed from the video/sequences. Two main types of occlusions are evaluated, namely block-wise and component-specific occlusion. In block-wise occlusion, the vertical or horizontal segments of the gait images are blocked out. On the other hand, specific body parts like the head, shoulder and calf are explicitly covered. After that, a GAN-based image reconstruction model is employed to evaluate the recoverability of gait information lost due to occlusion. The GAN model is trained on complete and also occluded GEIs. The model captures features like textures, patterns and edges from the image and uses this information to restore the images. This study provides insights into the robustness and limitations of current gait analysis techniques under realistic conditions. It also utilizes a GAN-based restoration technique to address the effect of occlusion on gait recognition.

It should be noted that the primary objective of this work is not to propose a novel gait recognition architecture, but rather to establish a controlled experimental framework for analyzing the impact of occlusion on gait-based age and gender estimation. Existing studies on occluded gait recognition mainly focus on improving recognition performance through increasingly sophisticated architectures. In contrast, this study seeks to systematically investigate how different occlusion types, occlusion locations, and affected body regions influence soft-biometric estimation performance. The GAN-based restoration module is therefore employed as an analytical tool to evaluate the extent to which lost gait information can be recovered after occlusion.

In a recent study, Awai et al. [21] developed a robust gait recognition method to improve the search for missing elderly people, using the security camera to manage image occlusions. The proposed method used walking images consisting of five occlusion types. Different models were created for each occlusion, and based on the occlusion classification, the best model was type selected. Pose estimation was performed using Openpose to acquire 17 2D coordinates. These values were used to classify the occlusion patterns as the inputs using the Random Forest algorithm. The dataset used was the CASIA-B dataset. It contained 124 subjects’ walking images from 11 angles. Five distinct masks were applied to parts of the images to evaluate the effects of the position and size of the masks. The results showed that the average accuracy of the occlusion patterns of NM (normal), BG (carrying bag) and CL (wearing coat) was 72.5% while the conventional method only achieved 67.8%. This work effectively handled partial visibility problems by designing distinct models for different occlusion types.

In addition, Paul et al. [22] proposed a neural-network-based model to reconstruct occluded frame sequences to improve the gait recognition accuracy. Binary silhouettes that were extracted from RGB frames were used in this study. First, encoded representation of each frame was obtained by using a two-layer convolutional autoencoder. The resulting feature map was then flattened before being put into the LSTM reconstruction model as an input. After that, two LSTM models were used to predict the embedding for each occluded frame from forward and backward directions. The predicted encoded vectors obtained by both LSTM models were decoded by the Decoder network that consisted of up-sampling layers and convolutional layers. A fusion network was used to combine the two frames to produce a final reconstructed image. CASIA-B and OU-ISIRLP datasets with synthetic occlusion were used to train the recognition model, while the TUM-IITKGP dataset was also used for training with unoccluded gait cycles. The two former datasets were used to evaluate the performance of the reconstruction model, as the latter one had a real occluded sequence. The experiment achieved an average of 0.866 and 0.898 dice scores and an average Rank-1 accuracy of 76.5% and 77.21% for both the OU-ISIRLP and CASIA-B datasets using Random Forest on occluded images with five distinct percentages of occlusions. The paper used dual LSTM models to predict occluded frames. However, the TUM-IITKGP dataset was not included in the test set, as it included real occlusions that might affect model performance against real-world scenarios.

Xu et al. [23] proposed a method to register silhouettes based on occlusion estimation related to the occluding elements. Only bounding-boxes with visible parts were considered. Silhouettes with visible parts were quantified by a CNN-based occlusion ratio estimator. The network had three convolutional layers with a batch normalization layer each. The occlusion ratio was obtained by averaging the estimates across frames and was used to register occluded silhouettes. A sampler derived from the spatial transformer network was used to obtain differentiable registration. After that, a pairwise mask was applied to mitigate the side effects due to the possible presence of residual variations within the occluded region even though the body size and position were normalized. Lastly, GaitGL was employed as a final feature extraction to the probe and galley sequence. The study used the OU-MVLP dataset, which consisted of 14 views of 10,307 subjects’ gait sequences. Only four views and four occlusions were considered with 5153 subjects for training and 5154 subjects for testing. The proposed method achieved a Rank-1 rate of 73.6% and EER of 1.45%. The results were averaged over all 4 × 4 occlusion combinations and 4 × 4 view combinations for each occlusion ratio pair.

Apart from that, Chen et al. [9] proposed an end-to-end model-based gait recognition method that combined SMPL model-based (human mesh) estimation and gait recognition without a prerequisite. The model was applied on occluded RGB gait videos and only a bounding box with visible body parts was deployed. The method started by cropping and resizing the unoccluded body parts for each frame as an input. After that, SMPL parameters such as body shape and pose parameters were estimated by using a sequence encoder. The potential intra-subject variation was reduced by utilizing an occlusion attenuation module that incorporated a GRU module before the 3D joint locations were fed into the recognition module. The OU-MVLP dataset was used. It consisted of 10,307 subjects with various angles. In the study, only 0°, 30°, 60°, and 90° views were selected for training and testing. Four occlusion patterns, namely FT, FB, CT, and CB (F: Fixed, C: Changing, T: Top, B: Bottom) were applied to each angle image. The recognition performance achieved an average of 60.2% Rank-1 accuracy and 2.36% EER from the average four occlusion types and angles.

Gupta et al. [24] proposed a method to model intrinsic occlusion type awareness into any state-of-the-art occluded gait recognition. An auxiliary detection module was designed to produce occlusion encodings that contained useful information for occluded gait recognition. GREW and BRIAR datasets were used in these experiments. Silhouette masks were extracted from the BRIAR dataset. The occlusion detector and gait recognition backbone were trained on different kinds of occlusions. The occlusion detector was a CNN that was trained for occlusion type classification. The output of the occlusion detector was taken by the occlusion awareness model and was combined with intermediate features to guide the training of gait recognition backbones. The evaluation was computed with top-K rank retrieval accuracy at different distances for BRIAR and standard top-K rank retrieval accuracy for GREW.

In addition, Hasan et al. [12] developed an end-to-end unified framework to overcome the challenges of the current approaches to occlusion. The framework consisted of two key modules, namely ODR and FEGR. The purpose of ODR module was to detect and classify the types of occlusion using 3D-based CNN. After that, a 3D GAN model was used to reconstruct the occluded silhouette sequence. The constructed silhouette sequence was taken by the FEGR module as an input for gait recognition. FEGR had two pipelines: one was based on GaitGL to extract global features and the second one used 2D CNN to extract frame-by-frame features. The CASIS-B and OU-MVLP datasets were utilized in the study. For each subject, five types of occlusions were applied and only the 90° view angle sequences were considered. Additionally, 74 subjects from CASIA-B were used for training while the other 50 were used for testing. OU-MVLP consisted of 10,307 subjects. CASIA-B dataset achieved an average Rank-1 accuracy of 84.5 for silhouette sequence with five types of artificially applied occlusion under NM, BG and CL with reconstruction. On the other hand, the OU-MVLP dataset achieved an average Rank-1 accuracy of 58.9 for the silhouette sequence with reconstruction.

Recent research has also explored alternative sensing modalities to overcome the limitations of vision-based gait analysis, particularly under challenging conditions such as occlusion, poor illumination, and privacy constraints. Dong et al. [25] proposed Wi-FiAG, a fine-grained abnormal gait recognition framework based on Wi-Fi Channel State Information (CSI). The method integrates CNN, BiGRU, and an attention mechanism to jointly capture spatial and temporal characteristics of gait signals, achieving an average recognition accuracy of 95% across seven gait categories. Unlike camera-based approaches, Wi-Fi sensing can capture human motion through wireless signal variations and is less affected by visual occlusions.

Similarly, mmWave radar has emerged as a promising sensing technology for gait recognition due to its robustness to lighting conditions, privacy preservation, and ability to capture motion information without relying on visual appearance. Mazzieri et al. [26] proposed an open-set gait recognition framework using sparse mmWave radar point clouds. Their approach combines supervised gait classification with point-cloud reconstruction to learn a discriminative latent representation that is capable of identifying both known and previously unseen subjects. Experimental results demonstrate significant improvements over existing methods in open-set recognition scenarios.

In addition, wearable sensing technologies have been increasingly adopted for gait analysis. Chen et al. [27] introduced INSENGA, an inertial sensor-based gait recognition framework that incorporates data imputation and channel attention weight redistribution to enhance feature representation from wearable sensor signals. By exploiting motion information captured directly from inertial measurement units (IMUs), the method improves recognition robustness under incomplete or noisy sensing conditions. Such wearable-based approaches provide complementary gait information that is not affected by visual occlusions. Although these multimodal sensing approaches have demonstrated strong robustness against environmental challenges, the present study focuses on understanding the impact of occlusion within a vision-based gait analysis framework and investigating the extent to which lost gait information can be recovered through image restoration techniques. A summary of the related vision-based methods is presented in Table 1.

Even though considerable research has been conducted on occluded gait recognition, most existing studies focus primarily on improving recognition accuracy through occlusion-aware architectures, silhouette reconstruction methods, or feature extraction techniques. Comparatively little attention has been paid to understanding how different occlusion locations and body regions affect gait-based soft biometric estimation, especially for age and gender classification. Furthermore, the relative importance of different anatomical regions under occlusion remains insufficiently investigated. This gap motivates the need for a systematic analysis framework that can quantify the impact of occlusion on gait-based age and gender estimation and evaluate the extent to which lost information can be recovered through image restoration.

The main contributions of this study are summarized as follows:

A systematic investigation of the effects of block-wise and component-specific occlusions on gait-based age and gender estimation using Gait Energy Images (GEIs).
A quantitative analysis of the relative importance of different body regions under occlusion, providing insights into which anatomical components contribute most significantly to age and gender estimation.
An evaluation of the influence of occlusion size and location on recognition performance, highlighting the vulnerability of specific gait regions to information loss.
An assessment of GAN-based image restoration for recovering discriminative gait information and mitigating the performance degradation caused by occlusion.

2. Materials and Methods

The overall framework of this research starts by acquiring the gait videos, which are the raw videos of participants walking. After that, gait energy images (GEIs) are generated based on the gait video. Next, the GEIs are synthetically occluded to simulate real-life occlusion. The non-occluded and occluded GEIs are used in feature extraction and classification during model training. The output is labeled as Male (0) and Female (1) for the gender model, and Child (0), Adult (1), and Senior (2) for the age model.

A GAN-based image restoration module is incorporated to reconstruct the occluded segments of GEIs. GANs represent a powerful image inpainting technique capable of synthesizing structurally plausible human silhouettes. In this study, the restoration module is not intended as a novel methodological contribution. Instead, it serves as an analytical tool for evaluating the recoverability of gait information lost due to occlusion. By comparing recognition performance using non-occluded, occluded, and GAN-restored GEIs, the study quantifies the extent to which soft biometric information can be recovered after occlusion. This enables a more comprehensive analysis of the true impact of occlusion on gait-based age and gender estimation. Figure 1 shows the overall framework of this work.

2.1. Data Collection

In total, there are 120 videos of people of different genders and ages walking 3 times each, in 4 directions: front, back, left, and right. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the protocol approved by the Research Ethics Committee of Multimedia University (Approval number: EA0012020).

The gender includes male and female, while the age is categorized as child (0–14 years old), adult (15–64 years old), and senior (>65 years old). The videos are segmented according to the walking directions, and only the side-walking parts (left and right directions) are considered in this project. Figure 2 shows some examples of the raw walking video images.

During video acquisition, the camera is placed at a position that can capture the subject’s whole body, starting from the head to the toes. Next, the participants are required to walk in 4 directions, which are front, back, left, and right, three times in each direction. Participants’ consent is obtained prior to the recording. Table 2 shows the participants’ demographic information.

2.2. Gait Energy Image (GEI)

After the gait videos have been collected, each of the side-walking videos are processed into a gait energy image. Gait refers to an individual’s walking manner or pattern and a GEI means the average of all silhouettes in one gait cycle [28,29]. The GEI is generated by taking the raw left–right walking edited videos as a batch of inputs, then extracting individual frames. After that, each frame is segmented by using a pre-trained DeepLabv3 model where it detects pixels that represent ‘person’ and label it as ‘True’, else the background is labeled as ‘False’. A binary mask is generated to set the target ‘person’ to 255 and the background is 0, resulting in a series of binary frames. Next, the contours are extracted from the binary mask to find the gait cycle using cv2.findContours(); it gives the shape’s outline which can be used to find the width, height, bounding box, etc. It searches for the initial frame to the end frames that conclude a complete gait cycle, so the width becomes wider when legs are apart and narrower when the legs overlap. Below is the formula for generating a GEI:

x = \frac{1}{F} \sum_{f = 1}^{F} I

(1)

where F represents the total number of frames in a gait cycle and I_f denotes the gait silhouette at the frame. To identify a complete gait cycle, the individual needs to walk two steps from the starting stance and must end at the same stance. Figure 3 shows an example of GEI.

2.3. Occlusion Simulation

The GEI is synthetically occluded to simulate real-life occlusion scenarios. Two occlusion strategies, namely block-wise and component-specific occlusions, were selected to represent complementary forms of visibility loss. Block-wise occlusion simulates large-scale obstruction caused by environmental objects such as walls, vehicles, or crowds, where contiguous regions of the gait image become unavailable. In contrast, component-specific occlusion targets individual body regions and enables the investigation of the relative importance of specific anatomical components for age and gender estimation. Together, these two occlusion strategies provide both a global and localized analysis of occlusion effects while maintaining a controlled experimental framework. Other forms of occlusion, such as dynamic occlusion, self-occlusion, and irregularly shaped occlusions, are beyond the scope of the present study and will be explored in future work.

2.3.1. Block-Wise Occlusion

Block-wise occlusion was implemented to simulate large-scale visibility loss commonly encountered in practical surveillance environments, such as partial obstruction by walls, vehicles, furniture, or crowds. Unlike component-specific occlusion, which focuses on the contribution of individual body parts, block-wise occlusion enables the evaluation of system robustness when contiguous regions of the gait image are unavailable. This provides insight into the overall spatial distribution of discriminative gait information and helps identify which regions are most critical for age and gender estimation under severe occlusion conditions.

The block-wise occlusion patterns were restricted to horizontal and vertical regions to provide a controlled and interpretable framework for analyzing the spatial distribution of gait information. Horizontal occlusions enable the investigation of upper-, middle-, and lower-body contributions, whereas vertical occlusions assess the importance of left and right spatial regions. Although these occlusion patterns do not encompass all real-world scenarios, they facilitate systematic comparison across subjects and occlusion levels. More complex occlusion patterns, including irregular, dynamic, and object-induced occlusions, are reserved for future investigation.

At first, the GEIs are loaded using cv2.imread(). Then, the coordinates of the silhouette region (non-zero pixels) are extracted using cv2.findNonZero(). These coordinates are passed to the cv2.boundingRect() function to establish a bounding rectangle encompassing the silhouette. After that, occlusions are then simulated by setting the pixels’ value within a specified range of height or width of the bounding rectangle to 0. For vertical occlusion, the width is divided into four equal parts named leftA, leftB, rightA and rightB. The leftA and rightA occlusions cover a quarter of the width starting from the left and right, while the leftB and rightB occlusions cover half of the width from the left and right.

The same applies to horizontal occlusion where the height is divided into five equal parts. Each part is individually occluded, resulting in five different horizontal occlusion patterns labeled topA, topB, topC, topD and topE. Each of the occluded segments affects 1/5 of the height in different vertical positions. In total, there are nine distinct occlusions occurring in different positions in the GEIs. Figure 4 provides a visual representation of these occlusion patterns, with the first row showing vertical occlusions and the second row showing horizontal occlusions.

2.3.2. Component Specific Occlusion

The second form of occlusion applied to the GEIs is the component-specific occlusion. It is different from block-wise occlusion as it focuses on more specific anatomical regions. The component-specific occlusion is manually implemented by creating filled black shapes without outlines and overlaying them onto the targeted body parts to simulate obstruction. There are nine component-specific occlusion configurations, which are the head, shoulder (s), right hand (rh), left hand (lh), butt, right thigh (rt), right calf (rc), left thigh (lt), and left calf (rc). Figure 5 illustrates the component-specific occlusion patterns.

The use of controlled synthetic occlusions is particularly suitable for identifying the relative importance of different body regions because the exact occluded area can be specified and reproduced across all subjects. Such controlled experimentation would be difficult to achieve using naturally occurring occlusions, which are highly variable and often affect multiple body regions simultaneously.

2.4. Feature Extraction and Classification

A convolutional neural network (CNN) [30] is implemented as the baseline model for gait-based age and gender estimation. We merely want to analyze the impact of occlusion on gait recognition performance, rather than to propose a novel classification architecture.

Prior to model training, the GEIs are preprocessed by resizing all images to 224 × 224 pixels. This resolution was selected because it is a widely adopted input size in CNN-based image analysis and provides a good balance between computational efficiency and feature preservation. The pixel intensities are normalized to the [0, 1] range. The CNN architecture consists of three convolutional layers followed by LeakyReLU activation functions to introduce non-linearity and learn complex gait patterns. LeakyReLU activation is employed to mitigate the dead neuron problem commonly associated with the standard ReLU activation function. These convolutional layers extract discriminative features related to posture, stride dynamics, body structure, and movement patterns that are relevant for age and gender classification.

To reduce spatial dimensionality and computational complexity, three max pooling layers are applied. The resulting feature maps are flattened into one-dimensional vectors and passed to two fully connected layers with ReLU activation to learn higher-level representations. Dropout regularization is incorporated to mitigate overfitting during training.

The final classification layer differs according to the task. A sigmoid activation function is used for the gender estimation model to support binary classification. On the other hand, a softmax activation function is adopted for the age estimation model to handle multi-class classification into child, adult, and senior categories, with the predicted class determined by the highest probability score. The architectural configurations for both models are summarized in Table 3.

To evaluate whether the observed occlusion effects are dependent on the selected classifier architecture, an additional comparative experiment using ResNet18 was conducted. The same training, validation, and testing protocols were applied to both models. The purpose of this comparison is to validate the robustness of the observed occlusion trends rather than to identify the best-performing recognition architecture.

2.5. GAN Image Restoration

To investigate whether gait information lost due to occlusion can be recovered, a Generative Adversarial Network (GAN) [31,32] is employed to restore synthetically occluded GEIs. The purpose of the restoration model is not to introduce a new image reconstruction architecture, but to provide a controlled mechanism for assessing the recoverability of age- and gender-related gait information. The GAN framework consists of two adversarially trained components, namely a generator and a discriminator. The generator reconstructs plausible GEIs from occluded inputs, while the discriminator distinguishes between real and restored images. Through adversarial training, the generator progressively learns to produce restorations that resemble the corresponding non-occluded GEIs.

The generator adopts a U-Net-like convolutional autoencoder that takes occluded images as input and tries to output the restored versions. It consists of an encoder (downsampling), a decoder (upsampling), and output layers. The encoder block is built on a series of Conv2D, Batch Normalization, and ReLU. The purpose is to reduce spatial dimensions while increasing the feature richness.

The Conv2D layer consists of a kernel which is a small matrix that extracts features like edges, textures or patterns from the images by sliding over it. A larger kernel can capture more global information, while a smaller one can focus on local, specific detail. The stride is used to determine how far the kernel moves each time it slides over the image, reducing width and height by half if stride is set as two. Strides can be small in the first few layers to capture maximum detail of the image, and can gradually increase later on to reduce spatial dimension. Kernel sizes and strides were selected based on commonly adopted settings in image restoration literature and were kept fixed throughout all experiments to ensure consistency across occlusion scenarios. The padding is set as ‘same’ to ensure the output size is the same as the input size preserving spatial dimensions. The kernel initializer is set as ‘random_normal’ to initialize the filter weights for better generalization.

The decoder block consists of Conv2DTranspose, Batch Normalization, and Relu layers. The main difference between the decoder and encoder block is that the decoder uses Conv2DTranspose instead of a Conv2D layer. It does the opposite of Conv2D, where it reconstructs the image by doubling the width and height. After each Conv2D or Conv2DTranspose layer, batch normalization is added to improve training stability and speed by normalizing the inputs to each layer. Relu activation function is used to introduce non-linearity to the generator. The output layer outputs a single-channel grayscale image with ‘tanh’ as its activation function. The ‘tanh’ output range is [−1, 1], which is easier to train with adversarial loss later on. No dropout layers are added in the generator because it needs consistent patterns to generate outputs that are more realistic.

On the other hand, the discriminator model is used to distinguish ground truth images from fake/restored images. It also helps in training the generator by penalizing it if the discriminator thinks that this image is ‘fake’, and updating the generator parameters. The discriminator is built like a standard CNN classifier. It uses Conv2D, batch normalization, LeakyReLu, dropout, flatten, and dense layers. The kernel and stride sizes for the discriminator are usually smaller, as it needs to focus on fine detail to distinguish between real and fake images. The flatten layer converts the 3D feature map into 1D for the dense layer. The dense layer then outputs a single probability with sigmoid activation function; it is either real (close to 1) or fake (close to 0). LeakyReLU activation is used in discriminator to prevent the dead neurons problem by enabling the gradients to flow when the neurons are not activated; since the dropout layer exists, the generator used ReLU instead, and as there are no dropout layers in the generator, the neurons will not be deactivated. Table 4 shows the generator and discriminator settings. It should be noted that the objective of this study is not to optimize the GAN architecture or investigate the effects of different loss functions. A conventional U-Net-based GAN configuration was adopted to provide a stable restoration framework for evaluating the recoverability of gait information under occlusion. Consequently, architectural and loss-function ablation studies are beyond the scope of the present work.

After the components of the generator and discriminator are defined, they are compiled with a Binary Cross-Entropy to calculate loss. During the training phase, the discriminator will be trained on real images with real labels (1) and fake images with fake labels (0), so if it can distinguish them well, the d_loss_real and d_loss_fake will be low; hence, the average of these two d_loss values will be low too. After that, during the generator training phase, the generator will feed the restored images to the discriminator and label it as a real label. The purpose of this is to fool the discriminator by telling it that these generated images are real images. If the discriminator thinks that the generated image is fake, this means that the generated output is not realistic enough to fool the discriminator. Hence, the adv_loss and g_total_loss will be high, and the gradient will be computed and used to update the generator’s parameters to restore the images to be more realistic.

It is important to note that the focus of this study is not the development of a new GAN architecture. A lightweight U-Net-based generator and a conventional CNN discriminator were intentionally selected to provide a stable and interpretable image restoration framework. This allows the observed changes in age and gender estimation performance to be attributed primarily to the effects of occlusion and information recovery rather than architectural innovations in the restoration model.

3. Results

3.1. Assessment of Full GEI

First, we present the results for the models trained on the original, non-occluded GEI data. There are 715 GEIs from 120 subjects that are augmented to 4290 GEIs to enhance training. The train–test split is set at 8:2. The CNN settings for the gender model are three convolutional layers, three pooling layers, three dense layers, and two dropout layers. The model was trained with 15 epochs and achieved 85% test accuracy. As the model is trained on non-occluded images, it was able to capture relevant information from the images and classify them with decent accuracy. Table 5 shows the classification report of the gender model.

On the other hand, the CNN settings for the age model are also three convolutional layers, three pooling layers, three dense layers and two dropout layers. The model was trained with 15 epochs and achieved 80% test accuracy. Similar to the gender model, a model trained with non-occluded images will not affect the gait recognition significantly; it still can classify three different age groups with high accuracy. Table 6 shows the classification report of the age model.

3.2. Block-Wise Occlusion Results

Block-wise occlusion experiments were conducted to evaluate the impact of spatially localized information loss. The models used 715 GEIs from 120 subjects. The training dataset consists of 4290 augmented non-occluded GEIs, while the testing dataset consists of 715 block-wise occluded GEIs. There were a total of nine block-wise occlusion configurations tested. The CNN settings for both gender and age models are three convolutional layers, three pooling layers, three dense layers and two dropout layers. Both models are trained with 15 epochs. Table 7 shows the test accuracy for the gender and age models. From the results, the larger occlusion size (leftB, rightB) affects the performance more compared to the smaller occlusion size (leftA, rightA) and the shoulder occlusion (topB) gives the lowest accuracy, implying that the shoulder part is the most crucial evidence for age classification.

3.3. Component-Specific Occlusion Results

For component-specific occlusion, the dataset and CNN settings used are the same as the block-wise occlusion experiment. There are also nine types of component-specific occlusions. The occluded part is more specific compared to block-wise occlusion. Table 8 shows the test accuracy for both the gender and age models. As the results show, the age model generally has lower accuracy on each component than the gender model. The lowest accuracy still occurs in the shoulder region for the age model, as in the block-wise occlusion results, while the gender model has high accuracy in the calf parts.

3.4. Restored GEI

The generator used the occluded GEIs for training, while the discriminator used both occluded and unoccluded GEIs; both were trained with 1200 epochs. To improve the stability of GAN training, sample replication was applied using the NumPy tile function. The replication procedure was performed only after the dataset had been partitioned into training, validation, and testing subsets. Consequently, the replicated samples were confined to the training set, while the validation and testing sets remained unchanged. No duplicated sample appeared simultaneously in both training and testing partitions, thereby preventing data leakage and ensuring an unbiased evaluation of model performance.

The images are resized to (64, 64) to preserve memory and are upscaled before output, resulting in a slightly blurry output. For each epoch, d_loss and g_loss are computed: d_loss measures how well the discriminator can distinguish fake and real images, while g_loss measures the generator’s ability to produce realistic images from the occluded version. G_loss is the sum of adv_loss and l1_loss: adv_loss encourages the generator to produce more realistic images that the discriminator predicts as real images, while l1_loss represents reconstruction loss that ensures that the restored images are close to the original unoccluded images, focusing on accuracy.

To quantitatively assess restoration quality, Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) are adopted as evaluation metrics. PSNR measures reconstruction accuracy, where values above 30 dB generally indicate good restoration quality, while SSIM evaluates perceptual similarity in terms of structural consistency and contrast, with values above 0.8 considered satisfactory.

Table 9 and Table 10 show the average PSNR and SSIM of block-wise occlusion and component-specific occlusion with these settings. For clarity, they are labeled as specific settings. As the results show, component-specific occluded GEIs are generally restored with over 30 dB and 0.8 SSIM with kernel sizes of (3,3) and (1,1), (2,2) strides. The block-wise occluded GEIs are also restored pretty decently but as compared to the restored component-specific occluded GEIs, the average PSNR is slightly lower due to its big occlusion size that causes information loss; hence, a bigger (5,5) kernel and (2,2) stride are used to capture global information.

Another setting known as general GAN is trained with all types of occlusion as the input for block-wise occlusion and component-specific occlusion. There are 715 × 9 occluded GEIs used in generator training and the unoccluded GEI is duplicated nine times to match the size of the occluded GEIs. The kernel size is (4,4) and the stride size is (1,1), (2,2) in the generator, while the kernel size is (3,3) and the stride size is (1,1), (2,2) in the discriminator. The average PSNR and SSIM for block-wise occlusion computed by the general GAN are 29.16 dB and 0.9205, while component-specific occlusion achieved 34.33 dB average SPNR and 0.9729 average SSIM. Although the results of PSNR and SSIM seem not that bad, the images produced by specific GAN are still better than general GAN, as there are situations where the kernel or stride size is not suitable for certain occlusions. Hence, the subsequent experiments use restored GEI produced by specific GAN. Figure 6 shows the restored block-wise occluded GEIs in specific GAN, while Figure 7 shows the restored component-specific occluded GEIs in specific GAN. Figure 8 shows the restored block-wise occluded GEIs in general GAN, while Figure 9 shows the restored component-specific occluded GEIs in general GAN. The subject in the example is the same subject in Figure 4 and Figure 5.

3.5. Evaluation Results Using Restored Block-Wise Occluded GEIs

The models are trained with the same CNN settings and GEIs restored with the specific GAN. GEIs are resized to [−1, 1] before training as the tanh activation function is used. We can see from Table 11 that the gender model trained on restored GEIs produces higher accuracy across various originally occluded areas. This demonstrates that the GAN restoration works well by restoring the occluded parts and provides the model with a more complete representation of the gait patterns to work on. The horizontal occlusion originally ranged from 57% to 65%, but after the images were restored, the accuracy range increased to 71% to 83%. As for the age model, the accuracy improvement across nine parts is also helping the model to classify the age more accurately. The highlight will be the topB occlusion where it makes the age model struggle with missing shoulder information, yielding only as low as 29% accuracy with occluded GEIs. But after the occluded region is restored, it improves the accuracy by up to 70%, showing the impact of image restoration on model training. Restoration in block-wise occlusion helps the models to improve a lot, which, due to GAN, restores the information loss caused by the large block-wise occlusion on body parts, especially on occluded parts like leftB and rightB that occlude a large area.

3.6. Evaluation Results Using Restored Component-Specific Occluded GEIs

Restoration of component-specific occlusion also enhances the performance for both gender and age models. From Table 12, we observe that the gender model is able to exceed 80% accuracy for most parts, and the restored shoulder occlusion also improves the age model accuracy from 48% to 73%. GAN restoration performs well in restoring component-specific occlusions because they are relatively smaller than block-wise occlusions, so the average PSNR and SSIM are higher than block-wise occlusions, producing a more realistic and accurate restored image.

3.7. Comparison with State-of-the-Arts

This section compares the proposed method, i.e., the CNN + GAN pipeline, with other models. The proposed method is compared with the KNN [33] and Random Forest [21] algorithms to highlight the effectiveness of deep learning in capturing spatial features. As shown in Table 13, although KNN and Random Forest have decent results in block-wise occlusion, the proposed method still outperforms them in terms of accuracy, proving the impact of GAN image restoration in gait recognition. Compared to block-wise occlusion, KNN and Random Forest produced better accuracy because the occlusion size is smaller in component-specific occlusion (refer to Table 14). KNN relies on distance between images, so if only a small area is occluded, the change in pixel values is not significant enough to affect the distance between similar and non-similar images. Random Forest uses random subsets of features in each tree, so many trees still observe useful parts of the image if the occlusion is small. Nevertheless, the proposed method still performs better than KNN and Random Forest due to the image restoration of the GAN model.

3.8. Cross-Architecture Validation

To evaluate whether the observed occlusion effects are dependent on the selected classifier architecture, an additional validation experiment was conducted using ResNet18. Table 15 presents the results for five representative conditions: non-occluded GEIs, TopB occlusion, shoulder occlusion, and their corresponding GAN-restored versions. ResNet18 achieved higher accuracies than the lightweight CNN across all conditions, reflecting its stronger feature extraction capability. However, both architectures exhibited similar performance trends. TopB occlusion resulted in the largest reduction in accuracy, while shoulder occlusion produced a moderate impact. Furthermore, GAN-based restoration consistently improved age and gender estimation performance for both models. These findings indicate that the observed effects of occlusion and the benefits of image restoration are not specific to the lightweight CNN architecture. The consistent trends across both classifiers suggest that the conclusions regarding the importance of upper-body gait information and the effectiveness of GAN-based restoration are robust and architecture-independent.

4. Discussion

Following the test results for both age and gender estimation models, a comparative analysis was conducted between the performance achieved on non-occluded and occluded GEIs. The following are some interesting findings based on the results.

As expected, models trained on occluded GEI data have lower test accuracy compared to models that use non-occluded GEI. This shows the detrimental effect of information loss due to occlusion. The bigger the occlusion proportion is, the less information can be obtained from the image for recognition.
In block-wise occlusion, the leftA and rightA occlusions yield a higher accuracy compared to others, such as leftB and rightB. This is because leftA and rightA affect smaller portions of the GEI silhouette, while leftB and rightB cover half of the body width. This confirms that the size of occlusions affects the gait recognition performance.
Horizontal occlusions have accuracies ranging from 57% to 65% for the gender model in block-wise occlusion. This suggests that even with block-wise obstruction of upper body posture, head movement and shoulder swing, there is sufficient information for gender recognition. The observed drop in accuracy with occluding upper body (topA, topB) and lower limbs (topD, topE) indicates the contribution of these regions (e.g., stride length, body posture, arm swing) to gender recognition.
The age model with occlusion in block-wise occlusion has similar results as the gender model but it has the lowest accuracy of 29% for topB occlusion. The topB occlusion covers the upper torso and shoulders of the subject GEI. This implies that information contained around the shoulder area is critical for age estimation. The possible reason could be that older people tend to have rounded shoulders, causing a stooped posture which differentiates them from subjects in other age groups.
In component-specific occlusion, the lowest accuracy occurs with head occlusion for the gender model, while the age model generally has lower accuracy for each component compared to the gender model.
For the gender model in component-specific occlusion, the left calf (lc) occlusion gives the highest accuracy. The right calf (rc) occlusion also has high accuracy. This means that the calf component might be a region that provides discriminative information for gender classification.
For the age model in component-specific occlusion, shoulder occlusion yields the lowest accuracy for both model evaluations, which indicates that the shoulder is the most vital information for age estimation, which is consistent with the block-wise occlusion findings.
The gender model trained on restored block-wise-occluded GEIs produces higher accuracy across various originally occluded areas.
For the age model, the accuracy improvement across nine parts helps the model to classify the age more accurately due to restored component-specific images.
TopB occlusion that makes the age model struggle at first improves from 29% accuracy to 70% after restoration.
Restoration in block-wise occlusion helps the models to improve significantly by restoring missing information, especially on occluded parts like leftB and rightB that occlude a large area.
Restoration of component-specific occlusion also enhances the performance of both the gender and age models.
The gender model can exceed 80% accuracy for most of the parts.
GAN restoration performs well in restoring component-specific occlusions because they are relatively smaller than block-wise occlusions, yielding higher average PSNR and SSIM and a more realistic and accurate restored image.
Specific GAN restores occluded GEIs better than general GAN because each occlusion can have its own custom settings, enabling GAN to adapt to various types of occlusions.
The restoration framework employed in this study operates on GEIs rather than raw gait sequences. Consequently, the GAN reconstructs missing spatial information from the aggregated gait representation without explicitly modeling temporal continuity between consecutive frames. While GEIs have been widely adopted due to their compact representation and computational efficiency, temporal gait characteristics may contain additional discriminative information that is not captured in the current framework. Future studies may investigate sequence-based restoration approaches that incorporate temporal dependencies using recurrent networks, ConvLSTM architectures, or transformer-based models.
Experimental results show that the Adult class achieved the highest F1-score (0.84), followed by the Child class (0.82), while the Senior class obtained a lower F1-score of 0.64. The reduced performance for the Senior class may be attributed in part to the smaller number of Senior participants in the dataset compared to the Adult group. Nevertheless, the model was still able to identify Senior subjects with reasonable precision and recall, indicating that the class remains distinguishable despite the demographic imbalance. Since all occlusion experiments were conducted using the same class distribution, the comparative analysis of occlusion effects remains consistent across experimental conditions.
The cross-architecture validation experiment demonstrates that the primary findings of this study remain consistent across different classification backbones. Although ResNet18 achieved higher overall accuracy than the lightweight CNN because of its greater representational capacity, both models exhibited similar sensitivity to occlusions affecting the upper body. This observation suggests that the identified importance of specific gait regions is attributable to the underlying gait information rather than the characteristics of a particular classifier architecture.
Although the occlusions considered in this study are synthetically generated, the findings provide practical insights into the spatial importance of different body regions for gait-based age and gender estimation. In particular, the results demonstrate that upper-body information contributes significantly to recognition performance and should therefore be prioritized when designing occlusion-aware gait analysis systems.

5. Limitations

Several limitations should be acknowledged. First, the study employs synthetically generated occlusions. Although synthetic occlusions facilitate systematic analysis and controlled experimentation, they may not fully represent the complexity of real-world occlusion patterns encountered in surveillance environments involving dynamic objects, shadows, crowds, and self-occlusion. Future work will extend the proposed framework to real-world gait datasets containing naturally occurring occlusions, such as GREW and BRIAR, to further evaluate the generalizability of the findings.

Second, the experimental evaluation primarily utilizes a lightweight CNN classifier, with additional validation performed using ResNet18. Although the consistency of the results across both architectures suggests that the findings are robust, future work should investigate whether similar trends are observed in more advanced gait recognition frameworks, including transformer-based models and state-of-the-art gait-specific architectures. The proposed restoration framework employs a conventional U-Net-based GAN. More sophisticated restoration approaches may further improve information recovery under severe occlusion conditions.

Third, the study utilizes GEIs as the primary gait representation. Although GEIs effectively summarize the spatial characteristics of a gait cycle, they compress temporal information into a single image. Consequently, the GAN-based restoration framework cannot explicitly exploit temporal continuity between gait frames. Future work may explore silhouette-sequence restoration and spatio-temporal reconstruction models to better preserve dynamic gait information under occluded conditions.

Fourth, the age distribution of the dataset is moderately imbalanced, with adults representing the majority class and seniors constituting the smallest group. As reflected by the classification report, the Senior class achieved lower precision, recall, and F1-score than the other age groups. Consequently, the reported age estimation performance for seniors should be interpreted with caution. Future work will involve collecting a larger and more balanced dataset to further validate the findings across all age categories.

Fifth, the evaluation was conducted using a single train–test split. Although the same evaluation protocol was consistently applied across all experimental conditions, the absence of repeated runs or cross-validation prevents formal statistical significance testing of the observed performance differences. Future work will incorporate repeated experiments, cross-validation, and statistical hypothesis testing to further validate the robustness of the reported findings.

Despite these findings, the current framework is not intended for direct deployment in unconstrained real-world environments. The synthetic occlusions used in this study provide controlled visibility loss but do not fully represent naturally occurring occlusions caused by dynamic objects, crowds, shadows, self-occlusion, or complex scene interactions. Consequently, the reported performance may differ from that achieved in practical surveillance settings. Validation on naturally occluded datasets and in real-world environments remains an important direction for future work.

6. Conclusions

This paper aims to observe the impacts of occlusion on gender and age estimation performances using gait features and the impact of image restoration. To achieve this, a CNN deep learning age and gender model is implemented to train on synthetically occluded GEIs. The occlusions include block-wise ones, with varying sizes of occlusions and component-specific occlusions targeting different body parts. Experimental results showed that the non-occluded models achieved accuracies of 85% and 80% for gender and age estimation, respectively. Among the evaluated occlusion scenarios, TopB occlusion produced the most severe performance degradation, reducing gender estimation accuracy from 85% to 57% and age estimation accuracy from 80% to 29%. The GAN-based restoration framework partially recovered the lost gait information, improving the corresponding accuracies to 74% and 70%, respectively. These findings indicate that upper-body gait information plays a critical role in both age and gender estimation.

Cross-architecture validation using ResNet18 further confirmed the robustness of the findings. Although ResNet18 achieved higher absolute accuracies than the lightweight CNN, similar occlusion sensitivity patterns were observed across both architectures. This suggests that the identified importance of upper-body gait information is not architecture-dependent.

Future work may improve recognition performance by incorporating more advanced gait representations, transformer-based architectures, sequence-based spatio-temporal modeling, and multimodal sensing technologies such as WiFi, millimeter-wave radar, and wearable inertial sensors. In addition, evaluation on naturally occluded datasets and larger balanced populations would further strengthen the practical applicability of the proposed framework.

Author Contributions

Conceptualization, T.C.; methodology, T.C. and R.Q.C.Z.; software, R.Q.C.Z.; validation, R.Q.C.Z., Z.K.L. and M.K.O.G.; formal analysis, R.Q.C.Z.; data curation, R.Q.C.Z. and Z.K.L.; writing—original draft preparation, R.Q.C.Z.; writing—review and editing, T.C., Z.K.L. and M.K.O.G.; supervision, T.C.; funding acquisition, M.K.O.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Multimedia University through the TM R&D Fund (Project ID: MMUE/250015 (Project No. RDTC/251166)).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Research Ethics Committee of Multimedia University (Approval number: EA0012020 and date of approval on 27 March 2020).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The dataset used in this study is available at: https://www.kaggle.com/datasets/teechengqi/mmu-gag-dataset, accessed on 25 June 2026.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

Ti, Y.F.; Connie, T.; Goh, M.K.O. GenReGait: Gender Recognition Using Gait Features. J. Inform. Web Eng. 2023, 2, 129–140. [Google Scholar] [CrossRef]
Vora, C.; Katkar, V.; Lunagaria, M. GAIT Analysis Based on GENDER Detection Using Pre-Trained Models and Tune Parameters. Discov. Artif. Intell. 2024, 4, 19. [Google Scholar] [CrossRef]
Alguliyev, R.; Aliguliyev, R.; Sukhostat, L. Human Gender Classification and Age Estimation Based on Gait Images Using Deep Learning. Multimed. Tools Appl. 2025, 84, 49055–49069. [Google Scholar] [CrossRef]
Li, X.; Makihara, Y.; Xu, C.; Yagi, Y. GaitAGE: Gait Age and Gender Estimation Based on an Age- and Gender-Specific 3D Human Model. IEEE Trans. Biom. Behav. Identity Sci. 2025, 7, 47–60. [Google Scholar] [CrossRef]
Saleem, A.A.; Siddiqui, H.U.R.; Sehar, R.; Dudley, S. Gender Classification Based on Gait Analysis Using Ultrawide Band Radar Augmented with Artificial Intelligence. Expert Syst. Appl. 2024, 249, 123843. [Google Scholar] [CrossRef]
Aderinola, T.B.; Connie, T.; Ong, T.S.; Teoh, A.B.J.; Goh, M.K.O. AggreGait: Automatic Gait Feature Extraction for Human Age and Gender Classification with Possible Occlusion. Array 2025, 26, 100379. [Google Scholar] [CrossRef]
Tan, V.W.S.; Ooi, W.X.; Chan, Y.F.; Connie, T.; Goh, M.K.O. Vision-Based Gait Analysis for Neurodegenerative Disorders Detection. J. Inform. Web Eng. 2024, 3, 136–154. [Google Scholar] [CrossRef]
Song, X.; Hou, S.; Huang, Y.; Cao, C.; Liu, X.; Huang, Y.; Shan, C. Gait Attribute Recognition: A New Benchmark for Learning Richer Attributes from Human Gait Patterns. IEEE Trans. Inf. Forensics Secur. 2024, 19, 1–14. [Google Scholar] [CrossRef]
Chen, Y.-J.; Chen, L.-X.; Lee, Y.-J. Systematic Evaluation of Features From Pressure Sensors and Step Number in Gait for Age and Gender Recognition. IEEE Sens. J. 2022, 22, 1956–1963. [Google Scholar] [CrossRef]
Mukherjee, M.; Faisal, A.I.; Balakrishnan, N.; Kumar, S.; Deen, M.J. An Inferential Model for Understanding the Effects of Demographic and Gait Factors and Their Interactions on the Human Gait Index: A Beta Regression Approach. IEEE J. Biomed. Health Inform. 2025, 29, 7593–7606. [Google Scholar] [CrossRef] [PubMed]
Lu, J.; Tan, Y.P. Gait-Based Human Age Estimation. IEEE Trans. Inf. Forensics Secur. 2010, 5, 761–770. [Google Scholar] [CrossRef]
Hasan, K.; Uddin, M.Z.; Ray, A.; Hasan, M.; Alnajjar, F.; Ahad, M.A.R. Improving Gait Recognition Through Occlusion Detection and Silhouette Sequence Reconstruction. IEEE Access 2024, 12, 158597–158610. [Google Scholar] [CrossRef]
Bharti, J.; Tomar, D.S.; Bhattacharjee, S. Gait Estimation of Occluded ROIs Using Interpolation Techniques and Testing Their Performance in Speed Variation in Gait. Procedia Comput. Sci. 2025, 258, 3826–3856. [Google Scholar] [CrossRef]
Xu, C.; Makihara, Y.; Li, X.; Yagi, Y. Occlusion-Aware Human Mesh Model-Based Gait Recognition. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1309–1321. [Google Scholar] [CrossRef]
Li, T.; Ma, W.; Zheng, Y.; Fan, X.; Yang, G.; Wang, L.; Li, Z. A Survey on Gait Recognition against Occlusion: Taxonomy, Dataset and Methodology. PeerJ Comput. Sci. 2024, 10, e2602. [Google Scholar] [CrossRef] [PubMed]
Qin, H.; Chen, Z.; Guo, Q.; Wu, Q.M.J.; Lu, M. RPNet: Gait Recognition With Relationships Between Each Body-Parts. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 2990–3000. [Google Scholar] [CrossRef]
Wang, Z.; Hou, S.; Zhang, M.; Liu, X.; Cao, C.; Huang, Y. GaitParsing: Human Semantic Parsing for Gait Recognition. IEEE Trans. Multimed. 2024, 26, 4736–4748. [Google Scholar] [CrossRef]
Kumar, S.S.; Singh, B.; Chattopadhyay, P.; Halder, A.; Wang, L. BGaitR-Net: An Effective Neural Model for Occlusion Reconstruction in Gait Sequences by Exploiting the Key Pose Information. Expert Syst. Appl. 2024, 246, 123181. [Google Scholar] [CrossRef]
Ali, Z.; Moon, J.; Gillani, S.; Afzal, S.; Bukhari, M.; Rho, S. A Region-Aware Deep Learning Model for Dual-Subject Gait Recognition in Occluded Surveillance Scenarios. CMES Comput. Model. Eng. Sci. 2025, 144, 2263–2286. [Google Scholar] [CrossRef]
Zhang, S.; Wang, Y.; Li, A. Gait Energy Image-Based Human Attribute Recognition Using Two-Branch Deep Convolutional Neural Network. IEEE Trans. Biom. Behav. Identity Sci. 2023, 5, 53–63. [Google Scholar] [CrossRef]
Awai, S.; Chikano, M.; Konno, T. Gait Recognition Using Occlusion Classification in Security Cameras. In Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE), Nara, Japan, 10–13 October 2023; pp. 520–521. [Google Scholar] [CrossRef]
Paul, A.; Jain, M.M.; Jain, J.; Chattopadhyay, P. Gait Cycle Reconstruction and Human Identification from Occluded Sequences. arXiv 2022, arXiv:2206.13395. [Google Scholar]
Xu, C.; Tsuji, S.; Makihara, Y.; Li, X.; Yagi, Y. Occluded Gait Recognition via Silhouette Registration Guided by Automated Occlusion Degree Estimation. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 4–6 October 2023; pp. 3191–3201. [Google Scholar]
Gupta, A.; Chellappa, R. You Can Run but Not Hide: Improving Gait Recognition with Intrinsic Occlusion Type Awareness. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 5881–5890. [Google Scholar]
Dong, A.; Zhang, J.; Xu, W.; Jia, J.; Yun, S.; Yu, J. Wi-FiAG: Fine-Grained Abnormal Gait Recognition via CNN-BiGRU with Attention Mechanism from Wi-Fi CSI. Mathematics 2025, 13, 1227. [Google Scholar] [CrossRef]
Mazzieri, R.; Pegoraro, J.; Rossi, M. Open-Set Gait Recognition from Sparse mmWave Radar Point Clouds. IEEE Sens. J. 2025, 25, 33051–33063. [Google Scholar] [CrossRef]
Huan, R.; Dong, G.; Cui, J.; Jiang, C.; Chen, P.; Liang, R. INSENGA: Inertial Sensor Gait Recognition Method Using Data Imputation and Channel Attention Weight Redistribution. IEEE Sens. J. 2025, 25, 39197–39219. [Google Scholar] [CrossRef]
Abirami, B.; Subashini, T.S.; Mahavaishnavi, V. Automatic Age-Group Estimation from Gait Energy Images. Mater. Today Proc. 2020, 33, 4646–4649. [Google Scholar] [CrossRef]
Wang, X.; Yan, W.Q. Human Gait Recognition Based on Frame-by-Frame Gait Energy Images and Convolutional Long Short-Term Memory. Int. J. Neur. Syst. 2020, 30, 1950027. [Google Scholar] [CrossRef] [PubMed]
Yan, J.; Yang, Z.; Lin, Y.; Zhou, W. Multi-Attention Augmented Spatio-Temporal Graph Convolution Network for Gait Recognition Based on IMUs Data. In Proceedings of the 2025 2nd International Conference on Electronic Engineering and Information Systems (EEISS), Nanjing, China, 23–25 May 2025; pp. 1–4. [Google Scholar]
Yu, S.; Liao, R.; An, W.; Chen, H.; García, E.B.; Huang, Y.; Poh, N. GaitGANv2: Invariant Gait Feature Extraction Using Generative Adversarial Networks. Pattern Recognit. 2019, 87, 179–189. [Google Scholar] [CrossRef]
Bicer, M.; Phillips, A.T.M.; Melis, A.; McGregor, A.H.; Modenese, L. Generative Adversarial Networks to Create Synthetic Motion Capture Datasets Including Subject and Gait Characteristics. J. Biomech. 2024, 177, 112358. [Google Scholar] [CrossRef] [PubMed]
Xing, H.; Zhang, R. Gait Recognition for Exoskeleton Robots Based on Improved KNN-DAGSVM Fusion Algorithm. In Proceedings of the 2022 37th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Beijing, China, 19–20 November 2022; pp. 364–369. [Google Scholar]

Figure 1. Overall framework of the proposed method.

Figure 2. Raw walking video images.

Figure 3. A sample of GEI.

Figure 4. Block-wise occlusions.

Figure 5. Component-specific occlusions.

Figure 6. Restored block-wise occluded GEIs (specific setting).

Figure 7. Restored component-specific occluded GEIs (specific setting).

Figure 8. Restored block-wise occluded GEIs (general setting).

Figure 9. Restored component-specific occluded GEIs (general setting).

Table 1. Summary of state-of-the-art methods.

Author	Method	Dataset	Classes	Results
Awai et al. [21]	Random Forest	CASIA-B dataset	Five occlusion patterns with conditions	Average Accuracy: 72.5%
Paul et al. [22]	Random Forest	CASIA-B, OU-ISIRLP	Five distinct percentages of occlusions	Dice Score: 0.898 Average Rank-1 Accuracy (%): 77.21
Xu et al. [23]	Occlusion ratio estimator, GaitGL	OU-MVLP	4 × 4 occlusion combinations	Average Rank-1 Rate (%): 73.6 Average EER (%): 1.45
Chen et al. [9]	SMPL	OU-MVLP	4 occlusion types with 4 angles each	Average Rank-1 Rate (%): 60.2
Gupta et al. [24]	Detectron 2 CNN Occlusion Awareness	BRIAR GREW	(1) BRIAR (Rank-1): 100 m, 400 m, 500 m, Elevated angle, Aerial (2) GREW: Rank-1, Rank 5, Rank-10, Rank-20	(1) Retrieval Accuracy (%) (GaitGL) (BRIAR): 34.66, 20.14, 16.90, 26.72, 26.83 (2) Retrieval Accuracy (%) (GaitGL) (GREW): 13.77, 25.91, 32.70, 40.60
Hasan et al. [12]	ODR FEGR	CASIA B OU-MVLP	Silhouette sequence with reconstruction under 5 types of occlusion	Average Rank-1 Accuracy (%): CASIA-B: 84.5 OU-MVLP: 58.9

Table 2. Participants’ demographic information.

Group	Gender	Count	Total	Percentage
Child	Male	15	27	22.50%
	Female	12
Adult	Male	28	70	58.33%
	Female	42
Senior	Male	15	23	19.17%
	Female	8

Note: The last column represents the percentage of participants within each age group relative to the total study population.

Table 3. CNN configuration for gender and age estimation models.

Model	Cov Layers	Pooling Layers	Dense Layers	Dropout Layers	Activation Layers
Gender	3	3	3	2	Sigmoid
Age	3	3	3	2	Softmax

Table 4. GAN’s generator and discriminator configuration.

Model	Architecture	Conv Layer	Kernel Size	Stride Size
Generator	Autoencoder	Convo2D, Convo2DTranspose	(3,3) (4,4), (5,5)	(1,1), (2,2)
Discriminator	CNN	Convo2D	(3,3)	(1,1), (2,2)

Table 5. Gender model classification report.

Group	Precision	Recall	F1-Score
Male	0.86	0.84	0.85
Female	0.85	0.87	0.86
Accuracy			0.85

Table 6. Age model classification report.

Group	Precision	Recall	F1-Score
Child	0.83	0.80	0.82
Adult	0.81	0.86	0.84
Senior	0.69	0.59	0.64
Accuracy			0.80

Table 7. Test results for block-wise occlusion.

Model	Test Accuracy (%)
	Gender	Age
leftA	88	81
leftB	66	67
rightA	87	83
rightB	56	61
topA	65	62
topB	57	29
topC	57	60
topD	63	60
topE	65	63
Average	67	63

Table 8. Test results for component-specific occlusion.

Model	Test Accuracy (%)
	Gender	Age
head	63	61
shoulder	73	48
rh	79	62
lh	68	64
butt	62	66
lt	75	71
lc	85	67
rt	73	61
rc	84	70
Average	74	63

Table 9. Restored results for block-wise occlusion (specific setting).

Model	Average PSNR (dB)	Average SSIM
leftA	36.09	0.8826
leftB	29.85	0.9384
rightA	35.80	0.9581
rightB	30.14	0.9570
topA	30.11	0.9661
topB	34.06	0.9822
topC	34.08	0.9766
topD	34.12	0.9814
topE	33.25	0.9702
Average	33.05	0.9569

Table 10. Restored results for component-specific occlusion (specific setting).

Model	Average PSNR (dB)	Average SSIM
head	31.03	0.9725
shoulder	34.18	0.9826
rh	33.62	0.8741
lh	34.76	0.9756
butt	37.82	0.9765
lt	38.14	0.9831
lc	39.05	0.9846
rt	37.63	0.9763
rc	39.15	0.9708
Average	36.15	0.9662

Table 11. Test results for restored block-wise occlusion.

Model	Test Accuracy (%)
	Gender	Age
leftA	86	76
leftB	78	68
rightA	82	78
rightB	80	73
topA	71	68
topB	74	70
topC	83	71
topD	77	74
topE	78	72
Average	78	72

Table 12. Test results for restored component-specific occlusion.

Model	Test Accuracy (%)
	Gender	Age
head	73	72
shoulder	84	73
rh	81	70
lh	83	73
butt	83	75
lt	83	76
lc	84	76
rt	81	77
rc	84	78
Average	81	74

Table 13. Comparison of restored block-wise occlusion.

Block-Wise Occlusion	Gender Model Test Accuracy (%)			Age Model Test Accuracy (%)
Block-Wise Occlusion	Proposed Method	KNN	Random Forest	Proposed Method	KNN	Random Forest
leftA	86	83	75	76	76	75
leftB	78	54	53	68	46	52
rightA	82	82	77	78	77	75
rightB	80	52	50	73	42	55
topA	71	62	70	68	62	78
topB	74	77	63	70	67	59
topC	83	78	58	71	68	57
topD	77	72	61	74	64	57
Average	78	70	63	72	62	63

Table 14. Comparison of restored component–specific occlusion.

Block-Wise Occlusion	Gender Model Test Accuracy (%)			Age Model Test Accuracy (%)
Block-Wise Occlusion	Proposed Method	KNN	Random Forest	Proposed Method	KNN	Random Forest
head	73	63	70	72	61	77
shoulder	84	84	74	73	72	76
rh	81	75	78	70	71	59
lh	83	75	66	73	68	70
butt	83	83	73	75	73	66
lt	86	82	76	76	74	70
lc	84	82	74	76	73	76
rt	81	81	78	77	72	69
Average	81	78	73	74	70	70

Table 15. Cross-architecture validation results using CNN and ResNet18 under representative occlusion scenarios.

Model	Non-Occluded	TopB	Shoulder	Restored TopB	Restored Shoulder
CNN (Gender)	85	57	73	74	84
ResNet18 (Gender)	89	60	78	78	87
CNN (Age)	80	29	48	70	73
ResNet18 (Age)	86	34	53	74	77

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, R.Q.C.; Connie, T.; Lim, Z.K.; Goh, M.K.O. Impacts of Occlusion on the Symmetry of Gait Representations for Age and Gender Estimation. Symmetry 2026, 18, 1082. https://doi.org/10.3390/sym18071082

AMA Style

Zheng RQC, Connie T, Lim ZK, Goh MKO. Impacts of Occlusion on the Symmetry of Gait Representations for Age and Gender Estimation. Symmetry. 2026; 18(7):1082. https://doi.org/10.3390/sym18071082

Chicago/Turabian Style

Zheng, Ryan Qin Chin, Tee Connie, Zhe Khae Lim, and Michael Kah Ong Goh. 2026. "Impacts of Occlusion on the Symmetry of Gait Representations for Age and Gender Estimation" Symmetry 18, no. 7: 1082. https://doi.org/10.3390/sym18071082

APA Style

Zheng, R. Q. C., Connie, T., Lim, Z. K., & Goh, M. K. O. (2026). Impacts of Occlusion on the Symmetry of Gait Representations for Age and Gender Estimation. Symmetry, 18(7), 1082. https://doi.org/10.3390/sym18071082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Impacts of Occlusion on the Symmetry of Gait Representations for Age and Gender Estimation

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Gait Energy Image (GEI)

2.3. Occlusion Simulation

2.3.1. Block-Wise Occlusion

2.3.2. Component Specific Occlusion

2.4. Feature Extraction and Classification

2.5. GAN Image Restoration

3. Results

3.1. Assessment of Full GEI

3.2. Block-Wise Occlusion Results

3.3. Component-Specific Occlusion Results

3.4. Restored GEI

3.5. Evaluation Results Using Restored Block-Wise Occluded GEIs

3.6. Evaluation Results Using Restored Component-Specific Occluded GEIs

3.7. Comparison with State-of-the-Arts

3.8. Cross-Architecture Validation

4. Discussion

5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI