Robust and Lightweight System for Gait-Based Gender Classiﬁcation toward Viewing Angle Variations

: In computer vision applications, gait-based gender classiﬁcation is a challenging task as a person may walk at various angles with respect to the camera viewpoint. In some of the viewing angles, the person’s limb movement can be occluded from the camera, preventing the perception of the gait-based features. To solve this problem, this study proposes a robust and lightweight system for gait-based gender classiﬁcation. It uses a gait energy image (GEI) for representing the gait of an individual. A discrete cosine transform (DCT) is applied on GEI to generate a gait-based feature vector. Further, this DCT feature vector is applied to XGBoost classiﬁer for performing gender classiﬁcation. To improve the classiﬁcation results, the XGBoost parameters are tuned. Finally, the results are compared with the other state-of-the-art approaches. The performance of the proposed system is evaluated on the OU-MVLP dataset. The experiment results show a mean CCR (correct classiﬁcation rate) of 95.33% for the gender classiﬁcation. The results obtained from various viewpoints of OU-MVLP illustrate the robustness of the proposed system for gait-based gender classiﬁcation.


Introduction
Gait refers to the walking style of a person.Every person possesses a unique gait that can be utilized as a behavioral feature in recognition of that person.The system does not need any help from the person for gait acquisition, since it can be done from a distance.Hence, in recent years, gait recognition has gained popularity and it can be used in several fields, such as inspection of criminal activity, surveillance, access monitoring, and forensics.As CCTV is now installed in almost all public places [1][2][3], it captures the gait of the person unobtrusively.
In the recent research literature, the majority of the work has been done in gait recognition [4][5][6][7][8][9][10][11][12][13][14], whereas gender classification based on gait has a huge potential in realtime applications [15].Gait-based gender classification can provide a cue in investigation and help in solving the problem of finding lost persons, especially children at railway stations, airports, bus stands, and other public places.The gait-based gender classification can also be used in commercial activities.For example, if the gender of the customer entering the shop is known, then related advertisements can be displayed on the digital screen.A talking robot installed in shopping malls can perceive the customers through cameras and help them with gender-related queries, such as finding toilets, beauty product shops, men's wear shops, and women's wear shops.
The majority of the studies done in gender classification [16][17][18][19][20][21][22][23] use gait energy image (GEI) [6].GEI combines the silhouettes from one complete walking cycle.The brightness of every picture element in GEI reveals the gait dynamic of one complete walking cycle.Recent studies usually perform gait-based gender recognition using fixed direction data [20,24], which means the recognition is only for a fixed viewpoint of the subject.But it is not always the same viewpoint in real-time.Unfortunately, among the recent research literature [13][14][15][16][17][18][25][26][27][28][29], very few have conducted gait-based gender classification experiments for viewing angle variations on a very large dataset, which can be considered as the main motivation of this study.

•
The performance of gender classification based on gait features mainly relies on the camera viewing angle.It can be observed in [27] that non-frontal viewing angles (22.5 • /45 • /67.5 • /90 • ) provide a straightforward way to distinguish the leg joint angles in the depth image XY-plane.However, in other viewing angles, the subjects' limb movement can be occluded from the camera, which hinders perceiving the gait features.
Recent studies [18,[28][29][30] have tried to investigate the problem of viewing angle variations in gait-based gender classification.However, they have used a relatively smaller dataset for performance evaluation of the results.

•
In recent times, most studies [16,25,[31][32][33] have adopted convolution neural network (CNN) based methods to perform gait-based gender classification.These methods demand an extensive amount of training, which can be accomplished with highperformance GPU machines.The end result is a high implementation cost.To overcome this drawback, this study makes a cost-effective attempt by proposing a lightweight system for gait-based gender classification.

Main Contributions
This study attempts to handle the research gap mentioned in Section 1.1 in the following manner:

•
It adopts a lightweight approach for gait-based gender classification under viewing angle variations of subjects.

•
It initially takes the input image, extracts silhouette, constructs GEI, applies discrete cosine transform (DCT) for feature extraction, and finally applies a XGBoost classifier for gender classification.

•
It verifies the experimental results using the world's largest multi-view gait dataset (OU-MVLP) by confirming the efficiency and effectiveness of the proposed system against the results produced by the state-of-the-art models.
The contents of this paper are arranged as follows: Section 2: Related work, Section 3: Proposed system, Section 4: Experiments, and Section 5: Conclusions.

Model-Based Systems
The model-based approaches attempt to model the subject's body to perform gaitbased gender classification and recognition tasks.These systems depend on the acquisition of a 3D skeletal model of subjects [40][41][42].The benefit of 3D modeling in a gait-based gender classification and recognition system is that it can successfully tackle the viewing angle variation using the skeletal models.Since the essential parameters in the skeletal model are evaluated in 3D with the help of several calibrated 2D cameras or depth-sensing cameras, the outcome of the system shows robustness to viewing angle variations.Therefore, these systems attain a recognition accuracy of 92% for 5 various viewpoints [41].Likewise, 3D dense models developed from several 2D cameras play a vital role in handling viewing angle variations [43].These systems can transform the features retrieved from a model to a particular viewpoint, attaining the accuracy of recognition of 75% over 12 different viewing angles of 20 subjects.
The disadvantages of the 3D model-based systems include things like camera calibration prior to usage, occlusion of gait, and limited range of depth-sensing cameras.Therefore, these systems become unproductive in the natural environment.
Yoo et al. [35] introduced a technique in which the gait outline is rearranged in the form of a two-dimensional link coordinate diagram and used as a distinct feature for recognition tasks.Lee et al. [34] partitioned the gait outline of the subject's body into 7 segments such that every segment is depicted in an elliptical shape.Further, the elliptical center, major axis, minor axis, and orientation of every ellipse are computed as distinct features.Isaac et al. [26] proposed a gender recognition system that removes the necessity of requiring a complete gait cycle by using the pose-based voting (PBV) method.The authors used linear discriminant analysis (LDA) in addition to the Bayes' rule for classification as an alternative to the popular support vector machine (SVM).Guffanti et al. [28] proposed a method that uses depth cameras to perceive most human gait features.They included 81 participants (40 females and 41 males) in their experiment.These participants were asked to walk at a self-selected speed across a 4.8-m walkway.A detailed analysis was done in the time domain.Further, the features with significant differences by gender were used to train a support vector machine (SVM) classifier.Lee et al. [29] proposed a gender recognition method that uses a support vector machine (SVM) and random forest (RF) based on recursive feature elimination to determine the best features.They investigated temporal, kinematics, and muscle activity to show the effects of gender-based differences on gait characteristics.
In general, it is observed that model-based systems fail to handle lower resolution input pictures; additionally, they end up increasing the computational complexity.

Appearance Based Systems
The appearance-based approaches depend on the spatio-temporal data captured from the gait dynamics of a person to perform gender classification and recognition.The appearance-based method mostly relies on a complete gait cycle of a subject to perform gender classification.The most famous appearance-based method is GEI.Lu et al. [44] introduced a gender classification system that is capable of handling the arbitrary movement of subjects.In this study, the gait sequences of subjects are compared and assigned to a particular cluster, depending on the comparison result.Finally, a cluster-based averaged gait image is computed, which is further used as a gait-based feature.Liu et al. [45] proposed a Fourier transform on gait energy image to extract the gait-based features.
In another study [46], the technique for gender classification is introduced in which the silhouette from every picture is extracted and gait dynamics over a certain time period are calculated to trace the person.Later, support vector machines (SVM) is applied for classification.The study [47] applied GEI and active energy image (AEI) in combination with the k-NN classification algorithm to perform gender classification.This study demonstrated the systems' performance on CASIA-B and SOTON-A public datasets.The study [48] utilized GEI and denoised energy images as distinct features and then applied SVM to perform gender classification.Bei et al. [16] introduced a method to calculate a subGEI using fewer frames rather than a complete gait cycle.They extracted the synthetic optical flow of multi-subGEIs and used them as temporal features.Further, they applied a two-stream CNN to combine the use of GEI and the optical flow information for further gait analysis.Hassan et al. [17] used a wavelet 5/3 lifting scheme for gait representation.They performed PCA to form dimensional distinctive vectors for each walk sequence.
The significance of appearance-based systems includes lower complexity for computation and reduced noise as it works directly on gait silhouette.Hence, this study uses an appearance-based approach.

Deep Learning-Based Systems
The deep learning approaches involving convolution neural networks (CNN) can be used for multiple tasks, such as gender classification, age estimation, and recognition, simultaneously.In [38], it can be observed that gender classification, age estimation, and recognition of subjects are carried out simultaneously.In [31], CNN is used with GEI; silhouette images are treated as input, and body mass index is computed as output.Zhang et al. [32] introduced a deep CNN, which performs multiple task learning for estimating the age along with gender classification.Sakata et al. [39] introduced CNN involving multiple stages to handle gait-based age and gender estimation.Liu et al. [33] used a VGGNet-16 deep convolution model in combination with SVM to perform gait-based gender recognition.
According to the available literature, it can be observed that deep learning-based systems for gait-based gender classification have produced superior results.However, these systems need a very high configuration of hardware systems [25].The proposed system attempts to solve gender classification problems using a lightweight method by adapting an appearance-based approach.

Deep Learning-Based Systems
The deep learning approaches involving convolution neural networks (CNN) can be used for multiple tasks, such as gender classification, age estimation, and recognition, simultaneously.In [38], it can be observed that gender classification, age estimation, and recognition of subjects are carried out simultaneously.In [31], CNN is used with GEI; silhouette images are treated as input, and body mass index is computed as output.Zhang et al. [32] introduced a deep CNN, which performs multiple task learning for estimating the age along with gender classification.Sakata et al. [39] introduced CNN involving multiple stages to handle gait-based age and gender estimation.Liu et al. [33] used a VGGNet-16 deep convolution model in combination with SVM to perform gait-based gender recognition.
According to the available literature, it can be observed that deep learning-based systems for gait-based gender classification have produced superior results.However, these systems need a very high configuration of hardware systems [25].The proposed system attempts to solve gender classification problems using a lightweight method by adapting an appearance-based approach.

Silhouette Extraction
According to a study by Aslam et al. [49], the Gaussian mixture-based background/foreground segmentation algorithm is found suitable for moving object detection.Their experiment shows satisfactory results for background subtraction.It is also found that the Gaussian-based background/foreground segmentation algorithm is capable of detecting the moving object even though it is occluded.Therefore, this study performs silhouette extraction using the aforementioned algorithm.This algorithm partitions the pixels according to their intensity value into foreground and background.
Let  be the input video, ( ,  ) the pixel position, and  the time.The history of pixel position ( ,  ) can be represented with the following equation: where  represents video frames at instant .
Let  = number of distributions, ω = associated weight with the  Gaussian, Σ = standard deviation, µ = mean,  = variance, and  = threshold value.The following equations illustrate the silhouette extraction process:

Silhouette Extraction
According to a study by Aslam et al. [49], the Gaussian mixture-based background/ foreground segmentation algorithm is found suitable for moving object detection.Their experiment shows satisfactory results for background subtraction.It is also found that the Gaussian-based background/foreground segmentation algorithm is capable of detecting the moving object even though it is occluded.Therefore, this study performs silhouette extraction using the aforementioned algorithm.This algorithm partitions the pixels according to their intensity value into foreground and background.
Let M be the input video, (x 0 , y 0 ) the pixel position, and t the time.The history of pixel position (x 0 , y 0 ) can be represented with the following equation: where G t represents video frames at instant t.
Let k = number of distributions, ω = associated weight with the ith Gaussian, Σ = standard deviation, µ = mean, σ = variance, and T = threshold value.The following equations illustrate the silhouette extraction process: where Once silhouette extraction is performed, a median filter of size 8 × 8 is applied to the silhouette image to remove the noise.

Gait Energy Image (GEI)
After silhouette extraction and noise removal, GEI is evaluated.The majority of appearance-based methods have adapted the GEI.In this experiment, we have fixed the GEI size as 88 × 128 pixels.Let N G.F. = number of frames in a gait cycle of an individual, i = frame sequence number, (x, y) = representation of image coordinate, and S i = gait frame.GEI can be computed as follows: Figure 2 illustrates a pictorial representation of GEI.
Once silhouette extraction is performed, a median filter of size 8 × 8 is applied to the silhouette image to remove the noise.

Gait Energy Image (GEI)
After silhouette extraction and noise removal, GEI is evaluated.The majority of appearance-based methods have adapted the GEI.In this experiment, we have fixed the GEI size as 88 × 128 pixels.Let  . .= number of frames in a gait cycle of an individual,  = frame sequence number, (, ) = representation of image coordinate, and  = gait frame.GEI can be computed as follows: Figure 2 illustrates a pictorial representation of GEI.Justification for GEI Representation According to equation 5, GEI is an average template; therefore, it is not affected by random noises in single silhouette frames.Further, the robustness may be enhanced by dropping the pixels whose energy values are below a threshold.In addition, with GEI templates, it is not required to split the sequence of silhouettes into cycles and carry out time normalization of the cycle length.Thus, the errors arising in this approach can be avoided.In comparison to binary silhouette, the GEI has an apparent information loss.The intensity value for a particular pixel in GEI indicates the frequency of silhouette taking place at that position over the whole sequence.We can partially rebuild the original silhouette sequence from the GEI with the knowledge of human walking.For instance, for the pixel closer to the leg contour, its GEI value indicates that the silhouette is taking place at this position in 30 frames out of 120 frames.These 30 frames are those frames in which persons are moving.Likewise, we can assign the GEI values to other limb movement regions for corresponding frames in the silhouette sequence.Generally, energy changes in the head and torso regions are regarded as noise.GEI can maintain the crucial contour of a person's walk, and it is also helpful to understand the changes in walking.
According to the study by J. Han et al. [6], GEI is more capable of saving both storage space and computation time for gait recognition than binary silhouette sequences.Their study also concluded that GEI is less sensitive to noise as compared to the individual frame.Therefore, this study uses GEI for gait representation.Justification for GEI Representation According to Equation (5), GEI is an average template; therefore, it is not affected by random noises in single silhouette frames.Further, the robustness may be enhanced by dropping the pixels whose energy values are below a threshold.In addition, with GEI templates, it is not required to split the sequence of silhouettes into cycles and carry out time normalization of the cycle length.Thus, the errors arising in this approach can be avoided.In comparison to binary silhouette, the GEI has an apparent information loss.The intensity value for a particular pixel in GEI indicates the frequency of silhouette taking place at that position over the whole sequence.We can partially rebuild the original silhouette sequence from the GEI with the knowledge of human walking.For instance, for the pixel closer to the leg contour, its GEI value indicates that the silhouette is taking place at this position in 30 frames out of 120 frames.These 30 frames are those frames in which persons are moving.Likewise, we can assign the GEI values to other limb movement regions for corresponding frames in the silhouette sequence.Generally, energy changes in the head and torso regions are regarded as noise.GEI can maintain the crucial contour of a person's walk, and it is also helpful to understand the changes in walking.
According to the study by J. Han et al. [6], GEI is more capable of saving both storage space and computation time for gait recognition than binary silhouette sequences.Their study also concluded that GEI is less sensitive to noise as compared to the individual frame.Therefore, this study uses GEI for gait representation.

Discrete Cosine Transform (DCT)
According to N. Ahmed et al. [50], DCT is preferred over Karhunen-Loeve Transform, Discrete Fourier Transform, Walsh-Hadamard Transform, and Haar Transform for pattern recognition application.In this study, comparing the performances of these transforms, DCT is found to be optimal.The study [51] compared the performance of Discrete Wavelet Transform (DWT) with DCT and concluded that DCT has excellent compaction for human image data.DCT provides an adequate trade-off between information packing ability and computational complexity, and is faster than DWT.Because of these advantages, this study applies DCT to the GEI for extracting the distinct gait features.Figure 3 illustrates the detailed process of DCT features extraction.The input GEI is divided into various blocks, with each block having a pixel size of 8 × 8.We get 176 such blocks.Then, DCT is applied on every block from left to right in a top-down approach.These blocks are capable of representing the entire GEI with comparatively less memory requirement.The DCT matrix elements can be computed as: where N represents the size of blocks,

Discrete Cosine Transform (DCT)
According to N. Ahmed et al.
[50], DCT is preferred over Karhunen-Loev Transform, Discrete Fourier Transform, Walsh-Hadamard Transform, and Ha Transform for pattern recognition application.In this study, comparing the performanc of these transforms, DCT is found to be optimal.The study [51] compared the performan of Discrete Wavelet Transform (DWT) with DCT and concluded that DCT has excelle compaction for human image data.DCT provides an adequate trade-off betwee information packing ability and computational complexity, and is faster than DW Because of these advantages, this study applies DCT to the GEI for extracting the distin gait features.Figure 3 illustrates the detailed process of DCT features extraction.The inp GEI is divided into various blocks, with each block having a pixel size of 8 × 8.We get 1 such blocks.Then, DCT is applied on every block from left to right in a top-dow approach.These blocks are capable of representing the entire GEI with comparatively le memory requirement.The DCT matrix elements can be computed as:

XGBoost Classifier
According to [52], the performance of XGBoost is found to be superior to support vector regression and artificial neural network models.In addition, the performance of the XGBoost model has shown robustness to all input combinations, compared with the ANN and SVR models.Accordingly, this study used an XGBoost classifier for performing classification based on the features obtained from DCT.This classifier applies secondorder gradients and advanced regularization for achieving higher accuracy in performance.The objective function of the classifier is computed as a summation of intermediate loss functions, which are observed in every iteration.The classifier also applies hessian to train the model by generating a tree.Hessian is described as a secondorder derivative of the loss at any instant.The following steps explain the gender classification process.
Let   the DCT features vector set obtained from the GEI of the individuals and  a set of genders,  =  ,   =    s.Step (1) Preparing the model at the beginning Step (2) Performing iteration from  = 1   (i) Evaluate hessian and gradient:

XGBoost Classifier
According to [52], the performance of XGBoost is found to be superior to support vector regression and artificial neural network models.In addition, the performance of the XGBoost model has shown robustness to all input combinations, compared with the ANN and SVR models.Accordingly, this study used an XGBoost classifier for performing classification based on the features obtained from DCT.This classifier applies secondorder gradients and advanced regularization for achieving higher accuracy in performance.The objective function of the classifier is computed as a summation of intermediate loss functions, which are observed in every iteration.The classifier also applies hessian to train the model by generating a tree.Hessian is described as a second-order derivative of the loss at any instant.The following steps explain the gender classification process.
Let X be the DCT features vector set obtained from the GEI of the individuals and Y a set of genders, α = learning rate, and M = number o f base learners.
Step (1) Preparing the model at the beginning Step ( 2) Performing iteration from m = 1 to M (i) Evaluate hessian and gradient: where L is the log loss function.
(ii) Resolving optimization problem by fitting base learner with training pair Step ( 3) Estimating the result Based on the outcome of step 3, the proposed system classifies the gender of the person under observation.Figure 5 illustrates the XGBoost classifier with the help of flowchart representation.
where  is the log loss function.
(ii) Resolving optimization problem by fitting base learner with training pair

𝑓 (𝑥) = 𝛼𝜙 (𝑥)
(iii) Upgrading the model Step ( 3) Estimating the result Based on the outcome of step 3, the proposed system classifies the gender of the person under observation.Figure 5 illustrates the XGBoost classifier with the help of flowchart representation.

Dataset
This study demonstrates the performance of the proposed system for gait-based gender classification using the OU-MVLP dataset [53].This dataset is developed by the Institute of Scientific and Industrial Research (ISIR), Osaka University (OU).It contains the gait of 10,307 individuals.Out of this, 5114 are male and 5193 are female.The gait recording of 10,307 individuals is done from 14 different viewing angles.These angles vary from 0° to 90° and 180° to 270°.The setup for gait recording includes seven network cameras, which are planted at the gap of 15° azimuth angles.These camera networks are planted at both ends of a walking track.The detailed setup (top view) for OU-MVLP is

Experiments 4.1. Dataset
This study demonstrates the performance of the proposed system for gait-based gender classification using the OU-MVLP dataset [53].This dataset is developed by the Institute of Scientific and Industrial Research (ISIR), Osaka University (OU).It contains the gait of 10,307 individuals.Out of this, 5114 are male and 5193 are female.The gait recording of 10,307 individuals is done from 14 different viewing angles.These angles vary from 0 • to 90 • and 180 • to 270 • .The setup for gait recording includes seven network cameras, which are planted at the gap of 15 • azimuth angles.These camera networks are planted at both ends of a walking track.The detailed setup (top view) for OU-MVLP is illustrated in Figure 6.Here, orange-colored cameras are used for gait recording when a person walks from A to B, whereas blue-colored cameras are used for gait recording of a person moving from B to A. During the recording of gait, every individual is instructed to walk twice in forward (A to B) and backward (B to A) directions.Accordingly, 28 gait sequences of each individual are captured in this setup.
illustrated in Figure 6.Here, orange-colored cameras are used for gait recordin person walks from A to B, whereas blue-colored cameras are used for gait reco person moving from B to A. During the recording of gait, every individual is i to walk twice in forward (A to B) and backward (B to A) directions.According sequences of each individual are captured in this setup.
The OU-MVLP gait dataset images are partitioned into two parts with simi sizes.The first part contains gait images which are used for training, while the the other part are used for testing.This study adopts a similar train and test spli based gender classification.

Classifier Tuning
This study tunes the XGBoost classifier by setting the value of boosting pa like max_depth, min_child_weight, gamma, subsample, colsample_byt scale_pos_weight.We have tuned the learning rate in the range from 0.2 to 0.4 a to the viewing angle variations.These learning rates have helped in the evaluat optimum number of trees required for classification.Once the learning rate and of trees are evaluated, then the tree specific parameters were tuned as follows: m = 3-5 (according to viewing angle variations), min_child_weight = 6, gam subsample = 0.8, colsample_bytree = 0.8, scale_pos_weight = 1.To enh performance of the classifier, the regularization parameter alpha is tuned with 0.005.Table 1 illustrates the tuning of max_depth, learning rate, and the n estimators according to the viewing angle variations.The OU-MVLP gait dataset images are partitioned into two parts with similar image sizes.The first part contains gait images which are used for training, while the images in the other part are used for testing.This study adopts a similar train and test split for gait-based gender classification.

Classifier Tuning
This study tunes the XGBoost classifier by setting the value of boosting parameters like max_depth, min_child_weight, gamma, subsample, colsample_bytree, and scale_pos_weight.We have tuned the learning rate in the range from 0.2 to 0.4 according to the viewing angle variations.These learning rates have helped in the evaluation of the optimum number of trees required for classification.Once the learning rate and number of trees are evaluated, then the tree specific parameters were tuned as follows: max_depth = 3-5 (according to viewing angle variations), min_child_weight = 6, gamma = 0, subsample = 0.8, colsample_bytree = 0.8, scale_pos_weight = 1.To enhance the performance of the classifier, the regularization parameter alpha is tuned with a value of 0.005.Table 1 illustrates the tuning of max_depth, learning rate, and the number of estimators according to the viewing angle variations.

Performance Evaluation Criteria
The performance of the proposed system for gait-based gender classification is validated under the assumption that the gender information is known at the beginning.This study carried out the test for both correct and incorrect gender classification with respect to a specific individual to observe the effect of viewing angle variations on the gender classification.The correct classification rate (CCR) is evaluated by the following equation: where, TP g = true positive pointing out the scenario where the proposed system performs correct classification of positive samples (male samples), TN g = true negative pointing out the scenario where the proposed system performs correct classification of negative samples (female samples), and N g is the total number of samples.We have labeled the positive sample as 1 and the negative sample as −1.The CCR depicts the correct classification of male and female samples.

Result Analysis
After tuning the classifier, the CCR for gender classification under every viewing angle is evaluated.Table 2 shows angle-wise CCR for gender classification.It is observed that the proposed system attains the highest CCR of 96.32% for gender classification under a viewing angle of 90 • .In the side view of a person, the majority of the gait features are observed, which helps in classification.At 0 • viewing angle, comparatively fewer features are observed; hence, the CCR declines to 92.85%.It can also be observed from the obtained results that when a person walks in the forward direction, the CCR obtained is comparatively better than when a person walks in the backward direction-refer to Figure 6 and Table 2.

Result Comparison and Discussion
In order to prove the robustness and efficiency of the proposed system, we have performed a statistical analysis of gait-based gender classification methods.Other studies excluding Gaitset [25] have evaluated the performance of their method for gender classification on a relatively smaller dataset.Nevertheless, the proposed system outperforms other methods.Table 3 enlists the state-of-art approaches for gait-based gender classification.The studies DGHEI [54], CNN+SVM [55], PBV-EFD (TUM-GAID) [26], and PBV-RCS (TUM-GAID) [26] have conducted experiments on a dataset containing the gait of 305 persons.Despite using a lower size dataset, the results obtained in the aforementioned studies do not match those of our study.The studies PBV-EFD (CASIA-B) [26], PBV-RCS (CASIA-B) [26], and SRML (CASIA-B) [45] have obtained better results, but these studies considered only 11 different viewing angles for calculation of mean CCR.Moreover, these studies conducted the experiments on a very small dataset of 62 persons, whereas our study conducted the experiment on the world's largest dataset, covering 14 different viewing angles.As can be seen in Table 3, the proposed system outperforms the studies by Hu et al. [56], SRML [45], PWC [30], and PWC+PF [30] in gender recognition accuracy.Although the studies Lifting scheme 5/3+PCA (OULP) [17], Lifting scheme 5/3+PCA (CASIA-B) [17], and SVM-RFE [29] show promising results, they have considered fewer viewing angle variations and used comparatively smaller datasets for gender classification.Thus, the Results section elucidates that DCT can successfully determine different frequency components and are insensitive to the changes in human appearances.Further, the GEI+DCT+XGBoost combination is capable of obtaining higher recognition rates than the methods using contour and texture-based features.
From the statistics presented in Table 3, it can be observed that mean CCR is inversely proportional to the number of viewing angles.As the number of viewing angles increases, the possibility of occlusion of gait features in several viewpoints also increases.This leads to the performance degradation of gender classification in those viewpoints.Thus, mean CCR reduces.However, the performance of gait-based gender classification depends on various factors, such as feature extraction technique, sample size, model training, and classification method, and the number of viewing angles, and so on.To the best of our knowledge, very few studies have performed gender classification using the OU-MVLP gait dataset.This study attempts to demonstrate the performance of the proposed system under viewing angle variations by showing an angle-wise comparison.Table 4 illustrates angle-wise comparative analysis of the proposed study with GaitSet [25].From Tables 3 and 4, it can be derived that the performance of the proposed system is superior to other methods for gait-based gender classification.The study [25] used a CNN-based method for feature extraction, whereas this study used DCT-based features for gender classification.This shows that the proposed method gives better results than the deep learning approach.In order to evaluate the statistical significance of Table 4, we assume µ 1 as the mean CCR of the study [25] and µ 2 as the mean CCR of the proposed study.Here, our null hypothesis is H o : µ 1 = µ 2 and alternative hypothesis is H a : µ 1 < µ 2 .The test statistic is −2.887006, and the p-value for the two-tailed test is 0.007729.Since the p-value is less than the critical value, we can reject the null hypothesis at a 5% level of significance.Thus, we can accept the alternative hypothesis which states that the performance of the proposed method is superior to that of study [25].

Computational Efficiency
This study confirms the performance of a proposed system by showing an analysis of computational complexity for feature extraction and classification.The proposed system takes O(N log 2 N) time complexity [57] for feature extraction, O(dn trees x log n) time complexity for training, and O(dn trees ) time complexity [58] for prediction of the classifier, where N is the size of a block, d is the depth of the tree, n trees is the number of trees, x is the number of non-missing entries in training data, and n is the total input samples.
In order to measure the complexity of CNN-based architecture, the study [59] exploits the concept of Betti numbers and summarizes that Betti numbers can grow exponentially for deep network architecture (refer to Table 5).Therefore, in comparison to the complexity of several CNN-based (shallow and deep architectures) methods, the proposed system proves to be efficient in runtime complexity.The CNN-based systems represent exponential runtime complexity, whereas the proposed system represents quasilinear runtime complexity.Moreover, CNN-based systems need GPU [25] for implementation, whereas the proposed system is implemented on the following configuration: Intel(R) Core (TM) i5-3570KCPU @ 3.40 GHz, 16GB RAM, 64-bit Windows 10 operating system.Thus, the proposed system is computationally efficient and implementation-wise cost-effective.
Table 5. Runtime complexity analysis of shallow and deep CNN architectures (adapted from study [59]).Here l = hidden layer, h = hidden units, n = inputs, and r = degree of the polynomial.

Conclusions
A lightweight and robust system for gait-based gender classification is proposed.The proposed system uses GEI for human gait representation, DCT for extraction of the feature vector, and XGBoost for classification of gender.GEI is more capable of saving both storage space and computation time for gait recognition than binary silhouette sequences.DCT provides an adequate trade-off between information packing ability and computational complexity; additionally, its performance time-wise is faster than DWT.The XGBoost model has shown robustness to all input combinations, and its performance is superior to that of support vector regression and artificial neural network models.
During this classification, the proposed system adopts an appearance-based approach for gait analysis.The performance of the proposed system is evaluated on the OU-MVLP dataset.Results obtained in the experiment are compared with conventional machine learning methods as well as deep learning methods to demonstrate the superior performance of the proposed system.The comparison results show superior CCR for each viewing angle and surpass the state-of-the-art models.
Most studies on gait-based gender recognition reported in the literature base their experiments on relatively smaller datasets, and the number of viewing angles considered is at most 11.The study targets the largest available data for classification and considers 14 different viewing angles.The proposed system also shows a mean CCR of 95.33% and runtime complexity of O(N log 2 N) for feature extraction, which is superior to CNN-based systems.Thus, it proves the robustness of the system against the viewing angle variations.
However, the proposed system does not consider other problems related to gaitbased gender classification, such as occlusion of gait due to variations in clothing and bag carrying conditions.The future work of this study will focus on handling the problem of gait occlusion and making a system robust against partial access of gait.

Figure 1
Figure 1 depicts the detailed functioning of gait-based gender classification.The following subsections explain the proposed system in detail.

Figure 1
Figure 1 depicts the detailed functioning of gait-based gender classification.The following subsections explain the proposed system in detail.

Figure 1 .
Figure 1.Proposed system for gait-based gender classification.

Figure 1 .
Figure 1.Proposed system for gait-based gender classification.
and g(m, n) represents the image matrix.Every DCT coefficient shows a specific spatial frequency.The first DCT coefficient, X(0, 0), is the DC coefficient.The DC coefficient has zero frequency in both the horizontal and vertical directions.It specifies the brightness of the image block as it is computed by averaging the pixel values in the block.The other coefficients are termed the AC coefficients.The AC coefficients near the DC coefficient have lower spatial frequencies and the frequencies rise on moving away from the DC coefficient in all directions.AC coefficients react to gray level variations that are in the same direction as their spatial frequencies.AC coefficients values and signs are directly related to the strength, contour, and orientation of the movement in the image blocks.Once DCT is applied on GEI, a few coefficients are chosen for the feature vector, whereas others are discarded, resulting in the reduction of data dimensionality.The selected coefficient carries high energy components, inducing energy compaction in DCT.AI 2022, 3, FOR PEER REVIEW )() (, ) cos (2 + 1) 2 cos (2 + 1) 2 ( where  represents the size of blocks, () = √   = 0 1   > 0 , and (, ) represents th image matrix.Every DCT coefficient shows a specific spatial frequency.The first DC coefficient, X(0,0), is the  coefficient.The DC coefficient has zero frequency in both th horizontal and vertical directions.It specifies the brightness of the image block as it computed by averaging the pixel values in the block.The other coefficients are termed th  coefficients.The AC coefficients near the DC coefficient have lower spatial frequenci and the frequencies rise on moving away from the DC coefficient in all directions.A coefficients react to gray level variations that are in the same direction as their spati frequencies.AC coefficients values and signs are directly related to the strength, contou and orientation of the movement in the image blocks.Once DCT is applied on GEI, a fe coefficients are chosen for the feature vector, whereas others are discarded, resulting the reduction of data dimensionality.The selected coefficient carries high energ components, inducing energy compaction in DCT.

Figure 4
Figure 4 is a pictorial representation of the coefficients of DCT along with the GE The DCT coefficients matrix consists of three different frequency components-low medium, and high.It is observed that low-frequency components are holding high relevant and meaningful data.Since the low-frequency components are sufficient regenerate the input image, these components have been utilized in DCT feature vect generation.

Figure 4
Figure 4 is a pictorial representation of the coefficients of DCT along with the GEI.The DCT coefficients matrix consists of three different frequency components-low, medium, and high.It is observed that low-frequency components are holding highly relevant and meaningful data.Since the low-frequency components are sufficient to regenerate the input image, these components have been utilized in DCT feature vector generation.

Table 1 .
Illustration of XGBoost parameters tuning with respect to viewing angle variations.

Table 2 .
CCR for gender classification under each viewing angle.

Table 3 .
Statistical analysis of gait-based gender classification methods.