Remote Pathological Gait Classification System

Several pathologies can alter the way people walk, i.e. their gait. Gait analysis can therefore be used to detect impairments and help diagnose illnesses and assess patient recovery. Using vision-based systems, diagnoses could be done at home or in a clinic, with the needed computation being done remotely. State-of-the-art vision-based gait analysis systems use deep learning, requiring large datasets for training. However, to our best knowledge, the biggest publicly available pathological gait dataset contains only 10 subjects, simulating 4 gait pathologies. This paper presents a new dataset called GAIT-IT, captured from 21 subjects simulating 4 gait pathologies, with 2 severity levels, besides normal gait, being considerably larger than publicly available gait pathology datasets, allowing to train a deep learning model for gait pathology classification. Moreover, it was recorded in a professional studio, making it possible to obtain nearly perfect silhouettes, free of segmentation errors. Recognizing the importance of remote healthcare, this paper proposes a prototype of a web application allowing to upload a walking person's video, possibly acquired using a smartphone camera, and execute a web service that classifies the person's gait as normal or across different pathologies. The web application has a user friendly interface and could be used by healthcare professionals or other end users. An automatic gait analysis system is also developed and integrated with the web application for pathology classification. Compared to state-of-the-art solutions, it achieves a drastic reduction in the number of model parameters, which means significantly lower memory requirements, as well as lower training and execution times. Classification accuracy is on par with the state-of-the-art.


Related Work
Nowadays, a rich characterization of gait information can be obtained through the use of different types of sensors, including [3]: i) floor sensors; ii) wearable sensors; and iii) vision sensors. Floor sensors, which can be used to detect ground reaction force measurements [11] or the pressure exerted on each area under the foot [3], typically provide limited information for pathological gait classification and the equipment used is restricted to constrained spaces. On the other hand, wearable sensors are portable, allowing data acquisition of three-dimensional information related to walking patterns over long periods of time [12], and can be used in many applications [4]. However, their performance can be influenced by the sensor placement, which might also affect the subject's natural gait. Vision-based systems have the advantage of being unobtrusive and not requiring complicated subject cooperation. Currently, in this category, marker-based systems are considered as the gold standard approach for gait analysis. Such solutions [13] use special markers placed on key body parts to track them and obtain kinematic features from the observed motion. However, these often require specialized personnel to ensure setup and calibration processes and can be very time consuming. On the other hand, a markerless approach can be more suitable for application in less constrained environments [14], such as the integration of gait analysis in a clinical context. For these reasons, markerless vision-based systems are considered in this paper.

Gait Representation
In vision-based systems, the representations used for gait analysis typically follow a model-based or an appearance-based approach [15].
In a model-based approach, gait representations are created by fitting a model to the input sequence of images or silhouettes, using prior knowledge of the human body (structural model) or its motion (motion model) [16]. An example includes two Kinect sensors with perpendicular viewing directions, acquiring RGB and depth to create a 3D model Figure 1: Example of binary silhouettes in a gait cycle and the corresponding GEI [18].
based on the movement of skeleton parts [17]. This model combines static features (e.g., distances between joints), and dynamic features (e.g., speed, stride length or the body's centre of mass movement).
An appearance-based approach represents gait without assuming prior knowledge of human motion. A sequence of binary silhouettes is typically obtained (e.g., by background subtraction), from which the desired gait representation is derived. A widely used representation is the Gait Energy Image (GEI) [18], obtained by averaging the cropped, normalized in size and horizontally aligned binary silhouettes of a gait cycle, according to Equation 1: N represents the number of frames in one (or multiple) gait cycle(s) and B i (x, y) is a binary silhouette image, with x and y being pixel coordinates. The resulting GEI is a grey-level image implicitly representing, in a single image, the subject's shape and motion along the gait cycle. The GEI representation is robust against noise in individual frames, as illustrated in Figure 1.
In this paper, results will be reported using the GEI for gait representation, as it is commonly used and provides a good compromise between representation power and computational efficiency.
A second representation considered for the presentation of results is the Skeleton Energy Image (SEI) [6], a hybrid between model-and appearance-based approaches. It starts by fitting a skeleton model to each image of the walking person, using OpenPose [19], as illustrated in Figure 2.b. With a skeleton image for each frame, the SEI can then be obtained with the same method used for GEI computation. The SEI was reported to achieve better pathological gait classification results than the GEI, as the SEI focuses on the dynamic movement characteristics and not on the physical constitution and clothing of a subject [6].

Pathological Gait Classification
Classification of gait related pathologies from vision-based representations typically uses the visual gait representation directly, computes a set of biomechanical features or uses a combination of both. For instance, the work in [20] describes two approaches, one using leg angles as features, and another one using the GEI. A set of normalized gait features was proposed in [21], including the step length, stance and swing phases, or the amount and broadness of limb movements, to quantify gait impairments.
The last decade has witnessed the emergence of deep learning methods for feature extraction in image recognition and classification, including gait analysis systems. The solution presented in [8] adopts the GEI for gait representation and uses the VGG-19 model [9], pre-trained on a subset of ImageNet [10], for feature extraction. Transfer learning was used to repurpose the model for pathological gait classification, with the last layers of the VGG-19 network being re-trained using GEIs computed from the INIT dataset [21]. Linear Discriminant Analysis (LDA) was used for classification and the system's performance was tested using two other pathological gait datasets: DAI [22] and DAI2 [20]. Another deep learning approach, also based on the VGG-19 model, was adopted in [6] for pathological gait classification, using both GEI and SEI gait representations. In this case, the pre-trained model was fine-tuned with data from the GAIT-IST dataset [6].
Other deep learning approaches include the use of Recurrent Neural Networks (RNNs) that are able to learn correlations between inputs in a time series, such as the application of a bidirectional Long-Short Term Memory (LSTM) [23] network for pathological gait classification based on sequences of lower limb flexion angles [7].
Given the good performance reported in the literature, this paper considers a deep learning solution based on the VGG-19 model as benchmarking for comparison against the new CNN model being proposed for gait analysis.

Pathological Gait Datasets
There are two types of gait datasets available, created either for gait recognition, or for pathological gait analysis. In gait recognition datasets subjects are required to walk normally, possibly including some covariates such as different speeds, different types of shoes, different clothing or carrying different items. Currently, a significant number of gait recognition datasets are publicly available. The purpose of pathological gait datasets is to include sequences of gait impaired due to some pathological condition, and there are much fewer publicly available datasets. Since sharing data from real patients raises ethical and data privacy issues, the publicly available pathological gait datasets were captured from healthy subjects simulating the characteristic gait impairments, after a learning and practice period.
Presently, there are four pathological gait datasets publicly available, as listed below. All the sequences in these datasets were captured from a canonical viewpoint and recorded in controlled environments.
The DAI dataset 1 [22] contains binary silhouettes of 5 walking individuals. It has 15 normal gait sequences, and 15 sequences with random abnormal gait simulations, for a total of 30 gait sequences. The individuals are captured walking over a distance of 3 m using both the RGB camera of a Kinect sensor and a smartphone.
The DAI2 dataset [20] also considers 5 walking individuals, but contains a total of 75 gait sequences. Each person simulates 4 pathologies (Parkinson's, diplegia, hemiplegia and neuropathy), as well as a normal walking gait. Each condition was recorded 3 times, while walking along a distance of 8 m.
The INIT dataset 2 [21] contains binary silhouettes of 10 individuals (9 males, 1 female), for a total of 80 sequences. Every subject is recorded 2 different times, at 30 fps, capturing multiple gait cycles and simulating seven different gait impairments (in addition to a normal gait sequence): i) right arm motionless; ii) half motion of the right arm; iii) left arm motionless; iv) half motion of the left arm; v) full body impairments; vi) half motion of the right leg; and vii) half motion of the left leg.
The GAIT-IST dataset 3 [6] considers 10 walking individuals, with a total of 360 gait sequences. The dataset includes the same 4 pathological gait types considered in DAI2, with 2 severity levels for each, 2 directions of walking, and 2 repetitions per participant, except for the normal gait. It is the largest pathological gait dataset currently available. Video sequences were captured using a smartphone camera with a resolution of 1280 × 720 pixels, mounted on a tripod at about 1.5 m above the ground and at a distance of about 4 m from the target.

GAIT-IT Dataset
The goal of the proposed GAIT-IT dataset is to capture a larger gait pathology dataset, containing more variations.
Having a more complete dataset, with higher quality images and a better contrast between the user and the background, allows us to obtain better models, which can generalize better to unknown data.
GAIT-IT was recorded in the professional studio of FCT| FCCN (Fundação para a Ciência e a Tecnologia) 4 , during two full days. The studio includes controlled artificial lighting and a green background, ideal for chroma-keying segmentation, allowing to compute high-quality binary silhouettes of walking subjects. Two professional 4K video cameras were used to capture synchronized gait sequences, one with a side view, at approximately 3 m from the target, and the other with a front/rear view, at about half a meter from the walking start position. Both cameras stood on tripods at 1.75 m from the ground.

Dataset Acquisition
The new GAIT-IT dataset includes sequences of normal gait and the same 4 pathological gait types present in the DAI2 and GAIT-IST datasets: diplegic, hemiplegic, neuropathic and Parkinsonian. For each pathology 2 levels of severity were considered, similarly to GAIT-IST, and the subjects were asked to provide 4 gait sequences per severity level and for their normal gait. This corresponds to a subject walking twice from left to right and from right to left, when imaged from the side view. The acquisition took place on two different days with the participation of 21 volunteers (19 males and 2 females) in the age range of 20-56 years old. Considering that sequences from 2 participants were acquired on both days, the GAIT-IT dataset includes a total of 828 gait sequences. Having sequences from 2 subjects acquired on different days, allows studying intra-subject variations, for instance due to wearing different clothes and shoes. Pose output format of detected body parts using OpenPose [19].
Subjects were instructed on how to simulate the various gait types and severity levels, as summarized below [24].
The diplegic pathology affects both sides of the body, with a forward leaning posture, and walking involves dragging both feet in a circular motion. For the second severity level the overall bending is accentuated, as well as leg and arm movements.
The hemiplegic pathology affects only one side of the body (the right side was chosen). The leg is dragged in a circular motion, with a broader reach for the second severity, while the right arm remains still and held close to the waist, or flexed against the chest in the second severity level.
The neuropathic pathology leads to foot drop and patients tend to lift their knees higher than normal to avoid dragging their toes on the floor. In the second severity level, the lift of the leg and the forward swing are exaggerated.
The Parkinsonian pathology is characterized by a stooped posture, with the arms held close to the chest and the lower limbs flexed and rigid. Subjects were asked to attempt simulating general and erratic body shaking while taking small and relatively fast steps. The second severity level involved an overall exaggeration of these symptoms.

Gait Representations Available in GAIT-IT
The GAIT-IT dataset provides various gait representations useful for gait pathology analysis: i) sequences of binary silhouettes; i) sequences of skeletons; iii) GEIs; and iv) SEIs. A GEI and SEI are available for each gait cycle, as well as for the complete set of gait cycles available per sequence.
The spatial dimension of the produced gait representations is 224×224. However, the computation of gait representations is done with the full resolution of the captured gait sequences, to preserve information. Cropping removes the background around the subject's bounding box and then the width of the cropped image is padded to match its height, while maintaining the centroid position. Finally, the square image is resized to 224 × 224 pixels, while maintaining the aspect ratio. All representations consider a 10 fps framerate.
The main steps for obtaining the above gait representations are briefly described in the following.
The extraction of binary silhouettes relies on chroma-keying segmentation. A frame containing only the background is represented in the HSV colour space and the histograms of the hue (H), saturation (S) and value (V) components were computed. Then, all pixels in gait sequences with HSV values outside the background range are classified as belonging to the walking person's binary silhouette, and a morphological filtering operation is applied to remove small and isolated noise blobs. A sample result is included in the lower portion of Figure 2.a.
Skeleton computation relies on locating key anatomical parts in the gait images, using the OpenPose software [19].
OpenPose is able to automatically detect a total of 135 body, hand, facial and foot keypoints in each frame of a video, operating in real-time, using a multi-stage CNN. In this case, it was used to obtain the 2D coordinates of 25 keypoints corresponding to the full body, as illustrated in Figure 2.b 5 . The computation of GEIs and SEIs follows the description provided in Section 2.1. The frames corresponding to a subject entering or leaving the camera's field of view were discarded, as well as the silhouettes and skeletons that did not correspond to a frame included in a complete gait cycle.
An example of the gait representations included in GAIT-IT, for one gait cycle, is included in Figure 3.

Gait Classification Web Application
This paper proposes the prototype of a system that allows remote gait evaluation. It can assist healthcare professionals to identify patients requiring immediate attention and further examination, as well as to monitor the evolution of an existing gait pathology, without the need of physical interaction with the patient. The usefulness of such a system is made more evident under the Covid-19 pandemic.
The proposed web-based remote gait pathology classification prototype, is composed of two main modules: • Automatic Gait Classification System -This module accepts as input a gait representation, such as a GEI or SEI, and automatically classifies it as either normal or impaired with one of the pathologies considered in the available datasets used for training: Parkinsonian, hemiplegic, diplegic or neurophatic gait; • Web Interface -This module provides an interface for access over the Internet, allowing the user to upload a gait video sequence, or directly a GEI or SEI, running the automatic classification system, and displaying the classification results in way that can be easily interpreted by the end user.
To better suit users with different degrees of expertise in using the automatic gait classification system, the web application provides two different interface modes. This allows advanced users to better understand the characteristics of the input gait sequence that contributed to the classification decision.

Automatic Gait Classification System
The state-of-the-art vision-based systems rely on transfer learning to use a CNN model, such as the VGG-19 pre-trained on ImageNet [8,6]. Transfer learning can be especially important when dealing with small amounts of training data. It allows models trained for a different and somewhat related task to be adjusted to perform the new task, thus transferring the previously acquired knowledge to solve a new problem. However, even for transfer learning, a larger dataset can improve the quality of results obtained.
The proposed pathological gait dataset, GAIT-IT, provides a considerable increase in the amount of available training data. Thus, rather than just using fine-tuning, it is now possible to develop a new light weight CNN specifically for the current classification task.
The architecture of the proposed CNN for feature extraction in pathological gait analysis is illustrated in Figure 4. Its main characteristics are: • Network Depth: The model includes 5 convolutional layers inspired by the networks designed for the Kaggle MNIST challenge [25] [26], which also consider binary input images.
• Convolution Kernel: In light of the work leading to the VGG CNN architectures, all the convolutional layers of the proposed CNN use small convolutional filters, with a receptive field of 3 × 3. A stride of 2 × 2 is considered.
• Feature Maps: The number of filters applied in each layer determines the number of feature maps at the layer's output. Starting with 32 filters at the image input, this number doubles for the last 2 layers. • Batch Normalization: Each convolutional layer is followed by batch normalization, a layer that adjusts and scales its outputs to have a mean value close to 0 and a standard deviation close to 1. Bounding the values that pass between layers helps to stabilize and speedup the training process.
To perform classification, the features computed by the proposed CNN are flattened and submitted to a dense fully connected neural network, with two fully connected layers with a dropout [27] of 0.5 between them. The first layer has 512 units, and the second has 5 units, corresponding to the 5 considered classes (normal, neuropathic, hemiplegic, diplegic or Parkinsonian gait), with a softmax activation to output class probabilities. This classification network is trained using categorical cross entropy as the loss function and the Adam algorithm [28], with the Nesterov momentum variation [29] and a learning rate of 0.001, as the optimizer.

Web Interface
The web interface provides remote access, over the Internet, to the automatic gait pathology classification system described in Section 4.1. The web application has two interface options: • Basic -The simpler interface could be used in a clinical environment, or even at home, where a simple setup for filming a walking person with any 2D camera, such as a smartphone camera, is available. The interface, illustrated in Figure 5.a, allows end-users to upload a video, and the web application computes a GEI representation of the observed gait and runs the automatic gait classification system. Users can visualize the significance of the parts of the body that contributed to the classification process using saliency maps [30] and class activation maps (grad-CAM) [31]. If the user so desires, the classification results can be sent to a specified e-mail address. This interface can be used to remotely obtain a preliminary diagnosis, or simply to help the healthcare staff to identify the regions that contribute most to the identified gait impairment.
• Advanced -The advanced interface, illustrated in Figure 5.b, could be used by researchers interested in analysing the operation of the deep learning classification solution, as they can visualise the feature maps generated by any of the CNN layers. This interface also allows to directly upload a previously computed GEI or SEI gait representation. The additional features of the advanced mode provide the users with an insight into the classification system operation.
The global Covid-19 pandemic has highlighted the importance of remote healthcare applications, to which the proposed web application prototype intends to contribute, providing the means to obtain a preliminary diagnosis and help healthcare staff to identify patients in need of urgent attention. Running over the Internet, the proposed web application eases the access of remote populations and people with limited resources, only requiring an Internet connection and a simple 2D camera.  Since the proposed system deploys a web service, it means that also other web applications could access the gait pathology classification system, by issuing HTTP requests to the web service.
The advanced interface of the proposed web application gives users the possibility to access feature maps from different layers and channels, to visualize intermediate activation maps, as illustrated in Figures 6.b and 7.b, displaying the feature map of the twelfth channel of the first convolutional layer, which appears to operate as an edge detector.  To further help understand the operation of the neural network, the web application also computes saliency [30] and gradient-weighted class activation maps (grad-CAMs) [31]. These representations help users understand which input image areas contribute more to the CNN classification decision. As illustrated in Figure 6.c, the features highlighted when analysing an hemiplegic gait cycle are from the lower part of the body, namely related to the movement of the feet. On the other hand, for a Parkinsonian gait cycle, in which upper body movements are more affected, the proposed system paid more attention to the inclination of the head, the torso orientation and the position of the hands, as illustrated in Figure 7.c. In the current implementation, these representations are obtained using the Keras Visualisation Toolkit [32].

Performance Results
The proposed gait classification system is evaluated using a 10-fold cross-validation protocol on the newly acquired GAIT-IT pathological gait dataset. To further emphasize the proposed system performance and generalization capability, a second test considers training on the GAIT-IT dataset and evaluation using GAIT-IST, thus performing a more challenging cross-database test. Detailed gait classification results, for each of the pathological gait types considered in GAIT-IT are reported in Section 5.3.
It should be noted that, according to [6] and [8], among all the networks originally trained on Imagenet, VGG-19 performed the best, when fine-tuned to perform classification of gait related pathologies. Thus, to compare the proposed system with the state-of-the-art, the system presented in [6] is fine-tuned using the proposed GAIT-IT pathological gait dataset. The fine-tuned system accepts a GEI or a SEI as input. As discussed in [6], when using GEIs as input, the last 3 convolutional blocks of the VGG-19 are fine-tuned, while for the SEI representation the best results are obtained by retraining all blocks except the first one.

Cross-validation Results on GAIT-IT
The proposed CNN architecture and the state-of-the-art VGG-19 system considered for benchmarking purposes are evaluated following a 10-fold cross-validation protocol, on the GAIT-IT dataset. All GAIT-IT subjects are used, except the 2 subject repetitions. The test set for each fold is defined as V k = {S i , S i+1 , S i+2 }, where i = 2 × k − 1, k is the fold iteration and S i represents all sequences from one of the 21 available subjects, following the numbered labels used for each subject in the dataset. This arrangement ensures the use of all subjects in the test set at least once, while reducing the training bias. Cross-validation results are presented in Table 1 as the average classification accuracy over all folds of the state-of-the-art system [6], as well as the classification model based on the proposed CNN. The low complexity CNN proposed here achieves a classification accuracy very close to the state-of-the-art classification accuracy, with 93.4% and 92.6%, when using GEI and SEI as inputs, respectively.
However, the proposed CNN architecture, built specifically for the task at hand, has the major advantage of drastically reducing the number of trainable parameters. This allows to reduce the usage of static and dynamic memory to store and execute the system, as reported in Table 2 for the proposed CNN and the state-of-the-art benchmark VGG-19 system. Notice that the VGG-19 model is significantly larger than the one proposed in this paper, and converting the network models into the HDF5 [33] file format [34], results in a model 83 times smaller than the state-of-the-art VGG-19 considered for benchmarking purposes. This is an advantage of using the proposed system, which achieves a classification accuracy similar to VGG-19 using a shallower and low complexity CNN architecture. As a consequence of reducing the number of trainable parameters, the training process (i.e., fine-tuning of the network) is significantly faster. There is also reduction in the time needed to execute the classification system, which is of great importance when operating over the Internet. Table 3 presents the time required for training and for executing the different networks. The entries represent the time taken to process one sample. These results suggest that the proposed CNN model is 15 times faster in training and 6 times faster in performing classification, when compared to the considered benchmark.

Cross-dataset Tests on GAIT-IST
To further highlight the generalization ability of the proposed CNN model, cross-dataset tests were conducted using the GAIT-IST and GAIT-IT datasets. The proposed and the state-of-the-art systems were trained using all 23 subjects from GAIT-IT and the resulting model was tested using the 10 subjects from the GAIT-IST dataset. Table 4 reports the 10-fold cross-validation classification average results, obtained using both the GAIT-IT for training and GAIT-IST for testing. These results suggest that the proposed CNN system generalizes better than the state-of-the-art VGG-19 benchmarking system, when tested across different datasets.
The proposed system achieves improved classification accuracy results over the state-of-the-art of 3.4% and 1.3%, when using GEIs and SEIs, respectively. This highlights that a deeper CNN architecture may be more prone to overfitting, and thus the shallower proposed CNN architecture is better suited for usage in a web application, which can receive gait video sequences captured in many different conditions, and using different types of video cameras.

Performance Comparison on Different Gait Types
The proposed system classifies gait across 5 different classes, which include 4 gait related pathologies and normal gait. Apart from obtaining state-of-the-art classification results, the proposed system aims to achieve a balanced performance across the different gait related pathologies considered. To confirm that this goal was achieved, Table 5 presents the obtained classification accuracies for each gait type on GAIT-IT.
To further analyze the system's performance with respect to each gait type, a confusion matrix is presented in Table 6, including the average of the classification results obtained using both the GEI and SEI representations. The confusion matrix highlights common mislabeled predictions, providing valuable insight to analyze results, for instance highlighting that there are some similar gait impairments that appear in different gait related pathologies.
From the presented results, normal gait appears as the easiest to classify, with a classification accuracy of 99% using the proposed CNN model. Diplegic gait is the most difficult to classify, with a classification accuracy of 89%, sometimes being incorrectly misclassified as hemiplegic or Parkinsonian. Hemiplegic gait performs slightly better with an average classification accuracy of 92%. The distinct walking pattern of neuropathic gait allows the system to achieve an average classification accuracy of 97%. Parkinsonian gait achieved the next best classification accuracy of 95%. These results suggest that most misclassifications are among diplegic and hemiplegic gait, as both pathologies are characterized by a stooped posture and relatively small step lengths.

Final Remarks
This paper proposes a remote healthcare system for obtaining an automatic gait pathology assessment, presenting the prototype of a web application that can be used over the Internet. The web application implements a web service that executes a deep learning gait pathology classification system, and reports results using a friendly graphical user interface. A novel gait pathology classification system based on a shallow CNN architecture was proposed, which performs at the same level as the state-of-the-art classification systems available. However, the developed system has two clear advantages over the state-of-the-art: (i) the proposed architecture has 83 times less parameters than an architecture based on VGG-19, with the corresponding advantages in terms of memory requirements, as well as training and testing times; (ii) The shallower network is less prone to overfitting, as confirmed by the cross-database tests reported, which is a major benefit when operating over the Internet and accepting gait video sequences acquired in very different conditions. To allow a more complete training and testing of the proposed classification system, a new and larger pathological gait dataset, GAIT-IT, was acquired. GAIT-IT contains 828 gait sequences, featuring 21 subjects simulating 4 different gait  Since this work focuses on the sagittal view, future work can consider integrating also frontal view analysis. The combination of orthogonal view points will result in more meaningful information, leading to an improved classification system. Furthermore, alternative system architectures allowing the simultaneous processing of multiple synchronously acquired sequences [35] can be considered to substitute the proposed CNN classification system. The web application can be further improved to allow training the system with new pathologies and additional representations in the advanced mode. Table 6: Confusion matrix detailing the classification results for each gait type, averaged between both systems in Table  5 using GEI and SEI inputs.

Predicted Class Class
Diplegic