Automated Gait Analysis Based on a Marker-Free Pose Estimation Model

Gait analysis is an essential tool for detecting biomechanical irregularities, designing personalized rehabilitation plans, and enhancing athletic performance. Currently, gait assessment depends on either visual observation, which lacks consistency between raters and requires clinical expertise, or instrumented evaluation, which is costly, invasive, time-consuming, and requires specialized equipment and trained personnel. Markerless gait analysis using 2D pose estimation techniques has emerged as a potential solution, but it still requires significant computational resources and human involvement, making it challenging to use. This research proposes an automated method for temporal gait analysis that employs the MediaPipe Pose, a low-computational-resource pose estimation model. The study validated this approach against the Vicon motion capture system to evaluate its reliability. The findings reveal that this approach demonstrates good (ICC(2,1) > 0.75) to excellent (ICC(2,1) > 0.90) agreement in all temporal gait parameters except for double support time (right leg switched to left leg) and swing time (right), which only exhibit a moderate (ICC(2,1) > 0.50) agreement. Additionally, this approach produces temporal gait parameters with low mean absolute error. It will be useful in monitoring changes in gait and evaluating the effectiveness of interventions such as rehabilitation or training programs in the community.


Introduction
A person's gait is referred to as his or her style or pattern of walking. Numerous factors, such as sex [1], age [2,3], walking speed [1,4,5], and type of disease, can affect gait. As gait patterns can be quite characteristic in certain disorders, such as the typical shuffling gait in parkinsonism, analyzing gait patterns can be very helpful in establishing medical diagnoses. Gait analysis has numerous applications in different fields, including clinical biomechanics, rehabilitation [6][7][8], sports science [9], robotics [9], ergonomics, and forensics [1,10]. Gait analysis is a critical tool in understanding the complexities of human movement and diagnosing movement-related conditions. It enables clinicians and researchers to identify biomechanical abnormalities, evaluate the effectiveness of treatment plans, develop personalized rehabilitation programs, and improve athletic performance. By understanding the nuances of an individual's gait, it is possible to improve mobility, reduce the risk of injury, and enhance quality of life.
This study proposes a method for automated temporal gait analysis using the Me-diaPipe Pose (3D top-down pose estimation model) with a single camera for running. The objective of the research is to evaluate and validate the reliability of this automated temporal gait analysis method compared to a reference-standard 3D Vicon motion capture system. The study involved analyzing a freely available dataset that includes synchronized digital video recordings of walking sequences and three-dimensional motion capture gait data [32]. The digital video recordings of walking sequence were initially analyzed using the MediaPipe Pose model to identify the location of each body joint in each video frame. An algorithm was then applied to automatically detect specific gait events, such as heel-strike and toe-off, based on the joint locations obtained. Temporal gait parameters were calculated using the detected gait events and compared to measurements obtained from the Vicon motion capture system, which is widely regarded as the gold standard for gait analysis.
The main contributions of this paper are outlined as follows:

1.
A markerless pose estimation model, MediaPipe Pose, which requires lower computational resources, was applied for the extraction of body key points from healthy individuals with promising accuracy and reduced inference speed.

2.
An algorithm was devised to automate the assessment of gait parameters based on the body key points extracted using MediaPipe Pose, eliminating the requirement for human intervention.

Dataset
In this research, a dataset that featured synchronized and calibrated video from multiple angles and motion capture was utilized [32]. The dataset was accessible to the public at http://bytom.pja.edu.pl/projekty/hm-gpjatk/ (accessed on 9 September 2022). It included three-dimensional motion capture data and walking video recordings of 32 healthy individuals, comprising 10 females and 22 males. The dataset did not contain identifiable information about the individuals, and the faces of the individuals in the video recordings were blurred.
This study included adult participants between the ages of 20 and 65 who did not have any injuries or conditions that could impact their ability to participate. However, individuals who relied on walking aids or had cognitive issues that could interfere with the study were excluded from participation. Additionally, pregnant women and children were also excluded from participation.

Video Data Collection
For the purpose of recording videos of healthy individuals walking, a setup of four calibrated and synchronized digital video cameras (Basler Pilot piA1900-32gc, Ahrensburg, Germany) was utilized ( Figure 1). A subset of the dataset identified as walking sequence s3, which featured a single pathway of 6.5 m, was used for this study. In this subset, individuals were recorded walking diagonally from right to left. The video recordings were captured at a resolution of 960 × 540 pixels and 25 frames per second (fps). However, the data of one healthy individual (data sequence id: p16s3) were excluded as they belonged to another subset (walking sequence s4: walking diagonally from left to right). As a result, a total of 31 healthy individuals were assessed in this research.
The study opted to use gait analysis based on videos captured from the side view, known as sagittal plane analysis, due to its advantages over frontal plane analysis. One advantage is that sagittal plane analysis is less susceptible to errors caused by changes in camera angle and position compared to frontal plane analysis. Another advantage is that it provides a clear and straightforward representation of the main gait features and overall gait pattern. To ensure accuracy, the study used video recordings captured from the right view using a specific digital camera labeled 'C1', which was found to provide more precise gait parameters for all gait cycles compared to recordings from the left view digital camera 'C3', as suggested by Stenum et al. [29].
Sensors 2023, 23, x FOR PEER REVIEW 4 of 20 gait pattern. To ensure accuracy, the study used video recordings captured from the right view using a specific digital camera labeled 'C1', which was found to provide more precise gait parameters for all gait cycles compared to recordings from the left view digital camera 'C3', as suggested by Stenum et al. [29].

Reference Standard-Vicon 3D Motion Capture System
In order to evaluate the accuracy and reliability of our system, a 3D motion tracking system was established, utilizing ten motion capture cameras (Vicon MX-T40, Denver, CO, USA) to generate a 3D skeleton of individuals while walking ( Figure 1). Prior to the walking test, the healthy individual was instructed to wear 39 retroreflective spherical markers on specific anatomical landmarks ( Figure 2). Out of these markers, 4 were placed on the head, 14 on the arms, 12 on the legs, 5 on the torso, and 4 on the pelvis. The Vicon Motion Capture (moCap) system, consisting of ten MX-T40 cameras with a resolution of 2352 × 1728 pixels, tracked the moCap data (i.e., 3D positions of markers) at a frequency of 100 Hz. The gait analysis tool included in the Vicon system was utilized to produce a set of gait results, which were used as a benchmark for comparison purposes.  [32]. (Remarks: RGB: digital video camera, IR: motion capture camera).

Reference Standard-Vicon 3D Motion Capture System
In order to evaluate the accuracy and reliability of our system, a 3D motion tracking system was established, utilizing ten motion capture cameras (Vicon MX-T40, Denver, CO, USA) to generate a 3D skeleton of individuals while walking ( Figure 1). Prior to the walking test, the healthy individual was instructed to wear 39 retroreflective spherical markers on specific anatomical landmarks ( Figure 2). Out of these markers, 4 were placed on the head, 14 on the arms, 12 on the legs, 5 on the torso, and 4 on the pelvis. The Vicon Motion Capture (moCap) system, consisting of ten MX-T40 cameras with a resolution of 2352 × 1728 pixels, tracked the moCap data (i.e., 3D positions of markers) at a frequency of 100 Hz. The gait analysis tool included in the Vicon system was utilized to produce a set of gait results, which were used as a benchmark for comparison purposes.

Pose Estimation Model for Gait Assessment
MediaPipe Pose is a Google-developed machine learning (ML) technology that uses RGB video frames to track the body pose of an individual by identifying 33 three-dimensional anatomical landmarks/body key points ( Figure 3). It is built on the BlazePose research [18] that also powers the ML Kit Pose Detection API (a lightweight solution for app developers to detect body poses in real-time). It is known for its low computation cost [18,31], which allows for real-time pose tracking, and its cross-platform compatibility. This makes it suitable for deployment on various devices, such as mobile phones, desktops/laptops, and even on the web, and in programming languages such as Python. There are three models (BlazePose GHUM Heavy, BlazePose GHUM Full, and BlazePose GHUM Lite) available for pose estimation in MediaPipe Pose, and for this study, the BlazePose GHUM Heavy model was chosen for gait assessment in the sagittal plane due to its accurate estimation of body key points. The minimum confidence levels for human tracking and key points detection were set at 0.5.

Pose Estimation Model for Gait Assessment
MediaPipe Pose is a Google-developed machine learning (ML) technology that uses RGB video frames to track the body pose of an individual by identifying 33 three-dimensional anatomical landmarks/body key points ( Figure 3). It is built on the BlazePose research [18] that also powers the ML Kit Pose Detection API (a lightweight solution for app developers to detect body poses in real-time). It is known for its low computation cost [18,31], which allows for real-time pose tracking, and its cross-platform compatibility. This makes it suitable for deployment on various devices, such as mobile phones, desktops/laptops, and even on the web, and in programming languages such as Python. There are three models (BlazePose GHUM Heavy, BlazePose GHUM Full, and BlazePose GHUM Lite) available for pose estimation in MediaPipe Pose, and for this study, the BlazePose GHUM Heavy model was chosen for gait assessment in the sagittal plane due to its accurate estimation of body key points. The minimum confidence levels for human tracking and key points detection were set at 0.5.

Gait Parameter Extraction
The proposed system utilized the Python application installed on a laptop (12th Gen Intel Core i7-12700H CPU) to conduct the temporal gait analysis based on the 3D markerless pose estimation model (MediaPipe Pose). The system inputted the walking video recorded using a digital video camera labeled 'C1′ and outputted the temporal gait analysis results in a comma-separated values (CSV) file for each healthy individual ( Figure 4). The pseudo-code for the proposed system is presented in Algorithm 1. The following section

Gait Parameter Extraction
The proposed system utilized the Python application installed on a laptop (12th Gen Intel Core i7-12700H CPU) to conduct the temporal gait analysis based on the 3D markerless pose estimation model (MediaPipe Pose). The system inputted the walking video recorded using a digital video camera labeled 'C1' and outputted the temporal gait analysis results in a comma-separated values (CSV) file for each healthy individual ( Figure 4). The pseudocode for the proposed system is presented in Algorithm 1. The following section details how temporal gait parameters are extracted from the X, Y, and Z locations of body key points utilizing signal analysis. Initialize MediaPipe Pose Estimator 2 while (current video frame ≤ last video frame) do 3 Identify the region-of-interest that contains human pose 4 Extract and save the positions of body keypoints in the region-of-interest 5 end while 6 Gap-filled-body-keypoints = Gap-fill (body keypoints) 7 Setup 10th order Butterworth low pass filter at normalized cut off frequency = 0.1752 8 Filtered-body-keypoints = Butterworth-low-pass-filter (Gap-filled-body-keypoints) 9 Calculate the relative changes in distance between the hip and foot-index for the left and right legs over time 10 Identify the peak and minima of the relative changes in distance between the hip and foot-index for the left and right legs over time 11 Heel-strike-event-timings-left-leg = Timings of peak occurrence (left leg) 12 Heel-strike-event-timings-right-leg = Timings of peak occurrence (right leg) 13 Toe-off-event-timings-left-leg = Timings of minima occurrence (left leg) 14 Toe-off-event-timings-right-leg = Timings of minima occurrence (right leg) 15 Stance-time = Time duration between heel strike and toe-off of the same leg 16 Swing-time = Time duration between toe-off and heel-strike of the same leg 17 Step  The pose estimation process using MediaPipe Pose involved a well-established twostep detector-tracker machine learning pipeline. In the first step, the detector identified the region-of-interest (ROI) containing the pose within each frame. Then, in the second step, the tracker extracted the positions of all 33 pose key points within this ROI. Each

Pose Estimation Using MediaPipe Pose
The pose estimation process using MediaPipe Pose involved a well-established twostep detector-tracker machine learning pipeline. In the first step, the detector identified the region-of-interest (ROI) containing the pose within each frame. Then, in the second step, the tracker extracted the positions of all 33 pose key points within this ROI. Each pose's key points included the following information: 1.
x and y: The coordinates of the key points, normalized to a range of [0.0, 1.0] based on the image width and height, respectively. 2.
z: The depth of the key points relative to the midpoint of the hips, where smaller values indicated proximity to the camera. The scale of z was roughly comparable to x. 3.
Visibility: A value ranging from 0.0 to 1.0, indicating the likelihood of the key points being visible and unobstructed in the image.
It is worth noting that, in the case of video, the detector was only applied to the first frame. For subsequent frames, the ROI was derived from the previous frame's pose ke points, as depicted in Figure 5.

Data Preprocessing (Gap Filling and Low Pass Filtering)
The body key points location data (x, y, and z coordinates) extracted using MediaPipe Pose was then gap-filled using cubic spline interpolation. The 10th-order Butterworth low pass filtering with a normalized cut-off frequency of 0.1752 was then applied to remove any spikes in the data series. This data preprocessing could reduce the noise that was not indicative of the real position of the body key points location extracted by MediaPipe Pose.

Temporal Gait Parameters Extraction: Identifying Key Gait Events
Heel strike and toe-off are the key gait events that aid the extraction of the temporal gait characteristics. Based on the gait cycle in Figure 6, the heel strike event occurs when the foot index is farthest forward (maximum relative distance between hip and foot index) while the toe-off event occurs when the foot index is farthest backward (minimum relative distance between hip and foot index).
To identify the occurrence of heel strike and toe-off events, the relative distance between the hip and foot index was calculated in pixels, horizontally ( Figure 7). In Figure 7, the heel strike events were indicated by the circle markers which represented the peak/maximum relative distance between the hip and foot index while the toe-off events were indicated by cross markers which represented the minima/minimum relative distance between the hip and foot index. To avoid the misidentification of heel strike and toeoff events, a time threshold was set at 0.8 s. At the same time, a peak was only detected when its value was larger or equal to 35% and 46% of the maximum relative distance between hip and foot index for the left and right legs, respectively, and the minimum was only detected when its value was smaller or equal to 18% of the minimum relative distance between hip and foot index for both legs. Based on the timing of the heel strike and toeoff events, the following temporal gait parameters were calculated and saved in a CSV file for each healthy individual: (i) Stance time: the duration between heel strike and toe-off of the same leg.
(ii) Swing time: the duration between toe-off and heel-strike of the same leg. (iii) Step time: the duration between consecutive heel strikes of both feet. (iv) Double support time: the duration between the heel strike of one leg and the toe-off

Data Preprocessing (Gap Filling and Low Pass Filtering)
The body key points location data (x, y, and z coordinates) extracted using MediaPipe Pose was then gap-filled using cubic spline interpolation. The 10th-order Butterworth low pass filtering with a normalized cut-off frequency of 0.1752 was then applied to remove any spikes in the data series. This data preprocessing could reduce the noise that was not indicative of the real position of the body key points location extracted by MediaPipe Pose.

Temporal Gait Parameters Extraction: Identifying Key Gait Events
Heel strike and toe-off are the key gait events that aid the extraction of the temporal gait characteristics. Based on the gait cycle in Figure 6, the heel strike event occurs when the foot index is farthest forward (maximum relative distance between hip and foot index) while the toe-off event occurs when the foot index is farthest backward (minimum relative distance between hip and foot index).
To identify the occurrence of heel strike and toe-off events, the relative distance between the hip and foot index was calculated in pixels, horizontally (Figure 7). In Figure 7, the heel strike events were indicated by the circle markers which represented the peak/maximum relative distance between the hip and foot index while the toe-off events were indicated by cross markers which represented the minima/minimum relative distance between the hip and foot index. To avoid the misidentification of heel strike and toe-off events, a time threshold was set at 0.8 s. At the same time, a peak was only detected when its value was larger or equal to 35% and 46% of the maximum relative distance between hip and foot index for the left and right legs, respectively, and the minimum was only detected when its value was smaller or equal to 18% of the minimum relative distance between hip and foot index for both legs. Based on the timing of the heel strike and toe-off events, the following temporal gait parameters were calculated and saved in a CSV file for each healthy individual: (i) Stance time: the duration between heel strike and toe-off of the same leg.

Statistics
Statistical analysis was conducted in IBM SPSS Statistics v26 to evaluate and validate the reliability and robustness of our system by comparing the temporal gait outcomes obtained from the Vicon motion capture system and our system. As the heel strike and

Statistics
Statistical analysis was conducted in IBM SPSS Statistics v26 to evaluate and validate the reliability and robustness of our system by comparing the temporal gait outcomes obtained from the Vicon motion capture system and our system. As the heel strike and

Statistics
Statistical analysis was conducted in IBM SPSS Statistics v26 to evaluate and validate the reliability and robustness of our system by comparing the temporal gait outcomes obtained from the Vicon motion capture system and our system. As the heel strike and toe-off events were the key gait events that aid the extraction of the temporal gait outcomes, descriptive statistics were conducted to compute the mean error, mean absolute error, and range of the mean error of heel strike and toe-off events timings obtained from the Vicon motion capture system and our system. Accordingly, descriptive statistics were conducted to compute the mean, standard deviation, mean error, and mean absolute error of the temporal gait parameters. Furthermore, to evaluate the statistically significant main effect, an independent samples t-test was performed. Correlation and absolute agreement between the two systems were then assessed using Pearson correlation coefficients (r) and intra-class correlation coefficients (ICC (2,1) ), respectively. The level of significance for all analyses was set at 0.05. The performance of ICC (2,1) was defined according to an accepted guideline that categorizes the result as poor (<0.500), moderate (0.500-0.750), good (0.750-0.900), and excellent (>0.900) [34]. Descriptive statistics, independent samples t-tests, Pearson correlation coefficients, and intra-class correlation coefficients were assessed for each gait cycle and the means of each healthy individual. Scatter plots of MediaPipe Pose versus the Vicon motion capture system were generated for the temporal gait parameters for each gait cycle and the means of each healthy individual.

Descriptive Statistics for Key Gait Events (Heel Strike and Toe-Off)
In Table 1, the number of gait events detected by the Vicon moCap System is taken as the baseline for comparison. Our MediaPipe Pose-based system was able to identify 103 out of 106 heel strike events for the left leg (97.17%) and 101 out of 102 heel strike events for the right leg (99.02%). The MediaPipe Pose system also accurately detected 100 out of 102 toe-off events for the left leg (98.04%) and 100 out of 105 toe-off events for the right leg (95.24%). However, there were two instances where the system produced false detectionsone for a left heel-strike event and one for a right toe-off event. In terms of differences between the Vicon moCap and MediaPipe systems, the mean error in identifying heel strike and toe-off events ranged from −4 ms to 20 ms, and the mean absolute error in these events ranged from 20 ms to 30 ms.

Statistical Analysis of Temporal Gait Parameters for All Gait Cycle
The statistical analysis only considered complete gait cycles when assessing temporal gait parameters. Table 2 shows that the average error between the Vicon moCap and the MediaPipe Pose systems in temporal gait parameters, such as stance time, swing time, step time, and double support time (for each gait cycle), ranged from −20 ms to 40 ms. The mean absolute error in these parameters ranged from 30 ms to 50 ms.    Figure 8 indicate that there was a linear increase in the temporal gait parameters obtained from the MediaPipe Pose system with those obtained from the Vicon motion capture system for all gait cycles. This suggested a positive association between the two systems, as shown in Figure 8. The correlation between the MediaPipe Pose system and the Vicon motion capture system was found to be strong for stance time and step time, but only moderate for swing time and double support time.

Statistical Analysis of Temporal Gait Parameters for the Means of Each Healthy Individual
In this section, the statistical analysis assessed the means of temporal gait parameters for each healthy individual. Table 4 shows that the average error between the Vicon moCap and the MediaPipe Pose systems in temporal gait parameters, such as stance time, swing time, step time, and double support time (for each healthy individual means), ranged from −20 ms to 40 ms. The mean absolute error in these parameters ranged from 20 ms to 40 ms.  Table 5 reveals that there were no significant differences in stance time (left and right), swing time (left and right), step time (left and right), and double support time (left leg switched to right leg) between the Vicon moCap and MediaPipe Pose systems, as the significance (2-tailed) values from the independent samples t-test were greater than the significance level of 0.05. However, there was a significant difference in double support time (right leg switched to left leg) between the Vicon moCap and MediaPipe Pose systems, as the significance (2-tailed) value was less than 0.05. The Pearson correlation and intraclass correlation coefficient tests were significant at the 0.01 level (2-tailed) for all temporal gait parameters. The Pearson correlation coefficient was good for all the temporal gait parameters (ranging from 0.802 to 0.955), except for swing time (right), which was rated as moderate (0.635). The intraclass correlation coefficient was excellent for stance time (left) and step time (left and right) (ranging from 0.923 to 0.956), good for stance time (right), swing time (left), and double support time (left leg switched to right leg) (ranging from 0.765 to 0.893), while it was moderate for swing time (right) and double support time (right leg switched to left leg) (ranging from 0.551 to 0.579). The scatter plots in Figure 9 indicate that there was a linear increase in the temporal gait parameters obtained from the MediaPipe Pose system with those obtained from the Vicon motion capture system for the means of each healthy individual. This suggested a positive association between the two systems, as shown in Figure 9. The correlation between the MediaPipe Pose system and the Vicon motion capture system was found to be strong for all temporal gait parameters, except left swing time, right swing time, and double support time (right leg switched to left leg).

Discussion
The aim of this study was to evaluate the accuracy and reliability of using the Medi-aPipe Pose model to provide an automated gait analysis without human intervention. The accuracy of this approach was compared to a three-dimensional Vicon motion capture system, which used ten motion capture cameras and built-in gait analysis software. This study had shown the potential of the application of markerless, automated gait analysis based on MediaPipe Pose to enable assessment for a wider range of individuals.

Performance of MediaPipe Pose
The BlazePose GHUM Heavy model of MediaPipe Pose was utilized to obtain precise body key points locations in this research. Nevertheless, this led to an increase in inference latency, which resulted in MediaPipe Pose estimating the body key points' location with an average inference speed of 9 fps on the CPU. The reason MediaPipe Pose was chosen over other pose estimation models such as OpenPose and PoseNet is that it employs a topdown approach, where human candidates are first detected by a human detector, and then single-person pose estimation is performed. This approach yields more accurate key point detection than a bottom-up approach where key points are predicted all at once and then assembled into full poses for all individuals. Although the top-down approach is time-consuming because the pose of each person is estimated independently and the inference time is proportional to the number of detected persons, MediaPipe Pose is a singleperson pose estimator model and human detection is not performed in each frame, thus enabling faster inference [31].

Temporal Gait Parameters Assessment
The study found that the MediaPipe Pose system has the potential for quantitative temporal analysis of gait and is suitable for clinical and biomechanical assessments of human walking. Based on the descriptive statistics, the system has shown low mean absolute error in assessing temporal gait parameters for all gait cycles and the means of each healthy individual.
Overall, the statistical test result for the means of each healthy individual is better compared to the statistical test result for all gait cycles. For each healthy individual means, the independent samples t-test results have shown that there is no significant difference in the temporal gait parameters between the MediaPipe Pose and Vicon moCap system except for double support time (right leg switched to left leg). The Pearson correlation coefficient is satisfactory for all temporal gait parameters, except for swing time (right)

Discussion
The aim of this study was to evaluate the accuracy and reliability of using the Me-diaPipe Pose model to provide an automated gait analysis without human intervention. The accuracy of this approach was compared to a three-dimensional Vicon motion capture system, which used ten motion capture cameras and built-in gait analysis software. This study had shown the potential of the application of markerless, automated gait analysis based on MediaPipe Pose to enable assessment for a wider range of individuals.

Performance of MediaPipe Pose
The BlazePose GHUM Heavy model of MediaPipe Pose was utilized to obtain precise body key points locations in this research. Nevertheless, this led to an increase in inference latency, which resulted in MediaPipe Pose estimating the body key points' location with an average inference speed of 9 fps on the CPU. The reason MediaPipe Pose was chosen over other pose estimation models such as OpenPose and PoseNet is that it employs a top-down approach, where human candidates are first detected by a human detector, and then single-person pose estimation is performed. This approach yields more accurate key point detection than a bottom-up approach where key points are predicted all at once and then assembled into full poses for all individuals. Although the top-down approach is time-consuming because the pose of each person is estimated independently and the inference time is proportional to the number of detected persons, MediaPipe Pose is a single-person pose estimator model and human detection is not performed in each frame, thus enabling faster inference [31].

Temporal Gait Parameters Assessment
The study found that the MediaPipe Pose system has the potential for quantitative temporal analysis of gait and is suitable for clinical and biomechanical assessments of human walking. Based on the descriptive statistics, the system has shown low mean absolute error in assessing temporal gait parameters for all gait cycles and the means of each healthy individual.
Overall, the statistical test result for the means of each healthy individual is better compared to the statistical test result for all gait cycles. For each healthy individual means, the independent samples t-test results have shown that there is no significant difference in the temporal gait parameters between the MediaPipe Pose and Vicon moCap system except for double support time (right leg switched to left leg). The Pearson correlation coefficient is satisfactory for all temporal gait parameters, except for swing time (right) which was rated as moderate. The intraclass correlation coefficient was good for stance time (left and right), step time (left and right), swing time (left), and double support time (left leg switched to right leg), while moderate for swing time (right) and double support time (right leg switched to left leg). This is because the duration of the double support times is very short (0.20 s). Thus, it is hard for the Mediapipe Pose system to calculate the double support time using the input video recorded at low frames per second, 25 fps. In addition, the misidentification of the left and right lower limbs by the MediaPipe Pose system (Figure 10) affects the accuracy of the temporal gait parameters assessment. This effect is minimized through data filtering techniques but still has an impact on the accuracy of gait events (heel strike and toe-off) identification, which is important for further extraction of temporal gait parameters.  (Figure 10) affects the accuracy of the temporal gait parameters assessment. This effect is minimized through data filtering techniques but still has an impact on the accuracy of gait events (heel strike and toe-off) identification, which is important for further extraction of temporal gait parameters.
(a) (b)  Table 6 presents gait analysis performed using Azure Kinect and Kinect v2, which can evaluate spatiotemporal gait parameters such as step time, step length, step width, stride length, and stride time. The relative error for spatial gait parameters ranges from −0.001 m to 0.040 m, while the relative error for temporal gait parameters ranges from 0.000 s to 0.010 s. In our study, we propose a gait analysis method based on a markerless pose estimation model (MediaPipe Pose) that can assess spatiotemporal gait parameters including stance time, swing time, step time, and double support time. The relative error for temporal gait parameters in our analysis ranges from −0.02 s to 0.02 s. Both the gait analysis using Azure Kinect and Kinect v2, as well as our proposed method, demonstrate low relative errors in assessing spatiotemporal gait parameters.

Qualitative Comparison with Other Works
Nevertheless, our approach attains similar spatiotemporal gait parameters even at a lower video resolution and frame rate when compared to the gait analysis conducted with Azure Kinect and Kinect v2. It is important to acknowledge that the limited video frame rate of 30 fps in Azure Kinect and Kinect cameras imposes restrictions on their suitability for gait assessment during faster walking or running scenarios. Conversely, our method allows for gait assessment in faster walking or running conditions by leveraging videos recorded at higher frame rates using high-speed cameras.
In the future, our study will broaden its scope to include cerebellar ataxia patients. The observed gait differences in cerebellar ataxia patients during preferred paced walking, when compared to healthy individuals, include reduced walking speed, cadence, step length, stride length, and swing phase, as well as increased base width, stride time, step time, stance phase, and double limb support phase. Additionally, there will be an evident increase in variability within step and stride parameters. Among these parameters, the most significantly affected ones in cerebellar ataxia are speed, double limb support phase  Table 6 presents gait analysis performed using Azure Kinect and Kinect v2, which can evaluate spatiotemporal gait parameters such as step time, step length, step width, stride length, and stride time. The relative error for spatial gait parameters ranges from −0.001 m to 0.040 m, while the relative error for temporal gait parameters ranges from 0.000 s to 0.010 s. In our study, we propose a gait analysis method based on a markerless pose estimation model (MediaPipe Pose) that can assess spatiotemporal gait parameters including stance time, swing time, step time, and double support time. The relative error for temporal gait parameters in our analysis ranges from −0.02 s to 0.02 s. Both the gait analysis using Azure Kinect and Kinect v2, as well as our proposed method, demonstrate low relative errors in assessing spatiotemporal gait parameters.

Qualitative Comparison with Other Works
Nevertheless, our approach attains similar spatiotemporal gait parameters even at a lower video resolution and frame rate when compared to the gait analysis conducted with Azure Kinect and Kinect v2. It is important to acknowledge that the limited video frame rate of 30 fps in Azure Kinect and Kinect cameras imposes restrictions on their suitability for gait assessment during faster walking or running scenarios. Conversely, our method allows for gait assessment in faster walking or running conditions by leveraging videos recorded at higher frame rates using high-speed cameras.
In the future, our study will broaden its scope to include cerebellar ataxia patients. The observed gait differences in cerebellar ataxia patients during preferred paced walking, when compared to healthy individuals, include reduced walking speed, cadence, step length, stride length, and swing phase, as well as increased base width, stride time, step time, stance phase, and double limb support phase. Additionally, there will be an evident increase in variability within step and stride parameters. Among these parameters, the most significantly affected ones in cerebellar ataxia are speed, double limb support phase duration, and variability in stride time [35]. Thus, our approach focuses on evaluating novel spatiotemporal gait parameters, such as stance time, swing time, step time, and double support time, in healthy individuals as a basis for future comparison with cerebellar ataxia patients.

Implications of the Proposed Approach in Clinical Settings
Automated gait analysis based on MediaPipe Pose has the potential to improve the diagnosis and treatment of gait abnormalities in clinical settings. The analysis can help identify gait abnormalities at an early stage, providing an objective way to assess gait and track progress, and tailoring treatment interventions to the specific needs of each patient. Automated gait analysis can also be a cost-effective alternative to traditional gait analysis methods, making it more accessible to a wider range of patients. Remote monitoring of gait using automated analysis can be used to improve patient compliance with treatment and reduce the burden on healthcare providers.
However, it is essential to verify the effectiveness of automated gait analysis based on pose estimation in various clinical populations, including both adults and children. Previous research has indicated that current pose estimation algorithms can accurately track the gait of patient populations using walking aids [38], but tracking patient populations who use prosthetic devices that differ from those used to train the algorithms presents challenges [39]. Therefore, it is critical to validate the accuracy and reliability of automated gait analysis in diverse patient populations before implementing it in clinical settings. Although this study used a pre-trained network [18], utilizing a network specifically trained for gait and clinical conditions may improve the accuracy of the results.

Conclusions
The automated temporal gait analysis based on a markerless pose estimation model (MediaPipe Pose) can be used to calculate temporal gait parameters, including stance time, swing time, double support time, and step time, with low mean absolute error without any human intervention. The approach exhibits excellent intraclass correlation coefficients for stance time (left) and step time (left and right) (0.923 to 0.956), good intraclass correlation coefficients for stance time (right), swing time (left), and double support time (left leg switched to right leg) (0.765 to 0.893), and moderate intraclass correlation coefficients for double support time (right leg switched to left leg) and swing time (right) (0.551 to 0.579). These parameters are essential for monitoring changes in gait and assessing the efficacy of interventions such as rehabilitation or training programs. The method is cost-effective and accessible compared to instrumented gait analysis, making it possible to conduct large-scale gait analysis in different populations. Additionally, it enables tracking of gait patterns in real-life situations, providing more naturalistic validity and a better understanding of gait irregularities during daily activities.

Limitations and Future Work
At present, a markerless MediaPipe Pose model-based automated gait analysis system has achieved a satisfactory level of accuracy in detecting left and right heel strike and toe-off events during gait analysis, with detection rates ranging from 95.24% to 99.02%. However, the system produces two false detections, resulting in some missing and inaccurate temporal gait parameter calculations. To improve the system, future work will explore better alternative approaches such as moving averages to identify peaks (heel strike event) and minima (toe-off event). Additionally, the walking sequence of healthy individuals could be captured at a higher resolution and frame rate to minimize the misidentification of lower limbs caused by fast walking speeds, resulting in more precise identification of gait events.
Moreover, the proposed approach could be further enhanced by incorporating spatial gait parameters and lower limb joint kinematics, making it a powerful tool for pathology evaluation. For instance, by examining the variability of spatial gait parameters, temporal gait parameters, and spatiotemporal gait parameters, it is possible to evaluate the progression of Friedreich ataxia [40]. Additionally, the spatiotemporal parameters and lower extremity kinematics during the gait cycle of adult patients with cervical spondylotic myelopathy differ from those of healthy individuals. By identifying the relationship between abnormal spinal alignment and lower extremity function, as well as the specific gait and biomechanical issues that myelopathic patients experience, clinicians can gain a better understanding of the disease and develop more effective rehabilitation protocols [41].  Institutional Review Board Statement: Not applicable. A publicly available dataset [32] was used in this work.
Informed Consent Statement: Not applicable. A publicly available dataset [32] was used in this work.

Data Availability Statement:
The GPJATK dataset that featured synchronized and calibrated video from multiple angles and motion capture is accessible to the public at http://bytom.pja.edu.pl/ projekty/hm-gpjatk/ (accessed on 9 September 2022) [32]. The GPJATK Dataset Release Agreement has been signed to use this dataset in this study.