Abstract
The accurate diagnosis of Hip Osteoarthritis (HOA) and the prediction of Total Hip Arthroplasty (THA) outcomes are crucial for reliable decision-making on treatment and rehabilitation strategies. Gait analysis (GA) is commonly employed for gait disorder examination in clinical settings, but it is still limited due to the massive data size and accuracy problems. A Machine Learning (ML) methodology has seen rapid growth in the past decade, but its development in the context of HOA and THA GA has not been previously examined. Thus, the novel contribution of this review is the evaluation of the current state of ML frameworks for the analysis of HOA and post-THA gaits. Five databases, namely PubMed, Embase, IEEE Xplore, ACM Digital Library, and Scopus, were searched in accordance with the PRISMA framework. Relevant publications published until May 2025 were retrieved, and information on reliability, applicability, and interpretability were extracted for quality assessment. Out of the 759 publications initially considered, 19 studies were selected, with 14 articles focused on classification and 5 articles on outcome prediction. Eight classification studies utilized kinematic features, while four outcome prediction articles utilized spatiotemporal parameters and mostly focused on post-THA gaits. The reported accuracy ranges between 70 and 100%, with the support vector machine (SVM) as the most frequently utilized ML algorithm. Scarce datasets, small sample sizes, and limited design description were the main hindrances revealed in our quality assessment. Nevertheless, this review demonstrated the recent developments in the utilization of ML techniques and evidently improved applicability through a consensus on the important gait features for HOA and post-THA gait analysis. Reliability and interpretability are still major concerns before ML models become widely accepted by medical practitioners. Future research should consider dataset quality, transparent validation protocol, model interpretability, and results’ explainability.
1. Introduction
HOA is one of the most common musculoskeletal diseases, affecting 25% of the population in a lifetime [1]. It progressively degenerates the hip joint and eventually leads to its dysfunction [2]. The resulting pain causes lateral trunk bending, which overloads other parts of the lower limb for compensation [3]. Gradual deterioration often leads to advanced HOA, which generally requires a THA operation. It was recently reported that there is a steady growth in the demand for THA, and it is expected to increase worldwide [4]. While THA is considered a successful operation that alleviates pain and restores walking functionality in most subjects, it is pointed out in the literature that improvements do not return to normal post-THA [5].
Medical imaging is the most common objective method used for diagnosis and therapeutic interventions in osteoarthritis [6]. However, clinical symptoms are not consistent with the results from imaging. In the context of assessing hip functionality, patient-based measures such as the Harris Hip Score are the accepted standard in the evaluation of rehabilitation progress post-THA [7]. Due to the subjective nature of the questionnaires, biases can be introduced by the patient and the raters.
In recent decades, GA has been widely used for the examination of gait alterations based on motion data such as kinematics, kinetics, muscle activations, and spatiotemporal (ST) information. These gait features have known discriminating abilities and have seen applications in sports, security, and health informatics [8]. More broadly, GA belongs to multiple research disciplines: biomechanics, clinical medicine, neuroscience, biomedical engineering, and forensics.
In clinical settings, GA is extensively investigated, with a focus on the pathological abnormalities of neuromuscular and musculoskeletal diseases. Specifically, for HOA and THA research-related reports, gait classification, severity and progress prediction, and relevant parameters are the key focus [9,10,11,12]. Gait alterations could already be manifested prior to the outset of observable functional impairments; thus, discriminating features are important for clinical applicability.
The large volume of information and diversity produced by GA is a prohibitive challenge for clinicians to fully interpret results. Additionally, traditional statistical methods have difficulty in synthesizing non-linear multi-dimensional gait data. To overcome these limitations, ML techniques are increasingly employed in the analysis of gait data. ML algorithms can capture non-linear patterns, especially in its recent subset; DL models can handle big datasets with an outstanding accuracy [1,13,14].
In the context of HOA and THA gait research, ML algorithms are employed to address a wide range of clinical topics. Such applications include the determination of best discriminating parameters [15,16,17], discriminating HOA or post-THA gaits with other pathologies [18,19,20], distinguishing between healthy and HOA and/or post-THA gaits [21,22], and predicting risks and recovery after THA [23,24]. However, there is still a hesitancy in acceptance from medical practitioners. There is a gap between development and practice due to the following reasons: (1) applicability, which is the relevancy and feasibility of the studies with regard to HOA or THA research trends; (2) result interpretability [25], which is the mapping from input to output and the explanation of results that can foster trust from users; and (3) reliability, which is the performance, consistency, and validity of the ML algorithms developed for GA.
A number of recent surveys were published on the application of artificial intelligence in gait analyses for neuromuscular [26,27,28] and musculoskeletal diseases [29]. Jiao et al. [27] focused on the automatic classification system for post-stroke gaits, while Kohnehshahri et al. [28] provided a comprehensive survey for cerebral palsy and stroke survivor subjects. Both survey papers are focused on ML methodologies as a data-driven technique in the analysis of gait patterns. Likewise, Franco et al. [26] surveyed papers that applied DL and GA in Parkinson’s Disease for classification, diagnosis, and monitoring. The reviewed papers were categorized into gait acquisition methods, namely wearable sensors and video capture. The common difficulties realized through these surveys were clinical utility and interpretability, which limit the use for real-world applications. Conspicuously, only a single musculoskeletal-related survey was published that investigated gait-related modifications after post-knee surgery through an ML framework [29]. Even so, only six articles were included in the survey, all of which focused on the classification task utilizing non-DL methods, while five articles were conducted on subjects after post-total knee arthroplasty.
Notably, to the best knowledge of the authors, no previous review has comprehensively synthesized ML studies addressing the analysis of both HOA and post-THA gaits. The main objective of our systematic review was to investigate the latest relevant published reports in this context, using biomechanical data from GA. By providing a comprehensive review of the current state in the application of artificial intelligence (AI) in gait analysis, benefits and limitations can be appraised, thus providing information for future directions.
2. Methods
This systematic review follows the PRISMA guidelines in the selection of papers and the methodology applied [30].
2.1. Search Strategy
Databases were judiciously selected based on their discipline to improve search results. As such, two technological databases, namely IEEE Xplore and ACM Digital Library, and two medical databases, namely Embase and PubMed, were selected. Additionally, Scopus was added for search as a hybrid of technological and medical databases. The PICO strategy [30] was utilized to formulate the search terms with Problem: HOA or THA; Interest: Machine Learning; and Context: Gait Analysis. Unique search terms are shown in Table 1 and, to comply with each database’s syntax requirements, these search terms are adapted and described in Appendix A. The final search was completed on 4 May 2025. All identified reports were uploaded to Endnote 21 for screening and review.
Table 1.
Unique search terms.
2.2. Screening Method
Two reviewers independently screened identified articles in a two-step process—(1) title and abstract and (2) full-text—using the inclusion and exclusion criteria. At the end of each step, any discrepancies were addressed through a face-to-face meeting with a third reviewer who is accorded with the final decision. A standard web-based tool, Covidence systematic review software (https://support.covidence.org/doc/how-can-i-cite-covidence, accessed on 3 November 2025) was utilized in the study. This streamlined the process, ensuring an effective collaboration through a shared real-time system, and reduced the risk of bias.
2.3. Inclusion and Exclusion Criteria
To determine the relevance of the identified reports to the objectives of this review, specific inclusion and exclusion criteria were applied. The included studies had to satisfy the following conditions:
- Studies that focus on HOA and THA in human subjects,
- Studies using artificial intelligence in the realm of ML and DL,
- Studies from January 2000 to 4 May 2025,
- Studies that deal with the analysis of gait utilizing input parameters such as kinematics, kinetics, ST, EMG, and vision data.
Thereafter, studies were excluded with the following conditions:
- Studies that simulate HOA or post-THA gaits,
- Studies involving robot rehabilitation or prosthetic legs,
- Studies focusing on other musculoskeletal diseases (e.g., knee osteoarthritis),
- Studies with input parameters that are not gait related,
- Non-English studies.
Explicitly, conference papers were considered in this review due to the emerging nature of this field and for avoiding publication bias.
2.4. Data Extraction
A single reviewer is tasked with extracting data utilizing a Covidence extraction template based on the PICO strategy, as shown in Table 2. In addition, general information, such as authors’ names, institutions, country, and publication year, was also extracted.
Table 2.
Data extraction.
2.5. Quality Assessment
Two reviewers assessed the quality of the included reports based on a set of questions through the RoB 2 tool (Risk of Bias) using the information acquired from the extracted data. To properly synthesize the proposed methods, the extracted data were categorized into reliability, applicability, and interpretability, as described in Table 3. An article is considered of high quality for a given category if a majority of its domains are met.
Table 3.
Quality assessment description.
3. Results
3.1. Search Results
Figure 1 summarizes the workflow to achieve this systematic review. The initial search generated 759 articles: 375 from Scopus, 63 from PubMed, 205 from Embase, 117 from ACM Digital Library, and 14 from IEEE Xplore. A total of 95 duplicates were automatically removed by the Covidence software, leaving 664 articles to be screened.
Figure 1.
PRISMA flowchart.
Two independent reviewers screened the titles and abstracts of the remaining articles based on the inclusion criteria. A total of 31 reports were eligible for full-text review. In the event of overlapping authorship, the said reviewer was not involved in any of the steps for the appraisal of the report. Subsequently, a single reviewer was tasked to download the eligible reports which were available and accessible. Ten reports met with disagreements in the vote, and, with the involvement of a third reviewer, eight and two reports were excluded and included, respectively, in the next stage. A total of 12 reports were excluded, with 9 as wrong population, 2 as non-ML method used, and a single report with incorrect input features utilized. Finally, 19 articles were selected for data extraction and quality assessment.
To understand the relationship between keywords used in the search, a visualization of similarities through VOSviewer version 1.6.20 was designed, as shown Figure 2. Evidently, THA is the most extensively researched topic, while DL and ML are the most recent. Also, strong connections are seen for ML and THA, GA, and HOA subjects, demonstrating clear relations among these keywords for the papers screened for this review.
Figure 2.
Similarity of keywords used in the review (https://www.vosviewer.com/, accessed on 22 July 2025).
3.2. Data Extraction
Out of the 19 included articles, 15 of these were published in the past 5 years, with the majority published just in the past few years, as presented in Figure 3. As such, DL methodologies were also introduced in these recent years [16,17,31].
Figure 3.
Annual ML publication of gait analysis for HOA and THA.
3.2.1. Dataset Information
The majority of the included studies included a small number of participants, with 10 articles utilizing 50 or less participants [15,18,21,22,31,32,33,34,35,36]. Notably, five studies had sample sizes of 100 or more participants [16,17,20,37,38], incorporating at least 1000 events for gait analysis. Information about sampling is summarized in Figure 4a. Four articles have not elaborated the participants’ demographics [18,20,21,24], while three articles matched the age and Body Mass Index (BMI) of the participants for the classes [15,32,39]. The same number of articles have a controlled gender ratio [19,32,33]. Only two articles have utilized a public dataset [40] that consists of healthy patients, and those before and after the THA operation [16,17]. Accordingly, a potential dataset bias is introduced due to sample size restrictions and demographic skew.
Figure 4.
(a) Sampling size information; (b) HOA/post-THA gait classes considered.
3.2.2. Problem Classification
Fourteen of the studies were classification tasks [15,16,17,18,19,20,21,22,32,33,35,36,37,39], with five studies solely focused on the diagnosis of HOA gaits [15,16,19,32,33]. Conspicuously, Choi et al. [33] employed a clustering method to further classify HOA gaits on severity levels. Four other studies were focused on THA surgery outcome predictions through binary classification [22,35,36] and multi-label classification with healthy and HOA gaits [17]. Particularly, only Ghaffari et al. aimed at finding gait pattern differences between hip and knee osteoarthritis [19].
The other five studies were prediction tasks [23,24,31,34,38] with a variety of objectives. Cornish et al. [31] predicted the hip contact forces and kinematic angles of HOA subjects using EMG and pose estimates through a vision-based marker-less system. Miyazaki et al. [38] predicted gait patterns before and after THA surgery, which are important for locomotive syndrome. Dindorf et al. [34] predicted the most significant parameters of post-THA gaits, and dimensionality reduction methods were employed. Additionally, Polus et al. [23] predicted the risk of fall for THA patients, while Surmacz et al. [24] predicted the recovery of patients after THA operation. Figure 4b shows that only three studies have considered both HOA and post-THA gaits in their article [17,23,38], while five studies focused only on post-THA gaits [22,24,34,35,36].
3.2.3. Modality and Input Feature
Nine articles have utilized state-of-the-art 3-Dimensional Gait Analysis (3DGA) [15,16,17,20,31,32,33,34,38] that may consist of optical cameras, inertial measurement units (IMUs), force plates, and bipolar surface electrodes to accurately measure gait parameters. Seven articles only require wearable sensors, of which a majority are IMUs [22,34,35,36,37,39]. Interestingly, two recent studies only needed the use of smartphones to acquire gait information [23,24], while another recent study proposed the use of a marker-less vision-based system [21].
Consequently, nine studies utilized kinematic parameters [15,16,17,19,22,34,35,36,37], another nine studies employed ST parameters [20,21,22,23,24,31,38,39], two articles exclusively used kinetic parameters [32,33], and two articles have employed muscle activation information through electromyography (EMG) [18,31]. Feature extraction and selection were performed and explicitly discussed in ten articles [20,21,22,23,34,35,36,37,38,39], which were the main focus of a single study conducted by Miyazaki et al. [38].
3.2.4. Machine Learning Algorithm
The majority of the included papers, sixteen in total, utilized traditional ML or non-DL methods, except for three recent papers [16,17,31]. For these traditional ML methods, the most popular algorithm used is SVM [15,20,21,22,23,32,35,36,37,39]. Other ML algorithms such as k-Nearest Neighbor (kNN) [19,20,32,39], random forest (RF) [32,34,35], decision trees (DT) [39], Fuzzy Inference System (FIS) [20], and several types of regression models [18,24,39] were also considered.
Seven articles conducted a comparable study on several ML methods [18,20,23,31,32,35,39]. Notably, the article by Choi et al. [33] is the only one that has considered an unsupervised methodology utilizing a k-means clustering algorithm. For DL models, only long short-term memory (LSTM) and convolutional neural networks (CNNs) were considered [16,17,31]. On the other hand, interpretability and explainability concepts for ML methods were only examined in two articles [35,36].
Hyperparameter tuning was mentioned in seven articles [16,17,18,19,31,37,39], albeit only two papers explicitly discussed the method used: grid search by Ghaffari et al. [19] and the hyperband algorithm by Cornish et al. [31]. Figure 5 shows the hierarchical extracted data according to the problem classification, ML method, and input feature categories.
Figure 5.
Extracted data categories.
3.2.5. Performance and Validation
With the exception of Miyazaki et al.’s [38] study on feature importance prediction, the rest of the articles have split their dataset into at least two groups: a training set, a validation set, and/or a test set. Of these articles, ten studies have held out a set or group, explicitly not seen in training, for performance: a test set [17,18,19,21,24,37], leave-one-subject-out (LOSO) [31], leave-one-group-out (LOGO) [23,34], and other classes [36]. Eleven studies employed some form of k-fold cross-validation (CV) [15,16,18,20,22,23,24,32,35,37,39], and trained the dataset iteratively to find the best performing model.
Accuracy is the most adopted performance measure among the studies, with a few exception articles, which used the symmetric mean absolute percentage error (SMAPE) [33], mean-square error (MSE) [31], and cumulative contribution [38]. Other performance measures, such as sensitivity, specificity, recall, and precision, were also added for a further performance analysis of the developed model.
Apart from the recent DL-related studies, SVM is found to have the highest performance among the traditional ML methods [20,23,34,39] within 70–100%. DL models [16,17,31] consistently provide superior performances, with accuracies above 95% as reported in the articles.
3.2.6. Results Interpretation
In terms of ST parameters, gait speed is the most reported, with the highest discriminating feature [20,21,24,39], followed by stride time [20,39]. For kinematic parameters, the sagittal hip angle is the highest discriminating feature as reported by eight articles [15,16,17,22,33,34,35,36], followed by the sagittal angle of the knee [15,17,34,35,36]. Subsequently, Nair et al. [18] utilized EMG information and reported gluteus medialis as the most important muscle for classification during loading and mid-stance. The summary of the extracted data for classification and prediction tasks is presented in Table 4, Table 5, Table 6 and Table 7.
Table 4.
Summary of data extracted for classification task—Part I.
Table 5.
Summary of extracted data for classification task—Part II (continued).
Table 6.
Summary of extracted data for classification task—Part III (continued).
Table 7.
Summary of extracted data for prediction task.
3.3. Quality Assessment
Table 8 summarizes the quality assessment results of the reviewed articles. Only one study had adequately achieved all quality categories [35]. In the context of reliability, ten articles sufficiently addressed the hurdles [16,17,23,31,32,33,34,35,38,39]. Conspicuously, only two studies explicitly addressed interpretability [35,36], while all but two studies adequately addressed concerns on applicability [24,32].
Table 8.
Quality assessment results.
4. Discussion
The main objective of this systematic review was to examine the application of ML methodology in gait analysis for subjects with HOA and THA. In this section, we summarize the results based on the previous section’s quality assessment, expounding on key characteristics: dataset information, model validation and evaluation, model algorithm selection or design, interpretability and explainability, and feature identification.
4.1. Dataset Information
Firstly, the quantity and quality of data is the most important factor for reliability. Thus far, a sufficiently large and diverse dataset, in most of the studies included, is lacking. Fourteen studies utilized less than a hundred participants. While there is no standard in the size of the dataset, a relatively small sample size adversely affects the generalizability of ML algorithms for the precise prediction of unforeseen data [41]. To mitigate this issue, multiple trials were conducted with each participant, thus artificially increasing the dataset, but this could have introduced bias and overfitting.
With regard to post-THA gait reports, measurements were conducted between 2 weeks and 6 months. A standardized time to conduct the measurements is strongly recommended. It can potentially enhance the reliability in benchmarking comparisons for the effectiveness of treatment and the performance of prediction capabilities of ML algorithms as well.
The most comprehensive and publicly available dataset was provided by Bertaux et al. [40] quite recently. This fully described 3DGA dataset is composed of 80 healthy individuals and 106 patients with unilateral HOA before and after THA operation. The use of a publicly available dataset allows researchers to contribute to common bodies of knowledge, and algorithm validation can be easily performed through the comparison of performances. Consequently, the kinematic information of the dataset has been used for the development of DL models [16,17] for classification problems. In addition, healthy, pre- and post-THA gait classes, in one protocol, are necessary for ML models to identify severity outcomes and provide insights into the effects of treatment [13].
For ML research purposes, it is highly recommended to have another independent publicly accessible dataset that also contains pre-THA and post-THA gait data. Thus, the results can be validated on the effects of treatment and rehabilitation. To achieve this, a close collaboration with relevant institutions is necessary.
On the other hand, data quality is tied to data acquisition, diversity, and class balancing. Primarily, data acquisition has a significant contribution to quality; thus, a complete description is necessary. The variability of markers or IMU placements by clinicians on anatomical landmarks can greatly affect the reliability and robustness of raw measurements. The error can be introduced between subjects or before and after an operation. Notably, less than half of the included articles were able to achieve this.
Subsequently, most of the included studies have provided generic demographic descriptions of the participants (gender, age, height, weight, etc.), describing limited diversity. When compounded with class imbalance, this can lead to a risk of bias towards the majority class [41]. Thus, dataset matching is crucial to improve the generalizability of ML models. Arguably, data availability may not be known beforehand; thus, dataset matching is an ongoing challenge towards deployment and application. To address this issue, generative artificial intelligence has been proposed recently as an effective method in data augmentation for ML models [42]. Synthetic data are created with validation from domain experts to enhance dataset diversity.
To further support our premise, key recommendations have also been identified for the enhancement of data utilization in healthcare through the FAIR principles [43] to guide stakeholders in the use, management, and access of quality data.
4.2. Model Validation and Evaluation
The majority of the studies adopted methods to address overfitting, especially when dealing with small datasets. LOSO or LOGO CV methods are widely accepted and recommended to evaluate models in terms of generalizability [44], which was only considered in three studies [23,31,34]. These inter-subject validation processes ensure that multiple gait cycles of the same subject do not appear in both training and testing sets, avoiding the risk of data leakage. Model training is achieved through an iterative process. Only a single study [17] explicitly split the dataset into three, holding a test set for a final evaluation. Hence, most studies were vague in their model tuning approach.
All ML models considered in this review were tested and trained on the same dataset. This limits its generalizability, as the samples have a high comparability [6]. A thorough external validation of ML models is necessary to ensure reliability before it can be considered for clinical utilization [45]. Significant barriers are seen in region-related differences such as institutional policies in patient selection, demographics, culture, and environment. A common protocol with the same ML platform could potentially address this issue.
While accuracy is used as the primary performance metric in most studies, other important metrics such as sensitivity, specificity, precision, and recall are important as well for clinical utility. These other metrics were considered in eleven studies, providing indicators on how a model performs with false predictions, which are relevant in the medical context. A full confusion matrix of the results is also recommended to provide a whole picture of the performance [46].
4.3. ML Algorithm Selection and Design
Several ML algorithms were considered, with SVM classifiers as the most popular in ten studies. SVM is the best performing model among traditional ML, with a reported accuracy of 70–100%. Noticeably, two similar articles [35,36] on post-THA gait classification reported perfect accuracy. Closer scrutiny reveals a small sample size, which suggests prediction overfitting. It should be noted that relatively early reports tend to have small sample sizes, affecting the developed model’s generalizability.
Recently, a DL framework was proposed that has shown superior performance. This is mainly due to its data-driven approach, as compared to traditional ML models, in related fields that require pattern recognition [47]. It is a feature-extracting architecture with attributes learned from its hidden layers, eliminating this time-consuming stage that oftentimes requires some handcrafting of features. In the past couple of years, three articles from the included reports have proposed DL methods for the classification of before and after THA gaits [16,17] and the prediction of HCF for HOA subjects [31], respectively. The former achieved above 95% performance across features and benchmarked with previous classification studies employing SVM [15,22], which demonstrates its outstanding results. Otherwise, the latter is the first study to predict HCF, kinematic, and kinetic information while using ML methods. The study achieved minimal errors and results in agreement with neuromusculoskeletal modeling.
On the other hand, a single study [33] utilized an unsupervised learning model through k-means clustering analysis from GRF measurements of healthy subjects and HOA patients. It is revealed that the gait patterns of HOA patients can be grouped into two clusters. The first cluster is similar to a healthy gait pattern and could require strengthening exercises, while the second cluster is markedly different from the other clusters. That points to hip replacement possibility or more stringent gait training. Additionally, this is the only study that attempts to correlate with the KL severity grade of HOA.
Our review reveals a palpable gap in feasibility studies in terms of computational and memory requirements. While hardware constraints are mostly settled in recent decades, with the introduction of cloud-edge computing and the use of graphics processing units (GPUs) [48], a conscious consideration of this requirement can significantly improve practicality and effectiveness in real-world clinical applications.
4.4. Interpretability and Explainability
As more DL methods are applied in GA, it is evident that performance has reached its ceiling. However, the lack of prediction transparency and explainability [25] is becoming a major hindrance for its practical application in clinical setting. The “black-box” nature of ML methods makes their decision-making opaque due to unknown input–output mapping on the hidden layer, leading to a poor understanding of predictions.
To address this issue, two reviewed studies explored explainable artificial intelligence (XAI) concepts. Dindorf et al. [35] examined the influence of input representation on SVM model accuracy in the gait classification of healthy and post-THA subjects. Local Interpretable Model-Agnostic Explanations (LIME) were used for the interpretation task. It was found that derived or intermediate features improve performance at the expense of interpretability. While the authors explicitly recommended combining different input representations, this might not be beneficial due to unclear connections. Instead, it is advantageous to exploit DL methods with XAI to obtain optimum performance while not sacrificing interpretability.
Consequently, the same research group [36] further investigated XAI to examine relevant gait features that lead to successful classification decisions utilizing a permutation feature importance algorithm [49]. The selected input features were common, discrete gait kinematic parameters, such as the sagittal hip trajectory, instead of abstract sensor or transformed data. This provides clinicians with valuable interpretable information.
An increase in the transparency of DL models, in the context of GA, is still the subject of active ongoing research [50,51]. Particularly, attention-based DL algorithms [52] are becoming popular with the use of attention maps to visualize hidden layers’ relationships, providing transparent prediction outcomes. It is recommended that future ML algorithms provide accurate results, as well as meaningful and interpretable information. Deeper insights into the gait events of joint kinematics and muscle activation parameters can guide clinicians in decision-making for suitable treatment strategies.
4.5. Feature Identification
Feature extraction and selection are integral parts of ML algorithm design to improve prediction performance. In the context of HOA gait studies, a pending research task is the determination of significant features that describe HOA or THA.
Regarding ST gait parameters, the articles included aimed to corroborate results from previous analytical studies through supervised methods. While gait speed and stride time were found to be the most important features [20,21,39], there are other parameters reported that could be relevant and need to be researched. Accordingly, the study by Miyazaki et al. [38] was focused on predicting the most significant ST gait parameters for subjects who underwent unilateral THA due to HOA through PCA. Among sixteen parameters, there are three components identified that account for more than 90% of the contribution: walking ability, stance phase, and asymmetry of support time. Multiple linear regression analysis further reveals that these are the most influential factors for clinical decisions.
Likewise, there is a positive development towards consensus on the identification of kinematic outcomes that can be utilized for the diagnosis and prediction for HOA patients. The sagittal kinematic angles of the hip, knee, and foot are considered to be significantly relevant [15,16,17,33,35,36]. Aside from the parameter itself, it is important to determine the gait event that leads to the correct prediction. This type of feature selection was explored by Teufl et al. [22]. It is recommended to utilize the whole trajectory as a feature. An interpretable method is employed to detect the most important event in that single gait cycle. Hence, the input feature can be mapped to output prediction, providing explanations for the resulting classification.
Furthermore, a less explored gait feature is based on EMG patterns of muscle activation. Nair et al. [18] investigated ML techniques in the classification of healthy and arthritic, both HOA and rheumatoid arthritis, subjects. Simple ML algorithms were employed more than a decade ago, with gluteus medialis muscles identified as important. Even with the rapid advancement of AI in the past decade, still quite a few studies in the context of GA have utilized EMG inputs [53,54]. Thus, it would be beneficial to see existing DL frameworks applied to EMG patterns and validate these results to further identify hidden but relevant patterns.
From these results, it is clear that utilizing a multi-modal system in a common ML platform is beneficial. Kinetic waveforms can be augmented to other parameters, potentially providing explanations for the relevancy of gait events. Additionally, discarded features may hold clinical significance, and they could be correlated with established and more prominent gait attributes. Interdependencies between features have the potential to be discovered and verified. This will provide crucial information for personalized rehabilitation strategies and further enhance the applicability of ML models.
Conversely, two articles in this review have not utilized clinical datasets. Instead of that, smartphones were employed to gather gait information in order to predict the outcomes of THA operation. Polus et al. [23] developed a fall risk prediction model, while Surmacz et al. [24] developed a multi-class model for the prediction of recovery and rehabilitation based only on gait speed. These real-world applications are the next steps in monitoring patients outside of the clinical settings.
5. Conclusions
ML algorithms reveal promising results in gait analysis for classifying and predicting the medical conditions of individuals with HOA and/or post-THA operations. The majority of the reviewed articles were published in the past five years, underscoring the evolving nature of the research field.
With recent advancements in DL frameworks, classification studies are achieving superior performance heading towards reliable platforms. Reports on the identification of key gait parametric outcomes are reaching a consensus, which can improve HOA diagnoses and follow-up conditions in post-THA subjects. However, small sample sizes, adequate validation processes, and a lack of focus on multi-class studies utilizing pre- and post-THA gaits in a single protocol are ongoing challenges that considerably affect clinical utility and generalizability. Further research on synthetic gait data through generative models is our research direction to mitigate these limitations. Another key characteristic that needs to be addressed is the interpretability of proposed ML models, which has been seldom explored. Future research activities should explore XAI and attention-based models that provide information on the hidden layers, thus increasing transparency.
The findings of this systematic review improve the understanding of the current state of ML research in the context of gait analysis applied to HOA and post-THA subjects. Furthermore, the synthesis of current evidence may guide the future development of clinically interpretable AI-assisted gait assessment methodologies.
Author Contributions
Conceptualization, R.P. and M.S. (Milan Simic); methodology, R.P.; validation, R.P., M.S. (Milan Simic) and M.S. (Mohamed Salih); investigation, R.P. and M.S. (Mohamed Salih); writing—original draft preparation, R.P.; writing—review and editing, R.P., M.S. (Mohamed Salih) and M.S. (Milan Simic); supervision, M.S. (Milan Simic). All authors have read and agreed to the published version of the manuscript.
Funding
This project is funded through Australian Government grant for research in the Development of New AI methodologies on gait analysis for biomedical, sports, and other applications.
Informed Consent Statement
Not applicable.
Data Availability Statement
No new data was created.
Acknowledgments
The first author, Roel Pantonial, acknowledges the Research Training Program scholarship, awarded by the Australian Government. The authors also acknowledge Milena Simic, Co-Lead of the Neuro-Musculoskeletal Research Collaborative Central Sydney (Patyegarang) Precinct, The University of Sydney, Discipline of Physiotherapy | School of Health Sciences | Faculty of Medicine and Health, for initial guidance and consultation in the medical domain and PRISMA methodology for conducting Systematic Review.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A
In line with transparency, the detailed strings for each database used in the study are shown in Table A1.
Table A1.
Detailed search strings.
Table A1.
Detailed search strings.
| Database | Search Strings |
|---|---|
| PubMed | (“osteoarthritis, hip” [MeSH Terms] OR “arthroplasty, replacement, hip” [MeSH Terms] OR “Hip Prosthesis” [MeSH Terms]) AND (“walking” [MeSH Terms] OR “gait” [MeSH Terms]) AND (“machine learning” [MeSH Terms] OR “classification” [MeSH Terms] OR “prediction algorithms” [MeSH Terms] OR “cluster analysis” [MeSH Terms] OR “Regression Analysis” [MeSH Terms] OR “Biometric Identification” [MeSH Terms]) |
| Embase | (((‘hip osteoarthritis’)/exp) OR ((‘hip arthroplasty’)/exp) OR ((‘hip replacement’)/exp) OR ((‘hip prosthesis’)/exp)) AND (((‘walking’)/exp) OR ((‘gait’)/exp)) AND (((‘machine learning’)/exp) OR ((‘classification’)/exp) OR ((‘predictive model’)/exp) OR ((‘cluster analysis’)/exp) OR ((‘regression model’)/exp) OR ((‘biometry’)/exp)) |
| IEEE Xplore | (“All Metadata”: “hip osteoarthritis” OR “All Metadata”: “hip arthroplasty” OR “All Metadata”: “hip replacement” OR “All Metadata”: “hip prosthesis”) AND (“All Metadata”: “walking” OR “All Metadata”: “gait”) AND (“All Metadata”: “machine learning” OR “All Metadata”: “deep learning” OR “All Metadata”: “classification” OR “All Metadata”: “prediction” OR “All Metadata”: “clustering” OR “All Metadata”: “regression” OR “All Metadata”: “biometric”) |
| ACM Digital Library | [[All: “hip osteoarthritis”] OR [All: “hip arthroplasty”] OR [All: “hip replacement”] OR [All: “hip prosthesis”]] AND [[All: “walking”] OR [All: “gait”]] AND [[All: “machine learning”] OR [All: “deep learning”] OR [All: “classification”] OR [All: “prediction”] OR [All: “cluster”] OR [All: “regression”] OR [All: “biometric”]] |
| Scopus | TITLE-ABS-KEY((“hip osteoarthritis” OR “hip arthroplasty” OR “hip replacement” OR “hip prosthesis”) AND (“walking” OR “gait”) AND (“machine learning” OR “deep learning” OR “classification” OR “prediction algorithm” OR “clustering analysis” OR “regression model” OR “biometrics information”)) |
References
- Murphy, L.B.; Helmick, C.G.; Schwartz, T.A.; Renner, J.B.; Tudor, G.; Koch, G.G.; Dragomir, A.D.; Kalsbeek, W.D.; Luta, G.; Jordan, J.M. One in four people may develop symptomatic hip osteoarthritis in his or her lifetime. Osteoarthr. Cartil. 2010, 18, 1372–1379. [Google Scholar] [CrossRef]
- Ornetti, P.; Maillefert, J.-F.; Laroche, D.; Morisset, C.; Dougados, M.; Gossec, L. Gait analysis as a quantifiable outcome measure in hip or knee osteoarthritis: A systematic review. Jt. Bone Spine 2010, 77, 421–425. [Google Scholar] [CrossRef]
- Whittle, M.W. Gait Analysis: An Introduction; Butterworth-Heinemann: Oxford, UK, 2014. [Google Scholar]
- Ackerman, I.N.; Bohensky, M.A.; de Steiger, R.; Brand, C.A.; Eskelinen, A.; Fenstad, A.M.; Furnes, O.; Graves, S.E.; Haapakoski, J.; Mäkelä, K.; et al. Lifetime risk of primary total hip replacement surgery for osteoarthritis from 2003 to 2013: A multinational analysis using national registry data. Arthritis Care Res. 2017, 69, 1659–1667. [Google Scholar] [CrossRef] [PubMed]
- Beaulieu, M.L.; Lamontagne, M.; Beaulé, P.E. Lower limb biomechanics during gait do not return to normal following total hip arthroplasty. Gait Posture 2010, 32, 269–273. [Google Scholar] [CrossRef]
- Xuan, A.; Chen, H.; Chen, T.; Li, J.; Lu, S.; Fan, T.; Zeng, D.; Wen, Z.; Ma, J.; Hunter, D.; et al. The application of machine learning in early diagnosis of osteoarthritis: A narrative review. Ther. Adv. Musculoskelet. Dis. 2023, 15, 1759720X231158198. [Google Scholar] [CrossRef]
- Lee, S.Y.; Park, S.J.; Gim, J.-A.; Kang, Y.J.; Choi, S.H.; Seo, S.H.; Kim, S.J.; Kim, S.C.; Kim, H.S.; Yoo, J.-I. Correlation between Harris hip score and gait analysis through artificial intelligence pose estimation in patients after total hip arthroplasty. Asian J. Surg. 2023, 46, 5438–5443. [Google Scholar] [CrossRef]
- Sepas-Moghaddam, A.; Etemad, A. Deep gait recognition: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 264–284. [Google Scholar] [CrossRef]
- Constantinou, M.; Loureiro, A.; Carty, C.; Mills, P.; Barrett, R. Hip joint mechanics during walking in individuals with mild-to-moderate hip osteoarthritis. Gait Posture 2017, 53, 162–167. [Google Scholar] [CrossRef] [PubMed]
- Leigh, R.J.; Osis, S.T.; Ferber, R. Kinematic gait patterns and their relationship to pain in mild-to-moderate hip osteoarthritis. Clin. Biomech. 2016, 34, 12–17. [Google Scholar] [CrossRef] [PubMed]
- Longworth, J.A.; Chlosta, S.; Foucher, K.C. Inter-joint coordination of kinematics and kinetics before and after total hip arthroplasty compared to asymptomatic subjects. J. Biomech. 2018, 72, 180–186. [Google Scholar] [CrossRef]
- Fujii, J.; Aoyama, S.; Tezuka, T.; Kobayashi, N.; Kawakami, E.; Inaba, Y. Prediction of change in pelvic tilt after total hip arthroplasty using machine learning. J. Arthroplast. 2023, 38, 2009–2016.e3. [Google Scholar] [CrossRef]
- Khan, A.; Galarraga, O.; Garcia-Salicetti, S.; Vigneron, V. Deep learning for quantified gait analysis: A systematic literature review. IEEE Access 2024, 12, 138932–138957. [Google Scholar] [CrossRef]
- Alharthi, A.S.; Yunas, S.U.; Ozanyan, K.B. Deep learning for monitoring of human gait: A review. IEEE Sens. J. 2019, 19, 9575–9591. [Google Scholar] [CrossRef]
- Laroche, D.; Tolambiya, A.; Morisset, C.; Maillefert, J.F.; French, R.M.; Ornetti, P.; Thomas, E. A classification study of kinematic gait trajectories in hip osteoarthritis. Comput. Biol. Med. 2014, 55, 42–48. [Google Scholar] [CrossRef]
- Pantonial, R.; Simic, M. Transfer Learning Method for the Classification of Hip Osteoarthritis using Kinematic Gait Parameters. Procedia Comput. Sci. 2024, 246, 4692–4701. [Google Scholar] [CrossRef]
- Pantonial, R.; Simic, M. Novel Deep Learning Method in Hip Osteoarthritis Investigation Before and After Total Hip Arthroplasty. Appl. Sci. 2025, 15, 872. [Google Scholar] [CrossRef]
- Nair, S.S.; French, R.M.; Laroche, D.; Thomas, E. The Application of Machine Learning Algorithms to the Analysis of Electromyographic Patterns From Arthritic Patients. IEEE Trans. Neural Syst. Rehabil. Eng. 2010, 18, 174–184. [Google Scholar] [CrossRef] [PubMed]
- Ghaffari, A.; Clasen, P.D.; Boel, R.V.; Kappel, A.; Jakobsen, T.; Rasmussen, J.; Kold, S.; Rahbek, O. Multivariable model for gait pattern differentiation in elderly patients with hip and knee osteoarthritis: A wearable sensor approach. Heliyon 2024, 10, e36825. [Google Scholar] [CrossRef] [PubMed]
- Altilio, R.; Paoloni, M.; Panella, M. Selection of clinical features for pattern recognition applied to gait analysis. Med. Biol. Eng. Comput. 2017, 55, 685–695. [Google Scholar] [CrossRef] [PubMed]
- Ghidotti, A.; Regazzoni, D.; Rizzi, C.; Fiorentino, G. Applying Machine Learning to Gait Analysis Data for Hip Osteoarthritis Diagnosis. Stud. Health Technol. Inform. 2025, 324, 152–157. [Google Scholar] [CrossRef]
- Teufl, W.; Taetz, B.; Miezal, M.; Lorenz, M.; Pietschmann, J.; Jöllenbeck, T.; Fröhlich, M.; Bleser, G. Towards an Inertial Sensor-Based Wearable Feedback System for Patients after Total Hip Arthroplasty: Validity and Applicability for Gait Classification with Gait Kinematics-Based Features. Sensors 2019, 19, 5006. [Google Scholar] [CrossRef]
- Polus, J.S.; Bloomfield, R.A.; Vasarhelyi, E.M.; Lanting, B.A.; Teeter, M.G. Machine Learning Predicts the Fall Risk of Total Hip Arthroplasty Patients Based on Wearable Sensor Instrumented Performance Tests. J. Arthroplast. 2021, 36, 573–578. [Google Scholar] [CrossRef]
- Surmacz, K.; Redfern, R.E.; Van Andel, D.C.; Kamath, A.F. Machine learning model identifies patient gait speed throughout the episode of care, generating notifications for clinician evaluation. Gait Posture 2024, 114, 62–68. [Google Scholar] [CrossRef] [PubMed]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
- Franco, A.; Russo, M.; Amboni, M.; Ponsiglione, A.M.; Di Filippo, F.; Romano, M.; Amato, F.; Ricciardi, C. The Role of Deep Learning and Gait Analysis in Parkinson’s Disease: A Systematic Review. Sensors 2024, 24, 5957. [Google Scholar] [CrossRef]
- Jiao, Y.; Hart, R.; Reading, S.; Zhang, Y. Systematic Review of Automatic Post-Stroke Gait Classification Systems. Gait Posture 2024, 109, 259–270. [Google Scholar] [CrossRef]
- Kohnehshahri, F.S.; Merlo, A.; Mazzoli, D.; Bò, M.C.; Stagni, R. Machine learning applied to gait analysis data in cerebral palsy and stroke: A systematic review. Gait Posture 2024, 111, 105–121. [Google Scholar] [CrossRef] [PubMed]
- Kokkotis, C.; Chalatsis, G.; Moustakidis, S.; Siouras, A.; Mitrousias, V.; Tsaopoulos, D.; Patikas, D.; Aggelousis, N.; Hantes, M.; Giakas, G.; et al. Identifying gait-related functional outcomes in post-knee surgery patients using machine learning: A systematic review. Int. J. Environ. Res. Public Health 2022, 20, 448. [Google Scholar] [CrossRef] [PubMed]
- Liberati, A.; Altman, D.G.; Tetzlaff, J.; Mulrow, C.; Gøtzsche, P.C.; Ioannidis, J.P.; Clarke, M.; Devereaux, P.J.; Kleijnen, J.; Moher, D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: Explanation and elaboration. BMJ 2009, 339, b2700. [Google Scholar] [CrossRef]
- Cornish, B.M.; Pizzolato, C.; Saxby, D.J.; Xia, Z.; Devaprakash, D.; Diamond, L.E. Hip contact forces can be predicted with a neural network using only synthesised key points and electromyography in people with hip osteoarthritis. Osteoarthr. Cartil. 2024, 32, 730–739. [Google Scholar] [CrossRef]
- Ahn, S.; Choi, W.; Jeong, H.; Oh, S.; Jung, T.D. One-Step Gait Pattern Analysis of Hip Osteoarthritis Patients Based on Dynamic Time Warping through Ground Reaction Force. Appl. Sci. 2023, 13, 4665. [Google Scholar] [CrossRef]
- Choi, W.; Jeong, H.; Oh, S.; Jung, T.D. Instant gait classification for hip osteoarthritis patients: A non-wearable sensor approach utilizing Pearson correlation, SMAPE, and GMM. Biomed. Eng. Lett. 2025, 15, 301–310. [Google Scholar] [CrossRef]
- Dindorf, C.; Teufl, W.; Taetz, B.; Becker, S.; Bleser, G.; Fröhlich, M. Feature extraction and gait classification in hip replacement patients on the basis of kinematic waveform data. Biomed. Hum. Kinet. 2021, 13, 177–186. [Google Scholar] [CrossRef]
- Dindorf, C.; Teufl, W.; Taetz, B.; Bleser, G.; Fröhlich, M. Interpretability of Input Representations for Gait Classification in Patients after Total Hip Arthroplasty. Sensors 2020, 20, 4385. [Google Scholar] [CrossRef]
- Teufl, W.; Taetz, B.; Miezal, M.; Dindorf, C.; Fröhlich, M.; Trinler, U.; Hogan, A.; Bleser, G. Automated detection and explainability of pathological gait patterns using a one-class support vector machine trained on inertial measurement unit based gait data. Clin. Biomech. 2021, 89, 105452. [Google Scholar] [CrossRef]
- Dammeyer, C.; Nüesch, C.; Visscher, R.M.S.; Kim, Y.K.; Ismailidis, P.; Wittauer, M.; Stoffel, K.; Acklin, Y.; Egloff, C.; Netzer, C.; et al. Classification of inertial sensor-based gait patterns of orthopaedic conditions using machine learning: A pilot study. J. Orthop. Res. 2024, 42, 1463–1472. [Google Scholar] [CrossRef] [PubMed]
- Miyazaki, S.; Fujii, Y.; Tsuruta, K.; Yoshinaga, S.; Hombu, A.; Funamoto, T.; Sakamoto, T.; Tajima, T.; Arakawa, H.; Kawaguchi, T.; et al. Spatiotemporal gait characteristics post-total hip arthroplasty and its impact on locomotive syndrome: A before-after comparative study in hip osteoarthritis patients. PeerJ 2024, 12, e18351. [Google Scholar] [CrossRef] [PubMed]
- Almuhammadi, W.S.; Agu, E.; King, J.; Franklin, P. OA-pain-sense: Machine learning prediction of hip and knee osteoarthritis pain from IMU data. Informatics 2022, 9, 97. [Google Scholar] [CrossRef]
- Bertaux, A.; Gueugnon, M.; Moissenet, F.; Orliac, B.; Martz, P.; Maillefert, J.F.; Ornetti, P.; Laroche, D. Gait analysis dataset of healthy volunteers and patients before and 6 months after total hip arthroplasty. Scientific Data 2022, 9, 399. [Google Scholar] [CrossRef]
- Halilaj, E.; Rajagopal, A.; Fiterau, M.; Hicks, J.L.; Hastie, T.J.; Delp, S.L. Machine learning in human movement biomechanics: Best practices, common pitfalls, and new opportunities. J. Biomech. 2018, 81, 1–11. [Google Scholar] [CrossRef]
- Dindorf, C.; Dully, J.; Konradi, J.; Wolf, C.; Becker, S.; Simon, S.; Huthwelker, J.; Werthmann, F.; Kniepert, J.; Drees, P.; et al. Enhancing biomechanical machine learning with limited data: Generating realistic synthetic posture data using generative artificial intelligence. Front. Bioeng. Biotechnol. 2024, 12, 1350135. [Google Scholar] [CrossRef]
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.-W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef]
- Staudenmayer, J.; Zhu, W.; Catellier, D.J. Statistical considerations in the analysis of accelerometry-based activity monitor data. Med. Sci. Sports Exerc. 2012, 44 (Suppl. S1), S61–S67. [Google Scholar] [CrossRef]
- Lee, L.S.; Chan, P.K.; Wen, C.; Fung, W.C.; Cheung, A.; Chan, V.W.K.; Cheung, M.H.; Fu, H.; Yan, C.H.; Chiu, K.Y. Artificial intelligence in diagnosis of knee osteoarthritis and prediction of arthroplasty outcomes: A review. Arthroplasty 2022, 4, 16. [Google Scholar] [CrossRef]
- Lavazza, L.; Morasca, S. Common problems with the usage of f-measure and accuracy metrics in medical research. IEEE Access 2023, 11, 51515–51526. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Kennedy, J.; Sharma, V.; Varghese, B.; Reaño, C. Multi-tier GPU virtualization for deep learning in cloud-edge systems. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 2107–2123. [Google Scholar] [CrossRef]
- Fisher, A.; Rudin, C.; Dominici, F. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 2019, 20, 1–81. [Google Scholar]
- Hosain, M.T.; Jim, J.R.; Mridha, M.; Kabir, M.M. Explainable AI approaches in deep learning: Advancements, applications and challenges. Comput. Electr. Eng. 2024, 117, 109246. [Google Scholar] [CrossRef]
- Slijepcevic, D.; Horst, F.; Lapuschkin, S.; Horsak, B.; Raberger, A.-M.; Kranzl, A.; Samek, W.; Breiteneder, C.; Schöllhorn, W.I.; Zeppelzauer, M. Explaining Machine Learning Models for Clinical Gait Analysis. ACM Trans. Comput. Healthc. 2021, 3, 14. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Rezaee, K.; Savarkar, S.; Yu, X.; Zhang, J. A hybrid deep transfer learning-based approach for Parkinson’s disease classification in surface electromyography signals. Biomed. Signal Process. Control 2022, 71, 103161. [Google Scholar] [CrossRef]
- Romano, F.; Formenti, D.; Cardone, D.; Russo, E.F.; Castiglioni, P.; Merati, G.; Merla, A.; Perpetuini, D. Data-driven identification of stroke through machine learning applied to complexity metrics in multimodal electromyography and kinematics. Entropy 2024, 26, 578. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).