Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data

Al-Ramini, Ali; Hassan, Mahdi; Fallahtafti, Farahnaz; Takallou, Mohammad Ali; Rahman, Hafizur; Qolomany, Basheer; Pipinos, Iraklis I.; Alsaleem, Fadi; Myers, Sara A.

doi:10.3390/s22197432

Open AccessArticle

Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data

by

Ali Al-Ramini

¹

,

Mahdi Hassan

^2,3,

Farahnaz Fallahtafti

^2,3

,

Mohammad Ali Takallou

⁴

,

Hafizur Rahman

²,

Basheer Qolomany

⁵,

Iraklis I. Pipinos

^3,6,

Fadi Alsaleem

⁴

and

Sara A. Myers

^2,3,*

¹

Mechanical Engineering Department, University of Nebraska-Lincoln, Lincoln, NE 68588, USA

²

Department of Biomechanics, University of Nebraska at Omaha, Omaha, NE 6160, USA

³

Department of Surgery and VA Research Service, VA Nebraska-Western Iowa Health Care System, Omaha, NE 68105, USA

⁴

Durham School of Architectural Engineering and Construction, University of Nebraska–Lincoln, Omaha, NE 68182, USA

⁵

Cyber Systems Department, University of Nebraska at Kearney, Kearney, NE 68849, USA

⁶

Department of Surgery, University of Nebraska Medical Center, Omaha, NE 68105, USA

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(19), 7432; https://doi.org/10.3390/s22197432

Submission received: 17 August 2022 / Revised: 21 September 2022 / Accepted: 26 September 2022 / Published: 30 September 2022

(This article belongs to the Special Issue Artificial Intelligence-Enabled System for Health and Biomechanical Monitoring)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Peripheral artery disease (PAD) manifests from atherosclerosis, which limits blood flow to the legs and causes changes in muscle structure and function, and in gait performance. PAD is underdiagnosed, which delays treatment and worsens clinical outcomes. To overcome this challenge, the purpose of this study is to develop machine learning (ML) models that distinguish individuals with and without PAD. This is the first step to using ML to identify those with PAD risk early. We built ML models based on previously acquired overground walking biomechanics data from patients with PAD and healthy controls. Gait signatures were characterized using ankle, knee, and hip joint angles, torques, and powers, as well as ground reaction forces (GRF). ML was able to classify those with and without PAD using Neural Networks or Random Forest algorithms with 89% accuracy (0.64 Matthew’s Correlation Coefficient) using all laboratory-based gait variables. Moreover, models using only GRF variables provided up to 87% accuracy (0.64 Matthew’s Correlation Coefficient). These results indicate that ML models can classify those with and without PAD using gait signatures with acceptable performance. Results also show that an ML gait signature model that uses GRF features delivers the most informative data for PAD classification.

Keywords:

peripheral artery disease; vascular disease; machine learning; gait analysis; deep learning

1. Introduction

Peripheral artery disease (PAD) is a cardiovascular disease caused by atherosclerosis, which limits blood flow to the arteries and tissues. PAD affects up to 10% of Americans over 40 years of age [1,2,3,4]. The number of patients with PAD has increased, making PAD the third most common atherosclerotic cardiovascular disease after coronary artery disease and stroke [5,6,7,8,9]. The most prevalent symptom of PAD is intermittent claudication, defined as ischemic pain that develops when working leg muscles do not have adequate oxygen [10]. Patients with PAD become progressively more sedentary [11,12] and have altered mobility [13,14,15,16,17,18,19,20]. Moreover, functional impairment frequently occurs before PAD diagnosis, and unidentified, asymptomatic PAD is associated with more adverse outcomes than intermittent claudication [21].

Diagnosing PAD early would enable treatment to slow disease progression, which would decrease the risk of major cardiovascular events [21]. However, 40–60% of patients with PAD go undiagnosed in a primary care setting [21]. The standard diagnostic method, the ankle-brachial index, is a highly specialized test that is costly and requires technologists with training in a specialized vascular lab setting [12,22,23,24]. Sheng et al. reported that pulse wave measurements could accurately detect PAD as ABI, but the pulse wave recording technique could be affected by physiological limitations [25]. A pulse wave depends on peripheral blood flow and may be affected by sympathetic nerve input rather than vessel patency [25]. In addition, severe congestive heart failure can also simulate inflow disease by reducing the blood flow [25]. A review study suggested that although pulse wave velocity measurements to detect PAD are reliable hemodynamic measures, further research is needed to establish the screening and diagnostic validity [26]. The diagnosis of PAD is challenging because of the absence of a distinctive sign that can help physicians to distinguish PAD from the typical signs of aging and other movement-related health conditions. A non-invasive screening approach that physicians could use to identify individuals at higher risk of PAD during daily activities is needed.

Recent research has implemented a data-driven approach using machine learning (ML) to identify patients with PAD [27,28,29,30]. ML models have been implemented for PAD diagnosis using blood samples and Doppler data [16]; clinical records [17]; symptom surveys, interviews, and walking distances [18]; and arterial pulse waveforms [19]. While some of these diagnostic models achieved accuracies up to 87%, significant limitations arise in terms of time required (e.g., multiple years of medical records), resources (e.g., protein-based lab setting and interviews that are not standard of care), and involvement of experts with advanced training to obtain the required data to train the models. A model using the six-minute walk test and symptom scores has fewer barriers but led to a compromised accuracy of 69%, and it still required detailed physician evaluation to gather symptom scores [18]. ML and Neural Networks have also been used to automate the classification of arterial segments affected following PAD diagnosis. This approach used computer vision algorithms with Doppler waveforms and PAD imaging but also required manual adjustment of images, which is time consuming [31,32]. Deep learning-based arterial pulse waveform analysis was also used to detect and estimate PAD severity, but this test is not easier to access than an ankle-brachial index test [30]. Other research developed ML algorithms to identify PAD and predict the mortality risk using complete clinical, imaging, demographic, and genomic information for each patient [28]. The resulting machine-learned models surpassed stepwise logistic regression models to identify patients with PAD and predict future mortality. However, the models depended on the availability of multiple clinical information collected simultaneously, which may not be available in practice and would only be helpful after a PAD diagnosis [28].

ML has recently been applied to ground reaction forces and joint angles to characterize gait in individuals [33], including those with Parkinson’s disease [34,35,36], but not for PAD. Gait analysis has proven crucial in determining the mechanisms and severity of functional limitations, measuring treatment effectiveness, and monitoring the progression of PAD [18,37,38]. For instance, patients with PAD walk slower, have decreased step length while walking before and after pain onset, and spend more time in the double support phase compared to older healthy controls [18,19,20,39]. In addition, gait biomechanics studies have found reduced joint angular displacement, velocities, and accelerations in patients with PAD compared to older controls without PAD [40]. Based on the consistently altered gait patterns in patients with PAD [18,19,20,38,41,42,43,44], it is likely that ML can be applied to gait data to identify the presence of PAD. Thus, ML can be valuable for developing gait signatures that enable early PAD detection and monitor functional severity, disease progression, and improvements with treatment.

This paper implements ML models on gait features to distinguish individuals with PAD from healthy older individuals without PAD. The organization of this paper is as follows: First, we provide a detailed description of the data sources, including gait data and the produced gait features. Next, we provide a preliminary descriptive analysis of gait signatures by studying variance, F-statistic, information gain, and correlation among the gait features. Then, we describe the predictive ML models, including data extraction, grouping, and feeding of the ML model. Finally, we dive into our ML approach that uses gait signatures to classify PAD by extracting the most distinguishing gait features. Figure 1 briefly demonstrates this paper’s workflow. This work provides a foundation to model PAD gait features from biomechanics data collected in the lab. This modeling approach may inspire extracting those gait features from acceleration measurements taken with wearable devices, which could be worn in real-world settings to identify potential patients with PAD. Thus, the presented work takes an essential first step toward continuously monitoring individuals’ physical and movement behavior. PAD is costly for individuals, governments, and society. These new models could be used to monitor moving in the real world, helping alert physicians to the potential presence of PAD in general practices, enable in-home detection of worsening PAD symptoms, manage chronic PAD, and predict when significant adverse health events may occur.

2. Data Sources

This section describes the available biomechanics data and the gait feature extraction process for ML applications. Biomechanics data were gathered from studies conducted and approved by the Institutional Review Boards at the University of Nebraska Medical Center and the Nebraska-Western Iowa Veteran Affairs Medical Center. These studies consist of a total of 270 participants, including 227 patients with PAD and 43 healthy older controls.

Experimental tests were conducted in the Biomechanics Research Building gait lab at the University of Nebraska at Omaha. Reflective markers were placed at specific anatomical locations on the lower limbs, utilizing the marker systems of Vaughan [45] and Nigg [46]. Each subject walked at their self-selected pace through a ten-meter pathway containing force platforms set level with the floor. Kinematics data were recorded using a 12 high-speed digital camera motion capture system (100 Hz; Cortex 5.1, Motion Analysis Corp., Rohnert Park, CA, USA) and ground reaction forces were collected using force plates (1000 Hz; AMTI, Watertown, MA, USA). Each patient performed the walking test before (pain-free condition) and after the onset of claudication pain (pain condition). Patients were required to rest one minute in between trials to prevent the onset of claudication pain during the pain-free walking condition. Healthy controls only performed the test in the pain-free condition since they do not experience claudication pain. A total of five successful overground walking trials per leg were collected in which heel-strike and toe-off events were within the boundaries of the force plate. Data were exported and processed using custom laboratory codes in MATLAB software (MathWorks Inc., Natick, MA, USA) and Visual 3D software (C-Motion, Inc., Germantown, MD, USA). Visual 3D software was used to calculate the ground reaction forces in vertical, anterior–posterior, and medial–lateral directions, as well as ankle, knee, and hip joint angles, and joint angular velocities during the stance phase of walking. Joint torques and powers were calculated using inverse dynamics for the ankle, knee, and hip joints during the stance phase of walking. Inverse dynamics combines the kinematics and the ground reaction forces described by Winter [47]. The joint torques and powers determine the lower extremity joint angles, muscular responses (torques), and contributions (powers) during walking.

From the biomechanics overground time-series data (Appendix A, Figure A1), peak discrete points were extracted from all trials for all subjects. Points included minimums and maximums for joint angles, torques, and powers for the ankle, knee, and hip. There were peak values from the anterior–posterior, medial–lateral, and vertical ground reaction forces. Overall, this resulted in a total of 31 predictive gait features from each trial. The peak discrete points were averaged across the five trials for each subject (Appendix A, Table A1), describing the gait features we used to develop the ML models.

3. Descriptive Data Analysis

For the preliminary data analysis of gait signatures, we first used statistical methods to provide a descriptive analysis of the data and anticipate the significance of each gait feature by exploring the variance [48], ANOVA F-statistic [49], information gain [50], and correlation [51]. These methods help with feature selection, producing features with high predictive capability for our ML model. Additionally, in our analysis, we used these methods to provide insight into the most important gait feature groups for our classification task (separating patients with PAD from healthy controls). For instance, variance is necessary within the dataset because the significant differences within feature variance allow the ML model to learn the different patterns hidden in the data. F-statistic, or F-test, is a statistical test that calculates the ratio between variance values, such as the variance from two different samples. F-statistics compare and identify relevant features for a classification task. Information gain measures the association between inputs and outputs. Thus, the higher the information gained, the better. Moreover, using the correlation in the data to extract the redundant features produces a better prediction [52]. Then, we utilize the insights from the descriptive analysis to build ML models using sub-groups of the data based on the source of the measurement (ankle, hip, knee, and GRF).

Our dataset consists of 32 predictive features that include 31 numerical features, as described in Table A1 in the Appendix A. First, we use variance analysis to visualize the distribution within each numeric feature and its corresponding presence or absence of PAD. The variance provides insight into the ability of each feature to distinguish between individuals with and without PAD. Because some features in the data are not normally distributed, we used Levene’s test [53], which accepts non-normal distributions to detect the features that significantly differed in variance between individuals with and without PAD. Figure 2 shows a boxplot of each numeric feature and distribution for each group. We divided the figures into four groups based on the gait feature measurement source (ankle, hip, knee, and GRF). The green asterisks represent a significant Levene’s test p-values for the difference in variance between patients with PAD and healthy controls. Our results show that GRF features have a higher variance than other features. For example, only one of the gait kinematic (ankle, knee, or hip) features differed in variance. However, there was a difference in variance for most GRF features. This suggests that GRF might be more discriminant gait measurement source to identify PAD.

Another commonly used feature selection method for classification is information gain, which measures the reduction in entropy by splitting a dataset according to a given value of a random variable. Entropy quantifies how much information exists in a random variable, specifically its probability distribution. For example, a skewed distribution has low entropy, whereas a distribution where events have equal probability has a higher entropy. The higher the information gain, the more informative the feature for the model classification capability. Similarly, F-statistic calculates the difference between two sample variances, and the higher F-statistic, the more valuable the data feature. Figure 3 shows the aggregated F-statistic and information gain average based on the measurement source. While we applied our analysis to all gait features, we only show the aggregated average groups of joint measurements for better presentation.

The final step before applying ML is determining the correlation between gait features. This can be used to identify and eliminate redundant features. A correlation study between all numeric features shows that GRF features are highly correlated compared to other features (Figure 4), which indicates that only a few GRF features might be sufficient for distinguishing PAD. This finding, along with our observations from the variance analysis (Figure 2) and F-statistic (Figure 3), suggests that a few laboratory-based gait features, e.g., GRF features, can successfully train ML models to identify the presence or absence of PAD. We test this hypothesis in the upcoming ML section.

4. Predictive ML Models to Diagnose PAD

This section explains how we distinguish between patients with PAD and healthy controls. It also demonstrates how the most essential gait features were extracted to identify the presence or absence of PAD. The ML model includes input, data grouping, ML training, ML testing, and performance evaluation (Figure 5). The goal is to find the lowest number of gait features that produce the most accurate ML model to classify individuals as having or not having PAD.

We divided the data into six groups to identify the most important features for classifying PAD (Figure 5). The groups range for including all features (Group 1), features from only one joint (Group 2, 3, 4), all GRF features (Group 5), or all features except GRF features (Group 6). Next, we divided the data into training and testing data sets. Due to the imbalance between the number of healthy controls versus patients with PAD, we oversampled.

The healthy subjects’ data in the training set using the Synthetic Minority Oversampling Technique (SMOTE) algorithm [54]. Subsequently, we applied several ML algorithms to Group 1 training data sets, extracted binary predictions using the test data, and compared these predictions with the original test data. Finally, we used the best algorithms obtained from Group 1 data and evaluated these algorithms for the other groups listed above. We followed this grouping criterion to assess the strength of different data sources (ankle, knee, and hip gait features and GRF) in distinguishing PAD using ML models. The descriptive data analysis in the previous section suggests that GRF gait features might be sufficient for a ML model to distinguish between patients with PAD and healthy controls. Moreover, identifying one valuable gait signature source could minimize the time and computational cost of detecting PAD.

In terms of ML, we used four well-known algorithms: Neural Networks [55,56], Random Forest [57], Support Vector Machine (SVM) [58], and Logistic Regression [59]. Previous research used these algorithms in many classification tasks for diagnostic applications in the medical fields [31,32,60,61,62]. We ran each group to achieve the best predictions and find the minimum features that produced acceptable performance. Specifically, we built the ML models to deliver the best possible performance using Group 1 data (all features), which we treated as a benchmark for comparing our predictions using the other groups.

We used TensorFlow to build Neural Networks models [63], and the Sci-kit Learn library [64] was used to create Random Forest, SVM, and logistic regression models in python [65] (The source code for each ML model can be found in the Supplementary Materials). Then, we used the grid search method [66] and manually tuned the hyperparameters to produce the best performance for each model. Hyperparameters refer to parameters within each ML model that require optimization to produce the best possible prediction result. It is noteworthy that Neural Networks models require the assigning of many parameters to form the model architecture, including the number of hidden layers, number of neurons in each hidden layer, activation functions, optimizers, and other hyperparameters. For example, the architecture of our Neural Networks includes activation functions before each hidden layer, five hidden layers, and an output activation function (Sigmoid) for binary classification.

Random Forest, SVM, and logistic regression ML algorithms require a relatively short building and tuning time to produce the best possible model for our specific data combinations. On the other hand, Neural Networks models need more time to build and tune, requiring many hyperparameters for tuning (Table 1). Neural Networks also require longer training time compares to Random Forest, SVM, and logistic regression approaches. Nevertheless, previous research proved Neural Networks to be a valuable and accurate tool in classification tasks [56]. Each algorithm has unique corresponding hyperparameters, and Neural Networks models require consideration of many hyperparameters (Table 1). (The best possible hyperparameters for each model can be found in the Supplementary Materials).

In the current study, 81% of the total data came from patients with PAD. Therefore, we oversampled the healthy subjects’ data and used multiple performance metrics to add to the accuracy metric (the number of predictions correctly predicted divided by the total number of examples) to avoid over-optimistic results [67]. Here, we provide an overview of some of the metrics we use. These include Matthew’s Correlation Coefficient, Discriminant Power, and Geometric Mean [67,68].

Matthew’s Correlation Coefficient is calculated based on four scores (true positive, true negative, false positive, and false negative), known as confusion matrix scores. Matthew’s Correlation Coefficient provides a good score, which usually ranges between 0.5 and 1 only if the model performs well in all four confusion matrix categories, and it is the least influenced model metric by data imbalance. Thus, we use Matthew’s Correlation Coefficient as the primary comparison method between the models. Matthew’s Correlation Coefficient values ranged from −1 to 1, with “1” as the perfect model, “−1” as the worst model, and 0 no better than a random naïve model [67,68]. The Geometric Mean measures the balance of the classification performance in the majority (PAD) and minority (healthy control) classes. It also helps avoid overfitting in negative and underfitting in positive classes. Finally, Discriminant Power measures the ability of the classifier to distinguish between minority and majority cases. A higher Discriminant Power value translates to better model performance.

5. ML Models Results

The predictions of Group 1, which included all gait features, yielded higher performance values in all metrics with the Neural Networks and Random Forest than with SVM and logistic regression (Figure 6). Therefore, in the next step, we applied Neural Networks and Random Forest models to the rest of the groups and compared the predictive performances with Group 1 predictions to measure the ability of the models to classify PAD using a few laboratory-based gait features.

In Groups 2, 3, and 4, which included the gait data generated from one joint (ankle, knee, or hip), the model predictions show that Neural Networks models were more accurate than Random Forest (Table 2). However, all these models were less accurate than the GRF model (Group 5). Group 5 GRF data yielded a comparable prediction accuracy to Group 1 (Figure 7). GRF-based Random Forest predictions had higher Discriminant Power and Geometric Mean values than Group 1 predictions (Table 2). Moreover, all the models that included GRF (Groups 1 and 5) performed better than the models built to distinguish PAD with gait data from only one joint. Interestingly, when all the body joints data were combined (Group 6), the GRF models still performed better, indicating that GRF is a crucial measurement factor in classifying PAD.

Finally, we explored using a few GRF-based gait features by dividing GRF data into three sub-groups based on the signal origins (X: anteroposterior component, Y: mediolateral component, Z: vertical component). GRF-anteroposterior produces five gait features, GRF-mediolateral produces two gait features, and GRF-vertical produces three features (Table A1). GRF-anteroposterior performed better than GRF-mediolateral and GRF-vertical in identifying the presence of PAD (Figure 8). However, using all GRF components still provided better predictive results indicating that limiting the GRF components compromised the model’s ability to identify PAD.

6. Discussion

We have described a proof-of-concept application of ML to classify individuals as having or not having PAD using laboratory-based gait features. The model used a dataset of 227 previously diagnosed patients with PAD and 43 healthy controls. Our preliminary analysis based on variance, F-statistic, and information gain (Figure 1 and Figure 2) suggested that GRF gait features hold the most valuable information to classify individuals compared to the joint angle features (ankle, knee, and hip).

Our results (Figure 5 and Figure 6; Table 2) suggest that individuals with PAD have distinct gait signatures compared to healthy individuals, and this ML approach may be helpful in the early identification of PAD. In addition, our findings indicate that classification of PAD is possible using a few laboratory-based gait features. For instance, the Random Forest model based on Group 5 GRF data (Matthew’s Correlation Coefficient: 0.64) performed similar to the model using all gait biomechanics features available in Group 1. Additionally, the Neural Networks GRF-based model (Group 5; Matthew’s Correlation Coefficient: 0.57) yielded comparable prediction quality compared to Group 1 (Matthew’s Correlation Coefficient: 0.64). On the other hand, joint data such as the ankle, knee, or hip produced less accurate results than Group 1. Future research can utilize this information to build a model that monitors disease severity and progression.

Furthermore, we showed the best ML model to handle the predictions for each group. Models using Neural Networks and Random Forest classified individuals with and without PAD using all gait features. The model using the Neural Networks approach produced the highest performance values when using all gait features, while the Random Forest-based model generated the best result for PAD classification using GRF gait features. Finally, we tested the ability of a few GRF laboratory gait features to classify PAD. While some GRF features performed better than the ankle, hip, or knee data alone, the models still lost essential information to provide high-quality predictions, making the model that included all gait features the best.

Based on gait signatures, this study provided a preliminary step toward a more robust model with a larger dataset that can accurately identify the presence of PAD. Moreover, researchers can use this ML method to identify individuals at risk of developing PAD or other diseases using the same training, testing, model architecture, comparison, and performance measurement techniques discussed in this paper. For instance, this research lays the first stone to possibly extract meaningful data points from gait measures captured with wearable sensors by identifying the most crucial gait features. By identifying these features, this research successfully minimizes the complexity of building of ML models that can identify PAD with gait data with larger datasets in the future. Eventually, such models could be implemented based on wearable, real-world data, providing alerts to physicians to order more detailed PAD screening.

There are some limitations to this study. First, the dataset is relatively small for ML applications, so the results from this paper cannot be generalized to identify PAD from a more extensive dataset. In addition, given the challenges of identifying and diagnosing PAD and its severity, the patients with PAD in this study are all pre-labeled with PAD, ensuring that PAD was the primary cause of functional impairment rather than other conditions impact function. Therefore, this could make the results less generalizable. However, this paper demonstrates that ML can offer a high-quality prediction while distinguishing between patients with PAD and healthy controls, even with a few laboratory-based gait features.

Future work will explore the ability of machine learning models to identify early PAD risk and monitor PAD progression and treatment effectiveness. Knowledge from this work could be transferred to wearable sensors that could be integrated into shoes or other assistive devices worn by older individuals. The ability to detect abnormal gait signatures or changes that indicate worsening disease progression can become a valuable tool for managing this chronic disease.

7. Conclusions

This paper utilized ML applications to classify individuals with PAD by developing gait signatures with laboratory-based gait features. First, we provided a preliminary analysis to statistically distinguish the essential gait features. Then, we used an ML approach to extract the most valuable features for classifying PAD. Our data-driven approach provided a preliminary foundation for ML identification tasks for an underdiagnosed disease and would greatly benefit from earlier detection. Our findings showed that ML algorithms could produce informative and strong performance values when applied to identify PAD. We also demonstrated that GRF measurements provided better information for classifying individuals with and without PAD. Future research should explore ML models’ ability to identify the risk of having PAD calculating surrogate measures of GRF and gait from measurement taken outside the laboratory. Future work could explore using ML models of wearable gait data as an indicator of PAD risk, as well as to monitor disease progression, severity, and treatment effectiveness.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/s22197432/s1, Python Code S1.1: Import Libraries; Python Code S1.2: Import Dataset; Python Code S1.3: Splitting the dataset into the Training set and Test set; Python Code S1.4: Feature Scaling; Python Code S1.5: SMOTE Oversampling of Training Data; Python Code S2.1: Neural Network Deep Learning Model Implementation; Python Code S2.2: Neural Network Model Scores; Python Code S2.3: SVM Machine Learning Model Implementation; Python Code S2.4: SVM Model Scores; Python Code S2.5: Random Forest Machine Learning Model Implementation; Python Code S2.6: Random Forest Model Scores; Python Code S2.7: Logistic Regression Machine Learning Model Implementation; Python Code S2.8: Logistic Regression Model Scores; Table S1: Description of Predictive feature and data types; Table S2: Neural Networks Model Hyperparameters; Table S3: SVM Model Hyperparameters; Table S4: Random Forest Model Hyperparameters; Table S5: Performance Summary of All Machine Learning Models; Figure S1: Summary of Models Results.

Author Contributions

Conceptualization, A.A.-R., B.Q., I.I.P., F.A. and S.A.M.; methodology, A.A.-R., M.H., F.F., M.A.T., H.R., B.Q., I.I.P., F.A. and S.A.M.; software, A.A.-R., M.A.T., B.Q. and F.A.; validation, B.Q., I.I.P., F.A. and S.A.M.; formal analysis, A.A.-R., M.H., F.F., M.A.T. and H.R.; investigation, M.H., F.F., H.R. and S.A.M.; resources, B.Q., I.I.P., F.A. and S.A.M.; data curation, A.A.-R., M.H., F.F., M.A.T. and H.R.; writing—original draft preparation, A.A.-R.; writing—review and editing, M.H., F.F., M.A.T., H.R., B.Q., I.I.P., F.A. and S.A.M.; visualization, A.A.-R.; supervision, B.Q., I.I.P., F.A. and S.A.M.; project administration, B.Q., I.I.P., F.A. and S.A.M.; funding acquisition, B.Q., I.I.P., F.A. and S.A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institutes of Health (R01AG034995, R01HD090333, R01AG049868), United States Department of Veterans Affairs Rehabilitation Research and Development Service (I01RX000604, I01RX003266), and the University of Nebraska Collaboration Initiative.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Boards The study was approved by the Institutional Review Boards at the University of Nebraska Medical Center and the Nebraska-Western Iowa Veteran Affairs Medical Center.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data that support the findings and results in the paper can be requested from the corresponding author. A material transfer agreement must be executed between the requesting institutions and the University of Nebraska at Omaha and the Omaha Veterans Affairs Medical Center.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. Biomechanics Data and Gait Features Sources and Descriptions.

Gait Feature Source	Raw Signal	Gait Signature Extracted	Definition and Explanation
Ground Reaction Forces (GRF) Figure A1a	GRF x-axis (Anteroposterior component)	Braking peak: Initial negative force component after heel contact. (N/kg) Zero-crossing: the midpoint of the anteroposterior component. (N/kg) Propulsive peak: The positive peak of the propulsion component. (N/kg) Braking impulse: The area under the anterior-posterior force curve between touch-down and zero-crossing at midstance. (N.s/kg) Propulsive impulse: The area under the anterior-posterior force curve between zero-crossing at midstance and toe-off. (N.s/kg)	GRF is recorded on overground force plates, where the center of pressure is expressed in a standard cartesian coordinate system (x, y, z). The ground reaction force is exerted by the ground on a body in contact with it and is composed of three components: vertical, anterior-posterior, and mediolateral. These forces can be combined with the limb orientation data to calculate ankle, knee, and hip joint torques and powers. The rotating effect of the force located at a distance from the joint axis is quantified using joint torques, while the joint power quantifies the power output of individual joints during walking.
	GRF y-axis (Mediolateral component)	Lateral peak: The maximum short positive force component immediately after heel contact. (N/kg) Medial peak: The minimum negative force component as the foot snatches for toe-off. (N/kg)
	GRF z-axis (Vertical component)	Loading response peak: Rapid rise in force after heel contact. (N/kg) Midstance valley: The minimum force exerted by the center of mass at midstance. (N/kg) Terminal stance peak: Second peak force that is greater than body weight (N/kg)
Ankle Figure A1b	Ankle Joint Angle	Ankle plantarflexion maximum: Peak plantarflexion during stance. (Degree) Ankle dorsiflexion maximum: Peak dorsiflexion during stance. (Degree)	The ankle is plantar flexed at heel strike in the range of 5–6 degrees, moves to 10–12 degrees of dorsiflexion, and then back to plantarflexion (15–20 degrees) at toe-off.
	Ankle Torque	Ankle dorsiflexor peak torque: Peak response of the ankle dorsi flexors during stance. (N.m/kg) Ankle plantar flexor peak torque: Peak response of the ankle plantar flexors (extensors) during stance. (N.m/kg)	During loading, the ankle has a dorsiflexor torque as the foot is lowered to the ground. Next, a plantarflexion torque occurs through midstance to control the weight transfer over the ankle as the body moves over the foot. Finally, at late stance, the plantarflexion torque continues as the plantar flexors advance the foot into the swing.
	Ankle Power	Early power absorption: (Eccentric muscular contraction) at the ankle after heel strike. (W/kg) Peak power absorption: (Eccentric muscular contraction) at the ankle during midstance. (W/kg) Peak power generation: (Concentric muscular contraction) at the ankle during late stance. (W/kg)	At loading response, power is absorbed by the dorsiflexors as the foot is lowered to the ground. Power absorption continues by the plantar flexors as the body moves over the foot. Finally, power is generated by the plantar flexors to drive the leg into the swing.
Hip Figure A1c	Hip Joint Angle	Hip Flexion Maximum: Peak hip flexion during stance. (Degree) Hip Extension Maximum: Peak hip extension during stance. (Degree)	Peak hip flexion usually occurs at heel contact and is approximately 35–50 degrees. After heel contact, hip flexion reduces throughout support until toe-off.
	Hip Torque	Hip Flexor peak torque: Peak response of the hip flexors during stance. (N.m/kg) Hip Extensor peak torque: Peak response of the hip extensors during stance. (N.m/kg)	A net hip extensor torque during the initial loading phase of support continues through midstance into late stance.
	Hip Power	Early peak power generation: (Concentric muscular contraction) at the hip after heel strike. (W/kg) Peak power absorption: (Eccentric muscular contraction) at the hip during midstance. (W/kg) Peak power generation: (Concentric muscular contraction) at the hip during late stance. (W/kg)	At heel contact, there is power generation of the hip extensors. In late stance, there is new power absorption by the hip extensors to decelerate the hip flexors, followed by power generation of the hip flexors to propel the leg into the swing.
Knee Figure A1d	Knee Joint Angle	Knee Flexion Maximum: Peak dorsiflexion during stance. (Degree) Knee Extension Maximum: Peak plantarflexion during stance. (Degree)	The ankle is plantar flexed at heel strike in the range of 5–6 degrees and moves to 10–12 degrees of dorsiflexion and then back to plantarflexion (15–20 degrees) at toe-off.
	Knee Torque	Knee Flexor peak torque: Peak response of the knee flexors during stance. (N.m/kg) Knee Extensor peak torque: Peak response of the knee extensors during stance. (N.m/kg)	The loading response at the knee involves an extensor torque of the knee, which transfers to a flexor torque after the knee angle moves into extension towards toe-off.
	Knee Power	Early peak power absorption: (Concentric muscular contraction) at the knee after heel strike. (W/kg) Peak power generation: (Concentric muscular contraction) at the knee during mid stance. (W/kg) Peak power absorption: (Eccentric muscular contraction) at the knee during terminal stance. (W/kg)	There is knee flexion controlled by the extensors (power absorption) at heel contact moving into midstance, where there is a knee extensor torque controlled by the extensors (power generation). In late stance, there is a knee extensor torque controlled by the extensors (power generation).

Figure A1. Gait features extracted from raw biomechanics data. (a) Ground reaction forces (GRF) raw signals and the peak points extracted, (b) Ankle, (c) Hip, and (d) Knee raw signals with the peak points extracted.

References

Dhaliwal, G.; Mukherjee, D. Peripheral Arterial Disease: Epidemiology, Natural History, Diagnosis and Treatment. Int. J. Angiol. 2007, 16, 36. [Google Scholar] [CrossRef] [PubMed]
Nehler, M.R.; Duval, S.; Diao, L.; Annex, B.H.; Hiatt, W.R.; Rogers, K.; Zakharyan, A.; Hirsch, A.T. Epidemiology of Peripheral Arterial Disease and Critical Limb Ischemia in an Insured National Population. J. Vasc. Surg. 2014, 60, 686–695.e2. [Google Scholar] [CrossRef] [PubMed]
Matsushita, K.; Sang, Y.; Ning, H.; Ballew, S.H.; Chow, E.K.; Grams, M.E.; Selvin, E.; Allison, M.; Criqui, M.; Coresh, J.; et al. Lifetime Risk of Lower-Extremity Peripheral Artery Disease Defined by Ankle-Brachial Index in the United States. J. Am. Heart Assoc. 2019, 8, e012177. [Google Scholar] [CrossRef] [PubMed]
Allison, M.A.; Ho, E.; Denenberg, J.O.; Langer, R.D.; Newman, A.B.; Fabsitz, R.R.; Criqui, M.H. Ethnic-Specific Prevalence of Peripheral Arterial Disease in the United States. Am. J. Prev. Med. 2007, 32, 328–333. [Google Scholar] [CrossRef] [PubMed]
Song, P.; Rudan, D.; Zhu, Y.; Fowkes, F.J.I.; Rahimi, K.; Fowkes, F.G.R.; Rudan, I. Global, Regional, and National Prevalence and Risk Factors for Peripheral Artery Disease in 2015: An Updated Systematic Review and Analysis. Lancet Glob. Health 2019, 7, e1020–e1030. [Google Scholar] [CrossRef]
Fowkes, F.G.R.; Rudan, D.; Rudan, I.; Aboyans, V.; Denenberg, J.O.; McDermott, M.M.; Norman, P.E.; Sampson, U.K.A.; Williams, L.J.; Mensah, G.A.; et al. Comparison of Global Estimates of Prevalence and Risk Factors for Peripheral Artery Disease in 2000 and 2010: A Systematic Review and Analysis. Lancet 2013, 382, 1329–1340. [Google Scholar] [CrossRef]
Kithcart, A.P.; Beckman, J.A. ACC/AHA Versus ESC Guidelines for Diagnosis and Management of Peripheral Artery Disease. J. Am. Coll. Cardiol. 2018, 72, 2789–2801. [Google Scholar] [CrossRef]
Murabito, J.M.; D’Agostino, R.B.; Silbershatz, H.; Wilson, P.W.F. Intermittent Claudication: A Risk Profile from the Framingham Heart Study. Circulation 1997, 96, 44–49. [Google Scholar] [CrossRef]
Vogt, M.T. Decreased Ankle/Arm Blood Pressure Index and Mortality in Elderly Women. JAMA J. Am. Med. Assoc. 1993, 270, 465. [Google Scholar] [CrossRef]
Rose, G.A.; Blackburn, H.; Gillum, R.F.; Prineas, R.J. Cardiovascular Survey Methods. World Health Organ. Monogr. Ser. 1982, 56, 188. [Google Scholar]
Meru, A.V.; Mittra, S.; Thyagarajan, B.; Chugh, A. Intermittent Claudication: An Overview. Atherosclerosis 2006, 187, 221–237. [Google Scholar] [CrossRef] [PubMed]
Norgren, L.; Hiatt, W.R.; Dormandy, J.A.; Nehler, M.R.; Harris, K.A.; Fowkes, F.G.R.; Bell, K.; Caporusso, J.; Durand-Zaleski, I.; Komori, K.; et al. Inter-Society Consensus for the Management of Peripheral Arterial Disease (TASC II). Eur. J. Vasc. Endovasc. Surg. 2007, 33, S1–S75. [Google Scholar] [CrossRef] [PubMed]
Regensteiner, J.G.; Hiatt, W.R.; Coll, J.R.; Criqui, M.H.; Treat-Jacobson, D.; McDermott, M.M.; Hirsch, A.T.; Rooke, T. The Impact of Peripheral Arterial Disease on Health-Related Quality of Life in the Peripheral Arterial Disease Awareness, Risk, and Treatment: New Resources for Survival (PARTNERS) Program. Vasc. Med. 2008, 13, 15–24. [Google Scholar] [CrossRef] [PubMed]
Gardner, A.W.; Montgomery, P.S. The Relationship between History of Falling and Physical Function in Subjects with Peripheral Arterial Disease. Vasc. Med. 2001, 6, 223–227. [Google Scholar] [CrossRef] [PubMed]
McDermott, M.M.G.; Ohlmiller, S.M.; Liu, K.; Guralnik, J.M.; Martin, G.J.; Pearce, W.H.; Greenland, P. Gait Alterations Associated with Walking Impairment in People with Peripheral Arterial Disease with and without Intermittent Claudication. J. Am. Geriatr. Soc. 2001, 49, 747–754. [Google Scholar] [CrossRef]
Feinglass, J.; McCarthy, W.J.; Slavensky, R.; Manheim, L.M.; Martin, G.J.; Keen, R.; Govostis, D.M.; Golan, J.F.; Schneider, J.R.; Madayag, M.; et al. Effect of Lower Extremity Blood Pressure on Physical Functioning in Patients Who Have Intermittent Claudication. J. Vasc. Surg. 1996, 24, 503–512. [Google Scholar] [CrossRef]
Issa, S.M.; Hoeks, S.E.; Reimer, W.J.S.O.; Van Gestel, Y.R.B.M.; Lenzen, M.J.; Verhagen, H.J.M.; Pedersen, S.S.; Poldermans, D. Health-Related Quality of Life Predicts Long-Term Survival in Patients with Peripheral Artery Disease. Vasc. Med. 2010, 15, 163–169. [Google Scholar] [CrossRef]
Myers, S.A.; Applequist, B.C.; Huisinga, J.M.; Pipinos, I.I.; Johanning, J.M. Gait Kinematics and Kinetics Are Affected More by Peripheral Arterial Disease than by Age. J. Rehabil. Res. Dev. 2016, 53, 229–238. [Google Scholar] [CrossRef]
Myers, S.A.; Johanning, J.M.; Pipinos, I.I.; Schmid, K.K.; Stergiou, N. Vascular Occlusion Affects Gait Variability Patterns of Healthy Younger and Older Individuals. Ann. Biomed. Eng. 2013, 41, 1692–1702. [Google Scholar] [CrossRef]
Wurdeman, S.R.; Koutakis, P.; Myers, S.A.; Johanning, J.M.; Pipinos, I.I.; Stergiou, N. Patients with Peripheral Arterial Disease Exhibit Reduced Joint Powers Compared to Velocity-Matched Controls. Gait Posture 2012, 36, 506–509. [Google Scholar] [CrossRef]
McDermott, M.M.; Kerwin, D.R.; Liu, K.; Martin, G.J.; O’Brien, E.; Kaplan, H.; Greenland, P. Prevalence and Significance of Unrecognized Lower Extremity Peripheral Arterial Disease in General Medicine Practice. J. Gen. Intern. Med. 2001, 16, 384–390. [Google Scholar] [CrossRef] [PubMed]
Clairotte, C.; Retout, S.; Potier, L.; Roussel, R.; Escoubet, B. Automated Ankle-Brachial Pressure Index Measurement by Clinical Staff for Peripheral Arterial Disease Diagnosis in Nondiabetic and Diabetic Patients. Diabetes Care 2009, 32, 1231–1236. [Google Scholar] [CrossRef] [PubMed]
Firnhaber, J.M.; Carolina, M.A.; Daop, N. Lower Extremity Peripheral Artery Disease: Diagnosis and Treatment. Am. Fam. Physician 2019, 99, 362–369. [Google Scholar]
Simon, A.; Papoz, L.; Ponton, A.; Segond, P.; Becker, F.; Drouet, L.; Levenson, J.; Marazanof, M.; Sentou, Y.; Chollet, E.; et al. Feasibility and Reliability of Ankle/Arm Blood Pressure Index in Preventive Medicine. Angiology 2000, 51, 463–471. [Google Scholar] [CrossRef]
Sheng, C.-S.; Li, Y.; Huang, Q.-F.; Kang, Y.-Y.; Li, F.-K.; Wang, J.-G. Pulse Waves in the Lower Extremities as a Diagnostic Tool of Peripheral Arterial Disease and Predictor of Mortality in Elderly Chinese. Hypertension 2016, 67, 527–534. [Google Scholar] [CrossRef]
Donohue, C.M.; Adler, J.V.; Bolton, L.L. Peripheral Arterial Disease Screening and Diagnostic Practice: A Scoping Review. Int. Wound J. 2020, 17, 32–44. [Google Scholar] [CrossRef]
Ramirez, J.L.; Magaret, C.A.; Khetani, S.A.; Rhyne, R.F.; Peters, C.; Barnes, G.; Grenon, S.M. PC102. A Novel Machine Learning-Driven Clinical and Proteomic Tool for the Diagnosis of Peripheral Artery Disease. J. Vasc. Surg. 2019, 69, e233–e234. [Google Scholar] [CrossRef]
Ross, E.G.; Shah, N.H.; Dalman, R.L.; Nead, K.T.; Cooke, J.P.; Leeper, N.J. The Use of Machine Learning for the Identification of Peripheral Artery Disease and Future Mortality Risk. J. Vasc. Surg. 2016, 64, 1515–1522.e3. [Google Scholar] [CrossRef]
Baloch, Z.O.; Raza, S.A.; Pathak, R.; Marone, L.; Ali, A. Machine Learning Confirms Nonlinear Relationship between Severity of Peripheral Arterial Disease, Functional Limitation and Symptom Severity. Diagnostics 2020, 10, 515. [Google Scholar] [CrossRef]
Kim, S.; Hahn, J.-O.; Youn, B.D. Detection and Severity Assessment of Peripheral Occlusive Artery Disease via Deep Learning Analysis of Arterial Pulse Waveforms: Proof-of-Concept and Potential Challenges. Front. Bioeng. Biotechnol. 2020, 8, 720. [Google Scholar] [CrossRef]
Flores, A.M.; Demsas, F.; Leeper, N.J.; Ross, E.G. Leveraging Machine Learning and Artificial Intelligence to Improve Peripheral Artery Disease Detection, Treatment, and Outcomes. Circ. Res. 2021, 128, 1833–1850. [Google Scholar] [CrossRef] [PubMed]
Ara, L.; Luo, X.; Sawchuk, A.; Rollins, D. Automate the Peripheral Arterial Disease Prediction in Lower Extremity Arterial Doppler Study Using Machine Learning and Neural Networks. In Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Niagara Falls, NY, USA, 4 September 2019; ACM: New York, NY, USA, 2019; pp. 130–135. [Google Scholar]
Horst, F.; Lapuschkin, S.; Samek, W.; Müller, K.-R.; Schöllhorn, W.I. Explaining the Unique Nature of Individual Gait Patterns with Deep Learning. Sci. Rep. 2019, 9, 2391. [Google Scholar] [CrossRef] [PubMed]
Li, M.H.; Mestre, T.A.; Fox, S.H.; Taati, B. Vision-Based Assessment of Parkinsonism and Levodopa-Induced Dyskinesia with Pose Estimation. J. Neuroeng. Rehabil. 2018, 15, 97. [Google Scholar] [CrossRef]
Kondragunta, J.; Wiede, C.; Hirtz, G. Gait Analysis for Early Parkinson’s Disease Detection Based on Deep Learning. Curr. Dir. Biomed. Eng. 2019, 5, 9–12. [Google Scholar] [CrossRef]
Juutinen, M.; Wang, C.; Zhu, J.; Haladjian, J.; Ruokolainen, J.; Puustinen, J.; Vehkaoja, A. Parkinson’s Disease Detection from 20-Step Walking Tests Using Inertial Sensors of a Smartphone: Machine Learning Approach Based on an Observational Case-Control Study. PLoS ONE 2020, 15, e0236258. [Google Scholar] [CrossRef] [PubMed]
Schieber, M.N.; Pipinos, I.I.; Johanning, J.M.; Casale, G.P.; Williams, M.A.; DeSpiegelaere, H.K.; Senderling, B.; Myers, S.A. Supervised Walking Exercise Therapy Improves Gait Biomechanics in Patients with Peripheral Artery Disease. J. Vasc. Surg. 2020, 71, 575–583. [Google Scholar] [CrossRef]
Myers, S.A.; Pipinos, I.I.; Johanning, J.M.; Stergiou, N. Gait Variability of Patients with Intermittent Claudication Is Similar before and after the Onset of Claudication Pain. Clin. Biomech. 2011, 26, 729–734. [Google Scholar] [CrossRef]
Szymczak, M.; Krupa, P.; Oszkinis, G.; Majchrzycki, M. Gait Pattern in Patients with Peripheral Artery Disease. BMC Geriatr. 2018, 18, 52. [Google Scholar] [CrossRef]
Crowther, R.G.; Spinks, W.L.; Leicht, A.S.; Quigley, F.; Golledge, J. Relationship between Temporal-Spatial Gait Parameters, Gait Kinematics, Walking Performance, Exercise Capacity, and Physical Activity Level in Peripheral Arterial Disease. J. Vasc. Surg. 2007, 45, 1172–1178. [Google Scholar] [CrossRef]
Koutakis, P.; Johanning, J.M.; Haynatzki, G.R.; Myers, S.A.; Stergiou, N.; Longo, G.M.; Pipinos, I.I. Abnormal Joint Powers before and after the Onset of Claudication Symptoms. J. Vasc. Surg. 2010, 52, 340–347. [Google Scholar] [CrossRef]
Celis, R.; Pipinos, I.I.; Scott-Pandorf, M.M.; Myers, S.A.; Stergiou, N.; Johanning, J.M. Peripheral Arterial Disease Affects Kinematics during Walking. J. Vasc. Surg. 2009, 49, 127–132. [Google Scholar] [CrossRef] [PubMed]
Chen, S.-J.; Pipinos, I.; Johanning, J.; Radovic, M.; Huisinga, J.M.; Myers, S.A.; Stergiou, N. Bilateral Claudication Results in Alterations in the Gait Biomechanics at the Hip and Ankle Joints. J. Biomech. 2008, 41, 2506–2514. [Google Scholar] [CrossRef] [PubMed]
Koutakis, P.; Pipinos, I.I.; Myers, S.A.; Stergiou, N.; Lynch, T.G.; Johanning, J.M. Joint Torques and Powers Are Reduced during Ambulation for Both Limbs in Patients with Unilateral Claudication. J. Vasc. Surg. 2010, 51, 80–88. [Google Scholar] [CrossRef]
Vaughan, C.L.; Davis, B.L.; Jeremy, C.O. Dynamics of Human Gait; Kiboho Publishers: Cape Town, South Africa, 1999. [Google Scholar]
Saltzman, C.L.; Nawoczenski, D.A.; Talbot, K.D. Measurement of the Medial Longitudinal Arch. Arch. Phys. Med. Rehabil. 1995, 76, 45–49. [Google Scholar] [CrossRef]
Winter, D.A. Biomechanics and Motor Control of Human Movement; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2009; ISBN 9780470549148. [Google Scholar]
Belkin, M.; Hsu, D.; Ma, S.; Mandal, S. Reconciling Modern Machine-Learning Practice and the Classical Bias–Variance Trade-Off. Proc. Natl. Acad. Sci. USA 2019, 116, 15849–15854. [Google Scholar] [CrossRef] [PubMed]
Parameswaran, R.; Box, G.E.P.; Hunter, W.G.; Hunter, J.S. Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building; Wiley: Hoboken, NJ, USA, 1979; Volume 16, ISBN 0-471-09315-7. [Google Scholar]
KENT, J.T. Information Gain and a General Measure of Correlation. Biometrika 1983, 70, 163–173. [Google Scholar] [CrossRef]
Akoglu, H. User’s Guide to Correlation Coefficients. Turkish J. Emerg. Med. 2018, 18, 91–93. [Google Scholar] [CrossRef]
Kuhn, M.; Johnson, K. Feature Engineering and Selection A Practical Approach for Predictive Models; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar]
Brown, M.B.; Forsythe, A.B. Robust Tests for the Equality of Variances. J. Am. Stat. Assoc. 1974, 69, 364–367. [Google Scholar] [CrossRef]
Pears, R.; Finlay, J.; Connor, A.M. Synthetic Minority Over-Sampling TEchnique(SMOTE) for Predicting Software Build Outcomes. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE, Pittsburgh, PA, USA, 8 July 2022; pp. 546–551. [Google Scholar]
Rosenblatt, F. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychol. Rev. 1958, 65, 386–408. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Jin, Z.; Shang, J.; Zhu, Q.; Ling, C.; Xie, W.; Qiang, B. RFRSF: Employee Turnover Prediction Based on Random Forests and Survival Analysis. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); LNCS; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12343, pp. 503–515. ISBN 9783030620073. [Google Scholar]
Cortes, C.; Vapnik, V.; Networks, S. Bio-Informatics. Bioinformatics 1995, 20, 273–297. [Google Scholar]
Menard, S. Applied Logistic Regression Analysis; Sage: Thousand Oaks, CA, USA, 2002. [Google Scholar]
Iqbal, M.S.; Ahmad, I.; Bin, L.; Khan, S.; Rodrigues, J.J.P.C. Deep Learning Recognition of Diseased and Normal Cell Representation. Trans. Emerg. Telecommun. Technol. 2021, 32, e4017. [Google Scholar] [CrossRef]
Sarvamangala, D.R.; Kulkarni, R.V. Convolutional Neural Networks in Medical Image Understanding: A Survey. Evol. Intell. 2022, 15, 1–22. [Google Scholar] [CrossRef] [PubMed]
Daoud, M.; Mayo, M. A Survey of Neural Network-Based Cancer Prediction Models from Microarray Data. Artif. Intell. Med. 2019, 97, 204–214. [Google Scholar] [CrossRef]
Gerard, C. TensorFlow.Js. In Practical Machine Learning in JavaScript; Apress: Berkeley, CA, USA, 2021; pp. 25–43. ISBN 9781931971331. [Google Scholar]
Pedregosa, F.; Gaël, V.; Alexandre, G.; Vincent, M.; Bertrand, T.; Olivier, G.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Sanner, F. Python: A Programming Language for so Ware Integration and Development Related Papers. J. Mol. Graph. Model. 1999, 17, 57–61. [Google Scholar]
Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Chicco, D.; Tötsch, N.; Jurman, G. The Matthews Correlation Coefficient (MCC) Is More Reliable than Balanced Accuracy, Bookmaker Informedness, and Markedness in Two-Class Confusion Matrix Evaluation. BioData Min. 2021, 14, 13. [Google Scholar] [CrossRef]
Akosa, J.S. Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data. SAS Glob. Forum 2017, 942, 1–12. [Google Scholar]

Figure 1. A flowchart briefly describing the utilized data and the methods applied.

Figure 2. Feature variance comparison of gait features between patients with PAD and healthy controls. (a) GRF gait features, (b) Ankle gait features, (c) Hip gait features, (d) Knee gait features. The green asterisks indicate a significant difference in variance based on Levene’s homogeneity of variance.

Figure 3. A comparison between gait feature sources in terms of (a) Average ANOVA F-statistic and (b) Average Information Gain. GRF features have higher F-statistic and information gain than the ankle, knee, and hip gait parameters.

Figure 4. A correlation study of all numeric gait features regardless of PAD status (healthy controls or patients with PAD). The color line indicates the correlation coefficient between features from “Dark Red: −1” to “Navy: 1”. A correlation coefficient of “−1” between two variables implies a perfect negative relationship, and a correlation coefficient of “1” between two variables implies a perfect positive relationship. If the correlation between two variables is 0, there is no linear relationship.

Figure 5. Flowchart showing the 3-step ML analysis method to identify PAD and extract the most valuable gait signatures for PAD diagnosis.

Figure 6. Performance metric results that measure the ability of each ML model algorithm to distinguish PAD.

Figure 7. A measurement of the ability of a few laboratory-based gait features to distinguish PAD using Matthew’s Correlation Coefficient. Generally, the GRF-based models (Groups 1 and 5) performed better than joint data models (Groups 2, 3, 4, and 6) and provided comparable results to using all gait features.

Figure 8. A comparison between GRF-based signals in identifying PAD using Matthew’s Correlation Coefficient as a performance metric. For both Neural Network and Random Forest model approaches, including all GRF components led to better PAD classification compared with any single GRF component (x = anteroposterior, y = mediolateral, and z = vertical).

Table 1. The hyperparameters for each algorithm.

Algorithm	List of Hyperparameters
Neural Networks	Activation function Optimizer Kernel Initializer Learning rate Regularization Batch size Number of epochs
Random Forest	Number of trees Maximum number of features Maximum depth of layers Criteria
SVM	Kernel Gamma Penalty parameter
Logistic Regression	Regularization

Table 2. A summary of the models’ performances on the testing data. It evaluates the applied model scores utilizing every group based on four performance metrics. The shaded columns highlight the comparison between all gait features models (Group 1) and GRF features (Group 6).

Metric	Model Type	Group Category
		All	Ankle	Hip	Knee	GRF	Ankle, Hip, Knee
		Group 1	Group 2	Group 3	Group 4	Group 5	Group 6
Accuracy	Neural Networks	0.89	0.79	0.78	0.81	0.82	0.84
Accuracy	Random Forest	0.89	0.69	0.73	0.75	0.87	0.83
Discriminant Power	Neural Networks	1.94	0.95	0.82	0.90	1.87	1.33
Discriminant Power	Random Forest	1.94	0.64	0.29	0.71	2.09	1.19
Geometric Mean	Neural Networks	0.83	0.65	0.61	0.54	0.84	0.84
Geometric Mean	Random Forest	0.83	0.63	0.46	0.60	0.87	0.63
Matthew’s Correlation Coefficient	Neural Networks	0.64	0.33	0.27	0.27	0.57	0.44
Matthew’s Correlation Coefficient	Random Forest	0.64	0.22	0.09	0.24	0.64	0.39
Best model type		Neural Networks, Random Forest	Neural Networks	Neural Networks	Neural Networks	Random Forest	Neural Networks
ML Performance Metrics Description: Accuracy: the number of correct predictions divided by the total number of examples. Range: (0 to 1), an accuracy value of “1” means the model predicts perfectly with no errors. Discriminant Power: measures the ability of the classifier to distinguish between minority (healthy controls) and majority (Patients with PAD) cases. A higher Discriminant Power value translates to better model performance. Geometric Mean measures the balance of the classification performance in the majority and minority cases. The higher the geometric mean, the better the model performance. Matthew’s Correlation Coefficient provides a good score only if the model performs well in all four confusion matrix categories. Range: (−1 to 1), with “1” as the perfect model, “−1” as the worst model, and 0 no better than a random naïve model.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Ramini, A.; Hassan, M.; Fallahtafti, F.; Takallou, M.A.; Rahman, H.; Qolomany, B.; Pipinos, I.I.; Alsaleem, F.; Myers, S.A. Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data. Sensors 2022, 22, 7432. https://doi.org/10.3390/s22197432

AMA Style

Al-Ramini A, Hassan M, Fallahtafti F, Takallou MA, Rahman H, Qolomany B, Pipinos II, Alsaleem F, Myers SA. Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data. Sensors. 2022; 22(19):7432. https://doi.org/10.3390/s22197432

Chicago/Turabian Style

Al-Ramini, Ali, Mahdi Hassan, Farahnaz Fallahtafti, Mohammad Ali Takallou, Hafizur Rahman, Basheer Qolomany, Iraklis I. Pipinos, Fadi Alsaleem, and Sara A. Myers. 2022. "Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data" Sensors 22, no. 19: 7432. https://doi.org/10.3390/s22197432

APA Style

Al-Ramini, A., Hassan, M., Fallahtafti, F., Takallou, M. A., Rahman, H., Qolomany, B., Pipinos, I. I., Alsaleem, F., & Myers, S. A. (2022). Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data. Sensors, 22(19), 7432. https://doi.org/10.3390/s22197432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Peripheral Artery Disease Identification Using Laboratory-Based Gait Data

Abstract

1. Introduction

2. Data Sources

3. Descriptive Data Analysis

4. Predictive ML Models to Diagnose PAD

5. ML Models Results

6. Discussion

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI