A Machine Learning Model to Predict Knee Osteoarthritis Cartilage Volume Changes over Time Using Baseline Bone Curvature

The hallmark of osteoarthritis (OA), the most prevalent musculoskeletal disease, is the loss of cartilage. By using machine learning (ML), we aimed to assess if baseline knee bone curvature (BC) could predict cartilage volume loss (CVL) at one year, and to develop a gender-based model. BC and cartilage volume were assessed on 1246 participants using magnetic resonance imaging. Variables included age, body mass index, and baseline values of eight BC regions. The outcome consisted of CVL at one year in 12 regions. Five ML methods were evaluated. Validation demonstrated very good accuracy for both genders (R ≥ 0.78), except the medial tibial plateau for the woman. In conclusion, we demonstrated, for the first time, that knee CVL at one year could be predicted using five baseline BC region values. This would benefit patients at risk of structural progressive knee OA.


Introduction
Osteoarthritis (OA) is the most prevalent musculoskeletal disease and a common joint degenerative disease. OA is a global health burden and is accountable for substantial health costs [1,2]. It is characterized by chronic pain and functional disability, and the knee is the most affected among the joints [3]. The hallmark of the disease is the loss of a joint tissue, the cartilage [3].
OA diagnosis often occurs late, i.e., when the destruction of articular tissues has reached a late stage. This is of importance as although OA is characterized by being a disease of "older age", younger people are more and more being affected by this disease [4]. Moreover, its two most prominent risk factors, age and body mass index (BMI) [5], are also of considerable concern for the healthcare system, as there is a growing number of aging and obese people worldwide who will soon confront the system with an unsustainable draw for OA individuals. Above all, there is not yet a curative cure (in the form of diseasemodifying OA drugs [DMOADs]) for this disease [3,6]. Currently, OA treatments only relieve symptoms.
To be able to combat the rise of this disease, there is a critical need to identify, at an early stage, individuals at risk of having a structurally progressive disease, i.e., rapid degradation of cartilage. Indeed, therapeutic strategies used early during the pathological process may permit to reduce/stop the structural progression of the disease. In turn, this would lead to an improvement of the symptoms. This is important, as in recent years there has been an issue about the safety of some of the symptom relief treatments, which were related to potential detrimental systemic impacts such as cardiovascular risks, increased risk of morbidity, and even mortality [7,8]. Moreover, the identification of individuals at risk of having a structural progressive disease is also of high significance for the development of DMOADs. Hence, a great part of the challenge in the development of such drugs is often the inclusion of patients in trials with advanced OA (severe cartilage loss), making it difficult to reduce or stop the degenerative process, therefore not suitable for DMOAD therapy, and impeding the power analysis of such trials.
Early identification of OA structural progressors currently depends on clinical judgment with the help of radiographic evaluation. However, it is well known that X-rays are not sensitive enough to detect early knee articular alteration [9,10]. Therefore, it is of great importance to develop automated and practical tools that will identify, at an early stage, OA patients for whom articular tissue alterations will progress rapidly.
A variety of fluid biomarkers has been evaluated for such discrimination. However, despite a significant body of research in this field, there is not yet a validated signature for early diagnosis or prognosis of the disease [11]. Limitations with fluid biomarkers include, among others, the fact that there is often no direct correlation with joint structural changes, the poorly defined association with age-related changes, some being related to obesity and cannot distinguish between OA and obesity, and that the use of only one fluid biomarker cannot fully reflect the complex patterns underlying this disease.
At present, for optimal forecasting of joint structural alterations, increasing evidence points toward the use of articular structural (tissue) markers. At first, cartilage alteration was evaluated as a marker for the knee. However, when cartilage begins to show degradation as evaluated by clinical features and/or radiography it is already at a moderate stage of the disease. Recently, the change in knee bone was suggested as an accurate marker to identify early OA structural progressors; knee bone alteration was shown to precede cartilage losses and contribute to the development of the disease [12][13][14][15][16][17].
Over the years, many methodologies were introduced to evaluate such bony changes and included bone attrition, joint incongruity, periarticular area, shape, and curvature [13,14,16,[18][19][20][21][22][23][24]. However, some used radiographic determination, which could lead to imprecision due to its dependence on the acquisition method and/or statistical modelling involving a component that is operator-dependent, which may introduce errors. Others used magnetic resonance imaging (MRI), and among the developed technologies, certain had shortcomings. For example, for the bone area, the assessment is subjective with inconsistent associations with knee structural progression. Machine learning (ML) techniques, coupled with MRI, have opened new possibilities for large-scale data integration to assess precise measurements of OA status in a multidimensional manner. Recently, by using these two methodologies (MRI and bone change), the measurement of the bone shape vector [25] and the subchondral bone length (SBL) [26] were reported. Yet, the bone shape vector was developed only for one bone, the femur, and included in its measurement the osteophytes (bony projections), which may induce inaccuracy in bone shape measurement changes, while the SBL uses 2D shape measurement. Another MRI fully automated methodology was developed and assessed the bone curvature (BC) [20]. This BC assessment methodology in addition to being quantitative, is patient-based, and, while preserving the measured bone surface, did remove two bone alterations (peripheral osteophytes and bone marrow lesions [BML], including edema and cysts) that could interfere with the bone measurement [20,27]. By using this system, BC alteration was shown to precede cartilage volume loss (CVL), in addition to predicting the effectiveness of OA treatment [20].
In the search for a model/tool that could offer an objective and quantitative assessment in the early forecasting of knee OA structural progressors, we hypothesized that knee BC features at baseline could predict, for an individual, CVL at one year. A primary concern was the understanding of which bone regions can play an effective role in such a prediction. Second, was the developed model able to predict CVL at one year in more than one knee subregion with the same baseline variables. Third, could the developed model be accurate for both genders, and fourth, could it be replicated and extended to another OA cohort for the prediction of outcomes? To answer these questions, we (i) applied feature selection by using ML algorithms on a fairly large sample to find the most important BC regions, (ii) developed advanced gender-based prediction models that provide high prediction performance on all the cartilage regions, and (iii) evaluated the reproducibility of the developed models by using an external cohort of OA patients from a clinical trial. Data revealed that the combination of five knee BC region values at baseline could predict CVL after one year on 12 knee regions with high accuracy and reproducibility.

Study Population
The models were developed using individuals from the Osteoarthritis Initiative (OAI) cohort. The OAI cohort, an observational study of the natural progression of knee OA, included men and women between the ages of 45 and 79, enrolled at four centers across the United States (Columbus, OH; Baltimore, MD; Pawtucket, RI; Pittsburgh, PA). The cohort included 4796 individuals at baseline (https://nda.nih.gov/oai/study-details, last accessed date: 25 October 2019). For this study, 3395 participants, at the baseline, having the parameters for the classification of participants into structural progressors or no-progressors (see below for description), were included.
To validate the developed models, an external dataset consisting of knee OA patients from a clinical trial was used [28]. This cohort comprised patients with primary symptomatic knee OA from a multicenter, randomized, double-blind clinical trial evaluating the effect of Licofelone (a lipoxygenase/cyclooxygenase inhibitor). Here, 77 patients were selected from the comparator arm (Naproxen, a cyclooxygenase inhibitor) of this trial. This cohort was named Naproxen.

Classification of Participants into Structural Progressors
This study was performed using the structural progressors, as we wanted to develop a model on individuals presumed as having disease progression. To this end, each participant was assigned a label for their probability values of being structural progressors (PVBSP), as previously described [29]. In brief, the PVBSP label for each participant included the values of five features at the baseline, as well as an outcome. The features were two X-rays: the medial minimum joint space width (JSW) and medial joint space narrowing (JSN) as a score [30], and three quantitative MRIs: mean cartilage thickness of peripheral, medial, and central tibial plateaus. The outcome was JSN ≥ 1 at 48 months. For discrimination of the structural progressor from the no-progressor, a binary classification in the context of multilabel classification was calculated by employing a threshold value using the maximizing F1 score, as described [31].
Data revealed that for the OAI cohort, 39% of the participants were classified as structural progressors (1246; 659 women and 587 men) and used to build the model. For the Naproxen cohort (validation), these proportions were reversed, and 69% (53; 20 women and 33 men) of the patients were labelled structural progressors.

Knee MRI Tissue Acquisitions
For the OAI cohort, MRIs were acquired with a 3T apparatus (Magneton Trio, Siemens, Germany) using a double-echo-steady state (DESS) imaging protocol, as per the OAI protocol. For the Naproxen cohort, the MRI acquisition was done as previously described, with a 1.5T apparatus with an integrated knee coil using 3D fast imaging with steady-state precession (FISP) with water excitation (Siemens, Erlangen, Germany) or spoiled gradient echo recalled (SPGR) with fat suppression (General Electric, Milwaukee, WI, USA) [28].

Bone Curvature
Bone curvature was evaluated using a fully MRI automated quantitative system, as previously described [20]. In brief, the mean curvature of a surface corresponds to the average of the two eigenvalues of the Weingarten matrix and is expressed as m −1 . The method used the cylindrical coordinate representation of the surfaces obtained by automatic segmentation [27], smoothed using a Gaussian filter of standard deviation sigma = 4 and size 6 * sigma in the configuration space, allowing for a curvature map of average resolution of 2 mm in the image by 6 mm transversely to the images. For each knee bone surface, the mean curvature was computed and averaged for all the samples of a region. In this study, the knee BC regions used as variables (input) included eight "basic" regions: lateral and medial trochlea, lateral and medial central condyle, lateral and medial posterior condyle, and lateral and medial tibial plateau. These regions were named basics, as added together, they provided the global knee or subregions.

Model Development
The development of the prediction model was performed in two phases ( Figure 1). As illustrated in Figure 1a, Phase 1, the independent variables (input) based on gender separation were grouped into two major OA risk factors (age and BMI) and eight knee BC regions, and the outcomes (output) were the CVL at one year in 12 regions. After selecting the best ML algorithm, the most representative region of CVL at one year as the outcome was identified. Further, and as illustrated in Figure 1b, Phase 2, the relevant input variable combination was identified.

Phase 1 Selecting the Best ML Algorithm
Five different ML-based methods were investigated. The ML techniques included tree-or non-tree-based methods. The tree-based methods were random forest (RF) [37], M5Rules [38], and M5P [38], and the non-tree-based methods were multilayer perceptron (MLP) [39] and the adaptive neuro-fuzzy inference system (ANFIS) [40]). The outcomes of ML analysis with tree-and non-tree-based methods and statistical analysis were implemented using MATLAB and Waikato Environment for Knowledge Analysis (WEKA) software. The main concept of each method is provided in the Supplementary Materials Table S1.

Finding the Most Representative Region of CVL at One Year as the Outcome
In contrast to the custom ML problem using one outcome as the target, this study was confronted with 12 outcomes, which was a challenging task. Our strategy was to find, as a first step, the most representative region to develop a model, and then assess the algorithm of the developed model with the other 11 regions. To this end, the best ML algorithm was used to analyze each outcome region (CVL at one year) using three different statistical indices (see below Statistical analysis).

Phase 2 Selecting the Variable Combination
With the use of the representative cartilage region as the outcome, we further investigated the most influential variable combinations. Selecting the relevant variables in ML models saves resources in the data collection step during model development or model applications. Having fewer misleading variables not only improves the accuracy of the ML model but removes multicollinearity that reduces the possibility of overfitting in the ML model. The variable reduction was performed stepwise, in which each step included the reduction of one and then two variables. This strategy not only removed the lowest-cost variable(s) among all possible input combinations but also led us to check the synergy of two variables along with variable reduction. In total, with ten input variables, 1023 different input combinations can be defined; however, by applying the above-mentioned strategy, 97 different variable combinations were evaluated.

Systematic Controllability Variable Reduction
We then performed the systematic controllability variable reduction for CVL prediction at one year. We had ten variables, then the process of variable reduction started with nine and eight variables for which all possible combinations were analyzed. The best model from this step was employed in the next step of variable reduction. This process of reducing variables continued until the ML results showed a significant decrease in accuracy, and the reduction in the number of variables did not further reduce the ML modelling accuracy.

Statistical Analysis
To evaluate the performance of the different methods with different variable combinations, three statistical indices were employed. They included correlation coefficient (R) as a correlation-based index, root mean square error (RMSE) and mean absolute error (MAE) as two well-known absolute indices. The simultaneous use of these indices to verify the efficiency of a model provided a robust evaluation [41].

Participant Characteristics
A comparison between the structural progressor baseline characteristics of OAI with Naproxen (Table 1) showed that OAI participants had lower BMI, WOMAC scores, JSW, and BC in the medial compartment. Moreover, OAI participants had higher cartilage volume in the global and lateral compartments. These indicate that the patients from the Naproxen cohort had a higher level of disease severity, which also explained the higher amount of structural progressor participants in the Naproxen (69%) compared to the OAI (39%) cohorts.

Finding the Optimal Parameters for Each of the ML Methods
Five well-known ML techniques in solving complex nonlinear problems were evaluated: M5P, RF, M5Rules, MLP, and ANFIS. The optimal values of the parameters for each ML technique were found through a trial-and-error process and are described in the Supplementary Materials, Table S1.

Selection of ML Technologies
Next, the best ML-based modelling algorithm was investigated. All ten variables (risk factors and BC regions) were employed to estimate 12 outcomes (CVL regions) using the five mentioned ML-based models. Results showed (Supplementary Materials, Table S2) for all the outcomes that ANFIS had higher or equal accuracy (R) and lower or equal RSME and MAE than the other ML methodologies, except in one region (lateral tibial plateau for the MLP, where the difference for the R was only 3%). Accordingly, ANFIS was then further used to develop gender-based models.

Finding the Representative Region of CVL at One Year as the Outcome
To find the most representative outcome between the 12 regions, the performance of each cartilage region was examined with the ANFIS methodology using the ten variables. Data showed ( Table 2) that the medial condyle and the global tibial plateau had the highest accuracy. Moreover, although the R value was identical for these two regions, the RMSE and MAE were lower in the medial condyle. Therefore, the medial condyle region was elected as the most representative outcome.

Uncovering the Most Effective Input Variable (Risk Factors and BC Regions) Combination
Further, by using the medial condyle region as the representative outcome, the systematic controllability variable reduction was employed to find the optimal input combination for prediction, i.e., the best statistical indices and the lowest number of variables possible.
As the first step, 55 different variable combinations (Table 3) were analyzed, in which the M1 represented the model with all ten BC region variables, and M2-M55 corresponded to all possible combinations for nine (M2-11) and eight (M12-M55) variables.
As shown in Table 3, the lack of one of the variables provided in M1 resulted in a decrease in accuracy (R, RMSE, MAE) in all models, but in M11 (missing age), in which the accuracy was about the same as in M1. Therefore, age can be considered as a potential variable that can be removed with the least impact on the accuracy of the ML model in outcome prediction. Further investigation with eight variables revealed that among the models, M20 (missing age and medial tibial plateau) performed as M1 and better than M11. Thus, the combination of age and medial tibial plateau can be removed from the input variables without any impact on the accuracy of outcome prediction.
Therefore, by using M20, seven and six variables were analyzed. The results (Table 4) for models with seven variables revealed that M20-8 (missing age, medial tibial plateau, and BMI) outperformed not only other combinations but also M20. When looking at six variables, M20-13 (missing age, lateral central condyle, medial posterior condyle, and medial tibial plateau) presents statistical indices similar to M1 and M20, but the R has a 1% difference with M20-8.
Further, with M20-13, we looked at the five variables ( Table 5). The model M20-13-6 with five variables (in addition to the missing M20-13 variables, BMI is also lacking) had equal statistical indices to M1. ML models with a lower number of variables were also assessed and data demonstrated a significantly reduced accuracy (data not shown). Consequently, the best prediction model for medial condyle CVL at one year was M20-13-6, which employed only five knee BC regions at baseline including lateral and medial trochlea, lateral posterior condyle, lateral tibial plateau, and medial central condyle. Figure 2 shows a representation of the knee with the subregions in which the M20-13-6 variable combination is denoted (dark regions).  Table 6 recapitulates the obtained results of the proposed systematic controllability feature reduction for the prediction of medial condyle CVL at one year. Discrimination of the model M20-13-6 for each gender (Table 7) showed that the man has slightly better statistical indices than the woman.
3.2.5. Impact of Each M20-13-6 Variable in Medial Condyle Volume Loss at One Year Forecasting Table 8 shows the statistical indices of M20-13-6, wherein the effect of each feature was assessed by removing one variable at a time (M20-13-6-1 to M20-13-6-5). Data revealed that the lateral tibial plateau (M20-13-6-3), followed by the medial central condyle (M20-13-6-1), have a higher impact on the outcome forecasting; the worst statistical values were obtained when they were excluded. A lower impact (i.e., best statistical indices) was achieved with the lateral posterior condyle and the lateral and medial trochlea, respectively.         3.2.6. Performance of the M-20-13-6 Model on All 12 CVL Region Outcomes Next, we assessed the predictive validity of the selected ML algorithm on the other 11 cartilage regions (Table 9). Data showed very good accuracy for both genders and all 12 cartilage regions in the testing stage. The lowest accuracy in men was for the medial tibial plateau (R, 0.82; RMSE, 0.045; MAE, 0.030) and women, the medial femur (R, 0.79; RMSE, 0.027; MAE, 1.019) both in the testing stage. These results demonstrate the high performance of the M20-13-6 algorithm in the prediction of CVL in all 12 studied regions at one year based on five BC regions at the baseline.

Validation of the Developed ML Model with an External Cohort from a Clinical Trial
The purpose of a ML-based predictive model is to offer valid outcome predictions for new individuals that assure the generalizability of the model. To this end, the performance of the M20-13-6 model was evaluated using an external cohort (Naproxen) on all 12 cartilage regions studied discriminating men and women. The predictive model (Table 10) demonstrated very good accuracy for men and women (R ≥ 0.78), except for the medial tibial plateau for women. Table 10. Validation of the M20-13-6 model in the prediction of cartilage volume loss at one year in 12 cartilage regions using five bone curvature regions at the baseline.

Outcome
Man

Discussion
At present, we cannot discriminate, early during the OA process, patients for whom cartilage will degrade rapidly from those for whom the progression will be slow. Such discrimination would not only assist to modify the disease trajectory with a personalized clinical treatment plan but would represent a unique opportunity to intervene before cartilage degradation becomes too severe. Moreover, it would also enable patient screening for clinical trials for the development of DMOADs. Indeed, such drug trials have not yet achieved significant results, which appears to be mainly due to OA recruitment, in that patients have, for the most part, moderate to severe cartilage damage. Consequently, the effect of a DMOAD could not be observed with enough statistical power. This study was undertaken to fulfill these needs.
To achieve CVL forecasting, evidence points toward the use of joint tissue markers and, more recently, BC was suggested for the knee. We developed a gender-based model in which five BC regions at baseline (lateral tibial plateau, medial central condyle, lateral posterior condyle, and lateral and medial trochlea) enable the prediction of 12 global and regional CVL at one year with very good accuracy for both genders: OAI, R ≥ 0.79 (testing stage) and Naproxen (validation) R ≥ 0.78, except for the medial tibial plateau for women.
As we aimed to detect CVL for multiple (12 global/regional) outcomes, a two-phase ML-based methodology was performed. In Phase 1, after comparing the accuracy and benefits of five ML algorithms, ANFIS was found to be the most reliable for prediction. The selection of ANFIS was not surprising as it has the advantage over other ML methodologies of capturing the nonlinear structure of a problem, an adaptive capability and a rapid learning capacity as it combines a neural network with fuzzy logic, in addition to a significant potential for predicting systems with high uncertainty and in a dynamic nature.
Next, data showed that the most representative region of CVL (outcome) was the medial condyle. Such a finding could reflect that the medial tibiofemoral compartment of the knee, more specifically the medial condyle, displays a higher rate of cartilage change with greater sensitivity than the other regions [42][43][44][45], as well as being highly related to OA progression and total knee replacement [46][47][48].
In Phase 2, the relevant variables were selected. Reducing the number of variables for ML development and application saves resources. Moreover, having fewer misleading features not only improves the accuracy of a ML model but also removes multicollinearity, thus reducing the possibility of overfitting. To this end, we employed a systematic controllability variable reduction (removing the lowest cost features among all input variables) to identify the relevant ones.
In this study, of the five selected BC variables, the lateral tibial plateau and medial central condyle demonstrated the highest impact in prediction forecasting. This finding contrasts with a previous one in which two other BC regions, namely, medial posterior condyle and lateral central condyle, were found to be the best regions to predict CVL at two years [20]. In the current study, the weight of these two regions appeared to be somewhat important as they were eliminated only when eight variables were examined (M30). Removing these two variables resulted in a decrease of 10.5% in R, and an increase of about 27% in RMSE and MAE, compared to model M1 (all ten variables). It should also be taken into consideration that the period examined between the two studies, as well as the methodology varied, which could be responsible for the change in the selected variables.
Here, the selection of the lateral tibial plateau and medial central condyle was not unexpected as they both showed a high level of bony remodeling during OA. Indeed, the tibial plateau demonstrated expansion and increased depression during the OA process [49][50][51][52], and bony changes in the lateral tibial plateau were associated with the presence of radiographic OA [53]. Moreover, uneven lateral support of the tibial plateau has been reported to be a key factor that leads to the non-uniform settlement of the knee and a shift of the mechanical axis to the medial compartment, more specifically, on the medial central condyle [54]. The stresses engendered could be responsible for the reported flattening of the medial central condyle bone during OA [14,23]. Bone remodeling in the medial central condyle could also be due to the presence of a high level of BMLs in the OA knee in this region [55]. Although BML was removed from our BC segmentation [27], such subchondral bone changes are suggested to increase the levels of contact stresses, thus, bone remodeling [56].
Even though all the 12 studied global and regional cartilage regions could be predicted with high accuracy with the OAI participants, validation using OA patients from a clinical trial (Naproxen) showed that generalization was attained in all cartilage regions, except in the medial tibial plateau for women. The lower accuracy in this region in women could reflect the fact that (i) compared to the OAI, participants from the Naproxen cohort displayed more disease severity, as ascertained by the clinical parameters, (ii) during the OA process, there was a high level of cartilage thinning/loss as well as inter-subject variability in this region [33,43,57], in addition to (iii) the cartilage volume of women being smaller than in men [58].
The finding that BMI was not included in the model with five variables was rather surprising as a link between BMI and knee bone remodeling has been previously reported [59]. However, this is still under debate as other studies have not shown such an association [53]. Of note, the weight of this variable was, to some extent, important as it was included when six variables were investigated.
Some challenges and limitations of this study should be acknowledged. First, ANFIS was selected as the best ML algorithm for model development. Because of the use of ten input variables, a limitation of this method could have been the high computational expense due to the high number of iterations needed to achieve high accuracy. However, care was given to the selection of the appropriate number and shape of membership function in the ANFIS model as they impact the accuracy of the final results and computational complexity of the ANFIS-based model, and although a challenging task, we were able to define the appropriate membership function, i.e., Gaussian (Table S1) for this study.
Second, in practice, for a given ML problem, multiple equivalent solutions in variable selections can exist [60]. A shortcoming of some variable selection methods is that they injudiciously identify only a single solution, minimizing a loss function like mean squared error, classification error, etc. Yet, a single solution is not proper when variable selections can be considered both for building a predictive model with high accuracy and for knowledge discovery. In this study, we opted to employ the systematic controllability variable reduction as, instead of giving only one solution, it deals with achieving the highest accuracy by removing the lowest cost variables. In addition, with this step-by-step variable reduction, not only the sensitivity but also the synergy between two variables for the estimation of the outcome could be evaluated.
Third, we could have used the cartilage volume as the outcome. We favored this tissue volume loss as to whether baseline cartilage volume predicts future cartilage loss is questionable.
Fourth, another challenge was the CVL period to be analyzed. We chose one year to ensure both a reliable assessment of cartilage change sensitivity and high patient retention for its use in clinical practice. However, a longer period was not evaluated, and the five input variables found could differ. The next step will be to explore, for a longer observation period, whether the developed ML model using the same BC variables could also predict with high accuracy CVL for all the studied regions. Although the purpose of this study was to evaluate BC as prognosis for CVL over time, assessing a longer period of cartilage loss (e.g., two to four years) could educate us on the collinearity between these two structures. In addition, validating the ML model for longer periods converts it into an application that can be of broader use in clinical practice.
This study has several strengths. The use of MRI to assess BC and cartilage volume at baseline permits for automation of these two knee structures (thus, avoiding human error) and quantitative segmentation/measurement in the same knee [27,32]. In addition, the 3D nature of the MRI data over radiographs for knee tissue measurement avoids difficulties in interpreting findings that may be related to positioning during image acquisition and to projection effects. Moreover, when studying BC in OA, care should be taken not to confuse the osteophytes and BMLs with true differences in the bone. This putative problem was circumvented by the exclusion of these two tissues in the measurement methodology used [27]. Finally, special emphasis needs to be given to the validation (reproducibility) of data using an external clinical trial cohort which, in addition to mimicking patients seen in clinical routine, adds to the robustness and generalization of the developed ML model in that the accuracy persisted.

Conclusions
In this comprehensive study, we developed, for the first time, a reliable and generalizable ML model to predict global and regional CVL at one year based on five BC regions at the baseline, including the lateral tibial plateau, medial central condyle, lateral posterior condyle, and lateral and medial trochlea. This study offers a novel automated system for forecasting knee OA cartilage degradation as an important step toward OA precision medicine, which will significantly improve clinical prognosis with real-time patient monitoring.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biomedicines10061247/s1, Table S1: A machine learning model to predict knee osteoarthritis cartilage volume changes over time using baseline bone curvature; Table S2: Performance of five machine learning algorithms in predicting cartilage volume loss at one year in 12 regions.  Informed Consent Statement: For both Osteoarthritis Initiative (OAI) and Naproxen cohorts, all patients gave their written informed consent. Data Availability Statement: Data from the Osteoarthritis Initiative (OAI) cohort used in this study are publicly available (https://data-archive.nimh.nih.gov/oai/, last accessed date: 25 October 2019). The additional data used and/or analyzed for the current study are available from the corresponding author upon reasonable request, as long as the request is evaluated as scientifically relevant.