Machine Learning Algorithms: Prediction and Feature Selection for Clinical Refracture after Surgically Treated Fragility Fracture

Background: The number of patients with fragility fracture has been increasing. Although the increasing number of patients with fragility fracture increased the rate of fracture (refracture), the causes of refracture are multifactorial, and its predictors are still not clarified. In this issue, we collected a registry-based longitudinal dataset that contained more than 7000 patients with fragility fractures treated surgically to detect potential predictors for clinical refracture. Methods: Based on the fact that machine learning algorithms are often used for the analysis of a large-scale dataset, we developed automatic prediction models and clarified the relevant features for patients with clinical refracture. Formats of input data containing perioperative clinical information were table data. Clinical refracture was documented as the primary outcome if the diagnosis of fracture was made at postoperative outpatient care. A decision-tree-based model, LightGBM, had moderate accuracy for the prediction in the test and the independent dataset, whereas the other models had poor accuracy or worse. Results: From a clinical perspective, rheumatoid arthritis (RA) and chronic kidney disease (CKD) were noted as the relevant features for patients with clinical refracture, both of which were associated with secondary osteoporosis. Conclusion: The decision-tree-based algorithm showed the precise prediction of clinical refracture, in which RA and CKD were detected as the potential predictors. Understanding these predictors may improve the management of patients with fragility fractures.


Introduction
Fragility fractures are associated with increased morbidity and an economic burden. In 1990, the number of fragility fractures was 1.7 million worldwide, which is estimated to rise to 6.3 million by 2050 [1]. This would surely induce an economic burden [2][3][4]. Scoring models for fragility fractures have been reported to precisely manage patients with risk factors [5]. Among these factors, patients with fragility fractures are at an increased risk of sustaining another fracture (refracture) [6]. To prevent refractures, fracture liaison services have been established, and risk factors for refractures have been investigated [5,7]. Recent studies have reported that patients with rheumatoid arthritis (RA), multiple fractures history, and old aged women are at an increased risk of fracture [6,8,9]. Although those predictors for refracture were crucial when managing patients with fragility fracture, no literature has reported on prediction models by machine learning algorithms for clinical refractures, to our knowledge.
Machine learning algorithms use large-scale clinical data and learn patterns to assess outcomes [10]. In the medical field, automatic prediction of cancer recurrence has been reported using artificial neural networks (ANNs) [10][11][12]. While ANNs are biologically based on studies of the nervous system, they aim for nonlinear regression of classification, and are not biologically realistic in their details [13]. The system comprises several layers of a computational unit (artificial neuron), in which connections of each unit are highly non-linear. Convolutional neural networks (CNNs), a class of ANNs, typically have specific layers: convolution layers and pooling layers. The two layers efficiently function as feature extractions for digital images [14]. In fact, CNNs are applied to make diagnoses based on images such as computed tomography, magnetic resonance imaging, radiography, ultrasound images, and pathological images [15][16][17][18][19][20].
In contrast, there is another algorithm, a decision tree model [21][22][23]. The main components of the model are nodes and branches, and an important step in building a model is splitting. As only input variables related to the target variables are used to split a parent node and make branches into child nodes, this algorithm is also based on nonparametric evaluation [24]. Previous studies have reported the effective use of the decision tree model for table data by extracting optimal features [25][26][27]. In practice, clinical data contain table data and/or images; thus, an appropriate model should be applied.
We previously investigated the refracture rates using the longitudinal cohort data of patients treated surgically for fragility fractures [28]. The number of registered patients was over 7000 and the format was table data, which allowed us to hypothesize that the prediction models would be built using decision-tree-based algorithms, ANN, or other popular models. The aims of this current study were (1) to compare the accuracy of machine learning algorithms that predict clinical refracture after fragility fracture treated surgically, and (2) to clarify the relevant features for patients with clinical refracture.

Study Design and Patients
This study was designed as a registry-based study approved by the local ethics committee of Hokkaido University Hospital (017-0448). Informed consent was obtained from all patients prior to inclusion. Figure 1 shows the flow chart of the data sets; the patients with non-vertebral fragility fractures treated surgically were registered. Among them, patients with two or more missing input data were excluded from the analysis. An independent dataset was created by randomly choosing one hospital. The remaining patients were also randomly subdivided into training (75%) and test (25%) datasets.
Input data were in table data format, which were collected as the following: sex; age; body mass index (BMI); primary fracture site (proximal part of the femur, proximal part of the humerus, and distal part of the radius); date of surgery; comorbidities including RA, diabetes mellitus, chronic kidney disease (CKD), and chronic obstructive pulmonary disease; presence of malignant tumor; warfarin use; glucocorticoid use; and pre-and post-operative treatments for osteoporosis (bisphosphonate, selective estrogen receptor modulator, teriparatide, and denosumab). We also investigated whether the durations of follow-ups were more than 24 months. In the case that the patients had symptoms such as pain at the postoperative outpatient follow-up, radiographs were taken and the diagnoses of clinical refracture were made.

Decision Tree Model
LightGBM was used as a decision-tree-based ensemble learning algorithm designed by Microsoft Research Asia [21]. Gradient boosting is a member of the ensemble learning paradigm. The learning procedure consecutively fits new models to provide a more accurate estimate. This is aimed to construct multiple weak learners to establish a more accurate and stronger model [24]. Although this ensemble part generates highly accurate models, there are several limitations, such as the unsatisfactorily long training time and scalability [29]. Input data were in table data format, which were collected as the following: sex; age; body mass index (BMI); primary fracture site (proximal part of the femur, proximal part of the humerus, and distal part of the radius); date of surgery; comorbidities including RA, diabetes mellitus, chronic kidney disease (CKD), and chronic obstructive pulmonary disease; presence of malignant tumor; warfarin use; glucocorticoid use; and pre-and postoperative treatments for osteoporosis (bisphosphonate, selective estrogen receptor modulator, teriparatide, and denosumab). We also investigated whether the durations of follow-ups were more than 24 months. In the case that the patients had symptoms such as pain at the postoperative outpatient follow-up, radiographs were taken and the diagnoses of clinical refracture were made.

Decision Tree Model
LightGBM was used as a decision-tree-based ensemble learning algorithm designed by Microsoft Research Asia [21]. Gradient boosting is a member of the ensemble learning paradigm. The learning procedure consecutively fits new models to provide a more accurate estimate. This is aimed to construct multiple weak learners to establish a more accurate and stronger model [24]. Although this ensemble part generates highly accurate models, there are several limitations, such as the unsatisfactorily long training time and scalability [29]. To solve some of these problems, LightGBM adopts a histogram algorithm and leafwise tree growth, which identifies the best leaf with the highest gain and only splits the best leaf, resulting in an asymmetrical tree [24]. This structure successfully decreased memory occupancy and improved accuracy compared with other variants [24,30]. In medical fields, this model has been applied successfully to assess each outcome [26,27].

Feature Selection and Relative Importance
Feature selection was applied according to the implementation of LightGBM to detect the relevant features among the input data for clinical refracture. This is employed to remove redundant and irrelevant features to select the optimal feature subset [24,30]. Zhang stated that two kinds of importance types are contained in the LightGBM: one is "split", and the other is "gain" [28]. "Split" contains the number of times the feature is used in a model, whereas "gain" reflects the total gains of splits which use the feature. Different from the "split," the "gain" measures the actual decrease in node impurity. The feature rankings of gain-based importance can be obtained after LightGBM fitting [31,32], in which gain-based feature selection was adopted and relative importance was calculated.

ANN: Artificial Neural Network Model
We implemented an ANN model that consists of dense layers, also known as fully connected layers, and activation layers with dropout layers. As the details of the ANN model are shown in Figure 2, each dropout layer was set to discard its value with a probability of 0.2. The sigmoid was adopted as the final activation layer. This model was modified from a previous study in the medical fields [10].
We implemented an ANN model that consists of dense layers, also known as fully connected layers, and activation layers with dropout layers. As the details of the ANN model are shown in Figure 2, each dropout layer was set to discard its value with a probability of 0.2. The sigmoid was adopted as the final activation layer. This model was modified from a previous study in the medical fields [10].

SVM: Support Vector Machine Model
The SVM algorithm was originally proposed to construct a linear classifier in 1963 by Vapnik [33]. This algorithm aims to create a decision boundary between two classes that enables the prediction of labels from one or more feature vectors [34]. It enhances classification accuracy by plotting a multidimensional hyperplane that divides classes and increases the margins between classes [10,35]. In the medical field, it has been applied to colon cancer tissue classification using selected sequence data [36].

Implementation Details
The experiments were performed on a computer comprising CPU ® Ryzen TM 9 5950X @3.4 GHz, Advanced Micro Devices, Inc., Santa Clara, CA, USA; RAM 64 GB; and GPU NVIDIA ® GeForce RTX TM 3090, NVIDIA Corporation, Santa Clara, CA, USA.

Statistical Analysis
Categorical variables were evaluated using the chi-square test, while continuous variables (age and BMI) were analyzed using the Mann−Whitney U test since they were not regarded as corresponding with the normal distribution using the Shapiro−Wilk test. Statistical analyses were conducted using a logistic regression model with JMP Pro version 14 (SAS Institute, Inc., Cary, NC, USA). The significance level was set at p < 0.05. Table 1 shows that the enrolled patients with a mean age of 77.2 years and the ratio of females was approximately 79.7%. As the main site of fragility fractures, 73.7% of the patients had fractures at the proximal femur. Post-operatively, 28.6% of the patients had postoperative treatments for osteoporosis (bisphosphonate, selective estrogen receptor modulator, teriparatide, and denosumab). In the cohort, the duration of follow-up for more than 24 months was 39.2%, and the incidence of clinical refracture was estimated to be 4.4%.

Comparison of the Models
There were no significant differences in patient demographic data, including the incidence of clinical refractures between the training and test sets, as shown in Table 2. LightGBM had an area under the curve (AUC) of 0.75 in the test dataset, as well as an AUC of 0.90 in the training dataset, whereas the ANN had an AUC of less than 0.60 in either set, as shown in Figure 3. Because the training was not successfully conducted by the SVM, this model could not describe the receiver operating curve.

Comparison of the Models
There were no significant differences in patient demographic data, including the incidence of clinical refractures between the training and test sets, as shown in Table 2. LightGBM had an area under the curve (AUC) of 0.75 in the test dataset, as well as an AUC of 0.90 in the training dataset, whereas the ANN had an AUC of less than 0.60 in either set, as shown in Figure 3. Because the training was not successfully conducted by the SVM, this model could not describe the receiver operating curve.

Relevant Features by the LightGBM
LightGBM captured the relevant features of patients with clinical refracture during training. The higher relative importance of clinical refractures compared with no postoperative treatment, which tended to be associated with the incidence of refracture in our previous study [28], included CKD, RA, presence of malignant tumor, primary fracture site: proximal part of humerus, and warfarin use (Table 3). Glucocorticoid use scored 19.3 as relative importance, which was low compared with CKD, RA, or no post-operative treatments.

Relevant Features by the LightGBM
LightGBM captured the relevant features of patients with clinical refracture during training. The higher relative importance of clinical refractures compared with no postoperative treatment, which tended to be associated with the incidence of refracture in our previous study [28], included CKD, RA, presence of malignant tumor, primary fracture site: proximal part of humerus, and warfarin use (Table 3). Glucocorticoid use scored 19.3 as relative importance, which was low compared with CKD, RA, or no post-operative treatments.

Discussion
We suggested a prediction model for clinical refracture after fragility fracture, which was performed using LightGBM, the decision-tree-based model. This model had an AUC of approximately 0.75 for prediction in the test dataset or independent dataset, whereas the other models had an AUC of less than 0.60 or worse. Considering that assessment

Discussion
We suggested a prediction model for clinical refracture after fragility fracture, which was performed using LightGBM, the decision-tree-based model. This model had an AUC of approximately 0.75 for prediction in the test dataset or independent dataset, whereas the other models had an AUC of less than 0.60 or worse. Considering that assessment models with an AUC of 0.70−0.90 are regarded as moderate [37], our model had moderate accuracy. In addition, RA and CKD were noted as the relative features of patients with clinical refracture.
When considering the clinical application of machine learning, appropriate algorithms should be selected to ensure an acceptable outcome. Two-or three-dimensional images have been accurately analyzed by CNN, a type of ANN [13][14][15][16][17]. Our report revealed, however, that the ANN had poor accuracy, whereas the decision-tree-based model showed satisfactory performance. The input data of our study were table data, and previous reports stated the effective use of the decision tree model for the table data [22]. Hence, our results are in accordance with this previous report.
Although machine learning algorithms assess the clinical outcome precisely, there is still a gap for clinical application; the black box covers the process until the output. In short, how do we understand the decisions suggested by machine learning [38]? To overcome this challenge, the concept of explainable artificial intelligence has been proposed. This field is concerned with the development of new methods that explain and interpret machine learning models [39]. In this study, for example, LightGBM was able to reveal the basis of the assessment according to feature importance. Similarly, feature importance and explainable artificial intelligence are linked [39].
Feature importance in our study showed the several potential predictors for clinical refractures, which can be divided into two groups. First, a group contained the items already detected by previous reports [9,28]. RA and no post-operative treatments for osteoporosis correspond to this. In another perspective, our model appeared reasonable since these items were detected as the top ranked features. In contrast, the other group contained unreported factors, such as CKD.
Control of phosphorus accumulation is crucial to prevent secondary osteoporosis [40]. Decreased glomerular filtration of phosphorus and hyperphosphatemia result in abnormal bone turnover and mineralization [41]. In fact, osteoporosis is more frequent in patients with CKD than those without CKD. To prevent fragility fracture in patients with CKD, adding vitamin D, reducing phosphate intake, and evaluating whether parathyroidectomy is essential is especially important [41]. In our study, CKD was detected as the top relevant feature of clinical refracture. As this is a well-known factor for secondary osteoporosis, further investigation is needed to verify the association with refracture.
RA was also detected as a relevant feature for clinical refracture, and RA is known to be a risk factor for refracture as well as fracture [7,9]. Osteoclasts are the main cell population responsible for bone loss in patients with RA [42]. For their differentiation, they require the intervention of macrophage colony-stimulating factor and the receptor activator of nuclear factor kappa B (RANK). As an osteoporosis treatment, denosumab inhibits osteoclast activity by targeting the RANK ligand [43]. Bisphosphonates also promote the apoptosis of osteoclasts, resulting in the suppression of bone turnover [44]. These mechanisms also support that no postoperative treatments for osteoporosis were detected as relevant features.
Previous studies showed the effectiveness of the medical intervention for refracture [45]; our previous report using part of the data in this study also showed the effectiveness of post-operative treatments for osteoporosis using the general statistics [28]. Thus, this study introduced the potential predictors, the relative importance of which scored equal to or greater than no post-operative treatments for the osteoporosis. Intriguingly, glucocorticoid use scored less than no post-operative treatments. Mono-variate analysis in our original data revealed that the patients with glucocorticoid use were significantly associated with RA, as well as clinical refractures (data not shown). This indicated the possibility of RA as a confounding factor. Further analyses should be conducted to clarify the roles of glucocorticoid according to duration or accumulation.
We devised an automatic prediction model for clinical refracture after a fragility fracture. The top relevant features of the refractures were involved in secondary osteoporosis. The precise prediction and understanding of the relevant features can lead to improved individual activity and avoidance of social burden.
This study had several limitations. First, the duration of follow-up was irregular among the enrolled patients. Second, this study was conducted only in Japan. Literature about fragility fracture in Japan also revealed female dominance among the population of fragility fracture [46]. Third, the enrolled patients were patients with non-vertebral fragility fractures who were treated with surgery alone. Fourth, cardiovascular diseases among the enrolled patients were not evaluated. Instead, we analyzed a related item-warfarinwhich works against vitamin K, improving bone homeostasis and increasing bone mineral density [47,48]. Fifth, the long-term cohort (maximum: 10-year period) might have resulted in the fragility fracture and clinical refracture being less correlated.

Conclusions
We presented a prediction model with moderate accuracy for clinical refractures after fragility fractures using LightGBM, a decision-tree-based algorithm. Our report showed the effective use of this decision-tree-based algorithm for the table data format. From a clinical perspective, RA and CKD were noted as relevant features for patients with clinical refracture. Funding: This study was supported by a Grant-in-Aid for Young Scientists from the Ministry of Education, Culture, Sports, Science, and Technology of Japan (20K17984).

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the local ethics committee of Hokkaido University Hospital (017-0448).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.