Spine Fragility Fracture Prediction Using TBS and BMD in Postmenopausal Women: A Bayesian Approach

The trabecular bone score (TBS) estimates bone microarchitecture and can be used to evaluate the risk of osteoporotic fractures independently of bone mineral density (BMD). In this retrospective case-control study, we tested and compared the ability of TBS and lumbar spine BMD (LS-BMD) to predict vertebral fragility fractures. The inclusion criteria were female sex, age range 50–90 years, menopause, and clinical risk factors for osteoporosis. Patients with secondary osteoporosis were excluded. LS-BMD and TBS were measured at the L1–L4 vertebral level. The ability of the two diagnostic systems in predicting vertebral fragility fractures was assessed by combining LS-BMD and TBS according to the Bayesian “OR rule” (the diagnosis is negative only for those negative for both tests, and it is positive for those who were positive for at least one test) or to the “AND rule” (the diagnosis is positive only for those positive to both tests and is negative for those negative for at least one test). Of the 992 postmenopausal women included, 86 had a documented vertebral fragility fracture. At the cutoff value used in the present study, the TBS and LS-BMD showed a similar diagnostic ability to predict vertebral fragility fractures, having positive predictive values (PPV) of, respectively, 13.19% and 13.24%. Negative predictive values (NPV) were, respectively, 95.40% and 94.95%. Compared to that of each single diagnostic system, the “OR-rule” significantly increased the NPV to 97.89%, while no statistically significant differences were found by using the “AND-rule”. In conclusion, the present study highlights the possibility that combining LS-BMD and TBS could improve their predictive ability in diagnosing vertebral fragility fractures, and that there is a significant probability of absence of fractures in women who test negative to both diagnostic systems.


Introduction
The trabecular bone score (TBS) has recently been proposed as a new diagnostic system for assessing the risk of osteoporotic fractures [1]. With TBS, it is possible to estimate bone microarchitecture by using dual-energy X-ray absorptiometry (DXA) [2,3]. It has also been reported that the TBS predicts fracture risk independently of bone mineral density (BMD) [4][5][6]. The combined use of TBS and BMD has also been shown to be superior than each diagnostic system alone to obtain a reliable estimate of the risk of fracture [7,8], albeit with some exceptions [9]. In other studies, the TBS has been shown to differentiate fractures even in osteopenic subjects [5,[9][10][11][12][13][14], proposing itself as a useful diagnostic tool for redefining fracture risk among women who are not classified as osteoporotic by BMD, which is currently the gold-standard method for diagnosing osteoporosis [15]. A majority of papers evaluating the ability of TBS to predict fragility fractures have used frequentist statistics with odds ratio to assess the overall fracture discriminant ability of the test. Bayesian statistics, which also individually evaluates the probability of positive and negative diagnostic results associated or not with fracture [16], has rarely been used. In this study, we used a Bayesian statistical approach to investigate the ability of TBS to estimate the probability of spine fragility fractures in the overall sample of postmenopausal women included, and also in those who tested negative to the spine BMD. Finally, we evaluated whether the combination of TBS with lumbar spine BMD (LS-BMD) improved the ability of LS-BMD alone to estimate the probability of spine fragility fractures.

Materials and Methods
The data used in this retrospective case-control study were an extension of the casuistry the authors had used in a previous publication [14]. Briefly, the study enrolled women who were in-or out-patients of the Rizzoli Orthopaedic Institute, Bologna, Italy, who were referred for DXA owing to clinical risk factors for osteoporosis. Selection criteria included menopause, the presence of vertebral fragility fractures due to minor trauma, and/or the presence of clinical risk factors for osteoporosis, as well as an age range of 50-90 years. Exclusion criteria were the presence of diseases or the chronic intake of drugs known to cause secondary osteoporosis; severe obesity or thinness (BMI > 35 Kg/m 2 or <17 Kg/m 2 ); and previous fractures other than spine fragility fracture. Only women with spine fragility fractures documented by radiologic vertebral crushing were included in the study, together with women having clinical risk factors for osteoporosis but not reporting fractures. The required information was gathered from the applications filed by the physicians applying for the DXA. The results of LS-BMD (g/cm 2 ) and TBS analyses, age, age at menopause, weight and height were recorded at the time of DXA. DXA of the L1-L4 vertebrae was carried out by using the technique described by the densitometer manufacturer. Only DXA acquisitions, which were performed by Discovery QDR (Hologic.INC, Bedford, MA, USA), without accuracy errors according to the judgement of the examiner, were used in the study. Trabecular bone score iNsight (Med-Imaps TBS version 1.9.1) software was used to calculate the TBS score from the L1-L4 DXA. Statistical analyses were performed by using Bayesian statistics, focusing on the role of TBS alone or combined with the LS-BMD in the assessment of osteoporotic fragility fractures in clinical practice [17]. The research was conducted in compliance with the Declaration of Helsinki and its latest amendments [18]. The study was approved by the local Ethics Committee (Comitato Etico Area Vasta Emilia Centrale, Bologna, Italy-approval number: 0005003, date 2 April 2020).

Statistical Analysis
The data analysis was carried out by using SPSS version 11 software (SPSS/PC, Chicago, IL, USA). For the statistical elaboration, data were reported as mean and standard deviation (SD) for continuous variables. The unpaired Student's t-test was used for comparison between groups of variables selected after verifying normal distribution and homogeneity of variance (Levene test). The association of factures with diagnostic tests was obtained by using 2 × 2 contingency tables (CTs), which were subsequently used, when appropriate, to calculate the diagnostic accuracy parameters of the TBS and LS-BMD, according to the methods of Bayesian statistics. The overall accuracy of the diagnostic systems was measured by using odds ratio (OR) and 95% confidence interval (95% CI) [19]. This was calculated by using the Mantel-Haenszel test as the ratio of the fractures testing positive (positive odds) divided by the ratio of the fractures testing negative (negative odds). The Z-test was used for the OR comparison after Log transformation of the proportion. The covariates of the receiver operating characteristic (ROC) curves, having fracture as the state variable, and LS-BMD or TBS as the test variable, were used to fix the diagnostic cutoffs of each test in the study, utilizing the maximum of the Youden index. The values of the TBS or LS-BMD greater than the respective cutoffs were defined to be negative tests, and those lower than or equal to the cutoff were defined to be positive tests. The ability of the diagnostic systems to correctly classify women with fractures (discriminative power) was measured by using sensitivity (SE) and specificity (SP), together with their confidence intervals calculated by "exact" Clopper-Pearson confidence intervals [20]. The betweentests comparison of the SE and SP were carried out by comparing the likelihood ratio in pair design [21]. The ability of the diagnostic systems to predict the post-test probability of fracture (predictive ability) was assessed by calculating the positive predictive value (PPV) (% of women having spine fractures over the total number of women positive to the diagnostic test) and the negative predictive value (NPV) (i.e., the % of women not having spine fractures over the total number of women negative to the diagnostic test). Fracture prevalence (or pre-test probability) was calculated by dividing the number of women with fractures by the total number of women in the study sample. The CIs of the PPV and the NPV were calculated according to Mercaldo et al. [22]. The Kosinski test was adopted to compare the PPVs and the NPVs of the two diagnostic tests [23]. The SE, SP, PPV, and NPV resulting from the combination of the two diagnostic tests were calculated according to the "OR rule" and the "AND-rule", which were obtained by using the dedicated equations of Bayesian statistics. The "OR-rule" considers the diagnosis to be negative only for those negative for both tests, and to be positive for those who were positive for at least one diagnostic test. The "AND-rule" considers only those positive for both tests to be positive and those negative for at least one test to be negative [24]. Cohen's kappa statistics were used to assess the degree of concordance between the dichotomous qualitative variables of positivity or negativity to LS-BMD or to TBS in the study population. The correlations between the percentages of the pre-test fracture prevalence (at different pre-test fracture probability) and the PPVs of the "AND-rule" and the NPVs of the "OR-rule" were tested by linear regression.

Results
In our study, we included 992 women who met the criteria from a sample of 1513 postmenopausal women; therefore, 521 patients were excluded. The mean age of the included patients was 68.5 ± 6.8 years (range 51-90 years). Fracture prevalence was 8.67%. The comparison between those with fractures and those without fractures is reported in Table 1. The women with fractures were significantly shorter, younger at menopause, and had lower TBS and LS-BMD values. The diagnostic fracture threshold, calculated on the entire sample of women selected, corresponded to an LS-BMD value of 0.800 g/cm 2 (T-score of −2.3) and to a TBS of 1.204. All the statistical analyses in the study were carried out at those cutoffs, when appropriate.

Diagnostic Concordance between the Two Tests
In the entire study sample, the women's diagnostic classifications, which were carried out by using the TBS and LS-BMD, had poor concordance at the K Cohen Test (K 0.355; Effect Size (ES) 0.030) ( Table 2). When analyzing women with fractures and women without fractures separately, agreement between the two diagnostic tests in classifying women positive or negative was poor in the women with fractures (Cohen's K = 0.100) and in those without fractures (Cohen's K = 0.367) ( Table 3).

LS-BMD Diagnostic
3.6. Diagnostic Accuracy Measurement of the Entire Sample of Women, Combining the Two Tests According to the "AND-Rule" (Fracture Prevalence 8.67%) When carrying out the analysis by using the "AND-rule", 29.73% of the diagnostic tests were positive, and 70.27% were negative. The SE 48.84% (95% CI: 38.67-59.34) of the "AND-rule" was lower than those of the TBS (Wald test: p < 0.001) and the LS-BMD (Wald test: p < 0.001) individually. The SP of the "AND-rule" was 72.08%; (95% CI: 68.81-74.64) was greater than those of each of the two tests considered individually (in both cases, Wald test: p < 0.001). The PPV of the "AND-rule" was 14.24% (95% CI: 11.30-17.17) and the NPV was 93.69% (95% CI: 92.44-94.93). The PPV was not significantly different from those of the LS-BMD and the TBS (Wald test: p= 0.622 and p = 0.647, respectively). The NPV of the "AND rule" was lower, but not significantly so, than those of the TBS and the LS-BMD (Wald test: p = 0.091 and p = 0.177, respectively). The fracture OR 2.46; (95% CI: 1.58-3.85) of the "AND rule" combination was statistically significant (Pearson chi square test: p < 0.001); however, it was not significantly different from that of each of the two tests considered individually (Z-test on Log transformed results; LS-BMD: p = 0.305, TBS: p = 0.359). Tables 5 and 6 show and compare the PPV and NPV percentages of the "OR-rule", the "AND-rule", the LS-BMD, and the TBS. These percentages were calculated at the pre-test probability of the sample selected for the study (8.67%) and at other pre-test probability percentages from 2 to 40%. All the calculations of the post-test probability were carried out at the values of SE and SP considered in the study, using the number of women in the present sample. The two diagnostic systems combined according to the "AND-rule" had better PPV percentages than those of each diagnostic system considered individually; however, the difference was not statistically significant (Table 5 and Figure 1). The combination of the two diagnostic systems according to the "OR rule" (Table 6, Figure 2) showed the best percentages of the NPVs at each pre-test probability value considered; they were greater than those of the "AND rule" (p = 0.001, using the Kosinski method with the Bonferroni correction) at all the estimated probability percentages, and greater than those of the individual LS-BMDs and TBSs with a difference which, above the pre-test prevalence of 6%, became statistically significant (at a fracture prevalence of 6%, the statistical significance, calculated by using the Kosinski method with the Bonferroni correction, was p = 0.023 for the LS-BMD and p = 0.054 for the TBS). The R and R2 of the correlation between the percentages of the pre-test fracture prevalence and the PPVs of the "AND rule" were 0.997 (p < 0.001), and 0.994 (p = 0.001), respectively, and those of the correlation between the percentages of the pre-test fracture prevalence and the NPVs of the "OR rule" were −0.993 (p < 0.001) and −0.987 (p = 0.001), respectively. Table 5. PPV average percentage value (and relative CIs) of the LS-BMD, the TBS, the "OR rule", and the "AND rule", calculated at different fracture prevalence values and at the same SE and SP values used in the study. The comparisons of the PPVs were carried out vs. the "AND-rule", using the Kosinski method with the Bonferroni correction; there was no statistically significant comparison with the PPVs of the "AND rule", the LS-BMD, and the TBS.  The comparisons of the PPVs were carried out vs. the "AND-rule", using the Kosinski method with the Bonferroni correction; there was no statistically significant comparison with the PPVs of the "AND rule", the LS-BMD, and the TBS.    The NPV comparisons were carried out vs. the "OR rule" using the Kosinski method with the Bonferroni correction: p value: p < 0.01 with the "AND rule" for all prevalences; p = 0.023 with the LS-BMD, and p = 0.054 with the TBS at a prevalence of 6% for both diagnostic tests.

Figure 2.
Graph showing the NPV trend of the LS-BMD, the TBS, the "OR rule", and the "AND rule" (ordinates) as the fracture prevalence values varied (abscissas).

Discussion
Using Bayesian statistics, the ability to predict vertebral fragility fractures of two diagnostic systems (i.e., TBS and LS-BMD) was tested in a group of 992 post-menopausal women. Attention was focused on three points: the comparative ability of the TBS and LS-BMD used individually to predict vertebral fragility fractures, the ability of each of them to predict the presence of vertebral fractures only in women who tested negative when using other diagnostic systems, and, finally, the ability of LS-BMD and the TBS combined to predict spine fracture.
At the cutoff value used in the present study, the TBS and LS-BMD showed similar overall diagnostic ability to predict women with vertebral fracture in accordance with some studies [5,9,10]; however, it was in contrast to others [7,11,13]. The estimated SEs,

Discussion
Using Bayesian statistics, the ability to predict vertebral fragility fractures of two diagnostic systems (i.e., TBS and LS-BMD) was tested in a group of 992 post-menopausal women. Attention was focused on three points: the comparative ability of the TBS and LS-BMD used individually to predict vertebral fragility fractures, the ability of each of them to predict the presence of vertebral fractures only in women who tested negative when using other diagnostic systems, and, finally, the ability of LS-BMD and the TBS combined to predict spine fracture.
At the cutoff value used in the present study, the TBS and LS-BMD showed similar overall diagnostic ability to predict women with vertebral fracture in accordance with some studies [5,9,10]; however, it was in contrast to others [7,11,13]. The estimated SEs, SPs, PPVs, and NPVs were not significantly different between the LS-BMD and the TBS. In particular, it was found that the PPV was lower than that of the NPV in both diagnostic systems, documenting a better ability to predict true negatives than true positives. This was analogous to other reports in the literature which used Bayesian statistics [4,[20][21][22][23]25]. The relatively low pre-test prevalence of vertebral fragility fractures of the majority of this type of study could, at least in part, justify this finding. In fact, the PPV and the NPV were directly and indirectly related to the pre-test probability of disease, respectively [26]; this influenced their respective predictive abilities.
After using the K Cohen test to verify that the diagnostic concordance between the TBS and the LS-BMD was poor, and that, therefore, the LS-BMD and the TBS predicted a part of fractures differently, the ability of the TBS to predict fractures among women testing negative to LS-BMD was evaluated.
The TBS classified 31.59% of the women who tested negative to LS-BMD as positive; of these, 23.3% sustained fractures. These data confirmed those of other authors regarding the overall ability of the TBS to predict fractures, even among women having an LS-BMD outside the range of osteoporosis [7,[9][10][11]27]. In addition to the PPVs and NPVs of the TBS, it was found that, by analyzing the women who tested negative to the LS-BMD, other data in the literature were confirmed. Moreover, calculated at the same fracture prevalence, they were similar to those reported by Albrecht W. et al. [4] regarding osteopenic women. Similar results were also obtained by testing the TBS-negative women with LS-BMD, confirming the diversity of bone structural factors which could be detected by using each single method.
The fact that the TBS classified some women with fractures who were LS-BMD negative as positive led to the belief that, when combining LS-BMD and the TBS, a larger number of women with fractures would be detected as positive. By so doing, women could be considered to be positive when they were positive to only one or the other of the two diagnostic systems. This effectively occurred when the Bayesian "OR rule" was applied. In fact, when using the "OR rule" in the present sample, it was observed that the SE value became greater than those of each individual diagnostic system, indicating that combining LS-BMD and the TBS allowed classifying a larger percentage of women with fractures as positive. However, it should be noted that the improved sensitivity for fractures resulting from the combined use of LS-BMD and the TBS was not necessarily associated with an improvement in their ability to differentiate those with fractures from those without [28]. In fact, in the present study, the PPV of the "OR rule" was even lower than that of the LS-BMD since the false positive rate increased parallelly. The total diagnostic performance of the "OR-rule", however, was better than that of each individual diagnostic system in agreement with other studies [5,7], and its OR ratio even approached statistical significance regarding the LS-BMD. The high percentage of the NPV value, which was statistically significant versus both the LS-BMD and the TBS certainly contributed to the better total diagnostic performance of the two systems combined according the "OR rule". When using the "OR rule", the NPV became reliable enough to consider the presence of spine fragility fractures in women who were negative to both diagnostic systems highly improbable. Unfortunately, the "AND-rule" did not significantly improve the overall diagnostic ability of the two diagnostic systems considered individually, nor did it improve their positive and negative post-test fracture probability. In addition, it should be noted that the SE of the "AND rule" was also lower than that of each individual test, leading to think that the combined positivity to the LS-BMD and the TBS was not useful in improving the ability of each diagnostic system to detect fractures.
The PPV and NPV of the TBS, the LS-BMD and of their combination at different fracture pre-test probabilities were then calculated to look at the resulting variations of those parameters. The "OR rule" showed that the NPV of the women who are negative to both diagnostic systems was higher than those estimated by the LS-BMD or the TBS individually at all the fracture prevalence values tested, with a gap above the pretest probability of 6%, which became statistically significant (Figure 2). Notably the predictive ability of the NPV of the "OR rule" had statistically significant values for fracture prevalence greater than those generally reported in free-living postmenopausal women [29]. On the contrary, the "AND rule" showed that women positive to both diagnostic systems had low PPV values at all the fracture prevalences in the present study; this did not significantly change those estimated by each individual diagnostic system.
To the authors' knowledge, the only study in the literature which allowed a comparison with the present data regarding the percentage of PPVs and NPVs when the LS-BMD and the TBS were combined according to Bayesian statistics was that of Nassar et al. [10]. The present results confirmed those of these authors. In fact, the present data, recalculated at the pre-test probability of the Nassar study for homogeneity of comparison, showed similar percentages of the NPV at the "OR-rule" in women testing negative and of the PPV at the "AND-rule" for those testing positive.
In summary, according to the present data, LS-BMD and the TBS predicted fracture to a similar extent; however, the identification of true-positive women was partially discordant between the two diagnostic systems. Compared to that of each diagnostic system, according to the "OR rule", their combination allowed us to identify a greater number of women with vertebral fractures (increase SE); however, at the same time, there was a greater number of false positives, leading to a low rate of correct positive-fracture prediction (PPV did not increase). Using the "AND rule", the women positive to both diagnostic systems did not have a better sensitivity or positive predictive value for fracture than those who tested positive at each individual diagnostic system; finally, the probability of not being fractured in women testing negative to both diagnostic systems was greater than those estimated by LS-BMD and the TBS individually.
Since the simultaneous negativity to both diagnostic systems gave a strong probability of fracture absence, it appeared that, when searching for women negative to both diagnostic systems, the TBS can simply be used as a second investigation in subjects negative to LS-BMD in order to confirm their negativity and their low probability of fracture [5].
The present study had several limitations. In fact, it was limited to vertebral fractures diagnosed following investigations required for clinical symptoms already in place. It was not accompanied by other analyses aimed at ascertaining the metabolism of the bone, and it was carried out on DXA women who were referred due to having risk factors for osteoporosis.

Conclusions
In conclusion, the present study highlights the possibility that combining LS-BMD and TBS could improve the overall predictive ability of DXA in diagnosing vertebral fragility fractures and that there is a significant probability of absence of fractures in women who test negative to both diagnostic systems. Additional research is, nevertheless, necessary to confirm the present results.

Institutional Review Board Statement:
The study was approved by the local Ethics Committee (Comitato Etico Area Vasta Emilia Centrale, Bologna, Italy-approval number: 0005003, date 2 April 2020).

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author, J.C., upon reasonable request.