# Probabilistic Pairwise Model Comparisons Based on Bootstrap Estimators of the Kullback–Leibler Discrepancy

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Methodological Development

#### 2.1. Background

#### 2.2. The Discrepancy Comparison Probability and Bootstrap Discrepancy Comparison Probability

## 3. Bias Corrections for the BDCP

**Lemma**

**1.**

**Proof.**

## 4. Simulation Studies

#### 4.1. Settings for Simulation Sets

- Set 1: Null hypothesis is correctly specified, and alternative hypothesis is overspecified.

- Set 2: Null hypothesis is underspecified, and alternative hypothesis is correctly specified.

- Set 3: Both null and alternative models are underspecified, but the null is closer to the data-generating model.

- Set 4: Both null and alternative models are equally underspecified.

- Set 5: Null model has correct mean specification and alternative model is overspecified, but both are misspecified with respect to the error distribution, which is a Student’s t distribution.

- Set 6: Null model has correct mean specification, and the alternative model is overspecified, but both are misspecified with respect to the error distribution, which is a mixture of normals.

#### 4.2. KLDCP Estimates from Simulations

- (1)
- (2)
**BDCPb**corresponds to results based on the distribution of 5000 replicates of BDCPb. Each BDCPb is computed using (6) with 200 bootstrap samples for Sets 1–5 and 500 bootstrap samples for Set 6.- (3)
**BDCPk**corresponds to results based on the distribution of 5000 replicates of BDCPk. Each BDCPk is computed using (7) with 200 bootstrap samples for Sets 1–5 and 500 bootstrap samples for Set 6.- (4)
**BDCP**corresponds to results based on the distribution of 5000 replicates of the uncorrected BDCP. Each BDCP is computed using (1) with 200 bootstrap samples for Sets 1–5 and 500 bootstrap samples for Set 6.

#### 4.3. Estimates of the Expected KLD from Simulations

- (1)
**E(KLD)**corresponds to the average of 5000 discrepancies calculated using (9).- (2)
**E(BD)**corresponds to the average of 5000 replicates of BD, where each BD is calculated by$$\frac{1}{M}\sum _{m=1}^{M}-2\ell \left({\widehat{\theta}}^{*}\left(m\right)\right|y).$$We have that $M=200$ for Sets 1–5 and $M=500$ for Set 6.- (3)
- $\mathbf{\Delta}$
**BDb**corresponds to the difference between the estimate of E(BD), with each BD corrected by ${k}_{b}$ and the estimate of E(KLD) described in (1). In other words, if we let $j\in \{1,2\dots ,5000\}$ be the number of simulated data sets, ${\tilde{BD}}_{j}$ be the BD estimate for each data set j, and ${k}_{jb}$ be the ${k}_{b}$ correction for data set j, then$$\Delta \mathrm{BDb}=\frac{1}{5000}\sum _{j=1}^{5000}\left(\right)open="["\; close="]">{\tilde{BD}}_{j}+{k}_{jb}$$ - (4)
- $\mathbf{\Delta}$
**BDk**shows the same difference described in (3), but using k instead of ${k}_{b}$, which results in$$\Delta \mathrm{BDk}=\frac{1}{5000}\sum _{j=1}^{5000}\left(\right)open="["\; close="]">{\tilde{BD}}_{j}+k$$

#### 4.4. Discussion of Simulation Results

## 5. Application: Creatine Kinase Levels during Football Preseason

#### 5.1. Overview of Application

- Setting 1: Testing the propriety of the model containing $CK1$.$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& {H}_{1}:CK3={\beta}_{1},\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& {H}_{2}:CK3={\beta}_{1}+{\beta}_{2}\phantom{\rule{0.166667em}{0ex}}CK1.\hfill \end{array}$$
- Setting 2: Testing the propriety of the model containing $CK1$ and $Semesters$ over the model containing only $CK1$.$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& {H}_{1}:CK3={\beta}_{1}+{\beta}_{2}\phantom{\rule{0.166667em}{0ex}}CK1,\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& {H}_{2}:CK3={\beta}_{1}+{\beta}_{2}\phantom{\rule{0.166667em}{0ex}}CK1+{\beta}_{3}\phantom{\rule{0.166667em}{0ex}}Semesters.\hfill \end{array}$$
- Setting 3: Head-to-head comparison of non-nested models.$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& {H}_{1}:CK3={\beta}_{1}+{\beta}_{2}\phantom{\rule{0.166667em}{0ex}}CK1+{\beta}_{3}\phantom{\rule{0.166667em}{0ex}}BMI,\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& {H}_{2}:CK3={\beta}_{1}+{\beta}_{2}\phantom{\rule{0.166667em}{0ex}}CK1+{\beta}_{3}\phantom{\rule{0.166667em}{0ex}}Semesters.\hfill \end{array}$$

#### 5.2. Results of Application

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Riedle, B.; Neath, A.; Cavanaugh, J.E. Reconceptualizing the p-Value From a Likelihood Ratio Test: A Probabilistic Pairwise Comparison of Models Based on Kullback-Leibler Discrepancy Measures. J. Appl. Stat.
**2020**, 47, 13–15. [Google Scholar] [CrossRef] [PubMed] - Efron, B.; Tibshirani, R. An Introduction to the Bootstrap, 2nd ed.; Chapman Hall: New York, NY, USA, 1993; pp. 31–37. [Google Scholar]
- Efron, B. Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation. J. Am. Stat. Assoc.
**1983**, 78, 316–331. [Google Scholar] [CrossRef] - Efron, B. How Biased is the Apparent Error Rate of a Prediction Rule? J. Am. Stat. Assoc.
**1986**, 81, 461–470. [Google Scholar] [CrossRef] - Cavanaugh, J.E.; Shumway, R.H. A Bootstrap Variant of AIC for State-Space Model Selection. Stat. Sin.
**1997**, 7, 473–496. [Google Scholar] - Hurvich, C.M.; Tsai, C. Regression and Time Series Model Selection in Small Samples. Biometrika
**1989**, 76, 297–307. [Google Scholar] [CrossRef] - Torres, P.; Helmstetter, J.; Kaye, A.M.; Kaye, A.D. Rhabdomyolysis: Pathogenesis, Diagnosis, and Treatment. Ochsner J. Spring
**2015**, 15, 58–69. [Google Scholar] - Smoot, M.K.; Cavanaugh, J.E.; Amendola, A.; West, D.R.; Herwaldt, L.A. Creatine Kinase Levels During Preseason Camp in National Collegiate Athletic Association Division I Football Athletes. Clin. J. Sport Med.
**2014**, 5, 438–440. [Google Scholar] [CrossRef] [PubMed] - Vasquez, C.R.; DiSanto, T.; Reilly, J.P.; Forker, C.M.; Holena, D.N.; Wu, Q.; Lanken, P.N.; Christie, J.D.; Shashaty, M.G.S. Relationship of Body Mass Index, Serum Creatine Kinase, and Acute Kidney Injury After Severe Trauma. J. Trauma Acute Care Surg.
**2020**, 89, 179–185. [Google Scholar] [CrossRef] [PubMed]

**Table 1.**Distribution approximations for Set 1, where the null model is correctly specified, while the alternative model is overspecified.

Statistic | KLDCP | BDCPb | BDCPk | BDCP |
---|---|---|---|---|

N = 500 | ||||

Mean | 1.000 | 0.878 | 0.868 | 0.515 |

Median | 1.000 | 1.000 | 1.000 | 0.515 |

SD | 0.000 | 0.233 | 0.241 | 0.282 |

N = 100 | ||||

Mean | 1.000 | 0.918 | 0.864 | 0.564 |

Median | 1.000 | 1.000 | 0.995 | 0.580 |

SD | 0.000 | 0.186 | 0.225 | 0.256 |

N = 50 | ||||

Mean | 1.000 | 0.966 | 0.875 | 0.631 |

Median | 1.000 | 1.000 | 0.980 | 0.650 |

SD | 0.000 | 0.111 | 0.193 | 0.220 |

N = 25 | ||||

Mean | 1.000 | 0.999 | 0.886 | 0.739 |

Median | 1.000 | 1.000 | 0.955 | 0.755 |

SD | 0.000 | 0.012 | 0.144 | 0.156 |

**Table 2.**Distribution approximations for Set 2, where the null model is underspecified, while the alternative model is correctly specified.

Statistic | KLDCP | BDCPb | BDCPk | BDCP |
---|---|---|---|---|

N = 500 | ||||

Mean | 0.001 | 0.022 | 0.021 | 0.011 |

Median | 0.001 | 0.000 | 0.000 | 0.000 |

SD | 0.000 | 0.088 | 0.085 | 0.043 |

N = 100 | ||||

Mean | 0.156 | 0.470 | 0.428 | 0.264 |

Median | 0.156 | 0.340 | 0.280 | 0.170 |

SD | 0.005 | 0.390 | 0.378 | 0.257 |

N = 50 | ||||

Mean | 0.372 | 0.691 | 0.597 | 0.409 |

Median | 0.372 | 0.905 | 0.630 | 0.360 |

SD | 0.007 | 0.350 | 0.354 | 0.266 |

N = 25 | ||||

Mean | 0.617 | 0.890 | 0.698 | 0.536 |

Median | 0.617 | 0.990 | 0.785 | 0.535 |

SD | 0.006 | 0.213 | 0.280 | 0.222 |

**Table 3.**Distribution approximations for Set 3, where the null and alternative models are underspecified, but the null model is closer to the true data-generating model.

Statistic | KLDCP | BDCPb | BDCPk | BDCP |
---|---|---|---|---|

N = 500 | ||||

Mean | 1.000 | 1.000 | 1.000 | 1.000 |

Median | 1.000 | 1.000 | 1.000 | 1.000 |

SD | 0.000 | 0.013 | 0.013 | 0.013 |

N = 100 | ||||

Mean | 0.979 | 0.910 | 0.910 | 0.910 |

Median | 0.979 | 1.000 | 1.000 | 1.000 |

SD | 0.002 | 0.244 | 0.244 | 0.244 |

N = 50 | ||||

Mean | 0.916 | 0.807 | 0.808 | 0.808 |

Median | 0.916 | 0.970 | 0.970 | 0.970 |

SD | 0.004 | 0.311 | 0.309 | 0.309 |

N = 25 | ||||

Mean | 0.804 | 0.692 | 0.699 | 0.699 |

Median | 0.805 | 0.845 | 0.840 | 0.840 |

SD | 0.005 | 0.314 | 0.303 | 0.303 |

**Table 4.**Distribution approximations for Set 4, where the null and alternative models are equally underspecified.

Statistic | KLDCP | BDCPb | BDCPk | BDCP |
---|---|---|---|---|

N = 500 | ||||

Mean | 0.498 | 0.507 | 0.507 | 0.507 |

Median | 0.498 | 0.570 | 0.580 | 0.580 |

SD | 0.007 | 0.478 | 0.478 | 0.478 |

N = 100 | ||||

Mean | 0.500 | 0.510 | 0.509 | 0.509 |

Median | 0.500 | 0.562 | 0.567 | 0.567 |

SD | 0.007 | 0.442 | 0.442 | 0.442 |

N = 50 | ||||

Mean | 0.500 | 0.502 | 0.502 | 0.502 |

Median | 0.500 | 0.505 | 0.515 | 0.515 |

SD | 0.007 | 0.407 | 0.406 | 0.406 |

N = 25 | ||||

Mean | 0.501 | 0.501 | 0.501 | 0.501 |

Median | 0.501 | 0.490 | 0.495 | 0.495 |

SD | 0.007 | 0.353 | 0.345 | 0.345 |

**Table 5.**Distribution approximations for Set 5, where the null and alternative models are misspecified with respect to the error distribution. Here, the errors are generated from a Student’s t distribution.

Statistic | KLDCP | BDCPb | BDCPk | BDCP |
---|---|---|---|---|

N = 500 | ||||

Mean | 1.000 | 0.794 | 0.794 | 0.499 |

Median | 1.000 | 1.000 | 1.000 | 0.500 |

SD | 0.000 | 0.329 | 0.328 | 0.289 |

N = 100 | ||||

Mean | 1.000 | 0.807 | 0.794 | 0.507 |

Median | 1.000 | 1.000 | 1.000 | 0.515 |

SD | 0.000 | 0.318 | 0.323 | 0.284 |

N = 50 | ||||

Mean | 1.000 | 0.825 | 0.790 | 0.508 |

Median | 1.000 | 1.000 | 0.995 | 0.505 |

SD | 0.000 | 0.301 | 0.315 | 0.273 |

N = 25 | ||||

Mean | 1.000 | 0.862 | 0.790 | 0.525 |

Median | 1.000 | 1.000 | 0.985 | 0.530 |

SD | 0.000 | 0.270 | 0.306 | 0.261 |

**Table 6.**Distribution approximations for Set 6, where the null and alternative models are misspecified with respect to the error distribution. Here, the errors are generated from a mixture of normal distributions.

Statistic | KLDCP | BDCPb | BDCPk | BDCP |
---|---|---|---|---|

N = 500 | ||||

Mean | 1.000 | 0.783 | 0.786 | 0.487 |

Median | 1.000 | 1.000 | 1.000 | 0.484 |

SD | 0.000 | 0.338 | 0.335 | 0.289 |

N = 100 | ||||

Mean | 1.000 | 0.808 | 0.793 | 0.495 |

Median | 1.000 | 1.000 | 0.998 | 0.496 |

SD | 0.000 | 0.322 | 0.325 | 0.283 |

N = 50 | ||||

Mean | 1.000 | 0.851 | 0.793 | 0.502 |

Median | 1.000 | 1.000 | 0.994 | 0.494 |

SD | 0.000 | 0.286 | 0.311 | 0.269 |

N = 25 | ||||

Mean | 1.000 | 0.906 | 0.787 | 0.509 |

Median | 1.000 | 1.000 | 0.986 | 0.490 |

SD | 0.000 | 0.229 | 0.300 | 0.246 |

**Table 7.**Expected value of the KLD, its bootstrap estimate, and the bias of the corrected bootstrap estimates for the null and alternative models in Set 1. Here, the null model is correctly specified, while the alternative model is overspecified.

Hypothesis | E(KLD) | E(BD) | $\mathbf{\Delta}$BDb | $\mathbf{\Delta}$BDk |
---|---|---|---|---|

N = 500 | ||||

Null | 3378.949 | 3375.407 | 0.488 | 0.411 |

Alternative | 3383.138 | 3375.578 | 0.686 | 0.362 |

N = 100 | ||||

Null | 679.282 | 675.291 | 0.385 | −0.030 |

Alternative | 684.115 | 676.667 | 2.518 | 0.521 |

N = 50 | ||||

Null | 342.167 | 338.498 | 1.267 | 0.268 |

Alternative | 348.245 | 342.348 | 7.476 | 2.065 |

N = 25 | ||||

Null | 174.334 | 171.169 | 3.657 | 0.910 |

Alternative | 183.828 | 193.249 | 43.328 | 17.290 |

**Table 8.**Expected value of the KLD, its bootstrap estimate, and the bias of the corrected bootstrap estimates for the null and alternative models in Set 2. Here, the null model is underspecified, while the alternative model is correctly specified.

Hypothesis | E(KLD) | E(BD) | $\mathbf{\Delta}$BDb | $\mathbf{\Delta}$BDk |
---|---|---|---|---|

N = 500 | ||||

Null | 3340.491 | 3335.733 | 0.410 | 0.290 |

Alternative | 3328.467 | 3322.581 | 0.319 | 0.143 |

N = 100 | ||||

Null | 672.373 | 667.928 | 1.210 | 0.520 |

Alternative | 671.137 | 665.628 | 1.493 | 0.454 |

N = 50 | ||||

Null | 339.515 | 334.726 | 1.891 | 0.226 |

Alternative | 339.923 | 334.181 | 2.888 | 0.305 |

N = 25 | ||||

Null | 174.136 | 171.376 | 7.446 | 2.223 |

Alternative | 176.073 | 174.320 | 13.270 | 4.106 |

**Table 9.**Expected value of the KLD, its bootstrap estimate, and the bias of the corrected bootstrap estimates for the null and alternative models in Set 3. Here, the null and alternative models are underspecified, but the null model is closer to the true data-generating model.

Hypothesis | E(KLD) | E(BD) | $\mathbf{\Delta}$BDb | $\mathbf{\Delta}$BDk |
---|---|---|---|---|

N = 500 | ||||

Null | 3726.902 | 3726.159 | 3.401 | 3.332 |

Alternative | 3832.770 | 3832.395 | 3.704 | 3.626 |

N = 100 | ||||

Null | 745.967 | 745.809 | 4.358 | 3.943 |

Alternative | 766.212 | 766.813 | 4.947 | 4.528 |

N = 50 | ||||

Null | 373.419 | 373.704 | 5.309 | 4.325 |

Alternative | 383.156 | 384.020 | 5.843 | 4.858 |

N = 25 | ||||

Null | 187.563 | 188.745 | 8.082 | 5.245 |

Alternative | 191.924 | 194.082 | 8.878 | 6.088 |

**Table 10.**Expected value of the KLD, its bootstrap estimate, and the bias of the corrected bootstrap estimates for the null and alternative models in Set 4. Here, the null and alternative models are equally underspecified.

Hypothesis | E(KLD) | E(BD) | $\mathbf{\Delta}$BDb | $\mathbf{\Delta}$BDk |
---|---|---|---|---|

N = 500 | ||||

Null | 3923.423 | 3923.908 | 5.022 | 4.948 |

Alternative | 3923.580 | 3924.705 | 5.475 | 5.399 |

N = 100 | ||||

Null | 784.021 | 784.917 | 5.080 | 4.670 |

Alternative | 784.042 | 785.026 | 5.241 | 4.823 |

N = 50 | ||||

Null | 391.751 | 393.155 | 6.335 | 5.343 |

Alternative | 391.753 | 393.131 | 6.222 | 5.239 |

N = 25 | ||||

Null | 195.732 | 198.616 | 9.602 | 6.821 |

Alternative | 195.862 | 198.690 | 9.598 | 6.804 |

**Table 11.**Expected value of the KLD, its bootstrap estimate, and the bias of the corrected bootstrap estimates for the null and alternative models in Set 5. Here, the null and alternative models are misspecified with respect to the error distribution, and the errors are generated from a Student’s t distribution.

Hypothesis | E(KLD) | E(BD) | $\mathbf{\Delta}$BDb | $\mathbf{\Delta}$BDk |
---|---|---|---|---|

N = 500 | ||||

Null | 1678.652 | 1672.369 | −2.224 | −4.178 |

Alternative | 1679.695 | 1672.387 | −2.248 | −4.231 |

N = 100 | ||||

Null | 338.728 | 334.154 | −0.920 | −2.471 |

Alternative | 339.866 | 334.300 | −0.728 | −2.438 |

N = 50 | ||||

Null | 171.377 | 167.500 | −0.231 | −1.839 |

Alternative | 172.640 | 167.847 | 0.283 | −1.714 |

N = 25 | ||||

Null | 87.689 | 83.577 | −0.434 | −2.077 |

Alternative | 89.311 | 84.495 | 0.869 | −1.785 |

**Table 12.**Expected value of the KLD, its bootstrap estimate, and the bias of the corrected bootstrap estimates for the null and alternative models in Set 6. Here, the null and alternative models are misspecified with respect to the error distribution, and the errors are generated from a mixture of normal distributions.

Hypothesis | E(KLD) | E(BD) | $\mathbf{\Delta}$BDb | $\mathbf{\Delta}$BDk |
---|---|---|---|---|

N = 500 | ||||

Null | 2488.932 | 2480.154 | −0.389 | 6.554 |

Alternative | 2490.012 | 2480.141 | −0.310 | 6.659 |

N = 100 | ||||

Null | 508.122 | 497.000 | −0.383 | 8.404 |

Alternative | 509.426 | 497.237 | −0.597 | 8.459 |

N = 50 | ||||

Null | 263.382 | 252.424 | −2.852 | 8.590 |

Alternative | 264.974 | 253.245 | −3.930 | 8.361 |

N = 25 | ||||

Null | 144.895 | 131.870 | −4.361 | 10.842 |

Alternative | 147.551 | 134.298 | −7.782 | 9.956 |

**Table 13.**From left to right: results for Setting 1, Setting 2, and Setting 3. BDCPk is the BDCP corrected by k, BDCPb is the BDCP corrected by ${k}_{b}$, and BDCP is the uncorrected BDCP. Results are based on 200 bootstraps samples.

BDCP | |||||
---|---|---|---|---|---|

BDCPk | 0.075 | BDCPk | 0.990 | BDCPk | 0.815 |

BDCPb | 0.075 | BDCPb | 0.995 | BDCPb | 0.780 |

BDCP | 0.055 | BDCP | 0.495 | BDCP | 0.815 |

p-Value | |||||

CK1 | 0.001 | CK1 | 0.001 | CK1 | 0.001 |

Semesters | 0.734 | BMI | 0.176 | ||

Semesters | 0.936 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Dajles, A.; Cavanaugh, J.
Probabilistic Pairwise Model Comparisons Based on Bootstrap Estimators of the Kullback–Leibler Discrepancy. *Entropy* **2022**, *24*, 1483.
https://doi.org/10.3390/e24101483

**AMA Style**

Dajles A, Cavanaugh J.
Probabilistic Pairwise Model Comparisons Based on Bootstrap Estimators of the Kullback–Leibler Discrepancy. *Entropy*. 2022; 24(10):1483.
https://doi.org/10.3390/e24101483

**Chicago/Turabian Style**

Dajles, Andres, and Joseph Cavanaugh.
2022. "Probabilistic Pairwise Model Comparisons Based on Bootstrap Estimators of the Kullback–Leibler Discrepancy" *Entropy* 24, no. 10: 1483.
https://doi.org/10.3390/e24101483