# Statistical Models for High-Risk Intestinal Metaplasia with DNA Methylation Profiling

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. epiTOC2 Model and TNSC Covariate

#### 2.2. Multinomial Mixed-Link Models

#### 2.3. Model Selection and Evaluation

## 3. Results

#### 3.1. Statistical Model Selection for Predicting IM Based on TNSC

#### 3.2. Statistical Model Selection for Predicting IM Based on TNSC and Gastric Atrophy

#### 3.3. Statistical Model Selection after Removing Unknown and Marked Categories

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

AIC | Akaike information criterion |

BIC | Bayesian information criterion |

CE | cross entropy |

CpG | 5’—C—phosphate—G—3’ sequence of nucleotides |

cloglog | complementary log-log link |

DNA | deoxyribonucleic acid |

GCEP | Gastric Cancer Epidemiology Program |

ID | identifier |

IM | intestinal metaplasia |

loglog | log-log link |

MIM | mild intestinal metaplasia |

MLE | maximum likelihood estimate |

npo | non-proportional odds assumption |

po | proportional odds assumption |

PRC2 | polycomb repressive complex-2 |

TNSC | total number of stem cell divisions |

## Appendix A. AIC and BIC Values of Multinomial Mixed-Link Models Using TNSC for Predicting IM

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.49 | 159.78 | 147.84 | 159.12 | 147.09 | 158.37 | - | - |

probit | 148.95 | 160.23 | 148.27 | 159.56 | 147.47 | 158.75 | - | - |

loglog | 148.83 | 160.11 | 148.11 | 159.39 | 146.69 | 157.97 | - | - |

cloglog | - | - | - | - | - | - | - | - |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.69 | 159.97 | 147.69 | 158.97 | 156.46 | 167.74 | - | - |

probit | 148.35 | 159.63 | 147.39 | 158.67 | 149.97 | 161.25 | - | - |

loglog | 146.24 | 157.52 | 145.53 | 156.81 | 147.10 | 158.38 | - | - |

cloglog | - | - | - | - | - | - | - | - |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.49 | 159.78 | 147.87 | 159.15 | 146.83 | 158.11 | 149.47 | 160.75 |

probit | 148.54 | 159.82 | 147.92 | 159.20 | 146.90 | 158.18 | 149.51 | 160.79 |

loglog | 147.65 | 158.93 | 147.01 | 158.29 | 145.97 | 157.25 | 148.56 | 159.85 |

clog log | 150.20 | 161.49 | 149.62 | 160.90 | 148.71 | 159.99 | 151.17 | 162.46 |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.95 | 160.23 | 148.30 | 159.58 | 147.28 | 158.56 | 149.81 | 161.09 |

probit | 148.55 | 159.84 | 147.91 | 159.19 | 146.88 | 158.16 | 149.41 | 160.70 |

loglog | 147.61 | 158.89 | 146.96 | 158.24 | 145.93 | 157.21 | 148.47 | 159.75 |

clog log | 151.95 | 163.23 | 151.30 | 162.58 | 150.27 | 161.55 | 152.81 | 164.09 |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 173.33 | 181.79 | 202.33 | 210.79 | 197.84 | 206.30 | 188.22 | 196.68 |

probit | 151.12 | 159.58 | 172.76 | 181.22 | 164.81 | 173.27 | 162.85 | 171.31 |

loglog | 159.04 | 167.50 | 180.71 | 189.17 | 176.63 | 185.09 | 170.09 | 178.55 |

clog log | 156.83 | 165.29 | 176.00 | 184.46 | 169.15 | 177.61 | 166.43 | 174.89 |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 150.23 | 158.70 | 194.90 | 203.36 | 204.72 | 213.18 | - | - |

probit | 147.75 | 156.21 | 149.25 | 157.71 | 155.53 | 163.99 | - | - |

loglog | 144.29 | 152.75 | 148.81 | 157.27 | 147.08 | 155.54 | - | - |

cloglog | - | - | - | - | - | - | - | - |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.87 | 157.33 | 153.58 | 162.04 | 153.82 | 162.28 | 151.66 | 160.12 |

probit | 146.57 | 155.03 | 148.59 | 157.05 | 148.14 | 156.60 | 148.17 | 156.63 |

loglog | 146.33 | 154.79 | 149.87 | 158.34 | 149.56 | 158.02 | 148.63 | 157.09 |

clog log | 148.21 | 156.67 | 150.03 | 158.49 | 149.74 | 158.20 | 149.68 | 158.14 |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 154.43 | 162.89 | 167.30 | 175.76 | 165.20 | 173.66 | 162.05 | 170.51 |

probit | 147.39 | 155.85 | 153.58 | 162.04 | 152.13 | 160.59 | 150.99 | 159.45 |

loglog | 146.96 | 155.42 | 153.40 | 161.86 | 151.96 | 160.43 | 150.83 | 159.29 |

cloglog | 152.17 | 160.63 | 161.47 | 169.94 | 159.67 | 168.13 | 157.42 | 165.88 |

**Figure 2.**Predictive probabilities ${\widehat{\pi}}_{ij}$ based on Model 1 against true response labels (left panel: $j=1$; middle panel: $j=2$; right panel: $j=3$).

**Figure 3.**Boxplots of cross-entropy loss of Model 1 and Model 2 based on ten-fold cross-validations with ten random partitions.

**Figure 7.**Boxplots of cross-entropy loss (on 98 Samples only) of Models 1, 2, and 3 based on ten-fold cross-validations with ten random partitions.

**Figure 8.**Boxplots of cross-entropy loss (on 26 Samples only) of Models 1 and 2 based on ten-fold cross-validations with ten random partitions.

Model | Best Link | AIC | BIC |
---|---|---|---|

Baseline-category npo | loglog, loglog | 146.69 | 157.97 |

Cumulative npo | loglog, probit | 145.53 | 156.81 |

Adjacent-categories npo | loglog, loglog | 145.97 | 157.25 |

Continuation-ratio npo | loglog, loglog | 145.93 | 157.21 |

Baseline-category po | probit, logit | 151.12 | 159.58 |

Cumulative po | loglog, logit | 144.29 | 152.75 |

Adjacent-categories po | loglog, logit | 146.33 | 154.79 |

Continuation-ratio po | loglog, logit | 146.96 | 155.42 |

Model | Best Link | AIC | BIC |
---|---|---|---|

Baseline-category npo | logit, probit | 109.95 | 143.79 |

Cumulative npo | loglog, logit | 109.20 | 143.04 |

Adjacent-categories npo | logit, logit | 109.97 | 143.81 |

Continuation-ratio npo | logit, logit | 110.97 | 144.82 |

Baseline-category po | probit, logit | 111.31 | 131.05 |

Cumulative po | probit, probit | 110.03 | 129.77 |

Adjacent-categories po | logit, logit | 108.89 | 128.63 |

Continuation-ratio po | probit, probit | 109.32 | 129.06 |

Model | Best Link | AIC | BIC |
---|---|---|---|

Baseline-category npo | logit, probit | 81.43 | 102.11 |

Cumulative npo | probit, probit | 84.29 | 104.97 |

Adjacent-categories npo | logit, probit | 83.22 | 103.90 |

Continuation-ratio npo | logit, probit | 83.56 | 104.24 |

Baseline-category po | probit, logit | 82.39 | 95.32 |

Cumulative po | probit, probit | 77.99 | 90.92 |

Adjacent-categories po | probit, probit | 77.56 | 90.48 |

Continuation-ratio po | probit, probit | 77.77 | 90.69 |

