# Statistical Models for High-Risk Intestinal Metaplasia with DNA Methylation Profiling

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. epiTOC2 Model and TNSC Covariate

#### 2.2. Multinomial Mixed-Link Models

#### 2.3. Model Selection and Evaluation

## 3. Results

#### 3.1. Statistical Model Selection for Predicting IM Based on TNSC

#### 3.2. Statistical Model Selection for Predicting IM Based on TNSC and Gastric Atrophy

#### 3.3. Statistical Model Selection after Removing Unknown and Marked Categories

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

AIC | Akaike information criterion |

BIC | Bayesian information criterion |

CE | cross entropy |

CpG | 5’—C—phosphate—G—3’ sequence of nucleotides |

cloglog | complementary log-log link |

DNA | deoxyribonucleic acid |

GCEP | Gastric Cancer Epidemiology Program |

ID | identifier |

IM | intestinal metaplasia |

loglog | log-log link |

MIM | mild intestinal metaplasia |

MLE | maximum likelihood estimate |

npo | non-proportional odds assumption |

po | proportional odds assumption |

PRC2 | polycomb repressive complex-2 |

TNSC | total number of stem cell divisions |

## Appendix A. AIC and BIC Values of Multinomial Mixed-Link Models Using TNSC for Predicting IM

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.49 | 159.78 | 147.84 | 159.12 | 147.09 | 158.37 | - | - |

probit | 148.95 | 160.23 | 148.27 | 159.56 | 147.47 | 158.75 | - | - |

loglog | 148.83 | 160.11 | 148.11 | 159.39 | 146.69 | 157.97 | - | - |

cloglog | - | - | - | - | - | - | - | - |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.69 | 159.97 | 147.69 | 158.97 | 156.46 | 167.74 | - | - |

probit | 148.35 | 159.63 | 147.39 | 158.67 | 149.97 | 161.25 | - | - |

loglog | 146.24 | 157.52 | 145.53 | 156.81 | 147.10 | 158.38 | - | - |

cloglog | - | - | - | - | - | - | - | - |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.49 | 159.78 | 147.87 | 159.15 | 146.83 | 158.11 | 149.47 | 160.75 |

probit | 148.54 | 159.82 | 147.92 | 159.20 | 146.90 | 158.18 | 149.51 | 160.79 |

loglog | 147.65 | 158.93 | 147.01 | 158.29 | 145.97 | 157.25 | 148.56 | 159.85 |

clog log | 150.20 | 161.49 | 149.62 | 160.90 | 148.71 | 159.99 | 151.17 | 162.46 |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.95 | 160.23 | 148.30 | 159.58 | 147.28 | 158.56 | 149.81 | 161.09 |

probit | 148.55 | 159.84 | 147.91 | 159.19 | 146.88 | 158.16 | 149.41 | 160.70 |

loglog | 147.61 | 158.89 | 146.96 | 158.24 | 145.93 | 157.21 | 148.47 | 159.75 |

clog log | 151.95 | 163.23 | 151.30 | 162.58 | 150.27 | 161.55 | 152.81 | 164.09 |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 173.33 | 181.79 | 202.33 | 210.79 | 197.84 | 206.30 | 188.22 | 196.68 |

probit | 151.12 | 159.58 | 172.76 | 181.22 | 164.81 | 173.27 | 162.85 | 171.31 |

loglog | 159.04 | 167.50 | 180.71 | 189.17 | 176.63 | 185.09 | 170.09 | 178.55 |

clog log | 156.83 | 165.29 | 176.00 | 184.46 | 169.15 | 177.61 | 166.43 | 174.89 |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 150.23 | 158.70 | 194.90 | 203.36 | 204.72 | 213.18 | - | - |

probit | 147.75 | 156.21 | 149.25 | 157.71 | 155.53 | 163.99 | - | - |

loglog | 144.29 | 152.75 | 148.81 | 157.27 | 147.08 | 155.54 | - | - |

cloglog | - | - | - | - | - | - | - | - |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 148.87 | 157.33 | 153.58 | 162.04 | 153.82 | 162.28 | 151.66 | 160.12 |

probit | 146.57 | 155.03 | 148.59 | 157.05 | 148.14 | 156.60 | 148.17 | 156.63 |

loglog | 146.33 | 154.79 | 149.87 | 158.34 | 149.56 | 158.02 | 148.63 | 157.09 |

clog log | 148.21 | 156.67 | 150.03 | 158.49 | 149.74 | 158.20 | 149.68 | 158.14 |

logit | probit | loglog | cloglog | |||||
---|---|---|---|---|---|---|---|---|

AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |

logit | 154.43 | 162.89 | 167.30 | 175.76 | 165.20 | 173.66 | 162.05 | 170.51 |

probit | 147.39 | 155.85 | 153.58 | 162.04 | 152.13 | 160.59 | 150.99 | 159.45 |

loglog | 146.96 | 155.42 | 153.40 | 161.86 | 151.96 | 160.43 | 150.83 | 159.29 |

cloglog | 152.17 | 160.63 | 161.47 | 169.94 | 159.67 | 168.13 | 157.42 | 165.88 |

## References

- Jencks, D.S.; Adam, J.D.; Borum, M.L.; Koh, J.M.; Stephen, S.; Doman, D.B. Overview of current concepts in gastric intestinal metaplasia and gastric cancer. Gastroenterol. Hepatol.
**2018**, 14, 92. [Google Scholar] - Filipe, M.I.; Muñoz, N.; Matko, I.; Kato, I.; Pompe-Kirn, V.; Jutersek, A.; Teuchmann, S.; Benz, M.; Prijon, T. Intestinal metaplasia types and the risk of gastric cancer: A cohort study in Slovenia. Int. J. Cancer
**1994**, 57, 324–329. [Google Scholar] [CrossRef] [PubMed] - Ferlay, J.; Soerjomataram, I.; Dikshit, R.; Eser, S.; Mathers, C.; Rebelo, M.; Parkin, D.M.; Forman, D.; Bray, F. Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer
**2015**, 136, E359–E386. [Google Scholar] [CrossRef] [PubMed] - Correa, P. The biological model of gastric carcinogenesis. IARC Sci. Publ.
**2004**, 157, 301–310. [Google Scholar] - Huang, K.K.; Ramnarayanan, K.; Zhu, F.; Srivastava, S.; Xu, C.; Tan, A.L.K.; Lee, M.; Tay, S.; Das, K.; Xing, M.; et al. Genomic and epigenomic profiling of high-risk intestinal metaplasia reveals molecular determinants of progression to gastric cancer. Cancer Cell
**2018**, 33, 137–150. [Google Scholar] [CrossRef] - Ushijima, T. Epigenetic field for cancerization. BMB Rep.
**2007**, 40, 142–150. [Google Scholar] [CrossRef] [PubMed] - Teschendorff, A.E.; Jones, A.; Fiegl, H.; Sargent, A.; Zhuang, J.J.; Kitchener, H.C.; Widschwendter, M. Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation. Genome Med.
**2012**, 4, 24. [Google Scholar] [CrossRef] [PubMed] - Wang, T.; Tsui, B.; Kreisberg, J.F.; Robertson, N.A.; Gross, A.M.; Yu, M.K.; Carter, H.; Brown-Borg, H.M.; Adams, P.D.; Ideker, T. Epigenetic aging signatures in mice livers are slowed by dwarfism, calorie restriction and rapamycin treatment. Genome Biol.
**2017**, 18, 57. [Google Scholar] [CrossRef] [PubMed] - Yamashita, S.; Kishino, T.; Takahashi, T.; Shimazu, T.; Charvat, H.; Kakugawa, Y.; Nakajima, T.; Lee, Y.C.; Iida, N.; Maeda, M.; et al. Genetic and epigenetic alterations in normal tissues have differential impacts on cancer risk among tissues. Proc. Natl. Acad. Sci. USA
**2018**, 115, 1328–1333. [Google Scholar] [CrossRef] - Tao, Y.; Kang, B.; Petkovich, D.A.; Bhandari, Y.R.; In, J.; Stein-O’Brien, G.; Kong, X.; Xie, W.; Zachos, N.; Maegawa, S.; et al. Aging-like spontaneous epigenetic silencing facilitates Wnt activation, stemness, and BrafV600E-induced tumorigenesis. Cancer Cell
**2019**, 35, 315–328. [Google Scholar] [CrossRef] - Cole, J.J.; Robertson, N.A.; Rather, M.I.; Thomson, J.P.; McBryan, T.; Sproul, D.; Wang, T.; Brock, C.; Clark, W.; Ideker, T.; et al. Diverse interventions that extend mouse lifespan suppress shared age-associated epigenetic changes at critical gene regulatory regions. Genome Biol.
**2017**, 18, 58. [Google Scholar] [CrossRef] [PubMed] - Teschendorff, A.E. A comparison of epigenetic mitotic-like clocks for cancer risk prediction. Genome Med.
**2020**, 12, 1–17. [Google Scholar] [CrossRef] [PubMed] - Suzuki, K.; Suzuki, I.; Leodolter, A.; Alonso, S.; Horiuchi, S.; Yamashita, K.; Perucho, M. Global DNA demethylation in gastrointestinal cancer is age dependent and precedes genomic damage. Cancer Cell
**2006**, 9, 199–207. [Google Scholar] [CrossRef] [PubMed] - Glonek, G.; McCullagh, P. Multivariate logistic models. J. R. Stat. Soc. Ser. B
**1995**, 57, 533–546. [Google Scholar] [CrossRef] - Zocchi, S.; Atkinson, A. Optimum experimental designs for multinomial logistic models. Biometrics
**1999**, 55, 437–444. [Google Scholar] [CrossRef] [PubMed] - Bu, X.; Majumdar, D.; Yang, J. D-optimal designs for multinomial logistic models. Ann. Stat.
**2020**, 48, 983–1000. [Google Scholar] [CrossRef] - Dousti Mousavi, N.; Aldirawi, H.; Yang, J. Categorical data analysis for high-dimensional sparse gene expression data. BioTech
**2023**, 12, 52. [Google Scholar] [CrossRef] [PubMed] - Aitchison, J.; Bennett, J. Polychotomous quantal response by maximum indicant. Biometrika
**1970**, 57, 253–262. [Google Scholar] [CrossRef] - Agresti, A. Categorical Data Analysis, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Greene, W. Econometric Analysis; Pearson Education: Hoboken, NJ, USA, 2018. [Google Scholar]
- McCullagh, P. Regression models for ordinal data. J. R. Stat. Soc. Ser. B
**1980**, 42, 109–142. [Google Scholar] [CrossRef] - Yang, J.; Tong, L.; Mandal, A. D-optimal designs with ordered categorical data. Stat. Sin.
**2017**, 27, 1879–1902. [Google Scholar] [CrossRef] - O’Connell, A. Logistic Regression Models for Ordinal Response Variables; Sage: London, UK, 2006. [Google Scholar]
- Wang, T.; Tong, L.; Yang, J. Multinomial link models. arXiv
**2023**, arXiv:2312.16260. [Google Scholar] - Tomasetti, C.; Vogelstein, B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science
**2015**, 347, 78–81. [Google Scholar] [CrossRef] [PubMed] - Klutstein, M.; Moss, J.; Kaplan, T.; Cedar, H. Contribution of epigenetic mechanisms to variation in cancer risk among tissues. Proc. Natl. Acad. Sci. USA
**2017**, 114, 2230–2234. [Google Scholar] [CrossRef] [PubMed] - Johnstone, S.E.; Gladyshev, V.N.; Aryee, M.J.; Bernstein, B.E. Epigenetic clocks, aging, and cancer. Science
**2022**, 378, 1276–1277. [Google Scholar] [CrossRef] [PubMed] - Zheng, S.C.; Widschwendter, M.; Teschendorff, A.E. Epigenetic drift, epigenetic clocks and cancer risk. Epigenomics
**2016**, 8, 705–719. [Google Scholar] [CrossRef] [PubMed] - Zhou, W.; Dinh, H.Q.; Ramjan, Z.; Weisenberger, D.J.; Nicolet, C.M.; Shen, H.; Laird, P.W.; Berman, B.P. DNA methylation loss in late-replicating domains is linked to mitotic cell division. Nat. Genet.
**2018**, 50, 591–602. [Google Scholar] [CrossRef] [PubMed] - Hannum, G.; Guinney, J.; Zhao, L.; Zhang, L.; Hughes, G.; Sadda, S.; Klotzle, B.; Bibikova, M.; Fan, J.B.; Gao, Y.; et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell
**2013**, 49, 359–367. [Google Scholar] [CrossRef] [PubMed] - Akaike, H. Information theory and an extension of the maximum likelihood principle. In Proceedings of the 2nd International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, 2–8 September 1971; Akademiai Kiado: Budapest, Hungary, 1973. [Google Scholar]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control
**1974**, 19, 716–723. [Google Scholar] [CrossRef] - Schwarz, G. Estimating the dimension of a model. Ann. Stat.
**1978**, 6, 461–464. [Google Scholar] [CrossRef] - Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- McCullagh, P.; Yang, J. Stochastic classification models. In Proceedings of the International Congress of Mathematicians, Madrid, Spain, 22–30 August 2006; Volume III, pp. 669–686. [Google Scholar]
- Burnham, K.P.; Anderson, D.R. Understanding AIC and BIC in Model Selection. Sociol. Methods Res.
**2004**, 33, 261–304. [Google Scholar] [CrossRef] - Correa, P.; Piazuelo, B.M.; Wilson, K.T. Pathology of gastric intestinal metaplasia: Clinical implications. Am. J. Gastroenterol.
**2010**, 105, 493–498. [Google Scholar] [CrossRef] [PubMed] - Veijola, L.; Oksanen, A.; Sipponen, P.; Rautelin, H. Evaluation of a commercial immunoblot, Helicoblot 2.1, for diagnosis of Helicobacter pylori infection. Clin. Vaccine Immunol.
**2008**, 15, 1705–1710. [Google Scholar] [CrossRef] [PubMed] - Calvet, X.; Sánchez-Delgado, J.; Montserrat, A.; Lario, S.; Ramírez-Lázaro, M.J.; Quesada, M.; Casalots, A.; Suárez, D.; Campo, R.; Brullet, E.; et al. Accuracy of diagnostic tests for Helicobacter pylori: A reappraisal. Clin. Infect. Dis.
**2009**, 48, 1385–1391. [Google Scholar] [CrossRef] [PubMed] - Wang, T.; Yang, J. Identifying the most appropriate order for categorical responses. arXiv
**2024**, arXiv:2206.08235. [Google Scholar] [CrossRef]

**Figure 2.**Predictive probabilities ${\widehat{\pi}}_{ij}$ based on Model 1 against true response labels (left panel: $j=1$; middle panel: $j=2$; right panel: $j=3$).

**Figure 3.**Boxplots of cross-entropy loss of Model 1 and Model 2 based on ten-fold cross-validations with ten random partitions.

**Figure 7.**Boxplots of cross-entropy loss (on 98 Samples only) of Models 1, 2, and 3 based on ten-fold cross-validations with ten random partitions.

**Figure 8.**Boxplots of cross-entropy loss (on 26 Samples only) of Models 1 and 2 based on ten-fold cross-validations with ten random partitions.

Model | Best Link | AIC | BIC |
---|---|---|---|

Baseline-category npo | loglog, loglog | 146.69 | 157.97 |

Cumulative npo | loglog, probit | 145.53 | 156.81 |

Adjacent-categories npo | loglog, loglog | 145.97 | 157.25 |

Continuation-ratio npo | loglog, loglog | 145.93 | 157.21 |

Baseline-category po | probit, logit | 151.12 | 159.58 |

Cumulative po | loglog, logit | 144.29 | 152.75 |

Adjacent-categories po | loglog, logit | 146.33 | 154.79 |

Continuation-ratio po | loglog, logit | 146.96 | 155.42 |

Model | Best Link | AIC | BIC |
---|---|---|---|

Baseline-category npo | logit, probit | 109.95 | 143.79 |

Cumulative npo | loglog, logit | 109.20 | 143.04 |

Adjacent-categories npo | logit, logit | 109.97 | 143.81 |

Continuation-ratio npo | logit, logit | 110.97 | 144.82 |

Baseline-category po | probit, logit | 111.31 | 131.05 |

Cumulative po | probit, probit | 110.03 | 129.77 |

Adjacent-categories po | logit, logit | 108.89 | 128.63 |

Continuation-ratio po | probit, probit | 109.32 | 129.06 |

Model | Best Link | AIC | BIC |
---|---|---|---|

Baseline-category npo | logit, probit | 81.43 | 102.11 |

Cumulative npo | probit, probit | 84.29 | 104.97 |

Adjacent-categories npo | logit, probit | 83.22 | 103.90 |

Continuation-ratio npo | logit, probit | 83.56 | 104.24 |

Baseline-category po | probit, logit | 82.39 | 95.32 |

Cumulative po | probit, probit | 77.99 | 90.92 |

Adjacent-categories po | probit, probit | 77.56 | 90.48 |

Continuation-ratio po | probit, probit | 77.77 | 90.69 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wang, T.; Huang, Y.; Yang, J.
Statistical Models for High-Risk Intestinal Metaplasia with DNA Methylation Profiling. *Epigenomes* **2024**, *8*, 19.
https://doi.org/10.3390/epigenomes8020019

**AMA Style**

Wang T, Huang Y, Yang J.
Statistical Models for High-Risk Intestinal Metaplasia with DNA Methylation Profiling. *Epigenomes*. 2024; 8(2):19.
https://doi.org/10.3390/epigenomes8020019

**Chicago/Turabian Style**

Wang, Tianmeng, Yifei Huang, and Jie Yang.
2024. "Statistical Models for High-Risk Intestinal Metaplasia with DNA Methylation Profiling" *Epigenomes* 8, no. 2: 19.
https://doi.org/10.3390/epigenomes8020019