Analysis of Risk Factors for Colorectal Cancer Associated with Ulcerative Colitis Using Machine Learning: A Retrospective Longitudinal Study Using a National Database in Japan
Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. The National Registry for Intractable Disease
2.2. Description for Collected Data
2.3. Definition of CRC
2.4. Disease Distribution and Clinical Assessment
2.5. Machine Learning Model
2.6. K-Means Clustering Algorithm for UC
2.7. Statistical Analysis
3. Results
3.1. Study Population
3.2. Standardized Incidence of Colon Cancer and Rectal Cancer in UC
3.3. Clinical Characteristics
3.4. Identifying CRC Risk Factors
3.5. Prediction Model
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| FY | Fiscal year |
| G-CAP | Granulocyte apheresis |
| L-CAP | Leukocyte apheresis |
| ADL | Activities of daily living |
| BMI | Body mass index |
| RBC | Red blood cell |
| WBC | White blood cell |
| ESR | Erythrocyte sedimentation rate |
| CRP | C-reactive protein |
References
- Ungaro, R.; Mehandru, S.; Allen, P.B.; Peyrin-Biroulet, L.; Colombel, J.-F. Ulcerative colitis. Lancet 2017, 389, 1756–1770. [Google Scholar] [CrossRef]
- Ahmadi, N.; Vidanapathirana, G.; Gopalan, V. Crossroads of Iron Metabolism and Inflammation in Colorectal Carcinogenesis: Molecular Mechanisms and Therapeutic Perspectives. Genes 2025, 16, 1166. [Google Scholar] [CrossRef]
- Eaden, J.A.; Abrams, K.R.; Mayberry, J.F. The risk of colorectal cancer in ulcerative colitis: A meta-analysis. Gut 2001, 48, 526–535. [Google Scholar] [CrossRef]
- Wu, H.-Y.; Weng, M.-T.; Chou, J.-W.; Yen, H.-H.; Lin, C.-C.; Chiang, F.-F.; Chung, C.-S.; Lin, W.-C.; Chang, C.-W.; Le, P.-H.; et al. Clinical Characteristics, Management, and Outcomes of Colitis-Associated Colorectal Cancer and the Comparison With Sporadic Colorectal Cancer in Taiwan. Clin. Transl. Gastroenterol. 2025, 16, e00798. [Google Scholar] [CrossRef]
- Olén, O.; Erichsen, R.; Sachs, M.C.; Pedersen, L.; Halfvarson, J.; Askling, J.; Ekbom, A.; Sørensen, H.T.; Ludvigsson, J.F. Colorectal cancer in ulcerative colitis: A Scandinavian population-based cohort study. Lancet 2020, 395, 123–131. [Google Scholar] [CrossRef] [PubMed]
- Parra-Izquierdo, V.; Otero-Regino, W.; Juliao-Baños, F.; Frías-Ordoñez, J.S.; Ibañez-Pinilla, E.; Gil-Parada, F.L.; Marulanda-Fernández, H.; Otero-Parra, L.; Otero-Ramos, E.; Puentes-Manosalva, F.E.; et al. Dysplasia and Colorectal Cancer Surveillance in Ulcerative Colitis Patients in Latin America: Real-World Data. Crohn’s Colitis 360 2025, 7, otae081. [Google Scholar] [CrossRef]
- Chen, T.; Liu, J.; Hang, R.; Chen, Q.; Wang, D. Neutrophils: From Inflammatory Bowel Disease to Colitis-Associated Colorectal Cancer. J. Inflamm. Res. 2025, 18, 925–947. [Google Scholar] [CrossRef]
- De Jong, M.E.; Van Tilburg, S.B.; Nissen, L.H.C.; Kievit, W.; Nagtegaal, I.D.; Horjus, C.S.; Römkens, T.E.H.; Drenth, J.P.H.; Hoentjen, F.; Derikx, L.A.A.P. Long-term Risk of Advanced Neoplasia After Colonic Low-grade Dysplasia in Patients With Inflammatory Bowel Disease: A Nationwide Cohort Study. J. Crohn’s Colitis 2019, 13, 1485–1491. [Google Scholar] [CrossRef]
- Pulusu, S.S.R.; Lawrance, I.C. Dysplasia and colorectal cancer surveillance in inflammatory bowel disease. Expert Rev. Gastroenterol. Hepatol. 2017, 11, 711–722. [Google Scholar] [CrossRef] [PubMed]
- Francescone, R.; Hou, V.; Grivennikov, S.I. Cytokines, IBD, and Colitis-associated Cancer. Inflamm. Bowel Dis. 2015, 21, 409–418. [Google Scholar] [CrossRef] [PubMed]
- Cheng, Y.; Desreumaux, P. 5-aminosalicylic acid is an attractive candidate agent for chemoprevention of colon cancer in patients with inflammatory bowel disease. World J. Gastroenterol. 2005, 11, 309–314. [Google Scholar] [CrossRef] [PubMed]
- Kanatani, Y.; Tomita, N.; Sato, Y.; Eto, A.; Omoe, H.; Mizushima, H. National Registry of Designated Intractable Diseases in Japan: Present Status and Future Prospects. Neurol. Med. Chir. 2017, 57, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Satsangi, J.; Silverberg, M.S.; Vermeire, S.; Colombel, J.F. The Montreal classification of inflammatory bowel disease: Controversies, consensus, and implications. Gut 2006, 55, 749–753. [Google Scholar] [CrossRef]
- Shibahara, T.; Wada, C.; Yamashita, Y.; Fujita, K.; Sato, M.; Kuwata, J.; Okamoto, A.; Ono, Y. Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified. PLoS ONE 2023, 18, e0286072. [Google Scholar] [CrossRef]
- Sammouda, R.; El-Zaart, A. An Optimized Approach for Prostate Image Segmentation Using K-Means Clustering Algorithm with Elbow Method. Comput. Intell. Neurosci. 2021, 2021, 4553832. [Google Scholar] [CrossRef] [PubMed]
- Zhiqin, W.; Palaniappan, S.; Ali, R.A.R. Inflammatory Bowel Disease-related Colorectal Cancer in the Asia-Pacific Region: Past, Present, and Future. Intest. Res. 2014, 12, 194–204. [Google Scholar] [CrossRef]
- Söderlund, S.; Granath, F.; Broström, O.; Karlén, P.; Löfberg, R.; Ekbom, A.; Askling, J. Inflammatory Bowel Disease Confers a Lower Risk of Colorectal Cancer to Females Than to Males. Gastroenterology 2010, 138, 1697–1703. [Google Scholar] [CrossRef]
- Sano, M.; Kanatani, Y.; Ueda, T.; Nemoto, S.; Miyake, Y.; Tomita, N.; Suzuki, H. Explainable artificial intelligence for prediction of refractory ulcerative colitis: Analysis of a Japanese Nationwide Registry. Ann. Med. 2025, 57, 2499960. [Google Scholar] [CrossRef]
- Porter, R.J.; Arends, M.J.; Churchhouse, A.M.D.; Din, S. Inflammatory Bowel Disease-Associated Colorectal Cancer: Translational Risks from Mechanisms to Medicines. J. Crohn’s Colitis 2021, 15, 2131–2141. [Google Scholar] [CrossRef]
- Mahmoud, R.; Shah, S.C.; Hove, J.R.T.; Torres, J.; Mooiweer, E.; Castaneda, D.; Glass, J.; Elman, J.; Kumar, A.; Axelrad, J.; et al. No Association Between Pseudopolyps and Colorectal Neoplasia in Patients With Inflammatory Bowel Diseases. Gastroenterology 2019, 156, 1333–1344. [Google Scholar] [CrossRef]
- Xu, W.; Liu, F.; Tang, W.; Gu, Y.; Zhong, J.; Cui, L.; Du, P. The Mayo Endoscopic Score Is a Novel Predictive Indicator for Malignant Transformation in Ulcerative Colitis: A Long-Term Follow-Up Multicenter Study. Front. Surg. 2022, 9, 832219. [Google Scholar] [CrossRef]
- Popp, C.; Nichita, L.; Voiosu, T.; Bastian, A.; Cioplea, M.; Micu, G.; Pop, G.; Sticlaru, L.; Bengus, A.; Voiosu, A.; et al. Expression Profile of p53 and p21 in Large Bowel Mucosa as Biomarkers of Inflammatory-Related Carcinogenesis in Ulcerative Colitis. Dis. Markers 2016, 2016, 3625279. [Google Scholar] [CrossRef] [PubMed]
- Qiu, X.; Ma, J.; Wang, K.; Zhang, H. Chemopreventive effects of 5-aminosalicylic acid on inflammatory bowel disease-associated colorectal cancer and dysplasia: A systematic review with meta-analysis. Oncotarget 2017, 8, 1031–1045. [Google Scholar] [CrossRef] [PubMed]
- Słoka, J.; Madej, M.; Strzalka-Mrozik, B. Molecular Mechanisms of the Antitumor Effects of Mesalazine and Its Preventive Potential in Colorectal Cancer. Molecules 2023, 28, 5081. [Google Scholar] [CrossRef] [PubMed]
- Baumgart, D.C.; Vierziger, K.; Sturm, A.; Wiedenmann, B.; Dignass, A.U. Mesalamine promotes intestinal epithelial wound healing in vitro through a TGF-beta-independent mechanism. Scand. J. Gastroenterol. 2005, 40, 958–964. [Google Scholar] [CrossRef]
- Stolfi, C.; Pellegrini, R.; Franzè, E.; Pallone, F.; Monteleone, G. Molecular basis of the potential of mesalazine to prevent colorectal cancer. World J. Gastroenterol. 2008, 14, 4434–4439. [Google Scholar] [CrossRef]
- Zhang, N.; Huang, Y.; Peng, B.; Weng, Z.; Li, B.; Xiao, H.; Peng, S.; Song, X.; Guo, Q. AI-Assisted Glucocorticoid Treatment Response Prediction of Active Ulcerative Active Patients. J. Gastroenterol. Hepatol. 2025, 40, 1754–1762. [Google Scholar] [CrossRef]
- Poon, A.I.F.; Sung, J.J.Y. Opening the black box of AI-Medicine. J. Gastroenterol. Hepatol. 2021, 36, 581–584. [Google Scholar] [CrossRef]
- Yalchin, M.; Baker, A.-M.; Graham, T.A.; Hart, A. Predicting Colorectal Cancer Occurrence in IBD. Cancers 2021, 13, 2908. [Google Scholar] [CrossRef]
- Abu-Freha, N.; Cohen, B.; Gordon, M.; Weissmann, S.; Kestenbaum, E.H.; Vosko, S.; Abu-Tailakh, M.; Ben-Shoshan, L.; Cohen, D.L.; Shirin, H. Colorectal cancer among inflammatory bowel disease patients: Risk factors and prevalence compared to the general population. Front. Med. 2023, 10, 1225616. [Google Scholar] [CrossRef]




| Hyperparameter | Best Parameter |
|---|---|
| Number of epochs | 250 |
| Number of inner layers | 6 |
| Size of layers | 170 |
| Batch size | 15,085 |
| Level smoothing | 0.0069 |
| Learning rate | 0.0007 |
| Momentum | 0.987 |
| Optimization | Adam |
| Dropout rate of inner layers | 0.04 |
| Dropout rate of input layers | 0.23 |
| Regularization coefficient | 9.642 × 10−9 |
| Number of Patients with Completed Follow-Up | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Registry Year | No. of Registered | 1 yr Later | 2 yrs Later | 3 yrs Later | 4 yrs Later | 5 yrs Later | 6 yrs Later | 7 yrs Later | 8 yrs Later | 9 yrs Later | 10 yrs Later |
| 2003 | 3376 | 1434 (42.5) | 1700 (50.4) | 2142 (63.4) | 2417 (71.6) | 2131 (63.1) | 1934 (57.3) | 2187 (64.8) | 2368 (70.1) | 2385 (70.6) | 2665 (78.9) |
| 2004 | 7046 | 3439 (48.8) | 3896 (55.3) | 4550 (64.6) | 4159 (59.0) | 4124 (58.5) | 4440 (63.0) | 4691 (66.6) | 4747 (67.4) | 5334 (75.7) | |
| 2005 | 8482 | 4651 (54.8) | 5465 (64.4) | 4819 (56.8) | 4266 (50.3) | 4837 (57.0) | 5307 (62.6) | 5299 (62.5) | 6340 (74.7) | ||
| 2006 | 6280 | 3765 (60.0) | 3381 (53.8) | 2766 (44.0) | 3173 (50.5) | 3551 (56.5) | 3634 (57.9) | 4519 (72.0) | |||
| 2007 | 6832 | 3397 (49.7) | 3138 (45.9) | 3894 (57.0) | 4002 (58.6) | 4054 (59.3) | 5064 (74.1) | ||||
| 2008 | 10,443 | 4474 (42.8) | 5446 (52.1) | 5986 (57.3) | 6037 (57.8) | 7679 (73.5) | |||||
| 2009 | 12,534 | 5325 (42.5) | 6296 (50.2) | 7048 (56.2) | 9397 (75.0) | ||||||
| 2010 | 13,073 | 5889 (45.0) | 6964 (53.3) | 9562 (73.1) | |||||||
| 2011 | 14,704 | 6592 (44.8) | 9956 (67.7) | ||||||||
| Tracking completion rate (%) | 47.9 ±5.8 | 54.8 ±6.6 | 59.1 ±7.9 | 60.4 ±8.9 | 61.3 ±5.9 | 63.0 ±6.0 | 66.5 ±3.5 | 70.7 ±3.0 | 73.2 ±2.6 | 78.9 | |
| CRC (+) | CRC (−) | ||
|---|---|---|---|
| No. Case | 141 | 78,415 | p-Value |
| Age | 54.5 ± 15.1 | 41.3 ± 16.5 | <0.001 |
| Onset age | 50.7 ± 20.1 | 38.5 ± 16.7 | <0.001 |
| Family history (Yes/No) | 2/113 (1.8%) | 2162/68,789 (3.2%) | 0.699 |
| Sex (Male/Female, ratio) | 91/50, 1.82 | 44,570/33,845, 1.32 | 0.065 |
| BMI | 20.3 ± 2.8 | 22.3 ± 3.5 | 0.002 |
| Laboratory findings | |||
| RBC (×104/μL) | 424 ± 68 | 447 ± 58 | <0.001 |
| Hemoglobin (g/dL) | 12.1 ± 2.6 | 13.1 ± 2.1 | <0.001 |
| WBC (/μL) | 7338 ± 3246 | 7058 ± 2944 | 0.2688 |
| ESR (mm/hr) | 28.4 ± 24.8 | 21.4 ± 25.5 | 0.041 |
| Total protein (g/dL) | 6.9 ± 0.7 | 7.1 ± 0.7 | 0.001 |
| Albumin (g/dL) | 3.7 ± 0.7 | 4.1 ± 0.7 | <0.001 |
| CRP (mg/dL) | 2.3 ± 4.1 | 1.7 ± 4.3 | 0.114 |
| Stool culture (positive/negative) | 7/88 (7.4%) | 4028/51,162 (7.3%) | 0.979 |
| Extent of inflammation# | |||
| E1: ulcerative proctitis | 7 (5.0%) | 4928 (6.3%) | 0.140 |
| E2: left-sided colitis | 38 (27.0%) | 23,133 (29.5%) | |
| E3: pancolitis | 55 (39.0%) | 23,743 (30.3%) | |
| Other | 26 (18.4%) | 18,002 (23.0%) | |
| Mayo-score | 4.2 ± 2.4 | 4.4 ± 2.0 | 0.211 |
| Endoscopic findings | |||
| mucosal friability | 110/129 (85.2%) | 70,354/75,860 (92.7%) | 0.0011 |
| erosion | 96/130 (73.8%) | 63,912/75,994 (84.1%) | 0.0014 |
| pseudo-polyps | 30/129 (23.3%) | 6480/68,843 (9.4%) | <0.001 |
| Biopsy findings | |||
| crypt abscess | 75/117 (64.1%) | 48,733/70,174 (69.4%) | 0.21 |
| abnormal crypt architecture | 67/114 (58.8%) | 25,383/68,325 (37.2%) | <0.001 |
| dysplasia | 54/116 (46.6%) | 4206/68,971 (6.1%) | <0.001 |
| Therapy | |||
| 5-ASA | 101/138 (73.2%) | 72,241/77,037 (93.8%) | <0.001 |
| corticosteroids | 33/139 (23.7%) | 22,295/74,173 (30.0%) | 0.11 |
| immuno-suppressant | 0/136 (0%) | 1315/71,514 (1.8%) | 0.11 |
| G-/L-CAP | 2/141 (1.4%) | 1002/78,415 (1.3%) | 0.88 |
| Univariate Analysis | Multivariate Analysis | |||||
|---|---|---|---|---|---|---|
| Variables | Hazard Ratio | 95% CI | p-Value | Hazard Ratio | 95% CI | p-Value |
| Age | 1.05 | 1.03–1.06 | <0.001 | 1.09 | 1.03–1.09 | <0.001 |
| Age (onset) | 1.03 | 1.01–1.05 | <0.001 | 0.96 | 0.93–0.99 | 0.006 |
| Extent of inflammation | 1.21 | 0.79–1.86 | 0.376 | |||
| Mayo score | 0.99 | 0.88–1.11 | 0.864 | |||
| Mayo endoscopic score | 3.14 | 1.74–5.65 | <0.001 | 0.74 | 0.42–1.31 | 0.299 |
| Endoscopic findings | ||||||
| mucosal friability | 0.48 | 0.22–1.01 | 0.055 | |||
| pseudo-polyps | 2.92 | 1.58–5.43 | 0.001 | 1.31 | 0.61–2.84 | 0.488 |
| Biopsy findings | ||||||
| abnormal crypt architecture | 3.14 | 1.74–5.65 | <0.001 | 1.82 | 0.88–3.75 | 0.106 |
| dysplasia | 11.31 | 6.50–19.69 | <0.001 | 8.53 | 4.30–16.9 | <0.001 |
| Therapy | ||||||
| 5-ASA | 0.36 | 0.18–0.71 | 0.003 | 0.52 | 0.20–1.34 | 0.182 |
| corticosteroids | 0.83 | 0.48–1.43 | 0.515 | |||
| Cluster 1 | Cluster 2 | Cluster 3 | ||
|---|---|---|---|---|
| No. Case Colon Cancer/Rectal Cancer | 16,450 88/0 | 39,040 0/0 | 23,066 0/53 | p-Value |
| Age | 53.7 ± 17.0 * | 40.4 ± 14.9 | 33.9 ± 13.2 † | <0.001 |
| Onset age | 51.9 ± 17.6 * | 37.6 ± 14.9 | 30.5 ± 12.6 † | <0.001 |
| Sex (Male/Female, ratio) | 9058/7392, 1.45 * | 21,627/17,413, 1.45 * | 13,976/9090, 1.39 † | 0.008 |
| Laboratory findings | ||||
| RBC (×104/μL) | 419 ± 65 † | 449 ± 54 | 463 ± 52 * | <0.001 |
| Hemoglobin (g/dL) | 12.4 ± 2.3 † | 13.2 ± 2.0 | 13.6 ± 2.0 * | <0.001 |
| WBC (/μL) | 7853 ± 3440 * | 7055 ± 2815 | 6501 ± 2631 † | <0.001 |
| ESR (mm/hr) | 36.6 ± 38.3 * | 19.3 ± 20.5 | 14.5 ± 15.7 † | <0.001 |
| Total protein (g/dL) | 6.8 ± 0.8 † | 7.1 ± 0.6 | 7.2 ± 0.6 * | <0.001 |
| Albumin (g/dL) | 3.7 ± 0.8 † | 4.1 ± 0.6 | 4.2 ± 0.5 * | <0.001 |
| CRP (mg/dL) | 2.6 ± 5.2 * | 1.5 ± 3.9 | 1.4 ± 4.1 † | <0.001 |
| Extent of inflammation | ||||
| cecum | 29.8% * | 23.2% | 21.6% † | <0.001 |
| ascending colon | 31.0% * | 23.0% † | 24.4% | <0.001 |
| transverse colon | 39.2% * | 27.5% † | 27.6% | <0.001 |
| descending colon | 41.8% * | 35.7% † | 39.7% | <0.001 |
| sigmoid colon | 51.5% † | 55.1% | 66.6% * | <0.001 |
| rectum | 12.0% † | 15.9% | 25.3% * | <0.001 |
| Mayo score | 3.9 ± 2.1 † | 4.3 ± 2.0 | 4.8 ± 2.0 * | <0.001 |
| Endoscopic findings | ||||
| mucosal friability | 88.7% † | 93.6% | 94.1% * | <0.001 |
| erosion | 89.0% * | 86.5% | 76.4% † | <0.001 |
| pseudo-polyps | 9.9% * | 7.6% † | 9.5% | <0.001 |
| Histopathological findings | ||||
| crypt abscess | 77.3% * | 71.9% | 59.0% † | <0.001 |
| abnormal crypt architecture | 40.0% * | 37.0% | 35.0% † | <0.001 |
| dysplasia | 4.5% † | 5.3% | 9.1% * | <0.001 |
| Therapy | ||||
| 5-ASA | 98.0% * | 97.0% | 85.1% † | <0.001 |
| corticosteroids | 21.8% † | 27.4% | 40.7% * | <0.001 |
| immuno-suppressant | 1.5% † | 1.7% | 2.3% * | <0.001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hirai, M.; Kanatani, Y.; Ueda, T.; Sano, M.; Arai, H.; Miyake, Y.; Tomita, N.; Nemoto, S.; Suzuki, H. Analysis of Risk Factors for Colorectal Cancer Associated with Ulcerative Colitis Using Machine Learning: A Retrospective Longitudinal Study Using a National Database in Japan. Cancers 2025, 17, 3752. https://doi.org/10.3390/cancers17233752
Hirai M, Kanatani Y, Ueda T, Sano M, Arai H, Miyake Y, Tomita N, Nemoto S, Suzuki H. Analysis of Risk Factors for Colorectal Cancer Associated with Ulcerative Colitis Using Machine Learning: A Retrospective Longitudinal Study Using a National Database in Japan. Cancers. 2025; 17(23):3752. https://doi.org/10.3390/cancers17233752
Chicago/Turabian StyleHirai, Miwa, Yasuhiro Kanatani, Takashi Ueda, Masaya Sano, Hiroaki Arai, Yurin Miyake, Naoko Tomita, Shota Nemoto, and Hidekazu Suzuki. 2025. "Analysis of Risk Factors for Colorectal Cancer Associated with Ulcerative Colitis Using Machine Learning: A Retrospective Longitudinal Study Using a National Database in Japan" Cancers 17, no. 23: 3752. https://doi.org/10.3390/cancers17233752
APA StyleHirai, M., Kanatani, Y., Ueda, T., Sano, M., Arai, H., Miyake, Y., Tomita, N., Nemoto, S., & Suzuki, H. (2025). Analysis of Risk Factors for Colorectal Cancer Associated with Ulcerative Colitis Using Machine Learning: A Retrospective Longitudinal Study Using a National Database in Japan. Cancers, 17(23), 3752. https://doi.org/10.3390/cancers17233752

