The Performance of Different Artificial Intelligence Models in Predicting Breast Cancer among Individuals Having Type 2 Diabetes Mellitus
Abstract
:1. Introduction
2. Methods
2.1. Data Source
2.2. Data Availability Statement
2.3. Ethics Statement
2.4. Sampled Participants
2.5. Training Set
2.6. Algorithm Training and Evaluation
2.7. Statistical Analyses
3. Results
3.1. Patient Demographic Features
3.2. Evaluation of Prediction Models
4. Discussion
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Global Report on Diabetes: World Health Organization. Available online: http://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf?sequence=1 (accessed on 20 November 2018).
- Kakkar, R. Rising burden of diabetes-public health challenges and way out. Nepal J. Epidemiol. 2016, 6, 557–559. [Google Scholar] [CrossRef]
- Jiang, Y.D.; Chang, C.H.; Tai, T.Y.; Chen, J.F.; Chuang, L.M. Incidence and prevalence rates of diabetes mellitus in Taiwan: Analysis of the 2000-2009 nationwide health insurance database. J. Formos. Med. Assoc. 2012, 111, 599–604. [Google Scholar] [CrossRef]
- Ballotari, P.; Vicentini, M.; Manicardi, V.; Gallo, M.; Ranieri, S.C.; Greci, M.; Rossi, P.G. Diabetes and risk of cancer incidence: Results from a population-based cohort study in northern Italy. BMC Cancer 2017, 17, 703. [Google Scholar] [CrossRef] [PubMed]
- Tsilidis, K.K.; Kasimis, J.C.; Lopez, D.S.; Ntzani, E.E.; Ioannidis, J.P. Type 2 diabetes and cancer: Umbrella review of meta-analyses of observational studies. BMJ 2015, 350, g7607. [Google Scholar] [CrossRef] [PubMed]
- Wang, M.; Hu, R.Y.; Wu, H.B.; Pan, J.; Gong, W.-W.; Guo, L.-H.; Zhong, J.-M.; Fei, F.-R.; Yu, M. Cancer risk among patients with type 2 diabetes mellitus: A population-based prospective study in China. Sci. Rep. 2015, 5, 11503. [Google Scholar] [CrossRef] [PubMed]
- Giovannucci, E.; Harlan, D.M.; Archer, M.C.; Bergenstal, R.M.; Gapstur, S.M.; Habel, L.A.; Pollak, M.; Regensteiner, J.G.; Yee, D. Diabetes and cancer: A consensus report. Diabetes Care 2010, 33, 1674–1685. [Google Scholar] [CrossRef]
- Johnson, J.A.; Carstensen, B.; Witte, D.; Bowker, S.L.; Lipscombe, L.; Renehan, A.G.; onbehalf of the Diabetes and Cancer Research Consortium. Diabetes and cancer (1): Evaluating the temporal relationship between type 2 diabetes and cancer incidence. Diabetologia 2012, 55, 1607–1618. [Google Scholar] [CrossRef]
- Ye, H.; Adane, B.; Khan, N.; Alexeev, E.; Nusbacher, N.; Minhajuddin, M.; Stevens, B.M.; Winters, A.C.; Lin, X.; John, M.; et al. Subversion of systemic glucose metabolism as a mechanism to support the growth of leukemia cells. Cancer Cell 2018, 34, 659–673. [Google Scholar] [CrossRef]
- Jee, S.H.; Ohrr, H.; Sull, J.W.; Yun, J.E.; Ji, M.; Samet, J.M. Fasting serum glucose level and cancer risk in Korean men and women. JAMA 2005, 293, 194–202. [Google Scholar] [CrossRef]
- Hardefeldt, P.J.; Edirimanne, S.; Eslick, G.D. Diabetes increases the risk of breast cancer: A meta-analysis. Endocr. Relat. Cancer 2012, 19, 793–803. [Google Scholar] [CrossRef]
- Schott, S.; Schneeweiss, A.; Sohn, C. Breast cancer and diabetes mellitus. Exp. Clin. Endocrinol. Diabetes. 2010, 118, 673–677. [Google Scholar] [CrossRef] [PubMed]
- La Vecchia, C.; Giordano, S.H.; Hortobagyi, G.N.; Chabner, B. Overweight, obesity, diabetes, and risk of breast cancer: Interlocking pieces of the puzzle. Oncologist 2011, 16, 726–729. [Google Scholar] [CrossRef] [PubMed]
- Larsson, S.C.; Mantzoros, C.S.; Wolk, A. Diabetes mellitus and risk of breast cancer: A meta-analysis. Int. J. Cancer 2007, 121, 856–862. [Google Scholar] [CrossRef] [PubMed]
- Liaw, Y.P.; Ko, P.C.; Jan, S.R. Implications of type1/2 diabetes mellitus in breast cancer development: A general female population-based cohort study. J. Cancer 2015, 6, 734–739. [Google Scholar] [CrossRef] [PubMed]
- Tseng, C.H. Diabetes and breast cancer in Taiwanese women: A detection bias? Eur. J. Clin. Investig. 2014, 44, 910–917. [Google Scholar] [CrossRef] [PubMed]
- Cleveland, R.J.; North, K.E.; Stevens, J.; Teitelbaum, S.L.; Neugut, A.I.; Gammon, M.D. The association of diabetes with breast cancer incidence and mortality in the Long Island Breast Cancer Study Project. Cancer Causes Control 2012, 23, 1193–1203. [Google Scholar] [CrossRef]
- International Agency for Research on Cancer (IARC) and World Health Organization (WHO). GLOBOCAN 2018: Estimated Cancer Incidence, Mortality and Prevalence Worldwide in 2018. Available online: https://www.iarc.fr/en/media-centre/pr/2018/pdfs/pr263_E.pdf (accessed on 20 November 2018).
- Cancer Statistics: Cancer Incidence Trends. Taiwan Cancer Registry. Available online: http://tcr.cph.ntu.edu.tw/main.php?Page=A5B2 (accessed on 20 November 2018).
- Shen, Y.C.; Chang, C.J.; Hsu, C.; Cheng, C.C.; Chiu, C.F.; Cheng, A.L. Significant difference in the rends of female breast cancer incidence between Taiwanese and Caucasian Americans: Implications from age-period-cohort analysis. Cancer Epidemiol. Biomark. Prev. 2005, 14, 1986–1990. [Google Scholar] [CrossRef]
- Liu, F.C.; Lin, H.T.; Kuo, C.F.; See, L.C.; Chiou, M.J.; Hu, H.P. Epidemiology and survival outcome of breast cancer in a nationwide study. Oncotarget 2017, 8, 16939–16950. [Google Scholar] [CrossRef] [Green Version]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. JAIR 2002, 16, 321–357. [Google Scholar] [CrossRef]
- William, T.; Arandjelovic, O.; Caie, P.D. Using machine learning and urine cytology for bladder cancer prescreening and patient stratification. In Proceedings of the Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Alghamdi, M.; Al-Mallah, M.; Keteyian, S.; Brawner, C.; Ehrman, J.; Sakr, S. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PLoS ONE 2017, 12, e0179805. [Google Scholar] [CrossRef]
- Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-normalizing neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7—9 May 2015; Volume 5. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. JMLR 2014, 15, 1929–1958. [Google Scholar]
- Fan, R.E.; Chang, K.W.; Hsieh, C.J.; Wang, X.R.; Lin, C.J. LIBLINEAR: A library for large linear classification. JMLR 2009, 9, 1871–1874. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. OSDI 2016, 16, 265–283. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. JMLR 2011, 12, 2825–2830. [Google Scholar]
- DeLong, E.R.; DeLong, D.M.; Clarke-Pearson, D.L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 1988, 44, 837–845. [Google Scholar] [CrossRef]
- Hay, N. Reprogramming glucose metabolism in cancer: Can it be exploited for cancer therapy? Nat. Rev. Cancer 2016, 16, 635–649. [Google Scholar] [CrossRef]
- Chappell, J.; Leitner, J.W.; Solomon, S.; Golovchenko, I.; Goalstone, M.L.; Draznin, B. Effect of insulin on cell cycle progression in MCF-7 breast cancer cells. Direct and potentiating influence. J. Biol. Chem. 2001, 276, 38023–38028. [Google Scholar]
- Papa, V.; Belfiore, A. Insulin receptors in breast cancer: Biological and clinical role. J. Endocrinol. Investig. 1996, 19, 324–333. [Google Scholar] [CrossRef]
- Tobias, D.K.; Akinkuolie, A.O.; Chandler, P.D. Markers of inflammation and incident breast cancer risk in the Women’s Health Study. Am. J. Epidemiol. 2018, 187, 705–716. [Google Scholar] [CrossRef]
- Wilson, C. Diabetes: Long-term use of insulin glargine might increase the risk of breast cancer. Nat. Rev. Endocrinol. 2011, 7, 499. [Google Scholar] [CrossRef]
- Tseng, C.H. Prolonged use of human insulin increases breast cancer risk in Taiwanese women with type 2 diabetes. BMC Cancer 2015, 15, 846. [Google Scholar] [CrossRef] [PubMed]
- Guppy, A.; Jamal-Hanjani, M.; Pickering, L. Anticancer effects of metformin and its potential use as therapeutic agent for breast cancer. Future Oncol. 2011, 7, 727–736. [Google Scholar] [CrossRef] [PubMed]
- Tseng, C.H. Metformin may reduce breast cancer risk in Taiwanese women with type 2 diabetes. Breast Cancer Res. Treat. 2014, 145, 785–790. [Google Scholar] [CrossRef]
- Lipscombe, L.L.; Hux, J.E.; Booth, G.L. Reduced screening mammography among women with diabetes. ARCH Intern. Med. 2005, 165, 2090–2095. [Google Scholar] [CrossRef] [PubMed]
- Steyerberg, E.W.; Eijkemans, M.J.; Harrell, F.E., Jr.; Habbema, J.D. Prognostic modeling with logistic regression analysis: A comparison of selection and estimation methods in small data sets. Stat. Med. 2000, 19, 1059–1079. [Google Scholar] [CrossRef]
- Tu, J.V. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 1996, 49, 1225–1231. [Google Scholar] [CrossRef]
- Ahmed, F.E. Artificial neural networks for diagnosis and survival prediction in colon cancer. Mol. Cancer 2005, 4, 29. [Google Scholar] [CrossRef]
- Cheng, C.A.; Chiu, H.W. An artificial neural network model for the evaluation of carotid artery stenting prognosis using a national-wide database. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2017, 2017, 2566–2569. [Google Scholar]
- Chen, Y.F.; Lin, C.S.; Hong, C.F.; Lee, D.J.; Sun, C.; Lin, H.H. Design of a clinical decision support system for predicting erectile dysfunction in men using NHIRD dataset. IEEE J. Biomed. Health Inf. 2019, 23, 2127–2137. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; pp. 278–282. [Google Scholar]
- Lipton, Z.C. The Mythos of Model Interpretability. ACM Queue 2018, 16, 30. [Google Scholar] [CrossRef]
- Dimitriou, N.; Arandjelović, O.; Harrison, D.J.; Caie, P.D. A principled machine learning framework improves accuracy of stage II colorectal cancer prognosis. NPJ Digit. Med. 2018, 1, 52. [Google Scholar] [CrossRef] [PubMed]
- Ferroni, P.; Zanzotto, F.M.; Riondino, S.; Scarpato, N.; Guadagni, F.; Roselli, M. Breast Cancer Prognosis Using a Machine Learning Approach. Cancers 2019, 11, 328. [Google Scholar] [CrossRef] [PubMed]
Variable | Breast Cancer | ||||
---|---|---|---|---|---|
No | Yes | p Value | |||
N = 628765 | N = 7346 | ||||
n | (%) | n | (%) | ||
Age group (year) | <0.001 | ||||
≤49 | 171,724 | 27.3 | 1943 | 26.5 | |
50–64 | 251,750 | 40.0 | 3716 | 50.6 | |
65+ | 205,291 | 32.7 | 1687 | 23.0 | |
Mean (SD) (year) * | 58.4 | 14.2 | 56.9 | 10.7 | |
Urbanization level # | <0.001 | ||||
1 (highest) | 183,283 | 29.2 | 2589 | 35.2 | |
2 | 185,090 | 29.4 | 2272 | 30.9 | |
3 | 100,217 | 15.9 | 1049 | 14.3 | |
4 (lowest) | 160,175 | 25.5 | 1436 | 19.6 | |
Occupation | <0.001 | ||||
White collar | 281,372 | 44.8 | 3632 | 49.4 | |
Blue collar | 294,699 | 46.9 | 3127 | 42.6 | |
Others ‡ | 52,694 | 8.38 | 587 | 7.99 | |
Underlying disease | |||||
Hypertension | 470,048 | 74.8 | 5236 | 71.3 | <0.001 |
Hyperlipidemia | 435,254 | 69.2 | 5046 | 69.7 | 0.33 |
Stroke | 88,246 | 14.0 | 606 | 8.25 | <0.001 |
Congestive heart failure | 95,160 | 15.1 | 645 | 8.78 | <0.001 |
Benign breast condition | 111,647 | 17.8 | 4899 | 66.7 | <0.001 |
Obesity | 42,712 | 6.79 | 479 | 6.52 | 0.36 |
COPD | 164,128 | 26.1 | 1619 | 22.0 | <0.001 |
CAD | 250,789 | 39.9 | 2574 | 35.0 | <0.001 |
Asthma | 138,917 | 22.1 | 1256 | 17.1 | <0.001 |
Stop-smoking clinic | 6107 | 0.97 | 28 | 0.38 | <0.001 |
Alcohol-related illness | 26,210 | 4.17 | 216 | 2.94 | <0.001 |
CKD | 188,584 | 30.0 | 1632 | 22.2 | <0.001 |
Diabetes complication (components of the aDCSI) | |||||
Retinopathy | 127,829 | 20.3 | 1123 | 15.3 | <0.001 |
Nephropathy | 222,113 | 35.3 | 1925 | 26.2 | <0.001 |
Neuropathy | 212,414 | 33.8 | 2025 | 27.6 | <0.001 |
Cerebrovascular | 168,028 | 26.7 | 1257 | 17.1 | <0.001 |
Cardiovascular | 383,242 | 61.0 | 3906 | 53.2 | <0.001 |
Peripheral vascular disease | 179,865 | 28.6 | 1419 | 19.3 | <0.001 |
Metabolic | 25,411 | 4.04 | 149 | 2.03 | <0.001 |
Mean aDCSI score (SD) | |||||
Onset | 1.62 | 1.68 | 1.29 | 1.46 | <0.001 |
End of follow-up | 3.12 | 2.33 | 2.27 | 1.96 | <0.001 |
Medications | |||||
Statin | 349,906 | 55.7 | 3465 | 47.2 | <0.001 |
Aspirin | 30,561 | 4.86 | 176 | 2.40 | <0.001 |
Estrogen | 274,204 | 43.6 | 3416 | 46.5 | <0.001 |
Insulin | 191,580 | 30.5 | 1181 | 16.1 | <0.001 |
Sulfonylureas | 340,489 | 54.2 | 3698 | 50.3 | <0.001 |
Metformin | 389,319 | 61.9 | 3897 | 53.1 | <0.001 |
TZD | 101,370 | 16.1 | 815 | 11.1 | <0.001 |
Other antidiabetic drugs | 167,166 | 26.6 | 1414 | 19.3 | <0.001 |
Dataset | Model | F1 | Precision | Recall | AUROC | AUROC SE | AUROC 95% CI |
---|---|---|---|---|---|---|---|
All (n = 1,267,867) | ANN | 0.789 | 0.791 | 0.790 | 0.865 | <0.001 | 0.864–0.866 |
LR | 0.763 | 0.765 | 0.763 | 0.834 | <0.001 | 0.833–0.835 | |
RF | 0.892 | 0.892 | 0.892 | 0.959 | <0.001 | 0.959–0.960 | |
Train (n = 1,236,170) | ANN | 0.789 | 0.791 | 0.790 | 0.865 | <0.001 | 0.864–0.866 |
LR | 0.763 | 0.765 | 0.763 | 0.834 | <0.001 | 0.833–0.835 | |
RF | 0.892 | 0.892 | 0.892 | 0.960 | <0.001 | 0.959–0.960 | |
Test (n = 31,697) | ANN | 0.789 | 0.790 | 0.789 | 0.864 | 0.002 | 0.860–0.868 |
LR | 0.758 | 0.761 | 0.758 | 0.829 | 0.002 | 0.824–0.833 | |
RF | 0.890 | 0.890 | 0.890 | 0.955 | 0.003 | 0.948–0.961 |
Model | ANN | LR | RF |
---|---|---|---|
k-fold accuracy | 0.786 | 0.881 | 0.763 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hsieh, M.-H.; Sun, L.-M.; Lin, C.-L.; Hsieh, M.-J.; Hsu, C.Y.; Kao, C.-H. The Performance of Different Artificial Intelligence Models in Predicting Breast Cancer among Individuals Having Type 2 Diabetes Mellitus. Cancers 2019, 11, 1751. https://doi.org/10.3390/cancers11111751
Hsieh M-H, Sun L-M, Lin C-L, Hsieh M-J, Hsu CY, Kao C-H. The Performance of Different Artificial Intelligence Models in Predicting Breast Cancer among Individuals Having Type 2 Diabetes Mellitus. Cancers. 2019; 11(11):1751. https://doi.org/10.3390/cancers11111751
Chicago/Turabian StyleHsieh, Meng-Hsuen, Li-Min Sun, Cheng-Li Lin, Meng-Ju Hsieh, Chung Y. Hsu, and Chia-Hung Kao. 2019. "The Performance of Different Artificial Intelligence Models in Predicting Breast Cancer among Individuals Having Type 2 Diabetes Mellitus" Cancers 11, no. 11: 1751. https://doi.org/10.3390/cancers11111751