Open Access This article is
- freely available
J. Clin. Med. 2018, 7(9), 277; doi:10.3390/jcm7090277
Development of a Prediction Model for Colorectal Cancer among Patients with Type 2 Diabetes Mellitus Using a Deep Neural Network
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA 94720, USA
Department of Radiation Oncology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 81342, Taiwan
Management Office for Health Data, China Medical University Hospital, Taichung 40447, Taiwan
College of Medicine, China Medical University, Taichung 40402, Taiwan
Department of Medicine, Poznan University of Medical Sciences, 61-701 Poznań, Poland
Program of Computer Science, Arizona State University, Tempe, AZ 85287, USA
Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
Department of Anesthesiology, China Medical University Hospital, Taichung 40447, Taiwan
Department of Nuclear Medicine and PET Center, China Medical University Hospital, Taichung 40447, Taiwan
Department of Bioinformatics and Medical Engineering, Asia University, Taichung 41354, Taiwan
Equally contributed to this article.
Received: 21 August 2018 / Accepted: 10 September 2018 / Published: 12 September 2018
Objectives: Observational studies suggested that patients with type 2 diabetes mellitus (T2DM) presented a higher risk of developing colorectal cancer (CRC). The current study aims to create a deep neural network (DNN) to predict the onset of CRC for patients with T2DM. Methods: We employed the national health insurance database of Taiwan to create predictive models for detecting an increased risk of subsequent CRC development in T2DM patients in Taiwan. We identified a total of 1,349,640 patients between 2000 and 2012 with newly diagnosed T2DM. All the available possible risk factors for CRC were also included in the analyses. The data were split into training and test sets with 97.5% of the patients in the training set and 2.5% of the patients in the test set. The deep neural network (DNN) model was optimized using Adam with Nesterov’s accelerated gradient descent. The recall, precision, F1 values, and the area under the receiver operating characteristic (ROC) curve were used to evaluate predictor performance. Results: The F1, precision, and recall values of the DNN model across all data were 0.931, 0.982, and 0.889, respectively. The area under the ROC curve of the DNN model across all data was 0.738, compared to the ideal value of 1. The metrics indicate that the DNN model appropriately predicted CRC. In contrast, a single variable predictor using adapted the Diabetes Complication Severity Index showed poorer performance compared to the DNN model. Conclusions: Our results indicated that the DNN model is an appropriate tool to predict CRC risk in patients with T2DM in Taiwan.
Keywords:type 2 diabetes mellitus; colorectal cancer; deep neural network; the national health insurance database; receiver operating characteristic
Diabetes mellitus (DM) is a common chronic disease worldwide. According to the Global Report on Diabetes from the World Health Organization, the prevalence of DM has been steadily rising for the past three decades, becoming a major public health issue. In 2014, 422 million people in the world had diabetes—8.5% of the adult population . In Taiwan, the standardized incidence rate of DM is in accordance with the global trend, with a near constancy (1.043% in 2000 and 1.160% in 2009 among age 20–79 residents). However, its prevalence has steadily increased (4.31% in 2000 and 6.38% in 2009 among age 20–79 residents), suggesting a possibility of better DM care that leads to a decrease in mortality rates of patients with DM .
As the number of patients with chronic DM increases, certain diseases related to DM become concerns among these patients. Studies have suggested that patients with DM are at a higher risk to develop cancer overall and several individual cancers compared to the general population [3,4,5,6,7,8,9]. The risk of developing colorectal cancer (TaCRC) among patients with DM was revealed by earlier reports [10,11,12,13].
Cancer has been Taiwan’s leading cause of mortality since 1982 and CRC has been the most common type of malignancy recorded in the country since 2006 . In 2015, the age-adjusted incidence rate of CRC in Taiwan was 43.58/100,000 people, an increase from 2005 of 20.9% and 8.3% for men and women, respectively . CRC has also been ranked as the third leading cause of cancer-related death in Taiwan, from 2013 to 2017, for both men and women, as well as both combined. Consequently, cancer continues to be a challenge for the public health field of Taiwan. It has come to our government’s attention, resulting in population-based investigations regarding early diagnosis and cancer-preventive epidemiology. Based on this concern and the suggestion of a possible link between CRC risk and type 2 DM (T2DM) by previous researchers, we proposed this study aimed at creating a deep neural network (DNN) to predict the onset of CRC among patients with T2DM in Taiwan.
2. Experimental Section
2.1. Data Source and Sampled Participants
This study was approved by the Research Ethics Committee of the China Medical University and Hospital in Taiwan (CMUH104-REC2-115-CR3). For the present study, we used a subset of data from the National Health Insurance Research Database (NHIRD) and the Longitudinal Cohort of Diabetes Patients (LHDB), which contains health data of 1,700,000 patients with newly diagnosed T2DM (International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) code 250.x0 and 250.x2) from 2000–2012. Subjects that had at least two diagnoses of T2DM within a year were eligible for inclusion in the LHDB. The first diagnosis date was defined as the index date of T2DM. T2DM patients with a CRC history (ICD-9-CM 153, 154) before the index date, aged less than 20 years and with incomplete information on demographics were excluded.
2.2. Outcome Measurements, Comorbidities, and Medications
All study subjects were followed from the index date to the date of CRC diagnoses, date of withdrawal from the insurance program, or the end of 2013; whichever came first. The baseline comorbidities considered in this study included hypertension, hyperlipidemia, stroke, congestive heart failure, colorectal polyps, obesity, chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), asthma, smoking (stop-smoking clinic), inflammatory bowel disease, irritable bowel syndrome, alcohol-related illness, and chronic kidney disease. The adapted Diabetes Complication Severity Index (aDCSI) score consists of scores from 7 complication categories including retinopathy, nephropathy, neuropathy, cerebrovascular, cardiovascular, peripheral vascular disease and metabolic; it ranges from 0 to 13 [16,17]. Medications that may be associated with CRC were also evaluated, including statin, insulin, sulfonylureas, metformin, thiazolidinedione (TZD), and other antidiabetic drugs.
2.3. Constructing Training and Data Sets
The data comprised of 1,349,640 data points, each representing one patient. The data had 37 input features and 2 output features. The positive output class represented the diagnosis of CRC, while the negative output class represented no diagnosis. The data were split into training and test sets: 97.5% of the data were used as the training set and 2.5% were used as the test set. This ratio was chosen both to have a sufficient number of data points for validation and to use the majority of the dataset for training. The data points were randomly allocated to each set. Table 1 shows the allocation between the training and test sets.
2.4. Algorithm and Training
The average k-fold cross-validation accuracy, with a k-value of 10, was used as the metric to determine the best hyperparameters, optimizers, and loss functions of predictors.
The DNN model is a multilayer perceptron deep neural network. The model used here consisted of one input layer of 37 dimensions, three hidden layers of 30 dimensions, and a scalar output layer. The number of dimensions corresponds to the number of artificial neurons in each layer. Each layer was densely connected, meaning that the neurons of each layer were connected to neurons of the preceding and successive layers. The model was trained using a stochastic gradient descent, an iterative algorithm used to optimize the weights of neurons in the network, with a mini-batch size of 128. The model was optimized using Adam with Nesterov’s accelerated gradient descent [18,19,20]. The input and hidden layers used a Rectified Linear Unit (ReLU) activation function , while the output layer used the Softmax activation function. These activation functions were applied to the output of each neuron. A dropout, a regularization technique used to prevent overfitting, of 20% was applied to the input layer and each hidden layer . The categorical cross entropy function was used as the loss function. The neuron weights were initialized using normalized He initialization .
A non-diagnosis of CRC was prevalent in the data set. Each data point in the positive class was weighted approximately 40 times greater than each data point in the negative class to ensure that the output of the prediction was not unbalanced towards the dominant class.
2.5. Statistical Analyses
Distributions of socio-demographic factors, including age, gender, urbanization level, occupation, underlying disease, diabetes complication, and medications of the patient with CRC and without CRC were compared. A Chi-square test and a t-test were used to test the difference between categorical and continuous variables, respectively, between the two groups.
Accuracy was not a reliable measurement of predictor performance due to the unbalanced data distribution . Instead, we used the weighted average recall (sensitivity), precision (positive predictive value), and F1 (harmonic mean of sensitivity and precision) values to evaluate predictor performance. These three metrics all have ideal values of one. The F1, precision, and recall values were calculated for the training set, test set, and all data using the scikit-learn library.
Additionally, the receiver operating characteristic (ROC) curve was used as a metric to measure predictor performance. The ROC was calculated between the outcome and the predicted probability of the outcome by the DNN model. The ROC curve was also computed using aDCSI as the sole predictor. The area under the ROC curve (AUROC) for the DNN model and aDCSI were compared to determine the performance of the DNN model and both values were also compared to the ideal value of 1 and to the null hypothesis area of 0.5 . The ROC curve was calculated using IBM SPSS. Data management was performed using the SAS 9.4 software (SAS Institute, Cary, NC, USA). All P-values were 2 tailed and a p-value < 0.05 was considered significant.
3.1. Demographic Features of Patients
Eligible study participants consisted of 1,349,640 T2DM patients, 14,867 of whom were with CRC and 1,334,773 of whom were without CRC (Table 2). The mean age was 63.7 years (SD = 11.2 years) for the CRC group and was 56.2 years (SD = 14.2 years) for the non-CRC group. There were more men than women. The two groups preferred to reside in urbanized areas (58.9% vs. 58.8%). Most of the occupations in both groups were white-collar jobs (45.0% vs. 48.1%). The comorbidities of hypertension, congestive heart failure, colorectal polyps, COPD, CAD, and irritable bowel syndrome were significantly higher in the CRC group than in the non-CRC group. The T2DM-related cardiovascular complication was more prevalent in the CRC group than in the non-CRC group. The mean aDCSI score at the end of the follow-up was 2.75 (SD = 2.15) in the CRC group and 3.03 (SD = 2.35) in the non-CRC group. All medications listed in Table 2 had higher proportions in the non-CRC group than in the CRC group. The mean follow-up periods were 4.73 (SD = 3.33) years in the CRC group and 6.86 (SD = 3.87) years in the non-CRC group.
3.2. Evaluation of Predictor Performance
The F1, precision, the recall values and the AUROC of the DNN model across all data are outlined in Table 3. The AUROC of aDCSI is outlined in Table 4. Figure 1 shows the ROC curve of the DNN model and aDCSI model for predicting CRC.
The AUROC of the aDCSI across all three datasets was close to the null hypothesis area of 0.5, which showed that aDCSI alone cannot be used as a predictor for CRC. This signified a necessity for a multivariate prediction model that takes into account all variables. The AUROC of the DNN model across all three datasets was significantly greater than the null hypothesis area and the AUROC of the aDCSI.
The DNN model had a high precision value across the test set (0.980), which indicated that the DNN model had a relatively low false positive rate. While the recall value across the test set was lower than the precision value, the recall value was also relatively high (0.886), which signified a low false negative rate.
This national population-based study demonstrated that the DNN model appropriately predicted CRC. In contrast, a single variable predictor using aDCSI showed poorer performance compared to the DNN model.
Earlier studies suggested that compared to the general population, patients with DM are at a higher risk to develop CRC [10,11,12,13]. Although DM and cancer share several risk factors, such as obesity, aging, unhealthy food and physical inactivity , the association between DM and the risk of CRC is biologically plausible based on the findings of previous studies. The potential mechanisms contributing to the development of diabetes-associated CRC may include insulin resistance and associated hyperglycemia, hyperinsulinemia, oxidative stress, and chronic inflammation [6,11,28]. Insulin stimulates cell proliferation and most cancer cells express the insulin-like growth factor (IGF) receptor [28,29]. The IGF system is a potent growth regulator closely linked with carcinogenesis  and several observational studies and reviews have revealed a linkage between elevated IGF levels and the risks of adenomatous polyps or CRC [31,32,33,34].
In Taiwan, by using the NHIRD, several authors used traditionally statistical methods to assess the cancer risk among patients with T2DM and anti-diabetic therapies [9,35,36,37]. Hsieh applied logistic regression models to test the risk of T2DM and antidiabetic drugs on cancers. They found that there was a significantly higher risk for CRC (adjusted odds ratio = 1.206, 95% confidence interval = 1.142–1.274) in patients with T2DM . Chiu employed a Cox proportional hazards regression analysis to evaluate T2DM and antidiabetic drugs with the risk of gastrointestinal malignancy. They indicated that T2DM was significantly associated with an increased risk of CRC (adjusted hazard ratio: 1.58, 95% confidence interval = 1.28–1.94) . Tseng created multivariable Cox regression models to calculate the adjusted relative risk of T2DM on CRC and concluded that a significantly higher risk was detected . Our team carried out Cox regression analyses several years ago to determine if TZD can decrease cancer risk in T2DM patients and highlighted that no significant difference was observed for the risk of CRC in the TZD group relative to the non-TZD group . To further clarify this issue, the current study attempted to use DNN to develop a prediction model for CRC among patients with T2DM. In addition, the drug effects were also considered for the analyses. A DNN is an artificial neural network with multilayer perceptron and uses sophisticated mathematical modelling to process data in complex ways . It can be used for prediction, forecasting, diagnosis, and decision making in different fields including the healthcare field.
We found that the AUROC of the DNN was significantly greater than that of using only aDCSI. We used aDCSI as a single variable predictor and the AUROC of aDCSI in predicting CRC was close to the null hypothesis area of 0.5, showing that using aDCSI to predict CRC was not effective. In contrast, DNN was designed for a multivariate predictor and the AUROC of the DNN model was significantly greater than 0.5, indicating that a DNN model can be an effective prediction model for CRC. Our government has already launched a nationwide screening program of CRC since 2004 and a free fecal immunochemical test is offered biennially to individuals aged 50 to 75 . According to our findings, the program may extend to cover patients with T2DM beyond this age range.
Table 2 indicates that T2DM patients with hypertension, congestive heart failure, colorectal polyps, COPD, CAD, and irritable bowel syndrome had a significantly higher risk of CRC compared with T2DM patients without the corresponding underlying diseases. On the contrary, T2DM patients with hyperlipidemia, obesity, smoking, alcohol-related illness and CKD had a significantly lower risk of CRC compared with T2DM patients without the corresponding underlying diseases. As all the listed comorbidities were suggested risk factors for either T2DM or CRC or both, the findings presented here may represent the effects of intricate mechanisms among risk factors, T2DM and CRC.
The main strength of this study lies in a use of a population-based cohort with a large and nationally representative sample, which increases its generalizability in Taiwan. However, we have to acknowledge several limitations as below. Firstly, detection bias could exist because patients with T2DM are supposed to have more clinical visits than the general population and Lewis found that diabetic patients receiving medications, particularly in the first year of diagnosis, are more likely to undergo lower endoscopic examinations  and we can expect an overestimation of the incidence of CRC among these groups. However, Taylor stated that diabetic patients have a significantly poorer response to colonoscopy bowel cleansing preparations than nondiabetic patients  and it might lead to the decreased detection of CRC, resulting in an underestimation of the relationship between T2DM and CRC . Secondly, inherent limitations of NHIRD hinder our ability to get some information related to the T2DM and CRC, such as smoking habits, alcohol consumption, body mass index (BMI), family history of T2DM or CRC, diet, and physical activity. To deal with this, we tried to use the comorbidities as surrogates for some risk factors of CRC, such as COPD and a stop-smoking clinic for smoking, alcohol-related illness for alcohol, and obesity for BMI. However, we should admit that the use of these comorbidities as surrogate risk factors for CRC did not allow any interpretation of data. Finally, unlike the traditional Cox proportional hazard model, our predictive models could not provide valued levels (e.g., 95% confidence intervals and p-values) to evaluate statistical significance; instead, we used recall, precision, F1, and AUROC to determine the predictor performance. The AUROC is 0.74 for the DNN, which is acceptable. Although these limitations suggest cautious interpretation of the results of the study, the diagnosis of T2DM and CRC is highly reliable which makes our results more convincing.
In conclusion, based on the DNN predictive model, our findings suggest that Taiwanese patients with T2DM were at an increased risk of developing CRC. Although we have a relatively successful screening policy for CRC, our findings might encourage the government to consider a slight policy modification regarding screening guidelines to include patients with T2DM with a wider age range.
All authors have contributed significantly and all authors are in agreement with the content of the manuscript: Conception/design: M.-H.H., L.-M.S., C.-H.K.; Provision of study materials: C.-H.K.; collection and/or assembly of data: All authors; data analysis and interpretation: All authors; manuscript writing: All authors; final approval of manuscript: All authors. M.-H.H. and L.-M.S. equally contributed to this article.
This work was supported by grants from the Ministry of Health and Welfare, Taiwan (MOHW107-TDU-B-212-123004), the China Medical University Hospital (CMU107-ASIA-19, DMR-107-192), Academia Sinica Stroke Biosignature Project (BM10701010021), the MOST Clinical Trial Consortium for Stroke (MOST 106-2321-B-039-005), the Tseng-Lien Lin Foundation, Taichung, Taiwan, and the Katsuzo and Kiyo Aoshima Memorial Funds, Japan. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. No additional external funding was received for this study.
Conflicts of Interest
All authors report no conflicts of interest.
T2DM: type 2 diabetes; NHIRD: National Health Insurance Registration Database; ICD-9-CM: International Classification of Diseases, Ninth Revision, Clinical Modification; DNN: deep neural network; SD: standard deviation. Cancer Registry. Available online: http://tcr.cph.ntu.edu.tw/main.php?Page=A5B2 (accessed on 25 July 2018).
- Global Report on Diabetes: World Health Organization. Available online: http://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf?sequence=1 (accessed on 25 July 2018).
- Jiang, Y.D.; Chang, C.H.; Tai, T.Y.; Chen, J.F.; Chuang, L.M. Incidence and prevalence rates of diabetes mellitus in Taiwan: Analysis of the 2000–2009 Nationwide Health Insurance database. J. Formos. Med. Assoc. 2012, 111, 599–604. [Google Scholar] [CrossRef] [PubMed]
- Tsilidis, K.K.; Kasimis, J.C.; Lopez, D.S.; Ntzani, E.E.; Ioannidis, J.P.A. Type 2 diabetes cancer: Umbrella review of meta-analyses of observational studies. BMJ 2015, 350, g7607. [Google Scholar] [CrossRef] [PubMed]
- Wang, M.; Hu, R.Y.; Wu, H.B.; Pan, J.; Gong, W.W.; Guo, L.H.; Zhong, J.M.; Fei, F.R.; Yu, M. Cancer risk among patients with type 2 diabetes mellitus: A population-based prospective study in China. Sci. Rep. 2015, 5, 11503. [Google Scholar] [CrossRef] [PubMed]
- Ballotari, P.; Vicentini, M.; Manicardi, V.; Gallo, M.; Chiatamone Ranieri, S.; Greci, M.; Giorgi Rossi, P. Diabetes and risk of cancer incidence: Results from a population-based cohort study in northern Italy. BMC Cancer 2017, 17, 703. [Google Scholar] [CrossRef] [PubMed]
- Giovannucci, E.; Harlan, D.M.; Archer, M.C.; Bergenstal, R.M.; Gapstur, S.M.; Habel, L.A.; Pollak, M.; Regensteiner, J.G.; Yee, D. Diabetes and cancer: A consensus report. Diabetes Care 2010, 33, 1674–1685. [Google Scholar] [CrossRef] [PubMed]
- Johnson, J.A.; Carstensen, B.; Witte, D.; Bowker, S.L.; Lipscombe, L.; Renehan, A.G. Diabetes and cancer (1): Evaluating the temporal relationship between type 2 diabetes and cancer incidence. Diabetologia 2012, 55, 1607–1618. [Google Scholar] [CrossRef] [PubMed]
- Jee, S.H.; Ohrr, H.; Sull, J.W.; Yun, J.E.; Ji, M.; Samet, J.M. Fasting serum glucose level and cancer risk in Korean men and women. JAMA 2005, 293, 194–202. [Google Scholar] [CrossRef] [PubMed]
- Hsieh, M.C.; Lee, T.C.; Cheng, S.M.; Tu, S.T.; Yen, M.H.; Tseng, C.H. The influence of type 2 diabetes and glucose-lowering therapies on cancer risk in the Taiwanese. Exp. Diabetes Res. 2012, 2012, 413782. [Google Scholar] [CrossRef] [PubMed]
- Deng, L.; Gui, Z.; Zhao, L.; Wang, J.; Shen, L. Diabetes mellitus and the incidence of colorectal cancer: An updated systematic review and meta-analysis. Dig. Dis. Sci. 2012, 57, 1576–1585. [Google Scholar] [CrossRef] [PubMed]
- Yuhara, H.; Steinmaus, C.; Cohen, S.E.; Corley, D.A.; Tei, Y.; Buffler, P.A. Is diabetes mellitus an independent risk factor for colon cancer and rectal cancer? Am. J. Gastroenterol. 2011, 106, 1911–1921. [Google Scholar] [CrossRef] [PubMed]
- Sinagra, E.; Guarnotta, V.; Raimondo, D.; Mocciaro, F.; Dolcimascolo, S.; Rizzolo, C.A.; Puccia, F.; Maltese, N.; Citarrella, R.; Messina, M.; et al. Colorectal cancer risk in patients with type 2 diabetes mellitus: A single-center experience. J. Biol. Regul. Homeost. Agents 2017, 31, 1101–1107. [Google Scholar] [PubMed]
- Jiang, Y.; Ben, Q.; Shen, H.; Lu, W.; Zhang, Y.; Zhu, J. Diabetes mellitus and incidence and mortality of colorectal cancer: A systematic review and meta-analysis of cohort studies. Eur. J. Epidemiol. 2011, 26, 863–876. [Google Scholar] [CrossRef] [PubMed]
- Cancer Statistics Annual Report: Taiwan Cancer Registry. Available online: http://tcr.cph.ntu.edu.tw/main.php?Page=N2 (accessed on 25 July 2018).
- Cancer Statistics: Cancer Incidence Trends. Taiwan Cancer Registry. Available online: http://tcr.cph.ntu.edu.tw/main.php?Page=A5B2 (accessed on 25 July 2018).
- Young, B.A.; Lin, E.; Von, K.M.; Simon, G.; Ciechanowski, P.; Ludman, E.J.; Everson-Stewart, S.; Kinder, L.; Oliver, M.; Boyko, E.J.; et al. Diabetes complications severity index and risk of mortality, hospitalization, and healthcare utilization. Am. J. Manag. Care 2008, 14, 15–23. [Google Scholar] [PubMed]
- Chang, H.Y.; Weiner, J.P.; Richards, T.M.; Bleich, S.N.; Segal, J.B. Validating the adapted Diabetes Complications Severity Index in claims data. Am. J. Manag. Care 2012, 18, 721–726. [Google Scholar] [PubMed]
- Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. PMLR 2013, 28, 1139–1147. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. Available online: https://arxiv.org/pdf/1412.6980.pdf (accessed on 25 July 2018).
- Dozat, T. Incorporating Nesterov Momentum into Adam. In Proceedings of the International Conference on Learning Representations Workshop, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Hahnloser, R.H.; Sarpeshkar, R.; Mahowald, M.A.; Douglas, R.J.; Seung, H.S. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 2000, 405, 947–951. [Google Scholar] [CrossRef] [PubMed]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on Imagenet Classification. Available online: https://arxiv.org/pdf/1502.01852.pdf (accessed on 25 July 2018).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isardet, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef] [PubMed]
- Sandhu, M.S.; Dunger, D.B.; Giovannucci, E.L. Insulin, insulin-like growth factor-I (IGF-I), IGF binding proteins, their biologic interactions, and colorectal cancer. J. Natl. Cancer Inst. 2002, 94, 972–980. [Google Scholar] [CrossRef] [PubMed]
- Yavari, K.; Taghikhani, M.; Maragheh, M.G.; Mesbah-Namin, S.A.; Babaei, M.H. Knockdown of IGF-IR by RNAi inhibits SW480 colon cancer cells growth in vitro. Arch. Med. Res. 2009, 40, 235–240. [Google Scholar] [CrossRef] [PubMed]
- LeRoith, D.; Baserga, R.; Helman, L.; Roberts, C.T., Jr. Insulin-like growth factors and cancer. Ann. Intern. Med. 1995, 122, 54–59. [Google Scholar] [CrossRef] [PubMed]
- Renehan, A.G.; Zwahlen, M.; Minder, C.; O’Dwyer, S.T.; Shalet, S.M.; Egger, M. Insulin-like growth factor (IGF)-I, IGF binding protein-3, and cancer risk: Systematic review and meta-regression analysis. Lancet 2004, 363, 1346–1353. [Google Scholar] [CrossRef]
- Schoen, R.E.; Weissfeld, J.L.; Kuller, L.H.; Thaete, F.L.; Evans, R.W.; Hayes, R.B.; Rosen, C.J. Insulin-like growth factor-I and insulin are associated with the presence and advancement of adenomatous polyps. Gastroenterology 2005, 129, 4644–4675. [Google Scholar] [CrossRef] [PubMed]
- Giovannucci, E. Insulin, insulin-like growth factors and colon cancer: A review of the evidence. J. Nutr. 2001, 131, 3109S–3120S. [Google Scholar] [CrossRef] [PubMed]
- Davies, M.; Gupta, S.; Goldspink, G.; Winslet, M. The insulin-like growth factor system and colorectal cancer: Clinical and experimental evidence. Int. J. Colorectal. Dis. 2006, 21, 201–208. [Google Scholar] [CrossRef] [PubMed]
- Chiu, C.C.; Huang, C.C.; Chen, Y.C. Increased risk of gastrointestinal malignancy in patients with diabetes mellitus and correlations with anti-diabetes drugs: A nationwide population-based study in Taiwan. Intern. Med. 2013, 52, 52939–52946. [Google Scholar] [CrossRef]
- Tseng, C.H. Diabetes, metformin use, and colon cancer: A population-based cohort study in Taiwan. Eur. J. Endocrinol. 2012, 167, 409–416. [Google Scholar] [CrossRef] [PubMed]
- Kao, C.H.; Sun, L.M.; Chen, P.C.; Lin, M.C.; Liang, J.A.; Muo, C.H.; Chang, S.N.; Sung, F.C. A population-based cohort study in Taiwan-use of insulin sensitizers can decrease cancer risk in diabetic patients. Ann. Oncol. 2013, 24, 523–530. [Google Scholar] [CrossRef] [PubMed]
- Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.W.; Chen, H.H.; Wu, M.S.; Chiu, H.M. Current status and future challenge of population-based organized colorectal cancer screening: Lesson from the first decade of Taiwanese program. J. Formos. Med. Assoc. 2018, 117, 358–364. [Google Scholar] [CrossRef] [PubMed]
- Lewis, J.D.; Capra, A.M.; Achacoso, N.S.; Ferrara, A.; Levin, T.R.; Quesenberry, C.P., Jr.; Habel, L.A. Medical therapy for diabetes is associated with increased use of lower endoscopy. Pharmacoepidemiol. Drug Saf. 2007, 16, 1195–1202. [Google Scholar] [CrossRef] [PubMed]
- Taylor, C.; Schubert, M.L. Decreased efficacy of polyethylene glycol lavage solution (golytely) in the preparation of diabetic patients for outpatient colonoscopy: A prospective and blinded study. Am. J. Gastroenterol. 2001, 96, 710–714. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The ROC curve of the DNN model and aDCSI model in predicting CRC.
Table 1. Distribution of training and test sets.
|All Patients||Training Set||Test Set|
Table 2. Baseline characteristics of T2DM patients with and without colorectal cancer.
|N = 1334773||N = 14867|
|Age group (year)||<0.001|
|Mean (SD) (year) †||56.2||14.2||63.7||11.2||<0.001|
|Urbanization level #||0.001|
|Congestive heart failure||183,790||13.8||2076||14.0||<0.001|
|Inflammatory bowel disease||49,295||3.69||575||3.87||0.26|
|Irritable bowel syndrome||182,951||13.7||2781||18.7||<0.001|
|Diabetes complication (components of the aDCSI)|
|Peripheral vascular disease||365,797||27.4||3406||22.9||<0.001|
|Mean aDCSI score (SD) †|
|End of follow-up||3.03||2.35||2.75||2.15||<0.001|
|Other antidiabetic drugs||365,662||27.4||3071||20.7||<0.001|
|Mean follow-up for endpoint, y (SD) †||6.86||3.87||4.73||3.33||<0.001|
#: The urbanization level was categorized by the population density of the residential area into 4 levels, with level 1 as the most urbanized and level 4 as the least urbanized. ‡: Other occupations included primarily retired, unemployed, or low income populations. aDCSI = adapted Diabetes Complication Severity Index. Chi-square test, and †: t-test comparing subjects with and without death.
Table 3. Performance of DNN across all data, the training set, and the test set.
|Dataset||F1||Precision||Recall||AUROC||AUROC 95% CI||AUROC SE|
|All data (n = 1349640)||0.931||0.982||0.889||0.738||0.734–0.742||0.002|
|Training set (n = 1315899)||0.931||0.982||0.889||0.739||0.735–0.743||0.002|
|Test set (n = 337410)||0.929||0.980||0.886||0.700||0.674–0.727||0.013|
Table 4. The receiver operating characteristic of aDCSI.
|Dataset||AUROC||AUROC 95% CI||AUROC SE|
|All data (n = 1349640)||0.492||0.487–0.497||0.003|
|Training set (n = 1315899)||0.492||0.487–0.498||0.003|
|Test set (n = 337410)||0.498||0.466–0.530||0.016|
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).