Association of Preterm Birth with Inflammatory Bowel Disease and Salivary Gland Disease: Machine Learning Analysis Using National Health Insurance Data

This study employs machine learning and population data for testing the associations of preterm birth with inflammatory bowel disease (IBD), salivary gland disease, socioeconomic status and medication history, including proton pump inhibitors. The source of population-based retrospective cohort data was the Korea National Health Insurance Service claims data for all women aged 25–40 years and who experience their first childbirths as singleton pregnancy during 2015 to 2017 (402,092 women). These participants were divided into the Ulcerative Colitis (UC) Group (1782 women), the Crohn Group (1954 women) and the Non-IBD Group (398,219 women). For each group, the dependent variable was preterm birth during 2015–2017, and 51 independent variables were included. Random forest variable importance was employed for investigating the main factors of preterm birth and testing its associations with salivary gland disease, socioeconomic status and medication history for each group. The proportion of preterm birth was higher for the UC Group and the Non-IBD Group than for the Crohn Group: 7.86%, 7.17% vs. 6.76%. Based on random forest variable importance, salivary gland disease was a top 10 determinant for the prediction of preterm birth for the UC Group, but this was not the case for the Crohn Group or the Non-IBD Group. The top 5 variables of preterm birth for the UC Group during 2015–2017 were socioeconomic status (8.58), age (8.00), proton pump inhibitors (2.35), progesterone (2.13) and salivary gland disease in 2014 (1.72). In conclusion, preterm birth has strong associations with ulcerative colitis, salivary gland disease, socioeconomic status and medication history including proton pump inhibitors.


Introduction
Preterm birth and inflammatory bowel disease (IBD) are major parts of the disease burden on the globe [1][2][3][4][5][6][7][8][9]. Every year, 15 million babies are born preterm in the world, and preterm birth is a main contributor for global neonatal and childhood mortality, i.e., 1 million deaths among those aged 0 to 4 years [1,2]. For example, 1 out of every 10 babies was preterm in the United States during 2003-2012, that is, 5,042,982 (12.2%) of 41,206,315 newborns [3]. Cost-effective interventions are expected to prevent three-quarters of mortality from preterm birth [4]. During 1990-2017, indeed, the global prevalence of IBD registered a rapid growth of 85.1% from 3.7 million to 6.8 million. IBD (or "non-infectious chronic inflammation of the gastrointestinal tract") includes ulcerative colitis (UC) and Crohn's disease. Its precise etiology is still unknown, but it is assumed to result from an inappropriate immune response to environmental factors such as microbial antigens in genetically susceptible hosts [5]. IBD is common among young reproductive-aged women, and it is reported to cause salivary problems with the lack of antimicrobial peptides [6][7][8][9]. In this context, it is a plausible hypothesis that preterm birth has strong associations with IBD and salivary gland disease.
To our best knowledge, however, no study with population-based data has been available on the associations of preterm birth with IBD and salivary gland disease. This study presents a comprehensive machine-learning analysis on this topic, using a populationbased cohort of 402,092 participants and a rich collection of 51 predictors. This study brings new results on the associations of preterm birth with UC, salivary gland disease, socioeconomic status and medication history, including proton pump inhibitors. Proton pump inhibitors medication, which is common for gastrointestinal disease, is expected to cause gastric acidity and compositional changes in the gut microbiota [10]. It is a plausible hypothesis that these changes can lead to the increased risks of infection, IBD, salivary gland disease and preterm birth.

Participants and Variables
The source of retrospective cohort data for this study was the Korea National Health Insurance Service claims data for all women in Korea who aged 25-40 years and gave first childbirths as singleton pregnancy during 2015 to 2017 (402,092 women). These inclusion criteria were adopted given that data with a wider range and a bigger size went beyond the maximum capacity of the Korea National Health Insurance Service data analysis center located in Seoul, Korea.
The 402,092 participants were divided into the UC Group (1782 women), the Crohn Group (or Crohn's Disease Group) (1954 women) and the Non-IBD Group (398,219 women). For each group, the dependent variable was preterm birth during 2015-2017 (birth before 37 weeks of gestation). Four categories of preterm birth were introduced according to the ICD-10 Code: (1) PTB 1-preterm birth with premature rupture of membranes (PROM) only; (2) PTB 2-preterm labor and birth without PROM; (3) PTB 3-PTB 1 or PTB 2; (4) PTB 4-PTB 3 or other indicated preterm birth (Table S1, a supplementary table). Each of these categories was coded as "no" vs. "yes". A total of 51 independent variables covered the following information: (1) demographic/socioeconomic determinants in 2014 including age (years), socioeconomic status measured by an insurance fee with the range of 1 (the highest group) to 20 (the lowest group), and region (city) (no vs. yes); (2) disease information (no vs. yes) for each of the years 2002-2014, namely, diabetes, hypertension and salivary gland disease; (3) medication history (no vs. yes) in 2014, that is, benzodiazepine, calcium channel blocker, nitrate, progesterone, proton pump inhibitor, sleeping pills and antidepressant [10][11][12]; (4) obstetric information (no vs. yes) in 2014 including in vitro fertilization, myoma uteri and prior cone. The 39 disease variables were presented as Diabetes_2002, . . . , Diabetes_2014, Hypertension_2002, . . . , Hypertension_2014, and Salivary_Gland_2002, . . . , Salivary_Gland_2014. The disease information and the medication history were screened from ICD-10 and ATC codes, respectively (Tables S1 and S2,  supplementary tables). Here, the definition of diabetes was based on fasting glucose equal to or higher than 126 mg/dL or antidiabetic medication [13], while the definition of hypertension was based on systolic/diastolic blood pressure equal to or higher than 140/90 mmHg or antihypertensive medication [14]. The definitions of UC, Crohn's disease and salivary gland disease were adopted from the Mayo Clinic and the National Institute of Health: (UC) "an inflammatory bowel disease causing the inflammations and ulcers (sores) of the digestive tract" (https://www.mayoclinic.org/diseases-conditions/ ulcerative-colitis/symptoms-causes/syc-20353326) (accessed on 1 December 2021); (Crohn's disease) "an inflammatory bowel disease causing the inflammations of the digestive tract, which can lead to abdominal pain, severe diarrhea, fatigue, weight loss and malnutrition" (https://www.mayoclinic.org/diseases-conditions/crohns-disease/ symptoms-causes/syc-20353304) accessed on (1 December 2021); (salivary gland disease) "if the salivary glands are damaged or aren't producing enough saliva, it can affect taste, make chewing and swallowing more difficult, and increase the risk for cavities, tooth loss, and infections in the mouth" (https://www.nidcr.nih.gov/health-info/saliva-salivarygland-disorders) accessed on (1 December 2021).

Analysis
Logistic regression, the random forest and the artificial neural network were employed for the prediction of preterm birth [10][11][12]. The three models were chosen based on the results of a recent review on the application of artificial intelligence in early diagnosis of preterm birth [11]: The summary of the review suggests that different machine learning approaches would be optimal for different types of data regarding the prediction of preterm birth, that is, logistic regression, random forest and/or the artificial neural network for numeric data. The number of trees was 100, and GINI was adopted as node impurity for the random forest. The number of hidden layers was 5 and the Broyden-Fletcher-Goldfarb-Shanno algorithm was used as an optimization algorithm for the artificial neural network. The data of 402,092 cases were split into training and validation sets with a 70:30 ratio (281,464 vs. 120,628 cases). A criterion for the validation of the trained models was accuracy, a ratio of correct predictions among 120,628 cases. Random forest variable importance (total decrease in GINI averaged over 100 trees) was introduced for investigating the main factors of preterm birth and testing its associations with salivary gland disease, socioeconomic status and medication history including proton pump inhibitors. R-Studio 1.3.959 (R-Studio Inc.: Boston, MA, USA) was employed for the analysis from 1 August 2020 to 31 January 2021.
This retrospective study was approved by the Institutional Review Board (IRB) of Korea University Anam Hospital, Seoul, Korea, on 5 November 2018 (2018AN0365). Informed consent was waived by the IRB.

Random Forest Variable Importance
In terms of accuracy, the random forest was slightly better than logistic regression and the artificial neural network ( Table 1). The average values of random forest variable importance for PTB 1, PTB 2, PTB 3 and PTB 4 are presented in Figures S1-S3 (Supplementary Figures), i.e., the UC Group ( Figure S1), the Crohn Group ( Figure S2) and Non-IBD Group ( Figure S3). Based on the average values of random forest variable importance for PTB 1, PTB 2, PTB 3 and PTB 4, salivary gland disease was a top 10 determinant for the prediction of preterm birth for the UC Group, but it was not the case for the Crohn Group or the Non-IBD Group. The top 10 variables of preterm birth for the UC Group during 2015-2017 in Figure S1 were socioeconomic status (8.

Random Forest Variable Importance
In terms of accuracy, the random forest was slightly better than logistic regression and the artificial neural network (  Figure S1), the Crohn Group ( Figure S2) and Non-IBD Group ( Figure S3). Based on the average values of random forest variable importance

Logistic Regression Coefficient Estimated
The results of logistic regression (Tables 2-4) present the sign and magnitude for the effect of a major determinant on preterm birth, i.e., the UC Group (Table 2), the Crohn Group (Table 3) and the Non-IBD Group (Table 4). For instance, the odds of PTB 4 in the UC Group will be greater by 4355% for those with salivary gland disease in 2014 than those without it (Table 2).

Findings of Study
In general, the proportion of preterm birth was higher for the UC Group and the Non-IBD Group than for the Crohn Group. Based on random forest variable importance, salivary gland disease was a top 10 determinant for the prediction of preterm birth for the UC Group, but it was not the case for the Crohn Group or the Non-IBD Group. The top 5 variables of preterm birth for the UC Group during 2015-2017 were socioeconomic status, age, proton pump inhibitors, progesterone and salivary gland disease in 2014.

Contributions of Study
Existing literature focuses on microbial infection as a possible mechanism between IBD and preterm birth or salivary gland disease [15][16][17]. In other words, preterm birth and salivary gland disease can be considered to be these extra-intestinal manifestations of IBD. However, the strength of the association among the three diseases can vary between UC and Crohn's disease as two types of IBD. For example, preterm birth had a stronger association with UC than with Crohn's disease in a previous study [16]. This agrees with the result of this study: The proportion of preterm birth was higher for the UC Group and the Non-IBD Group than for the Crohn Group. One possible explanation for this finding is that the Crohn's Group pays more attention to disease treatment and makes less attempts at pregnancy compared to the UC Group and that this leads to a discrepancy between their proportions of preterm birth. To our best knowledge, however, no machine-learning study with population-based data have been available on this topic. This study presents a comprehensive machine-learning analysis on this topic, using a population-based cohort of 402,092 participants and a rich collection of 51 predictors. Specifically, this study brings new results on an association among proton pump inhibitors medication before pregnancy, UC and preterm birth. A systematic review of 60 studies with 134,940 participants in total reports no significant relationship between proton pump inhibitors, medication during pregnancy and preterm birth [18]. A more recent systematic review of 26 observational studies confirmed this finding [19]. However, this study considered proton pump inhibitors before pregnancy instead of its gestational counterpart. Based on the random forest variable importance of this study for the UC Group, proton pump inhibitors in 2014 (before pregnancy) were the third most important determinant of preterm birth during 2015-2017, second only to socioeconomic status in 2014 and age in 2014 (the average for PTB1, PTB 2, PTB 3 and PTB 4). One possible explanation for this finding is that the use proton pump inhibitors before pregnancy causes gastric acidity and compositional changes in the gut microbiota, which lead to the increased risks of infection, UC and preterm birth [10,20]. The number of predictors and the size of the data in this study exceed those of the existing literature. For this reason, the results of this study would be more robust than those in the previous studies.

Limitations of Study
Firstly, this study did not examine possible mediating effects among variables. Secondly, this study adopted the binary category of preterm birth as no vs. yes. However, preterm birth can have multiple categories, and it will be a good topic for future study to compare different predictors for various categories of preterm birth, e.g., extremely preterm (less than 28 weeks), very preterm (28 to 32 weeks) and moderate to late preterm (32 to 37 weeks). Thirdly, this study considered the most comprehensive collection of diseases and medications regarding preterm birth available in the Korea National Health Insurance Service data, but it was not the scope of this study to explore and evaluate various pathways among diseases, complications, medications and preterm birth. Little research has been conducted, and more investigation is needed on this topic. Fourthly, according to a recent review, optimal machine learning methods are expected to vary with different types of data for predicting preterm birth: the random forest, logistic regression and/or the artificial neural networsk in the case of numeric data; the support vector machine in the case of electrohysterogram data; the convolutional neural network in the case of image data; and the recurrent neural network in the case of text data [11]. Uniting various kinds of machine learning approaches for various kinds of preterm birth data would bring new innovations and deeper insights in this line of research. Fifthly, further investigations of single vs. multiple gestation would deliver more insights and more detailed clinical implications. Sixthly, considering the following variables would extend the horizon of research in this direction much further: nutritional status, working conditions, drinking, smoking, biochemical markers, cervical length, cesarean section, familial predisposition, fetal malformation and uterine malformation. Seventhly, model calibration was not considered in this study. Eighthly, there might be room for improvement in the areas under the receiver operating characteristic curves in this study. Ninthly, errors were likely to exist in the stage of data collection but examining this issue was beyond the scope of this study. Little literature is available and more investigation is required on this topic. Finally, the numbers of participants with PTB 4 excluded in PTB 3 and salivary gland disease in the UC Group and the Crohn Group were very small in this study. The participants for this study aged 25-40 years and gave first childbirths as singleton pregnancy during 2015-2017. These inclusion criteria were adopted, given that data with a wider range and a bigger size went beyond the maximum capacity of the Korea National Health Insurance Service data analysis center located in Seoul, Korea. Expanding the data is expected to improve the validity and reliability of this study further.

Conclusions
Preterm birth has strong associations with ulcerative colitis, salivary gland disease, socioeconomic status and medication history including proton pump inhibitors. For preventing preterm birth, appropriate medication would be needed alongside preventive measures for ulcerative colitis and salivary gland disease and the promotion of socioeconomic status for pregnant women.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/ijerph19053056/s1, Table S1: ICD-10 Code for Preterm Birth, Inflammatory Bowel Disease and Salivary Gland Disease; Table S2: ATC Code for Medication; Table S3: Descriptive Statistics on Preterm Birth and Its Determinants of All; Table S4: Descriptive Statistics on Preterm Birth and Its Determinants of Ulcerative-Colitis Group; Table S5: Descriptive Statistics on Preterm Birth and Its Determinants of Crohn Group; Table S6: Descriptive Statistics on Preterm Birth and Its Determinants of Non-Inflammatory-Bowel-Disease Group; Figure S1: Average Values of Random Forest Variable Importance for PTB 1, PTB 2, PTB 3 and PTB 4: Ulcerative-Colitis Group; Figure S2: Average Values of Random Forest Variable Importance for PTB 1, PTB 2, PTB 3 and PTB 4: Crohn Group; Figure S3: Average Values of Random Forest Variable Importance for PTB 1, PTB 2, PTB 3 and PTB 4: Non-Inflammatory-Bowel-Disease Group.  Informed Consent Statement: Informed consent was waived by the institutional review board, given that data were deidentified.

Data Availability Statement:
The data presented in this study are not publicly available. However, the data are available from the corresponding author upon reasonable request and under the permission of Korea National Health Insurance Service.

Conflicts of Interest:
The authors declare no conflict of interest.