Risk Prediction for Gastric Cancer Using GWAS-Identifie Polymorphisms, Helicobacter pylori Infection and Lifestyle-Related Risk Factors in a Japanese Population

Simple Summary Gastric cancer remains the major cancer in Japan and worldwide. It is expected that practical intervention strategies for prevention, such as personalized approaches based on genetic risk models, will be developed. Here, we developed and validated a risk prediction model for gastric cancer using genetic, biological, and lifestyle-related risk factors. Results showed that the combination of selected GWAS-identified SNP polymorphisms and other predictors provided high discriminatory accuracy and good calibration in both the derivation and validation studies; however, the contribution of genetic factors to risk prediction was limited. The greatest contributor to risk prediction was ABCD classification (Helicobacter pylori infection-related factor). Abstract Background: As part of our efforts to develop practical intervention applications for cancer prevention, we investigated a risk prediction model for gastric cancer based on genetic, biological, and lifestyle-related risk factors. Methods: We conducted two independent age- and sex-matched case–control studies, the first for model derivation (696 cases and 1392 controls) and the second (795 and 795) for external validation. Using the derivation study data, we developed a prediction model by fitting a conditional logistic regression model using the predictors age, ABCD classification defined by H. pylori infection and gastric atrophy, smoking, alcohol consumption, fruit and vegetable intake, and 3 GWAS-identified polymorphisms. Performance was assessed with regard to discrimination (area under the curve (AUC)) and calibration (calibration plots and Hosmer–Lemeshow test). Results: A combination of selected GWAS-identified polymorphisms and the other predictors provided high discriminatory accuracy and good calibration in both the derivation and validation studies, with AUCs of 0.77 (95% confidence intervals: 0.75–0.79) and 0.78 (0.77–0.81), respectively. The calibration plots of both studies stayed close to the ideal calibration line. In the validation study, the environmental model (nongenetic model) was significantly more discriminative than the inclusive model, with an AUC value of 0.80 (0.77–0.82). Conclusion: The contribution of genetic factors to risk prediction was limited, and the ABCD classification (H. pylori infection-related factor) contributes most to risk prediction of gastric cancer.


Introduction
Gastric cancer is the second most common cancer [1] and is the third leading cause of cancer death in men and women [2] in Japan. Despite dramatic declines in incidence and mortality rates in the last several decades, it still confirms its status as a major public health issue in this country. Epidemiological evidence for the development of gastric cancer has been accumulating, and Helicobacter pylori (H. pylori) infection is now confirmed to be a convincing risk factor for gastric cancer in Japanese [3,4], in addition to the subsequent chronic atrophic gastritis that follows H. pylori infection [5]. Stratification by a combination of H. pylori infection and atrophic gastritis, known as ABCD classification, was associated with gastric cancer risk in case-control studies [4,6] and well predicted the incidence of gastric cancer in prospective studies [7][8][9][10][11][12][13][14]. In contrast, consumption of fruits and vegetables is recognized as a protective factor in gastric cancer. A meta-analysis of global data showed that fruit and vegetable consumption is associated with a significant reduction in gastric cancer risk [15]. With regard to tobacco, an association with tobacco smoking has been clearly established worldwide [16], including Japan [17], and 11% of gastric cancer cases may be attributed to it [16]. Similarly, alcohol drinking is recognized as a cause of gastric cancer. A large pooled analysis found an association between heavy alcohol drinking and risk of gastric cancer [18].
In this study, we examined a risk prediction model using these GWAS-identified SNPs and several risk factors of gastric cancer for possible use in distinguishing people at high and low risk of gastric cancer in personalized prevention settings.

Study Population
Two independent case-control studies were conducted to develop a risk prediction model. The study subjects were selected from the participants of the Hospital Epidemiology Research Program at Aichi Cancer Center (HERPACC)-2 (2001)(2002)(2003)(2004)(2005) for the derivation study and HERPACC-3 (2005HERPACC-3 ( -2013 for the validation study. The frameworks of HERPACC-2 and HERPACC-3 have been described elsewhere [32][33][34]. Briefly, all first-visit outpatients aged 20-79 were recruited to participate in the HERPACC-2 and -3. They were asked to fill in a questionnaire on lifestyle information before their first medical examination and provide blood samples. Response rate for enrollment was 97% for subjects in HERPACC-2, of whom half provided blood samples. In HERPACC-3, 66.4% of participants responded to the questionnaire, of whom 62% provided blood samples. In each study, cases were histologically diagnosed with gastric cancer, and controls were confirmed to have no cancer and no history of neoplasm. Controls were randomly selected and individually matched by age (± 5 years) and sex at a case-control ratio of 1:2 in the derivation study and 1:1 in the validation study. As a result, the present analysis included 696 cases/1392 controls in the derivation study and 795 cases/795 controls in the validation study. Written informed consent was obtained from all participants. The study was approved by the institutional ethics committee of Aichi Cancer Center.

Assessment of Helicobacter pylori Infection and Gastric Atrophy
All cases were examined for plasma IgG level for H. pylori using a commercially available direct enzyme-linked immunosorbent assay kit ('E Plate "Eiken" H. pylori Antibody'; Eiken Kagaku, Tokyo, Japan). This kit is commonly used in medical studies in Japan [4,35]. A positive status for H. pylori infection was defined as an anti-H. pylori IgG antibody level >10 U/mL in serum [4,35]. Serum pepsinogens (PG) were measured by chemiluminescence enzyme immunoassay, and gastric mucosal atrophy was defined by a PG I value ≤ 70 ng/mL and PG I/PG II ≤ 3 ng/mL [36,37]. We applied the ABCD classification [38,39]

Information on Lifestyle Risk Factors
To select lifestyle factors, we referred to Development and Evaluation of Cancer Prevention Strategies in Japan [3] and extracted risk/preventive factors for gastric cancer. In this matrix, smoking and H. pylori infection are certain risk factors, and vegetable and fruit intake are possible preventative factors for both men and women. Cereal intake (possible risk factor) and salt intake (almost certain risk factor) are omitted from lifestyle risk factors, as they cannot be estimated by our food frequency questionnaire.
Information on lifestyle factors was collected by a self-administered questionnaire. Smoking status was classified into three categories of never smoker, former smoker, and current smoker, with former smokers defined as those who had quit at least 1 year before study enrolment. Alcohol consumption status was classified into four categories: never, low, moderate, and heavy. Those who seldom or never drank were defined as never drinkers. Low drinking was defined as consumption on 4 days or fewer per week, moderate drinking as consumption of less than 46 g of ethanol on 5 days or more per week, and heavy drinking as consumption of more than 46 g ethanol on 5 days or more per week. Information on family history of gastric cancer was obtained in the two categories of yes and no regarding a history of gastric cancer in any first-degree relative. Consumption of fruits and vegetables was determined using a food frequency questionnaire, which included 43 single food items in eight frequency categories [40]. The food frequency questionnaire was validated using a 3-day weighed dietary record as standard, which showed that reproducibility and validity were satisfactory [40,41]. Participants were divided into three groups based on the distribution of fruit and vegetable consumption among controls in the derivation study (tertiles).

Statistical Analysis
To create a risk prediction model, we selected established environmental and lifestyle factors of gastric cancer (smoking (never, former and current), alcohol consumption (never, moderate, high-moderate, and heavy), energy-adjusted fruit and vegetable intake (in tertiles among controls in the derivation study), family history of gastric cancer (first-degree relative), and ABCD classification (in indicator variables, A, B, C, and D)). We examined the impact of each risk factor by conditional logistic regression. Age as continuous, sex, family history of gastric cancer, and referral pattern were included as adjusted factors in the model. Subjects with an unknown status for these variables were assigned dummy variables for the missing categories and included in the analysis. To assess the specific impact of a selected factor, we estimated the odds ratios (ORs) and corresponding 95% confidence intervals (CI) using uni-and multivariable conditional logistic regression models in the derivation study. For genetic factors, we evaluated the impact of each polymorphism by OR, 95% CI, and p-value adjusted for age and sex in both studies. These were calculated using the per-allele model of conditional logistic regression. To create risk prediction models, we selected polymorphisms with a value of p < 0.01 in the derivation study as risk predictors.
Performance of the risk prediction model was assessed in both the derivation study (as "internal validation") and the validation study (as "external validation") using standard methods for measurement of discrimination and calibration [42]. Discriminability was assessed by calculating the area under the curve (AUC) in the receiver-operating characteristic (ROC) curve, commonly known as the concordance (c) statistic. In the ROC, sensitivity is shown on the y-axis and false positive rate on the x-axis; a straight line in ROC indicates random classification of cases and controls, with a minimum AUC of 0.5. An AUC value of 1 corresponds to perfect classification, while values of 0.7 and 0.8 rate the model as having acceptable discrimination ability and above 0.8 as having excellent discrimination ability [43]. The AUC values were compared using the method of DeLong et al. [44]. The calibration of the models was assessed by the Hosmer-Lemeshow goodness-of-fit statistic and calibration plots. Subjects were divided into subgroups by decile of predicted probability. The Hosmer-Lemeshow statistic is computed based on a χ2-test, which compares the observed frequencies with the predicted frequencies in the ten groups; a nonsignificant p-value indicates good calibration, whereas a significant p-value indicates disagreement between the predicted and observed outcomes. In a calibration plot, the mean predicted probability was plotted against the mean observed probability for each decile. Ideally, the predicted probability equals the observed probability, so perfect predictions should lie on the 45 • line [42]. In addition, with perfect calibration, the estimated calibration slope equals 1 [45]. A slope below 1.0 reflects overfitting of the model [46], which indicates the need to shrink the regression coefficients [42].
All analyses were performed using Stata/SE 14 (Stata Corp, College Station, TX, USA).

Results
The two case-control studies were largely comparable ( Table 1). The proportion of current smokers was higher in cases than controls in both (42.2% and 30.9% in the derivation study and 31.7% and 22.8% in the validation study, respectively), as was the prevalence of H. pylori infection (82.2% and 55.6 in the derivation study and 71.7% and 41.9% in the validation study, respectively). Cases were more likely to have daily fruit and vegetable consumption than controls in both studies. Alcohol consumption and family history showed no apparent difference between cases and controls.  .59 g/day), and low intake (less than 109.41 g/day). Table 2 shows associations of selected lifestyle-related or biological factors in our prediction model, namely smoking, alcohol consumption, fruit and vegetable intake, and the ABCD classification with gastric cancer risk. We observed a statistically significant asso-ciation with each selected factor in both studies, with the exception of alcohol consumption and fruit and vegetable intake. The results of the validation study and meta-analysis are presented in Table S1.  Table 3 presents the association between 14 polymorphisms and gastric cancer risk. We selected three polymorphisms, namely rs4072037, rs2294008 and rs7849280, with values of p < 0.01, to develop a risk prediction model. The results of the validation study and meta-analysis are presented in Table S2.
Next, we assessed the performance of the prediction model (Tables 4 and 5; Figures 1 and 2). The discriminative abilities in the validation study were similar to those in the derivation study. The inclusive model provided acceptable discrimination in both the derivation and validation studies with AUC values of 0.7677 (0.7465-0.789) and 0.7823 (0.7694 chromosome 0.814), respectively (Table 4 and Figure 1). In the derivation study, the inclusive model had a statistically significantly higher discriminatory ability than the other genetic and nongenetic models (p = 4.74 × 10 −53 ). In the validation study, however, the environmental model was significantly more discriminative than the inclusive model, with an AUC value of 0.7925 (0.7705-0.815). The calibration analysis of the inclusive model revealed reasonably good agreement between the observed and predicted number of gastric cancer cases in groups defined by deciles of predicted risk distribution in both the derivation (p for Hosmer-Lemeshow test = 0.445) and validation studies (p = 0.116) ( Table 5). Moreover, the calibration plots of the inclusive model stayed close to the ideal calibration line throughout the risk spectrum in all data sets of both studies (Figure 2), and all of their calibration slopes were close to 1.0.  In the derivation study, the inclusive model had a statistically significantly higher discriminatory ability than the other models. In the validation study, however, the environmental model was significantly more discriminative than the inclusive model. In the derivation study, the inclusive model had a statistically significantly higher discriminatory ability than the other models. In the validation study, however, the environmental model was significantly more discriminative than the inclusive model.   In the derivation study, the inclusive model had a statistically significantly higher discriminatory ability than the other models. In the validation study, however, the environmental model was significantly more discriminative than the inclusive model.

Discussion
In this study, we developed a risk prediction model of gastric cancer using a combination of genetic, biological, and lifestyle-related risk factors. In the derivation study, discriminatory ability was slightly improved in the inclusive model, which consisted of both genetic and biological and lifestyle-related factors, than in the models that included only biological and lifestyle-related risk factors (environmental model). In the validation study, however, the environmental model was more discriminating than the inclusive model. The addition of genetic factors (SNPs) improved the performance of the risk prediction model only slightly, which suggests that genetic factors are less useful for risk prediction.

Discussion
In this study, we developed a risk prediction model of gastric cancer using a combination of genetic, biological, and lifestyle-related risk factors. In the derivation study, discriminatory ability was slightly improved in the inclusive model, which consisted of both genetic and biological and lifestyle-related factors, than in the models that included only biological and lifestyle-related risk factors (environmental model). In the validation study, however, the environmental model was more discriminating than the inclusive model. The addition of genetic factors (SNPs) improved the performance of the risk prediction model only slightly, which suggests that genetic factors are less useful for risk prediction.
This study represents the first attempt to combine genetic, biological, and lifestylerelated risk factors in the prediction of gastric cancer risk. Several previous risk prediction models for gastric cancer were investigated in large-scale population-based cohort studies in Japan, but these did not include genetic factors. Namely, Charvat et al. developed a prediction model to estimate an individual's risk of gastric cancer in Japan using a combination of age, sex, smoking, salted food consumption, family history of gastric cancer, and the ABCD classification [12], while Iida et al. developed a model in a cohort study in Japan using a combination of age, sex, combination of anti-H. pylori antibody and atrophic gastritis, hemoglobin A1c, smoking, drinking, and obesity [13]. In addition, Cai et al. recently developed a gastric cancer risk prediction rule in China based on a combination of age, sex, PG I/II ratio, gastrin-17 level, H. pylori infection, pickled food, and fried food. These models showed good performance, but did not include genetic factors [47].
Here, we selected MUC1-rs4072037, PSCA-rs2294008, and ABO-rs7849280 as genetic risk factors. MUC1-rs4072037 was identified in GWASs [26,27] and replicated in casecontrol studies [30,48] in East Asian countries. The membrane mucin MUC1 is a ligand for H. pylori in the stomach, and the SNP rs4072037 is known to determine a splicing acceptor site in the second exon of MUC1 [49]. MUC1-rs4072037 is an independent risk factor that influences tumor recurrence and disease-related death in diffuse-type gastric cancer, but not in intestinal-type gastric cancer [48]. PSCA-rs2294008 is a GWAS-identified susceptibility polymorphism for gastric cancer both in Japan and worldwide [19][20][21][22]24,25,50]. PSCA is expressed in differentiating gastric epithelial cells, shows a cell proliferation inhibitory effect in vitro, and is frequently downregulated in gastric cancer. PSCA-rs2294008 is a functional SNP that influences the transcriptional activity of the PSCA promoter; the T allele significantly suppresses its transcription activity, thus affecting susceptibility to diffuse-type gastric cancer [21]. ABO-rs7849280 was identified in a Japanese GWAS [31]. An association between blood type A and gastric cancer has been previously reported [51,52]. Tanikawa et al. revealed that the AA blood type has a higher frequency of the G allele of ABO-rs7849280 than other types. The risk G allele was associated with higher ABO mRNA expression, whereas ABO mRNA expression was significantly suppressed in H. pyloriinfected stomach [31]. ABO-rs7849280 is a key regulator of host-bacterial interactions of H. pylori-related diseases and gastric cancer.
Among environmental and biological factors, the ABCD classification showed a particularly high AUC value. Consistent with previous studies, we found that H. pylori infection and gastric atrophy substantially impacted gastric cancer risk. Their contribution to risk prediction was considerable, with AUCs 0.7354 and 0.7885 in the ABCD classification in the derivation and validation studies, respectively. Group D is negative for H. pylori infection but confers a high risk of gastric cancer. It is well known that H. pylori can no longer survive when atrophy has severely progressed or in the metaplastic intestinal mucosa induced by H. pylori infection [13], and production of anti-H. pylori antibodies in these conditions may be reduced. Therefore, although the subjects in Group D were not positive for H. pylori infection, most had been previously infected and were therefore also at high risk of developing gastric cancer, such as those in Group C.
The addition of novel GWAS-identified susceptibility loci may contribute to improving the performance of risk models. To date, however, the degree of such improvement has remained unclear. For example, in Szulkin et al.'s prostate cancer risk prediction study [53], a polygenic risk score for 65 established susceptibility variants provided an area under the curve (AUC) of 0.67, and the addition of 68 new variants increased the AUC to 0.68. In a similar study of the development of polygenic risk scores for prediction of breast cancer [54], the AUC of the prospective study was 0.603 with 77 SNPs, 0.630 with 313 SNPs, and 0.636 with 3820 SNPs. These findings suggest that the addition of SNPs may improve performance, albeit only to a limited degree.
In our present study, we also investigated the effects of increasing the number of SNPs. We selected SNPs with values of p < 0.05 in the derivation study (rs2294008, rs4072037, rs7849280, rs10074991, rs2294693, rs80142782) and then used the six SNPs selected to construct a new genetic model. As a supplementary explanation, rs13361707 also has a value of p < 0.05. Since both rs10074991 and rs13361707 are SNPs for PRKAA1 and in linkage disequilibrium, we chose rs10074991, which has a smaller p-value. Results showed an improvement in the AUC of the inclusive model in both the derivation (0.7728) and validation studies (0.7871). However, even with the inclusion of these six SNPs, the AUC of the environmental model was higher than that of the inclusive model in the validation study. In addition, in calibration using the Hosmer-Lemeshow test, p = 0.018 in the validation study, and the calibration could not be performed. Accordingly, although the addition of genetic factors had a positive effect on improving validation ability, these were not as great as expected due to the large impact of H. pylori infection and ABCD classification. For cancers that are significantly affected by one environmental factor, such as H. pylori in gastric cancer, the contribution of genetic risk factors to risk prediction may be limited.
Our study has several strengths. First, it was relatively large, and information was available on genetics, as well as H. pylori infection status, serologically defined gastric atrophy, and lifestyle characteristics. This allowed us to provide reliable estimates of risk factor effects and the performance of the model. Second, the constructed model was validated in a different dataset. Third, potential confounding by age and sex were considered by matching. Fourth, the allele frequencies of each SNP in the controls of this study were similar to that reported in HapMap JPT (available at http://www.ncbi.nlm.nih. gov/snp, accessed on 10 June 2020), warranting the comparability of our results for genetic factors with those in general populations in Japan.
Several limitations should also be noted. First, our lifestyle factors were obtained in a retrospective manner, and H. pylori/atrophy information was obtained in a crosssectional setting. Validation of the model in prospective studies is clearly warranted, and until then, application in prospective settings requires caution. Second, H. pylori infection and gastric mucosal atrophy status were defined by serological tests. Cutoff levels for defining negativity of serum anti-H. pylori antibody titers are reported to be too high [55], and H. pylori infection status might have been wrongly classified. If so, this might have introduced status misclassification, which would nevertheless have been nondifferential. Accordingly, the impact of these factors may have been underestimated. Third, although salt intake is known as a "probable" risk factor for gastric cancer [3] and the attributable fraction of salt intake is not negligible [56], it was not considered in the study. The addition of salt information might improve model performance, and should therefore be considered for future studies. Finally, residual confounding by known and unknown factors in the model might be present.

Conclusions
We developed and validated a risk prediction model for gastric cancer using genetic, biological, and lifestyle-related factors in Japanese. The contribution of genetic factors to risk prediction was limited due to the large impact of the ABCD classification (H. pylori infection-related factor).

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/cancers13215525/s1, Table S1: Associations of epidemiological and clinical risk factors in stomach cancer (validation study and meta-analysis), Table S2: Associations with Asian GWASidentifed susceptibility polymorphism in stomach cancer risk (validation study and meta-analysis). Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data are not publicly available due to ethical and data security requirements.