Scale and Scope of Gene-Alcohol Interactions in Chronic Pancreatitis: A Systematic Review

Background: Excessive alcohol consumption has long been known to be the primary cause of chronic pancreatitis (CP) but genetic risk factors have been increasingly identified over the past 25 years. The scale and scope of gene-alcohol interactions in CP nevertheless remain unclear. Methods: All studies that had obtained genetic variant data concurrently on alcoholic CP (ACP) patients, non-ACP (NACP) patients and normal controls were collated. Employing normal controls as a common baseline, paired ORACP and ORNACP (odds ratios associated with ACP and NACP, respectively) values were calculated and used to assess gene-alcohol interactions. Results: Thirteen variants involving PRSS1, SPINK1, CTRC, CLDN2, CPA1, CEL and CTRB1-CTRB2, and varying from very rare to common, were collated. Seven variants had an ORACP > ORNACP, which was regarded as an immediate indicator of gene-alcohol interactions in CP. Variants with an ORACP < ORNACP were also found to interact with alcohol consumption by virtue of their impact on age at first pancreatitis symptoms in ACP. Conclusions: This study revealed evidence for extensive gene-alcohol interactions in CP. Our findings lend support to the hypothesis that alcohol affects the expression of genetically determined CP and highlight a predominant role of weak-effect variants in the development of ACP.


Introduction
Although excessive alcohol consumption has long been known to be the primary cause of chronic pancreatitis (CP), its contribution to the etiology and development of pancreatitis is still shrouded in mystery in many respects [1][2][3][4]. Since the discovery of the first CP-causing variant, namely c.365G>A (p.Arg122His) in the PRSS1 gene (MIM# 276000; encoding cationic trypsinogen) [5], one new puzzle has emerged pertaining to whether or how a particular genetic risk factor interacts with alcohol consumption to cause CP. For example, one of the most extensively studied genetic risk factors-c.101A>G (p.Asn34Ser) in the SPINK1 gene (MIM# 167790; encoding pancreatic-specific trypsin inhibitor (PSTI)) [6]-was found to be overrepresented in alcoholic CP (ACP) patients; however, its detection rate in ACP patients was actually lower than that in non-ACP (NACP) patients [7][8][9], making it difficult to discern any gene-environment interactions (G×E) that might exist. Similar situations include, among others, a much lower detection rate of rare pathogenic CTRC (MIM# 601405; encoding chymotrypsin C) or CPA1 (MIM# 114850; encoding carboxypeptidase A1) variants in ACP patients than in NACP patients [10,11].
An interaction between alcohol consumption and the rs10273639-tagging common PRSS1-PRSS2 haplotype had however been suggested, firstly by the observation that the risk rs10273639C allele was found more frequently in ACP patients than in NACP patients and secondly by a case-only (i.e., ACP vs. NACP) analysis [12]. Recently, by fitting a more sophisticated model and by combining data from several studies, we provided compelling evidence for a synergistic interaction between the common PRSS1-PRSS2 haplotype and alcohol consumption [13]. PRSS1 was the first gene discovered to be responsible for CP [5]. Most pathogenic PRSS1 variants cause or predispose to CP by promoting increased activation or expression of trypsinogen [14], as exemplified by the p.Arg122His missense variant and trypsinogen gene duplication and triplication copy number variants [5,15,16]. PRSS2 encodes anionic trypsinogen (MIM# 601564), the second major trypsinogen isoform after cationic trypsinogen. Multiple lines of evidence support the involvement of PRSS2 in CP (see Herzig et al. [13] and references therein), although no pathogenic missense variants in the gene have been reported to date [14].
In addition to gain-of-function PRSS1 missense and copy number variants, loss-offunction variants in the SPINK1 gene [6] and CTRC gene [10,17] also predispose to CP by bringing about a gain of trypsin activity. Specifically, loss-of-function SPINK1 variants exert their effect by reducing the capability of PSTI-trypsin binding (see Szabo et al. [18] and references therein) whereas most loss-of-function CTRC variants increase trypsin activity by impairing (protective) trypsinogen degradation [10,19]. Together, genetic studies of the three genes have led to the identification of a trypsin-dependent pathological pathway in CP [20].
There is a pronounced dosage effect of the aforementioned rs10273639C risk allele on PRSS1/PRSS2 mRNA expression in human pancreatic tissue [12,13]. Given that uncontrolled trypsin expression/activity is central to CP pathogenesis [5,12,15,[21][22][23][24][25], the synergistic interaction revealed between the common PRSS1-PRSS2 haplotype and alcohol consumption [13] may serve as an important reference for unravelling the underlying mechanisms responsible for the G×E interactions in CP.
The three trypsin-dependent pathway genes (PRSS1, SPINK1 and CTRC) are among the most extensively studied CP genes in the context of NACP as well as ACP. We propose that a systematic review and meta-analysis of the currently available genetic data, starting from a comparison of the risk (in terms of odds ratio (OR)) conferred upon ACP (termed OR ACP ) by known pathogenic variants in the trypsin-dependent pathway genes (as well as other CP susceptibility genes) with the corresponding risk conferred to NACP (termed OR NACP ), could shed new light on the scale and scope of G×E interactions in CP. This study represents just such an attempt.

Disease Definitions
ACP was defined here as in the original publications despite slight differences in terms of the amount and duration of alcohol consumed as well as the ethnicity of the studied cohorts. For example, in most studies, ACP was attributed in relation to an alcohol intake of ≥80 g/d for a male and 60 g/d for a female for at least two years although both amount and duration of alcohol consumption were not specified in some studies. NACP and idiopathic chronic pancreatitis (ICP) were used here interchangeably. In principle, NACP was diagnosed in CP patients who were not known to have a positive family history and whose disease was not caused by known etiologies such as heavy drinking, autoimmune or obstructive factors.

Research Strategy
The outline of the research strategy is provided in Figure 1A. The key step was to generate paired OR ACP and OR NACP values for the previously reported variants in the three trypsin-dependent pathway genes as well as several other CP susceptibility genes. This was achieved through the re-analysis of available studies that had obtained variant data concurrently on ACP patients, NACP patients and normal controls, with normal controls being employed as a common baseline to calculate the respective ORs. In parallel to this main analysis, and in the context of the three trypsin-dependent pathway genes, we also compared the risk allele frequencies of informative variants in simultaneously analyzed normal and alcoholic controls. Alcoholic controls denote either subjects with alcohol dependence but without chronic pancreatitis or subjects with alcohol-associated liver cirrhosis. Normal controls denote healthy or population controls. The resulting paired OR ACP and OR NACP values were used firstly to evaluate the scale of G×E interactions in CP. Thus, if a variant had an OR ACP > OR NACP , it was interpreted as denoting an interaction with alcohol consumption. When variants exhibited an OR ACP < OR NACP , a further literature search was performed with a view to establishing their impact on the age at symptom onset in ACP: if a given variant was shown to accelerate age at symptom onset in variant-positive ACP patients compared to variant-negative ACP patients, this was interpreted as denoting an interaction with alcohol consumption.

Literature Searches and Selection Processes
Literature searches, performed with the intention of generating paired OR ACP and OR NACP values, in the context of the PRSS1, SPINK1 and CTRC genes, respectively, employed keyword queries in "all fields" of PubMed, from August 2020 to November 2020 and were frozen on 20 November 2020. For each gene, different sets of keyword query were first performed, with the results from the different queries being combined to generate a single list of unique publications (for details, see Supplementary Figure S1). All keyword search-derived publications were manually reviewed; only those that described original and extractable risk allele frequency distribution data simultaneously from ACP patients, NACP patients and normal controls were retained for analysis. In cases of overlapping informative studies from the same group, it was the latest one that was used for analysis. In the context of additional CP genes, only the first disease association report that included variant data from ACP, NACP and normal controls was used for analysis whenever possible; subsequent studies were briefly discussed regarding confirmation (or otherwise) of the first association report.

Data Extraction
All relevant data were manually extracted from the included studies.

Statistical Analysis
For a given variant (or aggregated variants), and in the case of single study-derived data, the OR ACP or OR NACP , associated 95% confidence interval (CI), and related χ 2 test of significance of the difference between risk allele frequencies, were calculated from the appropriate 2-by-2 contingency table in R [26]. A difference was regarded as being statistically significant when the p value was ≤0.05. If, for a given variant, data were available from ≥2 studies, the variant was subject to meta-analysis with respect to the calculation of its OR, 95% CI and p values. Meta-analysis, heterogeneity analysis, forest plots and funnel plots were performed using Review Manager 5.3 software [27]. The Mantel-Haenszel fixed-effect model was used to compute the pooled OR in the absence of statistical heterogeneity; otherwise, the Mantel-Haenszel random-effect model was used. Heterogeneity was considered to be significant when the p value for the test of heterogeneity was <0.05 or I 2 was 50% or more [27,28]. Funnel plots were performed only when the number of included studies that generated estimable ORs was ≥10 [27,29].
This study essentially followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [30]. The study was not registered in PROSPERO (an international database of prospectively registered systematic reviews), which does not allow the registration of already initiated studies [31]. It should be noted here that the validity of this study was not affected in any way by its non-registration in PROSPERO; all details of the study were fully provided.

Variant Classification in Accordance with Allele Frequency
Variants of interest were classified as very rare (<0.001), rare (from 0.001 to <0.005), low frequency (from 0.005 to 0.05) or common (>0.05) in accordance with their allele frequencies in the studied normal controls, following the definitions of Manolio and colleagues [32].

Variants Included for Analysis
In the context of the three trypsin-dependent pathway genes, the keyword searches yielded 260 PRSS1-related, 391 SPINK1-related and 76 CTRC-related publications (Supplementary Figure S1). Manual review of these publications in accordance with the criteria specified in Figure 1B Table S1), 17 SPINK1related (Supplementary Table S2) and 5 CTRC-related (Supplementary Table S3) studies that were then used for analysis. In the end, three distinct PRSS1 variants, two distinct SPINK1 variants, three distinct CTRC variants as well as one aggregate CTRC variant (all rare/very rare pathogenic variants in exons 2, 3 and 7), were retained for analysis (Table 1). It should be noted here that CTRC c.180C>T (p.Gly60Gly) was counted as two distinct variants; it was a common variant in the European and American populations [17,33] but rare in the Chinese population [34]. See Supplementary Results, Supplementary Figures  S2-S5 and Supplementary Tables S1-S4 for details about how these variants were selected and how their respective OR values were calculated. There were four other CP genes/loci for which the first disease association report concurrently analyzed ACP patients, NACP patients and normal controls, yielding three distinct informative variants (i.e., CLDN2 rs7057398, CEL-HYB1 and CTRB1-CTRB2 rs8055167) and one aggregate informative variant (i.e., all CPA1 variants with apparent activity <20%) ( Table 2). A detailed description of these four variants is provided in the Supplementary Results.
In summary, a total of 13 variants were included for analysis.

Evidence Suggesting Extensive G×E Interactions in CP
The paired OR ACP and OR NACP values of the 13 included variants, either newly calculated here or directly taken from the corresponding original reports, are provided in Tables 1 and 2. Up to seven variants had an OR ACP > OR NACP (5 in Table 1 and 2 in Table 2), suggesting the presence of G×E interactions between the respective variants and alcohol consumption. Three of the seven variants, namely PRSS1 rs10273639C, SPINK1 c.101A>G (p.Asn34Ser) and aggregate pathogenic rare/very rare variants in exons 2, 3 and 7 of the CTRC gene, were also concurrently analyzed in alcoholic controls and normal controls (See Supplementary Results, Supplementary Figure S6 and Supplementary Table S5 for details about how these variants were selected). None of them showed significant allele difference between the two control datasets ( Table 3).
As for the 6 variants that had an OR ACP < OR NACP (4 in Table 1 and 2 in Table 2), we searched for reports in which the age at symptom onset was informative in variant-positive and -negative ACP patients. Four reports were found to be informative with respect to distinct variants. Three of them [35][36][37] reported no significant difference in age at symptom onset between SPINK1 c.101A>G (p.Asn34Ser)-positive and -negative ACP patients; the remaining one [38] reported no significant difference with respect to two distinct CPA1 variants, p.Arg254Trp and c.738_761del. However, all these studies were limited by sample size (at most 124 patients were analyzed). We also found two reports that were informative for aggregate variants. Using the Kaplan-Meier model, Zou and colleagues showed that aggregate pathogenic genotypes involving either PRSS1, SPINK1, CTRC and/or CFTR (MIM# 602421; encoding cystic fibrosis transmembrane conductance regulator) genes were associated with a significant acceleration in age at symptom onset in ACP (p < 0.001; see Supplementary Figure S5 in the original report [34]) (NB, the pathogenic genotypes disproportionately involved PRSS1 and SPINK1 variants of strong genetic effect as defined below). The recent Lewis study demonstrated that the median age at symptom onset in ACP patients (defined as an alcohol consumption of >4 drinks per day) with a pathogenic SPINK1 variant was 39 years whereas that in SPINK1 variant-negative ACP patients was 45 years, although the difference did not achieve statistical significance [39]. Here, it is pertinent to note that Lewis and colleagues also analyzed a subgroup of CP patients termed "light to moderate drinkers" (defined as an alcohol consumption of ≤4 drinks per day) and found a statistically younger median age at symptom onset in SPINK1 variant-positive patients as compared to SPINK1 variant-negative patients (24 vs. 50 years). Details of the pathogenic SPINK1 variants were not made available in this latter report, but most, if not all, of them should fall into the category of variants of strong genetic effect as defined below. Taken together, these findings supported the presence of G×E interactions in cases wherein a G×E interaction was not immediately evident from a direct comparison of paired OR ACP and OR NACP values.

Inter-and Intra-Variant Comparison of the Paired OR ACP and OR NACP Values Revealed a Dichotomized Genetic Effect
To better understand the ground rules for G×E interactions in CP, we next performed an inter-and intra-variant comparison of the paired OR ACP and OR NACP values. Using OR NACP as the baseline genetic effect for comparison, we observed a clear and consistent pattern of genetic effect-dependent dichotomization: thus, variants with OR NACP ranging from 1.09 to 4.72 were invariably found more frequently in ACP than in NACP, whereas variants with OR NACP ranging from 5.20 to 59.31 were invariably found more frequently in NACP than in ACP, with an OR NACP value of~5 appearing to be the threshold (Figure 2). Based upon this observation, and by reference to the gene effect classifications by Manolio and colleagues [32], we defined here an OR NACP value of <5 as a weak genetic effect and an OR NACP value of ≥5 as a strong genetic effect. In other words, variants with a strong genetic effect were found more frequently in NACP than in ACP, whereas variants with a weak genetic effect were found more frequently in ACP than in NACP.  Tables 1 and 2, going from the smallest to the largest. ACP, alcoholic chronic pancreatitis; NACP, non-alcoholic chronic pancreatitis; OR ACP , odds ratio associated with ACP; OR NACP , odds ratio associated with NACP.   Figure S6).

Discussion
Prompted by findings from our recent meta-and re-analyses [13], we have herein embarked on a new analysis of published data with a view to being able to ascertain the scale and basic ground rules of G×E interactions in CP. The premise behind this attempt was that the wealth of genetic and clinical data which has accumulated over the past decades might harbor valuable information that could potentiate epidemiological exploration.
Mindful of the differences in disease definition between different studies in terms of the amount of alcohol consumed and the duration of heavy drinking [34], we opted to use only studies that analyzed variant(s) of interest concurrently in ACP patients, NACP patients and normal controls, with normal controls being employed as a common baseline to allow the simultaneous calculation of OR ACP and OR NACP values (Figure 1). A higher OR ACP than OR NACP , which was observed in 7 of the 13 re-analyzed variants, was regarded as an immediate indicator of G×E interactions in CP. Herein, it should be emphasized that we also collated variants that were concurrently analyzed in alcoholic and normal controls in the context of the three trypsin-dependent pathway genes. The respective risk allele frequencies of the three resulting variants, which collectively involved the PRSS1, SPINK1 and CTRC genes and whose allele frequencies in normal controls varied from rare to common, were remarkably similar between the two control datasets (Table 3). This allowed us to formally exclude the (albeit inherently unlikely) possibility that a higher OR in ACP than in NACP was actually due to an intrinsic association of the corresponding variant with alcohol consumption. As for variants with a higher OR NACP than OR ACP , we also found strong evidence to support the presence of G×E interactions; multiple such variants impacted on age at first pancreatitis symptoms in ACP.
The extensive scale of G×E interactions in CP has received strong support from a mechanistic standpoint. Except for the PRSS1 c.263G>C (p.Gly208Ala) variant, which may involve mutation-induced misfolding and consequent endoplasmic reticulum stress owing to its negative effect on protein secretion [43], all the distinct PRSS1, SPINK1 and CTRC variants (or a cis-linked variant in cases such as PRSS1 rs10273639 [44] and SPINK1 c.101A>G (p.Asn34Ser) [45]) listed in Table 1 are thought to predispose to CP through their action on the trypsin-dependent pathway of pathology [20]. The recently discovered CTRB1-CTRB2 inversion polymorphism, tagged by rs8055167 (Table 2), affects protective trypsinogen degradation, thereby ensuring that it should be assigned to the trypsin-dependent pathway [42]. As mentioned earlier, there is a pronounced dosage effect of the rs10273639C risk allele on PRSS1/PRSS2 mRNA expression in human pancreatic tissue [12,13]; and a synergistic interaction between the risk allele and alcohol consumption has been formally demonstrated [13]. Moreover, and most importantly, mice that express PRSS1 Arg122His developed more severe pancreatitis after ethanol feeding [23].
The causal variant underlying the association with CP at the CLDN2 locus remains to be identified, although the risk genotypes have been linked to ectopic expression of CLDN2 in pancreatic acinar cells [12]. Of the functionally defective CPA1 variants, the most frequently found variant in NACP patients, c.768C>G (p.Asn256Lys) [11], causes digestive enzyme misfolding and CP in mice [46]. The CEL-HYB1 allele [41] also appears to cause proteotoxic misfolding owing to missense variants present in the breakpoint junction region of the allele [47]. It should be remembered that the PRSS1 c.263G>C (p.Gly208Ala) variant may also involve a misfolding pathway that is distinct from the trypsin-dependent pathway [43]. Indeed, the misfolding-dependent pathway is now recognized as an independent mechanism underlying CP [48]. Recently, ethanol feeding was found to accelerate pancreatitis progression in CPA1 Asn256Lys mutant mice [49].
Despite the heterogeneity in relation both to the genes involved and the risk allele frequencies in normal controls, the re-analyzed variants exhibited a clear and consistent pattern of genetic effect-dependent dichotomization ( Figure 2). Here, one may argue that of the seven variants with an OR ACP > OR NACP , a statistically significant G×E interaction was demonstrated only for rs10273639C [13] and CTRC c.180C>T (p.Gly60Gly) in the context of the LaRush study [33], as evidenced by the non-overlapping 95% CIs (Table 1). By contrast, the other five variants with an OR ACP > OR NACP invariably showed overlapping 95% CIs. Given the clear and consistent pattern of genetic effect-dependent dichotomization exhibited by all analyzed variants (Figure 2), we assume that the absence of statistical significance in most cases was due to a lack of statistical power. Moreover, even in the two cases showing a statistically significant G×E interaction, the absolute differences between the respective OR ACP and OR NACP values were small. Importantly, these consistent observations with nevertheless subtle differences are consistent with a hypothesis about G×E interactions in CP, which was originally postulated by Lankisch and colleagues 20 years ago, in the absence of genetic data; having simply evaluated the impact of alcohol intake on the clinical course of CP, they presciently proposed that common underlying genetic defects exist as the basis of most CP, with environmental factors such as alcohol consumption influencing the expression of the disease [50]. Therefore, it appears that whilst G×E interactions in CP are extensive, they are limited in degree. Additionally, the genetic effect-dependent dichotomization of the re-analyzed variants clearly demonstrate that, as compared to NACP, ACP is more often associated with variants of small genetic effect.
An additional observation merits special mention. Variants having strong effects (i.e., OR NACP of >5) tend to be rare/very rare whereas variants having weak effects (i.e., OR NACP of <5) tend to be common. Consistent with this general trend, CTRC c.180C>T (p.Gly60Gly) had an OR NACP of 9.01 when it was a rare variant [34] (in the Chinese population) but had an OR NACP of 1.17 when it was a common variant (in the North American population) [33] (Table 1). This is consistent with our increasing appreciation of ethnic differences in terms of genetic susceptibility to CP [51][52][53] and highlights the importance of the use of ethnically matched controls for genetic association and replication studies.
Our study has several limitations. First and foremost, this study was by no means comprehensive. For example, the CFTR gene was not included in the analysis in part due to its extremely high allelic heterogeneity. Secondly, as for the included variants, sample size was quite small in several instances, and this was perhaps the cause of several nonsignificant associations with ACP or NACP. Thirdly, we would like to reiterate that there were inter-study differences in terms of the ACP and NACP definitions employed although we consider it unlikely that these limitations would have affected the main conclusions drawn owing to the use of paired data in each included study. Finally, it is important to point out that we had to limit our G×E interaction analysis to the ACP context since genetic data on a subgroup of CP patients termed "light to moderate drinkers" [39] have only just begun to emerge. Nonetheless, we surmise that it may be possible to extrapolate from general principles derived from the study of ACP to gene-alcohol interactions in CP as a whole.

Conclusions
By means of a systematic review, we have uncovered evidence for extensive albeit limited G×E interactions in CP and demonstrated a clear and consistent pattern of genetic effect-dependent dichotomization of the re-analyzed variants. Our findings lend strong support to the hypothesis that alcohol affects the expression of genetically determined CP and highlight a predominant role of variants with a weak genetic effect in the development of ACP. This study also strengthens our appreciation of CP as a multigenic and multifactorial disease and highlights the challenges ahead in terms of variant detection, risk assessment and disease prevention in the age of personalized medicine.  Figure S1: Sets and outcomes of the keyword search in "All Fields" of PubMed with respect to PRSS1-, SPINK1and CTRC-related publications (frozen on 20 November 2020), Figure S2: Meta-analysis of the association between the risk allele of PRSS1 c.365G>A (p.Arg122His) and alcoholic chronic pancreatitis (A) or non-alcoholic chronic pancreatitis (B), Figure S3: Meta-analysis of the association between the risk allele of SPINK1 c.101A>G (p.Asn34Ser) and alcoholic chronic pancreatitis (A) or non-alcoholic chronic pancreatitis (B), Figure S4: Funnel plots corresponding to data presented in Supplementary Figure S3, Figure S5: Meta-analysis of the association between the risk allele of CTRC c.760C>T (p.Arg254Trp) and alcoholic chronic pancreatitis (A) or non-alcoholic chronic pancreatitis (B), Figure S6: Comparison of the risk allele frequencies of SPINK1 c.101A>G (p.Asn34Ser) in alcoholic controls (Experimental) and normal controls (Control) by means of meta-analysis, Table S1: PRSS1-related studies used for analysis, Table S2: SPINK1-related studies used for analysis, Table S3: CTRC-related studies used for analysis, Table S4: Pathogenic variants in exons 2, 3 and 7 of the CTRC gene in German ACP patients, NACP patients and normal controls, Table S5: Pathogenic variants in exons 2, 3 and 7 of the CTRC gene in German alcoholic controls and normal controls.