Genetic Contribution to Alcohol Dependence: Investigation of a Heterogeneous German Sample of Individuals with Alcohol Dependence, Chronic Alcoholic Pancreatitis, and Alcohol-Related Cirrhosis

The present study investigated the genetic contribution to alcohol dependence (AD) using genome-wide association data from three German samples. These comprised patients with: (i) AD; (ii) chronic alcoholic pancreatitis (ACP); and (iii) alcohol-related liver cirrhosis (ALC). Single marker, gene-based, and pathway analyses were conducted. A significant association was detected for the ADH1B locus in a gene-based approach (puncorrected = 1.2 × 10−6; pcorrected = 0.020). This was driven by the AD subsample. No association with ADH1B was found in the combined ACP + ALC sample. On first inspection, this seems surprising, since ADH1B is a robustly replicated risk gene for AD and may therefore be expected to be associated also with subgroups of AD patients. The negative finding in the ACP + ALC sample, however, may reflect genetic stratification as well as random fluctuation of allele frequencies in the cases and controls, demonstrating the importance of large samples in which the phenotype is well assessed.


Introduction
Genetic influences play a major role in the development of alcohol use disorders as formal genetic studies in twins and epidemiological samples have shown [1][2][3]. Candidate studies and genome-wide association studies (GWAS) have identified numerous candidate genes for alcohol dependence (AD) and alcohol consumption (AC).
The advantage of GWAS studies which screen the entire genome with millions of variants is that they facilitate gene identification in novel biological contexts. Indeed, recent successes have been achieved for complex traits with low heritability, such as depression [7] in large samples comprising more than one hundred thousand individuals. Previous GWAS of AD have been conducted in samples with a lower number of individuals [8][9][10]. However, they have also most consistently identified variants in the ADH gene cluster. The up-to-now largest GWAS of AC which contained more than one hundred thousand individuals from the UK Biobank, and identified ten loci reaching genome-wide significance, also showed the best finding in the ADH1B/ADH1C region (rs145452708, p = 8.93 × 10 −29 ) [11]. Additionally, earlier GWAS of AC identified rs1229984 and other variants at the chr4q22/q23 region in/near the ADH gene cluster [12][13][14].
A promising approach to mitigating the burden of multiple testing, which limits the single-marker approach, is to analyze the aggregated contribution of variants in single genes and in functionally related gene groups (e.g., biological pathways), under the assumption that these contain a large number of variants with a disruptive influence on gene/pathway function.
In the present study, multi-marker analyses were performed in order to detect new genes and pathways for AD and/or to confirm prior reported genes and pathways. To increase sample size in order to maximize statistical power, individuals with alcohol-related somatic disease were also included. Gene-wide significance for the ADH1B gene was detected in the combined sample. However, the findings demonstrated that very large sample sizes are warranted to overcome heterogeneity and/or random genetic fluctuation.
The study was approved by the ethics committee II of Medical Faculty Mannheim of Heidelberg University study number 2012-361N-MA, and was carried out in accordance with the Declaration of Helsinki. All subjects provided written informed consent prior to inclusion.
Alcohol dependence (AD) case-control subsample: A detailed description of the sample is provided elsewhere [9]. All patients were of self-reported German ancestry, and fulfilled Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) criteria for AD [18]. The patients were recruited from consecutive admissions to psychiatric units at university hospitals participating in the German Addiction Research Network [19]. These five study centers are located in the following areas of central and southern Germany: Mannheim, Bonn/Essen/Düsseldorf/Homburg, Regensburg, Munich, and Mainz. Controls were drawn from the following three population-based epidemiological cohorts: (i) KORA-gen [20]; (ii) PopGen [21]; and (iii) HNR [17]. Further controls were drawn at random from a Munich community sample screened using the Composite International Diagnostic Interview. The AD case-control subsample is part of the Psychiatric Genomics Consortium (PGC) [22]. The GWAS data from the samples can be made available within the context of research collaborations.
Chronic alcoholic pancreatitis (ACP) patients: A diagnosis of ACP was assigned in patients with a history of ≥2 years ingestion of ≥80 g alcohol/day (men), or ≥60 g/day (women). Most patients exceeded these cut-offs for level and/or duration. The cohort included patients from a number of European countries [15]. However, only German ACP patients were included in the present analyses. These individuals were recruited in Berlin, Dresden, Erlangen, Heidelberg, Greifswald, Leipzig, Magdeburg, Mannheim, and Munich.
Alcohol-related liver cirrhosis (ALC) patients: A detailed description of the criteria used to define case status is provided in [16]. ALC patients presented with clinically-diagnosed, or biopsy-confirmed, cirrhosis and a ≥10-year history of a past and/or present alcohol consumption level of ≥80 g/day (men), or ≥60 g/day (women). In all cases, other causes of cirrhosis were excluded. ALC cases were recruited from university hospital departments of hepatology and gastroenterology in: (i) Germany (Bonn, Regensburg, Dresden, Leipzig, Kiel, Frankfurt); (ii) Austria (Salzburg); and (iii) Switzerland (Bern).
Controls for ACP and ALC patients: Controls were drawn from the HNR study [17].
ACP patient subsample, ALC patient subsample, and control sample for ACP and ALC patients: The individuals were genotyped using the Illumina Omni Express BeadChip.

Quality Control
Quality control (QC) and single marker association testing were performed using PLINK v1.9 [23]. Prior to QC, the genotype data of the three samples were merged. QC of the merged data was performed in accordance with the protocol of the Schizophrenia Working Group of the Psychiatric Genomics Consortium [24]. The analyses were restricted to autosomal single nucleotide polymorphisms (SNPs) only. In the combined sample, genetic outliers were identified using principal component analysis (PCA). Outlier status was defined as the presence of data points located more than 6 standard deviations from the mean on any of the first 20 principal components. The respective individuals were excluded from further analysis. After the removal of PCA outliers, the first and second principal components showed a nominally significant association with AD, and were included as covariates in all association analyses. As the present sample was comparatively small, a stringent Hardy-Weinberg Equilibrium (HWE) test cutoff of p > 0.05 was applied to the controls of the combined sample in order to optimize the quality of the genotyping clusters for the purposes of multimarker analysis. A total of 6525 (out of 6894) individuals and 257,866 variants passed all filters. Of the 6525 individuals, 2841 were cases (1331 AD patients, 1110 ACP patients, 400 ALC patients), and 3684 were controls (1934 controls from the AD subsample, and 1750 controls for the combined ACP and ALC patient sample). After linkage disequilibrium (LD)-based pruning (Variance Inflation Factor of 10), a total of 194,024 SNPs remained for the combined sample and subsample analyses.

Statistical Analysis
The single marker analysis was conducted using PLINK v1.9 [23]. This involved logistic regression, an additive model of inheritance and correction for population stratification by including the first two principal components as covariates. Gene-based and pathway-based analysis was conducted using MAGMA v1.04 [25]. For the pathway analysis, output files from the gene-based analysis were used as input. SNPs were assigned to a gene if the variant was located within the gene sequence or within 20 kb of the transcript. If a variant was located within a region shared by more than one gene, the variant was assigned to all of the respective genes. Version 5.1 of the Reactome database set was retrieved from the Molecular Signatures Database [26]. The Reactome v5.1 set comprises 674 pathways.
For the post hoc analysis, genotypes for the ADH variant rs1789891 were counted using the "hardy" option in PLINK. Deviation from HWE (exact test) was calculated using the DeFinetti program [27]. SNAP [28] was used to generate the regional association plot. The database dbSNP [29] was used to retrieve the nucleotide triplet for the amino acid exchange.
In the gene-based approach, only one finding achieved genome-wide significance. This was the association with alcohol dehydrogenase 1B (ADH1B) (p uncorrected = 1.2 × 10 −6 ; p corrected = 0.020; Table 1). The Bonferroni corrected gene-based genome-wide significance threshold was 2.9 × 10 −6 (0.05/16853 genes). In the combined genotype data, 16,853 genes were represented. Table 2 shows the variants of the ADH1B locus that were included in the gene-based analysis. This association was driven by the AD subsample (AD: p uncorrected = 6.5 × 10 −10 ; ACP + ALC: p uncorrected = 0.69). Table 1. Results of the gene-based analysis in the combined sample, and the respective p-values in the subsamples. Genes with p uncorrected < 1.0 × 10 −4 are shown. If a variant was located within a region shared by more than one gene, the variant was assigned to all of the respective genes. The ADH variant which made the strongest contribution to the gene-wide significance of ADH1B was rs1789891, which is located within ±20 kb of ADH1B and ADH1C and contributes to the gene-based p-values of both genes ( Figure S1). The rs1789891 association was driven by the AD subsample (combined sample: p = 1.315 × 10 −5 , OR A-allele = 1.232; ACP + ALC subsample: p = 0.6392, OR A-allele = 1.033; AD subsample: p = 1.642 × 10 −8 , OR A-allele = 1.469) ( Table S1). The rs1789891 "risk" allele was A and the "protective" allele was C.
In the control population of the combined cohort, the rs1789891 allele frequency was 0.154 for the A-allele. The A-allele frequency of the controls was 0.141 for the AD subsample and 0.169 for the ACP + ALC subsample (Table 3), the latter being similar to that reported in the 1000 Genomes Phase 3 data (A-allele: 0.167; CEU subpopulation) [30]. Table 2.
Variants used as input for the gene-based analysis of ADH1B (±20 kb; chr4:100426552-100481581; hg18). Single marker association p uncorrected -values shown, as calculated using logistic regression in PLINK.  In the pathway analysis of the combined sample, the top finding was the "Ethanol_Oxidation" gene set, which contains ADH1B (p uncorrected = 2.2 × 10 −4 ; p corrected = 0.15; Table 4). Table 4. Results of the genome-wide pathway analysis of the combined sample, and the respective values in the subsamples. Pathways with p uncorrected < 1 × 10 −2 are shown. If a variant was located within a region shared by more than one gene, the variant was assigned to all of the respective genes.

Discussion
To facilitate both the identification of new genes and pathways for AD and the replication of previous results, the present study combined the cohorts of three previous investigations in order to increase sample size to increase statistical power. The only association to withstand correction for multiple testing was the association with ADH1B in the gene-based test. This association was attributable to the AD subsample, and no association was detected in the ACP + ALC subsample. On first inspection, the lack of association with ADH1B in the combined ACP + ALC sample may seem surprising, since this gene is one of most consistently reported genes for AD and AC, and achieved genome-wide significance in the present AD subsample. In the AD patient subsample, the frequency of the A allele was higher (19.4% vs. 17.9%), and in the controls lower (14.1% vs. 16.9%) than in the ACP + ALC subsample. The lack of association may be attributable to random or systematic genetic differences within the patient and/or control samples.

1.
For ACP and ALC, no explicit diagnosis of AD was required, and these patients may therefore differ in terms of genetic disposition. However, the ACP and ALC patients were recruited from a clinic specialized in the treatment of alcohol-induced somatic disorders. Furthermore, in each patient, the respective disorder had been induced by excessive alcohol consumption, and the majority of patients were unable to abstain from alcohol despite the assignment of the somatic diagnosis.

2.
The differing distribution of rs1789891 in the AD and ACP + ALC samples is non-random. ADH1B metabolizes alcohol to acetaldehyde, and research suggests that the adverse effects of acetaldehyde inhibit further drinking [31][32][33]. Alleles that confer an increased rate of alcohol metabolism may also contribute to tissue damage [34]. This was illustrated in a recent study from Japan, which analyzed rs1229984 (Arg48His) in ADH1B. The ADH1B 48His variant leads to an increased level of acetaldehyde and is thus protective in terms of AD development. The authors found that the ADH1B 48His variant was overrepresented in patients with alcoholic liver cirrhosis and chronic alcoholic calcific pancreatitis [35]. ADH1B_48His has a low frequency in Europeans [36][37][38], was not present in our genotyping arrays, and could not be imputed with sufficient imputation quality (R 2 = 0.44).
In the present analyses, the ADH1B variant with the lowest p-value was rs1789891 (p = 1.315 × 10 −5 ). The rs1789891 variant is located between the genes ADH1B and ADH1C. Although both genes are expressed in the liver [39] and pancreas [40], rs1789891 has no known function according to the NCBI Phenotype-Genotype Integrator [41]. However, rs1789891 is in high LD with the functional variants rs1693482 (Arg272Gln) and rs698 (Ile350Val) in the gene ADH1C ( Figure S2). These two ADH1C missense variants Arg272Gln and Ile350Val typically occur together (r 2 = 1.0), and result in two different forms of ADH1C: (i) the ADH1C isoenzyme gamma1 (ADH1C*1), in which arginine is present at amino acid position 272 and isoleucine at amino acid position 350; and (ii) the ADH1C isoenzyme gamma2 (ADH1C*2), in which glutamine is present at amino acid position 272 and valine at amino acid position 350. Although ADH1C_Arg272/Ile350 (which correspond to the rs1789891 C-allele) confers a rapid rate of ethanol oxidation [42,43], its effect on the rate of alcohol metabolism is weaker than that of ADH1B_48His. However, imputation of rs1693482 and rs698 showed that the associations with rs1693482 (p = 1.43 × 10 −5 ) and rs698 (p = 1.798 × 10 −5 ) were weaker than with rs1789891 (p = 1.315 × 10 −5 ). Thus, the issue of whether the association with rs1789891 is mainly attributable to LD with rs1693482 and rs698, or whether further variants in this region are implicated, remains unclear. Variants rs1693482/rs698 could nevertheless be the contributory factor in terms of organ damage [44][45][46], since they are in LD with rs1789891. Two plausible hypotheses can be formulated to explain how the products of the ADH reaction may increase the risk of tissue damage in the pancreas and liver. First, acetaldehyde accumulation in response to chronic alcohol ingestion has been implicated in the etiology of liver cirrhosis, pancreatitis, brain damage, cardiomyopathy, fetal alcohol syndrome, and various forms of cancer [34]. Whereas the allele or genotype differences in some studies were non-significant, several investigations have reported a lower frequency of alcoholism-susceptibility alleles or genotypes in patients with alcoholic liver disease or alcoholic pancreatitis (reviewed in [35]). The cytotoxic acetaldehyde that is formed as an intermediate in the metabolism of ethanol is reported to induce morphological changes in the pancreas of experimental animals [47]. In addition, acetaldehyde has reported fibrogenic effects in the liver [48]. The likely molecular mechanism through which acetaldehyde causes organ damage is the promotion of adduct formation, which leads to protein and DNA damage [48].
Second, nicotinamide adenine dinucleotide (NAD + ) is an intermediate electron carrier in the cytosolic ADH-mediated metabolism of alcohol to acetaldehyde. In this reaction, NAD + is reduced to NADH by two electrons. In a subsequent step, the electrons of NADH are transferred to O 2 in the mitochondrial respiratory chain, which captures H + to yield H 2 O. Ethanol metabolism therefore increases the O 2 requirement of hepatocytes, and may result in hepatocyte hypoxia [49]. This may lead in turn to organ damage.

Controls
Random fluctuation seems very likely when looking at allele frequencies in the control samples. A detailed inspection of rs1789891 allele frequencies in the present control subsamples (data not shown) was therefore performed. Despite the fact that the control sample used for the ALC + ACP samples was drawn from the same study as a subcohort of the AD controls, it displayed a higher frequency of the AD risk allele A (16.9%) than the corresponding control subcohort used for AD (13.9%).

Conclusions
The aim of the present study was to identify new genes and pathways for AD, and to confirm previously reported findings, by combining three previously investigated samples. No novel data were generated. The fact that the previously reported association with ADH1B was not observed in the ACP + ALC subsample may reflect genetic stratification in cases and/or random fluctuation in allele frequencies in controls. This finding demonstrates that even strong signals can be blurred, if samples are small and heterogeneous with possible opposing effects. Our finding therefore stresses the necessity for, and central importance of samples that are large and well characterized, such as those investigated within the context of the Psychiatric Genomics Consortium. The present authors are optimistic that as has been the case with other psychiatric disorders, the possibilities offered by GWAS will ultimately generate major contributions to our understanding of the genetic background of the alcohol use disorders.
Supplementary Materials: The following are available online at www.mdpi.com/2073-4425/8/7/183/s1. Figure S1: Regional association plot of the alcohol dehydrogenase gene region in the combined sample; Figure S2: Linkage disequilibrium between rs1789891 and rs1229984, rs1693482, and rs698. Table S1: Top SNPs from the single marker analysis in the combined sample and their respective p-values in the subsamples.