Multi-Cell-Type Openness-Weighted Association Studies for Trait-Associated Genomic Segments Prioritization

Openness-weighted association study (OWAS) is a method that leverages the in silico prediction of chromatin accessibility to prioritize genome-wide association studies (GWAS) signals, and can provide novel insights into the roles of non-coding variants in complex diseases. A prerequisite to apply OWAS is to choose a trait-related cell type beforehand. However, for most complex traits, the trait-relevant cell types remain elusive. In addition, many complex traits involve multiple related cell types. To address these issues, we develop OWAS-joint, an efficient framework that aggregates predicted chromatin accessibility across multiple cell types, to prioritize disease-associated genomic segments. In simulation studies, we demonstrate that OWAS-joint achieves a greater statistical power compared to OWAS. Moreover, the heritability explained by OWAS-joint segments is higher than or comparable to OWAS segments. OWAS-joint segments also have high replication rates in independent replication cohorts. Applying the method to six complex human traits, we demonstrate the advantages of OWAS-joint over a single-cell-type OWAS approach. We highlight that OWAS-joint enhances the biological interpretation of disease mechanisms, especially for non-coding regions.


Introduction
Genome-wide association studies (GWAS) have been a powerful tool for identifying genetic signals associated with complex traits [1]. In recent years, numerous trait-related single nucleotide polymorphisms (SNPs) have been inferred from GWAS, which brings novel insights into disease mechanisms and genetic architectures. However, the majority of GWAS loci (∼89%) lie within non-coding regions [2,3]; hence, their interpretation remains a significant challenge. In recent years, methods incorporating functional annotations have been developed which support the prioritization of non-coding variants, improving the understanding of the biological mechanisms underlying complex traits [4,5].
Chromatin accessibility refers to the degree to which nuclear macromolecules are able to physically contact chromatinized DNA [6,7]. As an epigenetic feature, chromatin accessibility is an informative functional annotation in revealing active regulatory elements [8]. In addition, measurements of chromatin accessibility are predictive for gene expression [9,10]. However, obtaining experimental epigenome data for large cohorts is costly. As an alternative, computational approaches have been proposed for chromatin accessibility prediction. For example, deltaSVM leverages the gkm-SVM classifier and quantifies cell-type-specific effects of variants on DNase I sensitivity in their native genomic contexts [11]. DeepCage improves the prediction of novel chromatin accessible regions with a unified deep neural network, which integrates both sequence information and the binding status of transcription factors [12].
Recently, we have developed the openness-weighted association studies (OWAS) approach, a computational framework that leverages predicted chromatin accessibility for the prioritization of GWAS signals [13]. The first step in OWAS is to choose a trait-related cell type (e.g., the liver for low-density lipoprotein (LDL), and whole-blood for Crohn's disease (CD)). However, selecting an optimal cell type for a given complex trait remains a substantial challenge [14,15]. In fact, many complex traits involve multiple related cell types, such as obesity being regulated by cells in both brain and adipose tissues [3,16,17]. Additionally, it has been shown that phenotype-genotype associations are enriched in open regions in multiple cell types for many traits [18,19]. Specifically, associations with height displayed relatively ubiquitous enrichments in DNase I hypersensitive sites of cell types including blood, fetal muscle, pancreas, etc. Even for those traits that showed relatively narrow enrichments, such as CD, the associations were also enriched in open regions of fibroblast and fetal intestine, in addition to blood cells [19]. Therefore, it is reasonable to incorporate information from multiple cell types to improve the performance of single-cell-type OWAS analysis.
In this study, we introduce OWAS-joint, a computational framework that integrates multi-cell-type chromatin accessibility predictions to prioritize trait-associated genomic segments. OWAS-joint aggregates single-cell-type OWAS test statistics via an aggregated Cauchy association test (ACAT) [20], bypassing the selection of specific cell types. We show through simulations that OWAS-joint improves statistical power compared to single-cell-type methods. We compared the performance of OWAS-joint and single-cell-type with real data applications for six complex traits, including CD, rheumatoid arthritis (RA), hypertension (HT), prostate cancer (PrCa), high-density lipoprotein (HDL), and LDL. Compared with OWAS, OWAS-joint identified more signals that were biologically interpretable. Furthermore, segments identified by OWAS-joint explain higher heritability and have high replication rates in independent cohorts. OWAS-joint takes GWAS summary statistics as the input and can be easily applied to large cohorts (e.g., UK Biobank, UKBB). The findings of OWAS-joint highlight the mediating effects of chromatin accessibility to the phenotype of interest, which improves the functional interpretation of non-coding genetic variants and provides novel insights into disease mechanisms. We provide an R package, OWASjoint, to implement the proposed method, which is available at https://github.com/shuangsong0110/OWASjoint (accessed on 1 May 2022).

Jointly Modeling Multi-Cell-Type Openness Scores
OWAS-joint integrates in silico predictions of chromatin accessibility across multiple cell types and GWAS summary statistics to prioritize trait-associated genomic segments ( Figure 1). We examine 100 KB up and down-stream from the transcription start sites of genes as regulatory regions, which cover most of the regulatory variants [21]. The regulatory regions are then divided into segments of 5 KB, and the results are robust to the choice of length [13]. In the software, users can specify a candidate segment either by length or by the number of SNPs covered. For each segment and each cell type, OWAS test statistics are calculated as described in Song et al. (2021). OWAS-joint combines the single-cell-type OWAS results with ACAT [20]. To elaborate, denote p l,s as the OWAS p-value of segment s concerning the l-th cell type. Specifically, the ACAT test statistic for segment s is where L is the total number of cell types to be aggregated, and the η l values are nonnegative weights that sum to 1, modeling the relative importance of each cell type. In particular, OWAS-joint is equivalent to single-cell-type OWAS on the k-th cell type when the weight of the k-th cell type is 1 and the weights of other cell types are 0. We set η l = 1 L (l = 1, · · ·, L) in the following context assuming that there is no prior knowledge on the related cell types. The transformed quantity tan[(0.5 − p l,s )π] follows the Cauchy distribution if the p-value p l,s is from the null (i.e., p l,s follows a uniform distribution). Then the test statistic T s , which is a weighted average of Cauchy random variables, has approximately a Cauchy tail under the null regardless of the dependency structure [20,22]. Thus, we can transform T s based on the cumulative density function of the Cauchy distribution to derive the p-value of the multi-cell-type OWAS test, We note that the approximation holds with arbitrary correlation structures of the OWAS test statistics across cell types, which ensures the computational efficiency of OWAS-joint.

OWAS-joint
Openness-weighted association studies (OWAS) The p-value cut-offs for segment-level association tests were determined using the Bonferroni correction. For each cell type, we used 0.05 divided by the total number of segments as the significance threshold (∼5 × 10 −8 ) to identify the trait-associated genomic segments.

Linkage Disequilibrium (LD) Shrinkage
In OWAS [13], the z-score for the s-th segment is derived by where γ s is the effect size of segment s for the trait; Ω s denotes the set of SNPs in segment s; w j , z j , and σ 2 j denote the openness effect, z-score, and sample variance of the j-th SNP, respectively, and σ 2 s is the sample variance of the openness scores in segment s. We estimate σ 2 s withσ 2 s = W T s Cov(X s )W s , where W s is the vector containing the openness effects of SNPs in segment s. The covariance matrix of the genotype, Cov(X s ), can be estimated from an external LD reference panel, such as the 1000 Genomes Project (1000G) [23] and UKBB [24].
In real data applications, the LD matrix was estimated from European samples from the 1000G reference panel. However, the limited sample size leads to rank deficiency of the estimated covariance matrix. The singularity of the covariance matrix may yield aσ 2 s close to zero, and further cause inflation of Z s . Therefore, we leverage a two-step procedure to improve the robustness of the proposed method. In the first step, we perform an LD-based clumping using PLINK software with an LD threshold of 0.99 to exclude SNPs with perfect LD [25]. Secondly, we compute a shrinkage estimate of the LD matrix using the R package "corpcor" [26,27]. After shrinkage, the estimated covariance matrix is always positive definite and well-conditioned.

Simulation Settings
We used genotype data from the Wellcome Trust Case Control Consortium (WTCCC) dataset in simulation studies (n = 15,757) [28]. The predicted openness weights of three cell types, Th1, GM12878, and A549, were used as true openness effects in simulations to capture the genetic architecture of openness effects. For each segment, the openness score for the l-th cell type (l = 1, 2, 3) is : where X is the genotype matrix of the segment, and W l is the openness weights vector for SNPs in the segment of the l-th cell type. A quantitative phenotype is generated using the openness scores of the segment of all three cell types, i.e., where λ l is the effect size of the openness score of the l-th cell type. The error terms are identical and are independently normally distributed.
To evaluate the type-I error rate, we set λ 1 = λ 2 = λ 3 = 0, which indicates the phenotype is not associated with any openness scores. With respect to the evaluation of statistical power, we varied the simulated genetic architectures in two respects: the number of causal cell types, and the heritability explained by the segment. Specifically, the number of causal cell types was decided by adjusting λ l with 1. λ 1 = 1, λ 2 = λ 3 = 0. Only the first cell type (Th1) is causal to the phenotype. 2. λ 1 = λ 2 = 1 2 , λ 3 = 0. The cell types Th1 and GM12878 are causal, while A549 is not.
3 . All three cell types are causal. The heritability was set to 0.02% for the low heritability setting, and 0.1% for the high heritability effect setting. We removed segments containing fewer than two SNPs in simulations. The procedure was repeated 10 times on 500 randomly selected segments on chromosome 1.

GWAS Datasets 2.4.1. GWAS Summary Statistics
We provide the details of GWAS data used in our analysis in Table S1. The UKBB GWAS summary statistics were downloaded from the second round of results released in August 2018 by the Neale group (http://www.nealelab.is/uk-biobank, accessed on 3 February 2021). The Genetic Epidemiology Research on Aging (GERA) summary statistics were used for replication analysis for HT (dbGaP: phs000674.v3.p3, http://cg.bsc.es/gera_ summary_stats/, accessed on 9 October 2021).

Individual-Level Genotype Data
We used individual-level genotype data from WTCCC for simulations and real data applications to compute heritability. For quality control, we removed SNPs with a genotyping failure rate larger than 0.02 and significant Hardy-Weinberg equilibrium with p < 10 −6 in PLINK 1.9 [25]. We also removed samples with a missing rate larger than 0.02. We excluded the HLA region (chr6: 28, 477, 797-33, 448, 354, hg19) in the heritability estimation.

Predicted Openness of Personal Genomes
We obtained the predicted openness effect (w j ) of each SNP via the deltaSVM [11]. The model was trained on DNase I-hypersensitive sites of 12 cell types from the UW ENCODE Project (http://www.beerlab.org/deltasvm/, accessed on 13 May 2021). A list of the cell types is provided in Table S2.

Pathway Enrichment Analysis
We tested the enrichment of OWAS genes in KEGG pathways [29] using the R package "clusterProfiler" [30]. The false discovery rate (FDR) was controlled at the level of 0.2 with an R package "qvalue" [31,32].

Simulations
A key improvement of the OWAS-joint framework is to aggregate the statistical evidence across multiple cell types. We performed simulations to evaluate the type-I error rate and statistical power of OWAS-joint under varying genetic settings. The openness weights from three commonly used cell types, A549, GM12878, and Th1, were considered in the simulations (Section 2). In the null simulations, type-I error rates for both OWAS-joint and single-cell-type OWAS were well controlled (Table S3). Under various heritability settings, the statistical power was improved by OWAS-joint when the phenotype was correlated with the openness scores of multiple cell types (Figure 2, settings 2-3). Even when only one cell type was causal, the statistical power of OWAS-joint was comparable to the results based on the causal cell type (Figure 2, setting 1). We also compared the statistical power of OWAS-joint to simply combining results across three cell types with the Bonferroni correction. OWAS-joint achieved a consistent power improvement from 30.9% to 38.0% under low heritability settings, and 3.7% to 5.2% under high heritability settings.

OWAS-Joint Identifies More Genetic Signals
For real data applications, we applied OWAS-joint to six complex traits including CD [33], RA [34], HT [24], PrCa [35], HDL [36], and LDL [36]. A detailed description of the GWAS datasets is provided in Table S1. We compared the results of OWAS-joint to that of single-cell-type OWAS, based on the 12 common cell types from the UW EN-CODE Project (Section 2). Consistent with the simulation studies, OWAS-joint identified more trait-associated genes and segments than the single-cell-type OWAS and the union with Bonferroni correction (Table 1 and Figure S1). For example, OWAS-joint identified 382 significant genes for CD, while the single-cell-type OWAS identified 293 genes on average (standard deviation, SD = 13). The same patterns were observed in the analysis of segments, where OWAS-joint identified 2743 trait-associated segments. In contrast, on average, the single-cell-type OWAS identified 1138 segments (SD = 48). Table 1. The number of segments and genes identified by OWAS-joint, the union of single-cell-type OWAS with 12 cell types, and the average number of single-cell-type OWAS. The standard deviations across 12 cell types are shown in brackets. The p-value cutoffs for the union of segment-level association tests were determined by Bonferroni correction. The largest numbers of identified signals are highlighted in boldface.

OWAS-Joint Provides Novel Biological Interpretation
We first performed KEGG pathway enrichment analysis on genes identified by OWASjoint and single-cell-type OWAS analysis (Figures S2 and S3). The related cell type for single-cell-type OWAS was pre-specified based on domain knowledge [13,19,37] (Table S1). Here, we focus our discussions on pathways that were enriched in OWAS-joint genes but not in single-cell-type OWAS genes. The two infection-related pathways, the hepatitis B (q-value = 0.14) and measles (q-value = 0.16) pathways, were enriched for CD. Infectious agents were implicated in the initiation, maintenance, and risk of chronic inflammation in CD [38,39]. The antigen processing and presentation pathway (q-value = 0.16) was also enriched in OWAS-joint genes for CD, which confirms the pivotal role of antigenpresenting cells in intestinal inflammation [40]. Another example is the PPAR signaling pathway, which was enriched in OWAS-joint genes for LDL (q-value = 0.12). As discussed in Gusev et al. (2018), PPARG activation increases LDL metabolism via induction of LDLR and CYP7A1 [41]. It has also been reported that PPAR agonists decrease glycated LDL uptake into macrophages via regulation of lipoprotein lipase [42]. We also found the prostate cancer pathway was enriched in OWAS-joint genes for PrCa (q-value = 0.17), which validates the results.
OWAS-joint genes can also be validated with etiological evidence. We highlight several examples of CD. As discussed in Verlaan et al. (2009), the allele-specific chromatin remodeling in ZPBP2 (OWAS-joint p-value = 2.1 × 10 −10 ) is associated with the risk of autoimmune disease [44]. Furthermore, LRRK2 (OWAS-joint p-value = 9.0 × 10 −16 ) was found to suppress the transcriptional activity of NFAT1, which has been considered a key target for treating immune disorders [45]. In addition, we also found some susceptible genes for ulcerative colitis (UC). UC is known to have a high genetic correlation (around 0.7) with CD [46]. An example is GNAI2 ( OWAS-joint p-value = 2.5 × 10 −9 ), which plays an essential role in the inflammatory process [47]. GNAI2-deficient mice develop a lethal diffuse colitis with clinical and histopathological features, which is similar to UC in humans [48].

More Heritability Explained by OWAS-Joint Segments
As discussed in Song et al. (2021), OWAS segments explain more heritability compared with the genes or annotated SNP sets found by other methods, such as FUSION [49] and CADD [50]. In the following, we show that OWAS-joint segments explain more heritability than single-cell-type OWAS. Heritability was evaluated with WTCCC individual-level genotype data for CD, HT, and RA. The samples for evaluation were independent of the samples in the discovery cohorts. From Figure 3, we can see that the heritability explained by OWAS-joint segments was higher or comparable to the highest heritability explained by the single-cell-type OWAS segments with each of the 12 cell types. The patterns were consistent across the three traits under varying p-value thresholds, and the improvement was especially notable for RA.

Replication Rates of OWAS-Joint Segments
We evaluated the replication rates of OWAS-joint segments of the six traits mentioned above. The independent GWAS cohort details for evaluating replication rates are provided in Table S4. We first binned the SNPs by GWAS p-values in the discovery cohorts. For SNPs, either identified by OWAS-joint (i.e., SNPs that are harbored by significant segments identified by OWAS-joint) or not, we calculated the replication rate in the replication GWAS cohort within each bin. OWAS-joint achieved high replication rates for all six phenotypes and all p-value bins (Figure 4). This indicated that OWAS-joint segments effectively detected the truly associated SNPs. The improvement in the replication rates was more prominent for GWAS SNPs with moderate p-values, which indicated the benefits of aggregating multiple cell types when the GWAS power is limited.

Discussion
Despite the success of GWAS in identifying tens of thousands of associations between SNPs and complex traits, the interpretation of GWAS signals remains challenging. The growing amount of cell-type-level epigenomic data has increased our understanding of non-coding variations, but it is still unclear which cell types are more important for the development of complex traits. In this article, we propose an efficient statistical framework, aggregating personalized chromatin accessibility across multiple cell types, for prioritization of disease-related genomic segments. OWAS-joint bypasses the selection of trait-related cell types, which addresses the main challenge of cell type selection in OWAS analysis. OWAS-joint tests the mediating effects of chromatin accessibility by quantifying the association between personalized openness across multiple cell types and the pheno-type of interest. The results of OWAS-joint provide a biological interpretation, especially for non-coding variants, and promote our understanding of disease mechanisms.
In both simulations and real data applications, we demonstrated that OWAS-joint improves the statistical power and identifies more potential genetic signals than singlecell-type OWAS analysis. In addition, OWAS-joint segments were shown to have greater heritability enrichment and replication rates than OWAS. OWAS-joint takes GWAS summary statistics as input, which guarantees its general applicability without privacy concerns. The method can be easily applied to large cohorts (e.g., UKBB), and maintains high computational efficiency. The computational framework can also be extended to other epigenetic features for understanding of the mediation effects of epigenetic features for complex traits.
There are several limitations of our method. First, similar to OWAS, OWAS-joint is based on the openness effects predicted by machine learning methods. Therefore, its performance is affected by the prediction accuracy of the openness effects. When provided with more data sources (e.g., RNA-seq or Chip-seq data [51,52]), it is expected that the openness will be predicted more accurately. A recent study has shown that integrating sequence information and the binding status of transcription factors further improved the prediction accuracy of chromatin accessibility [12]. Second, we assume an additive model for the openness effect at SNP level. It may be of value to consider and evaluate the performance of other statistical models, such as dominant or recessive models. Third, our method provides a novel perspective to interpret non-coding variants by incorporating the change in chromatin accessibility in multiple cell types, but the method alone cannot infer causality. Fourth, we also note that the ACAT weights in Equation (1) can be flexibly specified based on domain knowledge of the relativity between the cell type and the trait. Future work is needed to design optimal weights based on data and background knowledge to further improve the performance of OWAS-joint.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10.339 0/genes13071220/s1. Figure S1: The number of significant genes identified by OWAS-joint and OWAS with each of the 12 common cell types. Figure S2: KEGG pathway enrichment analysis results of OWAS-joint genes. Figure S3: KEGG pathway enrichment analysis results of single-cell-type OWAS genes. Table S1: GWAS summary statistics used in OWAS-joint and single-cell-type OWAS analysis. Table S2: Summary information of the 12 common human cell types from UW ENCODE and the corresponding tissues in OWAS analysis in simulations. Table S3: Type-I error rates of OWAS-joint, single-cell-type OWAS with each of the three cell types, and a union of single-cell-type methods with Bonferroni correction. Table S4: GWAS summary statistics for replication studies.