Expression Quantitative Trait Locus Mapping in Pulmonary Arterial Hypertension

Expression quantitative trait loci (eQTL) can provide a link between disease susceptibility variants discovered by genetic association studies and biology. To date, eQTL mapping studies have been primarily conducted in healthy individuals from population-based cohorts. Genetic effects have been known to be context-specific and vary with changing environmental stimuli. We conducted a transcriptome- and genome-wide eQTL mapping study in a cohort of patients with idiopathic or heritable pulmonary arterial hypertension (PAH) using RNA sequencing (RNAseq) data from whole blood. We sought confirmation from three published population-based eQTL studies, including the GTEx Project, and followed up potentially novel eQTL not observed in the general population. In total, we identified 2314 eQTL of which 90% were cis-acting and 75% were confirmed by at least one of the published studies. While we observed a higher GWAS trait colocalization rate among confirmed eQTL, colocalisation rate of novel eQTL reported for lung-related phenotypes was twice as high as that of confirmed eQTL. Functional enrichment analysis of genes with novel eQTL in PAH highlighted immune-related processes, a suspected contributor to PAH. These potentially novel eQTL specific to or active in PAH could be useful in understanding genetic risk factors for other diseases that share common mechanisms with PAH.


RNA sequencing and transcript abundance estimation
Whole blood (3ml) was collected in Tempus™ Blood RNA Tubes, which were stored at -80 o C until required. RNA was extracted using a Maxwell robotic system (Promega). Samples with a 260/230 ratio >1.5 and a 260/280 ratio in the range 1.9-2.1 were further quality checked by Bioanalyser and those achieving a minimum RNA Integrity Number (RIN) of 7 were submitted for sequencing. Globin-Zero Gold rRNA Removal Kits (Illumina Inc, San Diego, CA) were used to remove ribosomal RNA contamination from whole blood RNA samples. 75bp paired-end sequencing on a Hiseq4000 was performed on pooled libraries of ~80 samples.
Fastq files (raw reads from RNAseq) were analysed using Salmon v0.9.1 (Patro et al., 2017) and GENCODE release 28 to produce transcript abundance estimates which were converted to gene expression data using tximport in R (Soneson et al., 2015). Salmon, the first transcriptome-wide quantifier to correct for fragment GC-content bias was used, which substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis (Patro et al., 2017).

eQTL validation procedure
To assess the extent to which expression quantitative trait locus (eQTL) -transcript pairs in PAH overlap with previously reported eQTL-transcript pairs described in healthy populations, we calculated the validation rate of our findings in the two largest published eQTL studies to date and the Genotype-Tissue Expression (GTEx) Project (Westra et al., 2013, Joehanes et al., 2017, Aguet et al., 2019. Validation rate was defined as the number of significant eQTL-transcript pairs in this study confirmed by the published study divided by the total number of significant eQTL-transcript pairs that were tested by the published study and multiplied by one hundred. We extracted all eQTL with effects below the study specific significance threshold from the published studies` results for all significant transcripts in this study. Ensembl identifiers were used for the transcripts and genomic coordinates (chromosome and base pair position) on the Genome Reference Consortium Human Build 37 for eQTL when matching my eQTL-transcript pairs to those published by the external studies.
An eQTL-transcript pair was considered confirmed if the lead variant was in linkage disequilibrium (r 2 ≥ 40%) with the lead eQTL of the same transcript in the published study. For this purpose, a list of correlated variants was obtained from the European population of the 1000 Genomes Project (Durbin et al., 2010) -using the R package 'proxysnps' -for each lead eQTL reported by the studies used for validation. Each eQTL-transcript pair that reached the study-specific significance threshold in at least one of the studies was considered confirmed.
Since the complete list of variants or transcripts that passed study-specific quality control is not usually made available by published studies, we had to make assumptions when determining the total number of my significant eQTL-transcript pairs tested in the external studies. This did not apply to GTEx results where all tested eQTL-transcript pairs could be retrieved.
All transcripts present on the expression array used in the other two studies were assumed to have been tested. Annotation files were downloaded from the manufacturer`s website for the complete list of transcripts present on the expression arrays. Additionally, all eQTL in this study were assumed to have been available for analysis in the studies used for validation which used genotyping array data imputed to high-density reference panels. We restricted our analyses to common variants with a minimum minor allele frequency of 5% which is at least as high as those of the published eQTL studies used for validation.

eQTL studies used for validation
We selected two of the largest published eQTL studies and the GTEx Project to compare our results to. Westra et al. (Westra et al., 2013) meta-analysed eQTL effects from seven studies totaling 5,311 individuals to identify cis-acting effects genome-wide as well as trans-acting effects of 4,542 variants implicated in diseases and traits from the GWAS Catalog (Buniello et al., 2019) at the time of their study. The other eQTL study used for validation was published by Joehanes et al. (Joehanes et al., 2017) who conducted the largest to-date single cohort transcriptome-wide analysis testing both cisand trans-acting elements genome-wide in the whole blood samples of 5,257 individuals from the Framingham Heart Study. The GTEx Project aims to create a database of genotype and gene expression correlations in multiple human tissues of consenting donors to aid the scientific community in understanding inherited disease susceptibility. The GTEx website provides a browser (https://gtexportal.org/home/testyourown) for retrieving results for any given variant and transcript pair based on Ensembl identifiers in the tissue of interest.
The current release (V8) of the GTEx Project has 838 post-mortem donor samples with genetic data of which 670 contributed to the eQTL mapping in whole blood (GTExPortal, Aguet et al., 2019). Only tissue samples that passed histological examination were accepted for the project, however tissue exclusions were not made based on cause of death. Demographics and cause of death statistics can be found on the GTEx Portal (https://gtexportal.org/). Westra et al. meta-analysed seven studies that measured gene expression in peripheral blood on one of Illumina`s whole-genome Expression BeadChips (HT12v3, HT12v4 or H8v2 arrays). Since most of the cohorts used the HT12v3 array, the analyses were restricted to transcripts present on this array (Westra et al., 2013). Joehanes et al. used the Affymetrix Human Exon ST 1.0 array for the whole cohort (Joehanes et al., 2017). The complete list of transcripts was obtained from the manifest files, namely HumanHt-12_V3_0_R3_11283641_A.bgx for Illumina and HuEx-1_0-st-v2.na33.1.hg19.probeset.csv for Affymetrix, available on the manufacturers` websites.
The genotype data from Westra et al. and Joehanes et al. were imputed to the largest haplotype reference panels available at the time of conducting their analyses. Westra et al. mapped cis-eQTL differently from our and the other two studies` approach, using a 250 kilobases (kb) maximum distance from the probe midpoint to demarcate cis effects, while eQTLs with a distance greater than 5 Mb were defined as trans-eQTLs. The validation rate of our trans eQTL was not assessed using Westra`s trans eQTL results as they have only tested a small set of selected variants. Also, cis-eQTL in this study were restricted to the ones not farther than 250 kB from the transcript`s TSS when validating cis-eQTL with Westra`s results.

Supplementary Tables
Supplementary