The rs599839 A>G Variant Disentangles Cardiovascular Risk and Hepatocellular Carcinoma in NAFLD Patients

Simple Summary Dyslipidemia is a hallmark of nonalcoholic fatty liver disease (NAFLD) and the rs599839 variant in the CELSR2-PSRC1-SORT1 genetic cluster, has been associated with a protection against cardiovascular events. Here, we revealed a novel link between the rs599839 variant and hepatocellular carcinoma (HCC) whose onset in the context of NAFLD is rapidly increasing. We found that the rs599839 variant disentangled the risk of HCC from that of cardiovascular abnormalities by modulating SORT1 and PSRC1 expressions. The latter emerged as a potential modifier of liver carcinogenesis. Abstract Background and Aims: Dyslipidemia and cardiovascular diseases (CVD) are comorbidities of nonalcoholic fatty liver disease (NAFLD), which ranges from steatosis to hepatocellular carcinoma (HCC). The rs599839 A>G variant, in the CELSR2-PSRC1-SORT1 gene cluster, has been associated CVD, but its impact on metabolic traits and on the severity liver damage in NAFLD has not been investigated yet. Methods: We evaluated the effect of the rs599839 variant in 1426 NAFLD patients (Overall cohort) of whom 131 had HCC (NAFLD-HCC), in 500,000 individuals from the UK Biobank Cohort (UKBBC), and in 366 HCC samples from The Cancer Genome Atlas (TCGA). Hepatic PSRC1, SORT1 and CELSR2 expressions were evaluated by RNAseq (n = 125). Results: The rs599839 variant was associated with reduced circulating LDL, carotid intima-media thickness, carotid plaques and hypertension (p < 0.05) in NAFLD patients and with protection against dyslipidemia in UKBBC. The minor G allele was associated with higher risk of HCC, independently of fibrosis severity (odds ratio (OR): 5.62; 95% c.i. 1.77–17.84, p = 0.003), poor prognosis and advanced tumor stage (p < 0.05) in the overall cohort. Hepatic PSRC1, SORT1 and CELSR2 expressions were increased in NAFLD patients carrying the rs599839 variant (p < 0.0001). SORT1 mRNA levels negatively correlated with circulating lipids and with those of genes involved in lipoprotein turnover (p < 0.0001). Conversely, PSRC1 expression was positively related to that of genes implicated in cell proliferation (p < 0.0001). In TCGA, PSRC1 over-expression promoted more aggressive HCC development (p < 0.05). Conclusions: In sum, the rs599839 A>G variant is associated with protection against dyslipidemia and CVD in NAFLD patients, but as one it might promote HCC development by modulating SORT1 and PSRC1 expressions which impact on lipid metabolism and cell proliferation, respectively.


Transcriptomic analysis
Gene expression was performed in a subset of 125 severely obese patients (21 without and 104 with NAFLD) (Transcriptomic cohort) belonging to the Hepatology service cohort, of whom percutaneous liver biopsy was performed during bariatric surgery at Fondazione IRCCS Cà Granda, Ospedale Policlinico, Milan, Italy. The study was conformed to the Declaration of Helsinki and approved by the Institutional Review Boards and their Ethics Committees. All participants gave written informed consent. Clinical characteristics of the Transcriptomic cohort are presented in Table S2.
Total RNA was isolated using RNeasy mini-kit (Qiagen, Hulsterweg, Germany), according to the manufacturer's instructions. RNA quality was assessed through Agilent 2100 Bioanalyzer and samples with RNA integrity numbers (RIN) greater than or equal to 7 were used for library preparation. RNA sequencing was performed in paired-end mode with a read length of 150nt using the Illumina HiSeq 4000 (Novogene, Hong Kong, China). Illumina raw reads were mapped against the Human Genome [1] using a custom pipeline based on the standard primary analysis procedure. The pipeline performed the primary analysis step including FASTQ quality check (FastQC software, Babraham Bioinformatics, Cambridge, UK), low-quality reads trimming with Trimmomatic [2] and mapping on GRCh37 reference genome using STAR mapper [3]. RNASeq quality control was performed, and samples with less than 10 million reads uniquely mapped or with less than 60% uniquely mapped/mapped reads were excluded from the analysis. Gene reads count was performed using RSEM software [4]. To quantify gene expression RSEM per-gene counts were normalized using DESeq2 package [5].

Tumor stage system classification
The stage system comprises the overall parameters that describes the tumor (primary tumor size, spreading into lymph-nodes and metastasis. Higher stage was used to define an enhanced size and number of the tumor and its spreading in nearby tissues and distant part of the body: S0: carcinoma in situ; S1-4: advanced carcinoma progressively associated to poor prognosis (Stage 1 have often a good prognosis).
Depending on the amount of abnormality, HCC were graded as: G1: well differentiated (low grade); the organization of tumor tissue appears close to normal, G2: moderately differentiated (intermediate grade), G3: poorly differentiated (high grade), G4: undifferentiated (high grade).
Tumor size (T) and/or extension defines the aspects of original primary tumor, such as size, growth and invasion to nearby tissues: T0: no evidence of primary tumor, T1-4: increasing tumor size and spreading.

Bioinformatic Resources
LD was calculated based on the 1000 genomes project population (n = 503 samples) using LD calculator from ensemble GRCh37 (https://grch37.ensembl.org/Homo_sapiens/Tools/LD). LD values were calculated by a pairwise estimation between SNPs genotyped in the same samples and within a given window of 50.00 Kb spanning the chr1: 109,800,000-109,850,000. An established method was used to estimate the maximum likelihood of the proportion that each possible haplotype contributed to the double heterozygote. R squared and D-prime calculation were based on maximum likelihood solution to the cubic equation. We reported those variants whose R-square ranged from 0.8 to 1, which we defined as indicative of the presence of strong LD.
To constitute a protein-protein interaction network between PSRC1, SORT1 and CELSR2, we performed a prediction analysis by using Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) (https://string-db.org/; version 11.0).
To clarify biological functions of genes co-regulated with PSCR1 in TCGA dataset we performed a pathway-enriched analyses on the most strictly correlated genes with PSRC1 (q-Value < 0.01 at Benjamini correction) by using Toppgene (https://toppgene.cchmc.org/). The same bioinformatic tool has been used to investigate which pathways are enriched among the differentially expressed genes, identified in the group of patients with altered expression of PSRC1.  Table S3. Associations between the rs599839 genetic variant and the spectrum of liver damage in patients from the Overall cohort (n = 1426), ┼ at recessive model.       Table S9. REACTOME pathway-enriched analysis for 1665 genes down-regulated in patients with PSRC1 altered expression by Toppgene.    109,800,000-109,850,000; Human (GRCh37.p13)), using data from 1000 genomes project, that includes 503 individuals of European descent (CEU). Manhattan plot represents the SNPs located in the region spanning from 3'UTR of CELSR2, the intergenic region and 3'UTR of PSRC1 oriented in opposite direction (on x axis). All SNPs are plotted considering r 2 values (r 2 , on y axis). Dashed red line represents the cutoff of r 2 = 0.8, dashed blue line represents the maximum r 2 = 1. SNPs with r 2 > 0.8 are considered in LD (A). Network analysis with PSRC1 protein was conducted by using STRING, a web-based bioinformatic tool that show protein-protein (nodes) interactions (edges). Filled nodes represent proteins with known 3D structure. Green lines: text-mining; Black lines: co-expression. Empty node represents protein with unknown 3D structure. Protein-protein interaction enrichment was automatically calculated. p < 0.0001 (A). Connection CELSR2 -PSRC1 score of estimated probability of interaction: 0.876; Connection CELSR2 -SORT1 score: 0.778; Connection PSRC1 -SORT1 score 0.798. The network constructed indicates that these proteins has significantly more interactions than expected. Proteinprotein interaction enrichment: p = 8.44 x 10 -07 . Thus, PSRC1, SORT1 and CELSR2 proteins are co-express (B).