Genetic Architecture of Untargeted Lipidomics in Cardiometabolic-Disease Patients Combines Strong Polygenic Control and Pleiotropy

Analysis of the genetic control of small metabolites provides powerful information on the regulation of the endpoints of genome expression. We carried out untargeted liquid chromatography–high-resolution mass spectrometry in 273 individuals characterized for pathophysiological elements of the cardiometabolic syndrome. We quantified 3013 serum lipidomic features, which we used in both genome-wide association studies (GWAS), using a panel of over 2.5 M imputed single-nucleotide polymorphisms (SNPs), and metabolome-wide association studies (MWAS) with phenotypes. Genetic analyses showed that 926 SNPs at 551 genetic loci significantly (q-value < 10−8) regulate the abundance of 74 lipidomic features in the group, with evidence of monogenic control for only 22 of these. In addition to this strong polygenic control of serum lipids, our results underscore instances of pleiotropy, when a single genetic locus controls the abundance of several distinct lipid features. Using the LIPID MAPS database, we assigned putative lipids, predominantly fatty acyls and sterol lipids, to 77% of the lipidome signals mapped to the genome. We identified significant correlations between lipids and clinical and biochemical phenotypes. These results demonstrate the power of untargeted lipidomic profiling for high-density quantitative molecular phenotyping in human-genetic studies and illustrate the complex genetic control of lipid metabolism.


Introduction
Molecular-phenotyping tools based on transcriptome, proteome and metabolome technologies provide detailed information on the molecular pathways and biomarkers relevant to disease etiopathogenesis. Their application in the context of genome-wide association studies (GWAS) of complex disorders can enhance our understanding of the genetic control of genome expression and to dissect out disease variables into multiple, intermediate disease traits and molecular phenotypes [1,2]. Metabolomics, which analyses the multivariate data representing a range of small metabolites in a biological sample, has already been used in humans to map the genetic determinants of the quantitative variations

Clinical-Data Analysis
The study group has a mean age of 57.4 ± 0.7 years and 56.4% (n = 154) of the individuals were males (Table 1). All individuals in the cohort were devoid of evidence of coronary artery stenosis, as assessed by an angiogram analysis. Analyses of the pathophysiological components of the cardiometabolic syndrome revealed that 132 individuals (49%) were obese (BMI > 30 kg/m 2 ), 46 had type 2 diabetes (17%), 147 were hypertensive (54%) and 119 were hyperlipidemic (44%), with a similar proportion of affected males and females ( Table 2). Table 1. Clinical and biochemical features of individuals in the study group used for metabolomic profiling. Individuals were selected for absence of coronary stenosis. Data are given as means ± SEM. Number of cases are reported in parentheses. Gender differences were tested using two-way ANOVA.

All
Females

General Features of Untargeted-Lipidome Data
Untargeted-lipidome profiling retrieved 3013 spectral features characterized by a specific mass-to-charge ratio (m/z) and retention time (RT) (1529 in the negative-ionization mode and 1484 in the positive-ionization mode) that met the acceptance criterion (i.e., Relative Standard Deviation (RSD) < 30%, also referred to as Coefficient of Variation CV) (Supplementary Table S1). Multivariate Principal Component Analysis (PCA) analysis showed the absence of strong technical drift during spectral-data acquisition in the cohort, as illustrated by the PCA scores' 2D plot representation of the QC samples in the two ionization modes (Supplementary Figure S1). The QC samples were tightly clustered, which indicates an acceptable reproducibility of the retained set of metabolic features as well as a good stability of the LC-MS-profiling experiments.

Genetic Analysis of Lipid Metabolism Uncovers Evidence of Pleiotropy
We identified 44 SNP loci that control two or more metabolic features, indicating potential pleiotropic effects of genetic variants, as illustrated in Figure 3B, where closely linked SNPs on chromosomes 6 and 13 are associated with a different m/z. For example, the above-mentioned SNP rs6928180 in GRIK2 was associated with several lipidome features under monogenic control (m/z 344.279, q-value = 1.89 × 10 −23 ; m/z 370.295, q-value = 1.14 × 10 −32 ; m/z 398.326, q-value = 4.96 × 10 −34 ; m/z 400.342, q-value = 3.68 × 10 −28 ; m/z 426.357, q-value = 7.38 × 10 −18 ) suggesting a pleiotropic effect of variants in GRIK2 on distinct but coordinately regulated lipids (Table 3). Along the same line, marker rs12997234 on chromosome 2 in an intron of DPP10 was exclusively associated with the monogenic control of m/z 568.34 (q-value = 1.73 × 10 −11 ) and m/z 590.32 (q-value = 2.93 × 10 −17 ) in the positive-ionization mode and with m/z 612.33 (q-value = 1.46 × 10 −9 ) in the negative-ionization mode ( Table 3). The most striking example of pleiotropy was detected on chromosome 13 at the locus rs1410818 and 11 distinct m/z values (Supplementary Table S2).

Genetic Analysis of Lipid Metabolism Uncovers Evidence of Pleiotropy
We identified 44 SNP loci that control two or more metabolic features, indicating potential pleiotropic effects of genetic variants, as illustrated in Figure 3B, where closely linked SNPs on chromosomes 6 and 13 are associated with a different m/z. For example, the above-mentioned SNP rs6928180 in GRIK2 was associated with several lipidome features under monogenic control (m/z 344.279, q-value = 1.89 × 10 −23 ; m/z 370.295, q-value = 1.14 × 10 −32 ; m/z 398.326, q-value = 4.96 × 10 −34 ; m/z 400.342, q-value = 3.68 × 10 −28 ; m/z 426.357, q-value = 7.38 × 10 −18 ) suggesting a pleiotropic effect of variants in GRIK2 on distinct but coordinately regulated lipids (Table 3). Along the same line, marker rs12997234 on chromosome 2 in an intron of DPP10 was exclusively associated with the monogenic control of m/z 568.34 (q-value = 1.73 × 10 −11 ) and m/z 590.32 (q-value = 2.93 × 10 −17 ) in the positive-ionization mode and with m/z 612.33 (q-value = 1.46 × 10 −9 ) in the negative-ionization mode ( Table 3). The most striking example of pleiotropy was detected on chromosome 13 at the locus rs1410818 and 11 distinct m/z values (Supplementary Table S2).

Assignment of Lipids to Lipidomic Features Mapped to the Human Genome
We next carried out the identification of candidate lipids for each of the 74 features showing evidence of genetic control. Using the LIPID MAPS database, we were able to annotate 26 lipidome signals with a single lipid, including 10 which were controlled by a single genetic locus (Table 3). Several lipid candidates could be proposed for the remaining 48 lipidome features, which prevented the unambiguous assignment of lipids. The vast majority of assigned lipids were fatty acyls (27), sterol lipids (23), triacylgycerols (9) and, to a lesser extent, a combination of phosphatidylcholines, phosphatidylethanolamine and phosphatidylserines (20).

Metabolome-Wide Association Studies Identify Metabolites Associated with Clinical and Biochemical Phenotypes
To test for evidence of association between clinical and variations in biochemical phenotypes and compounds from the lipidome dataset mapped to the human genome, linear regression was performed. Results from associations with a nominal p < 0.05 are given in Supplementary Table 3. Significant associations (q-value < 0.05) with multiple metabolic features were detected for cardiometabolic disease (Table 4) Table S3). Associations to family history of hypertension and diabetes independent to association to the diseases suggest that the underlying lipidomic feature may be a predictive marker of both diseases.    We did not identify statistically significant associations to LDL cholesterol or triacylglycerols. However, over 60 lipidomic features showed marginal evidence of co-association (nominal p-value < 0.05) to both LDL and HDL cholesterol (e.g., m/z 129.98 and 171.99) and five features (m/z 213.15, 367.23, 367.26, 369.27 and 722.50) were marginally associated to triacylglycerols and total, HDL and LDL cholesterol (Supplementary Table S3). No significant associations were found between spectral data and other phenotypes. We were able to assign one or several putative lipids to 14 lipidome signals, including ST 27:2;O;Hex and ST 28:1;O5, which were found to be regulated by multiple genetic loci (Table 4). Table 4. Significant associations between lipidomic features and clinical and biochemical phenotypes in the study group. Lipidomic features were independently acquired in negative-and positiveionization modes in serum samples from a study group of 273 individuals. Linear regression was used to compute a P-value statistic for each metabolic feature, which was corrected for multiple testing using the Benjamini-Hochberg method to calculate adjusted p-values. Significant evidence of association was obtained for cardiometabolic disease (CMD), family history (FH) of hypertension, body-mass index (BMI) and total and HDL cholesterol. CMD was assessed by presence of at least three anomalies (diabetes, hypertension, BMI > 30kg/m 2 , HDL < 40mg/dl). Results from association analysis for all phenotypes that did not reach statistical significance following correction for multiple testing (nominal p-value < 0.05) are shown in Supplementary Table S3

Discussion
We report results from the genome mapping of untargeted serum lipidomics in a group of individuals characterized for pathophysiological features of the cardiometabolic syndrome. We identified evidence of strong polygenic control of lipid features and instances of mechanisms of pleiotropy in the regulation of lipid metabolism. These observations illustrate the complex genetic architecture of serum lipid regulation and provide novel information beyond the genetic control of cholesterol metabolism.
Both proton nuclear magnetic resonance ( 1 H NMR) and mass spectrometry (MS) have been successfully used to map the genetic control of predominantly serum metabolites in genome-wide association studies (GWAS) in humans [17]. Collectively, over 1800 metabolomic data (i.e., known and unknown metabolites and ratios) have been associated with over 40,000 unique SNPs [18]. Among these, MS-lipidomic data provide significant advances in our understanding of the etiopathogenesis of diseases characterized by anomalies in lipid metabolism [19]. Untargeted lipidomics, a hypothesis-free strategy that has the power of deepening quantitative lipid analyses to unassigned lipids, remains challenging due to the breadth and intrinsic complexity of known lipids, which differ in terms of physicochemical properties [13,20]. As a consequence, harmonization of sample preparation for such a heterogeneous group of molecules is a problematic issue that limits detection and quantification of the broad diversity of lipid species [21]. In addition, variations in MS-instrument stability affect repeatability within and between experiments. Finally, the unambiguous assignment of putative lipids to MS-spectral signals remains an important methodological consideration in the application of untargeted MS lipidomics in GWAS.
Polygenic control is a hallmark of GWAS of human chronic diseases and complex phenotypes, and the genetic regulation of metabolomic profiling data does not make any exceptions [22][23][24]. We show that serum-lipid abundance exhibits predominant polygenic control, when a single metabolite is associated with several unlinked SNPs. Results from lipidomic GWAS have shown that about 30% of lipids are associated with several genetic loci [16]. Specifically, loci on chromosomes 2 and 4 control triglyceride TAG(50:1;0), loci on chromosomes 8 and 11 are associated with triglyceride TAG(52:3;0) and loci on chromosomes 12 and 18 control lysophosphatidylcholine LPC(14:0;0) [15]. This pattern of polygenic control suggests either functional redundancy of proteins in the regulation of lipid metabolic pathways, or the involvement of distinct proteins each contributing in parallel or in concert to interconnected mechanisms of lipid sensing, synthesis, transport and degradation.
Our association results also suggest apparent pleiotropy when a single genetic locus controls multiple, different lipidomic features. It is expected to occur in metabolic processes, since altered regulation of an individual protein involved in an enzymatic reaction or metabolite binding or transport may result in changes in interconnected biological pathways affecting multiple metabolites. An excess of distinct lipid species associated with genomic regions in lipidomic GWAS suggests the widespread occurrence of this phenomenon in the regulation of lipid metabolism [15,23,24]. Harshfield et al. reported the genetic mapping of 181 lipids to only 24 genomic regions [16], and Tabassum et al. identified associations to 42 lipid species in 11 genomic regions [15], thus implying that one genomic region is associated with several lipids. One of the most striking examples of pleiotropy in lipidomic GWAS is the GCKR locus, which is associated with over 30 lipid species [25]. The eicosanoid metabolic network, which involves 28 proteins for the production of over 150 lipids, provides a further example of pleiotropy in the regulation of lipid biology [26]. These coordinately regulated lipid clusters suggest the existence of genetically-determined "lipidotypes".
Combined with clinical data, lipidomic-based phenotyping allows the definition of disease-associated biomarkers as well as druggable-metabolite targets. Integrating genotyping data can identify instances of co-localization of disease-risk SNPs and loci associated with metabolomic features, which may represent disease-causative molecular biomarkers [15,16,27]. With the exception of SEL1L2 and SYT9, gene loci showing evidence of monogenic control of lipids in our study have been associated with disease-relevant phenotypes (e.g., body mass index), biochemical variables (e.g., creatinine) and behavioral traits in the GWAS repository (www.ebi.ac.uk/gwas/, accessed on 1 May 2022). Interestingly, multiple SNPs, the locus of the gene encoding pleckstrin and the Sec7 domain containing 3 (PSD3), which controls the level of a carnitine in our study, have been consistently associated with triglycerides and cholesterol levels as well as type 2 diabetes and obesity [28], and their downregulation results in reduced hepatic lipids in vitro and protects against fatty liver in vivo in mice [29].
Considering the breadth of circulating lipid species [7,21] and their roles in cardiovascular diseases [19], we were able to map the genetic control of several lipid species, mostly fatty acyls, phospholipids and triglycerides. On the other hand, we were unable to identify genetic loci associated with several important lipid species, including, for example, sphingomyelins and ceramides, which are involved in cardiovascular risk [9,30]. This may be caused by technical issues with data acquisition and the relatively modest sample size of the study but may also be accounted for by specific clinical features of the individuals selected in our study. Absence of coronary-artery stenosis in these individuals suggests reduced cardiovascular risk and, therefore, potentially limited quantitative variations in blood ceramides in cases and controls that may prevent genetic mapping. In support of this hypothesis, we did not identify statistically significant associations between lipidomic features and hypertension, which might nevertheless be improved with the use of intermediate, quantitative phenotypes, including measures of blood pressure. In addition, the fact that CMD patients may be under various medications, including lipid-lowering drugs (statins) or anti-diabetic treatments that result in improved control of blood pressure [31], may explain the absence of statistically significant associations between lipidomic features and hypertension in our study. However, our results suggest a role of lipids in the family history of hypertension, which may represent disease-predictive markers.

Study Subjects
The study group consisted of 273 subjects selected from a larger study recruited between 2006 and 2009 for inclusion in the FGENTCARD patient collection, primarily designed to map the genetic determinants of coronary artery stenosis [32]. Individuals from the FGENTCARD cohort were originally referred to a catheterization care unit for clinical evaluation. A 20 mL blood sample was collected in overnight fasted individuals from the peripheral femoral artery during the coronary angiography for serum preparation. Patients provided a written consent for the whole study including genomic analyses. The Institutional Review Board (IRB) at the Lebanese American University approved the study protocol.
Body weight, body-mass index (BMI) and blood chemistry (total, HDL and LDL cholesterol, triglycerides) were determined. Evidence of diabetes (fasting glucose > 125 mg/dl), hypertension (blood pressure > 10/14 mm Hg) and obesity (BMI > 30) was recorded in individuals' medical charts. Evidence of cardiometabolic disease (CMD) was assessed by presence of at least three anomalies (diabetes, hypertension, BMI > 30 kg/m 2 and HDL < 40 mg/dl). All 273 individuals selected for this genetic study were devoid of vessel stenosis, assessed through coronary angiography carried out at a single recruitment site. Family history of diabetes and hypertension, defined by presence of the disease in a sibling, parent or second-degree relative, was also recorded.
Statistical analysis of clinical and biochemical data was performed using two-way ANOVA. Differences were considered statistically significant with a p < 0.05.

Chemicals
Isopropanol, acetonitrile, formic acid and ammonium formate were LC-MS Chromasolv ® Fluka and high-performance liquid chromatography (HPLC) quality and were purchased from Sigma-Aldrich (Sigma-Aldrich, Saint-Quentin Fallavier, France). Ultra-pure water (resistivity: 18 mΩ) was obtained with a Milli-Q Integral purification system (Millipore, Molsheim, France) fitted with a 0.22 µm filter. The mobile phase was prepared with a solvent containing 400 mL of water, 600 mL of acetonitrile, 0.1% formic acid and 0.630 g of ammonium formate, and a solvent containing 100 mL of acetonitrile, 900 mL of isopropanol, 0.1% formic acid and 0.630 g of ammonium formate.

Sample Preparation
Lipid extraction from serum was performed using isopropanol (1:6, v/v), as recommended by the MS-equipment supplier, which is the most robust solvent enabling a broad coverage and recovery of lipid species from serum [33]. Experiments were carried out with 50 µL serum aliquots. Samples were then centrifuged at 14,000 g, and supernatants were then transferred to vials for injection in the UPLC system.

UPLC analysis
A Waters Acquity UPLC ® (Waters Corp, Saint-Quentin en Yvelines, France) fitted with a Acquity CSH C18 column (2.1 × 150 mm, 1.7 µm) and a corresponding guard column (Acquity CSH 1.7µM) (Waters Corp, Saint-Quentin en Yvelines, France) were used to analyse lipid compounds in serum samples as previously described [34]. The oven temperature was set at 55 • C. The flow rate used for these experiments was 400 µL/min and a volume of 5 µL of sample was injected. The total run time was 24 min. A binary gradient consisted of above-described mobile phases was used according to Waters' recommendation. Mobile phase B was maintained at 99% during 4 min at the end of the gradient.

Mass Spectrometry
Mass spectrometry was carried out as previously [34]. The UPLC system was coupled with a Q-Exactive™ Hybrid Quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific, Illkirch, France). Infusion of a calibration mixture (caffeine, MRFA and Ultramark ® 1621) was used for calibration of the instrument. Parameters of the heated-electrospray (HESI-II, Thermo Fisher Scientific, Illkirch, France) interface were as follows: S-Lens 50 V, Sheat gas: 65, Auxiliary gas: 25 arbitrary units, capillary voltage 3 kV, capillary temperature 350 • C and vaporization temperature 60 • C. The maximum target capacity of the C-trap (autogain control, AGC) target was defined as 3e6 ions and the maximum injection time was 200 ms. Full scans were obtained in positive and negative ion modes simultaneously with a resolution of 70,000 full width at half maximum (FWHM), in the scan range of mass-to-charge ratio (m/z) of 85-1275.

Untargeted Lipidomic Data Analysis
Analysis of MS data derived from UPLC complied with standard protocols and food and drug administration (FDA) guidelines [35,36], as previously described (34). XCMS tools implemented in R statistical language (v 3.1.0) (http://www.bioconductor.org, accessed on 10 May 2020) were used for preprocessing steps of MS data analysis (peak picking, peak grouping, retention-time correction, annotation of isotopes and adducts). Profiles of positive and negative ionization modes were separately extracted and converted into mzXML format for preprocessing by the XCMS tools. Identification of Regions of Interest (ROI) used the wavelet-based peak-picking approach (centwave). MS-data preprocessing resulted in a peak table listing lipidomic features characterized by a retention time (RT), mass-to-charge ratio (m/z) and corresponding intensity for each serum sample.
A data matrix reduction was applied to retain spectral features consistently found in the individuals. Over 40% of missing values were withdrawn. Performance and reliability of the analytical process and compliance of data with FDA-acceptance criteria [37] were also verified through a quality assurance (QA) strategy, based on analysis of a pooled qualitycontrol (QC) sample, which was injected every 10 samples throughout the analytical run. Median fold-change-normalization approach [38] was applied on the retained MS features, followed by a generalized log-transformation. A threshold of 30% calculated for each metabolic feature in the QC samples was set for relative standard deviation (RSD), which is an accepted standard to assess data reproducibility in metabolomic studies [35,36]. Four samples were identified as outliers and were discarded from the study. The resulting matrix was then used for multivariate and univariate statistical analyses (principal component analysis and linear regression).

Metabolome-Genome Wide Association Studies (mGWAS)
All individuals were genotyped by Illumina Human610-Quad BeadChip and Illumina Human660W-Quad BeadChip, respectively (552,510 overlapping SNPs), as part of the FGENTCARD consortium [32]. All SNPs with over 98% genotyping success rate, minor allele frequency above 1% and in Hardy-Weinberg equilibrium (p-value > 1 × 10 −7 ) were included in the analysis. An imputation across the whole genome to CEU HapMap population as a reference was performed using the IMPUTE2 tool [39], which yielded 2,573,690 SNPs. The plink tool [40] was used to perform both association analyses based on an additive genetic model. An FDR adjusted p-value (q-value) < 1 × 10 −8 was considered to be significant genome-wide. Plotting circles were generated using an in-house tool specifically developed to illustrate mGWAS associations.

Metabolome-Wide Association Studies (MWAS)
A linear-regression model was applied to carry out MWAS through the assessment of association, between each metabolic feature with clinical and biochemical continuous phenotypes (total, HDL and LDL cholesterol, triglycerides). Normality assumption of the residuals of each metabolic feature was investigated by Shapiro-Wilk test. The R statistical language was used to perform the linear regression and compute a p-value for each metabolic feature with a threshold of significance set to 0.05. Adjustment for age and sex was performed by including them as covariates in the statistical model. False discovery rates (FDR) were corrected using the Benjamini-Hochberg method to adjust P-values for false discovery involving multiple comparisons.

Assignment of Lipid Features
Annotation of lipid candidates corresponding to lipidome signals was carried out using the free resource LIPID MAPS (https://www.lipidmaps.org, accessed on 1 May 2022). We initially performed bulk-structure searches and subsequently refined our analysis by interrogating the LIPID MAPS Structure Database (LMSD) with a list of precursor ions. We entered the list of precursor ion m/z and chose appropriate polarity for the adduct ions. We defined a mass tolerance of ±0.001 m/z and sorted our data according to the delta between the input m/z and the m/z of candidate proposed in the database.

Conclusions
Results from our untargeted-lipidomic profiling provide information on fundamental mechanisms regulating serum lipids in humans. Replication of these findings in larger study populations and further analyses, such as MS/MS validation experiments designed to unambiguously assign lipids to lipidomic features, are required.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/metabo12070596/s1. Figure S1: 2-D Principal component analysis of mass spectrometry data in the cohort representing the scores of the first components; Table S1: Lipidome spectral features acquired in serum samples from the study population; Table S2: List of all SNPs showing evidence of statistically significant association with lipidome fearures; Table S3: Associations between lipidomic features and clinical and biochemical data.