Customized Massive Parallel Sequencing Panel for Diagnosis of Pulmonary Arterial Hypertension

Pulmonary arterial hypertension is a very infrequent disease, with a variable etiology and clinical expressivity, making sometimes the clinical diagnosis a challenge. Current classification based on clinical features does not reflect the underlying molecular profiling of these groups. The advance in massive parallel sequencing in PAH has allowed for the describing of several new causative and susceptibility genes related to PAH, improving overall patient diagnosis. In order to address the molecular diagnosis of patients with PAH we designed, validated, and routinely applied a custom panel including 21 genes. Three hundred patients from the National Spanish PAH Registry (REHAP) were included in the analysis. A custom script was developed to annotate and filter the variants. Variant classification was performed according to the ACMG guidelines. Pathogenic and likely pathogenic variants have been found in 15% of the patients with 12% of variants of unknown significance (VUS). We have found variants in patients with connective tissue disease (CTD) and congenital heart disease (CHD). In addition, in a small proportion of patients (1.75%), we observed a possible digenic mode of inheritance. These results stand out the importance of the genetic testing of patients with associated forms of PAH (i.e., CHD and CTD) additionally to the classical IPAH and HPAH forms. Molecular confirmation of the clinical presumptive diagnosis is required in cases with a high clinical overlapping to carry out proper management and follow up of the individuals with the disease.

than one variant associated with the disease might explain why some individuals with only BMPR2 mutations do not develop PAH. This can explain in part why the presence of BMPR2 pathogenic variants alone is not always enough to develop the disease.
Thus, the main objective of this study was to study a cohort of three hundred patients with idiopathic, heritable, PVOD and APAH from the Spanish registry of patients with PAH (REHAP), through a custom, in-house NGS panel of 21 genes. Secondly, phenotype-genotype correlation and survival rate were also assessed.

Cohort Description
All patients from group one and idiopathic, heritable, and PVOD and patients with associated PAH to congenital heart disease and connective tissue disease included in the national Spanish Registry of Pulmonary Arterial Hypertension (REHAP) were eligible for this study (Supplementary Figure S1). REHAP is an observational national registry with the aim to serve as a database for research purposes. It includes patients with any form of PAH [21,22]. Clinical parameters for PAH diagnosis were obtained from the 2015 ERS/ESC guidelines [3].
In Spain, genetic testing of patients included in the REHAP has been offered since 2011, with a focus on IPAH, HPAH, and PVOD at the beginning but later expanded to associated forms as well. This study followed the ethical principles of the European board of medical genetics. All patients signed a consent form and genetic counseling pre and post-test was offered.
The study was approved by the ethical committee for scientific research of each participant center and also by the ethical committee of the La Paz University Hospital (CEIC-HULP PI-1210).
Before genetic testing, family history information was collected, and DNA samples from the probands were extracted. Segregation analysis in patients carrying a variant of unknown significance was performed when possible. In cases of unaffected carriers, a complete diagnostic and specific follow up was performed to rule out PAH.

Statistical Analysis
Categorical variables are presented as absolute frequency and proportions, and continuous variables as the mean ± standard deviation or median (interquartile range). In order to validate the normality of the statistical distribution, the Kolmogorov-Smirnov test was applied, and to compare clinical features and continuous variables among patients, t-student was calculated. For variables that did not follow a normal distribution, the Wilcoxon signed-rank test was used. The rate of percentages was compared by Chi-square test, Fisher exact test, or Wilcoxon test, as appropriate.
Survival analysis was performed using the Kaplan-Meier analysis, matching the date of diagnosis with the date of the first diagnostic right heart catheterization (RHC). All-cause mortality or bilateral lung transplantation were defined as the endpoint and the log-rank test was used for comparison between groups. Multivariate Cox regression models identified significant predictors of mortality. A comparison of the survival rate among different groups was also performed (Supplementary Figure S2).
A two-sided p-value of less than 0.05 was considered statistically significant. Statistical analyses were done using Stata (v12.1 for Mac; StataCorp, College Station, Brazos County, TX, USA).
Our Institute (INGEMM) has developed a suite of quality control (QC) scripts that simplify data quality assessment. QC included an assessment of total reads, library complexity, capture efficiency, coverage distribution (95% at ≥20×), capture uniformity, raw error rates, Ti/Tv ratio in coding regions (typically 3.2 for known sites and 2.9 for novel sites), and distribution of known and novel variants relative to dbSNP and zygosity.
Annotation: an automated pipeline for the annotation of variants derived from targeted sequences in the capture genes was developed. INGEMM script application returned several variant annotations including dbSNPrs identification, gene names with accession numbers, predicted functional effect, protein positions and, for amino-acid changes (dbNSFP, CADD), conservation scores and several population frequency databases, and known clinical associations along with a vast array of annotations for non-coding sequences derived from ENCODE. Databases for pathogenic variants such as ClinVar, Human Gene Mutation Database (HGMD), Mastermind, LOVD, Alamut, and Varsome have also been reviewed.
After that, the variant prioritization was performed according to the filtering strategy described in Figure 1. Finally, variants were classified according to the ACMG guidelines [23]. A custom script developed in-house called "LACONv" has also been developed by INGEMM in order to analyze large genomic DNA gain and losses or copy number variation (CNVs) which can be found in the following repository (https://github.com/kibanez/LACONv) [24].
Genes 2020, 11, x FOR PEER REVIEW 4 of 14 was performed with the Illumina MiSeq platform (Illumina, San Diego, CA, USA) following the manufacturer's instructions. Genes were chosen because of their probed association with PAH or based on previous research data. An in-house pipeline for bioinformatic analysis was developed to perform quality control and variant annotation. Our Institute (INGEMM) has developed a suite of quality control (QC) scripts that simplify data quality assessment. QC included an assessment of total reads, library complexity, capture efficiency, coverage distribution (95% at ≥20×), capture uniformity, raw error rates, Ti/Tv ratio in coding regions (typically 3.2 for known sites and 2.9 for novel sites), and distribution of known and novel variants relative to dbSNP and zygosity.
Annotation: an automated pipeline for the annotation of variants derived from targeted sequences in the capture genes was developed. INGEMM script application returned several variant annotations including dbSNPrs identification, gene names with accession numbers, predicted functional effect, protein positions and, for amino-acid changes (dbNSFP, CADD), conservation scores and several population frequency databases, and known clinical associations along with a vast array of annotations for non-coding sequences derived from ENCODE. Databases for pathogenic variants such as ClinVar, Human Gene Mutation Database (HGMD), Mastermind, LOVD, Alamut, and Varsome have also been reviewed.
After that, the variant prioritization was performed according to the filtering strategy described in Figure 1. Finally, variants were classified according to the ACMG guidelines [23]. A custom script developed in-house called "LACONv" has also been developed by INGEMM in order to analyze large genomic DNA gain and losses or copy number variation (CNVs) which can be found in the following repository (https://github.com/kibanez/LACONv) [24].

Results
After the quality control analysis, 33 out of the 300 sequenced samples were discarded and 267 were subsequently filtered. We have identified 86 variants in 81 patients (32.2%). Out of them, 34 (39.5%) have been classified as pathogenic variants, 14 (16.3%) as likely pathogenic, and 38 (44.2%) as variants of unknown significance (VUS). In 186 samples non pathogenic, likely pathogenic, or VUS were detected in the analyzed genes. BMPR2 was the predominantly mutated gene with 25 pathogenic or likely pathogenic variants (29%, 25/86) followed by EIF2AK4, TBX4, and ACVRL1.

Results
After the quality control analysis, 33 out of the 300 sequenced samples were discarded and 267 were subsequently filtered. We have identified 86 variants in 81 patients (32.2%). Out of them, 34 (39.5%) have been classified as pathogenic variants, 14 (16.3%) as likely pathogenic, and 38 (44.2%) as variants of unknown significance (VUS). In 186 samples non pathogenic, likely pathogenic, or VUS were detected in the analyzed genes. BMPR2 was the predominantly mutated gene with 25 pathogenic or likely pathogenic variants (29%, 25/86) followed by EIF2AK4, TBX4, and ACVRL1.
Regarding the VUS variants detected, NOTCH3 and ABCC8 had the highest frequency of these variants ( Figure 2). Based on the etiology, 41 variants were detected in 38 IPAH individuals, 15 in 15 individuals with HPAH, and 10 in 8 individuals with PVOD. In associated -PAH, we have found 15 variants (Supplementary Tables S1 and S2). Finally, in all three patients with suspected heritable hemorrhagic telangiectasia (HHT), an AVCRL1 pathogenic variant was detected, confirming the clinical suspicion.
15 in 15 individuals with HPAH, and 10 in 8 individuals with PVOD. In associated -PAH, we have found 15 variants (Supplementary Tables S1 and S2). Finally, in all three patients with suspected heritable hemorrhagic telangiectasia (HHT), an AVCRL1 pathogenic variant was detected, confirming the clinical suspicion.
Strikingly, we have found five unrelated families in whom there was more than one candidate causative variant (Table 1, Figure 3). In two families, one of the variants was found in BMPR2 and the co-occurrence was with NOTCH3 (x2). In the other three families, the variants were located in ABCC8-NOTCH3, ABCC8-SARS2, and TBX4-SMAD1. Three of these families were classified as IPAH and the other two as PAH-CHD.

Figure 2.
Frequency of gene mutation detected after the analysis of 21 genes. (A) A total of 14% of pathogenic and likely pathogenic variants were identified, with a 15% of variants of unknown significance (VUS) and in approximately 65% of individuals no candidate gene variants were identified. After quality check, 6% of the analyzed samples were discarded due to low quality. (B) The gene with the highest pathogenic variants detected was BMPR2, followed by EIF2AK4, because many patients with PVOD were also included in the analysis. After quality check, 6% of the analyzed samples were discarded due to low quality. (B) The gene with the highest pathogenic variants detected was BMPR2, followed by EIF2AK4, because many patients with PVOD were also included in the analysis. Strikingly, we have found five unrelated families in whom there was more than one candidate causative variant (Table 1, Figure 3). In two families, one of the variants was found in BMPR2 and the co-occurrence was with NOTCH3 (x2). In the other three families, the variants were located in ABCC8-NOTCH3, ABCC8-SARS2, and TBX4-SMAD1. Three of these families were classified as IPAH and the other two as PAH-CHD.
In all consanguineous individuals with a clinical diagnosis of PVOD, we found the known Spanish founder missense pathogenic variant in EIF2AK4:NM_001013703.3:c.3344C>T(p.Pro1115Leu) in a homozygous state [15].
Additionally, in two siblings from a non-consanguineous couple, we found compound heterozygous pathogenic variants in EIF2AK4 ( Segregation analysis demonstrated that parents were carriers of each variant. One of the siblings is a Caucasian male with a clinical presumptive diagnosis of PVOD at 30 years of age. He underwent bilateral lung transplantation 3.5 years after diagnosis. His sister was also diagnosed with PVOD at 38 years of age leading to lung transplantation one year after diagnosis. We have found five patients from five unrelated families carrying a pathogenic or likely pathogenic variant in TBX4 [24]. All patients had a variable clinical expressivity with a moderate to severe reduction in diffusion capacity (DLCO).  A total of 14% of pathogenic and likely pathogenic variants were identified, with a 15% of variants of unknown significance (VUS) and in approximately 65% of individuals no candidate gene variants were identified. After quality check, 6% of the analyzed samples were discarded due to low quality. (B) The gene with the highest pathogenic variants detected was BMPR2, followed by EIF2AK4, because many patients with PVOD were also included in the analysis.  Finally, in CBLN2, we found one candidate variant in a patient with IPAH classified as VUS (CBLN2: NM_182511.3:c.263C>T:p.(Ser88Phe)).

Discussion
PAH is a multifactorial disease, in which many different phenotypes and etiologies can explain the prognosis and severity of the disease. During the last few years, mainly due to the advances in genomics technologies, many genes and genomic associations were described in PAH. Therefore, nowadays the diagnostic approach for PAH should include genetic analysis because it is crucial for follow up, genetic counseling, to grade the severity, or even for future personalized therapies [26,27]. Since 2014, we have been evaluating patients not only with IPAH, HPAH, and PVOD but also secondary forms from which there is not so much evidence for the relation with genetic mutations.
Thus, the HAP-panel v1.2 included 21 genes based on the literature, previous experience, and well-established genes in PAH, which influence the pathophysiology of the disease. We included patients with PAH, no matter the type of PAH (idiopathic, heritable, or associated forms). We found that 30.34% (81/267) of individuals with pathogenic, likely pathogenic, and/or VUS variants (Figure 2). A total of 53.41% (47/88) of these variants were classified as pathogenic or likely pathogenic, and the remaining 31.82% (38/88) VUS, meaning that we have a relatively high number of variants without clear effect. The majority of the VUS have been found in NOTCH3, a gene from which mutations have been recently associated with IPAH [9,28,29]. It has been suggested that NOTCH3 mutations are involved in proliferation and cell viability and impairs the NOTCH3-HES5 signaling pathway, a crucial pathway in the development of PAH [30].
Regarding NOTCH3, it has been suggested that the majority of the mutations in NOTCH3 cause a missense change in the protein. Out of the 38 VUS variants reported herein, 23.7% (9/38) have been found in NOTCH3, which means that functional studies and segregation analysis in available cases are strongly recommended to confirm or discard the possible effect for these variants in the protein function and pathways in which NOTCH3 is involved.
This means that, if all the NOTCH3 variants were classified as pathogenic or likely pathogenic, the frequency of mutations in this gene in PAH would be much more common than initially reported [9]. Strikingly, among the five unrelated families with a suggested digenic inheritance, two have had pathogenic variants in BMPR2 and NOTCH3, which also segregates with the disease ( Table 1). One of the index patients was classified as IPAH and clinical features of the proband showed one adult diagnosed at 39 years, with severe hemodynamic parameters (very low cardiac output). Oral dual therapy (riociguat and ambrisentan) was initiated with clinical and hemodynamic improvement. Eight years after diagnosis, he presents a low risk of death under dual oral therapy. It is important to remark that, as far as we know, there is no previous evidence of segregation of variants in BMPR2 and NOTCH3, and this fact, might explain in part, why penetrance of BMPR2 in idiopathic cases of PAH is only about 20% (14% in males and 42% in females) [6,13].
Furthermore, three additional families have been found to have two variants in two genes (Table 1, Figure 3); adding evidence that digenic inheritance may be a mechanism associated with PAH as a previous report suggested [18]. Two out of these three families were diagnosed with APAH-CDH and one with IPAH.
Two siblings from an unrelated couple were found to have a compound heterozygous mutation in EIF2AK4, each one inherited from healthy parents. Clinical features of the patients correlated with the molecular findings and with the presumptive diagnosis of PVOD.
In ABCC8, a gene initially included with minimal evidence of relation with PAH, we found nine variants (Table 2, Figure 4), seven in IPAH individuals, one APAH, and one patient with CHD. All detected variants are distributed along SUR1, the protein encoded by ABCC8, so there are no hotspots in the gene, in concordance with previous reports [31] where the mechanism suggested was the loss of the ATP-sensitive potassium channel function. Functional assays by minigenes (data not shown), confirmed that the variant p.Asp1132Asn affects the splicing by exon skipping of exon 27 [32] and the variants p.(Glu100Lys), p.(Val477Met), and p.(Glu1326Lys) do not alter the splicing and maybe the pathogenic effect is not by splicing defects. In fact, variant p.(Glu1326Lys) has been found together with another variant in NOTCH3 (NM_000435.2:c.6532C>T:p.(Pro2178Ser)), which means that further analysis must be performed to confirm the role of these changes.  Only one patient has been found to have a VUS in CBLN2, but further analysis needs to be done to elucidate the possible role of this gene in PAH.
In patients with PAH associated with CHD, we found ten variants in eight samples, three classified as pathogenic (two in BMPR2 in two siblings and one in TBX4) and six VUS in CPS1, ABCC8, SMAD5, SARS2, SMAD1, and NOTCH3 (Table 3).  Only one patient has been found to have a VUS in CBLN2, but further analysis needs to be done to elucidate the possible role of this gene in PAH.
In patients with PAH associated with CHD, we found ten variants in eight samples, three classified as pathogenic (two in BMPR2 in two siblings and one in TBX4) and six VUS in CPS1, ABCC8, SMAD5, SARS2, SMAD1, and NOTCH3 (Table 3).
The two BMPR2 mutation carriers came from unrelated families, inherited the variant from their healthy mother, and no additional candidate variants were detected in the NGS panel in both siblings, suggesting that there may be other factors influencing the PAH-phenotype in the siblings, such as second hits in other genes not included in the panel. One of these siblings is a male diagnosed with PH at 19 years of age. In the first work-up at diagnosis, an interventricular septal defect with a right to left shunt was observed. Right heart catheterization (RHC) confirmed suprasystemic PH. Considering these findings, ventricular septal defect closure was contraindicated, and PAH-targeted treatment was initiated. Clinical follow-up included a goal-oriented PAH therapy and the risk profile was assessed periodically according to the guidelines. Finally, he underwent bilateral lung transplantation four years after diagnosis.
His sister was also diagnosed with PAH at 26 years of age. At diagnosis, an interatrial septal defect with a left-to-right shunt was observed. In the RHC, PH with elevated pulmonary vascular resistance was confirmed. Dual oral PAH treatment was initiated, and she currently presents a low risk of death under oral therapy three years after diagnosis according to the hemodynamic parameters.
Previous studies suggested that pathogenic variants in BMPR2 are present in up to 7.5% of patients with PAH associated with CHD [33,34]. Furthermore, Liu et al. observed that BMPR2 pathogenic variants are more frequent in patients with pulmonary vascular disease (PVD) associated with CHD in comparison with those patients without PVD. To date, there is no effective method to predict the development of PH after the correction of a CHD. According to current guidelines, correction is contraindicated when PVD is established [35]. However, approximately 10% of patients with repaired atrial septal defect or ventricular septal defect develop PAH and therefore genetic background can contribute to explaining these differences. The role of germline mutations in these patients is unknown. These findings suggest that PAH mutations might contribute to the development of PAH in CHD patients. Genetic testing may allow us to identify high-risk patients. However, more studies are needed to assess the importance of genetic abnormalities in CHD. Table 3. Variants detected in associated PAH forms (APAH). Description of variants found in patients with APAH, specifically with congenital heart disease (CHD) and connective tissue disease (CTD). ID: patient identifier, GT: genotype, ACMG: American College of Medical Genetics. Hom: homozygous; Het: heterozygous; P: pathogenic; VUS: variant of unknown significance. In all patients where a pathogenic variant in TBX4 was detected, the clinical expressivity was highly variable, including an initial suspicion of PVOD, interstitial lung disease, pulmonary vascular abnormalities, and CHD. In one individual, a variant in SMAD1 has been also observed in addition to the TBX4 variant. This variant in SMAD1 was classified as VUS. He suffered with PAH associated with a non-repaired atrial septal defect. The variant observed in TBX4 is a very infrequent variant with an extremely low frequency in the analyzed control populations (gnomAD exomes: 0.0000281, gnomAD genomes, 1000G, Kaviar, Beacon, ESP, and Bravo) although the majority of the in silico bioinformatic tools did not suggest a deleterious effect of the variant. Thus, the relationship of the variant detected in SMAD1 and the phenotype of the patients remains unclear. The nonsense TBX4 variant detected in this patient (TBX4(NM_018488.3):c.1018C>T:(p.Arg340*)) was reported last year by Galambos et al. [36]. In their study, the two siblings in whom the same variant was detected presented a clinically specific phenotype with transient patent ductus arteriosus, a patent foramen oval, and interstitial lung disease (ILD). Strikingly, one individual did not manifest pulmonary hypertension; reflecting the possible contribution of other genetic, epigenetic, or external factors that can contribute to the development of the disease. The presence of left-to-right shunt causes an overflow in pulmonary circulation that can lead to PH. However, this does not fully explain these complex cases. It is necessary to further investigate the possible involvement of TBX4 in complex cardiac diseases such as CHD and ILD, and how it can contribute to increasing the risk of development of PH in these individuals.

ID
Additionally, we found three variants in CPS1, one in a homozygous state, a gene for which neonatal susceptibility to PAH has been suggested with an increase of pulmonary arterial pressure after a surgical repair of congenital heart defects [37]. Furthermore, polymorphisms in CPS1 have also been related to persistent pulmonary hypertension of the newborn (PPHN) [38]. All variants were classified as VUS and unfortunately, no parental samples were available for segregation analysis.
Between 15 and 30% of PAH cases are secondary to a CTD [39] such as systemic sclerosis or systemic lupus erythematosus. Five-year survival in PAH-CTD is around 44% in the US population [40]. In Spain, one-year survival in PAH-CTD is 81% [1,41]. There is increasing evidence of the contribution of genetic mutation in PAH-CTD, as previous reports suggested [17]. In our CTD-PAH cohort, we have found four variants (two classified as VUS in ABCC8 and NOTCH3, and two as pathogenic in TBX4 and GDF2) ( Table 3), adding further evidence of the potential relation between variants in well-known genes for PAH and CTD-PAH. Furthermore, as far as we know, this is the first time a variant in TBX4 is associated with CTD-PAH. All variants were found in different genes and de novo events were confirmed in segregation studies when available.
We have compared the clinical features of all patients with a confirmed genetic defect (Supplementary Tables S3-S5). We observed that those patients who carried a pathogenic or likely pathogenic variant had a higher mean pulmonary artery pressure (PAPm) and pulmonary vascular resistance (PVR), with a lower cardiac index. However, they were younger and less likely to respond to acute vasodilator testing. Regarding functional class, no statistical differences were observed in the clinical severity assessed by the WHO functional class. However, the six-minute walking test (6MWT) was statistically higher in the group of patients with pathogenic variants compared with the non-mutated group.
In terms of the overall survival rate, we have found no significant difference between both groups ( Figure 5). Additional analysis of survival by Kaplan-Meier does not show any statistical differences between patients with different gene variants except for EIF2AK4, as we expected due to the severity of the associated PAH form. Also, we have not found statistical differences in survival between idiopathic, hereditary, and associated PAH forms, nor between carriers and non-carriers with APAH and PVOD (Supplementary Figure S2). Genes 2020, 11, x FOR PEER REVIEW 11 of 14 Figure 5. Survival analysis between carriers' vs non-carriers. Kaplan-Meier analysis did not reflect statistical differences between individuals with PAH and pathogenic or likely pathogenic variants and individuals without any pathogenic, likely pathogenic variant detected.

Conclusions
To summarize, we applied a custom personalized panel of genes for PAH diagnosis, which demonstrated to be very useful to confirm a presumptive diagnosis in cases with overlapping clinical features of PAH. The panel allows the identification of variants in the PAH related genes in a reasonable turnaround time for clinical purposes. We also detected variants in genes that have been recently described in PAH, and that could potentially be new therapeutic targets for personalized medicine. Lastly, in secondary PAH, we found variants that may influence the phenotype of the disease. Further studies to confirm the role of these variants are needed.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Table S1: List of variants detected after prioritization. Table S2: List of variants detected after prioritization. Table S3: PAH Spanish Biobank cohort demographic and hemodynamic data. Table S4: PAH Spanish Biobank cohort demographic and hemodynamic data. Table S5: Clinical comparison between carriers' vs non carriers. Figure S1. Analyzed cohort of PAH. Figure S2: Survival analysis in PAH.

Conclusions
To summarize, we applied a custom personalized panel of genes for PAH diagnosis, which demonstrated to be very useful to confirm a presumptive diagnosis in cases with overlapping clinical features of PAH. The panel allows the identification of variants in the PAH related genes in a reasonable turnaround time for clinical purposes. We also detected variants in genes that have been recently described in PAH, and that could potentially be new therapeutic targets for personalized medicine. Lastly, in secondary PAH, we found variants that may influence the phenotype of the disease. Further studies to confirm the role of these variants are needed.