Impact of Genetic Variations on Thromboembolic Risk in Saudis with Sickle Cell Disease

Background: Sickle cell disease (SCD) is a Mendelian disease characterized by multigenic phenotypes. Previous reports indicated a higher rate of thromboembolic events (TEEs) in SCD patients. A number of candidate polymorphisms in certain genes (e.g., FVL, PRT, and MTHFR) were previously reported as risk factors for TEEs in different clinical conditions. This study aimed to genotype these genes and other loci predicted to underlie TEEs in SCD patients. Methodology: A multi-center genome-wide association study (GWAS) involving Saudi SCD adult patients with a history of TEEs (n = 65) and control patients without TEE history (n = 285) was performed. Genotyping used the 10× Affymetrix Axiom array, which includes 683,030 markers. Fisher’s exact test was used to generate p-values of TEE associations with each single-nucleotide polymorphism (SNP). The haplotype analysis software tool version 1.05, designed by the University of Göttingen, Germany, was used to identify the common inherited haplotypes. Results: No association was identified between the targeted single-nucleotide polymorphism rs1801133 in MTHFR and TEEs in SCD (p = 0.79). The allele frequency of rs6025 in FVL and rs1799963 in PRT in our cohort was extremely low (<0.01); thus, both variants were excluded from the analysis as no meaningful comparison was possible. In contrast, the GWAS analysis showed novel genome-wide associations (p < 5 × 10−8) with seven signals; five of them were located on Chr 11 (rs35390334, rs331532, rs317777, rs147062602, and rs372091), one SNP on Chr 20 (rs139341092), and another on Chr 9 (rs76076035). The other 34 SNPs located on known genes were also detected at a signal threshold of p < 5 × 10−6. Seven of the identified variants are located in olfactory receptor family 51 genes (OR51B5, OR51V1, OR51A1P, and OR51E2), and five variants were related to family 52 genes (OR52A5, OR52K1, OR52K2, and OR52T1P). The previously reported association between rs5006884-A in OR51B5 and fetal hemoglobin (HbF) levels was confirmed in our study, which showed significantly lower levels of HbF (p = 0.002) and less allele frequency (p = 0.003) in the TEE cases than in the controls. The assessment of the haplotype inheritance pattern involved the top ten significant markers with no LD (rs353988334, rs317777, rs14788626882, rs49188823, rs139349992, rs76076035, rs73395847, rs1368823, rs8888834548, and rs1455957). A haplotype analysis revealed significant associations between two haplotypes (a risk, TT-AA-del-AA-ins-CT-TT-CC-CC-AA, and a reverse protective, CC-GG-ins-GG-del-TT-CC-TT-GG-GG) and TEEs in SCD (p = 0.024, OR = 6.16, CI = 1.34–28.24, and p = 0.019, OR = 0.33, CI = 0.13–0.85, respectively). Conclusions: Seven markers showed novel genome-wide associations; two of them were exonic variants (rs317777 in OLFM5P and rs147062602 in OR51B5), and less significant associations (p < 5 × 10−6) were identified for 34 other variants in known genes with TEEs in SCD. Moreover, two 10-SNP common haplotypes were determined with contradictory effects. Further replication of these findings is needed.

Abstract: Background: Sickle cell disease (SCD) is a Mendelian disease characterized by multigenic phenotypes.Previous reports indicated a higher rate of thromboembolic events (TEEs) in SCD patients.A number of candidate polymorphisms in certain genes (e.g., FVL, PRT, and MTHFR) were previously reported as risk factors for TEEs in different clinical conditions.This study aimed to genotype these genes and other loci predicted to underlie TEEs in SCD patients.Methodology: A multi-center genome-wide association study (GWAS) involving Saudi SCD adult patients with a history of TEEs (n = 65) and control patients without TEE history (n = 285) was performed.Genotyping used the 10× Affymetrix Axiom array, which includes 683,030 markers.Fisher's exact test was used to generate p-values of TEE associations with each single-nucleotide polymorphism (SNP).The haplotype analysis software tool version 1.05, designed by the University of Göttingen, Germany, was used to identify the common inherited haplotypes.Results: No association was identified between the targeted single-nucleotide polymorphism rs1801133 in MTHFR and TEEs in SCD (p = 0.79).The allele frequency of rs6025 in FVL and rs1799963 in PRT in our cohort was extremely low (<0.01);thus, both variants were excluded from the analysis as no meaningful comparison was possible.In contrast, the GWAS analysis showed novel genome-wide associations (p < 5 × 10 −8 ) with seven signals; five of them were located on Chr 11 (rs35390334, rs331532, rs317777, rs147062602, and rs372091), one SNP on Chr 20 (rs139341092), and another on Chr 9 (rs76076035).The other 34 SNPs located on known genes were also detected at a signal threshold of p < 5 × 10 −6 .Seven of the identified variants are located in olfactory receptor family 51 genes (OR51B5, OR51V1, OR51A1P, and OR51E2), and five variants were related to family 52 genes (OR52A5, OR52K1, OR52K2, and OR52T1P).The previously reported association between rs5006884-A in OR51B5 and fetal hemoglobin (HbF) levels was confirmed in our study, which showed significantly lower levels of HbF (p = 0.002) and less allele frequency (p = 0.003) in the TEE cases than in the controls.The assessment of the haplotype inheritance pattern involved the top ten significant markers with no LD (rs353988334, rs317777, rs14788626882, rs49188823, rs139349992, rs76076035, rs73395847, rs1368823, rs8888834548, and rs1455957).A haplotype analysis revealed significant associations between two haplotypes

Introduction
Sickle cell disease (SCD) affects millions of people around the world but focuses more commonly on certain ethnicities in African, Caribbean, and Middle Eastern populations.A large proportion of Saudis, in particular people who live in the Eastern province (up to 24%), are affected by the disease or carry a single copy of the genetic trait [1,2].It is an autosomal recessive disease caused by a point gene mutation: a transversion change in a single nucleotide base (rs334 A>T) in the codon region of the sixth amino acid of the hemoglobin β gene (HBB), located at chromosome (Chr) 11.This substitution results in an amino acid change from glutamine to valine, which ultimately encodes sickled hemoglobin, which polymerizes when deoxygenated and tends to produce abnormal (crescent-shaped) red blood cells (RBCs) [3].The sickled reticulocytes (immature RBCs) in SCD have increased adhesive properties, which may trigger the vaso-occlusive process [4].A clog of blood capillaries may lead to a disabling systemic syndrome, including chronic anemia, which may require frequent blood transfusions.Complications of SCD may also comprise difficultto-treat leg ulcers, eye damage, stroke, vaso-occlusive (thrombotic) crises accompanied by acute pain, and organ infarction [5].SCD patients may develop chronic organ malfunctions too, such as splenic sequestration crises, the formation of gallstones, and lung crises (the development of acute chest syndrome), that ultimately result in a poor quality of life, a poor prognosis, and a shorter life expectancy [6].
Several previous reports confirmed the increased risk of thromboembolic events (TEEs), both arterial and venous thromboembolism (VTE), in both SCD trait (HbAS) and SCD patients (HbSS) in comparison to control groups [7][8][9][10].The reported level of risk in African-Americans with sickle cell trait is approximately twofold for VTE in general and fourfold for pulmonary embolism (PE) [11].These results were confirmed recently in the UK, where sickle cell carriers were found to be at a higher risk of VTE, in particular PE (OR = 2.27), than healthy individuals [12].In addition, in a study that involved 7000 SCD patients in the United States, the risk of developing PE was found to be fourfold the risk in patients without SCD [13].Similarly, higher rates of infarctive cerebrovascular accidents and hemorrhagic strokes were noted in individuals with SCD than in normal subjects [14,15].The prevalence of TEEs in SCD patients reported in African-Americans is about 11.3% to 11.5% [16,17]; this percentage is similar to the findings seen in the Saudi population (11.3%) [18], although a larger study showed a slightly lower prevalence (8.4%) [19].Rates are extremely low in healthy individuals, approximately 1 per 1000 (0.1%) [20,21].This might be attributed to several common risk factors, such as the placement of central venous catheters, obesity, pregnancy, and thrombophilias, or related to specific SCD factors, such as a history of splenectomy and the genetic makeup [22].
A number of reported candidate genes demonstrate a potential role in inducing thromboembolism in several clinical conditions.Factor V Leiden (FVL), prothrombin (PRT), and methylenetetrahydrofolate reductase (MTHFR) are the most common genes associated with a hypercoagulability status and are involved in various thrombophilic conditions.The risk of TEEs in women positive for the FVL (G1691A, rs6025) mutation increases by five times when they are exposed to oral contraceptives [23].Thrombotic episodes and graft rejection were also noted in patients who underwent kidney transplantation and were heterozygous for the FVL variant [24].Carriers of this variant are more susceptible to deep venous thrombosis (DVT) [25].PRT (FII) (G20210A, rs1799963) was also identified as a risk factor for pulmonary embolism [26] and myocardial infarction [27].Both variants in FVL and PRT were reported as the main contributing factors underlying recurrent pregnancy loss in Saudi females from different regions [28,29].These two variants were also found to be more prominent in 250 TEE patients from Kashmir [30].Furthermore, MTHFR variants (C677T (rs1801133) and A1298C (rs1801131)) were reported as risk markers in addition to FVL for developing DVT in two Iranian studies [31,32], although a previous study on a smaller number of patients failed to detect an association between DVT and MTHFR [33].Different types of thrombotic conditions, such as portal vein thrombosis [34,35], postcardiac surgery thrombosis [36], and arterial thrombosis [37] were significantly associated with MTHFR, in particular the C677T variant.The polymorphisms in MTHFR are common in the healthy Saudi population; 23.9% are positive for C677T and 33.9% are positive for A1298C.In contrast, a lower number of individuals carry the variants G1691A in FVL (an average of 2%) and G20210A in PRT (2%) [38].
Previous comparative genetic studies between SCD patients and the normal population revealed variable allele frequencies of variants in the selected thrombotic genes.Two Indian studies that involved collectively 391 patients versus 447 controls [39,40] indicated a higher frequency of risk alleles in both the FVL and MTHFR (C677T) genes in patients compared to controls.However, the result failed to be replicated in 180 patients from the western region of India [41].Two other studies conducted on Brazilian SCD patients showed a higher variant allele frequency in MTHFR (C677T) but not FVL in patients than in controls [42,43].A recent study conducted on Tunisians showed more frequent polymorphisms in PRT (G20210A) and MTHFR (C677T) in patients with SCD compared with healthy subjects [44].On the other hand, investigating the three polymorphisms in these genes (FVL, PRT, and MTHFR) in the Saudi population from the eastern province showed nonsignificant differences between 87 patients and 105 healthy individuals [45].Due to the known role of these genes in the induction of thrombosis, we considered that we might see a higher rate of the selected mutations in SCD patients with a history of TEEs than in patients without a history of thrombosis.Thus, this study assessed the association of specific variants in FVL, PRT, and MTHFR with TEEs in Saudis with SCD and also performed a genome-wide association study (GWAS) to identify novel risk variants.

Study Design
This is a genetic association, case-control, multicenter study conducted to screen genes predicted to be associated with TEEs in SCD patients.The study was approved by the Institutional Review Boards of KAIMRC (Ref: IRBC/1414/19) and Qatif Central Hospital (QCH-SREC0216/2020).The recruited subjects were genotyped for selected variants in known thrombotic genes such as FVL (G1691A, rs6025), PRT (FII) (G20210A, rs1799963), and MTHFR (C677T, rs1801133).Moreover, a GWAS analysis comparing SCD patients with and without a TEE history was performed.

Sample Recruitment
Unrelated adult patients with SCD confirmed by a positive sickling test and homozygosity for the sickle mutation (rs334) (n = 350; 65 SCD patients with a history of TEEs plus 285 controls (SCD patients without a TEE history)) were recruited through their regular follow up with hematology units in three different settings: (i) King Fahad Hospital (KFH), Ministry of National Guard Health Affairs (MNGHA), Riyadh (n = 106 SCD patients; 27 cases with a TEE history plus 79 controls), (ii) King Khalid University Hospital (KKUH), King Saud University Medical City, Riyadh, (n = 82 SCD patients; 25 cases with a TEE history plus 57 controls), and (iii) Qatif central hospital (QCH), Eastern province, Qatif (162 SCD patients; 13 SCD with a TEE history plus 149 controls).Most of the patients who attended KKUH and KFH hospitals in the central region (Riyadh) were referred from southwestern or northern areas of Saudi Arabia.The average age of participants at the time of recruitment was 32.7 ± 10.2 years.Demographic and phenotypic data were collected for all participants.The focus in this study was on HbSS patients; thus, others with different types of SCD, such as HbSC, HbSbeta-thalassemia, HbSD, and HbSO, were identified via hemoglobin electrophoresis and were excluded.Exclusion criteria for TEE cases involved obese patients (BMI ≥ 30), smokers, females on contraceptive pills, pregnant women, history of postpartum, recurrent miscarriages, diabetes mellitus, Behçet's disease, varicose veins, thrombophilia, chronic renal disease, ST elevation myocardial infarction (STEMI), cancer, immobilization, recent history of surgery, central venous catheter placement, admission at the intensive care units, trauma, fracture, recent long-distance travel, elevated levels of homocysteine, protein C, protein S, D-dimer, thrombin-antithrombin, and those with abnormal prothrombin time (PT, APTT).Only two cases fit these criteria and were excluded from our study cohort: a female with a postoperative DVT and another with a PE during pregnancy.Informed consent was obtained from the participants, and blood samples were collected from each patient for a DNA analysis.

TEE Diagnosis
Patients with TEEs were diagnosed through physician clinical judgment in conjunction with confirmatory imaging such as ultrasonography or computed tomography (CT).Stroke and transient ischemic attack (TIA) were confirmed via physical and neurological examination, e.g., electroencephalogram (EEG), laboratory (blood) tests, and imaging tests such as a doppler ultrasound, CT, or magnetic resonance imaging (MRI) scan.

Genomic Analysis
Total genomic DNA was extracted from whole blood using Puregene Blood Kits (Qiagen, Hilden, Germany, Catalog Number # 158389) according to the supplier's instructions.An automated DNA extractor machine (KingFisher™ magnetic system, Thermo Fisher Scientific, Fresno, CA, USA) was used.A nanodrop (2000/2000c), Thermo Fisher Scientific, Fresno, CA, USA, was used to measure the absorbance of DNA at 260 nm.Working DNA stocks were aliquoted at a concentration of 50 ng/ul and stored at 4 • C. Cases and control samples were genotyped using the 10× Affymetrix Axiom array (Axiom 2.0 reagent kit designed by Applied Biosystems TM , Waltham, MA, USA, catalog number 901758), which includes 683,030 markers for the GWAS.The GWAS association analysis involved both common (minor allele frequency (MAF) ≥ 5%) and rare (MAF < 0.5%) variants.To ensure the validity and reliability of the used platform, a control sample with known variant calls was added to each test run.Axiom Analysis Suite software version 5.1 was used to cluster the genomic data; the average quality control call rate (CR) for the passing samples was 99.896%, and samples with >93% CR were retested.Genotype imputation was not performed here, as no reference panel is currently available for the Saudi population.The main five HBB haplotypes (Benin, Arab/Indian, Cameroon, CAR/Bantu, and SEN) can be ascertained through the genotyping of four SNPs (rs3834466, rs28440105, rs10128556, and rs968857) [46].However, these SNPs are not included among the used GWAS panel.Thus, the association between HBB haplotypes and TEEs was not assessed.

Sample Size Calculation
Minitab version 16 was used to calculate the sample size needed as representative samples of the Saudi population to provide a statistical power of 80% at a 0.05 p-value cutoff significance.Our calculations based on MAF showed that a genotyping of 337 cases was needed for the FVL variant and 222 cases for the MTHFR variant to provide the suggested study power (Table 1).Therefore, we decided to recruit 350 cases in total.

Statistical Analysis
Plink software version 1.9 was used for the analysis of genomic data and generating linkage disequilibrium (LD) between SNPs.Samples with genotypes that were not in Hardy-Weinberg equilibrium (p < 0.05) were removed from the study.The p-values were calculated for categorical covariates such as genotype differences between different groups using Fisher's exact test (Graphpad PRISM version 5.0).For GWAS, a p-value < 5 × 10 −8 was set as a strict significance point to identify risk loci [47].However, markers with p > 5 × 10 −6 were also considered.Calculations of means and standard deviations were performed to assess age matching, and the differences were assessed using the 2-independentsample t-test.Manhattan, quantile-quantile (Q-Q), and principal component analysis (PCA) plots were generated using the R statistical package (qqman) version R-4.2.2.The permutation test (T1) was used to assess sample heterogeneity based on pairwise identity-by-state (IBS) distance.The haplotype analysis software tool version 1.05, prepared by Eliades N-G. and Eliades D. G. [48], was used to determine the common significant haplotypes.

Results
As shown in Table 2, matching was confirmed for age (35.7 ± 9.8 years vs. 34.4± 10.3 years, p = 0.35) and sex (49.2% vs. 56.8%females, p = 0.27) between cases and controls, respectively.The majority of TEE cases had stroke (41.5%),PE (38.5%), and DVT (29.2%).The recruited patients with TEEs were mainly from the southwestern province (78.5%), with few cases from the eastern province (21.5%).A Q-Q plot of the observed versus expected p-values showed a late departure of the observed p-values from the null (Supplementary Figure S1), indicating that the obtained results were not affected by genotyping quality, sample relatedness, or population stratification.Furthermore, the PCA plot indicated homogenous clusters of genetic variations that characterize cases and control cohorts (Figure 1).The genotype data of participants shown on the scatter plot revealed a matched background distribution.This was confirmed by the permutation test (T1) (p = 0.93), which ruled out the stratification effects between cases and controls.No significant difference was seen in the allele frequency of the MTHFR (rs1801133) polymorphism between the cases and control group (p = 0.74; Table 3).Moreover, the other two candidate genes, FVL (rs6025) and PRT (rs1799963), were not included in the analysis as the allele frequency of their variants was extremely low (<0.01).No meaningful comparison could be achieved in such a condition.Five SNPs (rs2229637 in ITPR3, rs10998957 in LINC02651-RPL5P26, rs10746487 in H6PD-SPSB1, rs1985317, and rs6771316 in LINC00877) were previously tested in GWASs among French [49], African-Americans [50], and other mixed populations [51] that showed associations at p < 5 × 10 −6 between the tested markers and venous thromboembolism (VTE).These results were replicated among our Saudi cohort but at lower significant levels (p = 0.016, 0.024, 0.028, 0.0496, and 0.044, respectively).Other variants in various genes were previously reported among the German population [52] (SLCO1B1 (rs4149056), PRIM1 (rs2277339), APOB (rs676210), TYK2 (rs12720356), TSEN15 (rs1046934), CYP4F2 (rs2108622), and MST1 (rs3197999)), and the Brazilians (ADAMTS (rs1364044)) [53] as risk factors for stroke in SCD patients with p-values < 1.0 × 10 −5 in GWASs.These SNPs were tested in our stroke patients (n = 27), and no significant associations were detected except for rs1364044 in ADAMTS12 (p = 0.75, 0.74, 0.55, 1.0, 0.53, 0.55, 0.38, and 0.036, respectively).

Discussion
Major progress has been made in the genetics field since the advent of GWASs, which allows genetic testing across the whole genome with improved resolution to identify variations with the highest level of association [61].A GWAS usually requires thousands of cases and controls to detect modest to strong associations with sufficient statistical power [62].However, for very strong associations, small (approximately 50 to 100) numbers of cases may suffice [63,64].Various GWASs conducted on SCD patients have succeeded in identifying several distinct loci predicted to be additional phenotypic modifiers.For example, rs3115229 at Chr 4 showed a significant association with acute, severe vasoocclusive pain in children with SCD [65].Furthermore, GWAS data from two African cohorts showed significant associations between variable HbF levels and variants in GLP2R and near BCL11A and HBS1L-MYB [66,67].
A head-to-head single SNP comparison of variants located in selected thrombotic genes (FVL, PRT, and MTHFR) between our cases and controls showed no association with TEEs in SCD patients.The genotyping results related to MTHFR (rs1801133) were not consistent with TEE associations reported previously in other medical conditions among Iranians [31,32], Japanese [34], Italians [35], Americans [36], and Georgians [37].This may indicate that the MTHFR variant has no general role in TEE susceptibility.In contrast, the genotyping comparisons for FVL and PRT variants in our study cohort were considered unreliable due to the extremely low frequency of their alleles (<0.01).In such a condition, inductive reasoning is not possible, and therefore the analysis of both variants was removed from the Section 3.These variants in FVL and PRT are known thrombotic factors impacting TEEs but not in SCD across various ethnic groups, including the Saudi population, as supported by previous studies [23][24][25][26][27][28][29][30].A recent meta-analysis that included 18 studies from mixed populations comparing 30,234 VTE cases and 172,122 controls detected FVL (rs6025) as the highest signal marker (1.4 × 10 −188 ) [51].The variant rs1799963 (in LD with rs191945075) in PRT was also reported with a very strong association (9.5 × 10 −32 ).Thus, our data do not rule out the possibility of the involvement of both markers in TEE development.Multiple variants in the SLCO1B1, PRIM1, APOB, TYK2, TSEN15, CYP4F2, and MST1 genes were previously suggested as risk factors for stroke with p-values < 1.0 × 10 −5 in a GWAS on German pediatrics [52].Our study on a larger number of adult stroke patients (n = 27) revealed no associations between the suggested markers and stroke.In contrast, the association between rs1364044 in ADAMTS and stroke in SCD, reported in pediatric patients from Brazil [53], was detected among our adult Saudis with stroke and SCD (p = 3 × 10 −6 vs. 0.036, respectively).This finding may imply a role for the variant in stroke induction, specifically in SCD.An association suggested by Arning et al. for another stroke marker (rs2084898 in TRIM29) [52] was also confirmed in our study.Furthermore, GWAS signals at the p < 5 × 10 −6 threshold for five SNPs (rs2229637 in ITPR3, rs10998957 in LINC02651-RPL5P26, rs10746487 in H6PD-SPSB1, rs1985317 in an intergenic region, and rs6771316 in LINC00877) were reported previously with TEEs [49][50][51].These signals were detected in our study as well, but at lower association levels.This replication provides evidence that the reported associations are not chance findings.
Genome-wide significant or close to significant associations with 41 markers were identified in the current study, with many of them (73.2%)located on Chr 11.SCD is a monogenic disorder involving Chr 11 [3], with various complications also predicted to be partially influenced by parental genetics.Thus, it is not surprising that the majority of genetic modifiers were seen in Chr 11.Four of the top detected signals that met the GWAS significant threshold cut-off are intronic variants with no previous phenotypic association reports.Introns are noncoding loci; however, some intronic variants may play a role in manipulating gene function via disrupting the RNA splicing process that takes place at exon-intron boundaries [68].
Our data also showed a possible association with rs317777 in a pseudogene (OLFM5P).Pseudogenes are DNA segments that have lost their coding ability; hence, they have no direct impact on phenotype occurrence but have the potential to influence the expression and activity of other coding genes possibly related to phenotypic diversity [69].The associations seen with variants in olfactory receptor genes, family 51 (OR51B5, OR51V1, OR51A1P, and OR51E2) and family 52 (OR52A5, OR52K1, OR52K2, and OR52T1P), represented 29.3% of the suggested risk variants.The results highlighted frameshift and missense variants in two olfactory receptor genes (rs147062602 in OR51B5 and rs7933549 in OR51V1).These genes are homologous (paralog) and have closely similar structures and functions [70].A frameshift variant is an insertion or deletion of a number of base pairs that generates a stop codon and usually causes a premature termination of a DNA sequence [71].Previous reports have identified two variants in the olfactory receptor region associated with various phenotypes: rs7948471-A in OR51B5 associated with a higher degree of hemolysis in SCD (p = 3 × 10 −10 ), rs5006884-A in OR51B5 associated with an increased HbF level (p = 3 × 10 −8 ) [72], and rs7950726-A in OR51V1 associated with variable HbA2 levels in healthy adults (p = 1 × 10 −11 ) [73].In line with these findings, our results confirmed the association of rs5006884-A in OR51B5, where it was significantly less frequent in the cases with lower HbF levels than in the controls.This may suggest a linkage between HbF levels and TEE development.More findings in our study signify the role of olfactory receptors on TEEs.For instance, two of the detected markers were found to have a strong LD with a missense variant (rs2472530) in another olfactory receptor gene (OR52A5).Similarly, another suggested risk variant (rs12286769) is located in the upstream transcriptional region near OR52K2.A different variant (rs569117290) in this gene was previously reported with variable mean corpuscular hemoglobin concentrations [74].The mutations in the olfactory receptor region (OR genes) may modulate the chromatin structure of the CTCF binding site within the β-globin gene and interfere with the gene-receptor interaction [75].Moreover, selected variants within OR genes may regulate the expression of HBG [76].A few olfactory receptor genes, such as OR2L13, OR4D6, and OR1N1, were recently suggested as risk factors for a number of TEE incidents [77].A functional analysis indicated an upregulation of the transduction pathway through which the signaling of olfactory receptors modulates platelet activation underlying TEE [77,78].
A splice acceptor variant (rs73395847) in the C11orf40 gene was among the topdetected markers in our study that could possibly impact TEE development.Several mutations in C11orf40 were reported previously in patients with fibromyalgia syndrome, but the gene function is not clearly known [79].This study also suggested that two variants (rs73402629 in HBG1 and rs2071348 in the human β-globin locus (HBBP1)) might be relevant to TEE risk (p = 1.41 × 10 −6 and 3.25 × 10 −6 , respectively).The variant rs2071348 was previously reported as a predictor for disease severity in anemic patients with β 0 -thalassemia/hemoglobin E (p = 3 × 10 −15 , OR = 4.05) [80], whereas a strong association between HBG1 (rs998870472) and low hemoglobin levels was recently reported (p = 1 × 10 −273 ) [81].HBBP1 (a pseudogene) and HBG1 are both involved in the interaction of the locus control region with globin genes, which is a critical step for γ-globin regulation in adults [82].Furthermore, rs6554634 is a predicted transcription factor binding site located approximately 12 kilobases (kb) upstream of SLC6A19, denoted in our study as a possible marker for TEE susceptibility.Certain cis-regulatory elements can be found hundreds of kb away from the actual transcriptional site [83].The variant rs6554634 is not currently reported in ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/,accessed on 1 January 2023), but was suggested as a biomarker affecting patients' response to cetuximab in the treatment of colorectal cancer (p = 1.76 × 10 −6 ) [84].Two haplotypes of independent markers showed significant associations: a haplo-91 with a sixfold risk of TEEs in the carriers and a haplo-22 with a protective effect.The haplotype analysis involved only the top ten independent risk loci.In addition, two-thirds of the suggested variants (n = 26) showed genome-wide significance when the GWAS analysis was restricted to stroke cases only.This may imply a role for these variants in the induction of thromboembolism.The recruitment of further cases may confirm this.Four common SNPs were previously suggested to assess HBB haplotype frequencies [46], but these SNPs were missing in our GWAS markers, so it was not possible to assess the haplotypes frequencies in our study cohort.A recent study on the Saudi population with SCD identified the common HBB haplotypes, where the Arab/Indian haplotype was predominant in the patients from the eastern province, whereas the Benin haplotype was most common in the southwestern individuals [85].Moreover, the study showed higher incidents of stroke in 318 Southwesterners with SCD than in the 159 examined patients from the eastern region.A previous study on Egyptians indicated a higher risk of stroke in homozygous Benin/Benin than other haplotypes [86].Consistent with this finding, we noticed a higher rate of TEE cases in the southwestern participants (34.4%), who are known to inherit Benin haplotype more frequently, in comparison to the eastern patients (8.4%).
This study included the largest genome-wide scan of Saudi SCD patients, and by considering markers that just failed to meet the normal GWAS threshold, it identified some interesting novel variants and a haplotype as possible risk factors; however, further work is needed to replicate the findings in an independent cohort to confirm that the detected associations are true signals and not related to technical or methodological bias [87].Some of the detected signals in GWAS are located in noncoding regions, but these may influence the binding between enhancer elements and transcription factors, which ultimately modulate genes' expression [88].Functional studies are further needed to interpret GWAS findings and provide possible mechanisms through which the detected variants impact TEE development [89].

Conclusions
This study showed no impact of the known thrombotic gene, MTHFR, on TEEs induced in Saudi patients with SCD.However, significant GWAS associations, at levels of p < 5 × 10 −6 , were identified between 41 variants (30 of them on Chr 11) and TEEs in SCD.Seven of the markers showed novel and stronger associations (p < 5 × 10 −8 ), two of them were exonic variants (rs317777 in OLFM5P and rs147062602 in OR51B5), but these findings need further replication studies to be confirmed.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes14101919/s1, Figure S1 Informed Consent Statement: Written informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The detailed research data is unavailable due to privacy aspects.

Figure 1 .
Figure 1.PCA plot showing sample stratification.The first two principal components (PC1 of the cases in red and the controls in blue were plo ed.The permutation test (T1) confi samples' heterogeneity (p = 0.93).

Figure 1 .
Figure 1.PCA plot showing sample stratification.The first two principal components (PC1 and PC2) of the cases in red and the controls in blue were plotted.The permutation test (T1) confirmed the samples' heterogeneity (p = 0.93).

Figure 2 .
Figure 2. Manha an plot for SNP associations with TEEs in the Saudi SCD cohort.The GWAS association p-values of the detected SNPs were plo ed across the 23 chromosomes.Only the top marker on each chromosome was annotated.The SNPs which met the genome-wide significance threshold (p < 5 × 10 −8 ) are shown just above the red line.The rs ID numbers of some SNPs with p < 5 × 10 −6 (above the blue line) are demonstrated.The SNPs were plo ed in two different colures (black and grey dots) to show a distinction between the chromosomes.

Figure 2 .Table 5 .
Figure 2. Manhattan plot for SNP associations with TEEs in the Saudi SCD cohort.The GWAS association p-values of the detected SNPs were plotted across the 23 chromosomes.Only the top marker on each chromosome was annotated.The SNPs which met the genome-wide significance threshold (p < 5 × 10 −8 ) are shown just above the red line.The rs ID numbers of some SNPs with p < 5 × 10 −6 (above the blue line) are demonstrated.The SNPs were plotted in two different colures (black and grey dots) to show a distinction between the chromosomes.Table 5.The common 10-SNP * haplotypes (observed in ≥5 individuals) seen in the tested SCD cohort of 65 cases and 285 controls.Number of Observations Haplotype Haplotype Code : GWAS results plot of SCD cohort (65 TEE cases vs. 285 controls).Q-Q plot of the observed and expected p-values generated from an allelic genetic model which involved a set of 683,030 variants.The expected values plot is shown in red in comparison to the observed values plot in blue.Supplementary Excel Sheet S1: Linkage disequilibrium between 27 SNPs associated with TEE in SCD among the Saudis.Supplementary Excel Sheet S2: SNPs associated with stroke (n = 27) in SCD among the Saudis.Supplementary Excel Sheet S3: SNPs associated with TEE cases (n = 65) in SCD among the Saudis.Author Contributions: M.A.A. prepared the study proposal and demonstrated the importance of the project's concept in collaboration with F.H.A.Q., M.A. and S.A.; M.A.A. set up the inclusion criteria, and B.A. managed the financial matters and logistics.M.A.A., S.A., A.A.Z., H.H.A.S., F.H.A.Q. and M.A. managed patients' recruitment and the collection of study samples.H.A.A. performed DNA extractions and managed their storage.D.A. performed the genotyping experiments.M.A.A. performed the genetic analysis.M.R. generated the statistical plots (Manhattan, QQ, and PCA).The writing of the original draft was driven by M.A.A. and S.A. Reviewing and editing of the manuscript were conducted by F.H.A.Q., S.A. and A.K.D.; A.K.D. participated effectively in editing the manuscript and producing the final version.All authors have read and agreed to the published version of the manuscript.Funding: This study was funded by the King Abdullah International Medical Research Center (KAIMRC), Riyadh, Saudi Arabia, with award number RC19-083-R.Institutional Review Board Statement: The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Boards of KAIMRC (Ref: IRBC/1414/19) and Qatif Central Hospital (QCH-SREC0216/2020).

Table 1 .
Sample size calculation based on MAF of FVL (rs6025) and MTHFR (rs1801133) at cut-off significance p-value of 0.05.

Table 2 .
Demographic data of SCD patients who developed TEEs versus patients with no TEE history.
Genes 2023, 14, x FOR PEER REVIEW

Table 3 .
Association results of selected SNPs (reported previously as risk factors for thr with TEE cases in our SCD cohort.

Table 3 .
Association results of selected SNPs (reported previously as risk factors for thrombosis) with TEE cases in our SCD cohort.

Table 4 .
Markers with the highest association significance suggested to impact TEEs in SCD.Signals located on known genes with an association threshold of p < 5 × 10 −6 (n = 34)

Table 4 .
Markers with the highest association significance suggested to impact TEEs in SCD.