Genetic Modifiers of Sickle Cell Anemia Phenotype in a Cohort of Angolan Children

The aim of this study was to identify genetic markers in the HBB Cluster; HBS1L-MYB intergenic region; and BCL11A, KLF1, FOX3, and ZBTB7A genes associated with the heterogeneous phenotypes of Sickle Cell Anemia (SCA) using next-generation sequencing, as well as to assess their influence and prevalence in an Angolan population. Hematological, biochemical, and clinical data were considered to determine patients’ severity phenotypes. Samples from 192 patients were sequenced, and 5,019,378 variants of high quality were registered. A catalog of candidate modifier genes that clustered in pathophysiological pathways important for SCA was generated, and candidate genes associated with increasing vaso-occlusive crises (VOC) and with lower fetal hemoglobin (HbF) were identified. These data support the polygenic view of the genetic architecture of SCA phenotypic variability. Two single nucleotide polymorphisms in the intronic region of 2q16.1, harboring the BCL11A gene, are genome-wide and significantly associated with decreasing HbF. A set of variants was identified to nominally be associated with increasing VOC and are potential genetic modifiers harboring phenotypic variation among patients. To the best of our knowledge, this is the first investigation of clinical variation in SCA in Angola using a well-customized and targeted sequencing approach.


Introduction
Sickle Cell Disease (SCD) is a group of inherited diseases where a single nucleotide substitution in the gene HBB causes an amino acid substitution from glutamic acid to valine in the β-globin subunit.This substitution affects hemoglobin behavior, forming polymers under deoxygenated conditions [1], and patients are predisposed to vaso-occlusion, ischemia, hemolysis, and inflammation [2].It is estimated that worldwide, each year, 300,000 babies are born with SCD, with more than three-fourths of these cases being reported in Sub-Saharan Africa [3].
The most common and severe form of SCD is Sickle Cell Anemia (SCA), where two βS alleles are present.Chronic Hemolytic anemia, frequent painful crises, and extensive organ damage are common in these patients, although they tend to present very heterogeneous phenotypes with different levels of severity and life expectancy [1].
Fetal hemoglobin (HbF) is an important modulator of the SCA phenotype, having an impact on the clinical and hematological features of this disease, as high levels of

Sample Characterization
The data were analyzed according to two phenotype groups stratification.Children with previous stroke and mean LDH > 664U/L (measured in three different routine appointments) were classified as having the Hemolytic phenotype (n = 21, mean age 6.38 ± 2.20), children with no previous stroke and previous vaso-occlusive/painful crisis were classified as having the VOC phenotype (n = 138, mean age 6.75 ± 2.54), and the remaining children were classified as having the less severe phenotype (n = 33, mean age 6.21 ± 2.55).Children with HbF ≥ 7.65% (3rd Quartil value) were included in the High-HbF phenotype (n = 48, mean age 5.90 ± 2.60), and children with HbF < 7.65% were included in the Low-HbF phenotype (n = 143, mean age 6.84 ± 2.43).Data were presented as mean (SD).The t-test was used to compare the means between two independent groups and the non-parametric Kruskal-Wallis tests were applied when comparing three separate groups.Bonferroni adjustments were used for multiple testing.

Targeted Sequencing
After DNA extraction using the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) and quantification with Qubit™ dsDNA HS fluorometric assay (ThermoFisher Scientific Inc., Waltham, MA, USA), the samples were sequenced with a custom enrichment panel (Supplementary Table S1A).Paired-end sequencing was performed on the NextSeq550 equipment (Illumina, Inc., San Diego, CA, USA) using the NextSeq 500/550 Mid-Output kit v2 (300 cycles).Reads were aligned with the reference GRCh37/hg19 human genome.

Variant Calling Quality Control and Annotation
Joint variant calling was conducted using GATK and BCFTOOLS [7,8].We applied VariantMetaCaller [9] to combine and optimize the accuracy of variant calls based on the consensus of their statistical properties and discovery.The resulting VCF files were filtered using the GATK tool "VariantFiltration".

Network and Enrichment Analysis
From the obtained candidate lists of predicted mutant variants, we reconstructed their functional, physical, and co-expression-interacting network GeneMania [29].We further examined how these genes within the constructed networks were associated with human phenotypes, pathways, biological processes, and molecular functions using Enrichr [30].The most significant pathways enriched for genes in the networks were selected from various bioinformatics databases [30].Gene ontology terms and annotations from the Gene Ontology databases were extracted for cellular components, biological processes, and molecular functions.

Principal Component Analysis (PCA)
To evaluate the extent of substructure within Angolan SCA, we leverage the curated 192-phased haplotypes dataset, which resulted from Eagle [31], to perform genetic structure analysis based on Principal Component Analysis (PCA) using smartpca [32].Genesis software http://www.bioinf.wits.ac.za/software/genesis was used to plot PCA (accessed on 10 January 2024).
We further performed a PCA analysis to investigate the genetic structure of Angola HbF patients with other population groups.We accessed VCF files from the 1000 Genomes Project (1KGP) Consortium, 2015, and the African Genome Variation Project (AGVP), which recently characterized the admixture across 18 ethnolinguistic groups from Sub-Saharan Africa [33].A quality control check was conducted on these VCF files using Plink [34], and we ultimately retained 2504 and 2428 samples from 1KGP and AGVP, respectively.Based on sample description (population or country labels), population ethnolinguistic information [35,36] was utilized to categorize the obtained data per ethnolinguistic cultural group as described in Supplementary Table S2, resulting in 20 ethnolinguistic cultural groups and our samples.The first 20 principal components were computed from EIGEN-STRAT package via smartpca, comparing Angolan SCA and these groups; the second PCA compared SCA patients among themselves, and phylogenic trees were also plotted.

Distribution of Minor Allele Frequency and Gene-Specific in SNP Frequencies
The distribution of the minor allele frequency (MAF) was investigated to examine the extent of common and rare variants across 9 selected ethnic groups (KhoeSan, Niger-Congo Bantu, Niger-Congo Volta Niger, Niger-Congo West, European South, European-USA, East Asian, South Asian, and African-American) and Angola SCA patients group.Similarly, a second comparison was conducted just among Angola SCA groups, including SCA VOC, SCA Hemolytic, SCA Low and High HbF.To this end, the proportion of minor alleles was categorized into six ranges (0-0.05,>0.05-0.1,>0.1-0.2, >0.2-0.3,>0.3-0.4,>0.4-0.5) with respect to each ethnic group with a disease.The MAF per SNP for each category was computed using Plink software.Furthermore, the fraction of gene-specific SNP frequency for each gene was computed, assuming SNPs upstream and downstream within a gene region are close and possibly in Linkage Disequilibrium (LD), obtained from dbSNP database [17].MAF per SNP was aggregated as per our previous studies [37,38].

Identity by Descent (IBD) and Functional Genomics
Leveraging the 192 samples of Angola SCA, we examine the overall genomic identity by descent (IBD) sharing between pairs of SCA patients, aiming to look at the genomics regions of interest or long-shared segments.After phasing the data using Eagle 2.0 [31], we inferred the segments of IBD from the Refined IBD algorithm [39].The genomic IBD segments among the 192 Angola SCA patients were evaluated, and the shared segments between the SCA groups (VOC, Hemolytic, Low/High HbF) were compared.A cut-off of 250 kb was applied to retain segments of shared IBD, and genes were mapped to these genomic regions to examine their potential functional biological network and, in addition, their functional partners.Additional enrichment analyses were explored to gain insight into potential disease-compromised networks.

HbF Association Testing
HbF association testing was performed using EMMAX [40] on curated dataset as a result of genetics association quality-control guidelines.EMMAX was run to detect possible associations, and we generated a pairwise relatedness matrix from the dataset, which is representative of the structure of the samples using EMMAX-kin.Given the SNPs for association with HbF, we, therefore, used a genome-wide significance level of 0.05/m where m is the total number of tested variants.

Meta-Analysis of Angolan HbF and Other African Ancestry HbF
To identify associations with small effect sizes, which are not usually detected by standard genetic association methods, summary statistics from Tanzania [41] and African-Americans [42] were combined with those from our study in a single association dataset.A fixed effects model [43] based on inverse-variance weighted effect size was used to combine the log odds ratio and standard error from the combined GWAS summary statistics dataset.Random effects and binary effects models, as described in the MetaSoft program, were applied [43], and the p-values from fixed effect model and M-values (the posterior probability that the effect exists in the study) were used to assess the level of significance.Variants were retained to be significant for M-values > 8.5 across all the studies, and p-values from fixed effect were lesser and equal to 0.05/M, where M is the total number of variants tested for meta-analysis.

Rare-Variant Association and Burden Tests
To account for rare variants and sample size and leverage possible effects from variants not included in association test and meta-analysis above and those not meeting the genomewide significance level, an optimal unified sequence kernel association test (SKAT-O) [44], aggregating SNP effects at gene level, was performed to discriminate quantitative traits appropriately.We utilized the linear weighted kernel within SKAT-O and set the missing cut-off to 0.9 to calculate the permutation p-value while adjusting for age and principal.

Estimating Functional Heritability from GWAS Dataset
Approaches based on association summary statistics gained critical interest in the "Omics" era due to the privacy advantages they present and, particularly, their reduction of computational cost [45,46].We applied LDAK [47] to estimate the functional SNP-heritability of HbF from summary statistics.Briefly, we excluded the major Histocompatibility Complex (MCH) region (25,000,000-40,000,000) on chr6 and the sickle cell (HbS) region on chr11:2,500,000-6,500,000 to avoid potential biases.We constructed Genomic Relatedness Matrix (GRM) from pruned, high-quality, independent autosomal SNPs (independent pairwise 50 10.2) and obtained a list of samples with a relatedness threshold >5%.We then computed GRMs using all SNPs for each cohort and excluded one of any pair of samples with relatedness threshold >5%, and the functional enrichment and SNP-heritability were estimated as recommended [48] 3. Results

In Silico Mutational Burden of Genes in Participants
To examine potential genetic modifiers, we performed mutation prioritization and examined the in silico biological functional pathways' relationship to these mutations through reconstructing their physical, functional, and co-expression networks as well as enrichment analysis.Among 192 SCA patients, we detected significant differences in the burden of non-synonymous, function-altering variants in a total of 26 genes (Supplementary Table S1B, A total number of 5,019,378 variants (1.7% insertion, 1.9% deletion, 5.4% structural variants, 0.012% multi-nucleotide variants, and 91% SNPs) were called in the targeted sequence dataset, of which 1.3% and 54% were exonic and intergenic, respectively, and they were distributed as 0.001% stop loss, 0.02% stop gain, 0.9% synonymous, 0.56% nonsynonymous, and 0.05% splice site variants in the dataset.Supplementary Figure S1 illustrates the quality control of the sequence alignment data.

Population Structure and Distribution of Gene-Specific in SNP Frequencies
HbF samples from Angola were merged with a combined 4932 samples from 1KGP [49] and the AGVP [33], resulting in 237,572 common variants from the study's targeted sequence data.Based on sample description population and country labels, these 4932 samples were grouped (Supplementary Table S2) based on culture and ethnolinguistic information [35,36], resulting in 20 worldwide ethnolinguistic cultural groups (WECG).
PCA based on these 237,572 common variants showed that the study samples clustered separately from the rest of these 20 WECG (Figure 2).It particularly formed a clearly distinct cluster from the Khoisan group.PCA plots (Supplementary Figure S2 and Figure 2) showed no global population differences among the SCA patients, i.e., Hemolytic and VOC patients clustered together, except for three patients with VOC-independent outliers.Supplementary Table S3 illustrates the genetics distance (FST) among the 20 WECG and SCA Angola.

Population Structure and Distribution of Gene-Specific in SNP Frequencies
HbF samples from Angola were merged with a combined 4932 samples from 1KGP [49] and the AGVP [33], resulting in 237,572 common variants from the study's targeted sequence data.Based on sample description population and country labels, these 4932 samples were grouped (Supplementary Table S2) based on culture and ethnolinguistic information [35,36], resulting in 20 worldwide ethnolinguistic cultural groups (WECG).
PCA based on these 237,572 common variants showed that the study samples clustered separately from the rest of these 20 WECG (Figure 2).It particularly formed a clearly distinct cluster from the Khoisan group.PCA plots (Supplementary Figures S2 and  2) showed no global population differences among the SCA patients, i.e., Hemolytic and VOC patients clustered together, except for three patients with VOC-independent outliers.Supplementary Table S3 illustrates the genetics distance (FST) among the 20 WECG and SCA Angola.We observed a variation in the distribution of minor alleles at rare variants within MAF range 0.0-0.05 and as well as at MAF range 0.1-0.2 between SCA Angola and nine selected major WECG (Figure 3A).Among SCA Angola samples, variations in the distribution of MAF were observed in SNP frequencies ranging between 5% and 20% (Figure 3B), suggesting possible mutations and genetic modifiers may result in heterogeneous phenotypes of SCA observed in our study.The substantial variation of gene-specific SNP frequencies from the selected top pathogenic genes (Supplementary Table S1B) was observed within SCA Angolan samples (Figure 3D) and between Angolan and the selected nine WECG (Figure 3C).This may support the hypothesis that genetics modifiers may result in potential clinical variability of SCA phenotypes.
Genes 2024, 15, x FOR PEER REVIEW 8 of 15 We observed a variation in the distribution of minor alleles at rare variants within MAF range 0.0-0.05 and as well as at MAF range 0.1-0.2 between SCA Angola and nine selected major WECG (Figure 3A).Among SCA Angola samples, variations in the distribution of MAF were observed in SNP frequencies ranging between 5% and 20% (Figure 3B), suggesting possible mutations and genetic modifiers may result in heterogeneous phenotypes of SCA observed in our study.The substantial variation of gene-specific SNP frequencies from the selected top pathogenic (Supplementary Table S1B) was observed within SCA Angolan samples (Figure 3D) and between Angolan and the selected nine WECG (Figure 3C).This may support the hypothesis that genetics modifiers may result in potential clinical variability of SCA phenotypes.

Association and Meta-Analysis
We analyzed data from 192 quantitative HbF based on variants discovered from the study's targeted sequence data.As expected, we did not observe a substantial population substructure, and following data quality control, three sample outliers were removed.To account for both population stratification and hidden relatedness, we applied the mixed model approach EMMAX [40].The Q-Q plots of genomic control factor effects shown in Figure 4A are acceptable (λGC = 1.04) and suggest little departure from the null expectation, except at the right end tail of the distribution.As shown in Table 3 and Figure 4A, two SNPs in the intronic region of chromosome 2q16.1,rs1427407 (p = 1.29X10 −09 , MAF = 0.22), and rs71327644 (p = 7.39X10 -08 , MAF = 0.30) are genome-wide and significantly associated with decreasing HbF.These SNPs are associated with the BCL11A gene.Previous studies showed that the γ-globin repressor BCL11A is a target for the development of therapies for β-hemoglobinopathies by reactivating HbF.BCL11A

Association and Meta-Analysis
We analyzed data from 192 quantitative HbF based on variants discovered from the study's targeted sequence data.As expected, we did not observe a substantial population substructure, and following data quality control, three sample outliers were removed.To account for both population stratification and hidden relatedness, we applied the mixed model approach EMMAX [40].The Q-Q plots of genomic control factor effects shown in Figure 4A are acceptable (λGC = 1.04) and suggest little departure from the null expectation, except at the right end tail of the distribution.As shown in Table 3 and Figure 4A, two SNPs in the intronic region of chromosome 2q16.1,rs1427407 (p = 1.29 × 10 −09 , MAF = 0.22), and rs71327644 (p = 7.39 × 10 −08 , MAF = 0.30) are genome-wide and significantly associated with decreasing HbF.These SNPs are associated with the BCL11A gene.Previous studies showed that the γ-globin repressor BCL11A is a target for the development of therapies for β-hemoglobinopathies by reactivating HbF.BCL11A interacts with 43 genes (Supplementary Figure S3A) either in physical, co-expression, or both pathway networks.Importantly, through the cross-HbF meta-analysis of Angola, Tanzania, and West Africa, we replicated the chromosome region 2p16.1 of BCL11A, and the meta-analysis fixed effect test enabled the recovery of five several variants near BCL11A within 2p16.1 harboring another five genes, including IFITM3P9, RPL26P13, RNU6-612P, ATP1B3P1, and PAPOLG (Table 4).

Discussion
The phenotype heterogeneity of SCA presents a challenge for patients' clinical management.Our study addresses the issue of potential function-altering variants and genetic modifiers of variation associated with these heterogeneous phenotypes.We utilized a design that ascertained HbF individuals from the extremes of genetic risk, including Hemolytic and VOC phenotypes.With this, we were able to generate a targeted sequence catalog of 192 Angolan samples from high-quality variants, calling on 5,019,378 variants with high confidence.An SCD-specific population structure study was conducted within our population samples and between the 20 WECG, which showed that the study samples clustered separately from the rest of these groups (Figure 2) and, particularly, formed a clearly distinct cluster from the Khoisan group, an ethnic group from southern African with fewer incidences of Malaria and SCA, which is not surprising because samples from Angola were not included in the 1KGP.Additionally, we observed variation in the distribution of minor alleles at rare variants within the MAF range of 0.0-0.05,as well as at the MAF range of 0.1-0.2 between SCA Angola and WECG (Figure 3A).Within the SCA Angolan samples, variation in the distribution of MAF was observed in SNP frequencies ranging between 5% and 20% (Figure 3B), suggesting possible mutations and genetic modifiers may result in heterogeneous phenotypes of SCA.
The first key finding points to significant differences in the burden of non-synonymous, function-altering variants in a total of 26 genes (Supplementary Table S1B,C), of which a strong variation in gene-specific SNPs was observed within SCA Angolan samples (Figure 3D), as well as between Angolan and WECG (Figure 3C), supporting the hypothesis that genetic modifiers may result in a potential clinical variability in SCA phenotypes.These genes are enriched for deleterious and loss-of-function mutations in phenotypically defined groups of Angolan SCA patients and with evidence of genetic association with different phenotypes, providing support for the polygenic view of the genetic architecture of SCD phenotypic variability.
Notably, pathways (Figure 1A,B), including Oxidative phosphorylation, Respiratory electron transport, and Arginine biosynthesis pathways represented by these 26 genes, point to relevant pathophysiological mechanisms and are already therapeutic targets [37,50].Importantly, Arginine biosynthesis is a key factor in the hemolysis-endothelial dysfunction observed in SCD and has become a target for therapeutic interventions [37,50].This finding is novel and noteworthy and will contribute to a greater understanding of the variability in the clinical expression of SCA, and our identified genes and pathways suggest new avenues for other interventions.
The second key finding of this paper suggests two SNPs in the intronic region of 2q16.1, harboring the genome-wide BCL11A gene, which is significantly associated with decreasing HbF.Interestingly, through the cross-HbF meta-analysis of Angola, Tanzania, and West Africa, we replicate the chromosome region 2p16.1 of BCL11A, and the meta-analysis fixed effect test enabled the recovery of several variants near BCL11A within 2p16.1, as well as other five genes, including IFITM3P9 (processed pseudogene), RPL26P13 (processed pseudogene), RNU6-612P (snRNA), ATP1B3P1 (processed pseudogene), PAPOLG (protein coding).BCL11A is a potent silencer of fetal hemoglobin and controls the β-globin gene cluster in concert with other factors.Our study demonstrated that BCL11A interacts with 43 genes (Supplementary Figures S3A and S4) either in physical, co-expression, or both pathway networks.This network is enriched in the B Cell Receptor Signaling pathway and associated with the Gastrointestinal stroma tumor (HP:0100723) human phenotype (Supplementary Figure S3B).
Our study leveraged HbF association summary statistics based on targeted sequence to partition the cumulative heritability into 65 different functional categories and biological pathways.We observed cumulative heritability in fewer categories, such as in fetal DNase I hypersensitive site and lysine H3K27 acetylation (Supplementary Figure S4), supporting the polygenic view of the genetic architecture of HbF SCD and demonstrating consistency with the hypothesis that the vast proportion of complex, heritable traits/diseases is explained by SNPs with small effect sizes.
Furthermore, this study identified a set of variants in 18 chromosomal regions (Figure 4B) to nominally be associated with increasing VOC (Supplementary Table S4(1)).This study also found that these variants are potential genetic modifiers causing phenotypic variation among patients with VOC and Hemolytic phenotypes in Angola SCA.This study additionally detected a set of variants ranging in 12 chromosomal regions to nominally be associated with lower HbF (Supplementary Table S4(2)).Most of the genes associated with these nominally significant variants, including BCL11A, are interestingly part of the BCL11A functional/physical and co-expression network (Supplementary Figure S3A).
To our knowledge, this is the first investigation of clinical variation in SCA in Angola using a well-customized and targeted sequencing approach.The strengths of the study include well-defined clinical groups, sites where treatment is unlikely to confound outcomes, the use of several different but complementary analytical approaches, and the linking of the identified genes and pathways to published therapeutic and transcriptomic data.Nonetheless, the study has some limitations; some of our findings may depend greatly on laboratory experiments, and the distribution of actionable genes across SCA phenotypic groups may depend on continuous genetic diversity, natural selection, and genetic drift.Such a study paves the way for the continuous analysis of SCA-specific actionable and therapeutic genes and their genetic mechanism underpinning SCA.
In summary, we reported a well-customized and targeted sequence catalog of 192 Angolan samples from high-quality variants, more specifically, 5,019,378 high-confidence variants.We generated a catalog of candidate modifier genes that clustered in pathophysiological pathways important for SCA, supporting the polygenic view of the genetic architecture of SCD phenotypic variability with implications for therapeutic intervention.We also identified and replicated the association of BCL11A in decreasing HbF and constructed a physical, co-expression pathway network for BCL11A, harboring 43 other genes.Moreover, we generated a catalog of nominally significant candidate genes associated with increasing VOC and a set of nominally significant candidate genes associated with lower HbF.This study fills an important knowledge gap by using a precise panel in a targeted sequencing approach focusing on deleterious coding variants that are important in two specific phenotypic categories of SCA patients (VOC and Hemolytic).This study, thus, makes significant contributions to the present knowledge of the natural history and clinical heterogeneity of SCA, with the potential to inform the design of new therapeutic measures.

Figure 1 .
Figure 1.(A) Physical, co-expression, and functional networks of the 26 genes where significant differences in the burden of non-synonymous, function-altering variants were identified among the 192 SCA patients.(B) Pathways associated with the 26 genes where significant differences in the burden of non-synonymous, function-altering variants were identified among the 192 SCA.

Figure 1 .
Figure 1.(A) Physical, co-expression, and functional networks of the 26 genes where significant differences in the burden of non-synonymous, function-altering variants were identified among the 192 SCA patients.(B) Pathways associated with the 26 genes where significant differences in the burden of non-synonymous, function-altering variants were identified among the 192 SCA.

Figure 2 .
Figure 2. Principal Component Analysis (PCA) of Sickle Cell Disease cohorts from Cameroon and Tanzania.(A) PCA plot of the first and the second eigenvectors for 20 ethnic groups with SCD from Cameroon and Tanzania.(B) Phylogeny tree showing evolutionary partnership between SCD cohorts and general populations from 20 ethnic groups.(C) PCA plot of only Africa-specific ethnicities with SCD cohorts in the first and the second eigenvectors.

Figure 2 .
Figure 2. Principal Component Analysis (PCA) of Sickle Cell Disease cohorts from Cameroon and Tanzania.(A) PCA plot of the first and the second eigenvectors for 20 ethnic groups with SCD from Cameroon and Tanzania.(B) Phylogeny tree showing evolutionary partnership between SCD cohorts and general populations from 20 ethnic groups.(C) PCA plot of only Africa-specific ethnicities with SCD cohorts in the first and the second eigenvectors.

Figure 4 .
Figure 4. Genome-wide association analysis of unimputed and imputed SCD genotype data.(A,B) Manhattan plot of the GWAS association test of both unimputed Cameroon SCD discovery and replication cohorts.(C,D) Manhattan plot of the GWAS association test of imputed combined Tanzania and Cameroon SCD cohorts.(E,F) Manhattan plot of the GWAS association test of imputed Cameroon SCD replication cohort.Red line denotes the genome-wide significance thresholds.Blue line denotes the level of suggestive significance.The insert in each Manhattan plot is Quantile-Quantile (Q-Q) plot of expected vs. observed -log10P value within the genomic inflation factor (lambda GC).

Figure 4 .
Figure 4. Genome-wide association analysis of unimputed and imputed SCD genotype data.(A,B) Manhattan plot of the GWAS association test of both unimputed Cameroon SCD discovery and replication cohorts.(C,D) Manhattan plot of the GWAS association test of imputed combined Tanzania and Cameroon SCD cohorts.(E,F) Manhattan plot of the GWAS association test of imputed Cameroon SCD replication cohort.Red line denotes the genome-wide significance thresholds.Blue line denotes the level of suggestive significance.The insert in each Manhattan plot is Quantile-Quantile (Q-Q) plot of expected vs. observed -log10P value within the genomic inflation factor (lambda GC).

Table 2 .
Significant genes from gene-set rare-variant association analyses in Angola Sickle Cell Diseases.

Table 2 .
Significant genes from gene-set rare-variant association analyses in Angola Sickle Cell Diseases.

Table 3 .
Top significant variants from the association analyses in Angola HBF Sickle Cell Diseases.The HbF shows significant association with BCL11A and is nominally associated with 4 other genes, including OR4C46, GFOD1, ACTR3BP2, and MUC3A.

Table 3 .
Top significant variants from the association analyses in Angola HBF Sickle Cell Diseases.The HbF shows significant association with BCL11A and is nominally associated with 4 other genes, including OR4C46, GFOD1, ACTR3BP2, and MUC3A.

Table 4 .
Cross-meta-analysis of Sickle Cell Disease cohorts: Angola, Tanzania, and West Africa.Cross-Sickle Cell Disease studies meta-analysis: African and Africa-American.The cross-meta-analysis shows a significant association of HbF with several variants in chromosome region of 2p16.1 near BCL11A, including 5 other genes within the region 2p16.1.P1, P2, and P3 stand for Angola, Tanzania, and West Africa study P values.M1, M2, and M3 stand for posterior probabilities that the effect exists within Angola, Tanzania, and West Africa studies, respectively.