Unravelling the Complexity of the +33 C>G [HBB:c.-18C>G] Variant in Beta Thalassemia

The +33 C>G variant [NM_000518.5(HBB):c.-18C>G] in the 5′ untranslated region (UTR) of the β-globin gene is described in the literature as both mild and silent, while it causes a phenotype of thalassemia intermedia in the presence of a severe β-thalassemia allele. Despite its potential clinical significance, the determination of its pathogenicity according to established standards requires a greater number of published cases and co-segregation evidence than what is currently available. The present study provides an extensive phenotypic characterization of +33 C>G using 26 heterozygous and 11 compound heterozygous novel cases detected in Cyprus and employs computational predictors (CADD, RegulomeDB) to better understand its impact on clinical severity. Genotype identification of globin gene variants, including α- and δ-thalassemia determinants, and rs7482144 (XmnI) was carried out using Sanger sequencing, gap-PCR, and restriction enzyme digestion methods. The heterozygous state of +33 C>G had a silent phenotype without apparent microcytosis or hypochromia, while compound heterozygosity with a β+ or β0 allele had a spectrum of clinical phenotypes. Awareness of the +33 C>G is required across Mediterranean populations where β-thalassemia is frequent, particularly in Cyprus, with significant relevance in population screening and fetal diagnostic applications.


Introduction
β-thalassemia results from quantitative defects in the β-subunit of adult hemoglobin (HbA, α 2 β 2 ).Thus far, over 400 different mutant alleles have been reported that affect different levels of β-globin (HBB) gene regulation and expression [1].They are generally classified as β 0 when no chains are produced, β + when β-globin production is severely reduced, and β ++ when β-globin production is mildly reduced [2].They have a distinct hematological phenotype in the heterozygote, essentially characterized by a variable reduction in mean cell hemoglobin (MCH) and mean cell volume (MCV) with raised HbA 2 levels.In some cases, β alleles are so mild that they are phenotypically silent with normal hematological parameters.Interactions of β alleles produce variable phenotypes ranging from a transfusion-dependent to an intermediate-to-mild form of anemia [3].The clinical severity of the condition may be modified by the co-inheritance of ameliorating genetic factors such as α-thalassemia and genetic determinants related to fetal hemoglobin (HbF) [4].
The presence of mild β alleles usually leads to intermediate forms of thalassemia during adulthood or in later stages of life.They manifest a condition of intermediate severity in the homozygous state, whereas interactions with severe alleles confer a spectrum of severity ranging from mild to severe.The mildest of the β alleles, also called silent, exert a hematological phenotype typical of heterozygous β-thalassemia when present in the homozygous state and a mild form of thalassemia intermedia in the compound heterozygous state with severe alleles [5,6].A few mild and even less silent β alleles have been identified thus far, mainly affecting HBB transcription, mRNA processing, and mRNA translation [2,7], as shown in Table 1.The identification and characterization of mild β alleles have important implications for fetal diagnostic and treatment services.This is particularly important due to recent findings linking milder genotypes to poorer survival [8], while the accurate pathogenicity assertion of the mild β thalassemia variants remains a priority for clinical genome interpretation.
The present study reports novel case-level data on β +33 C>G [NM_000518.5(HBB):c.-18C>G]single nucleotide substitution located in the 5 ′ untranslated region (UTR) of HBB.This variant is characterized by a few studies primarily conducted in Mediterranean populations to cause minimally reduced or normal red cell indices in the heterozygous state.However, when combined with severe β + or β 0 alleles, it manifests a thalassemia intermedia phenotype [9,10].Variant-to-phenotype associations have been described so far for four heterozygotes and six compound heterozygotes (available in IthaPhen [11]).Caselevel and segregation data are important in the evaluation of variant pathogenicity using the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) best-practice guidelines [12,13] while insufficient numbers of independent cases can contribute to the classification of variants as variants of uncertain significance (VUS).The present study enriches the current knowledge on the genotypephenotype landscape of mild β-thalassemia through the description of 26 heterozygous and 11 compound heterozygous novel β +33 C>G cases from the Cypriot population.Moreover, the study employs computational predictions to assess the variant's functional consequence with the aim of generating a more precise understanding of its impact on clinical severity.

Ethics Statement and Study Participants
The study was conducted in the Molecular Genetics Thalassemia Department of the Cyprus Institute of Neurology and Genetics (CING), in collaboration with the Thalassemia Clinics in Nicosia and Limassol.Participants provided their informed consent for the analysis.Samples were collected as part of routine patient care and data were extracted from the clinical records of participants.Molecular analyses were carried out in the CING laboratory, adhering to the CYS EN ISO 15189:2012 accreditation standard, in response to participant requests.The results underwent thorough evaluation and confirmation by the senior head of the laboratory before being sent to the ordering physician.This study is a report of the clinical experience of the assay concealing any description or personal information of the individual case.All data were de-identified to protect the privacy of participants and submitted to IthaPhen (https://www.ithanet.eu/db/ithaphen)[11], accessed on 18 September 2023.

Hematological Analyses
Hematological studies were performed using routine methods.Hemoglobin (Hb) analysis, including the separation and quantification of Hb subtypes, was performed by cation exchange high-performance liquid chromatography (HPLC) (VARIANT™; Bio-Rad Laboratories, Hercules, CA, USA).

Bioinformatics Analyses
The Combined Annotation Dependent Depletion (CADD) [17][18][19] tool was used to predict the deleteriousness of the 5 ′ UTR β +33 C>G variant.The PHRED CADD score for this variant was accessed from the Ensembl Variant Effect Predictor (VEP) [20] tool and was compared to thresholds for pathogenic and benign predictions for variants in hemoglobinopathy-related globin genes as defined in Tamana et al. [21].The RegulomeDB v2.2 [22] tool was used to annotate regulatory information on this variant and facilitate the identification of its potential causal link to disease.RegulomeDB v2.2 scores range from 1 to 6, with lower scores indicating a higher probability of a functional variant.

Compound Heterozygotes for β +33 C>G with Other β-thalassemia Alleles
Phenotypic data are presented for 11 compound heterozygotes with β alleles that are frequent in the Cypriot population [15], namely β + IVS I-110 G>A [HBB:c.93-21G>A](6 cases), β 0 CD 39 C>T [HBB:c.118C>T](4 cases), and β 0 IVS I-1 G>A [HBB:c.92+1G>A](1 case), as shown in Table 4.In none of the samples analyzed was homozygosity for the β +33 C>G variation detected.Among the study sample, six individuals required medical intervention, which eventually led to a diagnosis of β-thalassemia (cases 1-6, Table 4), while the remaining samples were detected during family studies and routine screening.Hematological indices were reported at the time of diagnosis (pre-transfusion) for ten patients, revealing a similar pattern of microcytic, hypochromic anemia (MCV 60.28 ± 3.59 (mean ± SD), 61.05 (54.5-65.2) (median (min-max)); MCH 19.22 ± 1.33 (mean ± SD), 19.4 (17.2-21.3)(median (min-max); Hb (female) 8.86 ± 0.56 (mean ± SD), 8.7 (8-9.6)(median (min-max)); Hb (male) 10.13 ± 3.11 (mean ± SD), 8.7 (8-13.7)(median (minmax))), as shown in Table 1.All patients had a normal alpha genotype except for case 7 (with β + IVS I-110 G>A), which had heterozygous α + thalassemia (αα/-α 3.7 ) and had also tested positive for the XmnI polymorphism.Owing to the co-inheritance of these ameliorating factors affecting disease severity, case 7 presented with a near-normal Hb level of 13.7 g/dL and mild morphologic abnormalities of blood cells on film, with no evident clinical manifestations.Another patient (case 9, with β + IVS I-110 G>A) was found to carry the XmnI polymorphism, which likely contributed to an increase in HbF at 8.2% but without profound improvements in the patient's hematological phenotype.This patient had no history of blood transfusion, while clinical data were unavailable.Three patients had transfusion-dependent thalassemia (TDT) of whom two patients required regular blood transfusions in adulthood after the age of 30 (cases 3 and 4, both with β 0 CD 39 C>T), while the third patient began transfusion therapy at the young age of three (case 6, with β + IVS I-110 G>A).Of these, only case 3 presented with clinical symptoms, namely splenomegaly and facial bone deformities.Two other patients (cases 1 and 8) required occasional blood transfusions following medical procedures but were otherwise clinically normal.Two out of three patients with non-transfusion-dependent thalassemia (NTDT) presented with splenomegaly (case 5, with β + IVS I-110 G>A) and a barely palpable spleen (case 11, with β 0 IVS I-1 G>A), while the third patient only had an abnormal hematological phenotype (case 2, with β 0 CD 39 C>T).Overall, case 6 appears to have the most severe phenotype among all diagnosed patients due to an early-on blood transfusion requirement; the presence of an alternate genetic determinant of disease severity is probable, warranting further investigation.

A Use Case for Comprehensive Genetic Analysis
Clinical diagnostic laboratories, including the CING laboratory, are increasingly adopting genetic testing protocols that rely on next-generation sequencing (NGS) technologies.These advanced methods enable the comprehensive detection of various genetic variations, facilitating the diagnosis of hemoglobinopathies and other inherited diseases.Figure 1 illustrates a family that underwent β-thalassemia prenatal diagnosis at the CING.The mother is a carrier of the severe IVS I-110 G>A variation while the father (case 7, Table 4) was initially diagnosed as a carrier of the IVS I-110 G>A variation also through targeted molecular methods.Sanger sequencing of the HBB performed on the fetal sample for prenatal diagnosis purposes detected the β +33 C>G variation in the heterozygous state.Repeat parental testing identified β +33 C>G in the father in compound heterozygosity with IVSI-110 G>A.IVSI-110 G>A is a severe β + allele, inducing abnormal splicing and leading to a reduced steady-state HBB protein level of only 10% in affected individuals [24].This case of fetal diagnostics demonstrates the silent nature of β +33 C>G in the presence of a severe β allele in the hematological phenotype of the father and highlights the benefit of adopting more comprehensive approaches, such as NGS, for holistic thalassemia genetic profiling.4); Hb, hemoglobin; MCV, mean corpuscular volume; MCH, mean corpuscular hemoglobin; HbA2, A2 hemoglobin; ?, unknown β allele as a result of direct detection methods utilized for the identification of the most common β variants.

Computational Data
The computational predictor CADD annotated β +33 C>G (rs34135787) as pathogenic with a PHRED CADD score of 18 (threshold > 12) [21].RegulomeDB identified this variant as having a potentially functional consequence on HBB by exploring various sources of data, including chromatin states, epigenetic marks, motif instances, transcription factor (TF) binding, and expression quantitative trait loci (eQTLs) annotations, as shown in Figure 2. β +33 C>G (rs34135787) is located 33 bases downstream of the transcriptional start site of HBB.RegulomeDB showed hits to several Dnase-seq peaks (open chromatin regions) in various cell types with significant activity in hematopoietic multipotent progenitor cells.In this cell type, this region (rs34135787-HBB) was mapped with an active enhancer state and shown to contain TF binding sites for GATA1 (GATA binding protein 1), POLR2A (RNA Polymerase II Subunit A), and CEBPA (CCAAT enhancer binding protein alpha).Furthermore, RegulomeDB TF motif evidence suggested that allele G would disrupt the binding of RFX2 (DNA-binding protein RFX2) in liver-related biosamples and RUNX3 (RUNX3 antisense RNA 1) in 26 different cells and tissues.RegulomeDB scored rs34135787 as a category 2 variant based on evidence of binding through ChIP-seq and DNase data albeit in the absence of eQTL information, indicating that this variant is most likely to impact gene regulation.4); Hb, hemoglobin; MCV, mean corpuscular volume; MCH, mean corpuscular hemoglobin; HbA2, A2 hemoglobin; ?, unknown β allele as a result of direct detection methods utilized for the identification of the most common β variants.

Computational Data
The computational predictor CADD annotated β +33 C>G (rs34135787) as pathogenic with a PHRED CADD score of 18 (threshold > 12) [21].RegulomeDB identified this variant as having a potentially functional consequence on HBB by exploring various sources of data, including chromatin states, epigenetic marks, motif instances, transcription factor (TF) binding, and expression quantitative trait loci (eQTLs) annotations, as shown in Figure 2. β +33 C>G (rs34135787) is located 33 bases downstream of the transcriptional start site of HBB.RegulomeDB showed hits to several Dnase-seq peaks (open chromatin regions) in various cell types with significant activity in hematopoietic multipotent progenitor cells.In this cell type, this region (rs34135787-HBB) was mapped with an active enhancer state and shown to contain TF binding sites for GATA1 (GATA binding protein 1), POLR2A (RNA Polymerase II Subunit A), and CEBPA (CCAAT enhancer binding protein alpha).Furthermore, RegulomeDB TF motif evidence suggested that allele G would disrupt the binding of RFX2 (DNA-binding protein RFX2) in liver-related biosamples and RUNX3 (RUNX3 antisense RNA 1) in 26 different cells and tissues.RegulomeDB scored rs34135787 as a category 2 variant based on evidence of binding through ChIP-seq and DNase data albeit in the absence of eQTL information, indicating that this variant is most likely to impact gene regulation.

Computational Data
The computational predictor CADD annotated β +33 C>G (rs34135787) as pathogenic with a PHRED CADD score of 18 (threshold > 12) [21].RegulomeDB identified this variant as having a potentially functional consequence on HBB by exploring various sources of data, including chromatin states, epigenetic marks, motif instances, transcription factor (TF) binding, and expression quantitative trait loci (eQTLs) annotations, as shown in Figure 2. β +33 C>G (rs34135787) is located 33 bases downstream of the transcriptional start site of HBB.RegulomeDB showed hits to several Dnase-seq peaks (open chromatin regions) in various cell types with significant activity in hematopoietic multipotent progenitor cells.In this cell type, this region (rs34135787-HBB) was mapped with an active enhancer state and shown to contain TF binding sites for GATA1 (GATA binding protein 1), POLR2A (RNA Polymerase II Subunit A), and CEBPA (CCAAT enhancer binding protein alpha).Furthermore, RegulomeDB TF motif evidence suggested that allele G would disrupt the binding of RFX2 (DNA-binding protein RFX2) in liver-related biosamples and RUNX3 (RUNX3 antisense RNA 1) in 26 different cells and tissues.RegulomeDB scored rs34135787 as a category 2 variant based on evidence of binding through ChIP-seq and DNase data albeit in the absence of eQTL information, indicating that this variant is most likely to impact gene regulation.

Discussion
β +33 C>G (rs34135787) is a non-coding 5 ′ UTR variant in HBB characterized in the literature as both a mild and silent β thalassemia allele [2,9,10,14], while its pathogenicity is not fully determined.Published case reports on β +33 C>G are scarce, while this variant is absent from reference population databases, indicating that this is possibly a rare variant.It was previously detected in individuals of Spanish [10], Turkish [25], and Greek Cypriot [9] origin, while a recent epidemiological study revealed that β +33 C>G has a carrier frequency of 0.086% (2/2335 alleles) in Cyprus [15].This evidence shows that β +33 C>G is most likely to be encountered in people of Mediterranean heritage, particularly the population of Cyprus, having significant relevance in population screening and fetal diagnostic applications.
The present study provides the most extensive phenotypic characterization of β +33 C>G using 26 heterozygous and 11 compound heterozygous cases detected in Cyprus as part of routine screening and specialized molecular diagnostic procedures.The vast majority of heterozygous samples had normal RBC indices and a normal/borderline-raised HbA 2 level.A few atypical cases were analyzed for the presence of α-thalassemia and δthalassemia determinants.Only the deletional -α 3.7 variant was detected, which is the most common α-thalassemia allele in Cyprus with a relative frequency of 72.8% of all α-globin variations [15].Overall, the heterozygous state of β +33 C>G exhibited a mild effect on the hematological phenotype.On the other hand, co-inheritance of β +33 C>G with a severe β allele produced hematological changes and a spectrum of clinical phenotypes among the affected individuals, while anemia was exacerbated under certain conditions, such as pregnancy.Co-inheritance with a β + allele (IVS I-110 G>A) demonstrated a milder variable impact on the observed phenotype compared to the co-inheritance with a β 0 allele (CD 39 C>T, IVS I-1 G>A).Overall, the co-inheritance with a severe β 0 or β + allele produced a clinical phenotype of thalassemia intermedia that worsened with age.The phenotypic manifestation observed in cases 6 and 7 (Table 4) may have been influenced by additional hereditary factors, indicating the need for further investigation.Overall, the phenotype of compound heterozygotes varied from clinically silent to mild transfusion-dependent thalassemia, indicating that a larger number of patients will be needed to delineate the pathogenicity of the variant and facilitate accurate prediction of the phenotype.
The mild and silent β alleles are predominantly identified incidentally during routine diagnostic screening or during investigating parental inheritance patterns.They are predominantly found in the non-coding sequences of HBB and can impact β-globin expression mainly by affecting HBB gene transcription (promoter, 5 ′ UTR) and mRNA processing (consensus and cryptic splice sites, poly A signal, 3 ′ UTR) [2,14].In addition, the 5 ′ UTR sequences may contain structural motifs and upstream open reading frames that regulate mRNA translation [26].In vitro cell-based assays investigating 5 ′ UTR β alleles previously associated with thalassemia intermedia showed that the β +33 C>G allele acts at the level of transcription by reducing HBB mRNA output without affecting its stability [27,28].Its genomic location also suggests a role in the regulation of HBB transcription.The +33 position lies downstream of the HBB transcriptional start site within a region that contains the downstream core element (DCE).The C>G change at this position disrupts DCE and, in turn, the binding of transcription factor II D (TFIID) that aids in the recruitment of RNA polymerase II (RNAPII) to the promoter [29].These findings are in line with Regu-lomeDB v2.2 functional annotation of β +33 C>G.The predicted binding of GATA1 (OMIM 305371, regulates RNAPII recruitment to the HBB promoter and BLCR) and POLR2 (OMIM 180660, encodes the large subunit of RNAPII) and the active enhancer state of this region (rs34135787-HBB) in blood-derived cells showed that β +33 C>G is likely to function by disputing RNAPII-mediated HBB expression.Collectively, these data demonstrated that the major mechanism by which β +33 C>G reduces the production of HBB mRNA is at the level of transcription.
In conclusion, β +33 C>G is a mild β allele associated with a thalassemia intermedia phenotype, exhibiting cosegregation with disease across multiple affected families and is supported by pathogenic predictions by computational methods and functional analyses.It is expected that there are many more cases of β +33 C>G that remain undiagnosed since the disease usually manifests at an older age, with or without clinical intervention, or is exacerbated during particular conditions such as infective diseases and pregnancy.Diagnostic screening for hemoglobinopathies traditionally uses molecular approaches to search for the most frequent genetic variations in the population from which the patient originates, resulting in mild and silent variations almost often remaining unreported.A showcase of fetal diagnostics highlights the benefit of implementing NGS to enable comprehensive and accurate detection of genetic variation.Although there is currently no indication for prenatal diagnosis in mild or even silent β-thalassemia, genotype information can equip prospective parents with the knowledge that their offspring may require medical care, such as occasional transfusions during the course of their lives.
The present study describes for the first time the phenotype of β +33 C>G in the Cypriot population and provides the largest pool of samples for comprehensive genotypephenotype analysis to better predict the clinical effect of this variant.The sharing of case observations and phenotypic information from laboratories to public domains serves as an invaluable resource for clinical variant interpretation and resolving variants with conflicting interpretations.It is also valuable for the study of rare diseases, such as hemoglobinopathies and other rare anemias that often have limited sample sizes and fragmented data.The genotype-phenotype data featured in this paper is openly available on the IthaPhen database [11].Despite detailing the minimum standards in screening and diagnosis in this study, it is essential to acknowledge the importance of a more thorough understanding.This can be achieved through the implementation of larger and more detailed future studies, which are necessary to fully elucidate the genotype-phenotype correlations within this cohort.Furthermore, the study highlights the importance of early detection since the co-inheritance of a mild β allele, as in the case of β +33 C>G, with a severe β allele may lead to a clinical phenotype with important implications on the life of the affected individual.Genotype information can play a crucial role in enabling affected individuals to proactively seek therapeutic management for their condition and receive proper care at an early stage, ultimately contributing to an improved quality of life.

Institutional Review Board Statement:
The study was conducted according to the ethical guidelines outlined in the Declaration of Helsinki.The study adheres to the guidelines and regulations of Cyprus legislation and the National Bioethics Committee.All genetic and personal information utilized in this study was collected as part of routine diagnostic services at the Cyprus Institute of Neurology and Genetics (CING), following participants' requests and in accordance with CING regulations.No additional data were collected or stored for this research investigation.All subjects were de-identified in compliance with the FDA Guidance Document "Informed Consent for In Vitro Diagnostic Device Studies Using Leftover Human Specimens that are Not Individually Identifiable," issued in April 2006.This study is exempt from IRB review, as confirmed by the Cyprus National Bioethics Committee.To safeguard the anonymity of these subjects, demographic data were limited to age and sex, without providing detailed descriptions of individual cases.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the analysis.

13 Figure 1 .
Figure 1.A pedigree showing HBB carriers and affected family members with IVS I-110 G>A and +33 C>G.proband (case 7, Table4); Hb, hemoglobin; MCV, mean corpuscular volume; MCH, mean corpuscular hemoglobin; HbA2, A2 hemoglobin; ?, unknown β allele as a result of direct detection methods utilized for the identification of the most common β variants.

Figure 1 . 14 Figure 1 .
Figure 1.A pedigree showing HBB carriers and affected family members with IVS I-110 G>A and +33 C>G.

Biomedicines 2024 , 13 Figure 2 .
Figure 2. RegulomeDB v2.2 analysis for rs34135787 [Accessed: 30 May 2023].RegulomeDB scores the functionality of rs34135787 based upon experimental data derived from large published datasets covering various tissues and cell lines, such as its presence in DNase hypersensitive regions (DNaseseq), promoters or enhancers (chromatin state), and sequences affecting the binding of transcription factors (TF-ChIP-seq) and DNA motifs.The top hits in each category are shown.

Figure 2 .
Figure 2. RegulomeDB v2.2 analysis for rs34135787 [Accessed: 30 May 2023].RegulomeDB scores the functionality of rs34135787 based upon experimental data derived from large published datasets covering various tissues and cell lines, such as its presence in DNase hypersensitive regions (DNaseseq), promoters or enhancers (chromatin state), and sequences affecting the binding of transcription factors (TF-ChIP-seq) and DNA motifs.The top hits in each category are shown.

:
Cypriot families with β +33 C>G [HBB:c.-18C>G].Author Contributions: C.S., M.P., M.K. and T.P. conceived and designed the study; S.C., C.M. and M.H. provided the samples and phenotypic data; M.P. performed the molecular analyses; P.K. guided in silico analyses; C.S. prepared tables and figures; C.S. wrote the manuscript.All authors have read and agreed to the published version of the manuscript.Funding: This research received no external funding.

Table 2 .
Characteristics of the study sample.
therapy requirements documented at age 35.Case 6 is diagnosed with β-thalassemia major and has undergone lifelong transfusions.