Application of Targeted Next-Generation Sequencing for the Investigation of Thalassemia in a Developing Country: A Single Center Experience

Thalassemia is identified as a prevalent disease in Malaysia, known to be one of the developing countries. Fourteen patients with confirmed cases of thalassemia were recruited from the Hematology Laboratory. The molecular genotypes of these patients were tested using the multiplex-ARMS and GAP-PCR methods. The samples were repeatedly investigated using the Devyser Thalassemia kit (Devyser, Sweden), a targeted NGS panel targeting the coding regions of hemoglobin genes, namely the HBA1, HBA2, and HBB genes, which were used in this study. There were many different genetic variants found in 14 unrelated cases. Out of all fourteen cases, NGS was able to determine an additional -50 G>A (HBB:c.-100G>A) that were not identified by the multiplex-ARMS method, including HBA2 mutations, namely CD 79 (HBA2:c.239C>G). Other than that, CD 142 (HBA2:c.427T>C) and another non-deletional alpha thalassemia and alpha triplication were also not picked up by the GAP-PCR methods. We illustrated a broad, targeted NGS-based test that proposes benefits rather than using traditional screening or basic molecular methods. The results of this study should be heeded, as this is the first report on the practicality of targeted NGS concerning the biological and phenotypic features of thalassemia, especially in a developing population. Discovering rare pathogenic thalassemia variants and additional secondary modifiers may facilitate precise diagnosis and better disease prevention.


Introduction
Hemoglobinopathies are classified as thalassemia, due to the reduced synthesis rate of one of the globin chains or structural hemoglobin (Hb) variants caused by single amino acid substitutions in the α or β globin chains. Thalassemia is an autosomal recessive disorder. Most conditions are found throughout the Middle East, Mediterranean region, and Indian subcontinent, as well as in Southeast Asia [1]. Thalassemia is identified as a prevalent disease in Malaysia, known to be common in the developing country. It is estimated that around 6.8% of Malaysians are thalassemia carriers, with various degrees of anemia [2,3].
There are 8023 Thalassemia patients reported in the National Thalassemia registry as of May 2019, and 5448 (~70%) of them are transfusion-dependent. The probability of surviving up to the age of 60 increased from 60% in 1999 to 80% in 2013 with the advancement in patient diagnosis and management care. The carrier rate was estimated to be every 1 in 20 Malaysian (6.8%). Since 2017, the Malaysia Health Ministry has implemented a screening program among 16-year-old school-age children as part of a prevention and control program. It was found that 9.8% of students were carriers of thalassemia and hemoglobinopathy [3,4].
The impact of reduced hemoglobin formation forms a fragile and weak erythrocyte and leads to chronic hemolytic anemia; therefore, the affected babies will progressively become severely anemic, requiring life-long blood transfusions [5]. Thus, to improve the survival rates and continuation of normal growth in these patients, the treatment is usually accompanied by iron chelation therapy, following regular blood transfusion, to reduce iron overload [6].
The genetic basis of hemoglobin consists of amino acids with a balanced pairing of α-like and β-like globin dimers, which form functional structures and tetrameric units. The α-globin gene cluster comprises three functional globin genes, the embryonic ζ gene (HBZ) and two fetal/adult α (α1 and α2) genes (HBA1 and HBA2), which are located on the short arm of chromosome 16. On the other hand, β-like globin chains on the short arm of chromosome 11 contain five functional genes, the embryonic ε gene (HBE), two fetal Gγ and Aγ genes (HBG2 and HBG1), and adult δ and β (HBD and HBB) genes [7].
Thalassemia is caused by a broad spectrum of point mutations or/and gene deletions, resulting in the reduced or zero formation of alpha or beta globin chain sub-units [3]. The three most common β-globin mutations seen among Malays (73.1%) with the β + thalassemia phenotype are HbE [CD 26 (CAG→AAG)], IVS 1-5 (G→C), and IVS1-1 (G→T). On the other hand, five common β-globin mutations among Chinese (90%) in Malaysia are CD 41/42 (-TCTT), IVS2-654 (C→T) (β + thalassemia phenotype), -28(A→G) (β + thalassemia phenotype), CD 17 (A→T), and CD71/72 (+A) [8]. For α-thalassemia, the most common deletional and non-deletional mutations were -SEA, -α3.7, and -α4.2 and ααCd59, ααCS, and Hb Quong Sze (αα125, respectively [9]). Table 1 summarizes the incidence rate of common molecular characteristics of alpha and beta-thalassemia in several developing countries. The current investigation for thalassemia requires full blood count, HB analysis, and DNA analysis for a definitive diagnosis. However, there are many limitations in the current diagnosis approach, leading to misdiagnosis in cases with normal or borderline red blood cell indices, normal HbA2 levels, or complexity of the disease due to gene interactions.
The evolution of clinical molecular testing in the genomics era allowed next-generation sequencing analysis to play a significant role in definitive diagnosis-making. Hence, this study was performed to review the application of targeted next-generation sequencing as part of the investigation of thalassemia in a country with limited resources.

Materials and Methods
A total of fourteen cases diagnosed with thalassemia were recruited from Hematology Laboratory, Universiti Sains Malaysia, Malaysia. These samples tested positive for thalassemia during the screening tests, which tested for red cell indices and peripheral blood film and performed hemoglobin analyses.
Full blood counts (hemoglobin (Hb), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), and red blood cell (RBC)) were tested using the Automated Hematology Analyzer Sysmex XN-1000™ (Sysmex Corporation, Kobe, Japan). Hb analysis was performed using high-performance liquid chromatography (HPLC) (Bio-rad Variant II System, Beta-thalassemia Short Program, Bio-Rad Laboratories, Hercules, CA, USA) to quantify hemoglobin subtypes as HbA2 and HbF. Positive thalassemia screening was defined as having an MCV or MCH less than 80 fl or 27 pg, respectively, and an increase in HbA 2 of more than 4%.
Positive screening was then followed by genotype results obtained using five different multiplex amplification refractory mutation system (MARMS)-PCRs and one single ARMS-PCR reaction for beta-thalassemia and a gap-polymerase chain reaction (GAP-PCR) for alpha-thalassemia. MARMS were designed for 20 types of β-gene mutations. After that, zygosity testing was performed to classify them as homozygous or heterozygous. In contrast, GAP-PCR identified only α 3.7 and α 4.2 types, -SEA deletion, -THAI deletion, and -FIL deletion. In addition, clinical and laboratory data were retrieved and tabulated in Table 2. This study was approved by the Universiti Sains Malaysia Research Ethics Committee JEPeM Code: USM/JEPeM/14120494) and the National Medical Research Register, Medical Research Ethics Committee (MREC), Amendment Approval number NMRR-12-980-13829 (IIIR), and carried out following the Declaration of Helsinki.
Patients' DNA samples were retrieved, and the multiplex PCR full method was used to amplify the entire HBA1, HBA2, and HBB genes. Targeted NGS for thalassemia libraries were constructed using the Devyser Thalassemia NGS assay (Devyser, Hägersten, Sweden) following the manufacturer's instructions. This targeted NGS assay specifically identified sequence variants in HBA1, HBA2, and HBB and common sequence variants, including exon spanning copy number variations (CNVs), small or large insertion and deletions (indels), and single-nucleotide variants (SNVs). DNA samples were sequenced using MISeq (NGS platform). A specialized bioinformatic software (Devyser Amplicon Suit v3.5) workflow was used for data analysis, especially for identifying specific deletion parts of the HBA1, HBA2, and HBB genes and for accurate CNV identification. All of the data were saved in the cloud.

Results
The genetic variants in globin genes were found in 14 unrelated cases, as shown in Table 2. Based on the screening test, four were identified as heterozygous alpha-thalassemia, with two heterozygous HbE and beta-thalassemia compounds, one HbE trait with the coinheritance Hb Constant Spring, and one compound heterozygous beta-thalassemia, and six beta-thalassemia were identified. As MARMS-and GAP-PCR are two different types of methods, they required different requests, and a few of the cases missed sending the sample for either method.
Case 7 was a 27-year-old female diagnosed with compound heterozygous HbE, with Hb Constant Spring identified ( Figure 1). Her hemoglobin was 7.5 g/dL, with a normal RBC of 4.98 and hypochromic microcytic indices. Based on the HPLC method, her HBA, HBA2/E, and HbF were 81.9%, 14.6%, and 3.5%, respectively. On the following date, alphadeletional and beta-molecular studies were performed on the patient, identifying SEA deletion and CD 26 (HBB:c.79G>A). No non-deletional molecular analysis was available at this center to confirm Hb Constant Spring. Her sample was sent for NGS and variant SEA deletion, and HBB:c.79G>A and CD 142 (HBA2:c.427T>C) were identified. CD 142 (HBA2:c.427T>C), known as HB Constant Spring, is a labile α-globin variant causing α-thalassemia, due to a common missense mutation of the termination codon of HBA2; Term→Gln [19].
Similarly, case 11 was a 7-year-old girl with mild anemia. HPLC exhibited A2/E and HbF, which were 42.6% and 2.2% respectively. Her molecular genetic by MARMS-PCR showed the presence of CD 26 (G>A) and a CAP + 1 (A>C) allele mutation. Using NGS analysis, her results exhibited as CD 26 (HBB:c.79G>A) and CAP + 1 (HBB:c.-50A>C), with an additional CD 142 (HBA2:c.427T>C), which is the Hb Constant Spring. CAP + 1 (A>C) is one of the rare, silent β-thalassemia formerly found in Asian Indians. Patients with these compound heterozygotes or CAP + 1 mutations normally exhibit borderline hemoglobin (Hb) levels, mean corpuscular volumes (MCV), and Hb A2 levels [21]. In this case, the Hb Constant Spring was missed by HPLC, while CE was not available at that time.
Case 12 was a 35-year-old lady with severe hypochromic microcytic who was on regularly packed cell transfusion, despite having heterozygous CD 8/9 (+G), based on her MARMS-PCR. Interestingly, additional triplicated α-globin genes were identified following targeted NGS, which explained the manifestation of her disease (Figure 3).

Discussion
Based on the Thalassemia International Federation Guidelines for the Management of Transfusion-dependent Thalassemia (2021), the thalassemia diagnosis needs to be initiated starting from the time of screening using the complete blood count (CBC). The presumptive diagnosis must be made using at least two different methods of hemoglobin analysis. The hemoglobin analyses used in our center were capillary hemoglobin electrophoresis and high-performance liquid chromatography. Some mutations might coincide during the screening test, resulting in an incorrect diagnosis, for example, in Hb Malay (Codon 19 (A>G)), in CAP + 1 (A>C), or in complex genotyping, which could alter the hematological parameters, such as in the mild β-thalassemia/δ-thalassemia with normal HbA2.
There are also a few supplementary methods available to support the diagnosis. Examples include HbH inclusions, the osmotic fragility test (OF), solubility or sickling tests for HbS, and the DCIP (2.6 dichlorophenolindophenol) test for HbE [22]. The confirmation is at the DNA level either by GAP-PCR for identifying DNA deletions or gene rearrangements, direct sequencing analysis, and multiplex ligation-dependent probe amplification (MLPA). The sequential workflow might be time-consuming and costly and miss some rare causative variants, resulting in delayed genetic counseling and unresolved cases. Therefore, NGS may increase the speed of establishing a correct diagnosis and reduce costs for many genetic diseases [23][24][25].
Traditional methods using the RBC indices and Hb analysis, either by HPLC or CE, as primary screenings for thalassemia are limited because some silent thalassemia carriers with normal or borderline red cell indices/HbA2 levels may not be detected or can be missed, even in patients with combined carriers of αand β-thalassemia. Other than false negative results, the NGS also makes a fast diagnosis by reducing the need for further referrals and repeated blood sampling. Due to financial limitations, the DNA analysis is chosen based on a differential diagnosis analysis of the patient's phenotypic and hematological characteristics. However, our conventional methods only offer limited mutational analysis. As the phenotype varies and limits the traditional DNA analysis test, patients may sometimes miss the diagnosis. Large deletions are typically found by gap-PCR or MLPA, while point mutations and indels are typically detected by MARMS-PCR or sequencing. A homozygous β-thalassemia detected using MARMS-PCR and sequencing may not be a true homozygote but rather a compound heterozygote with a deletional β-thalassemia or a δβ-thalassemia that must be ruled out using gap-PCR, MLPA, or cascade screening [26]. Therefore, a few countries have recently introduced screening for thalassemia or hemoglobinopathies using next-generation sequencing (NGS). For example, for the Dai ethnic group from Yunnan, China, they widely use NGS to screen more than 300 α-hemoglobin and β-hemoglobin mutations, especially for identifying novel mutations and for non-invasive prenatal diagnosis. Out of the screening population, 49.5% identified as thalassemia mutation carriers [27].
NGS among thalassemia carriers is widely used, especially in China, as it helps identify unknown mutations and detect missed thalassemia simultaneously. Following the review by Suhaimi SA et al., a few studies used NGS as part of thalassemia screening [28]. For example, Shang et al. and He et al. conducted a survey on population and premarital screening programs and found that 12.1% of the variants were missed when using Hb analysis, an additional 35 couples were at risk, and 27.5% missed the carrier, respectively, based on NGS screening methods [27,29]. Similar to a study by Zhang et al. in 2019 on population screening, they identified five different novel mutations through the NGS approach, and around 2.8% of carriers were missed when using the routine method [30].
To date, more than 350 different alleles have been discovered for beta-thalassemia mutations, and over 100 different mutations have been discovered for alpha-thalassemia [31,32]. As such, molecular techniques have long been lauded as the gold standard in the diagnosis of thalassemia. However, with globalization and an increase in migration between countries near and far, the local genomic landscapes of thalassemic mutations is rapidly evolving. The intermingling and marriages of different genetic backgrounds leads to the introduction of different thalassemic mutations, leading to different patterns of coinheritance. Conventional PCR methods using targeted primers do not account for the less common or unknown mutations of any given population. NGS, on the other hand, is able to identify unknown, and even novel, mutations that are not picked up by conventional PCR methods [28]. For that reason, NGS has become increasingly popular for finding the definitive diagnosis and even for the screening of carriers of thalassemia.
NGS provides a more comprehensive and complete analysis of a patient's genetic makeup, as it has the ability to detect multiple mutations on a single gene in a single test, whereas PCR typically only detects specific mutations that are targeted by the primer set used in the reaction. In Malaysia, at least five reference centers perform genetic testing for thalassemia, which includes GAP-PCR for alpha-thalassemia and MARMS-PCR for β-thalassemia. The MARMS-PCR method is limited to twenty different common mutations among the Malaysian population, followed by zygosity PCR analysis for positive results. Furthermore, some centers in Malaysia, for example in Hospital Kuala Lumpur, Malaysia, perform PCR for non-deletional alpha-PCR and Sanger sequencing. Thus, this requires repeated blood sampling and further referral tests, especially from regional hospitals. NGS also allows for the simultaneous detection of variants or mutations in both alpha and beta genes.
We highlighted the importance of including globin genes in the NGS analysis in most of the presented cases. From all fourteen cases, NGS was able to pick up an additional -50 G>A (HBB:c.-100G>A) that were not identified by the MARMS method, including HBA2 mutations, namely CD 79 (HBA2:c.239C>G). Other than that, there was also CD 142 (HBA2:c.427T>C), as well as other non-deletional alpha-thalassemia and alpha triplications that were not picked up by GAP-PCR methods that exclusively look for deletional alpha mutations common in our population. CD 26 (G>A) was not carried out based on the MARMS method in our cases because we routinely did not proceed with the molecular diagnostic, especially for those with an A2 in between 25 to 35%. All the additional mutations identified were clinically pathogenic and might cause further harm, especially for couples at risk. This clearly demonstrates the advantage of using NGS in the comprehensive diagnosis of thalassemia.
In addition, hemoglobin [Hb] J-Singapore was overlooked in Case 10 in the HPLC and gel electrophoresis methods. CD 79 (HBA2:c.239C>G) was discovered when NGS was used. Hb J-Singapore is a rare α-globin chain variant that has been reported in Singapore, Malaysia, and Thailand [33]. No extensive study was performed on the prevalence of this Hb variant. However, a few cases were reported among Malaysian family members living in Singapore, one of them a Malaysian and Thai woman [33][34][35]. In our case, it showed an abnormal band (presence of a fast band in the Hb Bart region with a prominent A2/E band) during alkaline gel electrophoresis, while there was no abnormal Hb peak at a retention time of 1.50-1.90, based on high-performance liquid chromatography [HPLC]. Furthermore, no capillary electrophoresis was performed on this patient. The variant eluted in the P3 window might have been overlooked during routine Hb analysis [35]. This case was an example of how DNA sequencing data complements identifying Hb variants, particularly for those that undergo posttranslational modifications. This Hb J-Singapore may be clinically relevant when it co-inherits with the alpha-thalassaemia-1 or other α-globin gene variants [33]. In this case, the heterogeneity of CD 41/42 may produce a thalassemia major phenotype. However, as it has co-inheritance with Hb J-Singapore, which acts as a secondary modifier that ameliorates the imbalance of the globin chain, it helps in the production of the intermedia phenotypic features.
The primary and significant role of NGS is particularly important in order to make a conclusive diagnosis of thalassemia and evaluate the unresolved cases. As seen in case 12, the patient had moderate anemia on her post-transfusion sample, which did not correspond with the results of her MARMS, which found a beta-thalassemia trait phenotype. Patients with -50 (G>A) carriers had normal hematological parameters, while the compound heterozygotes for -50 and β-thalassemia had a similar hematological presentation as that of the β-thalassemia trait [36]. By identifying that the patient had additional triplicated α-globin genes, which played a vital modifier role in exacerbating her phenotypic βthalassemia by affecting the erythroid maturation and causing an imbalance between the αand β-globin chains, a thalassemia intermedia phenotype was produced [37,38]. This was similar to the case reported by Steinberg-Shemer, O. et al., where whole exome sequencing (WES) was described as a useful tool in unraveling patients' diagnoses, especially with atypical presentation, and additional secondary modifiers that might exacerbate the disease presentation were determined [39]. Therefore, extensive genetic sequencing on her sample elucidated a precise genetic diagnosis. However, standard multiplex gap-PCR and MARMS-PCR were unable to detect this alpha triplication. Thus, further single-tube multiplex PCR for alpha triplication should be conducted. This is especially important to render a conclusive diagnosis for the patient and provide correct clinical management. The diagnostic NGS-based method made the detection of rare genetic variants of αand βglobin genes available to us.
In case 13, Hb Malay was initially described in 1989 as originating from Malaysia and was found in around 15% of the Malaysian population [40,41]. However, as Hb Malay co-eluted within HbA in CE and HPLC, molecular analysis can only make a definitive diagnosis. Patients with Hb Malay usually have a mild b-thalassemia phenotype, with mild microcytosis and elevated HbA2 levels [42,43].
In the genomic era, NGS may play a significant role in screening and diagnosing thalassemia, especially by filling the gaps and solving complex genomics riddles. However, there are still a few challenges in implementing NGS in thalassemia, especially in a developing country with limited funding resources. Nevertheless, NGS has improved precision and covers an extensive spectrum of mutations. The most common NGS technologies used worldwide consist of whole exome sequencing, which covers the entire exome, whole genome sequencing, which covers all genes and non-coding DNA, and targeted region gene sequencing, which covers around 10-500 genes [44,45].
Few findings in the cases mentioned above demonstrate the need for multiple molecular methods to confirm the diagnosis, especially when identifying alpha-and betathalassemia. Even for alpha-thalassemia, two methods are required, as the GAP-PCR method is unable to detect those with non-deletional mutations. NGS, using amplicons, has proven to be an efficient tool that can simultaneously detect αand β-thalassemia variants and resolve complicated cases of thalassemia that would have stayed undiagnosed [46,47]. This study used Illumina NGS based on the sequencing of the synthesis and fluorescence excitation method. Therefore, the optical instrument was used to record the fluorescence signal, convert it into bases, and align it to a reference genome with the bits of help from bioinformatics to analyze the possibility of indels, the significant copy number variations, and the single nucleotide polymorphism [48].
A study by Shang et al. also described targeted NGS as the most cost-effective treatment for thalassemia because of the modest cluster sizes. For uncommon copy number variation (CNV) identification and breakpoint estimate, a targeted NGS should ideally be able to detect point mutations in HBA, HBB, HBD, and HBG. It should also include uniform reads spanning the neighboring genes. Direct detection of common CNV is accurate when CNV spanning reads are added [29]. Thus, NGS is an efficient technology that facilitates thalassemia screening. The diagnosis of α-thalassemia is difficult, since the α-globin gene is highly variable. Hence, the choice of the diagnostic approach depends on the laboratory's equipment, competence, associated costs, and facilities [49]. Figure 4 illustrates on the proposed workflow involving targeted NGS, especially in a low-income country. Overall, while NGS had the ability to detect a wider range of mutations, several pitfalls were identified. One of the main limitations was that our target part only covered three globin genes, which were HBA1, HBA2, and HBB; therefore, the panel design still did not include all potentially significant genes that might influence the patients' phenotypes. Aside from that, it had the ability to only detect a few specific globin genes involving the primary modifier and secondary modifier but none for the tertiary modifier. Other than being more expensive and requiring more DNA samples to perform, compared to conventional methods, large HBA gene deletions were harder to detect using NGS methods, due to their breakpoints being embedded within long, repetitive sequences [50]. There was still a potential for errors involving the quality of the DNA samples, sequencing, and the bioinformatics pipeline.
Therefore, additional whole exon or whole genome sequencing might still be required, especially for those with discordant phenotypes and genotypes. Otherwise, the aim of this study was initially to illustrate the practicality and precise method to detect known variants related to clinical diagnosis, especially in a developing country.

Conclusions
In conclusion, to our knowledge, the results of this study should be heeded, as this is the first report on the usefulness of targeted NGS concerning the biological and phenotypic features of thalassemia, especially in a developing population, though it only involves a small cohort of the population. Therefore, a comprehensive targeted next-generation sequencing (NGS) test would immensely facilitate the diagnosis of thalassemia.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.