A Novel Algorithm Using Cell Population Data (VCS Parameters) as a Screening Discriminant between Alpha and Beta Thalassemia Traits

Thalassemia is one of the major inherited haematological disorders in the Southeast Asia region. This study explored the potential utility of red blood cell (RBC) parameters and reticulocyte cell population data (CPD) parameters in the differential diagnosis of α and β-thalassaemia traits as a rapid and cost-effective tool for screening of thalassemia traits. In this study, a total of 1597 subjects (1394 apparently healthy subjects, 155 subjects with α-thalassaemia trait, and 48 subjects with β-thalassaemia trait) were accrued. The parameters studied were the RBC parameters and reticulocyte CPD parameters derived from Unicel DxH800. A novel algorithm named αβ-algorithm was developed: (MN-LMALS-RET × RDW) − MCH) to discriminate α from β-thalassaemia trait with a cut-off value of 1742.5 [AUC = 0.966, sensitivity = 92%, specificity = 90%, 95% CI = 0.94–0.99]. Two prospective studies were carried: an in-house cohort to assess the specificity of this algorithm in 310 samples comprising various RBC disorders and in an interlaboratory cohort of 65 α-thalassemia trait, and 30 β-thalassaemia trait subjects to assess the reproducibility of the findings. We propose the αβ-algorithm to serve as a rapid, inexpensive surrogate evaluation tool of α and β-thalassaemia in the population screening of thalassemia traits in geographic regions with a high burden of these inherited blood disorders.


Introduction
Thalassaemia and haemoglobin variant disorders occur widely worldwide, with an estimation of 7% of the world population being carriers, as reported by the World Health Organisation (WHO) [1]. About 300,000-500,000 children were diagnosed with significant haemoglobin disorders, and of these, about 80% were born in developing countries [1]. In Malaysia, thalassaemia is the commonest single gene disorder characterised by defects in synthesising one or more globin chains [2]. Individuals with homozygous and double heterozygous mutations are associated with thalassaemia major or intermedia phenotype, high morbidity and mortality, whereas individuals with heterozygous mutations are carriers without exhibiting adverse morbidity [2]. Alpha (α) and beta (β) thalassaemia are the commonest types of thalassaemia in Malaysia [3].
Malaysian Thalassaemia Registry (2009) revealed that 3310 out of 4541 registered patients consist of transfusion-dependent β-thalassaemia major and HbE/β-thalassaemia patients [4]. Clinical management of these patients includes transfusion, iron chelation therapy, and haematopoietic stem cell transplantations, which are expensive and a burden to the country's healthcare system [5]. Besides patient management, the National Thalassaemia Prevention and Control Programme was actively carried out nationwide to provide population screening and counselling at primary healthcare facilities and health education and promotions to disseminate knowledge and awareness about the consequences of these disorders [4].
α-thalassaemia is characterised by the deficiency or absence of α-globin chain synthesis due to deletions of one or more α-globin genes located in chromosome 16 [6]. The severity of α-thalassaemia is characterised by the number of gene deletions, where single gene deletions result in α-thalassaemia silent carriers, two gene deletions ensue α-thalassaemia trait (minor), three gene deletions eventuate in haemoglobin H (HbH), and four-gene deletions result in haemoglobin Bart's, which usually results in fatal hydrops fetalis that have very high fatality rate [7]. On the other hand, β-thalassaemia is caused by the reduced or absent β-globin chain synthesis due to mutations in the β-globin gene on chromosome 11 [8]. Generally, β-thalassaemia is categorised as β-thalassaemia trait (minor), β-thalassaemia intermedia, and β-thalassaemia major based gene defects and severity of the decrease in β-globin chains production [9].
Preliminary identification of thalassaemia carriers is carried out by a screening program by full blood count (FBC) analysis as the baseline test to scrutinise the red blood cell indices, followed by morphologic examination of peripheral blood smears and subsequent confirmation by high-performance liquid chromatography (HPLC), haemoglobin (Hb) electrophoresis, and molecular genetics testing [4]. Each of the confirmation tests mentioned has its limitation, necessitating a combination of a few tests for differential diagnosis of α and β-thalassaemia. For instance, HPLC is not sufficient to discreetly detect variants as it is not sensitive and specific enough for an α-thalassaemia detection, especially in the presence of haemoglobin Constant Spring (non-deletion α-thalassaemia), indicated by the presence of a very tiny peak of Hb that is often missed [10,11]. In contrast, it is seen on the capillary electrophoresis (CE) electropherogram. Besides that, Hb electrophoresis also cannot be used as the sole technique to distinguish αand β-thalassaemia. Haemoglobin electrophoresis with β-thalassaemia trait usually shows reduced or absent adult haemoglobin (HbA), elevated levels of haemoglobin A2 (HbA2), and increased foetal haemoglobin (HbF) [10,12]. However, a normal concentration of HbA2 does not rule out the β-thalassaemia trait, especially if there is a concurrent iron deficiency or δ-thalassaemia, which can lower HbA2 levels into the normal range. Therefore, a combination of other methods is required for confirmatory diagnosis [12]. Combinations of these multiple testing are rather costly and require expertise to make accurate and reliable diagnoses as there are substantial overlaps among these disorders [10]. Numerous combinations of red cell indices have been suggested to distinguish different types of thalassaemia as this would allow cost optimisation as only patients requiring further investigations are subjected to expensive haemoglobin electrophoresis and molecular genetics testing.
Along with the advent of new generation full blood count (FBC) analysers, advanced parameters such as cell population data (CPD), which measure the characteristics of cells based on multiple light scatters, are being explored to understand the diagnostic utilities of these parameters better. The CPD parameters provide information on the volume (V), conductivity (C), and light scatter angles (ALL, LALS, LMALS, UMALS, and MALS) for reticulocytes, which are important in the assessment of bone marrow red blood cell production efficiency and detection of haemoglobin disorders such as thalassaemia and anaemia [13]. In this study, we explored the potential utility of reticulocyte CPD and aimed to construct an algorithm that could serve as a rapid tool and an inexpensive surrogate method for the discrimination of αand β-thalassaemia traits in populationbased thalassemia screening programs held not only in Malaysia but also in other countries with a high burden of these red cell disorders.

Materials and Methods
A total of 1597 subjects were included in the primary assessment and algorithm development cohort. The subjects were recruited between September 2011 and February 2019 in a study to establish reference intervals for haematological parameters in Malaysian adults reported in Ambayya et al. (2014) [14]. These cases were retrospectively analysed and categorised into 1394 apparently healthy subjects, 155 subjects with α-thalassemia trait, and 48 subjects with β-thalassemia trait.
Two ethical approvals were obtained for this study from the Medical Research Ethics Committee of the Ministry of Health Malaysia. First, for the recruitments of subjects in the reference interval establishments for haematological parameters (Research ID 10-277-5480) and second, for the exploration of haematological parameters available in FBC analysers in haematological disorders (NMRR 17-2708-38327). Written informed consent was obtained from all subjects before recruitment. Samples were processed according to the recommendation by International Council for Standardisation in Haematology [15][16][17].
All samples were processed as described in Ambayya et al. (2014) [14]. Initial screening test to rule out anaemia and thalassemia include full blood count (FBC) analysis of red cell indices (MCH < 27 pg, MCV < 80 fl, RDW > 14%) that was performed on Unicel DxH 800 (Beckman Coulter, Miami, FL, USA). Then, peripheral blood film (PBF) was processed using SP1000i automated slide maker (Sysmex, Kobe, Japan) and reviewed by qualified laboratory personnel. Haemoglobin analysis was carried out on Capillarys 2 (Sebia, France) and followed by H inclusion detected by supravital brilliant cresyl blue (BCB) staining (Merck Millipore, Darmstadt, Germany). Iron studies assayed in this study include serum ferritin (Modular E170, Roche, Switzerland) and soluble transferrin receptor (Cobas Integra 400, Roche, Switzerland). In subjects with increased serum ferritin [males > 400 ng/mL, females > 150 ng/mL], C-reactive protein was performed using AU480 (Bekman Coulter, USA). DNA testing for confirmation of α -thalassaemia carriers was conducted in cases selected cases that required definitive diagnosis. The analyses' pipeline adopted in this study was based on Malaysia's Management of transfusion-dependent thalassaemia (Clinical Practice guideline) v2009 [4].
Comparison of red blood cell (RBC) and reticulocyte routinely reported parameters, research parameters, and cell population data (CPD) were retrieved from Unicel DxH 800. Description of the CPD parameters is in Supplementary S1. Several strategies were developed first to scrutinise these parameters between the control group (apparently healthy individuals) and subjects diagnosed with α and β-thalassaemia and then select significant indices for a mathematical algorithm development that displayed the largest area under the curve (AUC) as a single parameter.
All statistical analyses were performed using IBM SPSS Statistics 22 software (SPSS, Chicago, IL, USA). The Kolmogorov-Smirnov test of normality was carried out for distribution assessment of each parameter (RBC parameters, research parameters, and CPD). One-way variance analysis (ANOVA) with homogeneity of variance test was conducted and followed by a post-hoc test. Tukey test was done on parameters that fulfilled the homogeneity of variance, while Games-Howell test for the parameters that did not fulfil the homogeneity of variance. Then, the Welch test was done to verify parameters that did not fulfil the homogeneity of variance. For the initial assessment of differences between the α and β-thalassaemia trait subjects, an independent t-test was conducted to test the parameters at a significance level of 0.05. Receiver Operating Characteristic (ROC) curves were generated for parameters that yielded p-values of <0.05. ROC analysis with p-values of <0.05 and AUC > 0.8 was considered significant in this study, and for each shortlisted parameter, cut-off points were determined by considering the sensitivity and specificity.
A novel algorithm, termed as αβ-algorithm, was formed by utilising routinely reported RBC parameters and reticulocyte CPD with AUC > 0.9 to discriminate α and β-thalassaemia traits. This algorithm was tested in comparison with a control group cohort (apparently healthy individuals), α and β-thalassaemia traits. Following that, two prospective validation cohorts were designed and carried out to ensure the reliability, repro-ducibility, and robustness of (αβ-algorithm) when heterogenous RBC disorders are present in a diagnostic setting: an in-house cohort and an interlaboratory cohort in collaboration with Gribbles Pathology, Malaysia. For the in-house validation cohort, 310 samples were recruited from various red cell disorders subjects (119 α-thalassemia trait, 48 β-thalassemia trait, 15 haemoglobin E trait, 84 iron deficiency anaemia (IDA), and 44 iron deficiency (ID)). We compared the distribution of cases among these red cell disorders to assess the overlap of cut-offs between other cases (non-α or β-thalassemia trait). As for the interlaboratory cohort, 95 samples were included (65 α-thalassemia trait, 30 β-thalassemia trait) recruited during general health screening programs. We used VassarStats clinical calculator (http://vassarstats.net/clin1.html, accessed on 16 September 2021) to obtain sensitivity, specificity, positive, and negative predictive values to discriminate α and β-thalassaemia traits [15].

Development of (αβ Algorithm) Using a Primary Retrospective Cohort
A total of 1597 subjects (1394 control group/apparently healthy; 155 α-thalassemia trait; 48 β-thalassemia trait) were assessed in this study. The distribution of cases and summary of statistics for RBC parameters, sTFR, serum ferritin, and haemoglobin analysis (HBA, HBA2, and HBF) are detailed in Table 1. After performing normality testing, we then compared the groups (control, α, and β-thalassaemia traits) by performing ANOVA and post hoc tests, and the diagnostic performance was determined by ROC analyses, as summarised in Supplementary S2. ROC analysis involved 203 subjects (155 α-thalassaemia trait and 48 β-thalassaemia trait), in which five parameters were notably higher (MCV, MCH, MAF, MN-V-RET, MN-V-NRET) in α than in the β-thalassaemia trait. The following parameters were lower in α than in the β-thalassaemia trait: RDW, MN-MALS-RET, MN-LMALS-RET, and MN-LMALS-NRET, as shown in Table 2. Three parameters (MN-LMALS-RET, MCH, and RDW) possessed the largest AUC distinguishing α from β-thalassaemia. To improve the sensitivity and specificity of these parameters, a mathematical algorithm (αβ-algorithm) was devised (MN-LMALS-RET × RDW) − MCH) to robustly distinguish α from β-thalassaemia trait with a cut-off of 1742.5, the AUC, sensitivity, and specificity were 0.966 (95% CI: 0.94, 0.99), 92% and 90%, respectively. ROC curves of RBC parameters and reticulocyte CPDs are depicted in Figure 1a,b. Figure 2 displays the ROC curve for the αβ-algorithm. Box and whisker plots for the for MCH, RDW, MN-LMALS-RET, and the αβ-algorithm that delineate the α,β-thalassaemia trait and the control group (apparently healthy subjects) are represented in Figure 3a

In-House Validation
In-house validation in a cohort of 310 subjects comprising various red cell disorders as summarised in Table 3 and Supplementary S3 and corresponding box and whisker plot is depicted in Figure 5. As shown, between the quartile 1 (Q1) to quartile 3 (Q3) of α thalassemia and β thalassemia traits, there was no significant overlap, comprising the majority of the cases included in this cohort. The overlapping cases require confirmation by HPLC and DNA testing. There is a significant overlap between the IDA and β thalassemia trait.   figure), iron deficiency, α thalassaemia trait (labelled as alpha thalassemia), and β-thalassaemia trait (labelled as beta-thalassemia). Outliers are marked as "*" and " 0 " in the HbE trait and β-thalassaemia trait groups.

Discussion
Distinguishing between various thalassaemia traits is essential in clinical decisionmaking as it will influence the treatment options and outcome of the patients [16]. However, the diagnosis of thalassaemia traits requires time and resources as various screening and confirmatory testing, including FBC, morphological review of blood smears, haemoglobin electrophoresis, and molecular genetic analysis to establish a reliable and robust diagnosis [17]. Several algorithms based on RBC indices have been proposed to differentiate iron deficiency anaemia and thalassaemia traits [17] [25], and RDW Index [26].
In more recent studies, researchers have been exploring machine learning algorithmbased studies with advancements in data science [27,28]. In a study in Thailand, a webbased prediction tool for discrimination of thalassemia trait and IDA was developed using a machine learning algorithm. However, in this study, the authors did not delineate the subtypes of thalassemia but created a support vector machine (SVM) model to distinguish IDA from thal trait, named ThalPred [27]. One of the largest studies performed in Israel validated Shine's formula and in-house developed SVM formula. This study included a total of 64,586 subjects. Their SVM formula displayed high sensitivity(>98%) and >99.77% negative predictive value that is robust in distinguishing the β-thalassemia carrier from normal count subjects and iron-deficient women [28].
Previously, using the CPD parameters generated by Beckman Coulter DxH 800, Ng and the team proposed an algorithm to differentiate IDA from thalassaemia traits among its subjects. With a cut-off value of 23, the area under the curve (AUC) of 0.995 (95% CI of 0.99-1.00), the algorithm achieved a sensitivity of 97% and specificity of 99.1%. They suggested that no biochemical marker of iron status such as serum ferritin testing is required with this formula; hence, simplified diagnostic workup of IDA and thalassemia was proposed [16]. However, these formulas were devised from various populations with varying sensitivity and specificity, with none specific for differentiation of α and βthalassaemia, without incorporating reticulocyte CPD parameters, as done in this present study [17,29,30].
In this present study, we successfully developed an algorithm (αβ-algorithm) based on reticulocyte CPD parameters that will serve as the downstream tool after FBC is performed to distinguish α versus β-thalassaemia trait, using Beckman Coulter Unicel DxH 800. The ROC-curve analysis revealed three parameters with the highest AUC, MN-LMALS-RET, MCH, and RDW, with good sensitivity and specificity (Figures 1a,b and 2). The αβalgorithm was devised by combining the aforementioned parameters ((MN-LMALS-RET × RDW) − MCH) that resulted in higher AUC with better sensitivity and specificity to discriminate α from β-thalassaemia traits. The rationales for the αβ-algorithm were MN-LMALS-RET, which measures the red blood cell volume in the reticulocyte channel through lower median angle light scattering [32], MCH indicates the amount of haemoglobin in a red blood cell [33], and RDW helps to measure variation in red blood cell size [12].
Our prospective study revealed the usefulness of the αβ-algorithm as a downstream pipeline following FBC analysis, as depicted in Figure 4. Based on Figure 5, when this algorithm was evaluated in a cohort of various red cell disorders, there was an overlap between the subgroups, especially between the β-thalassemia trait and IDA groups. Nevertheless, the diagnostic pipeline for cases suspected as IDA differs in the Malaysian clinical practice guideline, so this overlap will not impact the role of this algorithm in the differential diagnosis of α and β-thalassemia. Interlaboratory validation showed that the findings are reproducible in another laboratory setting with similar Unicel DxH 800, hence reiterating the usefulness of the αβ-algorithm in the screening of thalassemia cases, in particular, α and β-thalassemia in geographical locations with a high prevalence of thalassaemias such as Malaysia and other Asian countries.
There are several limitations of this study: First, it included a relatively low number of α and β-thalassemia cases. Secondly, β-thalassemia group, the maximum value of MCV was 82 that overlaps with the minimum MCV of 80 in the healthy control group. In such cases, other RBC indices (RBC, MCH) and HPLC would aid in the differential diagnosis of β-thalassemia. Ideally, molecular analysis for both alpha and beta mutations needs to be performed on all samples so as not to miss out on the silent α or β thalassemia during screening. However, we did not perform molecular analysis on all cases because of the high testing cost and rarity of silent β-thalassemia [34,35].
We propose the vigorousness of this αβ-algorithm be validated in a larger cohort in other geographical locations that exhibit high prevalence and incidence of α and βthalassaemia traits. Based on this study, we recommend the FBC analysis is performed within 6 h of sampling following International Council for Standardisation in Haematology (ICSH) guidelines [36]. Apart from that, the transportation and storage of samples condition must meet the guidelines recommended by The Clinical and Laboratory Standard Institute (CLSI) [37]. Any deviation from adhering strictly to these guidelines may affect the sensitivity and specificity of this algorithm as CPD parameters are highly affected by the structural changes of the cells.

Conclusions
Devising an algorithm that accurately distinguishes α and β-thalassaemia traits using CPD parameters derived from FBC analyser is essential in the pipeline of large populationbased screening carried out in Malaysia as high prevalence of these inherited disorders are reported. Implementing the αβ-algorithm in the screening laboratory will promote cost optimisation as only suspected subjects will be included in the downstream pipeline before performing haemoglobin electrophoresis and/or further specific genetic testing, as depicted in Figure 4. The applicability of the αβ-algorithm is relatively straightforward without involving any additional cost to the diagnosis and requires no sophisticated analysis and expertise. Hence, we strongly recommend adopting this αβ-algorithm in all screening laboratories with access to similar CPD parameters nationwide and other geographical regions with a high prevalence of thalassaemia-related disorders [38,39].  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data used in this study are available as Supplementary Materials (Supplementary S1-S4).