Diagnostic Accuracy of Various Immunochromatographic Tests for NS1 Antigen and IgM Antibodies Detection in Acute Dengue Virus Infection

Introduction: Rapid diagnostic tests (RDTs) were evaluated, in this paper, for their utility as a reliable test, using resource-constrained studies. In most studies, NS1 antigen and immunoglobulin M (IgM)-based immunochromatographic tests (ICTs) were considered for acute phase detection. We aimed to evaluate the diagnostic accuracy of NS1, IgM, and NS1/IgM-based ICTs to detect acute dengue virus (DENV) infection in dengue-endemic regions. Methods: Studies were electronically identified using the following databases: MEDLINE, Embase, Cochrane Library, Web of Science, and CINAHL Plus. Keywords including dengue, rapid diagnostic test, immunochromatography, sensitivity, specificity, and diagnosis were applied across databases. In total, 15 studies were included. Quality assessment of the included studies was performed using the QUADAS-2 tool. All statistical analyses were conducted using RevMan, MedCalc, and SPSS software. Results: The studies revealed a total of 4135 individuals, originating largely from the Americas and Asia. The prevalence of DENV cases was 53.8%. Pooled sensitivities vs. specificities for NS1 (only), IgM (only) and combined NS1/IgM were 70.97% vs. 94.73%, 40.32% vs. 93.01%, and 78.62% vs. 88.47%, respectively. Diagnostic odds ratio (DOR) of DENV for NS1 ICTs was 43.95 (95% CI: 36.61–52.78), for IgM only ICTs was 8.99 (95% CI: 7.25–11.16), and for NS1/IgM ICTs was 28.22 (95% CI: 24.18–32.95). ELISA ICTs yielded a DOR of 21.36, 95% CI: 17.08–26.741. RT-PCR had a DOR of 40.43, 95% CI: 23.3–71.2. Heterogeneity tests for subgroup analysis by ICT manufacturers for NS1 ICTs revealed an χ2 finding of 158.818 (df = 8), p < 0.001, whereas for IgM ICTs, the χ2 finding was 21.698 (df = 5), p < 0.001. Conclusion: NS1-based ICTs had the highest diagnostic accuracy in acute phases of DENV infection. Certain factors influenced the pooled sensitivity, including ICT manufacturers, nature of the infection, reference method (RT-PCR), and serotypes. Prospective studies may examine the best strategy for incorporating ICTs for dengue diagnosis.


Introduction
Dengue is a flavivirus infection spread by Aedes aegypti and Aedes albopictus mosquitoes, with four antigenically distinct dengue viruses (DENVs, serotypes 1-4) causing infection, and is a significant public health problem [1]. It has rapidly spread to nearly half the world's population and has caused epidemics in these regions with continued geographical expansion [2]. It has caused 400 million annual infections, which have risen exponentially Studies were excluded if they had any of the following characteristics: (1) use of inappropriate reference assays to assign true positive/true negative status to study samples, including 'in-house' assays for which the diagnostic accuracy had not been previously established; (2) inappropriate study population (such as convalescent samples only); (3) the study was limited to the detection of IgG rather than IgM and IgG; (4) the number of study samples was insufficient; (5) incomplete description of samples, such that it was impossible to determine the timing of sample collection; (6) errors or inconsistencies in the published study data; (7) the exclusion of indeterminate results; (8) partial verification of the study samples or the use of multiple reference assays; or (9) the assay took more than 60 min to perform, such as immunoblot (IBT)-style assays. The list of excluded full-text studies is given in Supplementary Table S1.

Statistical Analysis
The 'gold standard' (or reference) assay was compared with the index test to define true positive (TP), false positive (FP), false negative (FN), and true negative (TN) values. The measures of diagnostic accuracy, sensitivity (SN), specificity (SP), positive likelihood ratio (LR+), negative likelihood ratio (LR−), positive predictive value (PPV), negative predictive value (NPV), Fisher Exact P-values, and diagnostic odds ratio (DOR) were then computed (Habbema et al., 2002). Individual study results were pooled to generate an overall estimate of diagnostic accuracy. Chi-square and I 2 (Higgins et al., 2003) statistics were calculated before pooling to detect significant heterogeneity between subgroups. NS1 (only), IgM (only), NS1/IgM subgroup analysis in the acute phase DOR, Error Odds Ratio, Phi coefficient (to measure the strength of the relationship), and relative improvement over chance (RIOC, to measure predictive efficiency) were additionally computed. Utilizing the 2-by-2 table data, the Fisher Exact P-values and other relevant indices including an analysis of risk factors for analysis of the effectiveness of a diagnostic criterion for dengue based on multiple parameters (SN, SP, PPV, NPV, DOR, and error odds ratios), and other measures of association (Phi coefficient), were computed. The confidence intervals for the estimated parameters were computed by a general method as listed by Fleiss and colleagues (1981) [23]. The confidence intervals for RIOC and Phi coefficient as measures of predictive efficiency and strength of association in 2-by-2 tables were based on Farington and Loeber (1989) [24]. Definitions of listed statistical tests are enlisted in Table 1. A chi-square result of p < 0.1 was considered significant, given the low power of the test. I 2 values had a continuous scale of 0-100%, with 0% defining no heterogeneity and 25, 50, and 75% tentatively assigned as limits of low, medium, and high heterogeneity (Higgins et al., 2003). If heterogeneity was not significant, a Mantel-Haenszel fixed-effects model (Mantel and Haenszel, 1959) was used to calculate results, and, when significant, a random-effects model was used (DerSimonian and Laird, 1986). Summary receiver operator characteristics (SROC) (Littenberg and Moses, 1993) were also calculated to give a final area under the curve (AUC) value for pooled and subgroup analyses. A summary ROC curve is a plot of the combined SN on the y-axis against (1-SP) on the x-axis. The 45 • diagonal line connecting (0, 0) to (1, 1) is the ROC curve corresponding to random chance. The ROC curve for the gold standard is the line connecting (0,0) to (0,1) and (0,1) to (1,1). Therefore, the summary lines that were closest to the upper-left corner of the plot were considered nearest to the gold standard dengue testing format. Analyses were performed using RevMan 5.4.1, MedCalc Software (v 20.104), and IBM ® SPSS ® software (v. 27).

Statistical Term Definition
True Positives (TP) Individuals with the Disease with the Value of the Parameter of Interest above the Cut-Off.
False Positive (FP) Individuals without the disease with the value of the parameter of interest above the cut-off.
True Negative (TN) Individuals without the disease with the value of the parameter of interest below the cut-off.
False Negative (FN) Individuals with the disease with the value of the parameter of interest below the cut-off.
Positive Likelihood Ratio (LR+) Measures how likely it is that a positive test result will occur in individuals with the disease compared with those without the disease.
Negative Likelihood Ratio (LR−) Measures how likely it is that a negative test result will occur in individuals with the disease compared with those without the disease.
Positive Predictive Value (PPV) Reports the proportions of positive diagnostic test results and the true positive results.
Negative Predictive Value (NPV) Reports the proportions of negative diagnostic test results and the true negative results.
Diagnostic Odds Ratio (DOR) A general estimate of the discriminative power of diagnostic procedures. It tests the ratio of positivity odds in individuals with disease related to the odds of individuals without the disease.

Error Odds Ratio (EOR)
Measures the likelihood of errors in diagnostic tests in individuals with the disease compared with those without.
Phi Coefficient Also called a mean square contingency coefficient, this measures the association between two variables.
Relative Improvement Over Chance (RIOC) This measures the predictive efficiency of the test.

Results
In total, 1652 studies were identified by electronic searches. Abstracts were read, and 46 studies were retained for full-text quality assessment. Five studies were identified by reading reference lists and hand-searching journals (umbrella review search). In total, 51 studies were selected for full-text review against the inclusion and exclusion criteria. Only 15 studies were included according to the selection criteria, whereas 36 were excluded (Supplementary Table S1). Figure 1 shows a flow chart of the selection procedure.

Quality of Included Studies
The risk of bias and applicability concerns summary for every included study is depicted in Figure 2. Firstly, on assessing the risk of bias, nine studies had unclear risk and six had a low risk of bias for patient selection. The index test had an unclear risk of bias in six studies, whereas nine studies had a low risk of bias. On noting reference standards, all 15 studies had a low risk of bias. Flow and timing assessment revealed eight studies with unclear risk of bias, five studies with low risk of bias, and two with a high risk of bias. Secondly, the applicability concerns assessment revealed that 14 studies had a low risk of bias whereas only one had an unclear risk of bias. Index test assessment yielded low risks of bias for all 15 studies. Finally, the reference standard assessment determined that 14 studies had a low risk of bias and only one had an unclear risk.

Narrative Review of Included Studies
A summary of all included data is shown in

Quality of Included Studies
The risk of bias and applicability concerns summary for every included picted in Figure 2. Firstly, on assessing the risk of bias, nine studies had unc six had a low risk of bias for patient selection. The index test had an unclear r six studies, whereas nine studies had a low risk of bias. On noting reference s 15 studies had a low risk of bias. Flow and timing assessment revealed eight unclear risk of bias, five studies with low risk of bias, and two with a high Secondly, the applicability concerns assessment revealed that 14 studies had bias whereas only one had an unclear risk of bias. Index test assessment yield of bias for all 15 studies. Finally, the reference standard assessment determ studies had a low risk of bias and only one had an unclear risk.

Narrative Review of Included Studies
A summary of all included data is shown in

Individual and Pooled Study Diagnostic Accuracy Results
The individual studies using dengue ICT for dengue NS1 only (Pooled SN = 70.97%, SP = 94.73%), IgM only (Pooled SN = 40.32%, SP = 93.01%), and combined NS1/IgM (Pooled SN = 78.62%, SP = 88.47%) detection in acute studies were pooled and analyzed separately, as discussed in subsequent sections. The summary forest plot outcomes for all tests are depicted in Figure 3.

Individual and Pooled Study Diagnostic Accuracy Results
The individual studies using dengue ICT for dengue NS1 only (Pooled SN = 70.97%, SP = 94.73%), IgM only (Pooled SN = 40.32%, SP = 93.01%), and combined NS1/IgM (Pooled SN = 78.62%, SP = 88.47%) detection in acute studies were pooled and analyzed separately, as discussed in subsequent sections. The summary forest plot outcomes for all tests are depicted in Figure 3.
The SN, SP, PPV, NPV, LR+, LR− and Fisher Exact P-values for individual studies are presented in Table 4. The cumulative SN was 40.32% (SD = 26.02) and the value ranged from 11.7% to 89.9%. The cumulative SP was 93.01% (SD = 6.21) and it ranged from 83.9% to 100%. The cumulative PPV was 0.8074 (SD = 0.192), and it ranged from 0.39 to 1. The cumulative NPV was 0.6062 (SD = 0.205), and it ranged from 0.3 to 0.91. The cumulative LR+ was 6.487 (SD = 9.06), and it ranged from 0.85 to 28.56. The cumulative LR− was 0.6496 (SD = 0.293), and it ranged from 0.1 to 1.03.
The SN, SP, PPV, NPV, LR+, LR− and Fisher Exact P-values for individual studies are presented in Table 4. The cumulative SN was 40.32% (SD = 26.02) and the value ranged from 11.7% to 89.9%. The cumulative SP was 93.01% (SD = 6.21) and it ranged from 83.9% to 100%. The cumulative PPV was 0.8074 (SD = 0.192), and it ranged from 0.39 to 1. The cumulative NPV was 0.6062 (SD = 0.205), and it ranged from 0.3 to 0.91. The cumulative LR+ was 6.487 (SD = 9.06), and it ranged from 0.85 to 28.56. The cumulative LR− was 0.6496 (SD = 0.293), and it ranged from 0.1 to 1.03.

Subgroup Analysis by ICT Manufacturer
Subgroup analysis was performed to determine the influence of multiple ICT assays on diagnostic accuracy and inter-study heterogeneity. Studies were grouped according to the ICT assay used to calculate heterogeneity and diagnostic accuracy results for sample verification. Heterogeneity trends (Cochran's Q and χ 2 ) were calculated for studies that used the following nine ICTs for NS1 antigen detection, including: (

Diagnostic Accuracy by Reference Assay
A subgroup analysis was performed to determine the influence of multiple reference assays on study diagnostic accuracy. Studies were grouped according to the reference assay used for sample verification to calculate the diagnostic accuracy results. Studies used: (1) ELISA only, (2) RT-PCR only, (3) HAI only, (4) or multiple reference assays (two or more).

Diagnostic Accuracy by Primary/Secondary Disease
Primary and secondary infection ICT interpretation (as defined by the manufacturer using IgM and IgG results) was compared with a valid reference assay to detect primary and secondary dengue infection. The overall sensitivity for primary infection was 75.68%, with SD = 25.45, whereas for secondary infection, the sensitivity was 71.9% (SD = 20.12). The dengue ICTs showed more sensitivity for primary infection.

Identification of Different DENV Serotypes
The sensitivity was different for DENV 1-4 serotypes between subgroups. The dengue ICTs gave the most sensitive results for DENV 3 (cumulative SN = 83.63%). DENV 1 was the second most sensitive test finding, with cumulative SN being 81.3%. Findings for DENV 2 were the third most sensitive (cumulative SN = 75.22%), followed by DENV 4 (62.06%).

Summary Receiver Operating Characteristics (SROC) Findings
The SROC curve analysis of NS1, NS1/IgM, and IgM study results are appended in Figure 5. The plot determined slightly improved (optimum SN and SP) results with NS1 alone, followed by NS1/IgM, and lastly, IgM alone. Therefore, the summary line that is closest to the upper-left corner of the plot (NS1 in this case) is considered nearest to the gold standard dengue testing format; this is closely followed by combined NS1/IgM testing, and lastly IgM, which is the furthest from the gold-standard metric.

Identification of Different DENV Serotypes
The sensitivity was different for DENV 1-4 serotypes between subgroups. The dengue ICTs gave the most sensitive results for DENV 3 (cumulative SN = 83.63%). DENV 1 was the second most sensitive test finding, with cumulative SN being 81.3%. Findings for DENV 2 were the third most sensitive (cumulative SN = 75.22%), followed by DENV 4 (62.06%).

Summary Receiver Operating Characteristics (SROC) Findings
The SROC curve analysis of NS1, NS1/IgM, and IgM study results are appended in Figure 5. The plot determined slightly improved (optimum SN and SP) results with NS1 alone, followed by NS1/IgM, and lastly, IgM alone. Therefore, the summary line that is closest to the upper-left corner of the plot (NS1 in this case) is considered nearest to the gold standard dengue testing format; this is closely followed by combined NS1/IgM testing, and lastly IgM, which is the furthest from the gold-standard metric.

Discussion
We conducted a systematic review and meta-analysis of NS1, and IgM and combined NS1/IgM antigen detection with different commercially-available ICTs to detect acute DENV infection. Five thousand two hundred two individuals were screened with a prevalence rate of 55.9% across 19 studies in dengue-endemic countries. We found that NS1 ICTs had more diagnostic potential (DOR: 48 (19.3-78.5%) and comparable specificity (88.1-95.2%) was noted across six IgM ICT manufacturers. RT-PCR was the most accurate reference test, whereas ELISA was the least. The sensitivity of ICTs to detect primary infection was 77.9% compared with 66% for secondary infection in a subset of the studies that reported such data. The DENV-3 and DENV-1 serotypes were the most likely to be detected, compared with other serotypes. Overall, NS1 ICTs were the most predictive of acute DENV infection, with a higher detection in primary infection and DENV-3 serotypes. By different ICT manufacturers, the sensitivity varied, whereas the specificity was comparable. Our results emphasize the high diagnostic accuracy of NS1 ICTs in the acute phase, with certain commercial ICTs having higher sensitivities.
Our primary reason for conducting this study was to consider ICTs in detecting DENV in the acute phase to improve prompt diagnosis and early treatment. Many studies have demonstrated a high pooled sensitivity of NS1 ICTs and considerably low performance of IgM ICTs [14][15][16][17][18]. When used as a screening modality, ICTs may be part of the diagnostic algorithm that accounts for their high number of false negatives to optimize their performance [40]. While we did find a higher sensitivity of combined NS1/IgM ICTs, the overall diagnostic accuracy was prominently higher for NS1 ICTs. We only included acute-phase data, meaning only individuals presenting within seven days after the onset of symptoms were included. As we saw large inter-ICT manufacturer variability, we suggest considering ICTs with higher sensitivities/specificities for use in community settings, such as the NS1 Ag STRIP™ (Biorad Laboratories, Marnes-La-Coquette, France), similarly observed by Zhang et al. [16]. We considered these ICTs in the acute phase, since timely detection expands the diagnostic window of opportunity compared with gold standards (e.g., viral isolation, RT-PCR), which may take longer [41].
ICTs have the highest ability to distinguish serotypes DENV-3 and DENV-1, and the lowest for DENV-4, shown similarly by Zhang et al. and Shan et al. [16,17]. Among DENV-suspected individuals, NS1 ICTs and NS1/IgM ICTs are well-suited to detect acute infections of two serotypes, DENV-3 and DENV-1. Still, they are utilized at clinicians' discretion to guide acute clinical management, as observed by Lim et al. [42]. Understanding different DENV serological sensitivity helps apply ICTs as DENV-3 serotypes are more widespread in certain regions (e.g., Sri Lanka, Malaysia, Vietnam) [43]. The global distribution of different dengue serotypes has implications for diagnostic strategies, and phylogeographic relationships with serotypes in specific regions may help guide the adaptation of ICTs [43].
It was possible to separately analyze primary and secondary dengue infection in 15 studies. All serum samples were collected between 0-7 days after symptom onset; patients with a primary infection were categorized as primary infection if negative for IgG, and a secondary infection if positive for IgG. Overall, a 77.9% sensitivity rate of NS1/IgM ICTs for primary infections was higher than that observed for IgM ICTs only (71%) by Blacksell et al. [14]; secondary infections had a 66% sensitivity rate, similarly observed by Blacksell et al. [14]. We were interested in identifying the performance of NS1 and IgM-based ICTs for the detection of primary and secondary infection in the acute phase. This was because secondary infections are more likely to lead to dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS), a more severe form of dengue fever (DF) [44]. Two hallmark symptoms of DHF are bleeding and plasma leakage due to increased vascular permeability and abnormal hemostasis [45]. A major loss of vascular fluids results in DSS, which puts the patients into hypovolemic shock and is associated with a mortality rate of higher than 50% when untreated, and is a medical emergency [46].
However, studies have demonstrated that DHF occurs in primary infections, which refutes the "original antigenic sin" hypothesis that individuals with secondary infections have a higher risk of DHF/DSS and the complex nature of DHF is also associated with age, sex, serotype, and genetic background [47]. Regardless, current data suggest that secondary dengue infections have a higher frequency of severe disease. A key feature of dengue ICTs as a satisfactory diagnostic test is discriminating between the two infection states [48]. The sensitivity ranged from 66-77.9%, which was less than optimal; however, we expect the specificity to be high in acute primary and secondary infections, as shown in a previous meta-analysis [14]. We could not detect specificity rates due to the scarcity of data in our studies. Regardless, we consider NS1/IgM ICTs to be a good test for differentiation of infection type in the acute phase to screen patients at risk for severe DENV infection sequelae. While outside the scope of this paper, the combined use of IgM and IgG has been shown to increase sensitivity during the first 7 days as IgG: IgM ratio > 1 is an excellent marker of secondary infection.
Our study has certain limitations. First, there was a lack of data on the adequate characterization of the samples by mean time since the onset of symptoms. Second, there was high variability in the performance of different ICTs; we did not conduct any further analyses to account for low-performing ICTs, but it is likely that if outlier ICTs were removed, the overall diagnostic potential of ICTs would be much higher. Third, we found that the reference assays had a high rate of heterogeneity compared with one another. We expect the ICTs to have variable efficacies due to different diagnostic reference standards. However, all studies used at least one reference test with high specificity, e.g., RT-PCRs, and ELISAs. Fourth, there was high heterogeneity within studies due to different methodologies. We, however, removed any studies that failed to meet the selection criteria. There may be different prevalence rates that may act as confounders for primary and secondary dengue infection burden. However, we considered dengue-endemic countries to eliminate the geographical bias of disease burden. The high heterogeneity may also be due to different control arms across the studies. While most studies screened suspected DF patients, certain studies used negative samples from dengue-endemic and non-dengue-endemic countries and patients diagnosed with other infections similar to DENV infection. Additionally, we cannot rule out the cross-reactivity of available RDTs with other flaviviruses, especially the Zika virus (ZIKV). However, Tan et al. reported through their surveillance data, that the cross-reactivity found in the lowest reactive titers of flaviviruses was generally higher than virus titers reported in natural infections due to respective flaviviruses. The cross-reactivity challenges can ideally be addressed through detection thresholds of assays designed for specific flaviviruses, e.g., ZIKV. Nevertheless, the cross-reactivity thresholds of commercially available assays are less likely to identify false positive results among clinical specimens in flavivirus-endemic regions. Last, due to incomplete data, we did not analyze for confounders such as age and gender.
Our study has a few strengths. We documented the entire search strategy and procedures; as such, we reported reasons for the removal of studies. We analyzed data for NS1, IgM, and combined NS1/IgM diagnostic accuracy, which provides insight into their ability to rule in and rule out disease. We obtained and analyzed data for different types of infection and by serology. Another strength of ours was the robust evaluation of many commercially-available ICTs, which had not been conducted previously. Overall, we found many methodological concerns in existing studies that examined the roles of ICT for DENV infection detection. It is important to adapt clear methodological guidelines for the assessment of DENV infection diagnostic tests such as the QUADAS-2 tool [22] and CASP checklist [49]. The dramatic increase in the dengue burden globally is alarming, and there is a need to regulate already available diagnostic tools for dengue across the healthcare sector [50].

Conclusions
In conclusion, we found that NS1 ICTs have good diagnostic accuracy and excellent pooled specificity for DENV detection in acute phases of infection within 7 days postsymptom onset. The NS1 ICTs can distinguish DENV-3 and DENV-1 more accurately. Type of infection, ICT manufacturers, reference methods, and DENV serotypes influence the diagnostic accuracy of these tests. There is a critical need to evaluate different ICTs for their diagnostic accuracy using standardized methodologies. Such data may be leveraged to incorporate ICTs as part of diagnostic algorithms in dengue-endemic regions with high burdens in these settings.