Performance of Antigen Detection Tests for SARS-CoV-2: A Systematic Review and Meta-Analysis

Coronavirus disease 2019 (COVID-19) initiated global health care challenges such as the necessity for new diagnostic tests. Diagnosis by real-time PCR remains the gold-standard method, yet economical and technical issues prohibit its use in points of care (POC) or for repetitive tests in populations. A lot of effort has been exerted in developing, using, and validating antigen-based tests (ATs). Since individual studies focus on few methodological aspects of ATs, a comparison of different tests is needed. Herein, we perform a systematic review and meta-analysis of data from articles in PubMed, medRxiv and bioRxiv. The bivariate method for meta-analysis of diagnostic tests pooling sensitivities and specificities was used. Most of the AT types for SARS-CoV-2 were lateral flow immunoassays (LFIA), fluorescence immunoassays (FIA), and chemiluminescence enzyme immunoassays (CLEIA). We identified 235 articles containing data from 220,049 individuals. All ATs using nasopharyngeal samples show better performance than those with throat saliva (72% compared to 40%). Moreover, the rapid methods LFIA and FIA show about 10% lower sensitivity compared to the laboratory-based CLEIA method (72% compared to 82%). In addition, rapid ATs show higher sensitivity in symptomatic patients compared to asymptomatic patients, suggesting that viral load is a crucial parameter for ATs performed in POCs. Finally, all methods perform with very high specificity, reaching around 99%. LFIA tests, though with moderate sensitivity, appear as the most attractive method for use in POCs and for performing seroprevalence studies.


Introduction
COVID-19, caused by SARS-CoV-2, remains a global public health threat that has already claimed more than six million lives (https://covid19.who.int, accessed on 15 May 2022), with modeling estimates suggesting that this figure is probably much higher [1,2]. Vaccines, however, seem to perform well, especially after the administration of booster doses, providing moderate but short-lived protection from SARS-CoV-2 infection but significantly reducing COVID-19-related morbidity and mortality [3][4][5][6][7][8][9]. Non-pharmaceutical interventions (test-trace-isolate, hand washing, physical distancing, travel restrictions, school closures, closures of businesses, and stay-at-home orders) have also proved their effectiveness in containing the spread of the pandemic virus before the advent of vaccines [10][11][12]. Some of these measures will still be needed in our gradual efforts to return to normalcy. Testing in particular is essential to diagnosis, but also to developing and sustaining a reliable surveillance system for the years to come [13,14].
Real-time reverse transcription polymerase-chain-reaction (rt-PCR) test is the benchmark method for the clinical diagnosis of COVID-19 [15][16][17]. As such, it is designed for use or measurement of nucleocapsid (N) or spike (S) proteins of SARS-CoV-2 (qualitatively or quantitatively depending on the method used); and (c) providing the necessary data that allow the calculation of sensitivity and specificity. We included studies that reported data on cases (positive samples) and healthy controls (negative samples) as well as studies with data available only for cases (see also Section 2.5).

Data Extraction
Data extraction was performed in a predetermined Microsoft Excel ® sheet. From each study we extracted the following information: first author's last name, type of antigen used, type of sample, method of detection used, and the qPCR cycle threshold (Ct) values used for the detection of SARS-CoV-2 RNA. Additionally, the method of antigen testing used was recorded along with the brand name and the name of the manufacturer and the existence of data from the virus culture. Symptomatic and asymptomatic cases as well as male/female ratios were also recorded, if given. To obtain sensitivity and specificity measures, a 2 × 2 contingency table was constructed; thus, true positive (TP), false negative (FN), true negative (TN), and false positive (FP) results were recorded. In cases where no controls were used, we used TP and FN values only.

Study Outcomes
The primary outcome of this meta-analysis was the sensitivity and specificity of AT in relation to qPCR. Secondary outcomes included the performance of AT on different sample types (namely, nasopharyngeal, saliva, and throat samples) and by symptoms (asymptomatic versus symptomatic SARS-CoV-2 infected persons). We also explored the performance of AT across the number of qPCR Ct values (a higher Ct indicated lower viral load).

Data Analysis
The Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2 tool) was used to assess the quality of the included studies in terms of diagnostic accuracy [28]. The four domains assessed were patient selection, index test, reference standard, and flow and timing. Each domain was evaluated following classifications according to judgment, i.e., low risk, high risk, and unclear risk.
The bivariate meta-analytic method modified for the meta-analysis of diagnostic tests was used [29]. This method has been reported to be equivalent to the so-called hsROC method [30]. It uses logit-transforms of true positive rate (TPR) and false positive rate (FPR) in order to model sensitivity and specificity; it can also be used for the evaluation of betweenstudies variability (heterogeneity). Studies that include information only for logit (TPR)that is, only for sensitivity-were included in the bivariate model under the missing at random (MAR) assumption in order to maximize statistical power and allow the modeling of between-studies variability and correlation [31]. Begg's rank correlation test [32] and Egger's regression test [33] were used on logit (TPR) to evaluate the presence of publication bias. Stata13 [34] was used to perform the analysis and run the command "mvmeta" with the method of moments for multivariate meta-analyses and meta-regression [35]. Statistical significance was set at p < 0.05; meta-analysis was performed when two or more studies were available, whereas tests for publication bias and meta-regression were performed when five or more studies were available.

Characteristics of Studies
Following the literature search in Pubmed, MedRxiv, and BioRxiv by 4 July 2021, we retrieved 4700 unique articles (Figure 1). After scrutinizing abstracts and full papers and testing for eligibility criteria, we ended up with 235 articles, which included 31,387 SARS-CoV-2 infected individuals and 188,636 individuals without SARS-CoV-2 infection (total: 220,049 individuals). Two hundred and sixteen studies provided data on both cases and controls, while 19 studies reported results only for people with SARS-CoV-2 infection (Figure 1). Table 1 shows the characteristics of the included studies. All studies reported that SARS-CoV-2 infection was confirmed with qPCR of envelope (E), S or N protein according to WHO, CDC and ECDC guidelines. Various methods were used to identify or measure an antigen of SARS-CoV-2. The N antigen was investigated in 225 studies, the S antigen was investigated in eight studies, and in two studies, cumulative estimates were given for N + S or S + E + M (membrane) antigens. Four articles evaluated both N-and S-based assays. Most studies focused on rapid POC tests such as LFIA (181 studies), or FIA (38 studies). Chemiluminescence was used in 21 studies. In total, 83 different kits from 74 manufacturers and 18 in-house tests (LFIA, FIA, CLEIA) from the respective laboratories were used. Thirty-six studies used the same samples to compare different tests from different companies. Twelve studies used twelve unique techniques that are under development (LC-mass spectrometry [36,37], field-effect transistor (FET) based biosensing devices [38], organic electrochemical transistors-OECT [39], voltametric-based immunosensor [40], optical waveguide-based biosensor technology [41], deep learningbased surface-enhanced Raman spectroscopy [42], paper-based impedance sensor [43], high-field asymmetric waveform ion mobility spectrometry (FAIMS)-parallel reaction monitoring (PRM) [44], a colorimetric biosensor [45], an electrochemical glucose sensor [46], and a urine foaming test [47]). Finally, two studies were performed with urine samples [36,47]. Most studies used nasopharyngeal, nasal, pharyngeal, throat, oropharyngeal or saliva samples. We classified the samples into two groups, named "NSP", containing the first three sample types, and "TS", containing the last three types. The type of sample was clearly mentioned in 207 studies, while all types of samples were used without distinction in 31 studies. The results from different types of samples were compared with the same method in 11 studies. Finally, data from 60 studies on asymptomatic persons and 73 on symptomatic patients were also used to explore differences in diagnostic accuracy between these two patients' groups. The results of the quality assessment of the research using the QUADAS tool are provided in Supplementary Table S1 and in Supplementary File S1.

Analysis of Diagnostic Performance
A great amount of the available data, for all methods, concerned samples detected with qPCR Ct values of 20, and mostly of 30 and 40. As shown in Table 2, the sensitivity of LFIA tests (using the N antigen) based on NSP samples that were qPCR-positive for Ct < 20 was 0.945 (95% CI: 0.930, 0.961). It declined, however, considerably to 0.329 (95% CI: 0.265, 0.393) for 30 < Ct < 40. LFIA tests using TS samples performed worse in terms of sensitivity, with a highest estimate of 0.805 (95% CI: 0.599, 1.000) in samples positive for Ct < 20 and a lowest of 0.085 (0.000, 0.176) for Ct > 30 ( Table 2). The specificity of LFIA on NSP and TS samples (using the N antigen) was very high across all Ct intervals, ranging from 0.959 (95% CI: 0.923, 0.995) to 0.996 (95% CI: 0.993, 0.998). The sensitivity of FIA (using the N antigen) on NSP samples also showed a declining pattern from 0.935 (95% CI: 0.880, 0.990) for Ct < 20 to 0.435 (95% CI: 0.190, 0.680) for 30 < Ct < 40. Specificity was also very high using NSP qPCR positive samples for Ct < 30 (0.992, 95%: 0.979, 1.000). CLEIA (using the N antigen) had high sensitivity based on NSP samples that were PCR-positive for Ct < 30 (0.980, 95% CI: 0.960, 0.999); this estimate, however, was based on a smaller number of studies and dropped considerably at higher Ct (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40) values (0.515; 95% CI: 0.220, 0.810). The specificity of CLEIA was very high in all comparisons. The evaluation of the performance of other methods (using the N antigen) on NSP and TS samples for the above studied Ct values intervals (0-20, 21-30, and 31-40) was based on a few studies but showed similar patterns. Data on methods using other antigens (i.e., based on S, E or M protein) were too scarce to allow reliable estimations (Table 2).     Combining all major methods (LFIA, FIA and CLEIA) on NSP and TS samples, measuring both N and S antigens and stratified according to two Ct values (<30 and <40), the maximum sensitivity was estimated at 0.858 (95% CI 0.835, 0.881) for NSP samples positive for Ct < 30 ( Table 3). The sensitivity using qPCR positive NSP samples for Ct < 40 is lower at 0.726 (95% CI 0.706, 0.746). Again, antigen testing of NSP samples outperformed that of TS samples for both Ct < 30 and Ct < 40 (0.637 (95% CI: 0.478, 0.795) and 0.438 (95% CI: 0.332, 0.547), respectively). Specificity was very high in all meta-analyses (Table 3). Table 3. Results of the multivariate meta-analysis performed cumulatively for methods and/or antigen tested, in <30 and <40 Ct values. Listed information includes the pooled sensitivity and specificity along with the 95% confidence intervals (NSP: pharyngeal, nasopharyngeal, nasal specimens, TS: throat, saliva, oropharyngeal, N: nucleocapsid protein, S: spike protein, M: membrane E: envelope, NS: nucleocapsid and Spike proteins). To attain a better insight into how each method performs, we compared the metaanalysis results for the sensitivity and specificity of each method (LFIA, FIA, CLEIA) on NSP and TS samples for all antigens cumulatively (N plus S). As shown in Table 3, in terms of sensitivity, the laboratory CLEIA method outperforms the point of care (POC) methods (LFIA and FIA), the NSP samples outperform the TS samples, and the best results are obtained for samples identified positive with PCR for Ct < 30 (0.977 (95% CI: 0.955, 0.998) versus 0.408 (95% CI: 0.292, 0.523) and 0.162 (95% CI: 0.083, 0.242)) ( Table 3).
Since the ultimate goal of a diagnostic method for SARS-CoV-2 is to identify an infected person regardless of the low viral load, we compared the overall sensitivity of rapid tests performed in points either of care or where virus surveillance is performed (LFIA or FIA) with laboratory methods (CLEIA) that show the highest sensitivity. As shown in Figure 2 (and Table 3), the overall (for Ct < 40) sensitivity of POC methods is about 10% lower than that of the CLEIA method for NSP samples (0.718 (95% CI: 0.697, 0.739) compared to 0.816 (95% CI: 0.761, 0.870)). Specificity was again high in all cases ranging from 0.957 (95% CI: 0.889, 1.000) to 0.995 (95% CI: 0.993, 0.997), although due to the small number of the included studies in some subgroups, these results may have some uncertainty (Table 3).
To investigate the validity of our stratification analysis according to Ct values (<30 and <40), we tried to explore the association between a patient/sample's infectivity and positivity in POC antigen tests (LFIA and FIA) and PCR tests using data from the included studies. We found 51 studies (Table 1) that used a virus culture to address this issue; however, the results were presented in a plethora of different ways and could not be quantitatively synthesized and analyzed, due to different reported parameters. From them, ten studies used virus cultures to only test the viral load (RNA copies/mL) that a POC test could detect. The remaining 34 studies presented a combination of data such as the limit of detection (LoD) in terms of RNA copies/mL or per swab or in pfus/mL, tissue culture infection dose (TCID), TCID50, TCID95%, sensitivity of POC tests in correlation with virus culture cytopathic effect (CPE) measured in different days and after zero, one or two passages. Nevertheless, sixteen studies [63,85,87,91,101,135,145,151,167,169,199,[215][216][217]219,255] determined LoD Ct values ranging from 18.57 [219] to 34 [145], with most of them reporting Ct 30 as an average threshold for a POC test to be positive. Importantly, viral culture positivity (CPE), though measured under various protocols (directly [87,91,101,135,143,145,200,216,241] and indirectly [141,169,201,215,241,254]), has been extensively used as a marker for sample infectivity. Furthermore, twelve studies [54,76,85,143,170,199,213,217,233,235,237,241] presented data providing LoD values for a POC tests ranging from 5.10 3 (Ct = 27.3 [63]) to 10 6 RNA copies/swab (Ct = 30) [54,76]. Noteworthily, four studies on the CLEIA method [111,150,156,206] and four studies [41,44,46,47]) on in-house tests also investigated virus infectivity in correlation with either Ct values or positivity of these tests, but these were not analyzed since they were not reporting on POC tests. Taken together, the above observations suggest that if SARS-CoV-2-infected cell culture positivity is an indicator of a patient/sample that is likely to be infectious [202,258,259], this infectivity better correlates with POC test positivity than rt-PCR positivity. As we show herein, POC test positivity corresponds better to PCR positivity for Ct < 30; thus, POC tests are more likely to detect infectious individuals than positive PCR tests. reporting on POC tests. Taken together, the above observations suggest that if SARS-CoV-2-infected cell culture positivity is an indicator of a patient/sample that is likely to be infectious [202,258,259], this infectivity better correlates with POC test positivity than rt-PCR positivity. As we show herein, POC test positivity corresponds better to PCR positivity for Ct < 30; thus, POC tests are more likely to detect infectious individuals than positive PCR tests. Additional meta-analysis showed that the sensitivity of LFIA (on NSP samples) in symptomatic patients was higher than that in asymptomatic individuals, both for Ct < 30 and Ct < 40 (symptomatic: 0.823 (95% CI: 0.765, 0.882) and 0.753 (95% CI: 0.713, 0.794)asymptomatic: 0.665 (0.558, 0.772) and 0.561 (95% CI: 0.499, 0.622), respectively) ( Table 4 and Figure 3). FIA assays seem to perform worse, but the meta-analysis estimates were based on a smaller number of studies. Specificity was very high for both LFIA and FIA methods (~99%) ( Table 4). Additional meta-analysis showed that the sensitivity of LFIA (on NSP samples) in symptomatic patients was higher than that in asymptomatic individuals, both for Ct < 30 and Ct < 40 (symptomatic: 0.823 (95% CI: 0.765, 0.882) and 0.753 (95% CI: 0.713, 0.794)asymptomatic: 0.665 (0.558, 0.772) and 0.561 (95% CI: 0.499, 0.622), respectively) ( Table 4 and Figure 3). FIA assays seem to perform worse, but the meta-analysis estimates were based on a smaller number of studies. Specificity was very high for both LFIA and FIA methods (~99%) ( Table 4). Table 4. Results of the meta-analysis for the different types of assays for symptomatic and asymptomatic patients. Listed information includes the pooled sensitivity and specificity along with the 95% confidence intervals. (NSP: pharyngeal, nasopharyngeal, nasal specimens, TS: throat, saliva, N: nucleocapsid protein, S: spike protein, NS: nucleocapsid and Spike proteins).

Discussion
Test-trace-isolate remains a fundamental strategy to control SARS-CoV-2 transmission. Compared to PCR methods, antigen detection tests do not require specialized laboratory equipment and are less expensive, thus allowing repeated and point-of-care testing on a wide scale [18]. Our meta-analysis, summarizing evidence from thousands of people with and without SARS-CoV-2 infection diagnosed with rt-PCR, and performing various comparisons, shows that the overall performance of AT is comparable to rt-PCR, at least in terms of specificity, with meta-analytic estimates around 99%, irrespective of the method used. Sensitivity is lower and seems to depend on viral concentration being increased if detected at lower PCR cycles (Ct values). AT are also more sensitive when used on NSP samples and in symptomatic individuals. These updated findings are in accordance with previous efforts to summarize the evidence in this field [260,261]. Current best practices in meta-analysis suggest that a frequent update should be performed, and there is active research regarding the identification of the actual time that an update is needed [262,263]. As a matter of fact, previous works include statistical methods and surveillance systems that will identify the need for an update of a published meta-analysis [264,265]. More recently, the concept of a "living" systematic review has emerged, in which the review is continuously updated, incorporating relevant new data as they become available. Such reviews may be particularly important in fields where research evidence is emerging rapidly [266,267], and clearly, the COVID-19 pandemic is a perfect example of a field where new research accumulates in an unprecedented way and an updated meta-analysis is needed.
The sensitivity of AT is good but not ideal, and thus rt-PCR remains the gold standard for diagnosis. Given the suboptimal sensitivity of antigen tests, there is a likelihood of false negative results, which should be handled depending on the clinical and epidemiological circumstances. In general, confirmation of an AT result with rt-PCR in a laboratory is necessary when the result is not consistent with clinical and epidemiological information. Given their higher sensitivity among symptomatic people and in those with higher viral load (Ct < 30), ATs are expected to perform better when used for the diagnosis of SARS-CoV-2 infection in people with symptoms, in high-risk contacts of confirmed cases or in high-risk groups as health care workers with known exposure. Moreover, the sole detection of viral RNA with rt-PCR does not seem to overlap with patients' infectiousness. Rather, POC (rapid) antigen tests that can only detect viral loads detectable with rt-PCR at Ct values <30 seem to more efficiently discriminate infectious SARS-CoV-2 carriers that should stay in isolation [202,255,258,259]. These findings are further supported by CDC recommendations, already posed by the end of 2020, which propose a Ct value of 33 as illustrative of contagiousness [204,268].
Proper interpretation of AT results is important not only for diagnosis but also for screening and surveillance purposes. This meta-analysis did not evaluate screening strategies that used AT. Nevertheless, it seems that AT can be used for regular screening of asymptomatic people in high-risk congregate settings, such as nursing homes, homeless shelters, detention facilities, etc., where the turnaround time of results is critical [269]. The fast identification of highly infected people in these facilities using rapid POC antigen tests will immediately inform infection prevention and control strategies and interventions, and consequently will significantly reduce onward transmission. Due to the lower sensitivity, screening in congregate high-risk settings but also mass screening may suffer from false negative results. Given the presumed direct correlation of rapid ATs' positivity with patient's infectivity, and the evidence that the effectiveness of screening depends more on frequency of testing and speed of reporting rather than on very high sensitivity [91,270], it seems that antigen tests can be used for repeated population screening.
In terms of specificity, AT performs extremely well, similarly to rt-PCR, thus minimizing the likelihood of false-positive results. However, false-positive results do occur, especially when the prevalence of SARS-CoV-2 infection in communities is low. This should be considered both in terms of diagnosis and when designing public health interventions or prevalence studies in low-prevalence settings because false positives result in a waste of resources (unnecessary isolation of cases and follow-up actions) and inaccurate estimations.
This meta-analysis is subject to the limitations of the individual studies. Bias and confounding at the study level cannot be easily addressed or corrected at the stage of meta-analysis. There are also issues that could affect the results and are usually not measured, reported, or addressed in studies that evaluate the accuracy of AT: storage and handling, reading of test results (time and interpretation), specimen collection and handling, time from specimen collection to testing, temperature of specimen, and potential cross-contamination, as was shown in the quality assessment of the research performed with the QUADAS tool.
We need to emphasize that the studies included in this meta-analysis were conducted before July 2021. Thus, data collection was completed at a time prior to the emergence of the Omicron variant and thus, the conclusions drawn from this work involve mainly the initial Wuhan strain, Alpha, Beta and Delta (to some extent) variants. A complete treatment of the question regarding the effectiveness of antigen tests against the newly emerged Omicron variant [271] would require a study of its own, but nevertheless we might be able to highlight some of the available evidence. Initially, there were concerns regarding the effectiveness of the tests [272], but the first report with the Abbott BinaxNow SARS-CoV-2 Rapid Antigen Assay provided evidence that it can be used efficiently [273]. Similar results were reported with another approved test (E25Bio, Inc., Cambridge, MA, USA, and Perkin Elmer, Waltham, MA, USA) in a comparison study of the Alpha, Gamma, Delta and Omicron variants [274], and for Panbio™ COVID-19 Ag Rapid Test [275]. Stanley and coworkers examined the analytical sensitivity of the Abbott BinaxNow, the AccessBio CareStart and LumiraDx antigen tests, and found that the level of detection was at least as good for Omicron as for the initial Wuhan strain [276]. Finally, Deerain and coworkers measured the sensitivity of ten different lateral flow devices against the omicron variant and found that the analytical sensitivities of these ten kits were similar for both the Delta and Omicron variants [277]. All in all, even though more studies are needed, the available evidence suggests that the currently used ATs can be used efficiently for detecting the Omicron variant and large discrepancies in sensitivity due to its spread are not expected.
Finally, evaluation of different testing strategies in various settings is also urgently needed [278]. Moreover, the lack of an agreed, universal, standardized protocol starting from specimen collection and handling to performing and reading the test and to the way(s) that its performance is validated (rt-PCR (genes, Ct values) or cytopathic effects of virus cultures (reference virus strain) or RNA copies, etc. [140,279]) has also been revealed through our current systematic review and meta-analysis. Only in such uniform settings can accurate comparisons of methods and individual tests be performed in order to optimally track and manage SARS-CoV-2 infection in the global community.