Head-to-Head Accuracy Comparison of Three Commercial COVID-19 IgM/IgG Serology Rapid Tests

Background: Comparative data of SARS-CoV-2 IgM/IgG serology rapid diagnostic tests (RDTs) is scarce. We thus performed a head-to-head comparison of three RDTs. Methods: In this unmatched case-control study, blood samples from 41 RT-PCR-confirmed COVID-19 cases and 50 negative controls were studied. The diagnostic accuracy of three commercially available COVID-19 RDTs: NTBIO (RDT-A), Orient-Gene (RDT-B), and MEDsan (RDT-C), against both a recombinant spike-expressing immunofluorescence assay (rIFA) and Euroimmun IgG ELISA, was assessed. RDT results concordant with the reference methods, and between whole blood and plasma, were established by the Kendall coefficient. Results: COVID-19 cases’ median time from RT-PCR to serology was 22 days (interquartile range (IQR) 13–31 days). Whole-blood IgG detection with RDT-A, -B, and -C showed 0.93, 0.83, and 0.98 concordance with rIFA. Against rIFA, RDT-A sensitivity (SN) was 92% (95% CI: 78–98) and specificity (SP) 100% (95% CI: 91–100), RDT-B showed 87% SN (95% CI: 72–95) and 98% SP (95% CI: 88–100), and RDT-C 100% SN (95% CI: 88–100) and 98% SP (95% CI: 88–100). Against ELISA, SN and SP were above 90% for all three RDTs. Conclusions: RDT-A and RDT-C displayed IgG detection SN and SP above 90% in whole blood. These RDTs could be considered in the absence of routine diagnostic serology facilities.


Introduction
In the context of COVID-19, numerous CE-marked anti-SARS-CoV-2 serology-based assays have been developed for diagnostics and serological survey purposes, although they are not meant to provide information regarding individual seroprotection statuses [1]. Among them, several rapid diagnostic tests (RDT) have been released for general use, but so far, only a few validation studies assessed their performance against reference methods [2][3][4][5][6][7]. These studies showed varying levels of performances, and cross-comparison is difficult due to: (i) the heterogeneity in the study designs and studied populations (confirmed cases versus seroprevalence studies), (ii) the analytical performance variability, and (iii) only a few head-to-head comparisons between available RDTs Here, we performed a head-to-head comparison of the performances of three commercially available SARS-CoV-2 IgM/IgG immunochromatographic RDTs, which were provided to the Geneva Reference Centre for Emerging Viral Diseases by the Swiss Red Cross and the Swiss National COVID-19 Science Task Force (https://ncs-tf.ch/en/). These RDTs were from: (i) NTBIO Diagnostics Inc. (Surrey, British Columbia, Canada)-hereafter. RDT-A, (ii) Zhejiang Orient-Gene Biotech Co. Ltd. (Huzhou, China)-hereafter, RDT-B, and (iii) MEDsan GmbH, Biological Health Solutions (Hamburg, Germany)-hereafter, RDT-C.

Study Population and Blood Sample Collection
Anonymized leftovers of whole-blood EDTA samples were used for this analysis, in accordance with our institution's ethical committee and national regulations. We included blood samples from 41 real-time (RT)-PCR confirmed COVID-19 cases hospitalized at the University Hospitals of Geneva, Switzerland and 50 unmatched negative samples from asymptomatic blood donors obtained during the same period (April 2020). Whole-blood EDTA samples were used as a proxy of capillary blood and were centrifuged (3000× g for 10 min) in parallel to generate EDTA plasma. All analyses (see below) were performed within 72h of blood sampling without any freeze-thaw cycle. The 41 COVID-19 samples were categorized according to the number of days following real-time (RT)-PCR positivity-days post-diagnosis (DPD).

Study Endpoints
In our primary endpoint, within this cohort of 91 blood samples (45% RT-PCR-confirmed COVID-19), we assessed the accuracy of SARS-CoV-2 IgG detection in whole blood (as a surrogate for capillary blood) by three commercially available IgM/IgG RDTs. This was done by comparing against the recombinant immunofluorescence assay (rIFA) reference method (which identifies IgG targeting the complete SARS-CoV-2 spike protein, i.e., both S1 and S2 subunits).
In our secondary endpoints, we sought to: (i) validate the performances for SARS-CoV-2 IgG detection using the same RDT panel against an ELISA-based IgG serological immunoassay (Euroimmun) as an alternative reference method, (ii) assess the performance of the RDTs against rIFA and IgG ELISA within each of the COVID-19 DPD subgroups (0-14 and >14 days), (iii) assess the result concordances of the three RDTs with both reference methods, (iv) assess the concordance of the rapid IgM/IgG test results in whole blood versus EDTA plasma, and (v) assess IgM detection by each RDT. Due to the absence of a reference comparison method for IgM testing in our institution (not provided by Euroimmun), we did not formally assess RDT IgM detection performances but merely report their positivity ratio within the cohort.

IgM/IgG Immunochromatographic Rapid Cassette Tests
The three commercially available lateral-flow immunochromatographic SARS-CoV-2 IgM/IgG RDT cassettes can be used with capillary blood, venous whole blood, plasma, or serum. For each tested IgM/IgG RDT, one cassette per sample was used according to the manufacturer's instructions. For the RDT from NTBIO Diagnostics Inc., Surrey, British Columbia, Canada (hereinafter, RDT-A), 10 µL of whole blood (as a proxy for one capillary blood drop) or 10 µL of plasma were applied in parallel for each sample, and IgG/IgM responses were read after 15 min but no later than 20 min. For the RDT from Zhejiang Orient-Gene Biotech Co. Ltd., Huzhou, China (hereinafter, RDT-B), 10 µL of whole blood or 5 µL of plasma were applied, and results read after 10 min but no later than 15 min. For the RDT from MEDsan GmbH, Biological Health Solutions, Hamburg, Germany (hereinafter, RDT-C), 5 µL of whole blood or 5 µL of plasma were applied, and results read after 10 min but no later than 15 min.

SARS-CoV-2 IgG ELISA
Euroimmun IgG ELISA uses the S1 domain of the spike protein of SARS-CoV-2 as the antigen. EDTA plasma was diluted at 1:101 and assessed with the IgG CE-marked ELISA (Euroimmun AG, Lübeck, Germany # EI 2606-9601 G) according to the manufacturer's instructions. The ELISAs were run on Dynex Agility (Ruwag, Bettlach, Switzerland) according to the manufacturer's protocols. After adding the conjugate, samples' immunoreactivity were measured at an optical density of 450nm (OD450) and then divided by the OD450 of the calibrator provided with each ELISA kit to minimize the inter-assay variation [11,12]. The quantitative results (ratios) obtained were then expressed in arbitrary units and interpreted following the recently published proposed cut-offs derived from our local validation process: OD ratio: < 0.5 = negative, ≥0.5 and <1.5 = indeterminate, and ≥ 1.5 = positive [11,14].
Statistics Vassarstats online tool (www.vassarstats.net) was used to determine the sensitivity (SN), specificity (SP), and positive and negative predictive values (PPV and NPV), as well as the positive and negative likelihood ratios (LR+ and LR−). It was also used to calculate the proportions and 95% confidence intervals. We used PRISM (Graphpad, San Diego, CA, USA) for calculating the medians, interquartile range (IQR), and significance (p-values). Significance was calculated using Fisher's exact test for categorical variables and the Mann-Whitney U-test for continuous variables. Concordance between immunoassays was assessed with Kendall's coefficient, which was calculated using Statistica (version 13.5.0.17, TIBCO Software Inc., Palo Alto, CA, USA). As previously published, indeterminate rIFA IgG and ELISA IgG results were considered to be negative for the test performances and concordance analyses in order to maximize the specificity [11], as were invalid RDT results. Statistical significance was defined as p < 0.05.
rIFA and ELISA IgG seropositivity in RTPCR-confirmed COVID-19 cases and negative controls. Overall, 92.7% (n = 38) of RT-PCR confirmed COVID-19 patients had IgG seroconversion detected by rIFA and 90.2% (n = 37) by ELISA, as shown in Table 1. In the DPD >14 subgroup, 96.3% (26/27) of COVID-19 patients were seropositive by rIFA. In 49 out of 50 negative controls (98%, 95%CI: 88-100), SARS-CoV-2 IgG could not be detected by rIFA-albeit, one sample was IgG-positive by rIFA and ELISA in both whole blood and plasma (and as well with the three RDTs), suggesting a previously unidentified Covid-19 infection in one of the asymptomatic controls. Detailed results of each test for both COVID-19 and the control cases and samples are available in Supplementary Tables S1-S4.
SARS-CoV-2 IgG detection by RDTs in whole-blood EDTA versus plasma EDTA. Out of 91 paired samples, result concordances for IgG detection between whole-blood and plasma samples were 0.93, 0.87, and 1.00 for RDT-A, -B, and -C, respectively. The slightly lower concordance observed for RDT-B was influenced by four whole blood-plasma sample pairs, where IgG was detected only in the plasma (Supplementary Table S1). In keeping with these results, RDT-B in plasma showed higher SN (as well as higher NPV and lower negative LR) than in whole blood, as shown in Tables 3 and 4. SARS-CoV-2 IgM seropositivity ratios in RT-PCR-confirmed COVID-19 cases and negative controls. IgM detection proportions within these cohorts varied significantly across RDTs, in both whole blood and plasma, ranging from 14.6% positivity within RT-PCR-confirmed COVID-19 samples with RDT-A to 73.2% with RDT-B and 92.7% with RDT-C, as shown in Table 5. In addition, tests performed in plasma rather than whole blood yielded higher positivity for RDT-A and RDT-B, while RDT-C seemed to yield similar results in both sample types. IgM detection result concordances between whole blood and plasma were 0.47, 0.90, and 0.96 for RDT-A, -B, and -C, respectively. Overall, RDT-A showed lower IgM detection capacities and a lower concordance between whole blood and plasma, in favor of the latter. We did not establish here how these differences might affect test performances due to the absence of a reference method for IgM detection (absence of IgM detection by Euroimmun ELISA).  20) 0.00 Table 5. SARS-CoV-2 IgM detection characteristics by various immunoassays.

Discussion
The major finding of the present head-to-head comparison of these SARS-CoV-2 serology RDTs performed on a study sample with a pretest probability of 45.1% is that these RDTs, when carried out in whole blood and compared to rIFA as the primary endpoint, displayed rather acceptable performances overall. Indeed, for the three of them, the SPs were equal or above 98%, and the PPVs ranged between 97% and 100%, keeping in mind that the lowest end of the 95% CI could be as low as 83%. SNs ranged between 87% to 100% and the NPVs between 91% to 100%. When compared to rIFA, RDT-C (MEDsan) appeared to be the most satisfactory of the three RDT tested, with a concordance of 98%, a NPV of 100%, a PPV of 98%, a LR+ of 52, and an optimal LR-of 0.0, although this should be interpreted cautiously due to the rather wide 95% CI. These performances were similar in the two subgroup DPDs below or above 14 days. Regarding the secondary endpoint (association with Euroimmun IgG ELISA) in whole blood, RDT-C (MEDsan) was also found to be close to optimal, with a concordance of 96%, a SN of 100%, a NPV of 100%, a SP of 96%, a PPV of 95%, and an optimal LR-of 0.0, but it also had the lowest LR+ (26.5) among the three RDTs. For this secondary endpoint, when RDTs were used on plasma, RDT-B (Orient-Gene) was found to provide the best concordance with ELISA (98%), the highest LR+ (53), and a close to optimal LR-(0.013). Nevertheless, as RDTs are meant to be used without centrifugation, the implication of this observation is merely technical. Our results contrast with current RDT data available in the literature [2][3][4][5]15,16]. Indeed, Cassaniti and colleagues tested a RDT in the specific setting of acute diagnosis in the emergency department and found very low performance, thus suggesting that RDTs should be avoided in this particular acute clinical setting with potentially very short DPDs. In contrast, a study by Hoffman et al. testing the Zhejiang Orient-Gene RDT revealed a sensitivity of 69% and 93.1% for IgM and IgG, respectively, with very high specificities-although this study compared results against SARS-CoV-2 RT-PCR in the absence of a serology gold standard [2,4]. Thus, data in the literature cannot easily be compared to our results. One major difference between studies is the delay between RT-PCR and serology, which might vary and is generally shorter in other studies. To the best of our knowledge, only this study compared RDT performances against an immunoassay as the reference method.
Several factors can explain the inter-RDT differences. Firstly, it is important to note that the high performances for SARS-CoV-2 IgG detection of the present RDT panel were obtained using a cohort of cases/samples whose median time from RT-PCR to serology testing was approximatively three weeks. This median time was lower in several previous studies. Such a delay allowed to reach high median anti-SARS-CoV2 IgG ratio levels (16.89) measured with Euroimmun ELISA, which, in turn, may explain the good performances reported presently, including in our subgroup analyses in patients belonging to DPD 0-14 versus > 14 DPD, when compared to previous studies performed in acute settings. Secondly, this difference can also be at least partly explained by inter-assay analytical differences, as some assays are focused either against the full spike protein (such as RDT-A), against its subdomains S1 or S2, against the nucleocapsid, or even against a combination thereof (such as RDT-C) [17,18].
The second notable finding of this study lies in the fact that whole-blood EDTA, as a surrogate for capillary blood, is an appropriate medium type for IgG detection with RDT-C (MEDsan) and RDT-A (NTBIO), whereas better performances were obtained with plasma for RDT-B (Orient-Gene). For IgM detection, only RDT-C (MEDsan) showed high positivity rates in whole blood, equal to those in plasma, while plasma seemed to perform better for RDT-A and RDT-B, although further evaluations with an independent reference method is probably needed. Importantly, provided that an adequate RDT is used, these results suggest that whole blood can be an adequate medium for SARS-CoV-2 IgG serology testing. Of note, despite the large variability observed across RDTs for IgM detection, RDT-C (MEDsan) seemed to display the highest detection rate.
A limitation to this study is, firstly, that we used whole blood as a proxy for capillary blood, and the test was performed in a laboratory environment; although whole blood EDTA is an established surrogate of capillary blood [19], we would possibly expect different results in real life with a non-laboratory-trained individual using capillary blood. Secondly, this is a method validation study and not a seroprevalence study. The case-control design, with a final proportion of RT-PCR-confirmed COVID-19 cases of 45.1%, warrants a cautious interpretation of the PPV and NPV. Nevertheless, given the very high and low LR+ and LR-, respectively, our results indicate that that these RDT may prove to be useful even in lower pretest probability settings, where the current findings may provide guidance on the choice of the RDT to be considered. Third, within the asymptomatic control group, in one sample, all tested immunoassays (the three RDTs and both reference methods) yielded positive results. Although it did not influence the test performance analysis, it suggests a possible SARS-CoV-2 serological scar within the control population of the blood donors. When selecting healthy controls, we had no way to avoid the inclusion of a subject with a previous, possibly asymptomatic, infection during the epidemic period. Fourthly, the limited sample size of this validation study leads to broad 95% confidence intervals, precluding a robust ranking of the three RDTs under study. Finally, among all the RDTs available on the market, we only tested three of them based upon the prior selection made by the Swiss Red Cross and the Swiss COVID-19 Task Force. Therefore, our present conclusions only apply to these RDTs and must not be applied to any other RDTs. Taken together, these data indicate that RDTs should probably not be used if access to centralized tests is available [20].
In conclusion, among the three RDTs tested, the one from MEDsan (RDT-C) in whole blood displayed the highest concordance with our reference method (rIFA) and with automated ELISA, with close to optimal positive and negative predictive values in this high pretest probability cohort. Even if these RDT are currently not meant to replace centralized tests and should not be used whenever routine laboratory-based SARS-CoV-2 IgG serology is available, the apparently adequate performance of this RDT merits further testing in lower prevalence settings.

Conflicts of Interest:
The authors declare no conflicts of interest