Multicenter Technical Validation of 30 Rapid Antigen Tests for the Detection of SARS-CoV-2 (VALIDATE)

During COVID19 pandemic, SARS-CoV-2 rapid antigen tests (RATs) were marketed with minimal or no performance data. We aimed at closing this gap by determining technical sensitivities and specificities of 30 RATs prior to market release. We developed a standardized technical validation protocol and assessed 30 RATs across four diagnostic laboratories. RATs were tested in parallel using the Standard Q® (SD Biosensor/Roche) assay as internal reference. We used left-over universal transport/optimum media from nasopharyngeal swabs of 200 SARS-CoV-2 PCR-negative and 100 PCR-positive tested patients. Transport media was mixed with assay buffer and applied to RATs according to manufacturer instructions. Sensitivities were determined according to viral loads. Specificity of at least 99% and sensitivity of 95%, 90%, and 80% had to be reached for 107, 106, 105 virus copies/mL, respectively. Sensitivities ranged from 43.5% to 98.6%, 62.3% to 100%, and 66.7% to 100% at 105, 106, 107 copies/mL, respectively. Automated assay readers such as ExDia or LumiraDx showed higher performances. Specificities ranged from 88.8% to 100%. Only 15 of 30 (50%) RATs passed our technical validation. Due to the high failure rate of 50%, mainly caused by lack of sensitivity, we recommend a thorough validation of RATs prior to market release.


Introduction
The SARS-CoV-2 pandemic has lead to an unprecedented burden of individual and public health [1]. SARS-CoV-2 diagnostic assays became the corner stone of patient care and epidemiological management. Diagnostic laboratories reacted promptly, adapting workflows according to demands [2]. However, in many countries high case numbers resulted in limited reagents and focus on severely ill patients. In this situation, alternative testing strategies using SARS-CoV-2 specific rapid antigen testing (RATs) presented a promising opportunity for individual diagnostics and population screenings. RATs mostly target abundant viral proteins such as the SARS-CoV-2 nucleocapsid (N)-protein, and rarely the spike (S)-protein [3]. In general, RATs are cheaper, easy to perform, and faster than most molecular methods. However, analytical sensitivities are lower compared to RT-PCRs [4]. Moreover, RATs with a lower specificity than RT-PCR have been used increasingly in clinical settings with a low pre-test probability. This resulted in a low positive predictive value [5] demanding subsequent RT-PCR confirmation.
For several reasons a thorough evaluation of SARS-CoV-2 specific RATs is critical. A plethora of RATs were rapidly released on the market. The viral pathogen was still relatively new and many production companies and their product quality were unknown. Furthermore, the poor performance of RATs for other respiratory viruses such as Influenza has been known for decades [6][7][8]. Finally, diagnostic assay performance including sensitivity and specificity depends on many pre-analytical factors, such as the assay user, sample material and clinical settings, e.g., symptomatic or asymptomatic patients and time since symptoms onset [4,[8][9][10][11][12][13][14][15][16][17][18][19][20].
The Federal Office of Public Health (FOPH) in Switzerland authorized the introduction of RATs in November 2020 based on clinical and technical validation criteria and on the recommendations of the Swiss Society of Microbiology [20]. Thus, the FOPH mandated the Swiss Society of Microbiology to determine the sensitivity and specificity of RATs prior to market release using a shared validation protocol. We present the results of the national multicenter validation of 30 SARS-CoV-2 RATs performed across four diagnostic laboratories.

Study Design
We developed a technical validation protocol as part of the Coordination Commission for Clinical Microbiology of the Swiss Society for Microbiology (File S1 in [20]) and performed a technical validation with every center using the same standardized protocol for 30 RATs. Table S1 lists the used assays.

Samples
We used left-over material collected with flocked swabs from nasopharyngeal swabs and sent to the laboratory for SARS-CoV-2 specific RT-PCR testing. Samples were suspended in 1-3 mL of transport media. For each RAT, we collected 100 SARS-CoV-2 RT-PCR positive samples and 200 SARS-CoV-2 RT-PCR negative samples for the validation. All samples were stored at 4 • C and used within 72 h from collection. The only exceptions were reference samples shared between all centers (Table S3). Five highly positive reference samples were included. We diluted samples in transport media, and shipped frozen aliquots to each center. Similarly, 50 negative reference samples were included. These originated from a respiratory virus biobank collected at the University Hospital Basel between January 2017 to October 2020 and were stored at −80 • C until usage in this validation. The collection contained diverse respiratory viruses to assess cross reactivity of RATs, including: four human coronaviruses (HCoV-229E, HCoV-HKU1, HCoV-OC43, and HCoV-NL63), parainfluenza viruses 1 to 4, rhino/enterovirus, influenza A and B, respiratory syncytial virus (RSV), and human metapneumovirus. The samples were pooled, aliquoted, and shipped frozen to each center. All reference samples were evaluated with the Biofire ® FilmArray ® Respiratory panel (bioMérieux, Marcy-l'Etoile, France), a SARS-CoV-2 specific quantitative RT-PCR and a virus specific quantitative RT-PCR.

Validation Protocol
Details of the protocol are described in File S1 in [20]. First, a technical validation of at least 300 samples was conducted. Then, we determined the limit of detection from two positive samples using a serial dilution (see "Limit of detection"). Of note, the second step was not considered in the evaluation of the performance for the FOPH. The different assays were batch-wise evaluated, usually up to four different RATs at the same time. In each batch, the Standard Q ® (SD Biosensor, Suwonsi, Republic of Korea/ Hoffmann, Roche, Basel, Switzerland) was used as internal reference standard. The Standard Q ® (as well as the Panbio TM COVID-19 Ag Rapid Test from Abbott, Lake Forest, IL, USA) was validated in previous clinical studies [10,18] and already commercialized at the time of the beginning of the present study. In addition, results from a technical validation of the Panbio TM assay were presented elsewhere [4]. The Standard Q ® was chosen as internal reference based on its more frequent use.

Technical Validation
Transport media and respective buffer solution from the RAT were mixed in a 1:1 ratio. From this mixture the assay was performed according to the manufacturer's instructions. Three to four drops were added to the flow device and reading was done after 10 to 20 min. Reading was performed visually or with specific reading devices, according to the instrument protocol.

Limits of Detection
To determine the limit of detection, two highly positive samples were diluted in a 1/2 dilution series over 7 steps and each step was tested with the RAT (File S1 in [20]). One of the positive samples was from an infected Vero cell line and the second sample was from a highly positive patient left-over material. These were aliquoted at respective dilution steps, frozen and shipped to each center. For both assessments, detection of a cycle threshold (Ct) between 21.1 (around 2.3 × 10 7 copies/mL) and 23.3 (around 6.8 × 10 6 copies/mL) were considered the lower limits of detection.

Statistical Analysis
Sensitivity and specificity of each RAT compared to results from SARS-CoV-2 specific RT-PCR were assessed, including overall accuracy and 95% confidence intervals (CI). Sensitivity was stratified by viral loads. We used Kruskal-Wallis H test and multiple pairwise test to compare median Ct among different laboratories. Data were analyzed on "R statistical software" (version 3.6.1, 2019, Vienna, Austria).

Ethical Declaration
This project was prepared according to STANDARD guidelines for diagnostic accuracy studies reporting. The performance data of the different antigen assays were obtained during a quality enhancement project. According to the Swiss Human Research Act, publication of anonymized results of such a quality related project do not require approval of an ethics committee.

Sample Characteristics
We used 4523 left-over samples to validate 30 RATs between November 2020 and February 2021 across four laboratories. With each sample, we evaluated up to four RATs in parallel and generated 14,544 RAT results. Excluding the internal reference RAT (Standard Q ® ) and invalid results, we obtained 10,021 RAT results. Of note, we further excluded the Clinitest (Siemens, Erlangen, Germany) due to known cross reactivity with the transport media as a technical assessment was not possible with our setup. Hence, out of 9721 valid   Figure S1).
Three RATs (AMP Diagnostics, LUNGENE and Becton Dickinson) were validated based on non-inferiority criteria (sensitivity ≤ 5%) to the internal reference. In the end, 15/30 RATs did not pass the validation criteria ( Table 2), most of them due to insufficient sensitivity. For Ct values of 29 or higher (roughly 10 5 copies/mL and lower), RATs generally showed sensitivities below 65% (Tables 1 and 2, Figure 3A,B).
In contrast, the "top performing" RATs showed a sensitivity above or equal to 90% at this viral load (Table 1, Figure 3C). Noteworthy, the tests validated at Lausanne University Hospital exhibited worse performances likely due to the high proportion of samples taken after four days from symptoms onset among patients tested SARS-CoV-2 positive at the emergency room (data not shown).

Specificity of RATs
The overall median specificity was 98.6%. In between centers, no difference of specificity median values were noted. The overall specificities were 99.9% (range 99.3% to 100%), 96.4% (range 88.8% to 100%), 99.4% (range 99% to 100%), and 99.0% (one test evaluated) the University Hospital Basel, University Hospital Lausanne, ADMED Microbiology, and Dr. Risch Laboratories, respectively. 27/30 (90%) passed the specificity criteria. We observed no cross-reactivity when evaluating the 50 negative samples containing multiple other respiratory viruses (Table S2).  Figures 2 and 3 show the sensitivity rates for selected RATs. Twelve assa passed the FOPH validation criteria.

Figure 2.
RATs sensitivity rates for viral loads above 105 copies/mL compared to the internal reference test. Figur legend. Green and red dots represent the assays passing and not passing FOPH validation criteria, respectivel orange dots represent the assays validated using the non-inferiority criteria to the internal reference. The solid l represents the internal reference (Standard Q ® , SD Biosensor/Roche). Dashed horizontal lines represent the lim of difference in percentage within which sensitivity rates' variations were considered acceptable compared to IR. The vertical dashed line coincides with 80% cut-off, which was considered the minimal sensitivity thresho for FOPH validation above 105 copies/mL of viral load. ^: this test did not pass the validation criteria for insufficient sensitivity at viral loads above 106-107 copies/mL. °: this test did not pass the validation because lack of specificity. RATs sensitivity rates for viral loads above 105 copies/mL compared to the internal reference test. Figure 2 legend. Green and red dots represent the assays passing and not passing FOPH validation criteria, respectively; orange dots represent the assays validated using the noninferiority criteria to the internal reference. The solid line represents the internal reference (Standard Q ® , SD Biosensor/Roche). Dashed horizontal lines represent the limits of difference in percentage within which sensitivity rates' variations were considered acceptable compared to the IR. The vertical dashed line coincides with 80% cut-off, which was considered the minimal sensitivity threshold for FOPH validation above 105 copies/mL of viral load.ˆ: this test did not pass the validation criteria for insufficient sensitivity at viral loads above 106-107 copies/mL. • : this test did not pass the validation because of lack of specificity.  Three RATs (AMP Diagnostics, LUNGENE and Becton Dickinson) were validated based on non-inferiority criteria (sensitivity ≤ 5%) to the internal reference. In the end, 15/30 RATs did not pass the validation criteria ( Table 2), most of them due to insufficient sensitivity. For Ct values of 29 or higher (roughly 10 5 copies/mL and lower), RATs generally showed sensitivities below 65% (Tables 1 and 2, Figure 3A,B).     Internal reference reproducibility. The internal reference RAT was used 13 times on 4523 samples. We evaluated the intra-laboratory and inter-laboratory variability. As shown in Table 3 and Figure 3D, this antigen assay exhibited a specificity ranging from 99.0% to 100%, with a median specificity of 99.1%. The median sensitivity among patients with Ct < 23 was 98.9% (95% CI 98.7; 99.7), ranging from 95.7% to 100%. The median sensitivity among patients with Ct < 26 was 95.1% (95% CI 96.1; 98.0), ranging from 89.5% to 100%. Finally, the median sensitivity among patients with Ct < 29 was 84.3% (95% CI 88.5; 91.6), ranging from 69.4% to 93.7%. Limits of detection. Analytical sensitivity was tested on two serial dilutions. Minimal lower sensitivity limits were detectable for most assays. Seven RATs (that failed the technical validation) for cell culture supernatant and five RATs (not passing the technical validation) for diluted clinical sample obtained not acceptable results (Table S3). All tests were also compared to the internal reference RAT, which was always positive above the predefined limit of detection.

Discussion
Utilizing a shared validation protocol and pre-defined criteria [20,24] allowed us to evaluate a number of RATs within three months. Our work highlights the heterogeneous performances of RATs currently available. Half of 30 investigated assays exhibited an insufficient low sensitivity and/or specificity and did not pass the technical validation. A recent publication also evaluating 29 RATs made a similar observation with strong heterogeneity of performance [25]. The Paul Ehrlich Institute evaluated 122 assays in parallel and also noted a high heterogeneity [26]. Among the 30 tests we evaluated in Switzerland, 9 RATs were considered as "passed" by the Paul Ehrlich Institute in Germany, but failed the validation based on the Swiss criteria reported here and approved by the FOPH [20,24]. Conversely, no test passed the Swiss validation and failed in the Paul Ehrlich validation. Noteworthy, most RATs that clearly failed our technical validation in Switzerland were already used in other countries. Several other validations showed a limited sample size and/or had a selection bias for high viral loads, not reflecting realworld sample distributions. Therefore, many RATs match the WHO thresholds of 80% for sensitivity and 97% for specificity [2]. In addition, only very few studies included other respiratory viruses to evaluate cross reactivity. Our data showed that the false positive rate of 0.5% was not related to cross-reactivity with other respiratory viruses.
The sensitivity of any SARS-CoV-2 assay is dependent on the viral load in the sample [20]. For this reason, we have determined three sensitivity thresholds for Ct values and viral loads. Nevertheless, we noted some differences between centers. First, the investigated population in all laboratories were different, especially in term of the time since symptoms onset. Second, different SARS-CoV-2 PCR systems were used, which could add to difference in Ct values. The internal reference control and the 5% non-inferiority criteria supported the multicenter validation approach by controlling inter-laboratory variability.
This technical validation was done using left-over materials from nasopharyngeal swabs, which are considered the gold standard sample type. Thus, the performance observed here may only be considered for such samples. A reduced sensitivity is likely to be obtained when other sample types such as nasal or buccal swabs or saliva are used [27], but also among asymptomatic subjects [28]. Thus, in summary, the present validation provides a list of tests that may be used on nasopharyngeal samples taken from symptomatic subjects with 1 to 4 days of symptom.
Our study has several limitations. First, we could not control the percentage of COVID-19 asymptomatic patients as we did not access patient charts during this technical validation; second, not every patient included in the study had a COVID-19 symptoms duration shorter than four days; third, one center used multiple RT-PCR methods as reference without transposing the Ct values to those obtained with Cobas; and fourth, we did not define a Ct distribution from the 100 positive samples.
The "freshness" and storage condition of clinical sample may also influence the sensitivity. The viral load determined by RT-PCR may decrease significantly based on storage temperature [29], as we also observed a decrease upon time, while the antigen remained stable to degradation (data not shown). Thus, using non-fresh samples may lead to consider an antigen test more sensitive than its PCR based sensitivity. The problem of faster RNA degradation compared to DNA degradation may be one of the reasons why some manufacturers provide excellent validation reports for tests that clearly exhibit poor performances.
Finally, it needs to be taken into account that visual reading, compared to automated one, could sometimes result subjective, according to the reader's experience.

Conclusions
In summary, new RATs should be properly evaluated with (i) a standardized protocol, (ii) sufficient large sample, (iii) declaration on the sample origin, describing the sample material type, symptomatic or asymptomatic patients, and the time since symptoms onset. Overall, the poor sensitivity of multiple tests highlights the general problems with RATs-in critical situations SARS-CoV-2 specific RT-PCR should be used.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/microorganisms9122589/s1, Figure S1: Distribution of Ct values overall (A) and across all four laboratories (B), Table S1: All antigen assays evaluated between 7 November 2020 and 23 February 2021 in alphabetic order according to the manufacturer's name, Table S2: List of samples used for cross-reactivity validation. Each PCR was performed in a technical triplicate and the quantification was calculated from the mean value. Table S3 Funding: The costs of RATs' technical validation were covered by the FOPH; there was no direct contact between the manufacturers or vendors and the validating laboratories. Reporting on test performance to the companies was done via the FOPH. Assays who passed the validation were added to the FOPH white list.
Institutional Review Board Statement: This article was prepared according to STANDARD guidelines for diagnostic accuracy studies reporting. The data on the viability of the different antigen assays were obtained during a quality enhancement project. According to national law (Swiss Federal Act on Human Research), the performance and publishing of the results of such a project can be done without asking the permission of the competent research ethics committee.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data supporting reported results will be available upon request for the peer-review process.