Performance of Seven SARS-CoV-2 Self-Tests Based on Saliva, Anterior Nasal and Nasopharyngeal Swabs Corrected for Infectiousness in Real-Life Conditions: A Cross-Sectional Test Accuracy Study

Many studies reported good performance of nasopharyngeal swab-based antigen tests for detecting SARS-CoV-2-positive individuals; however, studies independently evaluating the quality of antigen tests utilizing anterior nasal swabs or saliva swabs are still rare, although such tests are widely used for mass testing. In our study, sensitivities, specificities and predictive values of seven antigen tests for detection of SARS-CoV-2 (one using nasopharyngeal swabs, two using anterior nasal swabs and four using saliva) were evaluated. In a setting of a high-capacity testing center, nasopharyngeal swabs for quantitative PCR (qPCR) were taken and, at the same time, antigen testing was performed in accordance with manufacturers’ instructions for the respective tests. In samples where qPCR and antigen tests yielded different results, virus culture was performed to evaluate the presence of the viable virus. Sensitivities and specificities of individual tests were calculated using both qPCR and qPCR corrected for viability as the reference. In addition, calculations were also performed for data categorized according to the cycle threshold and symptomatic status. The test using nasopharyngeal swabs yielded the best results (sensitivity of 80.6% relative to PCR and 91.2% when corrected for viability) while none of the remaining tests (anterior nasal swab or saliva-based tests) came even close to the WHO criteria for overall sensitivity. Hence, we advise caution when using antigen tests with alternative sampling methods without independent validation.


Introduction
The identification of COVID-19 patients as early as possible after their infection is crucial for the successful management of the epidemic. In addition to qPCR, which serves in most countries as a gold standard, rapid antigen tests (RATs) are employed to facilitate early detection of infected patients and their isolation. Numerous studies evaluated various RATs for SARS-CoV-2, with some reporting excellent results meeting or even exceeding the European Center for Disease Control (ECDC)/World Health Organization (WHO) criterion of 80% sensitivity, while others reported poor results. Such studies are reviewed, e.g.,

Patient Group and Sampling
The study was approved by the local Ethics Committee, No. NsPKar/11593/2021. The tests were performed in a setting of a high-capacity COVID testing center during the outbreak in February and March 2021 in Karvina (Czech Republic). All patients coming for the PCR test for SARS-CoV-2 were offered participation in the study. The inclusion criteria were: (i) asymptomatic patients with known contact with a SARS-CoV-2-positive patient or (ii) mildly symptomatic patients with symptoms consistent with COVID-19, as well as (iii) agreement with participation and (iv) signing an informed consent form. In addition, there were exclusion criteria for saliva-based tests, namely eating, drinking, smoking or chewing in the last 10 min to 2 h prior to saliva sampling (see more in the section on antigen testing).
In patients participating in the study, a nasopharyngeal swab was taken by trained medical personnel and placed into 2 mL of the transport medium (D-MEM, 0.5% bovine serum albumin) for qPCR and, if needed, virus culture. The medium was immediately put into a refrigerator operating at 2-4 • C. Sampling for the RAT (always one RAT per patient) was performed in accordance with manufacturers' instructions; for RATs utilizing ANS or NPS, these swabs were taken by trained personnel from the other nostril than the one for NPS for qPCR, and saliva tests were performed using self-sampling. The antigen test was performed immediately on site, and samples for qPCR were, still cooled, transported to the Public Health Institute Ostrava for analysis and analyzed within 24 h. The PCR sample was also used for viability testing on CV-1 cells (see below in the qPCR and virus culture section). If the cell culture could not be started within 24 h, the samples were frozen at −80 • C and thawed immediately before testing.

Antigen Testing
Seven RAT tests were compared. One of these used nasopharyngeal swabs (NPS) and was included in the battery of tests so that at least one of the more widely accepted RATs using nasopharyngeal swabs was present in the battery of tests. Two tests that are suitable, according to the manufacturers, for use with anterior nasal swabs (ANS) widely used in the Czech Republic were also included in this study. The samples for these three tests were collected by trained medical personnel from the respective part of the nose in accordance with manufacturers' instructions. The remaining four tests were self-tests based on saliva samples using various methods of sample collection by the tested individuals. Three of the saliva-based tests (tests referred to as Saliva 1, 3 and 4) used a sponge on a stick inserted in the mouth for collection of saliva ("lollipop-tests") and subsequent extraction of the saliva from the sponge; the test referred to as Saliva 2 was a "spitting" test, i.e., the tested person spit into the provided cup and then pipetted a partial amount of the sample into the buffer. All these tests were also performed in accordance with manufacturers' instructions. For saliva tests, patients were asked whether they ate/drank/chewed gum/smoked within the last 10 min to 2 h (depending on manufacturers' instructions). Patients who responded that they did were not tested using the respective saliva test. Nevertheless, we had no way to verify these statements, which remains a limitation of this study (see more in the Discussion).

qPCR and Virus Culture
Direct PCR used the DBdirect COVID-19 Multiplex qPCR Kit (Diana Biotechnologies, Czech Republic) with an automated PCR set up on Agilent Bravo Liquid Handling System. The detection was based on the proof of two SARS-CoV-2 genes, namely genes encoding the spike protein and EndoRNAse. A synthetic internal standard was used for quality control. The overall cycle threshold (C t ) cutoff was 40, and cut-offs for classification into viral load groups were approximately 1.28 × 10 8 (C t = 20), 4 × 10 6 (C t = 25), 1.25 × 10 5 (C t = 30), 3.91 × 10 3 (C t = 35) and 1.22 × 10 2 (C t = 40) RNA copies/mL.
Virus culture using monolayer CV-1 cells (African green monkey kidney fibroblasts) was only performed where the RAT and PCR tests were in disagreement. Cells cultured at 37 • C in Leighton tubes were inoculated with 300 µL of the sample used for the qPCR testing (or blanks) and were microscopically examined, daily, for cytopathic effects of the virus. After 7 days (or once the cytopathic effect was observed in approx. 75% of cells), they were passaged (1:6) and cultured for another 7 days. If no cytopathic effect (i.e., no virus action) was observed over that period, the sample was declared free of viable virus. Where a cytopathic effect was observed, SARS-CoV-2 presence was verified by qPCR. The sensitivity of virus culture method was verified through serial dilution of the virus stock suspension (3 × 10 11 RNA copies/mL) prepared by culture, both directly and after freezing at −80 • C and thawing. The detection limit of the method in both cases was approx. 10 4 RNA copies/mL. Within the frame of a previous study [8], we also performed an analysis before and after freezing on 10 real-world samples with cycle thresholds 25-30 (5 samples) and 30-40 (5 samples), with a 100% agreement between results before and after freezing.

Data Analysis
RAT parameters (sensitivity, specificity, positive and negative predictive values) were calculated in Stata v.14 (StataCorp LLC, College Station, TX, USA). As the reference standard, we used both the (i) qPCR result (considered as positive up to C t = 40) and (ii) qPCR result corrected for the cell culture in samples where RAT provided different results from PCR (i.e., where qPCR test was positive but no viable virus was detected, the samples were considered negative); see Table 1 for clarification. 95% confidence intervals were calculated for all parameters.

Results
In all, 2287 samples were taken and analyzed. The numbers of samples analyzed by individual RATs are detailed in Table 2, along with the test parameters calculated relative to the qPCR (positivity threshold of C t = 40) and for the same corrected for virus viability, as well as manufacturer-declared sensitivities and specificities (MDSe/Sp). It is obvious that with the exception of the RAT using nasopharyngeal swabs, the performance of none of the tests met the criteria set by WHO and ECDC [1,9]. It also appears that the performance decreases in the order of nasopharyngeal swabs > anterior nasal swabs > saliva swabs, which might be expected (if the seat of infection is in the nasopharynx, it is more likely that virus will be detected there than in the anterior part of the nose or even the mouth). This is true both before and after correction on cell culture results.
After the experience with the first evaluated saliva-based test (Saliva 3), preliminary results were calculated after recruiting approximately 200 patients for Saliva 1 and Saliva 2 tests, and recruitment of patients was stopped at that time as the results were obviously unsatisfactory (sensitivities below 50%). Evaluation of the last test (Saliva 4) was stopped even sooner, after 98 individuals, as it was obvious even then that further continuation of the evaluation would not make much sense (only 1 out of 27 qPCR-positive samples was detected by this test, while two more returned false-positive results).
A closer look at the performance of the tests within individual C t categories (see Figure 1) shows that in the C t < 20 category, even tests using anterior nasal swabs performed relatively well. However, as soon as in the C t < 25 category (C t 25 is in most studies evaluating RATs considered as the limit for high positivity due to the original study by Bullard, in which viable virus was not identified in any sample with C t > 24 [10]), their performance dropped. All saliva-based tests performed poorly (<60% sensitivity) even in the categories with the highest viral loads (C t < 20 and 25, respectively; see Figure 1). Where points are missing in the graph, less than five samples were available in that C t category for the particular test, and we decided to remove these points from the presentation. Table 2. Overall real-world performances of 7 RATs; note that only samples where results of RAT and qPCR differed were cell-culture tested; C t -threshold for qPCR positivity is 40; values are presented as parameter estimate (95% confidence interval); MDSe/Sp = manufacturer-declared sensitivity/specificity; NPS-nasopharyngeal swab; ANS-anterior nasal swab; NPV/PPV-negative/positive predictive value; N-number of subjects. ure 1) shows that in the Ct < 20 category, even tests using anterior nasal swabs performed relatively well. However, as soon as in the Ct < 25 category (Ct 25 is in most studies evaluating RATs considered as the limit for high positivity due to the original study by Bullard, in which viable virus was not identified in any sample with Ct > 24 [10]), their performance dropped. All saliva-based tests performed poorly (<60% sensitivity) even in the categories with the highest viral loads (Ct < 20 and 25, respectively; see Figure 1). Where points are missing in the graph, less than five samples were available in that Ct category for the particular test, and we decided to remove these points from the presentation. Percentages of samples falling into individual Ct categories for individual tests are shown in Table 3. Percentages of samples falling into individual C t categories for individual tests are shown in Table 3.  Figure 2 shows the same data as Figure 1 after correction on cell culture (i.e., where a qPCR-positive sample contained no viable virus, the sample was considered negative/noninfectious). We can see again that the NPS-based test outperformed all others and, with the exception of the C t 25-29 category (which can be probably attributed to the relatively low number of samples in the category-only 28 individuals in this group, resulting in a wide confidence interval of 51-86%), performed well throughout the entire range. The performances of the remaining tests have not improved by much after the correction ( Figure 2, Table 2).

Test
Diagnostics 2021, 11, 1567 7 of 12 considered negative/non-infectious). We can see again that the NPS-based test outperformed all others and, with the exception of the Ct 25-29 category (which can be probably attributed to the relatively low number of samples in the category-only 28 individuals in this group, resulting in a wide confidence interval of 51-86%), performed well throughout the entire range. The performances of the remaining tests have not improved by much after the correction ( Figure  2, Table 2).  Table 4 shows the test performance according to the symptomatic/asymptomatic status of the test subjects. The performance parameters of the NPS test, albeit better for symptomatic individuals, returned a passable result of 84.5% even in the group of asymptomatic patients after the correction on infectiousness. The ANS 1 test showed better sensitivity for symptomatic than asymptomatic individuals, while the results of the ANS 2 did not statistically significantly differ between the symptomatic/asymptomatic individuals after correction on infectiousness; still, the performance failed to meet the WHO/ECDC criteria. The tests using saliva swabs returned unusable results (i.e., far below the WHO/ECDC criteria) with sensitivities of ≤54% in all analyzed classes, regardless of the correction. Still, it should be noted that the number of asymptomatic but SARS-CoV-2 positive individuals was generally low, which is also reflected in the wide confidence intervals in sensitivity and positive predictive values.
Specificities of the NPS and ANS tests were relatively good, with ANS 2 just failing (97.4%) to meet the sensitivity criteria set by WHO and the remaining two (ANS 1 and NPS) meeting the criteria (   Table 4 shows the test performance according to the symptomatic/asymptomatic status of the test subjects. The performance parameters of the NPS test, albeit better for symptomatic individuals, returned a passable result of 84.5% even in the group of asymptomatic patients after the correction on infectiousness. The ANS 1 test showed better sensitivity for symptomatic than asymptomatic individuals, while the results of the ANS 2 did not statistically significantly differ between the symptomatic/asymptomatic individuals after correction on infectiousness; still, the performance failed to meet the WHO/ECDC criteria. The tests using saliva swabs returned unusable results (i.e., far below the WHO/ECDC criteria) with sensitivities of ≤54% in all analyzed classes, regardless of the correction. Still, it should be noted that the number of asymptomatic but SARS-CoV-2 positive individuals was generally low, which is also reflected in the wide confidence intervals in sensitivity and positive predictive values.
Specificities of the NPS and ANS tests were relatively good, with ANS 2 just failing (97.4%) to meet the sensitivity criteria set by WHO and the remaining two (ANS 1 and NPS) meeting the criteria (Table 2). Table 4. Comparison of the test parameters presented separately for symptomatic and asymptomatic individuals both before and after correction on virus culture. Test parameters are presented as estimates with confidence intervals in brackets and italics. NPS-nasopharyngeal swab; ANS-anterior nasal swab; NPV/PPV-negative/positive predictive value; N-number of subjects.

Discussion
In this study, we have evaluated the real-world performances of seven RATs in a setting of a high-capacity testing center using our previously proposed method [7,8]. Briefly, the reasoning is that the principal aim of RATs is to identify infectious patients, while PCR can detect dead viral particles that may have been excreted from the organism during recovery, killed by good mucosal immunity or even got to the nasal mucosa already dead (e.g., on dust particles). This can be partially offset by reducing the C t threshold (typically to C t 25); however, viable virus can be detected even at higher C t values. Therefore, verifying the tests only against PCR can lead to overestimating (if C t threshold is reduced) or underestimating (if any PCR positivity even at high C t values is considered) the test sensitivity for the identification of infectious individuals. Virus culture (i.e., virus viability testing) is a possible solution to this problem. Viability testing is, however, extremely laborious and time consuming and for this reason, the approach of viability testing serving only as a "referee" for samples where the two methods disagree is, in our opinion, the best practically achievable solution.
Tests using other sampling methods than NPS were highly inferior to the NPS-based test, the performance of which meets the WHO/ECDC criteria and is in line with the better tests evaluated in our previous study [7]. This does not necessarily mean that all self-tests using saliva or nasal swabs are so vastly inferior, but the fact that all these tests failed definitely makes one doubt the effectiveness of these tests in the high-capacity setting in general.
As expected, the RATs performed the best at the lowest C t cycles, which are associated with a higher viral load and, thus, with higher probability of triggering the test reaction. Below C t 20 (i.e., 1.28 × 10 8 RNA copies/mL sample), tests using NPS as well as ANS had over 80% sensitivity, thus meeting the ECDC/WHO criterion for sensitivity. However, as soon as in the next category, i.e., C t < 25, where the virus culture confirmed the presence of viable virus (i.e., infectiousness) in almost all samples, only the NPS test maintained a good sensitivity of well over 95%; the result dropped to 73% and 56% for the two ANS tests, respectively. Tests using saliva failed to produce meaningful results even in the categories with the strongest positivity. It is necessary to say that there were very few SARS-CoV-2-positive individuals in the C t < 20 group when testing two of the saliva-based tests (four and two samples, respectively), so the results of the evaluation of these tests in this category are not very reliable; nevertheless, the fact alone that none of these six strongly positive patients were detected supports the conclusion that the performance of these tests is as poor in this category as it is in the others.
Above C t 25, the performance of all tests continued to drop, which was, however, at the same time, accompanied by a drop in the percentage of samples with viable virus. Here, we have to point out that we have detected viable virus in 20% of samples with C t between 30 and 35, and surprisingly, even in 8% of tested samples with C t between 35 and 40 (please note that only 488 samples with discrepant results between RAT and qPCR were analyzed using cell culture). This supports our opinion voiced in our previous papers [7,8] that simply reducing the C t threshold for classifying patients as positive (i.e., the method obviously often employed by the manufacturers when performing their validation studies) is not the way to go for validation. This is also supported by the study on the relationship between virus viability and C t threshold/number of RNA copies in the sample by La Scola [11] who found viable virus up to C t threshold 33 (interestingly, as much as 50% of samples at C t 32 contained viable virus). It is true that they found no viable virus at C t values over 33, but they tested only one to three samples at these thresholds, so their failure to detect any viable sample at these C t values is not surprising (in our study, 90 samples with C t 30-34 and 62 samples with C t 30-35 were tested, respectively, thus giving us a chance to capture even lower percentages of samples with viable virus). The same can be said about the results by Bullard who, as mentioned above, did not detect any viable virus in samples with C t > 24; nevertheless, it is not clear from their paper how many samples fell within the category of C t > 25 (their study included 90 samples in total, with median C t 23) [10].
Comparison with the literature is difficult. We have not found any peer-reviewed paper evaluating saliva-based RATs and even studies using anterior nasal swabs are relatively rare. In one of the few such studies, Osmanodja et al. [12] described an excellent performance of their antigen test using anterior nasal swabs (Dräger Antigen Test SARS-CoV-2), with an overall sensitivity of 88.6% (and as much as 96.7% for patients with a high or medium viral load corresponding to C t = 27 in our study). The nasal swab variant of one of the most popular NPS tests, Standard Q, was evaluated by Nikolai et al. [13]. In their study, professionally taken ANS were compared with PCR and so were self-sampled and professionally collected mid-turbinate swabs. Their results returned excellent performance (86% sensitivity) of this test for professional ANS up to the virus load equivalent to approx. C t 24 in our study, dropping to 43% for higher values. Another ANS-based RAT by a recognized producer, Abbott BinaxNOW™, was evaluated by Pollock et al., reporting an overall sensitivity of 81.2% for patients with C t values up to 35 in self-collected ANS [14] and by Pilarowski et al., reporting 93.3% sensitivity in professionally-collected ANS [15]. These results were, however, in contrast with those by James et al. [16] who reported sensitivity of only 51.6% in professionally-collected nasal swabs using this test.
The two tests using anterior nasal swabs in our study did not perform as well as those reported in the aforementioned studies, suggesting their inferior quality. It must be also noted that both these ANS tests can, according to the manufacturers' instructions, be used with NPS instead. It is likely that if professionally-collected NPS were used with these two tests instead, the results would be better than those presented in this study. We have, however, not performed such a direct comparison, as the principal reason for the wide use of these tests in the Czech Republic is their "user-friendliness", i.e., the fact that they do not need a (professional) NPS taken.
None of the saliva-based tests yielded results that could justify their use in practice. We have to acknowledge as a limitation of the study that we do not know whether the patients told the truth that they have not eaten or drunk for some time before the sampling. Nevertheless, from the perspective of the mass use of these self-tests at workplaces, at schools or at high-capacity testing points, a limitation such as not eating, drinking, chewing, smoking, brushing teeth or generally interfering with the oral cavity for 2 h prior to taking the test would render such a test unsuitable for large-scale use regardless of the test result (although 30 min required by some of the tests is perhaps achievable). Their use as self-tests in the morning upon waking, i.e., after a long period without interference with the oral cavity, could perhaps provide better results; nevertheless, in our high-throughput setting, the performance of saliva-based RATs was sadly lacking.
The evaluation of the results for asymptomatic patients is, in view of their sensitivity results, meaningless for the saliva-based tests. For ANT and NPS tests, the results were affected by a low number of positive results in this group, which leads to wide confidence intervals. In effect, we cannot make any strong statements regarding the performance of the tests in these groups and the results can be only perceived as indicative. Still, it appears that the sensitivity is somewhat lower in asymptomatic individuals than in symptomatic ones. This is logical since in symptomatic individuals who are sneezing, the virus is more likely to reach the lower levels of the nasal system. The good news from the perspective of usability of these tests is that in the NPS-based test, the sensitivity estimate remained over 80% even in asymptomatic patients.
Based on our results, we have to strongly disagree with the widespread policy that all CE-certified antigen tests have the same validity, and this is especially true of the tests that have not been independently evaluated. All tests evaluated in our study were CE-certified and their reported sensitivities were over 89%, most of them over 95%, thus allegedly meeting the criteria set by ECDC and WHO. Only the NPS test met the criteria with its 91.2% sensitivity, 98.5% specificity, 96.2% PPV and 96.6% NPV after correction for the presence of viable virus, which is (i) in accordance with the declared values and (ii) comparable to the better performing NPS RATs evaluated using the same method; see our previous work comparing five NPS tests [8]. In that study, some tests provided excellent results (up to >96% sensitivity when compared with PCR and corrected for viability), while others failed to meet the criteria. This variability within the same sampling design also means that well-performing tests from saliva or ANT can exist; in our study, however, none of such tests met the WHO criteria when used in a high capacity setting. It is also worth noting that one of the saliva-based tests that performed poorly in this study is produced by the same manufacturer as an NPS test performing very well in the previous study [8]. This only further supports the notion that poor performance of saliva-based tests is rather associated with the type of the sample than with a poor manufacturing process.
As with every study, ours comes with strengths and limitations. The strengths include a large number of tests, sufficient numbers of tested individuals and, in particular, the cell culture testing for excluding samples that were qPCR-positive but did not contain any viable virus. Besides the aforementioned limitation for saliva-based tests (no way of knowing whether the patients told the truth about not eating, etc., within a specified period before sampling), there is another important limitation to our study. We are, unfortunately, unable to disclose the manufacturers or the test names as the study was funded by distributors who gave their consent for publishing the results only providing that the tests are not named. We acknowledge this as a limitation of this study; however, without this support, this study would not even come to existence. For this reason, we would like to call for additional fully independent studies evaluating the diagnostic performance of, in particular, ANS and saliva-based tests.

Conclusions
Of the evaluated antigen tests, only the test using nasopharyngeal swabs met the criterion of >80% sensitivity in a real-world high throughput setting in a high prevalence population (consisting of individuals with symptoms or with a history of contact with a SARS-CoV-2-positive person). The evaluated antigen tests using anterior nasal swabs performed much worse, meeting the criterion of 80% sensitivity only in patients with the highest viral load (C t < 20) but dropping below the limit as soon as in the C t 20-25 category.
The tests utilizing saliva included in this study yielded the worst sensitivities, with the best of these tests returning sensitivity of 54% in the group with the highest viral load (C t < 20) despite high declared sensitivity values. For these reasons, we strongly caution against using RATs, especially anterior nasal swab-based or saliva-based RATs, solely on the basis of manufacturer-declared sensitivity and specificity values without independent validation. We would also like to use this opportunity to call for such independent studies evaluating the diagnostic performance of, in particular, swab-or saliva-based RATs.

Conflicts of Interest:
Tests were provided by the distributors of the tests who also contributed equally to the costs of testing. However, the funders had no role in the design of the study; in the collection, analyses and interpretation of data; or in the writing of the article. However, under the terms of the agreements with the funders, without which this study could not be performed at all, we are not allowed to publish the test names or manufacturers.