Diagnosis of SARS-Cov-2 Infection by RT-PCR Using Specimens Other Than Naso- and Oropharyngeal Swabs: A Systematic Review and Meta-Analysis

The rapid and accurate testing of SARS-CoV-2 infection is still crucial to mitigate, and eventually halt, the spread of this disease. Currently, nasopharyngeal swab (NPS) and oropharyngeal swab (OPS) are the recommended standard sampling techniques, yet, these have some limitations such as the complexity of collection. Hence, several other types of specimens that are easier to obtain are being tested as alternatives to nasal/throat swabs in nucleic acid assays for SARS-CoV-2 detection. This study aims to critically appraise and compare the clinical performance of RT-PCR tests using oral saliva, deep-throat saliva/posterior oropharyngeal saliva (DTS/POS), sputum, urine, feces, and tears/conjunctival swab (CS) against standard specimens (NPS, OPS, or a combination of both). In this systematic review and meta-analysis, five databases (PubMed, Scopus, Web of Science, ClinicalTrial.gov and NIPH Clinical Trial) were searched up to the 30th of December, 2020. Case-control and cohort studies on the detection of SARS-CoV-2 were included. The methodological quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS 2). We identified 1560 entries, 33 of which (1.1%) met all required criteria and were included for the quantitative data analysis. Saliva presented the higher accuracy, 92.1% (95% CI: 70.0–98.3), with an estimated sensitivity of 83.9% (95% CI: 77.4–88.8) and specificity of 96.4% (95% CI: 89.5–98.8). DTS/POS samples had an overall accuracy of 79.7% (95% CI: 43.3–95.3), with an estimated sensitivity of 90.1% (95% CI: 83.3–96.9) and specificity of 63.1% (95% CI: 36.8–89.3). The remaining index specimens could not be adequately assessed given the lack of studies available. Our meta-analysis shows that saliva samples from the oral region provide a high sensitivity and specificity; therefore, these appear to be the best candidates for alternative specimens to NPS/OPS in SARS-CoV-2 detection, with suitable protocols for swab-free sample collection to be determined and validated in the future. The distinction between oral and extra-oral salivary samples will be crucial, since DTS/POS samples may induce a higher rate of false positives. Urine, feces, tears/CS and sputum seem unreliable for diagnosis. Saliva testing may increase testing capacity, ultimately promoting the implementation of truly deployable COVID-19 tests, which could either work at the point-of-care (e.g. hospitals, clinics) or at outbreak control spots (e.g., schools, airports, and nursing homes).


•
Observational studies (i.e., cross-sectional, case-control or cohort study types); • Use of RT-PCR to detect the presence of SARS-CoV-2 in matched samples; • Report SARS-CoV-2 positive and negative test results, and/or cycle threshold (CT) from index alternative specimens (saliva, DTS/POS, sputum, urine, feces, or tears/CS) evaluated against NPS and/or OPS; • Studies with confirmed or suspected cases of SARS-CoV-2 infection.
Saliva samples refer to samples collected from the oral region (i.e., circumscribed to the oral cavity) while DTS/POS refers to salivary samples mixed with pharyngeal secretions. Sputum refers to primarily lower respiratory tract mucous mixed with pharyngeal and salivary secretions.

Search Strategy and Study Selection
Search strategies were carried out in different databases (PubMed, Scopus, Web of Science, ClinicalTrial.gov and NIPH Clinical Trial) until 30th of December 2020.
We used the following search syntax: (COVID-19 OR COVID19 OR n-CoV19 OR SARS-CoV-2 OR SARS-CoV2) AND (Diagnosis OR Diagnostic OR Test OR Detection) AND (Saliva OR Salivary OR "Oral fluid" OR Sputum OR Expectoration OR Gob OR Tears OR Conjunctival OR Stool OR Feces OR Fecal OR Urine). No restrictions on the year of publication nor on language were made. We used Mendeley reference manager (Elsevier, Mendeley Ltd, London UK) to organize records and remove duplicates. The study selection was assessed independently by two investigators (V.M.M. and P.M.), and by screening the titles and abstracts of retrieved studies. Articles selected at this point were further appraised by full text reading. Inter-examiner reliability after full-text assessment was computed through Cohen's kappa statistics, and any disagreements were resolved by discussion with a third author (M.G.A.).

Data Extraction Process and Data Items
Two authors (V.M.M. and P.M.) independently retrieved and reviewed the following data (if available) from all included studies: year of publication, first author, location, design, population size, mean age, gender ratio, mean days after symptoms onset, specimens and methods used; and the following test outcomes: number of total, positives, negatives, and CTs.

Risk of Bias Assessment
The methodological quality of the included studies was evaluated independently by two authors (V.M.M. and P.M.), using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool [46], with any discordant rating resolved by a third author (M.G.A.). This instrument judges the risk of bias (RoB) and accessibility from diagnostic accuracy studies. QUADAS-2 contains four key domains (patient selection, index test, reference standard, and flow and timing) and each domain is rated as low, high, and unclear RoB. The robvis tool was used to generate all the RoB plots [47]. If a study failed to provide enough information, the domain was classified as "no information".

Quantitative Analyses
We used MetaDTA [48] to examine the overall SARS-CoV-2 detection test accuracy and perform subgroup sensitivity analysis for the selected index specimens. In MetaDTA, the bivariate random-effects model meta-analyses pooled estimates for sensitivity and specificity together. This approach accounts for potential threshold effects and covariance between sensitivity and specificity. However, because these two parameters depend on many other factors, accuracy heterogeneity is expected to be high and problematic to estimate [49]. Diagnostic odds ratios (dOR) were directly obtained from the sensitivity and specificity logit estimates. Furthermore, the summary receiver operating characteristic (sROC) plot was rendered using parameters estimated from the bivariate model through the equivalence equations of Harbord et al [50]. CTs random effects meta-analysis, and all meta-regressions to identify potential sources of heterogeneity or confounding within or between the evaluated index specimens meta-analysis were performed with OpenMeta-Analyst [51]. The influence of the specific time of sampling and the disease stage on the accuracy rate of the test were planned to be assessed through meta-regression.

Results
Electronic searches revealed a total of 3022 entries (1406 articles from PubMed, 522 from Web of Science and 1094 from Scopus). The search on clinical trial databases yielded no results. After removing replicates, 1560 articles were judged against the eligibility criteria, and 1415 were excluded after title and/or abstract review. Out of the 145 subjected to full paper review, 112 articles were excluded (Supplementary S2, pp. 4-11). As a result, a final of 33 studies met all the required criteria and were included for the quantitative data analysis ( Figure 1). Inter-examiner agreement was considered as almost perfect agreement (k = 0.907, 95% CI: 0.828-0.987).

Quantitative Analysis (Meta-analysis)
The random-effects meta-analysis demonstrated saliva as the index specimen with higher sensitivity and lower false-positive test results (Table 2).
In the meta-analysis of salivary samples from the oral cavity, estimates show an overall diagnostic accuracy of 92.1% (Figure 3a

Quantitative Analysis (Meta-Analysis)
The random-effects meta-analysis demonstrated saliva as the index specimen with higher sensitivity and lower false-positive test results (Table 2). In the meta-analysis of salivary samples from the oral cavity, estimates show an overall diagnostic accuracy of 92.1% (Figure 3a Meta-regressions' screening for potential confounding variables demonstrates no influence of M/F ratio (Supplementary S4, pp. 13). Regarding the differences in the study's sample size, while for sensitivity it is not significant (p = 0.518) ( Figure S7), for specificity a higher sample size appears to impact positively its performance (p < 0.034) (Supplementary S4, pp. 13). As for the target gene, sub-analysis was deemed unsuitable given the variety of methods (Table 1).
Regarding urine, we did not find enough studies to compute estimates. Finally, the CTs in RT-PCR tests were compared between the index samples under analysis. We obtained an overall mean difference between saliva and NPS/OPS of 2.792 (95% CI: −1.457; 7.041) (Supplementary S9, pp. 14), i.e., there is a negative correlation between the CT for the NPS/OPS specimen and the CT for saliva samples. This means that, on average, the CT value for saliva is higher than the one for NPS/OPS. For the mean difference between DTS/POS and NPS/OPS, a significantly different estimate was obtained: -1.808 (95% CI: −3.189;−0.427) (Supplementary S10, pp. 14).

Quantitative Analysis (Meta-analysis)
The random-effects meta-analysis demonstrated saliva as the index specimen with higher sensitivity and lower false-positive test results (Table 2).
In the meta-analysis of salivary samples from the oral cavity, estimates show an overall diagnostic accuracy of 92.1% (Figure 3a; 0.921, 95% CI: 0.700; 0.983), with an estimated sensitivity of 83.9% (Figure 3b 0.631, 95% CI: 0.368; 0.893). The uncertainty of test performance estimates is much higher than in saliva-based diagnostics since less studies support the meta-analysis model fit.
Regarding urine, we did not find enough studies to compute est Finally, the CTs in RT-PCR tests were compared between the i analysis. We obtained an overall mean difference between saliva an (95% CI: −1.457; 7.041) (Appendix S9, pp 14), i.e., there is a negative the CT for the NPS/OPS specimen and the CT for saliva samples. average, the CT value for saliva is higher than the one for NPS/OPS. ence between DTS/POS and NPS/OPS, a significantly different estim 1.808 (95% CI: −3.189;−0.427) (Appendix S10, pp 14).

Discussion
We systematically reviewed 33 studies on the diagnostic accuracy of RT-PCR testing using minimally invasive human specimens that may replace the nasal and throat swabbing that are routinely used for the detection of SARS-CoV-2. Overall, the most promising index specimen is saliva, with a true positive rate (sensitivity-pooled estimate) of 83.9% and a true negative rate (specificity-pooled estimate) of 96.4%. Interestingly, a critical analysis of these results shows that the accuracy of such tests was affected by a high level of heterogeneity, mostly due to methodological variations. Therefore, as a diagnostic specimen, "saliva" deserves a particular attention, and several considerations need to be taken into account. Firstly, most studies accounted for salivary samples circumscribed to the oral region (anterior to the throat) [ [25,27,29,32,41,43]. This fact is very important as the salivary characteristics and the collection method differ, and the DTS/POS may contain samples other than the oropharyngeal region (naso-pharyngeal or laryngealpharyngeal) [55]. Secondly, among the studies using saliva samples from the oral cavity, the methods described show high heterogeneity and are unclear; for instance, they do not mention whether saliva was stimulated or not. Nevertheless, despite the multiple approaches used for the collection of saliva from the oral cavity (stimulated, unstimulated or unclear), saliva provided a high diagnostic accuracy (above 90%), confirming the potential of this specimen for SARS-CoV-2 detection. An additional limitation is that some of these works failed to properly describe the percentage of patients having asymptomatic, presymptomatic or symptomatic statuses, as the viral load varies significantly in these patients and may negatively affect the accuracy of saliva as an index specimen. To further improve the saliva collecting protocol and secure its clinical validation and utility, specifically designed studies shall be performed, to overcome the current methodological limitations.
Concerning the other evaluated index specimens, sputum presented an elevated risk of delivering false positive results when compared to NPS/OPS RT-PCR. Nonetheless, we must be cautious in interpreting these results due to the small number of studies. Similarly, tears/CS delivered the lowest sensitivity and yet, the highest specificity; though, once again, these results were based on scarce data [56].
As for the CT analyses, due to the low number of available studies, these estimates are inconclusive at this stage.
From the sampling standpoint, both saliva and sputum can be easily obtained; however, 72% of COVID-19 patients may not produce enough sputum for analysis [57]. Therefore, saliva (from the oral region) seems to be the best specimen for both public health and epidemiologic measures [55]. Because saliva can be self-collected by patients at home or the outbreak spot, it would decrease the exposure of health-care workers to infections, and reduce the waiting times for sample collection [55]. In contrast, DTS/POS may cause the dispersion of aerosols as a result of the cough-up collection process. However, some papers have reported lower accuracy scores for salivary samples owing to critical factors such as the viral load [58], which greatly depend on the disease stage (time from onset of illness) and the time-point of specimen collection over the day. Consequently, in this systematic review we considered the influence of the specific time of sampling and the disease stage on the accuracy rate of the test through a meta-regression, though unsuccessfully. More research is needed on these factors in order to deliver more accurate results, and, eventually, to define a detailed protocol for sampling prior to collection (e.g. time-point, oral hygiene, whether to avoid drinking or eating beforehand). Other issues that may lead to false negative RT-PCR results include insufficient viral material in the specimen, laboratory error during sampling, and restrictions on sample transportation [56].
We are unaware of any other similar systematic review pooling consistent estimates on alternative specimens for detecting SARS-CoV-2, in such a way that it could have a significant impact in the accepted sampling methodologies. Indeed, almost ten months have passed since the public announcement of the COVID-19 pandemic and we now have access to a large number of scientific articles. The timing of this review is thus adequate and decisive to ensure the computation of pooling estimates, which, nonetheless, might become outdated in the months to come. Notwithstanding, these results pinpoint saliva samples circumscribed to the oral cavity as the index specimen with the greatest potential. This is a very important outcome owing to the particular circumstances we are currently experiencing (second or third waves of COVID-19) demanding extensive and rapid diagnosis of infection for which a self-administrated protocol for specimen collection would be extremely useful.
The recent understanding that some vaccines may provide little or no protection from infection with SARS-CoV2 strains bearing certain mutations in the receptor binding domain (spike variants) should prompt the development and implementation of new assays that combine sensitive diagnosis with strain identification such as those that make use of the CRISPR-Cas12 technology [59].

Strengths and Limitations
Despite the thorough and comprehensive approach undertaken in this review to appraise all the clinical evidence available, some shortcomings are noteworthy. The high level of heterogeneity observed limits the validation of quantitative analyses. This might be explained by the methodological variability in different works, namely the diverse number of samples considered in each one, the fact that not all studies have used the same index test, sample treatment or target gene.
Although several studies addressed the topic of detecting the presence of SARS-CoV-2 in index samples, not all of them could be included in this meta-analysis since some of them did not provide all the raw data required to calculate the main diagnostic performance parameters. Moreover, some of the works only tested positive patients. Other factors that might have led to some variance in results are the timing of specimens' collection and testing, sampling procedure, among others. Actually, a number of publications did not even provide such information. Given the urgency to develop effective solutions for the COVID-19 pandemic, this heterogeneity might be seen as a collateral limitation.
These results have been derived from a rigorous protocol with up-to-date standards using appropriate guidelines. In this way we were also able to estimate the accuracy (clinical sensitivity and specificity) of a considerable number of index specimens. Still, there is an urgent need for better designed trials that should follow more homogeneous methodologies to further confirm our findings, they may aid public health authorities in validating alternative samples for SARS-CoV-2 infection diagnosis that are as reliable as nasal and throat swabs, but are non-painful, non-stressful and much easier to collect.

Conclusions
Despite having several vaccines against SARS-CoV-2 already approved and being implemented in most developed countries, the coverage has been very slow, and it will take months to significantly reduce the prevalence of COVID-19. Since the very beginning of the pandemic, massive testing has been a critical priority in the struggle against the spread of the virus. Effective tests allow to discriminate between infected and non-infected people, thereby supporting decision making for clinical management of patients, transmission control, and epidemiological studies. According to the WHO interim guidance regarding "laboratory testing guiding principles" [59], the availability of accurate laboratory or pointof-care tests are as important as the rapid collection of appropriate physiological samples. Respiratory specimens are the only ones that were accepted up to now, but the complexity in their collection from the nasal cavity and discomfort caused to patients are driving the search for simpler and less intrusive substitutes. To this end, several alternative specimens have been compared to nasal/throat swabs for diagnosis of SARS-CoV-2 infection using nucleic acid assays (RT-PCR), and the results were systematically reviewed herein. We found that saliva from the oral region is the best candidate as an alternative specimen for SARS-CoV-2 detection. In fact, despite some heterogeneity in methodologies, the proportion of infected and non-infected patients correctly identified through the index sample is 83.9%, and 96.4%, respectively. The second-best specimen was DTS/POS, with a better true positive rate than saliva (sensitivity of 90.1%), but a much lower true negative rate (specificity of 63.1%). The specificity of sputum samples was even lower (25.4%), despite a reasonably high sensitivity (85.4%). Globally, the clinical performance of the other specimens (urine, feces, and tears) was inferior, but one should mention that the number of studies with these index specimens done so far is still scarce.
To sum up, saliva samples simply taken from the oral cavity are promising alternatives to the currently used swab-based specimens, since they can be effective, and allow selfcollection. Besides mitigating the discomfort caused by sampling, saliva testing may considerably reduce the transmission risk while increasing testing capacity, ultimately promoting the implementation of truly deployable COVID-19 tests, which could either work at the point-of-care (e.g., hospitals, clinics) or outbreak control spots (e.g., schools, airports, and nursing homes). Before the index specimen saliva can be recommended by the main public health authorities, further assessment and validation is urgently required to define the best practices to adopt.