SARS-CoV-2 Mutations and Variants May Muddle the Sensitivity of COVID-19 Diagnostic Assays

The performance of diagnostic polymerase chain reaction (PCR) assays can be impacted by SARS-CoV-2 variability as this is dependent on the full complementarity between PCR primers/probes and viral target templates. Here, we investigate the genetic variability of SARS-CoV-2 regions recognized by primers/probes utilized by PCR diagnostic assays based on nucleotide mismatching analysis. We evaluated the genetic variation in the binding regions of 73 primers/probes targeting the Nucleocapsid (N, N = 36), Spike (S, N = 22), and RNA-dependent RNA-polymerase/Helicase (RdRp/Hel, N = 15) of the publicly available PCR-based assays. Over 4.9 million high-quality SARS-CoV-2 genome sequences were retrieved from GISAID and were divided into group-A (all except Omicron, >4.2 million) and group-B (only Omicron, >558 thousand). In group-A sequences, a large range of variability in primers/probes binding regions in most PCR assays was observed. Particularly, 87.7% (64/73) of primers/probes displayed ≥1 mismatch with their viral targets, while 8.2% (6/73) contained ≥2 mismatches and 2.7% (2/73) contained ≥3 mismatches. In group-B sequences, 32.9% (24/73) of primers/probes were characterized by ≥1 mismatch, 13.7% (10/73) by ≥2 mismatches, and 5.5% (4/73) by ≥3 mismatches. The high rate of single and multiple mismatches- found in the target regions of molecular assays used worldwide for SARS-CoV-2 diagnosis reinforces the need to optimize and constantly update these assays according to SARS-CoV-2 genetic evolution and the future emergence of novel variants.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic has recorded over 554 million cases of infection (about 7% of the globe) and is responsible for more than 6.3 million deaths in the last 30 months. The causative agent of COVID-19; SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is characterized by an RNA genome encoding more than 29 structural, non-structural, and regulatory proteins.
As an RNA virus, SARS-CoV-2 has a relatively low dynamic mutation rate compared to other RNA viruses, including influenza, HIV, and HCV, and even to DNA viruses such as HBV [1], mainly due to the transcriptional fidelity and proofreading activity of its replication complex. Despite that, since the emergence of SARS-CoV-2, adaptive evolution and genetic diversification have led to the emergence of over 56,000 mutations, including deletions and insertions across the viral genome. In particular, most mutations are observed in the ORF1ab of 71.3%, followed by 12.8% in the Spike and 4.2% in the Nucleocapsid, [2] (Accessed date, 3 July 2022).
To date, SARS-CoV-2 genetic diversification has led to the emergence of the variants of concern (VOC), including Alpha, Beta, Gamma, Delta, and more recently Omicron.
These variants have raised several concerns due to their potential impact on increasing transmissibility and severity [3,4]. Therefore, efficient surveillance and accurate detection of SARS-CoV-2 variants are crucial to optimize clinical management and effective pandemic control. This also requires accurate and reliable diagnostic molecular assays based on the Polymerase chain reaction (PCR) as well as antigen-based (Ag-RDT) assays, as highlighted by the WHO, CDC, and ECDC [5][6][7].
Notably, the performance of the diagnostic assays can be impacted by certain variants. As the performance of PCR assays is crucially dependent on the set of primers and probes specific to bind complementary sequences in the targeted viral genome, mismatches between primers and templates are known to influence assay efficiency and sensitivity [8][9][10][11][12][13][14]. Notably, most of the validated SARS-CoV-2 PCR assays, including those for real-time (RT)-PCR, Qualitative PCR, and Sequencing, had their set of primers and probes designed based on the wide-type strain (NC_045512.2) published by early January 2020 and targeted regions in the Nucleocapsid, Spike, Envelope, and ORF1ab.
As above stated, these targeted regions have the highest mutational rates; therefore, some mutations may occur in the binding regions of the primers or probes, potentially leading to primer/probe-template mismatches and, consequently, false-negative results or even detection failure. Herein, the objective was to evaluate the genetic variability of the viral regions recognized by primers and probes utilized by the publicly available PCR-based diagnostic assays based on nucleotide mismatching analysis.

Materials and Methods
In this study, SARS-CoV-2 genome sequences (N = 4,930,239, Accessed date 25 January 2022) were retrieved from the Global Initiative on Sharing All Influenza Data (GISAID) database [15], and about half of the sequences were from Europe. Stringent quality filters were applied to include only entire sequences characterized by high quality (identified, for each analyzed region, the presence of <1% ambiguous nucleotides, <0.05% unique amino acid mutations, and no insertion/deletion unless verified in the sequence by the submitter).
The sequences were divided into group A (all sequences except Omicron) and group B (only Omicron sequences). The frequency of mismatches was calculated based on a total of 4,133,465 sequences for Nucleocapsid, 4,196,498 Spike glycoprotein, and 4,132,890 RdRp and Helicase for all SARS-CoV-2 sequences apart from Omicron (Group A) and for 558,914 sequences of Nucleocapsid, Spike glycoprotein, RdRp, and Helicase for predominately circulating Omicron VOC (Group B).
We then calculated the frequency of at least one, two, and three mismatches in the binding region of each primer and probe targeting the Nucleocapsid, Spike, RNA-dependent RNA polymerase, and Helicase. Sequences with a mixture of wild-type and mutant residues at single positions were considered to have the mutation(s) at that position. In particular, we calculated the number of exact matches, zero errors, and mismatches, with one, two or three errors, searching for this in the two groups of sequences using the software tre-agrep, a software that allow to search a string in a file with approximate matches. To standardize the effect of the sequence, only primers/probes mismatches observed in >1% of viral sequences (corresponding to >41,000 group-A and >5500 group-B sequences) were considered.

Results and Discussion
The overall analysis revealed that in group-A sequences, which represent the overall dataset of sequences except for the Omicron, a large range of variability in primers/probes binding regions was observed in most PCR assays. Particularly, 87.7% (64/73) of primers/probes displayed ≥1 mismatch with their viral targets, while 8.2% (6/73) displayed ≥2 mismatches and 2.7% (2/73) ≥3 mismatches (Table 1). Whereas, in group-B sequences (Omicron alone), 32.9% (24/73) of primers/probes was characterized by ≥1 mismatch, 13.7% (10/73) by ≥2 mismatches, and 5.5% (4/73) by ≥3 mismatches. (Table 1). It is widely known that viruses tend to evolve rapidly during outbreaks, leading to emerging new mutations, and it is unsurprising that mutations either synonymous or non-synonymous may occur in the binding regions of primers and probes compromising the sensitivity of PCR assays.
This was due to the three-nucleotide substitutions codifying the mutation pair R203K/G204R that is present in several SARS-CoV-2 variants, including previous VOCs Alpha and Gamma. The specific mutations R203M for Delta and T205I for Beta also localize in this primer as well, thus explaining the high frequency of mismatching for this primer ( Table 2).
Our finding is in line with an earlier study by Vogels et al. that showed the three mismatches in the China CDC N forward primer caused by the R203K/G204R [23]. Likewise, in the spike, in the RT-PCR forward primer reported by Young et al. in Singapore, ≥1, ≥2, and ≥3 mismatches were observed in 38.6%, 23.2%, and 22.1% of the sequences, respectively, the reason was attributable to the six-nucleotide deletions leading to H69del-V70del. The latter was associated with diagnostic escape events termed S gene target failure or S gene dropout in previously Alpha VOC [24,25] (Tables 1 and 2).
Moreover, some primers/probes have shown various degrees of mismatches arriving at 73%, as the highest mismatches of ≥1, observed in both S overlapping probes from RT-PCR by Sigma-Aldrich. This can be attributed to the mutation's enrichment in this region (Q677H, N679K, Ins679GIAL, P681H, and P681R) that characterize variants, including Alpha, Gamma, and Delta VOCs.

Qualitative PCR, and Sequencing assays
Sigma-Aldrich N GCCTCTTCTCGTTCCTCATCAC F Both ends A182S, S183Y, S186Y, S187L, S188L A182S, S183Y, S186Y, S187L, S188L This was followed by an S forward primer designed by Thermo Fisher, which showed ≥1 and ≥2 mismatches of 48.7% and 3.2% of sequences, respectively, resulting from the mutations L18F, T19R, and T20N that characterize Beta, Gamma, and Delta VOCs. Similarly, the two RdRp forward primers designed by Charité Hospital and Won from Korea were observed with about 42% of ≥1 mismatch due to the Delta VOC mutation G671S and other sporadic mutations M666I and M666T found in other variants.
Moreover, the N overlapping reverse primers for qualitative PCR, designed by Sigma-Aldrich, have shown ≥1 mismatch in binding sequences in over 37% of sequences, and this could be due to the localization of the reverse complement in a region highly enriched in mutations, including G215C specific for Delta VOC, and other sporadic mutations (Tables 1 and 2).
Again, in the same assay from CDC in China, the N reverse primer of RT-PCR showed ≥1 mismatch in 32.7% of sequences, and this is due to the presence of S235F specific for Alpha VOC and M234I in different SARS-CoV-2 lineages. Finally, in the RT-PCR probe reported by Young et al. in Singapore, ≥1 mismatch was observed in 13% of the sequences, due to the presence of D80A in Beta VOC and K77T in some sublineages of Delta VOC (Tables 1 and 2).
In three RT-PCR assays (one of which is the N probe reported by Young et al. in Singapore, the second of which is the Hel reverse primer reported by Chan et al., and the last is the RdRp reverse primer designed by Charite hospital), the primers/probe showed ≥1 mismatch for all the sequences, due to the original nucleotide mismatching (designed incorrectly) to the SARS-CoV-2 wide-type sequence as highlighted in Table 1. Focusing on RdRp reverse primer by Charite hospital (the first assay that has been broadly criticized for its molecular and methodological validities), it has been demonstrated that a single base mismatch in the reverse primer can increase the number of quantification cycles and reduce the sensitivity of the assay by affecting the RT step [26].
Similar to group A sequences, the highest number of mismatches was observed in the N forward primer of RT-PCR, designed by CDC in China, where >99% of sequences showed ≥1, ≥2, and ≥3 mismatches, due to the presence of the mutation pair R203K/G204R in all Omicron sublineages (Tables 1 and 2). Likewise, it was observed in the N reverse primer of RT-PCR assay, designed by NIH in Thailand, that >99% of sequences showed ≥3 mismatches, due to the presence of the nine nucleotide deletion codified as E31-R32-S33 in all Omicron sublineages (Tables 1 and 2).
In the S forward primer of the RT-PCR assay reported by Young et al., ≥3 mismatches were observed in >97% of the sequences, again due to the presence of the deletion at H69del-V70del in the Omicron's backbone except for BA.2 sublineage. Furthermore, the S forward primer designed by Thermo Fisher, showed ≥1, ≥2, and ≥3 mismatches, of 4.2%, 2.2%, and 2.2% of sequences, respectively, resulting from the mutations T19R, L24S, and P25del that characterize BA.2 sublineage of Omicron VOC (Tables 1 and 2).
Other primers/probes have shown complete (100%) mismatches, and the highest mismatches were observed again in both S overlapping probes of RT-PCR assay by Sigma-Aldrich of ≥1 and ≥2 mismatches, due to the presence of the mutations N679K and P681H in all Omicron VOC sublineages. A similar scenario was observed for the N probe of RT-PCR assay designed by CDC in the USA, which showed ≥1 mismatch in >99.7% of Omicron sequences due to the presence of P13L mutation. Again, the S probe reported by Chan et al. showed ≥1 mismatch in 97.2% of sequences, because of the existence of the mutation K417N in Omicron's backbone (Tables 1 and 2). In the S reverse primer of Qualitative PCR reported by Won et al. in Korea, >97.4% of Omicron sequences showed ≥1 mismatch with the reverse primer due to the presence of the T547K mutation (Tables 1 and 2).
Notably, the first study that addressed this issue has been published in May 2020 as early as the pandemic began [27] and showed that 79% (26/33) of the primer binding sites used in the RT-PCR assays (including CDC China and CDC USA assays) were mutated in at least one genome sequence of the total 1825 analyzed SARS-CoV-2 sequences.
Soon afterward, a number of studies described impairment of detection due to primer or probe mismatches. Artesi and colleagues showed that a single nucleotide mutation (C to T) at position 96 in the E gene is associated with failure of the Cobas SARS-CoV-2 E gene RT-PCR [28]. A study from our group also reported the failure of N target detection due to a deletion of six nucleotides at position 640-645 in the N gene of AllplexTM SARS-CoV2 Assay [29].
Three nucleotides' mismatches at N-tail of N gene lead to D3L mutation in the previously Alpha VOC is associated with N gene dropout and CT value shifting to Allplex™ SARS-CoV-2/FluA/FluB/RSV™ PCR assay [30]. Furthermore, a PCR amplification curve abnormality (double or low amplification curve) was reported in the RdRp/S gene of the Allplex SARS-CoV-2 assay due to the spike mutation in the P681 of Alpha and Delta VOCs [31]. This could be helpful in rapidly predicting the presence of these variants prior to sequencing.
The single nucleotide polymorphism C to T at position 927 in the C terminus of the N gene has been reported to cause N gene target failure for an Xpert Xpress SARS-CoV-2 (GXP) assay [32,33]. Finally, a single point substitution G to T at 922 was sufficient to impair N gene detection in Cepheid Xpert Xpress SARS-CoV-2 assay as recently reported in a Singaporean study [34]. This substitution localized in a region targeted by the N2 probe from CDC USA assay, and we observed that 2.1% of group A sequences presented ≥1 mismatch and <0.4% of group B in Omicron sequences due to the lack of mutation in the targeted region.
It is important to note that each mismatch, irrespective of its location within the primer sequence, leads to the reduced thermal stability of the primer-template duplex, thus potentially affecting the PCR performance. However, mismatches located in the 3 end region of a primer have significantly larger effects on the priming efficiency compared with 5 located mismatches, since 3 end mismatches can disrupt the nearby polymerase active site [8]. As seen in Table 2, the mutations responsible for the most mismatches in primer/probe-template resided more frequently in the 3 end or both ends.
Overall, the sensitivity of PCR assays in detecting SARS-CoV-2 is highly dependent on the virus' genetic variability, which can be determined by matching the primer and probe to sequence binding region. Interestingly, currently, multi-gene target PCR assays are considered the gold standard for the detection of SARS-CoV-2; thereby, a single gene detection failure does not jeopardize the proper interpretation in multiplex assays targeting different genes of the viral genome. The detection of mutations that may have the potential to escape diagnostic assay is a must for eliminating any resulting discrepancy and better interpretation. It is important to note that the detection of SARS-CoV-2 with one gene detection failure or dropout may provide a rapid signal that a specific variant may be present; thus, sequencing can be considered to characterize the variant.
Unlike PCR assays, the sensitivity of Ag-RDT assays is partially dependent on SARS-CoV-2 variants and mutation presence. In a recently published study that compared the sensitivity of seven Ag-RDT assays between all VOCs, including Omicron, the results showed that the analytical sensitivity to detect the Omicron variant was lower than that for the other VOCs in most of the assays evaluated [35]. Thus, potentially due to the presence of mutations and deletions that Omicron possesses in the nucleocapsid, which is the target of nearly all Ag-RDT assays.
A multi-center study compared another seven Ag-RDT assays, regardless of the SARS-CoV-2 variants, found that all Ag-RDTs reach high sensitivity early in the disease (<3 days of symptoms) and in individuals with high viral loads (>6 log10 SARS-CoV2 RNA copies/mL) irrespective of whether symptomatic or asymptomatic cases [36]. Inline, another study demonstrated increasing the Ag-RDT assay's sensitivity when the viral load (≥5.2 log10 SARS-CoV2 E gene RNA copies/mL) and comparable performance between symptomatic and asymptomatic cases with similar viral loads [37].
A recent study showed that the nucleocapsid mutation T135I was associated with escaping detection by the Panbio COVID-19 rapid antigen test due to its localization around major epitopes of N protein [38]. The latest study from the USA compared the performance of three Ag-RDT assays in the detection of Delta and Omicron VOC, the results showed that the Ag-RDT assays performed similarly for Omicron and Delta VOCs and performed better among patients with the highest viral loads [39].
The overall findings suggest that the sensitivity of Ag-RDT assays is widely dependent on several factors and is not only restricted to viral load or symptoms status nonetheless it may also be extended to mutations that can possibly alter different epitopes of SARS-CoV-2 structural proteins leading to escape coated antibodies in Ag-RDT assays. It is noteworthy to mention that low viral load (high CT) and mutations may pose a challenge in the early detection of SARS-CoV-2 by Ag-RDT assays, thus, further fueling chain transmission.
Importantly, our results are in keeping with the FDA recommendations for the use of assays with multiple genetic targets ensuring higher sensitivity despite the different genetic profiles of SARS-CoV-2 variants. Multiple genetic targets implies that a molecular assay is designed to detect more than one region of the SARS-CoV-2 genome or, for antigen tests, more than one region of the proteins that form SARS-CoV-2 [40]. Furthermore, assay optimization and validation are essential to confirm its sensitivity and specificity, and thus can be ensured by designing homologous primers to the target sequence, verifying the reverse complement, avoiding ambiguous nucleotides unless necessary, and optimizing the primers' concentration and temperature.

Conclusions
This study highlights the importance of characterizing mutations and variants of SARS-CoV-2 as they have the potential to affect diagnostic assays. The high rate of single and multiple mismatches found in the target regions of molecular assays worldwide used for SARS-CoV-2 diagnosis reinforces the need to use more than one target to bypass the potential lack of recognition of one PCR target and to monitor and constantly update, if necessary, these assays according to SARS-CoV-2 genetic evolution and the future emergence of novel variants. This will ensure the full efficacy of diagnostic assays, thus, contributing to the goal of limiting viral transmission chains and contrasting viral spread.  for SARS-CoV-2 diagnosis reinforces the need to use more than one target to bypass the potential lack of recognition of one PCR target and to monitor and constantly update, if necessary, these assays according to SARS-CoV-2 genetic evolution and the future emergence of novel variants. This will ensure the full efficacy of diagnostic assays, thus, contributing to the goal of limiting viral transmission chains and contrasting viral spread.