Geographical Variability Affects CCHFV Detection by RT–PCR: A Tool for In-Silico Evaluation of Molecular Assays

The Crimean–Congo hemorrhagic fever virus (CCHFV) is considered to be a major emerging infectious threat, according to the WHO R&D blueprint. A wide range of CCHFV molecular assays have been developed, employing varied primer/probe combinations. The high genetic variability of CCHFV often hampers the efficacy of available molecular tests and can affect their diagnostic potential. Recently, increasing numbers of complete CCHFV genomic sequences have become available, allowing a better appreciation of the genomic evolution of this virus. We summarized the current knowledge on molecular methods and developed a new bioinformatics tool to evaluate the existing assays for CCHFV detection, with a special focus on strains circulating in different geographical areas. Twenty-two molecular methods and 181 sequences of CCHFV were collected, respectively, from PubMed and GenBank databases. Up to 28 mismatches between primers and probes of each assay and CCHFV strains were detected through in-silico PCR analysis. Combinations of up to three molecular methods markedly decreased the number of mismatches within most geographic areas. These results supported the good practice of CCHFV detection of performing more than one assay, aimed for different sequence targets. The choice of the most appropriate tests must take into account patient’s travel history and geographic distribution of the different CCHFV strains.


Introduction
The Crimean-Congo hemorrhagic fever virus (CCHFV) is a tick-borne, negative-sense, single-stranded RNA orthonairovirus, and the causative agent of the Crimean-Congo hemorrhagic fever (CCHF). The CCHFV is considered to be one of the major emerging infectious threats as it might cause a severe disease, with increasing number of cases of infection in Africa, Middle East, Asia, and parts of Europe [1,2].
Rapid diagnosis is crucial to ensure the implementation of appropriate infection control measures and guide post exposure prophylaxis [3]. Among the diagnostic tests available, reverse-transcriptase PCR (RT-PCR) is the method of choice for rapid laboratory diagnosis during the acute phase of infection [4]. However, the efficacy of the available molecular tests is hampered by the high genetic variability of the virus that displays the greatest degree of sequence diversity of any arbovirus species [4,5].
CCHFV is an enveloped virus with a tripartite [small (S), medium (M), and large (L)] negative-sense single strand RNA genome [6], characterized by a degree of sequence diversity of 20%, 22%, and 31% among the S-, L-, and M-segments of the virus genome [5]. The mechanisms responsible for this marked degree of sequence diversity include genetic drift or separate evolution of lineages in geographically constrained reservoirs. Long-distance transport in infected ticks by migratory birds, or trade in viremic life stock can result in multiple strain variants occurring in the same endemic area. Segment reassortment as a result of co-infection can also increase the diversity of CCHFV strains [4,7,8].
Most of the isolates causing outbreaks in Eastern Europe belong to Clade V, whereas Clade VI includes largely divergent strains isolated from ticks in Greece (including the strain AP92) [4,18] and Turkey [19]. Isolates belonging to the "African" Clade III were also collected in Spain from infected ticks in 2010 [20] and during 2011-2015 [21], which were found to have caused human infections in 2016 [22]. One more recent human infection in Spain in 2018 were caused by a CCHFV not yet characterized [23].
Moreover, in some cases, different CCHFV geographic clades are characterized by different pathogenic potentials, as in the case of Clade VI in Greece, where, despite high antibody prevalence, very few and mostly mild human cases have been reported [24]. Viral diversity can strongly affect sensitivity of molecular tests when mutations occur at the binding site of primers and probes [25]. In particular, mismatches occurring within the 3'-end can dramatically affect fragment amplification [26,27].
Experimental validation of molecular methods can usually be performed on a limited set of biological samples that might not be representative of the viral population from the different outbreak-prone geographical areas. Nowadays, the increasing number of complete viral sequences that can be retrieved from public databases provide a worthwhile resource for evaluating the genetic variation within viral strains. However, many sequence entries have been assembled using classic RT-PCR approaches and the use of general terminal sequencing primers; as a consequence, published sequence fragment termini often include primer sequences. Very few segments have been assembled Viruses 2019, 11, 953 3 of 14 using molecular techniques specific for terminal sequencing, such as rapid amplification of cDNA ends (RACE).
Many tools were developed for designing in-silico PCR methods [28,29]. Here, we describe a bioinformatic tool (CCHFV Primer Checker) to evaluate the detection efficacy of the existing molecular assays and to propose specific sets of assays to detect as many CCHFV strains from all geographic regions, as possible.
All CCHFV genomes available by 31 December 2018 were retrieved from the GenBank (https://www.ncbi.nlm.nih.gov/nucleotide/), using "txid1980519[Organism]" as the term of query. All S segments including the complete coding region, with available data about host, collection country, and collection date were selected. The S segment sequence of the CCHFV that caused a case of infection in Spain in 2018 [23] was also added to the analysis, although the sequence was not complete [30].

Phylogenetic Analysis
The analyses was focused on the S segment, as the most conserved segment across the CCHFV clades [5,31], and the main target for molecular assays. CCHFV complete coding sequences of the S segment from the different viral strains were selected and clustered at 100%, with CD-HIT v4.6; all sequences were then aligned with MAFFT v7.123b in the local pair mode. Phylogenetic analysis was performed after removing all positions with gaps in the alignment; the maximum likelihood tree was then obtained with RAxML v8.2.10 using the GTRGAMMA model and 1000 bootstrap inferences. Moreover, pairwise distances between sequences were calculated using the Kimura model. In agreement with previous works, branch positions, pairwise distances, and collection countries were used to identify separate CCHFV clades within the phylogenetic tree [5,[9][10][11][12][13][14][15][16][17][18][19][20][21][22]].

Software Description
The homemade python script, named "CCHFV Primer Checker", took the primers/probes listed in a table in the "csv" format, the multi-sequence alignment file in the "fasta" format, and, for each clade, the list of GeneBank accession numbers of all sequences to be analyzed, in the "txt" format.
As most heuristic alignment algorithms have difficulties in recovering the ends of the alignment region, when these contain mismatches [32], the "CCHFV Primer Checker" searched for primers and probe binding sites using a perfect-match approach with IUPAC ambiguity codes [33]. If the assay contained degenerate primers, the ambiguous characters were replaced by regular expressions (for example, the ambiguous character "N" is substituted by regular nucleotides A + C + G + T). The software found the start and end positions of the annealing sites of each primer and probes, with respect to IbAr10200 (NCBI reference sequence NC_005302).
The number and position of mismatches for every primer/probe with each CCHFV sequence was recorded by the program. The software calculated the percentage of mismatches between each primer/probe and all targets belonging to the same viral clade. The percentage of viral strains in each clade fully matching the primers and probe set (i.e., with no mismatch in the annealing sites) was also calculated.
The results were then displayed as an Excel file, reporting the frequency of mismatches between each primers/probe set and every genome sequence, for each clade. As it is known that a single 3'-end mismatch can affect the performance of the primers [26,27,32], all those "critical mismatches" occurring at the last five positions of the 3'-end primers were also recorded and reported.
To investigate the detection efficacy of a combination of assays, the software also evaluated the number of mismatches of all viral sequences in each clade with all possible assay combinations, with a maximum of five assays per combination.

Assays Evaluation
In order to guide the interpretation of the results of the in-silico analysis, for each viral clade three threshold parameters were assumed: 1.
The primers/probe set of the assay must not have had more than 3 mismatches with respect to every genome in the clade; 2.
The primers of the assay must not have had more than 1 critical mismatch, i.e., a mismatch located at the last 5 positions of 3'-end; 3.
Within a clade, more than 50% of viral strains must have had fully matched the primers/probe set (therefore, we could expect that new sequences owing to the same clade would not have too many mismatches).
For each analyzed assay, the maximum number of mismatches of the respective primers/probe set, as well as that for each component of the set, were reported, together with the percent number of viral strains that fully matched each assay.
For each clade, the assay combination(s) with the minimum number of mismatches with all viral sequences was also reported.

Data Collection
The number of CCHFV molecular detection articles collected through PubMed search was 206 ( Figure 1), most of them were discarded as they did not report detailed descriptions of the detection methods employed, including the sequences of primers and probes. The results were then displayed as an Excel file, reporting the frequency of mismatches between each primers/probe set and every genome sequence, for each clade. As it is known that a single 3'end mismatch can affect the performance of the primers [26,27,32], all those "critical mismatches" occurring at the last five positions of the 3'-end primers were also recorded and reported.
To investigate the detection efficacy of a combination of assays, the software also evaluated the number of mismatches of all viral sequences in each clade with all possible assay combinations, with a maximum of five assays per combination.

Assays Evaluation
In order to guide the interpretation of the results of the in-silico analysis, for each viral clade three threshold parameters were assumed: 1. The primers/probe set of the assay must not have had more than 3 mismatches with respect to every genome in the clade; 2. The primers of the assay must not have had more than 1 critical mismatch, i.e., a mismatch located at the last 5 positions of 3'-end; 3. Within a clade, more than 50% of viral strains must have had fully matched the primers/probe set (therefore, we could expect that new sequences owing to the same clade would not have too many mismatches). For each analyzed assay, the maximum number of mismatches of the respective primers/probe set, as well as that for each component of the set, were reported, together with the percent number of viral strains that fully matched each assay.
For each clade, the assay combination(s) with the minimum number of mismatches with all viral sequences was also reported.
Starting with 2729 CCHFV sequences available in GenBank, 1,438 were found as the S segment, and 263 as complete coding sequences. The selection of records with available data about host, collection country and date, and clustering at 100% of nucleotide identity provided a total of 181 strains. N.D.-not declared; LOD-limit of detection.

Phylogenetic Analysis
The phylogenetic tree built with all 181 complete coding sequences of the S segment is shown in Figure 2. Our analysis suggests a clade separation into nine groups. In agreement with previous studies [5], we maintained the nomenclature of six different clades previously recognized, corresponding to different geographic regions-three clades prevalently diffused in Africa (Clades I-III), and two in Europe (Clades V and VI) and Asia (Clade IV). Moreover, one recently described clade was identified as clade VII [4] and, following previous work indications, clades III and IV were split into sub-groups, according to divergence in genetic distance (see web-only Supplementary Table S2) [17]. Median distance among CCHFV S sequences was 0.12 (range: 0.00-0.22), which was in agreement with [5]. The detailed phylogenetic tree is reported in the web-only Supplementary Figure S1. clade was identified as clade VII [4] and, following previous work indications, clades III and IV were split into sub-groups, according to divergence in genetic distance (see web-only Supplementary Table  S2) [17]. Median distance among CCHFV S sequences was 0.12 (range: 0.00-0.22), which was in agreement with [5]. The detailed phylogenetic tree is reported in the web-only Supplementary Figure  S1.

Assays Evaluation
Using the CCHFV Primer Checker workflow, all sequence variants of CCHFV S segment versus the primers/ probe sets were identified for each molecular assay; reported in Supplementary Figure  S2. The percentage of viral strains in each clade that fully matched (100% nucleotide identity) the set is reported in Table 2. For each viral clade, molecular assays that showed matching results over the thresholds values described in Methods, are reported in bold. As some molecular tests have an annealing site that lies in the extreme 5' or 3' non-coding regions, which is not represented in all of the analyzed S segments, the number of sequences included in the analysis varied between 89 and 160, according to the location of the primer/probe alignments. Therefore, for the assays falling in this category, we restricted the analysis to a reduced subset of CCHFV sequences. These results are presented in Table 3, since the value of this analysis is reduced and cannot be directly compared to the results reported for the other assays in Table 2.

Assays Evaluation
Using the CCHFV Primer Checker workflow, all sequence variants of CCHFV S segment versus the primers/ probe sets were identified for each molecular assay; reported in Supplementary Figure S2. The percentage of viral strains in each clade that fully matched (100% nucleotide identity) the set is reported in Table 2. For each viral clade, molecular assays that showed matching results over the thresholds values described in Methods, are reported in bold. As some molecular tests have an annealing site that lies in the extreme 5' or 3' non-coding regions, which is not represented in all of the analyzed S segments, the number of sequences included in the analysis varied between 89 and 160, according to the location of the primer/probe alignments. Therefore, for the assays falling in this category, we restricted the analysis to a reduced subset of CCHFV sequences. These results are presented in Table 3, since the value of this analysis is reduced and cannot be directly compared to the results reported for the other assays in Table 2.
Viruses 2019, 11, 953 9 of 14 As reported in Table 4, using a combination of assays resulted in a better match with sequences owing to all sub-groups, except Africa 1, Europe 2, and Europe 3. Assay combinations tested for all CCHFV sequences was also reported, using a maximum of five assays since the use of more than five tests is excessively demanding from a computational and a practical point of view. For all clades analyzed, combining more than three tests did not improve detection efficacy. Table 4. Best assay combinations for the CCHFV detection. For each clade, the combination(s) of assays with the best detection efficacy was reported on the basis of the three threshold parameters (see Methods).

Clade
Best

Discussion
An early and accurate diagnosis of CCHF infections is essential for case management and infection control procedures. The application of molecular tests in different settings is hampered by the great degree of sequence diversity. Therefore, serological methods have a broader use. The gold standard for diagnosis, however, is a combination of serology with molecular tests [4].
All available assays can be affected by the high diversity of the CCHFV genomes that hinders the design of specific primers/probe sets and prompts the use of high multiplexing procedures. Older molecular methods were designed on the basis of a limited number of CCHFV sequences and were often tested on a limited number of clades, while more recent assays were performed using bigger and updated data and were tested for multiple strain detection. Moreover, the methods optimized for the detection of strains circulating in a specific geographic area, might present a lower detection limit on the targeted strains, when compared with methods aimed to cover a broader spectrum of viral variants.
The in-silico PCR analysis performed in this study confirmed that assay sensitivity potential is strongly correlated to the geographic area of virus origin and to the evolutionary history of CCHFV, although validation studies with clinical samples are needed to proof this assumption. For example, the so-called "Drosten 2002" method perfectly matches more than 60% of sequences belonging to sub-group Asia 2 (Clade IV), but does not perfectly match any sequence belonging to other clades. Therefore, this assay can be considered useful to detect CCHFV from Asia but could perform worse when used to detect CCHFV from other regions. Our analysis confirmed suitability of primer choices for some assays that were expressly designed to detect CCHFV specifically from Europe, such as Duh 2006 (sub-group Europe 1) and Midilli 2009 (B) (sub-group Europe 2). A strong association between the predicted assay sensitivity and geographic origin emerged for other methods, like Wolfel  Atkinson 2012, and Bonney 2017), it is not feasible to perform a robust variant analysis using this tool because their target region includes the CCHFV 5' end that is rarely sequenced and reported. Consequently, in silico evaluation on their diagnostic capacity are not supported by enough data. In particular, the assay from Deyde et al. (2006) covers both the 5' and 3' ends of the S segment and needs to be evaluated for potential sensitivity problems.
All parameters considered are intended as an easy guide for selecting the most appropriate assay for diagnostic purposes, even if further wet lab analysis on a wide panel of reference strains and real clinical samples are necessary to evaluate the sensitivity and specificity of molecular tests. However, limited reference strains are available for testing, even through specialized repositories like EVAg (www.european-virus-archive.com), and very few clinical samples are available for tuning the diagnostic capabilities. Interestingly, it could be noted that there were some agreements of our analysis with the EQA on the molecular detection of CCHFV [54]; in particular for Duh 2006, which was not able to detect strains from Asia 1 and Africa 3 clades. Therefore, in-silico evaluation, even if not free of drawbacks, still provides useful data for the choice of the most appropriate molecular method(s) to detect CCHFV from different endemic regions of the world. This analysis could not comprehend commercial assays (such as Altona), as they do not share primer sequences. We also emphasized the need for public sharing of primer design and selections for commercially offered assays, as part of capacity-building for emerging infections.
In conclusion, from this work emerges a strong region-dependence of potential performances of all assays, in detecting CCHFV. As more and more CCHFV sequences become available, our future efforts will be involved in the design of diagnostic tests capable to detect all known circulating CCHFV clades. Future molecular methods could be based on multiple tests, designed to detect multiple CCHFV targets at a high sensitivity and specificity. Potential design approaches based on Microarray or other High Multiplexing Fast PCR might be investigated. Currently, although a single method is to be considered appropriate for "local" investigations or in outbreak conditions, when the performance of the adopted method for the detection of the circulating strain is known, an effective diagnostic of CCHF in patients from different geographical areas should rely on a panel of methods and a thorough epidemiologic investigation.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/10/953/s1, Figure S1: Detailed phylogenetic tree. Figure S2: Sequence variants of the CCHFV S segment versus each molecular assay. Table S1: Published molecular assays for CCHFV detection at December 2018. Table S2: CCHFV genetic distances between all analyzed CCHFV genomes, calculated using the Kimura-2-parameters model. Funding: This research was supported by the following funds: Italian Ministry of Health, grants Ricerca Corrente-Linea 1; European Union, Joint Action Consumers, Health, Agriculture, and Food Executive Agency for Efficient response to highly dangerous and emerging pathogens at EU level no. 677066 (EMERGE); European Centre for Disease Prevention and Control (ECDC), EVD-LabNet Framework contract ECDC/2016/00; European Union, Horizon 2020 research and innovation program "European Virus Archive goes Global" no. 653316 (EVAg).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.