Dynamic Molecular Epidemiology Reveals Lineage-Associated Single-Nucleotide Variants That Alter RNA Structure in Chikungunya Virus

Spicher, Thomas; Delitz, Markus; Schneider, Adriano de Bernardi; Wolfinger, Michael T.

doi:10.3390/genes12020239

Open AccessFeature PaperArticle

Dynamic Molecular Epidemiology Reveals Lineage-Associated Single-Nucleotide Variants That Alter RNA Structure in Chikungunya Virus

by

Thomas Spicher

^1,†,

Markus Delitz

^1,†

,

Adriano de Bernardi Schneider

²

and

Michael T. Wolfinger

^1,3,*

¹

Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Vienna, Austria

²

AntiViral Research Center, Department of Medicine, University of California San Diego, San Diego, CA 92103, USA

³

Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Währingerstraße 29, 1090 Vienna, Austria

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Genes 2021, 12(2), 239; https://doi.org/10.3390/genes12020239

Submission received: 14 January 2021 / Revised: 29 January 2021 / Accepted: 4 February 2021 / Published: 8 February 2021

(This article belongs to the Special Issue Quest for Conserved RNAs in Viral Genomes)

Download

Browse Figures

Versions Notes

Abstract

Chikungunya virus (CHIKV) is an emerging Alphavirus which causes millions of human infections every year. Outbreaks have been reported in Africa and Asia since the early 1950s, from three CHIKV lineages: West African, East Central South African, and Asian Urban. As new outbreaks occurred in the Americas, individual strains from the known lineages have evolved, creating new monophyletic groups that generated novel geographic-based lineages. Building on a recently updated phylogeny of CHIKV, we report here the availability of an interactive CHIKV phylodynamics dataset, which is based on more than 900 publicly available CHIKV genomes. We provide an interactive view of CHIKV molecular epidemiology built on Nextstrain, a web-based visualization framework for real-time tracking of pathogen evolution. CHIKV molecular epidemiology reveals single nucleotide variants that change the stability and fold of locally stable RNA structures. We propose alternative RNA structure formation in different CHIKV lineages by predicting more than a dozen RNA elements that are subject to perturbation of the structure ensemble upon variation of a single nucleotide.

Keywords:

Chikungunya virus; molecular epidemiology; mutation; RNA structure

1. Introduction

Chikungunya virus (CHIKV) is an arthropod-borne Alphavirus of the family Togaviridae that causes millions of human infections every year, particularly in tropic and subtropic regions. CHIKV is the etiological agent of chikungunya fever, an acute febrile illness associated with joint pain, rash, and, rarely, neurological manifestations [1]. CHIKV infection can culminate in chronic arthralgia and arthritis lasting up to several years. CHIKV cycles between vertebrate hosts and hematophagous arthropod vectors, predominantly Aedes aegypti and Aedes albopictus [2]. In Africa, CHIKV occurs in an enzootic, sylvatic cycle involving nonhuman primates as hosts, while in Asia, CHIKV is mainly maintained in an urban cycle with direct human–mosquito–human transmission [3]. The absence of a sylvatic life cycle in Asia suggests that CHIKV originated in Africa and was later carried to Asia [4].

1.1. Geographical Spread of Chikungunya Virus

The first documented outbreak of CHIKV was in 1952 on the Makonde Plateau in the Southern Province of Tanganyika (present-day Tanzania) [5,6]. Since then, CHIKV has been emerging in Africa and Asia, with larger outbreaks in the 1960s and 1990s [7]. In 2004, CHIKV re-emerged in Kenya and in 2005, an outbreak hit the island of La Réunion, which was the first time that CHIKV occurrences were reported in the southwestern Indian Ocean region [8]. The virus, previously assumed to be non-fatal, caused several deaths and also affected neighboring islands including Mayotte, Madagascar, the Seychelles, Comoros, and Mauritius. Following the Indian Ocean islands outbreak, CHIKV spread independently into the Indian Subcontinent and Southeast Asia [9]. In the Indian Ocean basin, CHIKV dissemination has been mediated by several mutations, the most prominent being A226V, an amino acid substitution that changes the protein structure of the membrane fusion glycoprotein E1 [10], resulting in increased transmission by A. albopictus. As A. albopictus is present in temperate regions, this adaptation also has implications on the geographical range of transmission, with CHIKV no longer being bound to tropical and subtropical latitudes [4]. Reported cases of CHIKV in Italy, France, Mexico, and the USA, together with the expanding global distribution of A. albopictus, facilitated by climate change, raises public health concerns worldwide [11,12,13].

Early phylogenetic analyses suggested that CHIKV can be separated into three geographically disjoint lineages: West African (WA), East Central South African (ECSA) and Asian Urban Lineage (AUL) [14]. Following the 2005 La Réunion outbreak, and the subsequent emergence of CHIKV in India, the existence of a fourth separate lineage, termed Indian Ocean Lineage (IOL), has been proposed [15]. While the WA lineage is geographically isolated and shares deep ancestry with the other lineages, investigation of the relationships between taxa belonging to ECSA through phylogenetic inference revealed that this lineage in fact splits into three distinct geographically disjoint epidemic clades [16]: The Middle African Lineage (MAL) gave rise to outbreaks in South America [17,18] and Haiti [19] (South American Lineage, SAL), and the East African Lineage (EAL) gave rise to the IOL. The latter is predominantly found on the Asian continent, except for travel-related cases in which CHIKV has been imported to Europe and North America [3,20,21]. The third ECSA-derived lineage includes the 1953 Tanganyika strain and encompasses a handful of isolates from Africa and Asia in a monophyletic group (Africa/Asia Lineage, AAL). The AUL, conversely, is considered a sister clade to all ECSA-derived lineages and has been circulating in Southeast Asia before spreading into Central America and many South American countries from 2013 [3,22,23]. In 2013, the AUL lineage reached Brazil with the first autochthonous cases being observed in late 2014 [17]. Around the same time span—2013/2014—it emerged in the Caribbean, causing massive spread and nearly three million cases [24]. Since then, AUL has spread to multiple regions in America. Interestingly, it is believed that CHIKV was present in the Americas in the 1800s, a courtesy of a spread through navigation, starting in the Caribbean and from there, spreading into North and South America, although at the time CHIKV was mislabeled as another febrile disease, dengue [25].

1.2. RNA Structure Conservation in Chikungunya Virus Genomes

CHIKV is a small, spherical, enveloped virus with a single-stranded, (+)-sense RNA genome of approximately 11.8 kb [2,26] that contains a 5

^{'}

cap structure and a 3

^{'}

poly-A tail. The CHIKV genome contains two open reading frames (ORFs) that encode non-structural and structural proteins, respectively, as polyproteins that are post-translationally cleaved [27]. The non-structural proteins (nsP1, nsP2, nsP3 and nsP4) constitute the viral replication machinery and are translated from the full-length genome, while the structural proteins (C, E1, E2, E3) form the virus particles and are produced from a subgenomic messenger RNA. The coding sequence is flanked by structured untranslated regions (UTRs) on both ends, which represent the most variable regions of the CHIKV genome [28]. This divergence manifests in variable 3

^{'}

UTR (and thus genome) lengths of individual CHIKV lineages, with ECSA-derived lineages being shorter than WA, and AUL comprising the longest isolates [15]. It is plausible to propose patterns of coupled historic mutation and recombination events that eventually resulted in the plasticity observed in present day CHIKV isolates [29]. Specific patterns and copy numbers of sequence-level direct repeats in the 3

^{'}

UTR are characteristic of particular CHIKV lineages and likely represent adaptations of the virus to environmental constraints [30].

Like many other RNA viruses, CHIKV encodes not only viral proteins but also functional RNAs that mediate the viral life cycle. These structured RNAs are found in coding and non-coding regions of the viral genome. A specific fold is often a prerequisite for functional RNAs, and there are selective evolutionary pressures on maintaining these folds, both in coding and non-coding regions [31]. As RNA structure is typically conserved at the level of secondary structures, structural homology can be interpreted as a result of evolutionary forces that act on particular RNAs, requiring them to maintain a critical set of structure-determining base-pair interactions. In nature, this is achieved by compensatory mutations, i.e., those that maintain base-pair complementarity by a combination of two mutations, e.g., AU → GC, or consistent mutations that change only one pairing partner, e.g., AU → GU. While this kind of structural conservation of functional RNAs, which is also known a covariation, is a ubiquitous trait that is found in all domains of life, increased mutation rates render viruses particularly interesting in this context.

Although CHIKV is one of the best-studied viruses within the genus Alphavirus, knowledge of functional RNA elements and their specific association to viral pathogenesis and replication remains elusive. Unlike other RNA viruses that are characterized by structural conservation of a critical amount of functional RNAs in their UTRs, such as flaviviruses [32] or coronaviruses [33], genus-wide RNA structure conservation does not appear to be prevalent in alphaviruses. In this line, evidence for pervasive RNA structure conservation has not been observed among Sindbis virus (SINV), Venezuelan equine encephalitis virus (VEEV), and CHIKV [34], probably because the genomic location of recognized RNA structure motifs, such as packaging signals, are found at divergent locations in different alphaviruses. However, the apparent absence of covariation patterns among alphavirus species does not exclude the ability of individual species to form highly stable, functional structures. This has been recently affirmed in a genome-wide RNA structure probing study that could confirm known functional RNAs in CHIKV by SHAPE-MaP and characterize several highly structured, potentially functional RNA elements [35].

Likewise, the association between primary sequence and secondary structure in the terminal regions of CHIKV genomes has raised considerable research interest over the last years, mainly motivated by the observation that different CHIKV lineages maintain variable-length 3

^{'}

UTRs that comprise specific patterns of sequence repeats. While earlier studies identified sequence repeat patterns in the 3

^{'}

UTR of several alphaviruses [36], recombination by copy-choice mechanisms has been proposed to accelerate CHIKV adaptability, resulting in novel 3’UTR variants [30]. In a recent study, we proposed an unambiguous association of sequence repeats in the 3

^{'}

UTRs of different CHIKV lineages with evolutionarily conserved, structured and unstructured RNA elements [16]. Current knowledge about the lack of functional conservation in alphaviruses suggests that potentially functional RNA elements evolved independently in each viral species.

An aspect related to the formation and specificity of RNA structure is the effect of single nucleotide variants (SNVs). These are mutations that can alter the RNA structural ensemble by mediating the base-pairing pattern, potentially resulting in an alternative fold and disrupted functionality. SNVs are sometimes associated with so-called riboSNitches, i.e., RNA elements that are subject to perturbation of the structural ensemble resulting in large conformational changes [37,38]. Examples where such events can lead to disease phenotypes in human have been described in the literature [39,40,41,42].

1.3. Molecular Epidemiology Reveals RNA Structure-Affecting SNVs

With the availability of large numbers of next-generation sequencing data in public databases, multiple efforts to analyze and visualize the spread of infectious diseases have been made [43,44,45]. Nextstrain (https://www.nextstrain.org (accessed on 20 January 2021)) [46] is an open source project that makes pathogen phylogenetic data easily accessible to researchers and the interested public, thus facilitating research efforts in the field of pathogen evolution and epidemiology. Nextstrain allows for the set up so-called community builds that employ the Nextstrain software stack to construct custom real-time phylodynamics resources. The Nextstrain community builds have become increasingly popular, and were used, for example, to highlight the genomic epidemiology of the 2018–2020 Ebola virus outbreak in DRC [47], and showcase the mutational dynamics of SARS-CoV-2 superspreading events in Austria [48]. We report here the availability of a custom Nextstrain build for CHIKV that encompasses 924 publicly available genomes.

To better understand the evolutionary traits associated with functional RNA conservation among different lineages, we set out to use the CHIKV molecular epidemiology data to study the impact of lineage-associated sequence variability on viral RNA structure. We were particularly interested in the structural divergence induced by fixed SNVs that are specific to particular lineages, and predict the existence of more than a dozen locally stable RNA elements in the coding regions of the CHIKV genome, whose structural ensemble is substantially altered by lineage-associated SNVs.

2. Materials and Methods

2.1. Taxon Selection

We downloaded viral genome and annotation data from the public National Center for Biotechnology Information (NCBI) Genbank database [49] on 23 October 2020. We compiled all temporal and geographic metadata available in these genome records to create data sets for building the Nextstrain instance. Eight sequences of the data set were removed due to missing geographic location metadata or designation as a vaccine sequence. Metadata related to geographic location was associated with the United Nations geoscheme for consistency and labeled in the same way as in de Bernardi Schneider et al. [16]. For the temporal analysis, we identified two sequences with erroneous sampling dates in the Genbank record. Upon contacting the corresponding authors, we were able to correct the dates of the NCBI accessions KX262991.1 and KY435454.1 to 2013 and 2014, respectively.

2.2. Genetic Distance

To calculate the genetic distance within and between Chikungunya lineages, we estimated the evolutionary divergence over sequence pairs between groups as implemented in MEGA X under default settings [50,51]. The analyses were conducted using the Maximum Composite Likelihood model [52].

2.3. CHIKV Nextstrain

We employed the workflow management system Snakemake [53] to build a pipeline for rapid deployment and reproducibility of the CHIKV Nextstrain build. In the first step of the Snakemake workflow, we retrieved metadata including collection date, country, and place of isolation (if available) from the Genbank records of all available CHIKV isolates. Each country was then assigned to one of the following regions: South East Asia, East Asia, South Asia, West Asia, Caribbean, Northern America, Central America, Southern America, Europe, Oceania, Eastern Africa, Middle Africa, Southern Africa, or Western Africa. In the next step, a file with lineage association for each isolate was created. Isolates with unknown lineage association were assigned to lineages via the the time-resolved phylogenetic tree produced by Nextstrain iteratively. The standard augur pipeline was then applied to construct all relevant data for Nextstrain visualization [46].

2.4. RNA sTructure Modulation via Lineage-Associated SNVs

For characterizing SNVs that affect CHIKV RNA structure formation, we performed local RNA structure prediction with RNALfold [54] in the reference strain of our Nextstrain build (KT327163.2), limited to sequence lengths of 150 nt and filtered for thermodynamically stable structures. We required a free energy z-score of at least −2 when comparing to 1000 dinucleotide shuffled sequences of the same nucleotide composition, resulting in 138 locally stable candidate structures spread throughout the CHIKV genome. These were then intersected with Nextstrain genome diversity data, yielding a set of 759 candidate RNAs that overlap either one or more variable sites of the CHIKV genome. For each candidate RNA, we computed the minimum free energy (MFE) of the non-mutated wild-type (WT) sequence as well as MFEs of all SNV mutants, assessed the base-pair distance between WT and mutant MFE structures with RNAdistance from the ViennaRNA Package [55], employing a base-pair distance cutoff of 15, and filtered for variants that show (almost) complete fixation in one or more lineages. This yielded 14 locally stable RNA elements of the reference strain that overlap a total of 16 lineage-associated SNV, as listed in Table 3. Each SNV was then evaluated for its potential to alter the thermodynamic ensemble of RNA structures with the MutaRNA web server [56] using default parameters.

2.5. Data Availability

The CHIKV Nexstrain instance is available at https://nextstrain.org/community/ViennaRNA/CHIKV (accessed on 20 January 2021). The data can be downloaded by scrolling down to the bottom of the page and clicking on the “Download Data” link.

3. Results

3.1. Genetic Distance between Chikungunya Virus Lineages

Nucleotide divergence analyses based on the average number of base substitutions per site can highlight the proximity among groups of taxa. To get an updated view of the distances within and between CHIKV lineages at the level of nucleotides, we computed the average evolutionary divergence over sequence pairs. Our results show that the nucleotide divergence within each individual lineage is relatively low (<0.01), with an average of 0.006 (Appendix A, Table A1). The evolutionary divergence between lineages has an average value of 0.073 (Table 1). The WA lineage has a nucleotide divergence to the other lineages that ranges from 0.17–0.19, showing the highest divergence to all other lineages. Additionally, the major clades encompassing AUL and AUL-Am on one side, and EAL, IOL, MAL, and SAL on the other side, present a nucleotide divergence ranging from 0.065–0.069 between each other. While the genetic divergence between these major clades confirms genotypes that have previously been described in the literature as well-defined lineages, i.e., WA, ECSA, and AUL, lower divergence can be observed between AUL-Am and AUL (0.011), as well as between SAL, MAL, EAL, IOL, AAL, sECSA, ranging from 0.006–0.031.

3.2. A Nextstrain Build for Chikungunya Virus

In an attempt to provide a publicly available epidemiological and phylogeographical interactive visualization of CHIKV spread, we created a custom Nextstrain build that encompasses all currently available CHIKV genome data. Our community build is available via https://nextstrain.org/community/ViennaRNA/CHIKV (accessed on 20 January 2021) and comprises 924 genomes, which represents a substantial increase compared to the 590 genomes considered in the previous most comprehensive study of CHIKV phylogeny [16]. The Nextstrain phylogeny is based on a maximum-likelihood tree, which is used to infer a timed tree with TreeTime [57] (Figure 1), making available the time of the most recent common ancestor (TMRCA) associated with each individual lineage/major clade of interest (Table 2).

The CHIKV Nextstrain build allows the user to filter data and change the visualization according to preferences, utilizing a set of filters such as date ranges, multiple tree visualization layouts, feature filters and colors. Besides visualizing phylogeography-related characteristics, Nextstrain provides information about the diversity of the underlying sequence data as normalized Shannon entropy in a separate diversity panel. The analysis of nucleotide divergence within the full genome sequences available on the Nextstrain build enabled us to explore lineage-associated genomic variants that change the stability and fold of locally stable RNA structures in silico.

3.3. Lineage-Specific RNA Structures

To better understand within-species RNA genotype–phenotype associations in viruses, wet set out to assess the impact of lineage-associated, evolutionary fixed SNVs on RNA structure formation in CHIKV. To this end, we performed local RNA secondary structure prediction in the reference strain of our Nextstrain build (KT327163.2, clustering with the AUL-Am lineage), aiming at characterizing structural elements that show increased thermodynamic stability, as expressed by z-score statistics. We intersected loci that fold into locally stable RNA structures with genome diversity data from Nextstrain to obtain regions of the CHIKV genome that both fold into stable RNA structures and contain one or more single-nucleotide mutations. Each SNV in this set was then characterized in terms of geographic spread and association to specific CHIKV lineages, as well as their base-pair distance between non-mutated wild-type MFE structure and mutant MFE structure. Using base-pair distance to pre-filter variants that induce a substantial change in the global fold of the locally stable RNAs, we identified 12 candidate RNAs within the CHIKV coding regions that overlap one SNV, and two candidates that overlap two SNVs each. In total, we have 14 candidate RNAs and 16 fixed SNVs, as shown in Figure 2. Details, such as the genomic location and thermodynamic stability of the candidate RNAs, as well as the lineage-association of SNVs are listed in Table 3.

For each candidate RNA, we quantified the effect of mutation-induced changes on the RNA structure ensemble with the MutaRNA webserver [56]. In addition to comparing the characteristics of wild-type and mutant (Due to the selection of our reference strain, ‘wild-type’ refers to isolate KT327163.2, whereas ‘mutant’ refers to the respective SNV in all candidates listed in Table 3). For MFE structures, we assessed the impact of lineage-associated mutations on the entire thermodynamic ensemble of RNA structures by partition function folding.

We selected two examples that exhibit interesting structural traits: The first example is a variant at nucleotide position 1653 (Figure 3), which has an Adenine (A, wild-type) in the majority of AUL and AUL-Am isolates, while a Guanine (G, mutant) is found at this position in all other lineages. The A1653G variant overlaps a locally stable RNA of 81 nt, whose wild-type sequence folds into a bulged stem-loop structure with an MFE of −23.30 kcal/mol. The mutant folds into a three-way junction structure with an MFE of −28.10 kcal/mol and an equilibrium frequency of 0.3 in the thermodynamic ensemble, suggesting that the mutant is thermodynamically more stable than the wild-type variant. The base pairing potential of wild-type and mutant sequences are depicted as heat-map dot-plots as well as circular plots in Figure 3. These plots demonstrate the differences in base pairing patterns, highlighting that the SNV considerably modulates the fold of the RNA, thereby enabling the formation of alternatively stacked regions in the mutant that result in thermodynamic stabilization.

Another example encompasses multiple lineage-specific mutations at position 11,246, as depicted in Figure 4. While the wild-type (comprising AUL, AUL-Am, AAL, MAL and SAL) has an A at this position, two other variants are clearly associated with different lineages: EAL and IOL have a Cytosine (C) and WA has an Uracil (U) at position 11,246. While these variants result in a synonymous mutation at the amino acid level, they induce substantial changes for RNA structure formation. Position 11,246 overlaps an RNA element of 80 nt with a wild-type MFE structure of −29.30 kcal/mol and equilibrium frequency of 0.2156. The A11246C and A11246U variants are less stable, with MFE structures of −26.90 kcal/mol (equilibrium frequency 0.0650) and −27.70 kcal/mol (equilibrium frequency 0.1133), respectively. Intriguingly, all variants show varied base pairing patterns in the thermodynamic ensemble, where only nine stacked base pairs of the closing stem are predicted with high probability.

An interesting observation relates to a C-U mutation at position 10,651, which is responsible for the E1-A226V mutation that has been associated with increased CHIKV transmissibility by A. albopictus. Although the C10651U mutation overlaps a locally stable region (positions 10,594–10,682 of the CHIKV genome, z = −3.73), the overall fold of this structure is not altered by the mutation. This suggests that the biological effect is mediated by the protein level rather than the RNA level.

4. Discussion

In this contribution, we address the question as to what can be learned about CHIKV genotype/phenotype associations by comparative approaches, bringing together different concepts of molecular epidemiology, phylogeny reconstruction, and computational RNA biology. To this end, we build on the Nextstrain [46] framework to provide an interactive phylodynamics resource of CHIKV that reveals spatiotemporal and epidemiological facets of global virus dissemination. Moreover, we employ established tools for RNA structure prediction based on the ViennaRNA [55] Package to infer lineage-specific structural traits.

While the nucleotide divergence within the observed CHIKV lineages is relatively low, this can be explained by the geographical constraint and the reduced collection period for novel lineages. Conversely, the divergence between lineages varies considerably. The highest divergence of the West African lineage when compared to the other CHIKV lineages can be explained by the observation that WA has been an isolated lineage that emerged decades before the more recent outbreaks [14].

However, the divergence between major clades and geographically delineated lineages does raise the question of whether the current nomenclature, as canonized by experts from the field when first identified, is not misguiding. IOL is an example of a well-established lineage in the literature that presents low divergence from its African origin, but it does introduce the A226V mutation in E1, which mediates the capacity of the virus to replicate in A. albopictus [59].

Most viral genotypes/subgenotypes/subclades are based on whole-genome nucleotide divergence of a specific percentage, usually determined by phylogeny showing clades/lineages of a defined ‘high’ statistical support (>70% bootstrap) [60,61]. Hepatitis B viruses, for example, are classified into genotypes and subgenotypes based on their monophyly, amino acid signatures, and genetic distance [62]. In HIV, major distinct clades are classified as major genetic groups, with multiple subtypes within the genetic groups [63].

In the most recent comprehensive phylogeny of CHIKV to date, de Bernardi Schneider et al. [16] analyzed the three major lineages of CHIKV, AUL, IOL, and ECSA. These lineages were broken down based on their monophyly and geographic predominance. Here, we can see that although these strains can still be classified into distinct groups, there should be definite layers, such as lineages/genotypes and sublineages/subgenotypes.

Although we were not aiming to discuss a reclassification of CHIKV into a system independent of geography, we find the urge to bring to attention that a new coherent system should replace the current taxa classification. Such a new system could assist drug and vaccine development researchers to target specific genotypes or subgenotypes.

In an epidemiological context, the current lineage system, or, respectively, the way the strains are currently grouped, allows the inspection of major outbreak instances and looking at TMRCAs as a way to gauge when a lineage has been introduced in a region, causing outbreaks. From this perspective, our Nextstrain instance provides a reasonable amount of data to conduct a more in-depth investigation of the outbreaks that have recently occurred. Importantly, the calculated TMRCA of the major clades is in accordance with previous studies [15]. While before December 2013, local CHIKV transmission had not been identified in the Americas [24], our results suggest that the time of the most recent common ancestor for sequences in SAL and AUL-Am were 2011 and 2008, respectively. This result emphasizes the importance of increased surveillance, to identify the virus at the time of introduction, rather than at the time of pandemic [64]. The consistency of the dates between Nextstrain TimeTree calculations and previously described TMRCAs in the literature is also reassuring of our ability to provide this additional information on the CHIKV Nextstrain instance.

Molecular epidemiology provides a detailed picture of the geographical spread and fixation of RNA variants in viruses and can be used in combination with in silico RNA structure prediction to study the structural divergence of different lineages. Owing to the error-prone replication machinery inherent to many RNA viruses, SNVs are created continuously and represent the constitutive driving force behind viral quasispecies [65]. Although most of these mutations are considered neutral in an evolutionary context [66], a detailed understanding of the impact and functional associations of lineage-specific SNVs in viruses remains elusive. Importantly, both thermodynamic stabilization and destabilization of the RNA can have an effect, such as making the RNA accessible for protein binding. Single nucleotide mutations can lead to non-synonymous mutations at the amino acid level that result in potentially different protein function. Likewise, nucleotide mutations that culminate in synonymous mutations still have the capacity to alter RNA structure formation, leading to RNA phenotypes that can influence, e.g., co-translational protein folding efficiency and thereby mediate viral gene expression patterns [67].

We asked whether Nextstrain can be utilized to infer novel insight into RNA structure-association of individual clades. As Nextstrain is particularly convenient for discriminating characteristics of viral clades/lineages, we set out to expand the sequence-centric approach to computation of lineage-associated structural traits. By combining Nextstrain genome diversity data with RNA structure prediction methods, we could associate sequence variants observed in different CHIKV lineages with alternative RNA structure formation. Although we cannot associate lineage-specific SNVs with particular biological functionality at this point, the fixation of nucleotide mutations in certain CHIKV lineages suggests that they are either neutral or they confer an adaptive advantage in an evolutionary setting.

Building on a thermodynamic model for RNA structure formation, we present here 14 RNA candidates in the CHIKV coding region that exhibit an alternative fold upon mutation of individual nucleotides. While we were focusing on mutations that induce the most obvious changes to the RNA structural ensemble, there are many more lineage-specific mutations that may have only a subtle effect on local RNA folding or may be involved in long-range RNA structure formation. Importantly, phenotypic traits result from the complex interplay of all lineage-specific mutations, and in this line, our approach can help by defining target RNAs for experimental verification.

In summary, we show here that the combination of molecular epidemiology data with RNA structure prediction can help to gain insight into hitherto unresolved aspects of genotype–phenotype associations within viral species. On a broader scale, specifically when applied to different viruses, this can augment our understanding of RNA structure evolution.

Author Contributions

Conceptualization, A.d.B.S. and M.T.W.; methodology, A.d.B.S. and M.T.W.; software, T.S., M.D. and M.T.W.; validation, A.d.B.S. and M.T.W.; formal analysis, A.d.B.S. and M.T.W.; investigation, A.d.B.S., M.D., T.S. and M.T.W.; resources, M.T.W.; Data curation, A.d.B.S., M.D., T.S. and M.T.W.; writing—original draft preparation, M.D., T.S., A.d.B.S. and M.T.W.; writing—review and editing, A.d.B.S. and M.T.W.; visualization, M.D. and T.S.; supervision, A.d.B.S. and M.T.W.; Project administration, M.T.W.; funding acquisition, A.d.B.S. and M.T.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Institutes of Health (NIH) National Institute of Allergy and Infectious Diseases (grant number AI135992) to A.d.B.S. Open Access Funding by the University of Vienna.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The CHIKV Nextstrain build is available at https://nextstrain.org/community/ViennaRNA/CHIKV (accessed on 20 January 2021).

Acknowledgments

Open Access Funding by University of Vienna.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Estimates of average evolutionary divergence over sequence pairs within Chikungunya virus lineages. The number of base substitutions per site from averaging over all sequence pairs within each group are shown.

Lineage	Divergence
AUL-Am	0.0012
AUL	0.0128
SAL	0.003
MAL	0.0107
IOL	0.0062
EAL	0.0011
AAL	0.0066
WA	0.0102

Figure A1. Nextstrain time-resolved maximum likelihood phylogeny, highlighting a subtree that comprises isolates from the Asian Urban (orange) and Asian Urban-American (yellow) lineages. Inferred TMRCA of the shown subtree is July 2007 (confidence interval November 2006–March 2008).

References

Chandak, N.; Kashyap, R.; Kabra, D.; Karandikar, P.; Saha, S.; Morey, S.; Purohit, H.; Taori, G.; Daginawala, H. Neurological complications of Chikungunya virus infection. (Original Article) (Clinical report). Neurol. India 2009, 57, 177. [Google Scholar] [PubMed]
Forrester, N.; Palacios, G.; Tesh, R.; Savji, N.; Guzman, H.; Sherman, M.; Weaver, S.; Lipkin, W. Genome-scale phylogeny of the alphavirus genus suggests a marine origin. J. Virol. 2011, 86, 2729–2738. [Google Scholar] [CrossRef] [PubMed]
Weaver, S.C.; Forrester, N.L. Chikungunya: Evolutionary history and recent epidemic spread. Antivir. Res. 2015, 120, 32–39. [Google Scholar] [CrossRef]
Her, Z.; Kam, Y.W.; Lin, R.T.; Ng, L.F. Chikungunya: A bending reality. Microbes Infect. 2009, 11, 1165–1176. [Google Scholar] [CrossRef]
Robinson, M.C. An Epidemic Of Virus Disease In Southern Province, Tanganyika Territory. Trans. R. Soc. Trop. Med. Hyg. 1955, 49, 28–32. [Google Scholar] [CrossRef]
Ross, R. The Newala epidemic: III. The virus: Isolation, pathogenic properties and relationship to the epidemic. Epidemiol. Infect. 1956, 54, 177–191. [Google Scholar] [CrossRef]
Pialoux, G.; Gaüzère, B.A.; Jauréguiberry, S.; Strobel, M. Chikungunya, an epidemic arbovirosis. Lancet Infect. Dis. 2007, 7, 319–327. [Google Scholar] [CrossRef]
Renault, P.; Solet, J.L.; Sissoko, D.; Balleydier, E.; Larrieu, S.; Filleul, L.; Lassalle, C.; Thiria, J.; Rachou, E.; De Valk, H.; et al. A major epidemic of chikungunya virus infection on Réunion Island, France, 2005–2006. Am. J. Trop. Med. Hyg. 2007, 77, 727–731. [Google Scholar] [CrossRef]
Tsetsarkin, K.A.; Chen, R.; Weaver, S.C. Interspecies transmission and chikungunya virus emergence. Curr. Opin. Virol. 2016, 16, 143–150. [Google Scholar] [CrossRef] [PubMed]
Schuffenecker, I.; Iteman, I.; Michault, A.; Murri, S.; Frangeul, L.; Vaney, M.C.; Lavenir, R.; Pardigon, N.; Reynes, J.M.; Pettinelli, F.; et al. Genome Microevolution of Chikungunya Viruses Causing the Indian Ocean Outbreak (Chikungunya Virus Genome Microevolution). PLoS Med. 2006, 3, e263. [Google Scholar] [CrossRef] [PubMed]
Centers for Disease Control and Prevention. Countries and Territories Where Chikungunya Cases Have Been Reported (as of 17 September 2019). 2019. Available online: https://www.cdc.gov/chikungunya/geo (accessed on 8 June 2020).
Delatte, H.; Desvars, A.; Bouetard, A.; Bord, S.; Gimonneau, G.; Vourc, G.; Fontenille, D. Blood-feeding behavior of Aedes albopictus, a Vector of Chikungunya on la Reunion. (Report). Vector-Borne Zoonotic Dis. 2010, 10, 249. [Google Scholar] [CrossRef]
Rochlin, I.; Ninivaggi, D.V.; Hutchinson, M.L.; Farajollahi, A. Climate change and range expansion of the Asian tiger mosquito (Aedes albopictus) in Northeastern USA: Implications for public health practitioners. PLoS ONE 2013, 8, e60874. [Google Scholar] [CrossRef]
Powers, A.M.; Brault, A.C.; Tesh, R.B.; Weaver, S.C. Re-emergence of Chikungunya and O’nyong-nyong viruses: Evidence for distinct geographical lineages and distant evolutionary relationships. J. Gen. Virol. 2000, 81, 471–479. [Google Scholar] [CrossRef]
Volk, S.M.; Chen, R.; Tsetsarkin, K.A.; Adams, A.P.; Garcia, T.I.; Sall, A.A.; Nasar, F.; Schuh, A.J.; Holmes, E.C.; Higgs, S.; et al. Genome-Scale Phylogenetic Analyses of Chikungunya Virus Reveal Independent Emergences of Recent Epidemics and Various Evolutionary Rates. J. Virol. 2010, 84, 6497–6504. [Google Scholar] [CrossRef]
De Bernardi Schneider, A.; Ochsenreiter, R.; Hostager, R.; Hofacker, I.L.; Janies, D.; Wolfinger, M.T. Updated Phylogeny of Chikungunya Virus Suggests Lineage-Specific RNA Architecture. Viruses 2019, 11, 798. [Google Scholar] [CrossRef]
Nunes, M.R.T.; Faria, N.R.; de Vasconcelos, J.M.; Golding, N.; Kraemer, M.U.; de Oliveira, L.F.; da Silva Azevedo, R.d.S.; da Silva, D.E.A.; da Silva, E.V.P.; da Silva, S.P.; et al. Emergence and potential for spread of Chikungunya virus in Brazil. BMC Med. 2015, 13, 102. [Google Scholar] [CrossRef]
Teixeira, M.G.; Andrade, A.M.; Maria da Conceição, N.C.; Castro, J.S.; Oliveira, F.L.; Goes, C.S.; Maia, M.; Santana, E.B.; Nunes, B.T.; Vasconcelos, P.F. East/Central/South African genotype Chikungunya virus, Brazil, 2014. Emerg. Infect. Dis. 2015, 21, 906. [Google Scholar] [CrossRef]
White, S.K.; Mavian, C.; Salemi, M.; Morris, J.G., Jr.; Elbadry, M.A.; Okech, B.A.; Lednicky, J.A.; Dunford, J.C. A new “American” subgroup of African-lineage Chikungunya virus detected in and isolated from mosquitoes collected in Haiti, 2016. PLoS ONE 2018, 13, e0196857. [Google Scholar] [CrossRef]
Rezza, G.; Nicoletti, L.; Angelini, R.; Romi, R.; Finarelli, A.; Panning, M.; Cordioli, P.; Fortuna, C.; Boros, S.; Magurano, F.; et al. Infection with Chikungunya virus in Italy: An outbreak in a temperate region. Lancet 2007, 370, 1840–1846. [Google Scholar] [CrossRef]
Lanciotti, R.S.; Kosoy, O.L.; Laven, J.J.; Panella, A.J.; Velez, J.O.; Lambert, A.J.; Campbell, G.L. Chikungunya virus in US travelers returning from India, 2006. Emerg. Infect. Dis. 2007, 13, 764. [Google Scholar] [CrossRef]
De Lamballerie, X.; Leroy, E.; Charrel, R.; Ttsetsarkin, K.; Higgs, S.; Gould, E. Chikungunya virus adapts to tiger mosquito via evolutionary convergence: A sign of things to come? Virol. J. 2008, 5, 33. [Google Scholar] [CrossRef]
Zeller, H.; Van Bortel, W.; Sudre, B. Chikungunya: Its history in Africa and Asia and its spread to new regions in 2013–2014. J. Infect. Dis. 2016, 214, S436–S440. [Google Scholar] [CrossRef]
Yactayo, S.; Staples, J.E.; Millot, V.; Cibrelus, L.; Ramon-Pardo, P. Epidemiology of Chikungunya in the Americas. J. Infect. Dis. 2016, 214, S441–S445. [Google Scholar] [CrossRef]
Halstead, S.B. Reappearance of chikungunya, formerly called dengue, in the Americas. Emerg. Infect. Dis. 2015, 21, 557. [Google Scholar] [CrossRef]
Strauss, J.H.; Strauss, E.G. The Alphaviruses: Gene Expression, Replication, and Evolution. Microbiol. Mol. Biol. R. 1994, 58, 491–562. [Google Scholar] [CrossRef]
Li, X.F.; Jiang, T.; Deng, Y.Q.; Zhao, H.; Yu, X.D.; Ye, Q.; Wang, H.J.; Zhu, S.Y.; Zhang, F.C.; Qin, E.D.; et al. Complete genome sequence of a Chikungunya virus isolated in Guangdong, China. J. Virol. 2012, 86, 8904–8905. [Google Scholar] [CrossRef]
Hyde, J.L.; Chen, R.; Trobaugh, D.W.; Diamond, M.S.; Weaver, S.C.; Klimstra, W.B.; Wilusz, J. The 5’ and 3’ ends of alphavirus RNAs–non-coding is not non-functional. Virus Res. 2015, 206, 99–107. [Google Scholar] [CrossRef] [PubMed]
Chen, R.; Wang, E.; Tsetsarkin, K.A.; Weaver, S.C. Chikungunya virus 3’ untranslated region: Adaptation to mosquitoes and a population bottleneck as major evolutionary forces. PLoS Pathog. 2013, 9, e1003591. [Google Scholar] [CrossRef]
Filomatori, C.V.; Bardossy, E.S.; Merwaiss, F.; Suzuki, Y.; Henrion, A.; Saleh, M.C.; Alvarez, D.E. RNA recombination at Chikungunya virus 3’UTR as an evolutionary mechanism that provides adaptability. PLoS Pathog. 2019, 15, e1007706. [Google Scholar] [CrossRef]
Kiening, M.; Ochsenreiter, R.; Hellinger, H.J.; Rattei, T.; Hofacker, I.; Frishman, D. Conserved secondary structures in viral mRNAs. Viruses 2019, 11, 401. [Google Scholar] [CrossRef]
Ochsenreiter, R.; Hofacker, I.L.; Wolfinger, M.T. Functional RNA Structures in the 3’UTR of Tick-Borne, Insect-Specific and No-Known-Vector Flaviviruses. Viruses 2019, 11, 298. [Google Scholar] [CrossRef] [PubMed]
Yang, D.; Leibowitz, J.L. The structure and functions of coronavirus genomic 3’ and 5’ ends. Virus Res. 2015, 206, 120–133. [Google Scholar] [CrossRef]
Kutchko, K.M.; Madden, E.A.; Morrison, C.; Plante, K.S.; Sanders, W.; Vincent, H.A.; Cruz Cisneros, M.C.; Long, K.M.; Moorman, N.J.; Heise, M.T.; et al. Structural divergence creates new functional features in alphavirus genomes. Nucleic Acids Res. 2018, 46, 3657–3670. [Google Scholar] [CrossRef]
Madden, E.A.; Plante, K.S.; Morrison, C.R.; Kutchko, K.M.; Sanders, W.; Long, K.M.; Taft-Benz, S.; Cisneros, M.C.C.; White, A.M.; Sarkar, S.; et al. Using SHAPE-MaP to model RNA secondary structure and identify 3’UTR variation in chikungunya virus. J. Virol. 2020, 94, e00701-20. [Google Scholar] [CrossRef] [PubMed]
Pfeffer, M.; Kinney, R.M.; Kaaden, O.R. The Alphavirus 3’-Nontranslated Region: Size Heterogeneity and Arrangement of Repeated Sequence Elements. Virology 1998, 240, 100–108. [Google Scholar] [CrossRef]
Halvorsen, M.; Martin, J.S.; Broadaway, S.; Laederach, A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet. 2010, 6, e1001074. [Google Scholar] [CrossRef]
Martin, J.S.; Halvorsen, M.; Davis-Neulander, L.; Ritz, J.; Gopinath, C.; Beauregard, A.; Laederach, A. Structural effects of linkage disequilibrium on the transcriptome. RNA 2012, 18, 77–87. [Google Scholar] [CrossRef]
Wan, Y.; Qu, K.; Zhang, Q.C.; Flynn, R.A.; Manor, O.; Ouyang, Z.; Zhang, J.; Spitale, R.C.; Snyder, M.P.; Segal, E.; et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 2014, 505, 706–709. [Google Scholar] [CrossRef]
Corley, M.; Solem, A.; Qu, K.; Chang, H.Y.; Laederach, A. Detecting riboSNitches with RNA folding algorithms: A genome-wide benchmark. Nucleic Acid Res. 2015, 43, 1859–1868. [Google Scholar] [CrossRef] [PubMed]
He, F.; Wei, R.; Zhou, Z.; Huang, L.; Wang, Y.; Tang, J.; Zou, Y.; Shi, L.; Gu, X.; Davis, M.J.; et al. Integrative Analysis of Somatic Mutations in Non-coding Regions Altering RNA Secondary Structures in Cancer Genomes. Sci. Rep. 2019, 9, 1–12. [Google Scholar] [CrossRef]
Lin, J.; Chen, Y.; Zhang, Y.; Ouyang, Z. Identification and analysis of RNA structural disruptions induced by single nucleotide variants using Riprap and RiboSNitchDB. NAR Genom. Bioinform. 2020, 2, lqaa057. [Google Scholar] [CrossRef]
De Bernardi Schneider, A.; Ford, C.T.; Hostager, R.; Williams, J.; Cioce, M.; Çatalyürek, Ü.V.; Wertheim, J.O.; Janies, D. StrainHub: A phylogenetic tool to construct pathogen transmission networks. Bioinformatics 2020, 36, 945–947. [Google Scholar] [CrossRef]
Campbell, F.; Didelot, X.; Fitzjohn, R.; Ferguson, N.; Cori, A.; Jombart, T. outbreaker2: A modular platform for outbreak reconstruction. BMB Bioinform. 2018, 19, 1–8. [Google Scholar] [CrossRef]
De Maio, N.; Worby, C.J.; Wilson, D.J.; Stoesser, N. Bayesian reconstruction of transmission within outbreaks using genomic variants. PLoS Comput. Biol. 2018, 14, e1006117. [Google Scholar] [CrossRef]
Hadfield, J.; Megill, C.; Bell, S.M.; Huddleston, J.; Potter, B.; Callender, C.; Sagulenko, P.; Bedford, T.; Neher, R.A. Nextstrain: Real-time tracking of pathogen evolution. Bioinformatics 2018, 34, 4121–4123. [Google Scholar] [CrossRef]
Kinganda-Lusamaki, E.; Black, A.; Mukadi, D.; Hadfield, J.; Mbala-Kingebeni, P.; Pratt, C.; Aziza, A.; Diagne, M.; White, B.; Bisento, N.; et al. Operationalizing genomic epidemiology during the Nord-Kivu Ebola outbreak, Democratic Republic of the Congo. medRxiv 2020. [Google Scholar] [CrossRef]
Popa, A.; Genger, J.W.; Nicholson, M.D.; Penz, T.; Schmid, D.; Aberle, S.W.; Agerer, B.; Lercher, A.; Endler, L.; Colaco, H.; et al. Genomic epidemiology of superspreading events in Austria reveals mutational dynamics and transmission properties of SARS-CoV-2. Sci. Transl. Med. 2020, 12. [Google Scholar] [CrossRef] [PubMed]
Clark, K.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Sayers, E.W. GenBank. Nucleic Acids Res. 2016, 44, D67–D72. [Google Scholar] [CrossRef]
Kumar, S.; Stecher, G.; Tamura, K. MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 2016, 33, 1870–1874. [Google Scholar] [CrossRef]
Kumar, S.; Stecher, G.; Li, M.; Knyaz, C.; Tamura, K. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 2018, 35, 1547–1549. [Google Scholar] [CrossRef]
Tamura, K.; Nei, M.; Kumar, S. Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc. Natl. Acad. Sci. USA 2004, 101, 11030–11035. [Google Scholar] [CrossRef] [PubMed]
Köster, J.; Rahmann, S. Snakemake—A scalable bioinformatics workflow engine. Bioinformatics 2012, 28, 2520–2522. [Google Scholar] [CrossRef]
Lorenz, R.; Stadler, P.F. RNA Secondary Structures with Limited Base Pair Span: Exact Backtracking and an Application. Genes 2021, 12, 14. [Google Scholar] [CrossRef] [PubMed]
Lorenz, R.; Bernhart, S.H.; Zu Siederdissen, C.H.; Tafer, H.; Flamm, C.; Stadler, P.F.; Hofacker, I.L. ViennaRNA Package 2.0. Algorithm. Mol. Biol. 2011, 6, 26. [Google Scholar] [CrossRef]
Miladi, M.; Raden, M.; Diederichs, S.; Backofen, R. MutaRNA: Analysis and visualization of mutation-induced changes in RNA structure. Nucleic Acids Res. 2020, 37, 1–5. [Google Scholar] [CrossRef]
Sagulenko, P.; Puller, V.; Neher, R.A. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol. 2018, 4, vex042. [Google Scholar] [CrossRef]
Lopez-Delisle, L.; Rabbani, L.; Wolff, J.; Bhardwaj, V.; Backofen, R.; Grüning, B.; Ramírez, F.; Manke, T. pyGenomeTracks: Reproducible plots for multivariate genomic data sets. Bioinformatics 2020, btaa692. [Google Scholar] [CrossRef] [PubMed]
Tsetsarkin, K.A.; Vanlandingham, D.L.; McGee, C.E.; Higgs, S. A single mutation in Chikungunya virus affects vector specificity and epidemic potential. PLoS Pathog. 2007, 3, e201. [Google Scholar] [CrossRef]
McNaughton, A.L.; Revill, P.A.; Littlejohn, M.; Matthews, P.C.; Ansari, M.A. Analysis of genomic-length HBV sequences to determine genotype and subgenotype reference sequences. J. Gen. Virol. 2020, 101, 1–13. [Google Scholar] [CrossRef]
Bbosa, N.; Kaleebu, P.; Ssemwanga, D. HIV subtype diversity worldwide. Curr. Opin. HIV Aids 2019, 14, 153–160. [Google Scholar] [CrossRef]
De Bernardi Schneider, A.; Osiowy, C.; Hostager, R.; Krarup, H.; Borresen, M.; Tanaka, Y.; Morriseau, T.; Wertheim, J.O. Analysis of Hepatitis B virus genotype D in Greenland suggests the presence of a novel quasi-subgenotype. Front. Microbiol. 2021. [Google Scholar] [CrossRef]
Robertson, D.L.; Anderson, J.; Bradac, J.; Carr, J.; Foley, B.; Funkhouser, R.; Gao, F.; Hahn, B.; Kalish, M.; Kuiken, C.; et al. HIV-1 nomenclature proposal. Science 2000, 288, 55–56. [Google Scholar] [CrossRef]
Souza, T.M.A.; Azeredo, E.L.; Badolato-Corrêa, J.; Damasco, P.V.; Santos, C.; Petitinga-Paiva, F.; Nunes, P.C.G.; Barbosa, L.S.; Cipitelli, M.C.; Chouin-Carneiro, T.; et al. First report of the East-Central South African genotype of Chikungunya virus in Rio de Janeiro, Brazil. PLoS Curr. 2017, 9. [Google Scholar] [CrossRef]
Schuster, P. Quasispecies on Fitness Landscapes. In Quasispecies: From Theory to Experimental Systems; Domingo, E., Schuster, P., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 61–120. [Google Scholar] [CrossRef]
Geoghegan, J.L.; Holmes, E.C. Virus Evolution. In Fields Virology; Howley, P.M., Knipe, D.M., Eds.; Wolters Kluwer Health: Philadelphia, PA, USA, 2021. [Google Scholar]
Faure, G.; Ogurtsov, A.Y.; Shabalina, S.A.; Koonin, E.V. Adaptation of mRNA structure to control protein folding. RNA Biol. 2017, 14, 1649–1654. [Google Scholar] [CrossRef]

Figure 1. Nexstrain time-resolved maximum likelihood phylogeny of CHIKV. The overview tree in (a) comprises the major lineages except the West Africa lineage. The latter is located ancestral to all other clades with an inferred time of the most recent common ancestor (TMRCA) date in 1645 (confidence interval 1625–1667, data not shown). (b) shows a more detailed view of the clade encompassing the East African (dark blue) and the Indian Ocean Lineage (light blue). (c) shows the subtree containing the South American lineage. A detailed view of the Asian Urban-American lineage is shown in Figure A1.

Figure 2. (a) Schematic genome representation of CHIKV isolate KT327163.2, which is used as the reference strain here. Open reading frames (ORFs) encoding non-structural and structural polyproteins are depicted in dark blue and light blue, respectively, UTRs in grey. Protein products are shown in orange. Locally stable RNAs that overlap single nucleotide variants (SNV) loci (indicated by dashed lines) are depicted in black. Genome tracks were created using pyGenomeTracks [58]. Genomic coordinates of candidate RNAs and SNVs are listed in Table 3. (b) Predicted minimum free energy (MFE) structures of the 14 locally stable RNAs in the reference strain.

Figure 3. Ensemble properties of the 81ṅt locally stable RNA element that contains the A1653G variant, represented as A49G here. Different patterns of base-pair probabilities of the wild type (WT) and mutant are shown as circle plots in (a,b), with darker arcs corresponding to increased base pairing probability. The SNV position in (b) is highlighted in red within the green circle that represents the RNA backbone. The dot plot in (c) shows the differences in base pairing potential of both WT (upper triangle) an mutant (lower triangle), with dark dots indicating high base pairing probability of particular sequence positions. Red dotted lines indicate the mutated position. Minimum free energy structures of the WT and mutant are shown in (d,e), respectively, with the mutated position highlighted in red. The A1653G mutation results in a thermodynamic stabilization by 4.8 kcal/mol. This mutation is specific to AUL, and in particular to AUL-Am isolates, highlighted in yellow in the time-resolved phylogenetic tree in (f).

Figure 4. Aberration of the thermodynamic ensemble of a locally stable RNA element induced by mutations at position 11,446, represented here as A19C and A19U. Structural diversity of the thermodynamic ensembles is shown in circle plots (a–c), where hue levels of gray represent pairing probabilities. The closing stem of 9 stacked base pairs is formed in all three variants, while the central parts of the RNA from positions 10–70 show varied base pairing patterns. The SNV at position 19 is highlighted in red on top of the green circles in (b,c) as well as in the minimum free energy structures (d–f).

Table 1. Estimates of evolutionary divergence over sequence pairs between Chikungunya virus (CHIKV) lineages. The number of base substitutions per site from averaging over all sequence pairs between groups are shown. Analyses were conducted using the Maximum Composite Likelihood model [52] with 924 CHIKV nucleotide sequences encompassing the lineages/groups observed in this study. Lineages: Asian urban (AUL), AUL-America (AUL-Am), South America (SAL), Middle Africa (MAL), Indian Ocean (IOL), East Africa (EAL), Africa and Asia (AAL), Sister Taxa to ECSA (sECSA), West Africa (WA).

Lineage	AUL-Am	AUL	SAL	MAL	IOL	EAL	AAL	sECSA
AUL	0.01108
SAL	0.06902	0.06622
MAL	0.06840	0.06548	0.02432
IOL	0.06988	0.06734	0.02954	0.02631
EAL	0.06763	0.06533	0.02574	0.02259	0.00584
AAL	0.06390	0.06067	0.03145	0.02887	0.03141	0.02807
sECSA	0.06302	0.06014	0.02957	0.02758	0.03038	0.02690	0.01918
WA	0.19383	0.19228	0.17696	0.17829	0.17917	0.17750	0.17443	0.17505

Table 2. TMRCA estimates of CHIKV lineages, extracted from the Nextstrain instance. Dates are formatted as DD-MM-YYYY.

Lineage	TMRCA	Date Confidence Interval	Year of First Isolation
AAL	17-04-1948	(18-10-1946,14-12-1949)	1953
AUL	05-02-1951	(13-09-1949, 04-01-1953)	1958
AUL-Am	12-03-2008	(24-12-2007, 10-11-2008)	2013
EAL	24-05-2002	(15-02-2001, 20-04-2003)	2005
IOL	03-08-2003	(20-10-2002, 14-01-2004)	2006
MAL	31-01-1953	(20-05-1951, 14-01-1955)	1962
SAL	22-03-2011	(24-06-2009, 15-09-2012)	2014
WA	16-01-1954	(13-05-1952, 26-09-1955)	1964

Table 3. Lineage-specific single-nucleotide mutations that modulate the fold of locally stable RNA elements in the CHIKV genome, computed for the reference strain of our Nextstrain build (accession KT327163.2). A total of 16 SNVs are overlapped by 14 locally stable RNAs. The Type column indicates whether a SNV induces a synonymous (S) or non-synonymous (N) mutation at the amino acid level. Amino acid changes are listed for Type N mutations in the AA mutation column. The Locus column lists global coordinates of locally stable RNAs in the reference strain. The minimum free energy (MFE) values for wild-type (WT) and mutant (Mut) are given in kcal/mol. The base-pair (BP) distances are computed between MFE structures of WT and Mut. * Subclade.

^{§}

Isolates collected from 1983 onwards.

^{†}

Isolates collected from 2006 onwards.

^{‡}

Isolates collected from 2019 onwards.

Table 3. Lineage-specific single-nucleotide mutations that modulate the fold of locally stable RNA elements in the CHIKV genome, computed for the reference strain of our Nextstrain build (accession KT327163.2). A total of 16 SNVs are overlapped by 14 locally stable RNAs. The Type column indicates whether a SNV induces a synonymous (S) or non-synonymous (N) mutation at the amino acid level. Amino acid changes are listed for Type N mutations in the AA mutation column. The Locus column lists global coordinates of locally stable RNAs in the reference strain. The minimum free energy (MFE) values for wild-type (WT) and mutant (Mut) are given in kcal/mol. The base-pair (BP) distances are computed between MFE structures of WT and Mut. * Subclade.

^{§}

Isolates collected from 1983 onwards.

^{†}

Isolates collected from 2006 onwards.

^{‡}

Isolates collected from 2019 onwards.

Variant	Type	Protein	AA Mutation	RNA #	Locus	${M F E}_{W T}$	z-Score	${M F E}_{M u t}$	BP Distance	Lineage Association
G432A	N	nsP1	E > A	1	378–472	−30.00	−2.474	−27.00	29	G: AUL, AUL-Am
U810C	S	nsP1	–	2	783–847	−21.60	−2.103	−18.70	42	U: WA, AUL, AUL-Am, $I O L$ *
A1653G	S	nsP1	–	3	1605–1685	−23.30	−2.640	−28.10	15	A: AUL $^{§}$ , AUL-Am
U2122C	S	nsP2	–	4	2105–2202	−31.90	−2.171	−28.30	29	U: AUL, AUL-Am
G2232A	S	nsP2	–	5	2210–2300	−31.50	−2.771	−27.40	36	G: AUL, AUL-Am
C3108U	S	nsP2	–	6	3093–3192	−28.20	−2.141	−25.60	42	C: WA, AUL, AUL-Am
C3682U	S	nsP2	–	7	3630–3731	−42.20	−4.325	−40.10	22	C: AUL, AUL-Am, $I O L$ *
U5508A	N	nsP3	D > E	8	5467–5527	−18.10	−2.370	−16.20	15	U: ${A U L}^{†}$ , AUL-Am
G8336C	S	C	–	9	8312–8395	−37.10	−3.023	−34.40	27	G: ${A U L}^{†}$ , AUL-Am
C8358U	S	C	–	9	8312–8395	−37.10	−3.023	−36.40	27	C: AUL, AUL-AM, SAL, MAL $^{‡}$ , $I O L$ *
G8969A	N	E2	R > K	10	8918–9019	−38.60	−2.447	−36.10	40	A: ${A U L}^{†}$ , AUL-Am
C9197U	S	E2	–	11	9130–9214	−22.40	−2.443	−20.40	34	C: WA, AUL, AUL-AM
A9414C	S	E2	–	12	9392–9456	−17.60	−2.568	−15.80	18	A: ${A U L}^{†}$ , AUL-Am
A10460C	N	E1	I > V	13	10369–10468	−31.60	−2.623	−31.70	18	A: AUL, AUL-AM
A11246C	S	E1	–	14	11,228–11,307	−29.30	−3.049	−26.90	25	A: AUL, AUL-Am, AAL, SAL, MAL; C: EAL, IOL
A11246U	S	E1	–	14	11,228–11,307	−29.30	−3.049	−27.20	36	A: AUL, AUL-Am, AAL, SAL, MAL; U: WA

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Spicher, T.; Delitz, M.; Schneider, A.d.B.; Wolfinger, M.T. Dynamic Molecular Epidemiology Reveals Lineage-Associated Single-Nucleotide Variants That Alter RNA Structure in Chikungunya Virus. Genes 2021, 12, 239. https://doi.org/10.3390/genes12020239

AMA Style

Spicher T, Delitz M, Schneider AdB, Wolfinger MT. Dynamic Molecular Epidemiology Reveals Lineage-Associated Single-Nucleotide Variants That Alter RNA Structure in Chikungunya Virus. Genes. 2021; 12(2):239. https://doi.org/10.3390/genes12020239

Chicago/Turabian Style

Spicher, Thomas, Markus Delitz, Adriano de Bernardi Schneider, and Michael T. Wolfinger. 2021. "Dynamic Molecular Epidemiology Reveals Lineage-Associated Single-Nucleotide Variants That Alter RNA Structure in Chikungunya Virus" Genes 12, no. 2: 239. https://doi.org/10.3390/genes12020239

APA Style

Spicher, T., Delitz, M., Schneider, A. d. B., & Wolfinger, M. T. (2021). Dynamic Molecular Epidemiology Reveals Lineage-Associated Single-Nucleotide Variants That Alter RNA Structure in Chikungunya Virus. Genes, 12(2), 239. https://doi.org/10.3390/genes12020239

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Molecular Epidemiology Reveals Lineage-Associated Single-Nucleotide Variants That Alter RNA Structure in Chikungunya Virus

Abstract

1. Introduction

1.1. Geographical Spread of Chikungunya Virus

1.2. RNA Structure Conservation in Chikungunya Virus Genomes

1.3. Molecular Epidemiology Reveals RNA Structure-Affecting SNVs

2. Materials and Methods

2.1. Taxon Selection

2.2. Genetic Distance

2.3. CHIKV Nextstrain

2.4. RNA sTructure Modulation via Lineage-Associated SNVs

2.5. Data Availability

3. Results

3.1. Genetic Distance between Chikungunya Virus Lineages

3.2. A Nextstrain Build for Chikungunya Virus

3.3. Lineage-Specific RNA Structures

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI