A Novel Sub-Lineage of Chikungunya Virus East/Central/South African Genotype Indian Ocean Lineage Caused Sequential Outbreaks in Bangladesh and Thailand

In recent decades, chikungunya virus (CHIKV) has become geographically widespread. In 2004, the CHIKV East/Central/South African (ECSA) genotype moved from Africa to Indian ocean islands and India followed by a large epidemic in Southeast Asia. In 2013, the CHIKV Asian genotype drove an outbreak in the Americas. Since 2016, CHIKV has re-emerged in the Indian subcontinent and Southeast Asia. In the present study, CHIKVs were obtained from Bangladesh in 2017 and Thailand in 2019, and their nearly full genomes were sequenced. Phylogenetic analysis revealed that the recent CHIKVs were of Indian Ocean Lineage (IOL) of genotype ECSA, similar to the previous outbreak. However, these CHIKVs were all clustered into a new distinct sub-lineage apart from the past IOL CHIKVs, and they lacked an alanine-to-valine substitution at position 226 of the E1 envelope glycoprotein, which enhances CHIKV replication in Aedes albopictus. Instead, all the re-emerged CHIKVs possessed mutations of lysine-to-glutamic acid at position 211 of E1 and valine-to-alanine at position 264 of E2. Molecular clock analysis suggested that the new sub-lineage CHIKV was introduced to Bangladesh around late 2015 and Thailand in early 2017. These results suggest that re-emerged CHIKVs have acquired different adaptations than the previous CHIKVs.


Introduction
Chikungunya virus (CHIKV) is a mosquito-borne virus transmitted via Aedes mosquitoes that causes a febrile illness accompanied by rash and arthralgia. CHIKV belongs to genus Alphavirus, family Togaviridae, and it contains a single-stranded RNA genome of approximately [11][12] kb with two open reading frames (ORFs) flanked by untranslated regions (UTRs) at the 5 and 3 ends. The CHIKV genome encodes two polyproteins that are subsequently processed into four nonstructural and four structural proteins. Although the first CHIKV was isolated in 1952 in Tanzania, both clinical records in the 18th and 19th centuries and molecular clock analyses of current CHIKV genomes suggest that this virus has existed for 300 years [1,2]. At present, CHIKV is phylogenetically classified into three genotypes reflecting the geographical locations where the respective strains were first isolated, East/Central/South African (ECSA), West African (WA), and Asian. CHIKV is transmitted by Aedes mosquitoes that are found in tropical and temperate regions. In particular, Ae. aegypti is predominant in tropical areas in East Africa, South Asia, and Southeast Asia where a re-emergence of CHIKV has occurred [3].
The emergence of new lineages has been common in recent decades. In 2004, the ECSA-genotype CHIKV circulated in Ae. aegypti in coastal Kenya and spread to the Comoros and other islands in the Indian Ocean, India, and Sri Lanka, and it has since diverged into a new lineage called the Indian Ocean Lineage (IOL) [2]. Shortly thereafter, IOL CHIKV reached Southeast Asia, particularly Singapore, Malaysia, and Thailand, and it caused large epidemics that resulted in over a million suspected CHIKV cases [4]. An alanine-to-valine mutation at position 226 within the E1 glycoprotein confers enhanced virus replication in Ae. albopictus and has been implicated in the widespread circulation of CHIKV in Asia and even in temperate climate countries such as Italy and France [4][5][6]. In December 2013, Asian-genotype CHIKV that originated in Micronesia in the Pacific was introduced into the Caribbean Sea, which is a region with naïve immunity to CHIKV and abundant A. aegypti. The outbreak started in St. Martin, was transmitted to other islands, and finally rapidly spread throughout Central and South America during late 2014-2016. These CHIKVs have been characterized as the Caribbean outbreak lineage (COL) [7]. Moreover, ECSA-genotype CHIKV also appeared in Brazil in 2014 [8].
In 2016, another CHIKV outbreak started in northern India [9,10] and Pakistan, and it spread to Bangladesh in 2017 [11] and even Kenya and Italy in 2016-2017 [12][13][14]. A re-emergence of CHIKV has also been reported in Thailand since 2018 until present [15]. In the present study, to investigate the evolutionary relationships and genetic diversity of CHIKVs that have re-emerged since 2016, we conducted a phylogenetic analysis using the complete coding regions of CHIKVs from South and Southeast Asia, namely Bangladesh and Thailand (Figure 1), collected in 2017 and 2019, respectively, together with public sequences from GenBank. The results revealed that both CHIKV sequences from Bangladesh and Thailand belonged to ECSA-IOL CHIKV. However, these sequences clustered in a distinct clade apart from the previous IOL CHIKV clade from 2004-2013, forming a new sub-lineage. Our molecular clock analysis suggested that the new sub-lineage IOL emerged in 2008 and was introduced into Bangladesh around 2015, and then into Thailand in early 2017.

CHIKV Samples
This study was conducted in accordance with the Declaration of Helsinki and was exempt from obtaining participants' consent since only leftover specimens were used after anonymization. All viruses analyzed in the present study are listed in Table 1. Twenty acute febrile cases that were serum-positive for CHIKV by real-time RT-PCR were collected at the Apollo Hospitals Dhaka in Dhaka city, Bangladesh, in 2017 [11,16]. This study proposal was approved by the Research and Ethics Committee of Apollo Hospitals Dhaka (ERC 16/2018-3). Fourteen CHIKV-positive sera samples were also obtained from the Hospital for Tropical Diseases in Bangkok, Thailand, in 2019. This study proposal was approved by the Research and Ethics Committee of the Faculty of Tropical Medicine, Mahidol University (FTM ECF-035-01). High viral load (C t value less than 20) sera were directly used for RT-PCR amplification of the CHIKV genome, while the other sera were subjected to virus isolation in the C6/36 mosquito cell line cultivated in L-15 medium (Hyclone, UT, USA) supplemented with 2% fetal bovine serum (Hyclone, UT, USA). Inoculated cells were incubated at 28 • C until the cytopathic effect appeared, and culture supernatants were subjected to real-time RT-PCR to confirm successful virus isolation. One laboratory CHIKV strain (CP10) was isolated in 2010 [17].

CHIKV Genome Sequencing
Either 70 µL of sera or cultured supernatant were used for RNA extraction by QIAamp viral RNA mini kits (Qiagen, Hilden, Germany). Two overlapping amplified DNA fragments covering the full genome of CHIKV were prepared, quantified, and normalized to 0.2 ng/µL using a Qubit fluorometer (Themo Fisher Scientific, MA, USA). Five µL of the amplified fragments were processed for library preparation by an Illumina Nextera XT kit (Illumina, San Diego, CA, USA) and the paired-end of the 2 × 250 bp sequencing reaction was conducted using the Miseq platform (Illumina, San Diego, CA, USA) according to a previously published protocol [18]. The forward and reverse short reads were imported to CLC Genomics Workbench software version 20 (Qiagen, Aarhus, Denmark) and then aligned to the MF773566-Bangladesh 2017 sequence [19] using map reads to the reference command. Finally, the consensus sequence was extracted.

Nucleotide Sequence Accession Numbers
All sequences generated in the present study were deposited to GenBank under accession numbers LC580236-LC580255 and LC580256-LC580270 for Bangladesh and Thailand sequences, respectively.
To further investigate the viral evolution with a Bayesian phylogenetic molecular clock approach, a second dataset of ECSA-CHIKV was prepared covering the geography and time spanning from the 1950s to the latest outbreak in the present study. This ECSA-CHIKV dataset was tested for the presence of recombination using GARD implemented in the Datamonkey server (https://www.datamonkey.org/) before proceeding to the construction of the initial ML tree, under GTR+F+I+G4 and 1000 replicates of bootstrap, to determine the evolutionary temporal signal using Tempest v1.5.3 [27]. To reconstruct the time-scale phylogeny, Bayesian Markov Chain Monte Carlo (MCMC) sampling was undertaken in the BEAST v1.10.4 package [28]. The best-fit model of a relaxed uncorrelated log normal clock and the Bayesian coalescent tree prior of Bayesian Skygrid under SRD06 nucleotide substitution was chosen by comparison with the other models by path sampling (PS) and stepping-stone sampling (SS) methods with 50 path steps of 1 million iterations and log every 1000 (Table S2) [29][30][31]. The grid point was set at 75 as predicted by the intercept of root-to-tip regression of the initial ECSA-CHIKV ML tree. Each sequence was tagged to a geographic region according to the collected location. The four independent runs of MCMC were done for 150 million generations each, with sampling every 15,000 generations. The running speed was enhanced using BEAGLE. All runs were combined to achieve good mixing of convergence in LogCombiner v1.10.4. The trace was accessed in Tracer v1.7.1, and then a 10% burn-in was removed. The maximum clade credibility (MCC) tree was generated using TreeAnnotator v1. 10 To determine lineage specific amino acid mutations, the corresponding amino acids of the ECSA-CHIKV nucleotide alignment were aligned in AliView v.1.26. Amino acid positions were annotated following HM045811-Tanzania-1953 CHIKV.

Selection Analyses
The two individual ORFs of nonstructural polyprotein and structural polyprotein corresponding to the ECSA CHIKV dataset, and the newly generated CHIKV dataset was applied for selection pressure analysis implemented in the DataMonkey server (http://datamonkey.org/) by using three methods: mixed-effects model of evolution (MEME), fast, unconstrained Bayesian approximation (FUBAR), and fixed-effect likelihood (FEL) for the site-specific selection [32][33][34][35].

Evolution of the ECSA-CHIKV IOL Sub-Lineage
To investigate the genetic relationships among the previous and recent ECSA-IOL CHIKVs, we collected additional sequences of ECSA-IOL from recent reports and subjected them to molecular clock analysis. To estimate the time of the most recent common ancestor (tMRCA) and the evolution rate of the new ECSA-IOL CHIKV sub-lineage circulating in 2016-2019, the expanded dataset of the CHIKV ECSA genotype (n = 233) was analyzed using Bayesian phylogenetic inference. The alignment of the sequence dataset showed no recombination among the collected CHIKV sequences. Then, we constructed an initial ML tree for root-to-tip analysis. The linear regression divergence against collection time exhibited a positive temporal signal of 0.88. By this method, the tMRCA was dated at 1945.57, and the substitution rate was estimated at 3.95 × 10 −4 substitutions/site/year (s/s/y) ( Figure 3A). The relaxed uncorrelated lognormal clock and Skygrid, the best-fit model, was used for estimation (Table S2). The timescale MCC tree showed that the root tMRCA of ECSA was estimated to be around 1949.17 (95% highest posterior density (HPD): 1944.14-1957.42) ( Figure 3B and Figure S1). The lineages, sub-lineages, and clades were arbitrary named and are described in Table 2 according to the defined nodes in Figure 3B.
As shown in the CHIKVs derived from node A in Figure 3B, ECSA CHIKVs that circulated around the 1950s-1970s in Africa first emerged in 1950. 48 (95%HPD: 194848 (95%HPD: .58-1952 42-2007.25). In addition, our strain CP10 collected in 2010 from Ratchaburi, Thailand also fell into this clade clustered with other Thailand CHIKVs with a similar collection date.
In contrast, not all but some Indian CHIKV sequences clustered into a separate clade (PP = 1) derived from node C2.

Signature Mutations Related to CHIKV ECSA-IOL Sub-Lineages among CHIKV Outbreak Strains
The tree topology of ECSA-IOL CHIKV showed that several distinct lineages diverged during 2005-2019. Therefore, we further examined the nonsynonymous mutations during the evolution of the lineages, especially those of the previous IOL outbreak in 2005 and the recent IOL outbreak. An alanine-to-valine substitution at residue 226 of the envelope E1 protein (E1-A226V) has been reported to facilitate CHIKV replication in Ae. albopictus. Herein, this mutation was observed in certain sequences of node C1-viruses (IOL-Indian Ocean islands), node C2.1-viruses (IOL-Indian subcontinent), and C2.2-viruses (IOL-Indian subcontinent/SEA). Remarkably, all the C2.2a viruses obtained during the previous Southeast Asian outbreak including in Thailand carried E1-226V ( Figure 3B and Table 3). However, E1-A226V was not detected, but E1-226A was found in all sequences in the current outbreak node C2.3 viruses (northern Indian subcontinent/SEA). Furthermore, the descendant sequences in node C2.3 viruses exhibited unique E1-K211E and E2-V264A substitutions, which are novel lineage-specific substitutions, in all the current outbreak strains in India and Pakistan in 2016, in Bangladesh in 2017, in Italy in 2017, and in Thailand in 2019 ( Figure 3 and Table 3). In addition, the viruses derived from node C2.3a, i.e., India 2016, Pakistan 2016, Bangladesh 2017, and Thailand 2019 ( Figure 3C and Table 3), shared five substitutions: nsP2-H130Y, nsP2-E145D, nsP4-S55N, nsP4-R85G, and E1-I317V. Moreover, the descendant clade defined by node C2.3b shared two more amino acid substitutions, nsP3-D372E and E2-G205S. Interestingly, nsP2-793V/A was found in Bangladeshi strains, while nsP2-V793A and nsP2-N495S were found in all sequences of the Thailand clade defined by node C2.3c, including Thailand, Myanmar, and China strains.

Selection Analyses
To identify amino acids under selection pressure within the ECSA-CHIKV, analyses using Datamonkey methods FEL, FUBAR, and MEME were conducted by aligning individual coding regions of the nonstructural polyproteins and structural polyproteins of all sequences used in Figure 3 (Table 4). A subset of the sequences obtained in the present study was also used to detect recent evolution of ECSA-CHIKV (Table 4). Residues nsP1-171, nsP3-217, and E1-211 in the total ECSA dataset and nsP3-58 and C-73 in the recent ECSA dataset were shown to be under positive selection pressure at statistically significant p-values of <0.05 in FEL and MEME and PP > 0.9 in FUBAR.
* Codon numbering from the first codon in each open reading frame, ** codon numbering from the first codon in each viral protein, *** corresponds to the defined nodes in Figure 3B and Table 2.

Discussion
The emergence of the ECSA CHIKV lineage IOL caused one million cases over a decade. Sequential outbreaks are still being reported in the Indian subcontinent. Bangladesh, located to the northeast of India, faced a second large outbreak in 2017, when the highest numbers of confirmed cases were reported during May-July [37]. In the present study, we examined CHIKV-positive serum from Dhaka during July-December 2017 [11]. Later in 2018-2019, CHIKV transmission was observed in nearby regions simultaneously, and the number of CHIKV cases noticeably increased in travelers and locals in Myanmar and especially Thailand [19,[38][39][40]. According to the national surveillance of Thailand, the numbers of confirmed cases were 10, 3,580, and 11,721 in 2017, 2018, and 2019, respectively [15]. We also examined CHIKV-positive serum collected at the Hospital for Tropical diseases, Bangkok, in October 2019.
Our results revealed a new distinct IOL sub-lineage separate from the previous IOL that previously caused outbreaks in the Indian Ocean, Indian subcontinent, and Southeast Asia in the last decade. According to our estimated tMRCA, this new IOL sub-lineage originated from India CHIKV, which circulated during 2008-2016. Then, the virus was introduced to Pakistan in early 2015, to Bangladesh in mid-2015, and reached Thailand in 2017. Furthermore, this virus was also detected in Italy around 2016. In contrast, the previous emergence of IOL originated from coastal Kenya, was split into two different sub-lineages in the Indian Ocean islands and Indian subcontinent in 2003, and the Indian subcontinent sub-lineage subsequently reached Southeast Asia in late 2006. The evolution rate of our new sub-lineage of IOL in the present study, 6.29 × 10 −4 (1.57 × 10 −4 -1.30 × 10 −3 s/s/y), was similar to that of the previous IOL outbreak in 2004-2013, 7.51 × 10 −4 (6.67 × 10 −4 -8.46 × 10 −4 s/s/y) [41]. The evolution rate in the Bangladesh clade was 6.79 × 10 −4 (1.75 × 10 −4 -13.76 × 10 −4 s/s/y), and that in the Thailand clade was 1.08 × 10 −3 (3.07 × 10 −4 -2.04 × 10 −3 s/s/y), although the broad HPD range might be due to a low number of available sequences in the present study.
Interestingly, among this new IOL sub-lineage, the adaptive mutation E1-A226V, which facilitates virus replication in Ae. albopictus [5,42], was not detected. Instead, another mutation, E1-K211E, was detected in all sequences in this sub-lineage. Ae. aegypti and Ae. albopictus coexist in India, where a mixture of IOL variants with E1-226A or 226V and E1-211K or 211E has been observed. The epidemics in India were initiated in 2005-2006 and were caused by either IOL E1-226A or 226V variants. However, later studies in 2007-2010 reported that IOL E1-226A, a wild-type or reverted strain, was the major strain in India [21,22,43,44]. The IOL E1-226V variant had been dominant in states such as Kerala where Ae. albopictus is abundant [22,45,46]. Later, apparently both E1-226V and E1-226A viruses had continued to spread in Kolkata of West Bengal. The E1-226V variant was spotted in rural areas, while the E1-226A virus was found in urban areas, where Ae. albopictus and Ae. aegypti are abundant, respectively [47]. Remarkably, in 2008, the IOL E1-226V variant successfully spread to Singapore and subsequently to other Southeast Asian countries where Ae. albopictus served as the primary vector [48,49]. In addition, an expansion of IOL E1-226A virus was detected in northern India and was supposedly related to the outbreaks in New Delhi in 2010 and 2016 [9,50].
In addition to E1-K211E, E2-V264A was first detected in 2010 in India in Kerala and New Delhi, and also from Ae. aegypti in Yemen [9,45,51]. They are also specific to the new IOL sub-lineage that emerged around mid-2008. An early appearance of these two substitutions was detected in Southern India around late 2009-2010 [44], followed by New Delhi in 2010 and 2016 [9,10]. Additionally, in 2010-2011, in Uttar Pradesh, a state located in northern India, a sequence determination of the E1 region in mosquitoes showed the co-existence of E1-211K and E1-211E variants in the background of E1-226A in Ae. aegypti and E1-211K in the background of E1-226V in Ae. albopictus [52]. These substitutions of E1-K211E and E2-V264A were commonly observed in the new IOL sub-lineage viruses. The present study also showed that the codon 211 of E1 was under positive selection. Accordingly, genetically engineered mutations of E1-K211E and E2-V264A in the background of E1-226A confer increased infection, dissemination, and transmission in Ae. aegypti [53].
In mid-2015, the new IOL CHIKV sub-lineage carrying E1-K211E and E2-V264A in the background of E1-226A was introduced to Bangladesh and replaced the old IOL CHIKV that was reported earlier in 2008-2011 in rural areas such as the Dohar sub-district in Dhaka and the Chapainababganj district located in northwestern Bangladesh [54,55]. This new sub-lineage caused the largest CHIKV outbreak in 2017. More than one thousand confirmed cases were reported in Dhaka city, and numbers of cases peaked in July during the monsoon season [56,57]. Although Ae. albopictus had been identified as the main vector for the previous outbreak in Dohar, Dhaka district [55], there has been no report on the mosquito species vector for the new IOL CHIKV sub-lineage. However, a previous study examining approximately 12,000 larvae and pupae sampled from households in Dhaka city during the monsoon in 2011-2013 showed that the numbers of Ae. aegypti were four times those of Ae. albopictus [58], suggesting that Ae. aegypti might be the transmission vector for the new sub-lineage of CHIKV IOL. Our Bangladesh CHIKVs were more closely related to those identified in CHIKV meningitis collected in Dhaka rather than those from other regions of Bangladesh [59]. A minor polymorphic mutation, V58I in nsP3, was observed among Bangladeshi CHIKVs only in BGD17-0303, -1024, -1147, -1299, -1332, -1629 and MK468610, MK468613, MK468616, MK468618, MK468622, and MK468625 [59]. In addition, Bangladeshi CHIKVs were also very similar to CHIKV strains reported in travelers and exported to China and Australia in 2017 [19,60], the Italian CHIKVs in 2017, and Thailand CHIKVs in 2018-2019. These closely related viruses shared the new mutations of D372E in nsP3 and G205S in E2, which were not detected in the Indian CHIKV 2016 (MK472624), the most related and earliest strain of recent clades circulated after 2017 of this new sub-lineage.
Concerning Italy, the first CHIKV outbreak in 2007 was caused by the IOL of E1-A226V that was transmitted by Ae. albopictus [61]. In 2017, the new CHIKV sub-lineage of IOL was reintroduced to Italy [13]. Our analysis revealed that Italian 2017 CHIKVs were closely related to those in Bangladesh, with E1-K211E and E2-V264A in the background of E1-226A. A recent study on the vector competence of these two IOL sub-lineages of CHIKV in Ae. albopictus showed that both had comparable infection and transmission rates [62]. It is tempting to speculate that the acquisition of efficient growth capability in both Ae. albopictus and Ae. aegypti allowed CHIKV to spread over a wide area.
As mentioned before, the national surveillance conducted by the Bureau of Epidemiology, Department of Disease Control, Ministry of Public Health has reported that CHIKV outbreaks have been ongoing in Thailand [15]. Sporadic suspected CHIKV cases have been routinely monitored through mid-2018. The rise of suspected CHIKV cases was noticed in the most southern Thailand during mid-2018 to the end of 2019, and they hit a peak in December 2018-January 2019 with nearly 3000 suspected cases. Then, CHIKV spread to the upper region in western Thailand during mid-2019 to the end of 2019. Similarly, in Bangkok from July 2019 to present, the numbers of CHIKV suspected cases peaked in October and November 2019 with 396 and 516 suspected cases, respectively. The total number of suspected CHIKV cases reached 11,484 overall in 2019 in Thailand. At present, CHIKVs have been spreading to northern and eastern Thailand. Our Thailand CHIKVs analyzed in the present study were collected in Bangkok during this ongoing outbreak in October 2019. These Bangkok CHIKVs were a new sub-lineage of IOL, which is closely related to and clustered together with southern Thailand strains in 2018, a Myanmar strain in 2019, and Australian, Slovenian, Finland, and Chinese strains imported from Thailand. In addition to new IOL-specific mutations mentioned before, these viruses contained additional polymorphic mutations of nsP3-N495S and Capsid-K73R. An investigation of Ae. aegypti collected from all over Thailand revealed that around 16% of field-caught Ae. aegypti in Bangkok are CHIKV PCR positive [63]. The rate of CHIKV PCR-positive Ae. aegypti was 34% in Prachuap Khiri Khan, a province in the southwestern part of Thailand, which is consistent with the numbers of suspected CHIKV cases from the surveillance as mentioned before. Furthermore, the CHIKV envelope regions in all these mosquitoes are characterized by E1-K211E, E2-V264A, and E1-226A [63]. In addition, new sub-lineage IOL CHIKVs were detected in small numbers of mosquitoes in Song Khla, Krabi, and even in northeastern regions such as Nong Khai province [63]. Unfortunately, all of the Thailand CHIKVs analyzed in the present study were collected in Bangkok, and thus, we could not see the old sub-lineage of IOL E1-A226V, which shows restricted transmission in mosquitoes in northern regions of Thailand such as Chiang Rai, Chiang Mai, and Ubon Ratchathani [63]. Thailand has abundant Ae. aegypti and Ae. albopictus. In a previous IOL outbreak in Thailand in 2009, the principal vector was Ae. albopictus [64]. Although there is no direct evidence of CHIKV transmission by this mosquito species in Thailand at present, it would be interesting to further investigate the vector competence of Ae. albopictus for the new sub-lineage of IOL CHIKV with E1-K211E, E2-V264A, and E1-226A.
A phylogenetic tree of nearly the whole genome shown in Figure 3 suggests that the adaptive E1-A226V substitution has independently occurred at least once in the ECSA-West African lineage, once in the ECSA-IOL sub-lineage of the Indian Ocean islands, and once in the ECSA-IOL sub-lineage of the Indian subcontinent/SEA. In contrast, other adaptive substitutions E1-K211E and E2-V264A have occurred only once in the new sub-lineage of the northern Indian subcontinent/SEA in India. Thus, it is important to monitor whether similar adaptive mutations occur in other geographical regions in other sub-lineages of the CHIKV ECSA genotype.
In conclusion, we found the new sub-lineages of ECSA IOL CHIKV carrying E1-K211E and E2-V264A, which have expanded from the northern Indian subcontinent to Thailand through Bangladesh. It is important to see whether this new sub-lineage of ECSA-IOL CHIKV could expand into other regions of the world.