Role of Diversity and Recombination in the Emergence of Chilli Leaf Curl Virus

Chilli leaf curl virus (ChiLCV), (Genus Begomovirus, family Geminiviridae) and associated satellites pose a serious threat to chilli production, worldwide. This study highlights the factors accountable for genetic diversity, recombination, and evolution of ChiLCV, and associated chilli leaf curl alphasatellite (ChiLCA) and chilli leaf curl betasatellite (ChiLCB). Phylogenetic analysis of complete genome (DNA-A) sequences of 132 ChiLCV isolates from five countries downloaded from NCBI database clustered into three major clades and showed high population diversity. The dN/dS ratio and Tajima D value of all viral DNA-A and associated betasatellite showed selective control on evolutionary relationships. Negative values of neutrality tests indicated purified selection and an excess of low-frequency polymorphism. Nucleotide diversity (π) for C4 and Rep genes was higher than other genes of ChiLCV with an average value of π = 18.37 × 10−2 and π = 17.52 × 10−2 respectively. A high number of mutations were detected in TrAP and Rep genes, while ChiLCB has a greater number of mutations than ChiLCA. In addition, significant recombination breakpoints were detected in all regions of ChiLCV genome, ChiLCB and, ChiLCA. Our findings indicate that ChiLCV has the potential for rapid evolution and adaptation to a range of geographic conditions and could be adopted to infect a wide range of crops, including diverse chilli cultivars.


Introduction
Viruses and diseases that have emerged in the past few years have impeded the production of important crops worldwide. Genetic diversity allows the adaptation of virus populations to a varying environment. Many virus species are the result of closely related genomic variants due to a high rate of mutations, rapid recombination, and a large population size [1]. Research on the evolution of plant viruses is mainly concentrated on RNA viruses, but very few studies exist on DNA viruses, although (ss) DNA viruses are the biggest emerging menace to agriculture globally. Several studies have revealed that ssDNA viruses may evolve as quickly as RNA viruses [2]. A number of studies anticipated that ssDNA viruses possibly replicate through low-fidelity DNA polymerases, and those spontaneous biochemical reactions which preferably act on ssDNA (methylation, deamination and oxidation of bases) might be contributing to the genetic variations of ssDNA viruses [3,4]. Though mutation dynamics acts as the most important factor for virus diversification [5], it does not describe all the standing genetic variation. Another crucial factor that significantly contributes to diversity of plant viruses is recombination. Recombination contributes to the evolution of a number of virus species and has been extensively identified in the genomes of many geminiviruses [1,[6][7][8].
The genus Begomovirus comprises more than 400 speciesand is the largest genus in the family Geminiviridae which are transmitted by whiteflies (Bemisia tabaci) [9,10]. Based on pair-wise sequence identity, host range, genome-organization and insect-vector relationship, the family Geminiviridae has been divided into nine genera: Becurtovirus, Begomovirus, Capulavirus, Curtovirus, Eragrovirus, Grablovirus, Mastrevirus, Topocuvirus and Turncurtovirus [9,11,12]. Recently, five more genera: Citlodavirus, Maldovirus, Mulcrilevirus, Opunvirus and Topilevirus have been added thus increasing the total number to 14 genera in the Geminiviridae family [13].Begomoviruses are ssDNA plant viruses having geminate quasi-icosahedral virions [9]. The genome of begomovirus can be monopartite (DNA-A) or bipartite (DNA-A and DNA-B) of~2.5-2.7 kb per genome component in size. DNA-A encodes six virus proteins. Four proteins are encoded by the complementary sense strand: replication associated protein (Rep/AC1), transcription activator protein (TrAP/AC2), replication enhancer protein (REn/AC3) and AC4 while virions sense strand encodes for coat protein (CP/AV1) and pre-coat protein (pre-CP/AV2), in some begomoviruses species AC5/C5 protein were also observed [6,14]. DNA-B is involved in the systemic (cell to cell) and local (nucleus to cytoplasm) movements of begomovirus and involved proteins aremovement protein (MP) and nuclear shuttle protein (NSP) respectively [10]. The alphasatellites and betasatellites are predominantly associated with monopartite begomoviruses [15,16] and a few cases have been reported for deltasatellites [17]. The alphasatellite belongs to family Alphasatellitidae, encoding a single protein alpha-rep and includes a hairpin structure at the origin of replication [15]. The betasatellite is about half the size of begomovirus DNA-A, encodes the βC1protein in the complementary sense strand, and plays important roles in transcriptional and post transcriptional gene silencing and symptom induction [18,19].
Chilli (Capsicum annuum) is a vital spice in Indian cuisine or food which is used both as a fresh vegetable and in powder form. The involvement of numerous begomoviruses (including ChiLCV) and associated DNA satellites in Chilli infection and the development of Chilli leaf curl disease were reported several years ago [14]. Chilli leaf curl virus (ChiLCV) is among the most predominant monopartite begomoviruses and seriously impacts solanaceous and non-solanaceous hosts in combination with various betasatellites [20]. Due to ChiLCV infection, 14-100% yield losses of chilli were recorded in Rajasthan (India) [21] and causes severe economic losses in both the tropical and sub-tropical regions of India [14,21]. The typical symptoms caused by the virus infection in chilli are leaf curling, crumpling, thickening and swelling of the veins, reduced leaf size, shortening of internodes and petioles, clustering of leaves and stunning of whole plant [12]. ChiLCV has a wide host range and could pose the possibility of an outbreak in the Indian sub-continent. However, numerous site-specific nucleases have been developed for direct interference with the begomovirus genome [22,23].
In this study, we analysed the genetic diversity of a large number of ChiLCV isolates from five countries and its satellite populations to get a better insight into the epidemics, and evolution of ChiLCV. We further analysed the phylogenetic relationship, and recombination breakpoints in the complete genome sequences of ChiLCV isolates, as well as their associated satellites viruses, and significant findings are outlined in this article.

Phylogenetics and Estimation of Nucleotide Substitution Rates
Using the complete genome of DNA-A of ChiLCV, maximum clade credibility phylogenetic analysis was performed to check the evolutionary relatedness among the virus populations. A total of 132 ChiLCV isolates reported from India, Pakistan, Oman, Sri Lanka, Bangladesh and Republic of Korea were grouped into three distinct clades (I, II and III). Majority of isolates in clade I were from Oman (28) whereas clade II indicates isolates from India (50), Pakistan (21), one from both Bangladesh and South Korea along with a single isolate from Sri Lanka also shared the same clade ( Figure 1A). Moreover, in clade III majority of isolates were from Pakistan (04 isolates) and India (03) ( Figure 1A). Additionally, this phylogenetic analysis was reinforced by nucleotide sequence identity test through Sequence Demarcation Tool Version 1.2 (SDTv1.2) (Supplementary Figure S1) that aids to interpret the phylogenetic tree analytically and efficiently.   [24] were used for maximum clade credibility (MCC) phylogenetic analyses based on full-length nucleotide sequences of ChiLCV and associated satellites with rooted tree mid-point and the tree was built in Interactive Tree Of Life (iTOL) an online tool (numbers indicates the height median for each isolates and associated satellites). The phylogenetic trees indicate to check the evolutionary relatedness among the virus populations of (A) Chilli leaf curl virus (ChilCV), (B) Chilli leaf curl alphasatellite (ChiLCA) and (C) Chilli leaf curl betasatellite (ChiLCB). The outermost ring shows the clades formation among the viruses whereas the middle ring indicates the host and the innermost ring shows country of origin.
The fifteen isolates of alphasatellites were grouped into three different clades ( Figure 1B). All clades consist of isolates belong to Indian origin associated with chilli crop except KF471047, which was associated with Amaranthus crop from India belonging to Clade III. The ChiLCA sequence KF584013 seems somewhat different among Clade II sequences, the key reason behind this out-grouping was its least nucleotide sequence identity (<78%) with all other ChiLCA sequences (Supplementary Figure S2) which was might be a consequence of frequent recombination and mutation events. On the other hand, the isolates of betasatellites were divided into four clades ( Figure 1C). Clade I had all isolates obtained from India (fourteen isolates), Bangladesh (two isolates) and Sri Lanka (one isolates), whereas majority of isolates groped in Clade II (five isolates) were from India except MT800762 which was from Saudi Arabia. While considering Clade III (twenty-four isolates) and Clade IV (twenty-eight isolates), among them we found that the majority of isolates were from Pakistan. In ChiLCB isolate, JN638446 (Sri Lanka) had the least sequence nucleotide identity (<70%) for all other ChiLCB sequences, perhaps because (Supplementary Figure S3) clades owe their phylogenetic relationships to other sequences. This phylogenetic finding facilitates the interpretation of evolutionary patterns existing among ChiLCV and its associated satellite populations across major different regions of the world, and although the isolates of different countries share the same clades pointing us in a direction that indicates the cross-border movement of these virus isolates.
The overall rate of nucleotide substitution was 3.34 × 10 −3 substitutions/site/year (with 95% HPD interval 1.72 × 10 −3 to 6.62 × 10 −3 ) for DNA-A of ChiLCV, which is very much similar to the nucleotide substitution rate of plant RNA viruses demonstrating a rapid rate of evolution. In addition, the average rate of nucleotide substitution for alphasatellites and betasatellites was 6.19 × 10 −3 substitutions/site/year and 9.17 × 10 −4 substitutions/site/year respectively (Table 1). The overall rate of evolution of all individual genes of DNA-A component exhibited a high rate of nucleotide substitution. The mutation rate (an important parameter for the calculation of the rate of evolution) of all the three codon positions among all the ORFs was observed in the CP gene at codon position 3 and for Pre-CP and C4 genes at codon position 2. Similarly, both alphasatellite and betasatellite have a high mutation rate at codon position 2. The other datasets (pre-CP and CP) had a higher mutation rate for the 2nd and 3rd codon positions (Table 1).

Recombination Analysis
Phylogenetic networks showing reticulation ( Figure 2) demonstrate clear evidence for recombination events. Putative recombination breakpoints are analysed by the RDP v. 4.2 package [25]. Analysis identified many unique recombination events in all the datasets. To avoid unreliable signals, only recombination events supported by at least three or more different methods with significant support were selected. Fifty-one recombination breakpoints were observed in ChiLCV DNA-A, mostly located in the C1 gene of the Csense strand and the V1 gene of the V-sense strand (Table 2a). These results were further confirmed by recombination breakpoint analysis for each ORF (C1, C2, C3, C4, V1 and V2) of DNA-A. The Rep gene (C1) was identified as the main contributing factor that was involved in intra-species recombination with fifteen breakpoints. Table 2b. CP (V1) regions showed five recombination breaks, and TrAP (C2) and Ren (C3) regions also showed three recombination breakpoints each. The putative recombination analysis for alphasatellite and betasatellite of ChiLCV showed that betasatellite has higher recombination events, i.e., seventeen recombination events more than alphasatellite (Table 2b).

Population Demography Analysis
The total number of mutations was η = 2104 in DNA-A, with the highest number of mutations in TrAP (426 nt) and Rep (399 nt) genes indicating that the diversification of the ChiLCV population is mainly driven by mutation in these two genes. ChiLCB has 1070 mutations, while ChiLCA has 727 mutations. ChiLCV DNA-A has a high degree of genetic variability (π > 0.08) i.e., π = 0.107, along with both the satellite molecules. Among all the ORFs of DNA-A, C4 (π = 0.183) and Rep (π = 0.175) gene had the highest nucleotide diversity (Table 3). Haplotype distribution analysis revealed different values among the 132 ChiLCV isolates, based on the six coding regions evaluated. Among the total sequences of ChiLCV (n = 132), the number of haplotypes ranged from 55 in pre-CP and C4 regions to 81 in CP region, with maximum haplotypes 113 in whole genome. Each isolate represented a maximum number of haplotypes at the CP region, showing high genetic variation within the coding gene.       (Table 4) suggesting a purifying selection and population expansion [26]. Similarly, for all the virus datasets statistical parameters like Fu and Li's D and Fu and Li's F tests, we have obtained negative and, in some case, positive values (Table 4)

Amino Acid Sites under Selections
The calculated dN/dS ratio was >1 for DNA-A, CP and pre-CP genes and 1.081 and 1.309 for ChiLCA and ChiLCB respectively (Table 4) (Table 4).

Discussion
ChiLCV is one of the most damaging begomoviruses and causes significant yield loss es in chilli production, worldwide. Due to the mixed cropping system and the polyphagous nature of the vector "whitefly", it leads to an overlapping host range of begomoviruses. ChiLCV has a wide host range and infects chilli, papaya, tomato, eggplant, hibiscus etc. [20]. The genetic structure of ChiLCV DNA-A with all six ORFs and associated satellite molecules was evaluated. This study elucidates the evolution and variability of the ChiLCV (whole genome and individual ORFs) and associated satellite molecules by using dated published genomic sequences. The two major factors contributing to the high genetic variability of begomoviruses are frequent recombination, which might considerably accelerate their evolution by increasing the permutations of pre-existing nucleotide polymorphism generated by mutations [8,27] and the high nucleotide substitution rate as rapidly as most RNA viruses [3,4,28]. Therefore, mutation and recombination are considered crucial factors involved in the genetic variability of ChiLCV populations [1,5,29]. In this study, we have analysed various parameters for ChiLCV isolates and shown which genes are more affected by the above factors. Satellite molecules (alphasatellite and betasatellite) are essential pathogenicity determinants for monopartite begomoviruses and have also been investigated [30,31].
The phylogenetic studies estimated from the whole-genome sequences of ChiLCV and associated satellite molecules were qualitatively congruent. The phylogenetic tree of DNA-A component represents a strong association of ChiLCV Indian isolates with isolates of Pakistan, Oman, and Sri Lanka ( Figure 1A). The phylogenetic relationship indicates that Indian isolates of ChiLCV played an important role in the evolution of the virus and are involved in intra-species recombination of ChiLCV isolates. The movement of whiteflies from India to the adjacent countries and similar cropping patterns may be the possible explanation for the intra-species recombination and phylogenetic relationship.
Analysis of homology and phylogeny further highlighted the evolutionary relatedness among ChiLCV populations arising from different countries. For instance, Indian ChiLCV isolates clustered with Pakistani isolates in clades II and III, whereas clade I contain mainly isolates from Oman. Furthermore, the emergence of new isolates of ChiLCV in the past era is now quite alarming for agriculture production as this virus is expanding the host range. Therefore, while the expanding host range of viruses is imperative to assess their evolutionary mechanisms; the diversity and genetic structure of viral populations in a single host are equally important to explain the evolutionary patterns [32]. The dominance of alphasatellites in India and betasatellites in Pakistan and India as compared to other regions might be attributed to the presence of suitable hosts and efficient transmission vectors [33]. To date, there is no evidence of chilli leaf alphasatellites from Pakistan, Oman, Sri Lanka, Bangladesh, or the Republic of Korea satellite. The majority of ChiLCA, found to be associated with chilli (excluding one host for Amaranthus, KF471047), explains the absence of suitable hosts in these regions. However, only one isolate of ChiLCB (JN638446; 2011) has been reported from Sri Lanka (Supplementary Table S1), the possible emergence of ChiLCV-associated satellites in the future cannot be neglected. Over the past few years, multiple infections of satellites were found to be associated with the ChiLCD complex, and additionally, alphasatellites were found in co-existence with betasatellites [34]. Because of the prevalent occurrence of mixed infection among begomoviruses, finding several and co-existing satellites may not be unusual [35]. Perhaps during whitefly-mediated transmission, satellite molecules might become associated with other viruses, forming new complexes and introducing them to disease-free regions. The two ChiLCA (KF584013 and KF471058) from India and the ChiLCB sequences JN638446 (SriLanka) showed a distinct outgroup with other clades, highlighting the importance of component recombination and reassortments.
Population genetics, along with recombination, are important factors influencing DNA virus evolution. Mutation plays a crucial role in genetic variation, on which recombination, natural selection, genetic drift, and gene flow act to shape the genetic structure of population [1], as shown in this work and previous studies with other plant viruses [12]. The isolates of ChiLCV exhibited a non-recombination structure in a more diversified form due to occurrence of maximum mutation then those to recombinant region. Hence, we found that the most viable recombinant gene, i.e., C1 has a low number of mutation sites shows high recombination breakpoints compared to C2 gene having a high number of mutation sites, shows low recombinant breakpoints. Therefore, the detected recombination patterns consequently seemed to have diverged from each other by point mutation, which highlights the genetic distribution likely involves the contribution of mutation in facilitating virus evolution.
Recombination among the sequence of all datasets might provide a high rate of evolution, rapid multiplication, and expansion of the host range. Various studies have revealed the high frequencies of recombination in begomovirus populations [36] and for ssDNA viruses which use a rolling circle replication mechanism, non-random sites of recombination events are a conserved trait [6,7,37]. In this study we observed at least one recombination breakpoint in analyzed datasets of DNA-A component and associated satellite molecules which strengthen the previous studies. The high recombination frequency in begomoviruses could leads to the emergence of new begomovirus species and helps to acquire satellite molecules [38]. In our dataset of ChiLCV we observed forty-seven recombination breakpoints with more than three algorithms implemented in RDP v. 4.2, suggests high genetic variation in ChiLCV genome (Table 1). Recombination in DNA-A and DNA-B components in bipartite begomoviruses and recombination with associated satellite molecules in monopartite begomoviruses were also reported along with intra-species recombination [39]. The Rep and CP gene in begomoviruses exhibits higher number of recombination as compare to other genes. This uneven presence of recombination events in begomoviruses genes supports that it is a major factor for genetic variation in begomoviruses. In our dataset of ChiLCV we observed the Rep and CP gene in begomoviruses exhibit higher number of recombination event as compared to other genes of DNA-A component. The associated betasatellite also showed higher number of recombination breakpoints than the alphasatellite ( Table 2). The Recombination analysis results obtained in this study also supports that the recombination is a driving force of the genetic variability in ChiLCV genome.
Numerous studies reported that the geminiviruses have a high nucleotide substitution rate, which is almost analogous to those of RNA viruses [14]. Here, the observed nucleotide substitution rate is higher than those considered for double-stranded DNA viruses [3]. The wide range of dN/dS values in a population implies that the populations may be under the influence of purifying selection or have experienced recent expansion [40]. In this study we observed the higher dN/dS ratio in CP and Pre-CP gene and lower dN/dS ratio in Rep and C4 genes of DNA-A component of ChiLCV (Table 4) revealing the occurrence of diversifying selection acting on virus genome. The wide range of dN/dS ratio in ChiLCV analyzed dataset (Table 4) demonstrates the presence of purifying selection and exhibits the strong negative selection in CP and Pre-CP gene of ChiLCV. As previous studies have already shown that in begomoviruses the most of sites are under purifying selection pressure, and few sporadic sites were identified as experiencing positive selection [41]. In our dataset we observed fewer sites in the ChiLCV are under positive selection. Positive selection sites were also detected in ChiLCB (34 sites) and ChiLCA (3 sites) ( Table 4).Our results support the fact that the positive selection is also acting as a major pressure responsible for the increased levels of genetic diversity in ChiLCV isolates.
In summary, this study further confirmed that ChiLCV populations are mostly influenced by mutation and recombination, which play a crucial role in the genetic diversification of the ChiLCV population. However, it needs to be determined how mutations are influenced by diverse hosts infected by ChiLCV. As we know that the chilli leaf curl disease caused by various begomoviruses involves enormous losses in chilli cultivation, worldwide and a matter of great concern for farmers as well as agricultural scientist. To combat this there is a great need of more studies related to evolving nature of ChiLCV. This study includes the analysis of major evolutionary driving forces such as mutation, recombination and natural selection in ChiLCV population, which helps us to understand the genetic variability in ChiLCV and to develop new strategies to control viral diseases in chilli and other susceptible crops.

Sequence Datasets and Multiple Sequence Alignments
Complete genome sequences (DNA-A) of 132 isolates of ChiLCV in which 54 isolates were reported from India, 49 from Oman, 25 from Pakistan, 2 from Bangladesh and one each from Korea and Sri Lanka, 75 complete sequences of ChiLCB and 15 of ChiLCA were retrieved from the NCBI GenBank database using the Taxonomy Browser (www.ncbi.nlm. nih.gov, accessed on 6 October 2021). Along with whole-genome sequences of ChiLCV and satellite molecules, multiple sequence alignments for all the six genes of DNA-A were also analysed using the Muscle algorithm implemented in MEGA X [42].

Phylogenetic and Coalescent Analysis
The Maximum clade credibility phylogenetic tree was constructed by using, the Bayesian method and Tree annotator tool available in BEAST v. 1.10 [43]. To select the best-suited nucleotide substitution model for each dataset MEGA X [42] had been employed. The resulting trees were visualized and edited in iTOL v6.5 (Interactive Tree Of Life) (https://itol.embl.de/#, accessed on 13 April 2022) [44]. To estimate the nucleotide substitution rates per site and mutations at various codon positions, the Bayesian Markov Chain Monte Carlo (MCMC) method obtainable in BEAST v. 1.10 [43] was used. Each data set was analysed by both relaxed and strict molecular clocks (uncorrelated exponential and uncorrelated lognormal). MCMC chains were run for sufficient length (10 7 ) and statistical uncertainty in the estimates was provided by the 95% highest probability density (HPD) value. Best-fit clock and coalescent constant demographic models were identified and achievement of suitable effective sample sizes for these parameters was estimated by using Tracer v1.5 [44].

Recombination Analysis
Phylogenetic network analysis was performed for evidence of recombination with the neighbor net method implemented in Splits Tree 4 [45]. To identify the parental isolates to substantiate the recombination events, breakpoints and origin of the virus spread were predicted by using RDP, GENECONV, MAXCHI, BOOTSCAN, CHIMAERA, SISCAN and 3SEQ methods implemented in RDP v. 4.2 [25] with default detection thresholds and 0.05 highest acceptable Bonferroni corrected p-value.

Population Demography Analysis
To investigate the nucleotide polymorphism various parameters were calculated by using DnaSP v. 6.0 [46]. The estimation of genetic diversity was determined by the number of polymorphic sites (S), total number of mutations (η), nucleotide diversity (π), number of haplotypes (h), haplotype diversity (Hd), Watterson's estimate of the population mutation rate based on the total number of segregation sites (θ − w) and the total number of mutations (θ − η). Neutrality tests were also performed using Tajima's D (nucleotide diversity with the proportion of polymorphic sites), Fu and Li's D* (difference between the number of singletons and the total number of mutations) and Fu and Li's F* (difference between the number of singletons and the average number of nucleotide differences between paired sequences) tests available in DnaSP v. 6.0.

Detection of Positive and Negative Selection at Amino Acid Sites
The ratio of non-synonymous to synonymous (dN/dS) substitutions was calculated by using standard parameters in MEGA X for every dataset. Ratio dN/dS > 1, dN/dS < 1, and dN/dS = 1 indicates positive (diversifying), negative (purifying), and neutral selection pressure, respectively. The detection of sites evolved under positive and negative selection was performed by three methods: single-likelihood ancestor counting (SLAC), fixed-effects likelihood (FEL) and fast unbiased Bayesian approximation (FUBAR) implemented in the DataMonkey (www.datamonkey.org, accessed on 19 March 2022) [47].
Author Contributions: M.M. and R.K.V. contributed equally to the experiments and computational work. V.P. and A.S. helped in computational analysis. R.G., A.A. and P.S. are the mentor and supervised the experiments, edited, proofread, and finalized the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: All sequencing data of virus isolates are available in the NCBI database. Further data analysis will be available from the corresponding authors upon request.