Biodiversity, Evolution and Ecological Specialization of Baculoviruses: A Treasure Trove for Future Applied Research

The Baculoviridae, a family of insect-specific large DNA viruses, is widely used in both biotechnology and biological control. Its applied value stems from millions of years of evolution influenced by interactions with their hosts and the environment. To understand how ecological interactions have shaped baculovirus diversification, we reconstructed a robust molecular phylogeny using 217 complete genomes and ~580 isolates for which at least one of four lepidopteran core genes was available. We then used a phylogenetic-concept-based approach (mPTP) to delimit 165 baculovirus species, including 38 species derived from new genetic data. Phylogenetic optimization of ecological characters revealed a general pattern of host conservatism punctuated by occasional shifts between closely related hosts and major shifts between lepidopteran superfamilies. Moreover, we found significant phylogenetic conservatism between baculoviruses and the type of plant growth (woody or herbaceous) associated with their insect hosts. In addition, we found that colonization of new ecological niches sometimes led to viral radiation. These macroevolutionary patterns show that besides selection during the infection process, baculovirus diversification was influenced by tritrophic interactions, explained by their persistence on plants and interactions in the midgut during horizontal transmission. This complete eco-evolutionary framework highlights the potential innovations that could still be harnessed from the diversity of baculoviruses.


Introduction
The use of baculoviruses (BVs) as expression vectors has mainly focused on the development of a single virus, namely Autographa californica multiple nucleopolyhedrovirus (AcMNPV) [1], and also, to a lesser extent, Bombyx mori nucleopolyhedrovirus (BmNPV) [2]. The commercial success of AcMNPV for biotechnological application is undeniable. Aside from historical reasons, the fact that it is a generalist virus that can infect cell lines from different hosts probably explains why AcMNPV was first chosen. The family Baculoviridae, however, encompasses hundreds of isolates, many of which have been studied in the context of biological control of insect pests, but some of which could in the future prove equally as useful as AcMNPV for biotechnological applications, by providing new molecular and biochemical products with contrasting antigenic properties or by infecting new, more productive cell lines. In this context, it is important to describe the taxonomical diversity of BVs to ensure that new interactions [25][26][27][28]. A better understanding of these diverse evolutionary interactions might point to new molecular targets that could lead to biotechnological innovations.
In this study, our main aims were to (1) provide a robust phylogeny for the whole family Baculoviridae using all the genetic data generated so far; (2) delineate BV species using a non-a-priori phylogenetic-concept-based approach; (3) study the evolutionary history of host use of BVs, assessing the level of host specialization and role of host shifts in the speciation of BVs; (4) elucidate the role of host plants in the evolution of BVs. Our results shed light on the biodiversity ecological and evolutionary factors that drive Baculoviridae diversification.

Virus Isolate Sequence Database
We created a DNA sequence database containing sequences of at least one of the four lepidopteran BV core genes, lef-8, lef-9, per os infectivity factor 2 (pif-2) and polh. We chose those genes because it has been shown that they bear a strong phylogenetic signal [16] and they have been abundantly sequenced. The database includes sequences collated from the public databases GenBank, EMBL and DDBJ (version April 2018), and we tried to obtain sequences from one hundred historical BV samples originating from the reference collection at the Centre for Ecology and Hydrology (NERC-CEH), Wallingford, UK. Primers, PCR amplification protocols and Sanger sequencing are previously described [7,15]. Following current taxonomic practice, each BV sample was associated with the host species from which it was isolated and OB morphology (NPV or GV), as well as sampling information about the virus isolate or strain (if provided), and the name of the first author of the study from which BV sequences derived (Table S1).

Host Ecology Database
For each BV isolate studied, we associated host ecological data. We included the taxonomy of each insect host species (superfamily, family, subfamily), and their geographical distribution (ecozone), established from localities where insect hosts were observed in the field. We also included the host plant range of each insect host, from which we determined the associated insect host plant growth type, distinguishing woody perennial (including shrubby and suffrutescent plants) and annual/biennial herbaceous plants. Information was mainly extracted from the database of the world's lepidopteran host plants of the Natural History Museum of London [29], from the Barcode of Life Data System (BOLD) [30] and from the literature [31] (Table S2).

Baculovirus Core-Genome Phylogeny
A phylogenomic approach was used to reconstruct the BV core-genome phylogeny. BVs possess 37 core genes [32], identified in all completely sequenced genomes. Amino acid multiple alignments were performed on the 37 BV core gene products using MAFFT program [33] and, alignments were concatenated prior to the phylogenetic reconstruction. A maximum likelihood (ML) phylogenetic inference was performed on the concatenated multiple alignment under the Le and Gascuel amino acid substitution model, with a gamma distributed among site rate variation and a proportion of invariant sites (LG + Γ + I), as determined by ProtTest 3 [34]. The ML analysis was performed with RAxMLv.8.2 program [35] and statistical support for nodes in the ML tree was assessed using a bootstrap approach (with 100 replicates).

Baculovirus Isolate Phylogeny
For each of the four BV core genes (lef-8, lef-9, pif-2 and polh) codon-based multiple alignments were performed on all BV isolate sequences. Alignments were then concatenated and completed with gaps for isolates, for which up to three of the four lepidopteran core genes might be missing. A ML phylogenetic inference was performed on the concatenated codon-based multiple alignment with RAxMLv.8.2 using the BV core-genome phylogeny as backbone constraint tree. The BV isolate phylogeny was reconstructed using the best-fitted substitution model and parameters GTR + Γ + I, as determined by jModelTest 2 [36] and statistical support for nodes in the ML tree was assessed using a bootstrap approach (with 100 replicates).

Species Delimitation
A species delimitation analysis was performed on the BV isolate phylogeny using a phylogenetic-species-concept-based method, the multi-rate Poisson tree processes (mPTP) model [37]. The mPTP method is an improved version of the PTP model [9], which models speciation or branching events in terms of number of substitutions and uses heuristic algorithms to identify the most likely classification of branches into population and species-level processes. Moreover, mPTP is fast and incorporates different levels of intraspecific genetic diversity deriving from differences in either the evolutionary history or sampling of each species [37]. The mPTP model delimit species on phylogenies without a priori assumptions, instead of the genetic distance cut-off commonly used. The mPTP results were compared to the commonly used genetic-distances-based method. A previous study calculated an intra-species genetic distance of ≤0.015 (up to 0.05 with complementary information) as marker to delimit BV species [7]. This distance is commonly used by the BV taxonomists and was evaluated using the Geneious plugin SpDelim [38], which assesses the within and between species genetic distances in a phylogenetic tree. For both mPTP and SpDelim analyses, the BV isolate ML phylogeny was used.

Baculovirus Species Phylogeny
From the species delimitation analysis, one isolate per cluster was selected as a representative of a putative BV species (e.g., isolates with a complete genome or isolates with the four core genes sequenced) and singletons were considered as distinct putative BV species. The topology of these species-representative isolates was extracted from the BV isolate phylogeny by pruning the other taxa, using the 'ape' package [39] for R. A new amino acid multiple alignment was created with only the set of the species-representative isolates. A Bayesian phylogenetic inference was performed on this alignment using BEASTv.1.8.4 [40]. The substitution model and parameters LG + Γ + I were used as well as a fixed strict clock of 1.0. The topology of the above species-representative isolate tree was used as target tree for the calculation of the consensus tree and posterior probabilities.

Phylogeny-Trait Correlation and Ancestral State Estimations
Correlations between phylogenetic tree structure and isolate trait discrete values (see Table S2) were assessed using the methods implemented in BaTS/befi-BaTS [41]. The number of state randomizations was defined as 100 to yield a null distribution. A correlation was considered unambiguously positive if the Association Index (AI), the Parsimony Score (PS), the Phylogenetic Diversity (PD), the Nearest Taxa (NT) and the Net Relatedness (NR) indices, the Unique Fraction (UniFrac) index, and the maximum monophyletic clade (MC) size probabilities were ≤0.01. Phylogenetic uncertainty was taken into account by using the set of tree topologies estimated by the above Bayesian phylogenetic inference. For those traits that obtained significant p value, a new Bayesian phylogenetic inference was performed on the species-representative isolate alignment with the same set of parameters as above and adding discrete trait partitions to reconstruct ancestral states.

Host-Virus Cophylogeny
To test for the existence of statistically significant topological congruence between BV species and Lepidoptera hosts, we estimated the maximum number of cospeciation events at different taxonomic levels (superfamily, family, subfamily and genus), needed for reconciling BV species and host phylogenies in TreeMap 3b [42] and compared this estimate to the distribution of corresponding values obtained by randomizing different Lepidoptera trees 100 times while keeping BV-host associations unchanged [43]. Tests at superfamily, family and subfamily taxonomic levels compared the whole BV species phylogeny to published Lepidoptera phylogenies [44][45][46]. Tests at the genus level were performed by reconstructing lepidopteran host species phylogenies based on published phylogenies (Noctuidae [47], Erebidae [48,49], Lasiocampidae [50], Spodoptera genus [51]) or when not available on CO1 sequence data extracted from BOLD public database [30] and compared to the BV species tree simplified by pruning monophyletic clades of species sharing a given host genus down to a single species. Analyses were performed using 25 starts from random maps and heuristic searches for up to 25 generations.

Baculovirus Phylogeny
Here we present a comprehensive phylogeny of BVs, including most of the relevant BV genetic data available to date. A data-mining analysis was performed on public genetic databases and resulted in the collation of 749 BV isolates properly formatted, containing the nucleotide sequences of at least one of four lepidopteran BV core genes (lef-8, lef-9, pif-2 and polh genes). We were also able to determine the sequences (38 lef-8, 39 lef-9, 21 pif-2 and 45 polh) for 45 historical isolates from our own collection. The 143 sequences were deposited in the GenBank database under the accession numbers MH454109-MH454230 and MH458171-MH458191. Overall, our working sequence database contains 2053 nucleotide sequences (564 lef-8 genes, 498 lef-9 genes, 283 pif-2 genes and 708 polh genes) belonging to 794 BV isolates (Table S1). A total of 235 isolates have the sequences of the four genes.
Among the BV isolates, 217 have complete genomes and were used to reconstruct the highly supported BV core-genome phylogeny ( Figure S1), inferred from the concatenated multiple amino acid alignment of the 37 BV core genes. The tree topology is in accordance with previous studies showing four BV genera infecting three distinct insect orders: the genera Alphabaculovirus and Betabaculovirus infect lepidopteran species, the genus Gammabaculovirus infects Hymenopteran species and the genus Deltabaculovirus infects dipteran species [14,15]. The topology obtained was used as backbone constraint tree to reconstruct the BV isolate phylogeny. The BV isolate phylogeny ( Figure 1 and Figure S2) is quite well supported and is in accordance with previous studies showing similar topologies [7,14,17] with notably a separation of two major monophyletic subgroups of alphabaculoviruses (Group I, Group II). In addition, four minor monophyletic subgroups are outgroups to Group I and II. Within Group I we observed a clear division into two distinct monophyletic clades (I.a, I.b) and Group II is divided into three distinct monophyletic clades (II.a, II.b and II.c) ( Figure 1 and Figure S2).

Species Delimitation
To infer macroevolutionary process, it is necessary to conduct analyses at the species level to avoid interference from intraspecific diversity. A species delimitation analysis was performed on the BV isolate tree (Figure 1; see also Figure S2) using the mPTP method and then compared to the commonly used genetic-distances-based method. The mPTP approach delimited 165 distinct putative BV species of which 70 putative BV species have been derived from clusters of two or more isolates and 95 putative species are singletons (based on a unique viral isolate) ( Figure 1; see also Figures S2 and S3). The dataset includes 38 new BV species from our historical isolates. The genetic-distances-based method delimited 178 putative BV species (72 clusters and 106 singletons; Figure 1; see also Figures S2 and S4). The two approaches show 157 species in common, the differences in the genetic-distances-based method were mostly observed in alphabaculovirus species, showing high levels of sampling and genetic variability ( Figure 1; see also Figure S2).    Figure 1. Baculovirus isolate phylogeny. The tree was obtained from a maximum likelihood inference analysis of the concatenated codon-based alignment (794 taxa) of four lepidopteran baculovirus core genes with the baculovirus core-genome phylogeny used as backbone tree ( Figure S1). External clades colored in red correspond to clusters determined by both the mPTP ( Figure S3) and SpDelim ( Figure S4) species delimitation analysis and in blue the clusters not determined by SpDelim. The two star symbols point out the same node in the tree. Baculovirus isolate sequences generated in this study are highlighted in green. Statistical support for nodes in the tree corresponds to bootstraps (with 100 replicates).
Within the 165 distinct species identified by mPTP, 116 putative species are alphabaculoviruses, 45 are betabaculoviruses, three are gammabaculoviruses and one is a deltabaculovirus (Table S3). The 10th report of the ICTV [52] currently only recognizes 68 species in the Baculoviridae: 40 in the genus Alphabaculovirus, 25 in the genus Betabaculovirus, two in the genus Gammabaculovirus and one in the genus Deltabaculovirus [14]. Our analysis leading to a reduced tree containing only putative BV species thus suggests 97 new putative BV species, including 76 alphabaculoviruses and 20 betabaculoviruses (Table S3).

Phylogenetic Conservatism and Host Shifts
Out of the 165 species of the genera Alphabaculovirus and Betabaculovirus identified by mPTP, 161 are associated with 187 Lepidoptera species from 24 different families. For all those lepidopteran hosts, we compiled a dataset with the taxonomy (superfamily, family and subfamily), the biogeographical distribution and the insect host plant range, from which we determined the host plant growth type (woody versus herbaceous) (Table S2). We performed phylogeny-trait correlation and ancestral state estimation analyses on the BV species phylogeny to measure whether those traits have evolved randomly or show phylogenetic conservatism. The different phylogeny-trait correlation tests show unequivocal significant associations between the taxonomy of insect hosts, the insect host plant growth type and the BV species phylogeny (AI, PS, PD, NT, NR, UniFrac and MC; p ≤ 0.01, Table 1). In contrast, no significant association was found between the insect host biogeography distribution and the BV species phylogeny, as shown by the high p values of certain tests (NR, UniFrac and MC; p > 0.01, Table 1). Ancestral state estimations on the BV species tree were performed on traits that showed significant associations such as host taxonomy and insect host plant growth. The estimations show highly significant levels of phylogenetic conservatism with the different host taxonomic levels and the insect host plant growth type (Table 1). For the host taxonomy trait, we illustrate only the insect host superfamily optimization, as it is the higher taxonomic level and has a reduced number of character states (Figure 2). The ancestral state estimation of the insect host superfamilies shows that closely related BV species tend to infect closely related lepidopterans (BVs cluster together according to the lepidopteran superfamilies) (Figure 2A). The host use optimization identifies owlet moths (Noctuoidea) as the most likely ancestral hosts of BVs (Figure 2A). Despite significant levels of phylogenetic conservatism several major host shifts across different lepidopteran superfamilies have occurred during BV evolution. Thus, in the Betabaculovirus genus we can see a host shift from Noctuoidea to the Tortricoidea superfamily and then colonization of several superfamilies: Papilionoidea, Zygaenoidea, Bombycoidea and a shift back to the Noctuoidea (Figure 2A). The Alphabaculovirus genus splits into two large lineages (Group I and II) and four small lineages (Figure 2A). Group I colonizes several Lepidoptera superfamilies, Clade I.a shows no host conservatism harboring a very diverse host range, whereas Clade I.b shows a medium level of host conservatism with a shifting from Noctuoidea to Tortricoidea and Papilionoidea. Group II shows a clear split between Clades II.a, II.b and II.c, with clades II.a and II.b showing a strong conservatism towards Noctuoidea with Clade II.a being specific to Noctuoidea with no apparent host shift. Both clades II.b and II.c show host shifts from Noctuoidea to Geometroidea and Lasiocampoidea and colonization of Bombycoideae, Tortricoidea, Tineoidea and Papilionoidea (Figure 2A). Insect host plant growth type ( Figure 2B), which can be considered as the local habitat of the virus, shows a strong phylogenetic conservatism (Table 1). Our analyses show woody plants as the most likely ancestral ecological niche of the first lepidopteran BVs. The Betabaculovirus genus is ancestrally associated with herbaceous plants before colonizing woody plants. The Alphabaculovirus genus is ancestrally associated with woody plants. Group I shows a strong association with woody plants with few shifts to herbaceous plants. The ancestors of group II likely fed on woody plants with colonization of herbaceous plants (Clade II.a). The three small alphabaculovirus clades, that are outgroups to Clade I and II, also show a shift to herbaceous plants. The most notable result in these analyses is the long-term associations with a particular insect host plant growth type. Strikingly, we noticed in the Group II of the Alphabaculovirus genus a split of virus infecting Noctuoidea, which seems to be the result of the colonization of herbaceous plant local habitat in Clade II.a ( Figure 2).

Cophylogeny between Baculoviruses and Their Lepidopteran Hosts
The association between the topology of the BV species phylogeny and that of their lepidopteran hosts (at superfamily, family and subfamily levels) is not significantly different from random. In contrast, we found significant topological congruence (p < 0.5), meaning that the BV tree is the mirror image of lepidopteran host tree, for six BV clades at different taxonomic levels (see the six clades denoted by rectangles in Figure 2). The Alphabaculovirus genus gathers all the clades with significant topological congruence with host topologies. One clade infecting Tortricoidea hosts was detected in Group I, whereas four clades infecting Noctuoidea hosts were identified, and notably three in Group II. One clade infecting Lasiocampoidea was also detected in Group II.

Reconstructing Phylogenies and Delineating Species
The Baculoviridae is by far the best described insect DNA virus family. For the last 50 years, BV use in biotechnology, either as expression vectors [1] or as microbial control agents for insect pests [53], has led to a vast production of molecular data, especially for the genera infecting the Lepidoptera, on which this study is focused. The first objective of this study was to reconstruct the most accurate and exhaustive BV isolate phylogeny in order to set a solid framework for species delimitation and macroevolutionary inference.
The backbone of the tree was built based on 217 whole BV genomes ( Figure S1). A secondary ML analysis included an additional 577 isolates, from which we obtained our isolate tree (Figure 1 and Figure S2). We chose to cluster these viral isolates into species, which are the basis of biological classification, in order to study BV speciation at a macroevolutionary scale. BVs like most large DNA viruses are slow-evolving viruses with an approximate mutation rate of 10 −6 /10 −7 (expressed as the number of substitutions per nucleotide per generation, defined as a cell infection in viruses) [54]. This mutation rate approaches those observed in bacteria and lower Eukaryotes. Consequently, we decided to use a phylogenetic-species-concept-based clustering approach, the mPTP model [37], to delimit BV species and to compare the results to the commonly used genetic-distances-based method (intra-species distance ≤0.015; up to 0.05 with complementary information [7]). Our species delimitation results were mostly consistent with the current taxonomy proposed by the ICTV [14] ( Table S3). The only difference is that three species were found to each include two species classified by the ICTV (Table S3). Out of 165 BV species, we were able to characterize 97 that are not yet included in the ICTV report. The BV species phylogeny reflects current knowledge of BV diversity and phylogenetic relationships (Figures 1 and 2). Furthermore, this study is the first to use a phylogenetic clustering approach for species delimitation in viruses, showing that its utility goes beyond vertebrates [55], invertebrates [56,57] and bacteria [12] taxa. Moreover, this approach for species delimitation fully respects the phylogenetic species concept and is less arbitrary than commonly used genetic distance approaches, which do not take into account differences in molecular evolutionary rates or sampling proportions and may thus vary depending on the biology of the lineage studied and on the gene used for phylogenetic reconstruction. Nevertheless, both species delimitation approaches gave relatively consistent results with an overlap of 157 species determined by both methods. The few differences in the genetic distances-based method were mostly observed in heterogeneous species clusters, with isolates infecting one or several hosts and showing high genetic diversity typically resulting in clusters with genetic distances between 0.015 and 0.050 [7]. The phylogenetic-based approach does not contradict the results of the genetic-distances-based approach but gives additional information to resolve the uncertainties of the genetic-distances-based approach.

Evolution of Host Use and Taxa Sampling
Most of the BV species identified in our study have isolates that infect only one lepidopteran host species. Strikingly, we generally found that closely related hosts belonging to the same genus were infected by different viral species (for example in the genera Spodoptera, Lymantria, or Malacosoma). This leads us to question the ecological reality and biological meaning of some isolates, such as Trichoplusia ni NPV and Busseola fusca NPV within the Helicoverpa armigera NPV clade ( Figure 1). However, generalist viruses, capable of infecting different host species, belonging or not to the same genus, exist and the hosts they infect generally have overlapping ecological niches (same host plants). As an example, several isolates from different species of nymphalid butterflies that feed on nettles (Urtica dioica) form a single alphabaculovirus species Vanessa atalanta NPV.
As parasites replicating exclusively in host cells, BVs are involved in durable and intimate obligate interactions with their host, implicating long-term coevolution. The phylogenetic conservation results suggest an ancestral and frequent association with hosts of the Noctuoidea superfamily. This could possibly reflect the actual evolution of lepidopteran BVs and their current host range, as with~42,000 out of~157,000 described lepidopteran species the Noctuoidea is the most diversified superfamily (in comparison, the second most abundant superfamily is the Geometroidea with~23,000 species) [31]. Yet, BV genetic data from public databases is clearly biased towards agro-economically important lepidopteran pests, characterized by large populations, important for sustaining BV populations. The addition of new BV samples from our collection tends to reduce the bias for pests, but BVs from pests still dominate our taxa sampling as these viruses have been isolated and resequenced many times (i.e., large clusters of Cydia pomonella GV or Helicoverpa armigera NPV; Figure 1) and as at least 62 out of 161 putative lepidopteran BV species are associated with pests ( Figure 2A; Table S2). As numerous lepidopteran pests belong to the Noctuoidea superfamily, this probably increases the representation of Noctuoidea infecting BVs in our dataset and could have biased our phylogenetic conservatism results (77 out of 161 putative lepidopteran BV species analyzed attack Noctuoidea, Figure 2A). Only a diverse BV sampling, more representative of the lepidopteran diversity could confirm if Noctuoidea played a key role in BV evolution.

Cophylogeny and Host Shifts
Cophylogenetic analyses show no topological congruence between Lepidoptera and the BV species tree, but do show significant cophylogenetic signal between certain internal nodes of the BV species phylogeny and the associated insect host species nodes (Figure 2). This means that present host-use patterns in BVs result mainly from a pattern of host conservatism punctuated with occasional shifting among pre-existing insect lineages [58,59]. This coevolutionary association concurs with a general process of colonization by host tracking [60,61] as previously suggested at the macroevolutionary level of several insect DNA virus families [18]. However, this type of coevolutionary association is only observable for crown groups of viruses infecting hosts belonging to the same family and is lost in more basal nodes. Indeed, over a short timeframe BV speciation remains intimately linked to the speciation of their host, following the biogeography of their hosts and ultimately in certain lineages we observed BV phylogenies that are the mirror images of their host's. Yet on a larger evolutionary scale, the insect-host coevolutionary relationship signal is confused, strongly suggesting that other factors act on BV evolution.

Ecological Specialization
The distinctive feature of the BV life cycle compared to most viruses is that they produce a transmission stage, which persists outside of the host and has the ability to resist environmental degradation. This feature is also found in other insect viruses such as entomopoxviruses and cypoviruses, which have similar life cycles and OBs. This highlights the importance and the persistence of this dissemination process in insect viruses [62]. As BVs only infect larvae and need to be ingested to initiate infection, they have an intimate association with the plants that their hosts feed on. In addition, there is an increasing body of evidence showing that host plant chemistry can moderate the BV infection process [25,28]. Thus, host plant characteristics could define the local biotopes of BVs. We therefore searched for conservatism with the type of plant used by insect hosts, distinguishing woody perennial (including shrubby and suffrutescent plants) and herbaceous plants. Results show ancestral associations to particular plant groups over several tens of million years ( Figure 2B), suggesting a pattern of plant-use conservatism punctuated with sporadic shifts between plant growths. This underlines the predominant role of host plant association in BV evolution. As a consequence, BV diversification entangles patterns of host and local biotope conservatism.
Virus ecological niches are considered in general as defined only by their hosts, underestimating other factors and notably the influence of the environment. The BV niche combines a set of insect and insect host plant biotic conditions. Viruses consume the resources provided by their hosts in a host tracking fashion and consequently are influenced by their geographical distributions; hosts are therefore the set of biotic conditions where primary speciation takes places through adaptation to insect immunity and competition with other parasites. Host spectrum and hosts shifts are contained within local environments represented by a group of host plants. Host plant-use therefore defines a second ring of biotic environment that BVs experience. The association with particular types of plants persists over millions of years and drives BVs towards particular insect hosts. This is strikingly observable for the group II of the Alphabaculovirus genus, where BV infecting Noctuoidea species are split in two distinct clades, in one clade (clade II.a) Noctuoidea species are associated with herbaceous plants and in the other clade (clade II.b) they are associated with woody plants (Figure 2).
At the level of the evolutionary history of BVs, our study points out the specialization of BVs highlighted with topological congruence of certain BV-hosts associations. At the macroevolutionary timescale, BVs are specialized to a particular insect order and notably to three orders of holometabolous insects [15]. The BVs infecting Lepidoptera separate into two major groups with different types of OBs, the NPVs and the GVs. GVs (Betabaculovirus) seem to be ancestrally associated to herbaceous plants and to have colonized woody plants later. Strikingly, some GVs are associated with internal feeding hosts such as the codling moth, Cydia pomonella, the potato tuber moth, Phthorimaea operculella or the tea leaf-roller micromoth Caloptilia theivora and these types of hosts are not infected by NPVs. The morphology of GV OBs, which are much smaller than those of NPVs, may confer a more targeted dispersal strategy to increase the likelihood of reaching hosts concealed within the plant. Therefore, it is predicted that leaf-mining and/or stem-boring Lepidoptera should more likely be infected by GVs than by NPVs.
NPVs (Alphabaculovirus) appear to be ancestrally associated with woody plants and to have colonized herbaceous plants afterwards. NPVs belonging to group II are often very specialized to their hosts according to topological congruence with their associated hosts and the clear split of herbaceous (clade II.a) and woody (clade II.b) clades. By contrast, NPVs belonging to group I show a complex pattern of associations to several lepidopteran superfamilies on herbaceous and woody plants, suggesting more frequent host switches than those observed for group II or GVs. Remarkably, this group includes the well-known Autographa californica MNPV, which has a large host range, spanning different Lepidoptera species belonging to distantly related families [63] (Figure 1; Figure S2). This BV is a generalist virus, which is an uncommon trait in BVs. Generalism could favor host switching and explain loss of BV-host phylogenetic congruence in certain BV lineages, notably in the clade I.a of the Alphabaculovirus genus.
The host is by far the most important component of virus ecological niches, but not the only one. Most previous studies focused on microevolutionary patterns of diversification, resulting in host distribution outcomes to explain virus dynamics [64,65]. Here, we discuss the macroevolutionary patterns of an entire virus family based on a reconstruction, as exhaustive as possible, of its history. BVs have a peculiar life cycle where insect hosts and their associated plants are entangled. Viral transmission and fitness are increased with the typical production of OBs for environmental dissemination combined with the modification of host behavior, like the enhancement of climbing behavior [66,67]. Plants are therefore the vessel of viral transmission. Plants attacked by many lepidopteran species could support the evolution of generalist viruses. This in turn could promote host shifts with the subsequent specialization of particular viral lineages. Conversely plants attacked by few specialist Lepidoptera should foster the evolution of specialist viruses. Multitrophic interactions have thus shaped BV evolution as shown by the combined patterns of insect host and insect host plant conservatism, punctuated by occasional shifts among pre-existing insect lineages. However, the direct evolutionary interactions between plants and BVs remain undetermined.

Conclusions
Our current understanding of BV diversity and evolution has been fueled by decades of research on biological control and biotechnological applications. Here, we presented a complete evolutionary framework of the known BV diversity, highlighting the complex ecological interactions of these viruses with their hosts. Our study is the first to use a phylogenetic clustering approach for species delimitation in viruses characterizing many new BV species. Our analyses show that host shifts played a major role in the diversification of BVs. It also shows that the colonization of a new ecological niche (herbaceous plants) lead to the radiation of some BV lineages. BV species richness results from millions of years of evolutionary interactions between the host plant ecology and chemistry and the physiology and ecology of both BVs and their associated insect hosts. This is conveyed at the genome level by an average of 150 genes, of which only 37 are core genes shared by all BVs, and a further 26 shared by all lepidopteran BVs [32]. The remaining so-called accessory genes may encode specific adaptive proteins [68] that could prove useful to improve expression vectors or agricultural biocontrol applications. Indeed, within the catalogue of BV genes, there are still many genes of unknown functions, including some that could enhance viral replication and protein production capacity, or that could modulate virulence or host specificity. It is thus quite probable that by its diversity the family Baculoviridae potentially still hides a treasure trove of genes and molecules that could lead to innovative biotechnology.