Spatio-Temporal Evolutionary Patterns of the Pieridae Butterflies (Lepidoptera: Papilionoidea) Inferred from Mitogenomic Data

Pieridae is one of the largest and almost cosmopolitan groups of butterflies, which plays an important role in natural ecosystems; however, to date, its phylogeny and evolutionary history have not been fully resolved. In this study, we obtained the complete or nearly complete mitochondrial genomes of 100 pierid taxa (six newly sequenced, sixty extracted from the whole-genome data, and thirty-four directly available from GenBank). At the same time, for the first time, we conducted comparative mitogenomic and phylogenetic analyses based on these mitogenomic data, to further clarify their spatio-temporal evolutionary patterns. Comparative mitogenomic analysis showed that, except for cox2, the GC content of each of the 13 protein-coding genes (PCGs) in the rapidly diverging subfamily Pierinae was higher than in its sister group Coliadinae. Moreover, the dN/dS values of nine genes (atp6, atp8, cox1, cox3, cob, nad1, nad3, nad5, and nad6) in Pierinae were also relatively higher than those in its sister group, Coliadinae. Phylogenetic analysis showed that all the resultant phylogenetic trees were generally in agreement with those of previous studies. The Pierinae family contained six clades in total with the relationship of (Leptosiaini + (((Nepheroniini + Arthocharidini) + Teracolini) + (Pierini + Elodini))). The Pieridae originated in the Palearctic region approximately 72.3 million years ago in the late Cretaceous, and the subfamily Pierinae diverged from this family around 57.9 million years ago in the Oriental region, shortly after the K–Pg mass extinction event; in addition, the spatio-temporal evolutionary patterns of Pierinae were closely correlated with geological events and environmental changes, as well as the host plant coevolutionary scenario in Earth’s history. However, some incongruencies were observed between our results and those of previous studies in terms of shallow phylogenies for a few taxa, and should be further investigated.


Introduction
As herbivores and pollinators, butterflies play a fundamental role in terrestrial ecosystem and are also indicators of environmental change [1]. Additionally, butterflies serve as models for research on community ecology, speciation, biogeography, and plant-insect interactions. Their diversification are presumed to have coevolved with angiosperms insects [2][3][4]. Pieridae is one of the major butterfly groups, comprising approximately 1100 described species in 85 genera worldwide [5][6][7]. In previous decades, many efforts were made to clarify the phylogeny of Pieridae by using morphological, molecular (mitochondrial and nuclear genes), or combined data, and currently, the monophyly of its four subfamilies (Pseudopontiinae, Dismorphiinae, Coliadinae, and Pierinae) are well established [6][7][8]. However, the relationships among Pieridae crown lineages, especially some tribes or genera within the rapidly diverging subfamily Pierinae, remain relatively unclear. The Pierinae species is divided into two tribes (Anthocharidini and Pierini) primarily based genomes, covering 56 genera, were first used to reconstruct the phylogenetic trees of the main lineages of pierid butterfly species; then, their divergence times were estimated with relaxed molecular dating methods using multiple calibrations, and their ancestral geographic distributions were constructed based on current geographic distributions, to further clarify their spatio-temporal evolutionary patterns.

DNA Extraction, Mitogenome Sequencing, and Assembly
The adult individuals of six newly identified pierid butterfly species (Supplementary Table S1) were collected and preserved in 100% ethanol for fixation and stored at −20 • C (College of Life Sciences, Anhui Normal University) until subsequent experiments. The total genomic DNA was extracted from a single specimen with a QIAGEN DNeasy Blood and Tissue Kit (Hilden, Germany), following the manufacturer's protocols. Briefly, a total amount of 0.2 µg DNA per sample was used as input material for DNA library preparations, and the sequencing library was generated using NEB Next ® Ultra™ DNA Library Prep Kit for Illumina (NEB, San Diego, CA, USA) following the manufacturer's protocols. The genomic DNA sample was fragmented via sonication to an average size of 350 bp and was sequenced on an Illumina platform (Illumina novaseq6000; Novegene, Tianjing, China) with 150 bp generated paired-end reads. Each sample was generated about 10 Gb of raw data, together with the 60 whole-genome data directly downloaded from GenBank, which were filtered out low quality reads, adapter contamination and ambiguous bases using fastp v0.20.0 [33] with the parameters of "−q 15 −n 10 −u 40". Subsequently, a total of 66 mitogenomes were assembled using the GetOrganelle v1.7.0 [34] program with default settings (minimum and maximum k values of 21 bp and 155 bp).

Gene Annotation and Mitogenome Analysis
The secondary structures of tRNAs were predicted using MITOS2 [35] with the invertebrate mitochondrial genetic code. The PCGs, rRNAs, and control region boundaries were determined by the positions of tRNAs. In addition, to ensure more accurate gene boundaries, PCGs and rRNA genes were verified through sequence comparison with other Pieridae mitogenomes and then confirmed with manual calibration using SeqMan [36]. PCGs were also translated into amino acids based on the invertebrate mitochondrial genetic code. The tandem repeats of the control region were predicted using the Tandem Repeats Finder online server (http://tandem.bu.edu/trf/trf.html, accessed on 18 March 2022) [37]. The base composition and relative synonymous codon usage (RSCU) of the 100 mitogenomes and the GC content of each gene for each species were analyzed using DAMBE 7 [38]. The ratio of non-synonymous (dN) to synonymous substitutions (dS) for each PCG among species was calculated with DnaSP v6.0 [39].

Measures of Nucleotide Variation
The substitution saturation of the 13_PCGs (13 protein-coding genes), 15_genes (the combination of 13 PCGs and 2 rRNAs), and 13PCGs_codon3 (the third position of 13 PCGs) were also tested in DAMBE 7 [38], with the GTR model selected as a reference model. Moreover, mitogenomes have proven to be powerful in defining the major lineages at the subfamily level and below, while they are usually affected by shifts in both evolutionary rates and nucleotide composition. In addition, owing to the shifts that existed in both the evolutionary rate and nucleotide composition, the heterogeneity of sequence divergence within datasets relative to an external reference (outgroup) was analyzed with AliGROOVE v1.07 [40]. The default sliding window size was used, with indels in the nucleotide dataset treated as ambiguity, to investigate the heterogeneity in these mitogenomic data.

Phylogenetic Analysis
Phylogenetic analysis was conducted with two datasets: (1) 13_PCGs and (2) 15_genes. In our analysis, 100 species representing three subfamilies, nine tribes of Pieridae, and fifty-six genera were used to reconstruct the phylogenetic trees, with three Papilionidae and three Hesperiidae species used as the outgroups (Supplementary Table S1). Nucleotide sequences for each of the 13 PCGs were translated into amino acids, separately aligned with muscle implemented within MEGA 7 [41]; the rRNAs were aligned using MAFFT 7.490 [42], with the L-INS-i algorithm. All the unreliably aligned sequences were eliminated using Gblocks 0.91b [43], with the aligned data of PCGs and rRNAs from each locus concatenated with SequenceMatrix v1.9 [44].
The maximum likelihood (ML) and Bayesian inference (BI) methods were used to reconstruct the phylogenetic tree. ML analysis was performed using IQ-TREE with 10,000 bootstrap replicates [45], and the best-fit partitioning schemes and substitution models (Supplementary Table S2) were recommended using ModelFinder [46], implemented in the IQ-TREE program. The support for each node of the tree (BPs) was evaluated via the bootstrap test with 1000 iterations. BI analysis was conducted in MrBayes v3.2.7 [47], and the nucleotide substitution models and best-fit partitioning schemes (Supplementary Table S2) were simultaneously recommended by PartitionFinder v2.1.1 [48], using the "greed" algorithm. Using the Akaike information criterion (AICc), two parallel analyses with four independent Markov chains (three hot chains and one cold chain) were conducted. The samples included 2 million generations, with sampling performed for every 100 generations when the convergence of MCMC chains was achieved (i.e., the average standard deviation of split frequencies was less than 0.01); the first 25% of the sampled trees were discarded as "burn-in" samples. The confidence values of the tree nodes were estimated as the posterior probabilities (PPs). All the generated trees were displayed using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/, accessed on 24 June 2022) software.

Divergence Time Estimation
In the present study, two fossil calibrations (V. pluto and M. talboti) were used to estimate the times of divergence, which were recently critically reviewed by de Jong [16]. Apomorphic character or character combinations diagnostic of extant clades were observed, thereby allowing reliable fossil allocations on phylogenetic trees to provide minimum ages to the corresponding nodes. Their geological times dated back 20.4 to 16.0 million years ago, respectively, both with a log-normal prior. In addition, two secondary calibration points of relevant host plants (Fabaceae and Brassicaceae) [49] were used in this study as the maximum constraint, owing to their coevolutionary scenarios [2,50,51].
The divergence times were identified using BEAST v1.10.4 [52] with the GTR+I+G model. The analysis assumed a birth-death speciation process and used an uncorrelated lognormal relaxed clock. Markov chain Monte Carlo (MCMC) chains were run for 30 million generations, with the parameters sampled every 3000 generations. Tracer v1.7.1 [53] was used to assess the convergence of the chains and effective sample size for each parameter, which was ≥ 200 for all the samples. The first 25% of the trees were discarded as "burn-in", and the node ages with upper and lower 95% posterior density (HPD) were calculated using TreeAnnotator v1.10.4. In addition, a lineage-through-time (LTT) analysis [54] was performed to determine the tempo of the diversification of the species and to assess its possible relation to temperature changes and geological events in Earth's history. The LTT plot of the lineage numbers against the divergence time was constructed using the ape and ggplot2 packages in R v4.1.2 [55].

Ancestral Area Reconstruction
The biogeographic realms in this study generally contained six areas, as previously demonstrated [56]: Nearctic (A), Palearctic (B), Afrotropic (C), Oriental (D), Australasia (E), and Neotropical (F). We compiled the distributional data for the sampled taxa according to their modern distributional areas and constrained the common ancestral areas, which were previously studied [19,21,22]. We performed the Bayesian binary MCMC (BBM) method of biogeographic ancestral area reconstruction implemented in RASP v4.x [57], using a guiding tree inferred from BEAST and the outgroup taxa excluded. The BBM analysis was run for 5 million cycles, using 10 chains, and sampling every 100 cycles. The temperature was set at 0.1, with a fixed JC model. The maximum number of areas for all the nodes was set to three, and the time-event curve was calculated by the "Time-Calculate" option in RASP. The data for each node were plotted in pie charts.

Ancestral Area Reconstruction
The biogeographic realms in this study generally contained six areas, as previou demonstrated [56]: Nearctic (A), Palearctic (B), Afrotropic (C), Oriental (D), Australa (E), and Neotropical (F). We compiled the distributional data for the sampled ta according to their modern distributional areas and constrained the common ancest areas, which were previously studied [19,21,22]. We performed the Bayesian bina MCMC (BBM) method of biogeographic ancestral area reconstruction implemented RASP v4.x [57], using a guiding tree inferred from BEAST and the outgroup ta excluded. The BBM analysis was run for 5 million cycles, using 10 chains, and sampli every 100 cycles. The temperature was set at 0.1, with a fixed JC model. The maximu number of areas for all the nodes was set to three, and the time-event curve was calculat by the "Time-Calculate" option in RASP. The data for each node were plotted in pie char

Characterization of Mitogenomes
The six complete mitogenome sequences of I. pyrene (15,   of the 13 PCGs (except the cox2 gene) was higher in Pierinae than in Coliadinae (Figure 3), and the GC contents of the four PCGs (atp6, cob, nad2, and nad4) in the subfamily Pierinae were significantly higher than those in Coliadinae (p < 0.01, Wilcoxon rank-sum test). The ratio of non-synonymous (dN) to synonymous substitutions (dS) for the PCGs was assessed separately for Coliadinae and Pierinae. Furthermore, the average dN/dS value of nine genes (atp6, atp8, cox1, cox3, cob, nad1, nad3, nad5, and nad6) in Pierinae was higher than in Coliadinae ( Figure 4). We believe that the elevated GC content might be correlated with a high mutation rate, which triggered the rapid development of the subfamily Pierinae and differentiated it from the Pieridae. The RSCU analysis of the PCGs showed that the third codon position among the five most commonly used codons (TCT, GCT, CCT, CTT, and CGA) of the whole Pieridae group is A or T (Figure 2), indicating a strongly positive AT skew. Notably, the GC content of each of the 13 PCGs (except the cox2 gene) was higher in Pierinae than in Coliadinae (Figure 3), and the GC contents of the four PCGs (atp6, cob, nad2, and nad4) in the subfamily Pierinae were significantly higher than those in Coliadinae (p < 0.01, Wilcoxon rank-sum test). The ratio of non-synonymous (dN) to synonymous substitutions (dS) for the PCGs was assessed separately for Coliadinae and Pierinae. Furthermore, the average dN/dS value of nine genes (atp6, atp8, cox1, cox3, cob, nad1, nad3, nad5, and nad6) in Pierinae was higher than in Coliadinae (Figure 4). We believe that the elevated GC content might be correlated with a high mutation rate, which triggered the rapid development of the subfamily Pierinae and differentiated it from the Pieridae.

Assessment of Sequence Variation
The plots of uncorrected pairwise sequence divergence against the divergence calculated using a GTR model (Supplementary Figure S1) showed that, even though 13PCGs_codon3 diverged faster for the transversions than for transitions, because of the high level of mutational saturation, they were not saturated, indicating that 13_PCGs and 15_genes are suitable for the direct analysis of deep divergences in the Pieridae phylogeny. The heterogeneity of sequence variation was assessed with AliGROOVE v1.07 [40], separately for different datasets. All the datasets yielded extremely low scores ( Figure 5). In general, the mitogenomes had relatively low heterogeneity of sequence composition for most pairwise comparisons between the sequences for the 13_PCGs, 15_genes, and 13PCGs_codon3. This was observed through a comparison of the pairwise

Assessment of Sequence Variation
The plots of uncorrected pairwise sequence divergence against the divergence calculated using a GTR model (Supplementary Figure S1) showed that, even though 13PCGs_codon3 diverged faster for the transversions than for transitions, because of the high level of mutational saturation, they were not saturated, indicating that 13_PCGs and 15_genes are suitable for the direct analysis of deep divergences in the Pieridae phylogeny. The heterogeneity of sequence variation was assessed with AliGROOVE v1.07 [40], separately for different datasets. All the datasets yielded extremely low scores ( Figure 5). In general, the mitogenomes had relatively low heterogeneity of sequence composition for most pairwise comparisons between the sequences for the 13_PCGs, 15_genes, and 13PCGs_codon3. This was observed through a comparison of the pairwise

Assessment of Sequence Variation
The plots of uncorrected pairwise sequence divergence against the divergence calculated using a GTR model (Supplementary Figure S1) showed that, even though 13PCGs_codon3 diverged faster for the transversions than for transitions, because of the high level of mutational saturation, they were not saturated, indicating that 13_PCGs and 15_genes are suitable for the direct analysis of deep divergences in the Pieridae phylogeny. The heterogeneity of sequence variation was assessed with AliGROOVE v1.07 [40], separately for different datasets. All the datasets yielded extremely low scores ( Figure 5). In general, the mitogenomes had relatively low heterogeneity of sequence composition for most pairwise comparisons between the sequences for the 13_PCGs, 15_genes, and 13PCGs_codon3. This was observed through a comparison of the pairwise sequence divergence of individual terminals to terminals outside of the focal group against the same measure of divergence over the entire data matrix.
1 Figure 5. Heterogeneous sequence divergence of mitochondrial genomes for different datasets. The mean similarity score between sequences is represented by a colored square. The scores range from −1, indicating full random similarity, to +1, indicating non-random similarity, which is visualized by a color range from dark red (−1) to dark blue (+1).

Phylogenetic Relationships
In the present study, we determined the phylogenetic trees of Pieridae based on 100 mitochondrial genomes (covering 56 genera), with both datasets (13_PCGs and 15_genes) across different methods (ML and BI phylogenetic analyses). This analysis yielded identical tree topology, with strong evidence for most of the nodes (BS = 80-100, PP = 0.95-1.00). Our results corroborated the phylogenetic data of the Pieridae butterfly family found in previous studies [6,7,11,12,32]; furthermore, the phylogenetic relationships of more taxa groups were revealed. Specifically, our results indicated that three subfamilies, namely Dismorphiinae, Coliadinae, and Pierinae, were strongly supported as monophyletic groups (BS = 100, PP = 1.00), with Dismorphiinae recovered as a sister group with Coliadinae + Pierinae, and Coliadinae recovered as a sister group with Pierinae ( Figure 6). Genes 2023, 14, x FOR PEER REVIEW 9 of 18  In this study, within the subfamily Coliadinae, two monophyletic tribes (Euremdini and Coliadini) covering a total of 14 genera were well supported. The tribe Euremdini was revealed to have six genera, with the relationship of (Nathalis + ((Kricogonia + Prestonia) + ((Eurema + Pyrisitia) + Teriocolias))), while the tribe Coliadini comprised eight genera, with the relationship of ((Gonepteryx + (Gandaca + Dercas)) + (Phoebis + (Anteos + (Catopsilia + (Colias + Zerene))))), which was generally consistent with the results by Braby et al. [6] and Wahlberg et al. [7], except for the genus Dercas, which was a sister genus to Gandaca, and the clade Gandaca + Dercas, which was a sister clade to Gonepteryx [6,7]. In another study, the genus Kricogonia was considered more closely related to Phoebis in terms of its morphological features [9]. However, our study supports the findings of the previous studies based on molecular sequence data indicating that Kricogonia and Phoebis were nested within the tribes Euremdini and Coliadini, respectively [6,7].
Our study showed that the tribe Anthocharidini contained two monophyletic clades: one clade comprised the genera Euchloe, Anthocharis, and Zegris, and the other comprised Eroessa and Hesperocharis. However, the genus Anthocharis was identified as a non-monophyletic group in this study since strong evidence indicated that Zggris eupheme was nested within this genus, implying that the genus Zggris should be synonymized with Anthocharis. Additionally, our study showed that the genus Hebomoia was clustered with Pareronia in the tribe Nepheroniini rather than in Anthocharidini, thus supporting the results of Braby et al. [6] but refuting the suggestion of its Anthocharidini grouping [6,7,11,12,32]. The tribe Pierini was found to contain three subtribes with their relationship of (Appiadina + (Aporiina + Pierina)), in agreement with the results of previous studies [6,7]. For the internal phylogeny of the subtribe Aporiina, the grouping of (Prioneris + Cepora) was sister to the remaining taxa, in agreement with previous studies [6,7]. Moreover, both Aporia and Appias were identified as paraphyletic groups. Since the monotypic Mesapia is nested within Aporia, our results supported that Mesapia should be synonymized with Aporia [8,9,[11][12][13]32,59]. Likewise, Saletara liberia was also found to be nested within the genus Appias, which was consistent with previous studies [6,9,60], except for the study by Wahlberg et al. [7]. Thus, Saletara should be classified under Appias or be synonymized with Appias. However, more studies with taxon sampling and the inclusion of both molecular and morphological data are needed to test these hypotheses.

Divergence Time Estimates
The estimations of Pieridae divergence times significantly varied among the previous studies [6,13,17,18] due to the unsuitable fossil calibrations adopted and thus poorly inferred phylogenies [61,62]. Our study showed that the most recent common ancestor (MRCA) of Pieridae originated at about 72.3 Mya (95% CI: 67.7 to 77.4 Mya) during the Upper Cretaceous (Figure 7a), generally congruent with the latest estimate by Chazot et al. [18]. The split between Coliadinae and Pierinae occurred at about 65.5 Mya (95% CI: 63.7 to 67.5 Mya) during the Cretaceous-Paleogene (K-Pg) boundary, and these subfamilies began to diversify at about 51.8 Mya (95%CI: 43.9 to 59.3 Mya) and 57.9 Mya (95% CI: 53.4 to 62.5 Mya), respectively. Thus, the Pieridae ancestral lineages might have survived by small populations in geographically restricted areas at the K-Pg boundary, with a relatively long delay before they began to diversify [14,56]. This finding supports the view that the K-Pg event had a huge impact on phytophagous insects, including some groups of butterflies [63], but refutes the suggestion that the Pierinae diverged shortly after the appearance of the Brassicales (90-85 Mya) [6,64]. This case of asynchrony between the evolutionary history of animal taxa and plant migrations (at least 20 Ma earlier) was also revealed in a previous study [65]. The estimations of Pieridae divergence times significantly varied among the previous studies [6,13,17,18] due to the unsuitable fossil calibrations adopted and thus poorly inferred phylogenies [61,62]. Our study showed that the most recent common ancestor (MRCA) of Pieridae originated at about 72.3 Mya (95% CI: 67.7 to 77.4 Mya) during the Upper Cretaceous (Figure 7a), generally congruent with the latest estimate by Chazot et al. [18]. The split between Coliadinae and Pierinae occurred at about 65.5 Mya (95% CI: 63.7 to 67.5 Mya) during the Cretaceous-Paleogene (K-Pg) boundary, and these subfamilies began to diversify at about 51.8 Mya (95%CI: 43.9 to 59.3 Mya) and 57.9 Mya (95% CI: 53.4 to 62.5 Mya), respectively. Thus, the Pieridae ancestral lineages might have survived by small populations in geographically restricted areas at the K-Pg boundary, with a relatively long delay before they began to diversify [14,56]. This finding supports the view that the K-Pg event had a huge impact on phytophagous insects, including some groups of butterflies [63], but refutes the suggestion that the Pierinae diverged shortly after the appearance of the Brassicales (90-85 Mya) [6,64]. This case of asynchrony between the evolutionary history of animal taxa and plant migrations (at least 20 Ma earlier) was also revealed in a previous study [65]. posterior density intervals for the node age, and placement of two fossil calibrations and two secondary calibrations are indicated by red circles and red stars, respectively); (b) the LTT plot of Pierinae diversification, and the point of a sharp increase are marked with a red arrow; (c) climatic-dispersal curve for Pierinae. Deep ocean temperatures (black curve, as a proxy for global temperature) were derived from oxygen isotopes corrected for variation in global ice volume; the time-event curve (blue curve) shows the changes in the frequency of dispersal.
The tribe-level lineages within Pierinae and Coliadinae were estimated to have diverged mostly during the late Paleocene and early Eocene periods (Figure 7a). Within the subfamily Coliadinae, the tribes Euremdini and Coliadini began to diverge at about 46.8 Mya (95% CI: 38.5 to 53.9 Mya) and 46.1 Mya (95% CI: 43.9 to 59.3 Mya), respectively; in comparison, within the subfamily Pierinae, the divergence between Leptosiaini and the remaining lineages of Pierinae occurred at about 57.9 Mya, and the divergence between the grouping of (Nepheroniini + Anthocharidini) and Teracolini occurred at about 49.  [66], as well as the host plant (Brassicales) dispersals [67] during this period.

Ancestral Area Reconstruction
Pierid butterflies are distributed worldwide but are unevenly distributed throughout the major zoogeographical regions. Our study showed that under the improved phylogenetic framework using the BBM method, the MRCA of the family Pieridae was likely to have originated in the Palearctic region during the Campanian period in the late Cretaceous, and the early evolution of its lineages occurred in this continent until the end of the Cretaceous and then dispersed into other ecoregions (Figure 8).
In terms of the higher taxa recognized in the family, the four subfamilies Dismorphiinae, Pseudopontiinae, Coliadinae, and Pierinae might have markedly different evolutionary patterns among zoogeographical regions. As a previous study has shown, Dismorphiinae probably originated in the western Gondwana region around 70 million years ago, during which South America and Africa were still connected [6]. Later, they dispersed mainly into the Neotropical region, with a few lineages into the Palaearctic. As for the Pseudopontiinae, previous studies showed that they are closely related to Dismorphiinae or Pierinae [6,7,17]. Nevertheless, they are currently endemic to Africa [6]; thus, their originating area is still a mystery. However, the originating areas of the two other large subfamilies, Coliadinae and Pierinae, due to their strong migratory tendencies and insufficient taxon sampling in previous studies, are also far from clear. The MRCA of Coliadinae and Pierinae was found to have dispersed from the Palearctic into the Oriental region through a land bridge that connected to the Oriental region resulting from the India-Eurasia collision after the late Cretaceous [68,69]. The MRCA of Coliadinae then dispersed from the Oriental region into the Nearctic region between 65.5 and 51.8 Mya, most likely via the Bering Land Bridge (BLB) [70]. This dispersal into new areas may have prevented them from K-Pg global extinction by increasing the cumulative chance of lineage survival. In Coliadinae, the most likely ancestral region for the tribe Euremdini was found to be the Nearctic region, and among this tribe, one clade (Nathalis + (Kricogonia + Prestonia)) was endemic to the Nearctic region, while the other clade (Teriocolias + (Pyristia + Eurema)) dispersed into the Neotropical region through the Isthmus of Panama (IP) at about 39 Mya in late Eocene. This coincided with a floral shift in the Neotropical region due to the more temperate and suitable habitats of tropical biomes [71]. As for the other tribe Coliadini, their most likely ancestral region was also the Nearctic, and one of its clades (Phoebis + (Anteos + (Catopsilia + (Colias + Zerene)))) was endemic to the Nearctic region, whereas the other clade ((Gonepteryx + (Gandaca + Dercas)) dispersed backed to the Oriental region at about 42.3 Mya. In terms of the higher taxa recognized in the family, the four subfamilies Dismorphiinae, Pseudopontiinae, Coliadinae, and Pierinae might have markedly different evolutionary patterns among zoogeographical regions. As a previous study has shown, Dismorphiinae probably originated in the western Gondwana region around 70 million years ago, during which South America and Africa were still connected [6]. Later, they dispersed mainly into the Neotropical region, with a few lineages into the Palaearctic. As for the Pseudopontiinae, previous studies showed that they are closely related to Dismorphiinae or Pierinae [6,7,17]. Nevertheless, they are currently endemic to Africa [6]; thus, their originating area is still a mystery. However, the originating areas of the two other large subfamilies, Coliadinae and Pierinae, due to their strong migratory tendencies and insufficient taxon sampling in previous studies, are also far from clear. The MRCA of Coliadinae and Pierinae was found to have dispersed from the Palearctic into the Oriental region through a land bridge that connected to the Oriental region resulting from the India-Eurasia collision after the late Cretaceous [68,69]. The MRCA of Coliadinae then dispersed from the Oriental region into the Nearctic region between 65.5 and 51.8 Mya, The MRCA of Pierinae diverged in the Oriental region after the K-Pg mass extinction around 65.5 million years ago, and their ancestral lineages may have stayed and survived by small populations with a relatively long delay before they began to diversify. Our results showed that the most likely ancestral region for ((Nepheroniini + Anthocharidini) + Teracolini)) and (Pierini + Elodinini) was the Oriental region. Moreover, our analysis indicated that the distributional changes in the Pierinae lineages were correlated with their host plants, the Brassicales, which underwent rapid and dramatic diversification in the late Cretaceous, especially in the K-Pg boundary, as shown in a previous study [67]. For example, in the tribes Anthocharidini and Pierini, the MRCA of three clades (Eroessa + Hesperocharis), ((Melete + (Pereute + Leodonta) + ((Neophasia + Eucheira) + (Charonias + (Catasticta + Archo-nias))))), and (Ascia + (Ganyra + (Hypsochila + ((Theochila + Tatochila) + (Phulia + (Infraphulia + (Piercolias + Pierphulia))))))) dispersed from the Oriental region into the Neotropical region at about 34 Mya, shortly after the Brassicales extended their distribution to South America [67]. This probably occurred through the BLB or stepping stones between these two zoogeographical regions [70,72,73], in contrast to the use of the Thulean Land Bridge (TLB) [74][75][76]. However, the MRCA of some lineages, such as the Euchloe, Anthocharis, Zegris, Aporia, Pontia, and Reliquia, returned to the Palearctic region from the Oriental region between 27.5 and 16.9 Mya during the Oligocene to Miocene eras, which was also correlated with the dispersal of their host plants Brassicales to Europe by the closure of the Turgai Straits during the Oligocene era [63]. Soon after, the subfamily Pierinae began to rapidly diversify into their extant lineages of current distributions (Figure 7b).

Conclusions
In this study, for the first time, we obtained the complete or nearly complete mitochondrial genomes of 100 pierid butterfly species, covering three subfamilies and fifty-six genera. Based on this relatively larger-scale dataset, we conducted comparative mitogenomic, phylogenetic, and phylogeographic analyses to further clarify the spatio-temporal evolutionary patterns of the Pieridae. Our results showed that the GC content of PCGs in Pierinae was elevated compared with that of the Coliainae, which might be correlated with their relatively higher mutation rate. All the obtained Pieridae phylogenetic trees in our study were identical in topology across the different methods (ML and BI phylogenetic analyses) and datasets (13_PCGs and 15_genes), providing a phylogenetic backbone of Pieridae in terms of mitogenomics. However, some minor incongruencies with previous studies were observed. Our phylogenetic analysis showed that the spatio-temporal evolutionary patterns of Pierinae were closely associated with the evolution of their host plants, the Brassicales, as well as the corresponding geological events and environmental changes over time and in different locations throughout Earth's history.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/genes14010072/s1, Table S1: Details of species and mitogenomes of Pieridae and outgroups used in this study. Table S2: Partition schemes and best-fitting models for phylogenetic analyses. Figure S1: Saturation plots for the 13_PCGs, 15_genes, and 13 PCGs_codon3 performed in DAMBE 7 with the GTR model selected as a reference model. S: transition rate; V: transversion rate. Figure S2: Phylogenetic relationships of the 100 Pieridae species from Bayesian inference (BI) analysis based on 13_PCGs; posterior probabilities are indicated at each node. Figure