Is the Existence of Two Lineages for Hamadryas glauconome (Lepidoptera: Nymphalidae) True? Molecular and Ecological Evidence

: The genus Hamadryas has a neotropical distribution. In 1983, the subspecies H. glauconome grisea from Mexico was recognized with subtle and subjective differences in color, size and distribution and limited to the northwest. Since then, there has been a debate about whether it is a different lineage from H. glauconome because adult-stage morphology studies have not found signiﬁcant differences. This study aims to delimitate H. g. glauconome and H. g. grisea lineages with two sources of evidence: ecological and molecular—the former through ecological niche modeling using the accessible area for the species and estimating the minimum volume ellipsoid overlapping as a fundamental niche using occurrences databases. The molecular evidence is found through the methods of phylogenetic inference and the generalized mixed yule coalescent approach, using sequences of cytochrome oxidase I. Ecological and molecular evidence suggest that H. g. grisea is a different lineage from H. glauconome . Also, molecular evidence of a third lineage from the south of Texas needs further study. This study suggests that different evidence should be provided when morphology is not enough for delimiting species, especially in recently diverged species. Furthermore, the H. g. grisea cytochrome oxidase I sequence (658 bp) is published for the ﬁrst time.


Introduction
Hamadryas is a Lepidoptera genus with a neotropical distribution [1].This group was first studied by Fruhstorfer in 1916 [2], and later, Jenkins [1] made a comprehensive review of the genus.The genus comprises 20 species distributed in Latin America, the Caribbean, and Mexico [1].In Mexico, there are nine of the twenty known species and eight subspecies, primarily distributed in the neotropical region.In the Yucatan Peninsula as a biotic province, including Campeche, Yucatan, Quintana Roo, northern Guatemala, and Belize [3,4], 10 species are known [5][6][7][8], including Hamadryas julitta, an endemic species restricted from northern Belize [6] to the northern Yucatan state.
In a previous study [7], genetic barcodes (a 658-base pair region of the cytochrome oxidase subunit I, COI gene) were used for the species identification of Lepidoptera in the Yucatan Peninsula.This study placed H. julitta and H. glauconome as sister species with a genetic divergence of 2%.This divergence is supported by further phylogenetic studies of the genus using morphology and molecular markers, which also place H. julitta and H. glauconome as sister species [9,10].Morphologically, they are very similar, and in the past, H. julitta was considered a form of H. glauconome [2].However, Jenkins [1] provided a detailed description of both their external morphology and genitalia, along with a comparative table of their characteristics, and concluded that they are two separate species, changing the status of H. julitta to species.A recent study with Hamadryas eggs revealed clear differences in the exocorion between these two sister species, adding further distinguishing morphological characters [11].
The distribution of H. glauconome ranges from Mexico to Costa Rica [1,2].In Mexico, subspecies H. g. glauconome and H. g. grisea are recognized [1,5].The distribution of the former extends to the Pacific and Gulf of Mexico slopes, while the latter has a restricted distribution in northwestern Mexico [1,5].However, the proposal of the subspecies H. g. grisea by Jenkins [1] is primarily based on distribution, coloration patterns, and subtle differences in genitalia.The study by Garzón-Orduña [9] highlights a group of H. glauconome that seems to be paraphyletic.A recent study [12] evaluated the taxonomic status of H. julitta and H. glauconome by morphology, color, and some climatic variables.It concluded that the variation between species is significant enough to consider them as different species.On the other hand, there was not a significant enough difference between H. g. glauconome and H. g. grisea to consider them different species and establish that the hindwing ocelli form is enough to differentiate among these subspecies.This affirmation states the need to study H. glauconome lineages under different approaches to provide a finer resolution and thus determine their status.
Studying the lineages of H. glauconome and H. julitta using different approaches will help us to understand how they are structured and the processes that led to their differentiation.If we consider species as lineages within a metapopulation that evolve independently and gradually diverge in various aspects such as morphology, genetics, behavior, and ecology, this provides evidence for the lineage delimitation of each species [13], especially when morphological characteristics are not discrete enough to discriminate between them.
Niche analysis has been used in previous studies for delineating lineages in vertebrates [14][15][16] and invertebrates like Lepidoptera [17][18][19][20]; it relies on the ecological species concept [21,22] and both Hutchinson's [23] and Maguire's [24] concept of niche as a species attribute.Species distribution is determined by abiotic favorable conditions and is equivalent to fundamental niche (A), biological interactions (B, biotic components), and accessible area (M).It is defined as the reachable area of a species, and it depends on dispersal capacity-the accessible areas of the world that have been available since the origins of the species [25].Ecological niche modeling (ENM) uses species occurrences and environmental data to estimate the realized niche, defined as where A and B converge, which means where the species live [25].Considering M while modeling instead of just A gives biological sense to the model as it depends on the dispersal capacity and evolutive history of the species [26].Component B is very complex as it needs robust data in a broad temporal-space scale, and it is difficult to obtain because interactions change along the species distribution; thus, it is not included in ENM [26,27].
Another approach for species delimitation is the molecular one, and various criteria have been employed for this delimitation, ranging from divergence thresholds (such as genetic barcodes [28,29]) and phylogenetics through Bayesian inference [30] to methods that incorporate coalescence concepts [31,32].Several methods include coalescence in their models, such as the GMYC method (generalized mixed yule coalescent), which uses the maximum likelihood model (ML).This is based on the prediction that lineages evolving independently lead to the emergence of distinct genetic clusters separated by long internal branches [31,32].
Considering this, our study proposes a multi-character approach, including ecological and molecular techniques to provide evidence for delimitating the lineages of H. g. glauconome, H. g. grisea, and H. julitta.

Databases
Species and subspecies occurrence data in the distribution area were obtained from various Lepidoptera collections-Museo de Zoología de la Facultad de Ciencias from UNAM, El Colegio de la Frontera Sur in Chetumal, the McGuire Center for Lepidoptera and Biodiversity at the Florida University-and some were collected from recent fieldwork.Published data were included [1,[33][34][35][36], a query from GBIF [37] was made only with preserved specimens, and doubtful data were dismissed.Missing coordinates were recovered with GE-OLocate [38] and medium to low precision was validated with the QGIS version 3.20.3[39] intersection tool using an America's countries shapefile (https://www.diva-gis.org,accessed on 20 October 2021).The remaining missing data were recovered with Google Earth [40].Environmental variables were downloaded from WorldClim [41] with a 2.5 resolution.The occurrences database was processed with RStudio [42] basic functions to eliminate duplicates, and the ecospat package [43] was used to avoid redundancy and bias by suppressing occurrences within 69 m, which was considered the dispersal capacity from the mean of the movement of some previously studied Hamadryas species [44].

Accessible Area
The accessible area (M) was estimated altogether, as sister lineages share an evolutive history [25].The accessible area simulation was run using the R Grinnell package [45] with normal dispersal, spread = 1, maximum two dispersers, three replicates, and twenty dispersal events as simulation parameters.Environmental layers were cropped according to the simulated M.

Model Calibration
For the calibration of the model, environmental values for each point were extracted with the raster [46] and dismo [47] R packages, and the correlation between variables was analyzed with the corr_var function in the lares package [48], where values ≥ 0.8 indicated a strong correlation.An analysis of variable contribution was conducted with a preliminary model using MaxEnt version 3.3.3[49] with altogether lineages and the cropped environmental layers using the default basic parameters and running the jackknife test included in MaxEnt to measure variable importance.The permutation importance was used as a criterion to remove strongly correlated variables to avoid bias; if two variables had a strong correlation, the one with more permutation importance was kept.When two correlated variables are shuffled, one of them has little effect on the performance because the other one has very similar information, thus affecting the permutation importance in the final model.

Ecological Niche Modeling
Each lineage was modeled separately in MaxEnt with the 5 selected variables using 10 bootstrap replicates, random seed, a 20% random test, and a maximum number of background points and 500 iterations as a maximum, adding samples to the background and writing background predictions.A jackknife test was run from MaxEnt for variable importance.We used the area under the curve (AUC) generated in the program to assess the model performance.The plot of the mean of each species' replicates was used to visualize the suitability with Qgis version 3.26 [39].

Identity Test
An identity test was conducted with the identity.testfunction from ENMTools [50] R package using each species occurrences and selected bioclimatic layers (cropped as the accessible area).This calculates the empirical niche identity between lineages using Schoender's D estimate, which goes from 0 when there is no overlapping to 1 when there is complete overlapping.This function performs the test as in Warren [51] with a one-tailed test.The parameters were 100 replicates, a MaxEnt model, and 1000 background points to construct a null distribution.Also, a critical value was calculated (p = 0.05), above of which 95% of values were higher, as well as compared results; if empirical D is above the critical value, the null hypothesis is accepted and no statistically significant difference exists between lineages niches.

Fundamental Niche Overlap
Each lineage data occurrence was used to visualize the niche in a multidimensional environmental space using NicheA version 3.0 [52].The background was constructed using the cropped bioclimatic layers, and the minimum volume ellipsoid (MVE, fundamental niche) overlapping was calculated.

Molecular Analysis 2.2.1. Sequences
Sixteen Hamadryas julitta's cytochrome oxidase I (COI) sequences (658 bp) from Barcode of Life Data Systems (BOLD, www.boldsystems.org,accessed on 28 September 2023) [53] under the project Nymphalidae of the Yucatan Peninsula (LNYM) were used.Additionally, 23 samples of H. g. glauconome and 5 from H. g. grisea were obtained from museum specimens' legs of the Lepidoptera Collection of Zoology Museum and the National Insects Collection, both from the Universidad Nacional Autónoma de México and the McGuire Center for Lepidoptera and Biodiversity at the Florida University.DNA extraction, amplification, and sequencing were performed at the Canadian Centre for DNA Barcoding under standard protocols [54].A total of 44 (16 from H. julitta, 23 from H. g. glauconome and 5 from H. g. grisea) sequences were aligned with MUSCLE [55] and used to perform the molecular analyses.Interspecific and intraspecific distances were calculated with MEGA version 11.0.13[56] using Kimura 2 parameters (K2P) [57] for a general overview.

Phylogenetic Inferences
The mtDNA tree was inferred employing both maximum likelihood (ML) in the IQ-Tree web server [58] and Bayesian inference (BI) in BEAST 2.4.6.[59] using all sequences whose GenBank accession numbers are in Table 1 (16 from H. julitta, 5 from H. g. grisea and 23 from H. g. glauconome) and sequences of H. februa, H. feronia and H. amphicloe as outgroups downloaded from GenBank (accession numbers GU659529, GU659523 and JN263324 respectively).
Sequence alignment was performed using the MUSCLE algorithm [55] included in the software MEGA X version 10.2.0 [60].A maximum likelihood (ML) tree was run in the IQ-TREE web server [58] using the best fitting model, TIM2 + I, according to the Bayesian Information Criterion (BIC) calculated with jModelTest2 version 2.1.10[61].Analysis was run with standard bootstrap with 100 alignments to estimate the branch support analysis.The consensus tree was visualized using FigTree version 1.4.4 [62].
For BEAST analyses we specified the best fitting model TIM2 + I and the invariant proportion from jModelTest2 version 2.1.10[61], an optimized relaxed clock and a lognormal constant-size coalescent tree prior.A run was performed with a length of 30 million generations and sampling every 5000 generations.We verified for convergence and evaluated the run performance with Tracer [63].We used TreeAnnotator 2.4.6 [64] to generate a maximum credibility tree using mean heights for the nodes.The nodes with BI posterior probability (PP) values >0.95 and ML bootstrap (BS) values 70% or above were a priori regarded as strongly supported.In contrast, lower values were regarded as indicating no significant node support [65].

Generalized Mixed Yule Coalescent (GMYC) Analysis
To delimit H. glauconome lineages, 28 sequences (including 23 from H. g. glauconome and 5 from H. g. grisea) were analyzed with a GMYC approach.For these sequences, the HKY + I substitution model was estimated with jModelTest2 version 2.1.10[61] using Akaike's Information Criterion (AIC).An ultrametric tree was generated through Bayesian inference using BEAST version 2.7.5 [66] including a seed of 30,000,000 and substitution model of HKY + I with the invariant proportion from the jModelTest analysis.An optimized relaxed clock [67] was used to calculate the branch length under a coalescent constant population model with log-normal and a 30,000,000-length Monte Carlo Markov Chain.The tree was corroborated by Tracer version 1.7.2.[63] looking for an estimated sample size (ESS) of ≥200.The posterior maximum clade credibility tree was summarized by Treeannotator [64] with a 10% burn-in, and the final tree was visualized with FigTree version 1.4.4 [62].Once the tree was generated, a GMYC was made with the single GMYC function of the splits package [68] in RStudio [42].This function is an optimization of the method by Pons [32] and combines phylogenetic (Yule model) and phylogeographic (Coalescence model) approaches [68].A second GMYC analysis was performed with a new ultrametric tree using the same parameters and adding 16 H.julitta sequences.

Ecological Analyses
A database with 475 occurrences of the species H. g. glauconome (346), H. g. grisea (17) and H. julitta (112) was obtained through their distribution area.Variables correlation analysis resulted in 12 of them having a strong correlation with at least one other.Five variables were selected by applying the permutation importance criteria (Table 1).
The precipitation of the driest month (bio14) and the mean temperature of the wettest quarter (bio08) are the variables with the highest permutation importance.Butterflies are sensitive to precipitation and temperature, driving their phenology and distribution.Hence, they are excellent bioindicators [69,70].The precipitation during the driest season is a key factor for these Hamadryas species.The modelling of each lineage is shown in Figure 1; all of them perform well, as AUC values show.From the five selected variables for the final model, the precipitation of the coldest quarter (bio19) had the highest permutation importance for H. g. glauconome; meanwhile, the precipitation of the driest month (bio14) remained for the other two lineages (Table 2).
Figure 1; all of them perform well, as AUC values show.From the five selected variables for the final model, the precipitation of the coldest quarter (bio19) had the highest permutation importance for H. g. glauconome; meanwhile, the precipitation of the driest month (bio14) remained for the other two lineages (Table 2).Schoender's D empirical estimation of niche identity shows that the H. g. glauconome niche overlaps more with H. g. grisea than with H. julitta, and the latter two are different (Table 3).The identity test leads to rejecting the null hypothesis and accepting that there is a significative difference between niches in all lineages.Fundamental niche estimation with the MVE and its overlapping analysis shows more overlapping between H. g. glauconome and H. g. grisea than H. g. glauconome and H. julitta, and there is no overlapping between H. g. grisea and H. julitta (31.32 > 9.06 > 0, respectively (Figure 2))-a similar result to Shoender's D analysis.Schoender's D empirical estimation of niche identity shows that the H. g. glauconome niche overlaps more with H. g. grisea than with H. julitta, and the latter two are different (Table 3).The identity test leads to rejecting the null hypothesis and accepting that there is a significative difference between niches in all lineages.Fundamental niche estimation with the MVE and its overlapping analysis shows more overlapping between H. g. glauconome and H. g. grisea than H. g. glauconome and H. julitta, and there is no overlapping between H. g. grisea and H. julitta (31.32 > 9.06 > 0, respectively (Figure 2))-a similar result to Shoender's D analysis.

Molecular Analyses
A total of 44 sequences were used for the molecular analysis (Table 4).Distances calculated with K2P in Mega show a mean interspecific distance of 1.53% between H. g. grisea and H. g. glauconome, 2.51% between H. julitta and H. g. glauconome; and 2.60% between H. julitta and H. g. grisea (Table 5).The mean intraspecific distance of H. julitta was 0.89%, H. g. glauconome was 0.20% and H. g. grisea was 0.34%.

Molecular Analyses
A total of 44 sequences were used for the molecular analysis (Table 4).Distances calculated with K2P in Mega show a mean interspecific distance of 1.53% between H. g. grisea and H. g. glauconome, 2.51% between H. julitta and H. g. glauconome; and 2.60% between H. julitta and H. g. grisea (Table 5).The mean intraspecific distance of H. julitta was 0.89%, H. g. glauconome was 0.20% and H. g. grisea was 0.34%.

Phylogenetic Inferences
The BI and ML data set consisted of 47 sequences.Both analyses produced highly congruent topologies.In both trees, the three taxa are monophyletic.The trees show good resolution and branch support in the major clades; however, most basal nodes remained unresolved, but the nodes at the level of species groups were generally well-resolved and received strong support both in BI (Figure 3) and ML (Supplementary Figure S1) analyses.

Generalized Mixed Yule Coalescent (GMYC) Analysis
The single threshold GMYC (sGMYC) model using the ultrametric phylogenetic tree with H. glauconome lineages, created in BEAST, resulted in the identification of two putative species with high probabilities (confidence interval [CI] = 2-2, ML of null model = 197.35,ML of GMYC model = 207.03,p = 0.0000621).These are significative values that lead to rejecting the null hypothesis and accepting that there is more than one species in the analyzed data.A single H. glauconome sequence forms a separated entity ([CI] = 2-4) from a Texas specimen: the northern distribution limit for this species (Table 6).The GMYC model using the ultrametric tree including H. julitta, created in BEAST, resulted in the identification of three putative species with high probabilities (confidence interval [CI] = 3-3, ML of null model = 318.95,ML of GMYC model = 327.82,p = 0.00014).These are significative values that lead to rejecting the null hypothesis and accepting that there is more than one species in the analyzed data (Table 6).

Generalized Mixed Yule Coalescent (GMYC) Analysis
The single threshold GMYC (sGMYC) model using the ultrametric phylogenetic tree with H. glauconome lineages, created in BEAST, resulted in the identification of two putative species with high probabilities (confidence interval [CI] = 2-2, ML of null model = 197.35,ML of GMYC model = 207.03,p = 0.0000621).These are significative values that lead to rejecting the null hypothesis and accepting that there is more than one species in the analyzed data.A single H. glauconome sequence forms a separated entity ([CI] = 2-4) from a Texas specimen: the northern distribution limit for this species (Table 6).The GMYC model using the ultrametric tree including H. julitta, created in BEAST, resulted in the identification of three putative species with high probabilities (confidence interval [CI] = 3-3, ML of null model = 318.95,ML of GMYC model = 327.82,p = 0.00014).These are significative values that lead to rejecting the null hypothesis and accepting that there is more than one species in the analyzed data (Table 6).

Discussion
Different traits evolve at diverse rates, challenging the species delimitation if only relayed in one of them, especially in recently diverged species [13].In this study, ecological evidence suggests that H. g. grisea is a different lineage from H. g. glauconome, and even more, it reflects the relationship between them, as the fundamental niche overlap between MVE and Schoender's D identity is greater than with H. julitta.Nevertheless, identity tests among all niche lineages are significant enough to consider them different.It will be important to analyze if there is a hybridization zone in the distribution limits of H. g. glauconome and H. g, grisea where niches overlap, which could affect the species designation of specimens from this area, as is seen in Lepidoptera species complexes [72].The divergence niche is well known for other groups of sister species [16,73] and its importance in evolutive processes [74].However, for some Lepidoptera specialist species, it has been proved that diversification processes are not associated with niche shift, and geographical distance is a preponderant factor [17].Our study settles a baseline for further studies that should focus on biogeography and phylogeography to explain the processes that originated and shaped the distribution of these lineages.
The ENM for each species was very precise, as occurrences shown in the map and the AUC confirm.Suitability for each species is well differentiated along the distribution range, limiting H. g. grisea to the northwest of Mexico and H. julitta to the Yucatan Peninsula but also predicting its presence in some small parts of Veracruz, Guatemala, and Honduras.However, the model excludes biotic components and barriers that could restrict H. julitta distribution to the Yucatan Peninsula.H. g. glauconome is more suitable from Mexico to Costa Rica, but most on the Pacific slope.This distribution confirms the "Y" pattern described for Mexico, limited by the Sierra Madre Oriental, Sierra Madre Occidental, and Trans-Mexican volcanic belt reported for some Mexican butterfly species [5,75].
The distance between H. g. grisea and H. g. glauconome is 1.5%, which is below the threshold to consider a different species in most groups [7] but is above the mean intraspecific divergence of 0.20% in H. g. glauconome and 0.34% for H. g. grisea.Nevertheless, some Lepidoptera complex species groups have only 1.1% divergence, and different traits support their delimitation [76].The genetic divergence threshold approaches, like barcoding, have limitations that often contain a degree of subjectivity and do not allow for hypothesis testing [77].Bayesian inference and maximum likelihood are widely used to solve relationships among species; although our analysis recovers the three monophyletic groups, which means different lineages, it is not possible to establish relationships among the groups because of the low support.Additional markers, as more sequences, are needed to solve relationships among the lineages.
The GMYC approach, through the maximum likelihood optimization of the set of nodes defining the transition between intra and interspecific processes, is how clusters are delimited.This likelihood allows for statistical inferences and hypothesis testing throughout the tree [31].The GMYC results in this study suggest that H. g. grisea, H. g. glauconome and H. julitta are different lineages.When excluding H. julitta from the GMYC analysis, one sequence from H. g. glauconome forms another entity, a specimen from the south of Texas; this is because of how clusters are defined in the GMYC analysis, as explained above.Because support for this node is low (0.24), further analysis that includes more samples and incorporates other molecular markers is needed.In a previous Hamadryas phylogenetic study [9], the glauconome group is considered paraphyletic as H. julitta is clustered within.Furthermore, certain divergence levels are seen among H. glauconome individuals: two from Costa Rica forming a cluster closer to H. julitta and one specimen from Texas that diverges from the others.It should be noted that in Garzón-Orduña et al. [9], no H.glauconome lineage from northwest Mexico was included, and the Texas specimen is different from the one in our study.
It is important to point out that morphology studies from these lineages have been performed with adult specimens [1,12], and the long-standing tradition of looking for genitalic differences responds to the morphological species concept, as a probe of reproductive isolation.However, in Lepidoptera, reproductive isolation is not only performed through genitalic morphology; behavior, chemical, and visual signaling act as a reproductive barrier between species [78][79][80][81].In male Hamadryas, there is a structure called "rami" that is thought to be an organ that secretes pheromones to attract females [1]; in this way, it acts as a reproductive barrier.It proves the need to include different evidence aside from morphology when delimiting species.
In Lepidoptera complex species, sometimes the adults seem to be "cryptic"; nevertheless, when studying immature stages, morphological differences are evident [11,76].A morphological approach that includes all lineages' immature stages will be necessary.Even

Figure 2 .
Figure 2. Minimum volume ellipsoid analysis, H. g. glauconome in color blue, H. g. grisea in green and H. julitta in pink.(a) X-Y view, (b) X-Z view and (c) Y-Z view.

Figure 2 .
Figure 2. Minimum volume ellipsoid analysis, H. g. glauconome in color blue, H. g. grisea in green and H. julitta in pink.(a) X-Y view, (b) X-Z view and (c) Y-Z view.

Figure 3 .
Figure 3. Ultrametric tree constructed using Bayesian inference and based on 47 sequences (658 bp) of the cytochrome oxidase c subunit I gene of specimens currently identified as Hamadryas g. glauconome; H. g. grisea, and H. julitta.This phylogenetic tree includes H. februa, H. amphicloe and is rooted using H. februa as an outgroup.The posterior probabilities of the major strongly supported clades are given in the nodes on the left side, and the bootstrap values from the maximum likelihood analysis are on the right side.

Figure 3 .
Figure 3. Ultrametric tree constructed using Bayesian inference and based on 47 sequences (658 bp) of the cytochrome oxidase c subunit I gene of specimens currently identified as Hamadryas g. glauconome; H. g. grisea, and H. julitta.This phylogenetic tree includes H. februa, H. amphicloe and is rooted using H. februa as an outgroup.The posterior probabilities of the major strongly supported clades are given in the nodes on the left side, and the bootstrap values from the maximum likelihood analysis are on the right side.

Table 1 .
Permutation and contribution analysis from the 19 environmental variables.
* Variables selected to run the final model with each lineage.

Table 2 .
Selected variables for final models, their percentage of contribution and permutation importance.
* Variable with the highest permutation importance.

Table 3 .
Identity test results of empirical values and critical values (p = 0.05).All empirical values are below critical values.

Table 2 .
Selected variables for final models, their percentage of contribution and permutation importance.

Table 3 .
Identity test results of empirical values and critical values (p = 0.05).All empirical values are below critical values.

Table 4 .
Sequences process ID from BOLD and GenBank Accession numbers.

Table 5 .
Mean interspecific distance calculated in Mega with K2P.

Table 6 .
Results of the GMYC entity assignments.

Table 6 .
Results of the GMYC entity assignments.

GMYC Entity Assignment of H. glauconome Sequences GMYC Species Number of Sequences Species
* Sequence of a specimen from South of Texas.