Is Coloburiscidae (Ephemeroptera) Monophyletic? A Comparison of Datasets

: Coloburiscidae consists of three living genera with a Gondwanan distribution— Coloburiscoides from Australia, Coloburiscus from New Zealand, and Murphyella from Chile. Molecular-based phylogenetic analyses of Ephemeroptera (mayﬂies) have been somewhat successful in resolving many higher-level relationships in the order. Most of these analyses, however, have been ambiguous with respect to the family Coloburiscidae. This study presents the ﬁrst phylogenetic analysis speciﬁc to Coloburiscidae using data generated from 448 phylogenomic sequences and data generated from the Sanger sequencing of ﬁve genes: 12S, 16S, 18S, 28S, and H3. Bayesian and likelihood analyses were conducted on each dataset and, ultimately, on a combined dataset of the two. Coloburiscidae was shown to be supported as monophyletic in each instance where the phylogenomic data were included. Coloburiscoides was shown as sister to Murphyella + Coloburiscus.


Introduction
Mayflies represent an ancient order of winged insects that date back 300 million years [1][2][3]. The extant lineages of the order are found in freshwater ecosystems worldwide, except for the continent of Antarctica and some remote islands [4,5]. Currently, mayfly diversity constitutes around 40 families, with approximately 460 genera, and almost 3700 described species [6]. While some families are believed to be of Gondwanan origin, today there are only four families that exhibit a strict Gondwanan distribution: Amelotopsidae, Coloburiscidae, Nesameletidae, and Oniscigastridae [4]. Of these, the monophyly of Coloburiscidae and Ameletopsidae remains elusive.

Review of Taxonomy
The family Coloburiscidae, sometimes described as the spinose-gilled mayfly family, is relatively small in comparison to other mayfly families and has only three genera: Coloburiscoides Lestage (1935), Coloburiscus Eaton (1888), and Murphyella Lestage (1930). Coloburiscoides and Coloburiscus are only found in Australia and New Zealand, respectively, while Murphyella is endemic to Chile, thus displaying a Gondwanan distribution [4,[7][8][9]. The family is not currently believed to have a presence in the remaining Gondwanan land masses, but this could be due to a comparatively decreased effort to explore mayfly taxonomy in underdeveloped countries. In total, the family has seven described species (see Table 1).
Higher-level classifications within Ephemeroptera have been problematic for decades. Edmunds [10] considered Coloburiscidae as a subfamily of Siphlonuridae. Riek [11] instead proposed a subfamily status under Oligoneuridae, while Landa [12] proposed Coloburiscidae as its own family. Later, McCafferty [13,14] developed a classification system of Ephemeroptera, where Coloburiscidae was proposed as a sister to the families Isonychiidae, Oligoneuridae, and Heptageniidae (including Arthroplea and Pseudiron). Together, these four families constituted the suborder Setisura due to several putative apomorphies within the group [13]. Kluge [15] developed a separate nomenclature system than McCafferty, in which Coloburiscidae had a family status; however, Kluge does not refer to any formal analysis of characteristics in his review [15,16].

Review of Relationships
Molecular data brought new insights into the relationships of mayflies. For example, the position of Ephemeroptera to all other extant pterygotes [17] was examined. The relationships of all the families of Ephemeroptera were investigated in a combined analysis [5]. The morphological data were composed of 101 characters and the molecular data came from the Sanger sequencing of nuclear and mitochondrial genes: 12S, 16S, 18S, 28S, and H3. Significant findings from this analysis include strong support for the monophyly of Ephemeroptera, as well as for several other major lineages proposed by McCafferty [14] and Kluge [15], while others were not corroborated as monophyletic. Thus, it was recognized that future robust phylogenetic analyses were needed to resolve previously ambiguous relationships. With respect to Coloburiscidae, the findings failed to support both the suborder Setisura and the family Coloburiscidae as monophyletic groups (Figure 1). The monophyly of Coloburiscidae was never supported in any of the molecular data trees in the 2009 analysis [5]. However, it was supported as monophyletic in the morphology tree, with Coloburiscus + Murphyella being sisters to Coloburiscoides. The most recent "initial evaluation" phylogenetic analysis on Ephemeroptera was conducted using targeted capture and next-generation sequencing of 448 protein-coding regions [18]. Thirty-five families of mayflies were represented in the 105 taxa dataset. This analysis represented the largest ever phylogenetic analysis of the order and brought new insights into many higher-level relationships with strong support values. However, this   The most recent "initial evaluation" phylogenetic analysis on Ephemeroptera was conducted using targeted capture and next-generation sequencing of 448 protein-coding regions [18]. Thirty-five families of mayflies were represented in the 105 taxa dataset. This analysis represented the largest ever phylogenetic analysis of the order and brought new insights into many higher-level relationships with strong support values. However, this work was a preliminary proposal of relationships, presented at the international meeting for Ephemeroptera, and emphasized the importance of large datasets for resolving relationships. This analysis only used amino acids and did not examine nucleotide datasets. Of note, Coloburiscidae was found to be monophyletic, with Coloburiscoides + Coloburiscus being sisters to Murphyella. Hence, this study contradicted the molecular data analyses of Ogden et al. (2009) [5]. Furthermore, the 2019 [18] analysis did not recover the same arrangement for the three genera of Coloburiscidae. The 2019 analysis only used an amino acid dataset in a Bayesian framework. Hence, additional scrutiny of these data is necessary to elucidate the monophyly and relationships of the genera of this family.
Considering the contradictory results between the Ogden et al., 2009 analysis [5] and the Ogden et al., 2019 analysis [18], and that the 2019 analysis was a preliminary approach, this study aims to further test the monophyly of Coloburiscidae and the relationships between the three genera. To this end, this research will: (1) conduct Bayesian and maximum likelihood (ML) analyses of Coloburiscidae using the same five genes (five-gene Sanger dataset) from the 2009 [5] analysis, with some additional sequenced data; (2) conduct Bayesian and ML analyses using the phylogenomic dataset generated as part of the analysis in Ogden et al., 2019 [18]; and (3) conduct Bayesian and ML analyses of both datasets combined as a single dataset.

Taxonomic Sampling
The total dataset consists of 23 taxa. Ingroup taxa include 1 representative from each of the three genera of Coloburiscidae and 19 other closely related taxa used as representatives for key lineages and families. Siphluriscus has been supported as the most basal taxa within the order Ephemeroptera and will be used as the outgroup [5,18].

5-Gene Sanger Dataset and Analysis
The Ogden et al. 2009 dataset had some missing data in the five Sanger genes. In order to complete the dataset further, DNA extraction was performed on the thorax or legs of each specimen using the Qiagen DNeasy Blood & Tissue kit (Valencia, CA, USA) and following the animal tissue protocol. Sequences were targeted for amplification via the standard polymerase chain reaction using the BioRad ® T100 Thermo Cycler (Hercules, CA, USA). The primers for each gene are the same as used in the Ogden et al. [5] molecular analysis. Gel electrophoresis was used to confirm the successful amplification of genes. The purification of DNA was conducted using the QIAquick PCR Purification kit (Valencia, CA, USA) and following the standard protocol. The purified DNA was sequenced at Psomagen Inc. (Rockville, MD, USA). The sequences were manually curated with Sequencher ® 5.2.4 [19]. The newly generated data for genes 12S, 16S, 18S, 28S, and H3 (Supplementary Materials  Table S1) were combined with the data from the 2009 dataset, and MUSCLE software was used to align the DNA sequences [20,21] using the default settings. Aligned gene regions were concatenated using Sequence Matrix 1.8 [22]. A Bayesian analysis was conducted using MrBayes [23,24] with the nst = 6 invgamma model for 10,000,000 generations. From the cold chain, the first 25% of the sample was discarded as the burn-in. The analysis resulted in a final split frequency of 0.0055. IQTREE software [25] was used to run an ML analysis with 1000 ultrafast bootstraps using the best-fit model selected by IQTREE: GTR + F + I + G4.

Phylogenomic Dataset and Analysis
Molecular data for each taxon (see Table 2) were generated in the Ogden et al., 2019 analysis [18]. Detailed information on protocols and workflow is specified elsewhere [3,18]. In summary, probe kits were designed for orthologous protein-coding loci across the genomes of all taxa. Library preparation, hybrid enrichment, and sequencing were conducted at RAPiD Genomics (Gainesville, FL, USA) using Illumina HiSeq 3000. Assembly and data cleanup were conducted using the anchored phylogenomics pipeline of Breinholt et al. [26]. The alignment of each locus was performed using MAFFT with default parameters. The phylogenomic data were constructed in two ways and analyzed in Bayesian and ML frameworks. First, only the first and second positions of each codon were included due to their conserved nature (DNA12 dataset) [3]; there was evidence of third position saturation. Second, the nucleotides were translated into amino acid sequences (AA dataset). The Bayesian analyses were conducted using MrBayes [23,24] for 10,000,000 generations with four chains. The first 25% of the sample was discarded as the burn-in for all runs. The model for the DNA12 dataset was nst = 6 and invgamma, and for the AA dataset, the Aamodel was used to provide a mixture of models with fixed rate matrices. The MrBayes analyses resulted in final split frequencies <0.003. IQTREE software [25] was used to run an ML analysis with 1000 ultrafast bootstraps. The best-fit model selected by IQTREE for the DNA12 dataset was GTR + F + I + G4, and for the AA dataset, the mtZOA + F + I + G4 was the best-fit model. To further test branch support, an SH-like aLRT with 1000 replicates was also carried out in IQTREE.

Combined Dataset and Analysis
The aligned sequences from the 5-gene Sanger dataset and the DNA12 dataset were concatenated using Sequence Matrix 1.8 [22]. Bayesian and ML analyses were conducted as described above. The nst = 6 invgamma model was used in MrBayes and the GTR + F + I + G4 model was used in IQTREE for the ML analysis.

Results
The alignment for the five-gene Sanger dataset was 5321 bp in length with 1045 parsimony informative sites. The ML IQTREE phylogenetic reconstruction results are shown in Figure 2. Coloburiscidae was not recovered as monophyletic, but Coloburiscoides was shown to be a sister to Murphyella. The alignment for the phylogenomic dataset was 61,116 bp in length with 4859 parsimony informative sites. The ML IQTREE phylogenetic reconstruction of the DNA12 results is shown in Figure 3. Coloburiscidae was strongly supported (100% SH-aLRT and 100% bootstrap) as monophyletic, and Coloburiscoides was somewhat supported (88% SH-aLRT and 90% bootstrap) as being a sister to Coloburiscus + Murphyella. The combined dataset tree shown in Figure 4 also strongly supported (100% SH-aLRT and 100% bootstrap) the monophyly of Coloburiscidae, and somewhat supported (88% SH-aLRT and 93% Bootstrap) Coloburiscoides as being a sister to Coloburiscus + Murphyella.  The AA dataset analysis (not in the figures as it was similar to the 2019 analysis) in ML and Bayesian analyses resulted in a strongly supported monophyletic Coloburiscidae; however, there was weaker support for the Murphyella-Coloburiscoides + Coloburiscus relationship, contradicting the relationship between the three genera found in the DNA12 dataset results. The AA dataset analysis (not in the figures as it was similar to the 2019 analysis) in ML and Bayesian analyses resulted in a strongly supported monophyletic Coloburiscidae; however, there was weaker support for the Murphyella-Coloburiscoides + Coloburiscus relationship, contradicting the relationship between the three genera found in the DNA12 dataset results.

Discussion
The purpose of this research was to further investigate the relationships of Coloburiscidae through molecular-based phylogenetic analysis. Coloburiscidae was shown to be monophyletic each time the phylogenomic data were included in any methodological framework, dataset (morphological, DNA12, or AA), or analysis method (ML or Bayesian). Hence, it can be strongly concluded that Coloburiscidae is a monophyletic lineage.

Discussion
The purpose of this research was to further investigate the relationships of Coloburiscidae through molecular-based phylogenetic analysis. Coloburiscidae was shown to be monophyletic each time the phylogenomic data were included in any methodological framework, dataset (morphological, DNA12, or AA), or analysis method (ML or Bayesian). Hence, it can be strongly concluded that Coloburiscidae is a monophyletic lineage. Of the other taxa selected for this analysis, it is also strongly supported that the Coloburiscidae is sister to the lineages Leptophlebiidae, Oligoneuridae, and Ephemerellidae, which aligns with the 2019 tree.
However, the relationships between the three genera are not as concordant across all the analyses. The DNA12 dataset supports Coloburiscoides as a sister to Murphyella + Coloburiscus, with fairly high support values (100% posterior probability in the Bayesian analysis, >90% bootstrap, and >87% SH-aLRT in the ML analysis) and agrees with the morphological tree (Figure 4 of Ogden et al., 2009) [5]. The AA dataset results somewhat weakly support (92% posterior probability in the Bayesian analysis, 65% bootstrap, and 12% SH-aLRT) Murphyella + Coloburiscus + Coloburiscoides. Not surprisingly, this was the same result from the Ogden et al., 2019 [18] analysis that also used amino acid sequences as the base dataset.
The question remains, which relationship of the three genera is correct? The morphology tree from 2009 and the DNA12 dataset of this study (with its relatively higher support values than the AA dataset results) support the Coloburiscoides as a sister to Murphyella + Coloburiscus as the most likely proposal. However, the AA dataset supports the sister relationship of Coloburiscus + Coloburiscoides, which aligns better with the Gondwana breakup consensus that Australia and New Zealand would have had land bridges in the more recent past. The fragmentation of Gondwana, which began approximately 150 Mya [27,28], continues to be examined within the field of biogeography as a growing number of studies point to organismal distribution patterns that can be explained by such a process [29][30][31][32]. Perhaps the most famous example of Gondwanan distribution is the southern beeches (Nothofagus) found throughout Australasia and South America [33], with a fossil record dating back 80 million years [34]. Gondwanan vicariance is widely accepted to have played a major role in distribution and speciation; however, several studies caution against the tendency to invoke these geologic events as the only possible explanation for them [33,34]. An alternate hypothesis for observable patterns of distribution among Coloburiscidae includes dispersal events. While it has been generally hypothesized that mayflies are poor dispersers, some oceanic and volcanic islands have been colonized with subsequent in situ radiation [6].
Thus, the intrafamilial relationships remain mostly inconclusive; however, these results and the 2019 analysis firmly support the monophyly of the family Coloburiscidae and its placement relative to other mayfly families. The five-gene Sanger dataset failed to support Coloburiscidae as monophyletic (individual analyses of each gene likewise did not support monophyly) and showed low support values across many nodes. Therefore, while these genes have been touted as somewhat successful in estimating relationships in previous analyses, one can only infer that they are not informative for all depths in an evolutionary tree, especially for relationships as old as the ones being examined in these lineages of mayflies. This point further illustrates the importance and effectiveness of robust datasets (i.e., more loci and more taxa) and analyses in resolving higher-level relationships.