1. Introduction
Prokaryotic and eukaryotic cells utilize glycerol, a three-carbon polyhydric alcohol, as a metabolic intermediate for anaerobic fermentation or aerobic glycolysis, gluconeogenesis, and the biosynthesis of triacylglycerols and phospholipids [
1,
2]. In addition, due to its colligative properties, organisms as diverse as algae, fungi, insects, and fishes can accumulate glycerol to alleviate osmotic stress or as an antifreeze metabolite [
3,
4,
5,
6,
7,
8]. Though glycerol may passively diffuse across cell membranes [
9], its transport is greatly facilitated by a group of integral membrane proteins termed the aquaglyceroporins (Glps) [
10,
11,
12,
13,
14,
15]. Glps were first identified in bacteria as the
Escherichia coli glycerol facilitator (GlpF), and they have been phylogenetically and functionally shown to belong to a superfamily of transmembrane water-conducting channels, the aquaporins [
16,
17,
18,
19].
Subsequent research has revealed a complex evolutionary history of channels capable of facilitating the transmembrane conduction of non-polar glycerol. For example, other members of the aquaporin (AQP) superfamily, including Archaean AqpM, plant GIPs and NIPs, insect Eglps, and vertebrate AQP6, -8, -11, and -14, also evolved this biophysical property [
20,
21,
22,
23,
24,
25,
26,
27]. In the cases of land plants, which acquired GIPs and NIPs via the horizontal gene transfer of bacterial channels, and hemipteran and holometabolous insects, which evolved more efficient Eglps from mutated AQP4-related channels, the genomic expansions of these new forms of glycerol transporters have been found to correlate to the supplantation of their
glp genes [
13,
25,
26,
28,
29]. In nearly all other organisms studied to date, however, the phylogenetically conserved
glps remain the most ubiquitously selected form of glycerol transporter.
Following the discovery of the bacterial GlpF, multiple Glps were identified in placental mammals (
AQP3, -
7, -
9, and
-10), which were named according to the chronology of aquaporin gene discovery [
30,
31,
32,
33]. In line with this nomenclature, a fifth mammalian Glp, termed
AQP13, which is phylogenetically related to a frog Glp [
34], was identified in the genome of the platypus (
Ornithorhynchus anatinus), an extant member of the egg-laying prototherian mammals [
35]. Conversely, in non-mammalian model organisms, genome-wide studies revealed seven
glp genes in zebrafish (
Danio rerio) (
aqp3a, -
3b, -
7, -
9a, -
9b, -
10a, and
-10b) [
36] and thirteen related orthologs in the paleotetraploid Atlantic salmon (
Salmo salar) [
37]. Such gene copy numbers represent ~
1/
3 of their genomic aquaporin complements, highlighting their importance for cellular homeostasis. In accordance with the zebrafish information network (ZFIN) nomenclature recommendations, the teleost
glps were named with respect to their phylogenetic relationships to the mammalian orthologs, although a unified nomenclature for piscine aquaporins has yet to be established.
As for the high aquaporin gene copy numbers in land plants, with up to 120 paralogs in species such as the rapeseed (
Brassica napus) [
38,
39], the multiplicity of the teleost and mammalian repertoires has been associated with serial rounds of whole genome duplication (WGD) [
40]. For example, in chordates, two rounds (R1 and R2) are considered to have occurred >500 million years ago (Ma), while a third round (R3) occurred ~300 Ma prior to the diversification of Teleostei [
41,
42]. Subsequently, within the teleost lineage, a fourth round (R4) of autotetraploidization occurred ~80–100 Ma in the common ancestor of Salmoniformes, and allotetraploidization ~12.4 Ma in specific cyprinid lineages [
41,
42,
43,
44,
45,
46,
47]. It is nevertheless thought that genome reduction has been the dominant mode of evolution, with duplicated paralogs typically silenced within a few million years [
48,
49]. As a result, many genes are likely to have been lost, which can obscure interpretations of the origin and modes of diversification of a gene superfamily. Asymmetric gene loss can also have undesired consequences for estimations of divergence times and interpretations of the adaptive functions of the encoded proteins, since it is usually assumed that single gene families represent the same ortholog in each species included in the data set [
50,
51]. This might be particularly problematic for distantly related piscine lineages that experienced both common and independent WGD and/or gene duplication events, along with the subsequent asymmetric loss of paralogs long after the duplication events. Conversely, the asymmetric acquisition of new genes or the absence of gene differentiation due to concerted evolution may further compound interpretations of historical duplication events.
Amongst more basal lineages of deuterostomes, preliminary studies have indicated that echinoderms such as the purple sea urchin (
Strongylocentrotus purpuratus) can harbor similar aquaporin and
glp gene copy numbers to birds, reptiles, and mammals; however, its genome did not undergo WGD [
35,
52]. How such copy numbers arose and whether they typify the
glp repertoires of other non-chordate deuterostomes remains to be established.
With the increased availability of public data from large-scale genome sequencing initiatives, it is becoming possible to address such questions in broad groups of distantly related organisms. In the present study, we leveraged >550 genomes and >300 transcriptomes to re-evaluate the evolutionary history of the Glp grade of aquaporins in Deuterostomia using Bayesian and maximum likelihood inference coupled with syntenic analyses. We found that tandem (intrachromosomal) duplication played an unexpectedly important role for glp gene evolution and retention in basal lineages, including starfishes, sea urchins, and sea cucumbers (Echinodermata), acorn worms (Hemichordata), lampreys (Hyperoartia), chimaeras and elasmobranchs (Chondrichthyes), as well as spiny ray-finned fishes (Actinopterygii). We assembled and phylogenetically analyzed >100 pseudogenes to validate the proposed evolutionary history, and we experimentally tested the functionality of the channels in a paleotetraploid teleost, the Atlantic salmon, which experienced the most complicated history of gene evolution. Based upon the collated findings, we propose a new pandeuterostome gene nomenclature for the Glp grade of glycerol transporters.
4. Discussion
The present work provides a reasonably comprehensive overview of the evolution of glycerol transporters in deuterostome organisms. The findings showed that gene expansion in Ambulacraria was primarily driven by tandem duplication that generated three major glp subfamilies (glp1, -2, and -3), while non-vertebrate cephalochordate and tunicate lineages mostly evolved single copy glp genes. Tandem duplication was further responsible for lineage-specific expansions within each of the Ambulacrarian glp subfamilies, leading to increased repertoires of glp3-type channels in hemichordate acorn worms, glp2-type channels in ophiuroid brittle starts, and glp1-type channels in echinoderm sea urchins, sea cucumbers, and starfishes. A surprising feature of the evolution of vertebrate glps is that although serial rounds of WGD can explain much of the observed channel diversity, tandem duplication also played an important role in shaping the genomic repertoires.
Previous analyses of the origins of the vertebrate
glp subfamilies have indicated that they likely originated in a narrow window time in association with WGD [
12,
35,
78]. However, a confounding aspect of the evolution of these channels in vertebrates has been the observation that certain lineages, including prototherian mammals and anuran frogs, harbor a fifth
glp gene—the
AQP13 channel [
34,
35]. The current analyses confirmed the existence of
AQP13 in Prototheria and Amphibia but further provided phylogenetic support for its existence in hagfishes (Hyperotreti). Though the latter support could be caused by long-branch attraction, the extensive codon analyses consistently placed one of the two hagfish sequences in an ancestral position of the
AQP13 cluster, irrespective of the taxonomic sampling, while the other sequence tended to cluster between the
AQP13 and
AQP3 channels. This indicated that the
AQP13 subfamily is ancient and potentially one of the founding members of the vertebrate
glps. The data for lamprey (Hyperoartia) showed that the two
aqp3-like sequences consistently cluster with gnathostome
aqp3, while the other
aqp9_13-like sequences, which evolved through tandem duplication, either cluster with gnathostome
aqp9 or between the
aqp3 and
aqp13 channels. Conversely, the gnathostome channels robustly resolve into the five
glp subfamilies (
aqp3,
-7,
-9,
-10, and
-13). If we apply the one-to-four rule for two rounds of WGD [
79], only four of these subfamilies can be explained by WGD. Since there are currently >1000 vertebrate genomes sequenced and none show evidence of a sixth
glp subfamily, it seems likely that one of the gnathostome subfamilies evolved through a different mechanism.
The best candidate would seem to be
AQP7, which is tandemly arrayed downstream of
AQP3 in all Sarcopterygian genomes. However, as revealed by the
glp1b genes in asteroids, which are juxtaposed to the
glp3 channels, a tandem locus may not be evidence of the genes duplication origin. Furthermore, until now, it has not been possible to provide extensive evidence for the linkage between
aqp3 and
aqp7 genes in actinopterygian fishes, nor a similar linked gene in cartilaginous fishes (Chondrichthyes). The data for basal lineages of actinopterygian fishes now show the linkage of
aqp7 to the
aqp3 paralog in Cladistia and Chondrostei and to the
aqp3b paralog in Otomorphan teleosts. We further uncovered a tandem duplicate downstream of the
aqp3 paralog in all lineages of cartilaginous fishes. However, in this latter instance the phylogenetic data revealed that the downstream paralog is not an
aqp7-type channel but a second form of
aqp3 (
aqp3C2). We therefore found no evidence for the existence of
aqp7-type channels in any vertebrate lineage prior to the evolution of true bony fishes (Osteichthyes). It thus seems reasonable to propose a new scheme for the evolution of
glps in Deuterostomia (
Figure 7). The scheme combines the observations in this study with the divergence times of the different lineages [
77], and it shows that
glps indeed likely arose during a narrow window of time through tandem duplication in Ambulacraria, and WGD together with tandem duplication in Chordata.
The above interpretation requires the loss of the
aqp13 paralog in most of the vertebrate lineages, yet its survival in Hyperotreti, Hyperoartia, Amphibia, and Prototheria. While such haphazard gene survival may seem unlikely, due to the ~250 million years separating the last common ancestor of the four lineages [
80], it is nevertheless the pattern of
aqp10ba evolution observed in the Gadiformes order of teleost fishes. In this latter instance, we also uncovered complete orthologs of the
aqp10ba channel in the ostariophysan (electric eel), protacanthopterygian (selected salmonids), and aulopiform (Atlantic greeneye) lineages. Our models showed that the
aqp10ba channel arose as an R3 duplicate of
aqp10aa ~300 Ma, and although it was lost in the majority of teleost lineages, it still survives in Gadiformes, which diverged ~225 million years after the R3 event. The proposed scheme of
glp evolution in Chordata also implies that the lamprey
aqp9_13L tandem duplicates are in fact
aqp13 channels. While our phylogenetic data did not demonstrate this facet, it seems possible that over the ~475 million years of evolution since the separation of the Petromyzontiform order of lampreys and the Myxiniformes order of hagfishes [
76,
81], the
aqp9_13L channels could have convergently evolved molecular features of the gnathostome
aqp9 channels. As in our previous analysis of the aquaporin superfamily in vertebrates [
35], the present phylogenetic data for the four lamprey
glps did not support their R2 origin, which has been suggested to have occurred before the divergence of cyclostomes and gnathostomes [
82,
83]. On the contrary, it rather supported their lineage-specific interchomosomal and tandem duplications within petromyzontiform lampreys.
While the timing of the R2 WGD in the chordate lineage continues to be debated [
84,
85], its effect on gnathostome gene and genome evolution is less controversial [
41,
86,
87,
88]. Similarly, the timing and effects of WGD in the teleost lineage are well documented [
42,
89] and consistent with the evolution of the
glp channels examined here. It is nevertheless recognized that genome reduction (rediploidization) is a dominant mode of evolution [
49]. Consequently, it is surprising to note the asymmetric gene evolution of
glp genes in actinopterygian fishes compared to tetrapods. Both lineages evolved over similar time scales (~400 million years), but tetrapods have mostly retained the same four paralogs (
AQP3,
-7,
-9, and
-10) with comparatively little gene loss. The main exception is the
AQP10 gene, which appears to be lost in Testudines and Prototheria and is a pseudogene in Ruminantia [
70], several families within Muroidea [
69], and some species of Metatheria. By contrast, we found substantial evidence of multiple
glp gene losses in actinopterygian fishes, with a loss of
aqp7 in Holostei and
aqp7b in Teleostei,
aqp10ba and
-10ab in most lineages of Teleostei,
aqp3a and
-9b in Osteoglossiformes,
aqp3b and
-10aa in Elopomorpha, and
aqp3b in many orders of Percamorphaceae. Similarly, non-WGD
glp expansions in Tetrapoda are rare, with only a few instances of tandem duplication at the individual species level. One exception is at the terminal branches of primate evolution, where tandem duplication of canonical
AQP7 occurred at least as far back (~10–13 Ma) in the last common ancestor (LCA) of Homininae [
90] but continued to duplicate in chimpanzees and humans. Subsequently, genomic rearrangements reduced the repertoire of functional
AQP7 channels in humans, leaving six pseudogenes spread across two chromosomes. Such differences may be related to the fruit and vegetable diet of gorillas and chimpanzees compared to the more omnivorous diet of humans [
91]. Amongst Actinopterygii, and indeed the more basal lineages of Chondrichthyes and Hyperoartia, we found ample evidence of lineage-specific gene retention of additional
aqp3 and
-10 channels in the aftermath of tandem duplication. In the absence of detailed phylogenetic and syntenic analyses, such homologs could confound BLAST searches aimed at identifying WGD duplicates or single-copy orthologous sequences for divergence time estimations. For example, our analyses revealed that even for relatively recent duplications such as the
aqp3a1 and
aqp3a2 channels in Cichliformes and the
aqp10aa1 and
aqp10aa2 channels in Cyprinodontoidei, divergence time estimations could differ by millions of years. These observations highlight the complex challenges facing evolutionary investigations of actinopterygian fishes.
The paleotetraploid Salmoniformes are an order of teleosts that clearly represent such a challenge. Our new estimate of the ancestral
glp gene copy number was 20 paralogs, of which approximately half are functional. While this is consistent with rediploidization and a differential retention of post-R4 paralogs compared to the R3 paralogs [
92], our data revealed unexpectedly complex genomic rearrangements, even in closely related paralogs. The strange case of the
aqp10aa1 and
aqp10aa2 channels indicated that they not only experienced concerted evolution post R4 but have been differentially translocated to become linked in several species, including the rainbow trout (
Oncorhynchus), arctic charr (
Salvelinus), Atlantic salmon (
Salmo), and grayling (
Thymallus) but not in other congeners such as the coho salmon (
Oncorhynchus) or the brown trout (
Salmo). Assuming that the genome assemblies are correct, then these observations are consistent with the bursts of DNA transposon activity including the DTSsa15-like Tc1/mariner type identified in the present study coinciding with speciation events [
75]. However, since we predict that the
aqp10aa2 genes are products of R4 as indicated by the syntenic analyses, the absence of large-scale gene conversion events seems paradoxical. The sequence homogenization is not restricted to the
aqp10aa1 and
-10aa2 genes, as it is also evident in the downstream
aqp10ab1 and
-10ab2 pseudogenes, which evolved ~300 million years earlier by tandem duplication in the LCA of Actinopterygii. The observed concerted evolution may therefore be associated with the transposon-mediated interlocus transposition of the binary clusters rather than canonical gene conversion. It is nevertheless clear that the
aqp10aa1 gene displays the most restricted tissue expression profile of all of the salmonid
glps, even compared to the
aqp10bb1 channel [
93], indicating that despite the absence of differentiation in the coding regions, the cis-regulatory regions have nonfunctionalized.