Gene Structure Evolution of the Short-Chain Dehydrogenase/Reductase (SDR) Family

SDR (Short-chain Dehydrogenases/Reductases) are one of the oldest and heterogeneous superfamily of proteins, whose classification is problematic because of the low percent identity, even within families. To get clearer insights into SDR molecular evolution, we explored the splicing site organization of the 75 human SDR genes across their vertebrate and invertebrate orthologs. We found anomalous gene structures in members of the human SDR7C and SDR42E families that provide clues of retrogene properties and independent evolutionary trajectories from a common invertebrate ancestor. The same analyses revealed that the identity value between human and invertebrate non-allelic variants is not necessarily associated with the homologous gene structure. Accordingly, a revision of the SDR nomenclature is proposed by including the human SDR40C1 and SDR7C gene in the same family.


Introduction
Short-chain dehydrogenase/reductase (SDR) is a superfamily of NAD(P)-dependent oxidoreductases and related enzymes, present in all living organisms [1][2][3]). For almost all members of the family, the structure of the gene, as well as the tertiary structure and the catalytic activity of the protein, have been defined. Despite their low sequence identity (typically from 10% to 30% of conserved sequence identity among families), SDR monomers have very similar tertiary structures and two specific sequences, one responsible for the specific binding of the NAD(P) coenzyme and the other including amino acids directly involved in the catalysis of various SDR substrates.
Among SDR enzymes, the divergence of the monomers increases toward their C-terminals, where the amino acid responsible for the substrate binding-site are localized [1][2][3]. Sequence variability at the SDR substrate-binding site is associated with a large spectrum of substrates, resulting in unique active sites and substrate specificities [2]. Concerning SDR differences and functions, it is worth mentioning their emerging role in human diseases. Indeed, knowledge has accumulated concerning the role that SNP variants in specific human SDR genes plays in a wide variety of cancers [4][5][6][7] and metabolic diseases [8][9][10][11][12]. The pattern of low sequence/high structural similarity suggests a fusion of domains specific of a common coenzyme and a wide spectrum of substrates [13]. It has also been argued that the alcohol dehydrogenase (ADH) in Drosophila melanogaster model organism is the ancestral "prototype" of the mammalian SDR superfamily and was responsible for the radiation of fruit fly species worldwide some 65 Mya [14]. Since then, many new data have accumulated in the genetic databases. Persson et al. [3], with an accurate analysis of protein sequences in all living species, identified 75 human SDR non-allelic variants classified, upon a clustering approach based on Hidden Markov Models (HMMs), into 48 families out of a total of 200, each having from one to eight members.
The low percent identity upon which SDR genes classification is based and the uncertainty about the relationships between structure and function across species make urgent a more comprehensive evolutionary analysis of SDR genes.
Splicing site organizations are generally conserved over very long evolutionary distances, as they demonstrated to retain information about gene homology better than other molecular traits, especially when genes belong to a large family [15,16]. Accordingly, splicing sites organization can be useful to identify cryptic homologous variants of SDR genes belonging either to the same or to different species [17,18]. In the present paper, we annotated the splicing site organization and the splicing site phase number of each member of the human SDR families, as well as the splicing site organization of their vertebrate and invertebrate orthologs, to achieve a more comprehensive nomenclature.

Methods
Computational tools, SDR gene and protein data were obtained from public primary databases: European Bioinformatics Institute (EBI), UK; the Genome Browser of the University of California, USA; National Center for Biotechnology Information (NCBI), Maryland, USA; ExPASy, Swiss Institute of Bioinformatics (SIB), Switzerland.
Vertebrate and invertebrate orthologs of human SDR variants were detected using BLASTp (release 2.13.0, NCBI). Protein sequences alignments and their identity values were obtained using Clustal Ω (release 1.2.2, EBI). Exon-intron organizations, intron physical positions, phase numbers, exon-intron boundary sequences and observance of the AG-GT/GC rule were manually checked upon gene sequence/cDNA alignments. The intron phase denotes the position of the intron within a codon: Phase 0, 1 or 2 depending whether it starts before the first base, after the first base, and after the second base, respectively. Any multiple polypeptide alignment was verified by pairwise comparison across vertebrate species; we annotated as orthologs the splicing sites having an identical phase number and identical sequence position. We used the same symbols adopted for human SDR variants by Persson et al. [3]. Symbols include SDR followed by the annotation number and a single letter: A (Atypical), C (Classical), E (Extended), U (Unknown). C and E denote the two major types of SDR family enzymes [3].
We identified the invertebrate orthologs of human SDR variants by the following procedures. First, we detected, by BLAST, the invertebrate protein having the highest sequence similarity with the human SDR protein variant. Then, we verified that the aminoacidic sequences included the SDR structure consensus: TGxxxGxG or TGxxGxxG and the catalytic consensus YxxxK, which are diagnostic of SDR superfamily variants [1][2][3]. SDR cofactor binding site and the catalysis active site consensus of the human SDR family are reported in the supplementary material (Online Resources 1: Tables S1 and S2). Lastly, we verified, in the invertebrate SDR variants, the presence of splicing site orthologs of those for the human variant used as BLAST probe.
Phylogenetic trees were constructed using the binarized matrix of splicing-site phases with the Wagner parsimony method, as implemented in the PARS algorithm of the software package PHYLIP version 3.6 [19]. We performed bootstrap analysis with 10,000 replications to estimate the strength of support for each clade. The same tree topologies were obtained by a Bayesian approach by BEAST 2.7.0.0 [20].

Gene Structure of Vertebrate and Invertebrate Variants of SDR Families
The human SDR families, classified by the relative identity values of their protein variants [3], may include members having either identical (SDR7C1 and SDR7C2), similar (SDR7C1 and SDR7C3), or completely different (SDR7C1, SDR7C4 and SDR7C5) splicing patterns ( Figure 1; Tables 1-3; Online Resources 2: Figures S1-S12, Tables S3-S14; Online  Resources 3: Figures S13-S24, Tables S15-S26).  Figure 1. Alignment of the human SDR7C family and SDR40C1 protein variants. × and + symbols mark the structure consensus and the catalysis consensus respectively. The couples of amino acid symbols in red mark the splicing-site positions. Splicing sites are progressively numbered, and phase (p.) type is indicated after the splicing-site number. * symbol marks the position of identical amino acid residues of the aligned SDR protein sequences. Variants of seven human SDR families retain an active site made of an insertion of two amino acids: Asparagine (N) and Serine (S). The four aminoacids (N-S-Y-K) are called the catalytic tetrad [3]. We found that the human SDR families with the catalytic tetrad are SDR9C, SDR12C, SDR16C, SDR25C, SDR26C SDR28C, SDR32C (respectively Figures S2, S5, S6 and S8-S11 in Online Resources 2). The members of these human families have invertebrate orthologues carrying identical catalytic tetrads (Online Resources 4: Figures S26-S36), suggesting a pre-vertebrate acquisition of these sites. Splicing site organizations and splicing phase numbers of human SDR7C (Online Resources 3: Figure S13, Table S15) and SDR21C (Online Resources 3: Figure S21, Table S23) variants are identical to those of their respective orthologs in each vertebrate class. Moreover, the other SDR families present a highly conserved splicing site organization, which is identical from fishes to humans. However, a species may carry a gene, belonging to a given family, which differs with shorter or longer exons, and/or lack one splicing site, and/or bears one or two extra splicing sites with respect to its human ortholog (Online Resources 3: Figures S13-S18 and S20-S24, Tables S15-S20 and S22-S26). Additionally, invertebrate SDR variants may have either identical, similar, or completely different gene structure with respect to the human SDR orthologs, identified by the sequence similarity of their codified polypeptides (Online Resources 4: Figures S25-S36, Tables S27-S50). Among invertebrate proteins, we only detected a polypeptide variant (in the sea urchin S. purpuratus), which has an identical gene structure with human SDR11E1 and SDR11E2 variants (Online Resources 4: Table S42), whereas variants of the other human SDR families show splicing patterns differing for the length of one or more exons, and/or by a different number of splicing sites (Online Resources 4: Figures S25-S36, Tables S27-S50).
The time of acquisition of the vertebrate gene structure can be inferred by the evolutionary position of C. intestinalis, a deuterostome considered a good approximation of the ancestral chordates living about 540 Mya. After the formation of the phylum, splicing patterns were conserved from fishes up to humans while genetic and protein sequences diverged (Online Resources 3: Figures S13-S24, Tables S15-S26; Online Resources 5: Figures S37-S48).
Conversely, four human variants, SDR7C4, SDR42E1 (Online Resources 4: Tables S39 and S50), SDR21C, and SDR21C2 (Online Resources 4: Table S45) have high identity values and completely different splicing patterns when compared to their respective invertebrate orthologs. These data show that, in SDR families, protein sequences and gene structures are phylogenetically uncoupled.
Interestingly, Saccharomyces cerevisiae genome has few introns, usually limited to one per gene [21]. We analyzed S. cerevisiae proteins having structure and catalysis consensuses diagnostic of the SDR family, but none of them had introns. Even though little evidence supports increased intron loss for paralogous gene families in plant and animal evolution [22], specific surveys on yeast suggest increased intron-loss over intron-gain events, with genes involved in metabolism, molecular transportation and enzyme activity regulation being more prone to introns loss [23]. Indeed, a recent study conducted on 263 fungal species, highlighted how the major evolutionary trend for intron changes in this kingdom, involves the loss of such sequences [24].
It is important to bear in mind, though, that analyses of Zrt-, Irt-like protein (ZIP) gene family, deemed as ancestral genes related to prion gene family evolution, revealed how intron conservation can be high when considering their relative positions in comparison of multiple sequences [25]. On the other hand, intron length conservation seems more diluted over evolutionary time scales, resulting in a heterogeneous set of intronic sequences length in ZIP genes [25].

Gene Structure of Human SDR7C and SDR42E Family Variants
Human SDR7C family has five non-allelic variants (Table 1; Online Resources 1: Figure S1). SDR7C1 and SDR7C2 variants have six splicing sites and an identical gene structure (Table 3). SDR7C3 variant has a very similar gene structure but only five splicing sites out of six are homologous to those of SDR7C1-2 variants (Table 3). SDR7C4 variant has only one splicing site, which has the same protein sequence position of the third splicing sites of SDR7C1, SDR7C2 and SDR7C3 variants, but a different phase number (Table 3). The SDR7C5 variant has four splicing sites and a gene structure completely different from those of all other human SDR7C family members (Table 3).
We found that human SDR40C1 variant, which is not included in the SDR7C family [3], has two splicing sites homologous to those of SDR7C1, SDR7C2 and SDR7C3 variants (Table 3), despite relatively low levels of identity with the other human SDR7C variants (26.3-31%, Table 4). Presumably, the low sequence homology prevented a correct assignment of the SDR40C1 variant into the SDR7C family by HMM-based clustering models [3]. Conversely, parsimony analyses based on the splicing-phase structure (Online resources 5: Figure S37) cluster the SDR40C1 variant within vertebrate and invertebrate orthologs of the SDR7C family with bootstrap support up to 92%. Proteins belonging to different superfamilies, which may have identities as low as 4% and whose homology can only be inferred from a similar 3D structure and function, may share splicing patterns [26].

Invertebrate Orthologs of Human SDR Family Variants
The splicing patterns of the human SDR7C family is the same in all vertebrate orthologs. Human SDR7C1, SDR7C2 and SDR7C3 variants have orthologs in several invertebrate species. S. purpuratus SDR7C-1C3 variant has a splicing phase identical to that of human SDR7C1 and SDR7C2 variants (Table 3). Therefore, it can be confidently considered the ancestral form. Human SDR7C4 variant has a single splicing site with zero phase number, the same protein sequence position of one splicing site of S. purpuratus SDR7C-1C3, B. malayi SDR7C-1C1 and A. californica SDR7C-1C3 variants. However, the invertebrate splicing sites have a different phase number (Table 3). Human SDR7C5 variant has diagnostic splicing sites and phases completely different from those of all other human SDR7C family members but shares orthologous splicing sites with invertebrate SDR7C variants. Some variants (SDR7C-1C1 and SDR7C-3C3 in C. intestinalis, SDR7C-2C2 in S. purpuratus, SDR7C-1C2 in A. californica) have orthologous splicing sites only with human SDR7C5 variant (Table 3), while others (SDR7C-1C2 and SDR7C-3C4 in C. intestinalis, SDR7C-1C1 in S. purpuratus) have orthologous splicing sites with human SDR7C5 as well as with human SDR7C1, SDR7C2 and SDR7C3 variants ( Table 3, Online resources 5: Figure S37). We speculate that human SDR7C1, SDR7C2, SDR7C3 and SDR7C5 variants originated from a common ancestral gene which differentiated early in invertebrate evolutionary history (Online Resources 4: Figure S25).
Four splicing sites, diagnostic of human SDR40C1 variant, have orthologous splicing sites in C. intestinalis SDR7C-3C2 variant, which has orthologous splicing sites only with human SDR40C1 variant (Table 3). However, S. purpuratus SDR7C-1C3 and A. californica SDR7C-1C3 variants have orthologous splicing sites with human SDR40C1 as well as SDR7C1, SDR7C2 and SDR7C3 variants (Table 3). Human SDR40C1 variant shares with C. intestinalis SDR7C-3C2 variant a 54.57% of the sequence and below 30% with the other invertebrate orthologous variants (Table 4). Parsimony analyses confirm they are closely related (Online resources 5: Figure S37). Thus, we assume that C. intestinalis SDR7C-3C2 is the closest variant to the most recent common ancestor of the human SDR40C1 variant. This gene may have evolved from an invertebrate gene, an ancestor of the human gene clade SDR40C1-SDR7C1-SDR7C2-SDR7C3. These data support our hypothesis that the SDR40C1 variant is a member of the SDR7C family.
Human SDR42E family has only two non-allelic variants: SDR42E1 and SDR42E2 (Table 5), which have one and ten splicing sites, respectively (Table 6). However, SDR42E1 and SDR42E2 variants have relatively high sequence similarity ( Figure 2, Table 7), with consensus regions spread in distinct blocks throughout the polypeptide molecule (Online Resources 2, Figure S12), and their gene structures are identical to those of all their respective vertebrate orthologs (Online Resources 3, Figure S24). Only the human SDR42E2 variant has orthologous splicing sites with the invertebrate SDR variants (Table 6). In particular, the ten human SDR42E2 splicing sites have orthologs in several invertebrate variants and all the ten splicing sites in C. intestinalis SDR42E-1E1 variant ( Table 6). Such splicing pattern is also phylogenetically supported (Online resources 5: Figure S48). Thus, the C. intestinalis SDR42E-1E1 gene may be the closest proxy of the more recent common ancestor of invertebrate and human SDR42E variants.  Table 6. Splicing-site organization of the human SDR42E family variants and their invertebrate orthologs. For further details see Table 1.

Human SDR7C4 and SDR42E1 Are Possibly Active Retrogenes
We speculate that human SDR7C4 and SDR42E1 genes are active retrogenes. Retrogenes are generated from processed mRNA, do not have splicing sites and are not transcribed. However, a certain number of retrogenes are functionally active [27,28]. They may have inherited the promoter bound to the coding sequence of the parental gene or accidentally acquired a new promoter from another gene when the retrogene sequence is inserted in the DNA of a given chromosome. To be heritable, the retrotransposition needs to occur in a germline or during early embryonic stages [28]. Many active retrogenes have been discovered in mammals, and many of them developed their functional role in the germ line. After retrotransposition, active retrogenes may acquire new introns [29].
Human SDR7C4 gene has the basic characteristics of an active retrogene. It is highly expressed in the testis (NCBI, gene ID: 57665), its chromosome localization is different from any other human SDR7C variant (Online Resources 2, Table S3) and it has a single splicing site, which does not have orthologs with identical phase number in the invertebrate variants (unlike other SDR7C variants). Moreover, the human SDR7C4 variant has high identity values with several invertebrate variants, orthologs of other human SDR7C variants (Table 4). We interpreted these data assuming that the human SDR7C4 variant was formed by retroposition of a metazoan ancestor gene before the chordate radiation.
Likewise, the human SDR42E1 gene has the basic characteristics of an active retrogene. SDR42E1 gene is highly synthetized in human testis, (NCBI, gene ID: 93517) and is localized on a chromosome different from those of SDR42E2 in vertebrate species except Catarrhines, where SDR42E1 and SDR42E2 are localized on the same chromosome. SDR42E1 has a single splicing site that is shared across all vertebrate species (Online Resources 3, Table  S26a) but does not have orthologs in the analyzed invertebrate variants (Table 6). Moreover, the human SDR42E1 variant shares slightly higher percent identity values with the invertebrate SDR42E2 ortholog variants than those of the human SDR42E2 variant (Table 7). These data suggest that human SDR42E1 variant has been generated by retrotransposition of an invertebrate ortholog of human SDR42E2 variant before the formation of the chordate phylum.
We could not find reported data about SDR42E1 and SDR42E2 chromosomal localizations in fish, amphibian, and reptilian species. However, vertebrate SDR42E1 and SDR42E2 variants have different chromosomal localization from birds up to early primate species (Callithrix jacchus and Microcebus murinus) and the same chromosomal localization in Catarrhini (Online Resources 3, Table S26a; Online Resources 6, Figures S49 and  S50, Table S51a). We interpreted these data assuming that, during evolution, a genomic rearrangement brought SDR42E1 and SDR42E2 genes on the same chromosome. This hypothesis is supported by the observation that, in Catarrhines, SDR42E1 and SDR42E2 gene loci have identical relative positions and comparable distances whereas SDR42E1 and SDR42E2 proteins have the same molecular characteristics of those of early primates and other vertebrate species which have SDR42E1 and SDR42E2 genes localized on different chromosomes (Online Resources 3, Table S26; Online Resources 6, Figures S49 and S50, Table S51a,b). These data suggest that human SDR42E1 is an active retrogene and not a duplicated form of SDR42E2 gene, thus adding a new case-study to stress the importance of retroposition in gene evolution [30][31][32][33][34].

Conclusions
A deeper insight into the molecular evolution of SDR gene families allowed us to resolve classification schemes and evolutionary patterns, regardless of the low sequence homology. The sequences of one member of the human SDR7C and SDR42E gene families retain traces of a very deep divergence time, at the root of chordate clade. The human SDR7C4 and SDR42E1 genes show the properties of an active retrogene, while the human SDR40C1 gene shows a conservative splicing formula which suggests its inclusion in the same protein family of the SDRC7 variants.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes14010110/s1, Figure S1: Alignment of the human SDR7C family and of SDR40C1 protein variants; Figure S2: Alignment of the human SDR9C family protein variants; Figure S3: Alignment of the human SDR10E protein variants; Figure S4: Alignment of the human SDR11E protein variants; Figure S5: Alignment of the human the SDR12C1 protein variants; Figure S6: Alignment of the human SDR16C protein variants; Figure S7: Alignment of the human SDR21C protein variants; Figure S8: Alignment of the human SDR25C protein variants; Figure S9: Alignment of the human SDR26C1 protein variants; Figure S10: Alignment of the human SDR28C protein variants; Figure S11: Alignment of the human SDR32C protein variants; Figure S12: Alignment of the human SDR42E protein variants; Figure S13: Alignment of the vertebrate SDR7C family and of SDR40C1 variants; Figure S14: Alignment of the vertebrate SDR9C family variants; Figure S15: Alignment of the vertebrate SDR10E family variants; Figure S16: Alignment of the vertebrate SDR11E family variants; Figure S17: Alignment of the vertebrate SDR12C family variants; Figure S18: Alignment of the vertebrate SDR16C family variants; Figure S19: Alignment of the vertebrate SDR21C family variants; Figure S20: Alignment of the vertebrate SDR25C family variants; Figure S21: Alignment of the vertebrate SDR26C family variants; Figure S22: Alignment of the vertebrate SDR28C family variants; Figure S23: Alignment of the vertebrate SDR32C family variants; Figure S24: Alignment of the vertebrate SDR42E family variants; Figure S25: Alignment of the invertebrate orthologs of the human SDR7C family and of the human SDR40C1 variants; Figure S26: Alignment of the invertebrate orthologs of the human SDR9C family variants; Figure S27: Alignment of the invertebrate orthologs of the human SDR10E1 family variants; Figure S28: Alignment of the invertebrate orthologs of the human SDR11E family variants; Figure S29: Alignment of the invertebrate orthologs of the human SDR12C family variants; Figure S30: Alignment of the invertebrate orthologs of the human SDR16C family variants; Figure S31: Alignment of the invertebrate orthologs of the human SDR21C family variants; Figure S32: Alignment of the invertebrate orthologs of the human SDR25C family variants; Figure S33: Alignment of the invertebrate orthologs of the human SDR26C family variants; Figure S34: Alignment of the invertebrate orthologs of the human SDR28C family variants; Figure S35: Alignment of the invertebrate orthologs of the human SDR32C family variants; Figure S36: Alignment of the invertebrate orthologs of the human SDR42E family variants; Figure S37: Alignment of the primate SDR42E protein variants; Figure S38: Alignment of the primate SDR42E2 proteins variants; Figure S39: Phylogram of SDR7C gene structures; Figure S40: Phylogram of SDR9C gene structures; Figure S41: Phylogram of SDR10E gene structures; Figure S42: Phylogram of SDR11E gene structures; Figure S43: Phylogram of SDR12C gene structures; Figure S44: Phylogram of SDR16C gene structures; Figure S45: Phylogram of SDR21C gene structures; Figure S46: Phylogram of SDR25C gene structures; Figure S47: Phylogram of SDR26C gene structures; Figure S48: Phylogram of SDR28C gene structures; Figure S49: Phylogram of SDR32C gene structures; Figure S50: Phylogram of SDR42E gene structures; Table S1: Structure and catalysis consensuses of the human SDR enzymes belonging to the Classical SDR families; Table S2: Structure and catalysis consensuses of the human SDR enzymes belonging to the Extended, Atypical and Unknown SDR families; Table S3a: Genetic and molecular data of the human SDR7C family and of the human SDR40C1 protein variants; Table S3b: The relative percent identity of the human SDR7C family and of the human SDR40C1 protein variants; Table S4a: Genetic and molecular data of the human SDR11E family protein variants; Table S4b: Relative percent identity of the SDR11E family protein variants; Table S5: Genetic and molecular data of the human SDR10E family protein variants; Table S6a: Genetic and molecular data of the human SDR9C family protein variants; Table S6b: Relative percent identity of the SDR9C family protein variants; Table S7a: Genetic and molecular data of the human SDR12C family protein variants; Table S7b: Relative percent identity of the SDR12C family protein variants; Table S8a: Genetic and molecular data of the human SDR17C family protein variants; Table S8b: Relative percent identity of the SDR16C family protein variants; Table S9: Genetic and molecular data of the human SDR21C family protein variants; Table S10: Genetic and molecular data of the human SDR25C family protein variants; Table S11: Genetic and molecular data of the human SDR26C family protein variants; Table S12: Genetic and molecular data of the human SDR28C family protein variants; Table S13: Genetic and molecular data of the human SDR32C family protein variants; Table S14: Genetic and molecular data of the human SDR42E family protein variants; Table S15: Genetic and molecular data of the vertebrate SDR7C family and of the SDR40C variants; Table S16: Genetic and molecular data of the vertebrate SDR9C family variants; Table S17: Genetic and molecular data of the vertebrate SDR10E family variants; Table S18: Genetic and molecular data of the vertebrate SDR11E family variants; Table S19: Genetic and molecular data of the vertebrate SDR12C family variants; Table S20: Genetic and molecular data of the vertebrate SDR16C family variants; Table S21: Genetic and molecular data of the vertebrate SDR21C family variants; Table  S22: Genetic and molecular data of the vertebrate SDR25C family variants; Table S23: Genetic and molecular data of the vertebrate SDR21C family variants; Table S24: Genetic and molecular data of the vertebrate SDR28C family variants; Table S25: Genetic and molecular data of the vertebrate SDR32C family variants; Table S26a: Genetic and molecular data of the vertebrate SDR42E family variants; Table S26b: Percent identity of the vertebrate SDR42E1 and SDR42E2 variants; Table S27: Genetic and molecular data of the invertebrate orthologs of the human SDR7C family and of the SDR40C1 variants; Table S28: Genetic and molecular data of the invertebrate orthologs of the human SDR9C family variants; Table S29: Genetic and molecular data of the invertebrate orthologs of the human SDR10E family variants; Table S30: Genetic and molecular data of the invertebrate orthologs of the human SDR11E family variants; Table S31: Genetic and molecular data of the invertebrate orthologs of the human SDR12C family variants; Table S32: Genetic and molecular data of the invertebrate orthologs of the human SDR16C family variants; Table S33: Genetic and molecular data of the invertebrate orthologs of the human SDR21C family variants; Table S34: Genetic and molecular data of the invertebrate orthologs of the human SDR25C family variants; Table S35: Genetic and molecular data of the invertebrate orthologs of the human SDR26C family variants; Table S36: Genetic and molecular data of the invertebrate orthologs of the human SDR28C family variants; Table S37: Genetic and molecular data of the invertebrate orthologs of the human SDR32C family variants; Table S38: Genetic and molecular data of the invertebrate orthologs of the human SDR42E family variants; Table  S39a: Splicing site organization of the human SDR7C family, of the SDR40C1 variants, and of their respective invertebrate orthologs; Table S39b: Percent identity values of the human SDR7C family, of the SDR40C1 variants and of their invertebrate orthologs; Table S40a: Splicing site organization of the human SDR9C family variants and of their invertebrate orthologs; Table S40b: Percent identity values of the human SDR9C family variants and of their invertebrate orthologs; Table S41a: Splicing site organization of the human SDR10E family variants and of their invertebrate orthologs; Table S41b: Percent identity values of the human SDR10E family variants and of their invertebrate orthologs; Table S42a: Splicing site organization of the human SDR11E family variants and of their invertebrate orthologs; Table S42b: Percent identity values of the human SDR11E family variants and of their invertebrate orthologs; Table S43a: Splicing site organization of the human SDR12C family variants and of their invertebrate orthologs; Table S43b: Percent identity values of the human SDR12C family variants and of their invertebrate orthologs; Table S44a: Splicing site organization of the human SDR16C family variants and of their invertebrate orthologs; Table S44b: Percent identity values of the human SDR16C family variants and of their invertebrate orthologs; Table S45a: Splicing site organization of the human SDR21C family variants and of their invertebrate orthologs; Table S45b: Percent identity values of the human SDR21C family variants and of their invertebrate orthologs; Table S46a: Splicing site organization of the human SDR25C family variants and of their invertebrate orthologs; Table S46b: Percent identity values of the human SDR25C family variants and of their invertebrate orthologs; Table S47a: Splicing site organization of the human SDR26C family variants and of their invertebrate orthologs; Table S47b: Percent identity values of the human SDR26C family variants and of their invertebrate orthologs; Table S48a: Splicing site organization of the human SDR28C family variants and of their invertebrate orthologs; Table S48b: Percent identity values of the human SDR28C family variants and of their invertebrate orthologs; Table S49a: Splicing site organization of the human SDR32C family variants and of their invertebrate orthologs; Table S49b: Percent identity values of the human SDR32C family variants and of their invertebrate orthologs; Table S50a: Splicing site organization of the human SDR42E family variants and of their invertebrate orthologs; Table S50b: Percent identity values of the human SDR42E family variants and of their invertebrate orthologs; Table S51a: Genetic and molecular data of primate SDR42E family variants; Table S51b: Relative percent identity of primate SDR42E family protein variants. References [35,36] are cited in Supplementary Material.
Author Contributions: F.G. contributed to the study conception and design, to data collection and analysis, to the preparation of figures and tables. S.T. handled the phylogenetic analyses. The first draft of the manuscript was written by F.G., S.T. and M.A. contributed to write down the final version. All authors have read and agreed to the published version of the manuscript.