Ceramide Synthase 6: Comparative Analysis, Phylogeny and Evolution

Ceramide synthase 6 (CerS6, also known as LASS6) is one of the six members of ceramide synthase gene family in humans. Comparisons of CerS6 amino acid sequences and structures as well as of CerS6 gene structures/locations were conducted using data from several vertebrate genome projects. A specific role for the CerS6 gene and protein has been identified as the endoplasmic reticulum C14- and C16-ceramide synthase. Mammalian CerS6 proteins share 90–100% similarity among different species, but are only 22–63% similar to other CerS family members, suggesting that CerS6 is a distinct gene family. Sequence alignments, predicted transmembrane, lumenal and cytoplasmic segments and N-glycosylation sites were also investigated, resulting in identification of the key conserved residues, including the active site as well as C-terminus acidic and serine residues. Mammalian CerS6 genes contain ten exons, are primarily located on the positive strands and transcribed as two major isoforms. The human CERS6 gene promoter harbors a large CpG island (94 CpGs) and multiple transcription factor binding sites (TFBS), which support precise transcriptional regulation and signaling functions. Additional regulation is conferred by 15 microRNA (miRNA) target sites identified in the CERS6 3′-UTR region. Phylogenetic analysis of the vertebrate CerS1–6 gene families relationships supports a major role for the CerS6 enzyme that is strongly conserved throughout vertebrate evolution.


Introduction
Mammalian ceramide synthase (CerS) genes were initially discovered by the homology search of human genome using the Lag1p motif from the yeast LAG1 gene [1]. In yeast, LAG1 was shown to regulate the life span [2] and support ceramide biosynthesis [3], and as such its homolog LAC1 and other homologous genes were dubbed LASSes (longevity associated genes). These genes were found in most species varying in numbers, with both mice and humans having six paralogs: LASS 1-6 [4]. Further studies of the mammalian LASS genes function revealed that the key feature of the proteins encoded by these genes is the ability to synthesize ceramide, and consequently the nomenclature was changed to CerS 1-6 [4]. Importantly, each of the ceramide synthases was shown to utilize a subset of acyl-CoAs, thus producing ceramides with specific acyl chain length [5,6]. Ceramides play a central role in sphingolipid metabolism and function as important components defining biophysical properties of cellular membranes [7]. In the past decades, distinct roles of ceramides as signaling molecules and as mediators of apoptosis and inflammatory responses have also been established [5,6,8]. It should be noted, that though CerSes generate different ceramide species, their genes show similarities in sequence and structure, and the corresponding protein products display similar domain organization, catalytic properties and subcellular localization. At the same time, different ceramide synthases show different biological properties and play vastly diverse roles in numerous biological processes [5]. CerS1 catalyzes the synthesis of C 18 acyl chain ceramides and plays a major role in the development of brain and other neural tissue [1,9]. CerS2 preferentially catalyzes the synthesis of C 20 -C 26 acyl chain ceramides which are essential during development. Accordingly, CerS2 null mice were reported as having chronic alterations of both liver and brain physiology with increased hepatocyte apoptosis and turnover, chronic and progressive myelin degeneration as well as membrane material accumulation in lysosomes [10,11]. CerS3 is predominantly expressed in testis and skin and exhibits a broad acyl chain preference synthesizing C 26 -C 32 acyl chain ceramides. In agreement with very specific function of this enzyme, CerS3 null mice die soon after birth due to the loss of skin barrier function [12,13]. CerS4, which is specific for C 18 -C 22 acyl-CoA, is found at high levels in the skin epidermis, heart, skeletal muscle, liver, lung and white adipose tissue and may play a role in regulating stem cell homeostasis, with Cers4 null mice exhibiting major reductions in C 20 acyl chain ceramide levels [14,15]. CerS5 is the major ceramide synthase in lung epithelial cells and in the white and grey matter of brain [5]. It is essential for maintaining C 16 ceramide pools and may contribute to the development of diet induced obesity [16,17].
The last member of the mammalian ceramide synthase family, CerS6 was first reported by Weinmann et al. [18] in 2005. This enzyme generates C 14 -and C 16 -ceramides and phylogenetically is most closely related to CerS5 [5]. Cers6 knockout mice demonstrated a significant decrease in C 16 -containing sphingolipids that was accompanied by behavioral abnormalities [19], protection from high-fat diet-induced obesity and glucose intolerance [20], as well as protection from colitis [21,22] and neutrophils activation [23]. Moreover, CerS6 expression was required for optimal T cell activation, proliferation and cytokine production in response to alloantigen and for subsequent induction of graft-versus-host disease [24]. Additionally, activation of CerS6 and subsequent elevation of C 16 -ceramide were demonstrated in response to folate stress [25], serum starvation [26], deficiency of cytochrome c oxidase [27] and methotrexate [28]. Despite high homology and identical substrate preference of this enzyme to CerS5, which to some extent has been studied with regard to its membrane organization and structural basis of specificity [29,30], the structure and function of CerS6 are still not well investigated. In the present study we used bioinformatic and phylogenetic methods to examine the structures of vertebrate CerS6 genes and enzymes and their evolutionary relationship with the other CerS genes as well as ancestral yeast and invertebrate genes.

Comparison of Vertebrate CerS6 Amino Acid Sequences and their Relation to Human CERS1-5 Sequences
Examination of the non-redundant protein sequence databases for several vertebrate genomes using basic local alignment search tool (BLAST) analyses for CerS6 amino acid sequence, as well as BLAST-like alignment tool (BLAT) analyses of the predicted CerS6 sequences using the University of California, Santa Cruz (UCSC) genome browser allowed us to predict locations for these genes, including exon boundary locations and gene sizes. Comparison of predicted gene and protein characteristics to their human and mouse counterparts is presented in Table 1. The human and other mammalian CerS6 sequences examined were 90% or more identical, suggesting that these are the products of the same family of genes. UNIPROT IDs (Swiss Institute of Bioinformatics, http://kr.expasy.org) and GenBank IDs (NIH, Bethesda, MD, USA, http://www.ncbi.nlm.nih.gov/genbank/) provide the sources for the gene and protein sequence data; gene sizes are given as base pairs of nucleotides; +strand and −strand refer to the transcribed strand; exons refer to coding exons; MW refers to predicted molecular weight; pI refers to theoretical isoelectric point; * predicted gene sequences;ˆchromosome location not available; > sequence is slightly different from another database sequence.
All primate CerS6 sequences included 392 amino acid residues and were more than 99% identical to human protein. The other eutherian mammalian CerS6 sequences showed the identity to human enzyme of 95-99% and the polypeptide chains were 391-396 amino acid residues long. Alignment of two human transcripts [18,19] baboon (Papio anubis), mouse [19], chicken (Gallus gallus), frog (Xenopus tropicalis) and zebra fish (Danio rerio) CerS6 protein sequences are shown in Figure 1. Baboon, chicken, frog, and zebra fish sequences were deduced by our analyses. Enzyme topology with predicted major domains determined using TMHMM Server v.2.0 [31,32] is shown in Figures 1 and 2. All primate CerS6 sequences included 392 amino acid residues and were more than 99% identical to human protein. The other eutherian mammalian CerS6 sequences showed the identity to human enzyme of 95-99% and the polypeptide chains were 391-396 amino acid residues long. Alignment of two human transcripts [18,19] baboon (Papio anubis), mouse [19], chicken (Gallus gallus), frog (Xenopus tropicalis) and zebra fish (Danio rerio) CerS6 protein sequences are shown in Figure 1. Baboon, chicken, frog, and zebra fish sequences were deduced by our analyses. Enzyme topology with predicted major domains determined using TMHMM Server v.2.0 [31,32] is shown in Figures 1 and 2.  Table 1 for sources of sequences; * shows identical residues for polypeptides; -similar alternate residues; -dissimilar alternate residues; exon start sites are underlined and in bold; predicted lumenal sites are shown in blue; predicted transmembrane sequences are in pink and numbered TM1-5 from N-terminal end; predicted cytoplasmic regions are in yellow; a predicted homeobox DNA binding domain (residues 66-127) is shown by black arrows; the Lag 1p motif (residues 202-253) is denoted by violet arrows; two predicted N-glycosylation sites in lumenal regions are shown in red; the two adjacent histidine residues (His) in the active site are in magenta; the acidic C-terminal domain is indicated, with acidic amino acid residues marked in brown and C-terminal serine (Ser) residues subject to phosphorylation shown in grey; 21 conserved residues for all CerS1-6 and yeast LAC1 sequences are shown in cyan; zfish refers to the zebra fish sequence. Note differences in sequences between human CERS6.1 and CERS6.2 isoforms. Sequence alignment has been completed using CLUSTAL Omega (EMBL-EBI, Hinxton, Cambridgeshire, UK, https://www.ebi.ac.uk/Tools/msa/clustalo/).  Table 1 for sources of sequences; * shows identical residues for polypeptides; -similar alternate residues; -dissimilar alternate residues; exon start sites are underlined and in bold; predicted lumenal sites are shown in blue; predicted transmembrane sequences are in pink and numbered TM1-5 from N-terminal end; predicted cytoplasmic regions are in yellow; a predicted homeobox DNA binding domain (residues 66-127) is shown by black arrows; the Lag 1p motif (residues 202-253) is denoted by violet arrows; two predicted N-glycosylation sites in lumenal regions are shown in red; the two adjacent histidine residues (His) in the active site are in magenta; the acidic C-terminal domain is indicated, with acidic amino acid residues marked in brown and C-terminal serine (Ser) residues subject to phosphorylation shown in grey; 21 conserved residues for all CerS1-6 and yeast LAC1 sequences are shown in cyan; zfish refers to the zebra fish sequence. Note differences in sequences between human CERS6.1 and CERS6.2 isoforms. Sequence alignment has been completed using CLUSTAL Omega (EMBL-EBI, Hinxton, Cambridgeshire, UK, https://www.ebi.ac.uk/Tools/msa/clustalo/). Our analysis suggests that CERS6 is an integral membrane protein and has three domains located within the lumen of the endoplasmic reticulum (residues 1-37, 199-206, 285-302), five transmembrane regions (designated TM1-5, residues 38-55, 178-198, 207-224, 262-284, 303-323, correspondingly) and three cytoplasmic regions (residues 56-177, 225-261, 324-392). Such domain organization coincides with the predictions made earlier using different bioinformatic tools, TopPredII and UniProt [30,33]. Currently, no experimental evidence for the precise topology of CerS6 has been presented, and no crystal structures for any of the CerS enzymes exist. Thus, it cannot be excluded that the precise topological arrangements of CERS6 may differ from the predicted here. Indeed, alignment of all six human ceramide synthases using PSI/TM-Coffee software predicted six transmembrane domains for CERS6 and quite different distribution of lumenal and cytoplasmic regions (shown in Figure 3). The TMHMM Server v.2 analysis of CERS6 ( Figure 1) is included for comparison in the same alignment (rectangular frame) and demonstrates the first cytoplasmic region that is 42 amino acids longer, as well as lumenal orientation of the region homologous to the second cytoplasmic domain of CERSes 1-4. Similar controversy was observed in the membrane topology studies of CERS5 [29]. Four out of fifteen software algorithms used for the topology analyses predicted five transmembrane domains for CERS5, while eleven other predicted six transmembrane domains. It should also be noted that PSI/TM-Coffee analysis (T-Coffee Server, Centre for Genomic Regulation, Barcelona, Spain) indicated direct connection of lumenal and cytoplasmic domains for CERS1, CERS2, CERS3 and CERS6, which are sterically improbable. Experimental interrogation of the mammalian CerSes topology, similar to that published for yeast CerS enzymes [34] could help resolve the prediction inconsistencies. Our analysis suggests that CERS6 is an integral membrane protein and has three domains located within the lumen of the endoplasmic reticulum (residues 1-37, 199-206, 285-302), five transmembrane regions (designated TM1-5, residues 38-55, 178-198, 207-224, 262-284, 303-323, correspondingly) and three cytoplasmic regions (residues 56-177, 225-261, 324-392). Such domain organization coincides with the predictions made earlier using different bioinformatic tools, TopPredII and UniProt [30,33]. Currently, no experimental evidence for the precise topology of CerS6 has been presented, and no crystal structures for any of the CerS enzymes exist. Thus, it cannot be excluded that the precise topological arrangements of CERS6 may differ from the predicted here. Indeed, alignment of all six human ceramide synthases using PSI/TM-Coffee software predicted six transmembrane domains for CERS6 and quite different distribution of lumenal and cytoplasmic regions (shown in Figure 3). The TMHMM Server v.2 analysis of CERS6 ( Figure 1) is included for comparison in the same alignment (rectangular frame) and demonstrates the first cytoplasmic region that is 42 amino acids longer, as well as lumenal orientation of the region homologous to the second cytoplasmic domain of CERSes 1-4. Similar controversy was observed in the membrane topology studies of CERS5 [29]. Four out of fifteen software algorithms used for the topology analyses predicted five transmembrane domains for CERS5, while eleven other predicted six transmembrane domains. It should also be noted that PSI/TM-Coffee analysis (T-Coffee Server, Centre for Genomic Regulation, Barcelona, Spain) indicated direct connection of lumenal and cytoplasmic domains for CERS1, CERS2, CERS3 and CERS6, which are sterically improbable. Experimental interrogation of the mammalian CerSes topology, similar to that published for yeast CerS enzymes [34] could help resolve the prediction inconsistencies.  Tables  1 and 3. The PSI/TM-Coffee software was used to align sequences and map membrane domains. Predicted lumenal sites are shown in blue; predicted transmembrane sequences are in pink and predicted cytoplasmic regions are in yellow; *, shows identical residues for all polypeptides; : -similar alternate residues; . -dissimilar alternate residues. Rectangular frame shows the same CerS6 sequence with the alternative domain topology prediction (TMHMM Server 2).
CerS6 sequence alignment ( Figure 1) and analysis have identified a number of specific amino acid residues conserved among the vertebrates: the 52 amino acid conserved sequence (Lag1p motif) shown to be important for the ceramide synthase activity [1,34,35]; the two adjacent histidine residues (His211 and His212) located within TM3 and also important for CerS CERS activity (first reported for CerS1 [35]); Lys134, mutation of which resulted in a 50% loss of activity, as well as residues comprising a functional Hox domain region (66-127) of which 12 terminal residues were shown to be important for enzyme activities in CerS5 and CerS6 only [36]. Recently a 150 residue region in the TLC domain of CerS enzymes (amino acids 159-309 for CerS5) has been identified as a determinant of substrate specificity [29]. Additionally, two putative N-glycosylation sites located within predicted lumenal domains (18NVT, 285NTT) were identified, similar to the previous work where experimental evidence of glycosylation of the N18 only was obtained [30]. Furthermore, C-terminal serine residues were identified as potential sites for CerS6 phosphorylation: S336, S341, S345, S346 and S347. Phosphorylation of these residues has been reported previously for CerS2-6 and was shown to contribute to the maintenance of catalytic activities [37] as well as stimulate Lac1/Lag1 activity and formation of complex sphingolipids in yeast (inositol-phosphate containing) [38].
Additionally, sequence comparison revealed several groups of distinct conserved residues the function of which has not been defined yet. Among these are the three tyrosine's, 180-182, located within the second transmembrane domain and the strictly conserved stretches of acidic amino acids in C-terminus including Asp346, Asp347, Asp350, Glu352, Asp356, Glu357, Glu358 and Asp359 (Figure 1) which may be important for the CerS6 carboxyl-terminus function within the cytoplasm. Additional residues with unknown function found to be conserved not only between the vertebrate CerS6 sequences but in nematode (LAGR1 and HYL1) and yeast LAC1 sequences as well, are: Phe117 and Glu124 in the first cytoplasmic domain; Tyr182 and Phe188 in TM2; Arg202-Lys203-Asp204 in the second lumenal region; Leu218 and Ser222 located within TM3; Gly231, Asp239, Asp242, Leu244, Lys.249 and Tyr253 in the second cytoplasmic segment; Phe267, Arg275 and Leu284 in TM4; as well as  Table 1 and Table 3. The PSI/TM-Coffee software was used to align sequences and map membrane domains. Predicted lumenal sites are shown in blue; predicted transmembrane sequences are in pink and predicted cytoplasmic regions are in yellow; *, shows identical residues for all polypeptides; :similar alternate residues; . -dissimilar alternate residues. Rectangular frame shows the same CerS6 sequence with the alternative domain topology prediction (TMHMM Server 2).
CerS6 sequence alignment ( Figure 1) and analysis have identified a number of specific amino acid residues conserved among the vertebrates: the 52 amino acid conserved sequence (Lag1p motif) shown to be important for the ceramide synthase activity [1,34,35]; the two adjacent histidine residues (His211 and His212) located within TM3 and also important for CerS CERS activity (first reported for CerS1 [35]); Lys134, mutation of which resulted in a 50% loss of activity, as well as residues comprising a functional Hox domain region (66-127) of which 12 terminal residues were shown to be important for enzyme activities in CerS5 and CerS6 only [36]. Recently a 150 residue region in the TLC domain of CerS enzymes (amino acids 159-309 for CerS5) has been identified as a determinant of substrate specificity [29]. Additionally, two putative N-glycosylation sites located within predicted lumenal domains (18NVT, 285NTT) were identified, similar to the previous work where experimental evidence of glycosylation of the N18 only was obtained [30]. Furthermore, C-terminal serine residues were identified as potential sites for CerS6 phosphorylation: S336, S341, S345, S346 and S347. Phosphorylation of these residues has been reported previously for CerS2-6 and was shown to contribute to the maintenance of catalytic activities [37] as well as stimulate Lac1/Lag1 activity and formation of complex sphingolipids in yeast (inositol-phosphate containing) [38].
Additionally, sequence comparison revealed several groups of distinct conserved residues the function of which has not been defined yet. Among these are the three tyrosine's, 180-182, located within the second transmembrane domain and the strictly conserved stretches of acidic amino acids in C-terminus including Asp346, Asp347, Asp350, Glu352, Asp356, Glu357, Glu358 and Asp359 (Figure 1) which may be important for the CerS6 carboxyl-terminus function within the cytoplasm. Additional residues with unknown function found to be conserved not only between the vertebrate CerS6 sequences but in nematode (LAGR1 and HYL1) and yeast LAC1 sequences as well, are: Phe117 and Glu124 in the first cytoplasmic domain; Tyr182 and Phe188 in TM2; Arg202-Lys203-Asp204 in the second lumenal region; Leu218 and Ser222 located within TM3; Gly231, Asp239, Asp242, Leu244, Lys.249 and Tyr253 in the second cytoplasmic segment; Phe267, Arg275 and Leu284 in TM4; as well as Trp318 and Ile322 in TM5. The future structure-function studies will explain the reason for their conservation in CerS6 evolution.
In agreement with high overall sequence similarity of CerS6 enzymes, comparisons of theoretical isoelectric points for eutherian mammalian proteins showed that a theoretical isoelectric point (pI) value of 8.0 was consistent for all of these enzymes, with the exception of baboon CerS6 (pI = 7.1) ( Table 1). This difference is apparently explained by the substitution of a basic amino acid (Lys154, pI = 9.03) in human CERS6 for an acidic Glu154 (pI = 5.03) in baboon enzyme.

CERS6 Gene Expression
Two major transcripts have been identified for CERS6, (CERS6.1 and CERS6.2), with the second isoform encoding an enzyme with 384 residues, lacking eight amino acids (Ala335-His342) within the C-terminal cytoplasmic domain, but retaining key active site and transmembrane sequences. Interestingly, only one transcript has been identified previously for CERS6 [33], and the properties and function of the short form identified here are not known at present. The shorter CerS6.2 was routinely observed for all mammalian and chicken BLAST analyses undertaken, suggesting that two CerS6 isoforms were retained throughout mammalian evolution.
Tissue expression analysis of human CERS6 transcripts demonstrated a wide distribution profile, with highest levels observed in transformed lymphocytes and fibroblasts, as well as in skin, small intestine, cerebellar hemispheres and frontal cortex, heart ventricle, ovary, stomach and uterus ( Figure 4). This is indicative of a broad role throughout the body for this enzyme in the biosynthesis of C 14 and C 16 ceramides that are required for maintaining normal membrane function and fluidity and for variety of signaling processes [39].   Table 1 summarizes comparisons of chromosomal locations, exonic structures and sizes for vertebrate CerS6 genes. The human CERS6 gene is located on the positive strand of human chromosome 2, contains ten exons, and spans > 300 Kbps of DNA (CERS6 gene structure is depicted in Figure 5). It is the largest of the human CERS genes ranging from 9.8 Kbps (CERS2), 27.6 Kbps (CERS1), 37.6 Kbps (CERS5), 50 Kbps (CERS4), 143.9 Kbps (CERS3) to 313.2 Kbps (CERS6) (see Tables 1 and 2). Interestingly, the mammalian CerS6 genes are 3-5 times larger than the bird, lizard, frog or fish CerS6 genes, though the number of exons and amino acids are preserved ( Table 1). The human CERS genes are located on separate chromosomes, with the exception of CERS1 and CERS4 which are distantly located on human chromosome 19 (Tables 1 and 2), consistent with previous report [5]. While human CERS1 gene has seven exons, the other human CERSes contain nine (CERS2) or ten (CERS3, CERS4, CERS5, CERS6) exons. All of the mammalian CerS6 genes examined also contained ten coding exons, were larger in size (>243 Kbps for rat Cers6) than the other human CERS genes and were encoded on the positive DNA strand (with the exception of marmoset CerS6) ( Table 1).  Table 1 summarizes comparisons of chromosomal locations, exonic structures and sizes for vertebrate CerS6 genes. The human CERS6 gene is located on the positive strand of human chromosome 2, contains ten exons, and spans > 300 Kbps of DNA (CERS6 gene structure is depicted in Figure 5). It is the largest of the human CERS genes ranging from 9.8 Kbps (CERS2), 27.6 Kbps (CERS1), 37.6 Kbps (CERS5), 50 Kbps (CERS4), 143.9 Kbps (CERS3) to 313.2 Kbps (CERS6) (see Tables 1 and 2). Interestingly, the mammalian CerS6 genes are 3-5 times larger than the bird, lizard, frog or fish CerS6 genes, though the number of exons and amino acids are preserved ( Table 1). The human CERS genes are located on separate chromosomes, with the exception of CERS1 and CERS4 which are distantly located on human chromosome 19 (Tables 1 and 2), consistent with previous report [5]. While human CERS1 gene has seven exons, the other human CERSes contain nine (CERS2) or ten (CERS3, CERS4, CERS5, CERS6) exons. All of the mammalian CerS6 genes examined also contained ten coding exons, were larger in size (>243 Kbps for rat Cers6) than the other human CERS genes and were encoded on the positive DNA strand (with the exception of marmoset CerS6) ( Table 1). Figure 5. Structure of the human CERS6 gene. Derived from the AceView website [40]; the major isoform variant is shown with capped 5′-and 3′-ends for the predicted mRNA sequences; introns (pink lines) and exons (pink boxes) are shown; the length of the messenger RNA (mRNAs) (as kilobases or kb) are shown; a CpG island (CpG94) is shown for the CERS6 promoter; 15 miRNA binding sites were identified for the 3′UTR of the human CERS6 gene; the direction for transcription is shown; TFBS refers to transcription factor binding sites located within the CERS6 promoter; CERS6-AS1 refers to an antisense gene located proximal to CERS6. Figure 5. Structure of the human CERS6 gene. Derived from the AceView website [40]; the major isoform variant is shown with capped 5 -and 3 -ends for the predicted mRNA sequences; introns (pink lines) and exons (pink boxes) are shown; the length of the messenger RNA (mRNAs) (as kilobases or kb) are shown; a CpG island (CpG94) is shown for the CERS6 promoter; 15 miRNA binding sites were identified for the 3 UTR of the human CERS6 gene; the direction for transcription is shown; TFBS refers to transcription factor binding sites located within the CERS6 promoter; CERS6-AS1 refers to an antisense gene located proximal to CERS6. UNIPROT IDs (Swiss Institute of Bioinformatics, http://kr.expasy.org) and GenBank IDs (NIH, Bethesda, MD, USA, http://www.ncbi.nlm.nih.gov/genbank/) provide the sources for the gene and protein sequence data; gene sizes are given as base pairs of nucleotides; +strand and −strand refer to the transcribed strand; exons refer to coding exons; pI refers to theoretical isoelectric points; * predicted gene sequences;ˆchromosome location not available; > sequence slightly different from another database.

Comparative Analysis of Vertebrate CerS6 Genes and Human CERS1-5 Genes
Compared to human CERS6 protein, other human ceramide synthases had slightly varying lengths: CERS1 (350 amino acids, aa), CERS2 (380 aa), CERS3 (383 aa), CERS4 (394 aa) and CERS5 (392 aa) (Tables 1 and 2). The human and other mammalian CerS6 sequences examined were 90% or more identical, suggesting that these are the products of the same family of genes. However, pair-wise comparisons of sequence identities of the human CERS6 and human CERS1-5 proteins showed identities within 22-63% range ( Table 3), indicating that these orthologs are members of distinct CerS gene families (Tables 1 and 2). The human CERS5 and CERS6 sequences were 62% identical, while the identity to human CERS1-4 enzymes was lower, indicating that CERS5 and 6 are more closely related to each other than to other human CERS enzymes, as previously reported [18]. The human CERS1 sequence was more distantly related with other human CERS homologs (22-27% identity) whereas human CERS2 and CERS3 sequences showed more similarity with each other (51% identity) than with other human CERSes ( Table 3). The sequence alignment (Figure 3) of all six human CERS enzymes demonstrates that the first cytoplasmic domain and the last 30 amino acid stretch of the c-terminal cytoplasmic domain show the lowest identity between the family members, while the transmembrane and lumenal domains have higher identities. Table 3. Sequence identities (%) for human CERS1-6 enzymes. Three distinct human CERS sub-families were identified based on amino acid sequence comparison: CERS1, (22-27%); CERS2-4, (37-51%); and CERS5-6, (62%).  CerS1  100  27  25  31  27  29   CerS2  27  100  53  49  41  41   CerS3  25  53  100  45  40  39   CerS4  31  49  45  100  43  42   CerS5  27  41  40  43  100  63   CerS6  19  41  39  42  63  100 Interestingly, a CERS6-AS1 (antisense) gene was located proximate to human CERS6 on the minus strand of chromosome 2 ( Figure 5). The non-coding transcription is not rare in biological systems: it takes place in organisms from all kingdoms of life and 25-40% of human expressed sequences are connected to antisense transcription [41,42]. Natural antisense transcripts were shown to function in early embryonic development, healthy growth and disease. For example, p53 tumor suppressor was shown to be regulated by the anti-sense transcript Wrap53 [43]. Thus, presence of CERS6-AS1 indicates that it may play a role in regulating expression of CERS6 in different tissues of the body or during development. Figure 5 also depicts the exonic and intronic structure of the human CERS6 gene, together with the presence of a CpG island containing 94 CpGs and several transcription factor-binding sites within the untranslated 5 region, as well as binding sites for 15 microRNAs within the 3 untranslated region of this gene.

Human CerSes
It should be noted, that Figure 5 shows the regulatory elements (CpG island, transcription factors and miRNAs) identified by the on-line tools, which may not reflect all regulatory elements. Thus, recent study from our laboratory has experimentally demonstrated that human CERS6 is transcriptionally regulated by the p53 protein via direct binding to a non-canonical site in the CERS6 promoter region [44]. It is apparent that with one experimentally established and eight predicted transcription factor binding sites, a CpG island and 15 miRNA-binding sites, the CERS6 gene has multiple mechanisms for regulation of its transcription and translation. Unfortunately, most of these mechanisms which may play significant roles during development or stress response are not studied.

Evolution of Vertebrate CerS1-6 Genes from the Invertebrate Lagr1 and Hyl1 Genes
A phylogenetic tree ( Figure 6) was generated by the progressive alignment of four vertebrate CerS6 sequences, several vertebrate CerS1-5 amino acid sequences and nematode (Caenorhabditis elegans) Hyl1 and Lagr1 sequences, with the baker's yeast (Saccharomyces cerevisiae) Lac1 sequence [34], which was also used to root the tree. Details of the vertebrate, nematode and yeast CerS-like genes and enzymes are presented in Tables 1 and 3. Interestingly, the rat CerS5 gene is located on X chromosome, a feature that has not been reported previously. The phylogram ( Figure 6) shows clustering of the CerS-like sequences into two groups, which is consistent with their evolutionary relatedness and separate groupings as distinct CerS1-like and CerS2-6-like genes and enzymes. It is apparent that the CerS1 gene evolved from a distinct invertebrate CerS-like gene family LAGR1 which has been retained throughout vertebrate and invertebrate evolution. In contrast, the invertebrate HYL1-like gene family has undergone several gene duplication events prior to the appearance of the ancestral vertebrate genome, forming the CerS2-6 gene families, also retained throughout vertebrate evolution.  [34], which was also used to root the tree. Details of the vertebrate, nematode and yeast CerS-like genes and enzymes are presented in Tables 1 and 3. Interestingly, the rat CerS5 gene is located on X chromosome, a feature that has not been reported previously. The phylogram ( Figure 6) shows clustering of the CerS-like sequences into two groups, which is consistent with their evolutionary relatedness and separate groupings as distinct CerS1-like and CerS2-6-like genes and enzymes. It is apparent that the CerS1 gene evolved from a distinct invertebrate CerS-like gene family LAGR1 which has been retained throughout vertebrate and invertebrate evolution. In contrast, the invertebrate HYL1-like gene family has undergone several gene duplication events prior to the appearance of the ancestral vertebrate genome, forming the CerS2-6 gene families, also retained throughout vertebrate evolution.  Tables 1 and 2 for details. Note the clusters corresponding to the CerS1-6 gene families. A genetic distance scale is shown. The number of times a clade (sequences common to a node or branch) occurred in the bootstrap replicates are shown. Replicate values of 0.9 or more, which are highly significant, are shown with 100 bootstrap replicates performed in each case. A proposed sequence of gene evolution events is shown arising from ancestral yeast LAC1 and invertebrate LAGR1 and HYL1 genes.
Among these gene families, CerS5/CerS6 and CerS2/CerS3 gene families are more closely related to CerS4 than to each other. It is also apparent that the evolution of the mammalian CerS6 genes has occurred at a more conservative rate than of the mammalian CerS5 genes, given the reduced genetic distances observed for the primate CerS6 encoded sequences, in comparison with the primate CerS5 encoded sequences. Stronger conservation of the CerS6 gene family could reflect an important additional role of the enzyme, considering a very similar acyl-CoA specificity of CerS5 and CerS6.  Tables 1 and 2 for details. Note the clusters corresponding to the CerS1-6 gene families. A genetic distance scale is shown. The number of times a clade (sequences common to a node or branch) occurred in the bootstrap replicates are shown. Replicate values of 0.9 or more, which are highly significant, are shown with 100 bootstrap replicates performed in each case. A proposed sequence of gene evolution events is shown arising from ancestral yeast LAC1 and invertebrate LAGR1 and HYL1 genes.
Among these gene families, CerS5/CerS6 and CerS2/CerS3 gene families are more closely related to CerS4 than to each other. It is also apparent that the evolution of the mammalian CerS6 genes has occurred at a more conservative rate than of the mammalian CerS5 genes, given the reduced genetic distances observed for the primate CerS6 encoded sequences, in comparison with the primate CerS5 encoded sequences. Stronger conservation of the CerS6 gene family could reflect an important additional role of the enzyme, considering a very similar acyl-CoA specificity of CerS5 and CerS6. The phylogenetic tree also suggests that the invertebrate CerS5 gene may have served as a primordial ancestor for both of the vertebrate CerS5 and CerS6 genes.

Predicted Structures and Properties of Mammalian CerS6 Enzymes
Molecular weights, isoelectric points and predicted location of transmembrane, lumenal and cytosolic domains for mammalian and chicken CerS6 proteins were obtained using Expasy web tools [31]. The identification of conserved domains for CerS6 proteins was conducted using NCBI web tools [32]. Analysis of CerS6 domain organization was performed using TMHMM Server v. 2.0 from DTU Bioinformatics (Kongens Lyngby, Denmark, http://www.cbs.dtu.dk/services/TMHMM/). Alignment of human Cers1-6 enzymes and prediction of the membrane, cytoplasmic and lumenal regions in all human CerS enzymes was performed using PSI/TM-Coffee V.11.00.d625267 software (T-Coffee Server, Centre for Genomic Regulation, Barcelona, Spain) with default settings [48].

Conclusions
The results of the present study indicate that vertebrate CerS6 and CerS5 genes and encoded proteins represent two closely related CerS gene families, which are distinct from other vertebrate CerS gene families (CerS1-4), similar to previously reported analyses [5,6]. At least eleven domains were identified for the vertebrate CerS6 proteins examined: three lumenal domains, including an N-terminus lumenal domain containing a conserved N-glycosylation site (Asn18); five predicted transmembrane domains (TM1-5) define organization of this enzyme within the endoplasmic reticulum membrane; and three cytoplasmic domains, with the C-terminus containing conserved acidic residues, that may define the function of this domain in cytoplasm. The active site for CerS6 has not been precisely experimentally defined yet, however conserved amino acids, including a double histidine site (His211-His212) located within TM3; Lys134, mutation of which resulted in a 50% loss in activity [36] and three consecutive tyrosine residues (Tyr180-Tyr181-Tyr182) may serve as elements contributing to the biosynthetic or other key roles for this enzyme.
Mammalian CerS6 enzymes are encoded by a single large gene (>240 Kbps) in all genomes studied, with ten exons located on the positive strand for all mammalian genes examined, with the exception of marmoset CerS6, for which the ten exons were located on the negative strand (Table 1). Phylogenetic studies using vertebrate CerS6 sequences, vertebrate CerS1-5 sequences and nematode LAGR1 and HYL1 sequences, indicated that these genes and enzymes represent two distinct but related gene families which have apparently evolved from a single yeast LAC1 gene. Funding: This research was funded by the NIH grant CA193782 to N.I.K and the APC was funded by CA193782.