Full-Length Transcriptome Maps of Reef-Building Coral Illuminate the Molecular Basis of Calcification, Symbiosis, and Circadian Genes

Coral transcriptomic data largely rely on short-read sequencing, which severely limits the understanding of coral molecular mechanisms and leaves many important biological questions unresolved. Here, we sequence the full-length transcriptomes of four common and frequently dominant reef-building corals using the PacBio Sequel II platform. We obtain information on reported gene functions, structures, and expression profiles. Among them, a comparative analysis of biomineralization-related genes provides insights into the molecular basis of coral skeletal density. The gene expression profiles of the symbiont Symbiodiniaceae are also isolated and annotated from the holobiont sequence data. Finally, a phylogenetic analysis of key circadian clock genes among 40 evolutionarily representative species indicates that there are four key members in early metazoans, including cry genes; Clock or Npas2; cyc or Arntl; and tim, while per, as the fifth member, occurs in Bilateria. In summary, this work provides a foundation for further work on the manipulation of skeleton production or symbiosis to promote the survival of these important organisms.

The greatest contribution of scleractinian corals to marine biota and ecosystems is the deposition of reefs via biomineralization in calicoblasts [28], which form skeletons. The reef-building coral skeleton is built through the continuous deposition of aragonite [38], which is formed by a mineral fraction consisting of calcium carbonate and an organic matrix molecule fraction that includes carbohydrates, lipids, and proteins [14]. Current reports

SequelII HiSeq X Ten
Only datasets of ≥10 Gb coral sequences (i.e., excluding Symbiodiniaceae sequences) are included. The bold lines indicate data generated in this paper.
To address these questions, we apply a full-length transcriptomic isoform sequencing (Iso-Seq) strategy using the PacBio Sequel II sequencing platform and quantitative gene expression analysis using the Illumina HiSeq X Ten sequencing platform to produce transcriptome maps for the four aforementioned important reef-building corals and their endosymbiotic Symbiodiniaceae. Based on the gene expression profiles, we perform a more thorough analysis of several important physiological features of reef-building corals, including biomineralization-related gene groups, circadian gene families, and coral and Symbiodiniaceae factors related to the endosymbiotic interaction. These results enhance our understanding of the molecular biology and ecology of reef-building corals, which may aid in the recovery of marine ecosystems, and provide insights into the evolution of the circadian circuitry and holobiosis.

Full-Length-Enriched Transcriptome Sequencing and Data Processing
Based on the standard processes of the PacBio Sequel II sequencing platform, the full-length-enriched raw transcriptome sequencing data (polymerase reads) of four coral holobionts obtained included 25.34 Gb in P. damicornis, 27.8 Gb in P. verrucosa, 22.32 Gb in A. muricata, and 21.44 Gb in M. foliosa (Tables 2 and S2). These data were filtered to ensure quality and reliability, including removing reads of less than 50 bp in length and adapter sequences (for details and statistics, see Tables 2 and S2, Figures S1 and S2). The gene expression profiles of corals and their symbiotic Symbiodiniaceae (whose transcripts were analyzed separately in a later subsection) were separated based on an alignment to previously published sequences. In terms of the corals themselves, there were 20,609 transcripts and 14,167 unigenes (N50 = 2954 bp) in P. damicornis, 24,174 transcripts and 12,822 unigenes (N50 = 2313 bp) in P. verrucosa, 31,242 transcripts and 13,800 unigenes (N50 = 2126 bp) in A. muricata, and 25,460 transcripts and 10,905 unigenes (N50 = 1678 bp) in M. foliosa (Tables 2 and S2). Unigene lengths were concentrated in the range of 1-3 kbp, with few unigenes of less than 1 kbp, verifying that the low-quality sequencing data had been filtered out ( Figure S1b and Table S2. 8). A previous study reported that there were 297,221 assembled transcripts (N50 = 1831 bp) and 209,337 unigenes (N50 = 1435 bp) in P. damicornis using a short-read sequencing approach [88]. It is obvious that PacBio sequencing can obtain fewer redundant sequences and higher-quality sequences, although we sequenced only one developmental stage of coral and not all transcripts were obtained. Then, to evaluate the level of redundancy in the data, we examined the unigene-to-transcript ratio. The percentages of unigenes with a one-to-one unigene-to-transcript ratio were 73.71% (P. damicornis), 64.75% (P. verrucosa), 57.15% (A. muricata), and 57.26% (M. foliosa), and the percentages of unigenes with a one-to-two unigene-to-transcript ratio were 17.44% (P. damicornis), 17.52% (P. verrucosa), 18.41% (A. muricata), and 17.18% (M. foliosa) (detailed in Table S2.9 and Figure S1c). The sum of the two ratios was more than 74% in all samples, indicating that data redundancy was largely reduced. Thus, we obtained high-quality full-length-enriched transcriptome sequencing data for four coral holobionts that were suitable for the subsequent analysis.

Gene-Function Annotation and Structure Analysis
To obtain comprehensive gene-function information, seven authoritative databases (NR [129], NT, Pfam [130], KOG [131], Swiss-Prot [132], KEGG [133], and GO [134]) were utilized to annotate the unigenes, and the related statistics are shown in Figure 1 (Table  S3 and Figures S3-S11). In the four corals, 93.88% (P. damicornis), 89.32% (P. verrucosa), 95.10% (A. muricata), and 79.64% (M. foliosa) of the unigenes were annotated in at least one database (Figure 1a and Table S3.1). The percentage of annotated unigenes in A. muricata, P. damicornis, and P. verrucosa was approximately 90%, suggesting that most of the unigenes in these species were orthologues of genes with functional annotations available. In contrast, the percentage of unigenes annotated in M. foliosa was only 79.64% because sequencing data obtained from Montipora are rare, and the adaptive specialized evolution of these species means that their genomes show especially great differences from those of other reef-building corals with available omics data [100,135], limiting their in-depth annotation. It was also obvious that the percentage of annotated unigenes in A. muricata was higher than that in the other three corals (Figure 1b) because A. digitifera [136] and A. millepora [137], with reported genome data, both belong to the same genus as A. muricata, providing a good reference genome background. 1 CCS: circular consensus sequence. 2 FLNC: full-length non-chimera sequences. 3 N50 or 90: the length for which the collection of all subreads of that length or longer contains at least 50% or 90% of the total of the lengths of the subreads.

Gene-Function Annotation and Structure Analysis
To obtain comprehensive gene-function information, seven authoritative databases (NR [129]], NT, Pfam [130], KOG [131], Swiss-Prot [132], KEGG [133], and GO [134]) were utilized to annotate the unigenes, and the related statistics are shown in Figure 1 (Table  S3 and Figures S3-S11). In the four corals, 93.88% (P. damicornis), 89.32% (P. verrucosa), 95.10% (A. muricata), and 79.64% (M. foliosa) of the unigenes were annotated in at least one database (Figure 1a and Table S3.1). The percentage of annotated unigenes in A. muricata, P. damicornis, and P. verrucosa was approximately 90%, suggesting that most of the unigenes in these species were orthologues of genes with functional annotations available. In contrast, the percentage of unigenes annotated in M. foliosa was only 79.64% because sequencing data obtained from Montipora are rare, and the adaptive specialized evolution of these species means that their genomes show especially great differences from those of other reef-building corals with available omics data [100,135], limiting their in-depth annotation. It was also obvious that the percentage of annotated unigenes in A. muricata was higher than that in the other three corals (Figure 1b) because A. digitifera [136] and A. millepora [137], with reported genome data, both belong to the same genus as A. muricata, providing a good reference genome background. To explore whether the sequences we obtained were indeed derived from cnidarians rather than from some contaminants, we counted the number of sequences annotated based on species information and observed that the top five species with the most highly similar genes to those of the four investigated corals were A. digitifera, Exaiptasia pallida, Nematostella vectensis, Branchiostoma belcheri, and Stylophora pistillata (or A. millepora) (Figure 2, Tables 3 and S3.6). These species (other than the amphioxus B. belcheri) are all cnidarians, and the similarities of the investigated corals to the reference species were all greater than 93%, indicating the relatively high accuracy of the annotations. Interestingly, the NR results indicate that the gene expression profiles of all four investigated corals are close to that of B. belcheri (Figure 2, Tables 3 and S3.6), which may be due to the slow evolution of amphioxus protein sequences. The prediction of CDSs is helpful for potential unigene analysis and provides a basis for subsequent protein analysis. The CDS lengths of the four corals were concentrated in the range of 300-1700 nt (Materials and Methods Section and Figure S12a), and totals of 97.91% (P. damicornis), 95.67% (P. verrucosa), 95.51% (A. muricata), and 89.04% (M. foliosa) of the unigenes were predicted to contain CDS regions (Table S4.1). We then identified coral TFs among the predicted protein CDS results based on Pfam TF families (Materials and Methods Section) and found that the five largest TF families in the corals were ZBTB, zf-C2H2, homeobox, bHLH, and TF_bZIP ( Figure S12b and Table S4.2). The results of the simple sequence repeat (SSR) detection showed that most SSR sequences corresponded to poly(A) tails ( Figure S12c and Table S4.3), reconfirming the absence of genomic DNA contamination. Transcripts without coding potential were classified as lncRNAs. Generally, lncRNAs were shorter than mRNAs ( Figure S12e). The number of predicted lncRNAs in M. foliosa was much higher than those in the other corals ( Figure S12d). Following CDS, TF, SSR, and lncRNA analyses, we quantified the gene expression profiles in each coral by mapping the Illumina sequencing reads to the full-length transcriptome data. This mapping directly providing read-count values that could be converted to the expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM) or transcripts per kilobase million (TPM) values for further ensuring diverse quantitative analysis under different conditions (Table S5). Based on these data, the Pearson's correlation coefficient (R) analysis can be used to reflect the gene expression profile similarity among the samples, and thus verify the experimental reliability and rationality of the sample selection.

Gene Expression Profile Analysis
We then analyzed the gene expression profiles of the four corals (Figure 3a-d). It was clear that the gene expression levels of each coral were considered to be consistent across all replicates (Figures 3a and S13). In addition, a Pearson's R 2 > 0.92 among the three biological replicates of each coral suggested that the sample selection was reasonable and the  The prediction of CDSs is helpful for potential unigene analysis and provides a basis for subsequent protein analysis. The CDS lengths of the four corals were concentrated in the range of 300-1700 nt (Materials and Methods Section and Figure S12a), and totals of 97.91% (P. damicornis), 95.67% (P. verrucosa), 95.51% (A. muricata), and 89.04% (M. foliosa) of the unigenes were predicted to contain CDS regions (Table S4.1). We then identified coral TFs among the predicted protein CDS results based on Pfam TF families (Materials and Methods Section) and found that the five largest TF families in the corals were ZBTB, zf-C2H2, homeobox, bHLH, and TF_bZIP ( Figure S12b and Table S4.2). The results of the simple sequence repeat (SSR) detection showed that most SSR sequences corresponded to poly(A) tails ( Figure S12c and Table S4.3), reconfirming the absence of genomic DNA contamination. Transcripts without coding potential were classified as lncRNAs. Generally, lncRNAs were shorter than mRNAs ( Figure S12e). The number of predicted lncRNAs in M. foliosa was much higher than those in the other corals ( Figure S12d). Following CDS, TF, SSR, and lncRNA analyses, we quantified the gene expression profiles in each coral by mapping the Illumina sequencing reads to the full-length transcriptome data. This mapping directly providing read-count values that could be converted to the expected number of fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM) or transcripts per kilobase million (TPM) values for further ensuring diverse quantitative analysis under different conditions (Table S5). Based on these data, the Pearson's correlation coefficient (R) analysis can be used to reflect the gene expression profile similarity among the samples, and thus verify the experimental reliability and rationality of the sample selection.

Gene Expression Profile Analysis
We then analyzed the gene expression profiles of the four corals (Figure 3a-d). It was clear that the gene expression levels of each coral were considered to be consistent across all replicates (Figures 3a and S13). In addition, a Pearson's R 2 > 0.92 among the three biological replicates of each coral suggested that the sample selection was reasonable and the experimental data were reliable ( Figure 3d). Then, the differences in the coral gene expression profiles were explored. We used two methods (median [138] and scbn [139]) to analyze the pairwise differential expressions between the two corals and showed that all six groups produced significant differential expressions of transcript orthologs (|log2FoldChange| > 2 or 10; p-value < 10 −6 ; see Materials and Methods Section for complete statistical analysis, Figure S14 and Table S6). Among them, the highest number of orthologous transcripts was observed between P. damicornis and P. verrucosa, while their differentially expressed genes (DEGs) accounted for the smallest proportion of all orthologous transcripts ( Figure S14 and Table S6.8). Subsequently, according to the gene expression profile results, biomineralization, symbiosis, and circadian gene families, which provide crucial information on coral population structure and the evolution of coral gene repertoires, are highlighted hereafter. cal analysis, Figure S14 and Table S6). Among them, the highest number of orthologous transcripts was observed between P. damicornis and P. verrucosa, while their differentially expressed genes (DEGs) accounted for the smallest proportion of all orthologous transcripts ( Figure S14 and Table S6.8). Subsequently, according to the gene expression profile results, biomineralization, symbiosis, and circadian gene families, which provide crucial information on coral population structure and the evolution of coral gene repertoires, are highlighted hereafter.

Expression Analysis of the Biomineralization-Related Gene Group
Although the biomineralization mechanisms of reef-building corals are similar (Figure 4a), they are displayed with a wide range of morphological differences. To obtain an

Expression Analysis of the Biomineralization-Related Gene Group
Although the biomineralization mechanisms of reef-building corals are similar (Figure 4a), they are displayed with a wide range of morphological differences. To obtain an insight into the molecular-level causes of this phenomenon, we investigated the expression patterns and features of biomineralization-related gene families and observed that the expression levels of calcium ATPases, which are essential for Ca 2+ transport, were higher in the genera Pocillopora and Acropora, and the expression levels of bicarbonate transporters, which are essential for HCO 3 − transport, were higher in P. verrucosa (Figure 4b higher in the genera Pocillopora and Acropora, and the expression levels of bicarbonate transporters, which are essential for HCO3 − transport, were higher in P. verrucosa ( Figure  4b, c, Tables 4 and S7). Among the four investigated corals, M. foliosa had the lowest expression level of calcium ATPases, which may have a negative effect on the transportation of calcium ions to the calcifying fluid (Figure 4b, Tables 4 and S7). CA2 is universally expressed in all four corals at very high levels and accounts for a major proportion of the CA content, implying its key role in coral biomineralization (Figure 4d, Tables 4 and S7).   The expression levels of coral biomineralization-related extracellular matrix (ECM) proteins, including acid-rich proteins, USOMPs, galaxins, and collagen alpha-6 (VI) chain proteins, were all obviously higher in the Complexa clade (A. muricata and M. foliosa) than in the Robusta clade (P. damicornis and P. verrucosa) (Figure 4e-g, Tables 4 and S7). Regarding key acid-rich ECM proteins, A. muricata showed the highest expression levels, diversity, and balance, followed by M. foliosa, while skeletal aspartic acid-rich protein 1 (SAARP1) was the only protein in this group highly expressed in genus Pocillopora. Regarding the ECM accessory proteins USOMPs, A. muricata and M. foliosa presented extremely higher levels and diversity, and USOMP-6 was the dominant USOMP gene, whereas members of the genus Pocillopora mainly expressed USOMP-5 (Tables 4 and S7). Galaxins were the only ECM components with a higher expression in M. foliosa than in A. muricata (Tables 4 and  S7). This result indicates that there are significant differences in the types and expression levels of core calcification genes among different reef-building corals, which may affect their morphological formation and growth model.

Genes Expressed by Symbiodiniaceae (Intracellular Symbionts of Coral) Are Mainly Involved in Energy and Nutrient Production
The secretion of calcium carbonate skeletons to form marine reefs and the reliance on intracellular symbiotic algae (Symbiodiniaceae) for nutrient acquisition are two signature features of reef-building coral physiology [14][15][16][17][18][19][20][21]136]. Although skeletal formation has been extensively discussed, the contributions of the genes expressed by the Symbiodiniaceae are rarely mentioned. Depending on identifying coral or symbiont transcripts based on alignment to previously published sequences, we found that 235 (P. damicornis), 351 (P. verrucosa), 220 (A. muricata), and 561 (M. foliosa) Symbiodiniaceae unigenes were expressed in the investigated coral holobionts (Table S8). Similar to the analysis process used for corals, Symbiodiniaceae gene expression profiling was performed using seven functional databases, NR, NT, Pfam, KOG, Swiss-Prot, KEGG, and GO, and published Symbiodiniaceae genome data (Figures S15-S26 and Tables S8.). We found 832 transcripts that were expressed by the symbiotic Symbiodiniaceae of all four investigated corals ( Figure  S27), accounting for approximately 70.75% (P. damicornis), 70.87% (P. verrucosa), 88.79% (A. muricata), and 93.69% (M. foliosa) of the symbiont transcripts. Overall, gene expression profiles of the Symbiodiniaceae were more similar than those of coral hosts, suggesting considerable gene expression convergence. On the other hand, the gene expression patterns of the symbionts of P. damicornis and P. verrucosa were more similar to each other than to A. muricata and M. foliosa, and vice versa, coinciding with the evolutionary relationships of the corals ( Figure S28).
To explore the functions of these expressed genes, we classified them according to the annotation results using the GO database ( Figures 5 and S29, and Tables S8.11 and S8.12). In the biological process category, the identified gene functions were mainly involved in oxidation reduction and carbohydrate metabolism ( Figure 5). In the cellular component category, most of the functional genes were related to photosynthesis, such as light-harvesting complex or photosystem ( Figure S29). In the molecular function category, the largest group of genes was associated with oxidoreductase or carbonate dehydratase activity ( Figure S29). These results suggest that the symbionts of all four corals have similar expression profiles that feature genes related to energy and nutrient production.

Phylogenetic Analysis of the Key Circadian Clock Gene Regulation Network
The circadian clock system is one of the most universal and fundamental characteristics of life across almost all living species [74]. Corals also exhibit circadian behaviors, and there have been studies describing the molecular mechanisms underlying the regulation of these behaviors [35][36][37]. However, whether this mechanism is applicable to other corals, other cnidarians and even other animals remains to be clarified. To explore the evolutionary origins of the key members of the circadian clock gene regulation network of reef-building corals, including the four corals studied herein, we selected 40 evolutionarily representative species and used their protein models to investigate the key circadian clock gene regulation network, namely, the cry1, cry2, Clock, Npas2, cyc, Arntl (also known as Bmal1), Arntl2 (also known as Bmal2), per1, per2, per3, and tim genes (Tables 5 and S9), and performed phylogenetic analyses (Figures 6 and S30-S38). The results show that the key circadian clock gene regulatory network evolves in early metazoans, whereas protozoan clade does not have this kind of gene regulatory network, although the tim orthologous gene is found in the Ciliophora clade. Moreover, the Ascidiacea clade showed a complete loss of this network during evolutionary adaptation, except for the tim gene, suggesting that ascidians might have unknown clock mechanisms that differ from known systems of vertebrates and insects (Tables 5 and S9, Figures 6 and S30-S38). Among this group of genes, the tim gene was the only one present in all the searched species, suggesting that it is the most conserved and stable circadian clock gene (Tables 5 and S9, Figure 6) [100,140]. The phylogenetic analysis suggested that in the early metazoan stage (non-bilaterian metazoans), there were four key gene members of the circadian clock gene regulatory network, cry; Clock or Npas2; cyc, Arntl(Bmal1), or Arntl2(Bmal2); and tim. The cry, Clock or Npas2, cyc or Arntl (Bmal1), and tim gene homologues can be found in reef-building corals as shown in Table 5. Thus, the data suggested the existence of four key conserved members of the reef-building coral circadian clock gene regulation network. The phylogenetic analysis also indicated that the per gene is a novel member of the circadian clock gene regulation network in the Bilateria, as it is found in protostomes and deuterostomes but not in early metazoans (Table 5).

Phylogenetic Analysis of the Key Circadian Clock Gene Regulation Network
The circadian clock system is one of the most universal and fundamental characteristics of life across almost all living species [74]. Corals also exhibit circadian behaviors, and there have been studies describing the molecular mechanisms underlying the regulation of these behaviors [35][36][37]. However, whether this mechanism is applicable to other corals, other cnidarians and even other animals remains to be clarified. To explore the evolutionary origins of the key members of the circadian clock gene regulation network of reef-building corals, including the four corals studied herein, we selected 40 evolutionarily representative species and used their protein models to investigate the key circadian clock gene regulation network, namely, the cry1, cry2, Clock, Npas2, cyc, Arntl (also known as Bmal1), Arntl2 (also known as Bmal2), per1, per2, per3, and tim genes (Tables 5 and S9), and performed phylogenetic analyses (Figures 6 and S30-S38). The results show that the key circadian clock gene regulatory network evolves in early metazoans, whereas protozoan clade does not have this kind of gene regulatory network, although the tim orthologous gene is found in the Ciliophora clade. Moreover, the Ascidiacea clade showed a complete loss of this network during evolutionary adaptation, except for the tim gene, Figure 5. Bar graph of co-expressed Symbiodiniaceae sequences based on GO biological processes categories. The vertical axis represents the GO terms, and the horizontal axis represents the number of transcripts annotated to the terms (including sub-terms of the terms).
"Y" means yes, which means the species contains this gene homologue, while "-" means the opposite.  Figure S38.

Discussion
Despite much research, our understanding of reef-building coral holobiont transcriptomes remains incomplete, and the short-read lengths intrinsic to the prevailing technologies have limited access to complete genetic information. Here, we sequenced and analyzed the full-length transcriptomes of four common dominant reef-building coral holobionts with respect to gene functions, structures, and expression. The results reveal

Discussion
Despite much research, our understanding of reef-building coral holobiont transcriptomes remains incomplete, and the short-read lengths intrinsic to the prevailing technologies have limited access to complete genetic information. Here, we sequenced and analyzed the full-length transcriptomes of four common dominant reef-building coral holobionts with respect to gene functions, structures, and expression. The results reveal differences in the members actually involved in biomineralization processes in reef-building corals and provide new insights into further understanding the molecular mechanisms of coral density. In particular, we isolated and demarcated the gene expression profiles of the symbiont Symbiodiniaceae, which showed a higher convergence than the coral hosts, suggesting that the internal environment of symbiotic algae-bearing cells in all four species may be quite similar. Moreover, we confirmed that there were four key members of the circadian clock gene regulatory network in early metazoans and that the per gene first occurred in Bilateria, further enriching the molecular database for studying the evolutionary origins of the animal circadian clock system, although this needs to be verified by biological experiments or other methods.

Coral Biomineralization and Skeleton Density
Coral biomineralization has been intensively studied in previous studies, but descriptions of its formation mechanism vary; here, we summarized this process, which consists of four components (Figure 4a). (1) Calcium ion transport. In calicoblasts and paracells, calcium ions enter the cells through calcium channels and exit via calcium ATPases that exchange two calcium ions for four protons across the cell membrane [39,40]. Ca 2+ diffusion among cells with chemical gradients may also participate in calcium deposition [15]. (2) An HCO 3 − source. The source of HCO 3 − is metabolic and environmental CO 2 . Metabolic CO 2 can be converted into HCO 3 − both intracellularly and extracellularly, which is catalyzed by CA under a favorable pH [41][42][43], and intracellular HCO 3 − exits cells via bicarbonate transporters belonging to two membrane protein families (SLC4 and SLC26) [44]. In calcifying fluid, the HCO 3 − concentration is higher than the Ca 2+ concentration, and the amount of calcium carbonate deposition is determined by the Ca 2+ concentration [43]. (3) Acid-rich proteins. Coral acid-rich proteins are key ECM proteins involved in biomineralization and can interact with amorphous calcium carbonate (ACC) directly, promoting crystal nucleation, determining growth axes and controlling crystal growth [45,46]. (4) Other organic matrix proteins of the ECM. The bioprecipitation of aragonite crystals in corals also requires additional skeletal organic matrix proteins as binders [47], including USOMPs, galaxins, and alpha IV collagen [38,48,49]. In addition, magnesium plays a crucial role in regulating the formation of different calcium carbonate phases that cooperate with organic matrix molecules to stabilize ACCs [141].
Here, the above core biomineralization-related gene expression level analyses among four reef-building corals indicate that the genera Pocillopora and Acropora may have a greater capacity to produce calcium carbonate than the genus Montipora, but the opposite situation was observed for biomineralization-related ECM proteins from the genus Pocillopora, suggesting that the species of the Complexa clade may produce a greater volume of skeleton than those of the Robusta clade using the same amount of calcium carbonate. This is in accordance with the reported skeletal density data of these stony corals [34,142]; however, future studies on the relationship between the activity and stability of biomineralization-related enzymes among different corals at the protein level are required to confirm this conjecture.

Evolutionary Origins of the per Gene
The oscillations of both the transcript and protein levels of the per gene have a period of approximately 24 h and play a central role in the molecular mechanism of the biological clock driving circadian rhythms. In Drosophila, after PER is produced from per mRNA, it dimerizes with timeless (TIM), and the complex enters the nucleus and inhibits the TFs per and tim, which in turn lowers the levels of PER and TIM [53]. When TIM is not complexed with PER, another protein, double-time, or DBT, phosphorylates PER, targeting it for degradation [57]. In mammals, an analogous transcription-translation negative feedback loop is observed. One of the three PER proteins (PER1, PER2, and PER3) that is translated from the mammalian homologs of drosophila-per dimerizes via its PAS domain with one of two cryptochrome proteins (CRY1 and CRY2) to form a negative element of the clock. This dimer then interacts with the CLOCK (or NPAS2) and ARNTL (or ARNTL2) heterodimer, inhibiting its activity and thereby negatively regulating its own expression [143]. Despite an extensive search, we failed to detect orthologs of the per gene in our coral data. This also may be the case for sponges and ctenophores (Tables 5 and S9). On the other hand, amphioxus [144,145] and Crassostrea gigas [146] likely have per genes, and our results show that the biological clock is a common feature of a wide range of Metazoa, while per gene originates from Bilateria. It is unclear, however, whether the ancient circadian clock system initially operated without the per gene, whether the non-bilaterian metazoans system lost this locus via deletion or other mutation, or whether unknown proteins may have functions similar to the per gene. These hypotheses need to be explored further in future research.
Overall, the full-length transcriptome maps of reef-building coral may serve as references for expression analyses of coral under environmental stressors linked to global change, which alter the normal function of reef-building corals. As long-read sequencing continues to evolve in throughput, accuracy, accessibility, and cost efficiency, full-length transcriptomes will be adopted by researchers and provide us with unprecedented views of serious biological problems. However, genome, small RNA, proteome, and single-cell data are still lacking to further understand the biology of reef-building corals. Future efforts are required to construct a coral genome database as a reference for analyzing functional genes and their associated regulatory networks at the transcriptomic and proteomic levels and further refine them to specific cellular lineages.

Ethics
All coral samples were collected and processed in accordance with local laws for invertebrate protection.

Sample Collection
The corals in the study were collected from the Xisha Islands in the South China Sea (latitude 15 • 40 -17 • 10 N, longitude 111 • -113 • E).

Coral Culture System
The coral samples were cultured in our laboratory coral tank with conditions conforming to their habitat environment. All samples were raised in a RedSea ® tank (redsea575, Red Sea Aquatics Ltd., London, UK) at 26 • C and 1.025 salinity (Red Sea Aquatics Ltd., London, UK). The physical conditions of the coral culture system were as follows: three coral lamps (AI ® , Red Sea Aquatics Ltd., London, UK), a protein skimmer (Reef Octopus Regal 250-S, Honya Co. Ltd., Shenzhen, China), a water chiller (tk1000, TECO Ltd., Port Louis, Mauritius), two wave devices (VorTech™ MP40, EcoTech Marine Ltd., Bethlehem, PA, USA), and a calcium reactor (Reef Octopus OCTO CalReact 200, Honya Co. Ltd., Shenzhen, China).

Total RNA Extraction
The three biological replicates samples for each coral were isolated from three healthy branches in the same coral independent colony to ensure that enough high-quality RNA (>15 µg) could be obtained for a PacBio cDNA library and three Illumina cDNA libraries. All the RNA extraction procedures followed the manufacturer's instructions. The total RNA was isolated with TRIzol ® LS Reagent (Thermo Fisher Scientific, 10296028, Waltham, MA, USA) and treated with DNase I (Thermo Fisher Scientific, 18068015, Waltham, MA, USA). The high-quality mRNA was isolated with a FastTrack MAG Maxi mRNA Isolation Kit (Thermo Fisher Scientific, K1580-02, Waltham, MA, USA). The RNA extraction procedure was performed according to the following instructions: (1) grinded coral samples (kept the samples submerged in liquid nitrogen at all times); (2) when the samples were ground into small pieces, the TRIzol ® LS reagent was added, the ratio of sample to reagent was about 1:3; (3) we let samples stand and thaw naturally; (4) we continued adding TRIzol ® LS reagent until the samples were dissolved, and dispensed into 50 mL centrifuge tubes; (5) centrifuged at 4 • C and 3000 rpm for 5-15 min; (6) dispensed the supernatant into 50 mL centrifuge tubes; (7) added BCP (Molecular Research Center, BP 151, Cincinnati, OH, USA) to the above centrifuge tubes, the ratio of sample to reagent was about 5:1, shaken well and then stood for 10 min; (8) centrifuged at 4 • C and 10,500 rpm for 15 min; (9) we obtained the supernatant, added an equal volume of Isopropanol (Amresco, 0918-500ML, Radnor, PA, USA) and mixed well, stood them overnight at −20 • C; (10) centrifuged at 4 • C and 10,500 rpm for 30 min, discarded the supernatant; and (11) rinsed them 2 times with 75% Ice Ethyl alcohol, Pure (Sigma-Aldrich, E7023-500ML, Taufkirchen, München, Germany). Finally, three samples of each coral were extracted in equal amounts (total >10 µg) and mixed for PacBio full-length-enriched transcriptome sequencing; the remainders (>1.5 µg per sample) were used for Illumina sequencing.

Total RNA Quality Testing
Before establishing the library, the quality of total RNA must be tested. RNA degradation and contamination were monitored on 1% agarose gels electrophoresis; RNA purity (OD260/280 ratio) was checked using the NanoPhotometer ® spectrophotometer (IMPLEN, Westlake Village, CA, USA); RNA concentration was quantified using Qubit ® RNA Assay Kit in Qubit ® 2.0 Flurometer (Thermo Fisher Scientific, Waltham, MA, USA); and RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA).

Illumina cDNA Library Construction and Sequencing
A total amount of 1.5 µg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext ® Ultra™ RNA Library Prep Kit (E7530L) for Illumina ® (NEB, Ipswich, MA, USA) following the manufacturer's recommendations, and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was performed using divalent cations under an elevated temperature in NEBNext First-Strand Synthesis Reaction Buffer (5×). First-strand cDNA was synthesized using a random hexamer primer and M-MuLV Reverse Transcriptase (RNase H − ). Second-strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3 ends of DNA fragments, an NEBNext Adaptor with hairpin loop structure was ligated to prepare for hybridization. In order to select cDNA fragments preferentially 250-300 bp in length, the library fragments were purified with an AMPure XP system (Beckman Coulter, Beverly, Brea, CA, USA). Then, 3 µL USER Enzyme (NEB, Ipswich, MA, USA) was used with size-selected, adaptor-ligated cDNA at 37 • C for 15 min followed by 5 min at 95 • C before PCR. Then, PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers, and Index (X) Primer. Finally, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system. The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumia, San Diego, CA, USA), according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina HiSeq X Ten platform and paired-end reads were generated.

PacBio cDNA Library Construction and Sequencing
The isoform sequencing (Iso-Seq) library was prepared according to the isoform sequencing protocol (Iso-Seq) using the Clontech SMARTer ® PCR cDNA Synthesis Kit (Clontech Laboratories (now Takara Laboratories), 634926, Mountain View, CA, USA) and the BluePippin Size Selection System protocol, as described by Pacific Biosciences (PN 100-092-800-03). Briefly, Oligo(dT)-enriched mRNA was reversely transcribed to cDNA by a SMARTer PCR cDNA Synthesis Kit; the synthesized cDNA was then amplified by polymerase chain reaction (PCR) using BluePippin Size-Selection System protocol; the Iso-Seq library was constructed by full-length cDNA damage repair, terminal repair, and attaching SMRT dumbbell adapters; the sequences of the unattached adapters at both ends of the cDNA were removed by exonuclease digestion; the cDNA obtained above was combined with primers and DNA polymerase to form a complete SMRT bell library. While the library was qualified, the PacBio Sequel II platform was used for sequencing based on the effective concentration and data output requirements of the library.

Data Filtering and Processing
The Illumina sequencing raw reads of fastq format were firstly processed through in-house perl scripts. In this step, clean data were obtained by removing reads containing adapters, reads containing ploy-N, and low-quality reads from raw data. At the same time, Q20, Q30, GC-content, and sequence duplication levels of the clean data were calculated. All the downstream analyses were based on clean data with a high quality.

Coral and Symbiodiniaceae Sequences Separation
We aligned consensus reads to coral or Symbiodiniaceae reference genomes, respectively, using GMAP v2017-06-20 (Thomas D. Wu, South San Francisco, CA, USA) software [147]. The sequences mapped to Symbiodiniaceae reference genomes belonged to Symbiodiniaceae sequences; the sequences mapped to coral reference genomes belonged to coral sequences.

Correction and De-Redundancy
The RNA-seq data sequenced by the Illumina HiSeq X Ten platform were used to correct additional nucleotide errors in polish consensus sequences obtained in the previous step with LoRDEC v0.7 (Leena Salmela and Eric Rivals, Helsinki, Finland) software [148]. Using CD-HIT v4.6.8 (Weizhong Li, La Jolla, CA, USA) software (parameters: −c 0.95 −T 6 −G 0 −aL 0.00 −aS 0.99), all redundancies were removed in corrected consensus reads to acquire final full-length transcripts and unigenes for subsequent bioinformatics analysis [149].

Gene Structure Analysis
ANGEL v2.4 (Kana Shimizu, Tokyo, Japan) software [153] was used to predict protein CDS (coding sequence). We used same species or closely related species confident protein sequences for ANGEL training and then ran the ANGEL prediction for the given sequences.
Usually, the TFs were identified based on the Pfam files of TF families in the AnimalTFDB 3.0 database [154]; however, corals were not included in this database, so we identified coral TFs based on the Pfam files of TF families using the hmmsearch program in HMMER 3.1 package. SSR of the transcriptomes was identified using MISA v1 (Thomas Thiel, Gatersleben, Germany) [155]. We used CNCI v2 (Liang Sun, Beijing, China) [156], CPC2 v0.1 (Yu-Jian Kang, Beijing, China) [157], PfamScan v1.6 (EMBL-EBI, Cambridgeshire, UK) [158], and PLEK v1.2 (Aimin Li and Junying Zhang, Xi'an, China) [159], four tools to predict the coding potential of transcripts. Transcripts predicted with coding potential by either/all of the three tools above were filtered out, and those lacking coding potential were our candidate set of lncRNAs.

Gene Expression Quantification
The full-length transcriptomes of each coral obtained above were used as the reference backgrounds, respectively, and then the clean reads for the corresponding three biological replicate samples obtained by Illumina sequencing were mapped to it and the read-count values for each transcript can be obtained by kallisto v0.44.0 (Pachter Lab, Berkeley, CA, USA) software [160]. The read counts were converted to FPKM to analyze the gene expression levels. Using Pearson's correlation coefficient, we analyzed the relationship among the samples.

Gene Differential Expression Analysis
Based on the above steps, we obtained gene expression levels for a total of 12 samples from 4 coral species (Table S5.1). Orthologous transcripts between each of the two corals were searched by orthofinder v2.5.4 (David M. Emms, Oxford, UK) [161]. To make the expression levels of orthologous genes comparable between different species, we used two methods (median [138] and scbn [139]) to normalize their expression levels. Both of them were based on a group of conserved orthologous genes among different species. Therefore, we selected orthologous transcripts that were uniquely present in all four coral species as conserved orthologous genes. Then, we used the median method by assessing their median expression levels in each species among the genes with expression values in the interquartile range for different species and deriving the scaling factor that adjusted those median values to a common value. Meanwhile, we used these conserved orthologous genes by minimizing the deviation between the empirical and nominal type-I errors to achieve the optimal scaling factor in the scbn method. Accordingly, the normalized expression matrix and the p-value of each pair of orthologous transcripts can be obtained. Orthologous transcripts with p-value < 10 −6 and |log2(FoldChange)| > 2 or 10 were regarded as DEGs.

Phylogenetic Analysis
To construct the phylogenetic tree, 4 of our own reef-building corals and 36 other species were selected for the study, for a total of 17 clades, with the exception of the Porifera clade for which no further protein model species could be found;2 two or more representative species per clade were selected to reduce the contingency of results (Table 5 and detailed in Table S9). The confirmed amino acid sequences of circadian clock genes of Homo sapiens, Mus musculus, Danio rerio, and Drosophila melanogaster were used as query sequences to search for orthologs in the protein models of the above species using the BLASTp algorithm at the local BLAST+ 2.13.0 (parameters: e value = 1 × 10 −5 ). The sequences alignment was performed using MAFFT v7.505 (Kazutaka Katoh, Tokyo, Japan) with default parameters [162]. The gaps were removed to obtain conserved domains by trimAl v1.4.rev15 (Salvador Capella-Gutiérrez, Barcelona, Spain) with default parameters [163]. The phylogenetic trees were constructed using the maximum likelihood method by IQ-TREE 2.0.7 (Bui Quang Minh, Canberra, ACT, Australia) with the parameters: −m MFP −B 1000 −alrt 1000 ("−m MFP" means that it can automatically determine the best-fit model for data, "−B 1000" specifies 1000 replicates for the ultrafast bootstrap, and "−alrt 1000" specifies the number of bootstrap replicates for SH-aLRT where 1000 is the minimum number recommended) [164][165][166]. iTOL v6.3.2 (Ivica Letunic, Heidelberg, Germany) [167] was used to visualize the tree and Jalview 2.11.2.4 (Andrew M. Waterhouse, Cambridge, MA, USA) was used to show the conserved domains [168].

Conclusions
The sequencing and analysis of the full-length transcriptome maps of dominant reefbuilding coral holobionts contributed to a more in-depth understanding of coral physiology. Related insights improve our understanding of the evolution of circadian rhythms and of holobiosis, and they provide a foundation for further work to protect or even manipulate coral skeleton production or symbiosis to promote the survival of these important organisms. These results provide support for remodeling reef-building corals through advanced gene toolkits under current global climate change and ecological deterioration.