Full-Length Transcriptomics Reveal the Gene Expression Profiles of Reef-Building Coral Pocillopora damicornis and Symbiont Zooxanthellae

Since the last century, episodes of coral reef bleaching and mortality have occurred almost annually in tropical or subtropical seas. When the temperature exceeds the tolerant limit of a coral–zooxanthellae holobiont, it induces physiological stress and disrupts the vulnerable finetuned balance between the two partners, leading to bleaching. The gene expression profiles of a scleractinian coral and its symbiotic zooxanthellae can offer important information with which to decipher this balanced relationship at the functional level of genes. Here, we sequence a full-length transcriptome of a well-known, common and frequently dominant reef-building coral, Pocillopora damicornis, to acquire gene expression information for the coral–zooxanthellae holobiont. To this end, we identify 21,926 and 465 unique genes in the coral and algal symbiont, respectively, and examine the functional enrichment among these genes based on GO (gene ontology) terms and KEGG (the Kyoto Encyclopedia of Genes and Genomes) pathways. The results show that the zooxanthellae provide for their coral host through energy and nutrition metabolism by photosynthesis, and that both the coral host and zooxanthellae have an anti-stress molecular mechanism, though the two parties have independent abilities to survive in the short term. This work sheds light on the valuable gene expression profile of a coral–zooxanthellae holobiont and provides grounds for further molecular biological research to support ecological protection work.


Introduction
Coral reef ecosystems, which are intricate and diverse collections of species that interact with each other and the physical environment to provide a habitat for many marine organisms, have been undergoing unprecedented mass coral bleaching events in recent decades, fueled by ocean warming, ocean acidification and the massive encroachment of the predatory crown-of-thorns starfish [1][2][3][4][5]. Global change continues to diminish the productivity and biodiversity of coral reefs, along with their capacity to deposit calcium carbonate [5,6].
Pocillopora damicornis, or scleractinian coral, a type of coral characterized by its hard skeleton, provides the bedrock of the reef and is, thus, one of the most widely distributed reef corals, which also makes it one of the most well-studied [7][8][9]. Scleractinian coral displays a population structure typical of organisms with an intimately intracellular coexistence, containing microscopic algae called zooxanthellae, which exist with the animal in a symbiotic relationship [10][11][12]. The symbiont is crucial to the maintenance and photosynthesis of the coral reef [13]. Changes in a suite of environmental conditions, however, can lead to the breakdown and dissociation of the coral-algal symbiosis, which results in coral bleaching [12]. Bleached corals die if not re-populated with Symbiodinium, and even recovered corals have reduced growth, regeneration and fitness, and a greater susceptibility to bleaching in the future [14][15][16][17]. The gene expression profile of the symbiont affects the host-symbiotic culture in corals and shows that the coral-zooxanthellae holobiont expresses messages for molecules involved with the regulation of the ecosystem to prevent bleaching.
Some preliminary studies of P. damicornis have been undertaken. In ocean warming and acidification conditions, oxidative stress is unlikely to have been the driver for symbiont expulsion and corals may have increased their resilience to the widespread climate change [18,19]. The microbiome in corals can be manipulated to lessen the effect of bleaching; thus, helping to alleviate pathogen and temperature stresses [20,21]. Next-generation transcriptome sequencing was firstly reported by Jeremie Vidal-Dupiol et al., who identified related genes that are upregulated in response to a CO 2 -driven pH decrease [22]. Niche partitioning of a closely related symbiotic in P. damicornis was studied for the first time using this approach [23]. Zhou Zhi et al. established transcriptome libraries of P. damicornis after ammonium stress and explored the dual-recognition activity of a rhamnose-binding lectin to pathogenic bacteria and zooxanthellae [24]. Although these papers studied the role of zooxanthellae in a symbiont and investigated the effects of zooxanthellae and corals on symbionts, the research was carried out under special conditions, such as acidification. As zooxanthellae live in symbiosis with corals in a normal marine environment, their influence on each other and complementary interaction must also be studied in such a setting.
Recent advances of PacBio Sequel II sequencing technology provide a way to obtain large amounts of full-length transcriptome data from many organisms and tissues [25][26][27].
In principle, such data can allow us to identify all expressed transcripts as a complete and contiguous mRNA sequence from the transcription start site to the transcription end site. This approach can be applied to multiple alternatively spliced isoforms and to allow a precise analysis of fusion genes, homologous genes, superfamily genes, or alleles. With these tools, researchers can accurately and more efficiently analyze all gene expression profile information, such as that on gene expression, variable splicing, gene fusion, expression regulation, the coding sequence (CDS), protein structure, etc. This overcomes the problems of short and error splicing triggered by next-generation sequencing technology and the problems caused by amplifying target genes one by one, which was an issue early in the days of polymerase chain reaction and Sanger sequencing [28][29][30]. However, to date, no studies on the full-length transcriptome analysis and expression profiles of zooxanthellae and coral have been reported in P. damicornis.
In this study, we first perform SMRT (single-molecule real-time) transcriptome sequencing of P. damicornis using the PacBio RSII sequence platform, which provides the technologies needed to directly obtain complete transcripts containing 5 UTR and 3 UTR with poly(A) tails. Additionally, zooxanthellae transcripts isolated from the algal symbiont transcriptome allow us to investigate and identify the highly expressed genes and related pathways at a molecular level, to provide further insight into the molecular mechanisms of a coral symbiont.

Ethics
All coral samples were collected and processed in accordance with local laws for invertebrate protection.

RNA Extraction
The RNA extraction procedure was performed according to the following instructions: (1) coral samples were ground (we kept the samples submerged in liquid nitrogen at all times); (2) when the samples were ground into small pieces, the TRIzol ® LS reagent (Thermo Fisher Scientific, Carlsbad, CA, USA, 10296028) was added, the ratio of sample to reagent was about 1:3; (3) samples were left to stand and thaw naturally; (4) TRIzol ® LS reagent was continuously added until the samples were dissolved, and dispensed into 50 mL centrifuge tubes; (5) centrifuged at 4 • C and 3000 rpm for 5-15 min; (6) the supernatant was dispensed into 50 mL centrifuge tubes; (7) BCP (Molecular Research Center, Cincinnati, OH, USA, BP151) was added to the above centrifuge tubes, the ratio of sample to reagent was about 5:1, samples were shaken well and then left to stand for 10 min; (8) centrifuged at 4 • C and 10,500 rpm for 15 min; (9) an equal volume of Isopropanol (AMRESCO, 0918-500ML) was added to the supernatant and mixed well, left to stand overnight at −20 • C; (10) it was centrifuged at 4 • C and 10,500 rpm for 30 min, and the supernatant was discarded; (11) rinsed 2 times with 75% Ice Ethanol (Sigma, Shanghai, China, E7023-500ML), and treated with DNase I (Thermo Fisher Scientific, 18068015). Finally, three samples of each coral were extracted in equal amounts (total > 10 µg) and mixed for PacBio full-length transcriptome sequencing, the remainders (>1.5 µg per sample) were used for Illumina sequencing.
The high-quality mRNAs were isolated with a FastTrack MAG Maxi mRNA Isolation Kit (Thermo Fisher Scientific, K1580-02). The samples were separated from healthy P. damicornis to ensure that enough high-quality RNA (>10 µg) could be obtained for a full-length cDNA (complementary DNA) transcriptome library.

Library Construction and Sequencing
To support the accuracy and credibility of the data, we used three sample repetitions for the library construction and sequencing. Before establishing the library, the quality of the total RNA was tested. Agarose gel electrophoresis was used to analyze the degree of degradation of RNA and determine whether it was contaminated. A Nanodrop nucleic acid quantifier was used to detect the purity of the RNA (OD260/280 ratio), a Qubit RNA assay was used to accurately quantify the RNA concentration and an Agilent 2200 TapeStation was used to accurately detect the integrity of the RNA. The Clontech SMARTer ® PCR cDNA Synthesis Kit (Clontech Laboratories, California, CA, USA, 634926) and the BluePippin Size Selection System protocol, as described by Pacific Biosciences (PN 100-092-800-03), were used to prepare the isoform sequencing (Iso-Seq) library according to the Iso-Seq protocol. We used the PacBio Sequel II platform with SMRT (single-molecular real-time) sequencing technology to obtain raw data.

Error Correction Using Illumina Reads
LoRDEC v0.7 constructed a DBG (de Bruijn graph) based on sequences obtained by the Illumina platform and read every consensus to identify whether the sequence was supported by Illumina data in DBG [32]. The sequences not supported by next-generation data were corrected and the corrected consensus sequences were outputted.

Gene Functional Annotation
The gene function was annotated based on the following databases: NR (NCBI nonredundant protein sequences) [34]; NT (NCBI non-redundant nucleotide sequences); Pfam (protein family); KOG/COG (clusters of orthologous groups of proteins) [35]; Swiss-Prot (manually annotated and reviewed protein sequence database) [36]; KEGG (Kyoto Encyclopedia of Genes and Genomes) [37]; GO (gene ontology) [38]. We used BLAST software and set the e-value to '1e-10 in the NT database analysis, used Diamond BLASTX and set the e-value to '1e-10 in the NR, KOG, Swiss-Prot and KEGG databases analyses and used the Hmmscan software in the Pfam database analysis [39].

CDS Prediction
The ANGEL V2.4 pipeline was used to identify protein coding sequences (CDSs) from cDNAs [40]. We used closely related species-confident protein sequences for ANGEL training and predicted the CDSs of P. damicornis.

SSR Analysis
SSRs (simple sequence repeats) of transcriptomes were identified using the MISA V1 website sever (http://pgrc.ipk-gatersleben.de/misa/misa.html, accessed on 21 October 2020), via which we located perfect microsatellites as well as compound microsatellites that were interrupted by a certain number of bases [43].

LncRNA Analysis
PLEK and CNCI (coding non-coding index) software were used to predict the transcriptome coding potential according to sequence characteristics of transcripts with default parameters [44,45]. The PLEK SVM (support vector machines) classifier uses an optimized K-mer approach to construct the best classifier with which to assess the coding potential of a species that lacks high-quality genomic sequences and annotations. The CNCI profiles adjoin nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. Then, after the PLEK and CNCI prediction, the transcript coding potential was assessed by CPC2 (coding potential calculator 2) software. The CPC mainly assesses the extent and quality of the ORF (open reading frame) in a transcript, and searches the sequences with a known protein sequence database to clarify the coding and non-coding transcripts [46]. We used the NCBI eukaryotes' protein database and set the e-value to '1e-10 in our analysis. Finally, the transcription sequences obtained in the prior steps were homology searched with the Pfam-A and Pfam-B databases for Hmmscan, and the lncRNAs (long noncoding RNAs) were finally obtained.

Quantification of Gene Expression Levels
We used bowtie2 software in RSEM (end-to-end and sensitive) to compare the clean reads of each sample obtained by Illumina to reference sequences (Ref), which were unigenes obtained by CD-HIT [47]. The read count for each transcript was obtained from the mapping results and then transformed into the FPKM (expected number of fragments per kilobase of transcript sequence per million base pairs sequenced) for analysis.

Correlation Analysis of Gene Expression
The correlation of the gene expression level between samples was an important index with which to test the reliability of the experiment and the rationality of the sample selection. The closer the correlation coefficient was to 1, the higher the similarity of the expression patterns between samples. The Encode program recommends a Pearson correlation coefficient square (R 2 ) greater than 0.92 (ideal for sampling and experimental conditions). In a specific project's operation, generally, an R 2 between biological repetition samples that is greater than 0.8 is considered to represent reasonable and good biological repetition.

GO and KEGG Enrichment Analysis
Gene ontology (GO) enrichment analysis of differentially expressed genes or lncRNA target genes was implemented by the GOseq R package, in which the gene length bias was corrected [38]. GO terms with corrected p-values of less than 0.05 were considered significantly enriched by differentially expressed genes. KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, organism and ecosystem, using molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www.genome.jp/kegg/, accessed on 25 July 2021) [37]. We used KOBAS software to test the statistical enrichment of the genes in the KEGG pathways [48].

Raw Data Quality Control
The PacBio Sequel sequencing platform is a circular sequencing platform, and sequencing reads produced by a single molecule during the sequencing process are called polymerase reads. Subreads are obtained by filtering and removing the joint and polymerase reads with a length of less than 50 bp. The statistical results for the polymerase reads and subreads in this study are shown in Table S1 and Figures S1 and S2.

Transcript Correction
The third-generation sequencing technology represented by PacBio has the advantage of an extremely long read length, but also has a high single-base error rate. Yet, its sequencing errors are random and can improve the data accuracy by increasing the sequencing depth up to 99.99%. Furthermore, the full-length transcriptome can be corrected multiple times to improve the data accuracy. High-quality CCSs are consistent sequences obtained from subreads in each ZMW (Zero-Mode Waveguides), where the same template is sequenced multiple times to perform in-hole corrections for zero-mode waveguide holes without a reference sequence alignment. CCS statistics are shown in Table S2 and Figure S3. The lengths of CCSs range from 2000 to 3000 at most. In SMRT Link software, a sequence containing both 3 and 5 UTRs and a poly(A) tail before the 3 primer, is defined as either a full-length (FL) read or non-full-length read. A non-chimeric sequence in an FL read is called a full-length non-chimeric (FLNC) read. The statistical results of the full-length transcript classification are shown in Table 1 and Figure S4. Multi-copy transcript sequencing data were clustered to eliminate redundancies, and an interhole correction of zero-mode waveguide holes was performed to obtain cluster consensus sequences. Arrow software was used to polish the cluster consensus sequences, to obtain consensus sequences. Consensus statistical results are shown in Table S2 and Figure S5. The lengths of the consensus sequences ranged from 2000 to 3000 at most and they had a similar length distribution to the FLNC reads. To further improve the sequencing accuracy, LoRDEC software was used to adjust the consensus sequences with next-generation data, and corrected consensus sequences were obtained. The lengths of the sequence before and after transcription correction were statistically analyzed ( Table 2).

Redundant Removal
CD-HIT uses heuristic algorithms to quickly find highly similar fragments between sequences. To do so in this study, first, all sequences were sorted according to their length. Then, the first cluster was formed from the longest sequence. Finally, the sequences were processed in turn. If the similarity between a new sequence and a representative sequence of the existing sequence cluster was above a set identity threshold, the new sequence would be added to the existing cluster; otherwise, a new cluster would be formed. According to a 95% similarity between sequences, we conducted clustering to eliminate redundancies from the corrected consensus sequences, and the results are shown in Figure 1, Table 3 and Tables S3 and S4. After redundancy removal, the number of unigenes was reduced by nearly 50% compared with the number of transcripts, of which 14,840 transcripts had no redundancies and 4170 unigenes contained two transcripts.

Gene Function Annotation
To obtain comprehensive gene function information, a gene function annotation was performed on symbiont unigenes with the NR, NT, Pfam, KOG/COG, Swiss-Prot, GO and KEGG databases. There were 21,204 genes annotated in at least one database, and 4256 genes were annotated in all seven databases ( Figures S6 and S7, Table S5 and Supplementary Data S1). To distinguish the unigenes of corals and zooxanthellae, the full-length transcriptome of the symbiont was divided into a full-length coral transcriptome and full-length zooxanthellae transcriptome, based on consensus, according to NR and NT annotation. This annotation is characterized by its comprehensive content, with the annotated results containing species information that is used for classification. A total of 21,926 unigenes was found in the coral and the zooxanthellae had 465 unigenes. The two transcriptomes were corrected and redundancies removed, according to the aforementioned methods, and reannotated based on the seven databases listed. All statistical results are shown in Table S5 and Figure 2.

NR Database Annotation
By comparing the similarities of the gene sequences of this species and those of the related species, and annotating with the NR database, functional information on the genes of this species could be obtained. There were 20,309 and 423 unigenes annotated in the NR database and the statistical results are shown in Figures S8 and S9. The top three species with the highest numbers of annotated unigenes in the coral were A. digitifera, Exaiptasia pallida and Nematostella vectensis, together comprising 92.12% of the unigenes. The top three species with the highest number of annotated unigenes in zooxanthellae, meanwhile, were A. digitifera, Symbiodinium microadriaticum and E. pallida, together comprising 81.80% of the unigenes.

GO Classification
GO is divided into three categories: (1) cellular component, used to describe subcellular structures, locations and macromolecular complexes, such as nucleoli, telomere and the recognition of the initiation complex; (2) molecular function, used to describe the functions of individual genes or gene products, such as binding to carbohydrates or ATP hydrolase activity; (3) biological process, used to describe biological processes in which the products encoded by genes participate, such as mitosis or purine metabolism. The successfully annotated unigenes were classified according to the second level of the three GO categories, and the results are shown in Figure S10. In total, 15,147 coral unigenes and 36 zooxanthellae unigenes were successfully annotated.

KOG Classification
Through a comparison, a protein sequence could be annotated into a KOG for eukaryotes, and each cluster of KOG is composed of lineal homologous sequences, meaning that the function of the sequence can be inferred. According to the function, a KOG can be divided into 26 clusters (http://www.ncbi.nlm.nih.gov/COG/, accessed on 17 March 2020). In this study, 13,660 coral genes and 191 zooxanthellae genes were successfully annotated in the KOG database (Table S6 and Figure S11).

KEGG Classification
KEGG is a database used to analyze the metabolic pathways of gene products, compounds in cells and the functions of these gene products. KEGG combines data from genomes, chemical molecules and biochemical systems, including the metabolic KEGG pathway, KEGG drug, KEGG disease, KEGG module, KEGG genes and KEGG genome annotation systems. The KO (KEGG ORTHOLOGY) system links each KEGG annotation systems together, completing the annotation system for the functional annotation of a genome or transcriptome for newly sequenced species (http://www.genome.jp/kegg, accessed on 9 March 2020). After KO annotation, unigenes can be classified according to the KEGG metabolic pathway they participate in, as shown in Figure S12.

Pfam Database Annotation
The Pfam database contains many protein domain families, which are composed of two parts: Pfam-A and Pfam-B. Pfam-A originates from the Pfamseq database based on the latest high-quality UniProtKB. As a supplement to Pfam-A, Pfam-B is extremely useful for identifying functional, conserved areas that Pfam-A cannot cover (http://pfam.xfam.org/, accessed on 20 April 2020). The annotation results of the Pfam database are shown in Supplementary Data S2.

Swiss-Prot Database Comment
Swiss-Prot is an annotated protein sequence database, including protein function, post-translational modification, variation and other descriptive information, which can be used to further identify protein variation and reduce redundancy (Supplementary Data S3).

CDS Prediction
The prediction of a protein-coding region is helpful for a preliminary unigene analysis and also sets the basis for a subsequent protein structure analysis. ANGEL software using a machine-learning algorithm was applied for the predictive analysis of the CDS. This maximized the limited information from the input sequence to predict the CDS. The frequency of codon usage, and protein structure information that is difficult to apply to the random model algorithm, were used to optimize the limited information. Therefore, the accuracy of the prediction results was independent of the length of the input sequence. The CDS length distribution results are shown in Figure 3.

TF Analysis
For the species in the animalTFDB 2.0 database, if they were Ensembl Gene ID, the transcription factors would be screened directly, and for unigenes not Ensembl Gene ID, BLASTX screening would be performed by the known transcription factor protein sequences of species in the database. For species not included in the database, hmmsearch was used to identify them according to pfam files of transcription factor families. The top 30 transcription factor families annotated to the largest number of transcripts are displayed in Figure 3.

SSR Analysis
The high variability of SSR length is caused by different nucleotides of repeat units and different repeats, among which the most common is the dinucleotide repeat type, such as (CA)n. The minimum repetition times of each of the corresponding unit sizes are: 1-10, 2-6, 3-5, 4-5, 5-5, 6-5 (for example, with 1-10, when a single nucleotide is used as the repetition unit, its repetition number can be detected only when it is at least 10; with 2-6, when a dual-core nucleotide is used for the repeat unit, the minimum number of replications is six for SSR detection in the MISA software (http://pgrc.ipk-gatersleben.de/misa/misa.html, accessed on 12 July 2020). Figure S13 displays an SSR distribution diagram.

LncRNA Prediction
LncRNA (long non-coding RNA) is a class of RNA molecules with transcripts over 200 nt long that do not encode proteins. Due to the limitation of the library construction principle, we could only obtain lncRNA containing a poly(A) tail. We used CNCI, CPC2, Pfam and PLEK to predict the coding potential of the unigenes. The number of noncoding unigenes predicted by each software was plotted into a Venn diagram displaying the numbers of common and unique LncRNAs predicted by each method (Figure 4). The length distribution density of mRNA and predicted lncRNA was then calculated (Figure 4).   Table S6.

Gene Expression Statistics
We used RSEM software to produce statistics of the bowtie2 comparison results, and further obtained a read count value of each sample in comparison to each unigene. Next, we performed FPKM conversion to analyze the gene expression level. The numbers of unigenes at different expression levels and the expression level of a single unigene were counted (Tables S7 and S8).

GO Entries
To screen unigenes with high expression levels, an FPKM of greater than 100 was taken as the screening threshold. A directed acyclic graph (DAG) is a graphical result of the GO enrichment analysis. Branches represent the inclusion relationship, and the range of functions defined from top to bottom became smaller and smaller. Generally, the top 10 results of the GO enrichment analysis are selected as the main nodes of a DAG. Each node of a DAG represents a GO term, and the box represents a GO with a top 10 enrichment degree. The depth of color represents the enrichment degree, and the darker the color, the higher the enrichment degree. According to the DAG results, the organonitrogen compound biosynthetic process enriched the most unigenes in the biological process ( Figure S14). The ribosomes, cytoplasmic parts, ribonucleoprotein complexes and nonmembrane-bound organelles, in the cellular component and structural molecular activity of the molecular function, enriched the most unigenes (Figures S15 and S16, Supplementary Data S4 and Figure 5).

KEGG Entries
The two pathways with the most KEGG enrichment were ribosome and oxidative phosphorylation. The top 20 pathways enriched in KEGG also included mineral absorption, indicating that coral reefs are formed mainly by the expression of genes related to mineral absorption ( Figure 6 and Supplementary Data S5). Figure 6. Highly expressed gene KEGG scatter plot of coral. The horizontal axis represents the rich factor and the vertical axis the KEGG term. The circle size represents the number of genes, and the color from purple to red indicates a q-value from 1 to 0.

Functional Gene Analysis of Zooxanthellae
As the number of unigenes in zooxanthellae was low, to obtain more comprehensive gene function information, GO, KEGG and KOG were used for the functional classification of the unigenes in zooxanthellae. In addition to forming basic cell structures, zooxanthellae also participated in the metabolism, regulation of biological processes and locomotion biological processes ( Figure S10B). In the KEGG classification results, besides a human diseases cluster, signal transduction and energy metabolism enriched the most unigenes ( Figure S12B). Similar to the KEGG classification results, signal transduction mechanisms, general function prediction alone, posttranslational modification and transcription matched the most unigenes ( Figure S11B).

Coral Biological Process
In addition to 43 ribosomal proteins, which mainly require organonitrogen compound biosynthesis, there are other proteins with important functions among the 71 enriched genes (Figure 4). ATP synthase produces ATP from ADP (adenosine diphosphate) in the presence of a proton gradient across the membrane, which is generated by electron transport complexes of the respiratory chain [49,50]. Nucleoside diphosphate kinase B plays a major role in the synthesis of nucleoside triphosphates other than ATP [51]. The activating enzyme activates a given amino acid by attaching it to the corresponding transfer ribonucleic acid [52]. Translational elongation factors are proteins that play important roles during the elongation cycle of protein biosynthesis on the ribosome [53]. The barrier to the autointegration factor is involved in multiple pathways, including mitosis, nuclear assembly, viral infection, chromatin and gene regulation and the DNA damage response [54]. The synthesis of organic nitrogen in coral is mainly used to produce protein and ATP.

Coral Cellular Component
There are 51 enriched genes in ribosomes, of which 42 are ribosomal proteins and the remaining are eukaryotic translation initiation factors and elongation factors involved in translation (Figure 4). These assemble amino acids to form proteins that are essential to carry out cellular functions. The cytoplasmic part and ribonucleoprotein complex also have more than 80% ribosomal proteins. It is noteworthy that aspartic acid genes related to coral skeleton growth are enriched in non-membrane-bounded organelles [55]. Aspartic acid is the most abundant of all amino acids in the coral skeleton and shows a clear seasonal fluctuation.

Coral Molecular Function
Heat shock factor-binding protein 1 (HSBP1) is enriched in the molecular function. HSBP1 is critical for early embryonic development and potentially affects the Wnt signaling pathway related to coral growth [56]. In molecular function, collagen I alpha 1 (which produces type-I collagen) is the most abundant structural protein of the connective tissues, which is responsible for connecting coral tissues (Figure 4).

Coral Mineral Absorption
Sodium/potassium-transporting ATPase is the catalytic component of the active enzyme, which catalyzes the hydrolysis of ATP coupled with the exchange of sodium and potassium ions across the plasma membrane; thus, providing the energy for the active transport of various nutrients [57]. Ferritins were identified as highly expressed in each sample. Ferritins are primarily used in organisms to store iron [58]. Iron availability is known to indirectly stimulate heterotrophic microbial production through the release of phytoplankton-derived dissolved organic matter. The expression of ferritin in corals can stimulate zooxanthellae to provide necessary nutrients; thus, promoting the symbiotic relationship between coral and the zooxanthellae.

Gene Function of Zooxanthellae
Zooxanthellae express proteins to construct their own structures, synthesize nutrients, regulate transcriptional DNA replication, etc., to maintain their own life activities. Proteins that are used to make up the contents of the cell membrane, cell wall and cytoplasm, as well as to sustain cell activity, are expressed in zooxanthellae. Ankyrins are a family of proteins that link the integral membrane proteins to the underlying spectrin-actin cytoskeleton and play key roles in activities such as cell motility, activation, proliferation, contact and the maintenance of specialized membrane domains [59]. Inositol oxygenase 2 is involved in the biosynthesis of UDP-glucuronic acid (Uridine diphosphate-GlcA); thus, providing nucleotide sugars for cell-wall polymers [60]. The ammonium transporter (AMT) plays a key role in NH4+ absorption and transport and is involved in maintaining the Golgi structure [61]. Vinexin is an adaptor protein supposed to play pivotal roles in various cellular events such as cell adhesion, cytoskeletal organization, signaling and gene expression [62]. Endoplasmin is a molecular chaperone that functions in the processing and transport of secreted proteins [63]. Alpha adducin, ADD1, is a ubiquitously expressed protein that is part of the cytoskeleton and may modulate ion transport [64]. These proteins are used to make up the contents of the zooxanthellae's cell membrane, cell wall and cytoplasm. E3 ubiquitin-protein ligase (BRE1B) plays a central role in histone coding and gene regulation [65]. The COMM domain is a scaffold protein motif domain that is implicated in diverse physiological processes, the function of which may be in part linked to its ability to regulate the ubiquitination of specific cellular proteins [66]. The enhancer of a rudimentary homolog (ERH) has several binding proteins and has been associated with various cellular processes, such as pyrimidine metabolism, cell-cycle progression and transcription control [67]. Soluble starch synthase I (SSI) is a key enzyme in the biosynthesis of plant amylopectin [68]. Under the co-regulation of these proteins, zooxanthellae can complete the regulation of cell growth gene expression, DNA replication, cell polarity development and other related protein expressions, while maintaining the balance of the zooxanthellae population in coral.
Zooxanthellae may also have signaling molecules that regulate the communication between cells and the relationships with corals. The allene oxide synthases may be converted to allene oxides and, subsequently, give rise to plant signaling molecules [69]. The allene oxide synthases may act as pheromones in zooxanthellae to transmit information. The members of the Syndecan family of heparan sulfate proteoglycans play diverse roles in cell adhesion and cell communication by serving as co-receptors for both cell-signaling and extracellular matrix molecules [70].
The sulfite reductase hemoprotein beta-component is a component of the sulfite reductase complex that catalyzes the six-electron reduction in sulfite to sulfide [71]. This is one of several activities required for the biosynthesis of the NLRC3 protein, which is a cytosolic regulator of innate immunity. L-cysteine from sulfate stimulates the inhibitory effect of vitamin D on oxidative stress, IL-8 and McP-1 secretion in monocytes treated with high glucose [72]. Tumor necrosis factor receptor-associated factor (TRAF) proteins play crucial roles in plant development and the response to abiotic stress [73]. Heat shock factor proteins are highly expressed in zooxanthellae, and also expressed in coral transcriptome. Nudix hydrolase 8 may be involved in plant immunity and act as a positive regulator of the defense response through salicylic acid (SA) signaling [74]. Zooxanthellae can also express the heat stress protein, NLRC3 protein, tumor necrosis factor receptor-associated protein and sulfite reductase hemoprotein. Beta-component, L-cysteine, Nudix hydrolase 8, etc., provide the self-regulating ability to cope with environmental change.
Notably, in the zooxanthellae transcriptome, chloroplast-related expression proteins are used for photosynthesis and ATP formation. The peridinin-chlorophyll a-binding protein is a water-soluble antenna for the capture of solar energy in the blue-green range [75]. Chloroplast functioning requires the import of nuclear-encoded proteins from the cytoplasm across the chloroplast double membrane [76]. This is accomplished by two protein complexes, the Toc complex located at the outer membrane and the Tic complex located at the inner membrane. The Toc complex recognizes specific proteins by a cleavable N-terminal sequence and is primarily responsible for translocation through the outer membrane, while the Tic complex translocates the protein through the inner membrane. The chloroplast stem-loop binding protein, meanwhile, binds and cleaves RNA, particularly in stem-loops, and associates with pre-ribosomal particles in chloroplasts. The protein participates in the chloroplast ribosomal RNA metabolism, probably during the final steps of 23S rRNA maturation [77]. Further to this, light-harvesting complex I LH38 proteins are synthesized as a 100 kDa polyprotein that is entirely imported into the chloroplast, where it is, subsequently, cleaved into five mature 20 kDa LH38 proteins [78]. Pyruvate kinase (PK) is the enzyme responsible for the final step of glycolysis, in which phosphoenolpyruvate is converted to pyruvate with the production of ATP [79]. Pyruvate phosphate dikinase (PPDK) is an essential enzyme of C4 photosynthesis in plants, catalyzing the ATP-driven conversion of pyruvate to phosphoenolpyruvate (PEP) [80]. PEPC (phosphoenolpyruvate carboxylase) plays a key role in photosynthesis by C4 and crassulacean acid metabolism in plants, in addition to its many anaplerotic functions [81]. Zooxanthellae capture solar energy in the blue-green range by the peridinin-binding protein, and transport nuclearencoded proteins into chloroplasts via Tic and Toc to regulate gene expression [82]. ATP is produced through pyruvate kinase and pyruvate phosphate dikinase and carbon is fixed in organic matter for coral growth to generate oxygen. In the dark, PEPC can be used for the photosynthesis.

Biochemical Connection in the Coral and Algal Symbiont
The generation of ATP and reductant (NADP(H)) in Symbiodinium cell chloroplasts and mitochondria sustains the active uptake and assimilation of nutrients into organic compounds forming the key carbon 'skeletons': carbohydrates, proteins, lipids and trace elements, which are important nutrients for coral subsistence [83][84][85][86][87]. Peridinin-binding protein, PEPC and other photosynthesis associated proteins expressed in zooxanthellae transcriptomes can fix carbon and form nutrients for symbiont. Nuclear encoded proteins, Soluble starch synthase I, Vinexin, COMM Domain and the Syndecan family of Heparan sulfate proteoglycans are related to the transport of formed compounds to corals. During the endosymbiotic interaction, a host regulates glycan profiles of the symbiont populations by post-translational modification to produce the required nutrients [88]. Symbiotic Symbiodinium (i.e., in hospite) contained less nitrogen compounds, more lipids and soluble carbohydrates than free-living Symbiodinium, indicating that the excess produced nutrients are to be transported to the coral [89,90]. Muscatine and Hand suggested that labeled metabolites moved from the algae to the tissue of the anemone host after 48 h of exposure but not after 18 h [91] and C 14 incorporates into lipids, amino acids, acidic and neutral compounds [92]. Trench has identified the compounds produced by Symbiodinium, and it was shown that glycerol was the major extracellular product and other labeled compounds included alanine, glucose, fumaric acid, succinic acid, glycolic acid and two other unidentified organic acids [93]. However, the hosts controlling the release of metabolites from the symbiont are "host factors" (HF) or "host release factors" (HRF) [93,94]. Gates et al. proposed that the metabolites of host, free amino acids, served as HRFs [86]. The above findings are consistent with our results, and our study confirmed the role of symbiotic zooxanthellae in symbionts at the gene expression level.

Conclusions
The gene expression profile of a coral-zooxanthellae holobiont forms the molecular foundation of coral reef biology. In this study, the split gene expression profiles of the reef-building coral P. damicornis and its symbiotic zooxanthellae were acquired through fulllength transcriptome sequencing using PacBio Sequel II sequencing technology. GO and KEGG analyses determined that coral has the capacity to regulate life activities, respond to stress and absorb minerals. Besides this, it had an anti-stress ability. Zooxanthellae, meanwhile, could produce signaling molecules for communication. Zooxanthellae also provided for the coral host through energy and nutrition metabolism by photosynthesis.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10.3 390/d13110543/s1, Figure S1: Polymerase reads distribution statistical results, Figure S2: Subreads distribution statistical results, Figure S3: CCS reads distribution statistical results, Figure S4: FLNC distribution statistical results, Figure S5: Consensus reads distribution statistical results, Figure S6: Seven database annotation statistical results, Figure S7: Gene functional annotation Venn diagram, Figure S8: Annotation statistical results of coral full-length transcriptome according to the NR database, Figure S9: Annotation statistical results of zooxanthellae full-length transcriptome according to the NR database, Figure S10: Annotation statistical results of coral and zooxanthellae full-length transcriptome according to the GO database, Figure S11: Annotation statistical results of coral and zooxanthellae full-length transcriptome according to the KOG database, Figure S12: Annotation statistical results of coral and zooxanthellae full-length transcriptome according to the KEGG database, Figure S13: SSR distribution results of coral and zooxanthellae, Figure S14: Gene GO enrichment DAG of biological process, Figure S15: Gene GO enrichment DAG of cellular component, Figure S16: Gene GO enrichment DAG of molecular function, Table S1: Statistical results of polymerase read and subread, Table S2: Statistical results of CCSs and FLNCs, Table S3: Statistical results of unigene number corresponding transcripts, Table S4: Statistical results of sequence length distribution after de-redundancy,

Data Availability Statement:
The dataset generated for this study can be found in the National Center for Biotechnology Information (NCBI) (accession number: SAMN16237127) in BioProject: PRJNA544778.

Conflicts of Interest:
The authors declare no competing interests.