Identification and Phylogenetic Analysis of Chitin Synthase Genes from the Deep-Sea Polychaete Branchipolynoe onnuriensis Genome

Chitin, one of the most abundant biopolymers in nature, is a crucial material that provides sufficient rigidity to the exoskeleton. In addition, chitin is a valuable substance in both the medical and industrial fields. The synthesis of chitin is catalyzed by chitin synthase (CHS) enzymes. Although the chitin synthesis pathway is highly conserved from fungi to invertebrates, CHSs have mostly only been investigated in insects and crustaceans. Especially, little is known about annelids from hydrothermal vents. To understand chitin synthesis from the evolutionary view in a deep-sea environment, we first generated the whole-genome sequencing of the parasitic polychaete Branchipolynoe onnuriensis. We identified seven putative CHS genes (BonCHS1-BonCHS7) by domain searches and phylogenetic analyses. This study showed that most crustaceans have only a single copy or two gene copies, whereas at least two independent gene duplication events occur in B. onnuriensis. This is the first study of CHS obtained from a parasitic species inhabiting a hydrothermal vent and will provide insight into various organisms’ adaptation to the deep-sea hosts.


Introduction
Chitin, a linear polymer of β-(1,4)-N-acetyl-D-glucosamine (GlcNac), is the second most abundant biopolymer in nature, followed by cellulose, with more than 100 billion tons synthesized annually [1][2][3][4][5]. It is found in various organisms, ranging from fungi to various invertebrates, and provides them with sufficient rigidity to support their shape and structure [1]. In arthropods, chitin plays a crucial role in forming new cuticles during molting and is a component of the intestine peritrophic matrix in insects, which supports digestion [1,6,7]. In nematodes, chitin components are found in the eggshell and pharynx [8]. Furthermore, in Lophotrochozoa, chitin forms the radula and shell in mollusks [9][10][11], beak in cephalopods [12], and chaetae in annelids [13]. Due to its diverse function, chitin is attracting attention as a raw material for various fields, such as the pharmaceutical and biotechnological industries [14].
Chitin is polymerized by an enzyme called chitin synthase (CHS, chitin 4-β-N-acetylglucosaminyltransferase; EC 2.4.1. 16), which is generally characterized by three functional domains: A, B, and C [15]. Domain A, composed of several transmembrane helixes, is located at the N-terminal, and this domain sequence may vary between species. Domain B (chitin_synth_2), the catalytic core that contains two highly conserved motifs ("EDR" and "QRRRW"), is in the middle of the gene. Domain C is located at the C-terminal, with approximately seven transmembrane helices, and has the conserved motif "WGTRE" [1].
Generally, insects have two CHS genes (CHS1 and CHS2). CHS1 is responsible for cuticle formation in the epidermis, while CHS2 is involved in chitin synthesis in the peritrophic membrane of the intestine [1,6]. A CHS gene knockdown study in the crustacean Lepeophtheirus salmonis showed the formation of an abnormal appendage, which eventually led to death, suggesting the multifunctional role of CHS [16]. However, interestingly, compared to the ecdysozoans, which have only a single or two gene copies located in the same chromosome, numerous CHS genes have been identified in lophotrochozoans [1]. For example, 31 CHS genes were identified in the brachiopod Lingula anatina [17]. In addition, four and five CHS genes were identified in the shallow-sea polychaetes Capitella teleta and Dimorphilus gyrociliatus, respectively, whereas 19 and 12 CHS genes were significantly expanded in the deep-sea polychaetes Paraescarpia echinospica and Lamellibrachia luymesi [18]. These findings suggest that the CHS gene duplication event occurs lineage-specifically. However, CHSs have mostly been explored only in arthropods; so far, little is known about CHSs in annelids. Studies are gradually investigating the poorly explored realm of the lophotrochozoan CHS, but data on the evolutionary process of CHS gene expansion are still lacking. In addition, no phylogenetic analysis, including deep-sea parasitic polychaetes, has been reported.
In this study, we performed whole-genome sequencing (WGS) of parasitic polychaete Branchipolynoe onnuriensis collected from bivalves living in a hydrothermal vent [19] and identified seven CHS genes (BonCHS1-BonCHS7) belonging to the glycosyltransferase 2 (GT2) family. This is the first study of CHSs from deep-sea parasitic polychaetes. We also analyzed the relationship of B. onnuriensis CHS genes with those from lophotrochozoans. In addition, we expanded on the classification of lophotrochozoan CHS gene groups, in order to obtain information about their CHS gene family expansion and categorized them into five different subgroups. Our results will provide important information for those who study the chitin synthesis mechanism in deep-sea parasitic polychaetes in the future.

Sample Collection and Next-Generation Sequencing
An individual parasitic polychaete Branchipolynoe onnuriensis was separated from its host Gigantidas vrijenhoeki (class Bivalvia) using a video-guided hydraulic grab (Oktopus, Germany) around the Onnuri Vent Field (OVF, 11 • 14 55.92" S, 66 • 15 15.10" E; depth of 2014.5 m) during a Korea Institute of Ocean Science and Technology (KIOST) expedition along the Central Indian Ridge (CIR) in 2019 [20]. Immediately after being collected, the sample was stored in 95% ethanol at -20 • C, until DNA extraction in the laboratory. Genomic DNA was extracted using the QIAGEN Blood & Cell Culture DNA Mini Kit (QIAGEN, Hilden, Germany), according to the manufacturer's instructions. A paired-end library was constructed using the TruSeq DNA Nano 550 bp kit (Illumina, Inc., San Diego, CA, USA), with an insert size of 550 bp, and 150 bp sequencing was performed using the Novaseq6000 platform (Illumina).

Data Filtering and De Novo Genome Assembly
Adaptor sequences and low-quality reads that were lower than the mean quality score of 20 were removed. In addition, reads shorter than 120 bp or with unknown bases (n) were filtered using Trim Galore (ver. 0.6.6) [21]. The cleaned reads were obtained with the following parameters: -quality 20 -length 120 -max_n 0.
After quality control, de novo assembly, using 21-, 31-, and 51-mers to build an initial de Bruijn graph, was performed with SPAdes (ver. 3.14.0) [22]. Finally, quality assessment software for genome assembly, QUAST (ver. 5.0.2) were utilized to obtain diverse metrics, such as the number of contigs, a large contig, the total length, N50, or L50, without a reference genome [23].

Gene Prediction and Identification of the Chitin Synthase Gene
The genome structure of B. onnuriensis was annotated using ab initio gene prediction with Augustus (ver. 3.4.0) using a generalized hidden Markov model [24].
To extract the putative CHS sequences, we combined the basic local alignment search tool (BLAST) searches using National Center for Biotechnology Information (NCBI) and domain predictions. According to Zakrzewski et al. (2014) [7], since lophotrochozoans have four subgroups of CHS genes in type 2, we assumed that there would be at least one gene in each group (A, B, C, and D). First, we mined five CHS genes corresponding to each group from the same polychaete species, three CHS genes from Owenia fusiformis (group A, accession no. AHX26704.1; group D, accession no. AHX26707.1; type 1, accession no. AHX26703.1), and two CHS genes from Sabellaria alveolata (group B, accession no. AHX26717.1; group C, accession no. AHX26711.1) from NCBI and used them as queries to search homologous genes in our sample. In addition, we performed BLAST searches against the customized database with an E-value cut-off of < 1 × 10 −50 and a length of > 300 amino acids (aa) [25]. Next, domain searches of each obtained gene were carried out using the simple modular architecture research tool (SMART) [26]. We identified seven putative genes and named them BonCHS1-BonCHS7. To confirm the putative BonCHSs, we performed BLAST searches against the Carbohydrate-Active enZymes Database (CAZy; https://bcb.unl.edu/dbCAN2/download/CAZyDB.09242021.fa, accessed on 1 June 2022) which contains enzymes that synthesize or break down carbohydrates and glycoconjugates, with an E-value cut-off of < 1 × 10 −100 [27]. In order to obtain comparable E-values, the database size of 1.58 × 10 11 (using the "-dbsize" option) was set to be equivalent to the size of the non-redundant (NR) protein database at NCBI.

Phylogenetic Analysis of Chitin Synthase
We conducted two phylogenetic analyses, i.e., the lophotrochozoan and metazoan trees. In the lophotrochozoan phylogenetic tree, 52 protein sequences, including seven putative BonCHS1-BonCHS7, were retrieved from 20 species (Table 1). In the metazoan CHS gene tree, 74 protein sequences were retrieved from 33 species (Table 1). Multiple sequence alignments were performed with MAFFT (ver. 7.475) [28]. We also used IQ-TREE (ver. 2.2.0) to select the best substitution model via Bayesian information criterion (BIC) [29]. The best evolutionary model of LG + I + G4 and LG + F + I + G4 was selected to construct the maximum likelihood (ML) for metazoan and lophotrochozoan data sets using RAxML-NG (ver. 0.9.0) [30]. In addition, each branch was supported by 1000 bootstrap replicates in the ML tree. The Bayesian tree was constructed using MrBayes (ver. 3.2.4), using the LG + I + G4 and LG + F + I + G4 models for metazoan and lophotrochozoan datasets. A total of four chains (three heated and one cold chain), for sampling all 5 × 10 2 generations, were carried out in two independent analyses. We performed 1 × 10 6 generations of MCMC analysis, and the first 25% trees as burn-in and incorporated with the ML tree [31]. Finally, each metazoan and lophotrochozoan phylogenetic tree was visualized using FigTree (ver. 1.4.4).

Data Filtering and Genome Assembly
Using Illumina paired-end sequencing, we generated 39.82 Gb raw reads from the parasitic polychaete Branchipolynoe onnuriensis. A stringent quality filter process (with Phred quality scores of 20 or more, see Materials and Methods) was applied; then, a total of 37.58 Gb (94.36%) filtered reads remained. After the filtering process, de novo assemblies of the genome sequences, using the software package SPAdes (ver. 3.14.0), were performed. Our initial genome assembly comprised 14,816 contigs, with a total length of 196,561,892 bp. The largest contig was 210,881 bp long, with an N50 length of 12,818 bp. Although the data obtained were insufficient for downstream analysis, since there are no genomic data available in Brachipolynoe spp. or parasitic polychaetes, we performed gene prediction to identify CHS protein-coding regions (see Table 2 for general information). Table 2.
General information from next-generation sequencing to gene annotation in Branchipolynoe onnuriensis.

Gene Prediction and Chitin Synthase Search
Gene structure prediction was conducted using the ab initio method, which yielded 353,344 protein-coding genes. To extract CHSs from B. onnuriensis, we performed sequence similarity searches and extracted the top five best genes in each group (type 1; groups A, B, C, and D in type 2) and investigated the sequences thoroughly (Table 3). Two genes were identified as type 1; four in groups A, B, and D; and five in group C. The first top-hit gene in each group was assumed to be the BonCHS genes belonging to the corresponding group. However, in group B, the third top-hit gene, g91735.t1, was considered a candidate CHS gene, because the first and second genes, g38534.t and g45117.t1, belonged to groups D and A, respectively. In addition, phylogenetic analysis for more sensitive identification showed no outlier (Figure 1). Therefore, we determined these five genes as putative BonCHS genes.
In addition, the number of genes in each group was determined to be one, except for group C. For example, the first top-hit gene in group A was included in group A, and all three genes belong to groups B, C, and D. However, in group C, the fourth and fifth top-hit genes belong to groups D and A, respectively. Since information about which group the second and third top-hit genes belong to is unknown, we first added two genes to the phylogenetic tree. As a result, both genes were included in the same clade of group C. Thus, the number of genes in group C was determined to be three. Consequently, from the similarity search and phylogenetic analysis, we extracted seven different CHS genes from B. onnuriensis: BonCHS1-BonCHS7.

Protein Domain Search, Identification of the GT2 Family, and Multiple Sequence Alignments
The seven BonCHS genes (BonCHS1-BonCHS7) were subjected to predict the domain structures using the SMART web server (http://smart.embl-heidelberg.de, accessed on 23 March 2022). We found that seven BonCHS sequences have chitin_synth_2 domain (Pfam domain: PF03142), except for BonCHS3 and BonCHS5. We supposed that three genes (BonCHS3, BonCHS4, and BonCHS5) in group C were only partially assembled, due to the limitation of Illumina short-read sequencing and lower coverage depth. However, the BLAST searches against the NCBI and UniProt web server showed that BonCHSs with the top-hit was CHS genes of the lophotrochozoan species, except for BonCHS6 (Table 4). Furthermore, multiple sequence alignment was performed using 45 amino acid sequences obtained from lophotrochozoans. The two unique motifs, "EDR" (associated with catalytic function) and "QRRRW" (conferring processivity to CHS), were found to be highly conserved in all annelids and mollusks, suggesting their significance in chitin synthesis ( Figure 2) [7,32]. Table 3. BLAST result with E-value cut-off threshold of < 1 × 10 −50 and length > 300 aa.  The similarity searches against the CAZy database showed their inclusion in the GT2 family (Table 5). For all genes, the E-value was < 1 × 10 −130 and their identities ranged from 37.02% to 88.46%. Although BonCHS3 and BonCHS5 were not confirmed by the domain searches, their E-values showed 0 and 4.67 × 10 −132 , with an identity of 76.46% and 53.22%. Note that our analysis failed to find the "EDR" and "QRRRW" motifs from BonCHS7. Thus, BonCHS3, BonCHS5, and BonCHS7 were excluded from the evolutionary patterns.

Phylogenetic Analysis of Chitin Synthase
In order to confirm the ortholog relationship and understand the molecular evolutionary history, we conducted phylogenetic analysis, including all type 1 and 2 CHS genes from the NCBI (Figure 1). As suggested by Zakrzewski et al. (2014) [7], type 1 CHS genes generally exist in all metazoans, and BonCHS7 was found to be closely related to OfuCHS6 from O. fusiformis and EmaCHS5 from Elysia marginata. However, since E. marginata EmaCHS6 and S. alveolata SalCHS3 form another clade, type 1 lophotrochozoan CHS is considered a paraphyletic group.
To understand the evolutionary relationship of lophotrochozoan CHS genes, we reconstructed a ML phylogenetic tree, with CHS protein sequences from seven annelid species, two gastropods, one polyplacophora, and five bivalves ( Figure 3). Five deuterostome sequences were used as an outgroup. Type 2 CHS genes mainly consist of four groups (groups A, B, C, and D). In each group, annelid and mollusk clades are clearly separated, with well-supported values of ≥ 87% and 1 from the ML and Bayesian inference, which suggests that the lophotrochozoan CHS gene duplication event occurred before the divergence of annelids and mollusks [7]. Except for group C, annelid and mollusk CHS genes formed a monophyletic clade in all groups. This means that the O. fusiformis group C CHS gene has undergone a more complex evolutionary process. In the same context, in group C, we found three BonCHS genes. Since these genes (BonCHS3-BonCHS5) originated from different contigs, they are more likely to result from the gene duplication events, rather than be isoforms. However, not all polychaetes have increased gene copies in group C. For example, two genes were identified from Platynereis dumerilii in group B, two and three genes in O. fusiformis and B. onnuriensis in group C, and two genes in O. fusiformis and C. teleta in group D, but with no copies in group A. Even for the same taxon, polychaeta, gene duplication did not occur in the same group, which appears to be a species-specific event. Moreover, several CHS copies were also found in mollusks (L. gigantidas and E. marginata). Considering that two types of CHS genes with different functions exist in ecdysozoans (component of the exoskeleton and peritrophic matrix), all four different types of CHS of lophotrochozoans may play a different function. Additionally, since B. onnuriensis was collected by chance from its host, Gigantidas vrijenhoeki, there was a limit to describing their ecological characteristics, except for their habitat and parasitism. However, we obtained evidence of a gene duplication event in group C, and it is best to say that it is due to the two factors mentioned above. To determine the underlying mechanisms and functions of lophotrochozoan enzymes, gene and protein characterization studies are required.  . Figure 2. Multiple sequence alignment of CHSs from lophotrochozoan species-23 CHS genes from annelids and 22 from mollusks were used. Gene types are marked next to the name (A2, B2, C2, and D2 for groups A, B, C, and D in type 2 and T1 for type 1). Two highly conservative motifs (EDR and QRRRW) are indicated in bold red. The color code is followed by physicochemical properties.   Table 1. Genes belonging to the polychaetes are colored in each group (red in group A, orange in group B, yellow in group C, and green in group D). Deuterostome sequences are used as an outgroup. In each node, supporting values for ML and Bayesian inference are shown in this order. The nodes supporting values of <60 are indicated with "-". The arrows indicate annelids with several copies (orange for Platynereis dumerilii, yellow for Owenia fusiformis, green for Capitellateleta, and red for Branchipolynoe onnuriensis). The clades with the gradient boxes represent the polychaete species in each group. The scale bar represents the amino acid substitutions per site.

Conclusions
Chitin, a natural polysaccharide, is the second-largest substance on earth and valuable for many industries. However, compared to the ecdysozoan CHSs, which are relatively wellresearched, little is known about the lophotrochozoan CHS gene. Therefore, in this study, we collected the parasitic polychaete B. onnuriensis living in the deep-sea and conducted WGS to investigate the evolutionary aspect of CHSs. As a first step toward understanding the role of lophotrochozoan enzymes, we successfully identified seven CHS genes (BonCHS1-BonCHS7) and classified them into five groups. Because of the lower coverage depth and