Quantitative Insights into the Contribution of Nematocysts to the Adaptive Success of Cnidarians Based on Proteomic Analysis

Simple Summary Cnidarians (such as corals, anemones, jellyfish) are ancient and successful animals. Previous studies have offered a variety of explanations for the adaptive success of certain cnidarian taxa. However, common strategies for the long-term persistence of cnidarians have not been identified. One factor that may contribute to their evolutionary success of the lineage is the nematocyst, a sting organelle able to deliver venom into prey or enemies. Using bioinformatics analyses, we aimed to quantitatively investigate the role of nematocysts in cnidarian adaptation. We identified the extensive species-specific adaptation in nematocyst proteins (NEMs) and demonstrate that both a unique evolutionary pattern of NEMs and the long evolutionary lag between nematocysts and cnidarians support their key adaptive role. Further, we find NEMs experience approximately 50% more adaptive changes on average compared to non-NEMs, and positively selected cnidarian-conserved proteins are enriched in NEMs. These results support a key role of nematocysts in successful cnidarian adaptation and provide a general quantitative framework for assessing the role of a phenotypic novelty in adaptation. Moreover, the findings will be critical for reassessing the evolutionary history of many established models, enhancing our understanding of both the mechanisms and evolutionary preference of adaptive evolution. Abstract Nematocysts are secretory organelles in cnidarians that play important roles in predation, defense, locomotion, and host invasion. However, the extent to which nematocysts contribute to adaptation and the mechanisms underlying nematocyst evolution are unclear. Here, we investigated the role of the nematocyst in cnidarian evolution based on eight nematocyst proteomes and 110 cnidarian transcriptomes/genomes. We detected extensive species-specific adaptive mutations in nematocyst proteins (NEMs) and evidence for decentralized evolution, in which most evolutionary events involved non-core NEMs, reflecting the rapid diversification of NEMs in cnidarians. Moreover, there was a 33–55 million year macroevolutionary lag between nematocyst evolution and the main phases of cnidarian diversification, suggesting that the nematocyst can act as a driving force in evolution. Quantitative analysis revealed an excess of adaptive changes in NEMs and enrichment for positively selected conserved NEMs. Together, these findings suggest that nematocysts may be key to the adaptive success of cnidarians and provide a reference for quantitative analyses of the roles of phenotypic novelties in adaptation.


Introduction
Cnidarians, a group of invertebrates, have survived five mass extinctions and have persisted for more than 700 million years [1,2]. Cnidarians have a simple body plan but diverse morphologies and complex life cycles [3]. They include more than 12,400 species, both free-living and parasitic, and are distributed in nearly all aquatic [3] and even terrestrial habitats [4]. Previous studies have proposed gene-or environment-based mechanisms to explain the adaptive success of certain cnidarian taxa [2,5]. However, common strategies for the long-term persistence of cnidarians have not been identified. One factor that may contribute to the evolutionary success of the lineage is the nematocyst, a secretory organelle able to deliver chemical arsenals into prey or predators [6,7].
The nematocyst is a phenotypic novelty in cnidarians, observed in fossils from the Middle Cambrian period [1,8]. Each nematocyst consists of a capsule and a coiled tubule through which venom is injected with an ultrafast acceleration of 5 million g [9,10]. The venom induces paralysis for prey capture and may cause pain, possibly for predator deterrence [11]. Nematocysts are also used for locomotion in Hydra and host attachment in parasitic myxozoans [7,12]. Strong selective pressures are expected to result in an arms race. However, it is difficult to quantitatively evaluate the extent to which the nematocyst contributes to adaptation and to explore the molecular mechanisms underlying this trait, in part because the trait is polygenic, making it challenging to quantitatively detect selection acting on multiple genes simultaneously [13]. Previous studies of novelties have focused on a few genes [14] or qualitatively identified a trait as a key innovation by linking it to increased diversification rates [15,16]. This approach may ignore environmental factors and may result in the identification of a complex evolutionary process as a single event [17]. Until now, the role of the nematocyst in cnidarian adaptive evolution and associated evolutionary processes are unclear. Resolving these issues requires knowledge of the specific set of genes that encode the nematocyst proteome. These genes might reveal a unique signature associated with nematocyst evolution and provide a basis for identifying candidate components proximal to nematocyst composition and function [18].
Myxozoa are microscopic, oligocellular endoparasites with simple body organization, but multiple morphologies in their complex life cycles [34]. They have only recently been classified as cnidarians and, together with Polypodium hydriforme, form a sister clade to Medusozoa [3,35]. Myxozoans are the only parasitic group that entailed substantial radiation in cnidarians and represent about 20% of cnidarian species diversity [36]. Myxospores consist of 1-13 polar capsules (referred to as nematocysts below) that have a functionality different from that of free-living species-attaching the host [37,38]. Despite lacking barbs or spines, the discharged tubules of some myxozoans have retained the ability to inject chemicals and other structural elasticity not found in tubules of free-living cnidarians [12,[39][40][41]. Thus, proteomic data of myxozoans are an important addition to understanding the adaptive evolution of cnidarian nematocysts. However, there is only one study to our knowledge on the myxozoan nematocyst proteome, which performed a comprehensive functional and proteomic analysis of Ceratonova shasta [32].
Here, we used three newly generated proteomes for myxozoan nematocysts in conjunction with five previously published proteomes and 110 cnidarian transcriptomes/genomes to determine the extent of positive selection in nematocyst proteins (NEMs) and to characterize their evolution in detail. We detected extensive species-specific adaptations in NEMs as well as evidence for decentralized evolution, in which most evolutionary events involved a non-core NEM set, possibly due to the rapid diversification of NEMs in cnidarians. Additionally, we detected a macroevolutionary lag between nematocyst evolution and the main phases of cnidarian diversification, suggesting that the nematocyst was an evolutionary driving force. This was supported by quantitative analyses of relative adaptation and enrichment of NEMs against reference proteins. We conclude that the nematocyst may be a key determinant of the adaptive success of cnidarians by conferring flexibility (evolvability) and therefore by promoting adaptation to changing selection regimes. Our results provide new insights into the success of cnidarians and build a foundation for quantifying the contribution of a phenotypic novelty to evolutionary success.

Collecting Details
Myxobolus honghuensis was collected from infected allogynogenetic gibel carp Carassius auratus gibelio in Zoumaling Farm, Hubei Province, China on 29 July 2015. Myxobolus wulii was sampled from infected allogynogenetic gibel carp in Yellow Sand Port, Jiangsu Province, China on 8 August 2016. Thelohanellus kitauei was collected from infected common carp Cyprinus carpio in Liuerbao Town, Shenyang Province, China on 11 August 2015. Fish were held on ice before being killed with an overdose of MS-222 (Sigma-Aldrich, Co., Ltd., St. Louis, MO, USA). From each fish, tissue containing one large cyst was homogenized by a manual glass tissue grinder and suspended in 0.1 M phosphate-buffered saline (PBS), pH 7.2, and then filtered through cotton gauze. Myxospores were separated from the filtrate by sucrose gradient centrifugation and Percoll gradient centrifugation in turn, washed several times with distilled water, and then examined microscopically to verify purity and identify. Purified myxospores were either placed into RNAlater (Sigma), frozen in liquid nitrogen and finally stored at −80 • C, or immediately sent to nematocyst isolation as described below. Myxozoan identification was performed based on morphology and 18S sequencing [42,43]. The maintenance and care of experimental animals complied with the National Institutes of Health Guide for the care and use of laboratory animals [44], approved by the animal care and use committee of Huazhong Agricultural University, China (HZAUFI-2015-011).

Nematocyst Isolation and Purity/Integrity Verification
Myxozoan nematocysts were isolated using our previously described methods [45]. Purified myxospores were disrupted by sonication for 3 min (on for 3 s and rest for 1 s) at 85 W electric power. Thereafter, by successively using 50%, 70%, 90% Percoll gradient centrifugation, and 100% Percoll centrifugation, pure nematocysts (50%/70% interface and the middle of the 70% layer) were separated from the sonicated suspension. The nematocysts were intact and pure from contaminating cellular debris based on light microscopy. To provide further evidence of the purity of the nematocysts, we fixed the T. kitauei nematocysts in 3% glutaraldehyde at 4 • C, dehydrated with CO 2 using the critical point method and sputter-coated with gold. The microstructure was characterized with a Japanese SU8010 field emission scanning electron microscope (FESEM) at 3 kV.
For genome sequencing, DNA was extracted from the frozen spores of M. honghuensis and M. wulii using a TIANamp Genomic DNA kit (TIANGEN Inc., Beijing, China). Libraries were built using the TruSeq DNA library kit (Illumina). The~500 bp (M. honghuensis) and 450 bp (M. wulii) insert libraries were sequenced on an Illumina HiSeq 2500 (2 × 125 PE) and Illumina HiSeq 4000 (2 × 150 PE), respectively. After filtering with Trimmomatic v0.33, 19.6 Gb and 18.5 Gb clean data were generated for M. honghuensis and M. wulii and were assembled with SOAPdenovo v2.04 using default parameters [50]. As the genomic and transcriptomic data were to be used specifically for subsequent MS/MS and phylogenomic analyses, we did not perform further systemic post-assembly analyses in this study.

Data Decontamination and Construction Databases for MS/MS Analysis
We used our recently developed method to construct a clean, efficient, and comprehensive protein reference that we denote as the customized comprehensive proteomic reference database (CCPRD) (for details, see [51]). Briefly, nonredundant host databases containing proteins or nucleotide sequences from the host, as well as nonredundant closely related databases containing proteins or nucleotide sequences from species closely related to myxozoans, were constructed. These four databases were then blasted against the transcriptome assemblies using TBLASTN or TBLASTX in BLAST + v2.4.0 [52] (−e value 1 × 10 −5 ). Only transcriptome sequences that exclusively matched the host databases were removed. The retained transcriptome sequences were further blasted against a bacterial database using BLASTX (−e value 1 × 10 −10 ). The positive hits were blasted to the NCBI nonredundant protein database (NR) using BLASTX (−e value 1 × 10 −5 ). Only sequences that were annotated as "bacteria" were removed. For assessment and visualization of contamination in genomes, taxonomic assignment of each contig was carried out using Blobtools v1.0 [53]. Sequences that strongly matched Chordata and Proteobacteria were excluded. The decontamination process was operated conservatively to prevent possible over-decontamination that could result in the loss of a large portion of actually expressed genes and proteins.
The CCPRDs of M. honghuensis, M. wulii, and T. kitauei were built by integrating information from the genomic and RNA-seq data. For genomes, we used GeneMark-ET [54], Augustus [55], and Semi-HMM-based Nucleic Acid Parser (SNAP) [56] for ab initio prediction, Genewise [57] for homology-based prediction, and PASA [58] for transcriptome-assisted prediction. For RNA-seq data, coding sequences were first predicted by TransDecoder [59] and GeneMarkS-T [60]. Next, transcriptome homology-based predictions were performed by a customized Perl script Hercules (https://github.com/qingxiangguo/hercules) (accessed date: 29 March 2017). Furthermore, those transcripts that were translated neither by de novo nor homology-based method were translated into amino acid sequences using the Transeq script from the European Molecular Biology Open Software Suite (EMBOSS) [61]. Finally, all proteins (>30 amino acids) resulting from genomic and RNA-seq data were processed by CD-HIT with a threshold of 100% to collapse the group into a nonredundant data set, leading to the final CCPRD.

Tandem Mass Spectrometry and Proteomic Analysis
Proteins were extracted from the cleaned nematocysts by dissolving them in lysis buffer (4% SDS, 100 mM Tris-HCl pH 7.6, 1 mM DTT). Protein samples (20 µg) were separated on 10% SDS-polyacrylamide gel electrophoresis and visualized with Coomassie Blue R-250 staining ( Figure S1). A 60 µL protein solution was collected, and DTT was added to a 10 mM final concentration, heated at 60 • C for 30 min, and then cooled to room temperature. Then, 200 µL UA buffer (8 M urea, 150 mM Tris-HCl, pH 8.0) was added and centrifuged at 14,000× g for 15 min two times; 100 µL IAA buffer (100 mM iodoacetamide in UA) was added and vortexed at 600 rpm for 1 min. The samples were incubated for 30 min in darkness and were centrifuged at 14,000× g for 15 min. Then, 100 µL UA buffer was added and centrifuged at 14,000× g for 15 min two times, after which 100 µL NH 4 HCO 3 buffer (25 mM) was added and centrifuged at 14,000× g for 15 min two times. Finally, the protein suspensions were digested with 40 µL Trypsin buffer (2 µg Trypsin in 40 µL 100 mM NH 4 HCO 3 ) and vortexed (600 rpm, 1 min) before incubation at 37 • C overnight, and the resulting peptides were collected as a filtrate. The filtrated peptides of each sample were desalted on C18 cartridges (Sigma, St. Louis, MO, USA) and were concentrated by vacuum centrifugation and reconstituted in 40 µL of 0.1% (v/v) formic acid. The peptides were eluted with a linear 110 min gradient of 8-45% and 10 min to 95% acetonitrile with 0.1% formic acid in water at a flow rate of 0.25 µL/min. The samples were analyzed using an Easy-nLC nanoflow HPLC system connected to a Q-Exactive mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA). MS/MS scans were acquired using Q-Exactive with a mass resolution of 70,000 and 17,500, respectively. The MS scan range was 300-1800 m/z with automatic gain control (AGC) target 3.0 × 10 −6 , and the maximum injection time was 50 ms. The AGC target and the maximum injection time for MS/MS were 3.0 × 10 −6 and 60 ms, respectively. The higher-energy collisional dissociation (HCD) mode was used as the fragmentation mode with 27 eV collision energy. The precursor isolation windows were set to 2 m/z. The raw files were searched against the CCPRD using MaxQuant v1.5.3.8 [62]. The search was set at a precursor mass window of 6 ppm and a fragment tolerance of 20 ppm. The enzyme was set to trypsin. Missed cleavage sites were set to 2. For all searches, carbamidomethylated cysteine was set as a fixed modification and oxidation of methionine and N-terminal protein acetylation as variable modifications. The false-discovery rate for peptides and proteins was set to 0.01. To estimate protein abundance, we used the iBAQ (intensity-based absolute quantification) [63] option of MaxQuant.
As a single peptide can be linked to multiple proteins (e.g., in the case of razor proteins), multiple protein IDs can be assigned to the same protein groups. To prevent the interference of redundant proteins in downstream analysis while still retaining maximum information, a leading protein-based protein identification strategy was introduced here. Briefly, an initial table containing the protein group information from three replicates was constructed. After excluding the proteins marked with "only identified by site", "reverse", and "contaminant" from the initial table, leading proteins and majority proteins from all three replicates were merged. The proteins (leading and majority) were annotated with the NR database, and leading protein sequences that needed correction were detected and marked with red color (Dataset S1). Comprehensive contaminants were manually determined with a holistic consideration of blast results, iBAQ, and unique peptide counts. Gene Ontology (GO) analysis and domain predictions were performed for the leading proteins with the BLAST2GO tool v4.19 [64] and InterPro [65], respectively. The final table was improved by correcting the leading protein sequences, removing the comprehensive contaminants and recalculating the sequence length and iBAQ%. To evaluate the repeatability within replicates and the consistency between leading and majority proteins, polar capsule proteins identified in different selection criteria in three replicates were compared. The Venn diagram in Figure S2 indicated good repeatability within replicates and that downstream analysis based on leading proteins had good representativeness for that of majority proteins. The identified proteins (accession numbers, sequences, Gene Ontology (GO) annotations, and InterPro results) are summarized in Dataset S1.

Comparative Proteomic and Venomic Analyses
We identified orthologs using OrthoMCL v2.0.9, which applies the Markov clustering algorithm [66]. Default options were used in all steps for nematocyst proteomes from eight species, namely, myxozoans M. honghuensis, M. wulii, T. kitauei, and C. shasta [32], and the free-living cnidarians Hydra vulgaris, Anemonia viridis, Aurelia aurita [21], and C. fleckeri [24]. To compare functionalities, BLAST2GO v4.19 [64] was used to obtain the InterPro domains found in the various NEMs, with comparisons performed using Microsoft Excel. For both orthologs and domains, the intra-group comparisons of the myxozoans were visualized in R package VennDiagram [67], while the results from the eight nematocyst proteomes were visualized in UpSetR [68]. A comparison between the myxozoan nematocyst proteomes and the four free-living cnidarian nematocyst proteomes was carried out using BLASTp (−e value 1 × 10 −20 ). Circoletto [69] and Circos [70] were used to represent the BLASTp results.
To identify the putative toxins, a group of known cnidarian toxin sequences (e.g., metalloprotease, ShK, MAC-PF domains) was used as seed for reciprocal best BLAST hit (RBBH) analysis with nematocyst proteomes (−e value 1 × 10 −10 ). Then, a machine learning-based classifier ToxClassifier [71] was used to exclude the proteins with non-toxic physiological functions. All candidate toxin proteins identified by RBBH or ToxClassifier were validated by blasting against Tox-Prot UniProtKB/Swiss-Prot [72], and only proteins with the best match to toxins were retained (−e value 1 × 10 −5 ). To generate an overview of the distribution pattern of putative toxin protein families across cnidarians, a matrix of the presence and absence of toxins in the present study and 23 well-annotated nematocyst proteomes [18][19][20][21][22][23][24][25][26][27][28][29][30][31][32] was constructed (Dataset S2) and visualized. To avoid dramatically increasing the number of potential toxins that could be explored, proteomes from whole tissues (e.g., tenacles, mucus) were not included. The toxin information was extracted directly from those original studies.

Ortholog Assessment, Gene Selection, and Data Filtering
For the cnidarian phylogeny dataset, a new supermatrix was assembled from 110 cnidarian transcriptomes and/or genomes, including our newly generated myxozoan data (Dataset S3). HaMStR v13.2.3 [73] was used to determine orthologous genes following the protocol in Reference [74]. Translated genes for all taxa were searched against the basal model organism core ortholog set of HaMStR using the strict option and Nematostella vectensis as the reference taxon. The orthologous groups were then aligned using MAFFT v7.205, and ambiguously aligned regions were trimmed with trimAL v1.4.3 [75]. Alignments that covered fewer than 50 taxa were deleted. Filtering of paralogs was performed in PhyloTreePruner v1.0 [76] based on trees generated with FastTree v2.1.11 [77] using default settings. The protein alignments were concatenated into an initial cnidarian matrix of 51,598 amino acid positions across 146 genes with 103 taxa. TreSpEx v1.1 [78] was used to detect possible effects of sequence biases (LBA and saturation) in our phylogenomic data. LB scores and pairwise distances [78] were calculated for each taxon and OG. Following Struck [78], these values were plotted in R, and outliers were identified as taxa or genes that could cause LBA artifacts or saturation. In addition, taxon-specific LB scores for each gene were calculated, and a heatmap of these scores was generated to identify LBA in a specific taxon (LB score higher than 200). To identify exogenous contamination missed by our initial orthology inference approach, we blasted the alignments against a database that included a set of representative cnidarian sequences and a set of representative bilaterian sequences. Then, we removed all sequences that had better BLAST hits to bilaterians than to cnidarians. Finally, a series of pruning experiments were conducted by removing those taxa and genes identified to be LBA artifacts, saturated, and contaminations. The outline for phylogenomic analysis is summarized in Figure S3.
For the nematocyst phylogeny, nematocyst core proteins were identified by OrthoMCL v2.0.9 from eight nematocyst proteomes and were used as queries to search annotated proteins from cnidarian transcriptomes or genomes with BLASTp. Reciprocal best hits recovered at a cutoff of 1 × 10 −5 among the cnidarian proteins and the matched sequences were considered to be orthologs. As a secondary check, HMMER v3.0 [79] was also used to find potential homologs. Retrieved amino acid sequences were aligned with MAFFT v7.205 [80] by the L-INS-i method. Multiple alignments were trimmed using Gblocks v0.91b [81]. In cases where there were multiple copies of homologs of a particular gene, single-gene trees were built using IQ-TREE v1.6.4 [82] and were then manually inspected to remove suspicious sequences and paralogs using FigTree v1.4.3 (http://tree.bio.ed.ac. uk/software/figtree/) (accessed date: 5 November 2017). Single-gene alignments were concatenated using SCaFoS v1.2.5 [83].

Phylogenetic Analysis
The phylogenomic and nematocyst datasets were analyzed with both ML and Bayesian inference (BI) methods. The ML analyses were conducted using RAxML v.8.2.12 [84] and IQ-TREE v1.6.4 [82]. Nonparametric ML bootstrap support was obtained from 100 ML replicates using the PROTGAMMAAUTO model implemented in RAxML. The IQ-TREE-inferred ML phylogeny used the LG + C20 + F + Γ4 model plus 10,000 ultrafast (UF) bootstrap approximations. Bayesian Inference was performed with PhyloBayes MPI v1.4 [85] using the CAT + GTR + Γ4 model. We ran four Markov chain Monte Carlo (MCMC chains for 20,000 generations and saved every second tree. We tested for convergence (maxdiff < 0.1) using bpcomp, implemented in PhyloBayes, with a burn-in of 20%.
Approximately unbiased [86] tests of various competing phylogenetic hypotheses were conducted using loose constraints with the hypothetical groupings and optimizing these constraints under LG +Γ4 + F + C20 + PSMF in IQ-TREE using the 146-gene (51,598 amino acid positions) dataset. The optimized trees were compared using the approximately unbiased test with 10,000 RELL bootstrap replicates. Maximum log-likelihoods of each constraint and their differences from the optimal ML tree were calculated and compared. The hypotheses within the 95% confidence interval that could not be rejected were where p > = 0.05.
To assess if fast-evolving genes were biasing inferences [87], we estimated the rates of evolution per site with Dist_Est [88] under the LG + Γ4 model and removed the fastest evolving sites in a stepwise fashion (3500 sites per step). Each step was analyzed using 10,000 MLBS pseudoreplicates in IQ-TREE under LG + Γ4 + F or LG +Γ4 + F + C20 + PSMF.

Divergence Time Estimation
The divergence times of nematocyst core proteins and the major cnidarian lineages were calculated across the best ML tree, which was generated by RAxML, using Penalized likelihood (PL) implemented in r8s v1.7.1 [89]. Cross-validation was tested to determine the best smoothing value, and the smoothing parameters (cvstart = −8; cvinc = 1; cvnum = 15) were set to 1 × 10 −6 . Four calibration points were used: the node of crown cnidarians was constrained with a maximum age of 741 Mya (million years ago) [90], the minimum age of Medusozoa was set to 570 Mya [91], the minimum age of Hexacorallia was set to 540 Mya [90], and the rise of Hydrozoa was fixed at 500 Mya [8].

Selection Analyses and Adaptation Quantification
Selection analysis was performed using Hyphy v2.3.14 [92]. For individual toxins (astacin, C-type lectin, disintegrin, Kunitz, ShK), site-specific selection analysis of nucleotide sequences was performed using MEME and FEL. The branch-site test was conducted by aBSREL. The gene-wide episodic diversifying selection was performed using BUSTED.
To quantitatively evaluate the role of nematocyst proteins in cnidarian adaptation, orthologs in the genomes of eight cnidarians (Acropora digitifera, H. vulgaris, Orbicella faveolata, Stylophora pistillata, Kudoa iwatai, T. kitauei, Exaiptasia diaphana, N. vectensis) [35,[93][94][95][96][97][98][99] were identified by OrthoMCL. We then classified these proteins into nematocyst proteins (NEMs) and proteins with no nematocyst-match (non-NEMs) by blasting against a protein repertory containing the same eight nematocyst proteomes as in comparative proteomics. We used the multiple alignments of the coding sequences from the eight cnidarians listed above to quantify adaptation across cnidarians using a method modified from Enard et al. [100]. Briefly, BUSTED was used as a threshold to determine whether genes were included in analyses. If the threshold for BUSTED p-value was set to 10 −x , we only included in the quantification the adaptation of coding sequences with BUSTED p-value ≤ 10 −x . For a certain BUSTED p-value, we assessed the amount of adaptation experienced by a particular protein (NEMs or non-NEMs) by estimating the average proportion of selected codons from the aBSREL test along all cnidarian branches. Then, the adaptation for whole NEMs (or non-NEMs) was quantified as the average proportions of selected codons in branches across NEMs (or the same number of randomly matched non-NEMs; see the permutation scheme described below). We then compared NEMs and non-NEMs for weak (BUSTED p-values ≤ 0.9) to increasingly strong (BUSTED p-values ≤ 10 −x ) evidence of adaptation. We calculated the excess of adaptation across cnidarians in NEMs, which is measured as the extra percentage of adaptation in NEMs compared to non-NEMs.
Non-NEMs were sampled by the permutation scheme. We used PAML to calculate dN and dS under M8 model [101] (Dataset S4). The average dN in NEMs was first calculated and noted as dN (NEM) . Then, we formed a range in which the average dN of sampled non-NEMs should fall. The infimum and supremum were defined as dN (inf) = 0.95 (αdN (NEM) ) and dN (sup) = 1.05(αdN (NEM) ). We then started to sample five non-NEMs randomly and continued until the average dN fell into [dN (inf) , dN (sup) ]. The average dN of non-NEMs was kept within that interval. However, for every X non-NEM, the sampling was completely random. If the average dN of non-NEMs was not within [dN (inf) , dN (sup) ], we continued to sample until it went back to our expected interval. In this way, the X in fact controls the variance of the sampling results, and the combination of α and X can be tested through a series of trial and error. To define α and X, we conducted 10 4 random samplings of non-NEMs, and it was found that α = 0.95 and X = 3 yielded the closest dN/dS variance (0.001043401 in NEMs vs. 0.001075036 in non-NEMs, p = 0.7631) and slightly different dN/dS means (NEMs mean = 0.04989917 vs. non-NEMs mean dN/dS = 0.0551787, p = 0.0206). Further, Fisher's exact test was used to determine whether there was adaptation enrichment in NEMs considering the whole cnidarian orthologs as a background.
Only 120, 109, and 104 domains were shared between the myxobolids and C. shasta ( Figure 2B). Within the myxobolids, 435 and 325 domains were shared between T. kitauei and M. honghuensis and between T. kitauei and M. wulii. A total of 557 domains were shared between M. honghuensis and M. wulii, constituting 38.4% and 86.1% of the total domains. Eighty-three domains were conserved among myxozoan NEMs. Contrary to the limited number of shared domains, all myxozoan NEMs had highly similar functional distributions. The proteomes were characterized by enzymes, structural proteins, and novel sequences. Many enzymes were part of the venom proteome, and structural proteins were composed mainly of sequences with ECM motifs (Figure 2C-E). A global comparison including freeliving and parasitic cnidarians showed that the species-specific components ranked the highest for both OG (orthologous groups) and domain intersections ( Figure 2F,G).     We next focused on the inter-group comparison of NEMs between myxozoans and free-living cnidarians. Estimates of sequence similarity were inconsistent with phylogenetic relationships (Figure 3). For example, all myxozoans had more common proteins and higher amino acid sequence similarity to those of the sea anemone A. viridis than to those of the moon jellyfish A. aurita. Likewise, there were as many as 201 NEM matches between M. honghuensis and A. viridis, nearly twice as many matches obtained for M. honghuensis vs. H. vulgaris.
domains. Eighty-three domains were conserved among myxozoan NEMs. Contrary to the limited number of shared domains, all myxozoan NEMs had highly similar functional distributions. The proteomes were characterized by enzymes, structural proteins, and novel sequences. Many enzymes were part of the venom proteome, and structural proteins were composed mainly of sequences with ECM motifs (Figure 2C-E). A global comparison including free-living and parasitic cnidarians showed that the species-specific components ranked the highest for both OG (orthologous groups) and domain intersections ( Figure 2F,G).
We next focused on the inter-group comparison of NEMs between myxozoans and free-living cnidarians. Estimates of sequence similarity were inconsistent with phylogenetic relationships (Figure 3). For example, all myxozoans had more common proteins and higher amino acid sequence similarity to those of the sea anemone A. viridis than to those of the moon jellyfish A. aurita. Likewise, there were as many as 201 NEM matches between M. honghuensis and A. viridis, nearly twice as many matches obtained for M. honghuensis vs. H. vulgaris.

Myxozoan Venom Analysis
In total, 67, 74, and 67 putative toxins were identified in the M. honghuensis, M. wulii, and T. kitauei proteomes, respectively (Dataset S2). According to their predicted biological functions, these toxins were classified into five categories (Figure 4). The proportional distributions of these toxin families were similar among the three species (Dataset S2). Peptidases were the most diverse putative toxins, including metalloproteinases, cysteine proteases, serine proteases, and aspartic peptidases. The metalloproteinases were alanine aminopeptidases, astacins, ADAMs, carboxypeptidases, and thimet oligopeptidases. The relative distributions of the other enzymes in the three species were also similar and, in particular, were dominated by transferases, phosphatases, and peroxidases. In addition, other toxins, such as neurotoxins (ShK-like and Kunitz-type), protease inhibitors, C-type lectins, and CRiSPs, were identified in the NEMs of the three myxobolids. teases, serine proteases, and aspartic peptidases. The metalloproteinases were alanine aminopeptidases, astacins, ADAMs, carboxypeptidases, and thimet oligopeptidases. The relative distributions of the other enzymes in the three species were also similar and, in particular, were dominated by transferases, phosphatases, and peroxidases. In addition, other toxins, such as neurotoxins (ShK-like and Kunitz-type), protease inhibitors, C-type lectins, and CRiSPs, were identified in the NEMs of the three myxobolids. Overall, the myxozoan toxin family was under negative selection (ω < 1) (Dataset S6). However, using the branch-site unrestricted statistical test for episodic diversification (BUSTED), we found evidence for episodic gene-wide positive selection across the myxozoan astacin-like toxin family (p = 0.007). This indicates that at least one site on at least one branch of the myxozoan-dominated astacin-like toxin subclade underwent positive selection based on a comparison with the free-living cnidarian branches and genes. Using adaptive branch-site random effects likelihood (aBSREL), we also found evidence for episodic positive selection on the M. honghuensis branch in the C-type lectin phylogeny.

NEM Core Set
The nematocyst core set (NEMs shared by all eight cnidarians evaluated in the present study) included five proteins, most of which have housekeeping functions (Table 1). Overall, the myxozoan toxin family was under negative selection (ω < 1) (Dataset S6). However, using the branch-site unrestricted statistical test for episodic diversification (BUSTED), we found evidence for episodic gene-wide positive selection across the myxozoan astacin-like toxin family (p = 0.007). This indicates that at least one site on at least one branch of the myxozoan-dominated astacin-like toxin subclade underwent positive selection based on a comparison with the free-living cnidarian branches and genes. Using adaptive branch-site random effects likelihood (aBSREL), we also found evidence for episodic positive selection on the M. honghuensis branch in the C-type lectin phylogeny.

NEM Core Set
The nematocyst core set (NEMs shared by all eight cnidarians evaluated in the present study) included five proteins, most of which have housekeeping functions (Table 1). Elongation factor 1 alpha (EF1α) is a key component of protein biosynthesis [102]. Histones (H2A and H2B) are the chief protein components of chromatin [103]. Heat shock proteins are important molecular chaperones [104]. Nematogalectin is a major structural component of the nematocyst tubule [105]. We then performed database searches and phylogenetic analyses of these proteins to determine the variant types. The EF1α and HSP (heat shock protein) sequences were identified as EF1α-1 and heat shock cognate 71 kDa (Hsc70) based on sequence comparisons. The histones were assigned to canonical H2A and canonical H2B type using HistoneDB2.0 [106]. The nematogalectin was identified as the ancient nematogalectin-related type by a gene family analysis of cnidarian nematogalectins ( Figure S4).

Cnidarian Organismal and Nematocyst Phylogenetic Analyses
Our initial cnidarian organismal set consisted of 105 taxa, 146 orthologous groups, 51,598 amino acid positions, and 24.8% missing data ( Figure S5). Both Bayesian analyses using the CAT model and a maximum likelihood (ML) analysis using the PMSF model strongly recovered the following relationship: (((Myxozoa + Polypodium) + Medusozoa) + Anthozoa). However, the myxozoan H. yanchengensis was grouped within Bilateria ( Figure S6) even after sequence bias correction ( Figures S7-S9) and rogue species deletion (Figures S10-S12). We then applied decontamination methods to Henneguya according to Kayal et al. [107], and the resulting matrix robustly supported the assignment of H. yanchengensis to the myxozoans ( Figure S13). We ultimately obtained a well-supported topology with good resolution of relationships among all major lineages of cnidarians ( Figure 5A), consistent with recently proposed relationships within Cnidaria [107].
Approximately unbiased (AU) tests were performed to compare alternative topologies proposed in previous studies or obtained in our analysis ( Figure S14). Most alternative topologies for Myxozoa + Polypodium were significantly rejected ( Table 2). To further examine these conflicts, we estimated the rate of evolution at sites in the supermatrix of 146 proteins from 105 taxa and removed sites in a stepwise manner according to the rate of evolution, plotting bootstrap values for nodes of interest in the tree per site deletion step under the models LG + F + Γ4 and LG + C20 + F + Γ4 + PMSF estimated using IQ-Tree ( Figure 5B). With the progressive removal of the fastest evolving sites, the topologies estimated by both the site-homogeneous and site-heterogeneous model and their support values were unaffected until well over half of the data set was removed. Overall, the conventional topology for the main lineages of cnidarians and class-level relationships within Medusozoa were consistently and robustly supported. Table 2. Approximately unbiased test of the cnidarian phylogeny. Each tree was loosely constrained with the hypothetical groupings and optimized under LG + Γ4 + FMIX (empirical, C20) in IQ-tree using the 146-gene (51,598 sites) data set. The optimized trees were compared using the approximately unbiased test with 10,000 RELL bootstrap replicates. Maximum log-likelihoods of each constraint and their differences from the optimal ML tree are listed. The hypotheses within the 95% confidence interval that could not be rejected are where p ≤ 0.05. POLYPODIOZOA, Polypodium hydriforme.  Approximately unbiased (AU) tests were performed to compare alternative topologies proposed in previous studies or obtained in our analysis ( Figure S14). Most alternative topologies for Myxozoa + Polypodium were significantly rejected ( Table 2). To further examine these conflicts, we estimated the rate of evolution at sites in the supermatrix of 146 proteins from 105 taxa and removed sites in a stepwise manner according to the rate of evolution, plotting bootstrap values for nodes of interest in the tree per site deletion step under the models LG + F + Γ4 and LG + C20 + F + Γ4 + PMSF estimated using IQ-Tree ( Figure 5B). With the progressive removal of the fastest evolving sites, the topologies The nematocyst tree successfully recovered the higher-level phylogeny of Cnidaria: (((Myxozoa + Polypodium) + Medusozoa) + Anthozoa) ( Figure 6A). We also recovered the two sister clades, Octocorallia and Hexacorallia, in Anthozoa and a monophyletic Medusozoa with the well-accepted topology of (((Cubozoa + Scyphozoa) + Staurozoa) + Hydrozoa). However, the nematocyst phylogenetic reconstruction was inconsistent with the lower-level taxonomy of Cnidaria (i.e., at the genus or species levels). Biology 2021, 10, x FOR PEER REVIEW 15 of 27

Major Events in Nematocyst Evolution
In the NEM core set, evolutionary events before the diversification of the main lineage only involved the gain of genes encoding nematogalectin/minicollagen and co-option of the housekeeping histones H2A, H2B, EF1α, and Hsc70. By comparing the phylogeny

Major Events in Nematocyst Evolution
In the NEM core set, evolutionary events before the diversification of the main lineage only involved the gain of genes encoding nematogalectin/minicollagen and co-option of the housekeeping histones H2A, H2B, EF1α, and Hsc70. By comparing the phylogeny of cnidarians ( Figure 5, Table 2) with that of nematocyst core proteins ( Figure 6A), we examined the evolution of the core set after diversification of the main lineage and found several inconsistencies in the placement of Cerianthidae, Antipatharia, and Palythoa variabilis. This most likely resulted from the transfer of the entire nematocyst gene complexes between Anthozoa lineages after their separation from other major cnidarian groups. In addition to gene transfer, post-expansion evolutionary events in the putative core set included the expansion of nematogalectin and minicollagen (if counted) by duplication and divergence.
By contrast, the non-core NEM set showed massive gene recruitment, gene duplication, and gene decay events. Some toxins, such as the Kunitz-type family, were recruited independently into Anthozoa/Scyphozoa/Myxobolidae; the Kunitz-type family was duplicated in Myxobolidae ( Figure S15). Actinoporins were gained at the base of cnidarians and subsequently lost in myxozoans. All venom proteins were lost in Ceratonova (Figure 4). We also observed the extensive domain shuffling of the ShK toxin ( Figure S16) and domain recruitment in C-type lectin ( Figure S17) and metalloproteinases ( Figure S18).

Divergence Time Estimates
All divergence time estimates are summarized in Table 3. Nematocyst evolution was always one step ahead of species evolution, with a gap of 33-55 million years ( Figure 6B). The last common ancestor of cnidarians occurred during the Tonian period, approximately 736 Mya. Medusozoa originated in 626.6 Mya. We estimated that the most recent common ancestor (MRCA) of all myxozoans dates back to the late Cambrian (492.6 Mya). Table 3. Comparison of divergence times for major nodes shared across eight studies. Numbers in the parentheses represent the median times and 95% HPD interval (Mya). "-" = Not available. "M + P" means Myxozoa + Polypodiozoa; "Coral" means Corallimorpharia.

NEM and Non-NEM Selection and Adaptation Analyses
All NEMs showed approximately 50% more adaptive amino acid changes, on average, compared to estimates in non-NEMs. NEMs with the strongest evidence of adaptation (BUSTED p-values < 10 −5 ) had up to 60% excess of adaptive substitutions ( Figure 7A, permutation test p < 2.2 × 10 −16 , 10 7 iterations). These NEMs were enriched for 20 GO (Gene Ontology) functions, including eight GO categories associated with more than 50 NEMs. Among the top 10 GO categories, seven were related to NEMs with a 40% or greater excess of adaptation (permutation test p < 0.05 in all cases). GO terms associated with a strong excess of adaptation included cellular processes, such as metabolism, cellular component organization or biogenesis, and biological regulation, as well as supracellular processes related to development ( Figure 7B). Moreover, Fisher's exact test of NEMs against background proteins showed that selective pressure was significantly enriched in NEMs (p = 0.004658, 0.01117, and 0.007638 at the 0.05, 0.01, and 0.001 levels; Figure 7C). Ontology) functions, including eight GO categories associated with more than 50 NEMs. Among the top 10 GO categories, seven were related to NEMs with a 40% or greater excess of adaptation (permutation test p < 0.05 in all cases). GO terms associated with a strong excess of adaptation included cellular processes, such as metabolism, cellular component organization or biogenesis, and biological regulation, as well as supracellular processes related to development ( Figure 7B). Moreover, Fisher's exact test of NEMs against background proteins showed that selective pressure was significantly enriched in NEMs (p = 0.004658, 0.01117, and 0.007638 at the 0.05, 0.01, and 0.001 levels; Figure 7C).

Nematocyst Heterogeneity and Organelle-Organism Discordance Support Extensive Species-Specific Adaptation
Our interspecific comparisons strongly support extensive species-specific adaptations in nematocyst evolution. Only 34 orthologs in nematocyst proteomes were shared among myxozoans. This heterogeneity was consistent with previous results for free-living cnidarians [21]. Comparing myxozoan data to data for free-living cnidarians, only four of the 34 orthologs were myxozoan-specific, and this number might be reduced in the future as sampling is expanded. Our results suggest that a myxozoan nematocyst "molecular

Nematocyst Heterogeneity and Organelle-Organism Discordance Support Extensive Species-Specific Adaptation
Our interspecific comparisons strongly support extensive species-specific adaptations in nematocyst evolution. Only 34 orthologs in nematocyst proteomes were shared among myxozoans. This heterogeneity was consistent with previous results for free-living cnidarians [21]. Comparing myxozoan data to data for free-living cnidarians, only four of the 34 orthologs were myxozoan-specific, and this number might be reduced in the future as sampling is expanded. Our results suggest that a myxozoan nematocyst "molecular toolbox" was lacking in the common ancestor and was potentially conserved among myxozoans for parasitism.
Despite the considerable differences in NEMs among cnidarians, there were a number of shared domains. Among myxozoans, the functional modules that make up the proteomes were highly similar. These results suggest that the composition and functional heterogeneity of the nematocyst resulted from extensive species-specific adaptation, a process focused more on functions than on specific protein types [21].
The discordance between proteomic similarity and phylogenetic relationships may also be explained by species-specific adaptations. Under changing conditions, the whole nematocyst proteome might have undergone rapid evolutionary changes. Thus, proteomic alterations might be inconsistent with the species phylogeny, especially for fast-evolving lineages. This prediction was tested by mapping proteomic results to the species phylogeny. Myxozoans had NEMs that were similar to those of sea anemones, inconsistent with the expectation that myxozoans are phylogenetically more closely related to Medusozoa than to Anthozoa [21], suggesting that rapidly evolving myxozoans are armed with nematocysts for parasitism. To investigate whether quantitative information from proteomics data would provide insights into species phylogenetic relationships, we compared the relative iBAQ values of 34 shared NEMs among three myxozoans. The abundance patterns of the 34 NEMs were found not to be significantly associated with phylogenetic relationships. However, our results may not reflect the true composition of nematocysts, as different proteins may perform differentially during digestion [21]. Thus, the relationship between the nematocyst protein abundance pattern and the phylogeny is still wide open for discussion, and more quantitative proteomics, including stable isotope labels and optimization of the dissolution method, are needed to quantify the nematocyst composition. Collectively, our results demonstrate that the nematocyst is highly diverse at the proteomic level. The striking interspecific heterogeneity in nematocyst proteome composition may be related to rapid diversification, highlighting its potentially significant role in adaptive evolution.

Dynamic Evolution of Myxozoan Venom and Episodic Positive Selection
Venom stored in nematocysts may be under selection, independent of other nematocyst core/structural proteins. Therefore, the evolution and selection of toxins warrant an independent investigation. We present evidence for the expression and translation of venom toxin homologs in myxobolid nematocysts, consistent with previous proteomic studies of myxospores [21]. Putative toxins are absent from C. shasta [32], suggesting that the composition of venom varies greatly among myxozoan families. The apparent loss of venom toxins in nematocysts of C. shasta could be attributed to a structural adaptation that facilitates host attachment rather than toxin delivery [40]. Despite the significant sequence differences, the functional modules of toxins were similar among the three myxobolids, exhibiting a similar evolutionary trend to that of other non-toxic NEMs. As observed in the comparative analysis of all NEMs, the Myxobolidae NEMs showed a unique toxin distribution pattern that was more similar to that anthozoans than to medusozoans, including abundant putative neurotoxins and a lack of cytolysins [108][109][110]. Our results are inconsistent with the hypothesis that the toxin gene distribution is correlated with species relatedness [108], suggesting that toxin families are not necessarily restricted to certain taxonomic groups but are common throughout cnidarian venom evolution [27].
Animal venoms are theorized to evolve under positive Darwinian selection in an arms race scenario [21]. Previous work has shown that genes encoding toxins in cnidarians experience stronger purifying selection than those of other venomous animals, possibly due to the functional role of these proteins in nematocyst development [21,111,112]. In myxozoans, despite substantial purifying selection across toxin sequences, a subset of genes (i.e., sites) along a subset of lineages showed evidence for positive selection, referred to as episodic positive selection [112]. We speculate that due to the simple morphology, myxozoan venoms are less constrained by nematocyst development. Thus, genomic constraints imposed on venom evolution are partially relaxed, leading to the intensification of selection on myxozoan toxins. Furthermore, this discrete positive selection may correspond with increased potency and/or specificity of a particular toxin or species. Overall, these patterns suggest that myxozoan venoms were subjected to various selective pressures, including wide-spread purifying selection and a few instances of gene-specific or lineage-specific episodic positive selection.

Defining the NEM Core Set
The identification of the core protein set is important for analyses of nematocyst evolution. In accordance with Rachamin et al. [21], we validated the relatively small set of core NEMs, most of which are housekeeping proteins. They include two histones, HSP, EF1α, and nematogalectin. Only nematogalectin is a nematocyst-specific component. Thus, it is possible that these non-nematocyst-specific components are cellular contaminants that can be explained by density gradient centrifugation, leading to the co-elution of nematocyst components with a small fraction of nuclei. However, we think that this explanation is unlikely for several reasons. (1) The purity of nematocysts was verified by light and electron microscopy. (2) The house-keeping proteins have also been reported in previous studies of free-living cnidarians [21,24] and are thought to have non-canonical functions, such as the maintenance of venom homeostasis (HSP), shaping the microtubule scaffold (EF1α), and even acting as venom (histones). (3) Histones have also been recovered from the clean nematocysts of C. shasta extracted by a dielectrophoresis-based microfluidic device. (4) The fraction of these house-keeping proteins in the complete proteome was high, as estimated by iBAQ values. Thus, their presence cannot be explained by chance alone. (5) The housekeeping proteins may have been co-opted by the ancestral nematocyst, suggesting that the core proteins are not necessarily nematocyst-specific. This is supported by the observation that all the core NEMs were the canonical type, in accordance with their ancient origin.
Next, conspicuously missing from our results were the minicollagens detected by Balasubramanian et al. [18], one of the most comprehensive datasets generated to date. This could reflect our isolation process or an insufficient protein solubilization protocol for the resolution of most major structural components, such as minicollagens. However, we do not think the absence of minicollagens is the result of insufficient nematocyst solubilization for three reasons. (1) Minicollagens formed insoluble polymers in mature wall structures and were difficult to dissolve in SDS before analysis, as observed by Rachamin et al. [21], and may reflect differences in extraction methods. (2) We successfully resolved major structural components, such as nematogalectins and ECM-related proteins, constituting about one-third of the Hydra nematocyst proteome [18]. (3) The proportion of structural proteins in our NEMs was lower than that in Hydra but comparable to that reported for the myxozoan C. shasta [32]. Thus, the reduction in structural components in our NEMs might be explained by lineage-specific characteristics, rather than insufficient solubilization. We performed a further proteomic database search using the oxidation of proline and still did not detect minicollagens (Dataset S7). Hence, the lack of minicollagen detection is not due to failure of matching collagenous peptides with proline hydroxylation. Despite the important role of minicollagen in nematocysts, it was not included in the current core set; it shows a complex evolutionary history and the ancestral sequence is unclear [113,114]. Therefore, the task of elucidating the evolution of the nematocyst may rest on establishing how the above core set evolved.

Decentralized Evolutionary Pattern of Nematocysts
With the aid of a robust cnidarian phylogeny and a nematocyst tree based on the core set, the evolution of the core/non-core set before and after the major lineage expansion was examined. We found a pattern of "decentralized modification", in which the putative core set underwent few modifications before and after appearing in the shared ancestor of cnidarians, and most evolutionary events were found in the non-core set. For example, the evolutionary processes involving the core set only included a few innovations, co-option events, expansion of nematogalectin/minicollagen, and frequent gene transfer events in anthozoans ( Figure 6A), the latter of which might be related to the frequent symbiosis between anthozoans and dinoflagellates algae [115]. The non-core set, however, showed massive gene recruitment, gene duplication, and gene decay events. The above evolu-tionary pattern is distinct from that of other organelles or complex systems. e.g., bacterial flagella [116], bivalve shell matrix [117] or bacterial non-flagellar T3SS (NF-T3SS) [118].
The most plausible explanation for the decentralized modification is that the rise of a nematocyst prototype was followed by the rapid diversification of cnidarians and, therefore, there was a relatively short period for evolutionary events to occur [119] before the rapid expansion of nematocysts into various lineages. Subsequently, strong selective pressure led to active lineage-specific modifications and finally contributed to the high heterogeneity of nematocysts. These results provide reasonable support for the key role of the nematocyst in cnidarian evolution.

Macroevolutionary Lag between the Nematocyst Core Set and Cnidarian Diversification
A prerequisite for the nematocyst to be an adaptive driver is that the ancestral nematocyst arose before the main cnidarian lineages diverged. This was tested by comparing the timescale of the nematocyst ancient core complex and organismal evolution under the same chronogram framework. The nematocyst consistently predicated steps in cnidarian evolution with a stride length of 33-55 My. The last common ancestor of cnidarians occurred during the Tonian period, approximately 736 Mya, within the range of 389-1035 Mya [120], earlier than 561-626 Mya [121], but later than the estimates reported by Holzer et al. [5] (723-848 Mya), Park et al. [90] (686-819 Mya), and Waggoner and Collins [122] (700-1200 Mya). Medusozoa was estimated to have originated 626.6 Mya, which is earlier than estimate of Park et al. [90] (571-670 Mya) but later than that of Waggoner and Collins [122] (640-940 Mya). In shallower nodes, such as the origin of cnidarian classes or families, our divergence time estimates are considerably younger than those of Holzer et al. [5]. We estimated that the most recent common ancestor (MRCA) of all myxozoans arose in the late Cambrian (492.6 Mya), later than the Ediacaran, 588 Ma (540-642 Mya), reported by Holzer et al. [5]. These results suggest that nematocysts achieved substantial complexity early in the evolution of these groups and have the potential to be a driving evolutionary force.

Frequent Adaptation in NEMs across Cnidarians
Despite evidence for the essential role of the nematocyst in cnidarian adaptive evolution, the extent to which the trait contributed to the evolution of the lineage remains unclear. For quantitative analysis, we detected episodes of adaptive evolution along each of the eight branches on the cnidarian tree using the adaptive branch-site REL test (aBSREL test) [123] and the BUSTED test [124]. A permutation test showed that adaptation has been much more common in NEMs than in non-NEMs across cnidarians. The strongest excess of adaptation in NEMs reached 60%, and these were related to 20 GO functions. The excess of adaptation in NEMs therefore involves wide functions. Interestingly, some of these functions are consistent with the functions of recently diverged nematocysts, such as response to stimuli and organelle localization. Moreover, a selective pressure enrichment analysis of the NEMs against conserved background proteins indicated that selective pressure on background proteins was significantly enriched in the NEMs, consistent with the idea that selection in cnidarians disproportionately affects the nematocyst. These results suggest that the nematocyst plays a predominant role in adaptive evolution in cnidarians via pleiotropic effects on diverse biological functions.

Future Perspectives
Our analysis benefits from the integration of decades of omics data for cnidarians. However, there are still potential sources of bias that should be addressed in future research. Although our NEM core set is in accordance with previous studies, it may change over time as a result of increased taxon sampling and the optimization of nematocyst solubilization methods. Furthermore, additional genome-and transcriptome-wide searches of cnidarian minicollagens are needed to better clarify the evolutionary trajectories of these proteins [114], which can be integrated into the core set and further improve our understanding of nematocyst evolution.

Conclusions
Starting with a top-down analysis of a phenotype of interest, we observed a strong increase in the rate of adaptive evolution in the nematocyst by a holistic consideration of the proteomic composition, patterns of evolution, chronograms, and quantification of adaptive evolution. The nematocyst likely increases evolvability, enabling adaptation to changing selection regimes and thereby contributing to the striking diversity of cnidarians. The unique role of nematocysts provides an important addition to the catalog of mechanisms underlying adaptive success in the lineage and establishes a promising system for investigating the ecological and evolutionary implications of phenotypic novelties in non-model organisms.
Our results emphasize the importance of quantitative analyses in the role of novelties in evolution and diversification, providing a basis for reassessing the evolutionary history of many established models, with the potential to substantially improve our understanding of both the mechanistic basis of novel traits and evolutionary processes. The present study and other research on cnidarian structures also play an important role as simple models for basic theories of structural development-beginning with the pioneering work of Alan Turing [125][126][127][128][129].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/biology11010091/s1, Figure S1: Protein quantitation of the isolated myxozoan nematocysts, Figure S3: Flowchart of phylogenomic data analysis, Figure S4: Cnidarian nematogalectin ML tree, Figure S5: Visualization of the supermatrix for final phylogenomic constructions, Figure S6: Cnidarian tree inferred by IQ-TREE (105 taxa and 51,598 amino acid sites), Figure S7: Heat map of the taxon-specific long-branch scores for the cnidarian tree (105 taxa and 51,598 amino acid sites), Figure  S8: Density plots of different gene-specific long-branch indices for the 146 genes of the cnidarian tree, Figure S9: Density plots of different gene specific saturation indices for the 146 genes of the cnidarian tree, Figure S10 Figure S14: Alternative topologies among major lineages of cnidarians, Figure S15: Bayesian analysis of the cnidarian type-2 K+ channel toxins, Figure S16: Bayesian analysis of the cnidarian ShK-like toxin domains, Figure S17. Bayesian analysis of the cnidarian C-type-lectin domains, Figure S18: Bayesian analysis of the cnidarian astacin (M12A metalloproteinases), Table S1: Sequencing and assembly statistics for genomes and transcriptomes in this study, Dataset S1: List of nematocyst proteins identified from three myxozoans (each with three replicates), Dataset S2: Overview of the toxin diversity of representative cnidarian species, Dataset S3: Taxa included in the phylogenomic study, Dataset S4: dN and dS of all cnidarian conserved orthologs calculated by PAML under M8 model, Dataset S5: List of the 34 orthologs that are conserved among four myxozoan nematocyst proteins, Dataset S6: Selection analyses of the myxozoan toxin families, Dataset S7: List of nematocyst proteins identified from three myxozoans (each with three replicates and oxidation of proline parameter added).