Highly Conserved Elements and Chromosome Structure Evolution in Mitochondrial Genomes in Ciliates

Recent phylogenetic analyses are incorporating ultraconserved elements (UCEs) and highly conserved elements (HCEs). Models of evolution of the genome structure and HCEs initially faced considerable algorithmic challenges, which gave rise to (often unnatural) constraints on these models, even for conceptually simple tasks such as the calculation of distance between two structures or the identification of UCEs. In our recent works, these constraints have been addressed with fast and efficient solutions with no constraints on the underlying models. These approaches have led us to an unexpected result: for some organelles and taxa, the genome structure and HCE set, despite themselves containing relatively little information, still adequately resolve the evolution of species. We also used the HCE identification to search for promoters and regulatory elements that characterize the functional evolution of the genome.


Introduction
ATP and other compounds are synthesized in mitochondria [1]. Generally, many eukaryotes living under anaerobic conditions either lack mitochondria [2], or contain mitochondrial remnants such as hydrogenosomes or mitosomes. For example, Nyctotherus ovalis, anaerobic, lives in the hindgut of the cockroaches Periplaneta americana and Blaberus sp. [3]; its mitochondria generate hydrogen [4]. The role of mitochondria varies between different organisms, and is reflected in the size of the mitochondrial genomes. Parasitic apicomplexans have extremely small mitochondrial genomes coding for only three proteins and short rRNA fragments [5].
The ciliates (Ciliophora) include parasitic Ichthyophthirius multifiliis which causes death in many freshwater fish species reared in aquaria, fish farms, and aquacultures [6]. Mitochondria of ciliates can serve as targets for therapeutic intervention in parasitic diseases, and analysis of the structure and evolution of their genomes as well as the regulation of their gene expression can be of practical importance, in particular in veterinary medicine, e.g., for organization and veterinary care in fish hatcheries.
We analyzed the mitochondrial genomes in Ciliophora. The phylum Apicomplexa as well as the phylum Ciliophora belong to the superphylum Alveolata. We considered genera that belong to three classes: Armophorea (Nyctotherus), Oligohymenophorea (Ichthyophthirius, Paramecium, likelihood method (ML). This tree is in good but naturally imprecise agreement with the species tree based on GenBank taxonomy. In particular, Moneuplotes crassus more commonly shared HCEs with Oxytricha trifallax than with Moneuplotes minuta, while Paramecium aurelia notably differed from P. caudatum by the HCE pattern.
Life 2017, 7, 9 3 of 11 maximum likelihood method (ML). This tree is in good but naturally imprecise agreement with the species tree based on GenBank taxonomy. In particular, Moneuplotes crassus more commonly shared HCEs with Oxytricha trifallax than with Moneuplotes minuta, while Paramecium aurelia notably differed from P. caudatum by the HCE pattern. Figure 1. The tree of mitochondrial evolution generated using 393 HCEs identified by our algorithm. The tree was generated by the RAxML program based on a matrix with 12 rows and 393 columns, with the matrix cells containing 1 or 0 to indicate the presence or absence of a given HCE in the mitochondrial genome of a given species, respectively.
Five HCEs have been found in Oligohymenophorea (assigned numbers 138, 234, 287, 290, and 315), neither overlapping with gene coding regions nor corresponding to RNA species described in Rfam. Four out of five of these HCEs have been found only within the Tetrahymena genus. The identified HCEs are described in Table S1.  Figure 1. The tree of mitochondrial evolution generated using 393 HCEs identified by our algorithm. The tree was generated by the RAxML program based on a matrix with 12 rows and 393 columns, with the matrix cells containing 1 or 0 to indicate the presence or absence of a given HCE in the mitochondrial genome of a given species, respectively.
Five HCEs have been found in Oligohymenophorea (assigned numbers 138, 234, 287, 290, and 315), neither overlapping with gene coding regions nor corresponding to RNA species described in Rfam. Four out of five of these HCEs have been found only within the Tetrahymena genus. The identified HCEs are described in Table S1. Table 1 exemplifies six HCEs found in Oligohymenophorea. HCE 287 has been found only in four Tetrahymena species. It is located upstream of the rRNA large subunit (on the complementary strand). It can be involved in the regulation of transcription or in post-transcriptional modifications of rRNA.
HCE 299. The mitochondrial nad2 and nad7 genes have opposite orientations and close positions in Oligohymenophorea; each of them starts a long operon. The alignment of Nad2 amino acid sequences annotated in GenBank demonstrates that there are nearly no conserved positions at the N terminus. Conversely, the nad7 genes are highly conserved, and their 5 ends overlap with HCE 151 in Ichthyophthirius multifiliis, Tetrahymena malaccensis, T. paravorax, T. pigmentosa, T. pyriformis, and T. thermophile.
This suggests that the nad2 gene overlaps the promoter upstream of nad7. HCE 299, containing a potential promoter, has been found within the nad2 coding regions in Tetrahymena malaccensis, T. pigmentosa, T. pyriformis, and T. thermophile. The CATA sequence (boldfaced in Table 1) corresponds to the YRTA consensus of promoters in plant mitochondria [21].
HCE 234 has been found in all Tetrahymena species between the ymf76 and ymf66 genes (both on the complementary strand). In four species, T. malaccensis, T. paravorax, T. pigmentosa, and T. pyriformis, HCE 234 is neighbored by HCE 290. The TGTA sequence (boldfaced in Table 1) corresponds to the YRTA consensus of promoters in plant mitochondria [21]. Analysis of potential promoters within HCE 290 and HCE 299 exposes a conserved motif, YRTAnnAATTY. However, the genes around HCE 290 are on the complementary strand.
HCEs 138 and 315. The Tetrahymena spp. cob gene coding for apocytochrome b is in an opposite orientation to the ymf77 gene. The Tetrahymena pyriformis genome annotation indicates a PAL2 element between these genes close to ymf77, which is similar to the parasitic PAL2-1 element from the mitochondria of Neurospora and Podospora, a senescence factor in these fungi [22]. A conserved motif has been found in this intergenic region closer to the cob gene. It corresponds to HCE 138 found in a wide range of species including Ichthyophthirius multifiliis (two regions, both between pairs of the gene encoding the large subunit ribosomal RNA), Tetrahymena malaccensis, T. paravorax, T. pigmentosa, T. pyriformis, and T. thermophila. Different localization of HCE 138 in Ichthyophthirius multifiliis and Tetrahymena spp. confirms that this element is associated with the mobile element rather than with the gene.
The same genome region harbors HCE 315, which was found only in four Tetrahymena species. Three out of four of these elements contain the CGTA sequence corresponding to the YRTA consensus of promoters in plant mitochondria [21]. This can be a promoter of the operon starting with the cob gene. However, a nucleotide was substituted in this site in T. pyriformis.
HCE 315 has not been found in other Oligohymenophorea, which suggests the presence of another promoter upstream of the cob gene in them. Indeed, a potential promoter with a different sequence has been identified in Ichthyophthirius multifiliis and Paramecium spp. Figure 2 shows the alignment of the 5 -leader sequences upstream of the cob gene in Ichthyophthirius multifiliis and Paramecium spp. The conserved region with low similarity to plant mitochondrial promoters is marked in grey; however, this region contains no YRTA site typical for such promoters [21]. The cob gene in these species is surrounded with other genes in the same DNA strand; however, the 5 -intergenic region of cob is relatively long.

Clustering of Proteins Encoded in Mitochondria in Ciliates
We used our algorithm [23] to divide the proteins encoded in mitochondria into clusters, presumable protein families. The obtained protein families are available at [24] as a database, which can be searched by protein phylogenetic profile. It should be noted that different clustering methods are also discussed in [25].

Clustering of Proteins Encoded in Mitochondria in Ciliates
We used our algorithm [23] to divide the proteins encoded in mitochondria into clusters, presumable protein families. The obtained protein families are available at [24] as a database, which can be searched by protein phylogenetic profile. It should be noted that different clustering methods are also discussed in [25].
Only one cluster including NADH dehydrogenase subunit 9 (Nad9) proteins contains paralogs. Specifically, two Tetrahymena species, T. malaccensis and T. thermophila, include very similar pairs of proteins YP_740744.1 (Nad9_1) and YP_740745.1 as well as (Nad9_2) NP_149392.1 (Nad9_1) and NP_149393.1 (Nad9_2), emerging from a recent duplication, presumably in their nearest common ancestor. Indeed, these species form a clade in two evolutionary trees discussed below, while they essentially form a polytomous group in the HCE-based tree (Figure 1). However, this conclusion can be refined. The proteins in each of the two pairs differ by a single position (specific for each pair), while the four proteins composing these pairs differ by 18 positions. Hence, it is more reasonable to propose independent duplications in these two species. The evolution of these paralogs was reconstructed by generating the tree of the Nad9 cluster using the PhyloBayes program (Figure 3), in particular demonstrating that each paralog is nearly equidistant from other proteins of the family. PhyloBayes implements commonly used Bayesian inference.

Clustering of Proteins Encoded in Mitochondria in Ciliates
We used our algorithm [23] to divide the proteins encoded in mitochondria into clusters, presumable protein families. The obtained protein families are available at [24] as a database, which can be searched by protein phylogenetic profile. It should be noted that different clustering methods are also discussed in [25].
Only one cluster including NADH dehydrogenase subunit 9 (Nad9) proteins contains paralogs. Specifically, two Tetrahymena species, T. malaccensis and T. thermophila, include very similar pairs of proteins YP_740744.1 (Nad9_1) and YP_740745.1 as well as (Nad9_2) NP_149392.1 (Nad9_1) and NP_149393.1 (Nad9_2), emerging from a recent duplication, presumably in their nearest common ancestor. Indeed, these species form a clade in two evolutionary trees discussed below, while they essentially form a polytomous group in the HCE-based tree (Figure 1). However, this conclusion can be refined. The proteins in each of the two pairs differ by a single position (specific for each pair), while the four proteins composing these pairs differ by 18 positions. Hence, it is more reasonable to propose independent duplications in these two species. The evolution of these paralogs was reconstructed by generating the tree of the Nad9 cluster using the PhyloBayes program (Figure 3), in particular demonstrating that each paralog is nearly equidistant from other proteins of the family. PhyloBayes implements commonly used Bayesian inference. The size distribution of the clusters is shown in Figure 4; the number of proteins in each species in clusters and singletons is given in Table 2. The size distribution of the clusters is shown in Figure 4; the number of proteins in each species in clusters and singletons is given in Table 2.  Finally, all clusters (39 in total) representing at least six species were selected. An alignment was generated for each of them using MUSCLE as described below in Materials and Methods. The trimAl program was then used to remove low-informative alignment columns. The alignments were concatenated into a single one with a total length of 8701 amino acids and the missing data ratio of 26%. RAxML was used to generate an evolutionary tree for the mitochondria of the species considered from this concatenated alignment; the tree was in a good agreement with the generally accepted taxonomy. Exactly the same tree has been generated by the PhyloBayes program ( Figure 5). The tree has maximum support at all nodes (100% bootstrap values for RAxML and posterior probability of 1 for PhyloBayes). This is a common practice in tree building from protein data.  Finally, all clusters (39 in total) representing at least six species were selected. An alignment was generated for each of them using MUSCLE as described below in Materials and Methods. The trimAl program was then used to remove low-informative alignment columns. The alignments were concatenated into a single one with a total length of 8701 amino acids and the missing data ratio of 26%. RAxML was used to generate an evolutionary tree for the mitochondria of the species considered from this concatenated alignment; the tree was in a good agreement with the generally accepted taxonomy. Exactly the same tree has been generated by the PhyloBayes program ( Figure 5). The tree has maximum support at all nodes (100% bootstrap values for RAxML and posterior probability of 1 for PhyloBayes). This is a common practice in tree building from protein data.

Evolution of Mitochondrial Chromosome Structure in Ciliates
The evolutionary tree of mitochondrial chromosome structures was generated from the distances between them, calculated using the chromosome structure model proposed in [19] and the program available at [26].
The resulting tree is shown in Figure 6. Each genus forms a clade in it. The Armophorea, Oligohymenophorea, and Spirotrichea classes also form clades. The close position of Armophorea and Spirotrichea on the tree is consistent with published data [4]. Thus, there is a largely good agreement between the HCE-based tree (Figure 1), the tree of proteins ( Figure 5), the tree of chromosome structures (Figure 6), and the generally accepted taxonomy. Minor differences between the trees shown in Figures 1, 4, and 5 can be attributed to the small size of the mitochondrial genomes and the corresponding relative scarcity of data. Figure 6. Evolutionary tree of mitochondrial chromosome structures. The tree was generated by the neighbor-joining method using distances between chromosome structures calculated as described in [19].

Reconstruction of Mitochondrial Chromosome Structure in Ciliates
The results of the reconstruction of the mitochondrial chromosome structures in ciliates at the internal nodes of the tree generated by the method described in [19] are presented in Table S2. The

Evolution of Mitochondrial Chromosome Structure in Ciliates
The evolutionary tree of mitochondrial chromosome structures was generated from the distances between them, calculated using the chromosome structure model proposed in [19] and the program available at [26].
The resulting tree is shown in Figure 6. Each genus forms a clade in it. The Armophorea, Oligohymenophorea, and Spirotrichea classes also form clades. The close position of Armophorea and Spirotrichea on the tree is consistent with published data [4]. Thus, there is a largely good agreement between the HCE-based tree (Figure 1), the tree of proteins ( Figure 5), the tree of chromosome structures (Figure 6), and the generally accepted taxonomy. Minor differences between the trees shown in Figures 1, 4 and 5 can be attributed to the small size of the mitochondrial genomes and the corresponding relative scarcity of data.

Evolution of Mitochondrial Chromosome Structure in Ciliates
The evolutionary tree of mitochondrial chromosome structures was generated from the distances between them, calculated using the chromosome structure model proposed in [19] and the program available at [26].
The resulting tree is shown in Figure 6. Each genus forms a clade in it. The Armophorea, Oligohymenophorea, and Spirotrichea classes also form clades. The close position of Armophorea and Spirotrichea on the tree is consistent with published data [4]. Thus, there is a largely good agreement between the HCE-based tree (Figure 1), the tree of proteins ( Figure 5), the tree of chromosome structures (Figure 6), and the generally accepted taxonomy. Minor differences between the trees shown in Figures 1, 4, and 5 can be attributed to the small size of the mitochondrial genomes and the corresponding relative scarcity of data. Figure 6. Evolutionary tree of mitochondrial chromosome structures. The tree was generated by the neighbor-joining method using distances between chromosome structures calculated as described in [19].

Reconstruction of Mitochondrial Chromosome Structure in Ciliates
The results of the reconstruction of the mitochondrial chromosome structures in ciliates at the internal nodes of the tree generated by the method described in [19] are presented in Table S2. The Figure 6. Evolutionary tree of mitochondrial chromosome structures. The tree was generated by the neighbor-joining method using distances between chromosome structures calculated as described in [19].

Reconstruction of Mitochondrial Chromosome Structure in Ciliates
The results of the reconstruction of the mitochondrial chromosome structures in ciliates at the internal nodes of the tree generated by the method described in [19] are presented in Table S2.
The left column of the table lists the tree leaf designated as (l) or terminal (according to Figure 6) leaves descending from the considered internal node. The middle column contains the order of genes on the chromosome at the corresponding node; L and C indicate linear and circular chromosomes, respectively; the asterisk preceding the gene name indicates that it is encoded in the complementary strand. The second chromosome (if any) in the species corresponding to the node begins a new line. The chromosomes at the leaves served as initial data for our algorithm. The right column lists evolutionary events that occurred at the edge incident to the considered node.
The search for HCEs was performed using the algorithm based on the dense subgraph identification described elsewhere [18]. A similar method relying on pseudo-boolean programming is discussed in [27]. The following parameters were used: key length, 8; minimum word length, 24; maximum cost of difference between words, 3.1; minimum overlap length of merged words, 20; number of consecutive deletions, 0; deletion cost, 2.1; maximum key repeat count, 1000; maximum word compaction ratio, 2.2; minimum number of different words in a word and a key, 4 and 3, respectively.
The HCE-based tree in Figure 1 was generated using the RAxML program [20] from a matrix with 12 rows and 393 columns with the cells containing 1 or 0 to indicate the presence or absence of a given HCE in the mitochondrial genome of a given species, respectively. Maximum likelihood search followed by rapid bootstrapping was performed in RAxML v. 8.2.4 with the binary substitution model and maximum likelihood estimate for the base frequencies; number of bootstrap replicates was limited to 300 using the frequency-based criterion.
The distances between chromosome structures as well as the reconstruction of chromosome rearrangements were obtained by the methods described elsewhere [19,28,29]. The default operation costs were used, specifically: the linear variant and double-cut-and-paste, 1.2; sesqui-cut-and-paste, 1.1; a-edge insertion and b-edge deletion, 1; b-edge insertion or a-edge deletion, 0.9; deletion of special a-edges, 2.0; deletion of special b-edges, 2.5. The unrooted tree shown in Figure 6 was generated from the distances between chromosome structures using the neighbor-joining method [30].
Proteins were clustered using the method described and tested in [23,[31][32][33][34]. BLAST hit threshold E = 0.001 and the most relaxed values for additional parameters: L = 0, H = 1, and very high p were used for clustering.
Protein alignment was performed by the MUSCLE program v. 3.8.31 [35] with default settings. Then, the trimAl program v. 1.2 [36] was used to remove positions with more than 50% gaps or with the similarity below 0.001. RAxML [20] and PhyloBayes v. 4.1 [37][38][39] with the MtZoa mitochondrial model [40] were used for tree generation. In the case of PhyloBayes, four chains ran in parallel for more than a thousand cycles. Upon convergence of likelihood values, alpha parameter, and tree length of the four chains, the discrepancy of bipartition frequencies between all chains was equal to zero (as shown by bpcomp utility in the PhyloBayes package). The first hundred cycles of each chain were discarded as burn-in and the majority rule consensus tree containing the posterior probabilities was calculated from the remaining trees of all chains. Both algorithms yielded the same tree. Trees with the same topology were also generated using more general CAT + GTR + Γ model in PhyloBayes and GTR + Γ model in RAxML.
Potential binding sites of transcription factors and promoters were identified using the method described elsewhere [41,42]. This method was used previously to identify binding sites of transcription factors in algal plastids [23,33,34].
GenBank gene annotations overlapping with HCEs were additionally checked against the Rfam database v. 12.1 [43].

Conclusions
At least for some organelles and taxa, the genome structure and HCE set, despite themselves containing relatively little information, still adequately describe the evolution of species. Indeed, the trees of HCEs, proteins, chromosome structures, and species are in agreement for the considered material. HCEs were found in mitochondrial genomes of ciliates (Ciliophora). Families of proteins encoded in mitochondria as well as the evolution of the chromosome structure were described in ciliate species. The data obtained were used to propose a method of combined application of our original methods to describe HCEs, protein families, and chromosome structures and eventually their evolution.