2.1. Highly Conserved Elements in Mitochondrial Genome in Ciliates (Ciliophora)
Our computer program based on the original algorithm [18
] was used to identify highly conserved DNA elements referred to as HCEs. As a result, 393 HCEs have been identified and assigned unique numbers (see Table S1
). Figure 1
demonstrates the tree generated by RAxML [20
] from a matrix with 12 rows and 393 columns showing the presence or absence in each mitochondrial genome of each HCE. Notice that this popular program deals with a matrix of ones and zeros, which distinguishes it from, e.g., PhyloBayes and the neighbor-joining method. RAxML implements the maximum likelihood method (ML). This tree is in good but naturally imprecise agreement with the species tree based on GenBank taxonomy. In particular, Moneuplotes crassus
more commonly shared HCEs with Oxytricha trifallax
than with Moneuplotes minuta
, while Paramecium aurelia
notably differed from P. caudatum
by the HCE pattern.
Five HCEs have been found in Oligohymenophorea (assigned numbers 138, 234, 287, 290, and 315), neither overlapping with gene coding regions nor corresponding to RNA species described in Rfam. Four out of five of these HCEs have been found only within the Tetrahymena
genus. The identified HCEs are described in Table S1
. Table 1
exemplifies six HCEs found in Oligohymenophorea.
HCE 287 has been found only in four Tetrahymena species. It is located upstream of the rRNA large subunit (on the complementary strand). It can be involved in the regulation of transcription or in post-transcriptional modifications of rRNA.
HCE 299. The mitochondrial nad2 and nad7 genes have opposite orientations and close positions in Oligohymenophorea; each of them starts a long operon. The alignment of Nad2 amino acid sequences annotated in GenBank demonstrates that there are nearly no conserved positions at the N terminus. Conversely, the nad7 genes are highly conserved, and their 5′ ends overlap with HCE 151 in Ichthyophthirius multifiliis, Tetrahymena malaccensis, T. paravorax, T. pigmentosa, T. pyriformis, and T. thermophile.
This suggests that the nad2
gene overlaps the promoter upstream of nad7
. HCE 299, containing a potential promoter, has been found within the nad2
coding regions in Tetrahymena malaccensis
, T. pigmentosa
, T. pyriformis
, and T. thermophile
. The CATA sequence (boldfaced in Table 1
) corresponds to the YRTA consensus of promoters in plant mitochondria [21
has been found in all Tetrahymena
species between the ymf76
genes (both on the complementary strand). In four species, T. malaccensis, T. paravorax, T. pigmentosa
, and T. pyriformis
, HCE 234 is neighbored by HCE 290. The TGTA sequence (boldfaced in Table 1
) corresponds to the YRTA consensus of promoters in plant mitochondria [21
]. Analysis of potential promoters within HCE 290 and HCE 299 exposes a conserved motif, YRTAnnAATTY. However, the genes around HCE 290 are on the complementary strand.
HCEs 138 and 315
. The Tetrahymena
gene coding for apocytochrome b is in an opposite orientation to the ymf77
gene. The Tetrahymena pyriformis
genome annotation indicates a PAL2 element between these genes close to ymf77
, which is similar to the parasitic PAL2-1 element from the mitochondria of Neurospora
, a senescence factor in these fungi [22
]. A conserved motif has been found in this intergenic region closer to the cob
gene. It corresponds to HCE 138 found in a wide range of species including Ichthyophthirius multifiliis
(two regions, both between pairs of the gene encoding the large subunit ribosomal RNA), Tetrahymena malaccensis, T. paravorax, T. pigmentosa, T. pyriformis
, and T. thermophila
. Different localization of HCE 138 in Ichthyophthirius multifiliis
spp. confirms that this element is associated with the mobile element rather than with the gene.
The same genome region harbors HCE 315, which was found only in four Tetrahymena
species. Three out of four of these elements contain the CGTA sequence corresponding to the YRTA consensus of promoters in plant mitochondria [21
]. This can be a promoter of the operon starting with the cob
gene. However, a nucleotide was substituted in this site in T. pyriformis
HCE 315 has not been found in other Oligohymenophorea, which suggests the presence of another promoter upstream of the cob gene in them. Indeed, a potential promoter with a different sequence has been identified in Ichthyophthirius multifiliis and Paramecium spp.
shows the alignment of the 5′-leader sequences upstream of the cob
gene in Ichthyophthirius multifiliis
spp. The conserved region with low similarity to plant mitochondrial promoters is marked in grey; however, this region contains no YRTA site typical for such promoters [21
]. The cob
gene in these species is surrounded with other genes in the same DNA strand; however, the 5′-intergenic region of cob
is relatively long.
2.2. Clustering of Proteins Encoded in Mitochondria in Ciliates
We used our algorithm [23
] to divide the proteins encoded in mitochondria into clusters, presumable protein families. The obtained protein families are available at [24
] as a database, which can be searched by protein phylogenetic profile. It should be noted that different clustering methods are also discussed in [25
Thus, 550 proteins from 12 mitochondria were divided into 63 non-single-element (nontrivial) clusters and 109 single-element clusters (singletons). Most singletons are represented by proteins from Oxytricha trifallax and Nyctotherus ovalis.
Only one cluster including NADH dehydrogenase subunit 9 (Nad9) proteins contains paralogs. Specifically, two Tetrahymena
species, T. malaccensis
and T. thermophila
, include very similar pairs of proteins YP_740744.1 (Nad9_1) and YP_740745.1 as well as (Nad9_2) NP_149392.1 (Nad9_1) and NP_149393.1 (Nad9_2), emerging from a recent duplication, presumably in their nearest common ancestor. Indeed, these species form a clade in two evolutionary trees discussed below, while they essentially form a polytomous group in the HCE-based tree (Figure 1
). However, this conclusion can be refined. The proteins in each of the two pairs differ by a single position (specific for each pair), while the four proteins composing these pairs differ by 18 positions. Hence, it is more reasonable to propose independent duplications in these two species. The evolution of these paralogs was reconstructed by generating the tree of the Nad9 cluster using the PhyloBayes program (Figure 3
), in particular demonstrating that each paralog is nearly equidistant from other proteins of the family. PhyloBayes implements commonly used Bayesian inference.
The size distribution of the clusters is shown in Figure 4
; the number of proteins in each species in clusters and singletons is given in Table 2
Finally, all clusters (39 in total) representing at least six species were selected. An alignment was generated for each of them using MUSCLE as described below in Materials and Methods. The trimAl program was then used to remove low-informative alignment columns. The alignments were concatenated into a single one with a total length of 8701 amino acids and the missing data ratio of 26%. RAxML was used to generate an evolutionary tree for the mitochondria of the species considered from this concatenated alignment; the tree was in a good agreement with the generally accepted taxonomy. Exactly the same tree has been generated by the PhyloBayes program (Figure 5
). The tree has maximum support at all nodes (100% bootstrap values for RAxML and posterior probability of 1 for PhyloBayes). This is a common practice in tree building from protein data.