Gene expression patterns vary among species—even among closely related species that share highly similar genomic sequences. These differences in gene expression and regulation are believed to be the major sources of species phenotypic variation and important factors in evolution [1
For many years, mutations in TFs have been thought to be the least likely source of variation, mainly because they can be responsible for negative pleiotropic effects [2
]. When a mutation arises in protein-coding regions of a transcriptional regulator, multiple target genes of the regulator are simultaneously affected, potentially causing large-scale detrimental effects [5
]. Genetic perturbations of 304 human/mouse TF orthologs in mouse associate with phenotypes and many individual TF loci have strong GWAS signals for multiple diseases [6
]. HOX TF genes play a key role in proper body pattern formation [7
], while SRY, a TF gene, is important for sex determination [8
]. In particular, C2H2 zinc finger proteins were found to diversify rapidly and to represent most of the rapidly evolving human TFs [9
During the past decade, an ever-increasing number of hidden Markov models of DNA binding domains (DBDs) and the growing sensitivity of TF detection procedures based on these models have contributed to the expansion of TF databases [11
]. Several animal TF databases have been established, such as animalTFDB3 [13
], Riken mouse TFdb [14
], FlyTF [15
], TFCat [16
], TFCONES [17
], ITFP [18
], and humanTFs [6
]. These databases collectively contain variable numbers of TFs from different species. Scanning of these databases suggests that the number of non-orthologous TFs is significant. Recent research on C2H2 TF families has also revealed the variability of TFs, but the relative frequency and consequences of global variation remain largely unexplored.
Although the systematic mapping of protein–protein interaction (PPI) is far from complete, it enables developmental and disease mechanisms at the system level to be understood by associating the global topology and dynamic characteristics of the interactome network with known biological characteristics [19
]. Orthologous human and mouse TFs show preserved TF–TF interactions in a TF-to-TF network [5
]. In contrast, information regarding the effects of non-orthologous TFs on gene regulatory networks is still limited. TFs with only non-TF interactions are usually ignored in TF-to-TF networks because they lack TF–TF interactions and are considered non-conservative. Since orthologous TFs are shared by both species, they are expected to be the core elements of the regulatory networks. Species specificity may be generated by microevolution of these orthologous TFs or their downstream target genes in each species lineage or it may be generated by rewiring the transcription networks by acquisition of new TFs and loss of existing TFs in each lineage. Because the second scenario has been largely neglected, we attempted to characterize it in this paper. Some TF/protein interactions are less well documented; however, their conservation tends to be low and mutated TFs are likely to be lethal, so they are more likely to achieve lineage-specific adaptation (reviewed in [21
]). What then, are the TFs with rare TF interactions in TF-to-TF networks? How do these TFs work to enable different numbers of TFs between species?
Based on the above findings, we identified such transcription factors, and they conformed to the speculated characteristics described in previous studies. We further investigated the origins, consequences, and underlying regulatory logic of TF evolution for this set of isolated TFs.
Our TF-to-TF network is based on the STRING database, which collects protein–protein interactions based on several types of evidence (see Materials and Methods). Interactions with genes for which there is little information may be under-represented in the list. However, because of the large amount of human RNA-seq data, the co-expression data coverage is comprehensive. TFs can regulate gene expression, so if such regulation exists, it is likely to be detected by “conserved co-expression” in STRING. Evidence of co-expression and from high-throughput laboratory experiments may include unbiased information on the TF-with-protein interactions. We adopted the interactions when there was any evidence regarding the type of interaction; therefore, the isolation of TFs is likely to be real.
Our TF database was constructed by collecting sequences with DBD. Some proteins own DBD bud does not have regulatory function and some proteins have regulatory function but do not include sequences that are similar to known DBD domain. The number of functional annotations and DBDs are growing but these are still incomplete for now. The quality of the annotation of regulatory function varies among species. Therefore, our analysis of acquisition and loss of transcription factors may be affected by the variation of the quality of functional annotation. The analysis will become more solid as many well-annotated genomes across whole mammal species become available.
In recent years, studies of the C2H2 TF family and several other TF genes have revealed the evolution of TFs [9
]. A relationship between TF sequence evolution and changes in DNA binding properties has also been found [47
]. Reports showing that TFs are evolutionarily conserved were based primarily on TFs with known DNA-binding sequence specificities, whereas reports showing that TFs are evolutionarily variable always considered entire TF families. We therefore hypothesized that there is another type of TF that, along with well-studied TFs, contribute to overall TF evolution. Three factors have been proposed to explain how TF evolution has circumvented the problem of negative pleiotropy: (1) alternative splicing, (2) short linear motifs, and (3) simple sequence repeats [49
]. Until now, however, the regulatory logic behind overall TF evolution remains unknown.
We found that one-third of TFs constitute a new TF type that is isolated in the human TF-to-TF network and that tends to be peripheral in the network of PPIs. These TFs have rarely been reported in previous human TF-to-TF network studies. The characteristics of isolated TFs are consistent with the protein characteristics related to lineage-specific phenotypes. Mutations of these isolated TFs are far less lethal than those of other TFs, indicating the high tolerance of the regulatory network to the evolution of these genes. The less strongly interacting genes encoding these isolated TFs contribute to less pleiotropic regulation. The other two-thirds of TFs make up a large connected TF component of the human TF-to-TF network containing nearly all TFs with known DNA-binding specificities.
Our comparative study of mammalian TFs presents an overview of TF member variation and demonstrates that TF evolution in mammals is ubiquitous—with changes observed in closely related species, not just between humans and mice. Starting from the same TFs in the shared common ancestor, the turnover of TFs during mammalian evolution and species–specific formation and loss events have gradually led to unique sets of TFs. In our human-mouse model, the overall force of TF formation and loss tends to be unilateral, with the overall expression level of interacting genes in a species being either relatively higher or lower. Changing the expression level of functional genes will consequently change phenotypes and pathway efficiency, an idea that is confirmed by the evidence in this study.
An isolated TF has a GO functional term overlay similar to that of connected TFs, which means that isolated TFs can also adjust a wide range of functions that are mainly regulated by connected TFs. We found that each GO term is regulated in humans and mice by a similar number of TFs, which are largely non-orthologous.
We believe that the gain and loss of TFs, mainly the isolated ones, is not a useless process, even though these changes are prevalent and tolerable to organisms. These changes will largely affect the properties of an interacting gene, such as its interaction and expression. When interacting TFs are absent or newly emerging, the same interacting genes will have different expression levels. As TF evolution has been frequent and widespread throughout mammalian history, large-scale phenotypes and pathway efficiencies have been shaped among species. These observations improve our understanding of the consequences of TF evolution.
We therefore hypothesized that these connected TFs follow the common TF regulatory pattern, with their conserved members possibly forming the backbone structure of the regulatory network. In contrast, the variable isolated TFs tune the flow of the regulatory network and give rise to species uniqueness by acting as on/off switches. This scenario explains how TFs can evolve while tolerating negative pleiotropic effects and identifies a major source of TF evolution and why TF numbers vary among species.
This situation may be best visualized by regarding the members of TF families as regulatory switches. During evolution, species may have modified the flow of the regulatory network by selecting different on/off states. Isolated TFs are an ideal tool for accomplishing this task: the relatively less lethal phenotypes of isolated TFs make them more tolerant to changes during speciation. In addition, emerging TFs in different species can diversify the expression profiles of their target genes, resulting in an adaptive phenotype for each species. Consequently, phenotypes have evolved by turning multiple switches on and off—in other words, through the formation and loss of isolated TFs.