3.1. Consistent Pseudogene Gene Pairs Identified
Our technique isolated only 50 pseudogenes in humans: 42 consistent with mouse genes, 42 of slightly various content with the rat and 38 with the rabbit (
Table 1). An expanded version of the table is provided in
Table S1, where the human pseudogenes are described in Columns A–H, their coding consistencies in the mouse in Columns I–R, in the rat in Columns S–AB and in the rabbit in Columns AC–AK. Columns Q and R contain the species numbers of the nonhuman hominoids and Old World monkeys, respectively—found to have a mouse ortholog in the current row under the imposed conditions (including number of witnesses). The numbers for the rat and rabbit are provided in Columns AA–AB and AJ–AK, analogously.
Reference species were found to possess 43 (mouse), 42 (rat) and 37 (rabbit) protein-coding genes pseudogenized in humans and some other hominoids. Among those, 10 genes in the mouse (Zp3r, Prss40, Prss46, Tmem30c, Dpy19l2, Adam5, Gm4787, Hils1, 4931406B18Rik, and Cst13) and 13 in the rat (Clca4l, Zp3r, Prss40, Prss46, Tmem30c, Taar4, Dpy19l2, Adam5, Adam4, Hils1, Cyp2g1, LOC690483, and Cst13) are expressed almost exclusively in the testes. Another 9 murine and 15 rat genes are highly expressed in the testes, but also active in other organs (H2bu2, Prorsd1, Ahsa2, Htr5b, Pcdhb14, Ccdc162, Tdh, Slc22a20, Zfp35, Clca5, Cym, Hist3h2ba, Prorsd1, Ahsa2, Tmed11, Fbxl21, Pcdhb14, Tdh, Olr836, Slc22a20, Tmem198b, Mettl21cl1, Zfp35 and RGD1560171, respectively).
Among the human pseudogenes in
Table S1, only one is annotated as processed, whilst the others represent transcribed unitary pseudogenes (21), transcribed unprocessed pseudogenes (17), unprocessed pseudogenes (8) and unitary pseudogenes (3). Accordingly, a total of 50 pseudogenes are predicted, with one of them listed three times in
Table S1.
Many of the identified genes are associated with the immune system, such is the
Zfp35 gene, whose knockout in mice results in an abnormal T-helper 2 cell differentiation, increased airway responsiveness, increased circulating interleukin-13 level, increased circulating interleukin-4 level, increased circulating interleukin-5 level and increased eosinophil cell number [
16]. This gene is expressed mainly in the testes, but also maintains high levels in the thymus, spleen and brain. In humans it is represented by the unitary transcribed pseudogene
ZNF271P. Some detected protein-coding genes are involved in digestion, e.g., murine chymosin-encoding
Cym.
Available evidence shows that knockout of many identified human pseudogene homologs in model species leads to severe disorders. For example, mice with a human-like
Cmah deficiency have hyperactive macrophages, T cells, B cells, etc. [
17]. Knockout of the heat shock protein ATPase 2 activator gene
Ahsa2 leads to abnormal cornea morphology and decreased total retina thickness [
18], whilst urate oxidase-encoding
Uox knockout causes abnormal kidney morphology and uremia [
19]. Knocking out gene
Tmem198b of the pituitary gland transmembrane protein 198b provokes reduced sensorimotor gating [
20]. In contrast, two mouse genes,
4931406B18Rik and
Htr5b, exhibit no abnormal phenotype in ablation [
21].
Of interest is the identified F-box and leucine-rich repeating protein 21-encoding gene
Fbxl21. In mice and rats it is highly expressed within the suprachiasmatic nuclei, the site of the master clock, where it displays marked circadian oscillations apparently driven by members of the PAR-bZIP family [
22]. Its knockout is associated with a shortened circadian behavioral period through ubiquitination and stabilization of cryptochromes [
23], limb grasping and decreased grip strength [
18]. In humans, it is consistent with the pseudogene
FBXL21P transcribed in the brain, kidneys and prostate, and the pseudogene ENSGGOG00000014124 in the gorilla. In the gibbon, chimp, bonobo and orangutan its coding homolog is preserved in the same syntenic context. Pseudogenization of this circadian rhythm regulator may presumably be associated with increased longevity, as well as emerging neoteny in humans [
24]. Gui et al. [
25] studied the impact of SNPs in
FBXL21 on the success of a kidney transplantation.
The identified
Ctf2, Cyp2g1, Fmo6, Olfr155, Olfr159, Olfr433 and
Taar4 genes are highly expressed in the vomeronasal organ or olfactory epithelium, which come in concordance with its reduction and degraded olfaction in humans [
24].
Homologs of the rat gene
RGD1560171 survived in all five of the non-human hominoids studied and many short-living mammals except mice (pseudogene
Gm715). This protein-coding gene is pseudogenized or lost in species with a relatively high longevity, including humans (
AL589987.1), naked mole rats and elephants. The gene
Ofcc1 in the mouse,
AABR07027339.1 in the rat and ENSOCUG00000016635 in the rabbit are consistent with the human pseudogene
OFCC1 and belong to same family with a gene potentially causal for orofacial cleft in humans [
26]. However, ablation of this gene had no effects in head development in mice.
Each human pseudogene is consistent with exactly one gene in mouse, except for CLCA3P, consistent with three syntenically linked paralogs in mice and two in rats. Each human pseudogene is consistent with exactly one protein-coding gene in the rabbit.
The human pseudogene homologs of 16 murine protein-coding genes are presumably functional in the five non-human hominoids, whilst other murine genes survived in only four species. The same ratio is observed with other reference species. Human pseudogenes presumably retain functionality in many Old World monkeys and other placental mammals (
Tables S2–S4).
3.2. Consistent Genes of the Reference Species and Their Counterparts in Other Species
Predictions in nonhuman primates and other placental mammals against the mouse, rat and rabbit are given in
Tables S2–S4, respectively. The species column contains gene IDs if a gene is present in the species. Mouse genes were found on average in 4.4 nonhuman hominoids (17 genes per 5 species) and 9.6 Old World monkeys; rat genes—in 4.4 nonhuman hominoids (17 genes per 5 species) and 9.4 Old World monkeys; and rabbit genes—in 4.4 nonhuman hominoids (15 genes per 5 species) and 9.5 Old World monkeys.
The mouse was found to have 43 consistencies with 42 human pseudogenes. Consistency is “one-to-one”, with the exception of the three mouse genes consistent with the same pseudogene
CLCA3P and the mouse gene
Cyp2g1, consistent with two neighboring human pseudogenes,
CYP2G1P and
CYP2G2P. Consistency is supported by local alignment of corresponding genes accounting for the exon-intronic structure and genomic alignment with at least 4 pairs of orthologous witnesses in a neighborhood of a specified size (3 pairs in the sole case), with the pair numbers normally being much greater.
Table S5 describes the witnesses; pseudogenes are shadowed green and the syntenic blocks separated by horizontal lines. The nearest protein-coding genes flanking the candidate genes, where possible, were chosen as witnesses within the specified neighborhoods.
The rat was predicted to have 42 genes consistent with 42 human pseudogenes. Consistency is “one-to-one”, with the exception of two genes in the rat consistent with the same human pseudogene
CLCA3P and the rat gene
Cyp2g1, consistent with two syntenic human pseudogenes,
CYP2G1P and
CYP2G2P. The exceptions thus mimic those in the mouse. The consistency is supported by the local alignment of the corresponding genes, accounting for the exon-intronic structure and genomic alignment with at least 4 pairs of orthologous witnesses in a neighborhood of a specified size (3 pairs in two cases), with the pair numbers normally being much greater.
Table S6 describes the witnesses; pseudogenes are shadowed green and the syntenic blocks separated by horizontal lines.
Rabbit was found to have 37 genes consistent with 38 human pseudogenes. Consistency is “one-to-one”, with the exception of the rabbit gene ENSOCUG00000005745, consistent with same syntenic human pseudogenes,
CYP2G1P and
CYP2G2P. Consistency is supported by the local alignment of the corresponding genes accounting for the exon-intronic structure and genomic alignment with at least 4 pairs of orthologous witnesses in a neighborhood of a specified size (3 pairs in the sole case), with the pair numbers normally being much greater.
Table S7 describes the witnesses; pseudogenes are shadowed green and the syntenic blocks separated by horizontal lines.
In most cases, the detected pseudogenes are confined strictly within a syntenic region, with the following 10 outliers found at the boundary:
A2MP1, AL589987.1, CST13P, CYP2G1P, CYP2G2P,
H2BU2P, KLRA1P, METTL21EP, OFCC1 and
TMED11P; see
Tables S5–S7.
3.3. Human Pseudogenes Independently Pseudogenized or Lost in Exactly One Nonhuman Hominoid
In contrast to humans, a few pseudogenes are known from nonhuman primates: 71% (human), 2% (gorilla), 3% (bonobo), 2% (chimpanzee), 5% (orangutan) and 62% (mouse) of the total protein-coding genes. The lack of some inferred genes in exactly one nonhuman hominoid may indicate their evolution into pseudogenes that escaped detection.
It follows from
Tables S1–S4 that the pseudogenes found in humans presumably all retain functionality in the common chimpanzee and bonobo—all except
CCDC92B, H1-9 and
SKINT1L; in the gibbon—all except
ADAM20P1, AL160191.3, CST13P and
TDH; in gorilla—all except
A2MP1, ADAM5, CYP2G1P, CYP2G2P, GUCY1B2, FBXL21P and
LINC00643; and in the orangutan—all except for 17 genes. These 31 cases of human pseudogenes that lost function in other hominoids are provided in
Table S8 (shadowed green in the first column) along with the relevant witnesses. Such gene groups are separated by horizontal lines. Pseudogenes are consistent with the reference species genes specified in
Table S1 that lack in exactly one nonhuman hominoid (bonobo, gorilla, gibbon or orangutan). Each group, starting from Column I, contains consistent nonhuman hominoid genes. For every pseudogene (except two), the genomic alignment contains a region without a consistent coding gene (marked “no gene”, except for three cases in gorilla and three in orangutan). In two cases in the gorilla, the pseudogenes are consistent with known pseudogenes, and in one case—to a coding gene with no orthology in the reference species, thus introducing conflict in the consistency with the reference and human. In the orangutan, one instance lacks genomic alignment (shadowed blue), while in the other two cases a pseudogene is consistent with three coding genes and a coding gene with a pseudogene (shadowed green), respectively. In the rest of the cases, 3 in bonobo, 4 in gorilla and gibbon, 14 in orangutan, a pseudogene-consistent coding gene is not inferred in the nonhuman hominoids, although the sequence comparison sometimes suggests its possible pseudogenization. The description fields contain the gene attributes.