Next Article in Journal
Integrated Multi-Omics Analyses Reveal That Autophagy-Mediated Cellular Metabolism Is Required for the Initiation of Pollen Germination
Previous Article in Journal
Combined Omics Approaches Reveal Distinct Mechanisms of Resistance and/or Susceptibility in Sugar Beet Double Haploid Genotypes at Early Stages of Beet Curly Top Virus Infection
Previous Article in Special Issue
Xerostomia and Its Cellular Targets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution

1
Dipartimento Scienze della Vita e Sanità Pubblica, Università Cattolica del Sacro Cuore, 00168 Rome, Italy
2
Fondazione Policlinico Universitario A. Gemelli IRCCS, 00168 Rome, Italy
3
Laboratorio di Proteomica, Centro Europeo di Ricerca sul Cervello, IRCCS Fondazione Santa Lucia, 00179 Rome, Italy
4
Dipartimento di Scienze della Vita e Dell’ambiente, Università di Cagliari, 09042 Monserrato, Italy
5
Department of Medicine and Surgery, Proteomics and Metabolomics Unit, University of Milano-Bicocca, 20854 Vedano al Lambro, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Current address: National Institute on Aging, NIH, Baltimore, MD 21224, USA.
Int. J. Mol. Sci. 2023, 24(19), 15010; https://doi.org/10.3390/ijms241915010
Submission received: 17 July 2023 / Revised: 4 October 2023 / Accepted: 6 October 2023 / Published: 9 October 2023
(This article belongs to the Special Issue Recent Advances in Salivary Gland and Their Function 2.0)

Abstract

:
Saliva houses over 2000 proteins and peptides with poorly clarified functions, including proline-rich proteins, statherin, P-B peptides, histatins, cystatins, and amylases. Their genes are poorly conserved across related species, reflecting an evolutionary adaptation. We searched the nucleotide substitutions fixed in these salivary proteins’ gene loci in modern humans compared with ancient hominins. We mapped 3472 sequence variants/nucleotide substitutions in coding, noncoding, and 5′-3′ untranslated regions. Despite most of the detected variations being within noncoding regions, the frequency of coding variations was far higher than the general rate found throughout the genome. Among the various missense substitutions, specific substitutions detected in PRB1 and PRB2 genes were responsible for the introduction/abrogation of consensus sequences recognized by convertase enzymes that cleave the protein precursors. Overall, these changes that occurred during the recent human evolution might have generated novel functional features and/or different expression ratios among the various components of the salivary proteome. This may have influenced the homeostasis of the oral cavity environment, possibly conditioning the eating habits of modern humans. However, fixed nucleotide changes in modern humans represented only 7.3% of all the substitutions reported in this study, and no signs of evolutionary pressure or adaptative introgression from archaic hominins were found on the tested genes.

1. Introduction

Saliva is a multifaceted bodily fluid that contains enzymes (amylases, lysozymes, and lipases), proteins, peptides and glycoproteins, lipids (hormones such as testosterone and progesterone), and proteases, along with a high concentration of inorganic ions [1]. To date, more than 2000 proteins and peptides have been identified in saliva [2]. They are mainly involved in the homeostasis of the oral cavity, the digestion process, and the innate immune response [3]. Ninety percent of the salivary proteins and peptides derive from the secretion of the three major salivary glands (parotid, submandibular, and sublingual glands), while the remaining 10% are secreted by minor salivary glands or derive from exfoliated cells and leucocytes present in the gingival–crevicular fluid [4] from plasma exudate, plus some contributions from the oral microbial flora. During their transit in the secretory pathway, salivary proteins undergo a series of post-translational modifications (PTMs), including phosphorylation, N-terminal acetylation, glycosylation, sulfation, and proteolytic cleavages. Further changes in proteins and peptides also occur after secretion in the oral cavity, through the action of exogenous (microflora) and endogenous enzymes [1].
The main contribution to the composition of the human salivary proteome derives from a few protein families. In particular, proline-rich proteins (PRPs), statherin (STATH), P-B peptide, histatins (HTN), cystatins (CST), and amylases (AMY) altogether represent more than 95% (w/w) of all proteins found in saliva to date [5]. PRPs represent the major fraction of the salivary proteome in Homo sapiens (nearly 70% of the total protein content; >50% in weight) and include basic (bPRPs), acidic (aPRPs), and basic glycosylated (gPRPs) PRPs. They share a high abundance of proline, glycine, and glutamine residues, which represent 70–80% of the entire amino acid sequence [6,7]. bPRPs include eleven parent peptides/proteins and more than six parent glycosylated proteins (gPRPs), plus several proteoforms derived from gene polymorphisms and PTMs [8,9,10] (Figure 1). PRPs are encoded by genes belonging to the PRP multigene family, located within the PRB locus mapping on 12p13.2. The locus includes six tandemly linked genes: PRB2PRB1–PRB4–PRH2–PRB3PRH1, in the 5′-to-3′ direction, and is highly polymorphic as it contains internally repetitive DNA sequences, leading to frequent recombinational events [11,12]. At least four alleles (S, small; M, medium; L, large; and VL, very large) are present in the Western population of Homo sapiens at PRB1 and PRB3 loci and three (S, M, L) at PRB2 and PRB4 loci [8] (Figure 1). Except for the protein encoded by the PRB3 locus that gives rise to gPRPs, all the bPRP pro-proteins are cleaved completely by pro-protein convertases, generating smaller peptides/proteins, before granule maturation [9] (Figure 1). aPRPs are expressed in two loci, PRH1 and PRH2, mapping on chromosome 12p13. Single amino acid substitution and repeat insertion generate three PRH1 alleles, encoding parotid isoelectric-focusing slow isoform (PIF-s), the parotid acidic protein (Pa)—both 150 residues long—and the double band isoform slow (Db-s)—171 amino acid residues long [10] (Figure 2A). A single nucleotide substitution generates two PHR2 alleles, encoding the PRP-1 and PRP2 isoforms [11] (Figure 2A). A pro-protein convertase partially cleaves PRP-1, PRP2 and PIF-s in 3 N-terminal fragments of 106 residues, called PRP3, PRP4, PIF-f (PRP3 type), and a common C-terminal fragment of 44 amino acids, called P-C peptide. Db-s is cleaved at position 127 generating two peptides: Db-f (f stands for fast) and the P-C peptide (same as above) [12] (Figure 2A). The Pa isoform not carrying the convertase sequence generates a dimeric form through a disulfide bond [13] (Figure 2A). STATH is encoded by the STATH gene located in chromosome 4q13-19 [13,14]. Several STATH proteoforms are detectable in saliva due to phosphorylation, cyclization by transglutaminase 2, and proteolysis by amino-/carboxy-peptidases and convertase action [13,15,16]. P-B is a proline-rich small peptide encoded by the SMR3B gene, mapping on chromosome 4q13.3 [17], near the STATH gene, possibly sharing epigenetic control and/or the DNA replication timeframe [13,15,16]. HTN are small cationic histidine-rich peptides encoded by the HTN1 and HTN3 genes on chromosome 4q13. Despite their high sequence homology, HTN1 and HTN3 have different maturation pathways and biological activities [17,18,19].
CST are inhibitory cysteine proteases involved in the innate immune response [20]. CSTA and CSTB are encoded by CSTA and CSTB genes, respectively, whereas CST-SN, CST-SA, CST-C, CST-S, and CST-D are encoded by CST1-CST5 genes (Figure 2B). Several PTMs occur in CST proteins, including N-acetylation, proteolytic cleavages, phosphorylation, and M-, W-, and C-oxidation, causing different final protein structures detectable in human saliva [21]. Also, two isoforms generated by single amino acid substitutions of cystatin D and cystatin SN are present in saliva [21] (Figure 2B).
The amylase alpha 1A (AMY1A) gene, on chromosome 1p21.1, is responsible for the expression of AMY, which accounts for about 20% of the weight of salivary proteins and is the most abundant protein of the whole saliva of Homo sapiens.
Several comparative studies have shown that the human salivary proteome differs from other species due to genetic divergences that are possible due to environmental factors, including diet and pathogens [22,23,24,25]. A recent study reported the results obtained from the comparison of the salivary proteomes of Homo sapiens sapiens (modern humans) with our closest extant evolutionary relatives, chimpanzees, and gorillas [26]. The authors demonstrated that the salivary protein composition is unique to each species despite their close sequence homology, which likely reflects an evolutionary adaptation [26]. Despite this initial observation, the evolution of human loci-encoding salivary proteins has not been studied to date. Nowadays, the increasing amount of genomic data obtained through sequencing of preserved skeletal remains of extinct hominins, such as Homo neanderthalensis (Neanderthals) and Homo Denisova (Denisovans), can reveal the extent of diversity that has emerged at the genomic level during more recent human evolution.
In this study, we aimed to identify the sequence changes that have been fixed during the recent human evolution in the gene loci encoded for the most abundant salivary proteins (namely, PRPs, statherin, P-B peptide, histatins, cystatins, and amylases) to gather possible functional indications regarding their evolutionary path and their contribution to oral homeostasis and salivary functions. Eating habits may be indeed mutually implicated with salivary proteins’ biology since these are implicated in the modulation of the microbiome of the oral cavity and the entire gastrointestinal tract [26]. To achieve this, we have interrogated the publicly available sequence databases of Neanderthals and Denisovans and compared them with modern human genome sequence data. This allowed us to identify several nucleotide substitutions in the loci coding for the most relevant human salivary protein families.

2. Results

By comparing the genomic sequences of salivary gene loci in modern humans with those of Altai Neanderthals, Chagyrskaya Neanderthals, Vindija Neanderthasl, and Denisovans, we identified an overall number of 3472 sequence variants/nucleotide substitutions across the 17 tested salivary genes in coding, noncoding, 5′-3′ untranslated (UTRs), and regulatory regions. The nucleotide substitutions observed in the 17 salivary-tested genes were summarized in Figure 3. Of the 3472 changed nucleotides, only 428 were in coding regions, and 121 were annotated as synonymous (Figure 3). The remaining 307 nucleotide variations were nonsynonymous (Figure 3), which are known to be subjected to a higher evolutionary pressure and are frequently exposed to natural selection [27,28]. We have, therefore, attempted a functional interpretation of nonsynonymous variations, which is inherently speculative and deserves future functional studies. The potential impact of nonsynonymous variants on salivary proteins’ function of Neanderthals and Denisovans was predicted by a SIFT (sorting intolerant from tolerant) analysis (see Table 1, Table 2 and Table 3), which enables predicting amino acid substitutions that may exert a deleterious effect. The reference single nucleotide polymorphism (SNP) number (rs) and the corresponding frequencies of the 107 missense changes in coding regions were also reported in Table 1, Table 2 and Table 3. Of note, even though the nucleotide changes located in noncoding regions should not affect the primary structure of the encoded protein, they could affect regulatory elements that may modify the splicing and/or the binding of epigenetic modulators and/or chromatin folding/looping. The variants fixed at 100% in modern humans compared to ancient hominines were highlighted in light orange in Table 1, Table 2 and Table 3 and Tables S1–S17.
In the following subparagraphs, the results were detailed considering one locus at a time. Note that given the extreme structure heterogeneity of the tested genes with multiple alleles and different lengths, the nucleotide variations were indicated according to their genomic coordinates (see Section 4 for details).

2.1. Nucleotide Variations in the Gene Loci Encoding Basic Proline-Rich Proteins

2.1.1. PRB1 Gene

The genomic alignment allowed us to identify 130 nucleotide changes in the PRB1 gene in ancient hominines compared with modern humans (Table 1 and Table S1). Fifty-five of these were detected within coding exons and included ten synonymous and forty-five nonsynonymous nucleotide substitutions. Among the nonsynonymous nucleotide substitutions, 20 corresponded to SNPs annotated in modern humans (Table 1). SIFT prediction indicated that 46% of these missense variants have a significant effect on protein function based on sequence homology and the physical properties of the involved amino acids (Table 1). The T-C transition, which occurred in modern humans at position 11,506,774, causing the substitution of R72 with a Q in the II-2 isoform (Table 1 and Figure 4a), may have an impact on post-translational protein processing. Indeed, the modern human R72 residue is part of the R72SPR75 consensus sequence recognized by the pro-protein convertase responsible for the cleavage between II-2 and P-E peptides. Therefore, we may hypothesize that in archaic species, the PRB-1-encoded protein was a fused peptide spanning 136 amino acids, which integrates the modern II-2 and P-E (Table 1 and Figure 4a). The sequences of the peptides and the resulting putative archaic protein primary structures (named PRB-1 salivary archaic fusion 1 peptide, PRB-1 SAF-1) are reported in Figure 4a. The remaining seventy-five nucleotide changes identified in the PRB1 locus were found to fall within noncoding regions, namely fifty-four in introns, six in upstream regions, one in the 5′ UTR, 1 in the 3′UTR, and thirteen in downstream regions (Table S1).

2.1.2. PRB2 Gene

One hundred and thirty-six nucleotide substitutions were detected in the PRB2 locus in ancient hominines compared with modern humans (Table 1 and Table S2). Thirty-seven of these were identified in introns, ten in upstream regions, one in the 3′UTR, and eight in downstream regions. The remaining eighty variations were found in coding regions, namely two in exon 1 (corresponding to the signal peptide), one in exon 2, and the remaining in exon 3 (Table 1 and Table S2). Of note, the modern human sequence reported in the UniProtKB database corresponded to the L allele coding for the common isoforms IB-8a Con1- and P-H S1, the first one with a P residue instead of an S at position 100, the second one with an S residue instead of an A at position 1 [8]. Of the 80 sequence variants found in coding exons, 64 were nonsynonymous, causing amino acid substitutions. SIFT prediction indicated that 19% of these missense variants have a significant effect on protein function based on sequence homology and the physical properties of the involved amino acids (Table 1). Twenty-six out of the sixty-four nonsynonymous substitutions were annotated as common variants (SNPs) in modern humans (Table 1). In particular, two changes occurring at 11,546,686 bp and 11,546,677 bp caused the substitution of the R93 and R96 with Q within the ancient IB-1 isoform. The two archaic residues were found in all four species, (Table 1). This implied that the archaic hominins’ R93SPR96 consensus sequence, recognized by the pro-protein convertase, apparently lacked two key arginine residues, thus disabling the post-translational cleavage. Therefore, the ancient saliva composition should feature a protein deriving from the fusion of IB-1 and P-J peptides, spanning 157 amino acids (named the PRB-2 salivary archaic fusion 2 peptide, PRB-2 SAF-2 peptide, in Figure 4b). Conversely, the presence of a C nucleotide at 11,546,314 bp in Neanderthals and Denisovans, instead of T in modern humans, led to the introduction of an R instead of the Q59 (Q217 in pro-protein) of the IB-8a Con1- isoform. This archaic primary structure would then include an additional pro-protein convertase consensus sequence, R59SAR62, causing the cleavage of the IB-8a Con1- protein into two smaller peptides. According to the usual removal of the C-terminal arginine residue observed for almost all the bPRPs, both peptides should be 61 aminoacidic residues long (Figure 4c). These putative archaic hominins’ PRB-2 variants are named by us the PRB-2 salivary archaic cleavage 1 peptide (PRB-2 SAC-1 peptide) and the PRB-2 salivary archaic cleavage 2 peptide (PRB-2 SAC-2 peptide) and are shown in Figure 4c. Of note, the sequence of the PRB-2 SAC-1 peptide exactly corresponds to the sequence of the modern human P-J peptide with an alanine (A61) instead of a serine in the last amino acid residue. The sequence of the PRB-2 SAC-2 peptide exactly corresponds to the modern human P-F peptide with a serine (S61) instead of an alanine in the last amino acid residue (Figure 4d and [9]). The variation at 11,546,395 bp indicated that in archaic hominins, the P31 (P189 of pro-protein) residue was replaced by a Q in the IB-8a Con1-; this change results probably in a deleterious effect on protein function, as predicted by SIFT analysis.
The protein name, the modifications with respect to modern humans, and the corresponding frequencies found in Neanderthals, Chagyrskayas, Vindijas and/or Denisovans are reported for each archaic protein. The positions of each substitution are also reported in the primary sequences (residues in bold characters). q: pyroglutamic acid; S: phosphorylated serine.

2.1.3. PRB3 Gene

We have identified 163 nucleotide variations in the PRB3 locus in ancient hominines compared with modern humans (Table 1 and Table S3). Of these, 53 were detected in coding regions and 110 in noncoding regions (71 within introns, 14 in upstream regions, 2 in the 3′UTR, and 23 in downstream regions; Table S3). The archaic sequences were compared with the allele Gl-2 (or PRP-3M) of modern humans. Fourteen variations identified in coding exons were synonymous, whereas thirty-nine changes were missense variants. Twelve out of the thirty-nine nonsynonymous substitutions corresponded to annotated common variants in modern humans (Table 1). PRP3 protein contains eight N-glycosylated Asp residues falling into the NXS/pS sequon; among the substitutions found in the PRB3 gene, only those at position 11,420,728 fall within the consensus sequence (S136F), and deleterious results for the protein function were predicted by SIFT (Table 1). Overall, 37.5% of the substitutions were found to be deleterious on the protein function (Table 1). The noncoding variant found at position 11,420,458 could probably affect the splicing process of PRB3 transcripts in ancient hominins since it fell within the GU consensus site (splice donor site) at 5′ end of intron 3 (Table S3).

2.1.4. PRB4 Gene

For the PRB4 locus, we detected 129 nucleotide substitutions in ancient hominines compared with modern humans (Table 1 and Table S4). Of these, 27 were found in coding exons, including 4 synonymous and 23 nonsynonymous (Table 1), and 102 in noncoding regions (Table S4). The archaic sequence was compared with the small allele of the modern human locus coding for P-D peptides and glycosylated protein A (PGA). The 23 missense variants were all found within coding regions for the glycosylated protein A, while none of the identified variations would affect the P-D variant (see Table 1 for details). These variations had no consequence on the consensus sequence of pro-protein convertase or on the sequence of the glycosylation sites. It is interesting to observe that all the archaic sequences reported a code for the P-D P32A variant. Overall, seven out of the twenty-three nonsynonymous in the PRB4 locus corresponded to annotated common variants in modern humans, and only 13% were found to be deleterious on the protein function (Table 1).

2.2. Nucleotide Variations in the Gene Locus Encoding the a-PRP

One hundred and sixty-three nucleotide substitutions have been annotated in the PRH2 gene locus in ancient hominines compared with modern humans (Table 2 and Table S5), of which thirty fell within coding exons, including seven synonymous and twenty-three nonsynonymous. Four of these latter corresponded to annotated common variants in modern humans (Table 2). Sixty-six nucleotide substitutions were identified in introns, seven in upstream regions, three in the 5′UTR, forty-nine in the 3′UTR, and eight in downstream regions (Table S5). The archaic DNA sequences reported in the sequence database used in this study (see Section 4 for details) corresponded to the PRP-1 protein of the PRH2 alleles, thus having a N50 residue. The nucleotide variations reported in Table 1 generated two synonymous substitutions at D6 and P135.

2.3. Nucleotide Variations in the HTN Gene Loci

A total of 188 and 175 nucleotide substitutions were identified in the HTN1 and HTN3 genes, respectively (Table 2, Tables S6 and S7). The nucleotide substitutions reported in HTN1 are distributed as follows: 4 fell within coding exons, including1 synonymous and 3 nonsynonymous, and 184 fell in noncoding regions, including146 within introns, 6 in upstream regions, 3 in the 5′UTR, 9 in the 3′UTR, and 20 in downstream regions (Table 2 and Table S6). Regarding HTN3, 3 nucleotide changes were reported in coding exons (1 synonymous and 2 nonsynonymous), whereas 172 fell in noncoding regions (145 within introns, 9 in upstream regions, 3 in the 5′UTR, 5 in the 3′UTR, and 10 in downstream regions) (Table 2 and Table S7). One missense variant for HTN1 and one for HTN3 found in ancient hominins were also reported as SNPs in modern humans (Table 2).

2.4. Nucleotide Variations in the AMY1A Gene Locus

Two hundred and twelve nucleotide substitutions have been annotated in the AMY1A gene locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S8). Forty changes fell within coding exons, of which eleven were synonymous and twenty-nine were nonsynonymous. Only one of the nonsynonymous substitutions corresponded to an annotated common variant in modern humans (Table 2). One hundred forty-four nucleotide substitutions were identified in introns, four in upstream regions, nine in the 5′UTR, and fifteen in downstream regions (Table S8).

2.5. Nucleotide Variations in the STATH and P-B Gene Loci

One hundred fifty-nine nucleotide substitutions have been annotated in the STATH gene locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S9). Six changes fell within coding exons, of which two were synonymous and four were nonsynonymous (Table 2). One hundred fifty-three nucleotide substitutions were detected in introns and regulatory regions (Table S9).
One hundred eighty-seven nucleotide substitutions were detected in the SMR3B locus in Neanderthals and Denisovans compared with modern humans (Table 2 and Table S10). Of these, 5 were found in coding exons (2 synonymous and 3 nonsynonymous), 155 were in introns, 3 in upstream regions, 3 in 5′UTRs, 10 in 3′UTR, and 11 in downstream regions (Table 2 and Table S10). One missense variant was reported as an SNP in modern humans (Table S10).

2.6. Nucleotide Variations in the CST Gene Loci

2.6.1. CST1 Gene

We have annotated 227 nucleotide substitutions in the CST1 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S11). Of these, 128 were found in introns, 19 in upstream regions, 7 in the 5′UTR, 12 in the 3′UTR, 32 in downstream regions (Table S11), and 29 in coding regions, including 11 synonymous and 18 missense variations (Table 3). The nucleotide variation at 23,731,494 bp caused the substitution of the Y3(sp) with an H, affecting the third amino acid residue of the signal peptide. This should not impact the function of the protein, although it may have affected the speed of protein translation and/or the correct processing and trafficking. Four substitutions out of eighteen could have a negative impact on protein function, as predicted by SIFT. Overall, nine nonsynonymous nucleotide substitutions corresponded to annotated common variants in modern humans (Table S11).

2.6.2. CST2 Gene

We detected 167 nucleotide changes in the CST2 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S12). Of these, 103 were in introns, 15 in upstream regions, 8 in the 3′UTR, 17 in downstream noncoding regions (Table S12), and 24 in coding regions (Table 2). The latter included six synonymous and nineteen nonsynonymous variations, eight of which were predicted to have a deleterious effect on protein function (SIFT score < 0.05). Ten out of the eighteen nonsynonymous substitutions corresponded to annotated common variants in modern humans (Table 2). Interestingly, the nucleotide change at 23,804,691 bp fell into the canonical DNA-binding motif for the NR3C1 (nuclear receptor subfamily 3 group C member 1) transcription factor, as reported in the UCSC Genome Browser. This variation could most likely affect the affinity of this factor for the regulatory region and thus the expression of the CST2 gene.

2.6.3. CST3 Gene

In the CST3 locus, we have identified 452 nucleotide variations in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S13). Of these, 329 were in introns, 18 in upstream regions, 9 in 5′UTR, 50 in 3′UTR, 29 in downstream noncoding regions (Table S13), and 17 in coding regions, including 9 synonymous and 8 nonsynonymous variations (Table 2). One nucleotide substitution corresponded to an annotated common variant in modern humans (Table 2).

2.6.4. CST4 Gene

Two hundred and sixty-three nucleotide substitutions were detected in the CST4 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S14). These included 130 changes in introns, 42 in upstream regions, 4 in the 5′UTR, 20 in the 3′UTR, 43 in downstream noncoding regions (Table S14), and 24 in coding exons (11 synonymous and 13 missense variations; Table 3). Seven variations in this locus corresponded to annotated common variants in modern humans (Table 3). The change at 23,666,565 bp caused the substitution of the M111 with an R in the corresponding Neanderthal peptide structure. Even if it causes the substitution of an uncharged amino acid with a charged one, the SIFT analysis did not predict a deleterious effect of this variant on the function of the archaic protein compared to modern humans.

2.6.5. CST5 Gene

One hundred ninety-three nucleotide substitutions were annotated in the CST5 locus in Neanderthals and Denisovans compared with modern humans (Table 3 and Table S15). Sixteen changes were mapped in the coding region, including eight synonymous and eight nonsynonymous (Table 3). Of the 177 nucleotide substitutions located in noncoding regions, 118 were in introns, 24 in upstream regions, 18 in 3′UTR, and 17 in downstream regions (Table S15). The exonic nucleotide variation generated the codon for an R in both archaic hominins instead of C26. This represented a common variant also found in modern humans (rs1799841). The cystatin D variant with the R26 is frequently detected in the soluble fraction of human saliva, probably because is more soluble than the C26-containing isoform [19]. Moreover, the opposite substitution (R26C) was detectable with high frequency at the same amino acid residue in the cystatin SA gene of Neanderthals. Five out of the eight nonsynonymous nucleotide substitutions corresponded to annotated common variants in modern humans (Table 3).

2.6.6. CSTA and CSTB Genes

Finally, 394 and 134 nucleotide substitutions were identified in CSTA and CSTB loci, respectively, in Neanderthals and Denisovans compared with modern humans (Table 3, Tables S16 and S17). The nucleotide substitutions reported in CSTA were distributed as follows: 6 fell in coding exons, including 2 synonymous and 4 nonsynonymous, and 388 fell in noncoding regions, including 346 in introns, 10 in upstream regions, 5 in the 5′UTR, 10 in the 3′UTR, and 17 in downstream regions (Table 3 and Table S16). Among these changes, the variation at 122,044,848-122,044,850 positions of CSTA was a CTT deletion, observed exclusively in Denisovans (Table S16). This fell within the canonical DNA-binding motif for the Spi-1 proto-oncogene transcription factor (source: UCSC Genome Browser); therefore, it could probably affect the expression of the CSTA gene in the ancient hominin. Regarding CSTB, 9 nucleotide changes were reported in coding exons (6 synonymous and 3 nonsynonymous), whereas 125 fell in noncoding regions (55 within introns, 27 in upstream regions, 5 in the 5′UTR, 15 in the 3′UTR, and 23 in downstream regions) (Table 3 and Table S17). One missense variant for CSTA and 1 for CSTB found in ancient hominins were also reported as an SNP in modern humans (Table 3).

2.7. Geographic Distribution of Genetic Variants in Modern Humans

Of note, the salivary protein genes tested resulted polymorphic in humans. The frequency of specific coding nonsynonymous genetic variants also changed between different populations, as reported in the Geography of Genetic Variants Browser (https://popgen.uchicago.edu/ggv; accessed on 22 July 2022) (File S1) [29]. In particular, 20 genetic variants (three in the PRB1 gene, six in PRB2, one in PRB3, two in CST1, four in CST2, three in CST5, and one in CSTB; highlighted in red in Table 1, Table 2 and Table 3) displayed a different geographic distribution and specifically; rs554211998, rs201994479, rs34305575, rs6076122, rs111349461, rs55860552, rs568411970, rs145031249, and rs1799841 showed a peculiar allele frequency in African populations (File S1).

2.8. Evolutionary Pressure of Salivary Protein Genes

To investigate if some of the salivary protein genes studied showed evidence of positive selection in anatomically modern humans, we performed a population branch statistics (PBS) analysis [30]. Our results showed no signal of recent selective pressure for the genes analysed, attesting that variants on these genes did not affect individual fitness (File S2). We also implemented the Tajima test as an additional evolutionary analysis to evaluate the selective effects of each observed substation. Tajima’s D values show comparable variance among the genes analysed. The D values were prevalently slightly negative or positive (ranging from −0.698 to 3.359) (File S3), confirming the absence of a selective sweep [31], which was already suggested by the PBS test.
Compared to modern humans, Neanderthal and Denisovan genomes showed evidence of ancient interbreed [32], leading to an uneven distribution of introgressed chromosomal regions because of natural selection [33]. To investigate if some of the salivary protein gene variants studied might be due to interbreeding, we used two databases of archaic introgression based on a comparison with modern genomes from the 1000 genomes project [34] and the Estonian Biocentre collection [35], which also reported data from previous studies [33,36]. However, the considered genes were not encompassed within the chromosomal regions highlighted in the databases and, therefore, did not show an apparent sign of adaptative introgression from archaic hominins.

3. Discussion

The different dietary habits of archaic hominins and modern humans have been mostly attributed to the changes in the availability of natural food resources, the oral bacterial community (microbiota), and climatic conditions [37,38]. A role for salivary proteins can be also inferred, as they are known to be implicated in the modulation of the microbiome of the oral cavity, the entire gastrointestinal tract, and taste perception [39]. aPRPs can promote the attachment of several important bacteria, such as Actinomyces viscosus, Bacteroides gingival, and some strains of Streptococcus mutans. Moreover, both aPRPs and statherin promote the colonization of oral surfaces by Porfiromonas gingivalis [40]. It was reported that the salivary proteins may modulate oral health and homeostasis, maintain a stable ecosystem, and inhibit the growth of cariogenic bacteria [41,42]. Recently, 258 salivary proteins were found differentially expressed between the caries-free and caries-active children [43]. They are also involved in taste perception. In particular, the salivary bPRPs II-2 and Ps-1 contribute to bitter taste sensitivity [44]. Also, some salivary peptides belonging to the bPRPs and the histatin families can bind polyphenols in tannin-rich foods, thus evoking the typical astringent sensation [44]. Salivary proteins play an important role in affecting sweet [45], salt [46], and umami [47] tastes, along with fat, salt, and bitter acceptance [48,49]. Also, cystatins are supposed to affect taste perception, as lower salivary levels of these peptides may enhance proteolysis, which would affect the mucosal pellicle lining of the oral cavity, thereby increasing the accessibility of tastants to taste receptors [49]. Interestingly, most of these proteins have been shown to be modulated in pathological conditions, including tumors and inflammation, suggesting that they play a role as clinically relevant biomarkers [5].
Therefore, a hypothesis has been raising that the evolutionary changes occurred in the structure of these proteins could be associated with the different dietary habits of archaic hominins. In this regard, mutations in different bitter taste receptor genes (namely TAS2R62, TAS2R64, and TAS2R38) and the masticatory myosin gene MYH16, along with the duplication of the salivary amylase gene AMY1 that has occurred in recent human evolution, have been associated with variations in taste sensitivity and the shift toward the food cooking habits of modern humans [50].
Based on this emerging background, in this study, we identified and inferred the functional consequences of the nucleotide substitutions fixed in the gene loci coding for the main salivary proteins in modern humans compared to ancient hominins species (Neanderthals and Denisovans).
By mapping over 3400 nucleotide substitutions, we have shown that the majority (87.7%) of changes are detectable in the genes expressing the most important salivary proteins (proline-rich proteins, statherin, P-B peptides, histatins, cystatins, and amylases) of modern humans, compared with Neanderthals and Denisovans, mapped within noncoding regions.
Quite unexpectedly, our data also showed the presence of nucleotide variations affecting the coding sequence of all 17 gene loci analysed. Overall, the frequency of coding variations in these genomic loci is far higher than the general rate found throughout the genome since previous studies highlighted that relatively few amino acid changes have become fixed in recent human evolution to date [51,52]. To the best of our knowledge, this study provides the first original description of coding nucleotide changes that occurred in salivary protein genes during the recent evolutionary shift of modern humans from Neanderthal and Denisovan species. Focusing on these missense variations, we hypothesized the possible functional effects they could have played in protein structure, processing, and function. Of the 307 missense changes found in the coding regions of the tested genes, 92 were predicted to have a potentially deleterious effect on protein function.
The changes identified in the PRB1 and PRB2 genes are worth particular attention and could be interpreted in light of the extant knowledge of the biology of the encoded proteins. As already mentioned, the PRB protein family is highly polymorphic and, despite being common to all mammals, the proteins belonging to this family feature have significant structural differences among species. For instance, the peptides generated by the convertase cleavage span 50 to 90 amino acids in length in humans and 10 to 40 in pigs, with sensible variations in the peptide sequences [53]. Therefore, bPRPs appear to be non-conserved across species, probably because they are mostly implicated in taste perception and underwent a deep transformation during evolution due to the changing habits and habitats of the species [44]. Interestingly, our results showed that three nucleotide substitutions annotated in the archaic hominins’ PRB1 and PRB2 genes affect specific arginine residues within the consensus sequences of the polypeptide, which are recognized by the pro-protein convertases responsible for their cleavage. These changes could have determined the presence of fused proteins in the archaic hominins’ proteome. The putative “PRB1 salivary archaic fusion 1 peptide” and “PRB2 salivary archaic fusion 2 peptide” could have been possibly associated with additional and/or alternative functions that able to influence the eating habits of extinct hominins. In addition, we have also identified a sequence change in the PRB2 gene that instead generates a new pro-protein convertase consensus sequence in the encoded peptide. As a result, ancient hominins could have expressed two smaller peptides, the “PRB2 salivary archaic cleavage 1 peptide” and the “PRB2 salivary archaic cleavage 2 peptide”, possibly exerting alternative functions, which deserve further functional studies.
The missense nucleotide substitutions annotated in the remaining salivary protein genes described in this study (aPRPs, histatins, amylases, statherin, P-B peptide, and cystatins) could be interpreted, at least in part, considering the putative changes that they can cause in post-translational protein processing, sorting, localization, and trafficking toward secretion. In addition, all the missense variations that introduce or remove a cysteine residue on the archaic cystatins, most likely affecting the conserved sequences involved in the protein-protein binding [53], could also influence protein function.
We also annotated the nucleotide variations fixed within the noncoding regions of modern humans of the tested genes, given these could reasonably affect the expression levels of salivary proteins by changing the affinity of transcriptional regulators for promoters, enhancer and/or silencer elements, and/or the splicing, in addition to changing splice site consensus sequences and leading to the formation of alternative coding transcripts. Also, they could affect post-transcriptional regulation mechanisms, such as the binding of the noncoding regulatory RNAs, leading to varying protein types and amounts that emerged during the recent evolution. Specifically, two nucleotide substitutions found in the CST2 and CSTA gene loci appear to fall within the canonical DNA-binding motifs for specific transcriptional factors, which could most likely intervene in the modulation of their expression. We also annotated 216 changes in the 3′ untranslated regions in 16 of the 17 genes analysed (in all but AMY1A). These substitutions might instead condition the binding of specific microRNA-targeting salivary protein transcripts, modulating their stability and the translation process.
Lastly, 34.9% of the nonsynonymous nucleotide substitutions identified in this study appear to be frequent in the modern human genome, where they are annotated as single nucleotide polymorphisms (SNPs). In addition, some of these coding genetic variants display a different geographic distribution in humans. This observation reduces the evolutionary significance of such changes, which are to be considered in light of the polymorphic nature of these genomic loci. However, taken together, variants showing alternative nucleotide fixation in modern vs. archaic humans represent 7.3% of all the nucleotide substitutions reported in the study.
Also, our results do not suggest any significant evolutionary pressure or sign of adaptative introgression from archaic hominins on the tested genes.

4. Materials and Methods

4.1. Nucleotide Variants Annotation

In order to annotate all the nucleotide variants within the gene loci of the salivary proteins of interest, we compared modern human sequences with Altai Neanderthals (downloaded from http://cdna.eva.mpg.de/Neanderthal/altai/AltaiNeanderthal/bam/, accessed on 2 May 2020), Chagyrskaya Neanderthals (Index of/neandertal/Chagyrskaya/BAM (mpg.de), accessed on 9 December 2022), Vindija Neanderthals (Index of/neandertal/Vindija/bam/Pruefer_etal_2017/Vindija33.19 (mpg.de), accessed on 9 December 2022), and Denisova sequences (http://cdna.eva.mpg.de/denisova/alignments/, accessed on 2 May 2020) [54,55]. The fossil remains, aged between 50,000 and 30,000 years, come from two distinct geographical areas. The female Neanderthal sample from Vindija (Croatia), in the Western Balkans, yielded a 30× genome coverage [56]. The other samples came from two different sites in the Altai Mountains in Siberia (Russia): the genomic data of a female Neanderthal (at 52× coverage) [57] and a juvenile female Denisovan individual (at 30× coverage) [55] came from the Denisova cave, and another female sample came from the Chagyrskaya cave, located about 100 km westward, and yielded a genome of 27× coverage [58]. In particular, we aligned the sequences of modern humans and ancient hominines by means of the Integrative Genomics Viewer (IGV) tool (2.3.72 version) [59,60,61]. Note that the reference genomes annotated in this database are set on the hg19 genome assembly coordinates. We annotated all the nucleotide substitutions with a frequency greater than 10% and a coverage of a minimum of 10 counts in both coding, noncoding, and regulatory sequences (i.e., 5′ and 3′ untranslated and flanking upstream and downstream regulatory regions) for each gene of interest to consider the possible damage and fragmentation to which the ancient hominin DNA was subjected. Of note, the variant frequency indicated the percentage of frequency of that substitution in ancient hominines, as reported by the IGV tool, considering the depth (coverage) of the reads displayed at each locus. For each tested gene, a region of approximately 500 bp upstream and downstream of the first and last exons was, respectively, considered and screened to annotate nucleotide substitutions within regulatory regions able to affect the gene expression rate. The precise hg19 genomic coordinates for each tested gene locus were as follows: PRB1 locus 11,509,000–11,504,200 on chromosome 12; PRB2 locus 11,549,000–11,544,000 on chromosome 12; PRB3 locus 11,423,140–11,418,300 on chromosome 12; PRB4 locus 11,463,900–11,459,500 on chromosome 12; PRH2 locus 11,081,500–11,087,950 on chromosome 12; HTN1 locus 70,915,750–70,925,000 on chromosome 4; HTN3 locus 70,893,670–70,902,700 on chromosome 4; AMY1A locus 104,239,500–104,229,500 on chromosome 1; STATH locus 70,861,200–70,868,790 on chromosome 4; SMR3B locus 71,248,550–71,256,400 on chromosome 4; CST1 locus 23,732,000–23,727,600 on chromosome 20; CST2 locus 23,807,800–23,803,900 on chromosome 20; CST3 locus 23,619,100–23,606,800 on chromosome 20; CST4 locus 23,670,200–23,665,700 on chromosome 20; CST5 locus 23,860,900–23,856,000 on chromosome 20; CSTA locus 122,043,600–122,061,300 on chromosome 3; and CSTB locus 45,196,800–45,193,000 on chromosome 21.
The annotation with the corresponding frequency of all variations in present-day human populations was collected by integrating information from both the dbSNP (Single Nucleotide Polymorphism Database; https://www.ncbi.nlm.nih.gov/snp, accessed on 15 July 2020) and the Ensembl (http://www.ensembl.org/index.html, accessed on 15 July 2020) databases. In particular, the frequency was reported as the Allele Frequency Aggregator (ALFA New). The analysis of regulatory regions in the gene loci analysed was assessed by implementing the information available on the UCSC Genome Browser database (https://genome.ucsc.edu, accessed on 15 July 2020).
The coding sequences of salivary proteins were extracted from the publicly available UniProtKB database (https://www.uniprot.org/, accessed on 15 July 2020): PRB1, primary accession number: P04280; PRB2: P02812; PRB3: Q04118; PRB4: P10163; PRH2: P02810; HTN1: P15515; HTN3: P15516; STATH: P02808; AMY1A: P0DUB6; P-B: P02814, CST1: P01037; CST2: P09228; CST3: P01034; CST4: P01036; CST5: P28325, CSTA: P01040, CSTB: P04080.

4.2. Protein Data Analysis

The potential impact of the amino acid substitution on salivary protein function was predicted by SIFT (sorting intolerant from tolerant) version 5.1.1 using the Genome tool (SIFT nonsynonymous single nucleotide variants (genome-scale), available at the SIFT website (http://sift.jcvi.org/, accessed on 20 June 2022). The SIFT algorithm is based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences, collected through PSI-BLAST [62]. SIFT results with a score < 0.05 indicate amino acids deleterious on protein function.

4.3. Selective Pressure Analysis

To detect any possible trace of selective pressure, PBS has been applied. PBS is a statistical three-population test based on the FST fixation index, and it has proven to be one of the best methods of detecting signs of recent natural selection on genomes [31]. Regarding the choice of the three populations, we used three distant populations worldwide (CEU for Europe, CHB for Asia, and YRI for Africa), which are the most commonly used [63,64] and are among the first populations released by the 1000 Genomes, Phase 1 [64].
FST among three possible populations pairs (CEU, CHB, and YRI) has been calculated by VCFtools v0.1.16 [65] using VCF files of each gene under scrutiny. The genes were previously filtrated with Plink 1.9 [66] to keep only the variants with MAF ≥ 0.05. Then, PBS and relative plots were performed with R Studio software (R Core Team 2021, https://www.R-project.org, accessed on 2 December 2022).

5. Conclusions

In conclusion, the nucleotide substitutions that have putatively affected the amino acid composition, the post-translational modification, and/or the gene expression levels of salivary proteins described in this study might have generated novel functional features and a different expression ratio among the several components of the salivary proteome. Given the largely unknown functional roles of most salivary proteins, we may only speculate that these changes could have ultimately modified the entire homeostasis of the oral cavity environment, possibly conditioning the eating habit lifestyle of modern humans. Our data may pave the way to unravelling evolutionary processes that have occurred through changes of salivary composition in the oral cavity homeostasis. This knowledge could provide additional novel cues toward a better understanding of the ability of different species to adapt to different and changing environments.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ijms241915010/s1.

Author Contributions

Conceptualization: M.C. and O.P.; data elaboration and collection, L.D.P., M.C., M.B., B.M. and A.O.; manuscript editing, L.D.P., W.L., M.C., B.M., T.C., O.P. and S.S. All authors contributed to the discussion and revision of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the FIR 2021 funds (Cagliari, Italy) to T.C. and the “Linea D.1–D.3.1” funds from the Università Cattolica del Sacro Cuore (Rome, Italy) to L.D.P., W.L., and O.P.

Data Availability Statement

All data reported in this manuscript are shown in the results section and further supported by the extended datasets provided in the supplementary files. No new primary datasets to be deposited have been generated.

Acknowledgments

We thank Luca Pagani (Università di Padova) for their useful advice on adaptative introgression.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cabras, T.; Iavarone, F.; Manconi, B.; Olianas, A.; Sanna, M.T.; Castagnola, M.; Messana, I. Top-down analytical platforms for the characterization of the human salivary proteome. Bioanalysis 2014, 6, 563–581. [Google Scholar] [CrossRef] [PubMed]
  2. Bandhakavi, S.; Stone, M.D.; Onsongo, G.; Van Riper, S.K.; Griffin, T.J. A Dynamic Range Compression and Three-Dimensional Peptide Fractionation Analysis Platform Expands Proteome Coverage and the Diagnostic Potential of Whole Saliva. J. Proteome Res. 2009, 8, 5590–5600. [Google Scholar] [CrossRef] [PubMed]
  3. Vila, T.; Rizk, A.M.; Sultan, A.S.; Jabra-Rizk, M.A. The power of saliva: Antimicrobial and beyond. PLoS Pathog. 2019, 15, e1008058. [Google Scholar] [CrossRef]
  4. Ngo, L.H.; Veith, P.D.; Chen, Y.Y.; Chen, D.; Darby, I.B.; Reynolds, E.C. Mass Spectrometric Analyses of Peptides and Proteins in Human Gingival Crevicular Fluid. J. Proteome Res. 2010, 9, 1683–1693. [Google Scholar] [CrossRef] [PubMed]
  5. Boroumand, M.; Olianas, A.; Cabras, T.; Manconi, B.; Fanni, D.; Faa, G.; Desiderio, C.; Messana, I.; Castagnola, M. Saliva, a bodily fluid with recognized and potential diagnostic applications. J. Sep. Sci. 2021, 44, 3677–3690. [Google Scholar] [CrossRef]
  6. Beeley, J.A. Basic proline-rich proteins: Multifunctional defence molecules? Oral Dis. 2012, 7, 69–70. [Google Scholar] [CrossRef]
  7. Hajishengallis, G.; Russell, M.W. Innate Humoral Defense Factors. Mucosal Immunol. 2015, 1, 251–270. [Google Scholar] [CrossRef]
  8. Lyons, K.M.; Azen, E.A.; Goodman, P.A.; Smithies, O. Many protein products from a few loci: Assignment of human salivary proline-rich proteins to specific loci. Genetics 1988, 120, 255–265. [Google Scholar] [CrossRef]
  9. Padiglia, A.; Orrù, R.; Boroumand, M.; Olianas, A.; Manconi, B.; Sanna, M.T.; Desiderio, C.; Iavarone, F.; Liori, B.; Messana, I.; et al. Extensive Characterization of the Human Salivary Basic Proline-Rich Protein Family by Top-Down Mass Spectrometry. J. Proteome Res. 2018, 17, 3292–3307. [Google Scholar] [CrossRef]
  10. Manconi, B.; Castagnola, M.; Cabras, T.; Olianas, A.; Vitali, A.; Desiderio, C.; Sanna, M.T.; Messana, I. The intriguing heterogeneity of human salivary proline-rich proteins. J. Proteom. 2016, 134, 47–56. [Google Scholar] [CrossRef]
  11. Lyons, K.M.; Stein, J.H.; Smithies, O. Length polymorphisms in human proline-rich protein genes generated by intragenic unequal crossing over. Genetics 1988, 120, 267–278. [Google Scholar] [CrossRef] [PubMed]
  12. Azen, E.A.; Amberger, E.; Fisher, S.; Prakobphol, A.; Niece, R.L. PRB1, PRB2, and PRB4 coded polymorphisms among human salivary concanavalin-A binding, II-1, and Po proline-rich proteins. Am. J. Hum. Genet. 1966, 58, 143–153. [Google Scholar]
  13. Messana, I.; Cabras, T.; Pisano, E.; Sanna, M.T.; Olianas, A.; Manconi, B.; Pellegrini, M.; Paludetti, G.; Scarano, E.; Fiorita, A.; et al. Trafficking and Postsecretory Events Responsible for the Formation of Secreted Human Salivary Peptides: A Proteomics Approach. Mol. Cell. Proteom. 2008, 7, 911–926. [Google Scholar] [CrossRef] [PubMed]
  14. Jensen, J.L.; Lamkin, M.S.; Troxler, R.F.; Oppenheim, F.G. Multiple forms of statherin in human salivary secretions. Arch. Oral Biol. 1991, 36, 529–534. [Google Scholar] [CrossRef] [PubMed]
  15. Inzitari, R.; Cabras, T.; Rossetti, D.V.; Fanali, C.; Vitali, A.; Pellegrini, M.; Paludetti, G.; Manni, A.; Giardina, B.; Messana, I.; et al. Detection in human saliva of different statherin and P-B fragments and derivatives. Proteomics 2006, 6, 6370–6379. [Google Scholar] [CrossRef]
  16. Cabras, T.; Inzitari, R.; Fanali, C.; Scarano, E.; Patamia, M.; Sanna, M.T.; Pisano, E.; Giardina, B.; Castagnola, M.; Messana, I. HPLC–MS characterization of cyclo-statherin Q-37, a specific cyclization product of human salivary statherin generated by transglutaminase 2. J. Sep. Sci. 2006, 29, 2600–2608. [Google Scholar] [CrossRef]
  17. Torres, P.; Castro, M.; Reyes, M.; Torres, V. Histatins, wound healing, and cell migration. Oral Dis. 2018, 24, 1150–1160. [Google Scholar] [CrossRef]
  18. Castagnola, M.; Inzitari, R.; Rossetti, D.V.; Olmi, C.; Cabras, T.; Piras, V.; Nicolussi, P.; Sanna, M.T.; Pellegrini, M.; Giardina, B.; et al. A Cascade of 24 Histatins (Histatin 3 Fragments) in Human Saliva: Suggestion for a Pre-Secretory Sequential Cleavage Pathway. J. Biol. Chem. 2004, 279, 41436–41443. [Google Scholar] [CrossRef]
  19. Wang, G. Human Antimicrobial Peptides and Proteins. Pharmaceuticals 2014, 7, 545–594. [Google Scholar] [CrossRef]
  20. Dickinson, D.P. Cysteine peptidases of mammals: Their biological roles and potential effects in the oral cavity and other tissues in health and disease. Crit. Rev. Oral Biol. Med. 2022, 13, 238–275. [Google Scholar] [CrossRef]
  21. Manconi, B.; Liori, B.; Cabras, T.; Vincenzoni, F.; Iavarone, F.; Castagnola, M.; Messana, I.; Olianas, A. Salivary Cystatins: Exploring New Post-Translational Modifications and Polymorphisms by Top-Down High-Resolution Mass Spectrometry. J. Proteome Res. 2017, 16, 4196–4207. [Google Scholar] [CrossRef] [PubMed]
  22. Perry, G.H.; Dominy, N.J.; Claw, K.G.; Lee, A.S.; Fiegler, H.; Redon, R.; Werner, J.; Villanea, F.A.; Mountain, J.L.; Misra, R.; et al. Diet and the evolution of human amylase gene copy number variation. Nat. Genet. 2007, 39, 1256–1260. [Google Scholar] [CrossRef] [PubMed]
  23. Polley, S.; Louzada, S.; Forni, D.; Sironi, M.; Balaskas, T.; Hains, D.S.; Yang, F.; Hollox, E.J. Evolution of the rapidly mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy. Proc. Natl. Acad. Sci. USA 2015, 112, 5105–5110. [Google Scholar] [CrossRef] [PubMed]
  24. Xu, D.; Pavlidis, P.; Taskent, R.O.; Alachiotis, N.; Flanagan, C.; DeGiorgio, M.; Blekhman, R.; Ruhl, S.; Gokcumen, O. Archaic Hominin Introgression in Africa Contributes to Functional Salivary MUC7 Genetic Variation. Mol. Biol. Evol. 2017, 34, 2704–2715. [Google Scholar] [CrossRef]
  25. Xu, D.; Pavlidis, P.; Thamadilok, S.; Redwood, E.; Fox, S.; Blekhman, R.; Ruhl, S.; Gokcumen, O. Recent evolution of the salivary mucin MUC7. Sci. Rep. 2016, 6, 31791. [Google Scholar] [CrossRef] [PubMed]
  26. Thamadilok, S.; Choi, K.S.; Ruhl, L.; Schulte, F.; Kazim, A.L.; Hardt, M.; Gokcumen, O.; RuhL, S. Human and Nonhuman Primate Lineage-Specific Footprints in the Salivary Proteome. Mol. Biol. Evol. 2020, 37, 395–405. [Google Scholar] [CrossRef] [PubMed]
  27. Edwards, A.W.F. The Genetical Theory of Natural Selection. Genetics 2000, 154, 1419–1426. [Google Scholar] [CrossRef]
  28. Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. USA 2010, 107, 961–968. [Google Scholar] [CrossRef]
  29. Marcus, J.H.; Novembre, J. Visualizing the geography of genetic variants. Bioinformatics 2017, 33, 594–595. [Google Scholar] [CrossRef]
  30. Yi, X.; Liang, Y.; Huerta-Sanchez, E.; Jin, X.; Cuo, Z.X.; Pool, J.E.; Xu, X.; Jiang, H.; Vinckenbosch, N.; Korneliussen, T.S.; et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 2010, 329, 75–78. [Google Scholar] [CrossRef]
  31. Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989, 123, 585–595. [Google Scholar] [CrossRef] [PubMed]
  32. Skoglund, P.; Jakobsson, M. Archaic human ancestry in East Asia. Proc. Natl. Acad. Sci. USA 2011, 108, 18301–18306. [Google Scholar] [CrossRef] [PubMed]
  33. Sankararaman, S.; Mallick, S.; Patterson, N.; Reich, D. The Combined Landscape of Denisovan and Neanderthal Ancestry in Present-Day Humans. Curr. Biol. 2016, 26, 1241–1247. [Google Scholar] [CrossRef]
  34. Racimo, F.; Marnetto, D.; Huerta-Sánchez, E. Signatures of Archaic Adaptive Introgression in Present-Day Human Populations. Mol. Biol. Evol. 2017, 34, 296–317. [Google Scholar] [CrossRef]
  35. Jagoda, E.; Lawson, D.J.; Wall, J.D.; Lambert, D.; Muller, C.; Westaway, M.; Leavesley, M.; Capellini, T.D.; Mirazón Lahr, M.; Gerbault, P.; et al. Disentangling Immediate Adaptive Introgression from Selection on Standing Introgressed Variation in Humans. Mol. Biol. Evol. 2018, 35, 623–630. [Google Scholar] [CrossRef] [PubMed]
  36. Vernot, B.; Akey, J.M. Resurrecting surviving Neandertal lineages from modern human genomes. Science 2014, 343, 1017–1021. [Google Scholar] [CrossRef] [PubMed]
  37. Weyrich, L.S.; Duchene, S.; Soubrier, J.; Arriola, L.; Llamas, B.; Breen, J.; Morris, A.G.; Alt, K.W.; Caramelli, D.; Dresely, V.; et al. Neanderthal behaviour, diet, and disease inferred from ancient DNA in dental calculus. Nature 2017, 544, 357–361. [Google Scholar] [CrossRef] [PubMed]
  38. El Zaatari, S.; Grine, F.E.; Ungar, P.S.; Hublin, J.J. Neandertal versus Modern Human Dietary Responses to Climatic Fluctuations. PLoS ONE 2016, 11, e0153277. [Google Scholar] [CrossRef]
  39. Cornejo Ulloa, P.; van der Veen, M.H.; Krom, B.P. Review: Modulation of the oral microbiome by the host to promote ecological balance. Odontology 2019, 107, 437–448. [Google Scholar] [CrossRef]
  40. Lamont, R.J.; Jenkinson, H.F. Subgingival colonization by Porphyromonas gingivalis. Oral Microbiol. Immunol. 2000, 15, 341–349. [Google Scholar] [CrossRef]
  41. Laputková, G.; Schwartzová, V.; Bánovčin, J.; Alexovič, M.; Sabo, J. Salivary Protein Roles in Oral Health and as Predictors of Caries Risk. Open Life Sci. 2018, 13, 174–200. [Google Scholar] [CrossRef] [PubMed]
  42. Lynge Pedersen, A.M.; Belstrøm, D. The role of natural salivary defences in maintaining a healthy oral microbiota. J. Dent. 2019, 80, S3–S12. [Google Scholar] [CrossRef] [PubMed]
  43. Chen, W.; Jiang, Q.; Yan, G.; Yang, D. The oral microbiome and salivary proteins influence caries in children aged 6 to 8 years. BMC Oral Health 2020, 20, 295. [Google Scholar] [CrossRef] [PubMed]
  44. Cabras, T.; Melis, M.; Castagnola, M.; Padiglia, A.; Tepper, B.J.; Messana, I.; Tomassini Barbarossa, I. Responsiveness to 6-n-Propylthiouracil (PROP) Is Associated with Salivary Levels of Two Specific Basic Proline-Rich Proteins in Humans. PLoS ONE 2012, 7, e30962. [Google Scholar] [CrossRef]
  45. Rodrigues, L.; Costa, G.; Cordeiro, C.; Pinheiro, C.; Amado, F.; Lamy, E. Salivary proteome and glucose levels are related with sweet taste sensitivity in young adults. Food Nutr. Res. 2017, 61, 1389208. [Google Scholar] [CrossRef]
  46. Stolle, T.; Grondinger, F.; Dunkel, A.; Meng, C.; Médard, G.; Kuster, B.; Hofmann, T. Salivary Proteome Patterns Affecting Human Salt Taste Sensitivity. J. Agric. Food Chem. 2017, 65, 9275–9286. [Google Scholar] [CrossRef]
  47. Scinska-Bienkowska, A.; Wrobel, E.; Turzynska, D.; Bidzinski, A.; Jezewska, E.; Sienkiewicz-Jarosz, H.; Golembiowska, K.; Kostowski, W.; Kukwa, A.; Plaznik, A.; et al. Glutamate concentration in whole saliva and taste responses to monosodium glutamate in humans. Nutr. Neurosci. 2006, 9, 25–31. [Google Scholar] [CrossRef]
  48. Méjean, C.; Morzel, M.; Neyraud, E.; Issanchou, S.; Martin, C.; Bozonnet, S.; Urbano, C.; Schlich, P.; Hercberg, S.; Péneau, S.; et al. Salivary Composition Is Associated with Liking and Usual Nutrient Intake. PLoS ONE 2015, 10, e0137473. [Google Scholar] [CrossRef]
  49. Morzel, M.; Chabanet, C.; Schwartz, C.; Lucchi, G.; Ducoroy, P.; Nicklaus, S. Salivary protein profiles are linked to bitter taste acceptance in infants. Eur. J. Pediatr. 2014, 173, 575–582. [Google Scholar] [CrossRef]
  50. Perry, G.H.; Kistler, L.; Kelaita, M.A.; Sams, A.J. Insights into hominin phenotypic and dietary evolution from ancient DNA sequence data. J. Hum. Evol. 2015, 79, 55–63. [Google Scholar] [CrossRef]
  51. Green, R.E.; Krause, J.; Briggs, A.W.; Maricic, T.; Stenzel, U.; Kircher, M.; Patterson, N.; Li, H.; Zhai, W.; Fritz, M.H.; et al. A Draft Sequence of the Neandertal Genome. Science 2010, 328, 710–722. [Google Scholar] [CrossRef] [PubMed]
  52. Burbano, H.A.; Hodges, E.; Green, R.E.; Briggs, A.W.; Krause, J.; Meyer, M.; Good, J.M.; Maricic, T.; Johnson, P.L.; Xuan, Z.; et al. Targeted Investigation of the Neandertal Genome by Array-Based Sequence Capture. Science 2010, 328, 723–725. [Google Scholar] [CrossRef] [PubMed]
  53. Bode, W.; Engh, R.; Musil, D.; Thiele, U.; Huber, R.; Karshikov, A.; Brzin, J.; Kos, J.; Turk, V. The 2.0 A X-ray crystal structure of chicken egg white cystatin and its possible mode of interaction with cysteine proteinases. EMBO J. 1988, 7, 2593–2599. [Google Scholar] [CrossRef] [PubMed]
  54. Mednikova, B.B. A Proximal Pedal Phalanx of a Paleolithic Hominin from Denisova Cave, Altai. Archaeol. Ethnol. Anthropol. Eurasia 2011, 39, 129–138. [Google Scholar] [CrossRef]
  55. Meyer, M.; Kircher, M.; Gansauge, M.T.; Li, H.; Racimo, F.; Mallick, S.; Schraiber, J.G.; Jay, F.; Prüfer, K.; de Filippo, C.; et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 2012, 338, 222–226. [Google Scholar] [CrossRef]
  56. Prüfer, K.; de Filippo, C.; Grote, S.; Mafessoni, F.; Korlević, P.; Hajdinjak, M.; Vernot, B.; Skov, L.; Hsieh, P.; Peyrégne, S.; et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 2017, 358, 655–658. [Google Scholar] [CrossRef]
  57. Prüfer, K.; Racimo, F.; Patterson, N.; Jay, F.; Sankararaman, S.; Sawyer, S.; Heinze, A.; Renaud, G.; Sudmant, P.H.; de Filippo, C.; et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 2014, 505, 43–49. [Google Scholar] [CrossRef]
  58. Mafessoni, F.; Grote, S.; de Filippo, C.; Slon, V.; Kolobova, K.A.; Viola, B.; Markin, S.V.; Chintalapati, M.; Peyrégne, S.; Skov, L.; et al. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl. Acad. Sci. USA 2020, 117, 15132–15136. [Google Scholar] [CrossRef]
  59. Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative genomics viewer. Nat. Biotechnol. 2011, 29, 24–26. [Google Scholar] [CrossRef]
  60. Thorvaldsdottir, H.; Robinson, J.T.; Mesirov, J.P. Integrative Genomics Viewer (IGV): High-performance genomics data visualization and exploration. Brief. Bioinform. 2013, 14, 178–192. [Google Scholar] [CrossRef]
  61. Robinson, J.T.; Thorvaldsdóttir, H.; Wenger, A.M.; Zehir, A.; Mesirov, J.P. Variant Review with the Integrative Genomics Viewer. Cancer Res. 2017, 77, e31–e34. [Google Scholar] [CrossRef] [PubMed]
  62. Ng, P.C. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003, 31, 3812–3814. [Google Scholar] [CrossRef] [PubMed]
  63. Pfeifer, B.; Alachiotis, N.; Pavlidis, P.; Schimek, M.G. Genome scans for selection and introgression based on k-nearest neighbour techniques. Mol. Ecol. Resour. 2020, 20, 1597–1609. [Google Scholar] [CrossRef] [PubMed]
  64. Bhatia, G.; Patterson, N.; Pasaniuc, B.; Zaitlen, N.; Genovese, G.; Pollack, S.; Mallick, S.; Myers, S.; Tandon, A.; Spencer, C.; et al. Genome-wide comparison of African-ancestry populations from CARe and other cohorts reveals signals of natural selection. Am. J. Hum. Genet. 2011, 89, 368–381. [Google Scholar] [CrossRef]
  65. Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef]
  66. Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P.I.; Daly, M.J.; et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef]
Figure 1. Schematic representation of basic proline-rich genes and encoded proteins: PRB1 (A), PRB2 (B), PRB3 (C), PRB4 (D). For each protein, the genetic allelic variants (S, small; M, medium; L, large; and VL, very large) are shown on the left-sided column; the resulting alternative proteoforms are shown on the right-sided column as blocks, with the corresponding symbol on top. Vertical dashed lines indicate the pro-protein convertase cleavage sites with corresponding Arg (R) residues’ positions. The P enclosed in a circle denotes phosphorylation sites; aminoacidic substitutions are shown for selected isoforms. See text for additional details.
Figure 1. Schematic representation of basic proline-rich genes and encoded proteins: PRB1 (A), PRB2 (B), PRB3 (C), PRB4 (D). For each protein, the genetic allelic variants (S, small; M, medium; L, large; and VL, very large) are shown on the left-sided column; the resulting alternative proteoforms are shown on the right-sided column as blocks, with the corresponding symbol on top. Vertical dashed lines indicate the pro-protein convertase cleavage sites with corresponding Arg (R) residues’ positions. The P enclosed in a circle denotes phosphorylation sites; aminoacidic substitutions are shown for selected isoforms. See text for additional details.
Ijms 24 15010 g001
Figure 2. Schematic representation of acidic proline-rich proteins (A) and cystatins (B). For each protein, the genetic allelic variants (S, small; M, medium; L, large; and VL, very large) are shown on the left-sided column; the resulting alternative proteoforms are shown on the right-sided column as blocks with corresponding symbols on top. All cystatin alternative proteoforms feature two disulfide bridges (indicated by brackets between Cys), oxidation (ox), and phosphorylation (P) sites. Vertical dashed lines indicate the pro-protein convertase cleavage sites with corresponding Arg (R) residues’ positions. The P enclosed in a circle denotes phosphorylation sites; ox: oxidation sites; p-E: N-terminal pyroglutamic acid; aminoacidic substitutions are shown for selected isoforms. See text for additional details.
Figure 2. Schematic representation of acidic proline-rich proteins (A) and cystatins (B). For each protein, the genetic allelic variants (S, small; M, medium; L, large; and VL, very large) are shown on the left-sided column; the resulting alternative proteoforms are shown on the right-sided column as blocks with corresponding symbols on top. All cystatin alternative proteoforms feature two disulfide bridges (indicated by brackets between Cys), oxidation (ox), and phosphorylation (P) sites. Vertical dashed lines indicate the pro-protein convertase cleavage sites with corresponding Arg (R) residues’ positions. The P enclosed in a circle denotes phosphorylation sites; ox: oxidation sites; p-E: N-terminal pyroglutamic acid; aminoacidic substitutions are shown for selected isoforms. See text for additional details.
Ijms 24 15010 g002
Figure 3. Nucleotide substitutions in salivary protein genes. The pie chart shows the type and number of 3472 nucleotide substitutions across the 17 tested salivary genes. In particular, the 428 substitutions found in coding regions included 307 nonsynonymous changes across all the 17 genes tested. See text for additional details.
Figure 3. Nucleotide substitutions in salivary protein genes. The pie chart shows the type and number of 3472 nucleotide substitutions across the 17 tested salivary genes. In particular, the 428 substitutions found in coding regions included 307 nonsynonymous changes across all the 17 genes tested. See text for additional details.
Ijms 24 15010 g003
Figure 4. Predicted archaic hominins’ PRB-1 (panel (a)) and PRB-2 (panels (bd)) protein variants.
Figure 4. Predicted archaic hominins’ PRB-1 (panel (a)) and PRB-2 (panels (bd)) protein variants.
Ijms 24 15010 g004
Table 1. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on PRB1, PRB2, PRB3, and PRB4 gene loci.
Table 1. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on PRB1, PRB2, PRB3, and PRB4 gene loci.
Chromosome
Position (hg19)
Gene RegionModern HumanAltai Neanderthal
(Variant Frequency a)
Chagyrskaya
Neanderthal
(Variant Frequency a)
Vindija
Neanderthal
(Variant Frequency a)
Denisovan
(Variant Frequency a)
Codon→Amino AcidSNP idSNP Total
Frequency (ALFA)
SIFT Results
(Score)
PRB1 (reverse reading, chromosome 12)
11,507,477Exon 2
(II-2)
CTTCTT (100%)TTT (13%)TTT (7%) *CTT (100%)GAA→E10
AAA→K10
n.a.n.a.Damaging (0.02)
11,507,464Exon 2
(II-2)
AGGAGG (100%)AGG (100%)AAG (12%)AGG (100%)UCC→S14
UUC→F14
rs1173856027A = 0%Tolerated (0.72)
11,506,888Exon 3
(II-2)
GGGGGG (100%)GGG (100%)GAG (12%)GGG (100%)CCC→P35
CUC→L35
n.a.n.a.Tolerated (0.06)
11,506,856Exon 3
(II-2)
GGGGGG (100%)AGG (11%)GGG (100%)GGG (100%)CCC→P45
UCC→S45
rs762910991A = 0.003%Tolerated (0.17)
11,506,853Exon 3
(II-2)
GGTTGT (3%) *GGT (100%)AGT (15%)GGT (100%)CCA→P46
UCA→S46
rs745726339A = 0%Damaging
(0)
11,506,852Exon 3
(II-2)
GGTGGT (100%)GGT (100%)GAT (11%)GGT (100%)CCA→P46
CUA→L46
n.a.n.a.Damaging
(0)
11,506,804Exon 3
(II-2)
GTTGAT (61%)GAT (63%)GAT (60%)GTT (100%)CAA→Q62
CUA→L62
n.a.n.a.Tolerated (0.29)
11,506,801Exon 3
(II-2)
CCTCCT (100%)CTT (11%)CTT (5%) *CCT (100%)GGA→G63
GAA→E63
n.a.n.a.Damaging (0.01)
11,506,790Exon 3
(II-2)
GTTGTT (100%)ATT (11%)ATT (6%) *GTT (100%)CAA→Q67
UAA→stop
rs1409612167A = 0%Damaging due to stop
11,506,784Exon 3
(II-2)
CTGCTG (100%)CTG (100%)TTG (13%)CTG (100%)GAC→D69
AAC→N69
rs554211998T = 0%Tolerated (0.95)
11,506,774Exon 3
(II-2)
GCTGTT (13%)GTT (8%) *GTT (6%) *GTT (9%) *CGA→R72
CAA→Q72
rs202083397T = 10.6%Tolerated (0.08)
11,506,766Exon 3
(II-2)
GCTGCT (100%)GCT (100%)ACT (12%)GCT (100%)CGA→R75
UGA→stop
rs766131639A = 0%Damaging due to stop
11,506,730Exon 3
(Ps-2)
GTTGTT (100%)ATT (16%)GTT (100%)GTT (100%)CAA→Q12
UAA→stop
n.a.n.a.Damaging due to stop
11,506,723Exon 3
(Ps-2)
CCACCA (100%)CTA (12%)CTA (3%) *CCA (100%)GGU→G14
GAU→D14
rs534597111T = 0%NS
11,506,669Exon 3
(Ps-2)
GGTGTT (39%)GTT (36%)GTT (55%)GTT (26%)CCA→P32
CAA→Q32
rs772365043C = 0%NS
11,506,618Exon 3
(Ps-2)
CCTCCT (100%)CTT (17%)CTT (3%) *CCT (100%)GGA→G49
GAA→E49
n.a.n.a.NS
11,506,612Exon 3
(Ps-2)
GGGGGG (100%)GAG (11%)GGG (100%)GGG (100%)CCC→P51
CUC→L51
n.a.n.a.NS
11,506,577Exon 3
(IB-6)
GGAGGA (100%)AGA (13%)GGA (100%)GGA (100%)CCU→P2
UCU→S2
n.a.n.a.NS
11,506,514Exon 3
(IB-6)
GGAGGA (100%)AGA (6%) *AGA (11%)GGA (100%)CCU→P23
UCU→S23
n.a.n.a.NS
11,506,492Exon 3
(IB-6)
GGTGGT (100%)GGT (100%)GAT (13%)GGT (100%)CCA→P30
CUA→L30
n.a.n.a.NS
11,506,490Exon 3
(IB-6)
GGGAGG (5%) *AGG (18%)AGG (8%) *GGG (100%)CCC→P31
UCC→S31
n.a.n.a.NS
11,506,486Exon 3
(IB-6)
GGTGGT (100%)GGT (100%)GTT (18%)GGT (100%)CCA→P32
CAA→Q32
rs755622101T = 1.3%NS
11,506,473Exon 3
(Ps-2)
TTCTTG(100%)TTG(83%) **TTG(100%) **TTG(75%) **AAG→K37
AAC→N37
rs61930109G = 72.1%NS
11,506,403Exon 3
(Ps-2)
AGGGGG (50%) **GGG (50%) **AGG (100%) **GGG (100%)UCC→S59
CCC→P59
n.a.n.a.NS
11,506,370Exon 3
(Ps-2)
GGGGGG (100%)GGG (100%)AGG (21%)GGG (100%)CCC→P70
UCC→S70
rs774158904A = 0%NS
11,506,369Exon 3
(Ps-2)
GGGGGG (93%)GGG (100%)GAG (16%)GGG (100%)CCC→P71
CUC→L71
rs369001998A = 0.007%NS
11,506,339Exon 3
(Ps-2)
GGGGGG (97%)GAG (5%) *GAG (23%)GGG (100%)CCC→P81
CUC→L81
n.a.n.a.NS
11,506,333Exon 3
(Ps-2)
GGAGGA (100%)GAA (5%) *GAA (11%)GGA (100%)CCU→P83
CUU→L83
n.a.n.a.NS
11,506,309Exon 3
(Ps-2)
GGTGAT (4%) *GAT (6%) *GAT (17%)GGT (100%)CCA→P91
CUU→L91
n.a.n.a.Damaging (0.01)
11,506,303Exon 3
(Ps-2)
GGTGTT (3%) *GTT (13%)GGT (100%)GGT (100%)CCA→P93
CAA→Q93
rs201682460T = 2.8%Damaging
(0)
11,506,301Exon 3
(Ps-2)
GTTATT (4%) *GTT (100%)ATT (15%)GTT (100%)CAA→Q94
UAA→stop
n.a.n.a.Damaging due to stop
11,506,285Exon 3
(Ps-2)
GGAGGA (100%)GGA (100%)GAA (14%)GGA (100%)CCU→P99
CUU→L99
n.a.n.a.Damaging
(0.01)
11,506,283Exon 3
(Ps-2)
GTTGTT (100%)ATT (14%)ATT (13%)GTT (100%)CAA→Q100
UAA→stop
n.a.n.a.Damaging due to stop
11,506,250Exon 3
(Ps-2)
GGTGGT (100%) **GGT (100%)AGT (14%)GGT (100%)CCA→P111
UCA→S111
n.a.n.a.Tolerated (0.08)
11,506,249Exon 3
(Ps-2)
GGTGGT (100%) **GGT (100%)GAT (13%)GGT (100%)CCA→P111
CUA→L111
rs1208300501A = 0%Tolerated (0.09)
11,506,246Exon 3
(Ps-2)
GGGGGG (100%) **GAG (18%)GGG (100%)GGG (100%)CCC→P112
CUC→L112
rs1303924609A = 0%Damaging
(0.02)
11,506,241Exon 3
(Ps-2)
GTTGTT (100%) **GTT (100%)ATT (14%)GTT (100%)CAA→Q114
UAA→stop
rs751826141A = 0%Damaging due to stop
11,506,217Exon 3
(IB-6)
CGGGGG (67%) **GGG (17%) **GGG (25%)CGG (100%)GCC→A61
CCC→P61
rs771648794G = 0.04%Tolerated
(1)
11,506,154Exon 3
(IB-6)
GGGGGG (100%)AGG (17%)AGG (4%) *GGG (100%)CCC→P82
UCC→S82
n.a.n.a.Tolerated (0.15)
11,506,150Exon 3
(IB-6)
GGTGGT (100%)GAT (14%)GGT (100%)GAT (6%) *CCA→P83
CUA→L83
rs747444571A = 0%Damaging
(0.03)
11,506,079Exon 3
(IB-6)
GGAGGA (100%)GGA (100%)AGA (13%)GGA (100%)CCU→P107
UCU→S107
n.a.n.a.Tolerated (0.06)
11,506,075Exon 3
(IB-6)
GGAGGA (100%)GGA (100%)GAA (13%)GGA (100%)CCU→P108
CUU→L108
n.a.n.a.Damaging
(0.01)
11,506,070Exon 3
(IB-6)
CCCCCC (100%)CCC (100%)TCC (12%)CCC (100%)GGG→G110
AGG→R110
n.a.n.a.Tolerated
(0.3)
11,506,057Exon 3
(IB-6)
AGGAGG (100%)AAG (11%)AAG (5%) *AGG (100%)UCC→S114
UUC→F114
n.a.n.a.Damaging
(0.03)
11,506,052Exon 3
(IB-6)
GGAGGA (100%)AGA (10%) *AGA (18%)GGA (100%)CCU→P116
UCU→S116
rs1372423355A = 0%Tolerated
(0.06)
PRB2 (reverse reading, chromosome 12)
11,548,429Exon 1
(Signal)
CGGCGG (100%)CAG (3%) *CAG (13%)CGG (100%)GCC→A11(sp)
GUC→V11(sp)
rs1415819382A = 0%Damaging
(0)
11,547,429Exon 2
(IB-1)
CCTTCT (4%) *CCT (100%)TCT (12%)CCT (100%)GGA→G18
AGA→R18
n.a.n.a.Damaging
(0.2)
11,546,899Exon 3
(IB-1)
CCTCCT (100%)CTT (11%)CCT (100%)CCT (100%)GGA→G22
GAA→E22
rs188924826T = 0.007%Tolerated
(0.1)
11,546,894Exon 3
(IB-1)
GGGGGG (100%)AGG (14%)GGG (100%)GGG (100%)CCC→P24
UCC→S24
n.a.n.a.Tolerated
(0.73)
11,546,872Exon 3
(IB-1)
GGAGGA (100%)GGA (100%)GAA (11%)GGA (100%)CCU→P31
CUU→L31
rs748769813A = 0%Tolerated
(0.46)
11,546,830Exon 3
(IB-1)
GGGGGG (100%)GAG (9%) *GAG (17%)GGG (100%)CCC→P45
CUC→L45
n.a.n.a.Tolerated
(0.1)
11,546,828Exon 3
(IB-1)
GGTAGT (3%) *GGT (100%)AGT (17%)GGT (100%)CCA→P46
UCA→S46
rs755161117A = 0.007%Tolerated
(0.36)
11,546,825Exon 3
(IB-1)
GTTGTT (97%)GTT (100%)ATT (17%)GTT (100%)CAA→Q47
UAA→stop
n.a.n.a.Damaging due to stop
11,546,810Exon 3
(IB-1)
GGAGGA (100%)GGA (100%)AGA (13%)GGA (100%)CCU→P52
UCU→S52
rs1347881375A = 0%Tolerated
(0.97)
11,546,809Exon 3
(IB-1)
GGAGGA (100%)GAA (6%) *GAA (12%)GGA (100%)CCU→P52
CUU→L52
n.a.n.a.Tolerated
(0.3)
11,546,807Exon 3
(IB-1)
GTTGTT (97%)ATT (11%)ATT (11%)GTT (100%)CAA→Q53
UAA→stop
n.a.n.a.Damaging due to stop
11,546,792Exon 3
(IB-1)
GGAGGA (100%)AGA (18%)GGA (100%)GGA (100%)CCU→P58
UCU→S58
n.a.n.a.Tolerated
(0.76)
11,546,780Exon 3
(IB-1)
GGTGGT (100%)GGT (100%)AGT (12%)GGT (100%)CCA→P62
UCA→S62
n.a.n.a.Tolerated
(0.64)
11,546,770Exon 3
(IB-1)
GGTGGT (100%)GGT (100%)GAT (13%)GGT (100%)CCA→P65
CUA→L65
n.a.n.a.Tolerated
(1)
11,546,764Exon 3
(IB-1)
GGTGGT (100%)GGT (96%)GAT (12%)GGT (100%)CCA→P67
CAA→Q67
rs201994479T = 0.008%Tolerated
(0.43)
11,546,732Exon 3
(IB-1)
GGAGGA (100%)GGA (100%)AGA (13%)GGA (100%)CCU→P78
UCU→S78
n.a.n.a.Tolerated
(0.38)
11,546,716Exon 3
(IB-1)
GTTGAT (4%) *GAT (14%)GTT (97%)GTT (100%)CAA→Q83
CUA→L83
n.a.n.a.Tolerated
(0.32)
11,546,686Exon 3
(IB-1)
GCTGTT (42%)GTT (39%)GTT (51%)GTT (29%)CGA→R93
CAA→Q93
rs76832300n.a.Tolerated
(0.5)
11,546,677Exon 3
(IB-1)
GCTGCT (100%)GCT (100%)GCT (100%)GTT (24%)CGA→R96
CAA→Q96
rs201144571T = 0.08%Tolerated
(0.47)
11,546,647Exon 3
(P-J)
GGGGGG (100%)GGG (100%)GAG (15%)GGG (100%)CCC→P10
CUC→L10
n.a.n.a.Tolerated
(0.18)
11,546,642Exon 3
(P-J)
GTTGTT (100%)GTT (100%)ATT (17%)GTT (100%)CAA→Q12
UAA→stop
n.a.n.a.Damaging due to stop
11,546,627Exon 3
(P-J)
GGAAGA (3%) *AGA (11%)AGA (5%) *GGA (100%)CCU→P17
UCU→S17
n.a.n.a.Tolerated
(0.45)
11,546,618Exon 3
(P-J)
GGAGGA (100%)GGA (93%)AGA (17%)GGA (100%)CCU→P20
UCU→S20
n.a.n.a.Tolerated
(0.81)
11,546,617Exon 3
(P-J)
GGAGGA (100%)GGA (100%)GAA (17%)GGA (100%)CCU→P20
CUU→L20
rs780517289A = 0%Tolerated
(0.82)
11,546,615Exon 3
(P-J)
GGTGGT (100%)AGT (12%)AGT (8%) *GGT (100%)CCA→P21
UCA→S21
n.a.n.a.Tolerated
(0.39)
11,546,614Exon 3
(P-J)
GGTGGT (100%)GAT (11%)GGT (100%)GGT (100%)CCA→P21
CUA→L21
n.a.n.a.Tolerated
(0.29)
11,546,585Exon 3
(P-J)
GGGGGG (100%)GGG (100%)AGG (13%)GGG (100%)CCC→P31
UCC→S31
n.a.n.a.Tolerated
(0.53)
11,546,581Exon 3
(P-J)
GGTGTT (6%) *GTT (13%)GGT (100%)GGT (100%)CCA→ P32
CAA→Q32
n.a.n.a.Damaging
(0.05)
11,546,566Exon 3
(P-J)
TTTTCT (8%) *TCT (12%)TTT (100%)TTT (100%)AAA→K37
AGA→R37
rs746515947C = 0%Tolerated
(1)
11,546,462Exon 3
(IB-8a)
GGGGGG (100%)AGG (13%)GGG (100%)GGG (100%)CCC→P9
UCC→S9
rs201392419A = 0%Tolerated
(0.58)
11,546,395Exon 3
(IB-8a)
GGTGTT (16%)GTT (10%) *GTT (13%)GTT (4%) *CCA→P31
CAA→Q31
rs11054277T = 0.01%Damaging
(0)
11,546,380Exon 3
(IB-8a)
TTTTCT (17%)TCT (14%)TCT (6%) *TTT (100%)AAA→K37
AGA→R37
rs11054276C = 0.01%Tolerated
(1)
11,546,381Exon 3
(IB-8a)
TTTTTT (100%)CTT (100%)TTT (100%)GTT (13%)AAA→K37
CAA→Q37
rs201455726G = 0.2%Tolerated
(0.42)
11,546,369Exon 3
(IB-8a)
GGGGGG (100%)AGG (12%)GGG (100%)GGG (100%)CCC→P41
UCC→S41
rs1238238576A = 0%Tolerated
(0.42)
11,546,347Exon 3
(IB-8a)
GTTGAT (6%) *GAT (4%) *GAT (15%)GTT (100%)CAA→Q48
CUA→L48
n.a.n.a.Tolerated
(0.32)
11,546,342Exon 3
(IB-8a)
GGTGGT (100%)GGT (100%)AGT (18%)GGT (100%)CCA→P50
UCA→S50
n.a.n.a.Tolerated
(0.41)
11,546,327Exon 3
(IB-8a)
CTGCTG (100%)TTG (11%)TTG (18%)CTG (100%)GAC→D55
AAC→N55
n.a.n.a.Tolerated
(0.28)
11,546,314Exon 3
(IB-8a)
GTTGCT (87%)GCT (77%)GCT (67%)GCT (94%)CAA→Q59
CGA→R59
rs34305575C = 7.6%Tolerated
(0.35)
11,546,309Exon 3
(IB-8a)
CGGGGG (12%)GGG (13%)GGG (18%)GGG (5%) *GCC→A61
CCC→P61
rs201308939G = 3.8%Tolerated
(0.25)
11,546,305Exon 3
(IB-8a)
GCTGTT (3%) *GCT (100%)GTT (11%)GCT (100%)CGA→R62
CAA→Q62
rs199748368T = 0.07%Tolerated
(0.46)
11,546,300Exon 3
(IB-8a)
GGAGGA (100%)AGA (13%)GGA (100%)GGA (100%)CCU→P64
UCU→S64
rs755713521n.a.Tolerated
(0.66)
11,546,294Exon 3
(IB-8a)
CCTCCT (100%)TCT (13%)CCT (100%)CCT (100%)GGA→G66
AGA→R66
n.a.n.a.Damaging (0.03)
11,546,279Exon 3
(IB-8a)
GGTAGT (2%) *GGT (100%)AGT (13%)GGT (100%)CCA→P71
UCA→S71
n.a.n.a.Tolerated
(0.67)
11,546,278Exon 3
(IB-8a)
GGTGAT (2%) *GGT (100%)GAT (13%)GGT (100%)CCA→P71
CUA→L71
rs766408532n.a.Tolerated
(0.26)
11,546,246Exon 3
(IB-8a)
GGGGGG (100%)GGG (100%)AGG (14%)GGG (100%)CCC→P82
UCC→S82
rs1440556057A = 0.0004%Tolerated
(0.42)
11,546,245Exon 3
(IB-8a)
GGGGGG (97%)GAG (7%) *GAG (26%)GAG (7%) *CCC→P82
CUC→L82
rs1262267049A = 0.0004%Tolerated
(0.15)
11,546,213Exon 3
(IB-8a)
GGGGGG (100%)AGG (8%) *AGG (25%)GGG (100%)CCC→P93
UCC→S93
rs1408969762n.a.Tolerated
(0.26)
11,546,187Exon 3
(IB-8a)
GTTGTT (96%)GTC (10%) *GTC (12%)GTC (4%) *CAA→Q101
CAC→H101
n.a.n.a.Tolerated
(0.23)
11,546,161Exon 3
(IB-8a)
GTTGAT (21%)GTT (100%)GAT (30%)GTT (100%)CAA→Q110
CUA→L110
n.a.n.a.Tolerated
(0.61)
11,546,089Exon 3
(P-F)
GGGGGG (100%)GAG (17%) **GAG (17%)GGG (100%)CCC→P10
CUC→L10
n.a.n.a.Tolerated
(0.61)
11,546,084Exon 3
(P-F)
GTTGTT (100%)GTT (100%)ATT (15%)GTT (100%)CAA→Q12
UAA→stop
n.a.n.a.Damaging due to stop
11,546,059Exon 3
(P-F)
GGGGGG (100%)GAG (7%) *GAG (21%)GGG (100%)CCC→P20
CUC→L20
n.a.n.a.Tolerated
(0.19)
11,546,050Exon 3
(P-F)
GGAGTA (4%) *GTA (13%)GGA (100%)GTA (7%) *CCU→P23
CAU→H23
n.a.n.a.Tolerated
(0.56)
11,546,027Exon 3
(P-F)
GGGGGG (100%)AGG (11%)AGG (7%) *GGG (100%)CCC→P31
UCC→S31
rs1201001162n.a.Tolerated
(0.61)
11,546,023Exon 3
(P-F)
GGTGGT (100%)GTT (5%) *GTT (13%)GTT (4%) *CCA→P32
CAA→Q32
rs201391404T = 0.059%Damaging (0.03)
11,546,009Exon 3
(P-F)
TTTTTT (100%)TTT (100%)TTT (95%)GTT (12%)AAA→K37
CAA→ Q37
n.a.n.a.Tolerated
(0.26)
11,545,975Exon 3
(P-F)
GTTGAT (2%) *GAT (16%)GAT (33%)GTT (100%)CAA→Q48
CUA→L48
n.a.n.a.Tolerated
(0.31)
11,545,964Exon 3
(P-F)
GGTGGT (100%)CGT (20%)CGT (22%)CGT (19%)CCA→P51
GCA→A51
n.a.n.a.Tolerated
(0.74)
11,545,904Exon 3
(P-H)
GGGGGG (100%)AGG (3%) *AGG (11%)GGG (100%)CCC→P10
UCC→S10
n.a.n.a.Tolerated
(0.8)
11,545,868Exon 3
(P-H)
GGAGGA (100%)GGA (100%)AGA (13%)GGA (100%)CCU→P22
UCU→S22
n.a.n.a.Tolerated
(0.69)
11,545,814Exon 3
(P-H)
GTCGTC (100%)ATC (4%) *ATC (12%)GTC (100%)CAG→Q40
UAG→stop
n.a.n.a.Damaging due to stop
11,545,802Exon 3
(P-H)
GCGGCG (100%)GCG (100%)ACG (11%)GCG (100%)CGC→R44
UGC→C44
rs748815572A = 0%Tolerated
(0.07)
11,545,793Exon 3
(P-H)
GTTGTT (100%)ATT (12%)GTT (100%)GTT (100%)CAA→Q47
UAA→stop
n.a.n.a.Damaging due to stop
11,545,790Exon 3
(P-H)
CCCCCC (100%)CCC (100%)TCC (13%)CCC (100%)GGG→G48
AGG→R48
n.a.n.a.Tolerated
(0.7)
PRB3 (reverse reading, chromosome 12)
11,422,578Exon 1
(Signal)
CGGCGG (100%)CAG (14%)CAG (3%) *CGG (100%)GCC→A8(sp)
GUC→V8(sp)
rs1337927316n.a.Tolerated
(0.06)
11,421,578Exon 2
(Gl-5)
AGGAGG (100%)AAG (11%)AAG (11%)AGG (100%)UCC→S14
UUC→F14
n.a.n.a.Tolerated
(0.32)
11,421,002Exon 3
(Gl-5)
GGGGGG (100%)AGG (11%)AGG (4%) *GGG (100%)CCC→P45
UCC→S45
rs533382585n.a.Damaging (0.04)
11,420,989Exon 3
(Gl-5)
CCGCCG (100%)CTG (14%)CTG (5%) *CCG (96%)GGC→G49
GAC→D49
n.a.n.a.Damaging
(0)
11,420,975Exon 3
(Gl-5)
CCATCA (2%) *TCA (17%)CCA (100%)CCA (100%)GGU→G54
AGU→S54
rs1197023343n.a.Tolerated
(0.12)
11,420,974Exon 3
(Gl-5)
CCACCA (100%)CTA (8%) *CTA (21%)CCA (100%)GGU→G54
GAU→D54
n.a.n.a.Tolerated
(0.19)
11,420,971Exon 3
(Gl-5)
GGGGGG (100%)GGG (100%)GAG (11%)GGG (100%)CCC→P55
CUC→L55
n.a.n.a.Damaging (0.02)
11,420,956Exon 3
(Gl-5)
CCTCCT (98%)CCT (100%)CTT (14%)CCT (100%)GGA→G60
GAA→E60
rs745804122T = 0%Tolerated
(0.06)
11,420,945Exon 3
(Gl-5)
CCTCCT (100%)CCT (100%)TCT (14%)TCT (4%) *GGA→G64
AGA→R64
rs781151188T = 0%Damaging (0.02)
11,420,939Exon 3
(Gl-5)
GGGGGG (100%)AGG (11%) **AGG (11%)GGG (100%)CCC→P66
UCC→S66
n.a.n.a.Damaging (0.04)
11,420,927Exon 3
(Gl-5)
CCTCCT (100%)CCT (100%)TCT (11%)CCT (100%)GGA→G70
AGA→R70
n.a.n.a.Damaging
(0)
11,420,926Exon 3
(Gl-5)
CCTCCT (100%)CCT (100%)CTT (16%)CCT (100%)GGA→G70
GAA→E70
n.a.n.a.Damaging
(0)
11,420,906Exon 3
(Gl-5)
GGTGGT (100%)GGT (100%)AGT (12%)GGT (100%)CCA→P77
UCA→S77
n.a.n.a.Damaging
(0.04)
11,420,899Exon 3
(Gl-5)
GCAGTA (73%)GCA (100%)GTA (65%)GTA (80%)CGU→R79
CAU→H79
rs769836435T = 0.02%Tolerated
(0.59)
11,420,896Exon 3
(Gl-5)
GGCGGC (100%)GGC (100%)GAC (13%)GGC (100%)CCG→P80
CUG→L80
n.a.n.a.Tolerated
(0.09)
11,420,836Exon 3
(Gl-5)
GCAGTA (7%) *GTA (5%) *GTA (9%) *GTA (22%)CGU→R100
CAU→H100
n.a.n.a.Tolerated
(0.24)
11,420,815Exon 3
(Gl-5)
GGTGTT (18%)GGT (100%)GGT (96%)GGT (100%)CCA→P107
CAA→Q107
rs201963893T = 0%Tolerated
(0.45)
11,420,803Exon 3
(Gl-5)
CCTCCT (100%)CCT (100%)CTT (15%)CCT (100%)GGA→G111
GAA→E111
n.a.n.a.Tolerated
(0.41)
11,420,800Exon 3
(Gl-5)
CCTCCT (97%)CCT (100%)CTT (11%)CCT (100%)GGA→G112
GAA→E112
n.a.n.a.Damaging (0.01)
11,420,780Exon 3
(Gl-5)
GGCGGC (100%)AGC (11%)GGC (100%)GGC (100%)CCG→P119
UCG→S119
n.a.n.a.Damaging (0.04)
11,420,779Exon 3
(Gl-5)
GGCGAC (4%) *GAC (6%) *GAC (35%)GGC (100%)CCG→P119
CUG→L119
n.a.n.a.Damaging (0.03)
11,420,728Exon 3
(Gl-5)
AGGAAG (4%) *AGG (100%)AAG (11%)AGG (100%)UCC→S136
UUC→F136
n.a.n.a.Damaging (0.04)
11,420,716Exon 3
(Gl-5)
GGCGAC (4%) *GGC (100%)GAC (17%)GGC (100%)CCG→P140
CUG→L140
n.a.n.a.Tolerated
(0.12)
11,420,687Exon 3
(Gl-5)
GGGGGG (98%)AGG (15%)GGG (100%)GGG (100%)CCC→P150
UCC→S150
n.a.n.a.Tolerated
(0.15)
11,420,686Exon 3
(Gl-5)
GGGGGG (98%)GAG (8%) *GAG (18%)GGG (100%)CCC→P150
CUC→L150
n.a.n.a.Tolerated
(0.15)
11,420,614Exon 3
(Gl-2)
CCTCCT (100%)CCT (100%)CTT (11%)CCT (100%)GGA→G132
GAA→E132
rs768625455n.a.NS
11,420,597Exon 3
(Gl-2)
CCACCA (100%)CCA (100%)TCA (13%)CCA (100%)GGU→G138
AGU→S138
rs780713977n.a.Tolerated
(0.09)
11,420,588Exon 3
(Gl-2)
GGAAGA (4%) *AGA (10%) *AGA (16%)GGA (100%)CCU→P141
UCU→S141
n.a.n.a.Tolerated
(0.78)
11,420,495Exon 3
(Gl-2)
GGTAGT (12%)AGT (3%) *AGT (6%) *AGT (14%)CCA→P172
UCA→S172
n.a.n.a.Tolerated
(0.14)
11,420,308Exon 4
(Gl-2)
GGGGGG (100%)AGG (17%)GGG (100%)GGG (100%)CCC→P234
UCC→S234
rs760324380A = 0.0008%Tolerated
(0.09)
11,420,307Exon 4
(Gl-2)
GGGGGG (100%)GAG (12%)GGG (100%)GGG (100%)CCC→P234
CUC→L234
n.a.n.a.Damaging (0.03)
11,420,304Exon 4
(Gl-2)
GGTGGT (100%)GAT (12%)GGT (100%)GGT (100%)CCA→P235
CUA→L235
n.a.n.a.Damaging (0.01)
11,420,281Exon 4
(Gl-2)
GCAGCA (100%)ACA (13%)ACA (10%) *GCA (100%)CGU→R243
UGU→C243
rs758570507A = 0%Damaging (0.05)
11,420,278Exon 4
(Gl-2)
GGGGGG (100%)GGG (100%)AGG (11%)GGG (100%)CCC→P244
UCC→S244
n.a.n.a.Tolerated
(0.27)
11,420,182Exon 4
(Gl-2)
GGTGGT (100%)GGT (100%)AGT (11%)GGT (100%)CCA→P277
UCA→S277
rs755939114A = 0%Tolerated
(0.06)
11,420,170Exon 4
(Gl-2)
CCCCCC (100%)CCC (100%)TCC (11%)CCC (100%)GGG→G280
AGG→R280
n.a.n.a.Tolerated
(0.07)
11,420,161Exon 4
(Gl-2)
GGTGGT (100%)GGT (100%)AGT (13%)GGT (100%)CCA→P283
UCA→S283
n.a.n.a.Tolerated
(0.21)
11,420,160Exon 4
(Gl-2)
GGTGGT (100%)GGT (100%)GAT (19%)GGT (100%)CCA→P283
CUA→L283
n.a.n.a.Tolerated
(0.09)
11,420,154Exon 4
(Gl-2)
TCTTTT (3%) *TCT (100%)TTT (11%)TCT (100%)AGA→R285
AAA→K285
n.a.n.a.Tolerated
(0.63)
PRB4 (reverse reading, chromosome 12)
11,463,280Exon 1
(PGA)
TCATGA (100%)TGA (100%)TGA (97%)TGA (100%)AGU→S2
ACU→T2
n.a.n.a.Tolerated (0.83)
11,461,801Exon 3
(PGA)
GCTGCT (98%)GCT (97%)GTT (13%)GCT (100%)CGA→R23
CAA→Q23
n.a.n.a.Tolerated (0.57)
11,461,772Exon 3
(PGA)
GCAGCA (100%)GCA (96%)ACA (12%)GCA (100%)CGU→R33
UGU→C33
rs77775235A = 0%Tolerated (0.06)
11,461,769Exon 3
(PGA)
GGGTGG (5%) *TGG (9%) *TGG (5%) *TGG (13%)CCC→P34
ACC→T34
rs144658455T = 0%Tolerated (0.53)
11,461,745Exon 3
(PGA)
GTTCTT (8%) *CTT (8%) *CTT (5%) *CTT (12%)CAA→Q42
GAA→E42
rs76859544C = 6.8%Tolerated
(1)
11,461,742Exon 3
(PGA)
CCTTCT (10%) *TCT (27%)TCT (11%)TCT (7%) *GGA→G43
AGA→R43
rs776943151T = 0.05%Tolerated (0.45)
11,461,706Exon 3
(PGA)
GGGTGG (14%)TGG (23%)TGG (13%)TGG (20%)CCC→P55
ACC→T55
rs12308381T = 21.6%Tolerated (0.12)
11,461,675Exon 3
(PGA)
GCTGGT (1%) *GGT (2%) *GGT (2%) *GGT (28%)CGA→R65
CCA→P65
rs75743553G = 0%Tolerated (0.32)
11,461,673Exon 3
(PGA)
GGGGGG (99%)AGG (13%)AGG (2%) *GGG (100%)CCC→P66
UCC→S66
rs1332850459A = 0%Tolerated (0.25)
11,461,580Exon 3
(PGA)
TGGGGG (65%)GGG (52%)GGG (24%)GGG (54%)ACC→T97
CCC→P97
n.a.n.a.Tolerated (0.81)
11,461,570Exon 3
(PGA)
GGAGTA (51%)GTA (54%)GTA (8%) *GTA (47%)CCU→P100
CAU→H100
n.a.n.a.Tolerated (0.59)
11,461,553Exon 3
(PGA)
TCTCCT (13%)CCT (15%)TCT (100%)CCT (24%)AGA→R106
GGA→G106
n.a.n.a.Tolerated (0.84)
11,461,550Exon 3
(PGA)
GGTGGT (100%)AGT (17%)GGT (100%)GGT (100%)CCA→P107
UCA→S107
n.a.n.a.Tolerated (0.50)
11,461,549Exon 3
(PGA)
GGTGCT (13%)GCT (6%) *GGT (100%)GCT (13%)CCA→P107
CGA→R107
n.a.n.a.Tolerated
(0.9)
11,461,525Exon 3
(PGA)
AGGAGG (100%)AAG (100%)AAG (100%)AGG (100%)UCC→S115
UUC→F115
n.a.n.a.Damaging (0.04)
11,461,513Exon 3
(PGA)
GGTGGT (100%)GAT (10%) *GAT (11%)GGT (100%)CCA→P119
CUA→L119
n.a.n.a.Damaging (0.04)
11,461,471Exon 3
(PGA)
CCACCA (100%)CTA (4%) *CTA (14%)CCA (100%)GGU→G133
GAU→D133
n.a.n.a.Tolerated
(0.46)
11,461,421Exon 3
(PGA)
GGGGGG (100%)AGG (5%) *AGG (6%) *AGG (100%)CCC→P150
UCC→S150
n.a.n.a.Tolerated (0.18)
11,461,420Exon 3
(PGA)
GGGGGG (100%)GAG (11%)GGG (100%)GGG (100%)CCC→P150
CUC→L150
n.a.n.a.Tolerated
(0.1)
11,461,412Exon 3
(PGA)
CTTCTT (100%)TTT (14%)CTT (100%)CTT (100%)GAA→E153
AAA→K153
n.a.n.a.Tolerated (0.85)
11,461,319Exon 4
(P-D P32A)
GGAGGA (97%)AGA (9%) *AGA (11%)GGA (100%)CCU→P23
UCU→S23
n.a.n.a.Tolerated (0.55)
11,461,309Exon 4
(P-D P32A)
GGTGGT (100%)GGT (100%)GAT (11%)GGT (100%)CCA→P26
CUA→L26
n.a.n.a.Damaging (0.01)
11,461,229Exon 4
(P-D P32A)
GGAGGA (100%)AGA (13%)AGA (4%) *GGA (100%)CCU→P54
UCU→S54
n.a.n.a.Tolerated (0.13)
a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available; NS: not scored. The variants fixed at 100% in modern humans compared with ancient hominines are highlighted in light orange. The genomic variants whose frequencies show a different geographic distribution among humans are in red text.
Table 2. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on PRH2, HTN1, HTN3, AMY1A, STATH, and SMR3B gene loci.
Table 2. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on PRH2, HTN1, HTN3, AMY1A, STATH, and SMR3B gene loci.
Chromosome
Position (hg19)
Gene RegionModern HumanAltai Neanderthal
(Variant Frequency a)
Chagyrskaya
Neanderthal
(Variant Frequency a)
Vindija
Neanderthal
(Variant Frequency a)
Denisovan
(Variant Frequency a)
Codon→Amino AcidSNP idSNP Total
Frequency (ALFA)
SIFT Results
(Score)
PRH2 (direct reading, chromosome 12)
11,082,885Exon 2
(PRP-1)
GTTATT (2%) *ATT (12%)ATT (4%) *GTT (100%)GUU→V12
AUU→I12
rs776898585A = 0%N.S
11,082,894Exon 2
(PRP-1)
GTAGTA (100%)ATA (12%)ATA (10%) *GTA (100%)GUA→V15
AUA→I15
n.a.n.a.Tolerated (0.26)
11,083,305Exon 3
(PRP-1)
CCACCA (98%)TCA (14%)TCA (14%)CCA (100%)CCA→P33
UCA→S33
n.a.n.a.Tolerated (0.07)
11,083,318Exon 3
(PRP-1)
GGAGGA (100%)GAA (14%)GGA (100%)GGA (100%)GGA→G37
GAA→E37
n.a.n.a.Tolerated (0.07)
11,083,323Exon 3
(PRP-1)
CAACAA (100%)TAA (8%) *TAA (12%)CAA (100%)CAA→Q39
UAA→stop
n.a.n.a.Damaging due to stop
11,083,426Exon 3
(PRP-1)
GGAGGA (100%)GGA (100%)GAA (11%)GGA (100%)GGA→G73
GAA→E73
n.a.n.a.Damaging (0.02)
11,083,431Exon 3
(PRP-1)
CCACCA (100%)TCA (13%)TCA (8%) *TCA (6%) *CCA→P75
UCA→S75
n.a.n.a.Tolerated (0.23)
11,083,452Exon 3
(PRP-1)
GGAGGA (100%)AGA (6%) *AGA (14%)GGA (100%)GGA→G82
AGA→R82
n.a.n.a.Damaging (0.01)
11,083,455Exon 3
(PRP-1)
GGCGGC (100%)AGC (17%)GGC (100%)GGC (100%)GGC→G83
AGC→S83
n.a.n.a.N.S.
11,083,488Exon 3
(PRP-1)
GGAGGA (100%)GGA (100%)AGA (11%)GGA (100%)GGA→G94
AGA→R94
n.a.n.a.Damaging (0.04)
11,083,531Exon 3
(PRP-1)
AGGAGG (100%)AGG (100%)AAG (18%)AGG (100%)AGG→R108
AAG→K108
n.a.n.a.N.S.
11,083,536Exon 3
(PRP-1)
CAACAA (100%)TAA (11%)CAA (100%)CAA (100%)CAA→Q110
UAA→stop
n.a.n.a.N.S.
11,083,545Exon 3
(PRP-1)
CCCCCC (100%)TCC (12%)TCC (6%) *CCC (100%)CCC→P113
UCC→S113
rs1289206423T = 0%N.S.
11,083,551Exon 3
(PRP-1)
CAGCAG (97%)CAG (100%)TAG (13%)CAG (100%)CAG→Q115
UAG→stop
n.a.n.a.N.S.
11,083,570Exon 3
(PRP-1)
GGTGGT (100%)GAT (18%)GGT (100%)GGT (100%)GGU→G121
GAU→D121
n.a.n.a.N.S.
11,083,575Exon 3
(PRP-1)
CCCCCC (96%)TCC (8%) *TCC (15%)CCC (100%)CCC→P123
UCC→S123
n.a.n.a.N.S.
11,083,581Exon 3
(PRP-1)
CCTCCT (100%)TCT (20%)TCT (8%) *CCT (100%)CCU→P125
UCU→S125
n.a.n.a.N.S.
11,083,582Exon 3
(PRP-1)
CCTCCT (100%)CTT (13%)CTT (8%) *CCT (100%)CCU→P125
CUU→L125
n.a.n.a.N.S.
11,083,605Exon 3
(PRP-1)
CCACCA (100%)TCA (11%)CCA (100%)CCA (100%)CCA→P133
UCA→S133
rs1343870622T = 0%N.S.
11,083,618Exon 3
(PRP-1)
GGGGGG (100%)GAG (11%)GGG (100%)GGG (100%)GGG→G137
GAG→E137
n.a.n.a.N.S.
11,083,635Exon 3
(PRP-1)
CCTCCT (100%)CCT (100%)TCT (16%)CCT (100%)CCU→P143
UCU→S143
n.a.n.a.N.S.
11,083,636Exon 3
(PRP-1)
CCTCCT (100%)CCT (100%)CTT (11%)CCT (100%)CCU→P143
CUU→L143
n.a.n.a.N.S.
11,083,663Exon 3
(C-term removal)
TCTTCT (100%)TCT (100%)TTT (17%)TCT (100%)UCU→S152(rem)
UUU→F152(rem)
rs746351335n.a.N.S.
HTN1 (direct reading, chromosome 4)
70,920,165Exon 4CATCAT (100%)TAT (2%) *TAT (13%)CAT (100%)CAUH15
UAUY15
n.a.n.a.Tolerated (0.37)
70,921,215Exon 5GAAGAA (100%)AAA (3%) *AAA (11%)GAA (100%)GAAE16
AAAK16
n.a.n.a.N.S
70,921,234Exon 5CGACAA (2%) *CAA (58%)CAA (3%) *CGA (100%)CGAR32
CAAQ32
rs375127098A = 0.014%N.S
HTN3 (direct reading, chromosome 4)
70,896,460Exon 2
(Signal)
ATGATG (100%)ATA (11%)ATG (100%)ATG (100%)AUGM0(sp)
AUAI0(sp)
n.a.n.a.N.S
70,897,696Exon 3
(Signal)
GGAGGA (100%)AGA (12%)AGA (4%) *GGA (100%)GGAG17(sp)
AGAR17(sp)
rs1254624179n.a.N.S
AMY1A (reverse reading, chromosome 1)
104,238,248Exon 2
(Signal)
ACCACC (100%)ACC (100%)ATC (15%)ACC (100%)UGG→W4(sp)
UAG→stop
n.a.n.a.Damaging due to stop
104,238,189Exon 2GCTGCT (100%)ACT (13%)ACT (20%) **GCT (100%)CGA→R10
UGA→stop
n.a.n.a.Damaging due to stop
104,237,696Exon 3ACCACC (100%)ACC (100%)ATC (17%)ACC (100%)UGG→W59
UAG→stop
n.a.n.a.Damaging due to stop
104,237,685Exon 3GTTGTT (100%)GTT (100%)ATT (14%)GTT (100%)CAA→Q63
UAA→stop
n.a.n.a.Damaging due to stop
104,237,626Exon 3TACTAC (100%)TAC (100%)TAT (15%)TAC (100%)AUG→M82
AUA→I82
n.a.n.a.Damaging (0.01)
104,236,795Exon 4GCAGCA (100%)GCA (100%)ACA (13%)GCA (100%)CGU→R92
UGU→C92
n.a.n.a.Damaging (0)
104,236,666Exon 4CTACTA (100%)CTA (100%)TTA (11%)CTA (100%)GAU→D135
AAU→N135
n.a.n.a.Tolerated (0.08)
104,236,654Exon 4CCACCA (100%)TCA (5%) *TCA (11%)CCA (100%)GGU→G139
AGU→S139
n.a.n.a.Tolerated (0.6)
104,236,152Exon 5CAGCAG (100%)TAG (15%)TAG (20%)CAG (100%)GUC→V157
AUC→I157
n.a.n.a.Tolerated (0.17)
104,236,146Exon 5CTACTA (100%)TTA (8%) *TTA (12%)CTA (100%)GAU→D159
AAU→N159
n.a.n.a.Tolerated (1)
104,236,139Exon 5GCAGTA (4%) *GTA (7%) *GTA (12%)GCA (100%)CGU→R161
CAU→H161
n.a.n.a.Damaging (0.01)
104,236,080Exon 5CTTCTT (100%)CTT (100%)TTT (13%)CTT (100%)GAA→E181
AAA→K181
n.a.n.a.Tolerated (0.11)
104,235,996Exon 5CGTCGT (96%)CGT (100%)TGT (13%)CGT (100%)GCA→A209
ACA→T209
n.a.n.a.Tolerated (0.27)
104,235,164Exon 6CTCCTC (100%)CTC (100%)TTC (11%)CTC (100%)GAG→E240
AAG→K240
n.a.n.a.Damaging (0.01)
104,235,148Exon 6TCATCA (100%)TCA (100%)TTA (18%)TCA (100%)AGU→S245
AAU→N245
n.a.n.a.Tolerated (0.52)
104,235,083Exon 6GCGACG (3%) *ACG (6%) *ACG (12%)GCG (100%)CGC→R267
UGC→C267
n.a.n.a.Damaging (0)
104,234,224Exon 7CCTCCT (100%)CCT (100%)CTT (13%)CCT (100%)GGA→G281
GAA→E281
n.a.n.a.Damaging (0)
104,234,218Exon 7CCACCA (100%)CTA (13%)CTA (15%)CCA (100%)GGU→G283
GAU→D283
n.a.n.a.Tolerated (0.25)
104,234,129Exon 7GAAGAA (100%)AAA (13%)GAA (100%)GAA (100%)CUU→L313
UUU→F313
n.a.n.a.Damaging (0)
104,234,125Exon 7TGGTGG (100%)TAG (17%)TGG (100%)TGG (100%)ACC→T314
AUC→I314
n.a.n.a.Damaging (0)
104,233,978Exon 8GGAGGA (100%)AGA (13%)AGA (11%)GGA (100%)CCU→P332
UCU→S332
n.a.n.a.Damaging (0.05)
104,233,977Exon 8GGAGGA (100%)GAA (6%) *GAA (11%)GGA (100%)CCU→P332
CUU→L332
n.a.n.a.Damaging (0)
104,233,963Exon 8GCTGCT (100%)GCT (100%)ACT (14%)GCT (100%)CGA→R337
UGA→stop
rs19955486A = 0.08%Damaging due to stop
104,231,858Exon 9ACAACA (100%)ACA (100%)ATA (11%)ACA (100%)UGU→C378
UAU→Y378
n.a.n.a.Damaging (0)
104,231,680Exon 10CACCAC (100%)TAC (4%) *TAC (20%)CAC (100%)GUG→V401
AUG→M401
n.a.n.a.Damaging (0)
104,231,643Exon 10CCCCCC (100%)CTC (5%) *CTC (11%)CCC (100%)GGG→G413
GAG→E413
n.a.n.a.Damaging (0.02)
104,231,622Exon 10CCCCCC (100%)CCC (100%)CTC (13%)CCC (100%)GGG→G420
GAG→E420
n.a.n.a.Tolerated (0.08)
104,230,237Exon 11TGATGA (100%)TGA (100%)TAA (13%)TGA (100%)ACU→T442
AUU→I442
n.a.n.a.Damaging (0)
104,230,129Exon 11AGAAGA (100%)AGA (100%)AAA (13%)AGA (100%)UCU→S478
UUU→F478
n.a.n.a.Tolerated (0.62)
STATH (direct reading, chromosome 4)
70,866,583Exon 5GGGGGG (100%)AGG (13%)AGG (3%) *GGG (100%)GGGG17
AGGR17
n.a.n.a.N.A.
70,866,616Exon 5CCACCA (98%)CCA (100%)TCA (11%)TCA (3%) *CCAP28
UCAS28
n.a.n.a.N.A.
70,866,626Exon 5CCACCA (100%)CTA (15%)CCA (100%)CCA (96%)CCAP31
CUAL31
n.a.n.a.N.A.
70,866,628Exon 5CAACAA (100%)TAA (15%)CAA (100%)CAA (100%)CAAQ32
UAAstop
n.a.n.a.Damaging due to stop
SMR3B (direct reading, chromosome 4)
71,255,405Exon 3AGGAGG (100%)AGG (100%)AAG (12%)AGG (100%)AGG→R5
AAG→K5
rs777831757A = 0%NS
71,255,444Exon 3CCTCCT (100%)CTT (12%)CTT (3%) *CCT (100%)CCU→P18
CUU→L18
n.a.n.a.NS
71,255,495Exon 3GGGGGG (100%)GGG (94%)GAG (17%)GGG (100%)GGG→G35
GAG→E35
n.a.n.a.NS
a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available; NS: not scored.
Table 3. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on CST1, CST2, CST3, CST4, CST5, CSTA, and CSTB gene loci.
Table 3. Neanderthal and Denisovan nucleotide substitutions and the corresponding SIFT results on CST1, CST2, CST3, CST4, CST5, CSTA, and CSTB gene loci.
Chromosome
Position (hg19)
Gene RegionModern HumanAltai Neanderthal
(Variant Frequency a)
Chagyrskaya
Neanderthal
(Variant Frequency a)
Vindija
Neanderthal
(Variant Frequency a)
Denisovan
(Variant Frequency a)
Codon→Amino AcidSNP idSNP Total
Frequency (ALFA)
SIFT Results
(Score)
CST1 (reverse reading, chromosome 20)
23,731,494Exon 1 (Signal)ATAGTA (100%)GTA (95%)GTA (100%)GTA (100%)UAU→Y3(sp)
CAU→H3(sp)
rs6076122G = 71.1%Tolerated
(0.11)
23,731,463Exon 1
(Signal)
TGGTAG (2%) *TAG (13%)TAG (5%) *TGG (100%)ACC→T13(sp)
AUC→I13(sp)
n.a.n.a.Tolerated
(0.39)
23,731,455Exon 1
(Signal)
CACCAC (100%)CAC (100%)TAC (16%)CAC (100%)GUG→V16(sp)
AUG→M16(sp)
n.a.n.a.Tolerated
(0.23)
23,731,446Exon 1
(Signal)
CGGCGG (100%)CGG (100%)TGG (11%)CGG (100%)GCC→A19(sp)
ACC→T19(sp)
rs1425228752T = 0.001%Damaging
(0.01)
23,731,439Exon 1TCGTCG (100%)TTG (6%) *TTG (14%)TCG (100%)AGC→S2
AAC→N2
n.a.n.a.Tolerated
(0.15)
23,731,428Exon 1CTCCTC (100%)CTC (100%)TTC (21%)CTC (100%)GAG→E6
AAG→K6
rs1292698911T = 0.0004%Tolerated
(0.66)
23,731,394Exon 1CGTCGT (100%)CAT (13%)CGT (100%)CGT (100%)GCA→A17
GUA→V17
n.a.n.a.Tolerated
(0.25)
23,731,344Exon 1CTCTTC (3%) *CTC (100%)TTC (11%)TTC (3%) *GAG→E34
AAG→K34
rs368203290T = 0.008%Tolerated
(0.07)
23,731,307Exon 1GCAGCA (100%)GTA (14%)GCA (100%)GTA (6%) *CGU→R46
CAU→H46
rs758187154T = 0%Damaging
(0.01)
23,731,281Exon 1GTTGTT (100%)GTT (100%)ATT (13%)GTT (100%)CAA→Q55
UAA→stop
n.a.n.a.Damaging due
to stop
23,729,759Exon 2CCCCCC (100%)CCC (100%)CGC (26%)CCC (100%)GGG→G59
GCG→A59
n.a.n.a.Tolerated
(1)
23,729,700Exon 2GGGGGG (100%)GGG (100%)AGG (11%)GGG (100%)CCC→P79
UCC→S79
n.a.n.a.Tolerated
(0.38)
23,729,699Exon 2GGGGGG (100%)GAG (3%) *GAG (11%)GGG (100%)CCC→P79
CUC→L79
rs756782667A = 0%Tolerated
(0.06)
23,729,687Exon 2TGGTGG (100%)TAG (16%)TAG (4%) *TGG (100%)ACC→T83
AUC→I83
n.a.n.a.Damaging
(0.02)
23,728,503Exon 3GGGGGG (100%)AGG (11%)AGG (3%) *GGG (100%)CCC→P106
UCC→S106
rs754531104A = 0.004%Tolerated
(0.09)
23,728,494Exon 3
(Cys-SN)
TTGCTG (10%) *CTG (11%)CTG (14%)CTG (4%) *AAC→N109
GAC→D109
rs3188319C = 0.004%Tolerated
(1)
23,728,490Exon 3TCTTTT (2%) *TTT (14%)TCT (100%)TCT (100%)AGA→R110
AAA→K110
n.a.n.a.Tolerated
(1)
23,728,487Exon 3TCCTCC (100%)TTC (13%)TTC (7%) *TCC (100%)AGG→R111
AAG→K111
rs3188320T = 0%Tolerated
(0.85)
CST2 (reverse reading, chromosome 20)
23,807,260Exon 1
(Signal)
CGGCGG (100%)CGG (100%)CAG (14%)CGG (100%)GCC→A12(sp)
GUC→V12(sp)
rs1411653443A = 0.007%Damaging
(0.02)
23,807,257Exon 1
(Signal)
TGGTGG (100%)TAG (14%)TGG (100%)TGG (100%)ACC→T13(sp)
AUC→I13(sp)
n.a.n.a.Tolerated
(0.43)
23,807,245Exon 1
(Signal)
CGGCGG (100%)CAG (14%)CGG (100%)CGG (100%)GCC→A17(sp)
GUC→V17(sp)
n.a.n.a.Tolerated
(0.1)
23,807,231Exon 1GGGGGG (100%)AGG (14%)AGG (8%) *GGG (100%)CCC→P3
UCC→S3
n.a.n.a.Tolerated
(1)
23,807,162Exon 1GCAACA (95%)ACA (100%)ACA (100%)ACA (8%) *CGU→R26
UGU→C26
rs111349461A = 0.06% Damaging
(0.05)
23,807,138Exon 1CTCTTC (3%) *TTC (12%)TTC (6%) *CTC (100%)GAG→E34
AAG→K34
rs541427772T = 0.017%Tolerated
(0.07)
23,807,102Exon 1GCGACG (3%) *GCG (100%)ACG (11%)GCG (100%)CGC→R46
UGC→C46
rs112783512A = 0.019%Tolerated
(0.07)
23,807,093Exon 1GCCGCC (100%)ACC (4%)ACC (20%)GCC (100%)CGG→R49
UGG→W49
rs55860552A = 0.12%Damaging
(0)
23,807,084Exon 1GCTGCT (100%)ACT (5%) *ACT (15%)GCT (100%)CGA→R52
UGA→stop
rs568411970A = 0%Damaging due
to stop
23,807,077Exon 1TCCTCC (100%)TCC (100%)TTC (13%)TCC (100%)AGG→R54
AAG→K54
n.a.n.a.Tolerated
(0.34)
23,807,075Exon 1CTCCTC (100%)TTC (12%)TTC (12%)CTC (100%)GAG→E55
AAG→K55
n.a.n.a.Tolerated
(1)
23,805,930Exon 2TATCAT (7%) *CAT (5%) *CAT (14%)CAT (4%) *AUA→I67
GUA→V67
rs199856966C = 0.004%Tolerated
(1)
23,805,917Exon 2GCTGTT (2%) *GTT (13%)GTT (5%) *GTT (2%) *CGA→R71
CAA→Q71
rs150428155T = 0.008%Damaging
(0.01)
23,805,878Exon 2ACAACA (100%)ACA (97%)ATA (14%)ACA (100%)UGU→C84
UAU→Y84
n.a.n.a.Damaging
(0)
23,805,875Exon 2CGGCGG (100%)CAG (15%)CAG (2%) *CGG (100%)GCC→A85
GUC→V85
n.a.n.a.Tolerated
(0.06)
23,804,730Exon 3ACGACG (100%)ATG (7%) *ATG (11%)ACG (100%)UGC→C98
UAC→Y98
n.a.n.a.Damaging
(0)
23,804,702Exon 3ACCACC (100%)ACT (12%)ACC (100%)ACC (100%)UGG→W107
UGA→stop
rs1380420803n.a.Damaging due to stop
23,804,691Exon 3TACTCC (13%)TCC (10%) *TCC (9%) *TAC (100%)AUG→M111
AGG→R111
rs202150666C = 0.01% Tolerated
(0.31)
CST3 (reverse reading, chromosome 20)
23,618,472Exon 1
(Signal)
GAGGAG (100%)AAG (8%) *AAG (15%)GAG (100%)CUC→L8(sp)
UUC→F8(sp)
rs1285248919n.a.Damaging
(0)
23,618,433Exon 1GGGGGG (100%)GGG (100%)AGG (13%)GGG (100%) **CCC→P22(sp)
UCC→S22(sp)
n.a.n.a.Tolerated
(0.5)
23,618,370Exon 1CACCAC (100%)CAC (100%)TAC (13%)CAC (100%)GUG→V18
AUG→M18
n.a.n.a.Tolerated
(0.11)
23,618,358Exon 1CCACCA (100%)TCA (22%)TCA (4%) *CCA (100%)GGU→G22
AGU→S22
n.a.n.a.Tolerated
(0.48)
23,618,357Exon 1CCACCA (100%)CTA (11%)CCA (100%)CCA (100%)GGU→G22
GAU→D22
n.a.n.a.Tolerated
(0.56)
23,618,295Exon 1GTGGTG (100%)GTG (100%)ATG (13%)GTG (100%)CAC→H43
UAC→Y43
n.a.n.a.Tolerated
(1)
23,615,994Exon 2CCCCTC (3%) *CCC (100%)CTC (13%)CCC (100%)GGG→G59
GAG→E59
n.a.n.a.Damaging
(0.01)
23,614,564Exon 3GTCGTC (100%)GTC (100%)ATC (13%)GTC (100%)CAG→Q118
UAG→stop
n.a.n.a.Damaging due to stop
CST4 (reverse reading, chromosome 20)
23,669,566Exon 1
(Signal)
TGGTGG (100%)TAG (7%) *TAG (11%)TGG (100%)ACC→T13(sp)
AUC→I13(sp)
rs770415022n.a.Tolerated (0.37)
23,669,561Exon 1
(Signal)
CGACGA (100%)CGA (100%)CGA (100%)AGA (100%)GCU→A15(sp)
UCU→S15(sp)
n.a.n.a.Tolerated (0.39)
23,669,539Exon 1AGGAGG (100%)AAG (5%) *AAG (13%)AGG (100%)UCC→S3
UUC→F3
n.a.n.a.Tolerated (0.08)
23,669,470Exon 1GCAGCA (100%)GTA (15%)GCA (100%)GTA (17%)CGU→R26
CAU→H26
rs201273557T = 0.01%Tolerated (0.08)
23,669,462Exon 1GTGGTG (100%)GTG (100%)ATG (18%)GTG (100%)CAC→H29
UAC→Y29
n.a.n.a.Tolerated (0.06)
23,669,408Exon 1GGCGGC (100%)AGC (12%)GGC (100%)GGC (100%)CCG→P47
UCG→S47
n.a.n.a.Tolerated (0.06)
23,667,835Exon 2AAACAA (97%)CAA (100%)CAA (90%)AAA (100%)UUU→F58
GUU→V58
rs145608577C = 0.2%Tolerated (1)
23,667,828Exon 2CCCCCC (100%)CTC (18%)CCC (100%)CCC (100%)GGG→G60
GAG→E60
rs144556333T = 0.007%Damaging (0)
23,667,826Exon 2CACCAC (100%)TAC (10%) *TAC (27%)CAC (100%)GUG→V61
AUG→M61
n.a.n.a.Tolerated (0.24)
23,667,808Exon 2CATCAT (100%)TAT (13%)CAT (100%)TAT (4%) *GUA→V67
AUA→I67
rs774067751T = 0.007%Tolerated (0.23)
23,667,792Exon 2TGGTGG (100%)TAG (13%)TGG (100%)TGG (100%)ACC→T72
AUC→I72
n.a.n.a.Damaging (0)
23,667,783Exon 2TGGTGG (100%)TGG (95%)TAG (15%)TGG (100%)ACC→T75
AUC→I75
rs760057501A = 0%Damaging (0.01)
23,666,565Exon 3TACTCC (88%)TCC (14%)TCC (80%)TAC (100%)AUG→M111
AGG→R111
rs779547810C = 0%Tolerated (0.87)
CST5 (reverse reading, chromosome 20)
23,860,243Exon 1AGCAAC (3%) *AGC (100%)AAC (11%)AAC (5%) *UCG→S4
UUG→L4
rs145031249A = 0.011%Tolerated (0.27)
23,860,211Exon 1GTAGTA (100%)GTA (100%)ATA (12%)GTA (100%)CAU→H15
UAU→Y15
n.a.n.a.Tolerated (1)
23,860,199Exon 1GAGGAG (100%)AAG (11%)GAG (100%)GAG (100%)CUC→L19
UUC→F19
rs370924959A = 0%Tolerated (0.66)
23,860,178Exon 1ACAGCA (93%)GCA (100%)GCA (95%)GCA (100%)UGU→ C26
CGU→ R26
rs1799841G = 43.2%Tolerated (1)
23,860,174Exon 1CGGCGG (100%)CGG (100%)CAG (11%)CGG (100%)GCC→A27
GUC→V27
n.a.n.a.Tolerated (0.18)
23,860,130Exon 1CTACTA (100%)CTA (100%)TTA (14%)CTA (100%)GAU→D42
AAU→N42
rs1257216384n.a.Tolerated (0.11)
23,860,093Exon 1CGGCGG (100%)CGG (100%)CAG (11%)CGG (100%)GCC→A54
GUC→V54
n.a.n.a.Tolerated (0.11)
23,858,200Exon 2TGGTGG (100%)TAG (22%)TGG (100%)TGG (100%)ACC→T76
AUC→I76
rs41282292A = 0.061%Damaging (0)
CSTA (direct reading, chromosome 3)
122,044,197Exon 1GTTGTT (100%)ATT (11%)GTT (100%)GTT (100%)GUU→V20
AUU→I20
rs778366890A = 0%Tolerated (0.23)
122,056,400Exon 2CCACCA (100%)CCA (100%)TCA (12%)CCA (100%)CCA→P25
UCA→S25
n.a.n.a.Tolerated (0.74)
122,060,361Exon 3CTTCTT (100%)CTT (100%)TTT (16%)CTT (100%)CUU→L82
UUU→F82
n.a.n.a.Damaging (0)
122,060,373Exon 3CAGCAG (100%)CAG (100%)TAG (12%)CAG (100%)CAG→Q86
UAG→stop
n.a.n.a.Damaging due
to stop
CSTB (reverse reading, chromosome 21)
45,194,562Exon 2CGCTGC (2%) *TGC (11%)CGC (100%)CGC (100%)GCG→A49
ACG→T49
rs559906825T = 0.007%Damaging (0)
45,194,138Exon 3TGGTGG (98%)TCG (13%)TGG (95%)TGG (100%)ACC→T81
AGC→S81
n.a.n.a.Tolerated (0.65)
45,194,132Exon 3AGAAGA (100%)AGA (100%)AAA (15%)AGA (100%)UCU→S83
UUU→F83
n.a.n.a.Tolerated (0.1)
a: Frequency of the substitution (highlighted bases) in the ancient hominin species, as reported in IGV considering the depth (coverage) of the reads displayed at the corresponding locus; * frequency ≤ 10% and ** counts < 10; n.a.: not available. The variants fixed at 100% in modern humans compared with ancient hominines are highlighted in light orange. The genomic variants whose frequencies show a different geographic distribution among humans are in red text.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Di Pietro, L.; Boroumand, M.; Lattanzi, W.; Manconi, B.; Salvati, M.; Cabras, T.; Olianas, A.; Flore, L.; Serrao, S.; Calò, C.M.; et al. A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution. Int. J. Mol. Sci. 2023, 24, 15010. https://doi.org/10.3390/ijms241915010

AMA Style

Di Pietro L, Boroumand M, Lattanzi W, Manconi B, Salvati M, Cabras T, Olianas A, Flore L, Serrao S, Calò CM, et al. A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution. International Journal of Molecular Sciences. 2023; 24(19):15010. https://doi.org/10.3390/ijms241915010

Chicago/Turabian Style

Di Pietro, Lorena, Mozhgan Boroumand, Wanda Lattanzi, Barbara Manconi, Martina Salvati, Tiziana Cabras, Alessandra Olianas, Laura Flore, Simone Serrao, Carla M. Calò, and et al. 2023. "A Catalog of Coding Sequence Variations in Salivary Proteins’ Genes Occurring during Recent Human Evolution" International Journal of Molecular Sciences 24, no. 19: 15010. https://doi.org/10.3390/ijms241915010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop