It is known that multi-domain proteins are frequently characterized by the occurrence of domain repeats in proteomes across the three domains of life: Bacteria, Archaea, and Eukaryotes [1
]. Proteins with repeats participate in nearly every cellular process from transcriptional regulation in the nucleus to cell adhesion at the plasma membrane [3
]. In addition, due to their flexibility, domain repeats can be found in cytoskeleton proteins, proteins responsible for transport and cell cycle control [4
]. Proteins with structural repeats are believed to be ancient folds.
One such unique protein family is a family of bacterial ribosomal proteins S1 in which structural domain S1 (one of the oligonucleotide/oligosaccharide-binding fold (OB-fold) options) repeats and changes in a strictly limited range from one to six [5
]. As demonstrated in our recent paper [5
], the family of polyfunctional ribosomal proteins S1 contains about 20% of all bacterial proteins, including the S1 domain. This fold also could be found in different eukaryotic protein families and protein complexes in different number variations. Such multiple copies of the structure increase the affinity and/or specificity of the protein binding to nucleic acid molecules.
Recently we have shown that the sequence alignments of S1 proteins between separate domains in each group reveal a rather low percentage of identity. In addition, the verification of the equivalence of the domain characteristics showed that for long S1 proteins (five- and six-domain containing S1 proteins) the central part of the proteins (the third domain) is more conservative than the terminal domains and apparently is vital for the activity and functionality of S1. Data obtained indicated that for general functioning of these proteins, the structure scaffold (OB-fold) is obviously more important than the amino acid sequence [6
]. This statement is in good agreement with the fact that there is a high degree of conservatism and topology position of the binding site on the OB-fold surface in others proteins, as well as “fold resistance” to mutations and the ability to adapt to a wide range of ligands, which allows us to consider this fold as one of the ancient protein folds. For example, the author of article [7
] proposed considering this core structure of inorganic pyrophosphatase as the evolutionary precursor of all other superfamilies.
At present, the structure of S1 from Escherichia coli
was obtained only with a very low resolution of 11.5 Å using cryo-electron microscopy [8
]. In the Protein Data Bank, there are only 3D structures of separate domains of ribosomal S1 from E. coli
obtained by NMR [9
]. Recently, protein S1 on the 70S ribosome was visualized by ensemble cryo-electron microscopy [11
]. It was shown that S1 cooperates with other ribosomal proteins (S2, S3, S6, and S18) to form a dynamic mesh near the mRNA exit and entrance channels to modulate the binding, folding and movement of mRNA. The cryo-electron microscopy was also used to obtain the structure of the inactive conformation of the S1 protein as part of a hibernating 100S ribosome [12
A separate S1 domain from the ribosomal proteins S1 [9
] and other bacterial proteins containing an S1 domain [13
] represents a β-barrel with an additional α-helix between the third and fourth β-sheets. As shown in the articles [13
], the S1 domain as a part of different bacterial proteins (as well as in eukaryotic proteins) itself is quite compact, therefore it crystallizes and is visualized very well.
At the same time, there are currently no determined structures for full-length, intact ribosomal S1 proteins containing a different number of structural domains (six in E. coli
, five in Thermus thermophilus
, etc.). This may be due to the increased flexibility of multi-domain proteins as was noted in [17
]. In addition, some biochemical studies suggest that in solution and on the ribosome, S1 can have an elongated shape stretching over 200 Å long [17
Moreover, recently it was shown that the prediction of intrinsic disorder within proteins with the tandem repeats supports the conclusion that the level of repetition correlates with their tendency to be unstructured and the chance to find natural structured proteins in the Protein Data Bank (PDB) increases with a decrease in the level of repeat perfection. Also, the authors suggested that in general, the repeat perfection is a sign of recent evolutionary events rather than of exceptional structural and/or functional importance of the repeat residues [21
Despite all these observations, the flexibility of S1 proteins, their tendency for intrinsic disorder, and the structural characteristics of this family have not been studied as of yet. To fill this gap, we have analyzed here the flexibility of the bacterial S1 proteins within and between structural domains, as well as the tendency for intrinsic disorder of the S1 protein family.
In this work, we show that S1 proteins belong to a unique family, which differs in the classical sense from proteins with tandem repeats. We found that the one-domain and two-domain containing S1 proteins apparently have more stable and rigid structure. An increase in the number of structural domains contributes to the possible transition of a portion of proteins from the folded state to the MG state. For example, for three- and four-domain containing proteins, the ratio of predicted MG state is about 70%. A relatively small percentage of internal flexibility/disorder within individual structural domains could be seen as an indicator of the stability of the S1 domain as one of the OB-fold in this family. At the same time the ratio of flexibility in the separate domains apparently is related to their roles in the activity and functionality of S1. A more stable, compact and conservative central part in the multi-domain proteins is vital for RNA interaction, while terminals domains are for other functions. At the same time, an equal ratio of regions connecting the secondary structure in separate domains and between structural domains indicates about the same organization of multi-domains containing S1 proteins, as well as position and ratio of the secondary structures within separate domains. Reasons for the lack of intact 3D structure of full-length ribosomal protein S1 is not well-understood Perhaps this is due to the high mobility of domains relative to each other in the multi-domain proteins. Further investigation of the flexibility of the available 3D structures for separate S1 domains and the full length S1 domain from E. coli in complex with 70S ribosomal subunit will allow finding an accurate explanation.