Article Structural Entropy to Characterize Small Proteins (70 aa) and Their Interactions

Proteins composed of short polypeptide chains (about 70 amino acid residues) participating in ligand-protein and protein-protein (small size) complex creation were analyzed and classified according to the hydrophobicity deficiency/excess distribution as a measure of structural and functional specificity and similarity. The characterization of this group of proteins is the introductory part to the analysis of the so called `Never Born Proteins' (NBPs) in search of protein compounds of biological activity in pharmacological context. The entropy scale (classification between random and deterministic limits) estimated according to the hydrophobicity irregularity organized in ranking list allows the comparative analysis of proteins under consideration. The comparison of the hydrophobicity deficiency/excess appeared to be useful for similarity recognition, examples of which are shown in the paper. The influence of mutations on structure and hydrophobicity distribution is discussed in detail.


Introduction
The interaction of protein with other macromolecules or with specific ligands is related to biological activity.The ligand binding as well as the complexation with other protein molecules is OPEN ACCESS characterized by high selectivity.The model of "fuzzy oil drop" (FOD) is applied to characterize the structural and functional specificity of proteins built by 70 amino acid residues in polypeptide chain.The limitation to this length is related to the hypothesis that the proteins of this size generated on the basis of the amino acid sequences which have never been observed in the Nature (random sequences) may be potentially a large repository of proteins of many biological activities not generated by the evolution.The generation of novel proteins aimed at producing new biological activities of pharmacological use is widely discussed in literature [1~4].Isolation of proteins with, for example, improved stability, new or altered catalytic properties, or proteins that bind target molecules with enhanced affinity is the focus of the attention of researchers involved in pharmacology.Consequently, a wide variety of methods have been developed for the isolation of such functional proteins [5~7].Significant progress has been made in understanding kinetic and thermodynamic requirements for the folding of heteropolymer sequences [8~15].The synthesis of homo-polymers under alleged prebiotic conditions has been described [16~20], but it is important to recall that it has not even been clarified how long copolymeric sequences containing several different amino acid residues in the same chain could have been produced under prebiotic conditions.It is commonly accepted that the proteins existing on our Earth are only an infinitesimal fraction of the possible sequences, and simple calculations show that the ratio between the 'existing' protein sequences and the 'possible' ones corresponds roughly to the ratio between the size of a hydrogen atom and the size of the universe [21].Random protein space has been explored several times to search for optimized enzymatic functions, as reported in literature [22~27], but generally with the aim to find novel active compounds for pharmaceutical and biotechnological applications.Usually, this kind of work, defined as directed molecular evolution [28], has been carried out starting from selected extant protein scaffolds, randomizing either restricted regions or the entire gene [29~33].Alternatively, by recombination techniques, DNA fragments are mixed to obtain novel combinations [34~39] with pharmacological activity.
The project on `Never Born Proteins' (NBPs) is oriented on the search for such pharmacologically active protein molecules [40,41].Before the analysis of biological function in NBP can be performed, the characteristics of proteins of the same length present in the Protein Data Bank (PDB) [42] shall be performed.Analysis of large-scale protein complexes (like ribosome) and proteins crystallized in form of individual molecules generates the data base for comparative analysis of proteins of de novo design status.The group of proteins built of 70 amino acid residues is quite differentiated.This is why the complex analysis summarizing all specific groups of proteins will also be presented in the next paper of this series.The analysis of proteins containing 70 amino acid residues in polypeptide chain is aimed to be the basis for the identification of potential biological activity of new proteins characterized by sequences not observed in Nature.The comparative analysis will be shown elsewhere taking into consideration the known proteins and the NBPs.
The paper presents the "fuzzy oil drop" model application to recognize the ligand binding area and/or protein-protein complexation engaged regions.The work is aimed to estimate the limitations of the model under consideration in respect to biological function recognition.The analysis of structures of proteins present in PDB is the introductory part to make possible the comparison with structures generated in silico according to ROSETTA and "fuzzy oil drop" models for randomly generated sequences ("never born proteins").
The entropy scale calculated for hydrophobicity deficiency/excess to measure the aim-orientation versus the randomness of the hydrophobicity distribution in the protein body.The entropy scale is assumed to be the variable applicable for structural and functional similarity.

Hem binding in c-type cytochromes
Cytochromes are characterized by hem binding.This binding in c-type cytochromes is of covalent character in contrast to the hem binding in b-type cytochromes or hemoglobin.The  show the residues engaged in hem binding in 1B7V, 1C75 and 1N9C.The color scale visualizes the magnitude of i H Δ .This scale is applied to all figures in this paper.The pink squares identify the residues engaged in hem binding in 1B7V, 1C75 and 1N9C (Figure 1).The distribution of these residues appeared different versus those engaged in any one hem molecule in 1OS6 suggesting different mechanism leading to the structure formation in proteins under consideration.It is impossible to recognize the hem binding site on the basis of comparable analysis of Δ maxima localization.This suggests the conclusion that there is no common strategy for ligand (hem) binding site creation (black thick line in Figure 1).The SE parameters comparison additionally supports this observation (Table 2).The blue fragments (Figure 1) suggest that the bend fragments seem to be common for all protein belonging to this group of proteins.
The analyzed proteins are a very good example to observe the influence of larger number of ligands (1OS6-A).The degree of the influence of additional ligands binding can be seen on the profile change and measured quantitatively using the SE scale.

Metal binding proteins
Mostly Zn(II) binding proteins are present in the group under consideration.Only calmodulin (1FW4) is complexed to Ca(II) ion.The (Figure 3C).The 1DX8 protein seems to be peculiar in that engages in ion binding residues with highly negative  The averaged values for residues engaged in ion binding appeared to be negative (hydrophobicity excess) with the exception of the only one ion Ca 2+ .The Zn 2+ ions are covalently bound by Cys residues, which are characterized by the highest hydrophobicity parameter (in comparison with other amino acids).Their appearance on the protein surface causes the high hydrophobicity excess on the protein surface.The Ca 2+ binding cavity represents the hydrophobicity deficiency as it is observed in many cavities binding ligand [44,45].The 3-D representation of ion binding proteins visualizes the relation between ion position and the characteristics of residues responsible for ion complexation.The ion binding to proteins is electrostatic interaction oriented.This is why the localization of ions is rather difficult to be recognized according to hydrophobicity irregularity distribution.The group of ion binding proteins under consideration is quite differentiated according to SE and I parameters (Table 2, section: Metal binding).This observation is also in agreement with the i H Δ profiles.The 3-D presentation with the surface characteristics reveals no specificity of residues responsible for ions binding (Figure 4) taking hydropbobicity characteristics as the criteria.

Antibiotics
The group of antibiotics is represented by Peptide Antibiotic Bacteriotoxin AS-48 from Enterococcus faecalis.The proteins deposited in PDB as 1O82, 1O83 and 1O84 differ by the crystallization pH conditions and complexation to different molecules.The i H Δ profiles of all three proteins appeared identical (Figure 5A) (or negligibly small) what suggest negligible pH and complexation influence on the structure of the protein in this case.The 1O82-C is weakly different in relation to the other polypeptide chains.No differences between the SE parameters have been found for these proteins (polypeptide chains).The fragments engaged in complex creation (yellow (P-P)1 and green (P-P)2 fragments in the profile) seem to represent the complementary character of interacting surfaces (Figure 5B) (seen also in Figure 6C).
The 3-D representation of i H Δ distribution in proteins engaged in the complex creation is shown in Figure 6A and 6B.The red color residues (high hydrophobicity deficiency) are almost entirely buried in the interior of the molecule, which suggests low tendency to interact with ligands as it was observed elsewhere [44,45].The maltose bound to the protein complex interacts with residues characterized by i H Δ values close to zero (Figure 5B) (hydrophilic residues) and one residue with negative i H Δ (hydrophobicity excess), what may be expected taking into account the characteristics of maltose as the hydrophilic molecule as and other than hydrophobic based type of interaction is responsible for this complexation (Figure 5B). Figure 6 shows the surface characteristics and the contact areas in particular (Figure 6C) revealing the region of hydrophobicity excess on the surface being engaged in protein-protein complexation.The SE and I parameters are also identical for all the structures in this group (see Table 2, section: Antibiotics).for 1O82 (1O83 and 1O84) (Table 3) calculated for residues engaged in protein-protein complexation suggest the mechanism leading to protein-protein complexation based on the hydrophobicity excess on the protein surface.The approach of residues on negative given in Table 3. as seen on the surface (the complex), (c) -the contact surfaces between units showing the hydrophobicity excess area being in contact.

Toxins
The group of toxin in this analysis is represented by vero-toxin, cobra-toxin, snake-toxin, scorpiontoxin and Shida-toxin.The Verotoxin is represented by the structures 1BOV and 4ULL (5 units).All units appeared to be very similar (using i H Δ profiles -Figure 7, SE and I parameters (Table 2) as criteria for comparison).almost identical hydrophobicity distribution.The notation "i+1" and "i-1" denote the preceding and following neighboring molecule.The "A+i" denotes the interaction of chain "i" with chain "A" present in the complete complex not taken in the analysis (longer than 70 aa).The Shiga toxin (1DM0) including 10 polypeptides of 70 amino acid residues in each unit represents a highly symmetrical system (C2 symmetry of two pentamers with C5 symmetry).The profiles for all units appeared to be identical (Figure 10).Each unit (i) interacts with two neighbors (i-1 and i+1).The fragments being in interaction contact are represented by i H Δ local maxima, which interpreted as hydrophobicity deficiency.The green fragment (Figure 10 B) which is responsible for the chain A complexation seems to be stabilized mostly by hydrophobic interaction (the green fragment represents the local i H Δ minimum -hydrophobicity excess).

The structural changes generated by mutation
The Shiga toxin is a good example for analysis of mutation influence on the i H Δ profile (and possibly structure change due to mutation) since several mutants of this protein are present in PDB.The mutation influence can be localized using the  The differences between units in the 10 subunits complex of wild type (WT) Shiga toxin are negligibly small (Figure 10).The complexation in 1D1I does not reveal any structural changes versus the 1CZW which is crystallized as an individual molecule.Both molecules are W34A mutants.The consequences of mutation F30A/W34A observed in 1C4Q versus the WT molecule (1DM0) is visualized in Figure 11.The profiles makes possible the analysis of the structural consequences of particular mutations (Figure 12).The relation between WT and F30A/W34A as well as between F30A/W34A and W34A can be traced (see Figure 12A and 12B).The three-dimensional Gauss function applied in FOD model was assumed to represent the distribution of hydrophobicity density in a protein molecule according to the well known model assuming that a hydrophobic center in proteins is responsible for their stability [46].The FOD model can also be applied to simulate the external force field of hydrophobic character that directs the folding process orienting hydrophobic residues in the central part of a molecule with simultaneous exposure of hydrophilic residues on the protein surface.Application of an idealized three-dimensional Gauss function hydrophobicity distribution reveals that proteins deviate from this theoretical distribution in a specific form.The i H Δ profile expressing the discrepancy between the idealized and empirical hydrophobicity distribution appeared to be specific for particular proteins characterizing their structure and function-related irregularities.The i H Δ profile was used in this paper to characterize 70 amino acid residues long proteins involved in protein-ligand complexes and small protein-protein complexes.A special attention was focused on the visualization of mutant dependent structural changes.It was shown that the structural changes may have a local as well as a long range character which may be easily identified by the i H Δ profiles changes and allow easy representation of these changes.The large set of mutants (including also proteins of larger size) will be presented soon.The i H Δ profile will be used for the analysis of the structural and functional characteristics of NBP, the structures of which will be generated according to ROSETTA [47] and the FOD model.The comparison between the i H Δ profiles of real proteins and NBP will be given.The i H Δ profile maxima representing the hydrophobicity deficiency cannot be treated as indicators for ion binding localization due to the electrostatic interaction in this case.The hem binding cavity in c-type cytochromes seems not to be hydrophobicity-based probably due to covalent binding, which determines the ligand localization.
The best applicability of FOD model seems to be the comparison of structural changes being consequence of mutation.This subject will be developed on larger group of proteins of different biological function, different size and different identified forms of biological activity failure as the consequence of mutation.The SE parameters will be used for similarity search between different groups of proteins classified according to their biological activity and in comparison with structures generated in silico according to FOD model and ROSETTA program in search for possible biological activity of NBP.The complete set of proteins of 70 amino acids in polypeptide chain is quite differentiated.

Correlation between SE and RSA (relative solvent accessibility)
Commonly used scales aimed to characterize the specificity of the protein surface like RSA (relative solvent accessibility) or ASA (accessible solvent area) are neither in correlation with values.This is why the correlation between them is not excluded although not expected for large number of proteins.Thus no general mutual dependency between SE and ASA or RSA is expected.

Conclusion
The analysis presented in this paper is aimed on estimation of the limits for "fuzzy oil drop" applicability for characteristics of structural and/or functional specificity of proteins of 70 amino acids in polypeptide chain complexed to other proteins, ligands and ions.According to the model the hydrophobicity deficiency (positive i H Δ ) values is expected to bind the hydrophobic ligand (or at least its hydrophobic part).The hydrophobicity excess (negative i H Δ ) when appearing on the surface points the area potentially engaged in protein-protein complexation.As shown in this paper it was found for selected proteins (antibiotics and partially for toxins).The protein-protein complexation appeared possible as results of the interaction of fragments of positive i H Δ generating the complex according to the mechanism as predicted for protein-ligand complexation mechanism.It seems to be also accordant to the model making two partners of complexation engaged in complementary irregularity accomplishment.
The SE scale was introduced to measure the degree of structural and/or functional similarity.The comparison of large number of SE values calculated for different proteins allows the search for similarity.It is of special importance in case of proteins of unknown biological function, the number of which is permanently growing in PDB [48].
The applicability of the "fuzzy oil drop" based model i H Δ profiles changed being the result of structural changes on mutation seems to be quite useful.Both the i H Δ profiles and Se scale seem to express the mutation influence in qualitative and quantitative form.This applicability of "fuzzy oil drop" model will be analyzed for larger group pf proteins to verify this observation.

Data
The tool available on PDB webpage oriented on the search for proteins satisfying particular conditions was used to extract the proteins according to a defined polypeptide chain length.The proteins containing 70 amino acid residues were selected.However some examples of proteins of chain length between 68 and 72 amino acids were also taken under consideration.The proteins  2).The toxins present in the data base are of particular interest due to the availability of few mutant forms, which are characterized with respect to structural changes classified according to the FOD model.

The protein-ligand contacts analysis
The search for residues being in a close distance (interaction distance) with the ligand molecule or another protein molecule (in protein-protein complexes) has been made using the PDBsum database (www.ebi.ac.uk/pdbsum/) [49].

The structural/functional characteristics
The basis of FOD model applied for the identification of hydrophobicity deficiency/excess in protein molecules, which appeared to be strongly structure (and function) dependent is very simple.The value of the difference between theoretically assumed hydrophobicity ( t H ~) distribution in a protein (which is assumed to be represented by a three-dimensional Gauss function) and that empirically observed ( e H ~) described according to Levitt [50] function, defines hydrophobicity irregularity: where: The values with index "sum" express the total value of appropriate hydrophobicity to make the theoretical and empirical distribution normalized, making possible comparison of these values.The symbols r j H denote the hydrophobicity parameter describing each amino acid [51] or any other hydrophobicity scale can be applied).The values with indexes i, j describe the points representing individual residues (amino acid residue geometric center).
The eq.The eq.3.expresses the method to calculated the empirical hydrophobicity density distribution being the result of the specific localization of residues representing specific hydrophobicity parameters.The e i H ~ values represent the hydrophobicity collecting all hydrophobic interactions in distance below cut off (15Å).The normalization of both function values (division by the sum of all i-th values) makes possible calculation of the differences between theoretical idealized hydrophobicity density distribution and the observed one in particular protein (eq.1.).This is why the  (hydrophobicity excess).This is a way to compare proteins and in consequence to measure their mutual similarity.The and/or functional similarity.The FOD model as well as SE calculation was applied to evaluate the structural and/or functional differentiation in the group of proteins listed in the Table 1.
Following quantity used in the study of binding sites is the information (I) necessary to localize residues creating the binding site (presence of any form of disorder may be interpreted as potential localization for any form of interaction).The participation of particular residues in the potential active site creation is understood as a probability expressing conjunction of events (close mutual localization) and can be created according to equation: where K and p j have the same meaning as in the equation (4).

Relative solvent accessibility
The relation between solvent and protein molecule was measured using standard models implemented in the program ASA-View (http://www.netasa.org/asaview/).This application allows calculation of ASA (accessible surface area) and recalculates to the RSA scale.Additionally the DSSP program was applied in the calculation procedure [53].The calculation was performed to search for correlation of SE scale with other methods oriented on protein-solvent relation.

iH
Δ profiles of ctype cytochromes under consideration are shown in Figure1.

Figure 1 .
Figure 1.The profiles for proteins belonging to the group of cytochromes.

Figure 2 .
Figure 2. The spatial representation of 1C75 with hem bound.(a) -the ribbon representation of 1C75 colored expressing the value of i H Δ .(b) -the space filling model of 1C75 showing the localization of hem in respect to the i H Δ characteristics of the binding cavity.(c) -the ribbon presentation of 1N9C colored according to the scale introduced in Figure 1.ligand molecule in white (d) -the structure of 1OS6 with ligands complexed: the SO4 -yellow, DXC -white and hem -red.The color scale for ribbon as in Figure 1.
Figure 3A.The 3-D representation of hydrophobicity deficiency/excess is presented in Figure 4.The residues at a short distance versus the ion complexed present the i H Δ values close to zero.It means that the spatial location of these residues is in agreement with the idealized hydrophobicity distribution.The i H Δ profiles of 1U5S-A and 1U5S-B reveal the complexation mechanism in agreement with the FOD model (Figure 3B).The fragment of positive i H Δ (local maximum) representing the hydrophobicity deficiency is complexed to the fragment of hydrophobicity excessnegative values of i H Δ what makes the hydrophobicity compatibility in the contact area, although neither maximum nor minimum are of global character.The resides distinguished as pink in chain A are interacting with green fragment of chain B. The hydrophobicity compatibility of these two fragments makes the protein-protein interaction possible and stable.The brown stars near the X-axis distinguish the residues of chain A responsible for ion binding with i H Δ values close to zero.The ion binding residues in 1D8Q seem to be exceptional, presenting rather contradictory i H Δ values.The ion binding residues in 2CUR-A protein present close to zero values of i H Δ

Figure 4 .
Figure 4.The 3-D representation of the structures (the more red color -the higher hydrophobicity deficiency assumed to reveal potential ligand binding site, the more blue color -the higher hydrophobicity excess revealing potential protein-protein complexation area).The green and blue dots visualize the positions of ion(s) complexed to protein (a) -1FW4 with two Ca(II) ions complexed (b) -2D8Q with two Zn(II) ions complexed (c) 1DX8 with one Zn(II) ion complexed (d) -2CUR with two Zn(II) ions complexed (e) -1U5S-B with two Zn(II) ions complexed

Figure 5 .
Figure 5.The of hydrophobic area in contact with water.The predictability of the protein-protein complexation seems to be possible in this case.The nonsymmetrical structure of four chains results in quite differentiated values of averaged i H Δ

Figure 6 .
Figure 6.The 3-D representation of protein representing the antibiotic which is common for 1O82, 1O83 and 1O84.(a) -the i H Δ distribution in a whole complex (the

Figure 8 .
Figure 8.The 3-D representation of vero-toxin.The symmetry of the system shown in right picture.The two left pictures show the distribution of i H Δvalues.The helices of hydrophobicity excess character (blue color -according to the scale shown in Figure1) seem to participate in the complex generation.

Figure 11 .
Figure 11.Influence of mutation on the the local and long range structural changes.The yellow line visualizes the magnitude of local change.The blue dots localize the mutation positions.The comparison of i H Δ

Figure 12 .
Figure 12. 3-D representation of Shiga-toxin (wild type).The color scale denotes the magnitude of i H Δ .The space filling representation -the mutated residue, the side chains in sticks representation -the residues influences by the mutation recognized according to the i H Δ profile.
scale.Although the SE scale is aimed to characterize the participation of hydrophobicity irregularity (hydrophobicity excess and/or deficiency) versus the ideal hydrophobicity density distribution it expresses quite different issue.The SE value depends on the number of fragments representing particular characteristics and the averaged value of i H Δ for each fragment.The RSA and ASA values express the global characteristics not taking under consideration the characteristics of the surface dispersion.SE values give information about the distribution and dispersion of the positive and/or negative i H Δ 2. expresses the three-dimensional Gauss function.The value of this function is interpreted as hydrophobicity density in idealized case.The Gauss function maximum localized in the point ) hydrophobicity density localized in the center of the ellipsoid.The Gauss function values decrease according to exponential function reaching values close to zero in a certain distance versus the center.This distance changes depending on the values of deviations), which can be different for different axes.
areas in a protein molecule which seem to be of a specific character describing the structure and related to biological function.The i H Δ profile shows the fragments of hydrophobicity deficiency ( i H Δ > 0) and hydrophobicity excess ( i H Δ < 0).The distribution and the length of fragments of positive and negative i H Δ can be interpreted on the basis of information theory assuming that the generation of structures with residues characterized by positive i H Δ localized in close mutual vicinity is a non-random event.The information entropy (according to Shannon definition [52]) is expressed as: pj expresses the sum of i H Δ values for j-th fragment (all consecutive SE depends on the number of K-fragments and their pi mutual relation, and therefore maximum (SEmax) for particular K elements (fragments) when all probabilities (pi) are equal, representing the random solution exists.The larger is the difference between SEmax and SE denoted by ΔSE, the less random character represents the localization of residues with positive i H Δ .The closer to SEmax is the calculated value of SE the more random is the process producing a particular i H Δ profile.SE can be calculated for fragments of positive i H Δ (hydrophobicity deficiency) as well as for fragments of negative i H Δ and SE calculations will be used for protein structures description.The color scale shown in figures representing the i H Δ profiles is also applied also for 3-D representation of proteins, showing the distribution of hydrophobicity deficiency/excess.Similar (or identical) i H Δ profile and/or similar or identical SE parameters may suggest a structural

Table 1 .
The averaged values of i H Δ profile in this case is not the good criteria for ligand binding site identification for "fuzzy oil drop" model.The averaged values of i H Δ being in contact with hem represent most frequently the positive value what suggests the character of hem binding site as hydrophobic deficiency cavity.The localization (as shown in Figure 1.D) seems to be similar although different residues are engaged in hem

average Residues engaged in interaction with ligand * i H Δ average Residues not engaged in interaction with ligand *
i H

Table 2 .
The averaged values of i H Δ for residues engaged in ion binding.The values given in table are multiplied by 10 *3 for simplicity

Table 3 .
The averaged Δ values calculated for residues engaged and not engaged in proteinprotein complexation.The values given in table are multiplied by 10 *3 for simplicity i H i H Δ

Table 4 .
The averaged values of Chains B and E were selected as the closest neighbors of chain A in seven chains symmetrical complex.* The values given in table are multiplied by 10 3 for simplicity i H Δ

Table 5 .
The SE characteristics of proteins under consideration d -N-decyl-_-D-maltose e -65% sequence identity to wild type

Table 6 .
Proteins and their short characteristics taken for analysis -PDB entries containing protein-protein complexes are denoted by *.presented in this paper belong to different groups (according to the biological function): electron transfer, metal binding proteins, antibiotic, and toxins (see Table a