Chaperonin Structure – The Large Multi-Subunit Protein Complex

The multi sub-unit protein structure representing the chaperonins group is analyzed with respect to its hydrophobicity distribution. The proteins of this group assist protein folding supported by ATP. The specific axial symmetry GroEL structure (two rings of seven units stacked back to back - 524 aa each) and the GroES (single ring of seven units - 97 aa each) polypeptide chains are analyzed using the hydrophobicity distribution expressed as excess/deficiency all over the molecule to search for structure-to-function relationships. The empirically observed distribution of hydrophobic residues is confronted with the theoretical one representing the idealized hydrophobic core with hydrophilic residues exposure on the surface. The observed discrepancy between these two distributions seems to be aim-oriented, determining the structure-to-function relation. The hydrophobic force field structure generated by the chaperonin capsule is presented. Its possible influence on substrate folding is suggested.


Introduction
It has been discovered that the protein folding process is guided by additional molecules directing the structural changes toward the correct native form. Molecular chaperones are the proteins which bind and stabilize unfolded or partially folded proteins, thereby preventing them from being degraded [1][2][3][4]. Only a certain subset of cellular proteins undergo the folding process accompanied by the chaperonins, which are large protein constructs which directly facilitate the protein folding process with participation of ATP molecules. Chaperonin exists as a back-to-back linked double-ring complex. The symmetric (7-fold) rings of GroEL interact with the co-chaperonin GroES. The mechanism of ATP binding and its collaboration with internal structural changes in cis-(called chains A-G in this paper) and trans-rings (chains H-N in this paper) reveals the functioning algorithm of the folding machine. Each part is responsible for a specific element of this algorithm [1][2][3][4].
The object of the analysis was the chaperonin used as an example to search for possible mechanisms for the generation of such large constructions with a nano-machine character. The question may be asked, how do these proteins become folded? How do they influence the substrate folding?
An attempt to find answers to these questions on the basis of the "fuzzy oil drop" model [5][6][7][8][9][10][11][12][13] has been undertaken and is presented in this paper. The assumed model helps clarify to what extent the hydrophobicity distribution may help, support or even direct the protein folding. The specific role of chaperonins as nano-machines with two functions: first as "holder" -complexation of the folding molecule to prevent the misfolding and secondly, as a "folder" -directing the folding process to the generation of proper native structure is the object of the work presented [14].
The structure of the GroEL-GroES-(ADP)7 complex is a very good example to test the "fuzzy-oildrop" model applicability to recognize the structural and functional specificity of the protein under consideration due to multi-subunit protein assembly comprising rings of subunits stacked back to back. The presence of ADP ligands makes possible the analysis of ligand docking to this molecule. The complete complex is presented in parts distinguishing the structural and functional fragments of this multi-subunit construction.

Data
The 1AON -the structure of the object under consideration has been taken from the PDB (deposit 1AON) [15].

"Fuzzy-oil-drop" Model
It is assumed that the presence of an external force field of hydrophobic character expressed by the three-dimensional Gauss function is able to direct the protein folding toward hydrophobic core generation, with the simultaneous exposure of hydrophilic residues toward the surface of the protein molecule.
The external force field is represented by the three-dimensional Gauss function. The value of the Gauss function (traditionally interpreted as probability density value) is assumed to represent the hydrophobic density in the protein body. The hydrophobicity density can be calculated for any point the space covering the protein molecule.
The three-dimensional Gauss function is given as follows: The value of Ht j is assumed to represent the hydrophobicity distribution at a particular point belonging to the protein body. The hydrophobicity maximum is localized in the center of the ellipsoid ( ) , , z y x and decreases in a distance-dependent manner according to the Gauss function. The mean value at which the Gauss function reaches its maximum is localized at the (0.0.0) point in a coordinate system. The standard deviation values σ x ,σ y ,σ z calculated for each dimension (axis) separately represent the size of the drop which depends on the length of the polypeptide under consideration [7].
The j-th grid point, for which the hydrophobicity is calculated, represents the effective atom position (averaged position of the side chain including Cα atoms) making possible attachment of a particular hydrophobicity density to a particular amino acid.
Before the external hydrophobic force field can be defined for any protein molecule, it shall be oriented in the space according to following procedure: 1. the geometric centre of the molecule shall be localized in the center of coordinate system. 2. the longest distance between two residues (represented by the effective atom -geometric centre of side chain of the amino acid) shall overlap one of the axes (say the X-axis).
3. the molecule shall be rotated around the X-axis to orient the longest inter-projections (on the YZ plane) distance along the Y-axis. 4. the linear size (the maximum inter-atomic distance along the X, Y, and Z axes ) increased by 9 Å in each direction (the cutoff distance for hydrophobic interaction) makes possible calculation of σ x ,σ y ,σ z This is how the geometric parameters of protein molecule can be interpreted according to the Gauss function.
Taking into account the high symmetry of the system under consideration, a user defined orientation of the coordinate system is necessary. Thus, the initial orientation determining the X-axis is defined by the position of the symmetrical units, so the averaged position of the top elements and averaged position of bottom elements (user-defined) determine the initial orientation of the molecule. The user-defined orientation of the molecule is available and necessary for any protein molecules or complexes before the "fuzzy oil drop" model can be applied. The X-axis defined this way is simultaneously the 7-fold symmetry axis.
The empirical (observed) distribution of hydrophobicity can be different than the idealized one. The empirical hydrophobicity distribution can be calculated according to Levitt function [16]: where Ho j represents the empirical hydrophobicity value characteristic for the j-th grid point, r i H represents the hydrophobicity characteristic of the i-th amino acid, r ij is the distance between the j-th grid point and i-th effective atom in the amino acid, and c expresses the cutoff distance, which has a fixed value of 9.0 Å following the original paper [16].
The continuity of the Gauss function allows calculation of the hydrophobicity density in any point in space (in the protein body). So any point can be treated as a grid point. It can also be the position of an effective atom in particular. This is why the index j may represent the position of j-th residue, as it is taken in this work. Ho sum represents the sum of all the grid points hydrophobicity. Any hydrophobicity scale may be applied to calculate the observed density of hydrophobicity [16][17][18][19][20][21].
Since both values are standardized (the coefficient represent the area of hydrophobicity higher than expected. The area of such characteristics, when localized on the surface of protein seems to represent the potential area responsible for protein-protein complex creation.
The profile of i H Δ values can show the discrepancy between the idealized and empirical hydrophobicity density distribution revealing the fragments (or individual residues) representing the hydrophobicity excess ( i H Δ negative) and hydrophobicity deficiency ( . It is expected that the hydrophobicity irregularity versus idealized one may express localization of biological function-related area in the protein body.

Protein Partitioning
The 1AON is a quite large and complex protein molecule. This is why different approaches have been applied.
1.The complete molecule was treated as one uniform "drop" -the orientation of molecule was according to its 7-fold symmetry axis (the X-axis). 2.The chaperonin molecule represents three levels organization: two stacked rings (Gro-EL) with the third one (Gro-ES) as "cap".
3.Each ring (Gro-EL) and the "cap" (Gro-ES) is composed of seven identical polypeptide chains which are also treated as structural units (chains A-H, G-N and O-U). 4.The polypeptide chains belonging to Gro-EL evidently represent the two-domain construction. This is why each such domain is treated as independent individual part and treated as a "drop". The partitioning shown above is aimed to define the folding unit and possible path leading to complex generation ( Figure 1). 1AON represents the chaperonin additionally complexed with ADP and some Mg +2 ions. The characteristics of localization of these ligands will be analyzed with respect to the construction of the "fuzzy-oil-drop".

Identification of the Non-Bonding Interactions
The cut-off of 3.9 Å was taken to identify the residues interacting with ions, ligand (ADP) and protein (chain). The cut-off value has been taken according to the criteria applied in PDBsum [22] data base to make possible the comparison of results.

Implementation
All results have been obtained using our own program, written in Python [23] programming language. The program has been divided into multiple subroutines to ensure flexibility and diversity in dealing with PDB [24] files, which was required to work on protein partitions described above.
The first routine reads the input PDB file, removes all water residues and classifies non-empty chains into three groups: protein, nucleic and "hetero". By "hetero" we mean neither protein nor nucleic acid chains, presumably containing only heteroatoms. All operations regarding PDB files are conducted using methods implemented in the Biopython [25] library.
The second routine is the core of the "fuzzy oil drop" evaluation: it performs an extraction of the residue subsets from the protein chains that the user is interested in and computes their effective atoms, which form the "drop". The drop is placed in the origin and rotated to achieve a desired spatial orientation, as stated in the model description. After the drop size becomes known, theoretical and observed hydrophobicity are evaluated using optimized array operators from Numpy [26] (Numerical Python) library.
Identification of interactions is conducted using a fast k-d tree algorithm implemented in the Biopython library by checking position of each atom from selected residues against all atoms in file. Each neighbor atom placed within given radius of 3.9 Å is unfolded into parental residue and then classified as either protein, nucleic, ligand or ion contact. Contacts with protein residues from same chain are ignored.
The last routine sets output values for every residue (same for each atom) in PDB file: normalizes hydrophobicity discrepancy by overwriting the beta-factor column and contact type by overwriting the occupancy column. Plotting of graphical representation of results is made using the Matplotlib library [27].

Clustering Analysis
The agglomerative hierarchical clustering "ahc" algorithm has been applied to analyze the agreement between subjective interpretation of the "fuzzy oil drop" model applicability and objective discrimination of elements representing the common characteristics [28]. Clustering is the assignment of objects (feature vectors) into groups called clusters so that objects from the same cluster are more similar to each other than objects from different clusters. The short description of "ahc" algorithm is as follows. Suppose we have a data set of N objects to be clustered and a user-defined distance measure ( , ) i j d x x to state how similar the two objects i x and j x are, for any i and j. The "ahc" algorithm starts off clustering this data by putting each of the data objects i x in a singleton and then keeps on joining the closest pair of clusters { } where | | C denotes the number of objects in a cluster C. Thus, the distance between clusters is the minimum (or maximum or average) of the distances between one object from the first and another from the second cluster. Once the dendrogram is generated for assumed k clusters, the procedure cuts the k-1 longest links in a dendrogram. The described "ahc" algorithm was implemented as a function in the software package Matlab v.7. First, the implemented "ahc" algorithm was used to cluster separately objects in each of the files according to the partition (section 2.3) into k=2 groups. Each object i x in these files is represented as a four dimensional feature vector x x x are the spatial coordinates x,y,z of the object i x while 4 i x is its estimated variable H Δ . For each object i x its correct classification (being label '0' or '1' (i.e. "0" -means residue engaged in protein-protein interaction, '1' denotes the residue not engaged in the protein-protein interaction) is known (according to PDBSum criteria). The clustering result for each partitioning form (as described in Section 2.3) compared with the correct classification allowed the calculation of the clustering performance defined as the number of objects correctly assigned to a groups divided by the total number of objects in a file.

Results
The results may be summarized as follows: A molecule of high complexity like 1AON is difficult to describe in a simple way. The partitioning described in Methods part was introduced. The chains A-H appeared identical taking the H Δ profiles as the criterion (results not shown) as well as the chains G-N. The chains present in Gro-ES part also appeared to represent the identical H Δ profiles. This is why the chains A, G and O were taken to represent particular rings.

Hydrophobicity Density Irregularity in Chaperonin Molecule
The H Δ profile of chain A taken to represent the chains A-G is shown in Figure 2. The residues engaged in ligand binding are distinguished as well as the residues involved in protein-protein complexation. According to "fuzzy oil drop" the H Δ minima representing the hydrophobicity excess on the surface of protein is assumed to be potential protein-protein contact area. The H Δ maxima The characteristics of residues engaged in P-P or ligand binding changes due to different relative localization of these residues versus in the "oil drop" construction. The same analysis performed for chain H representing the 7-fold system of the chains H-N is presented in Figure 3. There is no difference between the amino acids sequence in chains A-G and H-N, although some small differences of H Δ profiles are observed. It is the possible result of different complexation conditions (the chains H-N have no contact with the molecule Gro-ES) and no ligand is complexed to this ring. The accordance of protein-protein interaction area with expectations based on "fuzzy oil drop" model is lower in chain H than in chain A.   The H Δ profile of the chain O as calculated for the Gro-ES fragment reveals rather large fragments of hydrophobicity deficiency in this part of the chaperonin (Figure 6). The residues engaged in the interaction with the chain A are of special importance. Representing the very low values of H Δ (hydrophobicity excess) the residues of chain O fit well with the residues in chain A engaged in the interaction with this chain (residues 233-269), also representing the low values of H Δ . A high accordance can be seen in this case. The residues of low H Δ values at the N-and C-terminal fragments are engaged in the interaction with chains of the cis-ring. This interaction seems to be of the form of hydrophobic interaction although the fragments of hydrophobic deficiency character are also engaged in protein-protein interaction.
In conclusion one may say that the H and A chain domains (especially the domain containing the residues 192 -371) are highly accordant with the model. The domains generated by N-terminal and Cterminal polypeptide fragments seem also to be accordant with the "fuzzy oil drop" model.

Clustering
To make the interpretation of H Δ profiles more objective the clustering analysis was applied (as described in 2.6.) was performed. The results are shown in Table 1. Table 1. Accordance between expected clustering and observed one for classification between amino acids engaged in protein-protein complexation versus all others (not engaged in protein-protein interaction) -third column and amino acids engaged in proteinprotein complexation versus those which are not in protein-protein contact (the residues engaged in ligand or ion binding excluded). P-P denotes the residues engaged in proteinprotein interaction. The highest accordance between expected (according to the "fuzzy oil drop" model) and observed (according to the PDBSum criteria) was obtained for the domain 192-371 in chain A. This observation supports the assumption that this domain could be the first one spontaneously folded according to "fuzzy oil drop" model exposing on the surface the hydrophobic residues in contact with other polypeptide chains. The chain H, although representing the identical sequence, displays some differences versus the A chain, suggesting it to be folded under other conditions than the chain A. Lower accordance between expected and observed classification of all other fragments of the complex (partitioning) suggest that the proper unit to be applied for protein folding in the environment simulated by the "fuzzy oil drop" is the domain 192-371 of the chain A. The relatively high accordance observed for the entire complex (chaperonin molecule) may be interpreted as the reliability of the "fuzzy oil drop" model. It suggests that the protein-protein interaction in the complete molecule may be recognized by the minima of H Δ in the profile. The results given in Table 1 seem to support the observations presented in Figures 4 and 5.

Structure-to-function Characteristics
The "fuzzy-oil-drop" model may be also applied for biological function recognition of the protein under consideration [10][11][12]. The role of chaperonin molecule (complex) is to create the environment for folding proteins. Thus, the internal surface characteristics of the capsule seems to be of special importance in respect to the biological function of this molecule.
The H Δ values of residues localized on the internal surface of the complex are presented in Figure 7 and the 3-D representation in Figure 8. These two figures show the residues ordered according to the X-axis localized on the internal surface of the capsule. The high excess of hydrophobicity in Gro-ES suggests the high participation of this type of interaction in substrate binding. The cis-ring (chains A-H), in contrast to the GroES part, presents highly differentiated characteristics expressing excess/deficiency hydrophobicity, although biased significantly toward the hydrophobicity deficiency. It may be interpreted as a specific distribution for stronger/weaker interactions with a substrate molecule. The trans-ring (H-N) presents rather low differentiation of hydrophobicity excess/deficiency with H Δ values close to zero (particularly in the end area). This suggests high accordance of the hydrophobicity distribution with the expected one (accordant with "fuzzy oil drop" model). Figures 7 and 8 present the hydrophobicity irregularity of the internal channel where the folding reaction takes place. These two pictures show the characteristics of the external force field of hydrophobic character. Its specific deformation (in the sense of irregularity versus the idealized Gauss function distribution) seems to represent the localization of possible anchorage for folding protein in the chaperonin capsule.
The discussion of hydrophobicity based structure-to-function relationships also concerns other parts of the complex presenting highly irregular hydrophobicity distributions. Particularly the excess hydrophobicity areas not engaged in protein-protein interactions (responsible for complex generation) seem to represent the regions potentially ready to interact under changed circumstances during the action of the chaperonin. The structural changes are reported to be a large deviation from the 7-fold symmetry of Gro-EL rings [14]. The possible protein-protein interaction can be simulated linking two hydrophobicity-excess areas of interacting chains. This can be seen in Figure 9.   . Relation between residue localization (distance) versus the 7-fold symmetry axis of residues and its H Δ values expressing the degree of hydrophobicity density irregularity. The dark blue symbols represent all residues present in Gro-EL Gro-ES complex. The pink symbols distinguish the residues of Gro-ES fragment. The yellow symbols show the characteristics of residues engaged in ligand binding. The light green symbols show resides engaged in ion binding. The light blue symbols in lower picture represent the residues engaged in protein-protein interaction. The residues in the area distinguished by red circle represent residues on the chaperonin surface potentially ready for hydrophobic protein-protein interaction. The residues (belonging to the set distinguished by green circle) localized on the internal cylinder surface of high H Δ are potentially ready (not being engaged in any other interaction in the complex) to interact according to non-bonding interaction category. P-P denotes the protein-protein contact, ES-EL -the entire complex, Gro-ES the fragment ES, LIG -interaction with ADP and ION -residues engaged in ion binding.
The large-scale structural changes observed as accompanying the folding process engage residues potentially ready to interact. The residues localized on the surface (large distance versus the 7-fold symmetry axis) carrying highly negative H Δ values seem to be ready for hydrophobic interaction (distinguished by red circle in Figure 9). The residues localized closely versus the 7-fold symmetry axis representing large positive H Δ values are ready for non-bonding interaction (distinguished by green circle in Figure 9). The first possibility seems to be related to structural changes in the chaperonin molecule while the second one seems to be related to the interaction with the folding protein molecule (internal surface of the capsule) [29].

Conclusions
The "fuzzy oil drop" model was assumed to identify the area of excess hydrophobicity on the protein surface as a potential area for protein-protein interaction(s). The analysis of 1AON was treated as an example allowing the estimation of the limits of the applicability of this model for proteinprotein interaction areas and ligand binding predictability. The CAPRI [30] initiative is oriented on protein-protein complexation blind prediction. The presented model was assumed to apply for proteinprotein interaction recognition. According to the results shown in this paper, some fragments of polypeptide representing local H Δ minima in the profile can be treated as potential regions engaged in protein-protein complexation, particularly when calculated for domains present in the protein structure. The specificity of individual domains is able to determine the protein-protein complexation. They may be treated as the original source for this process. The ligand localization appeared also to be accordant with expectations -in the fragments of high H Δ values. Additionally the analysis of histograms particularly calculated of domains (which are assumed to be folded independently) perfectly well supports this interpretation (see Figure 4 and Figure 5.). The fragments of low H Δ values not engaged in protein-protein interaction in the complex under consideration seem to represent fragments potentially ready for this type of interaction. The large structural deformations experimentally observed in chaperonin molecule during the protein (substrate) folding seem to be possible in molecule with hydrophobic areas on the protein surface, which is what is observed in 1AON [14]. This observation seems to be additionally supported by the well defined correlation between the hydrophobicity of a side-chain and the logarithm of the folding rate that has been reported in [31], where almost perfect linear correlation has been found for ΔΔG versus the change in hydrophobicity plots observed for few proteins. This is why the analysis of hydrophobicity distribution in protein bodies seems to be of high importance.
The "fuzzy oil drop" model was generated to represent the external force field to generate the environment for folding process assumed to direct the hydrophobic residues toward the center of the molecule and exposure of hydrophilic residues on the surface. The specific irregularity ( H Δ profile) appeared to be biological function related [12]. This observation is assumed to support the postulated hypothesis of the necessary specific ligand participation in folding process to ensure the generation of highly specific cavity (ligand binding) [6][7][8][9][10][11][12]. The influence of external force field seems to be obvious in the case of the folding process assisted by a chaperonin molecule. This molecule is assumed to create the proper environment for folding polypeptide chains [31]. The hydrophobicity based characteristics of the interior of the capsule of chaperonin molecule seems to be able to direct the folding process in the form of controlled hydrophobicity excess/deficiency distribution in the folding molecule. The fragment of high positive H Δ (hydrophobicity deficiency) values fixes the non-bonding interactions and the fragments of low H Δ (hydrophobicity excess) constraints the hydrophobicity based interactions keeping the hydrophobic residues on the surface of the folding polypeptide if necessary. Assuming that the interior of the Gro-EL chamber really introduces the restraints of this character, the folded molecule shall represent the structure of H Δ distribution on the protein surface complementary to the internal