Statistical Properties of Protein-protein Interfaces

The properties of 1172 protein complexes (downloaded from the Protein Data Bank (PDB)) have been studied based on the concept of circular variance as a buriedness indicator and the concept of mutual proximity as a parameter-free definition of contact. The propensities of residues to be in the protein, on the surface or form contact, as well as residue pairs to form contact were calculated. In addition, the concept of circular variance has been used to compare the ruggedness and shape of the contact surface with the overall surface.


Introduction
Formation of protein complexes is a ubiquitous biological process.However, knowledge of the structure of individual proteins is rarely sufficient to unequivocally predict the interaction surface of two proteins.This problem manifests itself in the performance of various computational tools designed to predict such complex formation: a large number of predicted structures usually look "reasonable".Given this conundrum, tools that can help improve the ranking of such set of models for a complex can be of significant help.
The increase of the number of complexes in the protein databank led to statistical analyses.Jones and Thorton [1] compared the solvation potential, residue propensity, hydrophobicity, planarity, protrusion and accessible surface area of the interfaces and the rest of the surface of 48 complexes.Progress in the area has been reviewed by Moreira et al. [2].Energetic and evolutionary properties of the interfaces were examined by Ma et al. [3].Newer analyses of protein interaction surfaces were done in [4][5][6].A tool kit [7] and web servers [8,9] are also available that calculate a series of physical

OPEN ACCESS
and chemical properties of the interaction sites.The role of conserved residues in interaction surfaces has also been studied [10][11][12].Databases have been developed with characteristics of the cyrstallographically determined interfaces [13][14][15].A recent study examined the relationship between binding affinity and interfacial buried surface area [16].The geometry of interaction surfaces has also been studied [17] using Voronoi tessellation [18].A method measuring the surface roughness, different from the one described in Section 3.2 has been introduced in [19].A method differentiating between flat and protruding/interwinding interaction surfaces, different from the one described in Section 3.3 has been described in [20].
The present paper applies the concept of mutual proximity to delineate contact pairs on the interaction surface and uses the concept of the circular variance to examine some geometric properties of the interaction surface.

Experimental Section
The calculations presented in this paper are based on a number of geometric concepts introduced previously: accessible surface, circular variance and mutual proximity.Here, they are introduced briefly.
Accessible surface [21] is an extension of the Vander Waals (VdW) surface.It is defined by the center of a solvent-size sphere (in case of water of radius 1.4 Å) as it is rolled around the VdW surfaces of the atoms in the target.Such a surface is generally smoother than the VdW surface as the solvent sphere does not fit into the crevices formed by the VdW surfaces of atoms.
Circular Variance (CV), a concept introduced for the characterization of angular spread [22] has been found to be useful to characterize the extent a point is buried within a set of points [23].The degree of buriedness indicator is calculated as the circular variance of vectors drawn from a test point to the points in the set: when the test point is way outside then these vectors are essentially parallel, so CV is near zero while when the test point is in the middle of the set then CV will be close to one: where { } is the set vectors pointing to the atoms of the protein and is the vector pointing to the test point.
The concept of mutual proximity was used earlier to establish the contacts between a target and a docked ligand pose [24].In this application, for each atom in a pair of proteins the one nearest to it from the other protein is established; whenever atom i1 is nearest to i2 and at the same time atom i2 is nearest to i1 the pair of atoms [i1, i2] is considered a contact pair.Figure 1 shows a schematic demonstrating the relation between contact and mutual proximity.
It was also found that a robust definition of surface atoms is obtained if atoms whose exposed accessible surface fraction exceeds 3% and whose CV calculated with respect to the rest of the protein atoms is less than 0.8 are selected.As the average distance between closest pair contact atoms is 5.0 Å surface atoms within 4 Å were considered part of the interface.This choice ensured that most, if not all, non-contact surface atoms were included in the interface.All statistics reported in this paper were calculated using these definitions.As the data involved crystal structures, only heavy atoms were considered.
The test data set contained 1172 protein complexes, downloaded from the Protein Data Bank (PDB) [25].The complexes selected were of high resolution and had no missing residues.For each complex, the PDB annotation for biological oligomers was used.When the complex contained more than two members, the pair with the largest number of contacts was retained.The PDB IDs of the 1172 complexes, together with the chain IDs used is provided as supporting information.

Residue Propensities
Based on our definition of surface and contact atoms, the number of occurrence of each of the 20 amino acids (AA) were calculated in the whole set, the set of surface atoms and the set of contact atoms.The results are in Table 1, with AAs arranged in the order of hydrophobic, polar, charged.Columns 1, 2, and 4 of Table 1 show the percent of time each residue occurred in the dataset (all), on the surface (surf) or among the contact atoms (cont), resp.Note that uniformly distributed residues would all show up with 5%.Column 3 shows the ratio of surface and overall propensities; not surprisingly the excess likelihood of being on the surface increases moving from the hydrophobic to the charged residues.Residues participating in multiple contacts were only counted once.Columns 5 and 6 of Table 1 show the ratios %(cont)/%(all) and %(cont)/%(surf), resp.It is the last column that gives the most striking result: there are significant differences between the contact forming propensities of surface residues.All three aromatic AAs are strong contact formers but so are MET, CYS and HIS.Interestingly, of the charged residues only ARG shows significant contact forming propensity.Recent work showed that "hot spots" of protein-protein interactions frequently contain tryptophan, arginine, and tyrosine-this is in accordance with their larger contact-forming propensities found in the present study [26].
Table 2 shows the contact-pair forming propensities of AA pairs.The table values are normalized by the surface propensities of the contributing residues: where PRi,j is the propensity of the residue pair {i, j} to form contact, Pi is the probability of residue i being on the surface, Ni,j is the number of {i, j} pairs found in the data set and δi,j is the Kronecker delta (one when i = j, zero otherwise).This means that if a residue does not have any preference for contacting residues then the entries involving that residue should be all ones.PRi,j values over 3.0 are highlighted in bold type.

Ruggedness of the Interaction Surface
The calculated CV value can also be used to characterize the smoothness or ruggedness of region of the molecular surface.In a smooth surface the difference between the CV values of neighboring surface atoms is small; the more rugged the surface, the larger the CV differences are.This suggests a ruggedness indicator at a surface atom i, RG(i): where is a surface neighbor atom of i and Nn(i) is the number of surface neighbors of atom i.To differentiate the interaction surface from the overall surface atoms that are within 4 Å of a contact atom are considered to be part of the interaction surface.The average number of surface atoms, contact atoms, interaction surface atoms were found to be 307.0,50.8, 72.8, resp.The average RG values of the interaction and full surface atoms were found to be 0.069 (s.d.= 0.007) and 0.066 (s.d.= 0.005), resp.The difference is significant at the level p = 0.001.Thus it can be concluded that interaction surfaces are more rugged than the rest.

Shape of the Interaction Surface
Comparison of the average CV values of different surface regions can also tell if a particular surface area is protruding or is recessed: The average CV value of a protruding surface area is smaller than the overall average of the surface atoms' CV value, while the recessed region's CV average is larger.
Using the definition of interaction surface as above, the CV values of the interaction and full surface were found to be 0.454 (s.d.= 0.044) and 0.464 (s.d.= 0.029), resp.While this difference is quite small it is significant at the level p = 0.001 according to the Student t test.Thus, we can conclude that the interaction surface is likely to be protruding, even if only slightly.

Conclusions
Use of the parameter-free definition of contact and the concept of circular variance as a buriedness indicator, several properties of protein-protein interfaces were found to be different from those of the overall surface.It is expected that these properties can be exploited as additional filter, aiding in selecting the best model from the result of in silico predictions.
Identification of residues with high propensity for contact forming can also aid drug discovery.Significant numbers of drugs are designed to disrupt protein-protein associations and thus targeting residues that are more likely to form contacts increases the probability of success when searching for lead compounds.
The data set used for these comparisons has not been filtered for size or strength of interaction or the size of the interface.It is quite possible that partitioning the data set can refine these differences found: for some subsets the differences could be larger, for others they may become insignificant.Furthermore, some of the results presented are based on a specific definition of contact surface-other definitions may change the results.

Figure 1 .
Figure 1.Schematic showing the relation between mutual proximity and contact.Arrows from one component (filled circles, full line) to the other component (open circles) show the nearest atom in the other set; arrows with broken line show the nearest atom in the first set.The two-headed thick arrow shows the mutually proximal pair.

Table 1 .
Residue propensities to appear (all), be on the surface (surf) or form contact (cont).

Table 2 .
Contact pair propensities normalized by surface propensities.
GLY ALA