Fractal Hybrid Orbitals Analysis of the Tertiary Structure of Protein Molecules†

The structure and shape of the polypeptide chains of proteins are determined by the hybridized states of the atomic orbitals in the molecular chain. The calculated s ratio in the spn hybrid orbitals is computed from the fractal dimension D of the tertiary structures of 43 proteins selected to cover the five structural classes of protein molecules. A brief introduction to fractal theory is given in the text. It is demonstrated that the principles dictating the folding of the local and global backbone structures are well characterized in terms of the representation given by fractal theory. Comparison of the fractal character of protein molecules with that of the ideal Gaussian chain revealed several features of these principles. It is also shown that proteins in the β type structural class are distinguished quantitatively from those in other classes with this representation. They show a higher s ratio (0.32) in sp2.20 hybrids, rather similar to planar sp2. Comparison of the proteins with a Gaussian chain is interpreted in terms of steric repulsion.


Introduction
Several kinds of regular backbone conformations of proteins have been discovered by careful observations of their tertiary structures as determined by X-ray and nuclear magnetic resonance (NMR) analyses. These regular arrangements are the α-helix, the β-pleated sheet and the reverse turns, globally known as the protein secondary structures [1,2]. Careful observation further revealed that the aggregates of the secondary structures, which are commonly observed in various proteins, make up another class of arrangements of the backbone chain, which are known as the super-secondary structures. The coiled-coil α-helix [3], the βξβ-unit [4] and the β-meanders [5] are the well-known members of these super-secondary structures. In addition to the secondary and the super-secondary structures, the structural domains referring to those parts of the proteins that form well-separated globular regions, were also revealed as constitutional units of the protein conformation [6]. These local arrangements of the backbone chain of protein seem not only to form structural units in completely folded conformations but also (even though not yet proven completely) to be folding units, which fold almost independently of each other when the protein molecule folds from the denatured conformation to the native conformation.
These local arrangements of backbone chains were discovered mainly by careful, but subjective and qualitative, observations of the tertiary structures of proteins by resorting to the pattern recognition abilities of man. An objective and quantitative analysis of the conformation of proteins is desirable, but is hindered by the irregularity of the structures of protein molecules. Until recently classical Euclidean geometry and differential geometry were the most common tools for dealing with shapes, but their applicability was limited to shapes such as circles, ellipses, parabolas, spheres, curves or other differentiable surfaces. For extremely irregular surfaces such as those mentioned here or the trajectory of a particle under Brownian motion, these geometries cannot be used. Applying differential geometry to the analysis of protein structure, Rackovsky and Scheraga demonstrated that various types of ordered backbone structures are well characterized in terms of their differential geometric representation [7]. Bends were classified in a very natural way in their depiction. The availability of their method, however, is strictly restricted to ordered structures or to very short local structures like bends. A new mathematical tool, which deals with the irregular form, is thus desirable for the analysis of protein conformation.
A newly devised mathematical theory called fractal theory has developed rapidly over the last few decades. The object of this theory is to deal with irregular forms, which were beyond the scope of both Euclidean and differential geometry. This theory provides means to extract a rule or a regularity hidden within irregular forms. If this is the case, then fractal theory seems to be potentially useful in the analysis of the tertiary structures of proteins, which have extremely complex irregular forms. In this paper, the results of an attempt to apply fractal theory to the analysis of the tertiary structure of protein are presented. In the second section a brief introduction to fractal theory will be given. The fractal dimension is related to the bonded state of atomic orbitals and the fractal hybrid orbitals in the third section. In the fourth section the results and discussion are presented. The last section is devoted to the conclusions.

Fractal dimension of proteins
Structures of fractional dimension are known. Mandelbrot [8][9][10] pioneered the theoretical concepts and physical applications of this relatively new field of geometry and popularized the term fractal for a structure characterized by a fractional dimension [11,12]. By definition, any structure possessing a self-similar motif that is invariant under a transformation of scale may be represented by a fractal dimension. Self-similarity is geometric in regular structures; in random or irregular objects, is primarily statistical in nature. The average end-to-end length L of an unbranched polymer chain constitutes a statistically self-similar property. The fundamental relationship in fractals predicts that the number of monomer segments N of length ε is related to L by the fractal dimension D: where the exponent is also equal to the inverse Flory constant ν F in polymer theory [13][14][15][16].
Theoretical considerations provide limits of (1 ≤ D ≤ 2) in Equation (1), which corresponds to a linear structure, L = ε N, and a structure represented by an unrestricted random walk in D = 2, where L = ε N 1/2 [17].
Obeying the precept of fractal theory, the length of a protein molecule (L) is defined as a function of the fineness or coarseness of the scale (ε) as follows. At first, a zigzag line is drawn by connecting the C α atoms of the protein step by step at intervals of ε residues starting from the C α atom of the N-terminal residue. If not enough ε residues remain to complete a span in the final step, then the remaining residues are left as they are the drawing is stopped at the end. The parameter ε, the intervals at which the zigzag line is drawn, represents the coarseness of the scale, because a small ε corresponds to a closer observation of the protein molecule.
The length of the molecule with a scale of ε is defined as the sum of the length of the zigzag line and a correction term, which takes account of the contribution from the residues left at the side of C-terminal and is to be discussed in detail below. Namely, the length of the molecule with a scale of ε is expressed as follows: The correction term, which takes account of the contribution of the residues left unconnected at the C-terminal side, is evaluated with the equation where N and ε are the number of residues left unconnected and the scale, respectively, and the mean length is the mean length of the fractional lines of the zigzag line.
Plotting the common logarithms of the scale ε and the length of the molecule on the abscissa and the ordinate, respectively, fractal diagrams have been drawn for 43 proteins. The 43 proteins studied in this work were selected from the proteins in the Protein Data Bank (PDB) database of the [18] to cover the five protein structural classes [19,20]. The names of the proteins chosen and their classes are listed in Table 1. The five protein structural classes, i.e. the only-α-helices class, the almost exclusively β-sheet class, the class where α-helix and β-sheet tend to be segregated along the chain, the class where α-helix and β-sheet tend to alternate along the chain and the no α-helix nor β-sheet class, are abbreviated hereafter α, β, α + β, α / β and "miscellaneous" classes, respectively. In this work, we have used the BABEL software program, which implements a general framework for converting between 37 file formats used for molecular modelling, including PDB [21]. A version of BABEL, called BABELPDB, has been written for the search, retrieval, analysis and display of information from the PDB database. Several options are allowed: (1) convert from PDB to other formats; (2) add hydrogen atoms; (3) strip water molecules; and (4) separate α-carbons. The α-carbon skeleton extracted with BABELPDB allows for drawing the ribbon image, which determines the secondary structure of proteins. The coordinates obtained with BABELPDB have allowed for characterizing the presence of hydrogen bonds. An algorithm for detecting hydrogen bonds has been implemented in the program TOPO for the theoretical simulation of the molecular shapes [22].

Fractals for hybrid orbitals in protein models
The concept of fractal was applied to a number of protein properties [23]. A protein consists of a polypeptide chain that is made up of amino acid residues linked together by peptide bonds. An enzyme is, in general, a kind of protein with catalytic activity and a long chain, and its structure and shape are determined by the hybridized states of atomic orbitals in the molecular chain.
The polypeptide chains of proteins and enzymes resemble the Koch curve, whose shape and conformation are related to the bond angle of atomic orbitals [24][25][26][27][28][29]. The bond angle may be regarded as a generator. Assuming AO = OB, < AOB = θ, then, the number of intervals N = 2, the similarity ratio γ = 1 21− cos θ ( ) , and the fractal dimension is given by For a given molecular chain, basing on the principles of the orthogonality of the hybrid molecular orbitals, the bond angle θ ij between ψ i and ψ j orbitals is given by where s i and s j denote the ratio of containing s orbital in the hybrid orbitals ψ i and ψ j , respectively. Equation (2) can be simplified as follows For the equivalent hybrid orbital (s i = s j = s), therefore, and D = 2 ln 2 Obviously, D depends on the bonded state of atomic orbitals.
The relation of hybridization with structural properties such as the 13 C-NMR carbon-proton coupling constant has been reported elsewhere [30][31][32][33]. The method was successfully applied to iron proteins and other biopolymers [34,35]. A version of the hybrid orbital fractal model has been implemented in the TOPO program for the theoretical simulation of molecular shape. Another version of the algorithm has been implemented in the GEPOL program for the calculation of molecular volume and surface [36].

Calculation results and discussion
Isogai et al. analyzed the tertiary structures of 43 proteins selected to cover the five structural classes of protein molecules: proteins with only α−helices (α), almost exclusively β-sheet (β), α-helix and β-sheet tending to be segregated along the chain (α + β), α-helix and β-sheet tending to alternate along the chain (α / β) and no α-helix nor β-sheet ("miscellaneous") [37]. They applied fractal theory with the intention of devising a new tool for quantitative description of the tertiary structure of the protein. Their results for the fractal dimension are summarized in Table 1.
The fractal dimension of the Gaussian chain is 1.5. The Gaussian chain is composed of serially aligned identical elements, which are connected in the manner that the orientation of any element is completely random. In other words, neither repulsive nor attractive force acts between the elements. The mean value of the fractal dimension of real protein (D = 1.34), which reflects the local conformation of protein molecule, is smaller than that of the Gaussian chain. This demonstrates that the local conformation of protein molecule is more extended or swollen than that of the Gaussian chain. This might be due to the interatomic steric hindrance, which keeps atoms away from each other.
The fractal dimension D of various structural classes are next examined separately ( Table 1). The magnitude of the mean values of the fractal dimension D of the five structural classes decreases in the order of miscellaneous, α, α + β, α / β and β classes, and the β type class shows significant differences with all the other classes except that of miscellaneous type. The miscellaneous class shows the largest value of the fractal dimension D, but this result should not be stressed too much because this class includes only two sample proteins in it. It is a natural consequence that among all the classes the α and β classes show the largest and the smallest value of the fractal dimension D, respectively, except for the miscellaneous class, because the local structure of the β class is extended more than that of the α class and those of the α + β and α / β classes fall in between. There might be several possible explanations for the observation that only the β class displays a significantly different value of the fractal dimension D compared to the other classes and no significant difference is observed statistically between other pairs of classes. One is that the numbers of sample proteins in each class adopted in this research are too small to distinguish the difference of the local structures of the classes, and that if the number of sample proteins are increased the fractal analysis could distinguish these differences.
Another is that the ability of the fractal analysis is restricted with narrow limits not to be able to distinguish minute difference of the local structures of α + β and α / β types. The fractal dimension D is defined with the local structures within the range of 10 amino acid residues. This range is too short to distinguish minute difference of the local structures of α + β and α / β types whose remarkable local structures are characterized beyond this range. Thus, the latter explanation seems to be pertinent.
The values of fractal dimension D of various proteins are distributed within the range of 1.25 to 1.48. Isogai et al. showed that the fractal dimension D is independent of the chain length.
The calculated ratio of containing s orbital in the sp n hybrid orbitals is computed from the fractal dimension. A mean value of 0.29 predicts sp 2.5 hybrid orbitals, halfway between planar sp 2 hybrid orbitals and tetrahedral sp 3 orbitals. In particular, α-helix proteins show the lowest s ratio (0.26), which predicts sp 2.8 hybrid orbitals, more similar to tetrahedral sp 3 orbitals. Remarkably, β-sheet proteins show the greatest s ratio (0.32), which predicts sp 2.2 hybrid orbitals, more similar to planar sp 2 orbitals.
The fractal dimensions of all proteins in Table 1 are summarized in Figure 1. The dotted line corresponds to the Gaussian chain model. The local conformation of protein molecule is more extended than that of the Gaussian chain. In particular, the proteins in the β type structural class are distinguished quantitatively from other classes and correspond to smaller fractal dimension and greater steric hindrance.  The s ratio in the sp n hybrid orbitals of proteins is summed up in Figure 2. In particular, the proteins in the β type structural class correspond to greater s ratios. The n index in the sp n hybrid orbitals of proteins are summarized in Figure 3. Notice that the order of the indicated hybrid orbitals is reversed from that in Figure 2. In particular, the β-type proteins correspond to smaller n values, which predict sp 2.2 hybrid orbitals, rather similar to planar sp 2 orbitals.

Conclusions
From the preceding results the following conclusions can be drawn: 1. Defining a measure of the length of protein molecule as a function of coarseness of the scale to see the molecule, the fractal analysis of the tertiary structure of protein has been performed. The fractal dimension D has been calculated for 43 proteins. The fractal dimension D represents the complexity of the local conformation.
2. Comparison of this dimension D with that of Gaussian chain revealed that the steric hindrance between atoms is the major factor determining the local conformation. There is nothing new about this conclusion, but there was no quantitative description of the major factors that dictate the formation of local conformation. Namely, the more the steric hindrances dominate the construction of the conformation of protein molecule, the smaller the fractal dimension is. Conversely, the more the attractive forces dominate for it, the greater the fractal dimension is. Thus, the fractal dimension can be used as an indicator of the average strength of the forces acting between the atoms in the protein molecule. The stability of the structure of protein molecule, which has strong connection with the forces maintaining the molecular structure, might possibly be a good object for fractal theory.
3. The fractal dimension D of various proteins was examined separately for the five structural classes of proteins. The α and β type classes have shown the smallest and largest fractal dimension D, respectively, reflecting the extended character of β-structure and the closely packed character of α-helix, respectively. The β type class has been shown to be significantly different from other classes in terms of the fractal dimension D, but Isogai et al. failed to discriminate the differences between the pairs of other classes with the fractal dimension D, showing the availability and the limitation of the fractal dimension D.
4. The results of an attempt to apply fractal theory to the analysis of the tertiary structure of protein have been reported in this article. The analysis has been carried out focusing on the relationship between the fractal dimension and the structural classes, which was widely accepted. Availability and limitations of the theory have been made clear. Detecting the minute difference of protein conformations is not within its scope, but grasping the general character of global conformation is possible. Phenomena deeply connected with global conformation might possibly be good objects of study for fractal theory.
5. The structure and shape of the polypeptide chain of proteins are determined by the hybridized states of atomic orbitals in the molecular chain. The calculated s ratio in the sp n hybrid orbitals is computed from the fractal dimension. The proteins in the β type structural class are distinguished quantitatively from other classes with this representation.
Further work is in progress dealing with the assignment of a fractal dimension to each individual atom of the proteins using the program TOPO. A particular interest is the atom-by-atom characterization of the ratio of contained s orbital because linear relationships exist between s and the carbon-proton coupling constant ( 1 J13 C − H ), C-H bond energy, C-H bond length, bond angle and pK a .