Characterizing Cavities in Model Inclusion Fullerenes : A Comparative Study

The fullerene-82 cavity is selected as a model system in order to test several methods for characterizing inclusion molecules. The methods are based on different technical foundations such as a square and triangular tessellation of the molecular surface, spherical tessellation of the molecular surface, numerical integration of the atomic volumes and surfaces, triangular tessellation of the molecular surface, and cubic lattice approach to the molecular volume. Accurate measures of the molecular volume and surface area have been performed with the pseudorandom Monte Carlo (MCVS) and uniform Monte Carlo (UMCVS) methods. These calculations serve as a reference for the rest of the methods. The SURMO2 method does not recognize the cavity and may not be convenient for intercalation compounds. The programs that detect the cavities never exceed 1% deviation relative to the reference value for molecular volume and 5% for surface area. The GEPOL algorithm, alone or combined with TOPO, shows results in good agreement with those of the UMCVS reference. The uniform random number generator provides the fastest convergence for UMCVS and a correct estimate of the standard deviations. The effect of the internal cavity on the solvent-accessible surfaces has been calculated. Fullerene-82 is compared with fullerene-60 and -70.


Introduction
Analyses of molecular packing in liquids and crystals [1] have shown that cavities, defined as empty spaces, may occur as a result of local packing defects.In proteins, internal cavities large enough to accommodate even xenon atoms have been identified [2][3][4][5].Although the existence of these internal cavities has been largely associated with conformational flexibility [6][7][8], their role remains poorly understood.
The characterization of cavities can be applied in studies of enzyme-substrate compatibility; in the chemistry of solvation, cage compounds, and transition-metal complexes; in catalysis by inclusion in zeolites; in studies of solidgas reactions in which channels in the crystal structure allow selective diffusion of the gas in the solid; and in solidstate reactions which require mobility in their initial stages, when the crystal structure is not yet distorted by product formation.Internal cavities affect protein stability [9], and play an important role in the close packing of amino acid side chains, particularly in the core of the protein [10,11].The knowledge of the channels and cavity structures within crystals of globular proteins may be useful [12][13][14][15][16][17][18][19][20] in detecting the binding sites of specific ligands and ions on accessible surfaces of protein molecules within channels; in studying the structure and physical properties of the water near protein surfaces within the channels; for predicting the narrowest sites of the channels for diffusion of molecules of arbitrary sizes, etc.
In physical models of molecules, spheres whose size is determined by the atomic Van der Waals radii represent atoms.
Molecular models are thus commonly represented as systems of fused hard spheres with unequal radii, and there has been an important body of work on the computation of surface area and volume of such systems [21] and on the topological description of their surfaces [22,23].
For small molecules, the van der Waals surface gives a good description of the molecular shape and the outer surface.In large molecules such as proteins, where most of the van der Waals surface is buried in the interior, other, more suitable definitions such as the solvent-accessible surface [24,25] and solvent-excluding surface [10] have been used.The corresponding surface areas can be computed from the atomic coordinates by approximate procedures [24,25] or by using analytic algorithms such as those of Gibson and Scheraga [21], Connolly [26,27] or Richmond [28].The analytic algorithms enable the calculation of first-order derivatives of the surface area, which is an important requirement for the use of surface area functions in optimization procedures [29].
Methods to compute atomic and molecular volumes of proteins from the atomic coordinates have also been derived, originally by Richards [4] and Finney [30], and later implemented in automatic algorithms.On the basis of calculations of surface, protein cavities have been detected and analyzed by computer.However, all the reported cavity detection algorithms suffer from a number of shortcomings that influence the results.The most common are (1) difficulties in detecting cavities smaller than a certain size, or cavities that are separated by just one layer of spheres from the surface, and (2) the tendency to detect spurious cavities when parts of the outer surface are connected through narrow channels.Alard and Wodak [31] have described a procedure that is able to detect cavities in a system of interpenetrating spheres, no matter what their size or shape.
In a previous article the (I h )fullerene-60 (C 60 , buckminsterfullerene) and (D 5h )fullerene-70 (C 70 ) molecules were selected as model cavities in order to test various methods for characterizing inclusion molecules [32].In the present article (C 2 )fullerene-82 (C 82 ) has been selected as a model cavity for characterizing inclusion molecules.These inclusion molecules are more suitable than proteins for testing precise algorithms, because of their moderate sizes.The following programs and the methods implemented in them have been used for characterizing the molecular cavities of fullerenes: ( 1 In the next section, the different methods used in the preceding programs are briefly described, as well as the new features that we have included in them.Following that, results are presented and discussed.The last section summarizes our conclusions.

Square and triangular tessellation of the molecular surface (program SURMO2)
SURMO2-I represents the global shape of a molecule as an envelope of the van der Waals spheres of the external atoms [33].This allows a numerical treatment of integrals defined on the molecular surface, as where g is the position vector of a point in the surface and represents the elementary solid angle.The evaluation of the Equation (1)-type integrals is obtained by calculating the finite sum: To achieve the summation one can take a mesh of points by cutting the surfaces on a single unitary sphere with a uniform distribution of 2N 1 parallels and N 2 meridians.If the molecule is cut by an axis passing by the origin, two intersecting points, which correspond to two values of the g function, are obtained: g+ and g-.Denoting by M 0 the upper pole, M 1j the points placed on the first parallel, M ij those of the i-th parallel and those placed on the equator, the integral in Equation ( 1) can be measured as (2) where indicates a sum over all of the mesh points placed on the i-th parallel.
The geometric descriptor most directly connected to the shape is the molecular volume: where r is the radial spherical coordinate, and Equation ( 2) has been used.
The calculation of the bare molecular surface S and accessible surface areas have been implemented in SURMO-I.In a first approximation, S can be calculated as The deviation from the spherical shape was corrected in SURMO-II by dividing each g 2 value by the cosine of the angle formed by the semiaxis and the normal vector to the surface at this point [32]:

Spherical tessellation of the molecular surface (program MS)
The solvent-accessible surface area was originally defined and computed by Lee and Richards [24] as the area traced out by the centre of a probe sphere representing a solvent molecule as it is rolled over the surface of the molecule of interest.The solvent-excluding surface has also been defined by Richards [10].It consists of the part of that van der Waals surface of the atoms that is accessible to the probe sphere (contact surface), connected by a network of concave and saddle-shaped surfaces (reentrant surface) that smoothes over the crevices and pits between the atoms.Notice that, unlike the solvent-accessible surface, the solvent-excluding surface is defined as a part of the van der Waals surface.This surface is the boundary of the volume from which a probe sphere is excluded if it is not to experience van der Waals overlap with the atoms.Improving previous algorithms for calculating contact and reentrant surfaces by Greer and Bush [34], and for calculating the solvent-accessible surface area by Shrake and Rupley [25], Connolly [26,27] has developed the MS program.This program places dots over the solvent-accessible surface of a molecule in order to calculate the solvent-excluding surface of a molecule.Connolly pointed out that there are two kinds of reentrant surface: (1) the concave reentrant surface, which is generated when the probe sphere simultaneously touches three atoms, and (2) the saddle-shaped reentrant surface, which is generated as the probe sphere rolls along the crevice between two atoms.Connolly also found that the reentrant surface belonging to one probe may be contained in the interior volume of an overlapping probe, and so must be removed.The analysis of a molecular surface by the MS program is carried out by rolling a probe sphere (e.g., water, radius = 1.4A) over the whole molecular surface.For each contact point with the van der Waals surface, there is a dot.The result is a smooth van der Waals surface, which represents the accessible zones for the solvent molecule, including the internal cavities.However, those interstices that are too small to accept the probe are eliminated.Thus, the solvent-accessible surface has been calculated.These surfaces can be graphically visualized by means of a series of points (dot surface).

Numerical integration of the molecular surface (program SCAP)
The hypothesis of Hopfinger is that one can centre a solvation sphere on each group of the molecule [35,36].The intersecting volume V°, between the solvation sphere and the van der Waals spheres of the remaining atoms, is then calculated.This volume allows the calculation of the effective volume of solvation of the group for a given conformation.The model manages up to four parameters for a given solvent: (1) n: maximum number of solvent molecules allowed for filling the solvation sphere; (2) Dg o : variation of Gibbs free energy associated with the extraction of one solvent molecule out of the solvation sphere [37,38]; (3) R v : radius of the solvation sphere; and (4) V f : free volume available for a solvent molecule in the solvation sphere.In the solvation sphere, part of the volume excludes the solvent molecules.This volume consists of the van der Waals volume of the group at which the sphere is centred and of a volume representing the groups bonded to the central group.The latter volume is represented by a number of cylinders.The difference between the total volume of the solvation sphere and the volume excluded to the solvent molecules represents the volume V' that is actually available for the n solvent molecules.Therefore, V f can be calculated as The variation of Gibbs free energy associated with the extraction of all the solvent molecules out of the solvation sphere of a group, R, is and for the extraction of all the solvent molecules out of the solvent spheres of a molecule, the result is Finally, the solvation free energy of a molecule is minus .However, Pascal [39] found important difficulties in recalculating the volume V'.Hence, the values of V f [40][41][42][43][44][45] have been recalculated by means of a computer program [39,46].Once the values have been estimated, one can use them for calculating other properties.The partition coefficient P of a solute between a pair of immiscible solvents [1octanol (o) and water (w)] is critical for many phenomena [47][48][49][50][51][52][53][54].With the equation one can calculate the decimal logarithm at a given T , which is taken as 298K.

Accurate triangular tessellation of the molecular surface (program GEPOL)
GEPOL computes the molecular surface as a point distribution [55].Each point represents a piece of the surface, called a tessel.Three kinds of envelope surfaces can be computed: (1) the van der Waals molecular surface, taken as the external surface resulting from a set of intersecting spheres centred on the atoms or groups forming the molecule; (2) the solvent-accessible surface, taken as the surface generated by the centre of the solvent, considered as a rigid sphere (probe sphere), when it rolls around the van der Waals surface [24]; and (3) the solvent-excluding surface composed of its two parts: the contact surface and the reentrant surface [10].
The program computes the chosen molecular surface as a point distribution and calculates the corresponding area and volume.The model starts by setting a sphere on each atom.If the solvent-excluding surface is desired, the program will fill the spaces inaccessible to the probe sphere by creating a new set of spheres among the original set.Once the set of spheres that form the surface has been defined the program calculates the molecular area and volume.
The algorithm creates new spheres in the spaces inaccessible to the solvent.Once the set of spheres has been defined, their spherical surfaces are divided into a set of triangular tessels by using a data-coded generic pentakis dodecahedron.A simple geometric procedure is used to eliminate all triangles found at the intersection volume of the spheres.The division can be increased by means of the NDIV tessellation parameter, which specifies the division level for the triangles on the surface.If NDIV is 1, 2, 3, 4, or 5, the number of spherical triangles per sphere will be 60, 240, 960, 3 840, and 15 360, respectively.
The molecular surface area is easily obtained by summing up the areas of all triangles: where S i is the area of the i-th triangle.To calculate the molecular volume, the model first defines a molecular origin.Let be the position vector of the i-th triangle centre and be the corresponding normal vector to the surface at this point.The volume is obtained by summing up all solid volumes made by the triangles vector surfaces and the origin of the molecular system: . It is easy to verify that internal volumes are cancelled out.The coordinates at the centre of each triangle, the elementary surface value and the components of the normal vector to the tessellated surface provide the data concerning the surface in order to facilitate graphical visualization.
The calculation of the molecular globularity and rugosity indices as well as the fractal dimension of the solventaccessible surface has been implemented in GEPOL [32].The algorithm is fast and efficient.

Cubic lattice approach to the molecular surface (program CSAM)
The volume of a molecule can readily be computed by decomposing the space it occupies, using a cubic lattice [56][57][58][59].The problem of computing surfaces can be reduced to the computation of volumes by converting the surface into a sheet of finite uniform thickness dh; by computing its volume the surface area A can be obtained as [60].It can be shown that if a plane cuts a randomly oriented cube, then on average the area of the cut will be: This means that an estimate of the surface area of an object can readily be obtained: only the cubes cut by the surface contribute to A and the increment is .Of course, this method is not exact.The cut between a plane and a cube is a polygon with from three to six sides.If, for example, one corner is located on one side of the plane and the remaining seven corners on the opposite side, then one has a triangular cut, and on average the areas of triangular cuts tend to be somewhat smaller than the average of which is being used.This observation suggests a slightly more elaborate scheme, one in which the polygonal cuts are classified according to their number of sides and using different increments DA k for the five different types of cuts.The area is now where N k represents the number of cubes cut by the surface with the k-th type of cut for which the increment DA k is added to the surface area A. Senn [60] has reported the classification scheme for the preceding five cases together with the computed values for DA k and has written a computer program for the computation of surface areas of molecules (CSAM) with spherical atoms.The limiting parameter a is the length of the edges of the cubes used in the cubical tessellation of the molecular space.
The calculation of the molecular volume V has been implemented in CSAM.This V can be evaluated, in first approximation, by adding the volumes of interior cubes and half of the volumes of surface cubes, i.e., where I is the number of cubes in the inside of the molecular surface and P is the number of cubes partially included in this surface.This approximation becomes asymptotically correct as the unit length a becomes smaller.Of course, this method is not exact.When the cube is a surface one, it is expected to contribute to the molecular volume by about half of the volume of the cube.However, exact mean contribution is slightly less than half of the volume of the cube because of the convex curvature of spheres.For a representative sphere of radius 3.0A, the proper mean contribution in this case is found to be 0.482 (instead of 0.5) of the volume of the cube.This value has been used as the proper contribution from a surface cube: [60].

Cubic lattice approach to the molecular space (program TOPO)
TOPO [61] represents the surface of molecules by the external surface of a set of overlapping spheres with appropriate radii [62], centred on the nuclei of the atoms [63][64][65].The molecule is treated as a solid in space, defined by tracing spheres about the atomic nuclei.It is computationally enclosed in a graduated rectangular box and the geometric descriptors evaluated by counting points within the solid or close to chosen surfaces.The molecular volume V is concurrently approximated as , where P is the number of points within the molecular volume and GRID is the size of the mesh grid.
As a first approximation, the molecular bare surface area could be calculated as , where Q is the number of points close to the bare surface area (within a distance between R X and of any atomic nucleus X).However, the estimate has been improved: if the point falls exactly on the surface of one of the atomic spheres, it accounts indeed for GRID 2 units of area on the molecular bare surface.This is because the total surface of atom X can accommodate points.When a point falls beyond the surface, it represents GRID 2 units of area on the surface of a sphere of radius , not on the surface of atom X.On the surface of X it accounts only for a fraction of this quantity, namely, .The total bare surface area is, therefore, calculated as , where F is the sum of elements AF defined as for those points close enough to the surface of any atom X. is the squared radius of atom X and is the squared distance of point I from the atomic nucleus X.
Two topological indices of molecular shape can be also calculated: G and G'.Consider S e as the surface area of a sphere whose volume is equal to the molecular volume V.The ratio is interpreted as a descriptor of molecular globularity.The ratio is interpreted as a descriptor of molecular rugosity.
Another molecular geometric descriptor was proposed by Lee and Richards [24]: the solvent-accessible surface, or, simply, accessible surface, AS.This AS is defined by means of a probe sphere, which is allowed to roll on the outside while maintaining contact with the molecular bare surface.The AS can be calculated in the same way as the molecular bare surface by mean of pseudoatoms, whose van der Waals radii have been increased by the radius R of the probe.
Fractal surfaces [32,66] provide a means for characterizing the irregularity of molecular surfaces.The area of the AS depends on the value of the probe radius, R. The fractal dimension, D, of the molecules may be obtained according to Lewis and Rees [67] as A version of the TOPO program has been implemented in the AMYR program for the theoretical simulation of molecular associations and chemical reactions [68][69][70][71] and in the GEPOL program [55] for the calculation of the molecular volume and surface.

The Monte Carlo measure of the molecular volume and surface (program MCVS)
The Monte Carlo (MC) measure of the volume of some region in d-dimensional space (i.e., the union or intersection of some spheres) may be performed as follows [72]: 1. Build a routine able to detect whether a point is inside the region.
2. Find a rectangular box including the region.The MC measure of the surface area of the union of n spheres may be performed as follows [73]: 1. Let S be the surface of the i-th sphere and T be the sum of all these surfaces.Generate N points distributed uniformly on the total surface of all the spheres.This is performed with the following substeps: a. Select one of the n spheres, such that each i-th sphere has a probability equal to of being selected.
b. Generate a d-dimensional point distributed uniformly in the i-th sphere and norm it to obtain .
c.If falls inside one of the other spheres, does not pertain to the surface of the union.If falls outside, pertains to the surface of this union.
3. Count the number N u of points lying on the surface of the union.

Let
and .Thus, the MC estimate of the surface area is and the corresponding standard deviation is .

The uniform Monte Carlo method (program UMCVS)
To ensure an unbiased sampling of the whole space of parameters, the random number generator (RNG) of the MCVS numeric algorithm has been substituted in the uniform Monte Carlo (MC) program UMCVS, by a uniform random number generator (URNG) that yields uniformly distributed real numbers over a given finite range [74,75].We have used the LOIUAB URNG [74].This subprogram returns a real random number following the uniform law .The LOIUAB algorithm has three steps: 1. Update S i as .Here MOD has the usual meaning in programming languages.
2. Perform floating-point normalization of S i between 0 and 1 (real division of S i by 2 M ).

Transformation from to .
The values adopted in the database of the program are , and .The initiation step is , which shows a period of .
The LOIUAB method guarantees the portability of both source code and results.The consequence in terms of the representation of big integers and of their conversion into double precision is that they are not machine dependent.

Geometric descriptors
A summary of the basis of various programs and algorithms used in the present work to characterize the molecular cavity of fullerenes is presented in Table 1, along with usual values for the limiting parameters and the value used in this work [32].Notice that some methods were designed for proteins and so the usual limiting parameters have been correspondingly increased whenever possible for the smaller molecules discussed in this work.For SURMO2 the limiting parameters cut the unitary sphere in surface elements.For the MS algorithm the density of surface points, which has been taken as 100 points•A -2 , gives 26 251 total surface points for C 82 .No limiting parameter is given as input for the SCAP model.For the GEPOL program, variable NDIV controls the tessellation level and has been taken as its maximum value of 5; this gives 2 074 points for C 82 .The CSAM and TOPO methods have as limiting parameter the size of the mesh grid, which has been taken as 0.1A.The molecular volume and surface were measured accurately with the MCVS and UMCVS methods, using observations for each measure.This represents 14 730 points•A -3 for C 82 .The van der Waals radius for the carbon atom has been taken as 1.76A for all the methods [32].b No limiting parameter is given as an input for this model.
c Size of the mesh grid (A).
The geometry of C 82 has been optimized with the MM2-87 program [76].The molecule is brought into its principal inertial coordinate system.The length x of the molecule is defined as the maximum length, the height z as its minimum thickness, and its width y is measured at right angles to the axes indicated by the length and height.The origin is the centre of mass for the molecule.The C 82 structure is given in Figure 1.The molecular volumes of C 82 are shown in Figure 2a.The total volume V t is the sum of the molecular (V m ) and cavity (V c ) volumes: .The molecular surfaces of C 82 are shown in Figure 2b.The molecular surface S m , is the sum of the external (S e ) and cavity (S c ) surfaces: . The geometric descriptors and topological indices calculated for C 82 are given in Table 2.The SURMO2 method is unable to recognize the fullerene cavity.Hence, the calculated volume V is a measure of the total volume V t , and is 730.6A 3 .The rest of procedures recognize the cavity, and the value of the molecular volume V m is available and lies in the range 679-686A 3 .These results compare well with the accurate UMCVS reference calculation, which gives a result of (678.9±0.5)A 3 .The external surface area, A 2 , is estimated by SURMO2-I, which has been modified for the calculation of molecular spherical surface areas [32].This result is improved when the correction of the deviation from the spherical shape is taken into account with SURMO2-II ( A 2 ).f This method does not allow the calculation of this property.
g Molecular surface area (A 2 ).
h Water-accessible surface area (A 2 ).
l Fractal dimension of the solvent-accessible surface.
However, the actual (external plus internal) molecular surface area, S m , lies in the range 478-508A 2 (programs from SCAP to UMCVS), which compares well with the UMCVS reference (505.5±0.7A 2 ).Not surprisingly, the globularity index G is rather greater as calculated by SURMO2 (close to unity, as expected for a sphere) compared with the rest of methods, and the rugosity index G' is rather smaller.Notice that the internal cavity effect is difficult to appreciate in the context of the water-accessible surface area AS and of the side-chain accessible-surface area AS' because of the small cavity contribution ( A 2 and A 2 ).Similarly, the cavity effect in the fractal dimension of the solvent-accessible surface is calculated as negligible in C 82 .
The comparison between the GEPOL and TOPO programs has a special interest because the former is an efficient algorithm but does not perform an atom-to-atom partition analysis of the geometric descriptors and topological indices of molecules [61].It is interesting to propose a combined method that calculates the molecular volume with program TOPO and the molecular surface areas with program GEPOL [32].In particular, the resulting G and G' values show the best agreement with the UMCVS reference.
Both Monte Carlo (MC) methods provide statistical estimations of the standard deviation of the C 82 molecular volume .This quantity basically depends on the number of observations and is exact for UMCVS and approximate for MCVS.From the standard deviation s we have estimated the statistical absolute error E as , where t 95%,N is an integral of the Student distribution, which is easy to calculate by looking up the corresponding value of t in Fisher tables for a 95% confidence band and N degrees of freedom (number of observations) [77].This statistical parameter is a function of N. It approaches an asymptotic limit of 1.96 when N approximates infinity.It is not very different from 1.96 for .Thus, with the number of observations given in Table 1 the statistical absolute error has been estimated as A 3 and the relative error as .The same procedure has been repeated for the molecular surface area [ A 2 ] and a statistical absolute error of A 2 has been obtained.The relative error of the molecular surface area (0.14%) is doubled when compared with the molecular volume.This reflects the fact that molecular surface area is a more difficult magnitude to calculate than molecular volume [63,64].It should be remarked that this statistical error is not the only source of error in MCVS.When comparing the MCVS results with the UMCVS references a bias of 2.5A 3 can be seen for the molecular volume [5 times greater than E(V)] and 2.9A 2 for the molecular surface area [4 times greater than E(S)].This feature is a consequence of the nonuniform random number generator (RNG) used in MCVS, which results in an underestimation of standard deviations and errors.This bias corroborates those observed for C 60 and C 70 [32].
The mean relative deviations (percent) of the geometric descriptors and topological indices calculated for C 60 , C 70 and C 82 with different cavity-sensitive methods (from SCAP to TOPO) relative to the UMCVS reference are reported in Table 3.Notice that only three sets of data have been averaged, so that when a calculated relative deviation is rather low [e.g., for TOPO] it could be underestimated.It can be seen from the data that differences in molecular volume V, relative to those obtained with the UMCVS reference, never exceed 1% in spite of the sometimes-relevant differences in method of calculation.The same observation can be made for the molecular surface ].This is encouraging, since it means that all the algorithms, even starting from different points of view, are producing consistent results.It should be noted that TOPO gives the best molecular volume but the greatest errors for the surfacerelated properties.However, GEPOL shows, in general, the best results and the outcome is remarkably good for all the surface-related properties.In comparing CSAM and TOPO, the former, designed for dealing with the surface area, reveals a lower error for this property.However, TOPO, conceived for handling molecular volume, shows the least error of all methods for this property.It is thus appealing to combine GEPOL and TOPO.The combined (S from GEPOL/V from TOPO) method shows relative errors of less than 0.1% for all the descriptors in the three fullerenes and the smallest errors for indices G and G'. e Water-accessible surface area (A 2 ).
i Fractal dimension of the solvent-accessible surface.
From the calculation results in Table 2 and Reference 33, referring to the total (SURMO2) and cavity-sensitive (SCAP to UMCVS) molecular shapes, the geometric descriptors and topological indices for the cavities of the three fullerenes have been calculated.The data are given in Table 4 and show that for C 60 , the molecular volume V, surface area S and water-accessible surface area AS are smaller than for C 70 and for C 82 while the side-chain accessible-surface area AS' remains almost constant (1A) for all the fullerenes.It should be noted that, for the three fullerenes, .This is because a water molecule with an effective radius of A and a volume of about A 3 has little space to move inside the cavity while a probe sphere representing a protein side chain with a radius of A and a volume of about A 3 cannot be contained inside the C 60 , C 70 or C 82 cavity.In particular, for C 82 , the fractal dimension of the cavity solvent-accessible surface is strongly greater than for C 70 and for C 60 , indicating that, for the former, this cavity surface is rather more irregular.f Molecular rugosity (A -1 ).
g Fractal dimension of the solvent-accessible surface.

Solvation descriptors
The solvation descriptors calculated with program SCAP for the three fullerenes are listed in Table 5.The negative Gibbs free energy of solvation is slightly increased on going from C 60 (15.60kJ•mol -1 ) to C 82 (20.86kJ•mol -1 ).
However, the negative Gibbs free energy of solvation in 1-octanol is rather increased from C 60 (128.7kJ•mol - ) to C 82 (172.4kJ•mol - ), so that the 1-octanol/water partition coefficient P o is smaller for C 60 ( ) than for C 70 and for C 82 (26.6).The results can be compared with data obtained with a method developed by Kantola et al. [78].This method allows one to obtain conformationally dependent hydrophobic indices based on atomic contributions.Kantola et al. fitted the following expression for the 1-octanol/water partition coefficient: where S i is the contribution of atom i to the molecular surface area; a, b and g are fitting parameters dependent only on the atomic number of atom i; and Dq represents the atomic net charges [79], which we have computed with the POLAR algorithm [80].The comparison between SCAP and the method of Kantola et al. has special interest.The latter assigns a set of parameters for each atom, depending only on its atomic number and not on the surrounding atoms in the molecule.Instead, SCAP also takes into account the functional group to which each atom belongs in the molecule.A computer program, called CDHI, which uses the method of Kantola et al., has been written [81].The contribution of each atom to the molecular surface area is calculated with the TOPO algorithm [61].The calculated values do not agree exactly with the CDHI values.However, both the trend (C 82 > C 70 > C 60 ) and the basic information that one obtains from the results is the same.In fact, all the three fullerenes will remain in the organic solvent, as one would expect.SCAP has allowed calculating other organic solvent/water partition coefficients [40][41][42][43][44][45] for cyclohehexane (ch) and chloroform (cf).and show the same trend and .In particular, the results, especially for cyclohexane/water, compare better than with calculations carried out with a method proposed by Leo et al. [82].

Conclusions
In this article we have selected C 82 as a model cavity in order to test eight different methods for characterizing inclusion molecules.From these results and previous work on C 60 and C 70 , the following conclusions can be drawn.
1.The use of SURMO2, which do not recognize the cavities, may not be convenient for intercalation compounds, since this method gives an estimate of the global fraction of occupied space within the volume, not allowing distinction among different niches [32,83].
2. The programs that do recognize the cavities (SCAP, MS, GEPOL, CSAM and TOPO) never exceed 1% relative deviation in molecular volume and 5% in surface area, in spite of the sometimes relevant differences in method of calculation.This means that the choice should rely mainly on other possibilities offered by the different methods such as computational performance, possibility of fragment analysis, etc. GEPOL shows, in general, the best results and the outcome is remarkably good for the entire surface areas and surface-related properties.The combined (GEPOL/TOPO) method shows relative errors below 0.1% for all of the descriptors in C 60 , C 70 and C 82 .
3. The molecular volume and surface area were measured accurately with MCVS and UMCVS.The UMCVS measures the molecular volume and surface areas with high precision, so that the standard deviation is divided by 10 each time the number of points (and the CPU time) is multiplied by 100.This may be compared with the CSAM and TOPO methods, using a three-dimensional grid, for which a multiplication of the CPU time by about 1 000 is required to divide the error by 10.
4. The URNG in UMCVS provides the fastest convergence for the algorithm and a better estimate of the standard deviations.The use of a pseudorandom generator in MCVS produces a bias in the calculated properties and underestimates their standard deviations and errors, so that it cannot be recommended for high-precision predictions or as a benchmark maker.
5. In C 60 , C 70 and C 82 the effect of including the internal cavity surface in the calculation of the solvent-accessible surfaces is small but, evidently, if the cavity were much bigger (e.g.fullerene-240, -540, and -960) this effect would be much greater because the solvent molecule would then have room inside; at present, this effect can be corrected with an additional calculation carried out by use of a method that does not recognize the cavity.

3 .
Generate N points uniformly distributed in the box, and count the number N i of points falling inside the region.Let and .Thus, the MC estimate of the volume is , W being the volume of the box, and the corresponding standard deviation is .

Table 1 .
Foundations of different programs used to characterize molecular cavities.
a Density of surface points (points•A -2 ).

Table 2 .
Geometric descriptors and topological indices for fullerene-82 calculated with different methods.a a Statistical errors for some methods are reported in parentheses.b Program SURMO2 modified for the calculation of molecular spherical surface areas.c Program SURMO2 after correcting the deviation from the spherical shape.d The combined method calculates the molecular volume with program TOPO and the molecular surface areas with program GEPOL.e Molecular volume (A 3 ).

Table 3 .
Mean relative deviations (percent) of the geometric descriptors and topological indices for fullerene-60, -70 and -82, calculated with different cavity-free molecule methods relative to the UMCVS reference calculations.Mean of the relative deviations (percent) in absolute value for the first four methods.bThe combined method calculates the molecular volume with program TOPO and the molecular surface areas with program GEPOL.
b Gibbs free energy of solvation in 1-octanol (kJ•mol -1 ).c Gibbs free energy of solvation in cyclohexane (kJ•mol -1 ).d Gibbs free energy of solvation in chloroform (kJ•mol -1 ).e P o is the 1-octanol/water partition coefficient.f CDHI: calculations carried out with a method developed by Kantola et al .g P ch is the cyclohexane/water partition coefficient.h Calculations carried out with a method developed by Leo et al .i P cf is the chloroform/water partition coefficient.