Extending the Characteristic Polynomial for Characterization of C 20 Fullerene Congeners

The characteristic polynomial (ChP) has found its use in the characterization of chemical compounds since Hückel’s method of molecular orbitals. In order to discriminate the atoms of different elements and different bonds, an extension of the classical definition is required. The extending characteristic polynomial (EChP) family of structural descriptors is introduced in this article. Distinguishable atoms and bonds in the context of chemical structures are considered in the creation of the family of descriptors. The extension finds its uses in problems requiring discrimination among same-patterned graph representations of molecules as well as in problems involving relations between the structure and the properties of chemical compounds. The ability of the EChP to explain two properties, namely, area and volume, is analyzed on a sample of C20 fullerene congeners. The results have shown that the EChP-selected descriptors well explain the properties.


Introduction
The term 'secular function' has been used for what is now called a characteristic polynomial (ChP, in some of the literature, the term secular function is still used).The ChP was used to calculate secular perturbations (on a time scale of a century, i.e., slow compared with annual motion) of planetary orbits [1].The first use of the ChP (|λ•Id−Ad|, where Id is the identity matrix, and Ad is the adjacency matrix) in relation with chemical structure appeared after the discovery of wave-based treatment at the microscopic level [2].The Hückel's method of molecular orbitals is actually the first extension of the ChP definition.He uses the 'secular determinant'-the determinant of a matrix which is decomposed as |E•Id−Ad|, standing with the energy of the system (E instead of λ)-to approximate treatment of π electron systems in organic molecules [2].
The second extension of the ChP was found by Hartree [3,4] and Fock [5,6] by going in a different direction with the approximation of the wavefunction treatment.They actually found the same older eigenvector-eigenvalue problem ( §20 in [7]; T1 in [8]) in Slater's treatment [9,10] of molecular orbitals.More generally (and older), the eigen-problem (finding of eigenvalues and eigenvectors) is involved in any Hessian [11] where Ad is the adjacency matrix).The Laplacian polynomial is a polynomial connected with the ChP (in Table 1).This uses a modified form (the Laplacian matrix, [La]) of the adjacency matrix ([Ad]), [La] = [Dg] − [Ad], where [Dg] simply counts on the main diagonal the number of the atom's bonds (the rest of its elements are null; for convenience with the graph-theory-related concept, it was denoted [Dg], from vertex degree).The ChP is related also to the matching polynomial [12], degenerating to the same expression for forests (disjoint union of trees).Adapting [13] for molecules, a k-matching in a molecule is a matching with exactly k bonds between different atoms; see §3.1 & §3.3 in [14] for details.Each set containing a single edge is also an independent edge set; the empty set should be treated as an independent edge set with zero edges-this set is unique due to the constraint of connecting different atoms, where the matching may involve no more than [n/2] bonds, where n is the number of atoms.It is possible to count the k-matches [15], but, nevertheless, it is a hard problem [16], as well as to express the derived Z-counting polynomial [17] and matching polynomial-both are defined using m(k) as the k-matching number of the selected molecule, as shown in Table 1 (where n is the number of atoms).

Name Formula
A topological description of a molecule requires the storing of the bonds (as adjacencies) between the atoms and the atoms themselves (as identities).If this problem is simplified at maximum, by disregarding the atom and bond types, then the molecule is seen as an undirected and unweighted graph.The graph structure can be translated into the informational space by numbering the atoms.Unfortunately, this procedure also induces an isomorphism-the isomorphism of numbering, which may collapse into a nondeterministic polynomial time to be solved-see [18].This is a reason for the desire of graph invariants, e.g., which do not depend on the numbering made on the graph.
Once the atoms (or the vertices) are numbered, the information can be simply stored as lists of vertices (V) and edges (E), and the graph structure of the molecule is associated with the pair G = (V, E).An equivalent representation is obtained using matrices.The adjacencies ([Ad]) are simply stored with 0 when no bond connects the atoms and 1 when a bond connecting the atoms exists.The identity matrix ([Id]) identifies the atoms by placing 1 on the main diagonal and 0 otherwise.
The ChP is the natural construction of a polynomial (in λ) in which the eigenvalues of [Ad] are the roots of the ChP as follows: λ is an eigenvalue of [Ad] → there exists eigenvector [v] As a consequence: ChP is a polynomial (in λ) of degree n, where n is the number of atoms.The ChP finds its uses in the topological theory of aromaticity [19,20], structure-resonance theory [21], quantum chemistry [22], and counts of random walks [23], as well as in eigenvector-eigenvalue problems [24].
This definition allows extensions.A natural extension is to store in the identity matrix ([Id]) non-unity instead of unity values ([Id] i,j = 1 → [Id] i,j = 1) accounting for the atom types, as well as to store in the adjacency matrix ([Ad]) non-unity instead of unity values accounting for the bond types ([Ad] i,j = 1 → [Ad] i,i = 1.This extension was subjected to study in the context of deriving structural descriptors useful for structure-property relationships.

Graphs, Matrices, and the Characteristic Polynomial
The topology of a graph structure could be expressed as matrices, and, in this regard, three of them are more frequently used: identity, adjacency (vertex-vertex, edge-edge, and vertex-edge), and distance matrices can be built (Table 2).

Definition
The matrices reflect in a 1:1 fashion the graph if the full graph is stored (each vertex pair stored twice, in both ways).The matrices of vertex adjacency ([Ad]) and of edge adjacency are square and the double enumeration of the edges is reflected in symmetry relative to the main diagonal (see Figure 1).

Graphs, Matrices, and the Characteristic Polynomial
The topology of a graph structure could be expressed as matrices, and, in this regard, three of them are more frequently used: identity, adjacency (vertex-vertex, edge-edge, and vertex-edge), and distance matrices can be built (Table 2).

Definition
The matrices reflect in a 1:1 fashion the graph if the full graph is stored (each vertex pair stored twice, in both ways).The matrices of vertex adjacency ([Ad]) and of edge adjacency are square and the double enumeration of the edges is reflected in symmetry relative to the main diagonal (see Figure 1).

Graph
Identity Adjacency Distance 2 1 ChP is the natural construction of a polynomial in which the eigenvalues of the [Ad] are the roots of the ChP.ChP is a polynomial in λ of degree n, where n is the number of atoms.A natural extension is to store in [Id] (instead of unity) non-unity values accounting for the atom types, as well as to store in [Ad] (instead of unity) non-unity values accounting for the bond types.
An extremely important problem in chemistry is to uniquely identify a chemical compound.If the visual identification (looking at the structure) seems simple, for compounds of large size, this alternative is no longer viable.The data related to the structure of the compounds stored into the informational space may provide the answer to this problem.Nevertheless, together with the storing of the structure of the compound another issue is raised-namely, the arbitrary numbering of the atoms (Figure 2).ChP is the natural construction of a polynomial in which the eigenvalues of the [Ad] are the roots of the ChP.ChP is a polynomial in λ of degree n, where n is the number of atoms.A natural extension is to store in [Id] (instead of unity) non-unity values accounting for the atom types, as well as to store in [Ad] (instead of unity) non-unity values accounting for the bond types.
An extremely important problem in chemistry is to uniquely identify a chemical compound.If the visual identification (looking at the structure) seems simple, for compounds of large size, this alternative is no longer viable.The data related to the structure of the compounds stored into the informational space may provide the answer to this problem.Nevertheless, together with the storing of the structure of the compound another issue is raised-namely, the arbitrary numbering of the atoms (Figure 2).

Graphs, Matrices, and the Characteristic Polynomial
The topology of a graph structure could be expressed as matrices, and, in this regard, three of them are more frequently used: identity, adjacency (vertex-vertex, edge-edge, and vertex-edge), and distance matrices can be built (Table 2).
The matrices reflect in a 1:1 fashion the graph if the full graph is stored (each vertex pair stored twice, in both ways).The matrices of vertex adjacency ([Ad]) and of edge adjacency are square and the double enumeration of the edges is reflected in symmetry relative to the main diagonal (see Figure 1).

Graph
Identity Adjacency Distance 2 1 ChP is the natural construction of a polynomial in which the eigenvalues of the [Ad] are the roots of the ChP.ChP is a polynomial in λ of degree n, where n is the number of atoms.A natural extension is to store in [Id] (instead of unity) non-unity values accounting for the atom types, as well as to store in [Ad] (instead of unity) non-unity values accounting for the bond types.
An extremely important problem in chemistry is to uniquely identify a chemical compound.If the visual identification (looking at the structure) seems simple, for compounds of large size, this alternative is no longer viable.The data related to the structure of the compounds stored into the informational space may provide the answer to this problem.Nevertheless, together with the storing of the structure of the compound another issue is raised-namely, the arbitrary numbering of the atoms (Figure 2).For a chemical structure with N atoms stored as a (classical molecular) graph, there exist exactly N! possibilities for numbering the atoms.Unfortunately, storing the graphs as lists of edges and (eventually) vertices does not provide a direct tool to check this arbitrary differentiation due to the numbering.The same situation applies to the adjacency matrices.Therefore, seeking for graph invariants is perfectly justified: an invariant (graph invariant) does not depend on numbering.The adjacency matrix is not a graph invariant and, therefore, it is necessary to go further than the adjacencies.
Important classes of graph invariants are the graph polynomials.To this category belongs the ChP-a graph invariant encoding important properties of the graph.On the other hand, unfortunately, ChP does not represent a bijective image of the graph, as there exist different graphs with the same ChP (i.e., cospectral graphs-the smallest cospectral graphs occurs for 5 vertices [25]).In order to count the cospectral graphs, one should compare A000088 and A082104 [26,27].The ideal situation is that the invariant should be uniquely assigned to each structure, but this kind of invariant is difficult to find.A procedure to generate a non-degenerate invariant proposed by IUPAC is the international chemical identifier (InChI), which converts the chemical structure to a table of connectivity expressed as a unique and predictable series of characters [28].
Despite this inconvenience (not representing a bijective image of the graph) due to its link with the partition of the energy [2], the ChP seems to be one of the best alternatives for quantifying the information from the chemical structure.
Previously, researchers have shown the performance of estimation and/or prediction of the ChP on nonane isomers [29][30][31] as well as in the case of carbon nanostructures [32,33].Furthermore, an online environment has been developed to assist researchers in the calculation of polynomials based on different approaches; this includes the ChP [34].

Characteristic Polynomial Extension
When doing calculations on molecular graphs, it is important to consider that, with the increase in the simplification in the graph representation (such as neglecting the type of the atom, bond orders, geometry in the favor of topology), the degeneration of the whole pool of possible calculations increases and there are more molecules with the same representation.This is favorable for the problems seeking similarities but is clearly unfavorable for the problems seeking dissimilarities.
A necessary step to accomplish better coverage of similarity vs dissimilarity dualism is to build and use a family of molecular descriptors, large enough to be able to provide answers for all.In the natural way, such a family should possess a 'genetic code'-namely, a series of variables from which to (re)produce a (one by one) molecular descriptor, all descriptors being therefore obtained in the same way.It is expected that all individuals of the family are independent of the numbering of the atoms in the molecule (should be molecular invariants).
The construction of such a family needs to consider the following: • Molecules carry both topological and geometrical features (see Figure 3); For a chemical structure with N atoms stored as a (classical molecular) graph, there exist exactly N! possibilities for numbering the atoms.Unfortunately, storing the graphs as lists of edges and (eventually) vertices does not provide a direct tool to check this arbitrary differentiation due to the numbering.The same situation applies to the adjacency matrices.Therefore, seeking for graph invariants is perfectly justified: an invariant (graph invariant) does not depend on numbering.The adjacency matrix is not a graph invariant and, therefore, it is necessary to go further than the adjacencies.
Important classes of graph invariants are the graph polynomials.To this category belongs the ChP-a graph invariant encoding important properties of the graph.On the other hand, unfortunately, ChP does not represent a bijective image of the graph, as there exist different graphs with the same ChP (i.e., cospectral graphs-the smallest cospectral graphs occurs for 5 vertices [25]).In order to count the cospectral graphs, one should compare A000088 and A082104 [26,27].The ideal situation is that the invariant should be uniquely assigned to each structure, but this kind of invariant is difficult to find.A procedure to generate a non-degenerate invariant proposed by IUPAC is the international chemical identifier (InChI), which converts the chemical structure to a table of connectivity expressed as a unique and predictable series of characters [28].
Despite this inconvenience (not representing a bijective image of the graph) due to its link with the partition of the energy [2], the ChP seems to be one of the best alternatives for quantifying the information from the chemical structure.
Previously, researchers have shown the performance of estimation and/or prediction of the ChP on nonane isomers [29][30][31] as well as in the case of carbon nanostructures [32,33].Furthermore, an online environment has been developed to assist researchers in the calculation of polynomials based on different approaches; this includes the ChP [34].

Characteristic Polynomial Extension
When doing calculations on molecular graphs, it is important to consider that, with the increase in the simplification in the graph representation (such as neglecting the type of the atom, bond orders, geometry in the favor of topology), the degeneration of the whole pool of possible calculations increases and there are more molecules with the same representation.This is favorable for the problems seeking similarities but is clearly unfavorable for the problems seeking dissimilarities.
A necessary step to accomplish better coverage of similarity vs dissimilarity dualism is to build and use a family of molecular descriptors, large enough to be able to provide answers for all.In the natural way, such a family should possess a 'genetic code'-namely, a series of variables from which to (re)produce a (one by one) molecular descriptor, all descriptors being therefore obtained in the same way.It is expected that all individuals of the family are independent of the numbering of the atoms in the molecule (should be molecular invariants).
The construction of such a family needs to consider the following: • Molecules carry both topological and geometrical features (see Figure 3);  The representation of a molecule could be done using identity and adjacency (Figure 4).The representation of a molecule could be done using identity and adjacency (Figure 4).The distinct identities from Figure 4 are given using a, b, and c as variables in the case of adjacency and using d, e, and f as variables in the case of identity.This formalism allows the introduction of a natural extension of the ChP from graphs to molecules.There is no determinism in selecting the values of a-f.However, The full extension could include also the distance matrix (Figure 5).

•
If The full extension could include also the distance matrix (Figure 5).The distinct identities from Figure 4 are given using a, b, and c as variables in the case of adjacency and using d, e, and f as variables in the case of identity.This formalism allows the introduction of a natural extension of the ChP from graphs to molecules.There is no determinism in selecting the values of a-f.However, The full extension could include also the distance matrix (Figure 5).
Figure 6 shows the ChP extension differently accounting the identities from atomic properties ([I] ← A P ∈ {A, B, C, D, E, F, G, H, I, J, K, L}) and connectivity properties ([C] ← C P ∈ {t, g, c, b, T, G, C, B,}).The extending characteristic polynomial (EChP) is designed for estimation/prediction of molecular properties, so a software implementation was done.EChP(λ, IP, CP) diverges as ChP(λ) does (to ∞) quickly with the increase of λ > 1.Thus, the [−1, 1] range → '2001′ grid is useful for evaluation.A linearization (LL) is required and was implemented since biological properties are expressed in log scale.The evaluation is performed at every point (out of 2001), requiring O(n 3 ) operations (where n is the number of atoms).
EChP is a family with 96 (nI*nC) polynomial formulas and 288 (*nL) linearized ones, leading to a total of 576,288 individuals.The FreePascal software was used for implementation since it is very fast and allows a parallelized version to be used with multi-CPUs (chp17chp.pas)[35].The program requires input files in the 'chp' format (such as chfp_17_q.asc,see Figure 7), and uses a filtering (PHP) program (→chfp_17_t.asc)as well as a molecular property file (such as chfp_17 [prop].txt).The filtering program was designed to look for degenerations and to reduce the pool of descriptors by eliminating the degenerated ones.The extending characteristic polynomial (EChP) is designed for estimation/prediction of molecular properties, so a software implementation was done.EChP(λ, I P , C P ) diverges as ChP(λ) does (to ∞) quickly with the increase of λ > 1.Thus, the [−1, 1] range → '2001 grid is useful for evaluation.A linearization (L L ) is required and was implemented since biological properties are expressed in log scale.The evaluation is performed at every point (out of 2001), requiring O(n 3 ) operations (where n is the number of atoms).
EChP is a family with 96 (n I *n C ) polynomial formulas and 288 (*n L ) linearized ones, leading to a total of 576,288 individuals.The FreePascal software was used for implementation since it is very fast and allows a parallelized version to be used with multi-CPUs (chp17chp.pas)[35].The program requires input files in the 'chp' format (such as chfp_17_q.asc,see Figure 7), and uses a filtering (PHP) program (→chfp_17_t.asc)as well as a molecular property file (such as chfp_17 [prop].txt).The filtering program was designed to look for degenerations and to reduce the pool of descriptors by eliminating the degenerated ones.The extending characteristic polynomial (EChP) is designed for estimation/prediction of molecular properties, so a software implementation was done.EChP(λ, IP, CP) diverges as ChP(λ) does (to ∞) quickly with the increase of λ > 1.Thus, the [−1, 1] range → '2001′ grid is useful for evaluation.A linearization (LL) is required and was implemented since biological properties are expressed in log scale.The evaluation is performed at every point (out of 2001), requiring O(n 3 ) operations (where n is the number of atoms).
EChP is a family with 96 (nI*nC) polynomial formulas and 288 (*nL) linearized ones, leading to a total of 576,288 individuals.The FreePascal software was used for implementation since it is very fast and allows a parallelized version to be used with multi-CPUs (chp17chp.pas)[35].The program requires input files in the 'chp' format (such as chfp_17_q.asc,see Figure 7), and uses a filtering (PHP) program (→chfp_17_t.asc)as well as a molecular property file (such as chfp_17 [prop].txt).The filtering program was designed to look for degenerations and to reduce the pool of descriptors by eliminating the degenerated ones.The family of EChP descriptors was then used with a series of chemical compounds to obtain associations between the structure and properties as regression equations.

Numerical Case Study
The case study was conducted on C 20 fullerene congeners with Boron, Carbon, or Nitrogen atoms on each layer (Figure 8).A sample of 45 distinct compounds was obtained.The generic name of the files was stored as dd_R 1 R 2 R 3 R 4 , where dd is the number of the compound in the set and R 1 -R 4 are the atoms on layers 1-4 (e.g., 02_bbbn.chp is the second compound in the sample and has boron of the first three layers and nitrogen on the last layer).
Mathematics 2017, 5, 84 7 of 12 The family of EChP descriptors was then used with a series of chemical compounds to obtain associations between the structure and properties as regression equations.

Numerical Case Study
The case study was conducted on C20 fullerene congeners with Boron, Carbon, or Nitrogen atoms on each layer (Figure 8).A sample of 45 distinct compounds was obtained.The generic name of the files was stored as dd_R1R2R3R4, where dd is the number of the compound in the set and R1-R4 are the atoms on layers 1-4 (e.g., 02_bbbn.chp is the second compound in the sample and has boron of the first three layers and nitrogen on the last layer).The geometries were built at the Hartree-Fock (HF) [3][4][5][6] 6-31 G [36] level of theory and calculated properties (namely, area and volume) were extracted from these calculations.Two different structures proved stable for bbbb (see Figure 9) and both were included in the analysis, resulting in a sample of 46 compounds.00_bbbb 01_bbbb The values of the calculated properties are given in Table 3.The geometries were built at the Hartree-Fock (HF) [3][4][5][6] 6-31 G [36] level of theory and calculated properties (namely, area and volume) were extracted from these calculations.Two different structures proved stable for bbbb (see Figure 9) and both were included in the analysis, resulting in a sample of 46 compounds.The family of EChP descriptors was then used with a series of chemical compounds to obtain associations between the structure and properties as regression equations.

Numerical Case Study
The case study was conducted on C20 fullerene congeners with Boron, Carbon, or Nitrogen atoms on each layer (Figure 8).A sample of 45 distinct compounds was obtained.The generic name of the files was stored as dd_R1R2R3R4, where dd is the number of the compound in the set and R1-R4 are the atoms on layers 1-4 (e.g., 02_bbbn.chp is the second compound in the sample and has boron of the first three layers and nitrogen on the last layer).The geometries were built at the Hartree-Fock (HF) [3][4][5][6] 6-31 G [36] level of theory and calculated properties (namely, area and volume) were extracted from these calculations.Two different structures proved stable for bbbb (see Figure 9) and both were included in the analysis, resulting in a sample of 46 compounds.00_bbbb 01_bbbb The values of the calculated properties are given in Table 3.The values of the calculated properties are given in Table 3.Normal distribution of the data is one assumption that needs to be assessed before any linear regression analysis.Six different tests were used (AD = Anderson-Darling, KS = Kolmogorov-Smirnov, CM = Cramér-von Mises, KV = Kuiper V, WU = Watson U 2 , H1 = Shannon's entropy [37]) [38] and the decision was made based on the combined test proposed by Fisher [39].The distribution of the investigated properties proved to be not significantly different from the expected normal distribution (see Table 4, all p-values > 0.05).Where for a series of cumulative distribution function values ((f i ) 1≤i≤n ): The absences of the outliers have also been investigated using Grubb's test [40] for the association between volume (vol) and area on the sample of investigated C 20 congeners.The analysis identified three compounds as outliers, their exclusion leading to a performing linear association (Figure 10).Normal distribution of the data is one assumption that needs to be assessed before any linear regression analysis.Six different tests were used (AD = Anderson-Darling, KS = Kolmogorov-Smirnov, CM = Cramér-von Mises, KV = Kuiper V, WU = Watson U 2 , H1 = Shannon's entropy [37]) [38] and the decision was made based on the combined test proposed by Fisher [39].The distribution of the investigated properties proved to be not significantly different from the expected normal distribution (see Table 4, all p-values > 0.05).Where for a series of cumulative distribution function values ((fi)1≤i≤n): The absences of the outliers have also been investigated using Grubb's test [40] for the association between volume (vol) and area on the sample of investigated C20 congeners.The analysis identified three compounds as outliers, their exclusion leading to a performing linear association (Figure 10).The values of the EChP descriptors were generated for all molecules in the dataset and were used as input data for searching linear regression models able to explain the investigated properties (area and volume).Three different approaches were used, searching for additive, multiplicative, or full linear dependence (see Table 5).The model comparison strongly suggests that the best performing models are the additive or the full model for both investigated properties.However, since 03_bbcn is an outlier for the area on the additive model, we can say that choosing the full model will give a correct estimation.
It is important that the performing models identified using the EChP descriptors-the full model-select the same polynomial for both descriptors when both area and volume ("CG" in LCG+0.236,LCG+0.276, and LCG−−0.908)are investigated.It should be noted that one descriptor is common for the estimation of the area and of the volume (LCG−0.908)for the C20 fullerene congeners.This fact, in conjunction with the higher correlation between volume and area (r 2 adj ≈ 0.97), the presence of outliers in one additive model, and the significant higher performance by full models in estimation sustained by goodness-of-fit and the graphical representation of calculated versus estimated, suggests that the best models are those with full effects.

Conclusions and Further Work
EChP proved useful for estimation of the investigated molecular properties.Both properties of C20 congeners-volume and area-are explained by a common descriptor (LCG−0.908(or vice versa)).
EChP is a natural extension of the ChP.The scales of the atomic properties were more or less arbitrary selected and will be further investigated to find the optimal solution.Furthermore, the reversed distance seemed to be the best alternative but further analysis must be conducted to demonstrate this observation.Graphical representations of calculated and estimated area and respective volume by the investigated effects are given in Figure 11 (eq1-eq3) and Figure 12 (eq4-eq6).The model comparison strongly suggests that the best performing models are the additive or the full model for both investigated properties.However, since 03_bbcn is an outlier for the area on the additive model, we can say that choosing the full model will give a correct estimation.
It is important that the performing models identified using the EChP descriptors-the full model-select the same polynomial for both descriptors when both area and volume ("CG" in LCG+0.236,LCG+0.276, and LCG−−0.908)are investigated.It should be noted that one descriptor is common for the estimation of the area and of the volume (LCG−0.908)for the C20 fullerene congeners.This fact, in conjunction with the higher correlation between volume and area (r 2 adj ≈ 0.97), the presence of outliers in one additive model, and the significant higher performance by full models in estimation sustained by goodness-of-fit and the graphical representation of calculated versus estimated, suggests that the best models are those with full effects.

Conclusions and Further Work
EChP proved useful for estimation of the investigated molecular properties.Both properties of C20 congeners-volume and area-are explained by a common descriptor (LCG−0.908(or vice versa)).
EChP is a natural extension of the ChP.The scales of the atomic properties were more or less arbitrary selected and will be further investigated to find the optimal solution.Furthermore, the reversed distance seemed to be the best alternative but further analysis must be conducted to demonstrate this observation.The model comparison strongly suggests that the best performing models are the additive or the full model for both investigated properties.However, since 03_bbcn is an outlier for the area on the additive model, we can say that choosing the full model will give a correct estimation.
It is important that the performing models identified using the EChP descriptors-the full model-select the same polynomial for both descriptors when both area and volume ("CG" in LCG +0.236 , LCG +0.276 , and LCG −0.908 ) are investigated.It should be noted that one descriptor is common for the estimation of the area and of the volume (LCG −0.908 ) for the C 20 fullerene congeners.This fact, in conjunction with the higher correlation between volume and area (r 2 adj ≈ 0.97), the presence of outliers in one additive model, and the significant higher performance by full models in estimation sustained by goodness-of-fit and the graphical representation of calculated versus estimated, suggests that the best models are those with full effects.

Conclusions and Further Work
EChP proved useful for estimation of the investigated molecular properties.Both properties of C 20 congeners-volume and area-are explained by a common descriptor (LCG −0.908 (or vice versa)).
EChP is a natural extension of the ChP.The scales of the atomic properties were more or less arbitrary selected and will be further investigated to find the optimal solution.Furthermore, the reversed distance seemed to be the best alternative but further analysis must be conducted to demonstrate this observation.

Figure 4 .
Figure 4. Molecular geometry translated into adjacency and identity-an example.

Figure 4 .
Figure 4. Molecular geometry translated into adjacency and identity-an example.

Figure 5 .Figure 5 .
Figure 5. Molecular geometry translated into adjacency, identity, and distance-an example.The extended ChP has the following formula: ChP ← |λ × [I] − [C]| is either[A]  or[D], the identities (a, b, and c from [I]) and the connectivity (d, e, f, g, h, i, j, k, and l from [C]).The single-value entries (0 and 1 ≠ 0 for the classical definition of the ChP) can be upgraded to multi-value (any value), accounting for different atoms and bonds.Obviously, the classical ChP is found when a = b = c = d = e = f = 1 and g = h = i = j = k = l = 0.
] is either[A]  or[D], the identities (a, b, and c from [I]) and the connectivity (d, e, f, g, h, i, j, k, and l from [C]).The single-value entries (0 and 1 ≠ 0 for the classical definition of the ChP) can be upgraded to multi-value (any value), accounting for different atoms and bonds.Obviously, the classical ChP is found when a = b = c = d = e = f = 1 and g = h = i = j = k = l = 0.Figure6shows the ChP extension differently accounting the identities from atomic properties ([I] ← AP ∈ {A, B, C, D, E, F, G, H, I, J, K, L}) and connectivity properties ([C] ← CP ∈ {t, g, c, b, T, G, C, B,}).
bonds parameters molecule data (first two lines) & atoms parameters Data from postHF (MP2) geometry optimization

Figure 8 .
Figure 8. C20 fullerene congeners (R is the symbol of the atom on the layer).

Figure 8 .
Figure 8. C20 fullerene congeners (R is the symbol of the atom on the layer).

Figure 10 .
Figure 10.Volume as linear function of area.

Figure 10 .
Figure 10.Volume as linear function of area.

•
Atom and bond types are essential factors in the expression of the measurable properties; • Atom and/or bond numbering induces an undesired isomorphism; • Geometry and bond types induce other kinds of isomorphism.

Table 3 .
C20 congeners: values of investigated properties.Figure 8. C 20 fullerene congeners (R is the symbol of the atom on the layer).

Table 8 .
Fisher's Z model comparisons: results.Graphical representations of calculated and estimated area and respective volume by the investigated effects are given in Figure 11 (eq1-eq3) and Figure 12 (eq4-eq6).