Energy Decomposition Scheme for Rectangular Graphene Flakes

We show—to our own surprise—that total electronic energies for a family of m × n rectangular graphene flakes can be very accurately represented by a simple function of the structural parameters m and n with errors not exceeding 1 kcal/mol. The energies of these flakes, usually referred to as multiple zigzag chains Z(m,n), are computed for m, n < 21 at their optimized geometries using the DFTB3 methodology. We have discovered that the structural parameters m and n (and their simple algebraic functions) provide a much better basis for the energy decomposition scheme than the various topological invariants usually used in this context. Most terms appearing in our energy decomposition scheme seem to have simple chemical interpretations. Our observation goes against the well-established knowledge stating that many-body energies are complicated functions of molecular parameters. Our observations might have far-reaching consequences for building accurate machine learning models.


Introduction
Quantum mechanical studies of large graphene flakes are obstructed to a large degree by the prohibitive cost of their calculations [1,2].In principle, the total electronic energy of a rectangular graphene flake at its equilibrium geometry is a unique function of two parameters, m and n, corresponding, respectively, to the width and the height of the flake.Taking into account the complicated nature of this quantity and quite involved process associated with its determination-the diagonalization of the molecular Hamiltonian built from various integrals computed on an atomic basis followed by subsequent geometry optimization using gradients associated with such a non-additive many-body-like Hamiltonian-it has always been assumed that the energy function must be fairly complicated and cannot be determined in a close form as a simple function of the parameters of the model.On the other hand, in the limit m, n → ∞, a rectangular graphene flake converges toward an infinite graphene sheet, the physics of which is relatively simple, being sufficiently well described with a unit cell containing just two carbon atoms.Recent progress in the exploration of the chemical space using machine learning (ML) approaches [3][4][5][6][7][8][9][10] suggests that finding an energy function E = E(m, n) can be, in fact, possible; after all, the intrinsic success of machine learning algorithms is based on the implicit existence of such a function encoded in the structure of the ML model.The explicit extraction of the quantum mechanical energy of such a system from the ML model and expressing it in a closed form as a function of the parameters m and n could be advantageous for understanding the system better and possibly for finding new important descriptors pertinent to such a formulation.
In the current study, we attempt to construct such an energy function, E(m, n), restricting our attention to a single family of rectangular, hydrogen-terminated graphene flakes: multiple zigzag chains Z(m, n).(See Chapter 9 of [11,12]; for a graphical definition of these structures, see Figure 1).It is straightforward to establish that for a hydrogen-terminated multiple zigzag chain Z(m, n) of width m and height n, the number of carbon atoms is equal to 2mn + 2m + 2n, the number of hydrogen atoms is equal to 2m + 2n + 2, and the total number N of electrons is equal to 12mn + 14m + 14n.Taking into account that most quantum chemical methods require a diagonalization step [13], which scales according to N 3 with the number of electrons (see, for example, p. 191 of [1]), we can estimate that for large m and n, the cost of quantum chemical calculations for Z(m, n) is proportional to m 3 n 3 .It is therefore natural to ask whether the quantum chemical energy of a graphene flake and the corresponding equilibrium molecular geometry can be obtained by some simpler means.In the current paper, we demonstrate that indeed, for multiple zigzag chains Z(m, n), such a shortcut can be designed, and the energy function E(m, n) of fully optimized multiple zigzag chains Z(m, n) can be determined without quantum chemical calculations to a surprisingly high level of accuracy.We believe that a similar approach could also be designed for other carbon nanostructures, including graphene flakes [14,15] of other shapes, carbon nanotubes [16], and fullerenes [17,18].

Previous Studies
The past two decades witnessed several computational attempts to correlate wide ranges of electronic, physical, and chemical properties of molecular systems with their sizes.Schwerdtfeger and co-workers, for instance, discovered that the cohesive energies of gold [19], tin [20], and cesium clusters [21] depend on the inverse cube of the cluster size.The extrapolation of these theoretical models yielded cohesive energies with accuracy ranging from 0.05 to 0.67 eV, depending on the metal species.The total electronic and atomization energies of polyacenes (consisting of 1-8 benzene rings) were shown to change linearly with the number of benzene units and with the number of electrons [22].The harmonic frequencies (and consequently the positions of the associated Raman lines) for acoustic-like vibrational modes in octahedral and tetrahedral nanodiamonds were shown to have inverse linear scaling with respect to the number of carbon atoms [23], allowing the determination of the size of the nanodiamonds directly from the Raman spectra.Another typical example concerns the correlation between the maximum emission wavelength of quantum dots and their sizes; it is interesting that the color of a quantum dot is determined primarily by its diameter [24][25][26], while its chemical identity plays only a lesser role.Yet another-somewhat extreme-example is constituted by the recent discovery of a simple Rydberg-like formula based on the classical theory of atoms, which is capable of predicting qualitatively the distribution and energetics of the first 10-15 atomic excited states for a wide collection of atoms [27].These diversified examples suggest that, in many cases, simple models can provide an accuracy comparable with that of an exact quantum treatment.It is important to understand and develop such models and to analyze their strengths and limitations as in many cases their somewhat surprising and seemingly accidental numerical robustness can be explained by elementary scientific principles, some of which are yet to be discovered [28][29][30][31][32].
To the best of our knowledge, the only work on graphene flakes relevant to our investigations was performed by Gutman and his collaborators, who found that the topological resonance energies, total π-electron energies, and Dewar resonance energies of 118 catacondensed isomers of heptacene with the formula C 30 H 18 can be represented as simple functions of either the corresponding total number of Clar covers C or the Kekulé number K [33].In each case, a simple, few-parameter formula was found, which was capable of predicting resonance energies of all the isomers with errors no larger than 0.5%.This work was further extended to the topological resonance energies of 132 isomers of C 22 H 14 , C 24 H 14 , C 26 H 14 , C 26 H 16 , and C 28 H 16 , for which the maximal error was found to be smaller than 1.2% [34].It is important to stress here that the fitting parameters in the energy formulas were distinct for different numbers of carbon and hydrogen atoms.
The original reason to start working on the current project was to correlate various graph-theoretical and topological descriptors of graphene flakes Z(m, n), including the Clar number Cl, the Clar count C, the Kekulé count K, and the Zhang-Zhang polynomial [35][36][37][38][39][40][41][42][43], with the total electronic energies E(m, n) of these structures.Over the last decade, we have been involved in constructing a coherent mathematical theory of such invariants [44][45][46][47], so it was important for us to establish whether these topological invariants could be useful for the practical determination of the physicochemical properties of graphene flakes.In Appendix A, we show that no positive correlation exists between these quantities, suggesting that the Kekulé count K, the Clar count C, and the Clar number Cl do not constitute useful descriptors for reproducing the total electronic energies of rectangular graphene flakes Z(m, n) of various sizes.However, in the process of verifying this hypothesis, we have accidentally discovered that the total energies E(m, n) of the graphene flakes Z(m, n) can be expressed as a surprisingly simple function of the structural parameters m and n, with most of the terms appearing in this formulation having simple chemical interpretations.In the first steps of the verification process, these simple functions of the structural parameters m and n were added to the topological fitness function in order to improve the correlation.We immediately noticed that they constitute a much better fitness function than the topological descriptors; moreover, it soon turned out that removing the topological descriptors from the fitness function does not really lower the quality of the fit.A detailed explanation of the process of constructing an optimal set of structural descriptors is discussed in the next two Sections.

Computational Methodology and Data Analysis
The molecular structures of the here-analyzed multiple zigzag chains Z(m, n) with m, n = 1, . . ., 20 were optimized using the third-order density-functional tight-binding (DFTB3) method [48] in conjunction with the 3OB parameter sets [49] using the DFTB+ 1.2 software [50].The DFTB charge convergence criterion was selected as 10 −12 a.u.Most of the studied systems displayed small or negligible HOMO-LUMO gaps.Consequently, their electronic temperature was set to either 0 K (for 153 structures corresponding to smaller values of m and n) or to 0.0001 K (for 247 quasi-metallic structures corresponding to higher values of m and n).Each optimization was performed by alternating between the steepest descent and conjugate gradient procedures in order to accelerate the calculations and to ensure convergence to a global minimum.The force convergence criterion (10 −7 a.u.) corresponded to a very tight optimization.The resulting equilibrium geometries of the optimized flakes were planar.
It is possible that some graphene flakes may be subject to Jahn-Teller distortions, resulting in their non-planar geometry.This effect is probably strongest for flakes without hydrogen termination, where the deficiencies in chemical saturation of carbon atoms give rise to quite complicated electronic structures and, consequently, to open-shell characters and considerable degeneracies in the one-particle energy spectrum.Non-planarity deformations, allowing us to lift this degeneracy and to increase the percentage of doubly occupied levels, might be an important factor in lowering the total energy during the energy optimization process.We suspect that for the class of the hydrogen-teminated graphene flakes studied in our paper, this effect might be somewhat less pronounced owing to the well-defined, chemically saturated electronic nature of each of the flakes.An obvious way to verify whether a given flake is planar or non-planar is an inspection of the harmonic vibrational modes.Performing such a task for all the here-studied flakes is formidable due to its computational complexity.However, to answer one of the referee's comments, we have performed DFTB+ harmonic frequency calculations for two medium-size flakes, Z(10, 10) and Z (15,15).All the harmonic frequencies of Z(10, 10) and Z(15, 15) turned out to be positive, showing that both these flakes are in fact planar, despite the fact that their electronic structure is metallic with 4 electrons distributed among 3 quasi-degenerate MOs with the occupation pattern 1.6:1.3:1.1 for Z(10, 10) and with 2 electrons distributed among 4 quasi-degenerate MOs with the occupation pattern 1.1:0.4:0.3:0.2 for Z (15,15).These two sets of calculations do not provide a proof that all the flakes studied by us are planar, but we feel that they provide quite a strong argument supporting this idea.
The set of total DFTB3 energies {E(m, n) : m, n = 1, . . ., 20} of the fully optimized struc- tures of Z(m, n) was analyzed as follows.First, we selected a family of J bivariate functions where {c k } was the set of fitting coefficients to be determined via the least-square fitting procedure.In principle, the coefficients {c k } in Equation ( 1) could be determined using all the available energies E(m, n) in the fitting process.However, such a solution is to be avoided if one also aims at an expression applicable to larger values of m and n than those included in the fitting set.As described later in Section 4, the structures Z(m, n) with low values of m and n do not closely follow the trends observed for the larger structures.Therefore, to avoid outliers in the fitting process, we decided to discard the smallest structures from our analysis.Our detailed reasoning showing how to decide which structures should be avoided in the fitting process is presented in the next section; here, we merely mention that the smallest value of m retained in our analysis is denoted by M, and the smallest value of n, by N.
The coefficients c k are found by minimizing the following 2-norm of the residual vector The minimization leads to the least-square problem, which can be expressed as where c = c 1 , c 2 , . . ., c J T is the vector of the fitting coefficients, is the vector of energies used in the fitting process, and A is a The least-square problem in Equation ( 3) was solved via the singular value decomposition (SVD) [51,52] of the matrix which allowed us to write c as where Σ is the diagonal matrix containing the singular values of σ i and Σ + is a diagonal matrix comprising inverses of non-vanishing singular values, 1/σ i , and zeros otherwise.It was found that all the singular values σ i in Σ were always non-zero, so the inverse Σ + in Equation ( 7) could be computed with the full rank.
The analysis of the fitting residues is based on the vector ∆ϵ = E − E = E − Ac.The structure of the residual energy vector is as follows: where ∆ϵ(i, j) = E(i, j) − E(i, j).At times, to highlight the distinct behavior of E(m, n) for small values of m and n, we extend this definition to ∆ϵ = [∆ϵ(1, 1), . . . ,∆ϵ(20, 20)] T (9) even if the coefficients {c k } are optimized only for m ∈ M, . . ., 20 and n ∈ N, . . ., 20.

Construction of the Fit
As the main purpose of this research paper is to study the behavior of the energies E(m, n) of the graphene flakes Z(m, n) as a function of the structural parameters m and n, we start our investigation by presenting in Figure 2 a plot of E(m, n) as a function of m and n.The computed energies, depicted with solid black circles, are arranged in equidistant vertical columns, each corresponding to the same value of m.Interestingly and surprisingly, the circles corresponding to the same value of n arrange themselves on straight (diagonal) lines, which for the convenience of the reader are depicted using thin black lines in Figure 2. The highly structured form of the computed energies shown in Figure 2, resembling a trapezoidal lattice of diagonal and vertical lines, strongly suggests that the computed energies can be represented to a high level of accuracy by some analytical formula depending linearly on the parameters m and n.Therefore, in the first attempt, we chose three functions, The motivation behind this choice came from structural considerations.As mentioned previously, the number of carbon atoms is equal to 2mn + 2(m + n), and the number of hydrogen atoms is equal to 2(m + n) + 2. The number of the C-H bonds is equal to 2(m + n) + 2, and the number of the C-C bonds is equal to 3mn + 2(m + n) − 1, while the number of benzene rings in the graphene flake Z(m, n) is equal to mn.All these contributions can be expressed using the three functions A 1 (m, n), A 2 (m, n), and A 3 (m, n) specified above.This signifies that the selected functions should be capable of capturing all the energy effects associated with atomic contributions and with contributions corresponding to the localized C-H and C-C bonds and delocalized aromatic bonds.One might alternatively treat the graphene flake Z(m, n) as a macroscopic system.In this interpretation, A 2 (m, n) = m + n corresponds to the perimeter of the flake, A 3 (m, n) = mn corresponds to the area of the flake, and the choice of A 1 (m, n) = 1 is motivated by the finiteness aspect of each flake containing 4 corners and 4 edges, regardless of its size.
As anticipated, the energy expression E(m, n) constructed using the set {1, m + n, mn} allows us to capture most of the energy contributions, leaving out only relatively small residuals ∆ϵ(m, n), whose distribution is presented graphically in Figure 3.The fit coefficients are c 1 = 0.776231, c 2 = 4.156603, and c 3 = 3.436241; all values are given in hartrees (E h ).Even though the obtained residuals ∆ϵ(m, n) constitute only a tiny portion of the total energies E(m, n), for all practical purposes, the energy expression E(m, n) constructed using the set {1, m + n, mn} is useless because the residuals are too large and they tend to increase with increasing values of m and n.For m, n ≤ 20, the maximal absolute residue is about 90 kcal/mol, a value considerably larger than the 1 kcal/mol anticipated for accurate methods of quantum chemistry.On the other hand, a highly structured form of the residuals distribution shown in Figure 3 suggests that an inclusion of further fit functions in E(m, n) can considerably improve the quality of the fit.The shape of the distribution shown in Figure 3 suggests that an appropriate fit function to be included in E(m, n) can be expressed as A 4 (m, n) = m − n.Such a fitting function describes well the kite-shaped pattern of the distribution shown in Figure 3 and carries the information about the m-n asymmetry associated with the energy penalty corresponding to the departure of the flake from the ideal square shape.The energy expression E(m, n) constructed using the set {1, m + n, mn, m − n} reproduces the set of energies with much better fidelity.The resulting residuals ∆ϵ(m, n) are presented graphically in Figure 4.The maximal positive residue is reduced from 90 kcal/mol to about 45 kcal/mol, while the maximal negative residue is reduced from −75 kcal/mol to about −15 kcal/mol.Despite the fact that the residues have been substantially reduced by the additional basis function, their magnitudes are still too large to construct a practically useful expression for E(m, n).Before proceeding to finding an improved expression, let us briefly discuss the important information inferred from the analysis of data in Figure 4.

(i)
The fit coefficients are c 1 = 0.776231, c 2 = 4.156603, c 3 = 3.436241, and c 4 = −0.006627E h .The first three coefficients are identical to the coefficients obtained with the set {1, m + n, mn}.Such a pronounced fit stability suggests that all the employed basis functions represent some physical contributions to the energy.

(ii)
The energy residuals grow with increasing values of m and n.This signifies that the fitting formula cannot be extrapolated to large values of m and n without substantial loss of accuracy.

(iii)
The residues corresponding to m = 1 -4 and n = 1 -3 show distinct behavior compared with the rest of residues corresponding to higher values of m and/or n. (iv) The residues corresponding to 5 ≤ m ≤ 9 and to 4 ≤ n ≤ 6 show somewhat distinct behavior compared with the residues corresponding to higher values of m and n.The most important observation from the analysis given above concerns the difficulty of constructing an energy expression E(m, n) valid at the same time for small and large values of m and n: the finite small width (height) of graphene flakes Z(m, n) with m ≤ 4 (n ≤ 3) makes these structures dissimilar to a general graphene flake Z(m, n) with large values of m and n. (This effect has been partially discussed in the literature; a discovery that the ground state of linear polyacenes, i.e., the flakes Z(m, 1), is an open shell singlet caused considerable stir in the chemical community at the beginning of the 2000s [53][54][55][56]).Consequently, these structures need to be studied separately and cannot be used to construct a general expression for E(m, n).The next several small values of m and n belong to the transitional realm, in which the finite-size effects are still partially discernible.Therefore, it is advantageous from our point of view to use in the fitting process only the energies E(m, n) of the graphene flakes Z(m, n) with n ≥ N and m ≥ M, where the limiting values of M and N remain yet to be determined.Motivated by this observation, we present in Figure 5 the residuals ∆ϵ(m, n) obtained by using the fitting set {1, m + n, mn, m − n} with N = M = 7 (left panel) and N = M = 10 (right panel).As expected, removing the lower portion of data with small values of m and n makes the ∆ϵ(m, n) magnitudes almost 10 times smaller and opens further vistas for constructing a chemically useful expression for E(m, n).Clearly, the results obtained with N = M = 10 are more accurate with all the residuals ∆ϵ(m, n) with m ≥ 6 and n ≥ 9 falling within the window of ±2 kcal/mol of chemical accuracy.The usefulness of the resulting energy expression obtained with M = N = 10 was tested for its ability to predict the DFTB energies E(m, n) of larger flakes Z(m, n).For this purpose, we chose 9 flakes with 25 ≤ m, n ≤ 35; for detailed definitions, see Table 1.A graphical representation of all the residuals ∆ϵ(m, n) computed using the energy expression E(m, n) given by Equation ( 10) is displayed in Figure 6.The residuals ∆ϵ(m, n) of the 9 new flakes (marked with red triangles in the left panel of Figure 6) are consistently too large to fall within the chemical accuracy limit.
Fortunately, the computed departure from the desired behavior is regular and suggests how to improve the optimal energy expression E(m, n).In principle, there are two obvious ways to reduce the energy residuals further.The first one is based on an expansion of the fitting set by adding to it new functions of the structural parameters m and n.Since the structure of the flakes Z(m, n) does not offer any further obvious insights here, we have considered-somewhat ad hoc -the following four additional functions: m 2 , n 2 , m/n, and n/m.Various expansions (labeled as C, D, E, F, G, and H) of the fitting set have been performed; their definitions and the resulting list of the fitting coefficients is given in Table 2.
Table 1.Energy residuals |∆ϵ(m, n)| for larger graphene flakes calculated using the energy expression E(m, n) and constructed for various fitting function sets with M = N = 10.Definitions of the fitting sets (B -H) are given in Table 2.
As expected, the inclusion of more variables in the fitting function E(m, n) leads to a considerable reduction in the magnitudes of the energy residuals.(See Table 1 for the detailed values.)Again, not surprisingly, the smallest magnitudes of the energy residuals for the new 9 larger flakes were obtained with the largest here-tested fitting set H (for a graphical representation, see the right panel of Figure 6); slightly larger values of the mean absolute error (MAE) were obtained with the fitting sets D, F, and G.The similarity of these magnitudes and small values of fitting coefficients for some of the fit variables (particularly for m 2 and n 2 ) allows the identification of m/n as the most important new component of the fit.The variable m/n, similarly to m − n, quantifies the departure of the flake from squareness.Its intrinsic meaning in the energy decomposition of rectangular graphene flakes and its distinction from the m − n variable remains to be understood.
The second way to reduce the energy residuals further is to include the energies of some of these larger flakes in the fitting procedure.For the needs of the current analysis, we decided to include all 9 DFTB energies E(m, n) of the larger flakes listed in Table 1 in the fitting set together with the previously used 121 DFTB energies E(m, n) with 10 ≤ m, n ≤ 20.This new fitting set was used together with the B = {1, m + n, mn, m − n} and H = {1, m + n, mn, m − n, m 2 , n 2 , m/n, n/m} fitting functions sets; the resulting fitting protocols were abbreviated as B * and H * , respectively.The list of the fitting coefficients for B * and H * is given in Table 2.The performance of these protocols is summarized in Figure 7; the left panel of this figure shows the residuals obtained with the B * protocol, and the right panel shows the residuals obtained with the H * protocol.For B * , including the 9 energies of the larger flakes allowed us to reduce the energy residuals to less than ±2 kcal/mol.The performance of H * is particularly stunning: all the energies E(m, n) with 10 ≤ m, n are reproduced with practically vanishing residuals.Since all the previously used test points were included in the fitting set, to test the quality of these two new protocols, we decided to include two new large flakes, Z (25,35) and Z (40,35), in our analysis.(Note that for the 11 large rectangular flakes employed here for tests, the DFTB optimizations were performed less rigorously, with the force convergence criterion being, at most, 3 × 10 −5 a.u.) Figure 7 shows that both protocols perform well for these two new flakes, but the performance of H * is outstanding.We feel that it is worth explicitly stating the accurate energy expression E(m, n) constructed using the protocol H * ,

Results and Discussion
Several interesting aspects of our work are worth highlighting: (i) Figure 6 and the left panel of Figure 7 clearly show that the accuracy of the constructed energy expression E(m, n) deteriorates when one extrapolates it outside of the fitting set.A similar effect is also expected for the best-constructed energy expression E(m, n) in Equation (11) when used for very large flakes with m, n ≥ 60.
However, as the current study shows, such a problem can be easily circumvented by adding a few new very large flakes Z(m, n) with relevant values of m and n to the fitting set.For researchers interested in such extensions, we have included all the DFTB input files in the Supplementary Materials Section of this manuscript.The DFTB geometry optimization for the largest flake considered here, Z (40,35) with the molecular formula C 2950 H 152 , took approximately several weeks on a 100-core computer.The reader should be aware that the optimization of a larger flake might take a considerably longer time.An interesting alternative here could be a theoretical analysis of the contributions to the total energy from the finite edge effects and possibly quantifying such an influence using non-obvious, new (m, n)-dependent basis functions.Such a development is expected to improve the description of small flakes and to permit the extrapolation of the energy formula to really large values of m and n that presently escape the possibility of direct quantum chemical calculations.

(ii)
Small rectangular graphene flakes are known for their various interesting chemical deviations from the typical behavior of polycyclic hydrocarbons, including their open-shell ground state [53][54][55][56] and pronounced radical character [57][58][59][60][61][62][63].Our work shows that larger flakes are more uniform, suggesting that the transition from finite, pyrene-like, small flakes to infinite, graphene-like, large flakes happens in the regime of m, n ≈ 10.To investigate this issue in more detail, we have analyzed the distribution of the CC bond lengths and CCC bond angles in the transition from small to large flakes.Before discussing the results, let us only briefly comment that in the transition from small to large flakes, one would expect that the bond lengths and bond angles would become more uniform, approaching the values corresponding to an infinite graphene sheet, i.e., CC bond lengths of 1.42-1.43Å obtained from the DFT calculations [64][65][66][67] and of 1.42 Å obtained from experiment and an angle of 120 • corresponding to a hexagonal geometry.The results for square flakes defined by the formula Z(k, k) for k = 2-20 are presented in Figure 8.These results show that the convergence to the graphene-like values is fast.In principle, the flakes Z(6, 6) already show distributions similar to Z (20,20).In addition to square-shaped flakes, it is interesting to check for similar convergence properties for rectangular flakes.In Figure 9, we show analogous distributions of CC bond lengths and CCC bond angles for two families of rectangular flakes, Z(20, k) and Z(k, 20), with values of k = 2-20.In both cases, the convergence toward the graphene-like regime is obtained faster than for the square flakes; this effect is particularly fast for the polyacene-like flakes Z(20, k).Despite the fast convergence to the graphene-like regime, the finite edge effects are clearly visible in all the distributions, suggesting that the hydrogen termination and edge effects constitute important local perturbations.

(iii)
The parameters of the fit tabulated in Table 2 show surprisingly large inertias with respect to the extension of the function set with new variables.This behavior suggests that the energy decomposition has physical meaning, and its coefficients can be interpreted as sums of various energy contributions.For a flake Z(m, n), the contributions can be easily identified: The energy ϵ CH of a C-H bond with multiplicity 2(m + n) + 2. Unfortunately, all these contributions involve only 3 fitting functions ({1, m + n, and mn}), showing that a unique determination of the five parameters ϵ H , ϵ C , ϵ O , ϵ CC , and ϵ CH from the three coefficients c 1 , c 2 , and c 3 is not possible from our analysis.In the future studies, we plan to extend our analysis to other structured graphene flakes, including prolate rectangles Pr(k, m, n) [11,68], oblate rectangles Ob(m, n) [11,69], and O(k, m, n) [11,46].We expect that the distinct dependence of the total energy on the structural parameters for these structures will help to uniquely determine the five parameters ϵ H , ϵ C , ϵ O , ϵ CC , and ϵ CH defined above.

(iv)
The energies used to construct the energy expression given by Equation ( 11) in addition to the usual size and shape dependence encoded by the parameters m and n also include contributions related to the geometry relaxation effects.In our study, all these components are treated collectively.It would be very interesting to consider the relaxation effects individually, for example, by starting the geometry optimization from a rectangular patch of an idealized infinite graphene sheet with uniform hydrogen terminations.The relaxation effects can be divided into three types of contributions: (1) those corresponding to the relaxation of the carbon sublattice, (2) those corresponding to the relaxation of the hydrogen sublattice, and (3) those corresponding to the synergic relaxation of both lattices simultaneously.Such an analysis is beyond the scope of the current study.

(v)
Numerous studies tried to correlate various topological invariants with the energetic stability of polycyclic aromatic hydrocarbons.The vast efforts of the graphtheoretically oriented chemists over the last few decades have created substantial literature on this topic.(Probably the best existing account describing the body of results on the Kekulé count K is the monography of Cyvin and Gutman [11]; the results on other are scattered throughout the literature.)Our study may show that such an effort might be somewhat superfluous as similar effects can be achieved more readily by correlating the energetics with appropriate structural parameters (and their simple algebraic functions) of the whole family of analyzed structures.

Conclusions
We have computed the DFTB energies E(m, n) of rectangular graphene flakes Z(m, n) with 1 ≤ m, n ≤ 20 at their optimized geometries.The resulting set of energies has been used to construct an approximate energy expression E(m, n) for E(m, n) in a form of a linear combination of various simple functions of the structural parameters m and n.In order to construct an accurate energy expression E(m, n), several important factors have been explicitly considered in our work: It has been recognized that small graphene flakes Z(m, n) with 1 ≤ m, n ≤ 9 are structurally dissimilar to larger flakes due to their finite-size effects.Those structures have been excluded from the fitting set, which finally comprised 121 energies of flakes Z(m, n) with 10 ≤ m, n ≤ 20.

(ii)
It has been noted that in order to be able to extrapolate the energy expression E(m, n) outside of the fitting region, we need to include in the fitting set several larger structures Z(m, n) with 20 ≤ m, n. (iii) The set of fitting functions resulting from structural considerations and comprising physically meaningful variables {1, m + n, mn, m − n} needs to be expanded to {1, m n, mn, m − n, m 2 , n 2 , n/m, m/n}.The physical interpretation of these new variables remains to be understood.(iv) Performance tests of the energy expression E(m, n) have been performed using two additional graphene flakes, Z (25,35) and Z (40,35).
The best approximation to E(m, n) was produced using a fitting protocol abbreviated as H * (for details, see Table 2); the explicit form of the resulting E(m, n) is given by Equation ( 11).This expression is very accurate and falls withing the requirement of chemical accuracy (i.e., the energy residuals ∆ϵ(m, n) are smaller than ±2 kcal/mol) for all graphene flakes Z(m, n) with 10 ≤ m, n ≤ 20 used in the current study; in fact, all the relevant residuals are almost negligible (for details, see the right panel of Figure 7).

Figure 1 .
Figure 1.Two examples of multiple zigzag chains Z(m, n).The index m represents the length of the zigzag edge of these rectangular flakes, while the index n corresponds to the length of the armchair edge.The shape and the symmetry of the flake are slightly different for even and odd values of n.

Figure 2 .
Figure 2. The energies E(m, n) of the optimized graphene flakes Z(m, n) can be arranged in the shape of a trapezoid, in which the vertical lines correspond to constant values of m and the diagonal lines correspond to a constant values of n.

Figure 3 .
Figure 3. Energy residual ∆ϵ(m, n) corresponding to the fit obtained with a set {1, m + n, mn}.Points depicted in identical colors have identical values of m, and points depicted with identical symbols have identical values of n.For the convenience of the reader, the energy residuum scale is given in hartree (left axis) and in kcal/mol (right axis).

Figure 4 .
Figure 4. Energy residuals ∆ϵ(m, n) corresponding to the fit obtained with a set {1, m + n, mn, m − n}.Points depicted in identical colors have identical values of m, and points depicted with identical symbols have identical values of n.For the convenience of the reader, the energy residuum scale is given in hartree (left axis) and in kcal/mol (right axis).

Figure 5 .
Figure 5.Further improvement of energy residuals obtained using the fitting set {1, m + n, mn, m − n} and M = 7, N = 7 (left) and M = 10, N = 10 (right).For the meaning of M and N, see text.The yellow background depicts region of ±2 kcal/mol corresponding to chemical accuracy.

Figure 6 .
Figure 6.(Left): The residuals for the 9 new larger flakes with 25 ≤ m, n ≤ 35, marked here with red triangles, show a rather large systematic departure from the chemical accuracy limit and suggest that the energy expression E(m, n) constructed using the fitting function set B = {1, m + n, mn, m − n} and 121 DFTB energy values E(m, n) of smaller flakes with 10 ≤ m, n ≤ 20 cannot be easily extrapolated to larger structures.(Right): Various means to solve this problem are discussed in text.Here, we show that extending the fitting function set to H = {1, m + n, mn, m − n, m 2 , n 2 , n/m, m/n} can substantially reduce the residuals for these 9 new larger flakes, making all of them smaller than ±2 kcal/mol.The yellow background depicts region of ±2 kcal/mol corresponding to chemical accuracy.

Figure 7 .
Figure 7.Further improvement of energy residuals obtained using the set B * = {1, m + n, mn, m − n} (left panel) and H * = {1, m + n, mn, m − n, m 2 , n 2 , n/m, m/n} (right panel).The star signifies that 121 DFTB energies E(m, n) with 10 ≤ m, n ≤ 20 are used for the fitting process along with the 9 additional DFTB energies of the flakes listed in Table1.The residuals for the additional points are marked with red triangles.Two new flakes, Z(25, 35) (with a residue marked with a blue triangle) and Z(40, 35) (with a residue marked with a green triangle), are used to assess the quality of the constructed energy expressions E(m, n).The yellow background depicts region of ±2 kcal/mol corresponding to chemical accuracy.

Figure 8 .
Figure 8. Distributions of the CC bond lengths (left) and CCC bond angles (right) in square flakes Z(k, k) suggest that the graphene regime is already achieved for quite small flakes.

Figure 9 .
Figure 9. Distributions of the CC bond lengths (left panels) and CCC bond angles (right panels) in rectangular flakes Z(20, k) (upper panels) and Z(k, 20) (lower panels) suggest that the transition to graphene-like regime is achieved faster than for square flakes.

Figure A3 .
Figure A3.Fitting the DFTB energies E(m, n) to the natural logarithm of the Kekulé count K and the Clar count C produces a much better fit than the direct fit to K and C shown in FigureA2.Nevertheless, the resulting energy residuals are still huge (approximately 1000 times larger) in comparison to a fit constructed using the structural parameters m and n and their simple algebraic functions.The fitting sets are {1, ln K} for the (left) panel, {1, ln C} for the (middle) panel, and {1, ln K, ln C} for the (right) panel.

Table 2 .
The values of the coefficients c k (in E h ) calculated using various fit protocols.The last row gives a decimal multiplier for each type of coefficient.