Figures of Graph Partitioning by Counting, Sequence and Layer Matrices

Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1 Department of Mathematics-Informatics, University of Petrosani, 332006 Petrosani, Romania; MihaelaTomescu@upet.ro 2 Department of Physics and Chemistry, Technical University of Cluj-Napoca, 400641 Cluj-Napoca, Romania 3 Chemical Doctoral School, Babes-Bolyai University, 400028 Cluj-Napoca, Romania 4 Department of Conservative Dentistry, “Iuliu Hatieganu” Medicine and Pharmacy University, 400349 Cluj-Napoca, Romania; doina.rotaru@umfcluj.ro * Correspondence: lorentz.jantschi@gmail.com or lorentz.jantschi@chem.utcluj.ro or lorentz.jantschi@ubbcluj.ro; Tel.: +40-264-401-775

In some cases, operating with connected undirected unweighted graphs [9] provides all necessary information, whereas in other cases more specificity is needed [10]. On the other hand, coloring graphs (of the vertices [11], edges [12] and planes [13]) can provide useful visual information.
In this study, the case of unweighted, undirected connected graphs is considered. Some basic concepts about graphs are given here. Let G = (V, E) be an unweighted, undirected connected graph (with V, the set of vertices, and E, the set of edges). Then E is an subset (⊆) of V × V. Usually the vertices are indexed (numerically, starting from 1) so that if there are n vertices (|V| = n) then their numbering gives {1, 2, . . . , n} as their representation in the informational space. It should be noted that for a given graph G of n vertices there are exactly n! possibilities of numbering the vertices and (unfortunately) the same number of isomorphisms induced by numbering. This is why the search for graph invariants (an graph invariant is a property calculated from a graph that remains unchanged when the numbering changes) is one of the most important issues addressed when comparing graphs. From the edges (or the vertices) the next construction is chains of connected edges (or vertices). If such a chain allows revisitation of edges and vertices then it is called a walk. If only revisiting of vertices is allowed, then it is called a trail; and finally, if edges and vertices appears only once in the chain, then it is called a path (defined by edges (e i ∈ E): P = e 1 ...e k with e i ∩ e i+1 = {} for 1 ≤ i < k; or defined by vertices (v i ∈ V): P = v 1 , . . . , v k+1 with (v i , v i+1 ) ∈ E for 1 ≤ i ≤ k; a path with k edges has k + 1 vertices). Paths are an important concept in graph theory, because the topological distance metric (d i,j as the length of the shortest path between two vertices i and j) and the graph diameter (the longest distance in the graph; Algorithm A3 provided in Appendix B) are built on them.

Related Research
A literature survey showed vertex partitioning and graph coloring to be of growing interest. Cutting a graph into smaller pieces is one of the fundamental algorithmic operations; partitioning large graphs is often an important subproblem for complexity reduction or parallelization [14]. With (or without [15]) nonnegative weights on vertices, in [16,17] the balanced connected k-partition problem was addressed, which is known to be NP-hard. In the same context the minimum gap graph partitioning problem was formulated, as addressed in [18]. Partition strategies on resource description framework graphs have been studied in [19] and in [20]. Graph contraction (creating of a graph minor [21]) is used in some specific graph-related problems [22]. When parallel motif discovery is employed on complex networks [23], graph partitioning divides the network for efficient parallelization (to an approximately equal number of vertices to parts). In this context, the graph partitioning problem is NP-complete [24], and there are available strategies based on spectral [25] (eigenproblem in [26]), combinatorial [27], geometric [28] and multi-level [29] heuristics. Partitioning of the graph vertices leads to recognition the of 2-subcolorable [30], bipartite [31], cluster [32], dominable [33], monopolar [34], r-partite [35], split [36], unipolar [37], trapezoid [38] and graphical algorithms (etc.) working efficiently with special classes of graphs that have been devised (for monopolar and 2-subcolorable in [30]; for unipolar and generalized split in [39]; for partitioning a big graph into k sub-graphs in [40,41]; for graph that does not contain an induced subgraph, a claw in [42]). For an extended survey on finding sets of related vertices in graphs clustering, the reader should go to [43]. As other recent studies have shown, vertex coloring in graphs may solve a series of real problems. To tackle these problems, different coloring schemes have been proposed: the scheme based on distances in [44], the scheme based on templates in [45], the scheme based on adjacencies in [46], the scheme based on heuristics in [47] and the scheme based on pseudo-randomness (with constrains, Grundy and color-dominating) in [48]. The properties of the colorings have been studied in [49] and the counting of distinguishing (symmetry breaking) colorings with k colors in [50]. One should notice that all Zagreb indices and their relatives [51] are useless for any topological isomers of fullerene, in which any vertex has a degree of 3 (in the related notation, d v = d w = 3). Sequence matrices appeared first in a study by Frank Harary (American mathematician, specialized in graph theory, widely recognized as one of the ''fathers" of modern graph theory) regarding the distribution of phonemes [52], which has been since proven useful in solving scheduling problems [53], in connection with pipelines in [54], and for route discovery in [55]. A layer matrix, a term initially used for tables expressing stratification by age in biological populations [56], was introduced into graph theory by Andrey A. Dobrynin (see [57][58][59]). Most of the studies involving the use of the layer matrix to differentiate between topological isomers were lead by Dobrynin [60,61] and Mircea V. Diudea [62], but other researchers also found uses for the layer matrices in their studies (see [63,64]). Haruro Hosoya were the first to introduce a counting polynomial (Z-counting polynomial in [65]; see general review on counting polynomials in chemistry in [66]) to characterize a graph, and George Pólya were the first to introduce the counting polynomial into graph theory to count the topological isomers [67]. Counting matrices are the expanded forms of counting polynomials [68], since some distance-related properties can be expressed in the polynomial form, with coefficients calculable from the matrices (see [69,70]; for isomer-counting matrices, see [71]).
One set of works is especially related to the current study, since layer matrices were involved in the analysis of fullerenes: In [72], vertices were partitioned into classes of equivalence and ordered according to their centrality indexes, computed on layer matrices of vertex properties. In [73], the prediction of stability of C 40 fullerenes was derived from two indices (of complexity and of centrocomplexity) calculated on the layer matrix of valences.
The use of the counting, sequence and layer matrices and some proposed modifications and extensions to generate different partitions on graphs are given and illustrated. Finally, these partitions were used for for getting visual representations of them. As an application of molecular topology, two isomers of C 28 fullerene were subjected to atom partitioning, and it was of interest to obtain alternative groups of atoms.
To the best of the authors' knowledge, this communication is the first systematic approach of graph coloring based on a pool of partitions. To some extent, the sequence and layer matrices involved here were previously reviewed in [9], and the coloring of vertices based on counting matrices was previously reported in [12].

Graphs and Their Representation
An indexed numbered graph can be kept in the informational space as a list of the edges (pairs of integers), finally accompanied, for convenience, by the number of vertices. This type of representation is a powerful one (convenient in terms of the small amount of memory required for representation, as well as its fast processing; see Appendix A). However, in some cases a better equipped algebraic structure is preferred: a matrix representation. From this point of view, a graph can be represented by an adjacency matrix: The more convenient representation (than a rectangular one) is a square matrix-it can be raised to a power, two of such matrices can be multiplied, etc. The number of walks between two vertices is found in the powers of the adjacency matrix (of the vertices). More importantly, this it is the most commonly used matrix base representation of a graph.
Let us take an example of a graph (the one in Figure 1) to be used to introduce the following concepts.  The adjacency matrix ([Ad]) contains information about adjacencies ( Figure 2). If two vertices (either i and j) are connected by an edge ((i, j) ∈ E), then the corresponding elements in the matrix (Ad i,j ) are set to 1; otherwise, they are set to 0 (Algorithm A1 in Appendix B).  The matrix given in Figure 2 contains an additional column Σ that collects the valences (connections; number of adjacent vertices) for each vertex and it can be used to discriminate the vertices (the vertices are represented with different colors accordingly in Figure 2), being thus a first example of a criterion that can be used to create a vertices partition (for Σ j Ad i,j the partition is in three groups: {1, 2, 3}, {4}, {5}).
The distance matrix ([Di]) contains information about distances ( Figure 3). For any two vertices, the corresponding elements in the matrix (Di i,j ) are set to the value of the distance between them (Algorithm A2 in Appendix B).
[Di] 1 2 3 4 5 Σ  The additional column (Σ) collects the sum of the distances (at all vertices) for each vertex and can also be used to discriminate vertices (the vertices are represented with different colors accordingly in Figure 3) and as a criterion that can be used to create a vertices partition (for Σ j D i,j the partition is in four groups: {4}, {1, 3}, {2}, {5}-here, the groups are ordered according to the ascending value of Σ j D i,j ).
[Szd] 1 2 3 4 5 Σ  [Szd] is an important example, since unlike [Ad] and [Di] is unsymmetrical.  Adc i,k ← |{Ad i,j |Ad i,j = k}|  Dic i,k ← |{Di i,j |Di i,j = k}|   Counting matrices (by their definition) are always asymmetric. Another important property of the counting matrices is that always the sum of the elements is the same for any vertex (see columns Σ in Figures 2-4). Another classifier is introduced and is useful here. We will call it dot classifier ("." column in Figures 5-7). The numerical ordering (of the values given in the Σ columns in Figures 2-4) can be replaced with lexicographic ordering (as for the values given in the "." column in Figures 5-7).

Collecting Sets of Vertices
An important step forward to generalize (on the one hand) and simplify (on the other hand) sequencing and layering (to be defined) is to collect sets of vertices that meet certain criteria instead of their count (Algorithm A10 in Appendix B). This procedure slightly changes the previous one: from Tables 1-3  Collecting (instead of counting) defines layers natively (see Tables 1-3). Once obtained, the layers can easily be exploited to build other layer matrices.  [LD0] 0 1 2 3

Layer Matrices
The first reported layer matrix was for distance [59] and is the same as distance counting ([LD1] ← [Dic] from Figure 6; LD1 ← Dic in Algorithm A18 in Appendix B). In general, a layer matrix collects (as Σ) a property for all vertices belonging to the layer (for example for entry for vertex 1 and layer 2 of the Szeged layers in Figure 1 given in Table 3, a layer matrix will apply a sum of a property to {4, 5} as being the set of all vertices that belong to the layer). As the power of discrimination of any topological descriptor is limited (and for layer matrices as well), other layer matrices has been proposed to better take into account for branching, edges, and their sum (matrices B, E and S in [74] Table 2) and Adc 2,1 = 2 and Adc 4,1 = 3 (see [Adc] in Figure 5).
[LD3] counts distinct edges incident with the vertices in [LD0], without counting any edge that has already been counted in a previous layer. The counting for edges and the counting for adjacent vertices are the same (Figure 9). For example, in Figure 9, since LD0 1,1 = {2, 4} from all edges (5; all with endpoints in 2 or 4) there remains only 3 not counted previously ((1, 2) and (1, 4) counted for LD3 1,0 ) to be counted for LD3 1,1 .    7), all layer matrices have the same number of layers because of the counts from 0 to the diameter of the graph. Another layer matrix introduced is one of distance sums (R matrix in [75]; [LD5] in Figure 11; see Algorithm A7 in Appendix B), which, once again, results naturally from [LD0]. As an example, LD5 1,0 ← 6 ← Σ j Di 1,j (LD0 1,0 = {1}).

Sequence Matrices
Vertices similarity analysis can be involved beyond layers. One strategy is to build edge sequences. According to [76], a sequence matrix is a collection of walks (of increasing elongation) starting from each of the vertices to all the others, in opposition to a layer matrix collecting the properties of vertices u located in concentric shells (layers). The walks degrees (and their layers) are derived from rising to powers (up to the diameter of the graph) of the adjacency matrix and the subsequent collecting of the traces. Alternatively, the calculation of the layers of walk degrees of increasing length can be shortened using, along with the adjacency matrix ([Ad] from Figure 2), the layers of distance ([LD0] from  Figure 1).
[LW1] 0 1 2 3 "."      Table 2 is equivalent to the walk degrees by iterative summation over all neighbors, as Morgan proposed through their extended connectivity, ECs, to provide a unique representation for chemical structures [78].

Paths and Cycles
As mentioned in the beginning, when implying the distances (in graphs), one cannot escape from introducing paths. Unfortunately, to compute all paths in a graph is an NP (non-polynomial) hard problem [79] (its complexity increases in an non-polynomial manner) and it quickly goes 'out of memory' for any medium sized graph. For instance, when listing all paths for an isomer of C 28 fullerene (see below) the output alone contains over 1.5 million lines. Therefore, there may be a real interest for a shortened version of them, for example listing only paths less than or equal to the diameter of the graph (among those are the distance paths; Algorithm A12 in Appendix B). Such a procedure is significantly faster, and its complexity is limited by the diameter of the graph. The result (the paths list) can be further processed, and collected in a matrix form for each pair of vertices (Algorithm A14 in Appendix B); the result is labeled [SP0] and is listed in Table 4 for Figure 1). SP0 i,j is the set of paths between i (the line index in Table 4) and j (the column index in Table 4) that connects the vertices with the smallest set of edges. In general [SP0] is a complex multi-path structure that contains all distance paths between pairs of vertices. As can be seen, for instance between 1 and 3 (either of SP0 1,3 and SP0 3,1 ) in Table 4, the two shortest paths connecting those two vertices are 1 4 3 and 1 2 3. The simplest operation on the set of paths is counting of the paths, and the result is given as [SP1] (Figure 15), while the other operation can be counting the vertices (both in Algorithm A16 in Appendix B), and the result is given as [SP2] (Figure 16).  Layers on/from the sequences of paths can be generated too. The procedure is immediate and operates on the sequences already collected in [SP0] (see Table 4; both in Algorithm A14 in Appendix B) and the result is given as [LP0] (see Table 5). LP0 i,j is the set of paths starting from i (the line index in Table 5) and having a number of k (the column index in Table 5) edges connecting the vertices with the smallest set of edges. In general, [LP0] is a complex multi-path structure containing all distance paths between the pairs of vertices (LP0 i,k ← p|p ∈ SP0 i,· , |p| = max(k − 1, 0), k from p ← v 1 . . . v k ⇒ |p| = k). As can be seen in Table 5   Once generated, another immediate (from generating paths) result of the (limited by diameter) paths (Table 4) is generating (small, diameter limited) cycles (Algorithm A13 in Appendix B). A small cycle can be seen as being built from two distance paths at which we may need to add an edge to enclose a cycle. Please note that this assertion (that a cycle is build up from two distance paths) is not true in general, we may regard the cycle's build-up in this way as no bigger than double the diameter. The result of applying this procedure for Figure 1 is shown in Table 6 and their layers in Table 7 (Figure 1 have only one cycle and it is built up on two distance paths).     A distinctiveness between paths and cycles is that one cannot have cycles listed with less than three vertices, and as an effect, their layers start from 3 (Figures 21 and 22).
As a natural extension of generating sequences and sets of sequences for paths (Table 4) and cycles (Table 6), but also as a natural extension of generating the adjacency, distance and Szeged layers (Tables 1-3), another upgraded structure (upgraded from Table 3 and Figure 4) results-Szeged sets (or actually, sets of connected vertices or, in other words, fragments; Algorithm A17 in Appendix B) from collecting vertices instead of counting them (as [Szd] do, Figure 4). The result is given in Table 8. As mentioned above at [Szd] (Figure 4), an important characteristic of [Szs] too (Table 8) is its asymmetry.

Distinct Partitions Coloring of Vertices
As can be seen in the illustrations given above, either the Σ operator for square symmetrical matrices cumulating properties for pairs of vertices and either "." operator for layer matrices are able to produce different partitions of the vertices in the graphs.
Shifting from numbers (here all integer, thus from ordinal scale) or integer sequences (separated with ".", sortable, thus from an induced order scale) to colors, it hardly makes sense to keep the order relationship alive (someone may argue that the wavelength is an ordering operator, but is out of the scope of its use here).
Besides, when dealing with categories (multinominal, multinomial scales) in most of the cases, it is more important to keep the undistinctivness alive. Let us take here an example: Table 9 lists the side by side the values of Σ operator on [Szd] against the values of "." operator on [Szc]. Coloring of the vertices has been made (for any of the Figures 2-22) using the colors from the {Violet , Red , Light orange , Lime , Sea green , Aqua , Light blue } ordered set based on the operator induced partition sets. Table 9. Two orders for the same partition of Figure 1 vertices.

Vertices
Σ j Szd i,j "." k Szc i,k In some cases, it is of interest to discriminate among the two cases (see the two side by side colorings in Figure 23)-when the order of the sets in the partition is relevant, but of importance is also listing only distinct partitions when the order of the sets in the partition is not relevant. In this later case falls the example given in Table 9 and
A supplementary treatment of the information is required to alleviate this distinctiveness. Either way, it is a matter of deciding if the order of the groups is relevant or not (distinctiveness vs. undistinctiveness). Accounting for both, two different groups of classifiers it results. Table 10 gives the results of classifiers for Figure 1 side by side. Different classifications (Table 10) are always of interest in chemistry for instance in identification of new reaction pathways [80].

Case Study for Isomers of C 28 Fullerene
Fullerene is defined to have only cycles of 5 and 6 and each atom vertex to always have three neighbors (see Figure 24). Functionalization of fullerenes is of great interest for green energy [81], drug design [82], and even dentistry [83].  Table 11 gives the result of the grouping partition analysis on C 28 − D 2 , while Table 12 gives the result of the grouping partition analysis on C 28 − T d (images in Figures 25 and 26 for C 28 − D 2 and in Figure 27 for C 28 − T d ). The classifiers discussed above seem perfectly fit for this task (like Table 10 contains a summary for Figure 1 as the exemplified case).  Figure 25. U1-U6 partitions depicted in Figure 26.      Regarding the pairing (U2, D2) appearing for C 28 − T d (Figure 27), this pairing does not appear for C 28 − D 2 (Figure 25 vs. Figure 26) suggesting that its occurrence is again due to the increased symmetry of C 28 − T d than C 28 − D 2 .
Fullerenes are structures [84] with a high symmetry, stabilized by resonance, in which the difference between different positions (atoms) are very small and are of interest for their reactivity and functionalization [85]. Following this idea, of interest is identifying, visually if possible, different equivalent positions in the structures. The counting, sequence and layer matrices just do this.
As expected, with the increasing symmetry, the possibilities of distinguishing between the vertices (here atoms) are diminished. Thus, if the selected classifiers make 12 distinct classifications for C 28 − D 2 (from which 6 in which order of the vertices sets are not relevant), only 5 were created for the more symmetrical C 28 − T d congener (actually 4 considering that Ad and Adc do not distinguish between the vertices) from which only 3 (actually 2 considering that Ad and Adc do not distinguish between the vertices) patterns the vertices in sets in which the order of the vertices sets is not relevant.

Conclusions
Sequence and layer matrices are introduced accompanied with an example. Some of their extensions are given as well. These matrices were introduced to discriminate among graph's vertices on one hand, and to create different degrees of distinctiveness for the graph's vertices on the other hand. Following this foundational idea, graphs were colored according to the partitions of the graph's vertices. Two alternate cases have been identified: when the order of the sets in the partition of the vertices is relevant (the sets are distinguishable by their position), and the other when the order of the sets in the partition of the vertices is not relevant (the sets are indistinguishable by their position). The analysis employed on C 28 fullerene isomers shows that the classifiers are useful to generate a good number of different partitions and may be very helpful for scientists working in applied sciences, for functionalization of different highly symmetrical chemical structures.  Acknowledgments: Dedicated to the memory of Mircea V. Diudea (b. 11 Nov. 1950;d. 25 Jun. 2019).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Sample Availability: Drawings of the graphs as molecules are available from corresponding author.

Appendix A. Molecular Graphs and Their Representation
A lot of software and file formats are available and are used in chemistry. For instance, PubChem exports ASNT, JSON, SDF, and XML files, from which SDF is probably most compacted one (the best size:information ratio), all storing human readable information.
The format of a SDF file is as follows: • First three lines-reserved for compound identification • On fourth line: the number of atoms, the number of bonds, followed by a series of (8) reserved values (numeric and string) • An block of lines describing on each line one atom: the cartezian coordinates (x, y, and z), the symbol of the atom and a series of (12) reserved fields (numeric) • An block of lines describing on each line one bond: two numbers acting as indices for the atoms and a third number indicating the bond order, followed by a series of (4) reserved fields (numeric) Other common file, PDB, have the following format: • First two lines-reserved for compound identification • An block of lines describing on each line one atom: type of the fragment, index (numeric), symbol of the atom (1-3 characters), two other columns followed by the cartezian coordinates (x, y, and z) • An block of lines describing on each line the topology for one atom: atom index followed by the indices of the atoms connected with it A more compacted format, HIN, gives on the same line both the topology and the geometry for each atom: • in between mol number and endmol number on each line one atom having on the second column the atom index, on the fourth the atom symbol, from column 8 to 11 the cartesian coordinates, on column 12 the number of bonds followed (starting with column 13) by each bond on two columns each (atom index, bond order)

Appendix B. Algorithms
For the soley purpose of molecular topology of interest is the chopped reduced structure containing only the heavy atoms (heavy than Hydrogen), and for those atoms typically is collected the list of bonds (connectivities). The description of the algorithms providing molecular topology tools starts from a list of entries describing for each atom (now vertex) its bonds (now connections, edges). Let us consider that we already have keept the molecule as a graph in memory in a tabular form as in Table A1. First step is to obtain the vertex adjacency matrix (Algorithm A1), and then the rest of the matrices (see Algorithm A18).