Fast Spectral Approximation of Structured Graphs with Applications to Graph Filtering

: To analyze and synthesize signals on networks or graphs, Fourier theory has been extended to irregular domains, leading to a so-called graph Fourier transform. Unfortunately, different from the traditional Fourier transform, each graph exhibits a different graph Fourier transform. Therefore to analyze the graph-frequency domain properties of a graph signal, the graph Fourier modes and graph frequencies must be computed for the graph under study. Although to ﬁnd these graph frequencies and modes, a computationally expensive, or even prohibitive, eigendecomposition of the graph is required, there exist families of graphs that have properties that could be exploited for an approximate fast graph spectrum computation. In this work, we aim to identify these families and to provide a divide-and-conquer approach for computing an approximate spectral decomposition of the graph. Using the same decomposition, results on reducing the complexity of graph ﬁltering are derived. These results provide an attempt to leverage the underlying topological properties of graphs in order to devise general computational models for graph signal processing.


Introduction
Graphs, as combinatorial objects, are important in many disciplines. They allow us to capture complex interactions between different entities (elements). Due to this ability, they have found applications in a plethora of fields, spanning from biology to psychology, and from logistics to medical sciences. Additionally, interest has emerged in adapting classical signal processing methods to deal with signals in irregular domains, e.g., signals defined on graphs [1,2]. As for traditional signal processing, the fundamental tool of graph signal processing is its analogous Fourier transform: the so-called graph Fourier transform (GFT). This transform interprets a signal on a graph through its graph modes. The concept of graph modes can be understood by considering the Laplacian matrix of a graph as a discrete representation of a manifold. The so-called graph modes are then the discrete counterpart of eigenfunctions of the Laplacian operator over the continuous manifold. Therefore, the graph modes can can be identified as the eigenvectors of the Laplacian matrix and can be obtained through its eigendecomposition. Here, we remark that although the analogy connecting with manifolds employs the Laplacian, the concept of GFT holds for different matrix representations of the graph such as the adjacency matrix or normalized Laplacian [3]. For instance, let us consider an undirected (possibly weighted) graph G = (V, E ) defined on n vertices with vertex set V = {v 1 , v 2 , . . .} and edge set E and let A and D = diag(A1) be its (weighted) adjacency and degree matrices, respectively. Here, [A] i,j > 0 only if (v i , v j ) ∈ E . Then, selecting a matrix representation of the graph, S, e.g., Laplacian, normalized Laplacian, adjacency matrix, we can define the graph Fourier transform of an n-dimensional signal x over G as [3] x f = U T x, (1) where U is the orthogonal eigenvector matrix or the so-called graph modes obtained from the eigendecomposition of S, i.e., S = UΛU T , and Λ is the diagonal matrix of eigenvalues, λ = {λ 1 , . . . , λ n } also known as graph frequencies.
Although there are several alternatives to select the matrix S, we follow the formalism of [4] and use the eigenvalues of the normalized Laplacian, i.e., L n := I − D −1/2 AD −1/2 , to define the spectrum of G throughout this paper. Thus, the graph modes are then defined as the eigenvectors of L n . We make this consideration here as the normalized graph Laplacian's eigenvectors define generalized translation and modulation operators, see, e.g., [3], and its spectrum relates well with graph invariants because its definition is consistent with those of eigenvalues in spectral geometry and stochastic processes [4], which is not the case for the spectrum of other choices of S.
As noted in [5], one of the main issues of performing the GFT is that a full eigendecomposition is required. This naturally incurs a O(n 3 ) cost, which for many applications is undesirable or even infeasible. Therefore, before aiming to achieve a complexity comparable to the fast Fourier transform (FFT), we require to obtain a better understanding of the eigendecomposition of L n . In this context, a series of research works propose approximate fast GFT methods [5][6][7]. These methods employ Givens rotations and truncated Jacobi iterations to compute an orthogonal basis that approximately diagonalizes the matrix representation of the graph [8]. However, their construction relies on the assumption that Givens rotations, despite not being dense in the space of general orthogonal matrices, approximate well the matrix U. In addition, the main aim of these works is to obtain a fast GFT and not a fast graph spectral decomposition. That is, their focus is not on finding a basis for the transform efficiently, but to perform the action of the basis as fast as possible. Furthermore, no structure, as for example an excluded minor [9], is leveraged in those works. So, despite that these approaches address the problem, they fail to make use of particular structures (combinatorial in nature) that might improve the analysis and performance for restricted (but practical) families of graphs. Therefore, in this work, we aim to provide a topological approach to devise a fast graph spectral decomposition, i.e., a fast estimation of both Λ and U, as a first step towards a fast GFT. Here, we argue that for some families of graphs there is a structure that can be leveraged to obtain divide-and-conquer algorithms for fast computation. In addition, we provide asymptotic theoretical bounds for the approximation of the spectrum of a graph up to any recursion level. Although the main aim of this work is to provide an approximate method for computing the spectrum and the modes of a graph, we provide theoretical bounds on the quality of this decomposition for other tasks such as graph filtering. These results open a venue, and describe a possible landscape, for further research on graph partitioning for graph filtering and hierarchical graph filtering.

Preliminaries
In this section, we introduce the notation that will be used throughout the rest of the paper as well as related works within the field of graph theory and graph signal processing (GSP). In addition, the main contributions of this work and the outline of the paper are presented.

Notation
We use calligraphic letters to denote sets with the exception of G and H that are reserved for representing graphs. We use | · | to denote the cardinality of a given set. deg(v) denotes the degree of vertex v. Upper (lower) case boldface letters such as A (a) are used to denote matrices (vectors). (·) T represents transposition. · represents a generic norm. The use of a particular norm will be clarified in the subscript. Throughout the paper, asymptotic notation is used. Therefore, when we use O(·) and o(·) we always refer to big-O and little-o notation, respectively. They must be read as "is bounded above by" and "is dominated by", respectively. The notationÕ(·) is used to hide a factor of at most (log log n) 4 . In addition, Ω(·) denote big-Omega and it must be read as "is bounded below by". All these symbols describe the asymptotic behavior and hence represent the behavior in the asymptotic regime.

Prior Art
Besides the previously cited works that aim at finding a fast GFT, there are other works that closely relate to the existence of a fast scheme for computing a good graph spectrum approximation.
A close research area is the one of graph sparsification. Results in this area are widely used for solving large-scale linear systems involving Laplacian matrices [10]. In particular, graph sparsification and support theory [11] provide ways of finding optimal subgraph preconditioners for linear systems [12]. Another related area is the one devoted to graph coarsening. Here, instead of reducing the number of edges, reduced-size graphs are employed to approximate the action of a given graph [4]. In this context, a different interpretation of the well-known heavy-edge matching heuristic [13] for graph coarsening has been recently introduced in [14]. Similarly to our objective, the idea in [14] is to reduce the computational load of computing certain operations, e.g., quadratic forms, by introducing a trade-off between complexity and accuracy. So, while [14] aims to reduce the number of nodes involved in the computation of certain operations through stochastic arguments, here we aim to break down the graph in subgraphs to provide a trade-off in terms of accuracy and complexity but preserving the whole spectrum and not a subset of it. Furthermore, we give guarantees that are mostly deterministic, in contrast to the ones based on stochastic arguments in [14] leveraging the restricted isometric property [15].
Another related area is the one of graph signal downsampling [16]. Despite that this area is not directly connected to the goal of estimating the spectrum of a graph, specific combinatorial properties of the graph are exploited as in our work. For instance the symmetric property of the spectrum of bipartite graphs has recently been exploited to provide a theoretical framework for graph signal downsampling. Furthermore, using graph coloring ideas [9], this theory has been extended to arbitrary graphs with the aim to define wavelet filter banks [17]. Similar to these works, here we aim to take advantage of the combinatorial, algebraic and geometrical properties of certain families of graphs to provide useful simplifications of different graph signal processing tasks. One main difference between the bipartite-based approaches and our work is the fact that we focus on graphs which can be represented by (approximate) block diagonal matrices, whose off-diagonal blocks have few entries, e.g., o(n), without requiring bipartite graph approximations. Our approach follows the spirit of those employed in domain decomposition methods for parallelization of linear system of equations ( [18] Ch. 13.6), where graph partitions methods are used to distribute different part of the problem to the available processing units and later the solutions of the subproblems are aggregated to build the solution of the original problem.
Finally, the ideas presented in this work share similarities with decomposition methods used for different applications within signal processing and numerical linear algebra. For example, within GSP, multiscale (hierarchical) analysis modes have been proposed using recursive graph partitions based on the eigenvectors of the graph under study [19][20][21]. Similarly, clustering methods have been used for fast multiresolution matrix decomposition, see, e.g., [22][23][24]. In these works, the columns of the matrix to be decomposed are collected in disjoint clusters, forming a block-diagonal matrix approximation, to speed up the computation of each decomposition level. However, while these methods provide powerful analysis tools, they do not address our problem of interest: graph spectral approximation.

Contributions
In this paper, we introduce results for the approximation of the spectrum of structured graphs. We focus on families of graphs that not only have good combinatorial properties allowing for a good and efficient approximation of their spectrum but are also relevant for practical applications such as spectral sums, e.g., log determinant, trace of inverse, or spectral histogram approximation. The major contributions of this work are the following: • We provide an overview of families of graphs that accept good graph separators, i.e., graphs that can be separated into roughly equal subgraphs. Also, we discuss the characteristics of such families and how they relate to practical applications.

•
Restricting ourselves to such graph families with good recursive graph separators, we propose a hierarchical decomposition for such families of graphs for approximate graph spectrum estimation.

•
We propose a conquer mechanism to stitch back together the pieces of the hierarchical decomposition for approximate graph mode estimation.

•
We provide a theoretical analysis, leveraging current results on graph spectrum similarity, for the proposed hierarchical decomposition. We derive asymptotic bounds for both the accuracy of the graph spectrum approximation and the computational complexity of the proposed method using the hierarchical decomposition. Through numerical simulations, we show that in practice the approximation of the graph spectrum, using such a decomposition, obeys the derived bounds.

•
We employ our results to applications commonly found in the field of graph signal processing.
In particular, we derive bounds for the accuracy of approximate graph filtering using the proposed decomposition for the considered families of graphs. Despite that this result shows that a straightforward application of decomposition does not provide an efficient approximation, it sheds light on the reach of such an approach. In addition, we show that the cumulative spectral density of a given graph, commonly employed for the design of graph filter banks, can be properly approximated using the proposed decomposition. This differs from other approaches used in practice which do not provide any approximation guarantees.

Paper Outline
The remainder of the paper is organized as follows. In Section 3 we introduce some necessary background of topological graph theory and graph separators. In Section 4, an approximation guarantee for the spectrum of a graph given its bisection is provided, while in Section 5 the divide-and-conquer algorithm and theoretical guarantees on its performance are described. Section 6 corroborates the performance of the proposed method through numerical experiments. Section 7 discusses the impact of the graph partitioning scheme on graph filtering and provides theoretical guarantees. Finally, Section 8 concludes the paper and indicates future research directions.

Topological Graph Theory
Topological graph theory is a field within mathematics that studies the spatial embeddings of graphs, as it considers graphs as topological spaces [25]. From a computational perspective, results in algorithmic topological graph theory combine methods from computational geometry and data structures with classical methods from combinatorics, algebraic topology, and geometry. The recent advances in this area have led to algorithms applicable in many different areas of computer science. As results within this area are extensive, and several of them are outside the scope of this work, we refer the interested reader to [26] and references therein for an overview of this field.
In this work, we mostly focus on results from graph separators for particular families of graphs. Such graph separators provide partitions of graphs with desirable properties in terms of size and graph topology. Before presenting how to use these partitions to devise algorithms for estimating the spectra of a graph, we present a brief introduction to graph separators.

Graph Separators
Let us consider an undirected graph G = (V, E ), where V and E denotes its vertex set and edge set, respectively. In many instances, we desire to find a partition {P i } Q i=1 of the vertices in G, i.e., V = ∪ Q i=1 P i , such that the members of such groups are balanced, i.e., |P i | ≈ |P j |, ∀ i, j ∈ {1, 2, . . . , Q}, and either (i) the number of edges connecting the different partitions is minimized or (ii) the number of vertices that are needed to be removed to disconnect the graph is minimized. A common partition strategy is based on the field of graph separators [27].
In general, there are two kinds of graph separators: vertex separators and edge separators. In the case of vertex separators (see Figure 1), the vertices V are partitioned in three sets A, B and C. The goal of this partition is to disconnect A and B while minimizing the number of vertices in C and maintaining a balance between A and B, i.e., |A| ≈ |B|. Since there are no edges between A and B, vertex separators are also known as bisectors or bifurcators.
For the case of edge separators (see Figure 2), the vertices are partitioned in two balanced sets, A, and B such that the number of cut edges, i.e., edges between the two sets, is minimized. As there are (possibly) many ways to separate a graph, we first make the following remarks: (i) Finding optimal graph separators is hard.
Graphs whose graph separators are sublinear in size are considered as good graph separators. That is, the number of vertices (edges) that needs to be removed to partition the graph in balanced sets is o(n).
(iii) Graphs with good recursive graph separators are graphs whose resulting partitions have good separators themselves. For instance, if the class S of graphs has good graph separators, and S is closed under vertex (edge) deletion, then the graphs in S have good recursive graph separators. Now, we clarify further these remarks. We make the distinction between graph separators and good graph separators due to the fact that we are interested in graphs that can be partitioned in balanced sets by removing just a few vertices (edges). Since algorithmic reasons dictate properties that can be propagated to the subsequent partitions, we stress the importance of graphs with good recursive separators. This is important for the design of divide-and-conquer algorithms or for parallel processing because it implies the possibility of generating increasingly small graphs by recursively splitting the problem into many subproblems. Therefore, identifying structured graphs accepting good recursive separators are of high importance from an algorithmic perspective. Finally, despite the discouraging result (i), fortunately, there are types of graphs that have good separators [cf. (ii)] and there exist efficient algorithms for finding them.
As a last comment, let us discuss briefly the relationship between vertex and edge separators. An important result, linking vertex and edge separators, is that a good edge separator for a graph G automatically provides a good vertex separator for G given that the restriction on the balance of the sets is not very strict. Although the converse of this statement is not true in general, for bounded degree graphs (as is the case for most graphs encountered in real applications) it can be shown that the definition of an edge separator and vertex separator is equivalent up to a constant. Therefore, good vertex separators can be obtained from good edge separators (and vice versa) for such types of graphs. As in this work we are interested in partitioning a graph into disjoint sets, we mostly focus on edge separators. However, unless it is explicitly stated, throughout this work when we refer to graph separators we do not further make the distinction between edge or vertex separators.

Structured Graphs with Good Graph Separators
Let us introduce the following definition by Lipton and Tarjan [28].

Definition 1. [(α, β)-separator]
A class S of graphs satisfies the f (n)-separator theorem for constants α < 1 and β > 0 if every n-vertex graph in S has a vertex partition A ∪ B ∪ C such that and no edge has one endpoint in A and the other in B. Here, f (n) denotes a function, e.g., polynomial, of n.
From the definition of (α, β)−separator, families of graphs with α ≈ 1/2, f (n) ∈ o(n) and that are closed under bisection, i.e., the induced graphs by the subsets are contained in the family itself, are the main candidates for families of graphs with good recursive separators.
Although many sparse graphs do not have a nontrivial separator theorem in terms of Definition 1, there are many families that do. Examples of families that have good separators and are closed under bisection are planar graphs [28], e.g., graphs related to meshes and computer graphics, finite element graphs [28], e.g., graphs arising from tesselation of the space, and geometric graphs [29], e.g., k-nearest-neighbor graphs, sensor networks. Fortunately, these graph families enclose graphs that we encounter in typical (graph) signal processing applications such as field estimation, distributed sensor networks, smart grids, to name a few. For example, in the network of connections of a person, the sets depicted in Figure 1 could represent their family and close friends (A), their coworkers or professional network (B) and their acquaintances which interface with both networks (C). In this setting, it is sensible to assume that the number of people that interface with both subnetworks is small as in most cases these two domains barely intersect. Therefore, we can be encouraged to think that, similar to the fast Fourier transform (FFT), for graphs within these families we can devise algorithms for problems in GSP leveraging graph separators. For more details on these and other graph families, the reader is referred to Appendix A where more examples are the different families are explained in detail.

Approximate Graph Spectrum
Now that we know there exist graphs with good (recursive) separators, what is left to answer is the following: how well can we approximate the spectrum of a graph given a graph separator?
To address this question, we introduce our first result.
Theorem 1. Let G = (V, E ) be a graph for which a (α, β)-separator exists. Further, consider H = (V, E ) as the graph induced by removing the edges that connect the partition elements of G obtained by the (α, β)-separator.
where λ and λ are the spectra of G and H, respectively, and W 1 (λ, λ ) is the Wasserstein distance between the discrete spectra λ and λ .

Proof. See Appendix B.
This result motivates the use of graph separators for graph spectrum approximation. It shows that for families of graphs with good separators, i.e., f (n) ∈ o(n) and a maximum degree independent of the size of the graph, e.g., in a road map the number of highways connecting cities does not increase with the number of cities, we can obtain a good approximation of the spectrum of G through the spectrum of H. The main advantage of computing the spectrum of H instead of the spectrum of G is that the spectrum of H is disjoint. That is, its spectrum is the union of the spectra of its components, i.e., the matrix representation of H is a block diagonal matrix (after node permutation). This leads to a divide-and-conquer approach for approximating the spectrum of G through the one of H. By computing the spectrum of the reduced-size graphs induced by the partition, and stitching the result afterwards (conquer), an approximation of the spectrum of G can be obtained. In the following, we exploit this idea to provide an algorithm, as well as theoretical guarantees, for approximate fast graph spectrum computation of families of graphs that accept good graph separators. Furthermore, we also extend this method for families of graphs with good recursive separators which accept a hierarchical decomposition.

Divide-And-Conquer for Fast Graph Spectrum Estimation
Conceptually, the proposed divide-and-conquer approach is straightforward. Starting from a large graph, we perform, recursively, bisections, i.e., partition the graph in two subgraphs, until the subgraphs are of a manageable size, i.e., computations are affordable. Then, the spectra of the reduced-size graphs are obtained and later stitched bottom-up until the top level is reached. This approach is analogous to the divide-and-conquer approach used for computing the spectrum of tridiagonal matrices [30]. In this particular case, it is easy to see that the "graph" can be split directly in the "middle", i.e., in the middle of the rows and columns. The two parts are approximately disjoint, hence after computing the spectrum of each part the two parts can be stitched to obtain an exact spectrum. Notice, that although we use the term "exact", a more appropriate term is " -close" spectrum. The divide-and-conquer (DC) method is summarized in Algorithm 1 and an illustration of the approach is shown in Figure 3. In this figure, it is shown how a binary tree is generated by recursive bisections until the desired depth is reached (divide). Afterwards, the spectrum is constructed at each parent node, in a bottom-up fashion, by the union of the spectra of its children (conquer).
Input: Graph G, function for computing eigenvalues h : S → λ for graph family S, and depth d.
Result:λ : Approximate spectrum of G Notice that in Algorithm 1 there are several things left undefined: (i) the function h which maps a graph to its spectrum, e.g., standard eigenvalue decomposition, (ii) the depth of the binary tree and the (iii) merge routine, e.g., union of elements. Although in principle, the method can be applied for any given functions h, merge, and depth d, we decided to keep the algorithm description as general as possible due to the fact that these three free parameters impact the method in the following ways.
(a) The function h directly affects the computational complexity as it is the core operation of the algorithm.
In addition, if h is not an exact algorithm but a randomized or approximate method, then it will impact as well the quality of the final approximation. (b) The merge function is the conquer step that stitches back the solution of the leaves. As this function is called at every non-leaf node, dealing with increasing-size arguments (traversing the tree upward), it must be a low-complexity routine to not blow up the computational complexity of the method. (c) The depth of the tree, i.e., the number of recursive bisections, affects both the computational complexity and approximation quality. As each bisection worsens the approximation, deeper binary trees (theoretically) worsen the quality of the approximated spectrum. In addition, as the depth parameter controls the base case of the recursion for applying h, its value also impacts the final complexity of the algorithm.
· · · · · · · · · · · · · · · · · · λAλB λ λA 1λA2 An interesting property of this method is that it can be extended to other kinds of tasks. That is, this divide-and-conquer method can be used in all applications involving spectral properties of large graphs. For instance, we could consider applications where we desire to test if there exists an eigenvalue within a certain interval in a very large graph. By following this approach, if the graph under test falls within one of the previous families, we can test this property on the reduced-size graphs. However, there is always a trade-off between approximation quality and complexity, i.e., higher bisection orders worsen the performance derived from Theorem 1.

Accuracy-Complexity Trade-Off
Until this point, we only have provided theoretical guarantees for a graph decomposition with a single level, i.e., depth d = 1. In the following, we introduce the main theorems of this work with respect to both the quality and complexity of the DC approach for graph spectrum approximation. For this analysis, we consider the case where the merge routine solely sorts its input, incurring a complexity of O(n log n) for inputs of size n/2 each.
Before proceeding, we first introduce the following definition.

Definition 2. [Linearly separable graph family]
The set S c of graphs for which a O(n c )-separator theorem exists, with c < 1, and whose subgraphs are also contained in the set, is called a c-separable graph family. Furthermore, if the separator is efficiently computable in (approximately) linear time then the set is a c-linearly separable graph family and it is denoted as S l c .
For the sake of simplicity, we restrict our results to the family of graphs S l 1/2 . This family includes planar graphs, geometric graphs in 2-D, and graphs with a restricted minor [9]. However, the results hold for any c-separable graph family as long as the exponents are modified appropriately. In the following, the main results with respect to approximate graph spectrum estimation are presented. These results provide the following two insights. First, for large-size graphs the approximation is tight, i.e., it vanishes as n grows. Hence, the proposed method is asymptotically efficient in the size of the graph. Second, the time complexity of the method is bounded below by the so-desired O(n log n) complexity, and above by the (naive) cubic complexity, O(n 3 ). Therefore, as pointed out in Corollary 2, a good compromise between the decay rate of the error and complexity can be achieved when we fix δ to 1/4.
Although a discussion about parallelization is not the main aim of this work, we want to stress that, as most of the divide-and-conquer approaches, the proposed method is parallel friendly. That is, as each of the subproblems (at the leaves) are disjoint, they can easily be parallelized providing a significant reduction in the time complexity. However, this analysis is outside the scope of this work and it does not provide any further insights into the proposed method.

Eigenvectors Computation
Although we have mainly focused on the computation of the spectrum of the graph, for the sake of completeness, in the following, we present some theoretical results that provide insights on the approximation quality of the graph eigenvectors. These results motivate the later discussion of the landscape for the approximation of graph filters in Section 7.
To bound the error of the eigenvectors obtained through the proposed decomposition, we make use of the Bauer-Fike theorem [31] and provide the following result.
Theorem 3. Let S be the matrix representation of graph G. Further considerŜ as the matrix representation of the graph H obtained from G at depth d. Ifû is the eigenvector ofŜ with eigenvalueμ, then there exists an eigenvector u of S, with eigenvalue µ, such that where γ := 1 n S −Ŝ 2 F .

Proof. See Appendix D.
Despite that the above result shows a vanishing behavior in the normalized Euclidean distance between the approximated and true eigenvectors if γ ∈ o(n), it also shows that for highly clustered eigenvalues, the approximation error can be large. This result is not surprising as all eigendecomposition methods present difficulties for defining the eigenspace of highly clustered eigenvalues.
A possible way to improve the approximation error for the eigenvectors is through iterative refinement. Here, we will use the inverse iteration method to show that it is possible to perform improvements in near-linear time. The result is presented in the following theorem.
where the normalization constant c k is appropriately chosen, produces an eigenvector estimateû k+1 such that where µ is the closest eigenvalue of S toμ and µ close is the closest eigenvalue of S to λ. Moreover, such anû k+1 can be computed in expected timeÕ(m log 2 n log 1/η) at η-accuracy when S is the combinatorial Laplacian. For cases where S is the graph adjacency matrix, the complexity result holds for eigenpairs withμ ≥ max v∈V deg(v).

Proof. See Appendix E.
Although the proposed improvement of the approximation quality of the eigenvectors is based on an iterative method, the inverse iteration method tends to convergence in few iterations (typically one) when the approximate eigenvalue is close to the true one. Finally, we stress that the previous result is not the tightest in terms of computational complexity. If further structure of the Laplacian is known, e.g., planarity, faster solvers [32] can be employed to speed up the computation of the inverse iteration method. Notice that although the power method could have been used to compute the modes of the graph, having available O(1/n δ )-approximations for all eigenvalues leads to an overall reduction of the computational effort.

Experimental Results
To illustrate the performance of the proposed decomposition, in the following we perform a set of numerical experiments. In these numerical examples, we mostly focus on the family S l 1/2 as it includes most of the commonly encountered graphs within the field of graph signal processing, e.g., meshes, sensor networks, etc. For these experiments, we make use of the graph separator toolbox [33].

Depth of Hierarchical Decomposition
To validate the theory, we first perform numerical simulations for graphs in S l 1/2 of fixed dimension for different decomposition depths. Here, we generate a set of 100 random S l 1/2 graphs with n = 1000 nodes. The model to generate such graphs follows a unit disk model (see Appendix A). The model to generate such graphs follows a unit disk model (see Appendix A). Here, we generate n 2D points in [0, 1] × [0, 1] and establish edges between points pairs whose distance is below a fixed threshold. For this simulation, we have set the threshold to 0.1.
From Figure 4a we can observe how the average of the normalized approximation error, i.e., e n = λ −λ 2 / λ 2 follows the expected behavior from the developed theory. That is, there is a rapid increase of the approximation error for higher decomposition orders, i.e., increasing the depth parameter. From this plot, it can be observed that when the depth parameter is fixed to d * = 0.5 log 2 (n) = 4, the theoretical normalized approximation error is below 20% while the empirical error is below 10%. This discrepancy is due to the worst-case analysis for the derived error and the omitted additive and multiplicative constant hidden by the asymptotic notation. This is the depth at which the method provides a good trade-off between computational complexity and spectrum approximation accuracy as described in the theoretical analysis.

Asymptotics for Graph Size
Similarly, we perform an analysis to evaluate the behavior of the proposed method in terms of the graph size. Here, we set a fixed depth of d = {4, 5, 6} and perform a series of experiments for graphs of different sizes within the family S l 1/2 . For each graph size, we generate 100 random graphs and the average normalized approximation error, e n , is reported in Figure 4b. As before, the trend of the normalized error follows the expected behavior provided by the theoretical analysis. That is, for graphs with increasing number of nodes, the quality of the approximation improves. This result illustrates the asymptotically efficient behavior of the proposed method for spectrum estimation using the hierarchical decomposition for these families of graphs.

Time Complexity
At this point, we have shown that the theory holds for the approximation quality of the graph spectrum. However, one of the appealing reasons to use this approach is the possibility of making a trade-off between spectrum accuracy and computational complexity. To show the behavior of the proposed method, in Figure 5 we plot the time required by the built-in MATLAB function eig(·) for different graph sizes and compare it with the time required to obtain the approximate spectrum with the method introduced in this work. Clearly, the time required by the MATLAB function follows the expected complexity for eigenvalue decomposition. However, the proposed approximation method does not exhibit such a rapid increase for the different tested graph sizes. For this simulation, the depth decomposition is set to d = 0.5 log 2 (n) as this is the depth value that provides a good trade-off between computational complexity and approximation accuracy. The time reported for the proposed method includes (i) the time for the decomposition, (ii) the time for computing the eigenvalue decomposition at the leaves and (iii) the time for sorting the eigenvalues.
The combination of these results shows that both in theory and practice the proposed method delivers good and feasible approximations for the spectrum of structured graphs. This opens a venue for approximate signal processing techniques that leverage the underlying combinatorial structure of the graph.

Real Example: Minnesota, a Close-To-Planar Graph
To further illustrate the applicability of the proposed method for another choices of S, we use a well-known graph, the Minnesota graph [34], and the combinatorial Laplacian to show that this method can be applied to real graphs that approximately meet the conditions discussed in previous sections. It is known that the Minnesota graph is not a planar graph as it has a subgraph isomorphic to K 3,3 , the complete balanced bipartite graph on six vertices [35]. However, through this example, we show that it is possible to obtain an approximate spectrum, and its cumulative spectral density. As the Minnesota graph has n = 2642 nodes, we set a decomposition depth of d = 5. Using the geographic coordinates (as each node is located in a 2D space) we perform the hierarchical decomposition by recursively divide the graph using geometric partition [29]. This partition provides us with a good separator at every depth and its computation requires linear effort, i.e., O(2n). From the approximate spectrum, we compute the cumulative spectral density [36] using the expression where 1 {λ l ≤z} denotes the indicator function which takes a value of one whenever λ l ≤ 0 or zero otherwise.
The comparison between the true spectrum and cumulative spectral density with their approximations is shown in Figure 6. In addition, in Figure 6a we have color-coded the different subsets of nodes that belong to each of the elements of the partition. For this particular graph, the built-in MATLAB function eig(·) takes approximately ≈ 9 s for computing the matrix eigenvalues, while the method based on the hierarchical decomposition only takes ≈ 0.3 s. Finally, we show in Figure 7 some measures related to the quality of the approximated eigenvectors,Û, of the graph. Particularly, in Figure 7a, we show the absolute value of the inner product between the approximate and true graph eigenvectors. Here, we observe a banded structure whose bandwidth decreases for eigenvectors related to larger eigenvalues. Simiarly, in Figure 7b, we observe that the approximate eigenbasis almost diagonalizes the original graph Laplacian. For completeness, we show in Figure 7c a comparison in terms of the structure, i.e., Laplacian matrix support, of the original graph (red and blue dots) after permutation based on the partitioning and the block-diagonal approximation (blue dots).

Approximation of Graph Filtering
The main workhorse of graph signal processing is graph filtering. In many instances of the GSP field, we appreciate the distributable capabilities of a graph filter. However, this term is only applicable when the communication graph matches the data graph. When all the data resides on a single location, e.g., centralized data management, a distributed graph filter does not necessarily lead to an efficient distributed implementation, i.e., a parallel-friendly computation. As noticed in [5], the main bottleneck of the polynomial graph filter implementation is the queuing effect when performing local computations, i.e., performing a dense matrix-vector multiplication might end up more efficient than sequential sparse matrix-vector multiplications (queuing effect). In [5], the authors propose to compute support-disjoint sparse matrices to perform an approximate GFT. This somehow effectively alleviates the queuing problem as the different matrix-vector multiplications can be carried out in parallel. However, despite that this approach provides a parallel-friendly model for data processing, it still requires a shared memory location, i.e., the matrices operate over the full data vector regardless of the matrix support.
Data most of the time is geographically restricted. That is, the transmission and storage of data are usually constrained by the distance to the source. For example, although it is possible to mirror all possible information from different networks in different geographical regions, their storage is often local, i.e., the most accessed data is usually related to the local region (the last use/generated-first fetch principle for caching). Therefore, it makes sense to keep processing local as well and establish a hierarchical processing chain to minimize both the data management and communication protocol complexity. Luckily, the previously discussed recursive graph partition provides an alternative to deal with data processing in such domains. This is due to the fact that geographically restricted data, usually spawn from social relations or sensor networks, can be (approximately) modeled using planar or geometrical graphs which are structured graphs admitting good recursive graph separators. Examples for such networks can be found in traffic networks or range-limited transmission wireless networks, as discussed in Section 3.
To leverage the recursive decomposition presented in Section 5 for fast approximate graph filtering, we propose to use this hierarchical graph decomposition to compute disjoint graph filters that can be solved locally within the localized node clusters given by the partition. More specifically, we can state this in the following way. Considering a graph G, and a hierarchical decomposition T [cf. Figure 3]. We approximate the action of a desired shift-invariant graph filter [37], H, defined by a set of K polynomial coefficients {α k } K k=0 and a graph matrix representation S, e.g., graph shift-operator in GSP [1] using the structure in T . That is, we approximate y = Hx with where H t and x t are the shift-invariant graph filter and the graph signal partition associated with the leaf t of the partition T , respectively. Here, the simplest merge operation could be just the union of the result of the disjoint node partitions, however, more complex merging routines are possible.
In the following theorem, we provide the first result in approximate graph filters by means of graph bisection for the union merging routine and for a generic matrix representation of the graph, S, typically known within GSP as the graph shift operator, see, e.g., [2]. Theorem 5. Let G be a graph, S its matrix representation satisfying S 2 < 1 and H = ∑ K k=0 α k S k a graph filter defined on G such that Hx 2 = x 2 , where x is a signal defined over the nodes of G. Then, the block diagonal matrixS, obtained by the bisection of G and removal of the connections between the connected components, can implement a graph filterH = ∑ K k=0 α kS k , over the induced graphG such that where K is the order of the graph filter and S −S 2 .

Proof. See Appendix F.
This result shows that even for graphs with good partitions, there is no straightforward asymptotic efficient approximation for graph filtering. However, it encourages its use for the families presented in this work as the growth rate of the error is not fast. That is, by inspection it can be seen that is directly proportional to the graph edit distance, i.e., if S is considered a scaled adjacency matrix then can be directly upper bounded with the Frobenius norm which translates to the edit distance. Hence, for the family S l 1/2 the error grows at most following a square-root law in the size of the graph. Numerical experiments in the next section illustrate this behavior.
From the result of Theorem 5, where a single bisection is considered (d = 1), we can expect that for higher order recursions, i.e., decomposition depth, the graph filter approximation quality degrades further. The reason for this issue is that the difference between the approximation error of the matrix representation of the graph only worsens with increasing decomposition depth. Hence, a stitching method is required in order to improve the approximation at the cost of increasing the communication and implementation cost of the method. This is left for immediate future research, where the low-rank structure of the off-diagonal blocks, guaranteed by the separator theorem, can be leveraged to obtain a better merging strategy than only working on the leaves of the decomposition.

Results for Approximate Graph Filtering
In the previous discussion, it was shown that the approximation quality for graph filters is O( ), where is related to the difference between the matrix representation of the graphs. For the graph family S l 1/2 , this parameter has an order given by O( √ n). Hence, it is expected that the error, for different filter orders, increases following a square-root law. In the following numerical experiments, we illustrate this behavior for approximating a heat kernel filter, i.e., h(λ) = e −τλ/λ max , where τ = 10 and λ max is the maximum eigenvalue of the matrix representation of the graph. All the filters evaluated are implemented and designed using the GSP Toolbox [38]. For these experiments, we evaluate the performance of the approximation in terms of the normalized error e n = Hx −Hx / Hx , whereH is the approximate graph filter as described before. Here, we have considered four types of inputs: (i) uniform distributed signals with entries in the range [0, 1] (positive/uniform), (ii) zero-mean normalized Gaussian distributed signals (Gaussian), (iii) binary signals (Binary) and (iv) bipolar signals with entries in {−1, 1} (Bipolar). In Figure 8, the results for these comparisons are shown. From Figure 8a, we can observe the square-root law for the increase on the approximation error as described in the theory. Here, we performed the comparison using a filter order K = 30 for both the true and approximate filter. The increasing trend of the error is only reflected in the Gaussian and Bipolar signals as those kinds of signals are zero mean asymptotically. For the Positive/Uniform and Binary signals, the error decays sublinearly as the size of the graph increases. This is due to the fact that the mean term of the signal dominates the expression of the error. In Figure 8b,c, a comparison for different filter orders is shown. Here, it can be seen that the error remains constant across different filter orders. This result is due to the fact that the tested filter is sufficiently smooth for obtaining good approximations with low-order Chebyshev polynomials. The trend observed in Figure 8a is again observed for the different filter orders.

A Note on Graph Filter Bank Design
Motivated by the current developments in graph filtering [39][40][41][42][43], there has been a large effort in defining (and designing) filter banks for the task of filtering signals in graphs [17,44,45]. Although there are many alternatives for designing filter banks in general, as remarked in [46], a guiding principle when designing filter banks is to identify gaps in the spectrum of the matrix representation of the graph, e.g., normalized Laplacian or adjacency matrix. For identifying such gaps one requires to have access either to the full spectrum, which for large-size graphs might not be feasible. Therefore, an approximation method for identifying gaps in the spectrum of such matrices is required.
Here, is where our method provides an approximate solution for designing graph filter banks. In this sense, there are two possible ways to use the proposed method. The first is merely a direct application of our results and estimates the spectrum of the graph by means of the introduced hierarchical decomposition. The second consists of estimating the cumulative spectral density function of the graph [c.f Equation (7)].
For approximating such a function, a widely used method is the kernel polynomial method (KPM) [47]. This method has been adapted in [46] using a stochastic estimator of the trace of a matrix which incurs an overall complexity of O(KJ|E |), where K and J are the order of the filters used for approximating the Heaviside function and the number of random samples required by the method, respectively. For non-sparse graphs, i.e., |E | = Ω(n), this approach might not be appealing as its complexity is not linear anymore in the size of the graph and the number of random samples required by Hutchinson's method for estimating tr(A), for any given matrix A, is known to require J = 6 −2 ln( 2 δ rank{A}) samples for an error of at most with probability δ when Rademacher distributed vectors are employed [48]. Hence, in such cases, the hierarchical decomposition, with deterministic bounds, could be used as an alternative to compute a fast approximation for the cumulative spectral density of the matrix representation of the graph. In this case, using the result of Theorem 2, we can guarantee that our approximation to the cumulative spectral density is asymptotically efficient (for increasing graph sizes) and is related to the edit graph distance. Further, differently from the KPM, our method provides not only an approximation for the CDF but to both eigenvalues and eigenvectors. And if desired, the KPM can be instead employed in the leaves of the decomposition to approximate the CDF at them and then later on construct the global CDF of the graph. This last modification could lead to a significant reduction as the full eigendecomposition of the submatrices at the leaves is no longer required.

Conclusions
In this paper, we have introduced a set of families of graphs which have good recursive graph separators. Based on this property, we proposed a divide-and-conquer approach for approximating the spectrum of large-sized graphs within these families. The proposed algorithm was stated in general form allowing to generalize the approach to different applications involving the spectrum of a graph. In addition, we derived a theoretical bound on the error for the approximation of the spectrum of such graphs for any depth of the graph decomposition. Furthermore, we showed that the proposed decomposition based on recursive bisections might be beneficial for other graph signal processing tasks where operating with the full graph is infeasible or undesirable. These results form a step forward towards a fast graph Fourier transform analogous to the one existing fast Fourier transform in the time-domain.

Conflicts of Interest:
The authors declare no conflict of interest. In addition, the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Graph Families with Good Separators
In the following, we present a brief summary of graph families that allow for good separator and that are closed under partitioning, i.e., each element of the partition is within the class.

Planar Graphs.
A graph that can be drawn without edge crossings is called planar [49]. This type of graphs arises typically in applications related to 2-D meshes and computer graphics. For such graphs the following separator theorem is known: Theorem A1 ( [28]). Planar graphs satisfy a O(n 1/2 )-separator theorem with constant α = 2/3 and β = 2 √ 2.
In addition, Lipton and Tarjan also provided a linear-time algorithm to find such a separator in planar graphs. This result is of great importance as for large-scale graphs the computation of such partition remains feasible. Besides planar graphs, degenerate instances of them such as edgeless graphs, linked lists, and trees have good vertex separators [50][51][52].
Almost Planar Graphs. Similar to planar graphs, almost planar graphs can be made planar by solely removing a small number of edges. Instances of this kind of graphs are road networks or power grids. In such (physical) networks, planarity is lost by bridges or tunnels, therefore by removing such edges, the graph can be approximated through a planar one. As a result, the O(n 1/2 )-separator theorem for planar graphs holds (approximately) for this family of graphs.
Finite Element Graphs. This family of graphs arises from finite element methods, for example through a tessellation of the space. Following the formal description of Gilbert, et al. a finite element graph can be obtained from a planar graph as follows. First, the graph is embedded in a plane. Next, we identify certain points as nodes, e.g., vertices, points on edges, points in faces. Then, edges between all nodes that share a face are drawn. From this construction, if the number of nodes per face is bounded by d, finite elements graphs satisfy a O(dn 1/2 )-separator theorem [28].
Graphs Embedded in a Low-Dimensional Space. Interestingly enough, nearly all graphs that are used to represent connections in low dimensional spaces have small separators. For example, 3-D meshes, under certain nicety conditions, accept a O(n 2/3 )-separator theorem [53]. In general, unstructured meshes of dimension d allow a O(n (d−1)/d )-separator theorem [53].
Circuits. These are one of the families of graphs that motivated the early application of graph separators. This class and its separator are still used for VLSI design when designers want to minimize the area employed [54]. If the circuit, represented through vertices (components) and edges (connections) is drawn with few crossings, then it may be considered as an almost planar graph. Hence, the O(n 1/2 )-separator theorem approximately holds. Otherwise, if the circuit can be embedded on a surface of genus [9] g, then it has an exact O((gn) 1/2 )-separator theorem [51].
Geometric Graphs. This family of graphs arises by construction through geometrical objects. As this family of graphs encloses a large variety of objects, e.g., k-nearest-neighbor graphs, meshes, etc., here we only consider the unit disk graphs subfamily. Unit disk graphs arise naturally in applications involving sensors networks. These graphs are combinatorial objects generated by the intersection of disks on the plane. They are also known as Euclidean graphs. These graphs are constructed from points in the space (vertices) when edges between points are drawn if the distance between them is smaller than some threshold, e.g., range-limit communication graph. For this family of graphs, a O(n 1/2 )-separator theorem holds similarly to the case of planar graphs as they are embeddable in a 2-D surface. For a more in-depth discussion of the results related to this kind of graphs, the reader is referred to [29].
Social Graphs. Networks such as friend, bibliographic or citation graphs have good separators in practice as they are based on communities, thus exhibiting local structure, see, e.g, [55]. Within social graphs, most links can be found within some other form of community or local domain, as the link graphs used for the web. Unfortunately, differently from the previously discussed types of graphs, social graphs cannot be guaranteed to accept good recursive separators in general. This situation can be observed in a social network represented by a power-law graph. Differently from planar or geometric graphs, despite that a power-law graph can be easily separated, there is no guarantee that its partitions accept good separators themselves as it is not (necessarily) closed under (vertex) edge deletion. However, in practice, this seldom is the case as networks of friends are, again, networks of friends [56].

Appendix B. Proof of Theorem 1
Using the results from [57], we can obtain the following inequality for any two graphs G and H where G∆H denotes the graph edit distance [58]. Further, using the fact that max v∈V deg(v) ∈ O(1), we have for some f . Therefore, as |V | = n, the result follows.

Appendix C. Proof of Theorem 2
First, we prove the the existence of the approximation guarantee. Let us consider the result [57] for the earth mover distance between the spectrum of G andG. This implies that the approximation error is bounded by the edit distance between the original graph, G, and the graph used to compute the approximate spectrum,G. The decomposition employed in Algorithm 1, with depth d, applied to graphs in S 1/2 , obtains a surrogate graphG where G∆G = αn 1/2 + 2α √ n/2 + · · · + 2 (d−1) α n/2 (d−1) edges have been removed from the original graph G. Here, we have kept the second term in the big-O notation to provide an expression for zero error when d = 0, i.e., if the original graph is considered, the true spectrum must be computed. A vanishing error for sufficiently large graphs, i.e., implies that the number of removed edges meets G∆G = o(n). This ensures that (A3) becomes zero asymptotically. Here, o(·) is used to denote little-o notation, i.e., f (x) = o(g(x)) implies lim x→∞ f (x)/g(x) = 0 as g(x) is nonzero, or becomes nonzero after certain point. Therefore, to guarantee this condition we should show that G∆G ∈ O(n (1−δ) ) for some δ ∈ R ++ .
If we set the depth of the decomposition to d * = O(η δ log n) as stated in the theorem, the number of removed edges given by (A2) becomes G∆G = O(n 1/2 √ 2 d * − n 1/2 ) = O(n 1/2 · n η δ − n 1/2 ) = O(n 1−δ − n 1/2 ) ∈ O(n 1−δ ), as we wanted. Here, we have used the definition η δ 1 − 2δ. Using this result, we can show that the approximation error when employing the proposed decomposition with depth d * is where δ it is seen to define the rate of decay of the error. From the definition of d * it is observed that for 0 ≤ δ < 1/2 the depth parameter is positive, which implies a feasible decomposition attaining an O(1/n δ )-approximation as desired. For δ = 1/2 we have d = 0 which implies that no decomposition is performed and the original graph, G, is used to compute the spectrum. In this case, it is seen from expression (A5) that the error is zero.
The time complexity can be obtained by considering the dominant (competing) operations: (i) computation of eigendecomposition of the leaves, and (ii) computation of the graph decomposition. The complexity for (i), T e (n, d), is obtained by considering that there are 2 d leaves in the decomposition and that each leaf contains a n/2 d × n/2 d matrix, i.e., T e (n, d) = O(2 d T f (n/2 d )), where T f (m) is the time required by f (·) to compute the eigendecomposition of an m × m matrix. The complexity for (ii), T d (n, d), is obtained considering that there are d levels in the tree, and that at the ith level 2 i decompositions on graphs with n/2 i nodes are performed, i.e., O(d · 2 i · T g (n/2 i )). As for graphs in S 1/2 the decomposition requires linear time, i.e., T g (m) = O(m), we obtain linear complexity in n and d, i.e., T d (n, d) = O(nd).
As a result, the total complexity can be expressed as Now, substituting the optimal depth parameter, d * , in (A10) we obtain a total complexity T(n, δ) = O(n 1+4δ + (1 − 2δ)n log n), depending on the size of the graph and the accuracy of the approximation. Furthermore, considering that log n = o(n ω ), ∀ ω > 0, we obtain T(n, δ) = O(n 1+4δ ), as we wanted to show.

Appendix D. Proof Theorem 3
Direct use of the Bauer-Fike theorem leads to the inequality where we have defined E := S −Ŝ.
Here, we used the fact that E 2 ≤ E F and the definition of the spectral norm of (S − µI) † . Thus, the result follows.

Appendix E. Proof of Theorem 4
Using results from the convergence analysis of the power method, we directly obtain where γ is the second closest eigenvalue of S toμ. Considering that |μ − µ| ∈ O(1/n δ ) and making the approximation for small O(1/n δ ), we obtain Further, noting that |µ − µ close | ≤ |µ − γ|, the convergence rate result follows.
To show the result for the computationally complexity, first we recall the following result Theorem A2 ( [12]). On input an n × n symmetric diagonally dominant matrix X with m non-zero entries and a vector b, a vectorx satisfying x − X † b X < η X † b X , can be computed in expected timeÕ(m log 2 n log 1/η).
For the case that S is taken as the graph adjacency matrix, if the eigenpairs meet the condition stated in the theorem, we have that S is diagonally dominant. Thus, the complexity result follows.
For the case that S is the combinatorial Laplacian, L, we cannot directly use the result of [12] because S −μI is not diagonally dominant. Hence, we first need to show how to implement the solution of the involved linear system through Laplacian systems. Without loss of generality, we assume that 1 Tû = 0. So, an equivalent linear system can be built as where y k+1 := Lû k+1 . The above linear system can be solved by the Jacobi method, i.e., where Lz (i) = y (i) k+1 . Notice that each solve, for both z (i) andû k+1 is based on a Laplacian system. Hence, the result of [12] can be used and the complexity result follows.
Finally, as it is assumed that the spectral norm is strictly upper bounded by one, i.e., S ∈ O(1), we obtain the desired result.