Near-Optimal Graph Signal Sampling by Pareto Optimization

In this paper, we focus on the bandlimited graph signal sampling problem. To sample graph signals, we need to find small-sized subset of nodes with the minimal optimal reconstruction error. We formulate this problem as a subset selection problem, and propose an efficient Pareto Optimization for Graph Signal Sampling (POGSS) algorithm. Since the evaluation of the objective function is very time-consuming, a novel acceleration algorithm is proposed in this paper as well, which accelerates the evaluation of any solution. Theoretical analysis shows that POGSS finds the desired solution in quadratic time while guaranteeing nearly the best known approximation bound. Empirical studies on both Erdos-Renyi graphs and Gaussian graphs demonstrate that our method outperforms the state-of-the-art greedy algorithms.


Introduction
Graph Signal Processing (GSP) is a powerful tool for irregular data modeling and processing, and has attracted extensive attention from the signal processing [1,2], machine learning [3], and computer vision communities [4,5]. Traditional signal processing usually treat the data as a sequence of vectors, then analyze the features in the time domain or frequency domain, and try to extract information from the features. This implies that the data rely on an Euclidean space R n . However, in many real-world applications such as computer graphics [6] and machine learning [7], this assumption is not accurate enough. For example, the data captured by sensors in a sensor network are always interconnected to each other, and they may rely on a manifold embedded in an Euclidean space. Moreover, a RGB image lies within the pixel space R H×W×3 , but the dimension of the natural image space is usually much lower. GSP brings the signal from the time (spatial) domain to the node domain, and allows us to use a similar tool in traditional signal processing, such as Fourier transform and wavelet analysis, to analyze the graph signals, and then extract the latent information. As an illustration, Figure 1 shows the graph signals supported on the Stanford bunny graph, the Swiss roll graph and the Minnesota road graph respectively.
However, in some applications, the underlying graph is very large, which makes it very expensive to collect and process the data from all nodes. Graph signal sampling is a basic problem in GSP that aims to select a subset with the minimal reconstruction error from the graph node set. Hence, we just need to collect the signal from a small-sized sampling set, and the complete graph signal can be obtained via optimal graph signal reconstruction. Graph signal sampling has been widely used in many fields, e.g., sensor placement [8] and 3D point-cloud sampling [9]. Previous works [10,11] first considered the graph signal reconstruction problem, which aims to recover the whole graph signal from partial observations, then formulated the graph signal sampling problem as a subset of nodes with the minimal optimal reconstruction error. This sampling problem can be regarded as a subset selection problem, which plays a fundamental role in many fields, such as sparse reciprocal graphical model estimation [12] and sparse plus low rank graphical model estimation [13]. This problem is proved to be generally NP-hard [14], so we may not find a global optimal solution of this problem in polynomial time. A sub-optimal method is the greedy algorithm, which selects an element with the greatest improvement to the objective function at each iteration. The greedy algorithm is very simple, but also very effective. In [15], it is proved that the greedy algorithm achieves 1 − e −γ approximation ratio for normalized (that means f (∅) = 0, f (V ) = 1) monotonic weak submodular objective functions in general subset selection problems, where γ is the submodularity ratio of the objective function. For submodular objective functions we have γ ≥ 1, then the greedy algorithm guarantees 1 − 1/e performance, which is the best-so-far theoretical guarantee.
Recently, many Multi-Objective Evolutionary Algorithms (MOEAs) such as the Pareto Optimization for Subset Selection (POSS) algorithm have exhibited better performance than greedy algorithms do [16][17][18]. POSS treats the cardinality constraint as an additional objective function of the problem, then the original problem is transformed to a Bi-objective Optimization Problem (BOP). POSS adopts the evolutionary mutation operation to search for different Pareto optimal solutions of the BOP, and finally select a desired Pareto optimal solution.
Though POSS has been proved to be a powerful tool to solve the subset selection problem, it is infeasible to directly apply it to the graph signal sampling problem, since the evaluation of the optimal reconstruction error is very expensive. In this work, we propose the Pareto Optimization for Graph Signal Sampling (POGSS) based on POSS, and an acceleration algorithm is proposed. By introducing the supermodularity ratio of the objective function, we prove that this method theoretically achieves comparable performance as greedy algorithms do in polynomial time, and show that the supermodularity ratio is bounded as well. Our major contributions can be summarized as follows:

1.
Inspired by POSS, we formulate the graph signal sampling problem as a subset selection problem, and solve this problem by employing Pareto optimization, which has been proved to be superior to greedy methods.

2.
To accelerate the objective estimation procedure, we propose a novel acceleration algorithm based on matrix inversion lemma. Our acceleration algorithm is scalable since it can be used to accelerate the evaluation of any solution, thus that can be directly applied to other evolutionary-based methods (e.g., POSS with recombination [17]).

3.
We provide comprehensive theoretical analyses of the proposed algorithm, including complexity analysis, Pareto optimality analysis and error bound analysis.
The remainder of this paper is organized as follows. We present the preliminaries and formulate the problem in Section 2.1. We give a brief introduction to the POSS method and introduce the Pareto optimization for graph signal sampling (POGSS) algorithm in Section 3. In Section 4, we provide comprehensive theoretical analyses of POGSS. The Numerical results and the conclusions are presented in Sections 5 and 6, respectively.

Notations
In this paper, we use the bold uppercase letters (e.g., A) to denote matrices and use bold lowercase letters (e.g., a) to denote vectors. (i, j)-th entry of A is denoted as A ij or [A] ij . (·) T denote the transpose, (·) −1 denote the inverse, (·) † denote the Moore-Penrose pseudo-inverse. Tr(·) denote the trace. E[·] denote the expectation. diag(a) stands for a diagonal matrix with the diagonal entries given by a vector a. Sets are denoted by calligraphic letters, e.g., A (except the set of real number, which is denoted as R).

Graph Signal Processing
We consider a weighted, undirected graph G = {V, E }, where V is the set of nodes with |V | = n, E is the set of edges. Let A ∈ R n×n be the adjacency matrix of G, where A ij is the weight of the edge between nodes i and j. The graph signal x is a function that maps each node in V to a real number, and can be written as a vector with its' i-th entry denotes the signal value at nodes i.
Defined the unnormalized graph Laplacian matrix of G as where D is a diagonal degree matrix with D ii = ∑ n j=1 A ij . Then we write the eigenvalue decomposition of the graph Laplacian matrix as . . , λ n } is a diagonal matrix of graph frequencies, and V is the matrix of graph Fourier bases. Then the graph Fourier transform of x is defined asx = V T x, and the inverse Fourier transform is given by x = Vx. We say a graph signal x is Kbandlimited, if its' graph Fourier coefficients indexed by V \ K are zero, i.e.,x V \K = 0. In [19], it was proved that if the sampling set size is not smaller than |K|, then the Kbandlimited graph signal can be perfectly recovered. However, in many applications we can only observe noise corrupted signals, we cannot find a perfect estimator under noise. Consequently, Bayesian estimation is adopted in the graph signal reconstruction problem.

Problem Formulation
Let x be the original noise-free graph signal, and n be the noise vector. In the graph signal reconstruction problem, only the noise corrupted samples y S in a sampling set S ⊆ V are observable, thus we take the observation model as where C S ∈ {0, 1} |S|×n is the sampling matrix. We assume that n is a zero-mean vector with covariance matrix σ 2 I, x is K-bandlimited andx K is a zero-mean vector with covariance matrixΣ, hence the covariance matrix of Graph signal reconstruction is to approximately recover the original graph signal x from partial noisy observation y S via a linear transform Φ S ∈ R n×|S | , i.e.,x The error covariance matrix with respect to Φ S is defined as To find the optimal reconstruction matrix Φ S , one needs to minimize the scalar loss function J(Φ S ) = z T Φ S z for all z ∈ R n . The derived optimal recover operator is [10] and the correspongding error covariance matrix is given by where V T K = [v 1 , . . . , v n ], and the optimal mean squared error (MSE) is given by The goal of graph signal sampling is to select a subset S ⊆ V with at most k nodes, which minimizes the optimal reconstruction error MSE S , i.e., finding By using the circular commutation property of trace, (6) is simplified to

Method
In this study, we propose to solve (7) by Pareto optimization. For representable convenience, we use the binary vector s ∈ {0, 1} n to represent the subset S ⊆ V, the entry s i = 1 means that the node v i is included in S and s i = 0 otherwise. We do not distinguish between the two notations. Please note that the original problem is a constrained problem with single objective function, we now reformulate this problem as an unconstrained BOP: where To compare two solutions, the Pareto dominance relationship and the Pareto optimality are defined as follows: Definition 1 (Pareto dominance). Let s 1 and s 2 be two solutions, then Definition 2 (Pareto optimality). We say s is a Pareto optimal solution with respect to M, if there is no solution s ∈ M such that s ≺ s .
To summarize, a solution s 1 is said to weakly dominates s 2 if s 1 performs not worse than s 2 for all metrics, and for two solutions such that s 1 s 2 , we say s 1 dominates s 2 if there is at least one metric for which s 1 strictlly outperforms s 2 . s is said to be a Pareto optimal solution with respect to M if there is no solution in M that dominates s .

Pareto Optimization for Subset Selection (Poss)
POSS is an evolutionary algorithm for subset selection, and its overall procedure is shown in Algorithm 1. Denote T = 2 V as the set of all solutions, M ⊆ T as the set of all solutions found so far. POSS maintains a set P ⊆ M of all Pareto optimal solutions with respect to M. At each step, POSS first randomly selects a solution from the set of Pareto optimal solutions P, and then adopts the mutation operator to search for a potential Pareto optimal solution. If there is no solution that dominates the generated solution s , then POSS includes the generated solution to P, and remove the solutions which are weakly dominated by s from P; Otherwise POSS discards the solution s. Such update strategy in POSS guarantees that if P is the Pareto optimal solution set at current iteration, then the updating operation does not change the optimality of P. After T iterations, POSS outputs a solution s ∈ P with the minimal reconstructed error and a cardinality smaller than k, i.e., s = arg min s∈P, s 0 ≤k f (s).

Algorithm 1 POSS
Output: a subset S ⊆ V with at most k nodes.

4:
Generate s by bit-wise flipping with probability 1 n .

Proposed Pareto Optimiazation for Graph Signal Sampling (Pogss)
However, due to the general O(n 3 ) complexity of the matrix inversion operation, the evaluation step of f (s) is very expensive in general, which means we cannot efficiently compare two solutions. Please note that at each iteration the new solution is generated from a solution evaluated before, we can accelerate the evaluation of the objective function by leveraging the information of the previous solution. More explicitly, by using the matrix inversion lemma [20] we deduced the following lemma: Let t I = s \ s = {t I 1 , . . . , t I l } be the elements to be included at the current iteration, t E = s \ s = {t E 1 , . . . , t E m } denote the elements to be excluded, and t R = s ∩ s denote the elements that will be retained. Instead of directly computing theK s , we propose to computeK s∪t I firstly, and then compute the desiredK s =K (s∪t I )\t E . By using Lemma 1, we can easily computeK s∪t I by including the elements in t I to s one by one, and then computeK s by excluding the elements in t E from s ∪ t I separately. The full procedure of POGSS is described in Algorithm 2, and the solution flipping algorithm for acceleration is presented in Algorithm 3.
Denote N t = |t I | + |t E | as the number of bits that will be flipped at iteration t, then we have E(N t ) = 1, which reveals the fact that we are expected to flip only 1 bit at each iteration. Please note that the evaluation of (9) and (10) can be done at a cost of O(n 2 ), so the cost of evaluating the objective function in Algorithm 2 is expected to be O(n 2 ), while the naive matrix inversion update requires a cost of O(n 3 ). Therefore, this proposed procedure significantly accelerates the algorithm. Generate s by bit-wise flipping with probability 1 n .

Theoretical Analysis
In this section, we theoretically analyze the performance of POGSS, including the Pareto optimality of P and the overall reconstruction error of POGSS.
We first show in Proposition 1 that the subset P is the set of all Pareto optimal solutions with respect to M. Proposition 1 (Pareto optimality of P). At any iteration of the algorithm 2, let M be the set of all solutions found so far, then • for all s ∈ P, there is no s ∈ M such that s ≺ s, • for all s ∈ (M \ P ), there is a solution s ∈ M such that s ≺ s.
We then analyze the performance of POGSS by introducing the supermodularity ratio. A set function f : For a general criterion f , Definition 3 defines how close it is to a supermodular function, i.e., the supermodularity ratio γ S,k ( f ) [21]. It is easy to prove that f is supermodular if and only if γ S,k ( f ) ≥ 1 holds for any S and k.

Definition 3 (Supermodularity ratio).
Let f be a non-negative set function. The submodularity ratio of f with respect to a set S and a parameter k ≥ 1 is The most time-consuming operation in POGSS is the evaluation of the objective function after bit flipping. So here comes a question: how many bit flipping operations we need to obtain a desired solution? We provide an approximation upper bound of the optimal reconstruction MSE with O(k 2 n) bits flipped in Theorem 1. Please note that the greedy algorithm guarantees ( is the solution given by the greedy algorithm, POGSS achieves nearly the best known approximation guarantee as greedy algorithms do in quadratic time.

Theorem 1.
Define γ min = min s: s 0 =k−1 γ s,k , let N be the total number of flipped bits, OPT be the optimal value of (7). Then POGSS finds a solution s such that The proof of Theorem 1 relies on Lemma 2 and 3 stated bellow. The complete proof is similar to the proof of theorem 14.1 in [22], which is omitted due to the limited space.

Lemma 3 ([16]
). ∀s ∈ {0, 1} n , there exsits one nodes v ∈ s such that Please note that the approximation upper bound in Theorem 1 involves γ min , we show in Theorem 2 that γ min is also bounded.

Proposition 2 (Lower bound for γ min ). The supermodularity ratio satisfies
This conclusion is natural and obvious, since the equality holds when f (L) − f (L ∪ U ) = max u∈U f (L) − f (L ∪ {u}), and is also relatively loose. However, it is still meaningful because it is independent of the graph structure and the Signal-to-Noise Ratio (SNR), so it is tighter than the bound in [10] in high SNR scenarios.

Experiments
In this section, we conduct several experiments to evaluate the performance of POGSS. We first state the experimental settings, and then report the experimental results.

Experimental Settings
Two types of undirected graphs are adopted for evaluation: • The Erdos-Renyi (ER) graph: For ER graphs, any two nodes are connected with probability p, i.e., P( The Gaussian graph: For Gaussian graphs, we uniformly sample n points {p i } n i=1 in a unit square [0, 1] 2 . The edges are first connected with weights and we then set the weights which are smaller to some threshold W T to 0.
In our experiments, the number of nodes |V | = n is set to 100 for all graphs, and we set p = 0.2 for ER graphs, set s = 0.3, W T = 0.6 for Gaussian graphs.
The graph signal x is sampled from a Gaussian distribution: where V K is the matrix of graph Fourier bases,Σ is the covariance matrix ofx K . In our experiments,Σ is generated byΣ = PP T , where P ∈ R |K|×|K| is a random matrix with P i,j ∼ N (0, 1). The additive Gaussian noise n is adopted in our experiments, and the noise power σ 2 is set to 0.01. We compare our methods with several algorithms, including the state-of-the-art greedy algorithm, the randomized greedy algorithm, and the convex relaxation algorithm. In all experiments, the number of iterations of POGSS is set to 2ek 2 n according to Theorem 1. The naive greedy graph siganl sampling algorithm [10] simply selects a node with the largest marginal gain at each iteartion. Assume subset of nodes obtained by the greedy algorithm at the current iteration is S, then the node v = arg max v∈V \S f (S ∪ {v}) is included to S. The randomized greedy algorithm [11] uses the similar strategy to the greedy algorithm, but accelerates the algorithm by selecting the best node from a random subset. More explicitly, it first samples a subset R ⊂ V \ S with size −n/k log( ) uniformly, where 0 < < 1 is a hyper-parameter, then includes the node v = arg max v∈R f (S ∪ {v}) to S. Recall the original problem of graph signal sampling (7) since the original problem is neither differentiable nor convex, we relax this problem to a convex optimization problem [23] d = arg min then let the largest k entries in d to be included to the sampling set.

Experimental Results
We first evaluate the algorithms under the K-bandlimited signal assumption. Under this assumption, the entries inx indexed by K are zero, thus x can be perfectly represented by only |K| graph Fourier bases. In the following experiments we set |K| = 40 and use the first 40 graph Fourier bases to form V K .
We set the subset size k from 10 to 100 with stepsize 10, then independently run the simulation 10 times and report the average optimal reconstruction MSE of the obtained subset of each algorithm. Figures 2 and 3 illustrate the results obtained on ER graphs and Gaussian graphs respectively. From Figures 2a and 3a we observe that POGSS and greedybased algorithms achieve near-optimal performance, the performance convex relaxation method is worse than other method, and POGSS outperforms others for any subset size. We note that when the subset size k is approaching |K| = 40 the MSE decreases dramatically, and when k > |K| = 40 (shown in the right-hand side of the vertical dotted line) all of the methods provide the solutions with very small reconstruction MSE. Figures 2b and 3b plot the convergence curve of POGSS when k = 20. We can see that the MSE decreases very fast at first, and then becomes flat when k > 500. However, it keeps decreasing throughout the whole process. To be precise, POGSS reaches its' final solution at the 2144-th iteration, while T = 2ek 2 n = 2174. We next evaluate the algorithms for non-bandlimited graph signals, i.e., V K = V. Similar to the bandlimited scenario, we set the subset size from 10 to 100 with step size 10 and present the average optimal reconstruction MSE of the obtained subset of each algorithm. Figures 4 and 5 show the results obtained on ER graphs and Gaussian graphs respectively. From Figures 4a and 5a we observe the similar results to the ones in Figures 2a and 3a, where POGSS still achieves the best performance. It is worth noting that there is no "turning point" in Figures 4 and 5, because the signals are non-bandlimited, and cannot be accurately reconstructed from partial observations.    In Figure 6 we compare the running time of different sampling set sizes. We observe that POGSS takes the most time to converge, which is as expected since Theorem 1 reveals that POGSS takse 2ek 2 n iterations to converge, while greedy algorithms require only k iterations to do that (the time consumption of each method to finish one iteration is almost the same). Although POGSS takes the most time to find a desired solution, it is still practicable for two reasons: 1.
if we fit the running time of POGSS with a quadratic function, then we have T ≈ 0.26 · k 2 (sec), thus POGSS only takes only ∆T ≈ 5.15 · k + 25.77 (sec) when we add 10 nodes to the current sampling set, we think it is worthwhile to find a better solution with such a time consumption. 2.
In real world applications of graph signal sampling, e.g., [8], the sampling procedure is offline and the reconstruction procedure is online. We only need to sample the graph once then the entire graph signal can be always reconstructed with a small error, while the reconstruction time depends only on the size of the sampling set, rather than the sampling method. We also provide a visual demonstration for our method. We use the stanford bunny data [6] to obtain the graph. The nodes are scanned from a 3D model, and the edges are obtained by connecting each node with its' four nearest neighbors. The smooth graph signal is first sampled from a Gaussian distribution, and then filtered by a heat diffusion low-pass filter [24]. The resulting graph Fourier coefficients after filtering arê where t is the scaling parameter and is set to 10 in our experiment. As displayed in Figure 7a is the original smooth graph signal with Gaussian noise, the noise power is σ 2 = 0.01. There are n = 453 nodes in the original graph, and we sample a subset with k = 300 nodes by using POGSS to obtain Figure 7b, note that all of the edges are from the original graph. Figure 7c illustrates the optimally recovered graph signal by using (3), and the corresponding reconstruction error is MSE S = 137.2.

Conclusions
In this paper, we consider the graph signal sampling problem, which can be formulated as a subset selection problem. The evolutionary Pareto optimization for graph signal sampling (POGSS) is proposed for solving this problem. An efficient procedure is also proposed to accelerate the evaluation of the objective function. Theoretical analysis shows that POGSS, in polynomial time, achieves the best approximation upper bound of (1 − e −γ min ) · OPT + e −γ min f (0) as the state-of-the-art greedy algorithm does, and an lower bound of γ min is built as well. Simulation results demonstrate that POGSS outperforms the previous greedy algorithms.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.