Sequential Sampling and Estimation of Approximately Bandlimited Graph Signals

Graph signal sampling has been widely studied in recent years, but the accurate signal models required by most of the existing sampling methods are usually unavailable prior to any observations made in a practical environment. In this paper, a sequential sampling and estimation algorithm is proposed for approximately bandlimited graph signals, in the absence of prior knowledge concerning signal properties. We approach the problem from a Bayesian perspective in which we formulate the signal prior by a multivariate Gaussian distribution with unknown hyperparameters. To overcome the interconnected problems associated with the parameter estimation, in the proposed algorithm, hyperparameter estimation and sample selection are performed in an alternating way. At each step, the unknown hyperparameters are updated by an expectation maximization procedure based on historical observations, and then the next node in the sampling operation is chosen by uncertainty sampling with the latest hyperparameters. We prove that under some specific conditions, signal estimation in the proposed algorithm is consistent. Subsequent validation of the approach through simulations shows that the proposed procedure yields performances which are significantly better than existing state-of-the-art approaches notwithstanding the additional attribute of robustness in the presence of a broad range of signal attributes.

Sampling for (lossless) reconstruction or (minimum error) estimation is a fundamental problem in graph signal processing. For example, a sample survey in a social network needs to carefully select interviewees so as to better predict the attitudes of all users. In a sensor network, it is important to optimize sensor placement so that more information can be collected by a restricted number of sensors due to economic constraint [9]. In addition, any graph-based active semisupervised learning task may also be interpreted as a graph signal sampling and estimation problem [14,15].
There have been multiple works on graph signal sampling. The authors of [6,7] try to extend the Nyquist sampling theory to bandlimited graph signals. More works are seeking optimal sampling sets for graph signal estimation, where the graph signals are usually assumed to be bandlimited [17][18][19] or approximately bandlimited [17,20] in the frequency domain, or smooth in the vertex domain [17,21]. Sampling strategies mainly include topology-based methods that compute a score for each vertex [18,21], and designof-experiment (DOE) approaches that optimize certain scalarization form of some target matrix [19,22]. Most of the current sampling methods rely on accurate bandwidth [6,18,19] or prior distribution [22,23] of the graph signal.
It is natural in many applications to assume that the graph signal is somehow bandlimited or smooth. For instance, people with strong social connection are likely to hold similar opinions and temperature sensors located close to each other usually get similar measurements. However, in practice it is hard to know the exact bandwidth or prior distribution of the graph signal before any observation. In such cases, those sampling strategies based on accurate signal model become not applicable.
A possible solution is to sample and estimate in a sequential way. Unknown signal properties can be estimated from previous observations, and the subsequent sampling node can be selected under the latest signal model. Such a sequential framework can overcome the difficulty of incomplete system model and take advantage of model-based sample selection and signal estimation [20].
In this work, we investigate the sequential sampling and estimation of approximately bandlimited (ABL) graph signals, whose bandwidth and energy level are both not known in advance. With the aid of a Butterworth low-pass filter, the ABL graph signal is assumed to follow a multivariate Gaussian distribution with unknown hyperparameters. The variance of observation noise is also considered unknown.
In the proposed algorithm, hyperparameter estimation and sample selection are performed in an alternating way. At each step, expectation maximization (EM) [24] is first used to update the maximum marginal likelihood (MML) estimation of unknown hyperparameters based on historical observations. Substituting these latest estimated values into the signal prior and noise distribution, a Bayesian estimation and prediction procedure can then be performed. According to the uncertainty sampling (US) criterion [25], the node with the largest predictive variance is selected as the next node to sample.
In fact, estimating prior distribution from data, namely, empirical Bayes (EB) method, has long been studied [26]. It has been proved that under certain assumptions, the EB posterior distribution of parameter is consistent at its true value [27]. Meanwhile, an alternating approach analogous to the proposed one has been used in sequential DOE for nonlinear least squares (LS) regression. There, parameter estimation is proved to be consistent, and the design is asymptotically optimal [28].
In this paper, we prove that under specific conditions, signal estimation in the proposed algorithm is consistent at the true value. That is, as the number of samples goes to infinity, the estimated graph signal can get arbitrarily close to the true value. The finite-time performance of the proposed algorithm is validated by simulation results. Our algorithm is able to adjust to ABL graph signals with different properties, and it performs better in estimation error than existing methods.
The rest of the paper is organized as follows. We detail the signal prior and observation model in Section 2, and develop a sequential sampling and estimation algorithm in Section 3. The consistency of the proposed algorithm is proved under specific conditions in Section 4, and its finite-time performance is validated by experiments in Section 5. We conclude the paper in Section 6.
Throughout the paper, we use normal-weight italic lowercase letters (e.g., x) to denote scalar variables, bold italic lowercase letters (e.g., x) to denote vectors, bold italic uppercase letters (e.g., X) to denote matrices, and calligraphic uppercase letters (e.g., X ) to denote sets.

Preliminaries
We consider a simple connected weighted undirected graph G = (V, E , W ), where V is the set of vertices indexed by {1, · · · , N}; E is the set of edges between vertex pairs, e.g., (i, j) ∈ E denotes an edge between vertex i and vertex j; and W is the weighted adjacency matrix, with w ij the weight of edge between vertex i and vertex j if they are connected, or 0 otherwise. A graph signal that takes a real value on each vertex of the graph can be represented as f ∈ R N , where f i is the signal value on vertex i.
The Laplacian matrix is defined as L = D − W, where D = diag(d 1 , . . . , d N ) with d i = ∑ j:(i,j)∈E w ij is the degree matrix. Being symmetric and positive semi-definite, L has a spectral decomposition L = V ΛV T , where Λ is a diagonal matrix of the eigenvalues 0 = λ 1 < λ 2 ≤ · · · ≤ λ N , and V contains the corresponding orthonormal eigenvectors {v k } N k=1 as columns. These eigenvectors act as graph Fourier basis, and the associated eigenvalues are regarded as graph frequencies [1]. The frequency-domain coefficientsf of a graph signal f can be calculated via the graph Fourier transform (GFT) asf = V T f . The graph signal f can be expressed as f = Vf , which is known as the inverse GFT (IGFT).

Signal Prior
In this work, the graph signal f is assumed to be drawn from the following distribution: where I N denotes the N×N identity matrix, and n, α, γ > 0 are hyperparameters, among which α, γ are assumed unknown in prior. The meanings of the signal model and hyperparameters are explained in the frequency domain as follows. By GFT, the spectral coefficients {f k } N k=1 follow independent zero-mean Gaussian distributions with variances conforming to a Butterworth low-pass filter ( [29], Section 7.3): where n is the order, λ c = 2n α/γ is the cut-off frequency, and g = 1/ √ α is the amplitude gain of the filter. Under such a prior, we say the graph signal is approximately bandlimited in a probabilistic sense, where λ c plays the role of bandwidth, n controls the strictness of banlimitedness, and g determines the energy level of the graph signal.
Specially, when n = 1/2 and δ = α/γ is small, the signal prior Equation (1) becomes which is widely used in graph signal sampling and estimation [9,23]. The restriction on total variation f T L f = ∑ (i,j)∈E w ij ( f i − f j ) 2 can lead to smooth signals over the graph. If n → ∞, the out-of-band coefficients become deterministic zeros. The graph signal becomes strictly bandlimited, represented by where K = max{k | λ k ≤ λ c },f K collects the first K GFT coefficients that may be non-zero, and V K contains the first K GFT basis.
In conclusion, our signal prior Equation (1) is flexible to describe approximately bandlimited graph signals with different bandwidth, strictness of bandlimitedness, and energy level. It can also cover the most commonly used smoothness prior [9,23] and bandlimitedness prior [6,7,18,19]. Additionally, this model is simple with three hyperparameters, and can be represented in the vertex domain without eigendecomposition.
Here, hyperparameters α, γ or equivalently λ c , g in the signal prior are assumed unknown, that is, we do not need to have prior knowledge about the bandwidth or energy level of the graph signal. Only a value for n is required to control the strictness of bandlimitedness of the estimated signal. Compared to those methods for only strictly bandlimited graph signals with known bandwidth or smooth graph signals with known prior distribution, our signal prior is more general.

Observation Model
To overcome the difficulty caused by unknown hyperparameters, a sequential sampling process is considered, where one node is sampled at each step, so that unknown hyperparameters can be estimated from previous observations, and the next node to sample can be selected using the latest hyperparameters.
Suppose that vertex s t is sampled at step t, and the observed signal value is y t . The observation model can be expressed as where ψ T t = e T s t is the sampling vector with its s t -th element equal to 1 and others equal to 0, and w t is an additive zero-mean Gaussian observation noise with unknown precision β, In hyperparameter, signal estimation, and sample selection at step t, the historical observations from step 1 to step t are considered together. Let The noise values w 1 , . . . , w t involved in different samples (on either different vertices or the same vertex) are assumed to be independent and identically distributed (i.i.d.). According to Equations (5) and (6), p(w 1:t ; β) = N (w 1:t | 0, β −1 I t ).

Problem Formulation
Under the signal and observation model described in Sections 2.2 and 2.3, at each step t in the sequential sampling and estimation process, the two core problems are (1) how to estimate the unknown hyperparameters α, γ, β as well as the graph signal f based on the historical observations Ψ 1:t , y 1:t α t , γ t , β t , f t → Ψ 1:t , y 1:t , where α t , γ t , β t are the estimated values of hyperparameters at step t, and f t denotes the estimated graph signal at step t, and (2) how to select the next node s t+1 to be sampled at step t + 1 with the latest estimated values of hyperparameters Our ultimate goal is to minimize signal estimation error within given budget, or from another angle, reach certain estimation accuracy with least samples.

Hyperparameter and Signal Estimation
We first focus on unknown hyperparameter estimation from observation data. Here, maximum marginal likelihood (MML) estimation of unknown hyperparameters is adopted: α t , γ t , β t = arg max α,γ,β p(y 1:t |Ψ 1:t ; α, γ, β) Then, the signal posterior given estimated hyperparameters is Gaussian: with mean vector and covariance matrix ( [30], Section 10.6) The minimum mean square error (MMSE) or maximum a posteriori (MAP) estimation of the graph signal is f t = µ t .
However, direct optimization of Equation (12) is intractable. We view f as a hidden variable, and introduce expectation maximization (EM) ( [24], Section 9.4) in our algorithm to estimate α, γ, β, and thereby obtain the posterior distribution of f . To be concise, the subscripts t and 1 : t are omitted in the introduction of our EM algorithm.
In M step, α, γ, β are updated according to (8) and (9). For brevity, denote the expectation operator in Equation (16) as E l−1 [·], and the expectation value as E l−1 , which is also the objective value of the optimization problem. By direct calculations, we have where M is the sample size; "const" denotes terms that are independent of α, γ, β; and Note that E l−1 is concave with respect to α, γ, β, see Appendix A. The maximization problem in Equation (16) can be solved by any standard tool for convex optimization. Here, the first order conditions of the optimization problem are analyzed to give some insights into the hyperparameter estimation, and then a possible efficient search method based on these conditions is provided in Appendix B.
Take the partial derivatives of E l−1 with respect to α, γ, β, and set them to zero: Note that tr((αI N + γL 2n ) −1 ), tr((αI N + γL 2n ) −1 L 2n ) and M β can be seen as the prior expectations of f T f , f T L 2n f and w T w with respect to p( f ; α, γ) and p(w; β). The M step actually looks for a group of hyperparameters that makes these prior expectations equal to their posterior ones. In this sense, the estimated hyperparameters agree with the observations.
Repeat the E and M steps as above until convergence, and we will obtain an MML estimation of unknown hyperparameters α, γ, β, together with a posterior of the graph signal f . This process is summarized in Algorithm 1. Although the estimated hyperparameters are not guaranteed to be globally optimal by MML, we find the performance good enough in experiments.

Sample Selection
Having been able to estimate the unknown hyperparameters from previous observations by EM, at each decision step, the subsequent sampling node can be selected using the latest estimated values of hyperparameters.
According to the uncertainty sampling (US) criterion in active learning ( [25], Chapter 2), we should scan through all the nodes, and pick the one whose observed signal value we are most uncertain about as the next node to sample. Here, predictive variance is regarded as a measurement of uncertainty.
The predictive distribution of an observation y on vertex s with sampling vector ψ T = e T s given historical observations Ψ 1:t , y 1:t is where µ t and C t are the posterior mean and covariance of the graph signal f , respectively, given Ψ 1:t and y 1:t . The predictive variance consists of two parts: estimative variance e T s C t e s and noise variance (β t ) −1 , among which the latter is equal for all vertices. The next sampling node s t+1 can be decided by

Sequential Sampling and Estimation
Finally, the complete procedure of the proposed sequential sampling and estimation algorithm for approximately bandlimited graph signals is given in Algorithm 2. At each step, first unknown hyperparameters are re-estimated by EM as in Section 3.1, and then the next node to sample is selected by US as in Section 3.2. The total number of sampling steps T is equal to the sampling budget. Sample the vertex s t and obtain an observation y t . 4: Update hyperparameters α t , γ t , β t and signal posterior µ t , C t by EM as in Algorithm 1.

5:
Select the next node to sample s t+1 by US as in Equation (25). 6: end for Our sequential sampling strategy takes into account not only graph topology, but also previous observations, by both estimating hyperparameters based on historical data and selecting sampling nodes based on signal posterior. Making full use of historical observations and deciding which nodes to sample online, the proposed algorithm is able to cope with the situation where signal and noise distributions are not completely available in advance, and efficiently select samples to estimate the underlying graph signal.
The computational complexity of a decision step in the proposed algorithm is is the number of EM iterations, n bs t is the average iterations of binary search in M step, and N is the graph size. The computational cost is acceptable in view of the performance improvement, especially in time-insensitive scenarios with large observation cost. The proposed method is able to efficiently select samples to give accurate estimation of signals thanks to the sequential framework with EM hyperparameter update, and the signal estimation is consistent.

Asymptotic Analysis
In the previous section, we have developed a sequential sampling and estimation algorithm for approximately bandlimited graph signals with an incomplete system model, where hyperparameter estimation and sample selection are performed in an alternating way. The performance of the proposed algorithm will be validated via simulation in the next section, where we will see its efficient sample selection and accurate signal estimation with limited sampling budget and little prior knowledge. In this section, we are to emphasize that the proposed algorithm improves limited-budget performance without sacrifice of consistency, which is not trivial for a sequential decision process.
Intuitively, as the number of observations grows, hyperparameters better fitting the true model will be picked out, really important nodes will be selected to sample, and finally an excellent estimation of the graph signal can be obtained. Here, we try to provide some theoretical support for this intuition. The asymptotic performance of the proposed algorithm is analyzed, and we state and prove the consistency of signal estimation in our method.
Before concentrating on the asymptotic performance of signal estimation, we first investigate the asymptotic behavior of sample selection in the proposed algorithm. Lemma 1. Let m i,t denote the number of times vertex i is sampled up to step t. If lim That is, as t → ∞, all the nodes of the graph will be sampled again and again, each maintaining a sampling ratio greater than δ, including the lowest one. A proof of Lemma 1 is given in Appendix C. Then, two direct corollaries follow.
Based on them, we are now to state our main theorem on the consistency of signal estimation in the proposed algorithm. Theorem 2. The signal posterior p( f |Ψ 1:t , y 1:t ; α t , γ t , β t ) is consistent at the true value f * , if EM converges at MML hyperparameters satisfying the condition in Lemma 1.
The consistency of signal posterior here means, as t → ∞, the posterior distribution will become a point mass at the true value of the graph signal. A proof of Theorem 2 is provided in Appendix D. Then, we have the following corollary. It ensures that under specific conditions, as the sample size grows infinitely large, the MMSE/MAP estimation can get arbitrarily close to the true signal value, i.e., the graph signal can be estimated with arbitrary accuracy. This guarantees the asymptotic performance of the proposed algorithm in signal estimation.
The condition lim

Experiments
In this section, the finite-time performance of the proposed algorithm is validated via a series of simulation experiments, on both synthetic and real-world data.
The performance of the proposed algorithm is compared to the following methods: • RndEM: Select sampling nodes randomly and uniformly, and estimate hyperparameters and the signal using the proposed EM algorithm. • OrcUS: Suppose that the hyperparameters are known in advance (the "oracle"). Select sampling nodes by US, and estimate the signal by MMSE/MAP. • Anis2016: A heuristic sampling algorithm proposed in Anis et al. [7] to maximize the cut-off frequency such that the sampling set is a uniqueness set for the bandlimited subspace. The graph signal is recovered in the bandlimited subspace by least squares. The estimation order k of cut-off frequency is set to k = 4. • Perraudin2018: A nonuniform random sampling method proposed in Perraudin et al. [21] with probability relevant to local uncertainty. Inpainting is done by minimizing total variation. The kernel in local uncertainty isĝ(λ) = exp(−τλ) where τ = 8. • Sakiyama2019: In this method by Sakiyama et al. [31], the graph signal is recovered by a linear combination of localization operators at the sampled nodes. The sampling set is designed such that the energy of operator at each node is large and meanwhile the overlapping areas are small. The kernel in localization operator isĝ(λ) = exp(−τλ) where τ = 0.5. The Chebyshev polynomial approximation order is P = 12, and the signal recovery order is k = 12. RndEM and OrcUS are references to verify the effectiveness of the proposed algorithm. Anis2016, Perraudin2018, Sakiyama2019 are three state-of-the-art methods that do not directly rely on signal model. For Anis2016, the benefit of further increasing k is not noticeable in our experiments, but the computations will become less numerically stable. For Perraudin2018 and Sakiyama2019, we use the same kernel type as in the original papers, and the parameters are chosen by experiments to adjust to our settings.
The normalized l 2 -norm estimation error is taken as a performance index, where f is the estimated signal, and f * is the true value.

Simulations on Synthetic Data
Three graphs of representative types that have widespread applications in the real world are used for testing in our experiments: • G 1 : A random geometric graph with N = 100 vertices randomly placed in a 1-by-1 square, and edge weights assigned via a thresholded Gaussian kernel where d ij is the Euclid distance between vertex i and vertex j, r = 0.2 and σ = 0.1. • G 2 : A small world graph with N = 100 nodes, generated by the Watts-Strogatz model [32] with mean node degree 6 and rewiring probability 0.2. • G 3 : A scale-free graph with N = 100 nodes, generated by the Barabási-Albert model [33,34] with 1 initial node, each newly added node connected to 1 existing node, and connecting probabilities proportional to the degrees of existing nodes.
The graph topologies as well as their eigenvalue distributions are visualized in Figure 1. Due to limited space, we will mainly display and analyze our experimental results on G 1 in detail, whereas example results on G 2 and G 3 are shown at the end to demonstrate the topology adaptability. Graph signals are generated from the ABL prior distribution Equation (1). Moreover, observation noise follows i.i.d. zero-mean Gaussian distribution Equation (6). A variety of hyperparameters are considered, in order to cover ABL graph signals with different bandwidth, strictness of bandlimitedness and energy level, and scenarios with different signal-noise ratio (SNR). SNR here considering random ABL graph signal is defined as For each setting, 100 graph signals are generated to evaluate the average performance of each method. The average estimation errors under different sample sizes are computed and plotted.
First of all, we compare the performance of the proposed algorithm with RndEM and OrcUS on G 1 . Hyperparameters are set to n = 2, λ c = 2, g = 1, and SNR r = 15 dB. As shown in Figure 2, although in the initial stage, the estimation error of the proposed algorithm is similar to RndEM, as the sample size grows, our performance gradually approaches that of OrcUS. This result is as expected, as hyperparameter estimation gets better with growing sample size, which can lead to more efficient sample selection and more accurate signal estimation. Our superiority to RndEM (EM estimation without US sampling) demonstrates the effectiveness of choosing sampling nodes by US. The performance gap between our method and OrcUS (US sampling with known hyperparameters) tends to disappear when samples are abundant, also supporting the validity of hyperparameter estimation by EM. Subsequently, the proposed algorithm is compared to three existing sampling and estimation methods-Anis2016, Perraudin2018, and Sakiyama2019-in scenarios without much prior knowledge about signal and noise model. A series of experiments with different hyperparameters are done on G 1 to test and compare their performance in all kinds of settings. In the first experiment, we fix g = 1, SNR r = 15 dB, and repeat the simulation with n = 1, 2, λ c = 2, 3, corresponding to ABL graph signals with different bandwidth and strictness of bandlimitedness. Simulation results are depicted in Figure 3. From the graphics we can see, the proposed algorithm can obtain better or competitive estimation performance compared to state-of-the-art methods on ABL graph signals with various properties. For Perraudin2018 and Sakiyama2019, the proposed method significantly outperforms them after a small number of initial samples in all the four settings. As for Anis2016, when n is small or λ c is large, which means the signal is far from strictly bandlimited or occupies large bandwidth, we need far fewer samples than Anis2016 to recover the signal with tolerable error. This is because our method estimates the graph signal at all frequencies from the beginning thanks to the Bayesian framework, and our signal prior with hyperparameters estimated from data can better fit ABL graph signals with different properties. Therefore, the proposed algorithm is more robust against different signal properties and is able to obtain better estimation results in most cases. We continue to investigate our performance for ABL graph signals with different energy level as well as in scenarios with different signal-noise ratio. We fix n = 2, λ c = 2. When changing g from 0.01 to 100, SNR r is fixed at 15 dB. When changing SNR r from 5 dB to 25 dB, g is fixed at 1. Sampling size is set to M = 60. According to Figure 4a, the proposed algorithm gets better performance for graph signals with lower energy. For one thing, even though the estimated variance in the signal prior also becomes larger when the signal energy is larger, the zero-mean Gaussian prior still prefers signals with lower energy. For another, as the variance gets larger, the signal prior becomes flatter, and thus weaker, in the sense that it has less preference among different signals. Therefore, more observations are needed to reach the same estimation accuracy. Fortunately, this problem can be overcome by adaptive rescaling, dividing the observed signal values by a sufficiently large number, as long as we know the rough order of magnitudes of the signal energy. As for SNR, from Figure 4b, although lower SNR may result in worse performance of the proposed algorithm, our method keeps outperforming the others when samples are adequate.  Then, example simulation results on G 2 and G 3 are given in Figure 5. n = 2, g = 1, SNR r = 15 dB. λ c = 3 for G 2 , and λ c = 1 for G 3 . The proposed algorithm also reaches higher estimation accuracy than the other methods with enough sampling budget. This demonstrates the robustness of the proposed algorithm against different graph topologies. From the simulation results above, without much prior knowledge, the proposed algorithm can adjust to ABL graph signals with different bandwidth, strictness of bandlimitedness, energy level, and SNR. In most cases, exceeding a small number of initial samples, the proposed algorithm is able to estimate the graph signals with higher accuracy than the existing methods.
Besides the validation of the limited-budget performance, we also verify the consistency of the proposed algorithm by a simulation experiment. We generate an ABL graph signal on G 1 with n = 2, λ c = 2, and g = 1, and the SNR is set to 15 dB. To observe the asymptotic performance, the maximum sample size is set to 10 5 , which is 10 3 times the graph size. We repeat the proposed algorithm on the graph signal for 50 times to see the mean as well as the variance of the estimation errors.
As shown in Figure 6, the mean normalized l 2 -norm error tends to decrease continuously as the sample size grows, and has decreased to about 5 × 10 −3 when the sample size reaches 10 5 . The standard deviation also tends to decline with growing sample size, which is around 3 × 10 −4 with 10 5 samples. Other ABL graph signals also have similar results. This agrees with the consistency of the proposed algorithm, which has been stated and theoretically proved in Section 4.  Figure 6. Result of the consistency verification experiment. The graph signal is on G 1 generated with n = 2, λ c = 2, g = 1, and the observation SNR is 15 dB. The curve denotes the mean of the normalized l 2 -norm error, and the region corresponds to ± 1 standard deviation around the mean. In the overview figure, the y-axis is in logarithmic scale to better display the decrease of the mean estimation error, whereas in the partial enlargement, it is linear to present the decline of the standard deviation.

A Real-World Example of Temperature Sensor Network
As an application example, we apply the proposed algorithm to the air temperatures measured by weather stations in China [35]. This example shows a potential application of the proposed algorithm in sensor selection.
We construct a K-nearest neighbor (K-NN) graph from the N = 381 weather stations with K = 5, where each node corresponds to a weather station, and the great-circle distances between stations are considered in K-NN search. Edge weights are assigned via a Gaussian kernel: where d ij is the great-circle distance between station i and station j in degree, and σ = 2. Then, the air temperatures measured by these weather stations can be seen as a graph signal. We consider two such signals in our experiment: f 1 at 0:00 on 1 January 2020, which is a winter night, and f 2 at 12:00 on 1 July 2020, which is a summer noon. The two graph signals and their spectra are visualized in Figure 7. As expected, the signals are smooth in the vertex domain and approximately bandlimited in the frequency domain. We apply the proposed sequential sampling and estimation algorithm to these graph signals. Additive i.i.d. zero-mean Gaussian observation noise following Equation (6) is added, where SNR d = 10 lg f T f N/β is 15 dB. The known hyperparameter n in the signal prior Equation (1) is set to 1/2, which means a smoothness prior Equation (3) is considered. The observed signal values are divided by 100 as preprocessing, and accordingly the estimated graph signal is multiplied by 100 at output. The performance of the proposed algorithm is compared to Anis2016, Perraudin2018 and Sakiyama2019. As our first sampling node is chosen at random, Perraudin2018 is a random sampling method, and the observation noises also introduce randomness, we test each method on the same signal for 20 times, and compare their average estimation errors. As shown in Figure 8, although the proposed algorithm needs some initial samples to "warm up", after that the estimation error decreases rapidly and surpass Anis2016, Perraudin2018, and Sakiyama2019 in very short time, and we keep leading as the sample size further grows. This is because our sequential method with unknown hyperparameters requires a certain number of samples to have a relatively proper estimation of the signal and noise distributions. Then, based on the latest signal bandwidth, energy level, and noise precision estimated from data, we are able to choose the subsequent sampling nodes more efficiently and estimate the graph signal more accurately than other methods that take no account of such signal and noise properties.

Conclusions
In this paper, a sequential sampling and estimation algorithm is proposed for approximately bandlimited graph signals without complete prior knowledge about signal and noise properties. In the proposed algorithm, hyperparameter estimation by EM based on historical observations and sample selection by US with latest estimated hyperparameters are done in an alternating way. We prove in theory the consistency of signal estimation in the proposed algorithm under specific conditions. Finite-time performance of the proposed algorithm is validated by a series of simulation results. The proposed method provides a novel and practical sequential framework for graph signal sampling with unknown hyperparameters in the system model, and the experiment on temperature data shows our potential application in sensor selection on sensor networks with smooth measurements over the topologies. Other potential applications include opinion prediction in social networks assuming that people with strong social connection are likely to hold similar opinions, and active semisupervised learning tasks on similarity graphs.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A. Concavity of Objective Function in M Step
The objective function in Equation (16) is concave with respect to α, γ, β, as the Hessian matrix is negative semi-definite for all α, γ, β.

Appendix B. Search Method for Optimization Problem in M Step
To maximize E l−1 to solve Equation (16), take the partial derivatives with respect to α, γ, β and set them to zero as in Equations (21)-(23).
From Equation (23), the M-step update of β has closed form which is independent of the update of α, γ.
As for α, γ, as the approximately bandlimited prior Equation (1) requires α, γ > 0, if Equations (21) and (22) hold for some α, γ > 0, this (α, γ) is the solution for (α l , γ l ); otherwise, (α l , γ l ) should be determined using gradient information based on concavity. Combining Equations (21) and (22) to have α which constrains (α, γ) on a straight line in the α-γ plane. Only the segment PQ in the first quadrant is considered as shown in Figure A1 to ensure α, γ > 0. The endpoint P has coordinates 0, is always rightward as illustrated in Figure A1. Moreover, the gradient at Q is either upward or downward, depending on the sign of ∆.
If ∆ > 0, the gradient at Q is upward as in Figure A1a. Let d be the unit vector of direction PQ. Consider the directional derivative ∂d is monotonous along PQ. The zero-crossing Z can be found using, for example, binary search. Notice that g| Z satisfies both g| Z ⊥ PQ and g| Z ⊥ OZ, so g| Z = 0, i.e., Equations (21) and (22) hold at Z. Therefore, Z is the solution point for (α l , γ l ).
Otherwise, if ∆ ≤ 0, the gradient at Q is zero or downward as in Figure A1b. By concavity of E l−1 , the endpoint Q is the optimal point in the first quadrant. γ l is set to 0, and α is updated by , signal values on different nodes become i.i.d., and only the total energy of the graph signal is restricted.

Appendix C. Proof of Lemma 1
Proof. Let {m (r),t } N r=1 be the order statistics of {m i,t } i∈V satisfying m (1),t ≤ · · · ≤ m (N),t , and denote the sorted vertex indices by {i (r),t } N r=1 . Consider η i,t = m i,t t , and η (r),t = m (r),t t . Define Note that lim inf t→∞ η (r),t always exists as η i,t is bounded in [0, 1]. Moreover, the set is not We are to show that lim inf t→∞ ρ t ≥ ρ for some ρ > 0, which contradicts the definition of R. Then we will arrive at an assertion R = 0, and lemma 1 is proved.
Then, we get exp(t 2 ) D t (α, β * ) → ∞ as t → ∞. Therefore, the denominator for all large t with high probability, where 2 can be any positive number. Pick 0 < 2 < 1 . For 3 = 1 − 2 > 0, for all sufficiently large t with high probability. We finally reach a conclusion that P(U c |Ψ 1:t , y 1:t ; α t , β t ) converges to 0 with high probability as t → ∞ for any positive .