Edge-Preserving Denoising of Image Sequences

To monitor the Earth’s surface, the satellite of the NASA Landsat program provides us image sequences of any region on the Earth constantly over time. These image sequences give us a unique resource to study the Earth’s surface, changes of the Earth resource over time, and their implications in agriculture, geology, forestry, and more. Besides natural sciences, image sequences are also commonly used in functional magnetic resonance imaging (fMRI) of medical studies for understanding the functioning of brains and other organs. In practice, observed images almost always contain noise and other contaminations. For a reliable subsequent image analysis, it is important to remove such contaminations in advance. This paper focuses on image sequence denoising, which has not been well-discussed in the literature yet. To this end, an edge-preserving image denoising procedure is suggested. The suggested method is based on a jump-preserving local smoothing procedure, in which the bandwidths are chosen such that the possible spatio-temporal correlations in the observed image intensities are accommodated properly. Both theoretical arguments and numerical studies show that this method works well in the various cases considered.


Introduction
The Landsat project, led by the US Geological Survey (USGS) and NASA, has launched eight satellites since 1972 to continuously provide scientifically valuable images of the Earth's surface.These images can be freely accessed by researchers around the world (cf., Zanter [1]).This rich archive of Landsat images has become a major resource for scientific research about the Earth's surface and its resources in different scientific disciplines, including forest science, climate science, agriculture, ecology, fire science, and many more.As an example, Figure 1 shows two images of the Las Vegas area in Nevada taken in 1984 and 2007, respectively.These two images clearly show the increasing urban sprawl of Las Vegas during the 23-year period, and consequently, the environment in that region has changed dramatically.The current satellite (i.e., the Landsat 8) can deliver an image of a given region roughly every 16 days.So, we have a sequence of images of that region collected sequentially over time, stored in the Landsat database, which is increasing all the time.Image sequences are commonly used in many other applications, including functional magnetic resonance imaging (fMRI) in neuroscience and quality control in manufacturing industries (Qiu [2]).In practice, observed images usually contain noise and other contaminations (Gonzalez and Woods [3]).For reliable subsequent image analyses, such contaminations should be removed in advance.In the image processing literature, the removal of noise from an observed image is referred to as image denoising.This paper focuses on image denoising for analyzing observed image sequences.
In the literature, there has been extensive discussion on image denoising (Qiu [4]).Many early methods in the computer science literature are based on the Markov random field (MRF) framework, in which observed image intensities of an image are assumed to have the Markov property that the observed intensity at a given pixel depends only on the observed intensities in a neighborhood of the given pixel (Geman and Geman [5]).
Then, if the true image is assumed to have a prior distribution which is also an MRF, its posterior distribution would be an MRF too, and consequently, the true image can be estimated by the maximum a posteriori (MAP) estimator (e.g., Geman and Geman [5], Besag [6], Fessler et al. [7]).Other popular image denoising methods include those based on diffusion equations (e.g., Perona and Malik [8], Weickert [9]), total variation (Beck and Teboulle [10], Rudin et al. [11], Yuan et al. [12]), wavelet transformations (e.g., Chang et al. [13], Mrázek [14]), jump regression analysis (e.g., Gijbels et al. [15], Qiu [16], Qiu [17], Qiu and Mukherjee [18]), adaptive weights smoothing (e.g., Polzehl and Spokoiny [19]), spatial adaption (e.g., Kervrann and Boulanger [20]) and more.Besides noise removal, edge-preserving is important for image denoising because edges are important structures of the images.Some of the methods mentioned above can preserve edges well, such as the ones based on jump regression analysis, total variation, and wavelet transformations.Thorough surveys of popular edge-preserving image denoising methods can be found in Jain and Tyagi [21] and Qiu [4].Although there are already some existing methods for edge-preserveing image denoising, almost all of them handle observed images taken at a single time point.So far, we have not found much discussion about denoising image sequences, which is the focus of the current paper.A given image sequence often describes a gradual change in appearance over time, subject to the underlying process.For instance, the sequence of images of the Las Vegas area acquired by the Landsat satellite (cf., Figure 1) describes the gradual change of the Earth's surface in that area over time.As mentioned above, two consecutive images in the sequence acquired by the current Landsat satellite are only about 16 days apart.So, their difference should be very small.However, the images could be substantially different after a long period of time, as shown in Figure 1.In such applications, it should be reasonable to assume that edge locations in different images either do not change or change gradually over time.To handle such image sequences, the neighboring images should be useful when denoising the image at a given time point, or information in neighboring images should be shared during image denoising.By noticing such features of image sequences, we propose an edge-preserving image denoising procedure for analyzing image sequences in this paper.Our proposed method is based on the jump regression analysis (JRA) used for regression modeling when the underlying regression function has jumps or other singularities (Qiu [22]).It is a local smoothing procedure, and the possible spatiotemporal correlation in the observed image data has been accommodated properly in its construction.Both theoretical arguments and numerical studies show that this method works well in various different cases.
The remaining parts of the article are organized as follows.The proposed method is described in detail in Section 2. Its statistical properties and the numerical studies about its performance in different finite-sample cases are presented in Section 3. Several concluding remarks are provided in Section 4. Some technical details are given in Appendix A.

Materials and Methods
This section describes our proposed method in two parts.A JRA model for describing an image sequence and the model estimation are discussed in Section 2.1.Selection of several parameters used in model estimation is discussed in Section 2.2.

JRA Model and Its Estimation
To describe an image sequence, let us consider the following JRA model: where Z ijk is the observed image intensity level at the (i, j)-th pixel (x i , y j ) and at the k-th time point t k , f (x i , y j ; t k ) is the true image intensity level, and ε ijk is the pointwise random noise with mean 0 and variance σ 2 .In model ( 1), spatio-temporal data correlation is allowed, namely, {ε ijk } could be correlated over i, j and k.For image data, the pixel locations are usually regularly spaced.Without loss of generality, it is assumed that they are equally spaced in the design space Ω = [0, 1] × [0, 1], namely, (x i , y j ) = (i/n x , j/n y ), for all i and j, where n x and n y are the numbers of rows and columns, respectively.The observation times {t k , k = 1, 2, . . ., n t } are also assumed to be equally spaced in the time interval [0, 1].The true image intensity function f (x, y; t), for (x, y) ∈ Ω, is continuous in the design space Ω at each t ∈ [0, 1], except on the edges where it has jumps.
To estimate the unknown image intensity function f (x, y; t) in model ( 1), we consider using a local smoothing method, instead of a global smoothing method (e.g., smoothing spline method), because of a large amount of data involved in the current problem.Likewise, it has been well-discussed in the JRA literature that conventional smoothing methods (e.g., conventional local kernel smoothing methods) would not work well for estimating models like (1) where the true image intensity function f (x, y; t) has jumps at the edges, because the jumps would be blurred by such conventional methods (cf., Qiu [22]).In this paper, we suggest a jump-preserving local smoothing method for estimating (1), described in detail below.For a given point (x, y; t) ∈ Ω × [0, 1], define a local neighborhood where h x , h y and h t are the bandwidths in the x−, y−, and t−axis, respectively.In O(x, y; t), we first consider the following local linear kernel (LLK) smoothing procedure (Fan and Gijbels [23]): where where , and m rsl = ∑ ijk (x i − x) r (y j − y) s (t k − t) l K ijk , for r, s, l = 0, 1, 2. The LLK estimator of f (x, y; t) is defined to be a(x, y; t).The estimated gradient direction of f (x, y; t) at (x, y; t) is G(x, y; t) = ( b(x, y; t), c(x, y; t), d(x, y; t)) which indicates the direction in which the estimated plane in O(x, y; t) by the LLK procedure (2) increases the fastest.If there is an edge surface in O(x, y; t), then G(x, y; t) would be (approximately) orthogonal to that surface.
In cases when there are no edges in the neighborhood O(x, y; t), a(x, y; t) would be a good estimate of f (x, y; t).Otherwise, it cannot be a good estimate because a(x, y; t) is a weighted average of all observed image intensities in O(x, y; t), the jumps in the image intensity surface would be smoothed out in the weighted average, and the estimate a(x, y; t) would be biased for estimating f (x, y; t).To overcome that limitation, we consider the following one-sided smoothing idea.Let O(x, y; t) be divided into two parts O (1) (x, y; t) and O (2) (x, y; t) by a plane that passes (x, y; t) and is perpendicular to G(x, y; t).See Figure 2 for an example.Then, in cases when there is an edge surface in O(x, y; t), that plane would be (approximately) parallel to the edge surface.Consequently, at least one of O (1) (x, y; t) and O (2) (x, y; t) would be (mostly) located on a single side of the edge surface in such cases.Now, let us consider the following one-sided LLK smoothing procedure: for l = 1, 2, min a,b,c,d The solutions of (4) to (a, b, c, d) are denoted as ( a (l) (x, y; t), b (l) (x, y; t), c (l) (x, y; t), d (l) (x, y; t)), for l = 1, 2. Intuitively, when there are no edges in O(x, y; t), a(x, y; t), a (1) (x, y; t) and a (2) (x, y; t) are all consistent estimates of f (x, y; t) under some regular conditions.In such cases, a(x, y; t) would be preferred since it averages more observations and consequently it would have a smaller variance.When there are edges in O(x, y; t), a(x, y; t) would not be a good estimate of f (x, y; t) as explained above, but one of a (1) (x, y; t) and a (2) (x, y; t) should estimate f (x, y; t) well.Therefore, in all cases, at least one of the three estimators a(x, y; t), a (1) (x, y; t) and a (2) (x, y; t) should estimate f (x, y; t) well.Next, we need to choose a good estimator from a(x, y; t), a (1) (x, y; t) and a (2) (x, y; t) based on the observed data, which is not straightforward, partly because we do not know in advance whether there are edges in the neighborhood O(x, y; t) and whether the edges are mostly contained in O (1) (x, y; t) or O (2) (x, y; t) if the answer to the first question is positive.To overcome this difficulty, let us consider the following weighted residual mean squares (WRMS) of the fitted local plane by the LLK procedure (2): The above WRMS measures how well the fitted local plane describes the observed data in O(x, y; t).If there are edges in O(x, y; t), this quantity would be relatively large, due mainly to the jumps in the image intensity surface.Otherwise, it would be relatively small.So, the quantity e(x, y; t) contains useful information about the existence of edges in O(x, y; t).Similarly, we can define WRMS values for the two one-sided local planes fitted in O (1) (x, y; t) and O (2) (x, y; t).They are denoted as e (1) (x, y; t) and e (2) (x, y; t).Based on these WRMS values, we define our edge-preserving estimator of f (x, y; t) to be f (x, y; t) = a(x, y; t)I(D(x, y; t) ≤ u) + a (1) (x, y; t)I(D(x, y; t) > u)I(e (1) where D(x, y; t) = max(e(x, y; t) − e (1) (x, y; t), e(x, y; t) − e (2) (x, y; t)), I(•) is the indicator function, and u > 0 is a threshold parameter.By (6), it is obvious that f (x, y; t) is defined to be one of a(x, y; t), a (1) (x, y; t) and a (2) (x, y; t).The quantity a(x, y; t), which is obtained from the entire neighborhood O(x, y; t), is chosen if the observed data indicate no edges in O(x, y; t), supported by the event D(x, y; t) ≤ u.Otherwise, one of the two one-sided quantities, a (1) (x, y; t) and a (2) (x, y; t), with a smaller WRMS value is chosen.Although, theoretically, the event (e (1) (x, y; t) = e (2) (x, y; t)) would have probability zero of happening, the last term on the right-hand-side of ( 6) is still included for completeness of the definition of f (x, y; t) and for the consideration that e (1) (x, y; t) and e (2) (x, y; t) could be considered the same in certain algorithms when their values are close.

Parameter Selection
In our proposed method described in Section 2.1, there are four parameters; h x , h y , h t and u, that need to be chosen properly in advance.For that purpose, it is natural to consider the cross validation (CV) procedure, especially in the current research problem where the observed data are quite large in size.However, it has been well-demonstrated in the literature that the conventional CV procedure would not work well in cases when the observed data are autocorrelated, because it cannot effectively distinguish the data correlation structure from the mean structure (cf., Altman [24], Opsomer et al. [25]).In the current problem, spatio-temperal data correlation is possible in almost all applications.Thus, the conventional CV procedure is not feasible in such cases.In the univariate regression setup, Brabanter et al. [26] suggested a modified CV procedure for choosing smoothing parameters in cases with correlated data.This procedure is generalized here for choosing the parameters h x , h y , h t and u used in the proposed method, which is described below.Let the modified CV score for choosing h x , h y , h t and u be defined as where f −(ijk) (x i , y j ; t k ) is the leave-one-out estimate of f (x i , y j ; t k ) by ( 2)-( 6) after the observation Z ijk is removed from the estimation process and after the kernel function is replaced by the so-called -optimal bimodal kernel function K (v) defined to be where 0 < < 1 is a parameter.Based on a large simulation study, Brabanter et al. [26] suggested choosing to be 0.1, which is adopted in this paper.Then, by the above modified CV procedure, ( 7) and ( 8), the parameters h x , h y , h t and u can be chosen by minimizing the modified CV score CV(h x , h y , h t , u).

Statistical Properties
In this part, we discuss some statistical properties of the proposed edge-preserving image sequence denoising method ( 2)-( 6).First, we have the following proposition.Proposition 1. Assume that i) the kernel function K(v) used in (2) is a Lipschitz-1 continuous density function, and ii) the noise terms {ε ijk , i = 1, 2, . . ., n x , j = 1, 2, . . ., n y , k = 1, 2, . . ., n t } in model (1) form a strong mixing stochastic process with the following strong mixing coefficients: which have the property that α(d) ≤ c 1 σ 2 ρ c 2 d , where c 1 , c 2 > 0 and 0 < ρ < 1 are constants, and iii) E(ε 6 111 ) < ∞.Let N = n x n y n t , H = h x h y h t , n min = min(n x , n y , n t ), and h min = min(h x , h y , h t ).Then, for any (x, y; t) Based on the results in Proposition 1, we can derive the following properties of the LLK estimates defined in (3).Theorem 1.Besides the conditions in Proposition 1, we further assume that the true image intensity function f (x, y; t) has continuous first-order partial derivatives with respect to x, y and t in the design space Ω except at the edge curves.Then, for any (x, y; t) ∈ Ω h \ J h , we have where for any (x * , y * , t * ) ∈ J}, S is the set of singular points in J, including the crossing points of two or more edges, points on an edge surface at which the edge surface does not have a unique tangent surface, and points in J at which the jump sizes in f (x, y; t) are zero, S h = {(x, y; t) : (x, y; t) ∈ Ω h , (x is the projection of (x, y; t) to J with the Euclidean distance between the two points being c h 2 x + h 2 y + h 2 t , for a constant 0 < c < 1, and f − (x τ , y τ ; t τ ) is the smaller one of the two one-sided limits of f (x, y; t) at (x τ , y τ ; t τ ).In cases when O(x, y; t) contains jumps, without loss of generality, it is assumed that O(x, y; t) is divided by the edge surface into two parts I 1 and I 2 with a positive jump size d τ from I 1 to I 2 at (x τ , y τ ; t τ ), and Q (1) and Q (2) are the two corresponding parts in the support of K(u, v)K(w).
The next two theorems establish the consistency of the proposed edge-preserving image denoising procedure (2)- (6).First, we have the following theorem about the WRMS values defined in (5).
Theorem 3.Under the conditions in Theorem 2 and the extra assumption that threshold parameter u = u N → 0 as N → ∞, we have, for any (x, y; t) .
The proofs of these theoretical results are given in Appendix A.

Numerical Studies
In this part, we study the numerical performance of our proposed method for denoising an image sequence.First, we consider a simulation example in which the true image intensity function in model (1) has the following expression: In model ( 1), the random errors {ε ijk , i = 1, 2, . . ., n x , j = 1, 2, . . ., n y , k = 1, 2, . . ., n t } are generated by the function spatialnoise() in the R-package neuRosim (cf., Welvaert et al. [27]).In that R function, there are two parameters ρ and σ to specify in advance, where ρ controls the data autocorrelation in all three dimensions and σ is the common standard deviation of the random errors.In all our examples, σ is fixed at 0.1, 0.2 or 0.3, and ρ is fixed at 0.1, 0.3 or 0.5, to study the possible impact of data noise level and data correlation on the performance of the proposed method.Without loss of generality, we set n x = n y in all examples.In the model estimation procedure (2)-( 6), we set h x = h y , and the kernel function K(v) is chosen to be the following truncated Gaussian density function: In cases when σ = 0.1, 0.2 or 0.3, n x = 64 or 128, n t = 50 or 100, ρ = 0.1, 0.3 or 0.5, the MSE values of the estimator f (x, y; t) defined in (6) are presented in Table 1, along with the corresponding parameters h x , h t and u selected by the modified CV procedure (7) and (8).In each case considered, the MSE value is computed based on 10 replicated simulations.For comparison purposes, the optimal MSE value of the estimator f (x, y; t), when its parameters (h x , h t and u) are chosen such that the MSE value reaches the minimum in each case considered, is also presented in the table, along with the corresponding parameter values.From the table, we can draw the following conclusions.(i) The MSE values are smaller when either n x or n t is larger, which confirms the consistency results discussed in Section 3.1.(ii) When ρ is larger (i.e., the spatio-temporal data correlation is stronger), the MSE values are larger.So, data correlation does have an impact on the performance of the proposed method, which is intuitively reasonable.(iii) By comparing the MSE and the optimal MSE values, we can see that the MSE values are usually larger than their optimal values, but their differences are not that big in almost all cases considered.This conclusion indicates that the modified CV procedure (7) and (8) for determining the values of the parameters (h x , h t , u) is quite effective.(iv) The parameter values chosen by the modified CV procedure (7) and (8) are quite close to the optimal parameter values in most cases considered.

Table 1.
In each entry, MSE of f (x, y; t) in ( 6) is presented in the first line with its standard error (in parenthesis); the corresponding values of (h x , h t , u) chosen by the modified CV procedure (7) and ( 8) is presented in the second line; the optimal MSE is presented in the third line with its standard error (in parenthesis); the optimal values of (h xy , h t , u) are presented in the fourth line.MSE in the table has been multiplied by 10 3 and standard error has been multiplied by 10 Next, we compare our proposed method, denoted as NEW, with some alternative methods described below.The first alternative method is the conventional LLK procedure (2), by which f (x, y; t) is estimated by a(x, y; t) defined in (3).Its bandwidths are chosen by the conventional CV procedure, without considering any possible spatiotemporal data correlation.As explained in Section 2.1, this estimator would blur edges while removing noise.The second alternative method is to use a(x, y; t) for estimating f (x, y; t), but its bandwidths are chosen by the modified CV procedure (7) and (8).The above two alternative methods are denoted as LLK-C and LLK, respectively, where LLK-C denotes the first conventional LLK procedure that does not accommodate data correlation.The third alternative method is the one by Gijbels et al. [15] which is used for edgepreserving image denoising of a single image.To apply this method to the current problem, individual images collected at different time points can be denoised by it separately.This method assumes that the observed image intensities at different pixels are independent of each other, and thus their bandwidths can be chosen by the conventional CV procedure.This method is denoted as GLQ.The fourth alternative method is to use f (x, y; t) in (6) to estimate f (x, y; t), but the parameters (h x , h t , u) are chosen by the conventional CV procedure.This method is denoted as NEW-C.By considering all these four alternative methods (i.e., LLK-C, LLK, GLQ and NEW-C), we can check whether the current problem to denoise an image sequence can be handled properly by the conventional LLK procedure with or without using the modified CV procedure, by an existing edge-preserving image denoising method designed for denoising a single image, or by the proposed method without considering the possible spatio-temporal data correlation.To evaluate their performance, in addition to the regular MSE criterion, we also consider the following edge-preservation (EP) criterion originally discussed in Hall and Qiu [28]: , and JS( f ) is defined similarly.According to Hall and Qiu [28], JS(f) is a reasonable measure of the cumulative jump magnitude of f at the edge locations.So, EP( f ) provides a measure of the percentage of the cumulative jump magnitude of f that has been lost during data smoothing by using the estimator f .By this explanation, the smaller its value, the better.In cases when σ = 0.1, 0.2 or 0.3, n x = 128, n t = 100, and ρ = 0.1, 0.3 or 0.5, the MSE and EP values of the related methods are presented in Table 2. From the table, it can be seen that the proposed method NEW has the smallest MSE values with quite large margins among all five methods in all cases considered, except the case when σ = 0.1 and ρ = 0.1 where NEW-C has a lightly smaller MSE value than that of NEW due to the weak data correlation in that case.Likewise, NEW has much smaller EP values in all cases considered, compared to the four competing methods.This example confirms that it is necessary to consider edge-preserving procedures when denoising image sequences and the possible spatio-temporal data correlation should be taken into account during the denoising process.It also confirms the benefit to share useful information among neighboring images when denoising an image sequence.In the cases when σ = 0.2 and ρ = 0.1, 0.3 or 0.5, Figure 4 shows the observed images at t = 0.5 in the first column, and the denoised images by the methods LLK-C, LLK, GLQ, NEW-C and NEW in columns 2-6.From the figure, it can be seen that the denoised images by NEW are the best in removing noise and preserving edges.As a comparison, the denoised images by LLK-C, and NEW-C are quite noisy because their selected bandwidths by the conventional CV procedure are relatively small due to the fact the conventional CV procedure cannot distinguish the data correlation from the mean structure, as discussed in Section 2.2.The denoised images by LLK are quite blurry because the method does not take the edges into account when denoising the images.The denoised images by GLQ are quite blurry as well since GLQ denoises individual images at different time points separately and the serial data correlation is ignored in this method.Next, we apply the proposed method NEW and the four alternative methods LLK-C, LLK, GLQ and NEW-C to a sequence of cell images that records the vasculogenesis process.The sequence has 100 images, and each image has 128 × 128 pixels.A detailed description of the data can be found in Svoboda et al. [29].The 1st, 50th and 100th images of the sequence are shown in Figure 5.In the image denoising literature, to test the noise removal ability of a image denoising method, it is a common practice to add random noise at a certain level to the test images and then apply the image denoising method to the noisy test images (cf., Gijbels et al. [15]).To follow this convention, spatio-temporally correlated noise is first generated using the R-package neuRosim and then added to the sequence of 100 cell images described above.When generating the noise, σ is chosen to be 0.1, 0.2 or 0.3 and ρ is chosen to be 0.1, 0.3 or 0.5, as in the simulation examples presented above.The MSE and EP values of the five image denoising methods based on 10 replicated simulations are presented in Table 3.
From the table, it can be seen that NEW still has smaller MSE and EP values in this example, compared to the four competing methods, except in a small number of cases when σ and ρ are relatively small.Table 3. Results for denoising a sequence of 100 cell images.In each entry, the first line is the MSE value and its standard error (in parenthesis), and the second line is the EP value.MSE values in the table are in the unit of 10 3 and the standard errors are in the unit of 10 The 50th observed test image after the spatio-temporally correlated noise with ρ = 0.1, 0.3 or 0.5 being added is shown in the first column of Figure 6  Finally, we apply the five methods considered in the above examples to a sequence of Landsat images of the Salton Sea region.The Salton Sea is the largest inland lake located at the southern border of California, US, and has a great impact on the local ecosystem (Shuford et al. [30]).The Landsat images used here were taken during the time period of 27 May 2000 and 24 December 2001.There are a total of 20 images collected at roughly equally-spaced time points, and each image has 100 × 100 pixels.In this example, we consider the case when σ = 0.3 and ρ = 0.3.The MSE values of the five methods LLK-C, LLK, GLQ, NEW-C, and NEW calculated in the same way as before are 9.70, 4.78, 12.03, 9.77, and 4.82, respectively.Their EP values are respectively 85.54%, 20.18%, 109.91%,86.15%, and 19.14%.So, we can see that NEW method has the best edge-preserving performance among the five methods in this example, and NEW and LLK have the best overall noise removal performance.The 10th noisy observed test image taken on 28 April 2001 and its denoised versions by the five methods are shown in Figure 7.It can be seen from the figure that the denoised images by the methods LLK-C, GLQ, and NEW-C are still quite noisy, and the noise in the images generated by NEW and LLK is mostly removed while the edges are preserved reasonably well.

Conclusions
In this paper, we have described our proposed edge-preserving image denoising method for handling image sequences.Some major features of the proposed method include (i) helpful information in neighboring images is shared during image denoising, (ii) edge structures in the observed images can be preserved when removing noise, and (iii) possible sptio-temporal data correlation can be accommodated in the related local smoothing procedure.Theoretical arguments given in Section 3.1 and numerical studies presented in Section 3.2 show that the proposed method works well in various cases considered.There are still some issues about the proposed method for future research.For instance, in the proposed local smoothing procedure (2)-( 6), each of the bandwidths (h x , h y , h t ) is chosen by the modified CV procedure (7) and ( 8) to be the same in the entire design space Ω × [0, 1].Intuitively, relatively small bandwidths are preferred at places where the image intensity surface f (x, y; t) has large curvature and relatively large bandwidths are preferred at places where the curvature of f (x, y; t) is small.Thus, in some applications where the curvature of f (x, y; t) could change quite dramatically in the design space, variable bandwidths might be helpful.Such issues will be studied carefully in our future research.
where C ≥ 0 is the Lipschitz constant that satisfies the condition |K(u So, the first result in Proposition 1 is valid.
To prove the second result, it can be checked that ).
Similarly, it can be checked that ).
The first inequality in the above expression is based on the result in Davydov [31].So, the third result is valid.

Appendix A.2. Proof of Theorem 1
We first consider the case when (x, y; t) ∈ Ω h \ J h .By Taylor expansion, we have So, it can be checked that By some simple algebraic manipulations, we have . Now, we consider the case when (x, y; t) ∈ J h \ S h .If (x i , y j ; t k ) ∈ I 1 , then we have and if (x i , y j ; t k ) ∈ I 2 , we have By some similar arguments to those in the case considered above, we have We prove the second equations in (10) and (11) here.The first equations can be proved similarly.For simplicity, we write a (l) (x, y; t), b (l) (x, y; t), c (l) (x, y; t), d (l) (x, y; t), O (l) (x, y; t) and O (l) (x, y; t) as a (l) , b (l) , c (l) , d (l) , O (l) and O (l) , respectively from now on.First, by Proposition 1, it is easy to show that ∑ ijk (ε 2 ijk − σ 2 )K .
The main difference between this case and the previous case in the proof is in the derivation of the result of (A8).For e(x, y; t), the corresponding result is

Figure 1 .
Figure 1.Two Landsat images of the Las Vegas area taken in 1984 (left panel) and 2007 (right panel).

Figure 2 .
Figure 2. The neighborhood O(x, y; t) is divided into two parts by a plane that passes (x, y; t) and is perpendicular to the estimated gradient direction G(x, y; t).

Figure 5 .
Figure 5.The 1st, 50th and 100th cell images of the image sequence for describing a vasculogenesis process.
. The denoised images by the five methods LLK-C, LLK, GLQ, NEW-C and NEW are shown in columns 2-6 of the figure.It can be seen that similar conclusions to those from Figure 4 can be made here, and the denoised images by NEW look reasonably well, as the algorithm work well in removing noise and preserving edges.

Figure 7 .
Figure 7.The first image is the observed landsat image of the Salton Sea region taken on 28 April 2001 after the spatio-temporally correlated noise with σ = 0.3 and ρ = 0.3 being added.Second to sixth images are its denoised versions by LLK-C, LLK, GLQ, NEW-C, and NEW, respectively.

Table 2 .
In each entry, the first line is the MSE value with its standard error (in parenthesis), and the second line is the EP value.MSE values in the table are in the unit of 10 3 and the standard error values are in the unit of 10 5 . 5.