Unsupervised Change Detection for Multispectral Remote Sensing Images Using Random Walks

In this paper, the change detection of Multi-Spectral (MS) remote sensing images is treated as an image segmentation issue. An unsupervised method integrating histogram-based thresholding and image segmentation techniques is proposed. In order to overcome the poor performance of thresholding techniques for strongly overlapped change/non-change signals, a Gaussian Mixture Model (GMM) with three components, including non-change, non-labeling and change, is adopted to model the statistical characteristics of the different images between two multi-temporal MS images. The non-labeling represents the pixels that are difficult to be classified. A random walk based segmentation method is applied to solve this problem, in which the different images are modeled as graphs and the classification results of GMM are imported as the labeling seeds. The experimental results of three remote sensing image pairs acquired by different sensors suggest a superiority of the proposed approach comparing with the existing unsupervised change detection methods.


Introduction
In remote sensing, change detection is to identify the difference between two images of the same scene acquired at different times, which is a critical technique in many remote sensing applications, such as land use change monitoring [1], disaster damage assessment [2,3], urban expansion study [4] and map updating [5], etc.Many factors can affect accuracy of the detection results.Identifying task dependent changes between two images is still a challenging task.Numerous techniques have been proposed.Most of them can be grouped into two categories: supervised and unsupervised approaches.
In supervised approaches, change detection is considered as a binary classification problem using bi-temporal images.In which spatial or/and spectral features are extracted from two images (or difference images).Then, classifiers such as Support Vector Machines (SVM) [6], decision trees [7] and Neural Networks [8,9] are employed to perform the classification.To guarantee a better result, a large number of training samples with human labeled ground truth need to be fed into the classifiers after feature extraction.These methods are known as pre-classification change detection methods.Another widely used strategy in supervised approaches is called post-classification change detection or map-to-map change detection.The first step of these methods involves generation of land cover maps from images of different times.Then, the changes can be easily identified between them.The maps can be produced independently.Thus, pre-processing such as radiometric normalization may not be necessary, and cross-sensors change detection also may not be a difficult issue.However, the major limitation is obvious in which post-classification depends on the quality of land cover maps, which is very difficult to be produced.Therefore, a variety of studies [10,11] have been proposed to improve detection accuracy.The advantage of supervised approaches is that a complete description of transitions and the "from-to" information between two acquisitions can be obtained, and the accuracy of detection results are usually higher than other methods.In spite of success, the demand of large volume training data is very difficult to be satisfied, which limits its application.
Compared with supervised methods, unsupervised methods do not aim to provide the "from-to" information of the changes, but the change/non-change map.Furthermore, they do not need any human labeling process to produce ground truth, which is favored in many practical applications.In this paper, we focus attention on unsupervised change detection in multi-spectral images.
The unsupervised change detection methods are typically applied on the difference images of multi-temporal remote sensing images that can be derived by different ways, including subtraction, ratio, and image transformations (e.g., principle component analysis (PCA), change vector analysis (CVA)) [12][13][14][15][16][17].The purpose of detection is to label all pixels in the difference images as change or non-change.The thresholding techniques are widely adopted to determine the critical boundary between change and non-change [18,19], such as the minimum error thresholding proposed by Kittler and Illingrowth [20] and the EM (Expectation-Maximization) based method proposed by Bruzzone and Prieto [21].These methods provide a fast and simple way to generate the change map, and work well in general.However, when change and non-change classes are strongly overlapped in feature space, or when their statistical distribution cannot be modeled accurately, the thresholding techniques can not provide satisfactory results.To tackle this problem, some researchers proposed improved approaches.In [22], Markov Random Field (MRF) is applied to construct a fusion framework of assembling different thresholding algorithms to obtain more robust results.Bovolo et al. [23] proposed a two-phase detection method, in which the point set derived by a selective Bayesian thresholding is treated as a pseudo-training set of a binary semi-supervised SVM classifier to generate change map.Bazi et al. [24] applied an active contour algorithm to the pyramid of difference images to segment them into two parts, which are labeled with change or non-change.In Bazi's method, Kittler's thresholding method [20] is used to generate the initial contour.In these methods, thresholding techniques still play an important role, which suggests that the intensity of difference signal is a necessary feature for identification of change and non-change.To solve the overlapped distribution of change and non-change signals, other features such as spatial structure information should be considered.
In this paper, an unsupervised change detection method using random walks is proposed.Random walk is a mathematical formalization of a trajectory that consists of taking successive random steps, which is applied here to determine the boundaries of change and non-change.An image is usually modeled as a connected graph consisting of vertices and edges.The probabilities that random walkers starting at each vertex will first arrive at predefined vertices are the critical part that need to be computed.In [25], Grady proposed an image segmentation method using random walks and presented an analytical and quick algorithm to determine the probabilities.The experimental results demonstrate the superiority of Grady's method in terms of locating weak boundaries and noise tolerance.However, to perform an unsupervised change detection, the predefined vertices (i.e., pre-labeled pixels) should be identified automatically.The thresholding technique is a reasonable choice for this task.In this paper, we extend Bruzzone's method [21] and propose an adaptive PCA strategy to enhance the change signals in the difference images.PCA transform is applied to the difference images obtained by pixel-wise subtracting band by band.Each PCA component is considered as a mixture of three kinds of signals: change, non-labeling and non-change, and modeled by a Gaussian Mixture Model (GMM).The EM algorithm is adopted to estimate parameters of GMM for each selected component.Then, the pre-labeled map of each selected component is generated using Bayes rule.Random walks are applied to determined the classes of non-labeling signals.Finally, the change map is obtained by fusing the component change maps.
This paper is organized as follows.A brief review of the random walks algorithm on graphs is given in Section 2. Section 3 describes the proposed unsupervised change detection method.Section 4 provides experimental results of the proposed approach and comparisons with four commonly used methods, including the EM based thresholding method [21], the MRF based method [21], Celik's method [26] and Bazi's method [24].Finally, conclusions are given in Section 4.

Random Walks on Graph
Random walk is one of the first stochastic processes studied in probability and continues playing an important role in many applications.When it is applied to the labeling task of an image, the image is usually modeled as a finite connected weighted graph.Therefore, we focus on the random walks on graph.
A graph [27] is usually modeled as a pair of sets G = (V, E) such that E ⊆ [V] 2 .The elements v ⊆ V are the vertices of the graph G.The elements e ⊆ E are the edges of the graph G.An edge e ij denote the edge or connection between two vertices (v i , v j ).For a two-dimensional image, each pixel can be treated as a vertex, and the edges can be established with the four-neighbor or eight-neighbor relationship of pixels.The weight w ij of edge e ij is usually used to denote the intensity or color changes in the graph-based images segmentation method [25,28].In these works, w ij is defined as the typical Gaussian weighting function given by where g i and g j indicate the intensity or color vector of ith and jth pixel, respectively.The degree d i of vertex v i is sum of the weights of all edges incident on v i , Suppose there are K labels L k , k ∈ 1, 2, ..., K, and the problem of random walk labeling is to identify the probability p(v x |L k ) that the walker starting at the vertex v x reaches a vertex pre-labeled by L k first before the vertex pre-labeled by other labels.For a general graph, this is not a facile task.Fortunately, Doyle and Snell [29] established the connection between the concepts of resistor and electric potential in electric networks and corresponding descriptive quantities of random walks on the graph.If the electric potential at the vertices pre-labeled by L k are fixed to unity and other pre-labeled are set to zero, the electric potential u(v x ) on the vertex v x equals probability p(v x |L k ).The probability problem is converted to finding a harmonic function given its boundary values, i.e., the Dirichlet problem.
A harmonic function is a function that satisfies the Laplace equation The solution u(x) of Dirichlet problem is a harmonic function defined in the region Ω, which minimizes the Dirichlet integral Considering the application on graph, Grady and Schwartz [30] presents a combinatorial formulation of Dirichlet integral where L is the combinatorial Laplacian matrix defined as If all the vertices in the graph G are ordered such that the pre-labeled vertices are first and the non-labeling vertices are second.Equation ( 4) can be decomposed as Equation ( 6) is a system of linear equations with u n unknown.Because L is a positive semi-defined matrix, the only critical points of D[u] will be minima.The critical point yields u l represents the potentials or the probabilities of pre-labeled vertices.Considering the probabilities p(v l |L k ) of pre-labeled vertices, it is obvious that if a vertex is pre-labeled by L k , the probability is one; otherwise, it is zero.Let u k l and u k n indicate the potentials (probabilities) of pre-labeled and non-labeling for L k .u k n can be obtained by solving Equation ( 8): The label assigned to a vertex v i is determined as It should be noted that the potential of each pixel for all possible labels should satisfy the constraint as follows: Figure 1 illustrates the labeling procedure using the aforementioned algorithm.The pre-labeled vertices are marked first, and then the potentials for all labels are computed.The final results of random walks are shown in Figure 1c.

Proposed Unsupervised Change Detection
The proposed technique attempts to extract a reliable subset of change and non-change pixels as the pre-labeled sets, and then obtains the change map using the labeling algorithm based on random walks.It consists of three main parts: (i) enhancement of the change information using an adaptive PCA approach; (ii) extraction of pre-labeled sets based on Bayes rule; (iii) generation of change map using the random walk algorithm.

Enhancement of the Difference Image Using Adaptive PCA
PCA has been used in change detection for many years and has became one of the most popular techniques for its simplicity and capability of enhancing change information [31].There are two ways to apply PCA for change detection: (1) put two or more images acquired on the same area at different times into a single file and then perform PCA and analyse the minor component images for change information; and (2) perform PCA separately and then analyse the difference images between the multi-temporal PC images [32].Identifying the best component to represent the change and selecting appropriate thresholds are challenging tasks, especially when different kinds of changes exist.In this work, an adaptive PCA approach is proposed.
Let X t1 and X t2 denote two multi-spectral images of the same scene acquired at different times.Each pixel in a multi-spectral image is represented as a spectral vector, where i is index of a pixel.b represents the bth band of image and B is the number of spectral bands.It is often assumed that X t1 and X t2 have been co-registered and calibrated to reduce the differences in spatial, lighting, and atmospheric conditions between them.Following the CVA method [33], change vector d of each pixel is obtained by subtracting operation . PCA transform is applied to all change vectors.The transformed component vectors can be represented as c i = {c bi , b = 1, . . ., B, i ∈ N}, in which the components are sorted in the order of decreasing eigenvalues.The components with absolute values C i = {|c bi |, i ∈ N} are used in the change detection procedure.In the following sections, c bi denotes the absolute value of the ith pixel in the bth component.
We assume that all pixels in each component can be classified into three groups: change, non-change and non-labeling.Pixels having lower or higher change amplitudes are grouped to non-change or change.Pixels that are difficult to be labeled as change or non-change belong to the non-labeling class.The statistical characters of each category are modeled by a Gaussian distribution N(µ, σ).Then, the histogram of a PCA component is represented by a GMM, as shown in Equation (11) and Figure 2: w n,i , w nl,i and w c,i are the weights of the distribution models of change, non-labeling and non-change, and should satisfy the following two constraints: Because the absolute values of difference images are used here, the means of these three signals should satisfy the following InEquation (14): All parameters w, µ and σ in Equation ( 11) can be estimated using the EM algorithm.The enhancing effect of PCA lies in making the change signal easy to be separated from non-change and non-labeling.The separability between two classes can be measured by the Mahalanobis distance of two class centers: Considering the relationship among these three signals, the distances between change and non-labeling, or non-change is preferred to be larger.On the contrary, the distance between non-labeling and non-change should be smaller.Therefore, an index F is defined as Equation ( 16) to measure the distinctness of change signals Based on this measurement, the component selection algorithm is described as follows: Step 1: Initialize the selection threshold t, 0 < t ≤ 1.
Step 2: Normalize F b as Equation ( 17) and sort f b in decreasing order: Step 3: If max{ f b } > t, then the corresponding component is the unique selected component.
Step 4: Otherwise, repeatedly abandon the component with min{ f b }, until the sum of the rest is less than or equals to t.Then, the remaining components {C s , s ∈ S} are selected for change detection.
The threshold t is recommended to be not less than (1 − 1 B ).When t is set to 1, it means all PCA components are selected.

Identification of Pre-Labeled Set Based on Bayes Rule
In Section 3.1, the histogram of a component is modeled with GMM.Equation ( 11) can be rewritten in its probability form: P(c bi ) = P(ω n )P(c bi |ω n ) + P(ω nl )P(c bi |ω nl ) + P(ω c )P(c bi |ω c ), (18) where ω n , ω nl , and ω c indicate the labels of the three signal classes.It is reasonable to use the Bayes rule to label each pixel in the selected component, as shown in Equation (19).The pre-labeled set is generated by extracting the pixels whose labels are ω n or ω c : However, the noise in the selected components still has an impact on labeling accuracy.To alleviate this problem, a method is proposed based on Gaussian stack.In the Gaussian stack, the original component C i is treated as the bottom level L 0 .The nth level L n of the stack is represented as Equation ( 20): where G(•) indicates a rotationally symmetric Gaussian filter, w is the filter size, and σ is the standard deviation.The final pre-labeled set is obtained by merging the labeling results of L 0 and L l as follows: It should be noted that the arguments of Gaussian filter w, σ, and the number l of stack level are all used to control the Gaussian stack.To simplify the parameter setting, only l was treated as a variant, w and σ are fixed at 5 and 3, respectively.

Generation of Change Map Based on Random Walks
For each selected component, the pre-labeled sets of non-change and change are treated as the seeds of the random walks algorithm introduced in Section 2. Each non-labeling pixel is assigned a certain label ω n or ω c .The labeling algorithm [25] can be summarized as follows: Step 1: Initialize the selected component C s and the pre-labeled map M ns and M cs .
Step 3: Solve Equation ( 8) to obtain the potentials U sn for non-change label.The potential U sc for change label can be derived by U sc = 1 − U sn for computational efficiency.
Step 4: Assign a label to each non-labeling pixel according to Equation (9).
Considering the information of different change types is enhanced and represented in different PCA components, the change map is obtained by summing the labeled results of all selected components as Equation ( 23) The whole procedure of the proposed unsupervised change detection approach is illustrated in Figure 3. Figure 4 shows an example of the overall process of our method from the difference images to the generation of change map.The proposed adaptive PCA-based method is applied to the difference image (Figure 4a) to extract the representative principle components (PCs) (Figure 4b) containing the change information.Then, the selected components are labeled with change (red), non-change (green) and non-labeling (black) based on the Bayes rule, as shown in Figure 4c.The pre-labeled map and the selected components are fed to the random walk algorithm.Each non-labeling pixel is assigned to a label that has the maximum potential (probability).Finally, the change map as shown in Figure 4d can be obtained.(c) (d)

Experiments
In order to assess effectiveness of the proposed unsupervised change detection method, three multi-temporal image pairs are selected as test data sets as shown in Figures 5-7, which were acquired by different sensors and contain various land cover changes.The proposed method is compared with four widely used change detection methods on the test data sets.

Data Sets
The first data set as shown in Figure 5a To obtain accurate change detection results, all of these image pairs have been co-registered using ENVI 4.8 (Exelis Visual Information Solutions, Boulder, CO, USA) and calibrated using the method proposed in [37].We manually labeled the ground truth as shown in Figure 8a-c

Parameter Setting
In the proposed method, there are three parameters that need to be specified, including the component selection threshold t, the number l of Gaussian stack and the Gaussian weighting parameter β in Equation (1).t determines the number of selected PCA components.As mentioned in Section 3.1, it is recommended in the range of ( B − 1 B , 1).Generally, the level number of Gaussian stack l is a trade-off between the minimum isolate change region and the strength of noise.It will result in no pre-labeled pixels existing in the small change region if l is set to a higher value.Considering that data at a high spatial resolution contains more noise factors than that at low spatial resolution, such as the view angle and the sun angle, and the value of l assigned for the pan-sharpened QuickBird data is higher than that for TM and ASTER data.In this work, we set l = 2 for ASTER and TM images and l = 6 for Quickbird images.For the Gaussian weighting parameter β, we set β = 90 as recommended in [26].

The Effect of the Adaptive PCA Approach
The adaptive PCA approach presented in Section 3.1 is applied on the difference images to extract the components containing enhanced change information.This is the fundamental phase of this method.The threshold t is empirically set to 0.8 for all test data.Figure 9a-d show four PCs of QuickBird difference images and their normalized indices.Intuitively, the components containing more enhanced change signals have a higher values of f i .Based on this assumption, Figure 9b,c should be selected as a representative component for QuickBird images.From Table 1, it can be seen that the proposed index successfully identify true components.From Figure 9e,f and Table 1, it can also be concluded that the component selected by visual inspection and the proposed index are consistent both in TM and ASTER images.

Comparisons with Other Change Detection Methods
In order to illustrate the effectiveness of the proposed change detection method (PCA-RW for short), it is compared with four commonly used unsupervised change detection approaches: (1) EM based thresholding method [21]; (2) MRF-based method [21], in which Besag's iterated conditional modes (ICM) algorithm [38] is used to minimize the MRF energy function initialized by EM algorithm; (3) Block-PCA method (BPCA) [26], in which PCA is applied to h × h non-overlapping block set to detect the change area; (4) and multi-resolution level set method (MLSK) [24], in which the Chan-Vese algorithm [39] is applied on the pyramid of difference images to obtain the change map.These existing methods all depend on some parameters except for the EM based method.In the MRF-based method, β tunes the influence of the spatial contextual information during the change detection process.The size h of the image block needs to be set in the BPCA method.In the MLSK method, the regulation parameter µ controls the tradeoff between the goodness of fit and the length of the profile curve.In this paper, all of these parameters are tuned to achieve the best results for the test data, which are β = 2, h = 4, µ = 0.2.For the proposed method, the level number of Gaussian stack l = 4 is used.
The performance of these methods are evaluated using the following three standard error measures: 1. False-alarm rate P f , where N ce is the number of unchanged pixels classified as changed pixels, and N c is the total number of unchanged pixels.2. Missed-alarm rate P m , where N ne is the number of changed pixels classified as unchanged pixels, and N e is the total number of changed pixels.3. Error rate P e is the total number of wrongly classified pixels over the total number of pixels, Figure 10 shows change maps obtained by the proposed and four compared methods on the three test image pairs.From Figure 10a, it can be seen that the EM based method produces worst results on all three data sets.There is a lot of noise in TM and ASTER images.Note that a large part of the building is missing in results of Quickbird images.In Figure 10b, the MRF based method also exhibits significant noise in results of TM images.Noise still exists in ASTER images but is reduced to a lower level.A similar part is missing in the change map of Quickbird images.BPCA produces results similar to that of MLSK in the three data sets.However, careful inspection may find that there exist some missing parts on both results of QuickBird images (see the last row of Figure 10c,d).It can be observed from Figure 10e that the proposed method provides results very close to ground truth maps.
Table 2 gives quantitative results on the three data sets.It can be easily seen that the EM method produces results with worst measure indexes.The MRF based method has the best P m on the TM images.The noise in TM results actually classify more pixels into changed ones, leading to a lower P m and higher P f .This is consistent with visual inspection.The BPCA method obtains relatively good results with neither lowest nor highest measure indexes.MLSK shows its efficiency in terms of false-alarm rate with best P f on all three data sets, which means that almost all pixels classified as changed are correct.To this end, algorithms should have a rigorous criterion to decide which pixels are changed ones, which will have a severe impact on P m and P e .This can be observed in Table 2.The proposed method outperforms other methods for overall error rate P e , which also demonstrates its advantage in reducing the missing alarm.It can be inferred that PCA-RW can make a trade-off between missed-alarm and false alarm rates.Furthermore, it is suitable for detecting multiple types of changes.

Conclusions
This paper presents an unsupervised change detection method based on random walks for multi-spectral remote sensing images.Normally, supervised methods obtain better results than unsupervised ones because knowledge about changed and unchanged information are used during training steps.This work can be considered as a special case of a "semi-supervised" method.Firstly, an adaptive PCA approach is proposed to enhance and extract the change information between multi-temporal images.Subsequently, a GMM is applied to obtain three category labels, including changed, unchanged labels and undermined labels.Then, the pre-labeled changed and unchanged mask is fed to a random walk algorithm to obtain the labels of undermined pixels.In this process, the pre-labeled changed and unchanged pixels can be considered as "training samples".They implicitly contain change and unchanged information.Because no external data is used, the proposed method can still be called "unsupervised" change detection.The method is tested on three image pairs acquired by ASTER, TM and QuickBird and compared with EM, MRF, BPCA and MLSK methods.Results demonstrate that the proposed approach can generate promising change maps.Although there is a smaller sacrifice on P f , our method can obtain better quantitative indexes in terms of P e and P m .

Figure 1 .
Figure 1.The labeling procedure using random walks.The vertices pre-labeled with two labels are separately marked with gray and black color.The conditional probability (potential) of each vertex is represented within the circle.(a) probabilities that a random walker starting from each vertices.First reaches the pre-labeled gray vertex; (b) probabilities that a random walker starting from each vertices.First reaches the pre-labeled black vertex; (c) labeled results with the random walk algorithm.

Figure 2 .
Figure 2. Histograms of a principal component modeled with three Gaussian distribution: change (c), non-labeling (nl), and non-change (n).

Figure 3 .
Figure 3. Flowchart of the proposed unsupervised change detection approach.

Figure 4 .
Figure 4. Example of proposed unsupervised change detection.The original multi-temporal images are retrieved from [34].The first principle component (b) are selected by the algorithm described in Section 3.1, in which t = 0.8.In the pre-labeled map of selected component (c), pixels of change, non-change and non-labeled are represented with red, green and black.The change pixels in (d) are marked with white.(a) false color image obtained by CVA; (b) the component selected by proposed algorithm; (c) pre-labeled map of the selected component; and (d) change map obtained by random walks.

Figure 8 .
Figure 8. Manually labeled ground truth for three test data sets.(a) ground truth map of Hobet (TM); (b) ground truth map of Mountain Hiller (ASTER); (c) ground truth map of Shijingshan (QuickBird).

Figure 9 .
Figure 9. Four PCA components of the QuickBird difference image and the selected PCs of TM and ASTER data.(a) the first PC for QuickBird images, f 1 = 0.0888; (b) the second PC, f 2 = 0.2375; (c) the third PC, f 3 = 0.5449; (d) the fourth PC, f 4 = 0.1288; (e) the selected component for TM images, f selected = 0.7291 ; and (f) the selected component for ASTER images, f selected = 0.9991.

Figure 10 .
Figure 10.(a-f) Change detection results for different approaches on the test data sets.

Table 1 .
Normalized component selection index for three datasets.

Table 2 .
Error rate (P e ), false-alarm rate (P f ) and missed-alarm rate (P m ) in percentage achieved by different change detection methods on the three data sets (Hobet, Mountain, Shijingshan).