Similarity Measures of Remotely Sensed Multi-Sensor Images for Change Detection Applications

Change detection of remotely sensed images is a particularly challenging task when the time series data come from different sensors. Indeed, many change indicators are based on radiometry measurements, used to calculate differences or ratios, that are no longer meaningful when the data have been acquired by different instruments. For this reason, it is interesting to study those indicators that do not rely completely on radiometric values. In this work a new approach is proposed based on similarity measures. A series of such measures is employed for automatic change detection of optical and SAR images and a comparison of their performance is carried out to establish the limits of their applicability and their sensitivity to the occurred changes. Initial results are promising and suggest similarity measures as possiblechange detectors in multi-sensor configurations.


Introduction
Change detection analyzes a pair of images of the same subject acquired at different times to detect eventual changes occurred between the two data collections.In remote sensing applications, one deals with the same geographical area trying to perform, for example, environmental monitoring [1][2][3], study on land use/land cover dynamics [4,5], agricultural surveys [6], analysis of forest or vegetation changes [7][8][9], damage assessment [10,11] or analysis of urban changes [12][13][14].
In most of the literature of common knowledge, the comparison between two or more images is performed on data acquired by exactly the same sensor [2,9,14] or by sensors of the same type [13,15] but at present multi-sensor image change detection, using for instance optical and synthetic aperture radar (SAR) images [5], has also become a concrete possibility, at least in terms of data availability, due to the increasing number of operational airborne and spaceborne sensors.Hence, there is the need for the development of technical "tools" to exploit such data in a combined way taking advantage of their characteristics and complementarity.A relevant example of the importance of multi-sensor methodologies is the following: in the presence of a natural disaster for which rapid mapping of the damages is needed, it may be given that optical data from an archive are available for the "before" scenario, but only SAR data are available for the "after" scenario due to adverse atmospheric conditions.
We present here a novel approach for change detection based on the use of similarity measures.Such measures have been applied until now only to image coregistration [16][17][18][19] (and mainly in the field of medical imagery).In particular, one of their principal properties is their capability to operate in the multi-sensor case.Our idea is then to profit from this property and to use the correspondence between the same points in the two images not to correct the relative displacement but, already given their precise coregistration, to detect eventual changes occurred between the data acquisitions.
Another reason of interest for similarity measures is that basically no example of their use for change detection has been yet reported in the literature.The works published in this field describe methods that can be finally divided into two main groups: • those that operate a preliminary feature extraction or classification of the images and then search for transitions of the pixels from one feature to another (hence, permitting a boolean comparison and a direct yes/no response for the change/no-change definition); • methods that estimate the difference of the radiometric values of the image pixels (via a straightforward subtraction or using ratios, also in logarithmic form, as is common practise for SAR images) and then establish if a change occurred based on thresholding criteria.Similarity measures belong to this second group.
However, even the very extensive review of change detection techniques by Lu et al. [15] does not mention similarity measures as a possible way to perform this task nor cites any reference to them.Thus, a summary of the theory on similarity measures, their systematic investigation and a comparative analysis of their performance represent, in our opinion, a useful scientific contribution.Given these motivations, optical and SAR images acquired at different times were coregistered and a series of similarity images were derived and used as indicators of the changes.The experiments were carried out on two data sets relative to the test sites of Toulouse, France, and Oberpfaffenhofen, Germany, and both qualitative and quantitative analyses were conducted.Open questions and suggestions for further investigations are mentioned along with our observations.
In Sections 2 and 3 we begin by reviewing the theoretical background of the similarity measures objective of this study and, after describing the experimental procedure in Section 4, we report the results obtained for the two test cases: those based on the Toulouse data are discussed in Section 5 and those relative to the Oberpfaffenhofen data in Section 6. Final considerations and comments are provided in Section 7.

Similarity Measures
The leading principle of similarity measures is stated in [19] and may be summarized as the consideration that, although two images of the same scene acquired by different sensors can be characterized by completely different radiometric properties, a common "basis" is shared by the two data sets since they are different representations of the same reality.The key question is then to correctly retrieve this correspondence.
In registration procedures [20], where the goal is to find a spatial transformation relating two images, a common way to proceed is to extract features from each image with a segmentation step and minimize the distance criterion between these features.More recent approaches are based directly on the intensity values of the image pixels and do not need the preliminary feature extraction.Since image coregistration and change detection have many aspects in common, the following considerations will be valid for both topics.
By definition, given two images I and J, a similarity measure is any scalar strictly positive function f (I, J, c) that quantifies how similar are the images according to the criterion c. f has an absolute maximum when I and J are identical according to c [19].The selection of the similarity criterion, and hence the definition of the function f , can vary according to the type of images under analysis, the application (e.g., image registration or change detection) and the parameters used to define it (radiometric values, features characteristics, etc.).
In general, I and J are mappings of the type [18]: where d is the dimension of the image (for instance, in many medical imaging applications one deals with the 3D case) and x k is its size along the k-th axis.We will then denote a general spatial coordinate as x.The domain D I is the set of intensities of I and, typically, for the two images it is: D I = D J = [0, 255] ⊂ N, referring to the grey levels.I and J may be regarded as two random variables, taking their values in D I and D J , which have marginal probability distributions: p I (I(x) = i) and p J (J(x) = j), respectively, and joint probability distribution: p IJ (I(x) = i, J(x) = j).
Based on the probability distributions of the images, several similarity criteria c can be defined.The necessary a priori assumption is the following: the two images are linked, thus the link is maximal when there are no differences (due to changes or registration errors) between the two.The link may be evaluated by the notion of dependence.Intensity distributions are dependent if a variation in one distribution leads to the variation in the other.The case of maximal dependence is given when I and J are related by a one-to-one mapping T such that: the case of statistical independence when: Inglada [21] already discussed the change detection between two data collected by the same optical sensor.The solution proposed there, and common to most of the measures discussed in [19], was to use the statistics of each pixel's neighbours.Indeed, p I (i), p J (j) and p IJ (i, j) can be estimated for the subsets of the whole images, then a pixel with a given intensity i is defined as changed if the probability p I (i) calculated with respect to that of the neighbours is not the same as that of the corresponding pixel in the image J (i.e., if it changes beyond a given threshold).To evaluate this, for each pair of corresponding pixels in the two images I and J, an estimation window has to be fixed (see Figure 1).Then, the marginal and joint probabilities are calculated and used in the definition of several functions f .More in detail, similarity measures may be distinguished based on only the probability estimations or on the combined use of the probabilities and the radiometric information (its mean value in the estimation window or its variance).In the following, we will report the results obtained using five measures of these two main groups, namely: 1. Measures using only the probabilities:

Measures combining probabilities and radiometric values:
• normalized standard deviation or Woods criterion (for this measure two different formulations will be discussed) • correlation ratio.
The approach adopted here to the statistical analysis necessary to calculate the similarity measures is analogous to that of [21] and [19]: histograms of the radiometric values for corresponding estimation windows are used to define the probabilities by simple normalization.Alternatively, in [22] these were derived from the probability distribution functions obtained by a parameterized formula (made explicit by calculating its parameters from the pixels of the estimation windows).

Distance to Independence
As anticipated, the condition of statistical independence is formalized by Equation (3).Hence, the difference of the two terms, the product of the marginals and the joint probability, directly measures the degree of independence of the two images:

Mutual Information
By expressing the same difference in logarithmic form, it is possible to refer to the concept of entropy introduced by Shannon [23]: which is a measure of the amount of uncertainty about the random variable I.The entropy is null when an event (in our case, a given pixel value) is certain, i.e., p I (i) = 1, and has a maximum when all events have the same probability: max(H(I)) = log n, where n is the number of possible values of I.
The relationship between the difference of the probabilities and the entropy is given when defining the quantity [16]: for which the following equations hold: H(I, J) is the joint entropy of I and J, whereas H(I | J) and H(J | I) are the conditional entropy of I given J and of J given I, respectively.These are defined as: where p I|J (i | j) is the conditional probability of I given J.
Since H(I | J) measures the uncertainty left in I when knowing J, the mutual information M I(I, J) estimates the reduction of the uncertainty of one random variable by the knowledge of the other, or the amount of information that one contains about the other.

Cluster Reward Algorithm
A further measure based on only the joint and the marginal probabilities is the cluster reward algorithm, which is defined as: As stated in [19], the CRA(I, J) index has a large value when the joint histogram has little dispersion.This can be the result of a good correlation (histogram distributed along a line) or the clustering of the image intensities within the histogram.In both cases, it is possible to predict the values of one image from those of the other.
An observed advantage of this measure, with respect to the previous ones, is that the joint histogram noise resulted from the estimation has a weaker influence, thus smaller windows may be considered to derive the histogram [19].

Woods Criterion
Along with the probabilities, the radiometric values of the image pixels can also be directly used to estimate the correlation between two images.At this scope, it is practical to introduce the conditional mean: and the conditional variance: By their means it is then possible to define a measure of the variability of the pixel intensity in one image, given a certain value of the homologous ones in the other [24].The assumption is that this variability is larger in the presence of differences (again, changes or misregistration results) between the images.The measure is thus introduced as:

Correlation Ratio
Finally, the correlation ratio is defined in a similar way using the conditional mean and variance: but also the variance σ 2 I of the radiometric values of one of the images.

Robust Measures
An assumption necessary to the application of the Woods criterion is the inter-image uniformity, which means that pixels of corresponding areas in the two images should have proportionally similar radiometric values.For example, in the specific case of [24], this yields that voxels of 3D medical images representing the same human tissue must have similar intensities within each image.This is a relatively strong requirement that is rarely verified in other cases of medical imagery coregistration.In fact, it may be more often observed that the joint histograms of the images to be coregistered represent a mix of several populations even when considering an homogeneous region in one of the images [17].
Consequently, the calculation of the averages and variances used in Equation ( 15) is influenced by the presence of several populations with different distributions.
In order to improve the robustness of the similarity estimation, especially in the case of multi-sensor data, averages and variances have to be calculated in such a way to minimize the effects of outliers altering the distribution of the "main" population.This is possible by using the Geman-McClure estimator, which is defined as: In Equation ( 17), x is the grey level residual given by the difference between corresponding pixels in the case of mono-sensor images and by the difference with the conditional mean for multi-sensor data (i.e., in this case, it is implicit in the hypothesis of uniformity of the grey levels in the estimation window) and C is a scale parameter.
Given its shape, the function ρ(x, C) permits the reduction of the relevance of the largest distribution deviations.Indeed, it is easy to verify that, as the amplitude of the residual errors increases, ρ(x, C) tends to a constant value [17].

Robust Woods Criterion
Based on the properties of robust estimators, an alternative formulation of the Woods criterion is possible as described in [17].Conditional mean and conditional variance can be calculated in robust form as: with and The parameters C j are defined as the median values of the absolute values of the residual errors: This last expression derives from the fact that the median value of the absolute values of a large number of samples normally distributed and with unitary standard deviation is just equal to 1 1.4826 = 0.6745 [17,25].
The robust definition of the Woods criterion is:

Experimental Approach
The performance analysis of the similarity measures presented in the previous sections was based on two data sets relative to the test sites of Toulouse, France, and Oberpfaffenhofen, Germany.Optical and SAR data were considered.The study was split in two parts related to each data set: a general analysis of the similarity images was first conducted, aimed at defining their main characteristics and retrieving the features that define a low or high similarity, then a more detailed study was carried out to provide accuracy estimations of the derived change/no-change maps (with reference to man-made structures).Some processing steps were performed in common for both data sets: • Ground control points were selected by visual inspection for a preliminary "manual" coregistration • Fine coregistration was then performed using, in turn, an automatic selection of the homologous points in the image pairs based on the CRA.Indeed, this was the method providing the best accuracy according to the tests reported in [19] (the relevance of a precise coregistration is great in every change detection process as put into evidence in [26]) • Each similarity measure S(I, J) was expressed in normalized form (i.e., rescaled to range from 0 to 1) and then, also to facilitate visual interpretation, its complementary value 1 − S(I, J) was used to derive the images.In this way, the most significant changes (i.e., the smallest similarity estimates) are represented by the brightest pixels.

Scene and Data Processing
The experimental data set of the Toulouse site consisted of two optical images (the blue band of multispectral, XS, data) and one SAR (X-band) image, taken from the PELICAN and the RAMSES systems, respectively.The scene is mainly urban with limited vegetation patches represented by a park and some avenues with trees (see Figure 2).The first optical image simulates the push-broom mode of the SPOT satellite; details on the sensors and on the data may be found in Table 1.A significant time span of almost six years is given between one of the optical images and the remaining two.After reducing the SAR and push-broom images to the same resolution of the other optical data, the three images were coregistered.Then, the similarity measures presented in the previous sections were applied to the push-broom/SAR and push-broom/XS pairs.For all the measures, two series of tests were conducted [27]: • Case a used a 7 × 7-pixel estimation window to determine the histograms and then calculate the measures by the pixels of that window; • Case b used a larger window (21 × 21 pixels) to determine the histograms and the pixel value probabilities but then evaluate the similarity using only the pixels of a smaller area (again of 7 × 7 pixels).Referring again to Figure 1, in this case, one considers the outer and inner square windows for the two calculations.
The selection of the windows dimensions is a delicate point related to the resolution of the sensors and the nature and the size of the targets in the scene.Further investigations on this topic are presently underway.

Result Analysis
We reported in Figures 3 and 4 the images obtained using the same estimation window (7 × 7 pixels) for the statistical analysis and the measures calculation (case a), and in Figures 5 and 6 those obtained considering different window dimensions (case b).Based on these similarity images, considerations can be made having a qualitative character.
The initial general observation is that the results present relevant variations depending on the measure, indicating that each of them has a different sensitivity and may perform more or less efficiently.It is also evident that differences exist between the cases a and b that have then to be evaluated separately.
The similarity images show a strong dependence on the change of the sensor as well as on the change of the date (the time span between two acquisitions).This happens with all the similarity measures, so that none of the methods provides an unbiased (system independent) estimation of the occurred changes.This is an important question to keep in mind, since an eventual fusion of the results, i.e., the use of several similarity images derived by various combinations of different sensors, should then take into account the sensor type and weight each contribution differently.In order to do this, a dedicated analysis of each method would also be necessary to assess the specific thresholds that define the occurrence of the change.

Case a
Referring to the targets in the scene, one can note that the areas of the park and of the avenues are often characterized by the lowest indexes of similarity.This is within expectation.In fact, vegetated areas usually provide an unstable and changing scenario (in terms of the signals received by the sensors) leading to low probabilities of the specific values measured at each pixel.With respect to this point, it is interesting to observe the analogy between the considered similarity measures and the interferometric coherence of two SAR images.In both cases, the calculated quantities finally provide an estimation of the correlation of the pixel values (for the SAR, taking into account the phase of the coherent travelling signal) within an estimation (averaging) window.The characteristics of a similarity image are then close to those of the coherence one obtained from a pair of SAR images.Hence, similarity images may be suggested for classification applications to detect vegetated areas as is commonly practiced for interferometric coherence data [28,29].The similarity images derived from the optical data are generally clear and permit a good recognition of details such as borders and edges, those obtained from the push-broom/SAR pair are blurred, and lots of features recognizable using the other pair are confused and not clearly detectable (e.g., the asphalted parts of the avenues flanked by trees and some open spaces in the park).

Case b
The results obtained using a larger window (21 × 21 pixels) for determining the pixel value probabilities and then a smaller one (7 × 7 pixels) for the similarity measures are quite different from those of the previous case.
As expected, a larger number of samples yields a more stable and reliable assessment of their probabilities but this also implies that pixels from heterogeneous areas could be mixed together and used, leading to a spread of the histogram width.The relative variation of the probabilities in the 7 × 7-pixel window is reduced, and the "intensity" of the estimated changes is minor with respect to that characterizing the examples of case a.A practical effect on the similarity images is the blurring of the borders between the two main homogeneous features present in the scene, buildings and trees, that are generally so well separated as in case a (see Figures 5 and 6).
Also interesting is the fact that the measures based only on probabilities provide results partly contrasting with those of the previous case.Indeed, just the reduced relative variation of the probability in one pixel neighbourhood yields a more uniform behaviour of estimation windows corresponding to different features.In particular, the relevant changes of the vegetated areas observed with the small estimation window of case a turned out no longer as such on a larger scale.Consequently, also the similarity measure values for these areas are no longer so small.A more strict correspondence between the results of the two cases is given by the measures using also radiometry values.Three different airborne images were used for this second series of tests: panchromatic, X-band SAR, and hyper-spectral (HS), taking for the last one the blue band (see Table 2).
The acquisition campaigns were carried out over the test site of Oberpfaffenhofen, Germany, at different times, with a major span between the first and the last data take of about fourteen years.A limited scene was selected containing mainly agricultural fields and forests but also a technology campus where some new buildings and a parking lot appear in the more recent images and are not yet present in the first one (see Figure 7).The accurate identification of the new buildings permitted us to concentrate on changes of man-made structures and to define the precise ground truth for both changes (new buildings or the parking place) and the false alarms ("old" buildings always present in the scene).
The panchromatic and SAR images were reduced to the same resolution of the HS one and the three images were coregistered.Then, the similarity images were derived from the panchromatic/SAR and panchromatic/HS pairs.Table 2. Oberpfaffenhofen data set: images and sensors details.For all the measures, the tests were conducted using a larger window (21 × 21 pixels) for determining the histograms and the pixel value probabilities and then a smaller area of 7 × 7 pixels for evaluating the similarity of the images [30].

Result Analysis
In Figure 8 the similarity images shown are obtained using the panchromatic/SAR pair, and those in Figure 10 are obtained with the panchromatic/HS pair.Starting from these images, a quantitative assessment of the performance of the various similarity measures was obtained by selecting some man-made structures as the ground truth for both actual changes and false alarms.Then the respective percentages were measured by fixing a threshold for the values of the similarity images (hence, dividing them in two change/no-change regions) and counting the number of pixels of the ground truth areas included in the correct binary partition.In Tables 3 and 5, the estimates reported are obtained by automatically choosing the mean values of the similarity images as the threshold, i.e., a pixel is considered as changed if its measured value is larger than the average of the image to which it belongs.In contrast, the results collected in Tables 4 and 6 refer to thresholds fixed "manually" by trying to optimize the changes/false alarms ratio.For this second series of examples, the practical criterion was adopted where thresholds that lead to a percentage of detected changes lower than 50% were never used.Figures 9 and 11 show the corresponding change/no-change maps for the panchromatic/SAR and panchromatic/HS pairs, respectively.
From the estimated performance of the panchromatic/SAR pair (see Tables 3 and 4), one may note the poor reliability of the Woods criterion and correlation ratio.In both cases, the false alarms rate is larger than the percentage of detected changes (although, again, this could be improved by adequately selecting the discriminant threshold).For the three measures using only the probabilities, the accuracy of the change detection is much higher, but the false alarms rate remains too large for these measures to be reliable.In other words, the theoretical capability of the considered measures to deal with passive/active multi-sensor images presents some limits in practical applications.Referring to the results obtained using images from passive sensors (panchromatic and hyper-spectral), Tables 5 and 6 indicate that the automatic selection of the change/no-change thresholds from the average value of the similarity images provide reliable results for the ground truth samples.The worst accuracy (the smallest number of changed pixels detected) and the largest false alarms percentage are given by the Woods criterion in both its standard and robust formulation.However, in the first case a weakening of the change/no-change discriminant (i.e., a reduction of the threshold) leads to an improvement of more than 23% in the change detection accuracy with only a 13% increase in the false alarms.In general, the CRA and the mutual information measures perform the best.For the latter, it was also possible to improve the change detection accuracy by manually setting the threshold without affecting the false alarms rate.

Summary
In this work, a new approach for the change detection of remotely sensed data has been presented and the feasibility of multi-sensor images change detection at pixel level was verified; indeed, alternative procedures are also possible [15], for instance, to operate at feature level after a preliminary segmentation or classification process [31].Methods normally used for image coregistration, the similarity measures, have been applied, investigating their performance when the data are provided by different sensors.The general interest is motivated by the need, in the near future, for techniques which permit the exploitation of complementary optical and SAR data from satellites planned to work in a cooperative way like the Pléiades and COSMO-SkyMed ones.A further reason of interest is that these measures express, by their very definition, the degree of similarity between two images and hence are natural candidates to estimate their difference (their "dissimilarity").Since basically no example of their application has been reported in the literature, we decided to fill this gap and to study them providing also a basic review of their theory.
Our observations do not allow, at this stage, to draw definitive conclusions but suggest a methodology able to cope with a major remote sensing issue, which opens the way to several interesting research perspectives.The presented results are indeed promising and indicate similarity measures as possible tools to detect changes of the Earth surface [27,30].
We could see that the considered algorithms perform differently and that they do not offer an "absolute" measure of the changes.In fact, they depend more on the type of the sensor than on the time difference between the data takes.Also the selection of the dimensions of the estimation windows (for the pixel statistics and the similarity measure calculation) affects the results, in particular, when using measures based only on the probabilities.The definition of the optimal dimensions for the estimation window is also an open question to be further investigated.It is also worth noticing that we used a straightforward definition of the neighbourhood of a given pixel (based on square estimation windows) for the statistical analysis and the similarity estimation, and that the accuracy of these steps may be improved using any of the techniques that actively redefine region boundaries.These algorithms present the advantage of effectively identifying homogeneous areas and reducing the smearing of the radiometric values due to the straddling of the fixed window on two or more different regions.Our observations would remain valid and the use of each similarity measure could in this way be refined.
Although not originally intended to accomplish this task, the finding that similarity images can distinguish vegetated areas from man-made structures suggests their application for classification purposes in a similar way as for interferometric coherence SAR data (hence permitting, e.g., forest classification in area with limited availability of clouds-free optical images).However, since the characteristics of natural targets lead to confusion with the response from changing man-made structures, step-wise procedures are suggested to firstly establish the nature of the targets and then their eventual changes.Dedicated studies are presently in progress on both these topics.

Figure 1 .
Figure 1.Calculation of the marginal and joint probabilities.

Figure 3 .
Figure 3. Similarity images of the push-broom and SAR data -case a: (a) distance to independence; (b) mutual information; (c) cluster reward algorithm; (d) Woods criterion; (e) robust Woods criterion; (f) correlation ratio.

Figure 4 .
Figure 4. Similarity images of the push-broom and XS data -case a: (a) distance to independence; (b) mutual information; (c) cluster reward algorithm; (d) Woods criterion; (e) robust Woods criterion; (f) correlation ratio.

Figure 5 .
Figure 5. Similarity images of the push-broom and SAR data -case b: (a) distance to independence; (b) mutual information; (c) cluster reward algorithm; (d) Woods criterion; (e) robust Woods criterion; (f) correlation ratio.

Figure 11 .
Figure 11.Panchromatic/HS pair, change/no-change maps based on manually selected similarity measure thresholds: (a) distance to independence; (b) mutual information; (c) CRA; (d) Woods criterion; (e) robust Woods criterion; (f) correlation ratio.In yellow and red, respectively, the ground truth areas corresponding to new and always present man-made artifacts.

Table 1 .
Toulouse data set: images and sensors details.

Table 3 .
Panchromatic/SAR pair: change and false alarm estimates based on mean similarity measure thresholds.

Table 4 .
Panchromatic/SAR pair: change and false alarm estimates based on manually selected similarity measure thresholds.

Table 5 .
Panchromatic/HS pair: change and false alarm estimates based on mean change measure thresholds.

Table 6 .
Panchromatic/HS pair: change and false alarm estimates based on manually selected similarity measure thresholds.