Object-Based Change Detection in Urban Areas from High Spatial Resolution Images Based on Multiple Features and Ensemble Learning

To improve the accuracy of change detection in urban areas using bi-temporal high-resolution remote sensing images, a novel object-based change detection scheme combining multiple features and ensemble learning is proposed in this paper. Image segmentation is conducted to determine the objects in bi-temporal images separately. Subsequently, three kinds of object features, i.e., spectral, shape and texture, are extracted. Using the image differencing process, a difference image is generated and used as the input for nonlinear supervised classifiers, including k-nearest neighbor, support vector machine, extreme learning machine and random forest. Finally, the results of multiple classifiers are integrated using an ensemble rule called weighted voting to generate the final change detection result. Experimental results of two pairs of real high-resolution remote sensing datasets demonstrate that the proposed approach outperforms the traditional methods in terms of overall accuracy and generates change detection maps with a higher number of homogeneous regions in urban areas. Moreover, the influences of segmentation scale and the feature selection strategy on the change detection performance are also analyzed and discussed.


Introduction
Changes of land use/cover in urban areas are complex owing to the frequent interaction between humans and the natural system.Information on transformation of land use/cover in urban areas is crucial for decision-making related to sustainable development and plays an important role in research on urban expansion and practical applications such as urban planning and management.Remote sensing images with large covers, but short revisit times are extensively used in various aspects of urban change detection such as monitoring the resources and the environment in urban areas [1,2].The availability of new satellite sensors capable of providing high spatial resolution images with more detailed landscape characterization as compared with medium-and low-resolution remote sensing images allows us to analyze urban areas at a local level [3].
Change detection is the process of identifying differences by observing images at different times [4][5][6][7].Change detection methods can be broadly categorized into unsupervised or supervised methods [8,9].The former methods perform a direct comparison of the two multispectral images under consideration [10].These techniques are suitable for some applications, such as the detection of deforestation or burned areas.However, the performance of unsupervised techniques may be degraded by several external factors, such as illumination variations, changes of atmospheric conditions and poor sensor calibration, which normally occur at different acquisition times.By contrast, supervised methods, which are adopted in this study, use a training set generated from the ground truth for the learning process of classifiers.Compared with unsupervised methods, supervised methods exhibit the advantage of robustness in handling various atmospheric and illumination conditions at different acquisition times [11].The use of high-resolution (HR) imagery incurs the cost of increased within-class variances, which prevent the successful application of traditional supervised classification methods.Therefore, the use of a robust and nonlinear classifier is more effective for HR imagery because noise and the generally higher spread in class distributions make the classification problem very complex.Rather than traditional classifiers such as the maximum likelihood classifier, discriminative classifiers based on k-nearest neighbor (KNN), support vector machine (SVM), extreme learning machine (ELM) and random forest (RF) play important roles in supervised classification.Although effectiveness in several remote sensing applications has been demonstrated, some drawbacks have been observed with each classifier.The change detection result obtained using a single classifier is limited by the advantages and disadvantages of the selected classifier [12].To solve this problem, the use of ensemble learning (EL) is proposed in change detection using HR imagery.An ensemble is defined as a set of individually-trained classifiers whose predictions are combined when classifying new data [13].With regard to the application of EL in remote sensing, some studies have proved that the combination of different classifiers achieves better performance than the use of an individual classifier [14][15][16].Owing to these principles and advantages of EL, the shortcomings of the aforementioned advanced classifiers might be overcome by using the ensemble strategy.
The traditional change detection method is pixel-based.It detects the occurrence of changes based on comparison of pixels, and thus, it cannot overcome the limitations of radiometric differences and misregistration between different dates or sensors [17].The object-based method overcomes these problems and considerably improves the accuracy of change detection [18].The object-based method detects changes based on objects, and it can employ spectral and spatial information effectively and match the properties of ground objects.The object-based method subdivides an image into meaningful homogeneous regions and organizes them hierarchically into image objects [19,20].An object is a set of pixels adjacent in space and spectrally similar to each other [21].Object-based change detection (OBCD) extracts meaningful image-objects by segmenting (two or more) input remote sensing images, and thus, it is consistent with the original notion of using change detection to identify differences in the state of an observed "object or phenomenon" [7].Each image object is considered as a single study unit in OBCD.OBCD combines segmentation operation and spatial, spectral and geographic information along with the experience of the analyst with image objects to model geographic entities [22,23].It is evident that the interference of ground truth may decrease when using OBCD, which contains abundant relationships of attributes between pixels in the same objects [24][25][26].The detected small spurious changes caused by high spectral variability, multi-temporal image acquisition and misregistration, therefore, are reduced by smoothing out small changes within the extent of each geographic object.The object-based method has demonstrated significant advantages for OBCD using HR imagery, and various studies have been conducted.For example, Miller et al. (2005) [27] presented an OBCD algorithm to detect the change objects between a pair of gray-level images.Durieux et al. (2008) [28] applied an object-based classification approach and compared the independently-classified objects obtained from multi-temporal images for change detection.Instead of separately segmenting multi-temporal images, Desclee et al. (2006) [29] presented an algorithm in which temporally-sequential images are combined and segmented together to produce spatially-corresponding change objects.
Moreover, in contrast to the use of spectral information only, change detection using the object-based method allows for the extraction of sophisticated geospatial information with object-based features, such as shape and texture, which can exploit spatial information to complement and enhance the spectral-based method for change detection [30].Spatial context features are considered to ease the process of classification and change detection using HR images.Murray et al. (2010) [31] proved that the combination of spectral and textural features improves the performance of classification using HR images significantly.By contrast, classification results generated using spectral features result in lower performance.In [32], shape features were integrated for better classification using HR imagery.In summary, these studies confirm that the lack of spectral information is successfully balanced by the supplementation of contextual information.
Given that HR remote sensing images are widely used in urban change detection, the weaknesses of the aforementioned traditional methods can be observed often.Although the combination of an object-based method and multiple object contextual information has been introduced in some studies to enhance the change detection ability in comparison with traditional methods, the results are unstable for different training samples and urban scenes and are limited by the advantages and disadvantages of the selected classifier.Therefore, in this study, the EL method is applied to optimize the advantages of the multiple supervised classifiers in OBCD with multiple object contextual features and to obtain stable and highly accurate results of change detection in urban areas from HR remote sensing images.
The remainder of this paper is organized as follows.Section 2 presents the proposed scheme of change detection.Experimental results and analysis are presented in Section 3. Finally, the conclusion and some perspectives are presented in Section 4.

Method
The proposed approach, which is illustrated in Figure 1, broadly includes the following steps.
(1) Data pre-processing is conducted to reduce discrepancies between bi-temporal images.
(2) Image segmentation is performed to generate segmentation maps of bi-temporal images.
(3) The features in each region of the images are represented by the spectral, shape and textural information of the homogeneous pixels, and a difference image is generated.
(4) Multiple supervised classifiers are employed to detect the changes by using the difference image, and multiple change detection results are integrated to generate the final result.

Data Preprocessing
Fundamental image pre-processing operations are performed to reduce discrepancies between bi-temporal images acquired using the same or different sensors.Radiometric correction, which contains radiometric calibration and the FLAASH atmospheric correction algorithm, is carried out to eliminate radiance or reflectance differences caused by the digitalization process of the remote sensing systems and atmospheric attenuation distortion caused by absorption and scattering in the atmosphere [9].Orthorectification is usually required to remove relief displacement in multi-temporal images [33].Pan-sharpening is also used in this step to improve the spatial resolution of HR imagery [34].Co-registration is essential for ensuring that multi-temporal image pixels or objects in the same location are compared [35].To ensure high accuracy of change detection, bi-temporal images are co-registered to a root-mean-square error of less than 0.5 pixels.

Image Segmentation
Image segmentation is the subdivision of a digital image into small separate regions represented by basic unclassified image objects according to certain criteria [36].Multi-scale image segmentation is employed to obtain image objects at different scales [37].Compared with several existing image segmentation methods, the fractal net evolution approach is an effective and widely-used image segmentation method in remote sensing [38].In this method, spectral and spatial information are used to define a series of relatively homogeneous polygons [39], and several pixels or existing objects are merged into a larger one based on the following parameters: scale, color against shape weight and smoothness against compactness weight.Changing these criteria will change the shape and size of objects produced by segmentation, allowing an image to be segmented at different scales.In this research, the image segmentation of bi-temporal HR datasets is conducted using the object-based software Definiens eCognition Developer Version 8.0 (formerly eCognition), and segmentation schemes with a range of scales are used.All the parameter settings during the segmentation process are based on visual inspection of segmentation [40,41].Framework of the proposed change detection approach using HR imagery.

Multiple Features Extraction and Difference Image Generation
Using the feature extraction procedure, all the pixels within a segment receive the value of the feature computed for the whole segment, thus generating a raster feature map.Multiple features of objects are subsequently calculated to structure data layers in each temporal image.Spectral features are obtained based on the reflectance of the incident electromagnetic wave of different objects in each band, including spectral signal of objects in each band (i.e., average of spectral signals from all the pixels within the objects), brightness of objects (i.e., average of spectral signals from all the bands) and maximum difference (i.e., maximum variation between spectral signals of all bands).The shape features consist of geometric features of objects, including length-width ratio, compactness, density and shape index.The textural features, including mean, variance, homogeneity, contrast, dissimilarity, entropy, angular second moment and correlation, comprise the measures of gray level co-occurrence matrices (GLCM) proposed by Haralick [42,43].Table 1 lists the extracted features in this study.Two feature selection strategies are applied to evaluate the performance of multiple features in change detection, i.e., the selection of six spectral features and all eighteen features.Feature fusion through layer stacking is conducted to generate new bi-temporal images with multiple feature layers.In addition, normalization via min-max scaling is applied before feature fusion, which is aimed at reducing the effects of different data expressions owing to various acquisitions and generation conditions.
The pixel-based image differencing process is conducted to calculate the difference vectors of the bi-temporal segmentation images with multiple features from Time 1-Time 2. The difference vectors of the multi-band difference image, which are used as the input of classifiers for change detection, are expressed as ∆C = G − H, where G = (g 1 , g 2 , ..., g k ) and H = (h 1 , h 2 , ..., h k ) are two single temporal images, and k is the number of layers in each image.As all unchanged pixels result in similar differences (with ∆C ≈ 0), the land cover class of such pixels cannot be modeled.However, those showing difference vectors far from 0 in at least one feature have a high probability of being associated with a transition in ground cover [44,45].When working with a low number of original spectral variables, this approach may present an ambiguity problem that ∆C ≈ 0 may correspond to transitions of the pixels.However, this issue does not affect the change detection process in the proposed scheme because it relies on multi-spectral bands and additional shape and textural variables.

Change Detection with EL
Change vectors of bi-temporal difference images are used as the input of the supervised classifiers to classify the change and non-change in urban areas.The robust and powerful classifiers, which are proposed for use in change detection using HR imagery, are described as follows.
(1) The KNN classification approach is an instant-based learning algorithm that uses the nearest distance in determining the category of a new vector in the training data.During the training stage, the feature space is divided into multiple regions, and the training data points are mapped into these regions according to the similarity of their content.The unlabeled input data points are categorized into a particular category by finding the distance between the input data point and that particular category.The KNN approach requires only a small number of training data points, and this has contributed to the simplicity of the KNN, which outperforms other classification approaches [46].
(2) SVM, which is a nonparametric supervised classifier relying on Vapnik's statistical learning theory [17], is chosen owing to its intrinsic robustness to high-dimensional datasets and to ill-posed problems.It possesses the advantages of superior generalization ability and insensitive value and is suitable for solving high-dimensional, small-sample, non-linear model classification and return problems.In this study, the Gaussian radial basis function is used as the kernel function, and particle swarm optimization is used to optimize the parameters [47] in supervised change detection.
(3) ELM is a new type of single hidden layer feed-forward neural network [48], whose characteristics include inputting the weight and bias with random initialization and obtaining the corresponding output.Successful applications of ELM in different fields have been frequently reported in the literature [49][50][51].Compared with the traditional single hidden layer feed-forward neural network, ELM has higher computational efficiency and generalization ability and is thus best suited to being the core algorithm as an effective and efficient classifier in change detection.
(4) RF is a non-parametric machine learning method that can handle a large number of input features [52].It can also be used as an embedded method for the fusion of multiple features, where the feature selection and learning phases can interact with each other [53].RF is a combination of a series of tree-structured classifiers.For each node of a tree, a bootstrapped sample of the original training samples is randomly selected.The Gini index [54], which is a standard impurity measure, is used to avoid variable selection bias.The final results are subsequently obtained by selecting the output of the ensemble of tree classifiers, which yields higher accuracy than using a single classifier alone [55].This method demonstrates robust and accurate performance with complex datasets without requiring fine-tuning in the presence of many noisy variables, and it can simultaneously predict whether changes have occurred in the pixels of the objects.
In the context of pattern recognition fields, it is not guaranteed that one or more specific classifiers can always achieve the best performance in every circumstance.However, better predictive performance than that achieved using any single classifier might be achieved using EL, which can combine the results of multiple classifiers to achieve a better result and generalization capability.Owing to its validity in statistics, expression and computation, EL has been widely applied in machine learning and pattern recognition fields [13][14][15].In this research, after the aforementioned multiple change detection results are generated using different supervised classifiers, an ensemble method called weighted voting [14,56] is applied, which combines the outputs of all the classifiers to achieve the final change detection result.In this method, the supervised classifiers are trained with training samples randomly selected from the labeled samples, and the remaining samples are used as test samples.The weight of each classifier is defined by the overall accuracy (OA) of its change detection results in test samples.Finally, the change detection result of each pixel is sum-weighted, and the result with the largest weight is defined as the final change detection result of the pixel.

Experiment A
Experimental Area A is located in Xuzhou, China.The images were acquired by QuickBird satellite on 15 September 2004 and 2 May 2005.After undergoing data preprocessing, the images have 1340 × 1860 pixels with a spatial resolution of 0.61 m.The ground reference change map, showing the changed and unchanged regions in the study area, is produced in two steps.(1) A pixel-based change vector analysis method is applied to implement the unsupervised change detection, and the Otsu method is used to define the threshold between change and non-change classes.The Otsu method is a non-parametric method of automatic threshold selection.The optimal threshold is selected based on a discriminated criterion, so as to maximize the separability of the resultant classes [57].(2) Using the preliminary result of the previous step, manual image-interpretation is carefully conducted based on available prior knowledge.Figure 2   In Figure 3, image segmentation, in which the color weight is set as 0.8, the shape weight is set as 0.2 and the smoothness and compactness weights are both set as 0.5 in all scales according to [58], is conducted with the scales from 100-300 in steps of 50 to validate the applicability of multiple features and EL in change detection using HR imagery.Eighteen features are extracted including spectral, shape and texture.Two feature selection strategies, one that considers spectral features and another that considers all the features extracted, are applied in order to evaluate the performance of multiple object features in OBCD using HR imagery.Four classifiers, i.e., KNN, SVM, ELM and RF, are used with 2500 training samples per class (a total of 5000 samples) randomly selected from the pixels in the reference map in each classifier, and the results of multiple classifiers are integrated to generate the final results.To obtain stable accuracy of the change detection, the experimental results are achieved as the mean of 10 Monte Carlo runs.Table 2 lists the OBCD accuracies obtained using four different single supervised classifiers and the EL method with different scales and feature selection strategies, in which pixel-based change detection results are also listed for comparison.The highest accuracies for each scale and feature selection strategy are highlighted in bold.It can be observed that OBCD outperforms the traditional pixel-based change detection method.In OBCD, the accuracies of the EL method are always the highest for all scales.Comparing the different feature selection strategies used for OBCD with EL, the multiple-feature-based method produces better results than the spectral-feature-based method.Figure 4 shows the change detection maps obtained by using EL in one Monte Carlo run.Significant improvements obtained by using multiple features are indicated with pairs of yellow rectangles.For example, the unchanged woods, which are marked with yellow rectangles in the center right of most maps, are not detected by the multiple feature strategy, but they are falsely detected by the spectral feature strategy.Spectral-feature-based change detection also causes some other errors such as the false detection of unchanged grassland, which is marked with the smallest rectangles in the lower right of (c) and the lower center of (d) contrasted with the same location in (h) and (i).Other pairs of comparisons are also listed in the figure.Overall, change detection maps obtained using multiple features are more accurate than the spectral-feature-based method.Figure 5a,b intuitively show the trends of change detection accuracies for EL and the other four supervised classifiers.From the figures, it can be observed that the image segmentation scale significantly affects the accuracy of the change detection results.The lines representing different classifier accuracies show various trends with the increase in image segmentation scale, which indicates that the impact of image segmentation scale is varied across classifiers.Nevertheless, the EL method can take advantage of the results with higher accuracies and avoid the disadvantages of the results with lower accuracies to generate the final results with the highest accuracies for all scales.The EL lines increase with the increase of the scale until peaks are reached at the scale of 200, followed by drop off.In other words, over-segmentation or lack of segmentation can negatively affect change detection results.Figure 5c illustrates that change detection results obtained using multiple features are more accurate than those obtained using the spectral-based method for all scales.

Experiment B
The Experimental Area B dataset is composed of Ziyuan-3 and Gaofen-1 HR images acquired at the same location of Jiangyin County, China, on 10 February 2015 and 27 March 2016.Although acquired from different sensors, they share the same multi-spectral and panchromatic bands, each of which has the same spectral range response.Therefore, after undergoing a series of pre-processing operations including radiometric correction, orthorectification, pan-sharpening, co-registration and resampling, the images are consistent and available for change detection.They have 800 × 880 pixels with a spatial resolution of 2 m.As in the previous method, the ground reference map is produced by combining the pixel-based change vector analysis method and manual photointerpretation. Figure 6   Image segmentation is performed using the eCognition software with the scales from 40-120 in steps of 20 for each temporal image (see Figure 7).In the operation, the color weight is set as 0.8, the shape weight is set as 0.2 and the smoothness and compactness weights are both set as 0.  Table 3 lists the overall accuracies and Kappa coefficients of all the results for different feature selection strategies and image segmentation scales.The highest accuracies for each scale and feature selection strategy are highlighted in bold.Compared with the pixel-based method, OBCD achieves higher results in each selected segmentation scale.The proposed EL method integrates the multiple results obtained with different classifiers and achieves the best results.Comparing different feature selection strategies, the use of multiple features in change detection achieves better performance than the use of only spectral features in each scale.Figure 8 shows the EL results in one Monte Carlo run for different scales and feature selection strategies.Significant improvements are marked with yellow rectangles in each map of Figure 8, and pairs of comparisons are listed to show the difference of change maps in each scale.It can be observed that the spectral-feature-based method causes more false alarms than the multiple-feature-based method with reference to the standard reference change map.For example, in the lower center of Experimental Area B, farmlands have not changed, except that canals have dried up.False detections of farmlands are illustrated by using the spectral-feature-based method with marked rectangles in (c-e), whereas the multiple-feature-based method detects the changed canals accurately in the same location of (h-j).Figure 9a,b shows the change detection results with different supervised classifiers for different scales.Although various single classifiers are provided showing various trends with the increase of segmentation scale, the EL method integrates their results and achieves better results than any single classifier for each scale.Figure 9c shows the EL results with different feature selection strategies for various segmentation scales.The results obtained by using multiple features are significantly better than those obtained by using spectral features only.With the increase of segmentation scale, the accuracies of both lines increase until peaks are reached at the scale of 80, followed by drop off.In other words, over-segmentation and lack of image segmentation decrease the accuracies of the results.

Conclusions and Perspective
In this work, a novel change detection scheme that combines multiple features and the EL method is proposed for OBCD of urban areas using HR images.Based on the experimental results using two datasets, several conclusions can be drawn as follows.
(1) The object-based method shows better performance than the pixel-based method as it considers not only spectral information, but also object geometry information in change detection.
(2) The use of multiple object features in change detection can yield higher accuracies in the object-based method with different segmentation scales and classifiers.Additional shape and textural features can enhance the ability to distinguish changed and unchanged regions in comparison with the spectral-based method.
(3) Single supervised classifiers show unstable performances when dealing with different images or segmentation scales.As the change detection with the EL method integrates multiple results, it can always achieve higher accuracy than any single supervised classifier.In addition, change detection results with the EL method are more stable for various segmentation scales and benefit OBCD without having much prior knowledge of the research area.
(4) Image segmentation scale can interfere with the accuracy of OBCD results.Suitable scales can yield better performance, and the outcomes show high similarity with the transformation of ground truth objects, whereas over-segmentation and lack of segmentation are not conducive to obtaining better results, and missing or false detections are easily generated owing to the inappropriate sizes of objects in bi-temporal segmentation maps.
However, the new approach still suffers from two major limitations.First, segmentation scales can affect the results of change detection, and determining the most suitable scale requires repeated experiments.Second, since the proposed approach is based on supervised methods, it is transferable and reproducible in the condition of the information of training samples in research areas being learned in advance.Specifically, a small quantity of training samples containing changed and unchanged regions should be labeled to generate the change detection map of the whole scene through a full training process.Owing to these limitations and subjective intervention conditions, future studies will be directed toward, but not limited to, extending the method from the supervised way to an unsupervised way or a semi-supervised way by performing segmentation and selecting the training samples of changed and unchanged regions automatically, in HR images with urban scenes and other types of data with different spatial resolutions and scenes.

4 Figure 1 .
Figure 1.Framework of the proposed change detection approach using HR imagery.
illustrates the false color composite images and standard reference change map, in which 305,298 pixels are labeled as non-change and 219,715 pixels are labeled as change.

Figure 2 .
Figure 2. Experimental Area A. (a) Image acquired on 15 September 2004; (b) image acquired on 2 May 2005; (c) standard reference change map.

Figure 3 .
Figure 3. Subsets of segmentation maps with different scales in Experimental Area A. (a-e) are the scales from 100-300 in steps of 50 for the image acquired in 2004; (f-j) are the scales from 100-300 in steps of 50 for the image acquired in 2005.

Figure 4 .
Figure 4. Change detection results of the EL method with one Monte Carlo run, where red denotes change and blue denotes non-change.(a-e) are the results with spectral features with the scale from 100-300 in steps of 50; (f-j) are the results with multiple features with the scale from 100-300 in steps of 50.

Figure 5 .
Figure 5. (a) Accuracies of OBCD with spectral features and different classifiers for various segmentation scales; (b) accuracies of OBCD with multiple features and different classifiers for various segmentation scales; (c) accuracies of OBCD with EL and different feature selection strategies for various segmentation scales.
illustrates the false color composite images and standard reference change map, in which 42,704 pixels are labeled as non-change and 24,586 pixels are labeled as change.

5 .
Eighteen features are extracted from the objects in each segmentation image, and two feature selection strategies are applied for comparison as mentioned above.Four classifiers, i.e., KNN, SVM, ELM and RF, are employed to classify the change and non-change with 1000 samples per class (a total of 2000 samples) randomly selected from the objects in the standard change map.Finally, multiple results are integrated to obtain the final results.The experimental results are obtained as the mean of 10 Monte Carlo runs.

Figure 7 .
Figure 7. Subsets of segmentation maps with different scales in Experimental Area B. (a-e) are the scales from 40-120 in steps of 20 for the image acquired in 2015; (f-j) are the scales from 40-120 in steps of 20 for the image acquired in 2016.

Figure 8 .
Figure 8. Change detection results of the EL method with one Monte Carlo run, where red denotes change and blue denotes non-change.(a-e) are the results with spectral features with the scale from 40-120 in steps of 20; (f-j) are the results with multiple features with the scale from 40-120 in steps of 20.

Figure 9 .
Figure 9. (a) Accuracies of OBCD with six spectral features and different classifiers for various segmentation scales; (b) accuracies of OBCD with multiple features and different classifiers for various segmentation scales; (c) accuracies of OBCD with EL and different feature selection strategies for various segmentation scales.

Table 1 .
Overview of extracted features for OBCD using HR imagery.

Table 2 .
Overall accuracies of change detection and Kappa coefficients of Experiment A for different segmentation scales and feature selection strategies.

Table 3 .
Overall accuracies of change detection and Kappa coefficients of Experiment B for different segmentation scales and feature selection strategies.