A Novel Change Detection Approach for Multi-Temporal High-Resolution Remote Sensing Images Based on Rotation Forest and Coarse-to-Fine Uncertainty Analyses

: In the process of object-based change detection (OBCD), scale is a signiﬁcant factor related to extraction and analyses of subsequent change data. To address this problem, this paper describes an object-based approach to urban area change detection (CD) using rotation forest (RoF) and coarse-to-ﬁne uncertainty analyses of multi-temporal high-resolution remote sensing images. First, highly homogeneous objects with consistent spatial positions are identiﬁed through vector-raster integration and multi-scale ﬁne segmentation. The multi-temporal images are stacked and segmented under the constraints of a historical land use vector map using a series of optimal segmentation scales, ranging from coarse to ﬁne. Second, neighborhood correlation image analyses are performed to highlight pixels with high probabilities of being changed or unchanged, which can be used as a prerequisite for object-based analyses. Third, based on the coarse-to-ﬁne segmentation and pixel-based pre-classiﬁcation results, change possibilities are calculated for various objects. Furthermore, changed and unchanged objects identiﬁed at different scales are automatically selected to serve as training samples. The spectral and texture features of each object are extracted. Finally, uncertain objects are classiﬁed using the RoF classiﬁer. Multi-scale classiﬁcation results are combined using a majority voting rule to generate the ﬁnal CD results. In experiments using two pairs of real high-resolution remote sensing datasets, our proposed approach outperformed existing methods in terms of CD accuracy, verifying its feasibility and effectiveness.


Introduction
Change detection (CD) is an important research topic involving quantitative analyses of multi-temporal remotely sensed images to investigate changes in land cover, particularly in the contexts of urban infrastructure monitoring, urban development, and disaster assessment [1]. Along with the rapid development of remotely sensed image acquisition systems and gradual shortening of the utilized existing geographic information system (GIS) databases and prior knowledge to accomplish high-resolution remote sensing image CD [26][27][28][29][30]. Post-classification with multi-temporal remote sensing images is one of the most popular change detection methods, providing the detailed "from-to" change information in real applications [31,32]. However, due to the fact that it neglects the temporal correlation between corresponding pixels in multi-temporal images, the post-classification approach usually suffers from an accumulation of misclassification errors [32]. In order to solve this problem, some researchers have combined historical land use vector maps (HVMs) with remote sensing images to create a classification system, set decision rules, and used existing GIS knowledge to aid the CD process [26,[28][29][30]. These studies use GIS and remote sensing tools to monitor land use and land cover changes at different spatiotemporal scales. The GIS and remote sensing tools can provide scientific procedures to analyze the pattern, rate, and trend of environmental change at all scales. Generally speaking, there are two types of change that can affect polygons in HVMs, global change and local change. Global changes are changes to the overall category of the target polygon (e.g., region B from impervious surface to building, and region C from farmland to impervious surface in Figure 1a,b), while local changes are those within a target polygon's coverage (e.g., part of region A from bare land to impervious surface, and part of region D from settlement to impervious surface Figure 1a,b). To effectively detect areas where local changes have occurred in an HVM, a remote sensing image needs to be segmented under the constraints of the HVM to separate the area exhibiting a local change. Vector-raster integration methods avoid the manual setting of parameters for segmentation partition and each object is the smallest unit of the land use CD. Nonetheless, these methods have several drawbacks. First, HVMs and images must have strict geometric registration, because the registration error can affect the CD result. Second, an HVM is produced according to the national census geography standard of the country being analyzed. It often reflects land use based on classification criteria, while remote sensing images reflect the actual land coverage. Transformation is necessary before integrating these two data sources. Third, objects acquired through HVM are usually heterogeneous, and the classification of these objects is generally complex. These flaws are likely to cause poor CD results.
Remote Sens. 2018, 10, x FOR PEER REVIEW 3 of 23 knowledge to accomplish high-resolution remote sensing image CD [26][27][28][29][30]. Post-classification with multi-temporal remote sensing images is one of the most popular change detection methods, providing the detailed "from-to" change information in real applications [31,32]. However, due to the fact that it neglects the temporal correlation between corresponding pixels in multi-temporal images, the post-classification approach usually suffers from an accumulation of misclassification errors [32]. In order to solve this problem, some researchers have combined historical land use vector maps (HVMs) with remote sensing images to create a classification system, set decision rules, and used existing GIS knowledge to aid the CD process [26,[28][29][30]. These studies use GIS and remote sensing tools to monitor land use and land cover changes at different spatiotemporal scales. The GIS and remote sensing tools can provide scientific procedures to analyze the pattern, rate, and trend of environmental change at all scales. Generally speaking, there are two types of change that can affect polygons in HVMs, global change and local change. Global changes are changes to the overall category of the target polygon (e.g., region B from impervious surface to building, and region C from farmland to impervious surface in Figure 1a,b), while local changes are those within a target polygon's coverage (e.g., part of region A from bare land to impervious surface, and part of region D from settlement to impervious surface Figure 1a,b). To effectively detect areas where local changes have occurred in an HVM, a remote sensing image needs to be segmented under the constraints of the HVM to separate the area exhibiting a local change. Vector-raster integration methods avoid the manual setting of parameters for segmentation partition and each object is the smallest unit of the land use CD. Nonetheless, these methods have several drawbacks. First, HVMs and images must have strict geometric registration, because the registration error can affect the CD result. Second, an HVM is produced according to the national census geography standard of the country being analyzed. It often reflects land use based on classification criteria, while remote sensing images reflect the actual land coverage. Transformation is necessary before integrating these two data sources. Third, objects acquired through HVM are usually heterogeneous, and the classification of these objects is generally complex. These flaws are likely to cause poor CD results. To alleviate the aforementioned drawbacks, this paper incorporates the guidance of historical interpretation into the OBCD process from multi-temporal high-resolution remote sensing images. Accordingly, an OBCD approach that combines pixel-based pre-classification of high-resolution remote sensing images using the rotation forest (RoF) classifier and HVM is proposed in this paper. The purpose of combining two approaches is to obtain better results [16,[33][34][35]. In general, there are two ways to undertake a combined approach: The two methods may be executed in parallel and then integrated, or the two methods may be performed consecutively, allowing the results of one method to be used as the premise for the other. Both of these strategies are employed for the purpose of achieving better results.
In this paper, we perform pixel-based and OB methods consecutively. For pixel-level analyses, neighborhood correlation image analysis (NCIA) is adopted to obtain pixel-level pre-classification To alleviate the aforementioned drawbacks, this paper incorporates the guidance of historical interpretation into the OBCD process from multi-temporal high-resolution remote sensing images. Accordingly, an OBCD approach that combines pixel-based pre-classification of high-resolution remote sensing images using the rotation forest (RoF) classifier and HVM is proposed in this paper. The purpose of combining two approaches is to obtain better results [16,[33][34][35]. In general, there are two ways to undertake a combined approach: The two methods may be executed in parallel and then integrated, or the two methods may be performed consecutively, allowing the results of one method to be used as the premise for the other. Both of these strategies are employed for the purpose of achieving better results.
In this paper, we perform pixel-based and OB methods consecutively. For pixel-level analyses, neighborhood correlation image analysis (NCIA) is adopted to obtain pixel-level pre-classification results, which are then used as a prerequisite for OB analyses. The HVM is used as a priori knowledge to guide selection of the optimal segmentation scales for various ground objects. RoF, a new type of machine-learning algorithm, performs better than many single predictors and integrated forecasting methods due to its high diversity of training samples and features. RoF is widely used to combine numerous weak classifiers to generate a relatively strong classifier by reducing either the bias or the variance of the individual classifiers [23,[36][37][38]. In this paper, we combine pixel-based and OB methods with the RoF model to analyze the influences of segmentation scale, sample selection, and feature extraction on final results. The main contributions of this paper are as follows. First, we propose a pre-classification scheme based on NCIA to obtain high-accuracy labeled samples. Second, we present a coarse-to-fine uncertainty analysis model based on the combination of RoF and HVM. In this analysis, the optimal image segmentation result is obtained under the guidance of historical knowledge interpretation. RoF, combining feature extraction and classifier ensembles, is applied to binary classification. Third, an effective majority voting (MV) rule is proposed to produce the final results.
The rest of this paper is organized as follows. Section 2 details the proposed method. In Section 3, the experimental results on multiple datasets are presented to show the performance of the proposed method. Section 4 provides the discussion. Finally, we conclude the paper in Section 5.

Methodology
Among CD techniques, OBCD methods have become the most popular. However, due to the decline of spectral separability and complications such as identifying the same spectrum in different objects and/or different spectra in the same object, OBCD remains a challenging issue for several reasons. First, multi-temporal image segmentation (MTIS) is the foundation and basis of OBCD, and it seriously restricts the accuracy of OBCD. Not only do many segmentation algorithms exist that affect the resulting object geometries, but many CD techniques can affect the final results [24]. Second, change information is usually extracted at a single scale, neglecting scale set constraints. Third, objects in MTIS results are treated independently, ignoring the effects of neighboring object interactions on the change property.
To address these key problems, a multi-scale and multi-level CD strategy is adopted in this paper. The overall workflow of the proposed approach is shown in Figure 2. Four aspects are investigated to improve OBCD accuracy in terms of the OBCD process, namely MTIS, training sample selection, multi-feature extraction, and coarse-to-fine uncertainty analyses. (1) MTIS is conducted using vector-raster integration and coarse-to-fine segmentation. The superimposed image is divided into homogeneous objects with eCognition software (version 8.7) [39], facilitating subsequent CD experiments. (2) Changed and unchanged objects are selected based on coarse-to-fine segmentation and pixel-level pre-classification. Uncertain objects are further classified using the trained RoF model. (3) The multi-features of each object in the images are represented by spectral and texture information of homogeneous pixels. (4) Scale sets can be considered as a collection of image sequences of different scales. CD results from different scales are usually treated independently, and thus multi-scale fusion is implemented to combine different CD results using coarse-to-fine uncertainty analyses.

MTIS and Estimation of Scale Parameters
Multi-resolution segmentation (MRS) [39], which is embedded in eCognition software, is employed for image segmentation. Acquisition of coarse-to-fine sequential images is the basis of scale-driven OBCD. The quality of segmentation seriously affects the accuracy of object feature extraction and OBCD. Based on different image segmentation strategies, there are three primary modes of multi-temporal image segmentation at present [40,41], namely single-temporal segmentation (STS), multi-temporal separate segmentation (MTSS), and multi-temporal combined segmentation (MTCS), as shown in Figure 3. Further details of the three primary modes of multi-temporal image segmentation are described in Niemeyer et al. (2008) and Zhang et al. (2017). In this paper, corresponding objects are extracted via vector-raster integration and coarse-to-fine segmentation under the boundary constraints of HVM. The HVM, produced from the National Census Geographic of China, is used as auxiliary data. The superimposed image is divided into homogeneous objects with consistent spatial positions. Estimation of scale parameter (ESP) [42,43] and the modified average segmentation evaluation index (ASEI) are used to guide the selection of the optimal segmentation scales for different ground objects. ESP, produced by Lucian et al. (2010), is a scale parameter estimation plug-in developed for eCognition software that calculates the local variance (LV) of homogeneous image objects using different scale parameters. The average standard deviation of the object layer is used to determine whether the segmentation scale is optimal. The rate of change in LV (ROC-LV) is used to indicate the approximate optimal scale parameter. When the ROC-LV is at its maximum value within an interval, the corresponding scale is the optimal scale.
However, this method can only be adapted to analyses of image data in a single band. To take full advantage of multi-spectral information, we utilize the modified ASEI to confirm the best scale

MTIS and Estimation of Scale Parameters
Multi-resolution segmentation (MRS) [39], which is embedded in eCognition software, is employed for image segmentation. Acquisition of coarse-to-fine sequential images is the basis of scale-driven OBCD. The quality of segmentation seriously affects the accuracy of object feature extraction and OBCD. Based on different image segmentation strategies, there are three primary modes of multi-temporal image segmentation at present [40,41], namely single-temporal segmentation (STS), multi-temporal separate segmentation (MTSS), and multi-temporal combined segmentation (MTCS), as shown in Figure 3. Further details of the three primary modes of multi-temporal image segmentation are described in Niemeyer et al. (2008) and Zhang et al. (2017). In this paper, corresponding objects are extracted via vector-raster integration and coarse-to-fine segmentation under the boundary constraints of HVM. The HVM, produced from the National Census Geographic of China, is used as auxiliary data. The superimposed image is divided into homogeneous objects with consistent spatial positions.

MTIS and Estimation of Scale Parameters
Multi-resolution segmentation (MRS) [39], which is embedded in eCognition software, is employed for image segmentation. Acquisition of coarse-to-fine sequential images is the basis of scale-driven OBCD. The quality of segmentation seriously affects the accuracy of object feature extraction and OBCD. Based on different image segmentation strategies, there are three primary modes of multi-temporal image segmentation at present [40,41], namely single-temporal segmentation (STS), multi-temporal separate segmentation (MTSS), and multi-temporal combined segmentation (MTCS), as shown in Figure 3. Further details of the three primary modes of multi-temporal image segmentation are described in Niemeyer et al. (2008) and Zhang et al. (2017). In this paper, corresponding objects are extracted via vector-raster integration and coarse-to-fine segmentation under the boundary constraints of HVM. The HVM, produced from the National Census Geographic of China, is used as auxiliary data. The superimposed image is divided into homogeneous objects with consistent spatial positions. Estimation of scale parameter (ESP) [42,43] and the modified average segmentation evaluation index (ASEI) are used to guide the selection of the optimal segmentation scales for different ground objects. ESP, produced by Lucian et al. (2010), is a scale parameter estimation plug-in developed for eCognition software that calculates the local variance (LV) of homogeneous image objects using different scale parameters. The average standard deviation of the object layer is used to determine whether the segmentation scale is optimal. The rate of change in LV (ROC-LV) is used to indicate the approximate optimal scale parameter. When the ROC-LV is at its maximum value within an interval, the corresponding scale is the optimal scale. However, this method can only be adapted to analyses of image data in a single band. To take full advantage of multi-spectral information, we utilize the modified ASEI to confirm the best scale Estimation of scale parameter (ESP) [42,43] and the modified average segmentation evaluation index (ASEI) are used to guide the selection of the optimal segmentation scales for different ground objects. ESP, produced by Lucian et al. (2010), is a scale parameter estimation plug-in developed for eCognition software that calculates the local variance (LV) of homogeneous image objects using different scale parameters. The average standard deviation of the object layer is used to determine whether the segmentation scale is optimal. The rate of change in LV (ROC-LV) is used to indicate the approximate optimal scale parameter. When the ROC-LV is at its maximum value within an interval, the corresponding scale is the optimal scale. However, this method can only be adapted to analyses of image data in a single band. To take full advantage of multi-spectral information, we utilize the modified ASEI to confirm the best scale parameter. The segmentation evaluation index (SEI) [44,45] in band L for an object is defined by Equation (1), where σ L is the spectral standard deviation of the object, which is used as a homogeneity index. Its formula is shown in Equation (2), where ∆C L is the heterogeneity index derived from calculating the absolute value of the mean difference between the object and a neighbouring object. Its formula is shown in Equation (3), in which num is the total number of pixels within the object, C Li is the grey value of pixel i in band L, C L is the mean value of the object, l is the boundary length of the object, l k is the boundary length of the common edge with the kth adjacent object, and C Lk is the mean value of the kth adjacent object: All of the formulas above are for single-band L only. For superimposed images (number of bands is 2N, where N is the band number of the single-phase image), eCognition applies different weights (w L ) to different bands to highlight the impacts of various bands on the segmentation results. Therefore, when considering a plurality of the bands for the SEI of any object, this paper modifies the calculation formula of SEI to Equation (4), and uses a weight for each band of w L = 1. This ensures that each band is equal in the segmentation process [42,43]: To compare objects at different segmentation scales, the SEI values of all objects in the study area can be averaged. Because the area (i.e., the number of pixels forming an image object) of the object may affect the segmentation results, this paper introduces the area factor, and gives a larger weight value to objects with larger areas to reduce the instability caused by object area. The modified ASEI is calculated using Equation (5): where A represents the total area of all objects, A j is the area of the j-th object, Num is the total number of objects, and SEI j is the SEI value of the j-th object.

Selection of Training Samples
Neighbourhood correlation image analysis (NCIA) is a pixel-based method of CD, which is based on spectral contextual information created using correlation analyses between bi-temporal images within a specified neighborhood [46,47]. Further details of the NCIA method are described in Im et al. (2005). A rectangular (e.g., a moving window of 5 × 5 pixels) neighborhood type is used in this paper. After obtaining the difference image, the Otsu thresholding [48] algorithm is applied to classify the images into two types, changed and unchanged, and then the corresponding binary change images are generated. Sample selection is conducted based on the results of multi-scale segmentation and pixel-level pre-classification. For the ith object R i , the uncertainty index T is calculated based on pixel-level pre-classification results, as shown in the following Equation (6): where n c , n u , and n are the numbers of changed, unchanged, and total pixels in object R i , respectively. After setting the threshold T m , we use the following equation to determine the properties of the object: where l i = 1, 2, 3 indicates the attributes of object R i that are unchanged, uncertain, and changed, respectively, and the threshold range is T m ∈ (0.5, 1). Changed and unchanged objects are selected as training samples for RoF, and uncertain objects are further classified.

Multi-Feature Information Extraction
After sample selection, the key steps are feature extraction and analyzes. Due to the characteristics of high-resolution images, comprehensive consideration of ground objects is necessary, which is achieved by combining multiple features. It is necessary to preliminarily specify the features to be extracted according to the requirements of CD [49]. Prior to feature analyses, we filter or combine the extracted features. Finally, based on these features, change information is extracted. After fine multi-scale segmentation, eCognition can determine the features of objects by evaluating the image objects as well as their embedding in the image object hierarchy. In this paper, spectral features and texture features of objects at the same position are extracted from T1 and T2 phase images, respectively.
Spectral features include the mean value, standard deviation, ratio (i.e., the amount that a given image layer contributes to the total brightness), maximum value (i.e., the value of the pixel with the maximum layer intensity value of the image object), and minimum value (i.e., the value of the pixel with the minimum layer intensity value in the image object). Haralick et al. [50] used the Grey-level Co-occurrence Matrix (GLCM) method to define 14 types of texture features. In this paper, the primary texture features are the mean value, standard deviation, contrast, entropy, homogeneity, correlation, angular second moment, and dissimilarity. The spectral and texture features are stacked, as shown in Figure 4. Table 1 lists the features extracted in this study. Each type of feature may reflect different object information from different angles, and often complement each other. After extracting the two types of features listed above, all features are normalised to the range of 0-1 [49] and combined as input RoF data for model training.

OBCD Based on RoF and Coarse-to-Fine Uncertainty Analyses
For multi-scale image sequences, the coarse-scale object size is large, which is suitable for CD of objects with large areas. The fine-scale object size is small, and it is advantageous for CD of small objects. Therefore, detecting large objects at a coarse scale, detecting small objects at a fine scale, and then synthesising the object detection results at different scales is useful for improving the accuracy and reliability of the CD algorithm. Inspired by the concept of "from coarse to fine, refine layer by layer" [25], this paper proposes a coarse-to-fine uncertainty analysis method. The overall workflow is shown in Figure 5, which includes the following general steps: Figure 5. OBCD process incorporating rotation forest and coarse-to-fine uncertainty analyses.

Step 1: Uncertain object classification by RoF
The RoF classifier method can successfully generate classifier ensembles based on feature extraction. This paper utilises RoF to randomly divide the original feature dataset into several subsets, and carries out feature transformation (e.g., principal component analysis, PCA) for each subset [23,[36][37][38]51]. The correlations between transformed subsets are minimized. This work lays

OBCD Based on RoF and Coarse-to-Fine Uncertainty Analyses
For multi-scale image sequences, the coarse-scale object size is large, which is suitable for CD of objects with large areas. The fine-scale object size is small, and it is advantageous for CD of small objects. Therefore, detecting large objects at a coarse scale, detecting small objects at a fine scale, and then synthesising the object detection results at different scales is useful for improving the accuracy and reliability of the CD algorithm. Inspired by the concept of "from coarse to fine, refine layer by layer" [25], this paper proposes a coarse-to-fine uncertainty analysis method. The overall workflow is shown in Figure 5, which includes the following general steps:

OBCD Based on RoF and Coarse-to-Fine Uncertainty Analyses
For multi-scale image sequences, the coarse-scale object size is large, which is suitable for CD of objects with large areas. The fine-scale object size is small, and it is advantageous for CD of small objects. Therefore, detecting large objects at a coarse scale, detecting small objects at a fine scale, and then synthesising the object detection results at different scales is useful for improving the accuracy and reliability of the CD algorithm. Inspired by the concept of "from coarse to fine, refine layer by layer" [25], this paper proposes a coarse-to-fine uncertainty analysis method. The overall workflow is shown in Figure 5, which includes the following general steps: Figure 5. OBCD process incorporating rotation forest and coarse-to-fine uncertainty analyses.

Step 1: Uncertain object classification by RoF
The RoF classifier method can successfully generate classifier ensembles based on feature extraction. This paper utilises RoF to randomly divide the original feature dataset into several subsets, and carries out feature transformation (e.g., principal component analysis, PCA) for each subset [23,[36][37][38]51]. The correlations between transformed subsets are minimized. This work lays Step 1: Uncertain object classification by RoF The RoF classifier method can successfully generate classifier ensembles based on feature extraction. This paper utilises RoF to randomly divide the original feature dataset into several subsets, and carries out feature transformation (e.g., principal component analysis, PCA) for each subset [23,[36][37][38]51]. The correlations between transformed subsets are minimized. This work lays the foundation for improving the accuracy of classification without changing the information in the original dataset by retaining all major components. The classification and regression tree (CART) [23] is employed as the base classifier because it is sensitive to rotation of the characteristic axis and can produce a strongly differentiating classifier. The decision tree is intuitive and easy to understand. The training of each base classifier is carried out in different subsets, significantly increasing the difference with the base classifier. This training helps to improve the accuracy of prediction.
Step 2: Coarse-to-fine image fusion The proposed approach makes full use of the advantages of multi-scale image analyses and expression, and simultaneously uses multi-scale image layer object relationships to fuse coarse-scale CD results with fine-scale results. Finally, the MV rule [23] is used to obtain the final CD results.

Dataset Description
To verify the feasibility and effectiveness of the proposed method, we apply the proposed method to two pairs of real high-resolution remote sensing datasets. The first experimental dataset (DS1) contains GF2 multi-spectral images (Figure 6a,b) captured in 2015 and 2016 covering the city of Liuzhou, China. GF2 is a Chinese satellite launched in August 2014, which provides high-resolution imagery of the Earth. The orbit height of the GF2 satellite is 631 km, and its inclination is 97.9080 • . The GF2 image comprises four spectral bands: Red (R), green (G), blue (B), and near infrared, as well as one panchromatic band. The pan-and multi-spectral images are fused using the Pan-sharp algorithm [52] and the spatial resolution of the fused image is 0.8 m. The bi-temporal images used for experiments are orthorectified and mainly include the three bands R, G, and B. The image contains 3749 × 3008 pixels. The vector data used in this study are an HVM of Liuzhou compiled in 2015 ( Figure 6c). These data were produced through the National Census Geographic of China. The HVM and images of the same area are obtained after preprocessing. The vector data are projected using Transverse Mercator projection, and the projection's central meridian is 111 • E. The vector data contains 554 objects.
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 23 the foundation for improving the accuracy of classification without changing the information in the original dataset by retaining all major components. The classification and regression tree (CART) [23] is employed as the base classifier because it is sensitive to rotation of the characteristic axis and can produce a strongly differentiating classifier. The decision tree is intuitive and easy to understand. The training of each base classifier is carried out in different subsets, significantly increasing the difference with the base classifier. This training helps to improve the accuracy of prediction.
Step 2: Coarse-to-fine image fusion The proposed approach makes full use of the advantages of multi-scale image analyses and expression, and simultaneously uses multi-scale image layer object relationships to fuse coarse-scale CD results with fine-scale results. Finally, the MV rule [23] is used to obtain the final CD results.

Dataset Description
To verify the feasibility and effectiveness of the proposed method, we apply the proposed method to two pairs of real high-resolution remote sensing datasets. The first experimental dataset (DS1) contains GF2 multi-spectral images (Figure 6a,b) captured in 2015 and 2016 covering the city of Liuzhou, China. GF2 is a Chinese satellite launched in August 2014, which provides high-resolution imagery of the Earth. The orbit height of the GF2 satellite is 631 km, and its inclination is 97.9080°. The GF2 image comprises four spectral bands: Red (R), green (G), blue (B), and near infrared, as well as one panchromatic band. The pan-and multi-spectral images are fused using the Pan-sharp algorithm [52] and the spatial resolution of the fused image is 0.8 m. The bi-temporal images used for experiments are orthorectified and mainly include the three bands R, G, and B. The image contains 3749 × 3008 pixels. The vector data used in this study are an HVM of Liuzhou compiled in 2015 ( Figure 6c). These data were produced through the National Census Geographic of China. The HVM and images of the same area are obtained after preprocessing. The vector data are projected using Transverse Mercator projection, and the projection's central meridian is 111°E. The vector data contains 554 objects. The two high-resolution images in the second dataset (DS2) are also GF2 multi-spectral images (Figure 7a,b). The bi-temporal images used for experiments mainly utilize the three bands of R, G, and B, and contain 5314 × 4745 pixels. The image areas are much larger than those of DS1 and include a variety of land cover types. The vector data set for land use contains 1540 objects ( Figure  7c). From the bi-temporal images and HVMs, the main land cover types are divided into nine categories, i.e., farmland, woodland, garden land, grassland, building, road, bare land, water and artificial structure. The types are in accordance with the classification criteria of the National Census Geography of China. Among them, artificial structure mainly includes playground, impervious surface and construction; building is categorized into medium-story apartments, The two high-resolution images in the second dataset (DS2) are also GF2 multi-spectral images (Figure 7a,b). The bi-temporal images used for experiments mainly utilize the three bands of R, G, and B, and contain 5314 × 4745 pixels. The image areas are much larger than those of DS1 and include a variety of land cover types. The vector data set for land use contains 1540 objects (Figure 7c). From the bi-temporal images and HVMs, the main land cover types are divided into nine categories, i.e., farmland, woodland, garden land, grassland, building, road, bare land, water and artificial structure. The types are in accordance with the classification criteria of the National Census Geography of China. Among them, artificial structure mainly includes playground, impervious surface and construction; building is categorized into medium-story apartments, commercial building, industrial building, settlement and residential area; grassland is categorized into artificial grassland and natural grassland; farmland is categorized into paddy field and dry land; garden land is categorized into orchard, tea plantation and mulberry field. The reference images shown in Figures 6d and 7d were produced using the polygon to raster (PTR) tool built into ESRI ArcGIS 10.1 software. Reference polygon features were produced through the National Census Geography of China. The black areas represent unchanged regions, while white regions have changed.
commercial building, industrial building, settlement and residential area; grassland is categorized into artificial grassland and natural grassland; farmland is categorized into paddy field and dry land; garden land is categorized into orchard, tea plantation and mulberry field. The reference images shown in Figures 6d and 7d were produced using the polygon to raster (PTR) tool built into ESRI ArcGIS 10.1 software. Reference polygon features were produced through the National Census Geography of China. The black areas represent unchanged regions, while white regions have changed.

Evaluation Metrics
Evaluation of accuracy is essential to interpreting CD results and the final decision-making process. Four indices are used to evaluate the accuracy of the final results [  Kappa: The Kappa coefficient is a statistical measure of accuracy or agreement, which reflects the consistency between experimental results and ground truth data, and is expressed as , where 0 p indicates the true consistency and c p indicates the theoretical consistency.

Experimental Results and Analysis
To verify the feasibility and effectiveness of the proposed approach, two unsupervised pixel-based CD methods, i.e., PCA-k-means [10] and NCIA, as well as three OB methods are used for comparison. The PCA-k-means method has two parameters, i.e., non-overlapping blocks H (H = 5 in our experiments) and the dimensions S (S = 5 in our experiments) of the eigenvector space. OCVA [18] is employed as the unsupervised method, and extreme learning machine (ELM) [54] and random forest (RF) [55] are used as the supervised classifiers. The parameters involved in these methods are set as in the original paper. The implementation of the ELM method is available at Ref. 54. These methods are applied to demonstrate the advantages of the proposed approach. The CART decision tree is adopted as the base classifier for RF and RoF. The default number of trees for RF is set to 100, while the number of decision trees for RoF is set to L = 50. For RoF, Xia et al. [23,38,51] suggest that a small number of features per subset will increase the classification performance, as such, we set M = 10. The threshold of the uncertainty index is set to Tm = 0.55 for both experiments after the comparisons.

Evaluation Metrics
Evaluation of accuracy is essential to interpreting CD results and the final decision-making process.
The Kappa coefficient is a statistical measure of accuracy or agreement, which reflects the consistency between experimental results and ground truth data, and is expressed as where p 0 indicates the true consistency and p c indicates the theoretical consistency.

Experimental Results and Analysis
To verify the feasibility and effectiveness of the proposed approach, two unsupervised pixel-based CD methods, i.e., PCA-k-means [10] and NCIA, as well as three OB methods are used for comparison. The PCA-k-means method has two parameters, i.e., non-overlapping blocks H (H = 5 in our experiments) and the dimensions S (S = 5 in our experiments) of the eigenvector space. OCVA [18] is employed as the unsupervised method, and extreme learning machine (ELM) [54] and random forest (RF) [55] are used as the supervised classifiers. The parameters involved in these methods are set as in the original paper. The implementation of the ELM method is available at Ref. 54. These methods are applied to demonstrate the advantages of the proposed approach. The CART decision tree is adopted as the base classifier for RF and RoF. The default number of trees for RF is set to 100, while the number of decision trees for RoF is set to L = 50. For RoF, Xia et al. [23,38,51] suggest that a small number of features per subset will increase the classification performance, as such, we set M = 10. The threshold of the uncertainty index is set to T m = 0.55 for both experiments after the comparisons.

Test of Scale Parameters
Multi-scalar image segmentation is a fundamental step in OBCD. When using the ESP tool in DS1, the scale range is from 55-245 in steps of 5. The resulting LV and ROC-LV values vary with scale. As illustrated in Figure 8a, as the segmentation scale increases, LV values retain a trend of growth, while ROC-LV values show the opposite trend. The change in LV from one level to another indicates how important that scale level is for structuring information about object variability relative to the whole image. Theoretically, the peaks in an ROC-LV curve indicate the levels where LV increases as segments delineate their correspondents in the real world [42]. In Figure 8a, the peaks in the ROC-LV curve corresponding to scale levels of 105, 115, 125, 150, 175, 210 and potentially 240 indicate the meaningful scale parameters for segmentation of DS1. We selected the most obvious peaks, which dominate their neighborhood, as indicators for optimal scale parameters. Such a scale is generally considered an alternative optimal scale. Based on segmentation results, industrial buildings, settlements and residential areas are better segmented at a scale of 105 than at other scales. Features such as playground, impervious surfaces, construction, and bare land are better segmented at a scale of 175, while woodlands and road areas are better segmented at 210. The object levels delineated with these scale parameters matched the structures in real world for the DS1. These three scales are considered suitable for rough image segmentation. Then these three scales (with floating of ±5) are used to set up the corresponding ranges of 100-110, 170-180, and 205-215. This paper uses the modified ASEI to further determine the specific optimal scale within each range ( Figure 9). The maximum values shown in Figure 9 are 102, 179, and 213 in panels a-c, respectively, which can be considered the optimal segmentation scales of DS1. The mean object value maps at these three optimal scales are shown in Figure 10.
For DS2, the scale range is from 105 to 295 in steps of 5. The maximum ROC-LV values are obtained at scales of 110, 140, 155, 165, 205, 245, and 275, as shown in Figure 8b. These values represent the meaningful scale parameters and are generally considered an alternative optimal scale for the segmentation of DS2. Based on the segmentation results, commercial buildings, industrial buildings, settlements, and residential areas are segmented better at the scale of 140 than at other scales. On the other hand, playgrounds, impervious surfaces, construction areas, and bare land are better segmented at a scale of 205, while grasslands and roads are better segmented at 245. These three scales (with floating of ±5) are used to set the three corresponding ranges 135-145, 200-210, and 240-250. Then the modified ASEI is used to determine the specific optimal scale for each range ( Figure 11). The maximum values in Figure 11 are 136, 206, and 244 in panels a-c, respectively, which can be considered the optimal segmentation scales of DS2. The mean object value maps at these three optimal scales are shown in Figure 12.

. Test of Scale Parameters
Multi-scalar image segmentation is a fundamental step in OBCD. When using the ESP tool in DS1, the scale range is from 55-245 in steps of 5. The resulting LV and ROC-LV values vary with scale. As illustrated in Figure 8a, as the segmentation scale increases, LV values retain a trend of growth, while ROC-LV values show the opposite trend. The change in LV from one level to another indicates how important that scale level is for structuring information about object variability relative to the whole image. Theoretically, the peaks in an ROC-LV curve indicate the levels where LV increases as segments delineate their correspondents in the real world [42]. In Figure 8a, the peaks in the ROC-LV curve corresponding to scale levels of 105, 115, 125, 150, 175, 210 and potentially 240 indicate the meaningful scale parameters for segmentation of DS1. We selected the most obvious peaks, which dominate their neighborhood, as indicators for optimal scale parameters. Such a scale is generally considered an alternative optimal scale. Based on segmentation results, industrial buildings, settlements and residential areas are better segmented at a scale of 105 than at other scales. Features such as playground, impervious surfaces, construction, and bare land are better segmented at a scale of 175, while woodlands and road areas are better segmented at 210. The object levels delineated with these scale parameters matched the structures in real world for the DS1. These three scales are considered suitable for rough image segmentation. Then these three scales (with floating of ±5) are used to set up the corresponding ranges of 100-110, 170-180, and 205-215. This paper uses the modified ASEI to further determine the specific optimal scale within each range (Figure 9). The maximum values shown in Figure 9 are 102, 179, and 213 in panels a-c, respectively, which can be considered the optimal segmentation scales of DS1. The mean object value maps at these three optimal scales are shown in Figure 10.
For DS2, the scale range is from 105 to 295 in steps of 5. The maximum ROC-LV values are obtained at scales of 110, 140, 155, 165, 205, 245, and 275, as shown in Figure 8b. These values represent the meaningful scale parameters and are generally considered an alternative optimal scale for the segmentation of DS2. Based on the segmentation results, commercial buildings, industrial buildings, settlements, and residential areas are segmented better at the scale of 140 than at other scales. On the other hand, playgrounds, impervious surfaces, construction areas, and bare land are better segmented at a scale of 205, while grasslands and roads are better segmented at 245. These three scales (with floating of ±5) are used to set the three corresponding ranges 135-145, 200-210, and 240-250. Then the modified ASEI is used to determine the specific optimal scale for each range ( Figure 11). The maximum values in Figure 11 are 136, 206, and 244 in panels a-c, respectively, which can be considered the optimal segmentation scales of DS2. The mean object value maps at these three optimal scales are shown in Figure 12.

Results for DS1
Our results are demonstrated in two ways. The CD results are displayed in figures and the criteria are listed in tables. Figures 13 and 14 show the experimental results based on DS1. Figure 13 shows CD results at the three optimal scales using RoF. The false detection phenomenon (the red regions) is most serious at scale 102, whereas the missed detection (the yellow regions) is better at this scale than at the other two scales (scale 179 and scale 213) tested. As the scale increases, false detection is further reduced at scale 213, but missed detection is slightly more frequent than at the other two scales. This paper utilizes the MV approach to fuse the CD results at the three optimal scales, which can effectively integrate and complement false detection or missed detection phenomena at a single scale and improve accuracy of the final result. Figure 14 shows the results of MV using ELM, RF and RoF. Changed regions are concentrated mainly within strongly changed regions (change occurs at three scales) and obviously changed regions (change occurs at any two scales), while subtly changed regions (change occurs only at one scale) are mostly false changes.

Results for DS1
Our results are demonstrated in two ways. The CD results are displayed in figures and the criteria are listed in tables. Figures 13 and 14 show the experimental results based on DS1. Figure 13 shows CD results at the three optimal scales using RoF. The false detection phenomenon (the red regions) is most serious at scale 102, whereas the missed detection (the yellow regions) is better at this scale than at the other two scales (scale 179 and scale 213) tested. As the scale increases, false detection is further reduced at scale 213, but missed detection is slightly more frequent than at the other two scales. This paper utilizes the MV approach to fuse the CD results at the three optimal scales, which can effectively integrate and complement false detection or missed detection phenomena at a single scale and improve accuracy of the final result. Figure 14 shows the results of MV using ELM, RF and RoF. Changed regions are concentrated mainly within strongly changed regions (change occurs at three scales) and obviously changed regions (change occurs at any two scales), while subtly changed regions (change occurs only at one scale) are mostly false changes.

Results for DS1
Our results are demonstrated in two ways. The CD results are displayed in figures and the criteria are listed in tables. Figures 13 and 14 show the experimental results based on DS1. Figure 13 shows CD results at the three optimal scales using RoF. The false detection phenomenon (the red regions) is most serious at scale 102, whereas the missed detection (the yellow regions) is better at this scale than at the other two scales (scale 179 and scale 213) tested. As the scale increases, false detection is further reduced at scale 213, but missed detection is slightly more frequent than at the other two scales. This paper utilizes the MV approach to fuse the CD results at the three optimal scales, which can effectively integrate and complement false detection or missed detection phenomena at a single scale and improve accuracy of the final result. Figure 14 shows the results of MV using ELM, RF and RoF. Changed regions are concentrated mainly within strongly changed regions (change occurs at three scales) and obviously changed regions (change occurs at any two scales), while subtly changed regions (change occurs only at one scale) are mostly false changes.   The proposed fusion strategy converts the traditional "hard detection" method into "soft detection". Various scale detection results are combined to attain stepwise reclassification of the change intensity, which offers a more meaningful and flexible reference in practice than simply separating all pixels into two classes of changed and unchanged. Moreover, strong changes that occur at all three scales can be defined as the areas with maximum probability of change, while obvious changes that occur at two scales are viewed as having the second greatest probability. Finally, the strongly and obviously changed regions are considered changed areas, whereas the subtly changed and unchanged regions are treated as unchanged areas. Table 2 lists the values used for evaluation metrics. As shown in Figure 15, the two pixel-based methods are filled many white spots due to noise. The CD accuracy of pixel-based methods is comparatively low, mainly because only the spectral features of bi-temporal images are used in these methods. Furthermore, due to the high resolution of the original images, pixel-based CD methods based on spectral statistics are not able to meet the requirements for change information extraction and the results obtained are poor. The OCVA method achieves better object-level CD results than the two pixel-based methods. Because the unsupervised CD method is strongly affected by the threshold, OCVA results are worse than those of the supervised RoF at the three optimal scales. The dominant change combinations include those between settlements and impervious surfaces, and between bare land and buildings. The main driver of these changes is urban growth and construction in Lizhou in recent years. Due to the limitations of OCVA, some changed objects are falsely detected as unchanged, while some unchanged objects are falsely identified as changed. The scale constraints are fully considered through implementation of coarse-to-fine uncertainty analyses. The amounts of uncertain data are reduced with each scale. CD results at the three optimal scales are combined to generate final results using the MV approach. Compared to the RF-MV and ELM-MV methods, the kappa coefficient of RoF-MV is highest among the three supervised classifiers. Furthermore, it should be noted that although image segmentation may impair results near borders, the proposed approach not only improves CD performance but also enhances the image boundary compared to results at a single scale. In short, the proposed approach obtains the best results and greater accuracy compared to other methods tested. The proposed fusion strategy converts the traditional "hard detection" method into "soft detection". Various scale detection results are combined to attain stepwise reclassification of the change intensity, which offers a more meaningful and flexible reference in practice than simply separating all pixels into two classes of changed and unchanged. Moreover, strong changes that occur at all three scales can be defined as the areas with maximum probability of change, while obvious changes that occur at two scales are viewed as having the second greatest probability. Finally, the strongly and obviously changed regions are considered changed areas, whereas the subtly changed and unchanged regions are treated as unchanged areas. Table 2 lists the values used for evaluation metrics. As shown in Figure 15, the two pixel-based methods are filled many white spots due to noise. The CD accuracy of pixel-based methods is comparatively low, mainly because only the spectral features of bi-temporal images are used in these methods. Furthermore, due to the high resolution of the original images, pixel-based CD methods based on spectral statistics are not able to meet the requirements for change information extraction and the results obtained are poor. The OCVA method achieves better object-level CD results than the two pixel-based methods. Because the unsupervised CD method is strongly affected by the threshold, OCVA results are worse than those of the supervised RoF at the three optimal scales. The dominant change combinations include those between settlements and impervious surfaces, and between bare land and buildings. The main driver of these changes is urban growth and construction in Lizhou in recent years. Due to the limitations of OCVA, some changed objects are falsely detected as unchanged, while some unchanged objects are falsely identified as changed. The scale constraints are fully considered through implementation of coarse-to-fine uncertainty analyses. The amounts of uncertain data are reduced with each scale. CD results at the three optimal scales are combined to generate final results using the MV approach. Compared to the RF-MV and ELM-MV methods, the kappa coefficient of RoF-MV is highest among the three supervised classifiers. Furthermore, it should be noted that although image segmentation may impair results near borders, the proposed approach not only improves CD performance but also enhances the image boundary compared to results at a single scale. In short, the proposed approach obtains the best results and greater accuracy compared to other methods tested.   Figure 16, among the three optimal scales, the false detection phenomenon (the red regions) is more serious at scale 136 than at the other two scales (scale 206 and scale 244); on the other hand, the missed detection phenomenon (the yellow regions) is better at that scale. As the scale increases, false detection is further reduced at the scale of 244, but missed detection is slightly higher than at the other two scales. Therefore, these three scales of DS2 can compensate for each other in the CD process. Figure 17 shows the results of ELM, RF, and RoF using the MV approach. Strong and obvious changes can be set as the areas with a maximum probability of change, while subtle changes are viewed as having secondary importance.   Figure 16, among the three optimal scales, the false detection phenomenon (the red regions) is more serious at scale 136 than at the other two scales (scale 206 and scale 244); on the other hand, the missed detection phenomenon (the yellow regions) is better at that scale. As the scale increases, false detection is further reduced at the scale of 244, but missed detection is slightly higher than at the other two scales. Therefore, these three scales of DS2 can compensate for each other in the CD process. Figure 17 shows the results of ELM, RF, and RoF using the MV approach. Strong and obvious changes can be set as the areas with a maximum probability of change, while subtle changes are viewed as having secondary importance.  Four indicators, namely, FA, MA, OE, and the kappa coefficient, are adopted for quantitative comparisons ( Table 3). As shown in Figure 18, the results of OBCD methods (OCVA, ELM, RF, and RoF) indicated better performance than those of PBCD methods (NCIA and PCA-k-means). The NCIA method utilizes contextual information to make a final decision, and thus performs better than the PCA-k-means method. The dominant land cover change combinations indicate changes between roads and impervious surfaces, bare land and farmland, settlements and impervious surfaces, and bare land and buildings. The unsupervised OCVA method achieved better CD result, due to the use of segmented objects as the basic unit of analysis. With this method, changed objects are more regular. The MV scheme obtains better CD results than a single scale, and leads to much more homogeneous regions shown on the CD maps. Furthermore, the proposed RoF-MV method was superior to all other methods in terms of the kappa coefficient for DS2. Regardless of the computation time, it can be expected that RoF can surpass RF to some extent. The time requirement for ELM is lower than RF and RoF classifiers. However, the final kappa coefficient of ELM is smaller than these two ensemble learning methods. The proposed approach clearly outperforms other comparison methods in both qualitative and quantitative analyses.  Four indicators, namely, FA, MA, OE, and the kappa coefficient, are adopted for quantitative comparisons ( Table 3). As shown in Figure 18, the results of OBCD methods (OCVA, ELM, RF, and RoF) indicated better performance than those of PBCD methods (NCIA and PCA-k-means). The NCIA method utilizes contextual information to make a final decision, and thus performs better than the PCA-k-means method. The dominant land cover change combinations indicate changes between roads and impervious surfaces, bare land and farmland, settlements and impervious surfaces, and bare land and buildings. The unsupervised OCVA method achieved better CD result, due to the use of segmented objects as the basic unit of analysis. With this method, changed objects are more regular. The MV scheme obtains better CD results than a single scale, and leads to much more homogeneous regions shown on the CD maps. Furthermore, the proposed RoF-MV method was superior to all other methods in terms of the kappa coefficient for DS2. Regardless of the computation time, it can be expected that RoF can surpass RF to some extent. The time requirement for ELM is lower than RF and RoF classifiers. However, the final kappa coefficient of ELM is smaller than these two ensemble learning methods. The proposed approach clearly outperforms other comparison methods in both qualitative and quantitative analyses. Four indicators, namely, FA, MA, OE, and the kappa coefficient, are adopted for quantitative comparisons ( Table 3). As shown in Figure 18, the results of OBCD methods (OCVA, ELM, RF, and RoF) indicated better performance than those of PBCD methods (NCIA and PCA-k-means). The NCIA method utilizes contextual information to make a final decision, and thus performs better than the PCA-k-means method. The dominant land cover change combinations indicate changes between roads and impervious surfaces, bare land and farmland, settlements and impervious surfaces, and bare land and buildings. The unsupervised OCVA method achieved better CD result, due to the use of segmented objects as the basic unit of analysis. With this method, changed objects are more regular. The MV scheme obtains better CD results than a single scale, and leads to much more homogeneous regions shown on the CD maps. Furthermore, the proposed RoF-MV method was superior to all other methods in terms of the kappa coefficient for DS2. Regardless of the computation time, it can be expected that RoF can surpass RF to some extent. The time requirement for ELM is lower than RF and RoF classifiers. However, the final kappa coefficient of ELM is smaller than these two ensemble learning methods. The proposed approach clearly outperforms other comparison methods in both qualitative and quantitative analyses.

Discussion
Scale is an important feature in the OBCD process. The accuracy of OBCD results depends greatly on the quality of the MTIS and information extraction methods. Contextual constraints link parent and child objects at various scales. In this paper, highly homogenous objects with consistent spatial positions are obtained using the MRS algorithm under the boundary constraints of the HVM. The proposed approach, which is capable of delineating and analysing image ground objects at different scales based on coarse-to-fine uncertainty analyses, was successfully implemented for high-resolution remotely sensed images. The approach adopts RoF to classify uncertain regions of

Discussion
Scale is an important feature in the OBCD process. The accuracy of OBCD results depends greatly on the quality of the MTIS and information extraction methods. Contextual constraints link parent and child objects at various scales. In this paper, highly homogenous objects with consistent spatial positions are obtained using the MRS algorithm under the boundary constraints of the HVM. The proposed approach, which is capable of delineating and analysing image ground objects at different scales based on coarse-to-fine uncertainty analyses, was successfully implemented for high-resolution remotely sensed images. The approach adopts RoF to classify uncertain regions of the generated coarse-to-fine segmentation maps. Multiple CD results are combined to generate the final result using the MV approach. Experimental results demonstrated the effectiveness of multi-feature and coarse-to-fine image fusion in improving CD results. The proposed approach led to acceptable levels of efficiency and accuracy. Furthermore, comparative analyses showed that the accuracy is better than other CD methods. The results support the proposed approach as a highly efficient method to improve monitoring of urban areas using remotely sensed data.
Our approach is based on a combination of pixel-based and OB analyses. The hybrid algorithms that combined pixel-and object-based schemes successfully reduced noisy changes, as well as the small and spurious changes introduced by the inconsistent delineation of objects. However, quality of the training samples is affected by the initial pixel-level pre-classification results. The threshold of uncertainty index T m is an important parameter that can affect the final CD results. The performance varies with different threshold values. In this paper, the two experimental datasets dynamically adjust T m within the interval (0.5, 1) at a step size of 0.1. As shown in Figure 19, among the three optimal scales for DS1, the false and overall alarm rates at scale 213 are lower than those of scale 102 and 179 as the threshold T m value increases, whereas the missed alarm rate shows the opposite trend. Similarly, when the threshold is within the interval (0.5, 0.85), the false and overall alarm rates at scale 102 are higher than those at scales of 179 and 213, whereas the missed alarm rate shows the opposite pattern. When threshold T m = 0.55, the missed alarm rates at the three optimal scales are lowest, which means that these three scales can compensate for each other in the CD process. Therefore, the false detection or missed detection phenomena at a single scale can be reduced using multi-scale fusion. Figure 20 shows the influence of this index on DS2. The false alarm rate and overall alarm rate at scale 136 are higher than those at scale 206 and 244 as threshold T m increases, whereas the missed alarm rate has the opposite trend. When the threshold is in the interval (0.55, 0.85), the false alarm rate and overall alarm rate at scale 244 are smaller those ones at scales of 136 and 206. When the threshold T m = 0.55, the missed alarm rates are lowest at all three scales. Therefore, the optimal threshold value is T m = 0.55 for DS2. Using the multi-scale propagation method, incorporation of spatial information at different scales by MRS into the RoF classifier can significantly improve the classifier's performance, which indicates the importance of spatial information. The excellent performance of the proposed approach in our two experiments is mainly attributable to the proposed CD scheme successfully taking advantage of spectral, texture, and spatial information at different scale levels.
In the application of CD, we can combine the pixel-based pre-classification and object-based image analyses approaches for different purposes using RoF and HVM, to obtain final object-level CD results. The changed objects are more regular, and the object geometries correspond to actual geographical features. Therefore, the combination not only exhibits the advantages of both pixel-based and object-based approaches, but also has the greatest accuracy.
Remote Sens. 2018, 10, x FOR PEER REVIEW 18 of 23 the generated coarse-to-fine segmentation maps. Multiple CD results are combined to generate the final result using the MV approach. Experimental results demonstrated the effectiveness of multi-feature and coarse-to-fine image fusion in improving CD results. The proposed approach led to acceptable levels of efficiency and accuracy. Furthermore, comparative analyses showed that the accuracy is better than other CD methods. The results support the proposed approach as a highly efficient method to improve monitoring of urban areas using remotely sensed data. Our approach is based on a combination of pixel-based and OB analyses. The hybrid algorithms that combined pixel-and object-based schemes successfully reduced noisy changes, as well as the small and spurious changes introduced by the inconsistent delineation of objects. However, quality of the training samples is affected by the initial pixel-level pre-classification results. The threshold of uncertainty index Tm is an important parameter that can affect the final CD results. The performance varies with different threshold values. In this paper, the two experimental datasets dynamically adjust Tm within the interval (0.5, 1) at a step size of 0.1. As shown in Figure  19, among the three optimal scales for DS1, the false and overall alarm rates at scale 213 are lower than those of scale 102 and 179 as the threshold Tm value increases, whereas the missed alarm rate shows the opposite trend. Similarly, when the threshold is within the interval (0.5, 0.85), the false and overall alarm rates at scale 102 are higher than those at scales of 179 and 213, whereas the missed alarm rate shows the opposite pattern. When threshold Tm = 0.55, the missed alarm rates at the three optimal scales are lowest, which means that these three scales can compensate for each other in the CD process. Therefore, the false detection or missed detection phenomena at a single scale can be reduced using multi-scale fusion. Figure 20 shows the influence of this index on DS2. The false alarm rate and overall alarm rate at scale 136 are higher than those at scale 206 and 244 as threshold Tm increases, whereas the missed alarm rate has the opposite trend. When the threshold is in the interval (0.55, 0.85), the false alarm rate and overall alarm rate at scale 244 are smaller those ones at scales of 136 and 206. When the threshold Tm = 0.55, the missed alarm rates are lowest at all three scales. Therefore, the optimal threshold value is Tm = 0.55 for DS2. Using the multi-scale propagation method, incorporation of spatial information at different scales by MRS into the RoF classifier can significantly improve the classifier's performance, which indicates the importance of spatial information. The excellent performance of the proposed approach in our two experiments is mainly attributable to the proposed CD scheme successfully taking advantage of spectral, texture, and spatial information at different scale levels.
In the application of CD, we can combine the pixel-based pre-classification and object-based image analyses approaches for different purposes using RoF and HVM, to obtain final object-level CD results. The changed objects are more regular, and the object geometries correspond to actual geographical features. Therefore, the combination not only exhibits the advantages of both pixel-based and object-based approaches, but also has the greatest accuracy.

Conclusions and Perspective
Optimal scale selection has been a key issue affecting multi-scale segmentation of high-resolution remote sensing images. Determination of the optimal segmentation scale is directly related to subsequent extraction and analysis of change information. In view of this problem, a novel CD approach for high-resolution remote sensing images based on leveraging RoF and coarse-to-fine uncertainty analyses is proposed in this paper. This approach takes advantage of spatial and contextual information in existing HVM to automatically detect land-use changes. A coarse-to-fine CD strategy is adopted to effectively select local and global changed areas within the HVM.
During the pixel-level pre-classification stage, the NCIA algorithm is used to consider neighborhood information and select pixels with high probabilities of being either changed or unchanged. In the object-level classification stage, a series of optimal segmentation scales ranging from coarse to fine is chosen under the boundary constraints of the HVM. These two methods can effectively avoid subjectivity in scale selection. To obtain more evenly distributed training samples, the proposed sample selection strategy introduces an uncertainty index for each object. The uncertain objects are further classified at a later stage. Multiple classifiers, each of which combines RoF and a single-scale segmentation map, are used to obtain a single-scale classification result. Then, multiple results are integrated using the MV approach to generate the final results. Thus, spatial information from different segmentation scales is incorporated into the CD process of high-resolution remote sensing images, rather than relying upon a single segmentation scale. Two pairs of real high-resolution remote sensing datasets are used to verify the effectiveness and superior performance of the proposed method. In future research, we will further extend this method from the supervised approach to an unsupervised or semi-supervised approach. In addition, we will focus on automated segmentation and parameter selection processes to further improve performance of the CD. and C.X. reviewed the manuscript and provided theoretical and technical guidance; H.S. acted as the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Conclusions and Perspective
Optimal scale selection has been a key issue affecting multi-scale segmentation of high-resolution remote sensing images. Determination of the optimal segmentation scale is directly related to subsequent extraction and analysis of change information. In view of this problem, a novel CD approach for high-resolution remote sensing images based on leveraging RoF and coarse-to-fine uncertainty analyses is proposed in this paper. This approach takes advantage of spatial and contextual information in existing HVM to automatically detect land-use changes. A coarse-to-fine CD strategy is adopted to effectively select local and global changed areas within the HVM.
During the pixel-level pre-classification stage, the NCIA algorithm is used to consider neighborhood information and select pixels with high probabilities of being either changed or unchanged. In the object-level classification stage, a series of optimal segmentation scales ranging from coarse to fine is chosen under the boundary constraints of the HVM. These two methods can effectively avoid subjectivity in scale selection. To obtain more evenly distributed training samples, the proposed sample selection strategy introduces an uncertainty index for each object. The uncertain objects are further classified at a later stage. Multiple classifiers, each of which combines RoF and a single-scale segmentation map, are used to obtain a single-scale classification result. Then, multiple results are integrated using the MV approach to generate the final results. Thus, spatial information from different segmentation scales is incorporated into the CD process of high-resolution remote sensing images, rather than relying upon a single segmentation scale. Two pairs of real high-resolution remote sensing datasets are used to verify the effectiveness and superior performance of the proposed method. In future research, we will further extend this method from the supervised approach to an unsupervised or semi-supervised approach. In addition, we will focus on automated segmentation and parameter selection processes to further improve performance of the CD.

Abbreviations
The following abbreviations are used in this manuscript: