Spatial–Spectral Feature Fusion Coupled with Multi-Scale Segmentation Voting Decision for Detecting Land Cover Change with VHR Remote Sensing Images

: In this article, a novel approach for land cover change detection (LCCD) using very high resolution (VHR) remote sensing images based on spatial–spectral feature fusion and multi-scale segmentation voting decision is proposed. Unlike other traditional methods that have used a single feature without post-processing on a raw detection map, the proposed approach uses spatial–spectral features and post-processing strategies to improve detecting accuracies and performance. Our proposed approach involved two stages. First, we explored the spatial features of the VHR remote sensing image to complement the insu ﬃ ciency of the spectral feature, and then fused the spatial–spectral features with di ﬀ erent strategies. Next, the Manhattan distance between the corresponding spatial–spectral feature vectors of the bi-temporal images was employed to measure the change magnitude between the bi-temporal images and generate a change magnitude image (CMI). Second, the use of the Otsu binary threshold algorithm was proposed to divide the CMI into a binary change detection map (BCDM) and a multi-scale segmentation voting decision algorithm to fuse the initial BCDMs as the ﬁnal change detection map was proposed. Experiments were carried out on three pairs of bi-temporal remote sensing images with VHR remote sensing images. The results were compared with those of the state-of-the-art methods including four popular contextual-based LCCD methods and three post-processing LCCD methods. Experimental comparisons demonstrated that the proposed approach had an advantage over other state-of-the-art techniques in terms of detection accuracies and performance.


Introduction
Land cover change detection (LCCD) with bi-temporal remote sensing images is a popular technique in remote sensing applications [1][2][3]. This technique concentrates on finding and capturing land cover changes using two or more remote sensing images that cover the same geographic area acquired on different dates [1,[4][5][6]. LCCD plays an important role in large-scale land use analysis [7][8][9], environment monitoring evaluation [10,11], natural hazard assessment [12][13][14], and natural resource inventory [15]. However, issues such as "salt-and-pepper" noise in the detection results, especially for VHR remote sensing images [16][17][18], pose a challenge in the practical applications of LCCD with remote sensing images.
LCCD with bi-temporal remote sensing images can be viewed as a pattern recognition problem in image processing where two groups of pixels are labelled, one class for the changed pixels and the other for the unchanged pixels [19]. Existing methods for LCCD can be classified into two types: binary change detection and "from-to" change detection. A binary change detection method acquires land cover change information by measuring the change magnitude through a comparison of bi-temporal images such as image rotation [20], image difference [21], and change vector analysis methods [22][23][24]. In these methods, a binary threshold is adopted to separate the pixels of the change magnitude image (CMI) into "changed" and "unchanged". Some advantages of binary change detection are that it is straightforward and operational; however, the limitation of this method is that it can only provide the size and distribution of the change target without providing more details on the change information [25,26]. In contrast, the "from-to" change method can directly recognize the kinds of changes "from one to another". However, most "from-to" change detection methods depend on the performance of the corresponding land cover classification [22,[27][28][29].
In recent decades, a considerable number of studies have focused on LCCD based on VHR remote sensing images [30][31][32]. The VHR remote sensing image can depict ground targets in more detail than median-low resolution remote sensing images. However, these VHR images with insufficient spectra, but higher resolution usually means a larger intra-variance of the intra-class [33][34][35]. Although satellite sensors such as the WorldView-3 satellite have collected VHR images with eight spectral bands (red, red edge, coastal, blue, green, yellow, near-IR1, and near-IR2) in recent years, the image still demonstrates "larger intra-variance of the intraclass" [36,37]. Furthermore, when using LCCD on VHR bi-temporal images, the two images acquired on different dates are usually inconsistent in terms of atmospheric conditions, sun height, or seasonal phenology; this difference will bring "pseudo-change" in the detection map [38,39]. To address this problem, the contextual spatial feature is usually adopted to smoothen the noise and improve detection accuracies. For example, Celik et al. proposed a method called principal component analysis and k-means clustering (PCA_Kmeans), which divided the CMI into h × h overlapping blocks [40]. The fuzzy clustering method was integrated into the change vector analysis for LCCD (CVA_FCM) [41]. The semi-supervised fuzzy c-means clustering algorithm (Semi_FCM) was developed to address the problem of separating the different images into changed and unchanged pixels [42]. Zhang et al. presented a novel method for unsupervised change detection (CD) from remote sensing images using level set evolution with local uncertainty constraints (LSELUC) [43]. The level set method was developed for acquiring landslide inventory mapping with VHR remote sensing images [13]. The Markov random field is another effective way of employing contextual information to improve the performance of LCCD with VHR remote sensing images [44][45][46]. Although these methods can reduce the noise in the detection map, they are sensitive to contextual space and the progress of determining the contextual scale depends on the mathematical model used and the experience of the practitioner.
Apart from the aforementioned spatial context-based LCCD methods, which are referred to as pre-processing LCCD techniques in this study, a number of studies have also reported that the post-processing procedure can further improve the performance and accuracy of LCCD [22,47]. The post-processing LCCD method focuses on processing the initial detection map and enhancing the performance of LCCD. For example, post-processing with majority voting (MV) has played an important role in improving the raw classification of remote sensing images [48,49]. A general post-processing classification framework (GPCF) was proposed to smoothen the noise of the initial classification map in [50]. Inspired by the post-processing work in image classification, in our previous study [51], an object-based expectation maximization (OBEM) post-processing approach was developed to refine raw LCCD results, which confirmed that using post-processing could effectively improve the performance of LCCD.
While reviewing LCCD techniques with remote sensing images in the past decades [1,2,20,21,52], most LCCD methods were found to concentrate on the extraction and utilization of one single feature to measure the change magnitude between bi-temporal images. In addition, if these methods are defined as "pre-processing LCCD techniques", then post-processing LCCD techniques for the methods are still missing. With the challenge of LCCD with VHR images becoming increasingly prominent in recent years, a considerable number of initial detection results cannot satisfy the requirements of practical application due to the large amount of "salt-and-pepper noise" in a raw change detection map. Pre-and post-processing LCCD techniques should complement each other to improve the change detection performance and accuracies. This complementarity and improvement serve as the basic motivation and viewpoint of our work.
This study, which was inspired by the effectiveness of spatial-spectral feature fusion for image classification [34,[53][54][55] and the post-processing LCCD method [51], developed a novel LCCD approach to improve the performance of LCCD with VHR bi-temporal remote sensing images. The contribution of the proposed framework lies in constructing a new workflow based on the existing techniques including the spatial-spectral feature fusion and the multi-scale segmentation majority voting techniques. Compared with our previous works, which include the MV [49] and OBEM [51], the improvement and difference of this proposed framework are twofold: (1) While the binary change detection was viewed previously as a "pre-processing technique", MV [49] and GPCF [50] can be applied to smoothen the noise and improve the initial detection performance. However, the existing regular window sliding technique cannot cover various ground targets with different shapes and sizes. Hence, in the proposed framework, in addition to the spatial-spectral feature fusion strategy, the initial detection map sets were fused and smoothened using multi-scale segments that represent anastomosis with the size and shape of the ground targets. (2) In the previous post-processing method called the OBEM [51], the best raw initial change detection result is first chosen from the initial detection results set, then multi-scale segmentation based on the post-event images is adopted to smoothen the noise of the selected initial raw map. In contrast, in the proposed framework, the multi-scale segmentation was used directly to fuse the initial detection results and generate the final change detection map by the majority voting decision.
Three pairs of VHR remote sensing images for depicting real land cover change events were employed to assess the effectiveness and performance of our proposed framework. Four state-of-the-art context based LCCD methods and three post-processing LCCD methods were adopted and compared with the proposed framework. Experiments based on the bi-temporal remote sensing images, which covered the real landslide and land use events, were conducted for comparisons. The present study concluded that the proposed LCCD framework based on the integration of spatial-spectral features and multi-scale segmentation voting decision was better suited for the task of change detection than other state-of-the-art techniques.
The rest of this article is divided into four sections. Section 2 describes the proposed methodology. A description of the experimental dataset is given in Section 3. Section 4 presents the details of the experiments and discussion on the results. Conclusions are drawn in Section 5.

Methodology
In the present work, a novel framework based on spatial-spectral feature fusion and multi-scale segmentation voting decision for change detection with VHR remote sensing images is proposed. The present work has two contributions: (1) an algorithm based on spatial-spectral feature fusion, Manhattan distance, and Otsu threshold was integrated to obtain the initial BCDMs; and (2) a multi-scale segmentation voting decision method was developed to fuse the initial BCDMs into the final change detection map. Unlike other state-of-the-art techniques that consider only pre-or posttechniques alone, the proposed framework was designed to integrate pre-and post-techniques into a single platform to detect land cover change. As shown in Figure 1, the proposed framework has two major stages, which will be discussed in detail in the following sections.

Generation of the Initial BCDMs
The aim of the first stage is to generate the initial BCDMs based on spatial-spectral feature fusion. The motivation for fusing different spatial-spectral features lies in that different spatial-spectral features have outstanding different ground targets in an image. One advantage of fusing different spatial-spectral features is that it can enhance the ability to detect a variety of targets. In other words, different spatial-spectral feature extraction methods may have various advantages, and fusing different spatial-spectral features together may provide a potential means of utilizing the advantages of the different spatial-spectral feature extraction methods. Figure 1a shows the proposed framework of generating the initial BCDMs. First, the spatial feature of the bi-temporal images is extracted using a developed spatial feature extraction method and the spatial and spectral features are stacked with different fusion methods to improve the homogeneity of the target. Second, the Manhattan distance is employed to measure the change magnitude between the corresponding fused feature vector, as presented in Equation (2). Finally, Otsu [56] is applied on the CMI to obtain the binary change detection map.
Three classical spatial feature extraction methods, namely, extended morphological profiles (EMPs) [57], morphological attribute profiles (Aps) [58], and rolling guide filter (RGF) [59], which have been applied successfully for image classification, were introduced to explore the spatial feature of the bi-temporal VHR remote sensing images and verify our viewpoint in the first stage. The spatial and spectral features are denoted as In this stage, feature fusion is proposed to complement the insufficiency of the spectral information of the VHR remote sensing image. Three feature fusion strategies (layer stacking [35], mean-weight [60] and adaptive weight [61]), which have been applied successfully in road extraction, land cover classification, and image segmentation with VHR remote sensing image, respectively, were proposed. Different strategies have different effects on spatial and spectral features, as shown in Equation (1). For example, the layer stacking method [35] is the most widely used multi-feature

Generation of the Initial BCDMs
The aim of the first stage is to generate the initial BCDMs based on spatial-spectral feature fusion. The motivation for fusing different spatial-spectral features lies in that different spatial-spectral features have outstanding different ground targets in an image. One advantage of fusing different spatial-spectral features is that it can enhance the ability to detect a variety of targets. In other words, different spatial-spectral feature extraction methods may have various advantages, and fusing different spatial-spectral features together may provide a potential means of utilizing the advantages of the different spatial-spectral feature extraction methods. Figure 1a shows the proposed framework of generating the initial BCDMs. First, the spatial feature of the bi-temporal images is extracted using a developed spatial feature extraction method and the spatial and spectral features are stacked with different fusion methods to improve the homogeneity of the target. Second, the Manhattan distance is employed to measure the change magnitude between the corresponding fused feature vector, as presented in Equation (2). Finally, Otsu [56] is applied on the CMI to obtain the binary change detection map.
Three classical spatial feature extraction methods, namely, extended morphological profiles (EMPs) [57], morphological attribute profiles (Aps) [58], and rolling guide filter (RGF) [59], which have been applied successfully for image classification, were introduced to explore the spatial feature of the bi-temporal VHR remote sensing images and verify our viewpoint in the first stage. The spatial and spectral features are denoted as F and adaptive weight [61]), which have been applied successfully in road extraction, land cover classification, and image segmentation with VHR remote sensing image, respectively, were proposed. Different strategies have different effects on spatial and spectral features, as shown in Equation (1). For example, the layer stacking method [35] is the most widely used multi-feature fusion approach that concatenates the multiple features into one vector (W 1 = W 2 = 1.0). The mean-weight fusion method [60] separates the effects on F t 1 spa and F t 1 spe as 0.5 (W 1 = W 2 = 0.5). In the adaptive weight method [61], the weight of a pixel is determined by the correlation between the center pixel and its surrounding neighbors, and the closer correlation implies a heavier weight, and consequently, more details can be tracked in the literature [61].
The change magnitude between the bi-temporal images is measured by considering the availability of the Manhattan distance for detecting land cover change [62]. The change magnitude between the corresponding pixel (P t 1 ij and P t 2 ij ) are calculated using the Manhattan distance [63] as presented in Equation (2). The entire bi-temporal image is processed pixel-by-pixel in this manner and a change magnitude image (CMI) is generated. The spatial-spectral feature used for calculating MD should correspond to the same feature fusion strategy. Therefore, three feature fusion approaches based on one composition of spatial-spectral feature will produce three CMIs.
To divide each CMI into a binary change detection map, the binary threshold method, Otsu [56], was employed to divide the CMI into a BCDM. Otsu assumes that the CMI has two classes, changed and unchanged, and calculates the optimum threshold separating the two classes so that their intra-class variance is minimal or equivalent. Otsu has been applied successfully in the prediction of the binary threshold for detecting land cover change (more details can be found in [64,65]).

Multi-scale Segmentation Voting Decision
In the second stage, the BCDMs are fused into the final change detection map through our proposed multi-scale segmentation voting decision method to further improve the performance of LCCD.
For the second stage, inspired by previous studies [14,49,50], a multi-scale segment voting decision method was developed as a post-processing fusion strategy. Multi-scale segmentation based on the post-event image was acquired using eCognition to ensure that the image was constructed in an object manner ("object" is a group of pixels homogeneous in spectra domain and connected continuously in the spatial domain [66]). Here, the post-event image refers to the image that depicts the occurrence of the detection target such as the landslide or building-up area. Then, the initial BCDMs and multi-scale segmentation were overlapped and the final change detection was generated in an object-by-object manner. In the final change detection map, the label of the pixel within an object was assigned according to the rule of the major voting decision. It is worth noting that the multi-scale segmentation utilized in the proposed approach, called the fractal evolution net approach (FENA) [67], has three parameters (scale, shape, and compactness). In addition, FNEA has been embedded in the eCognition 8.7 software as a "multi-scale segmentation" tool for processing images [68]. The shape and compactness were fixed at 0.8 and 0.9, respectively, because high compactness and homogeneity of segmental objects were expected in our proposed approach.
Combining an image object with MV [49] for fusing and smoothing the initial BCDM has three advantages. (1) The multi-scale segmentation is based on the post-event image and the pixels within an object usually have high-level homogeneity and can be deemed as the same material class. Therefore, some noise pixels can be removed effectively in the final change detection map. (2) The spatial information of the changed or unchanged area such as shape, size, and distribution is obtained from the post-event image through multi-scale segmentation. According to the multi-scale segmentation theory, the shape and size of an object yield to the shape and size of a target or a part of a target. Therefore, smoothing the changed or unchanged area can be done in an adaptive object manner instead of a pixel manner. An adaptive smoothing filter is more rational and practical than using a regular window for detecting land cover change, which is uncertain in terms of shape and size. (3) According to the characteristics of the proposed multi-scale segmentation voting decision strategy, the generation of the final change detection map is acquired by fusing the initial BCDMs. Therefore, the proposed approach has the potential ability to integrate the different advantages of the initial BDCM and improve detection accuracy. A schematic example is presented in Figure 2 to demonstrate the effectiveness of the proposed multi-scale segmentation voting decision. Therefore, the proposed approach has the potential ability to integrate the different advantages of the initial BDCM and improve detection accuracy. A schematic example is presented in Figure 2 to demonstrate the effectiveness of the proposed multi-scale segmentation voting decision.

Experiments and Analysis of Results.
In this section, three pairs of bi-temporal remote sensing images with very high spatial resolution were used to test the effectiveness of the proposed framework. First, the image data for the three land cover change events were described in detail. Then, the experiments were designed and presented. Finally, the results were compared and the parameter sensitivity of the proposed framework analyzed.

Dataset Description
The three image datasets used in the experiments are illustrated in Figure 3. These images were acquired through the aerial platform and QuickBird satellite. The images depict landslide change and land use change events. More details on the images are given below and in the caption of Figure 3.
Site A: The bi-temporal images shown in Figures 3a,b were acquired in April 2007 and July 2014, respectively. The bi-temporal image scene depicts the pre-and post-event of a landslide on Lantau Island, Hong Kong, China. The size of this site was 750 × 950 pixels with a spatial resolution of 0.5 m/pixel. The ground reference of the landslide inventory map was interpreted manually as presented in Figure 3c.
Site B: The bi-temporal images of Site B were acquired in the same way as the data for Site A. As shown in Figures 3d,e, the size of the scene was 1252 × 2199 pixels, with a spatial resolution of 0.5 m/pixel. The ground reference of the landslide inventory map of Site B is given in Figure 3f.
Ji'Nan QuickBird data: As shown in Figures 3g-h, the bi-temporal images were acquired by the QuickBird satellite in April 2007 and February 2009, respectively. The size of the image scene was 950 × 1250 pixels with a spatial resolution of 0.61 m/pixel. This area is also covered by different land-use types including crops, naked soil, roads, and railways, and the bi-temporal images are different for each season. These factors pose challenges in detecting land cover changes.
The ground reference of each dataset was interpreted manually. During the progress of interpretation, the bitemporal images were overlayered together. Then, mapping tools such as "Swipe", "Adjust Transparency", and the editing toolbars in the ArcMap 10.2, were employed to map the ground reference. In addition, to avoid missing change detection, the changed and unchanged areas were outlined grid-by-grid. Details of the ground reference for each dataset are presented in Table 1.

Experiments and Analysis of Results
In this section, three pairs of bi-temporal remote sensing images with very high spatial resolution were used to test the effectiveness of the proposed framework. First, the image data for the three land cover change events were described in detail. Then, the experiments were designed and presented. Finally, the results were compared and the parameter sensitivity of the proposed framework analyzed.

Dataset Description
The three image datasets used in the experiments are illustrated in Figure 3. These images were acquired through the aerial platform and QuickBird satellite. The images depict landslide change and land use change events. More details on the images are given below and in the caption of Figure 3.
Site A: The bi-temporal images shown in Figure 3a,b were acquired in April 2007 and July 2014, respectively. The bi-temporal image scene depicts the pre-and post-event of a landslide on Lantau Island, Hong Kong, China. The size of this site was 750 × 950 pixels with a spatial resolution of 0.5 m/pixel. The ground reference of the landslide inventory map was interpreted manually as presented in Figure 3c.
Site B: The bi-temporal images of Site B were acquired in the same way as the data for Site A. As shown in Figure 3d,e, the size of the scene was 1252 × 2199 pixels, with a spatial resolution of 0.5 m/pixel. The ground reference of the landslide inventory map of Site B is given in Figure 3f.
Ji'Nan QuickBird data: As shown in Figure 3g-h, the bi-temporal images were acquired by the QuickBird satellite in April 2007 and February 2009, respectively. The size of the image scene was 950 × 1250 pixels with a spatial resolution of 0.61 m/pixel. This area is also covered by different land-use types including crops, naked soil, roads, and railways, and the bi-temporal images are different for each season. These factors pose challenges in detecting land cover changes.
The ground reference of each dataset was interpreted manually. During the progress of interpretation, the bitemporal images were overlayered together. Then, mapping tools such as "Swipe", "Adjust Transparency", and the editing toolbars in the ArcMap 10.2, were employed to map the ground reference. In addition, to avoid missing change detection, the changed and unchanged areas were outlined grid-by-grid. Details of the ground reference for each dataset are presented in Table 1.

Experimental Designation
Three experiments were designed to demonstrate the effectiveness and superiority of the proposed framework in detecting land cover change with VHR remote sensing images.
In the first experiment, bi-temporal images of Site A were used to demonstrate the effectiveness of the proposed framework. The raw spectral features (false color bands) of the bi-temporal images were adopted to detect the landslide area based on the Manhattan distance and Otsu binary threshold. Then, three classic spatial extraction methods (EMPs [57], Aps [58], and RGF [59]) and three multi-features fusion methods (layer stacking [35], mean weight [60], and adaptive weight [61]) were validated in our proposed framework. The parameters of the multi-feature extraction methods are detailed in Table 2.
The superiority of the proposed framework was further investigated in the second experiments. In these experiments, four LCCD methods, namely PCA_Kmeans [40], CVA_FCM [41], Semi_FCM [42], and LSELUC [43], which also consider contextual information to improve the detection accuracy and have been applied successfully in practice, were used. The landslide Site B aerial and Ji'Nan QuickBird satellite images were employed for comparisons in each experiment. The optimized parameter settings of these approaches and datasets are given in Table 3. Table 2. Parameter settings for spatial feature extraction methods.

Spatial Extraction Methods Parameter Settings
EMPs SE = "disk", size of AE is 5 × 5. Three post-processing LCCD methods (MV [49], GPCF [50], and OBEM [51]) were employed and compared with the proposed framework based on the Site B and Ji'Nan datasets to further demonstrate the advantages of the proposed framework. The optimal parameters of the post-processing LCCD methods are presented as follows.
First, as the proposed approach concentrated on detecting the land cover change and guarantee the fairness of comparison, the parameters of the spatial feature extraction methods were fixed for each dataset, as shown in Table 2. Moreover, the generation of CMI and BCDM for all post-processing approaches was also based on the MD distance and Otsu binary threshold method. Second, in addition to the parameter settings for spatial feature extraction, the parameters for each post-processing approach were optimized using the trial-and-error approach. The optimized parameters of each post-approach and dataset are detailed in Table 4. Table 3. Parameter settings for the comparison between the proposed framework and the state-of-the-art pre-processing LCCD approaches for the different datasets.  Table 4. Optimized parameter settings of the comparison between the proposed framework and the post-processing methods for different datasets. Note: The parameters' requirement of each approach is presented in the "(.)".

Results and Analysis
Various measuring indices were considered in the quantitative assessment of the proposed framework: the ratio of false alarms (FA), the ratio of missed alarms (MA), the ratio of total errors (TE), overall accuracy (OA), and Kappa coefficient (Ka). All performance measuring indices were considered for a comparative analysis of the experiments.
As mentioned in the above sections, evaluation of the effectiveness of the proposed method, raw spectral feature, three spatial-spectral feature fusion approaches (layer stacking [35], mean-weight [60], and adaptive weight [61]), and the proposed framework was applied on the Site A VHR bi-temporal images for comparison. Table 5 shows that compared with methods that use the raw spectral feature alone, the spatial feature coupled with the spectral feature could clearly improve the cover detection accuracies. For instance, the improvement of FA was about 5.98% in terms of the layer stacking fusion approach [35] and the EMPs spatial feature extraction approach [57]. Furthermore, the proposed approach achieved the best detection accuracies when compared with the spatial-spectral feature fusion-based approach and that of using the raw spectral feature alone. Figure 4 demonstrates that the proposed framework clearly smoothened the salt-and-pepper noise in the results using the raw spectral feature alone and each spatial-spectral feature fusion-based approach.   The proposed approach was compared with state-of-the-art LCCD methods including PCA_Kmean [40], CVA_FCM [41], Semi_FCM [42], and LSELUC [43] to further outline the advantages of the proposed framework. For experimentation, two pairs of VHR remote sensing images were considered for comparison. Table 6 shows the results of the comparison of the Site B landslide aerial remote sensing images with state-of-the-art methods. The advantages of the proposed approach can be found in three ways: (1) The results showed that among the state-of-the art methods, the relatively new LSELUC approach achieved better accuracies because reliable local spatial information was considered through local uncertainties in the developed LSELUC approach. However, the detection accuracies of the proposed framework achieved the best accuracies in terms The proposed approach was compared with state-of-the-art LCCD methods including PCA_Kmean [40], CVA_FCM [41], Semi_FCM [42], and LSELUC [43] to further outline the advantages of the proposed framework. For experimentation, two pairs of VHR remote sensing images were considered for comparison. Table 6 shows the results of the comparison of the Site B landslide aerial remote sensing images with state-of-the-art methods. The advantages of the proposed approach can be found in three ways: (1) The results showed that among the state-of-the art methods, the relatively new LSELUC approach achieved better accuracies because reliable local spatial information was considered through local uncertainties in the developed LSELUC approach. However, the detection accuracies of the proposed framework achieved the best accuracies in terms of FA, MA, TE, OA, and Ka; (2) Different spatial-spatial feature fusion methods that may have different effects on the performance of the proposed framework were adopted in the proposed framework; however, the best accuracy could be achieved by the proposed framework regardless of which spatial-spectra feature fusion method was adopted; and (3) For the Site B aerial images, EMPs [57] coupled with spectral feature in the proposed approach acquired the best accuracies.
Comparisons among the approaches were performed as shown in the bar charts of Figure 5. The figure clearly presents the advantages of the proposed approach. The visual performance of the comparisons further verified the conclusion of comparisons of the Site B landslide aerial images as shown in Figure 6. The comparisons on the Ji'Nan QuickBird satellite images for detecting land cover and land use change were conducted and similar conclusions were reached. The details can be found in Table 7 and Figures 7 and 8. From the quantitative comparisons and visual performance, it can be seen that the proposed approach achieved the best accuracies in terms of FA, MA, and TA, regardless of the utilized spatial feature extraction approach that was employed. different effects on the performance of the proposed framework were adopted in the proposed framework; however, the best accuracy could be achieved by the proposed framework regardless of which spatial-spectra feature fusion method was adopted; and (3) For the Site B aerial images, EMPs [57] coupled with spectral feature in the proposed approach acquired the best accuracies. Comparisons among the approaches were performed as shown in the bar charts of Figure 5. The figure clearly presents the advantages of the proposed approach. The visual performance of the comparisons further verified the conclusion of comparisons of the Site B landslide aerial images as shown in Figure 6. The comparisons on the Ji'Nan QuickBird satellite images for detecting land cover and land use change were conducted and similar conclusions were reached. The details can be found in Table 7 and Figures 7 and 8. From the quantitative comparisons and visual performance, it can be seen that the proposed approach achieved the best accuracies in terms of FA, MA, and TA, regardless of the utilized spatial feature extraction approach that was employed.         The proposed approach was compared with MV [49], GPCF [50], and OBEM [51] to further investigate the advantages of the proposed approach as designed in the third experiment. The comparative results for the Site B landslide aerial images are shown in Table 8. From these comparisons, the proposed approach appears to have achieved a competitive detection accuracy when compared with that of MV [49], GPCF [50], and OBEM [51]. In addition, while visual performance was observed as shown in Figure 9, the proposed approach (the fourth column in Figure  9) presented less noise than the others. This finding was further verified in the quantitative  The proposed approach was compared with MV [49], GPCF [50], and OBEM [51] to further investigate the advantages of the proposed approach as designed in the third experiment. The comparative results for the Site B landslide aerial images are shown in Table 8. From these comparisons, the proposed approach appears to have achieved a competitive detection accuracy when compared with that of MV [49], GPCF [50], and OBEM [51]. In addition, while visual performance was observed as shown in Figure 9, the proposed approach (the fourth column in Figure  9) presented less noise than the others. This finding was further verified in the quantitative The proposed approach was compared with MV [49], GPCF [50], and OBEM [51] to further investigate the advantages of the proposed approach as designed in the third experiment. The comparative results for the Site B landslide aerial images are shown in Table 8. From these comparisons, the proposed approach appears to have achieved a competitive detection accuracy when compared with that of MV [49], GPCF [50], and OBEM [51]. In addition, while visual performance was observed as shown in Figure 9, the proposed approach (the fourth column in Figure 9) presented less noise than the others. This finding was further verified in the quantitative comparison in Table 9. A similar conclusion can be reached from the comparisons conducted on the Ji'Nan QB satellite remote sensing images shown in Table 9 and Figure 10. Table 8. Quantitative comparisons (%) between the proposed approach and three post-processing methods with different spatial-spectral feature fusion strategies in the Site B landslide aerial images.  Figure 9. Visual comparison of the detection maps between the post-processing and the proposed approaches with different spatial-spectral feature fusion methods in the Site B landslide aerial images. The caption on the left shows which spatial-spectral feature was adopted in this row, and the caption at the top shows which post-processing approach the column adopted. The caption on the left shows which spatial-spectral feature was adopted in this row, and the caption at the top shows which post-processing approach the column adopted. Table 9. Quantitative comparisons (%) between the proposed approach and three post-processing methods with different spatial-spectral feature fusion strategies in the Ji'Nan QB satellite images.  Table 9. Quantitative comparisons (%) between the proposed approach and three post-processing methods with different spatial-spectral feature fusion strategies in the Ji'Nan QB satellite images.  The caption on the left shows which spatial-spectral feature was adopted in this row, and the caption on the top shows which post-processing approach the column adopted.

Spatial-Spectral
The sensitivity of the detection accuracy and the parameters of the proposed approach are discussed in this section with the aim of extending the potential application of the proposed approach. We only observed the relationship between scale and detection accuracies. Parameter scale indicates the size of the segmental object and a larger scale will generate larger segments and the ground details The caption on the left shows which spatial-spectral feature was adopted in this row, and the caption on the top shows which post-processing approach the column adopted.
The sensitivity of the detection accuracy and the parameters of the proposed approach are discussed in this section with the aim of extending the potential application of the proposed approach. We only observed the relationship between scale and detection accuracies. Parameter scale indicates the size of the segmental object and a larger scale will generate larger segments and the ground details of a target may be smoothened. In contrast, a smaller scale will yield a smaller segment and more ground detail will be preserved. However, more noise will be introduced to the detection results. Therefore, an appropriate scale should be adjusted according to the given images. Figure 11 shows that MA, TE, and FA decreased as the scale increased. Furthermore, MA and TE decreased gradually when the scale ranged from 10 to 30. However, when the value of the scale was larger than 30, MA and TE remained at a horizontal level. These results can be attributed to the size of the segments being large enough to obtain optimum accuracy because the scale has less effect on the size of the segments as the other parameters (compactness and shape) are fixed. In addition to MA and TE, FA also decreased as the scale increased and then fluctuated in the range of 2.99 to 3.91. This fluctuation may be caused by the uncertain distribution of spatial heterogeneity.
The relationship between the segmental scale and the detection accuracy in the Ji'Nan QB satellite images was also investigated. Figure 12 shows that for the Ji'Nan dataset, FA and TE decreased as the scale increased, and MA first increased then decreased. These findings are helpful in determining the parameters of the proposed approach. of a target may be smoothened. In contrast, a smaller scale will yield a smaller segment and more ground detail will be preserved. However, more noise will be introduced to the detection results. Therefore, an appropriate scale should be adjusted according to the given images. Figure 11 shows that MA, TE, and FA decreased as the scale increased. Furthermore, MA and TE decreased gradually when the scale ranged from 10 to 30. However, when the value of the scale was larger than 30, MA and TE remained at a horizontal level. These results can be attributed to the size of the segments being large enough to obtain optimum accuracy because the scale has less effect on the size of the segments as the other parameters (compactness and shape) are fixed. In addition to MA and TE, FA also decreased as the scale increased and then fluctuated in the range of 2.99 to 3.91. This fluctuation may be caused by the uncertain distribution of spatial heterogeneity.
The relationship between the segmental scale and the detection accuracy in the Ji'Nan QB satellite images was also investigated. Figure 12 shows that for the Ji'Nan dataset, FA and TE decreased as the scale increased, and MA first increased then decreased. These findings are helpful in determining the parameters of the proposed approach.

Conclusions
In the present work, a novel framework for detecting land cover change using spatial-spectral feature fusion and multi-scale segmentation voting decision strategies was proposed. Instead of using a single feature to obtain the binary change detection map directly, spatial features were extracted and coupled with the raw spectral feature through different fusion strategies. Different spatialspectral features were provided with different initial BCDMs. Finally, a multi-scale segmentation voting decision strategy was proposed to fuse the initial BCDMs into the final change detection map. The main contribution of the proposed approach was that it provides a comprehensive framework for more accurate land cover change detection using bitemporal VHR remote sensing images. Multispatial features and different feature fusion strategies were introduced to generate the initial BCDMs. In addition, multi-scale segmentation voting decision was first promoted to fuse the initial BCDMs into the final change detection map. The advantages of multi-scale segmentation voting decision have two aspects: (1) the different performance of the initial BCMDs, which are obtained from different spatial-spectral features, can be utilized together to avoid the bias detection; and (2) majority voting with the constraint of a multi-scale object can consider the uncertainty of the ground target such as the shape and size of a target, which is helpful in improving the voting accuracy.
Experiments were carried out on three pairs of datasets to confirm the effectiveness of the proposed approach. The results of the experiments showed that the proposed approach achieved better performance than using the raw spectral feature alone and other state-of-the-art LCCD techniques. However, one limitation of the proposed framework is that it requires many parameters in practical application, and the optimized parameter setting for a specific dataset is time-consuming. In the future, an extensive investigation of the proposed approach will be conducted on additional types of images and land cover change events such as unmanned aerial vehicle images and forest disasters. Theoretically, further investigations on a method with various sourcing image and land cover change events will improve the robustness of the method. A comprehensive investigation will also broaden the applicability of the proposed approach.

Conclusions
In the present work, a novel framework for detecting land cover change using spatial-spectral feature fusion and multi-scale segmentation voting decision strategies was proposed. Instead of using a single feature to obtain the binary change detection map directly, spatial features were extracted and coupled with the raw spectral feature through different fusion strategies. Different spatial-spectral features were provided with different initial BCDMs. Finally, a multi-scale segmentation voting decision strategy was proposed to fuse the initial BCDMs into the final change detection map. The main contribution of the proposed approach was that it provides a comprehensive framework for more accurate land cover change detection using bitemporal VHR remote sensing images. Multi-spatial features and different feature fusion strategies were introduced to generate the initial BCDMs. In addition, multi-scale segmentation voting decision was first promoted to fuse the initial BCDMs into the final change detection map. The advantages of multi-scale segmentation voting decision have two aspects: (1) the different performance of the initial BCMDs, which are obtained from different spatial-spectral features, can be utilized together to avoid the bias detection; and (2) majority voting with the constraint of a multi-scale object can consider the uncertainty of the ground target such as the shape and size of a target, which is helpful in improving the voting accuracy.
Experiments were carried out on three pairs of datasets to confirm the effectiveness of the proposed approach. The results of the experiments showed that the proposed approach achieved better performance than using the raw spectral feature alone and other state-of-the-art LCCD techniques. However, one limitation of the proposed framework is that it requires many parameters in practical application, and the optimized parameter setting for a specific dataset is time-consuming. In the future, an extensive investigation of the proposed approach will be conducted on additional types of images and land cover change events such as unmanned aerial vehicle images and forest disasters. Theoretically, further investigations on a method with various sourcing image and land cover change events will improve the robustness of the method. A comprehensive investigation will also broaden the applicability of the proposed approach.