1. Introduction
Land cover change detection based on remotely sensed data [
1] is a technique whose main objective is to identify differences on the ground surface in multitemporal or bitemporal images. This process is widely used in the field of remote sensing for a wide range of applications on vegetation images including monitoring changes in the vegetation cover, identifying areas affected by natural disasters such as wildfires, assessing the effects of climate change on vegetation patterns, or detecting invasive vegetation species [
2,
3,
4].
Change detection in remote sensing images is highly important in many applications related to vegetation, natural spaces, agriculture, and different ecosystems. For example, the mapping of vegetation cover in terrestrial and aquatic environments is a key indicator of environmental health in marine and freshwater ecosystems [
5]. Change detection enables the identification of temporal variations in land cover or land use, providing critical insights for diverse applications spanning agriculture, geology, forestry, regional mapping and planning, oil pollution monitoring, and surveillance [
6,
7]. These techniques can also be used to monitor environmental conditions such as the impact of natural disasters, evaluate desertification, or detect specific urban or natural variations in the environment [
3].
In the domain of change detection in vegetation images, a critical challenge is the detection of changes between different types of vegetation [
8]. Primarily, this complexity is due to the similar spectral signatures that different vegetation types exhibit [
1]. Therefore, pixel-level change detection methods based on comparing spectral signatures are limited in detecting these changes, a task that becomes particularly vital when identifying changes in native compared to foreign vegetation [
9,
10]. This enhancement is crucial for maintaining ecological balance and managing biodiversity, demonstrating the practical implications of our proposed methodology. Different vegetation types have been correctly classified in the literature using object-based algorithms based on spatial information and textures [
11].
With the increasing improvement of the sensors used to capture images of the Earth’s surface, the resolution per pixel has also increased, specifically in the case of very-high-spatial-resolution (VHR) images. These images usually have a resolution from sub-metre to several metres per pixel [
12,
13] and contain a large amount of spatial information, but present several problems in terms of change detection. Since VHR images contain limited spectral information of only four or five bands [
14], it is more difficult to separate classes containing a similar spectral signature because of the low variance between classes [
15]. Moreover, there is a higher intra-class variance of the classes represented in the image because of the higher spatial resolution of each pixel [
16,
17].
Consequently, the pixel-level change detection used thus far generates rather poor results in terms of accuracy for VHR images because such algorithms only rely on the use of spectral information. These algorithms include CVA-based techniques [
1,
4,
18,
19,
20,
21,
22,
23,
24] and do not exploit the extensive amount of spatial information provided by the structures and textures of the VHR images [
25,
26,
27]. Therefore, different object-based change detection (OBCD) algorithms have been incorporated to take advantage of the spatial information [
28,
29,
30,
31], which can also be used to detect homogeneous structures to reduce the computational cost of the techniques.
The minimum unit or structure used for the detection is pixel-level or object-level algorithms [
32]. Traditionally, pixel-level methods have been used in low–medium resolution images, for example change vector analysis (CVA) is widely used [
33] to analyse the change vector between pairs of pixels of bi-temporal images to generate a difference image or magnitude [
21]. Each pixel of this image stores the intensity of change, and a binary thresholding algorithm is used to determine the intensity above which a pixel is defined as a change or not. Statistical techniques such as expectation–maximisation (EM) [
34] or, most frequently, the Otsu algorithm [
35,
36,
37,
38] have been used to obtain an optimal threshold.
Several detection techniques based on the fusion of results from different methods, such as different object scales (multi-scale approach) [
39] or different algorithms (multi-algorithm approach) [
40], have been developed to take full advantage of the properties of spatial information extraction algorithms. Multi-scale and multi-algorithm techniques have one step in common: merging of results. The most-commonly used methods in the literature for this purpose are majority vote (MV) [
41], Dempster–Shafer (DS) [
42], or fuzzy integral (FI) [
43]. Currently, most detection methods that use fusion techniques are based on deep learning and supervised detection. The main approach is to have several classifiers in parallel with the same objective, but to vary the parameters of the supervised learning [
44,
45,
46], for example using different convolutional neural networks (CNNs) differing in the convolutional filter size (e.g., 3 × 3, 5 × 5, and 7 × 7) in parallel and then fusing their results [
47,
48]. Another reported approach is to perform feature-based change detection using textures and morphological profiles at different scales [
49]. There are also object-based multi-scale techniques where variability is achieved by varying the scale of each object using circular structures [
50] or segmentation algorithms [
51]. Ensemble learning has also been used to perform fusion for change detection [
52,
53,
54] and is based on a combination of multiple classifiers to make a final decision, whereby the main objective is to minimise the errors of a single classifier by compensating them with other classifiers, resulting in a higher overall accuracy [
55].
The most-commonly used algorithms for object extraction in a non-regular context are segmentation algorithms, which divide the image into segments, homogeneous regions according to some criterion. Segmentation is not a well-defined problem, so different algorithms have been proposed based on different approximations. The watershed transform [
56] is a region-based technique that can find the catchment basins and watershed lines for any grayscale image. It is not directly applicable to multidimensional images, so it is common to use an algorithm such as robust colour morphological gradient (RCMG) [
57], which reduces the dimensionality to a single band before segmentation. Superpixel methods are another type of segmentation algorithm, which can modify the granularity of the segmentation depending on parameters such as segment size or regularity. Superpixel techniques are separated into different categories [
58,
59]: watershed-based such as waterpixels [
60] and morphological superpixel segmentation (MSS) [
61] based on the watershed algorithm; density-based such as quick-shift (QS) [
62]; graph-based, which interpret the image as a non-directional graph such as entropy rate superpixels (ERSs) [
63]; clustering-based, which use clustering algorithms to generate the segmentation. The latter group involves simple linear iterative clustering (SLIC) [
64] or extended topology preserving (ETPS) [
65]. Based on these segmentation algorithms, different multi-resolution approaches have been proposed by applying a segmentation algorithm at different spatial resolutions. These methods are especially adequate for VHR images or for those datasets that present change areas of different sizes [
66]. The multi-resolution segmentation (MRS) algorithm [
67] and the MSEG algorithm [
68] are commonly used multi-resolution segmentation algorithms. In general, all segmentation algorithms can be designed to exploit not only the spatial information of the images, but also the spectral information by considering the different bands of the image for each pixel. In this study, three segmentation algorithms that exploit all the spectral information available in the images were selected including the watershed algorithm as it is very efficient in remote sensing [
69,
70] and SLIC and waterpixels because they preserve the edges and boundaries in the images and provide flexibility as the superpixel algorithms can be tuned by adjusting the size and shape of the superpixels.
Different techniques can be used to analyse multi-temporal images, and they are classified into two main groups: those based on multi-temporal information at the feature level and those based on the multi-temporal information at the decision level [
71,
72]. The feature-based group is directly related to unsupervised change detection using techniques such as the calculation of the ratio or pixel-to-pixel difference between two images. The decision-based change detection approach is usually based on the post-classification of the processed images and on the calculation of the differences between the generated classification maps [
73,
74,
75] or performing a joint classification of multiple images [
49,
76] with domain adaptation [
77,
78].
The two most-common algorithms for detecting changes are binary detection and multiclass detection. The first generates a change map, where each pixel is classified into a binary set as change or unchanged, whereas multiclass detection classifies pixels associated with changes into a set of classes corresponding to different types of changes [
79,
80,
81].
Regarding the field of application, some multi-scale methods focus on detecting changes at the level of structures or buildings [
82] using the LEVIR-CD building dataset consisting of VHR images taken from Google Earth. Some researchers have developed methods for detecting changes in vegetation [
44,
49,
83,
84,
85], specifically for detecting the rapid growth of invasive species. In particular, one research group [
49] used texture extraction and multi-scale for performing an extended morphological profile, which was then used as the input for a set of support vector machine (SVM) classifiers. The objective was to detect changes related to the replacement of vegetation by buildings using the multispectral Steinacker dataset composed of two pan-sharpened QuickBird images. The method proposed in [
44] uses a similar approach, but instead of using multi-scale object-based techniques, it uses the multiresolution segmentation algorithm (MRS) with multiple SVM classifiers for detecting changes between VHR multispectral images in land use mainly from cultivated land to bare soil. Using algebraic techniques not based on machine learning, Reference [
86] extracted textures with a single object-level descriptor (not multi-scale) and differentiated between them; this method has been used on VHR images of vegetation, in particular for changes from vegetation to bare ground or from vegetation to buildings.
The main contribution of this paper is the proposed multi-scale binary change detection based on consensus techniques for multi-algorithm fusion. Consensus exploits the same idea as in ensemble learning, but from a classical algorithmic approach rather than a machine learning approach. Ensemble learning uses the combination of multiple classifiers with the same objective and, therefore, is more robust than individual algorithms in detecting changes [
54,
55]. By using multiple algorithms, ensemble learning is also less sensitive to noise and outliers in the data, which can lead to more accurate results [
53]. Overall, ensemble learning is a powerful approach to improving the robustness and flexibility of change detection in remote sensing. The proposed method was designed for multispectral and hyperspectral VHR multi-temporal vegetation datasets using the CVA-SAM at the segment level to compute the difference maps. This technique combines several multi-scale segmentation algorithms by making a consensus fusion of the results obtained by different multi-scale segmentation algorithms. The result is a binary change detection map that indicates whether there is a change or not for each pixel.
3. Results
This section analyses the quality in terms of the accuracy of the proposed multi-scale approach based on consensus techniques and compares it using different configurations. The different metrics used to assess the quality of change detection are described in
Section 2.3.
Table 4 shows the results obtained by the proposed change detection (CD) method in terms of CP or recall using three levels of segmentation for each of the three segmentation algorithms. The results for all of the intermediate phases are also shown. This table shows the accuracy of every single level of segmentation (single scale) after merging several levels with the five different fusion algorithms described in
Section 2 (Columns 5–9) and, finally, after merging the results of each multi-scale segmentation using two consensus techniques (last two columns in the table). The colour code from dark red to dark green made it easy to observe that the worst results were obtained using a single-scale technique. Multi-scale fusion yielded better quality results, particularly with the fusion based on the Euclidean distance (ED) for all of the datasets. The ED detected more changes at the cost of some erroneous detection, as mentioned in the previous section. Finally, the best results were obtained by performing consensus fusion: merging the results of each ED multi-scale fusion, in particular with the OR fusion technique. These results confirmed the effectiveness of the proposed method with the ED, multi-scale fusion, and OR consensus fusion, obtaining accuracies exceeding 94% in both VHR and low- or medium-resolution datasets.
Figure 11 presents the croppings of the Level-1 segmentation maps generated for each dataset using SLIC and waterpixels, showing that, for the same image (one row of the figure), the three different segmentation algorithms provided different spatial information even when the same average size for the segment was used for all.
The problem of detecting changes in vegetation, particularly from VHR images, requires first considering that the vegetation changes were usually at the level of objects of different non-regular sizes and shapes; thus, segmentation at several granularity scales was considered. Therefore, our first hypothesis was that there was an improvement in CP (recall) when using a multi-scale segmentation for a single scale.
Figure 12a presents the difference between the results obtained by the multi-scale approach using the best-proposed fusion method (ED fusion in
Table 4) and the average of the accuracies obtained using the single-scale detector with the scale L0, showing improved CP in all of the datasets and with all of the segmentation algorithms upon merging the scales.
In addition, as segmentation is not a well-defined problem and VHR images present richer spatial information, different segmentation algorithms were proposed, and their results were combined using consensus techniques. Therefore, it was necessary to test the hypothesis that a consensus-based fusion among segmentation algorithms improved the accuracy of the multi-scale version with only one segmentation algorithm. To answer this question,
Figure 12b shows the improvements in CP using the consensus technique approach over the use of a single multi-scale segmentation algorithm, with improvements obtained for all of the datasets obtaining the highest results in the case of the OR fusion technique. This improvement was, as expected, particularly high for the VHR datasets, i.e., an average improvement of +5.33 percentage points (pp) and, specifically, an average improvement of +8.33 points for the two VHR datasets (Oitaven and Ermidas).
Table 5 presents the accuracy results for each step of the technique evaluated in terms of different metrics: CP, NCA, and OA, showing the multi-scale detection results using only the best option selected above, the ED fusion, and the results using consensus fusion by MV and OR. Since the MV technique reduces the number of FPs (higher NCA) at the cost of detecting fewer changes (a higher number of FNs and a lower CP), this technique obtains results where each detected change has a higher probability of being a real change. In contrast, the OR fusion technique has the opposite behaviour, as it detects a larger number of changes (reduces the number of FNs and increases CP) at the cost of an increase in the number of FPs (lower NCA). Therefore, the latter technique allows the generation of change maps for which the number of undetected changes is minimal. The best consensus method for the main objective of this paper is the OR fusion because it obtained the best CP results (marked in bold) with only small OA decreases.
Figure 13 shows a detailed comparison of the OR and MV fusion techniques for the reclassification of the controversial pixels that allowed us to analyse whether there was an increase (positive value) or decrease (negative value) in the quality of OR with respect to MV. It can be seen that, in all datasets, there was an average +4.30 pp of improvement in CP when using the OR technique with respect to MV, while the OA was not reduced by more than −1.48 pp on average. This indicated that the OR fusion technique aims to maximise the number of detected changes at the cost of failing in some pixels, and this can be seen in the increase with respect to the MV of the CP for all datasets. In this paper, the OR technique was selected as the best consensus technique because it detected more changes than MV without committing many errors in comparison.
Focusing more on the consensus techniques,
Table 6 shows the uncontested and controversial change pixels for each dataset and the subsequent reclassification of the controversial pixels. As explained in
Section 2, the uncontested pixels were classified by all three detectors on the same class; for the controversial pixels, there was no consensus among the detectors, so they had to be reclassified. As shown in
Table 6, the OR technique reclassified all of the controversial pixels as a change, while the MV technique, on average, reclassified more pixels as no change than change.
Figure 14a shows the percentage of pixels of each type for each dataset with a higher percentage of uncontested pixels than controversial pixels, which implied that there was a consensus between the three detectors of 70.90% on average without the application of any additional technique. The remaining 29.10% were pixels with no consensus among the detectors that had to be reclassified. Two techniques were applied to perform the reclassification of the controversial pixels: OR fusion, which reclassified all of the controversial pixels as change, and the MV fusion shown in
Figure 14b, which reclassified on average 56.93% of the controversial pixels as no change and the rest as change.
Figure 15 shows the binary change maps for each dataset obtained by each stage of the proposed technique: the multi-scale data fusion of each segmentation algorithm and, finally, the consensus decision fusion of all of the segmentation algorithms used. The multi-scale data fusion columns are the result of merging the difference maps (data level fusion) of each single-scale segmentation with the ED fusion technique for the segmentation algorithm indicated. The consensus decision fusion maps were the result of merging the multi-scale maps shown in the previous columns by using the indicated fusion technique (MV or OR). Finally, the white pixels denote the changes correctly detected; magenta pixels correspond to pixels wrongly classified as a change (FP); yellow pixels indicate the changes not detected by the algorithm (FN). Note that the number of changes not correctly detected (yellow pixels) by individual segmentation-based detectors (four first images in each row) for the Oitavén and Ermidas examples decreased when the four segmentation algorithms were considered (last two columns in the figure), while the number of pixels wrongly classified as change (in magenta) did not increase after fusing the results of the different detectors.
Figure 16 shows that the two initial hypotheses were fulfilled: First, the multi-scale approach detected more changes than a single-scale approach and, second, the consensus between different detectors increased with an increase in the CP with respect to a single detector. The results shown in
Figure 16 are for the single-scale approach: the average of the CP values obtained for three different segmentations; for multi-scale: the average for three multi-scale detectors with an ED fusion; the OR consensus result.
The proposed technique was compared to related works on change detection applied to vegetation or building CD datasets, as well as the
-score measure as it was the most-interesting for this study as it assigned a higher weight to CP than to CR.
Table 7 shows the accuracies obtained in terms of CP, NCA, and
-score for each of the techniques in the literature and the proposed method, with the best results in each category (CP, NCA, and
-score) for each dataset shown in bold. The first technique in
Table 7, DS [
40], performed multi-algorithm change detection to calculate a binary change map using three pixel-level detection algorithms: CVA, IRMAD, and PCA. Then, it performed a fusion of these algorithms using a segmentation map and the Dempster–Shafer algorithm. The second method is M
CVA [
39], which performed change detection based on feature extraction using morphological profiles varying the scale of the structural element and merging the results at the data level using CVA. Finally, the last method is KPVD [
86], which performed an object-based change detection using a single segmentation scale with a proposed key point vector distance (KPVD). Previously, two of these methods (M
CVA and KPVD) were applied to the VHR multispectral vegetation dataset [
39,
86], while DS was applied to detect changes in the VHR multispectral buildings dataset. These methods were chosen because they use techniques related to the one proposed in this paper. DS [
40] proposes a multi-algorithm approach with the use of consensus techniques and different segmentation algorithms. Additionally, M
CVA [
39] proposes using different scales of spatial information extraction using morphological profiles. Finally, KPDV [
86] proposes a spatial information extraction technique based on segments. From
Table 7, it can be observed that the best results in terms of CP and
-score were obtained by the technique proposed in this paper. As shown in
Table 7, DS, M
CVA, and KPVD achieved high values of non-change accuracy (NCA) and relatively low values of completeness (CP) and
-score possibly because these methods focused on increasing the accuracy in the detection of non-changed pixels (NCA measurement), while our proposal prioritised the accuracy in the detection of changed pixels (CP measurement).
Finally,
Figure 17 illustrates the comparative results by displaying the binary change maps obtained by the selected methods, showing that the proposed method obtained lower values of false negatives (FNs) than the related methods for all the datasets. This observation is compatible with the CP values observed in
Table 7.
4. Discussion
The main contribution of this paper was the proposed method for binary change detection over medium- and VHR multispectral and hyperspectral images for land cover vegetation applications for the extraction of object-based features based on the use of multi-scale detection. The method uses several detectors, each one built over a segmentation algorithm applied at different scales. As changes in vegetation present high variability depending on the capture conditions such as illumination, CVA using the SAM distance (called the CVA-SAM in this paper) was applied at the segment level. The quality of the proposed method was evaluated using different configurations and metrics and compared to other reported similar methods, showing that the proposed method achieved high accuracy in terms of recall (CP) for every single level of segmentation and after merging the results of each multi-scale segmentation using consensus techniques. CP is the preferred metric for evaluation as the accuracy in the detection of changed pixels is prioritised over the accuracy in the detection of non-changed pixels.
Section 3 highlights different features of the proposed method as they significantly contribute to the change detection results obtained. Firstly, the use of object-based features and multi-scale segmentation allowed adequate exploitation of the spatial information contained in the images and the identification of changes at different granularity levels. Secondly, the use of several detectors based on different segmentation algorithms improved the accuracy of the change detection process over the use of a single segmentation algorithm. Finally, the use of CVA-SAM applied at the segment level instead of at the pixel level improved the robustness of the approach to changes in the images such as those produced by illumination conditions.
Moreover, the proposed method can detect changes between different vegetation types with better results than those reported in the literature. The proposed method was particularly efficient for detecting changes in vegetation due to several characteristics. The segmentation-based approach averages the spectral characteristics of the vegetation taking into account the growth stage, variability in canopy structure, shading, lighting, and so on. In addition, the CVA-SAM by using angle instead of magnitude was more robust to environmental or atmospheric changes, e.g., lighting, whereas the method was more adaptable to changes of different granularities such as changes in vegetation using multi-scale segmentation. Finally, the use of consensus techniques allowed the application of different segmentation methods to extract different information from the segment as discussed above.
Regarding the method limitations, the computational cost of the technique in terms of execution time was considered high as several segmentation algorithms at different scales were computed. However, the computation of changes on a segment basis instead of on a pixel basis reduced the computational cost, with the CVA-SAM computed using a representative pixel vector for each segment of the image. The need for user intervention in the selection of the parameters could be considered a strong limitation. Indeed, the selection of several parameters, such as the number of segmentation levels and the parameters of the segmentation algorithms such as segmentation size, which affect the accuracy of the results, need to be performed. Nevertheless, the impact of the parameter selection in the detection results was small, for example if the scaling parameter was changed by ±10%, the differences in CP accuracy varied less than 2 percentage points. Regarding the number of segmentation levels selected, varying the number of levels between 3 and 9 to analyse the variability of the obtained CP values was performed with the resulting CP values only varying by ±1.5 pp on average compared to the use of 3 levels. To alleviate the burden of parameter tuning for users, the parameters could be determined based on the sensor characteristics used to capture the images.
Future work will involve an optimisation algorithm to configure the parameters of the methods, such as the number of levels, initial scales, and steps between scales as a function of the input images. Furthermore, the incorporation of additional context information available in the VHR images by the use of, for example, textures will be considered. In addition, some improvements could be applied to the computation of the difference images by adapting the CVA-SAM to the spectral patterns of variability observed in vegetation images. Finally, as computational efficiency is also relevant, an exhaustive analysis of the performance in terms of execution time and computational resources required is also necessary, as well as the use of hardware accelerators to reduce execution time while maintaining correctness and accuracy.
5. Conclusions
This paper proposed an unsupervised binary change detection technique using multi-scale segmentation and merging by consensus. The technique was adapted to detect changes in multispectral and hyperspectral vegetation images, in particular VHR images, with changes detected at the level of objects and priority given to minimise the undetected changes. The use of multi-scale and consensus techniques allowed the detection of all possible changes at different granularity levels, taking advantage of the high spatial information provided by VHR images. The CVA-SAM algorithm applied at the level of uniform regions produced by the segmentation algorithms allowed the different stages to be analysed in terms of accuracy, with the final results compared to those of other reported techniques.
The proposed method was effective for identifying vegetation changes, as the use of multi-scale segments instead of pixels allowed for adapting to the granularity of changes, which is especially important for irregular changes, for example forests. This is essential for discerning various vegetation types due to the extraction of spatial information. Secondly, given that vegetation changes can exhibit considerable variability based on image capture conditions, such as illumination, this can be resolved by the application of the CVA-SAM algorithm at the segment level. Lastly, the incorporation of a consensus approach among different multi-scale detectors maximises the number of changes detected when using different spatial information extraction techniques for an efficient and robust technique.
Five change detection datasets consisting of multispectral and hyperspectral images with vegetation changes were used, two of them being multispectral VHR images of rivers in Galicia, concluding that the use of multi-scale segmentation improved the CP results compared to a single-scale version. Additionally, the incorporation of consensus techniques between different multi-scale detectors obtained more accurate results. For example, in the case of the Oitavén dataset, a CP of 97.60% was obtained compared to an average of 89.15% in the single-scale approach. The best method for scale fusion was ED fusion, which maximised the number of changes detected by the technique, and OR fusion improved the accuracy with respect to MV.
The proposed technique also improved, in terms of accuracy and -score, the results obtained by the other solutions proposed in the literature, obtaining improvements of +18.90 pp in CP (recall) and +19.61 pp on average in the case of the VHR datasets. The -score metric obtained improvements of +12.91 pp on average.