Variational-Scale Segmentation for Multispectral Remote-Sensing Images Using Spectral Indices

: Many studies have focused on performing variational-scale segmentation to represent various geographical objects in high-resolution remote-sensing images. However, it remains a signiﬁcant challenge to select the most appropriate scales based on the geographical-distribution characteristics of ground objects. In this study, we propose a variational-scale multispectral remote-sensing image segmentation method using spectral indices. Real scenes in remote-sensing images contain different types of land cover with different scales. Therefore, it is difﬁcult to segment images optimally based on the scales of different ground objects. To guarantee image segmentation of ground objects with their own scale information, spectral indices that can be used to enhance some types of land cover, such as green cover and water bodies, were introduced into marker generation for the watershed transformation. First, a vector ﬁeld model was used to determine the gradient of a multispectral remote-sensing image, and a marker was generated from the gradient. Second, appropriate spectral indices were selected, and the kernel density estimation was used to generate spectral-index marker images based on the analysis of spectral indices. Third, a series of mathematical morphology operations were used to obtain a combined marker image from the gradient and the spectral index markers. Finally, the watershed transformation was used for image segmentation. In a segmentation experiment, an optimal threshold for the spectral-index-marker generation method was identiﬁed. Additionally, the inﬂuence of the scale parameter was analyzed in a segmentation experiment based on a ﬁve-subset dataset. The comparative results for the proposed method, the commonly used watershed segmentation method, and the multiresolution segmentation method demonstrate that the proposed method yielded multispectral remote-sensing images with much better performance than the other methods.


Introduction
Based on the development of remote-sensing technology, high-spatial-resolution remotely sensed images, such as IKONOS, Quickbird, GeoEye-I, and WorldView, are available for use in environmental monitoring, management, and protection works. High spatial resolution facilitates the retrieval of the structural details of geographical objects for land cover/use mapping and monitoring. However, high spatial resolution can generate saltand-pepper noise effects in pixel-based image-analysis methods. Therefore, the object-based image analysis (OBIA) technique, which provides information on images based on meaningful objects, was proposed for high-spatial-resolution remote-sensing image analysis. The main prerequisite for OBIA is image segmentation, which has already been recognized as a valuable approach for identifying regions instead of pixels as feature carriers, which are then used for classification [1][2][3][4][5][6][7]. In addition to the use of remote-sensing data with high spatial resolution, the OBIA technique has also been used for gathering remote-sensing images with lower spatial resolutions, such as ASTER and TM data [8][9][10][11][12].
The goal of an image segmentation algorithm is to divide an image into meaningful separate regions that are homogeneous with respect to one or more properties, such as texture, color, or brightness [13]. These properties fall into four categories: characteristic feature thresholds or clusters, edge detection, region growing or extraction, and iterative pixel classification [14]. An image-segmentation algorithm may belong to two or more of these categories. Many image-segmentation algorithms have been proposed in the field of computer vision over the past several decades. There are also many applications of image segmentation in remote sensing, such as watershed transformation [15,16], region growing [17][18][19], Markov random field models [20,21], and fuzzy image regions [22].
However, a major difficulty in processing natural images is that changes can and do occur over a wide range of scales [23,24]. Remote-sensing images are used to represent the natural geographical world. Therefore, it is difficult to obtain optimal image-segmentation results according to the different scales of ground objects [25]. The multi-scale segmentation strategy is widely used to handle the difficulty of wide scale ranges [26], but the automatic selection of optimal segmentation scales for successive analysis remains a significant challenge [27] for which unsupervised segmentation evaluation methods are widely adopted [28][29][30].
Originally, many efforts were made to select a single optimal segmentation scale by combining intra-segment homogeneity and inter-segment heterogeneity [31][32][33][34][35] or by considering abrupt changes in homogeneity in terms of all segments [36][37][38]. However, global optimal segmentation scales still contain segments that are either too coarse or too fine because segmentation based on a single global scale parameter makes it difficult to separate various geographical objects [33].
To overcome this problem, the concept of deriving locally adaptive scale parameters has received significant attention in recent years with the goal of selecting optimal remote-sensing image segmentation scale parameters for different regions or objects [39]. There are two main types of automatic methods for determining local scale parameters. One is to tune a global optimal scale parameter locally according to the heterogeneity of local structures [40][41][42][43]. The other is to first partition an image into different regions or landscapes and then determine the optimal segmentation scale for each region using a global evaluation measure [44,45]. Overall, the segmentation performance of such methods depends on the effectiveness of the global evaluation measure. Additionally, locally optimized scale parameters are mainly designed for different objects without considering the spectral and spatial characteristics of different land-cover categories. We propose the utilization of spectral indexes for determining the optimal segmentation scales of different land-cover categories in the watershed transformation framework.
Because spectral indices from multispectral remote-sensing data such as the normalized difference vegetation index (NDVI) and the normalized difference water index (NDWI), which are used to enhance the information of some specific ground objects, may not be influenced by the interior textures of geographical objects and can maintain consistency between segmentation results for agminated land cover (e.g., vegetation, water, and snow), as well as the outlines of real ground objects in a natural scene, they can be used to generate markers for variational-scale segmentation. The watershed transformation [46,47] is a widely used method in which the quality of the markers directly determines the segmentation scales. Therefore, final segmentation quality depends on marker generation. If markers can be generated according to the self-scales of geographical objects, then the watershed transformation can be used to obtain variational-scale segmentation results for geographic objects. Spectral-index-based markers can also help to overcome the phenomenon of over-segmentation because they are sensitive to irrelevant local minima generated by gradients. Overall, remote-sensing images can provide more-complex scenes for watershed-transformation-based segmentation compared to other images, such as medical, finger, eye, and facial images.
In this study, the main goal was to use the spectral indices of multispectral remotesensing images to develop a scale-variable watershed method for multispectral remote-sensing image segmentation based on watershed transformation. First, multispectral gradient-based markers were obtained using a vector field model (VFM) for multispectral remote-sensing images. Second, a novel marker generation method was applied to determine corresponding spectral indices based on kernel density estimation (KDE). Third, mathematical morphology was used to integrate gradient information and spectral indices into a single combined marker. Next, the final combined marker was used with the watershed transformation method for multispectral remote-sensing image segmentation. Finally, high-spatial-resolution multispectral remote-sensing datasets were used as experimental data to validate the proposed segmentation method.
The remainder of this article is organized as follows. Section 2 discusses marker generation for a multispectral gradient using a vector field model, marker generation using spectral indices, and image segmentation based on a combination of markers from gradient and spectral indices. In Section 3, we introduce experimental data, evaluation measures, and our experimental setup. Section 4 describes experiments on two high-resolution multispectral remote-sensing images. The segmentation results and gradient-marker-controlled watershed transformation for each image are compared. Section 5 summarizes our results and presents conclusions.

Study Area and Data
The study area is located in Nanjing City, Jiangsu Province and Wanning City, Hainan Province, China, as shown in Figure 1. As one of the four garden cities in China, Nanjing City has an abundance of urban green spaces and water bodies. Wanning City has a tropical monsoon marine climate with a large amount of vegetative cover and water bodies. Additionally, because China has experienced rapid economic development over the past 40 y, there are many man-made structures mixed with natural landscapes in the study area, providing good study samples for our experiments. An IKONOS image covering an area of 108 km 2 over Nanjing City and a GeoEye-I image covering an area 49 km 2 in Wanning city were used as our experimental dataset. These two images consist of four spectral bands: blue, green, red, and near-infrared. The spatial resolutions of the multispectral bands and panchromatic band of the IKONOS image are 4.0 m and 1.0 m, respectively. A subset image called D1 with a 1.08 km × 1.37 km area was used to represent the segmentation results. There is significant vegetative cover, watery areas, and built-up land cover in the IKONOS image. The spatial resolution of the multispectral bands and panchromatic band of the GeoEye-I image are 1.64 m and 0.41 m, respectively. A subset image called D2 with a 0.9 km × 0.66 km area was used to represent the segmentation results. The GeoEye-1 image mainly contains vegetative cover with smaller areas of residential neighborhoods and watery cover. Additionally, the vegetative cover can be divided into farmland, shrubbery, and forestry, which are suitable categories for our experiments.

Overview
We used the gradient and spectral index markers for scale-variable watershed segmentation. The marker-controlled watershed transformation proceeds by using an automatic thresholding technique and a mathematical morphology method based on spectral indices and gradient images. Two marker images were generated from the gradient and spectral indices. The two marker images were then combined into one marker image for watershed transformation segmentation. Therefore, the key step in our method was to generate a marker image that integrates the gradient marker and spectral indices appropriately. The procedure for the proposed method is shown in Figure 2. (1) Obtain multispectral edge strength from the fusion of panchromatic and multispectral bands using the Canny method and vector field model.
(2) Generate a marker image for the gradient by using the method proposed by Hill et al. [48], which is referred to as the Hill-marker method for convenience.
(3) Choose spectral indices based on land cover in the target remote-sensing image.
(4) Generate a marker image based on a histogram of spectral indices.
(5) Derive a final marker from the gradient and spectral index markers using intersection, erosion, thinning, and union operations from mathematical morphology.

Marker Generation from Multispectral Gradient using the Vector Field Model
The panchromatic and multispectral bands are fused to obtain a multispectral remotesensing image with a higher spatial resolution. The first fundamental form based on the vector field model [49] is used to derive a multispectral gradient using the Canny method. Let I(x, y) be a multispectral image in the form of a vector field with bands I i (x, y) and i = 1, · · · , n. The value of I at any given point (x 0 , y 0 ) is an n-dimensional vector. The gradient of I i (x, y) in the ith band can be written as and its squared norm (i.e., the first fundamental form) is where the eigenvectors of the 2 × 2 matrix G = G xx G xy G xy G yy can be used to obtain the directions of the maximal and minimal changes. Therefore, the eigenvalues of the matrix G can be represented by the gradient of the image, and the eigenvectors of G can be used to determine the edge direction. By using elementary algebra calculations, the maximum and minimum eigenvalues can be defined as follows: The eigenvectors are (cos θ±, sin θ±), where the angles θ± are given by and Sapiro and Ringach [49] suggested that the gradient of a multispectral remote-sensing image should not be represented by the maximal value among the eigenvalues λ + but by how λ + compares to λ − . Therefore, the multispectral gradient can be defined as After obtaining the gradient of a multi-spectral remote-sensing image, we applied the moving threshold method proposed by Hill et al. [48] to generate a marker image M g .
Here, we employed these spectral indices to generate a marker for the target objects in a remote-sensing image. To ensure that ground objects with complex internal textures can be segmented into a single region, it is important to apply a thresholding technique to obtain spectral index markers. Many unimodal thresholding techniques have been proposed in previous studies [63,64]. However, the goal of our study was not to retrieve accurate information associated with an object using spectral indices (e.g., water bodies can be segmented using the NDWI) but to obtain a marker image from spectral indices for watershed segmentation.
Here, we used kernel density estimation (KDE) to obtain a marker image. Most thresholding techniques are based on histograms. However, there are many local maxima and minima that can have a negative effect on determining a threshold. Unlike a histogram, KDE can represent the overall trend of the grayscale distribution in an image.
Given that x 1 , x 2 , x 3 , . . . , x n ∈ R d is a random sample from a distribution F with density f (x), the kernel density estimate of f (x) is given bŷ where k(x, x i ) is the kernel function. Because the probability density function of the spectral index commonly contains one or more peaks, we used the following Gaussian function as a kernel: In Figure 3, we present an example to analyze the differences between the KDE and the histogram of an image. One can see that there are many local extrema in the histogram in Figure 3a. Therefore, we cannot easily identify useful local extrema, even though the trend of the curve can be visibly identified. In the KDE results in Figure 3b, the overall trend is clear and free of noise, so local extrema can be located correctly.  Because spectral indices are mainly used to enhance the information of the corresponding ground objects, a higher value indicates a higher probability of being the corresponding object. Additionally, the grayscale distribution of ground objects is typically normally distributed. Therefore, information on ground objects is relative to the last peak. To identify the last peak, we must find the accurate location of the peak (point A in Figure 3b) and the trough (point B in Figure 3b) on the left of the last peak. The peak A (x = T peak ) and trough B (x = T trough ) points can be obtained as It is necessary to combine the markers from the gradient and spectral indices into a single marker image to perform watershed transformation segmentation efficiently. Two main factors should be considered when combining these markers. First, we must ensure that the spectral indices can be used to maintain the original scales of ground objects in the segmentation results. Second, the gradient information must be unaffected so that land cover can be segmented without the representation of specific spectral indices. Therefore, we must find a way to satisfy these two conditions. Here, mathematical morphology was used to reconstruct a marker image by combining the gradient and spectral indices.
Assume that M g and M s(i) are marker images from the gradient and ith spectral index, respectively. The inflow for the combined marker-image-based spectral indices and gradient markers is presented in Figure 4. and where sign() is the signum function.
Then, the optimal threshold should be chosen from the interval [T trough , T peak ].

Gradient marker image
Spectral index marker image

Segmentation via Combination of Markers from Spectral Indices and Gradient Markers
First, we obtained the intersection of the gradient and spectral index marker images as follows: Second, the gradient marker minus spectral index marker was defined as where B is the structuring element. We then used the intersection g as a marker and M s(i) as a mask to reconstruct a new marker. Here, the reconstruction of M s(i) from g, denoted as RM gs(i) , was defined as This reconstruction step was iterated to maintain the connectivity of the marker image while transforming the spectral index markers into one-pixel-wide lines.
Finally, the final Marker M was defined as where n is the number of spectral indices based on the land cover distribution. The combined marker from the gradient and spectral index markers can be obtained using Algorithm 1.
Obtain the intersection g using Equation (11) 8: Define the structuring element B 9: Obtain M g−rest (i) using Equation (12) 10: while Area(h k+1 ) < Area(h k ) do 11: Obtain the reconstruction h k+1 (i) of M s(i) from g using Equation (13) 12: end while 13: Obtain the reconstruction RM = RM + h k+1 (i) 14: end for 15: Get the combined marker image M using Equation (14) 16: return CM Finally, the minima imposition technique [65] was employed for the watershed transformation based on the final marker image.

Performance Evaluation
Precision, recall, and F-measure [66] were used to evaluate the performance of edge detection using different algorithms. S 1 and S 2 denote the source and target edge pixels, respectively. Matching was defined as true when the neighborhood of an edge pixel b i in S 1 includes the edge pixel b j in S 2 and there is no pixel b x between b i and b j . The neighborhood distance t d was set to three pixels in this study. Precision and recall were defined as follows: where Match(S 1 , S 2 ) is the number of edge pixels in S 1 matched with S 2 , Match(S 2 , S 1 ) is the number of edge pixels in S 2 matched with S 1 , and |S 1 | and |S 2 | are the numbers of edge pixels in S 1 and S 2 , respectively. According to Equations (23) and (24), the F-measure, which is the weighted harmonic mean of precision and recall, is defined as where a = 0.5 is a given parameter.

Influence of the Threshold for Spectral Indices on Segmentation
The five subsets from images D1 and D2 shown in Figure 5 were used to analyze the influence of the threshold for spectral indices on image segmentation. Excluding the parameter of minsize in the Hill-marker method [48], when obtaining a marker image of a gradient, the range [T trough , T peak ] is the only parameter for the proposed method. The influence of this parameter on the segmentation results of the subsets was evaluated with the goal of determining an optimal setting. The ranges [T trough , T peak ] of the spectral indices of the images calculated using Equations (9) and (10) are listed in Table 1.  To illustrate the influence of the threshold parameter of the spectral index for marker generation, given the parameter value of minsize = 10, segmentation results for four different threshold values applied to generate spectral index markers based on NDVI and NDWI are presented in Figures 6 and 7, respectively. Additionally, Figure 8 presents watershed segmentation results for the Hill-marker method, where the parameter of minsize was also set to 10, to demonstrate the superiority of the proposed method.
First, as shown in Figure 6 for NDVI, the segmentation results for vegetable cover tend to contain more segments as the threshold value increases for all five subset images. Regardless, compared to the Hill-marker watershed segmentation results in Figure 8, one can see that green objects identified by the proposed method with the NDVI marker were segmented better than those identified by the Hill-marker method, particularly when the threshold value of the spectral index was close to T trough . Consider the segmentation results of S1, S2, and S4 as examples for further analysis. One can see that the segmentation results for green cover in Figure 8a,b,d contain large numbers of fragments that are adjacent and that all belong to the green cover. This phenomenon was less apparent when using the proposed method.
Second, the shadows of ground objects generally have negative effects on segmentation and were segmented into a different region in Figure 8c. In contrast, in Figure 6, the green cover and its shadows were segmented into the same region. The shadows of the green objects are highlighted with white circles. Additionally, various adjacent vegetative covers can be segmented into the same region. For example, the green cover in image S3 consists of both shrubs and forest trees. In the segmentation results of the Hill-marker method, there are boundaries between these different types of plants. In contrast, they are segmented into the same region in Figure 6i,j when using the proposed method.    Regarding the segmentation based on the NDWI marker image in Figure 7, one can see similar results compared to the NDVI marker image. When the threshold value increases, the number of segmentation regions tends to increase. However, unlike the comparison between the segmentation results in Figures 7 and 8, the proposed method based on NDWI markers alone is not superior to the Hill-marker method. The result for image S5 is inferior to that generated by the Hill-marker method because the water pool on the right cannot be segmented accurately, and its boundary is visibly separated from the real boundary. This is because the internal spectral response difference for the water cover region is high. Therefore, if there is more than one region of water in an experimental image with multiple peaks corresponding to multiple water cover types, the threshold of NDWI for marker generation is difficult to determine.

Automatic Selection of Optimal Segmentation Regions
As shown in Figures 6 and 7, the selection of a threshold for the spectral indices [T trough , T peak ] has a distinct influence on the final segmentation results. It is necessary to select parameters to obtain optimal segmentation results automatically, where segmentation can accurately reflect the scales of ground objects in remote-sensing images. For watershed transformation segmentation, more markers generate more segments. Therefore, the threshold for the spectral index [T trough , T peak ], which determines the number of markers generated from the spectral index, is key to the performance of segmentation.
When the threshold is T peak , the number of segments may increase when uncorrelated ground objects are segmented. This is because correctly joined ground objects may be divided into several disjointed regions with different internal textures. When the threshold is T trough , the number of segments may also increase. As shown in Figure 9, we simulated a onedimensional spectral response to explain this phenomenon. One can see that the segment number may decrease and then increase as the threshold increases from T trough to T peak . Therefore, we must identify the most suitable threshold for generating marker images.  Figure 10 presents the numbers of segmentation regions when the threshold was selected from the range of [T trough , T peak ] for the subset images. Figure 10a-e are based on using the NDVI for S1 to S5, respectively. Figure 10f-h are based on using the NDWI marker for S2, S3, and S5, respectively. One can see that most of the curves generally tend to increase. However, once can also see trends of small decreases (Figure 10b,d,h), fluctuations (Figure 10a,c,f), or unchanged values (Figure 10e,g). When the curve visibly increases, it indicates that the green or water covers tend to be segmented into more fragments. Therefore, we must identify the minimum number of segmentations to guarantee that there are no (or few) adjacent segmented regions of green or water covers in the results. When [T trough , T peak ] is obtained based on spectral indices from remote-sensing data, spectral markers can be obtained in Algorithm 2: Algorithm 2 Obtaining the optimal thresholds of spectral indices for generating marker images.

Input:
Multispectral band image T Output: Spectral index marker image M s 1: Obtain the spectral index image I s 2: Obtain the kernel density estimate KDE s of I s 3: Find the T trough and T peak using Equations (9) and (10) 4: for i = T trough to T peak do 5: Thresholding spectral index image T i , where the gray scale of T i is larger than i Get the number of connected regions Num i 8: end for 9: Obtain the minimum Min num of Num i 10: Get the best threshold T best = argmax(Num i == Min num ) 11: return Spectral index marker image M s = T > T best Figure 11 presents the optimal segmentations for the five subsets of data, as well as expressive segmentation results. First, the green covers in all subsets were segmented into one region. Land covers with different scales, such as buildings and green covers, can be segmented into different regions. For example, there is a significant difference in the scale of buildup in S1, S4, and S5. However, these buildups can be segmented with little influence from the parameter of the spectral index threshold. Additionally, the water bodies in the images were also segmented accurately. Therefore, the proposed method for automatically selecting the threshold for spectral indices can obtain expressive segmentation results.
(a) S1 (e) S5 Figure 11. The optimal image segmentation by automatically selecting the threshold of spectral indices' KDE.

Influence of the Scale Parameter
Based on the automatic selection of the threshold parameter, the segmentation results for different scale parameters are presented in Figure 12 to analyze the influence of the this parameter comparatively. Here, the scale parameter (scale) of the watershed transformation was defined within the interval of [5,1000] in increments of five. We can see that all the curves of F-measure using the SI-marker are superior to those using the Hill-marker. In the following section, we present our analysis in detail.
In Figure 12a, the values of F-measure first increase and then slightly decrease as the scale parameter increases. The highest F-measure reaches its maximum value when the scale parameters are 525, 530, and 535. The reason for the values of F-measure being low in the beginning is that there were still some fragments of green cover in the segmentation results.
For dataset S2, the best performance (F-measure = 0.7339) was achieved when the scale parameter was relatively small at 90. The SI-marker method is distinctly superior to the Hill-marker method.
In Figure 12c, the curve of F-measure using the SI-marker method was significantly higher than that using the Hill-marker method. The main reason for this result is that the precision of the SI-marker method is much higher than that of the Hill-marker method. This means that the boundaries of the segments have high congruency with the real boundaries between ground objects. The highest F-measure appeared in the range of [180,240], which is relatively low.
As shown in Figure 12d, the curve of F-measure using the SI-marker method was not higher than that using the Hill-marker method for all scale parameters. The SI-marker method can achieve the best performance (F-measure = 0.6623) when the scale parameter is five. However, as long as the scale parameter is less than 250, the SI-marker method is superior to the Hill-marker method. Figure 12e reveals that the SI-marker method is generally superior to the Hill-marker method. The best performance when using the SI-marker method occurs when the scale parameter is five. The main advantage of the SI-marker method is that the accuracy of the boundaries detected by the SI-marker method is higher than that detected by the Hill-marker method. By comparing the five curves, one can conclude that the performance of the SI-marker method is visibly and consistently superior to that of the Hill-marker method when the scale is small. Additionally, the precision of the SI-marker method was always higher than that of the Hill-marker method, which indicates that there were more accurate segmentation boundaries. Furthermore, there were many fragments in the segmentation results, so the recall for some datasets when using the Hill-marker method was higher than that when using the SI-marker method, particularly when the scale parameter was small for S2 and S4.
To analyze the optimal scale parameter for our proposed method further, local variance was used as a tool to explore the optimal scale. Given an image I, the local variance can be defined as where VarI i,j is the local mean of the image. Given that η i,j is a weighted neighborhood centered on a pixel, the local variance can be rewritten as The mean of the local variance µv I was estimated as where M and N represent the size of the image I. We used the µv I to analyze the relationship between the local variance and the optimal scale parameter. Figure 13 plots the relationship between the scale and the corresponding local variance of regions marked by NDVI for each subset image to reflect the influence of local variance. Green objects, which represent a high proportion of land cover in the datasets, are important for evaluating the performance of segmentation. The weighted neighborhood size was drawn from the interval of [3,19] with a step size of two. Additionally, the curves of the fitting function represent the fitting results for the point sets of {S1, S2, S3, S4, and S5}.
First, one can see that the relationship between the optimal scale and the local variance was stable. The relative positions of the five datasets in Figure 13 remained almost unchanged, even though the size of the neighborhood window η was different for each subfigure. Second, one can see that the optimal scale for these datasets increases with increasing local variance, except for S3. For S3, the segmentation accuracy remained high when the scale parameter was low, even though the optimal scale and corresponding local variance do not match the first-order fitting curve. Figure 13. The relationship between the scale and corresponding local variance of regions marked by NDVI for subset images of S1, S2, S3, S4, and S5 under different weighted neighborhood sizes η from the interval of [3,19].

Comparision
To highlight the effectiveness of the proposed method, datasets D1 and D2, where there was large-scale green cover and some water bodies, were used as experimental datasets for comparative segmentation analysis. In this segmentation experiment, the proposed method was compared to the Hill-marker watershed segmentation and the multiresolution segmentation method (MRS) [17] embedded in the eCognition Developer software. The scale parameters for the SI-marker and the Hill-marker watershed segmentation methods were selected from the interval [10, 500] with a step size of 10. For the MRS segmentation method, the scale parameter was selected from 20, 40, 60, 80 and the range of [100, 1000] with a step size of 10. The shape parameter was selected from 0.2, 0.5, 0.8, and the compact parameter was set to the default value of 0.5. Figure 14 presents the curves of F-measure, precision, and recall with different numbers of segmentation regions using different methods.
First, precision decreased as the number of segmentation regions increased for all methods. However, the precision of the proposed method based on the SI-marker was superior to that of the other methods. The recall curve in Figure 14b reveals that the Hillmarker watershed segmentation method performed better than the SI-marker watershed segmentation method when the number of segmentation regions was small. When the number of segmentation regions was greater than 500, the recall of the proposed method was visibly higher than that of the other methods, particularly when the recall of the Hillmarker method decreased sharply. Third, regarding the F-measure values of all methods, the proposed method exhibited the best performance for different numbers of segmentation regions. Additionally, the F-measure of SI-marker watershed segmentation increased and then decreased with the number of segmentation regions with a maximum value near 0.8.   Figure 15 presents the best segmentation results for the SI-marker, and Hill-marker, and MRS methods on datasets D1 and D2. Segmentation by the SI-marker watershed transformation clearly provided better results than the other methods. In particular, the green objects identified by the SI-marker watershed transformation were mostly segmented into one region, even though the spectral information of the green objects was not consistent. In the segmentation results of the Hill-marker watershed method, there was significant over-segmentation. The MRS method also yielded some over-segmentation. Regarding the segmentation results for water body cover, the SI-marker watershed method was superior to the MRS method. To illustrate the effectiveness of the SI-marker watershed method, we selected nine regions for further analysis. To illustrate the details of segmentation clearly, Figures 16 and 17 present the segmentation results for regions A, B, C, D, E, F, G, and I using the three methods. First, in regions A, B, D, E, and F, where the spectral information of green objects contained visible inconsistency, the SI-marker watershed transformation method obtained a single region, whereas the Hill-marker watershed transformation and MRS methods separated each region into several segments. Second, because the scales of the best segmentation results for the Hill-marker watershed transformation and MRS methods were large, many small-scale objects were ignored and incorrectly segmented into other ground objects. However, an object with a small scale can be identified using the proposed method. For example, the bare area in region F, small-scale green objects in regions G and H, two buildings in region I, and many buildings in regions A and B were maintained as one segment. Third, in addition to green objects, water bodies can also be incorrectly segmented. For example, the water bodies were missed in the segmentation results for regions A, B, C, E, and G for the Hill-marker watershed transformation and MRS methods. In contrast, the SI-marker watershed method guaranteed that green and water objects were segmented properly and that objects of different scales can maintain good segmentation performance.

Conclusions
In this study, we proposed a novel variational-scale multispectral image-segmentation method using spectral indices. Because geographical objects on the Earth's surface have different scales, remote-sensing images should be segmented according to the different scales of ground objects. We found that the spectral indices that are primarily used to enhance the information of ground objects can ignore the scale problem of geographical objects. Therefore, we introduced spectral indices into marker generation for marker-controlled watershed transformation segmentation. Spectral indices were used to generate markers for watershed segmentation based on KDE. Next, a gradient based on the Canny method and VFM was used to generate markers using Hill's method [48]. Reconstruction based on mathematical morphology was then used to combine the gradient and spectral index markers. Additionally, the automatic selection of a spectral index threshold was proposed for optimal variational-scale watershed segmentation based on mathematical morphology.
Two multispectral high-resolution remote-sensing images were used to validate the proposed remote-sensing image-segmentation method. In an image segmentation experiment, we compared our method to the multi-scale watershed transformation method proposed in [48] and the MRS method embedded in the eCognition Developer software. The F-measure, precision, and recall measures proposed in [66] were used to evaluate the segmentation results. Additionally, we analyzed the influence of the scale parameter on image segmentation based on seven subsets of remote-sensing images from datasets D1 and D2. The results revealed that the proposed method provides improved performance compared to the Hill-marker watershed transformation and MRS methods. We can draw the following conclusions: (1) The F-measure revealed that the proposed method produces more accurate segmentation results compared to the watershed transformation method from [48] and the MRS [17] method for varying numbers of segments.
(2) The proposed segmentation method can visually and consistently maintain the original scales of ground objects in image-segmentation results at different segmentation scales, particularly for green land cover.
(3) Our method provides more robust image segmentation under different σ and scale values when the geographical objects in remote-sensing images can be enhanced by spectral indices.
Author Contributions: K.W. had the original idea for the study and drafted the manuscript. K.W. and H.C. contributed to perform the experimentation. L.C. and J.X. contributed data curation. All authors contributed to writing and polishing of the manuscript. All authors have read and agreed to the published version of the manuscript.