Region Merging Considering Within-and Between-Segment Heterogeneity: An Improved Hybrid Remote-Sensing Image Segmentation Method

: Image segmentation is an important process and a prerequisite for object-based image analysis, but segmenting an image into meaningful geo-objects is a challenging problem. Recently, some scholars have focused on hybrid methods that employ initial segmentation and subsequent region merging since hybrid methods consider both boundary and spatial information. However, the existing merging criteria (MC) only consider the heterogeneity between adjacent segments to calculate the merging cost of adjacent segments, thus limiting the goodness-of-ﬁt between segments and geo-objects because the homogeneity within segments and the heterogeneity between segments should be treated equally. To overcome this limitation, in this paper a hybrid remote-sensing image segmentation method is employed that considers the objective heterogeneity and relative homogeneity (OHRH) for MC during region merging. In this paper, the OHRH method is implemented in ﬁve different study areas and then compared to our region merging method using the objective heterogeneity (OH) method, as well as the full lambda-schedule algorithm (FLSA). The unsupervised evaluation indicated that the OHRH method was more accurate than the OH and FLSA methods, and the visual results showed that the OHRH method could distinguish both small and large geo-objects. The segments showed greater size changes than those of the other methods, demonstrating the superiority of considering within-and between-segment heterogeneity in the OHRH method.


Introduction
With the thriving development of satellite sensors with different spatial resolutions, geographic object-based image analysis (GEOBIA) is currently available as a new and evolving paradigm in remote-sensing translation and analysis [1,2], which uses spectral, textural, and contextual information and geo-object features to improve image classification [3][4][5].Due to GEOBIA's lower sensitivity to the spectral variance within geo-objects [6], this analysis could reduce the "salt and pepper" noise compared with traditional pixel-based methods [3,7], and thus, this method is much more visually consistent and more easily converted into ready-to-use vector data.The process of using GEOBIA for a remote-sensing image is geo-object recognition, information extraction and image classification [8,9], and the core of GEOBIA is image objects [1,10], so image segmentation is an important prerequisite for GEOBIA.The image segmentation involves partitioning a remote-sensing image into non-overlapping regions, which are spatially contiguous and spectrally homogeneous [11], and these segmented regions are considered to be image objects.
Image segmentation is generally acknowledged to be a complicated task when classifying remote-sensing images, and following studies of this task, many methods have been proposed [12][13][14][15][16][17][18].In general, approaches to image segmentation can be grouped into two different categories: edge-based segmentation and region-based segmentation.The edge-based segmentation method considers the gray values to be discontinuous at different boundary regions and generally searches for places where the gray values in the image are discontinuous to determine the edge [15,[19][20][21][22].In contrast, the region-based segmentation method considers the similarity and adjacent relations between pixels, and ensures that the image satisfies a homogeneity criterion for each segmented object [23][24][25][26].However, it is impossible to segment all land cover properly using the above mentioned segmentation categories because over-segmentation or under-segmentation problems often occur.
The category combining splitting with merging is a new trend used to solve the aforementioned problems, i.e., the edge-based segmentation method is first used to obtain the initial segmentation results, and then the region-based method is conducted to merge similar segmented objects [5,[27][28][29][30].Recently, such hybrid segmentation methods have caught the attention of many scholars, because these methods consider both the boundary information used to obtain the initial segmentation, and the spatial information between adjacent geo-objects used to merge similar segmented geo-objects [29].
Region merging is an effective method used in remote-sensing image segmentation by merging segments constrained by some homogeneity or heterogeneity metrics [27].Merging order (MO) and merging criteria (MC) are the key issues in region merging processing.Recently, some MO-based methods have been developed for image segmentation [29,[31][32][33].For example, Canovas-Garcia and Alonso-Sarria [31] divided a large and heterogeneous agricultural area into plots with different land-cover types for optimal scale parameter selection.Yang et al. [29] proposed a novel hybrid segmentation method, whicht employs local spectral angle thresholds for region merging, and the method demonstrated great potential to benefit ecological applications in object detection, object-based classification, and change detection.Although the MO-based methods perform well in image segmentation, these methods all rely on MC to determine the merging results.Thus, constructing a proper MC is the key for region merging.Benz et al. [10] proposed the multi-resolution segmentation method, which integrates spectral and geometric heterogeneity to form the MC, and this method was applied in eCognition.Robinson et al. [34] and Jin [35] integrated Euclidean distance between adjacent geo-objects and common borders to form the MC in their methods, which was implemented on feature extraction in ENVI software.Chen et al. [36] incorporated size-constrained rules into the MC, weakening the error-prone nature of the traditional metric.While these MC-based methods have been widely adopted in diverse remote-sensing applications, these methods are limited by the fact that they consider the heterogeneity between adjacent geo-objects regardless of the level of homogeneity within the image's geo-objects.To further improve the segmentation quality, the MC should allow the homogeneous segments to have a higher priority for merging.This is because a homogeneous segment is more likely to be part of the adjacent object than a heterogeneous segment when the difference between the two segments and the adjacent object are the same.
Segmentation methods considering within-and between-segment heterogeneity have been developed recently.Johnson et al. [37] integrated area-weighted variance and Global Moran's I into the F-measure to obtain the optimal image segmentation parameter.Yang et al. [38] proposed an energy function by calculating the spectral angle between adjacent segments and the mean spectral angle within a segment which could parameterize a proper segmentation scale to obtain a good segmentation result.Although the aforementioned studies consider the within-and between-segment heterogeneity in the segmentation methods, these studies were all used for the image segmentation parameter optimization.To the best of our knowledge, few of the existing methods can consider both intrasegment homogeneity and intersegment heterogeneity for MC in region merging.
In this study, we describe a new hybrid segmentation method considering within-and between-segment heterogeneity of MC in region merging.To build an objective MC for region merging in this study, the spectral angle was used to quantify spectral distance [29,38,39] between adjacent segments and the standard deviation (STDV) was used to quantify the homogeneity within segments.Now, the area and common border between adjacent segments are incorporated into the MC to objectively quantify the heterogeneity between adjacent segments.The reminder of the paper is organized as follows.In Section 2, the details of the proposed hybrid segmentation method are described, with particular focus on the MC that considers within-and between-segment heterogeneity in region merging.Then, a description of the study areas and images, as well as the result analyses, follows in Section 3. In Sections 4 and 5, the discussion and conclusions are presented, respectively.

Overview
The process of the improved hybrid remote-sensing image segmentation method can be described as follows.The watershed transformation is first executed to produce an over-segmented result, and then the region-adjacent graph (RAG) [40,41] and nearest neighbor graph (NNG) [42] are built based on the initial segments to specify the adjacent relationship between segments and merging cost between adjacent segments, respectively.In addition, the merging cost is calculated with the improved objective MC considering within-and between-segment heterogeneity.Finally, adjacent segments are iteratively merged if the merging cost is less than a defined threshold.The framework of the method is shown in Figure 1.

Figure 1.
Hybrid segmentation method for region merging considering within-and between-segment heterogeneity.

Watershed Transformation
To detect edges, Vincent and Soille [43] proposed the watershed transformation method, which is the most commonly used segmentation technique in gray-scale mathematical morphology.The watershed transform is based on the concept of geodesic topology terrain.Each pixel in an image represents an elevation.If a pixel is darker, the elevation is lower; the lowest pixel is called the minimum.The different minimums are considered to be different basins.Water starts from the minimums and gradually fills up the basins until they reach the so-called watersheds [44], where different basins meet.Thus, the image is divided into different regions with similar pixel intensities by watershed [45].
However, the growing availability of multispectral remote-sensing images makes it possible to take full advantage of the edge information contained in the multiple bands.In this study, the remote-sensing image is calculated with the sobel operator as the gradient value, which is averaged across all bands as follows: where p is one of the pixels within an image and f i (p) denotes the Digital Number(DN) value of the pixel p in band i.The sobel is a 3 × 3 operator, which is Then, the watershed transformation proposed by Vincent and Soille is carried out based on the gradient image to obtain the segmentation result.In this paper, the hybrid segmentation method is a two-stage technique, and the aim of executing the watershed transformation is to produce primitive segments for subsequent merging, in which over-segmentation is considered to be a good starting point.Thus, a filter is not implemented to remove noise prior to segmentation, which is often done to avoid over-segmentation [46,47].

Region Merging Based on Merging Criteria (MC) Considering Within-and
Between-Segment Heterogeneity

MC Considering Within-and Between-Segment Heterogeneity
Two adjacent segments, which have a higher homogeneity within segments and a lower heterogeneity between segments, are more likely to be part of a geo-object.To further obtain a better segmentation result, it is key to establish an object MC considering within-and between-segment heterogeneity.First, the spectral angle (SA) [29] is calculated as the spectral distance to measure spectral heterogeneity between segments.The SA is a more intuitive and physically-defined metric compared to other methods such as Euclidean distance.The SA is then defined as follows: where (s 1 ) i and (s 2 ) i are the average DN values of two segments S 1 and S 2 in band i, respectively.The SA (S 1 ,S 2 ) value varies between 0 and 90, and lower SA values represent lower heterogeneity between segments S 1 and S 2 , i.e., a smaller merging cost.Second, the objective heterogeneity (OH) is calculated between adjacent segments, which weights the adjacent segment contributions based on their areas and common borders as follows: Remote Sens. 2018, 10, 781 5 of 26 where A 1 and A 2 are the areas of segments S 1 and S 2 , respectively.L is the common border of segments S 1 and S 2 .The adjacent segments, which have a smaller area or larger common border, have a higher priority to be merged, because the segments with a smaller area or larger common border are more likely to be part of an adjacent object.Third, the spectral homogeneity of each segment is calculated as the STDV of the DN value, which is averaged across all bands as follows: where p is one of the pixels within segment S, and DN i (p) represents the DN value of pixel p in band i.
A homogeneous segment should have a low STDV.Fourth, the average homogeneity is calculated across all segments, which weights the contribution of each segment by its area as follows: Fifth, the homogeneity of each segment is divided by the average to obtain the relative homogeneity (RH).A large RH value represents a high homogeneity within a segment, and the RH is then defined as follows: Finally, the objective heterogeneity and relative homogeneity (OHRH) are integrated into the MC, which can obtain a better segmentation result, as follows: (7) According to the above-mentioned OHRH MC, the adjacent segments, which have a high homogeneity within-segment and low heterogeneity between-segment have a high priority to be merged, since the their OHRH values are small.

Region Merging Using the Objective Heterogeneity and Relative Homogeneity (OHRH) MC
The graph model, including RAG and NNG, is built based on initial segments to simplify the region merging processing.The RAG is an undirected graph, which is used to express the spatial relationship between adjacent segments, with nodes representing segments and arcs representing their adjacency [40,42].In addition, NNG is a directed graph derived from RAG, which implements the fast search for the minimum weights in RAG [40,41].To simplify the operation of RAG and NNG in this study, only the stored adjacency relations of segments in RAG and merging cost calculated with Equation (7) in NNG, which correspond to the adjacency objects in RAG, are considered.Then, the merging cost between each segment and each of its neighbors is compared, and the adjacent segments for which the merging cost is smallest are merged if the merging cost is smaller than the threshold.The threshold (T) can be calculated through cumulative probability analysis as follows: where x is the OHRH value of a segment, and P is the cumulative probability when x ≤ T. Each α corresponds to the only T, and T increases when α increased.Finally, the RAG and NNG are updated, and the steps above are repeated until the smallest merging cost exceeds the threshold.A segment will become increasingly heterogeneous within-and between-segments as the segment is iteratively merged, and the smallest merging cost progressively increases as a result.When two pairs of adjacent segments have the same homogeneity, the adjacent segments for which the heterogeneity is lower have a smaller merging cost and priority to be merged and vice versa.However, with all the existing segmentation methods, the user must systematically vary the threshold to obtain the best-fitting segmentation.Thus, in this paper, the scale α is varied from 0.1 to 1 in increments of 0.1, and the goodness-of-fit is evaluated for each threshold using an unsupervised image segmentation evaluation method (further details are given below, Table 1).Step 2: Calculate the merging value T by Equation ( 8), and then select the adjacent segments for which the merging cost (t (s1,s2) ) is the lowest, and merge them if t (s1,s2) ≤ T Step 3: Repeating step 1 and step 2, the iterative process is stopped when t (s1,s2) > T (H must only be calculated once) Step 4: Output the final segmentation result

Region Merging Based on MC Considering Between-Segment Heterogeneity
In this paper, due to the absence of existing methods considering within-and between-segment heterogeneity, the proposed OHRH merging method is compared with two methods only considering between-segment heterogeneity.To assess the effectiveness of our OHRH method, we first implemented a modified version of our merging method by only considering between-segment heterogeneity (i.e., OH method).Then, to compare both of our methods in this study to one of these other methods considering between-segment heterogeneity, the full lambda-schedule algorithm (FLSA) [34,48] was also implemented, for which the Euclidean distance is used to quantify the spectral distance between adjacent segments.Furthermore, in this paper, the same initial segments are used for all three methods.To obtain the best segmentation result, the FLSA threshold was varied systematically, in the same way in which the other two thresholds were varied.

Segmentation Evaluation
In this paper, the segmentation results were both visually and quantitatively evaluated.For quantitative evaluation, an unsupervised segmentation evaluation (USE) method was selected.Most USE methods in remote sensing usually consider that the segmentation result should be internally homogeneous and should be distinguishable from its neighborhood [49].The existing methods consist of two components: a measure of within-segment homogeneity and one of between-segment heterogeneity, and then the results are aggregated.In this paper, homogeneity is measured by the area-weighted variance (WV) [49] of all the segments and heterogeneity by Global Moran's I (MI) [50], which measures the degree of spatial association as reflected overall in the data set.Then, the WV and MI are defined as follows: where a i and v i are the area and variance of an object, respectively; n is the total number of objects; y i is the mean gray value of object O i ; y is the mean gray value of the image; w ij is a measure of the spatial adjacency of objects; and O i and O j are adjacent.Otherwise, the value is zero.Lower WV values indicate higher within-segment homogeneity, whereas lower MI values indicate lower spectral similarity between neighboring segments.
Various methods can be used to combine WV and MI to calculate the "overall goodness" (OG).In this paper, these methods are combined using the F-measure because recent studies by Zhang et al. [4] and Johnson et al. [37] found that the F-measure is more sensitive to excessive under-segmentation or over-segmentation than other commonly used combination methods.First, the WV and MI values need to be normalized to a 0-1 range by the following: where X is the WV or MI value, and X max and X min are the maximum and minimum WV or MI values of all generated segmentations, respectively.A higher WV norm or MI norm value indicates a higher within-segment homogeneity or between-segment heterogeneity.In addition, WV norm and MI norm values are calculated for each spectral b and then averaged.Second, the OG is calculated by the F-measure, which is given by the following: where b is a weight that controls the relative weights of WV norm and MI norm .In this study, the WV norm is considered to have the same weight as MI norm , i.e., b = 1.

Study Area and Image
We used remote-sensing applications to evaluate the merging methods, including urban area, rural area and forest area.All the images were taken in north-eastern or south-eastern Beijing, China, and acquired from the gaofen-1 (GF-1) satellite, which is the first satellite of the Chinese High-resolution Earth Observation System (CHEOS), but the images differed in sensor type and spatial resolution.

Urban Area
Image segmentation is widely applied to urban geo-object information extraction, such as road detection [51] and impervious surface extraction [52].Urban geo-object information extraction usually requires very high spatial resolution images for small urban geo-objects; thus, in this paper, GF-1 images with a very high-resolution of 2 m were used, collected on 8 May 2016 in Beijing, China.In this experiment, we selected two test images of a factory area and residential area in an urban area (Figure 2).The images are panchromatic images and were fused with 8 m multispectral GF-1 images using the NNDiffuse Pan Sharpening function from the software ENVI 5.2 to enhance spectral information.The fused images, in which the size of the factory area and residential area are 0.9 × 0.75 km and 1 × 0.7 km, respectively, contain four spectral bands, including blue, green, red and near infrared.

Rural Area
Unlike the urban area, rural geo-objects, such as farmland and river, are usually large.Image segmentation is important to implement at a relatively coarse scale for the application.In this study, for the application, a typical farmland area and river area located in Beijing were chosen.In addition, the GF-1 images (~4.8 × 4.8 km) were acquired on 8 May 2016 with a spatial resolution of 8 m (Figure 3).Specifically, the images have the same bands as the aforementioned test images with 2 m resolution.

Rural Area
Unlike the urban area, rural geo-objects, such as farmland and river, are usually large.Image segmentation is important to implement at a relatively coarse scale for the application.In this study, for the application, a typical farmland area and river area located in Beijing were chosen.In addition, the GF-1 images (~4.8 × 4.8 km) were acquired on 8 May 2016 with a spatial resolution of 8 m (Figure 3).Specifically, the images have the same bands as the aforementioned test images with 2 m resolution.

Forest Area
In forest remote sensing, forest is usually identified at a large regional scale, such as the country or global scale.For the application, we made use of a GF-1 multispectral image of northern Beijing forest with 16 m resolution, collected on 24 March 2016, to evaluate the performance of the proposed merging method for forest applications.We selected a subset of the GF-1 image of 9.6 × 9.6 km, including blue, green, red and near infrared bands, which is covered with several forests and a lake (Figure 4).

Sensitivity Analysis of the OHRH Method
In the proposed merging method, the MC determines the merging sequence.However, in the region merging processing, errors are often accumulated as merging iteratively takes place.If two objects are wrongly merged, the subsequent merges will be misguided by the error.To assess the effects of a different parameter α to the merging results in this paper, three small sub-regions were chosen from an urban image to visually compare the merging results by varying α from 0.1 to 1 with increments of 0.3 (Figure 5).The over-segmented factory segments were gradually merged as α increased, but the tree clusters were wrongly merged into a factory when α was set at 1, which resulted in under-segmentation (Figure 5(c1-c4)).The OHRH method showed a good performance for small building and tree clusters when setting α to 0.4, but when changing α to 0.7, the tree clusters could not be distinguished from the grassland (Figure 5(d1-d4)).Similarly, the small building and shadow were wrongly merged with a road when α was equal to 0.7 (Figure 5(e1-e4)).These results demonstrated that if we want to segment as many different sized geo-objects as

Forest Area
In forest remote sensing, forest is usually identified at a large regional scale, such as the country or global scale.For the application, we made use of a GF-1 multispectral image of northern Beijing forest with 16 m resolution, collected on 24 March 2016, to evaluate the performance of the proposed merging method for forest applications.We selected a subset of the GF-1 image of 9.6 × 9.6 km, including blue, green, red and near infrared bands, which is covered with several forests and a lake (Figure 4).

Sensitivity Analysis of the OHRH Method
In the proposed merging method, the MC determines the merging sequence.However, in the region merging processing, errors are often accumulated as merging iteratively takes place.If two objects are wrongly merged, the subsequent merges will be misguided by the error.To assess the effects of a different parameter α to the merging results in this paper, three small sub-regions were chosen from an urban image to visually compare the merging results by varying α from 0.1 to 1 with increments of 0.3 (Figure 5).The over-segmented factory segments were gradually merged as α increased, but the tree clusters were wrongly merged into a factory when α was set at 1, which resulted in under-segmentation (Figure 5(c1-c4)).The OHRH method showed a good performance for small building and tree clusters when setting α to 0.4, but when changing α to 0.7, the tree clusters could not be distinguished from the grassland (Figure 5(d1-d4)).Similarly, the small building and shadow were wrongly merged with a road when α was equal to 0.7 (Figure 5(e1-e4)).These results demonstrated that if we want to segment as many different sized geo-objects as possible, the parameter α should be small.Whereas α should be set as a large value to segment large geo-objects, such as factories and forests.The hybrid segmentation method is a two-stage technique, and the watershed transformation is conducted to obtain the initial segments prior to the OHRH merging process.Different initial segments lead to different merging results.To assess the effects of different initial segments on the merging results, the watersheds with small, medium and large scales were first applied to obtain three initial segments in two rural subsets (Figures 6a-c and 7a-c), and then, the OHRH method The hybrid segmentation method is a two-stage technique, and the watershed transformation is conducted to obtain the initial segments prior to the OHRH merging process.Different initial segments lead to different merging results.To assess the effects of different initial segments on the merging results, the watersheds with small, medium and large scales were first applied to obtain three initial segments in two rural subsets (Figures 6a-c and 7a-c), and then, the OHRH method was implemented to obtain the final segmentation by setting α at 0.5 (Figures 6d-f and 7d-f).Over-segmentation is an obvious problem for watersheds (Figure 6a-c), and improved remarkably after the OHRH merging processing (Figure 6d-f).The merging result based on the initial segments of watersheds with small scale showed a remarkable over-segmentation, because the merging excessively used all the boundary information (Figure 6d).Conversely, under-segmentation arose in the merging result based on the initial segments of watersheds with large scale, regardless of boundary information (Figure 6f).Thus, selecting proper initial segments is very important to obtain satisfying merging results.When different sized geo-objects need to be segmented, the watershed scale is neither too coarse nor too fine.However, we want to segment geo-objects formed by grouping, so the watershed scale should be coarser.In addition, Figure 7  Over-segmentation is an obvious problem for watersheds (Figure 6a-c), and improved remarkably after the OHRH merging processing (Figure 6d-f).The merging result based on the initial segments of watersheds with small scale showed a remarkable over-segmentation, because the merging excessively used all the boundary information (Figure 6d).Conversely, under-segmentation arose in the merging result based on the initial segments of watersheds with large scale, regardless of boundary information (Figure 6f).Thus, selecting proper initial segments is very important to obtain satisfying merging results.When different sized geo-objects need to be segmented, the watershed scale is neither too coarse nor too fine.However, we want to segment geo-objects formed by grouping, so the watershed scale should be coarser.In addition, Figure 7 also demonstrated a similar conclusion.To assess the effects of different speckle noises on the merging results, a small subset corrupted by speckle noises was compared with different variances, which were selected from the forest area (T5).The parameter settings of variances were 0.0000001, 0.0000005 and 0.000001, respectively.The experimental results are shown in Figure 8.As the variance of speckle noise increased, the over-segmentation of natural objects, such as forest and lake, showed an increasing influence, but the over-segmentation of a building had less impact.Then, the under-segmentation did not occur in all four merging results, which demonstrates that the OHRH method has better noise immunity.To assess the effects of different speckle noises on the merging results, a small subset corrupted by speckle noises was compared with different variances, which were selected from the forest area (T5).The parameter settings of variances were 0.0000001, 0.0000005 and 0.000001, respectively.The experimental results are shown in Figure 8.As the variance of speckle noise increased, the over-segmentation of natural objects, such as forest and lake, showed an increasing influence, but the over-segmentation of a building had less impact.Then, the under-segmentation did not occur in all four merging results, which demonstrates that the OHRH method has better noise immunity.

Urban Area
The best segmentation using the OHRH, OH and FLSA methods was obtained by setting the input parameter α to 0.6 (Figure 9a,b).For the factory area (Figure 9a), the OG of the OHRH method (OG f : 0.6133) was better than that of the other methods (OG f : 0.6073 and 0.5803), with increases of 0.006 and 0.033, respectively.For the residential area (Figure 9b), the OG of the OHRH methods (OG f : 0.5034) was also better than that of the OH and FLSA methods (OG f : 0.4728 and 0.4734), with increases of 0.0306 and 0.03, respectively.However, the FLSA method was more accurate than the OH method (OG f of 0.4734 vs. 0.4728).The best segmentation results of the T1 and T2 images produced by the three methods using the optimal scale were presented in Figure 10.To further assess the segmentation quality, six subsets were selected from the T1 and T2 images to visually compare the best segmentation results obtained using the three methods (Figure 11).Overall, the OHRH method was more accurate because it could segment various sized geo-objects well (Figure 11a).By contrast, the other two methods over-or under-segmented some geo-objects (Figure 11b,c).In the first and third subsets, the OH and FLSA methods over-segmented the large tree field and large building, respectively.This result was probably due to neglecting the fact that a homogeneous segment was more likely to be part of the adjacent object than a heterogeneous segment when the difference between the two segments and an adjacent object is the same.The OH and FLSA methods only consider the heterogeneity between adjacent geo-objects, resulting in over-segmentation.In the fourth and fifth subsets, the OH and FLSA methods under-segmented the residential building and road, respectively.In the process of merging, the newly merged segments became less homogeneous such that a higher merging cost was obtained to stop further merging in the OHRH method.However, the OH and FLSA methods failed to consider the decreasing homogeneity, which resulted in further merging and under-segmentation.In the second and sixth subsets, the OHRH and OH methods produced nearly the same segmentations, and both of them outperformed the FLSA method because the method over-segmented the industrial building, and under-segmented the water.

Rural Area
As indicated by the highest values of OGf (Figure 12), the best segmentation using the OHRH, OH and FLSA methods was obtained by setting the input parameter α to 0.6, 0.6 and 0.5 for the river area (Figure 12a), respectively, and at 0.6 for the farmland area (Figure 12b).For the river area, compared to the OH and FLSA methods (OGf: 0.5479 and 0.5575), the OHRH method (OGf: 0.5716) increased the OG by 0.0237 and 0.0141.The FLSA method was more accurate than the OH method (OGf of 0.5575 vs. 0.5479).For the farmland area, the OHRH method (OGf: 0.6655) increased the OG by 0.0542 and 0.0146 in comparison with the other two methods (OGf: 0.6113 and 0.6509).Again, the FLSA method was more accurate than the OH method (OGf of 0.6509 vs. 0.6113).

Rural Area
As indicated by the highest values of OG f (Figure 12), the best segmentation using the OHRH, OH and FLSA methods was obtained by setting the input parameter α to 0.6, 0.6 and 0.5 for the river area (Figure 12a), respectively, and at 0.6 for the farmland area (Figure 12b).For the river area, compared to the OH and FLSA methods (OG f : 0.5479 and 0.5575), the OHRH method (OG f : 0.5716) increased the OG by 0.0237 and 0.0141.The FLSA method was more accurate than the OH method (OG f of 0.5575 vs. 0.5479).For the farmland area, the OHRH method (OG f : 0.6655) increased the OG by 0.0542 and 0.0146 in comparison with the other two methods (OG f : 0.6113 and 0.6509).Again, the FLSA method was more accurate than the OH method (OG f of 0.6509 vs. 0.6113).The best segmentation results of the T3 and T4 images produced using the three methods with the optimal scale are shown in Figure 13.Then, a close-up of five subsets selected from the T3 and T4 images in the segmentation results were used to show the difference (Figure 14).The first and fourth subsets showed the OHRH method's advantage.The road in these two subsets was segmented out with the OHRH method, whereas the other two methods cannot distinguish the road very well.In the second subset, the trees were separated from other objects in their entirety by the OHRH and OH methods, whereas the trees were wrongly merged with other objects in the results obtained with the FLSA method.For the third and fifth subsets, the farmlands were segmented well using the OHRH method, but these areas were over-segmented with the OH and FLSA methods.Moreover, in the fifth subset, the small geo-objects could be segmented well with the OHRH and FLSA methods; however, the OH method did not distinguish small geo-objects very well.
Remote Sens. 2018, 10, x FOR PEER REVIEW 16 of 25 The best segmentation results of the T3 and T4 images produced using the three methods with the optimal scale are shown in Figure 13.Then, a close-up of five subsets selected from the T3 and T4 images in the segmentation results were used to show the difference (Figure 14).The first and fourth subsets showed the OHRH method's advantage.The road in these two subsets was segmented out with the OHRH method, whereas the other two methods cannot distinguish the road very well.In the second subset, the trees were separated from other objects in their entirety by the OHRH and OH methods, whereas the trees were wrongly merged with other objects in the results obtained with the FLSA method.For the third and fifth subsets, the farmlands were segmented well using the OHRH method, but these areas were over-segmented with the OH and FLSA methods.Moreover, in the fifth subset, the small geo-objects could be segmented well with the OHRH and FLSA methods; however, the OH method did not distinguish small geo-objects very well.

Forest Area
For all three methods, the unsupervised evaluation results of the T5 image are shown in Figure 15, and the best segmentation was obtaibed by setting the input parameter α to 0.7, 0.7 and 0.6 for the OHRH, OH and FLSA methods, respectively.The OG of the OHRH method (OGf: 0.7518) was better than the OH and FLSA methods (OGf: 0.7313 and 0.6907), because the OHRH's OGf value was 0.0205 and 0.0611 higher than that of the other methods, respectively.

Forest Area
For all three methods, the unsupervised evaluation results of the T5 image are shown in Figure 15, and the best segmentation was obtaibed by setting the input parameter α to 0.7, 0.7 and 0.6 for the OHRH, OH and FLSA methods, respectively.The OG of the OHRH method (OG f : 0.7518) was better than the OH and FLSA methods (OG f : 0.7313 and 0.6907), because the OHRH's OG f value was 0.0205 and 0.0611 higher than that of the other methods, respectively.The best segmentation results of the T5 images produced by the three methods using the optimal scale are shown in Figure 16.For the forest area, three subsets were selected for a visual comparison of the segmentation results obtained using all three methods (Figure 17).In the first subset, the OHRH and OH methods showed better performances for the segmentation results, because they were able to segment various sized geo-objects well, while the FLSA method over-segmented large geo-objects, and tended to produce segments with similar region sizes, which was due to ignoring the contribution of homogeneity within segments to merging cost.In the second and third subsets, the OHRH method could also segment different sized geo-objects well; however, the OH and FLSA methods under-segmented some geo-objects.For the second subset, the OH method cannot distinguish forest from mountain ranges well, and for the third subset, the FLSA method wrongly merged some forest into lake.This under-segmentation occurred because the OH and FLSA methods did not consider the contribution of decreasing homogeneity within the segments to the merging cost.The best segmentation results of the T5 images produced by the three methods using the optimal scale are shown in Figure 16.For the forest area, three subsets were selected for a visual comparison of the segmentation results obtained using all three methods (Figure 17).In the first subset, the OHRH and OH methods showed better performances for the segmentation results, because they were able to segment various sized geo-objects well, while the FLSA method over-segmented large geo-objects, and tended to produce segments with similar region sizes, which was due to ignoring the contribution of homogeneity within segments to merging cost.In the second and third subsets, the OHRH method could also segment different sized geo-objects well; however, the OH and FLSA methods under-segmented some geo-objects.For the second subset, the OH method cannot distinguish forest from mountain ranges well, and for the third subset, the FLSA method wrongly merged some forest into lake.This under-segmentation occurred because the OH and FLSA methods did not consider the contribution of decreasing homogeneity within the segments to the merging cost.

The Performance of the OHRH Method in Another Dataset
According to the aforementioned experiments, the OHRH method showed a good performance in the GF-1 image applications with different spatial resolutions.Furthermore, the SZTAKI-INRIA building detection dataset [53] is used to further evaluate the effectiveness of the OHRH method, which was obtained from the website (http://web.eee.sztaki.hu/remotesensing/building_benchmark.html).In addition, this dataset contains 9 aerial or satellite images taken from Budapest, Szada (both in Hungary), Manchester (UK), Bodensee (Germany), Normandy and Côte d'Azur (both in France).In this paper, images from three regions of Szada, Manchester and Cot d'Azur were selected in the dataset.In this part, we selected one merging scale for each region image, and the scale selection criteria are the same as those for the GF-1 images.The segmentation results are shown in Figure 18 for visual comparison, where only the OHRH and FLSA results are presented.The unsupervised evaluation results for the OHRH and FLSA methods are shown in Table 2. Figure 18 did not show an obvious difference in the merging results between the OHRH and FLSA methods, because their OGf values were similar.To clearly demonstrate the difference, two subsets were selected from each image (Figure 19).First, the OHRH method showed a better performance in segmenting buildings and tree clusters compared to the FLSA method.Then, the FLSA tended to produce segments with similar region sizes, whereas the OHRH method was likely to produce segments with different sizes, which demonstrates the superiority of the OHRH method considering within-and between-segment heterogeneity.

The Performance of the OHRH Method in Another Dataset
According to the aforementioned experiments, the OHRH method showed a good performance in the GF-1 image applications with different spatial resolutions.Furthermore, the SZTAKI-INRIA building detection dataset [53] is used to further evaluate the effectiveness of the OHRH method, which was obtained from the website (http://web.eee.sztaki.hu/remotesensing/building_benchmark.html).In addition, this dataset contains 9 aerial or satellite images taken from Budapest, Szada (both in Hungary), Manchester (UK), Bodensee (Germany), Normandy and Côte d'Azur (both in France).In this paper, images from three regions of Szada, Manchester and Cot d'Azur were selected in the dataset.In this part, we selected one merging scale for each region image, and the scale selection criteria are the same as those for the GF-1 images.The segmentation results are shown in Figure 18 for visual comparison, where only the OHRH and FLSA results are presented.The unsupervised evaluation results for the OHRH and FLSA methods are shown in Table 2. Figure 18 did not show an obvious difference in the merging results between the OHRH and FLSA methods, because their OG f values were similar.To clearly demonstrate the difference, two subsets were selected from each image (Figure 19).First, the OHRH method showed a better performance in segmenting buildings and tree clusters compared to the FLSA method.Then, the FLSA tended to produce segments with similar region sizes, whereas the OHRH method was likely to produce segments with different sizes, which demonstrates the superiority of the OHRH method considering within-and between-segment heterogeneity.17, where the first and third columns are in the OHRH results, and the other columns are in the FLSA results.

Discussion
Image segmentation is the first step and a necessary prerequisite for GEOBIA, and accurate segmentation can improve the performance of many subsequent ecological applications, such as oil spill detection [13], cloud extraction [16], land-fog detection [48] and road detection [51].The hybrid segmentation, which combines splitting with merging, is a new trend for obtaining good segments [5], because the method considers both the boundary information and spatial information between adjacent geo-objects [29].The region merging method merges segments according to some homogeneity or heterogeneity metrics and provides an effective approach for remote sensing image segmentation [27].However, most MC in the region merging methods only consider the heterogeneity between adjacent geo-objects, regardless of the level of homogeneity within the image geo-objects.The proposed OHRH method combines the area-weighted SA and relative STDV into the MC, in which the area-weighted SA is used to calculate the heterogeneity between adjacent geo-objects and the relative STDV is used to calculate the homogeneity within each segment.This method provides a possible solution to determine the proper segments to be merged by taking full advantage of the spectral homogeneity and spatial independence.
With the rapid development of high spatial resolution satellite sensors, it is possible to segment geo-objects that change widely in size, from lakes and forests to small buildings and cars.However, when a geo-object of a given size needs to be segmented, the correspondence between an image spatial resolution and the size of a given geo-object should be built.When the spatial resolution is so low that the geo-object comprises only a few pixels, the geo-object will be wrongly merged into the adjacent geo-objects, which results in under-segmentation.Conversely, if the spatial resolution is so high that the geo-object contains lots of pixels with high spectral variability, the geo-object will be partitioned into many segments, which results in over-segmentation.Therefore, the average size of geo-objects determines the optimal spatial resolution when the segmentation method is implemented.
However, there are always many different geo-objects sizes, such as factory, road, residential building, farmland, tree clusters and river, in an image (Figure 3a).To segment different sizes of geo-objects well, some recent studies have attempted to segment geo-objects at more than one scale.For example, Johnson et al. [37] evaluated the performance of the single-scale/single-level (SS) and multi-scale/multi-level (MS) GEOBIA approaches for mapping a single land-use/land-cover (LULC) type.The stopping rule of the OHRH method is the threshold α, so the OHRH method can produce multi-scale results by setting different thresholds of the cumulative probability α.A larger threshold

Discussion
Image segmentation is the first step and a necessary prerequisite for GEOBIA, and accurate segmentation can improve the performance of many subsequent ecological applications, such as oil spill detection [13], cloud extraction [16], land-fog detection [48] and road detection [51].The hybrid segmentation, which combines splitting with merging, is a new trend for obtaining good segments [5], because the method considers both the boundary information and spatial information between adjacent geo-objects [29].The region merging method merges segments according to some homogeneity or heterogeneity metrics and provides an effective approach for remote sensing image segmentation [27].However, most MC in the region merging methods only consider the heterogeneity between adjacent geo-objects, regardless of the level of homogeneity within the image geo-objects.The proposed OHRH method combines the area-weighted SA and relative STDV into the MC, in which the area-weighted SA is used to calculate the heterogeneity between adjacent geo-objects and the relative STDV is used to calculate the homogeneity within each segment.This method provides a possible solution to determine the proper segments to be merged by taking full advantage of the spectral homogeneity and spatial independence.
With the rapid development of high spatial resolution satellite sensors, it is possible to segment geo-objects that change widely in size, from lakes and forests to small buildings and cars.However, when a geo-object of a given size needs to be segmented, the correspondence between an image spatial resolution and the size of a given geo-object should be built.When the spatial resolution is so low that the geo-object comprises only a few pixels, the geo-object will be wrongly merged into the adjacent geo-objects, which results in under-segmentation.Conversely, if the spatial resolution is so high that the geo-object contains lots of pixels with high spectral variability, the geo-object will be partitioned into many segments, which results in over-segmentation.Therefore, the average size of geo-objects determines the optimal spatial resolution when the segmentation method is implemented.
However, there are always many different geo-objects sizes, such as factory, road, residential building, farmland, tree clusters and river, in an image (Figure 3a).To segment different sizes of geo-objects well, some recent studies have attempted to segment geo-objects at more than one scale.For example, Johnson et al. [37] evaluated the performance of the single-scale/single-level (SS) and multi-scale/multi-level (MS) GEOBIA approaches for mapping a single land-use/land-cover (LULC) type.The stopping rule of the OHRH method is the threshold α, so the OHRH method can produce multi-scale results by setting different thresholds of the cumulative probability α.A larger threshold means that the more merging iterations are executed, resulting in coarser segmentation results.Then, the consequent problem is how to automatically select a proper scale for a given application.An alternative solution may be to build the correspondence between the scale parameter and the semantic meaning of geo-objects.
The hybrid segmentation method is a two-stage technique, and the merging processes are based on the initial segments obtained for the watershed.Figures 6 and 7 show that selecting a proper initial segment is very important to obtain satisfying merging results.Initial segments that are too coarse or too fine cannot obtain a good merging result.The boundary and spatial information should be fully considered for satisfying merging results by building correspondence between initial segments and the semantic meaning of geo-objects.
The USE (Figures 9, 12 and 15) show that the OHRH method performs better than both the OH and FLSA methods.The merging procedures are applied to the same initial segments, thus proving the superiority of within-and between-segment heterogeneity considered strategy in the OHRH method.On the other hand, the fact that the OH method was not always more accurate than the FLSA method proves that the area-weighted SA is not the key issue regarding the OHRH method's good performance, but the within-and between-segment heterogeneity considered strategy.Thus, further improving region merging methods using other metrics is possible.The visual results (Figures 10, 13 and 16) among the OHRH, OH and FLSA methods did not show obvious differences, which is potentially why the USE results for the three methods were similar.However, the local area results ( Figures 11,14 and 17) indicate that the OHRH method could distinguish both small and large geo-objects, and the segments had a greater change in size than those of the other methods.To further demonstrate the observation, we calculated the standard deviations of the segment sizes with the OHRH, OH and FLSA methods for the factory area, urban residential area, river area, farmland area, and forest and lake areas (Figure 20).The highest standard deviation from each study area was almost always from the OHRH method except the river area, which indicated that the OHRH method could segment the image into more different geo-object sizes than the other methods.Then, the experimental results using the SZTAKI-INRIA building detection dataset again demonstrated the above-mentioned conclusion.The main contributions of this study are as follows: (1) the existing MC for region merging in a hybrid method only considers the heterogeneity between adjacent segments to calculate the merging cost of adjacent segments, thus limiting the goodness-of-fit between segments and geo-objects, because the homogeneity within segments and heterogeneity between segments should be treated equally.To overcome this limitation, this paper proposed a hybrid remote-sensing image segmentation method that considers the objective heterogeneity and the relative homogeneity (OHRH) for MC in the region merging.(2) The OHRH method could distinguish both small and large geo-objects, and the segments showed a greater change in size than that of the other methods, which demonstrates the superiority of the within-and between-segment heterogeneity considered strategy in the OHRH method.

Conclusions
In this paper, we proposed a hybrid remote-sensing image segmentation method that employs a within-and between-segment heterogeneity considered strategy for MC in the region merging method.First, the initial over-segmented segments are produced by watershed transformation, since it is considered to be a good starting point for consequent region merging.Second, the RAG and NNG models are built based on initial segments to simplify the region merging processing.Finally, the OHRH merging method is conducted on the graph models to produce the segmentation results.To show the effectiveness of the OHRH method in this paper, a GF-1 image was used as an example, and a set of different spatial resolution remote-sensing images was used to perform the experiments.Then, we used the same initial segments produced by the watershed transformation for consequent region merging to compare the final results with those of the OH and FLSA methods.The unsupervised evaluation indicator (F-measure) was chosen to assess the segmentation quality, because the studies of Zhang [4] and Johnson [37] have shown that the F-measure is more effective for combining the withinand between-segment heterogeneity metrics and can penalize excessive under-or over-segmentation.The F-measure indicated that the OHRH method was more accurate than the OH and FLSA methods, and the visual results showed that the OHRH method was able to distinguish both small and large geo-objects.The segments showed a greater change in size than that of the other methods, demonstrating the superiority of the within-and between-segment heterogeneity considered strategy in the OHRH method.Moreover, the experimental results using the SZTAKI-INRIA building detection dataset also demonstrated a similar conclusion.In addition, the experimental data and all Figures in this paper are available online (see details further in the Supplementary Materials).In the future, we will focus on improving the automation of selecting a proper scale for a given application and the combination of within-and between-segment heterogeneity using other metrics to further improve region merging performances.

Figure 2 .
Figure 2. The urban area located in Beijing and its fused GF-1 multispectral images with a spatial resolution of 2 m: (a) T1, factory area and (b) T2, urban residential area.

Figure 2 .
Figure 2. The urban area located in Beijing and its fused GF-1 multispectral images with a spatial resolution of 2 m: (a) T1, factory area and (b) T2, urban residential area.

Figure 3 .
Figure 3.The rural area located in Beijing and its GF-1 multispectral images with a spatial resolution of 8 m: (a) T3, river area; and (b) T4, farmland area.

Figure 3 .
Figure 3.The rural area located in Beijing and its GF-1 multispectral images with a spatial resolution of 8 m: (a) T3, river area; and (b) T4, farmland area.

Figure 4 .
Figure 4.The forest area located in Beijing and its GF-1 multispectral images with a spatial resolution of 16 m: T5, forest and lake areas.

Figure 4 .
Figure 4.The forest area located in Beijing and its GF-1 multispectral images with a spatial resolution of 16 m: T5, forest and lake areas.

Figure 5 .
Figure 5. Segmentation results of urban subsets, where the parameter α is set at 0.1, 0.4, 0.7 and 1 from left to right.

Figure 5 .
Figure 5. Segmentation results of urban subsets, where the parameter α is set at 0.1, 0.4, 0.7 and 1 from left to right.
also demonstrated a similar conclusion.Remote Sens. 2018, 10, x FOR PEER REVIEW 11 of 25 was implemented to obtain the final segmentation by setting α at 0.5 (Figures 6d-f and 7d-f).

Figure 6 .
Figure 6.The merging results of the river subset based on initial segments obtained for watersheds with different scales: panels (a-c) are the initial segments of watersheds with small, medium and large scales, respectively.Panels (d-f) are the subsequent OHRH merging results, where α is set at 0.5.

Figure 6 .
Figure 6.The merging results of the river subset based on initial segments obtained for watersheds with different scales: panels (a-c) are the initial segments of watersheds with small, medium and large scales, respectively.Panels (d-f) are the subsequent OHRH merging results, where α is set at 0.5.

Figure 7 .
Figure 7.The merging results of the farmland subset based on initial segments obtained for watersheds with different scales: panels (a-c) are the initial segments for watersheds with small, medium and large scales, respectively.Panels (d-f) are the subsequent OHRH merging results, where α is set at 0.5.

Figure 8 .
Figure 8.The merging results of the forest subset corrupted by speckle noises with different variances: (a) no speckle noises; (b) speckle noises with a setting variance of 0.0000001; (c) speckle noises with a setting variance of 0.0000005; (d) speckle noises with a setting variance of 0.000001.

Figure 7 .Figure 7 .
Figure 7.The merging results of the farmland subset based on initial segments obtained for watersheds with different scales: panels (a-c) are the initial segments for watersheds with small, medium and large scales, respectively.Panels (d-f) are the subsequent OHRH merging results, where α is set at 0.5.

Figure 8 .
Figure 8.The merging results of the forest subset corrupted by speckle noises with different variances: (a) no speckle noises; (b) speckle noises with a setting variance of 0.0000001; (c) speckle noises with a setting variance of 0.0000005; (d) speckle noises with a setting variance of 0.000001.

Figure 8 .
Figure 8.The merging results of the forest subset corrupted by speckle noises with different variances: (a) no speckle noises; (b) speckle noises with a setting variance of 0.0000001; (c) speckle noises with a setting variance of 0.0000005; (d) speckle noises with a setting variance of 0.000001.

Figure 9 .
Figure 9.The unsupervised segmentation evaluation (USE) for test images T1 and T2 produced by three methods, where the scale α is varied from 0.1 to 1.

Figure 10 .
Figure 10.The segmentation results of test images T1 and T2 produced by three methods, using the optimal scale.

Figure 10 .
Figure 10.The segmentation results of test images T1 and T2 produced by three methods, using the optimal scale.

Figure 11 .
Figure 11.Subsets of segmentations in Figure 6, which are in the OHRH, objective heterogeneity (OH) and full lambda-schedule algorithm (FLSA) results from left to right.

Figure 12 .
Figure 12.The USE for test images T3 and T4 produced by three methods, in which the scale α is varied from 0.1 to 1.

Figure 11 .
Figure 11.Subsets of segmentations in Figure 6, which are in the OHRH, objective heterogeneity (OH) and full lambda-schedule algorithm (FLSA) results from left to right.

Figure 12 .
Figure 12.The USE for test images T3 and T4 produced by three methods, in which the scale α is varied from 0.1 to 1.

Figure 13 .
Figure 13.The segmentation results of test images T3 and T4 produced by three methods, using the optimal scale.

Figure 13 .
Figure 13.The segmentation results of test images T3 and T4 produced by three methods, using the optimal scale.

Figure 14 .
Figure 14.Subsets of segmentations in Figure 9, which are in the OHRH, OH and FLSA results from left to right.

Figure 14 .
Figure 14.Subsets of segmentations in Figure 9, which are in the OHRH, OH and FLSA results from left to right.

Figure 15 .
Figure 15.The USE for test image T5 produced by three methods, in which the scale α varied from 0.1 to 1.

Figure 15 .
Figure 15.The USE for test image T5 produced by three methods, in which the scale α varied from 0.1 to 1.

Figure 16 .
Figure 16.The segmentation results of test image T5 produced by three methods, using the optimal scale.

Figure 16 .
Figure 16.The segmentation results of test image T5 produced by three methods, using the optimal scale.

Figure 17 .
Figure 17.Subsets of segmentations in Figure 12, which are in the OHRH, OH and FLSA results from left to right.

Figure 17 .
Figure 17.Subsets of segmentations in Figure 12, which are in the OHRH, OH and FLSA results from left to right.

Figure 18 .
Figure 18.Segmentation results produced with the OHRH and FLSA methods, using the SZTAKI-INRIA building detection dataset.

Figure 18 .
Figure 18.Segmentation results produced with the OHRH and FLSA methods, using the SZTAKI-INRIA building detection dataset.

Figure 19 .
Figure 19.Subsets of segmentations in Figure 17, where the first and third columns are in the OHRH results, and the other columns are in the FLSA results.

Figure 20 .
Figure 20.Standard deviations of segment sizes using the best OHRH, OH and FLSA segmentations for urban area, rural area and forest area, respectively.

Table 1 .
Algorithm for region merging using the objective heterogeneity and relative homogeneity (OHRH) merging criteria (MC).

Table 2 .
The unsupervised evaluation results for the SZTAKI-INRIA building detection dataset produced with the OHRH and FLSA methods.Subsets of segmentations in Figure