Assessment of Segmentation Parameters for Object-Based Land Cover Classification Using Color-Infrared Imagery

Using object-based image analysis (OBIA) techniques for land use-land cover classification (LULC) has become an area of interest due to the availability of high-resolution data and segmentation methods. Multi-resolution segmentation in particular, statistically seen as the most used algorithm, is able to produce non-identical segmentations depending on the required parameters. The total effect of segmentation parameters on the classification accuracy of high-resolution imagery is still an open question, though some studies were implemented to define the optimum segmentation parameters. However, recent studies have not properly considered the parameters and their consequences on LULC accuracy. The main objective of this study is to assess OBIA segmentation and classification accuracy according to the segmentation parameters using different overlap ratios during image object sampling for a predetermined scale. With this aim, we analyzed and compared (a) high-resolution color-infrared aerial images of a newly-developed urban area including different land use types; (b) combinations of multi-resolution segmentation with different shape, color, compactness, bands, and band-weights; and (c) accuracies of classifications based on varied segmentations. The results of various parameters in the study showed an explicit correlation between segmentation accuracies and classification accuracies. The effect of changes in segmentation parameters using different sample selection methods for five main LULC types was studied. Specifically, moderate shape and compactness values provided more consistency than lower and higher values; also, band weighting demonstrated substantial results due to the chosen bands. Differences in the variable importance of the classifications and changes in LULC maps were also explained.


Introduction
One of the most widely-used processes in remote sensing is land use-land cover classification (hereafter, LULC), and various approaches have been implemented to obtain thematic information on the earth's surface characteristics through diversely-scaled remote sensing data [1].Spatial information extraction using high-resolution remote sensing imagery, such as airborne, unmanned aerial systems (UAS), and satellite images, utilizes the advantages of object-based image analysis (OBIA).A great deal of research that relates to earth observation, such as land cover mapping, biodiversity, and disaster management, uses OBIA techniques to obtain valuable temporal geospatial knowledge [2][3][4][5][6].
Image segmentation as a preceding part of object-based classification is crucial to the success of OBIA.In image processing, Reference [7] categorized segmentation methods from an algorithmic perspective into four classes, as point-based, edge-based, region-based, and combinations thereof.In OBIA, region-based and combined segmentation algorithms stand out since the algorithms create a homogenous subset of the image with respect to criteria, such as spectral values, geometry, and texture.In 2017, Reference [8] statistically explained that studies on segmentation using multi-resolution segmentation [9,10] of eCognition software account for 80.9% in reviewed studies.The other preferred segmentation methods beside the multi-resolution algorithm are mean-shift segmentation [11,12] and the combined segmentation process of ENVI (Environment for Visualizing Images) software [13,14].
Multi-resolution segmentation is a technique for converting adjacent one-pixel objects into multi-pixel objects with step-by-step merging according to their united-form features, resulting in an increase in their heterogeneities controlled by the defined scale parameter.Besides the importance of the scale parameter, shape-color heterogeneities within a determined scale have an impact on segment features and thus, segmentation.In multi-resolution segmentation, the increase of heterogeneity f is a function of the weighted spectral and shape heterogeneities.Spectral heterogeneity is a function of standard deviation depending on the band value and number of pixels of the objects to merge.Furthermore, shape heterogeneity is a function of both object smoothness and compactness.Smoothness is defined as the ratio between the border length of the object and bounding box of the object, whereas compactness is described as the ratio between the border length of the object and number of object pixels [9,10].Equations ( 1)-( 3) compute the increase of heterogeneity (f), while w is weight, ∆h is heterogeneity, n is number of pixels, and σ band is standard deviations.f = w color * ∆h color + w shape * ∆h shape w color ∈ [0, 1], w shape ∈ [0, 1] and w color + w shape = 1 ∆h color = ∑ band w band (n merge * σ band,mege − (n obj1 * σ band,obj1 + n obj2 * σ band,obj2 )) ∆h shape = w comp * ∆h comp + w smooth * ∆h smooth w comp ∈ [0, 1], w smooth ∈ [0, 1] and w comp + w smooth = 1 In OBIA, some pre-or post-processing interventions are needed to overcome the weakness of existing segmentation methods so as to obtain an ideal representation of image objects [15][16][17][18].For example, due to the identical spectral/spatial properties of road-like objects and contextual structures like parking lots and railways, Reference [15] applied image filtering techniques to remove irregularities in the extracted road segments.On the other hand, the detection of optimized parameters of multi-resolution segmentation, such as scale, shape, and compactness, were discussed in order to delineate more appropriate image object boundaries.To enhance the quality of image segmentation, Reference [16] proposed an optimization procedure that provided an improvement in segmentation accuracy between 20% and 40%.Nevertheless, processing time of the proposed method increased substantially due to the multiple segmentation.In 2014, Reference [17] proposed an unsupervised multi-band approach for scale parameter selection in the multi-scale image segmentation process whereas the index of spectral homogeneity was used to determine multiple appropriate scale parameters.However, the method was evaluated by only two object classes without comparing it with widely-used scale selection methods.Another unsupervised scale selection method was explained in Reference [18] based on the computation of local variances to select optimal scale parameters.
As a supervised optimization method, Reference [19] described a stratified OBIA for semi-automated mapping of geomorphological image objects.The results, which were obtained by comparing 2-D frequency distribution matrices of training samples and image objects, provided a more effective digital landscape analysis for automated geomorphological mapping, although the phase of delineating training samples was considered a major drawback.Multi-resolution segmentation was examined and compared by using both supervised and unsupervised approaches to produce the desired object geometry [20].Since supervised segmentation requires an indefinite amount of work, unsupervised methods offer an important alternative to detect the optimal scale parameter of multi-resolution segmentation.Different multi-resolution segmentation parameters have also been combined, and then their segmentation results were compared to analyze the effect of parameter selection, providing the proper segmentation of specific objects.Various segmentation results were evaluated by comparing the influence of spatial resolution; spectral band sets and the classification approach for mapping urban land cover [21].The results showed that spatial resolution is clearly the first and most influential factor for urban land-use mapping accuracy.Moreover, it has also been stated that the second priority is the classifier; and third, the spectral band set could lead to significant gains.
In OBIA, accuracy assessment of image segmentation is considered as both a qualitative assessment based on visual interpretation and a quantitative assessment using reference data.Approaches for quantitative assessment are grouped into two main categories: geometric methods and non-geometric methods.Geometric methods focus on the geometry of the reference objects and segment polygons to determine the similarity among them; whereas non-geometric methods are related with the properties of the objects such as the spectral content [22].Reference [23] demonstrated measures that facilitate the identification of optimal segmentation results and have utility in reporting the overall accuracy of segmentation relative to a training set.Reference [24] proposed the region-based precision and recall measures for evaluating segmentation quality.Reference [24] also defined the F-measure, the sum of precision and recall, and Euclidean distances were proposed to compare the quality of different image partitions.On the other hand, References [25,26] described a new discrepancy measure called the segmentation evaluation index, which redefines the corresponding segment using a two-sided 50% overlap instead of a one-sided 50% overlap.
Using OBIA techniques for LULC classification became an area of interest due to the availability of high-resolution data and segmentation methods.OBIA classifications instead of classic pixel-based methods were reported to present better performance by dealing with more characteristics such as shape of sample features [27,28].In fact, as mentioned in Reference [29], the spectral properties of a certain area depend on changes in vegetative cover phenology, whereas spatial properties of the same area such as shape and size are more probable to remain permanent.On the other hand, the spatial characteristics of the data used are considered to be one of the most essential parameters affecting the success of classification since the discrimination of objects from each other is associated with pixel size.Even though there have been considerable number of studies conducted using moderate resolution satellite imageries such as Landsat and SPOT (Satellite Probatoire d'Observation de la Terre-Satellite for Observation of Earth) series, the coarse spatial properties of the imageries may present issues when detailed-level mapping is required.Also, it was reported that there is relation between study area and pixel size, and higher resolution satellite data such as WorldView-2, QuickBird (QB), GeoEye-1, and Ikonos were mainly used in small-size areas [8].Moreover, as cited in Reference [30], high-resolution aerial imageries are suggested for precise results in OBIA LULC mapping where finer products are necessary [31][32][33][34][35][36].According to Reference [37], spatial resolution is a more significant parameter for urban land cover in comparison to spectral resolution and brings to aerial photography an important role in urban studies.Furthermore, another essential consideration is the selection of an appropriate classification technique among a range of algorithms such as the decision tree, support vector machines, and random forest.The study in Reference [8] documented that the most widely-employed supervised classifier is nearest-neighborhood, whereas use of random forest classifier resulted in higher overall accuracy.
Recent studies mentioned above focused on discovering optimal segmentation parameters and particularly, the scale parameter was exclusively considered.As cited in Reference [38], earlier studies have mainly focused on the scale parameter due to its key role on determination of segmentation size [22,23,39].However, shape, compactness, and band weight were not considered together in terms of segmentation and classification accuracy.In particular, it has not been explained yet how an elaborate change in segmentation parameters might have a significant effect on LULC classifications.The main objective of this study is to assess OBIA segmentation and classification accuracy depending on shape-color, compactness-smoothness, and band weight parameters using different overlap ratios during image object sampling for a predetermined scale.In order to achieve this objective, three aspects were analyzed and compared: (a) high-resolution color-infrared aerial images of a recently-developed urban area including different land-use types; (b) combinations of multi-resolution segmentation with different shape-color, compactness-smoothness, and band-weights; and (c) accuracies of classifications based on varied segmentations.
The paper is organized as follows.Section 2 introduces the study area, image properties and data pre-processing.Section 3 presents a series of applied steps including definition of segmentations and object-based classifications in the methodology.The outcomes which demonstrate accuracy assessments and their comparisons are discussed in Section 4. Specifically, Section 4.1 shows segmentation accuracy while Section 4.2 defines classification accuracies.Variable importance based on mean decrease accuracy is analyzed in Section 4.3.Then, classification maps were produced and overviewed visually as shown in Section 4.4.Finally, conclusions and future prospects are considered in Section 5.

Study Area and Data Pre-Processing
Canakkale province is located between 25 • 40'-27 • 30' E and 39 • 27'-40 • 45' N on both sides of the Dardanelles Straits in the North-Aegean part of Turkey, combining Asia and Europe.The city of Canakkale has dramatically expanded through north and south due to urban development in recent years.The study area (≈160 ha) was selected on the northern edge of the city depending on the variations in LULC types.A relatively low residential density in the newly-developed area, enabling discrimination of buildings, roads, green areas, concrete, and bare soils, was the main criteria for selection of the study area (Figure 1).Aerial images at 30 cm ground sample distance acquired by Microsoft UltraCam Eagle photogrammetric digital aerial camera (Table 1) over the city of Çanakkale, Turkey were used to produce orthophotos.For this purpose, each aerial image 30% vertically and 60% horizontally overlaps neighboring images.Initial exterior orientation parameters were estimated by onboard global positioning systems (GPS) and an inertial measurement unit (IMU) system during image Aerial images at 30 cm ground sample distance acquired by Microsoft UltraCam Eagle photogrammetric digital aerial camera (Table 1) over the city of Çanakkale, Turkey were used to produce orthophotos.For this purpose, each aerial image 30% vertically and 60% horizontally overlaps neighboring images.Initial exterior orientation parameters were estimated by onboard global positioning systems (GPS) and an inertial measurement unit (IMU) system during image acquisition.Initial exterior orientation parameters were adjusted to calculate accurately the exterior orientation parameters of each image using ground control points.In order to produce ortho-images with four spectral bands (red, green, blue (RGB), and near infrared (NIR)), oriented images were used.The mosaic image was generated from these orthophotos with horizontal accuracy of mosaic images within ±2 m based on a 90% confidence interval (Table 2).Then, the study area was clipped from the mosaic image.As cited in Reference [40], selection of the training set, and sufficiency of its size and completeness are challenging points in OBIA classification as well as pixel-based approaches [41][42][43][44].Samples were collected manually depending on the magnitude and homogeneity of patches for each class.In this context, it could be seen that patches are larger and spectrally more homogenous for the bare soil class whereas buildings have varied types of roofing material and shape.Therefore, less samples were able to represent discrimination of the bare class from the others, while buildings required more samples.On the other hand, distribution of the patch size and spectral properties of the green spaces (agriculture and recreation areas), roads (asphalt, with/without shadow), and concrete areas (sidewalks and interlocking pavers) were homogenous compared to the building class.This leads to the approximate number of samples for the mentioned classes in Figure 2.
In OBIA classification, segments as image objects are interpreted based on their object features and thus segments that overlap delineated samples are considered as training and validation samples in OBIA.For the sake of clarity, segments that represent defined samples will be called sample-segments in the remainder of this paper.
bare soil class whereas buildings have varied types of roofing material and shape.Therefore, less samples were able to represent discrimination of the bare class from the others, while buildings required more samples.On the other hand, distribution of the patch size and spectral properties of the green spaces (agriculture and recreation areas), roads (asphalt, with/without shadow), and concrete areas (sidewalks and interlocking pavers) were homogenous compared to the building class.This leads to the approximate number of samples for the mentioned classes in Figure 2. In OBIA classification, segments as image objects are interpreted based on their object features and thus segments that overlap delineated samples are considered as training and validation samples in OBIA.For the sake of clarity, segments that represent defined samples will be called samplesegments in the remainder of this paper.

Methodology
Separately designed subsections constituted the complete methodology of this study (Figure 3).The four band orthophoto explained in the previous section was used as the input.The segmentations and classification samples were determined in the first step.Then, three sample segmentation methods (SSM) were applied using Python scripts to define samples for object-oriented classification.Afterward, recently generated data including object features and the geometries of segments were transferred to the PostgreSQL 9.6.0/PostGIS2.5 database.The geospatial datasets organized in the PostGIS database were handled in two separate analyses.The first analysis, implemented in Matlab R2016a, concerned segments and segmentation accuracy; whereas the second analysis, produced in R 3.5.1 software, was for random forest classification [45].Subsequently, LULC maps based on the assignment of image objects to classes were illustrated in Matlab.Finally, segmentation and classification accuracies were examined numerically and compared graphically.
In general, the outcome of the multi-resolution segmentation algorithm is managed by three main factors: scale, shape, and compactness.For the purpose of this study, weighted bands were also considered in addition to shape and compactness.Both shape and compactness were assessed as 0.1, 0.3, 0.5, 0.7, and 0.9 while the scale value was constant.The optimized scale level 65 was determined using the estimation of scale parameter (ESP) tool, which was introduced in Reference [46] and enhanced in Reference [18].ESP is an iterative tool that calculates the local variance of image-objects for each scale step starting from the user-defined starting scale parameter.Due to insignificant changes and a large amount of data, 0.3 and 0.7 were eliminated.In order to analyze spectral effects in the segmentation process, single band or multiple bands were weighted.When all these combinations are considered, a total of 108 segmentation attempts was implemented.These segmentation attempts were enumerated from 1 to 108 as their segmentation number (SN).Table 3 presents the shape and compactness criteria and weighted bands for each attempt whereas S, C, and WB indicate shape, compactness, and weighted bands, respectively.transferred to the PostgreSQL 9.6.0/PostGIS2.5 database.The geospatial datasets organized in the PostGIS database were handled in two separate analyses.The first analysis, implemented in Matlab R2016a, concerned segments and segmentation accuracy; whereas the second analysis, produced in R 3.5.1 software, was for random forest classification [45].Subsequently, LULC maps based on the assignment of image objects to classes were illustrated in Matlab.Finally, segmentation and classification accuracies were examined numerically and compared graphically.The number of segments for each SN is given in Figure 4. Since shape is known to be the most influential criterion, the number of segments decreased while the shape criteria increased.When the shape was stabilized, increase in compactness led to a higher number of segments.In addition to the above, a decrease in the number of segments was observed by weighting the NIR band.In contrast, the weighted red band increased the number of segments.In Figure 5, the effects of shape and compactness and the weighting bands on segmentation are demonstrated.Figure 5b presents SN 1 with a shape of 0.1, compactness of 0.1, and each band has the same weight.The segmentation attempt produced 16514 segments with these criteria.With the same compactness and band weights, SN 73 (Figure 5c) produced only 4892 segments with the shape criterion of 0.9.When Figure 5c,d is compared, the enhancing effect of compactness on the number of segments can be seen.With the same shape and band weights for SN 73 (Figure 5c) and SN 97 (Figure 5d), SN 97 had 6904 segments while the compactness was 0.9.SN 98 (Figure 5e) and SN 101 (Figure 5f) indicate the band weighting effects for red and NIR.The number of segments is 7178 and 6463, respectively.In Figure 5, the effects of shape and compactness and the weighting bands on segmentation are demonstrated.Figure 5b presents SN 1 with a shape of 0.1, compactness of 0.1, and each band has the same weight.The segmentation attempt produced 16514 segments with these criteria.With the same compactness and band weights, SN 73 (Figure 5c) produced only 4892 segments with the shape criterion of 0.9.When Figure 5c,d is compared, the enhancing effect of compactness on the number of segments can be seen.With the same shape and band weights for SN 73 (Figure 5c) and SN 97 (Figure 5d), SN 97 had 6904 segments while the compactness was 0.9.SN 98 (Figure 5e) and SN 101 (Figure 5f) indicate the band weighting effects for red and NIR.The number of segments is 7178 and 6463, respectively.Although there is no standard approach for image segmentation accuracy assessment, Reference [47] summarized image segmentation accuracy in two main categories, namely, the empirical discrepancy (supervised) and empirical goodness (unsupervised) methods.Then, Reference [22] tabularized the most common related studies with their metrics and references.In this study, over segmentation, area fit index, and quality rate assessment were considered, then the root mean square error (RMSE) was calculated using these assessment values (Equation ( 4)).Noting that i l indicates the th i sample of the total m samples and j s indicates th j segment of the total n intersecting segments with the th i sample, these assessments are explained as follows: Oversegmentation assessment was proposed by Reference [23] and applied in References [48,49].Oversegmentation of a single sample ) OS ( i l can be defined as subtracting the division of the total intersecting area of the sample and segments from one (Equation ( 5)).The overall oversegmentation of the segmentation ) OS ( can be calculated using the means of all i l OS (Equation ( 6)).OS and i l OS have a range between 0 and 1, with 0 as a perfect match.
The area fit index of a single sample ) AFI ( i l can be defined as dividing the sum of the subtracted segments from the sample area by the sample area (Equation ( 7)).The overall Although there is no standard approach for image segmentation accuracy assessment, Reference [47] summarized image segmentation accuracy in two main categories, namely, the empirical discrepancy (supervised) and empirical goodness (unsupervised) methods.Then, Reference [22] tabularized the most common related studies with their metrics and references.In this study, over segmentation, area fit index, and quality rate assessment were considered, then the root mean square error (RMSE) was calculated using these assessment values (Equation ( 4)).Noting that l i indicates the i th sample of the total m samples and s j indicates j th segment of the total n intersecting segments with the i th sample, these assessments are explained as follows:

•
Oversegmentation assessment was proposed by Reference [23] and applied in References [48,49].Oversegmentation of a single sample (OS l i ) can be defined as subtracting the division of the total intersecting area of the sample and segments from one (Equation ( 5)).The overall oversegmentation of the segmentation (OS) can be calculated using the means of all OS l i (Equation ( 6)).OS and OS l i have a range between 0 and 1, with 0 as a perfect match.
The area fit index of a single sample (AFI l i ) can be defined as dividing the sum of the subtracted segments from the sample area by the sample area (Equation ( 7)).The overall oversegmentation of the segmentation (AFI) can be calculated by the means of all AFI l i (Equation ( 8)).AFI and AFI l i have a range between 0 and 1, with 0 as a perfect match.
The quality rate of a single sample (QR l i ) can be defined as dividing the total intersecting area of the sample and the segments by the union area of the sample and the segments (Equation ( 9)).
The overall over segmentation of the segmentation (QR) can be calculated by the means of all QR l i (Equation ( 10)).QR and QR l i have a range between 0 and 1, with 1 as a perfect match.
In this study, the machine learning algorithm called random forest, which yields relatively more accurate results than other classifiers [56], was preferred to label undefined urban image objects of distinctive segmentation.Another reason for choosing random forest is its ability to handle a large data set with higher dimensionality.Random forest as an ensemble classifier works with a large collection of de-correlated decision trees.One third of the samples, also known as out-of-bag (OOB) samples, are used in an internal cross-validation technique for OOB error estimation.While M and N denote, respectively, the total number of input variables and the number of trees, six essential steps to implement the random forest classification algorithm can be explained briefly as follows [57]: Randomly select m variable subsets from M where m < M.

2.
Calculate the best split point among the m feature for node d.

3.
Divide the node into two nodes using the best split.4.
Repeat the first three steps until a certain number of nodes has been reached.

5.
Repeat the first four steps to build the forest N times.6.
Predict new observations with a majority vote.
Labeled segments, which represent delineated samples as training data, were defined in three categories by the distinct segment selection method (Table 4).Segment selection methods (SSM) differ depending on the ratio calculated using the overlap area between segments and delineated samples.If a training segment is totally covered by a sample, in other words no points of a segment geometrically lie in the exterior of a sample, it is called SSM 1.If a segment maximally overflows a sample by 10% of the overlap area, it is called SSM 2. In addition, if a segment maximally overflows a sample by 20% of the overlap area, it is called SSM 3. Figure 6 shows labeled training segments from a building sample using the three distinct criteria.

SSM Total Area (A T ) of Training Segments for any Class Condition Based on Sample Area (A l ) and Segment Area (A s )
Predict new observations with a majority vote.Labeled segments, which represent delineated samples as training data, were defined in three categories by the distinct segment selection method (Table 4).Segment selection methods (SSM) differ depending on the ratio calculated using the overlap area between segments and delineated samples.If a training segment is totally covered by a sample, in other words no points of a segment geometrically lie in the exterior of a sample, it is called SSM 1.If a segment maximally overflows a sample by 10% of the overlap area, it is called SSM 2. In addition, if a segment maximally overflows a sample by 20% of the overlap area, it is called SSM 3. Figure 6 shows labeled training segments from a building sample using the three distinct criteria.Many and various object features, which are based on spectral information, shape, texture, geometry, and contextual semantic knowledge, can be selected to classify image objects after the segmentation step is implemented in OBIA.In total, fourteen variables-mean values of band layers, maximum difference, brightness, GLCM (gray-level co-occurrence matrix) derivatives, and shape index-as significant object features in LULC classifications [21,58,59] were used in this study (Table 5).Each GCLM measure was calculated using the mean value of red, green, blue, and NIR band values.Many and various object features, which are based on spectral information, shape, texture, geometry, and contextual semantic knowledge, can be selected to classify image objects after the segmentation step is implemented in OBIA.In total, fourteen variables-mean values of band layers, maximum difference, brightness, GLCM (gray-level co-occurrence matrix) derivatives, and shape index-as significant object features in LULC classifications [21,58,59] were used in this study (Table 5).Each GCLM measure was calculated using the mean value of red, green, blue, and NIR band values.

Segmentation Accuracy
As the first step of segmentation accuracy analysis, OS, AFI, and QR assessment criteria were evaluated (Figure 8a-c).It can be clearly seen that their distribution showed almost the same trends.To reduce the random error, the RMSE criterion proposed in this study was used as a combination of these three assessments.Figure 8a-d illustrate that RMSE resulted in low accuracies for SSM 1 when compared to SSM 2 and SSM 3. As indicated before, this emerged from the inadequate representation of samples by SSM 1. Also, the shape and compactness criteria and the weighted band effect on segment accuracy could not be sufficiently distinguished in SSM 1.

Segmentation Accuracy
As the first step of segmentation accuracy analysis, OS, AFI, and QR assessment criteria were evaluated (Figure 8a-c).It can be clearly seen that their distribution showed almost the same trends.To reduce the random error, the RMSE criterion proposed in this study was used as a combination of these three assessments.Figure 8a-d illustrate that RMSE resulted in low accuracies for SSM 1 when compared to SSM 2 and SSM 3. As indicated before, this emerged from the inadequate representation of samples by SSM 1. Also, the shape and compactness criteria and the weighted band effect on segment accuracy could not be sufficiently distinguished in SSM 1.
When SSM 2 and SSM 3 were compared, it could be stated that SSM 3 produced higher accuracies with an average of 6% OS, 8% AFI, 5% QR, and 6% RMSE.On the other hand, when the value changes in shape and compactness and the weighted band were considered, similar behaviors in accuracy were observed in SSM 2 and SSM 3.For both methods, lower RMSE values were obtained with 0.5 shape values.Higher shape values needed higher compactness values in order to produce low RMSE values when the compactness changes were individually examined.Although band weighting affected RMSE only slightly; the weighted NIR band caused a moderate reduction in accuracy.Particularly, the equally weighted and the red-NIR-weighted bands achieved high accuracy in segmentation parameters S01, producing small segment sizes, whereas blue-weighted band segmentation in S05 and S09 was noticed to be the most accurate.

Classification Accuracy
As shown in Table 6, 257 out of 324 classifications were successfully implemented in the study.Due to the considerable increase in segment size at high shape values, SSM 1 provided low samplesegment numbers that strongly influence the classifications.Moreover, it is observed that SSM 2 also caused failure on 17 classifications.Sample-segments obtained by SSM 3 successfully accomplished the remained 108 classifications.When considering the number of sample-segments in the OBIA classification, sample-segments for each class were determined properly according to the size of the collected samples and segmentation parameters (Figure 9).Additionally, the number of sample-segments was proportional to the total number of segments of each segmentation attempt.It was also noted that the ratio between the number of sample-segments and total number of segments of each segmentation remained nearly unchanged.Thus, the ratio which trivialized the differences in the number of sample-segments also protected the number of assignments to the training number for each classification.When SSM 2 and SSM 3 were compared, it could be stated that SSM 3 produced higher accuracies with an average of 6% OS, 8% AFI, 5% QR, and 6% RMSE.On the other hand, when the value changes in shape and compactness and the weighted band were considered, similar behaviors in accuracy were observed in SSM 2 and SSM 3.For both methods, lower RMSE values were obtained with 0.5 shape values.Higher shape values needed higher compactness values in order to produce low RMSE values when the compactness changes were individually examined.Although band weighting affected RMSE only slightly; the weighted NIR band caused a moderate reduction in accuracy.Particularly, the equally weighted and the red-NIR-weighted bands achieved high accuracy in segmentation parameters S01, producing small segment sizes, whereas blue-weighted band segmentation in S05 and S09 was noticed to be the most accurate.

Classification Accuracy
As shown in Table 6, 257 out of 324 classifications were successfully implemented in the study.Due to the considerable increase in segment size at high shape values, SSM 1 provided low sample-segment numbers that strongly influence the classifications.Moreover, it is observed that SSM 2 also caused failure on 17 classifications.Sample-segments obtained by SSM 3 successfully accomplished the remained 108 classifications.When considering the number of sample-segments in the OBIA classification, sample-segments for each class were determined properly according to the size of the collected samples and segmentation parameters (Figure 9).Additionally, the number of sample-segments was proportional to the total number of segments of each segmentation attempt.It was also noted that the ratio between the number of sample-segments and total number of segments of each segmentation remained nearly unchanged.Thus, the ratio which trivialized the differences in the number of sample-segments also protected the number of assignments to the training number for each classification.The error matrix, also known as the confusion matrix, is one of the most-used accuracy assessment techniques in supervised LULC classification [60].Both user accuracy and producer accuracy calculated in the error matrix indicate a degree of consistency between class prediction and validation data.Having the largest sample-segment size, user accuracies of error matrices of SSM3 classifications shown in Figure 10 indicate that the bare soil class has almost the highest accuracy in each user accuracy of error matrices.However, bare soil accuracies decreased in S09-C01 while user accuracies for the concrete class obtained higher values.This result indicates that compactness and band weights become more sensitive in high shape values.Moreover, heterogeneous land-use objects, such as buildings and roads, were significantly affected by band weighting, particularly at high shape values.The determined effect was also observed partly in producer accuracies of error matrices (Figure 11).The error matrix, also known as the confusion matrix, is one of the most-used accuracy assessment techniques in supervised LULC classification [60].Both user accuracy and producer accuracy calculated in the error matrix indicate a degree of consistency between class prediction and validation data.Having the largest sample-segment size, user accuracies of error matrices of SSM3 classifications shown in Figure 10 indicate that the bare soil class has almost the highest accuracy in each user accuracy of error matrices.However, bare soil accuracies decreased in S09-C01 while user accuracies for the concrete class obtained higher values.This result indicates that compactness and band weights become more sensitive in high shape values.Moreover, heterogeneous land-use objects, such as buildings and roads, were significantly affected by band weighting, particularly at high shape values.The determined effect was also observed partly in producer accuracies of error matrices (Figure 11).The error matrix, also known as the confusion matrix, is one of the most-used accuracy assessment techniques in supervised LULC classification [60].Both user accuracy and producer accuracy calculated in the error matrix indicate a degree of consistency between class prediction and validation data.Having the largest sample-segment size, user accuracies of error matrices of SSM3 classifications shown in Figure 10 indicate that the bare soil class has almost the highest accuracy in each user accuracy of error matrices.However, bare soil accuracies decreased in S09-C01 while user accuracies for the concrete class obtained higher values.This result indicates that compactness and band weights become more sensitive in high shape values.Moreover, heterogeneous land-use objects, such as buildings and roads, were significantly affected by band weighting, particularly at high shape values.The determined effect was also observed partly in producer accuracies of error matrices (Figure 11).The kappa index produced from the error matrix determined the correspondence between prediction values and actual values as an accuracy criterion.Obtained kappa values (color-based on SSMs) are seen in Figure 12 as a result of random forest classifications.According to the figure, it is seen that the higher the shape value of the segmentation, the greater the kappa accuracy of the classification produced.The slope coefficient of the equation by SSM 1 was lower than those of SSM 2 and SSM 3 when the linear regressions were examined in Figure 12.Most likely, the missing classifications belonging to the higher shape values in SSM 1 caused a decrease in the slope coefficient of the equation SSM 1.On the other hand, when the intercept values were considered, SSM 2 and SSM 3 presented higher kappa values than SSM 1.In addition, the slope coefficient of the equations by SSM 2 and SSM 3 were quite close to each other.The kappa results briefly indicated that SSM 2 and SSM 3 had a similar effect on classification accuracy and led to greater accuracy than SSM 1.The kappa index produced from the error matrix determined the correspondence between prediction values and actual values as an accuracy criterion.Obtained kappa values (color-based on SSMs) are seen in Figure 12 as a result of random forest classifications.According to the figure, it is seen that the higher the shape value of the segmentation, the greater the kappa accuracy of the classification produced.The slope coefficient of the equation by SSM 1 was lower than those of SSM 2 and SSM 3 when the linear regressions were examined in Figure 12.Most likely, the missing classifications belonging to the higher shape values in SSM 1 caused a decrease in the slope coefficient of the equation SSM 1.On the other hand, when the intercept values were considered, SSM 2 and SSM 3 presented higher kappa values than SSM 1.In addition, the slope coefficient of the equations by SSM 2 and SSM 3 were quite close to each other.The kappa results briefly indicated that SSM 2 and SSM 3 had a similar effect on classification accuracy and led to greater accuracy than SSM 1.The kappa index produced from the error matrix determined the correspondence between prediction values and actual values as an accuracy criterion.Obtained kappa values (color-based on SSMs) are seen in Figure 12 as a result of random forest classifications.According to the figure, it is seen that the higher the shape value of the segmentation, the greater the kappa accuracy of the classification produced.The slope coefficient of the equation by SSM 1 was lower than those of SSM 2 and SSM 3 when the linear regressions were examined in Figure 12.Most likely, the missing classifications belonging to the higher shape values in SSM 1 caused a decrease in the slope coefficient of the equation SSM 1.On the other hand, when the intercept values were considered, SSM 2 and SSM 3 presented higher kappa values than SSM 1.In addition, the slope coefficient of the equations by SSM 2 and SSM 3 were quite close to each other.The kappa results briefly indicated that SSM 2 and SSM 3 had a similar effect on classification accuracy and led to greater accuracy than SSM 1.In the study, inconsistency among the number of sample-segments for each class was also considered.For example, buildings, green areas, roads, concrete, and bare soils as depicted in Figure 9 have a different number of sample-segments for the implemented classifications; this is called unequally-sampled classifications.An equal number of subset sample-segments for each class was randomly selected and then classifications were also implemented.The kappa indices computed from equally-sampled classifications were compared with the previous unequally-sampled classifications produced using SSM 3.Although equally-sampled classifications achieved lower accuracies than unequally-sampled classifications (Figure 13), the two types of classification, having different sampling, showed a generally similar tendency in kappa accuracies due to the increasing shape parameter value.It was also seen that accuracies became closer at higher shape values except for some band weighting.Moreover, major accuracy changes occurred among equally-sampled classification due to segmentation parameters.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 19 of 28 In the study, inconsistency among the number of sample-segments for each class was also considered.For example, buildings, green areas, roads, concrete, and bare soils as depicted in Figure 9 have a different number of sample-segments for the implemented classifications; this is called unequally-sampled classifications.An equal number of subset sample-segments for each class was randomly selected and then classifications were also implemented.The kappa indices computed from equally-sampled classifications were compared with the previous unequally-sampled classifications produced using SSM 3.Although equally-sampled classifications achieved lower accuracies than unequally-sampled classifications (Figure 13), the two types of classification, having different sampling, showed a generally similar tendency in kappa accuracies due to the increasing shape parameter value.It was also seen that accuracies became closer at higher shape values except for some band weighting.Moreover, major accuracy changes occurred among equally-sampled classification due to segmentation parameters.

Variable Importance
In the random forest classification, mean decrease accuracy (MDA) and Gini index are the two most-used algorithms for calculating variable importance measures [57,61,62].In this study, MDA, which evaluates and sorts the variable effect on classification accuracy, was used to interpret the relation between segmentation criterion and classification variables.For each classification, the most important three out of fourteen variables (Table 5) were determined.Furthermore, 254 classifications were categorized based on their segmentation criteria, and percentages of the variable appearance were computed (Figure 14).
As shown in Figure 14a-c, RMean and BMean were the most effective variables in these classifications.It was also observed that Brightness, NIRMean, GLCMHom, GLCMCon, and GLCMStd variables had a moderate impact.Particularly, increasing the shape value led to boosting the Brightness and GLCMHom variables; however, the importance of NIRMean, GLCMCon, and GLCMStd variables dramatically dropped in Figure 14a.As compactness and band weighting did not influence segmentation as much as shape, these criteria did not have a continuous effect in variable importance (Figure 14b,c).Furthermore, a significant impact of the brightness variable in OBIA LULC classification based on NIR-weighted segmentation was determined.
The bias possibility in MDA was also considered due to correlated predictor variables such as textural measures.Thus, a conditional permutation importance measure able to evaluate the

Variable Importance
In the random forest classification, mean decrease accuracy (MDA) and Gini index are the two most-used algorithms for calculating variable importance measures [57,61,62].In this study, MDA, which evaluates and sorts the variable effect on classification accuracy, was used to interpret the relation between segmentation criterion and classification variables.For each classification, the most important three out of fourteen variables (Table 5) were determined.Furthermore, 254 classifications were categorized based on their segmentation criteria, and percentages of the variable appearance were computed (Figure 14).
As shown in Figure 14a-c, R Mean and B Mean were the most effective variables in these classifications.It was also observed that Brightness, NIR Mean, GLCM Hom , GLCM Con , and GLCM Std variables had a moderate impact.Particularly, increasing the shape value led to boosting the Brightness and GLCM Hom variables; however, the importance of NIR Mean , GLCM Con , and GLCM Std variables dramatically dropped in Figure 14a.As compactness and band weighting did not influence segmentation as much as shape, these criteria did not have a continuous effect in variable importance (Figure 14b,c).Furthermore, a significant impact of the brightness variable in OBIA LULC classification based on NIR-weighted segmentation was determined.
The bias possibility in MDA was also considered due to correlated predictor variables such as textural measures.Thus, a conditional permutation importance measure able to evaluate the importance of correlated predictor variables was calculated using R Package Party [63,64].The results showed that random forest implementation utilizing conditional inference trees generally produced consistent results with R package RandomForest based on References [45,57].Furthermore, minor changes occurred in the determination of the three most important variables using conditional variable importance.It is also recommended that conditional variable importance should be taken into account in more detailed discussion of uncorrelated variable importance.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 20 of 28 importance of correlated predictor variables was calculated using R Package Party [63,64].The results showed that random forest implementation utilizing conditional inference trees generally produced consistent results with R package RandomForest based on References [45,57].Furthermore, minor changes occurred in the determination of the three most important variables using conditional variable importance.It is also recommended that conditional variable importance should be taken into account in more detailed discussion of uncorrelated variable importance.

Classification Results
Figure 15 presents classification maps for various SSM and segmentation criteria.Especially, buildings with shadows on the ground could not be detected separately as different classes using

Classification Results
Figure 15 presents classification maps for various SSM and segmentation criteria.Especially, buildings with shadows on the ground could not be detected separately as different classes using SSM 1 (Figure 15a) due to inadequate sample representation by segments.Maps produced from SSM ISPRS Int.J. Geo-Inf.2018, 7, 424 19 of 26 2 (Figure 15b) and SSM 3 (Figure 15c) were quite similar, whereas the first row of Figure 15 shows differences in classification among SSMs.The effect of the shape value increase on the classification maps is seen between Figures 15c and 15d.Objects with larger areas were appropriately classified, while there were random errors for small area objects because of the high resolution.On the other hand, increasing compactness causes a salt and pepper effect, as depicted in Figure 15e.Furthermore, due to changes in the band weighting, some object types become more dominant in classification, such as the NIR favored building class.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 21 of 28 SSM 1 (Figure 15a) due to inadequate sample representation by segments.Maps produced from SSM 2 (Figure 15b) and SSM 3 (Figure 15c) were quite similar, whereas the first row of Figure 15 shows differences in classification among SSMs.The effect of the shape value increase on the classification maps is seen between Figure 15c and Figure 15d.Objects with larger areas were appropriately classified, while there were random errors for small area objects because of the high resolution.On the other hand, increasing compactness causes a salt and pepper effect, as depicted in Figure 15e.Furthermore, due to changes in the band weighting, some object types become more dominant in classification, such as the NIR favored building class.The influence of the absolute sample size on class assignment obtained from classification predictions was examined using Figure 16.Substantial differences in class assignment were achieved between unequally and equally-sampled classifications at low shape values.Positive and negative changes were especially determined between concrete and road classes.Positive change referred to more assignments in classes with equal sampling whereas negative ones indicated an increase in assignments in classes with unequal sampling.In this context, bare soil assignments were not influenced exclusively, although that class had more sample-segments than others.On the other hand, segments were mostly predicted as roads under unequally sampling, despite the class having a moderate number of sample-segments compared to other classes.Moreover, concrete was determined as the most influenced class due to its lower number of sample-segments.The results showed that the number of sample-segments was not the unique reason for OBIA classification and class assignments.The influence of the absolute sample size on class assignment obtained from classification predictions was examined using Figure 16.Substantial differences in class assignment were achieved between unequally and equally-sampled classifications at low shape values.Positive and negative changes were especially determined between concrete and road classes.Positive change referred to more assignments in classes with equal sampling whereas negative ones indicated an increase in assignments in classes with unequal sampling.In this context, bare soil assignments were not influenced exclusively, although that class had more sample-segments than others.On the other hand, segments were mostly predicted as roads under unequally sampling, despite the class having a moderate number of sample-segments compared to other classes.Moreover, concrete was determined as the most influenced class due to its lower number of sample-segments.The results showed that the number of sample-segments was not the unique reason for OBIA classification and class assignments.

Comparison between Segmentation and Classification Accuracies
In Figure 17, segmentation RMSE values and classification kappa values were compared to examine the explicit relationship between segmentation accuracy and classification accuracy.Since lower RMSE and higher kappa values indicated better accuracies for segmentation and classification respectively; 1-RMSE is given in Figure 17a-c illustrating SSM 1, 2, and 3, respectively.To provide a comprehensible visualization between RMSE and kappa, each plot was highlighted with boxes defining their shape and compactness values such as S01-C01, S01-C05, and S01-C09.As seen in Figure 17a representing the classifications obtained using SSM 1, S05-C01 and S05-C05 were substantially distinguished by the relative similarities between RMSE and kappa accuracy values.In S01-C05, a partial consistency between RMSE and kappa accuracies was also determined, although it did not as implicitly occur as in S05-C01 and S05-C05.On the other hand, graphical conformity between segmentation and classification accuracy values was seen for S01-C05 and S05-C05 (Figure 17b).Figure 17c mostly illustrates coherence between RSME and kappa values except for all combinations of S09 besides some detected resemblances in both Figure 17a,b.In particular, more conformity between RMSE and kappa values stands out for both S05-C01 and S05-C05 in Figure 17c.Furthermore, Euclidean distances (ED) for each SSM, shape and compactness value, and the weighted bands are given at the bottom-right of the related figure.Moreover, EDs for each highlighted shape and compactness group are given in the boxes.The ED were computed by Equation ( 11) where x represents 1-RMSE, y represents kappa, and x and y are the mean of 1-RMSE and kappa, respectively.Given ED values also solidified the conformities mentioned above.

Comparison between Segmentation and Classification Accuracies
In Figure 17, segmentation RMSE values and classification kappa values were compared to examine the explicit relationship between segmentation accuracy and classification accuracy.Since lower RMSE and higher kappa values indicated better accuracies for segmentation and classification respectively; 1-RMSE is given in Figure 17a-c illustrating SSM 1, 2, and 3, respectively.To provide a comprehensible visualization between RMSE and kappa, each plot was highlighted with boxes defining their shape and compactness values such as S01-C01, S01-C05, and S01-C09.As seen in Figure 17a representing the classifications obtained using SSM 1, S05-C01 and S05-C05 were substantially distinguished by the relative similarities between RMSE and kappa accuracy values.In S01-C05, a partial consistency between RMSE and kappa accuracies was also determined, although it did not as implicitly occur as in S05-C01 and S05-C05.On the other hand, graphical conformity between segmentation and classification accuracy values was seen for S01-C05 and S05-C05 (Figure 17b).Figure 17c mostly illustrates coherence between RSME and kappa values except for all combinations of S09 besides some detected resemblances in both Figure 17a,b.In particular, more conformity between RMSE and kappa values stands out for both S05-C01 and S05-C05 in Figure 17c.Furthermore, Euclidean distances (ED) for each SSM, shape and compactness value, and the weighted bands are given at the bottom-right of the related figure.Moreover, EDs for each highlighted shape and compactness group are given in the boxes.The ED were computed by Equation ( 11) where x represents 1-RMSE, y represents kappa, and x and y are the mean of 1-RMSE and kappa, respectively.Given ED values also solidified the conformities mentioned above.

Conclusions
In this study, accuracies of multi-resolution segmentation and classification based on different shape, color, compactness, and band-weights were analyzed.The proposed RMSE, defined in this study as a remarkable coexistence of three profoundly different accuracy assessments, delivered more objective and comprehensive segmentation accuracy.Weighting only the NIR band reduced the accuracy of the segmentation.However, due to the high importance of the NIR band in the classification, color-infrared images should be used for newly-developed urban areas.
When sample selection for OBIA is considered, sample-segments overflowing samples by 20% provided more appropriate segment representation for the sample image objects.On the other hand, the ratio of the total number of sample-segments to the total number of segments should not be less than 0.5% for an accurate classification.
In general, a concrete correlation between segmentation and classification accuracies can be stated.However, the largest deviations are expected in classifications derived using high shape values.The compactness value exerted a greater effect when it was used with higher shape values.Kappa indices also prove that high compactness for low shape values and low compactness for high shape values should be selected.
In classification attempts, various segmentation parameters highlighted objects of some classes while partly ignoring the remainder of the objects in other classes.For example, classifications using red band-weighted segmentation led to some object loss in the building class.When all LULC categories are classified using high resolution infrared aerial images, S05-C05 as segmentation parameters and SSM 3 as sample selection method are recommended.
For the future, LULC sub-classes can be considered to highlight the effect of different band combinations on various objects, as the present study was implemented for five classes.Classifications using high resolution satellite images having varied spectral bands and ranges should be discussed and compared to each other to understand the broad relationship between segmentation

Conclusions
In this study, accuracies of multi-resolution segmentation and classification based on different shape, color, compactness, and band-weights were analyzed.The proposed RMSE, defined in this study as a remarkable coexistence of three profoundly different accuracy assessments, delivered more objective and comprehensive segmentation accuracy.Weighting only the NIR band reduced the accuracy of the segmentation.However, due to the high importance of the NIR band in the classification, color-infrared images should be used for newly-developed urban areas.
When sample selection for OBIA is considered, sample-segments overflowing samples by 20% provided more appropriate segment representation for the sample image objects.On the other hand, the ratio of the total number of sample-segments to the total number of segments should not be less than 0.5% for an accurate classification.
In general, a concrete correlation between segmentation and classification accuracies can be stated.However, the largest deviations are expected in classifications derived using high shape values.The compactness value exerted a greater effect when it was used with higher shape values.Kappa indices also prove that high compactness for low shape values and low compactness for high shape values should be selected.
In classification attempts, various segmentation parameters highlighted objects of some classes while partly ignoring the remainder of the objects in other classes.For example, classifications using red band-weighted segmentation led to some object loss in the building class.When all LULC categories are classified using high resolution infrared aerial images, S05-C05 as segmentation parameters and SSM 3 as sample selection method are recommended.
For the future, LULC sub-classes can be considered to highlight the effect of different band combinations on various objects, as the present study was implemented for five classes.Classifications using high resolution satellite images having varied spectral bands and ranges should be discussed and compared to each other to understand the broad relationship between segmentation and object-based classification.Segmentation accuracy assessment methods should also be examined to better reflect object-based classification needs in the future.Impact of different overflow values in sample selection should be considered in terms of image-object representation.

Figure 4 .
Figure 4. Number of segments according to segmentation parameters.

Figure 4 .
Figure 4. Number of segments according to segmentation parameters.

1 Figure 6 .
Figure 6.Defining training segments using different criteria.LULC classification C can be expressed as

Figure 6 . 28 Figure 7 .
Figure 6.Defining training segments using different criteria.LULC classification C can be expressed as C = {c 1 , c 2 , . . . ,c m } and c i indicates any i th (1 ≤ i ≤ m) class of C whereas m indicates the total number of classes in C. Training samples L, defined as representative of any class c i , can be written as L i = {l i1 , l i2 , . . . ,l in }, where l ij indicates the j th sample of class c i .Moreover, labeled training segments S are explained as S j = s j1 , s j2 , . . ., s jk where k indicates the total number of overlapping segments satisfying the criterion with sample l ij (Figure 7).When the three different SSMs are considered, the implementation of 324 random forest classifications, three times the 108 segmentations, were predicted in this study.ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 13 of 28

Figure 9 .
Figure 9. Number of sample-segments for each class in classifications based on SSM 3.

Figure 10 .
Figure 10.User accuracies for each class in classifications based on SSM 3.

Figure 9 .
Figure 9. Number of sample-segments for each class in classifications based on SSM 3.

28 Figure 9 .
Figure 9. Number of sample-segments for each class in classifications based on SSM 3.

Figure 10 .
Figure 10.User accuracies for each class in classifications based on SSM 3.

Figure 10 .
Figure 10.User accuracies for each class in classifications based on SSM 3.

Figure 11 .
Figure 11.Producer accuracies for each class in classifications based on SSM 3.

Figure 12 .
Figure 12.Kappa values according to segmentation numbers.

Figure 11 .
Figure 11.Producer accuracies for each class in classifications based on SSM 3.

28 Figure 11 .
Figure 11.Producer accuracies for each class in classifications based on SSM 3.

Figure 12 .
Figure 12.Kappa values according to segmentation numbers.

Figure 12 .
Figure 12.Kappa values according to segmentation numbers.

Figure 13 .
Figure 13.Comparison of classification accuracies of equal and unequal number of sample-segment classifications based on SSM 3.

Figure 13 .
Figure 13.Comparison of classification accuracies of equal and unequal number of sample-segment classifications based on SSM 3.

Figure 14 .
Figure 14.Mean decrease accuracy for variables.

Figure 14 .
Figure 14.Mean decrease accuracy for variables.

Figure 16 .
Figure 16.Percentage of change in class assignment in classifications based on SSM 3.

Figure 16 .
Figure 16.Percentage of change in class assignment in classifications based on SSM 3.

Table 1 .
Microsoft UltraCam Eagle photogrammetric large-format digital aerial camera specification.

Table 2 .
Mosaic image features obtained using photogrammetric process.

Table 3 .
Produced segmentations using different parameters.

Table 4 .
Three criteria for training segment definition.

Table 4 .
Three criteria for training segment definition.

Table 6 .
Number of random forest classifications.

Table 6 .
Number of random forest classifications.