#### 3.1. Sampling Analysis

Figure 5 and

Figure 6 show how the

p-value between the segments with similar land covers (two black roofs,

Figure 5) and segments with different land covers (a black roof and a grass patch,

Figure 6) is influenced by the number of sampled pixels using the Kolmogorov-Smirnov statistics and the Student’s

t-test statistics. Different sized pixel sets were sampled 100 times in each segment. As shown later, with this number of samplings (100), the

p-value becomes stable and does not vary significantly.

The comparison of

Figure 5 and

Figure 6 shows that the

p-value converged to zero as the number of sampled pixels increased. The drop is very fast when the Kolmogorov-Smirnov statistics was applied. As defined in

Section 2.4., both proposed classifiers are based on the proximity or similarity analysis. The starting premise is that no two segments are identical. Instead, segments should be considered as continuous variables for which the perspective plays a significant role: the closer we look (

i.e., the larger the sample size), the smaller is the necessary difference that leads to the claim that the segments are not derived from the same population, despite the fact that two empirical cumulative distribution functions would appear to be the almost identical when plotted. Therefore, a sufficiently large sample would lead to the rejection of the hypothesis, if, of course, the hypothesis testing was applied. These conclusions could also reflect the equations used for the

p-value computation that is better suited to small and medium sized samples. However, we also tested the asymptotic functions suitable for larger samples with the two-sample Kolmogorov-Smirnov statistic that yielded

p-values of a similar magnitude.

The results shown in

Figure 5 and

Figure 6 are summarized in

Figure 7 that illustrates the difference in the

p-value in individual bands for the two-sample Kolmogorov-Smirnov statistics (

Figure 7a) and for the Student’s

t-test statistics (

Figure 7b) in the example for 10 sampled pixels. Both classifiers clearly indicated that the largest differentiation between different land covers (a black roof and a grass patch) was seen in Red-Edge, Near Infrared 1 and Near Infrared 2 bands. In these bands, the calculated

p-value was insignificant (almost equal to zero), which suggests that the two observed segments were actually different and that infrared bands could contribute useful separability information in the classification process. Different

p-values in the Red band (

p-value was higher for segments with different land covers) indicated a possible misclassification if only this band was used for classification purposes. The results were also summarized for cases in which a higher numbers of pixels (20, 50, 100, 200, 500, and 1000) were sampled. As the number of the sampled pixels increased, the Coastal and Green bands became increasingly important in the differentiation with the Kolmogorov-Smirnov test statistics, as the

p-values in Red-Edge, Near Infrared 1 and Near Infrared 2 bands converged towards zero very fast (with 50 sampled pixels). The Red band became unreliable for classification purposes. With the Student’s

t-test statistics, Yellow, Red and Red-Edge bands were the highest contributors to the separation of different segments as the number of sampled pixels increased.

**Figure 5.**
The per-band p-value computation between segments with similar land covers (two black roofs) when different sizes of sampled pixel sets were applied and when: (**a**) the two-sample Kolmogorov-Smirnov statistics; (**b**) the Student’s t-test statistics was used.

**Figure 5.**
The per-band p-value computation between segments with similar land covers (two black roofs) when different sizes of sampled pixel sets were applied and when: (**a**) the two-sample Kolmogorov-Smirnov statistics; (**b**) the Student’s t-test statistics was used.

**Figure 6.**
The per-band p-value computation between segments with different land covers (a black roof and a grass patch) when different sizes of sampled pixel sets were applied and when: (**a**) the two-sample Kolmogorov-Smirnov statistics; (**b**) the Student’s t-test statistics was used.

**Figure 6.**
The per-band p-value computation between segments with different land covers (a black roof and a grass patch) when different sizes of sampled pixel sets were applied and when: (**a**) the two-sample Kolmogorov-Smirnov statistics; (**b**) the Student’s t-test statistics was used.

Figure 7 represents the results merely for the three analyzed segments (two black roofs and a grass patch). In order to provide more generalized and objective conclusions on a contribution of individual bands to classification results, a larger number of segments with different land covers should be selected and compared.

Finally, in order to study the effect of the number of samplings on the

p-value, a sample size of 10 pixels was chosen. We selected this small sample size, because the

p-value was the highest when the minimum number of pixels was sampled (see

Figure 5 and

Figure 6). However, it was assumed that with a small-sized sampled pixel set, the

p-value could easily be affected by outliers. The question arose as to how many times a set of pixels needs to be sampled in order for the

p-value to become stable. The results are presented in

Figure 8 and

Figure 9.

With both classifiers the p-value stabilized when approximately 100 samplings were performed. The sampling analysis also included the comparison of two small segments (i.e., segments with less than 150 pixels) and mixed sized segments (i.e., one small and one large segment). The results were consistent with the large segments.

**Figure 7.**
The difference in the

p-value per individual band for 10 sampled pixels when: (

**a**) the Kolmogorov-Smirnov statistics (

Figure 5a and

Figure 6a combined); (

**b**) the Student’s

t-test statistics was applied (

Figure 5b and

Figure 6b combined).

**Figure 7.**
The difference in the

p-value per individual band for 10 sampled pixels when: (

**a**) the Kolmogorov-Smirnov statistics (

Figure 5a and

Figure 6a combined); (

**b**) the Student’s

t-test statistics was applied (

Figure 5b and

Figure 6b combined).

The substantial difference between the sizes of the two compared samples was the subject of the research in the field of hypothesis testing. Gordon and Klebanov [

22] proved that a paradoxical situation takes place when the two-sample Kolmogorov-Smirnov test is used: one cannot use additional information contained in a very large sample if the second sample is relatively small, meaning that the two-sample Kolmogorov-Smirnov test can lose power when the sizes of the two samples differ substantially.

To summarize, 10 pixels were sampled 100 times for each segment both, with the Kolmogorov-Smirnov and Student’s t-test classifiers. This assured that the two samples were always of the same size as well as enabled the inclusion of very small segments. Segments with less than 10 pixels, i.e., 23 segments in total, were left unclassified. Due to their small size they had no visual effect on the final classification image.

**Figure 8.**
The per-band p-value computation between segments with similar land covers (two black roofs) when different numbers of 10 pixel samples were applied and when: (**a**) the two-sample Kolmogorov-Smirnov statistics; (**b**) the Student’s t-test statistics was used.

**Figure 8.**
The per-band p-value computation between segments with similar land covers (two black roofs) when different numbers of 10 pixel samples were applied and when: (**a**) the two-sample Kolmogorov-Smirnov statistics; (**b**) the Student’s t-test statistics was used.

**Figure 9.**
The per-band p-value computation between segments with different land covers (a black roof and a grass patch) when different numbers of 10 pixel samples were applied and when: (**a**) the two-sample Kolmogorov-Smirnov statistics; (**b**) the Student’s t-test statistics was used.

**Figure 9.**
The per-band p-value computation between segments with different land covers (a black roof and a grass patch) when different numbers of 10 pixel samples were applied and when: (**a**) the two-sample Kolmogorov-Smirnov statistics; (**b**) the Student’s t-test statistics was used.

#### 3.2. Classification Results and Accuracy Assessment

The final classification results and the basic accuracy assessment of the four used classifiers are summarized in

Table 3 and

Table 4. Several accuracy measures, which give direct and basic indications of the classification quality, can be applied in the evaluation process [

23]. In our study, user and producer accuracies per class and mean user and producer accuracies (in bold) were adopted and provided.

Table 3 presents a confusion matrix for all four classifiers for the event in which

k-NN and SVM classification methods were conducted with four spectral attributes, while

Table 4 provides additional performance results for the event in which 22 attributes (4 spectral, 4 texture and 14 spatial) were used with

k-NN and SVM classifiers. In the latter case, the results served to study the effect additional attributes have on the classification performance. The basic classification capability was estimated by examining the number of segments classified as the addressed class in regards to the reference data and the number of misclassified segments for both the 4-band and the 8-band image. None of the four methods produced mean user and producer accuracies that would exceed 90%.

Moderate classification accuracies were the result of a single level segmentation, which can never ensure a clear delineation of objects and boundaries as seen in reality. Therefore, the segments appeared over- and under-segmented, which lead to a misinterpretation of their true land cover. This can be observed with all four classifiers by reading the values in the columns for trees and grass classes, where mainly buildings have been misassigned to this class. This is a result of over-segmentation, in which a building and a tree appeared within the same segment on numerous occasions.

The classification results for the 4-band image in

Table 3 reveal that higher mean user and producer accuracies were achieved with Kolmogorov-Smirnov and Student’s

t-test classifiers (when the sampling approach was adopted) in overall, than with

k-NN and SVM methods. These results show the potential of the in-segment analysis for the classification process. Amongst the four methods, the parametric Student’s

t-test classifier resulted in the best overall performance, which suggests that the normality violation is not an issue when small-sized pixel sets are sampled. The sampling of small-sized pixel sets had a greater effect on the computation of the empirical cumulative distribution functions. With a sample consisting of a mere 10 pixels, the in-segment distribution and its unique characteristics were inefficiently captured due to the large generalization, which resulted in the reduced classification capability of the two-sample Kolmogorov-Smirnov test statistics. Thus, empirical distribution functions should be approached in a different way.

Both proposed classifiers performed best for the classification of roads, grass and trees. Using merely spectral information resulted in a poor classification of objects from different classes built from the same material (

i.e., roads and buildings with grey roofs). Thus all four classifiers most commonly misclassified the buildings and roads. This suggests that the two proposed classifiers would also be ineffective for classifying 3-band RGB aerial images. The SVM classifier outperformed the remaining three methods when detecting buildings; however its performance was very poor for classifying roads. Six training samples were included in the roads class. The poor performance of the SVM classifier might have been caused by the large, yet spectrally relatively homogenous (no margin training samples were provided) training samples. The SVM method is otherwise known to achieve high classification results with small training sets [

19].

It is insufficient to rely merely on the spectral information in supervised classification when training samples are not representative or individual classes do not show a great variety of different objects. This issue was most evident in both proposed classifiers. Their characteristic is to find the best match for the segment in the analysis related to the available training data samples (the all-against-one approach), making them highly dependable on the number and variety of training samples. If an insufficient number of samples are provided, both classifiers will provide a match and assign a class to a segment; however this class might not be the real one.

**Table 3.**
A confusion matrix for all four classifiers for the 4- and 8-band input image in the event when k-NN and SVM classification methods were conducted with four spectral attributes. The user and producer accuracy values in bold represent the mean overall user and producer accuracy.

**Table 3.**
A confusion matrix for all four classifiers for the 4- and 8-band input image in the event when k-NN and SVM classification methods were conducted with four spectral attributes. The user and producer accuracy values in bold represent the mean overall user and producer accuracy.
| **k-Nearest Neighbor** |

**4-Band Image** | **8-Band Image** |

**Reference** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** |

Roads | **43** | 20 | 4 | 67 | 59.7 | **48** | 18 | 1 | 67 | 57.8 |

Building | 29 | **190** | 38 | 257 | 87.6 | 32 | **195** | 30 | 257 | 82.2 |

Trees + grass | 0 | 7 | **169** | 176 | 80.1 | 3 | 24 | **149** | 176 | 82.8 |

Total | 72 | 217 | 211 | 500 | **75.8** | 83 | 237 | 180 | 500 | **74.3** |

User accuracy (%) | 64.2 | 73.9 | 96.0 | **78.0** | | 71.6 | 75.9 | 84.6 | **77.4** | |

| **Support Vector Machine** |

**4-Band Image** | **8-Band Image** |

**Reference** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** |

Roads | **22** | 41 | 4 | 67 | 59.5 | **23** | 18 | 1 | 67 | 62.2 |

Building | 15 | **210** | 32 | 257 | 83.3 | 13 | **210** | 34 | 257 | 90.5 |

Trees + grass | 0 | 1 | **175** | 176 | 82.9 | 1 | 4 | **171** | 176 | 83.0 |

Total | 37 | 252 | 211 | 500 | **75.2** | 37 | 232 | 206 | 500 | **78.6** |

User accuracy (%) | 32.8 | 81.7 | 99.4 | **71.3** | | 34.3 | 81.7 | 97.1 | **71.0** | |

| **Two-Sample Kolmogorov-Smirnov Test Statistics Classifier** |

**4-Band Image** | **8-Band Image** |

**Reference** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** |

Roads | **52** | 14 | 1 | 67 | 61.1 | **52** | 14 | 1 | 67 | 59.0 |

Building | 33 | **179** | 45 | 257 | 92.7 | 36 | **181** | 40 | 257 | 92.8 |

Trees + grass | 0 | 0 | **176** | 176 | 79.3 | 0 | 0 | **176** | 176 | 81.1 |

Total | 85 | 193 | 222 | 500 | **77.7** | 88 | 195 | 217 | 500 | **77.6** |

User accuracy (%) | 77.6 | 69.6 | 100 | **82.4** | | 77.6 | 70.4 | 100 | **82.7** | |

| **Student’s t-Test Statistics Classifier** |

**4-Band Image** | **8-Band Image** |

**Reference** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** |

Roads | **52** | 14 | 1 | 67 | 56.5 | **52** | 14 | 1 | 67 | 59.1 |

Building | 40 | **194** | 23 | 257 | 93.3 | 36 | **196** | 25 | 257 | 93.3 |

Trees + grass | 0 | 0 | **176** | 176 | 88.0 | 0 | 0 | **176** | 176 | 871 |

Total | 92 | 208 | 200 | 500 | **79.3** | 88 | 210 | 202 | 500 | **79.8** |

User accuracy (%) | 77.6 | 75.5 | 100 | **84.4** | | 77.6 | 75.5 | 100 | **84.4** | |

**Table 4.**
Additional confusion matrix for k-NN and SVM classifiers; conducted with 22 attributes (4 spectral, 4 texture and 14 spatial attributes).

**Table 4.**
Additional confusion matrix for k-NN and SVM classifiers; conducted with 22 attributes (4 spectral, 4 texture and 14 spatial attributes).
| **k-Nearest Neighbor** |

**4-Band Image** | **8-Band Image** |

**Reference** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** |

Roads | **40** | 20 | 7 | 67 | 68.9 | **44** | 20 | 3 | 67 | 74.6 |

Building | 16 | **202** | 39 | 257 | 83.1 | 14 | **213** | 30 | 257 | 81.9 |

Trees/grass | 2 | 21 | **153** | 176 | 76.9 | 1 | 27 | **148** | 176 | 81.8 |

Total | 58 | 243 | 199 | 500 | **76.3** | 59 | 260 | 181 | 500 | **79.4** |

User accuracy (%) | 59.7 | 78.6 | 86.9 | **75.1** | | 65.7 | 82.9 | 84.1 | **77.6** | |

| **Support Vector Machine** |

**4-Band Image** | **8-Band Image** |

**Reference** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** | **Roads** | **Building** | **Trees + Grass** | **Total** | **Producer Accuracy (%)** |

Roads | **32** | 25 | 10 | 67 | 88.9 | **35** | 30 | 2 | 67 | 87.5 |

Building | 4 | **226** | 27 | 257 | 87.9 | 5 | **231** | 21 | 257 | 87.1 |

Trees/grass | 0 | 6 | **170** | 176 | 82.1 | 0 | 4 | **172** | 176 | 88.2 |

Total | 36 | 257 | 207 | 500 | **86.3** | 40 | 265 | 195 | 500 | **87.6** |

User accuracy (%) | 47.8 | 87.9 | 96.6 | **77.4** | | 52.2 | 89.9 | 97.7 | **79.9** | |

There were no major differences between the classification results when the 4- or the 8-band images were used. The overall accuracies were only slightly higher (up to 3.5%) with the 8-band image for the

k-NN and SVM methods and slightly lower (up to 1%) for the two-sample Kolmogorov-Smirnov and Student’s

t-test classifiers. In the terms of user’s accuracy, the classes that benefited the most from additional bands were: roads in the

k-NN and SVM classification methods, and buildings in the two-sample Kolmogorov-Smirnov classifier. With the Student’s

t-test classifier, user accuracies remained unchanged. While the use of the 8-band WorldView-2 image has been recognized to improve classification results [

24], this did not prove significant in our study. As seen in

Figure 7, a WorldView-2 image with its additional bands (Coastal, Yellow and Red-Edge) provides high separability between different land covers. However, as suggested in [

24], the low positive contribution of the extra bands to classification accuracies could be due to the inability of classifiers to handle the pairwise collinearity of bands, meaning that each of the new bands is correlated to one or more of the standard band(s).

After additional attributes were selected for the

k-NN and SVM methods, the latter resulted in increased user and producer accuracy (up to 10%), while the performance of the

k-NN method did not change significantly (

Table 4). A SVM classifier has been reported to perform well with an increased feature dimensionality (number of attributes) [

25], but it has also proven to be less sensitive to increases in feature dimensionality compared to other statistical classifiers [

26].

Figure 10 represents the classification output images created with the use of the 8-band image. The classification results were not field-verified, thus the Worldview-2 satellite image served as the ground truth image. A visual inspection and comparison of the classification images and the satellite data indicated that more realistic and accurate maps were obtained with two proposed classifiers. It can be seen that with the

k-NN and SVM classifiers numerous segments were classified as buildings (approximately 60% of all 1544 segments) and this resulted in a generalized output classification image. A detailed comparison of classification images is provided in

Figure 11.

In addition to the presented case study, four algorithms were used to classify 3338 segments in the second highly residential study area measuring 0.25 km^{2} and located 1.5 km west of the primary study area. This classification was conducted using a 4-band Worldview-2 image (Red, Green, Blue and NIR1 bands) and a new set of training samples for four land cover classes: roads (6), buildings (9), trees (7) and grass (9). With the k-NN and SVM classification methods only four spectral attributes were used.

The performance of the classification conducted for the new study area was assessed only visually, i.e., the classification results were compared to the WorldView-2 satellite image, which was used as the reference image. The determination of the reference information for 3338 segments was omitted since it would be influenced by high subjectivity. In terms of output classification images, the classification results were consistent with the previously described results. Overall, the Student’s t-test statistics based classifier yielded the most realistic classification image, despite misclassifying several buildings with grey roofs as roads. The k-NN classifier misclassified several roads and other built-up and impervious areas as buildings and some of the buildings with black roofs as trees or grass. The SVM classifier performed poorly when classifying roads, as it classified (generalized) areas of buildings and roads merely as buildings, similarly to what was described for the primary study area. Compared to the Student’s t-test classifier, the two-sample Kolmogorov-Smirnov test statistics based classifier classified a higher number of buildings with grey roofs as roads. Trees and grass areas were most accurately determined with two proposed classifiers. With the use of the two-sample Kolmogorov-Smirnov statistics based classifier and Student’s t-test statistics based classifier 120 segments (3% of all segments) remained unclassified.

**Figure 10.**
The final classification image for all four classifiers related to the 8-band classification results presented in

Table 4: (

**a**)

k-NN; (

**b**) SVM; (

**c**) the two-sample Kolmogorov-Smirnov statistics based classifier; (

**d**) the Student’s

t-test statistics based classifier.

**Figure 10.**
The final classification image for all four classifiers related to the 8-band classification results presented in

Table 4: (

**a**)

k-NN; (

**b**) SVM; (

**c**) the two-sample Kolmogorov-Smirnov statistics based classifier; (

**d**) the Student’s

t-test statistics based classifier.

The high computing complexity and low speed performance are considered to be the major drawbacks of the proposed classification approach. On average the Student’s t-test based algorithm needs 1.5 s to classify one segment with a 4-band image and 3 s to classify one segment with an 8-band image. Using the two-sample Kolmogorov-Smirnov test based algorithm, it takes an average of 5 s to classify one segment with a 4-band image and 13 s to classify one segment with an 8-band image. Both features are a result of the data structure organization of the computed variables in the IDL programming language and could be improved significantly if the sampling approach and the computation of the empirical distribution functions were altered.

Merely a few approaches in literature have been found to deal with the analysis of the impact the in-segment investigation has on the object-based classification results. Toure

et al. [

11] reported that curve matching classifiers yielded higher classification accuracies than the nearest neighbor classifier, however, the overall accuracies of all classifiers were still moderate (between 27% and 73%). Sridharan and Qiu [

8] compared the effectiveness of the Fuzzy Kolmogorov-Smirnov classifier to six other classifiers (

k-NN, SVM, Spectral Angle Mapper, Maximum Likelihood classifier, Minimum Distance to Mean classifier and Parallelepiped classifier). The Fuzzy Kolmogorov-Smirnov classifier performed substantially better than the remaining classifiers. Its overall accuracy was higher by at least 10%.

**Figure 11.**
The detailed subset of classification images represented in

Figure 10 in relation to the satellite image that was used as an input for the classification process and was also considered as the ground truth image: (

**a**) pan-sharpened Worldview-2 image (shown as a false color composite); (

**b**)

k-NN classifier; (

**c**) SVM classifier; (

**d**) the two-sample Kolmogorov-Smirnov statistics based classifier; (

**e**) the Student’s

t-test statistics based classifier. Images reveal higher classification accuracies of the two proposed classifiers compared to the ground truth image.

**Figure 11.**
The detailed subset of classification images represented in

Figure 10 in relation to the satellite image that was used as an input for the classification process and was also considered as the ground truth image: (

**a**) pan-sharpened Worldview-2 image (shown as a false color composite); (

**b**)

k-NN classifier; (

**c**) SVM classifier; (

**d**) the two-sample Kolmogorov-Smirnov statistics based classifier; (

**e**) the Student’s

t-test statistics based classifier. Images reveal higher classification accuracies of the two proposed classifiers compared to the ground truth image.

The performance of the proposed classification approach was not compared to machine learning and tree-based algorithms, which are considered as advanced classification techniques and are becoming increasingly popular in the GEOBIA community. Extensive analysis conducted in [

10] and [

24] indicate that high classification accuracies can be achieved with the Random Forest classifier when applied to the supervised object-based approach; however, the high accuracy was to a great extent dependent on the training set representativeness and the quality of segmentation [

24]. In spite of high classification accuracies, these two studies cannot serve for the straightforward comparison, since in [

10], the Landsat TM data with a course spatial resolution was used and in [

24] segments were manually delineated.

Although the

k-NN and SVM methods provided less accurate results with the use of the 4-band image, this does not imply that these are inefficient classification methods. It has been suggested [

27] that selecting the “best” classifier is not necessarily the wisest decision, since valuable information may be produced by classifiers that are considered to be less successful. Taking into account that a specific classification metric is better adapted to a specific situation, all classifiers can have the potential of being the “best” for a certain situation. Therefore, image understanding and active learning techniques tend to be implemented and used [

1]. Image understanding is a complex cognitive process for which we may currently lack the key concepts. Therefore, in order to ensure a strong foundation, research in this field needs to focus on the deeper understanding of the relationship between image objects and landscape objects, rather than exclusively on the development of new techniques that are tailored for specific applications [

28] and rarely transferable [

3].