Automatic Labelling and Selection of Training Samples for High-Resolution Remote Sensing Image Classification over Urban Areas

Supervised classification is the commonly used method for extracting ground information from images. However, for supervised classification, the selection and labelling of training samples is an expensive and time-consuming task. Recently, automatic information indexes have achieved satisfactory results for indicating different land-cover classes, which makes it possible to develop an automatic method for labelling the training samples instead of manual interpretation. In this paper, we propose a method for the automatic selection and labelling of training samples for high-resolution image classification. In this way, the initial candidate training samples can be provided by the information indexes and open-source geographical information system (GIS) data, referring to the representative land-cover classes: buildings, roads, soil, water, shadow, and vegetation. Several operations are then applied to refine the initial samples, including removing overlaps, removing borders, and semantic constraints. The proposed sampling method is evaluated on a series of high-resolution remote sensing images over urban areas, and is compared to classification with manually labeled training samples. It is found that the proposed method is able to provide and label a large number of reliable samples, and can achieve satisfactory results for different classifiers. In addition, our experiments show that active learning can further enhance the classification performance, as active learning is used to choose the most informative samples from the automatically labeled samples.


Introduction
Classification is one of the most vital phases for remote sensing image interpretation, and the classification model learned from the training samples should be extended and transferred in the whole image. To date, many different pattern recognition methods have been successfully applied to remote sensing classification. Maximum likelihood classification (MLC) has proved to be robust for remote sensing images, as long as the data meet the distribution assumption (e.g., a Gaussian distribution) [1]. However, MLC does not achieve satisfactory results when the estimated distribution does not represent the actual distribution of the data [2]. In such cases, a single class may contain more than one component in the feature space, the distribution of which cannot be classes of building, shadow, water, and vegetation, respectively [29]. The HSV color space is used to describe the distribution of the soil and road lines are provided by OSM. In [29], these multiple information indexes were integrated and interpreted by a multi-kernel learning approach, aiming to classify high-resolution images. In addition, the MBI has proven effective for building change detection, where the change in the MBI index is considered as the condition for building change in urban areas [30].
Morphological Building Index (MBI): Considering the fact that the relatively high reflectance of roofs and the spatially adjacent shadows lead to the high local contrast of buildings, the MBI aims to build the relationship between the spectral-structural characteristics of buildings and the morphological operators [25]. It is defined as the sum of the differential morphological profiles (DMP) of the white top-hat (W-TH) : MBI= W-THpd, sq=I-γ re I pd, sq where γ re I represents the opening-by-reconstruction of the brightness image (I), and d and s denote the parameters of direction and scale, respectively. The white top-hat DMP is used to represent the local contrast of bright structures, corresponding to the candidate building structures [25].
Morphological Shadow Index (MSI): Considering the low reflectance and the high local contrast of shadow, the MSI can be conveniently extended from the MBI by replacing the white top-hat (W-TH) with the black top-hat (B-TH) : MSI= B-THpd, sq=ϕ re I pd, sq-I where ϕ re I represents the closing-by-reconstruction of the brightness image, and is used to represent the local contrast of shadows [25]. The MBI and the MSI have achieved satisfactory results in terms of accuracies and visual interpretation in experiments [24,25]. In this study, they are used to generate the initial training samples for buildings and shadows, respectively.
Normalized Difference Water Index (NDWI): Water has a low reflection in the infrared channel and a high reflection in the green channel [26]. Therefore, the NDWI makes use of this difference to enhance the description of water, and is defined as: Normalized Difference Vegetation Index (NDVI): According to the different reflection of vegetation canopies in the NIR and red channels [27], the NDVI is defined as: HSV Color System: HSV is a common color system, standing for hue (0~1), saturation (0~1), and value (0~1). The HSV color system is able to quantitatively describe the color space for an image [31]. In this research, HSV transform is used to detect the soil components which present as yellow or yellowish-red in the color space.
Open Street Map (OSM): OSM is a free, editable map of the whole world, which contains a large amount of location information, especially abundant and detailed road lines [28]. In this research, the road networks are registered with the corresponding remote sensing images, and the training samples for roads can then be obtained. As shown in Figure 1 A graphic example of a WorldView-2 image is used to show the effectiveness of the information indexes, the HSV-based soil detection, as well as the OSM road lines for the automatic sample collection ( Figure 2). From the illustrations, it can be clearly seen that these information sources can provide effective descriptions of buildings, shadow, water, vegetation, soil, and roads. The visual results show that it is possible to automatically select candidate training samples. In particular, the soil components are highlighted as dark green in the HSV space, and can be detected as soil. However, it should be noted that there are overlaps between some similar classes, e.g., water and shadows. This suggests that the samples generated from the information sources cannot be directly used for classification, and refinement processing is needed. A graphic example of a WorldView-2 image is used to show the effectiveness of the information indexes, the HSV-based soil detection, as well as the OSM road lines for the automatic sample collection ( Figure 2). From the illustrations, it can be clearly seen that these information sources can provide effective descriptions of buildings, shadow, water, vegetation, soil, and roads. The visual results show that it is possible to automatically select candidate training samples. In particular, the soil components are highlighted as dark green in the HSV space, and can be detected as soil. However, it should be noted that there are overlaps between some similar classes, e.g., water and shadows. This suggests that the samples generated from the information sources cannot be directly used for classification, and refinement processing is needed. 4 provide effective descriptions of buildings, shadow, water, vegetation, soil, and roads. The visual results show that it is possible to automatically select candidate training samples. In particular, the soil components are highlighted as dark green in the HSV space, and can be detected as soil. However, it should be noted that there are overlaps between some similar classes, e.g., water and shadows. This suggests that the samples generated from the information sources cannot be directly used for classification, and refinement processing is needed.

Automatic Sample Selection
This section introduces the proposed method for the automatic selection of training samples for buildings, shadow, water, vegetation, roads and soil, as illustrated in Figure 3, including the following steps.
(1) Select initial training samples of buildings, shadow, water, vegetation, soil, and roads, respectively, from the multiple information sources. (2) Samples located at the border areas are likely to be mixed pixels, and, thus, it is difficult to automatically assign these pixels to a certain label. In order to avoid introducing incorrect training samples, these border samples are removed with an erosion operation. (3) Manual sampling always prefers homogeneous areas and disregards outliers. Therefore, in this study, area thresholding is applied to the candidate samples, and the objects whose areas are smaller than a predefined value are removed.

Automatic Sample Selection
This section introduces the proposed method for the automatic selection of training samples for buildings, shadow, water, vegetation, roads and soil, as illustrated in Figure 3, including the following steps.
(1) Select initial training samples of buildings, shadow, water, vegetation, soil, and roads, respectively, from the multiple information sources. (2) Samples located at the border areas are likely to be mixed pixels, and, thus, it is difficult to automatically assign these pixels to a certain label. In order to avoid introducing incorrect training samples, these border samples are removed with an erosion operation. (3) Manual sampling always prefers homogeneous areas and disregards outliers. Therefore, in this study, area thresholding is applied to the candidate samples, and the objects whose areas are smaller than a predefined value are removed. (4) The obtained samples should be further refined, in order to guarantee the accuracy of the samples.
Considering the fact that buildings and shadows are always spatially adjacent, the distance between the buildings and their neighboring shadows should be smaller than a threshold, which is used to remove unreliable buildings and shadows from the sample sets. Meanwhile, the road lines obtained from OSM are widened by several pixels, forming a series of buffer areas, where road samples can be picked out. (5) Considering the difficulty and uncertainty in labelling samples in overlapping regions, the samples that are labeled as more than one class are removed.

5
(2) Samples located at the border areas are likely to be mixed pixels, and, thus, it is difficult to automatically assign these pixels to a certain label. In order to avoid introducing incorrect training samples, these border samples are removed with an erosion operation. (3) Manual sampling always prefers homogeneous areas and disregards outliers. Therefore, in this study, area thresholding is applied to the candidate samples, and the objects whose areas are smaller than a predefined value are removed.
(a) (b)  The whole processing chain for the automatic sample selection is summarized in the following algorithm. Please note that the values of the parameters, mainly referring to the binarization threshold values for the multiple information indexes, and the area threshold values used to remove the small and heterogeneous points, can be conveniently determined and unified in all the test images. The suggested threshold values used in this study are not fixed, and can be appropriately tuned in different image scenes, but rather represent a first empirical approach.
In Figure 3, the initial samples extracted by the multiple information sources and the samples refined by the proposed algorithm are compared, from which it can be seen that the pure samples for the land-cover classes are correctly labeled in an automatic manner.

Classifiers Considered in This Study
In this paper, four classifiers MLC, SVM, RF, and MLP are used to implement the proposed automatic training sample selection method. The chosen classifiers have proven effective in many remote sensing applications [32].
(1) Maximum Likelihood Classification: MLC is a statistical approach for pattern recognition. For a given pixel, the probability of it belonging to each class is calculated, and it is assigned to the class with the highest probability [14]. Normally, the distribution of each class in the multi-dimension space is assumed to be a Gaussian distribution. The mean and covariance matrix of MLC are obtained from the training samples, and used to effectively model the classes. If the training set is biased compared to the normal distribution, the estimated parameters will not be very accurate. (2) Support Vector Machine: SVM is a binary classification method based on minimal structural risk, and it can be extended to multi-class classification with multi-class strategies. When dealing with linearly separable datasets, the optimal decision line is obtained by maximizing the margin between the two parallel hyperplanes. This type of SVM ensuring that all the samples are classified correctly is called hard-margin SVM. On the other hand, allowing the existence of misclassified training samples, soft-margin SVM introduces slack variables for each sample. SVM generates nonlinear decision boundaries by mapping the samples from a low-dimension space to a high-dimension one, and the kernel trick is used to avoid the definition of the mapping function [33].
An SVM model is constructed by support vectors, which usually locate in the decision boundary region between the class pairs. The most representative and informative samples will be close to the boundary of the class pair [15,34]. A better sample set for training an SVM model is not to accurately describe the classes, but to provide information about the decision boundary between the class pairs in the feature space [35]. Meanwhile, in the case of a small-size sample set, the outliers have an obvious influence on the decision boundary. (3) Neural Networks: NNs can be viewed as a parallel computing system consisting of an extremely large number of simple processors with interconnections [32]. NNs are able to learn complex nonlinear input-output relationships, use sequential training procedures, and adapt themselves to the data [32]. The MLP is one of the most commonly used NNs. It consists of input, hidden, and output layers, where all the neurons in each layer are fully connected to the neurons in the adjacent layers. These interconnections are associated with numerical weights, which are adjusted iteratively during the training process [36]. Each hidden neuron performs a mapping of the input feature space by a transform function. After an appropriate mapping by the previous layer, the next layer can learn the classification model as a linearly separable problem in the mapped feature space, and thus NNs are able to deal with nonlinearly separable datasets [8].
In this study, the conjugate gradient method (e.g., scaled conjugate gradient, SCG), is used for training of the MLP, since it can avoid the line search at each learning iteration by using a Levenberg-Marquardt approach to scale the step size [36]. (4) Decision Tree: A hierarchical DT classifier is an algorithm for labelling an unknown pattern by the use of a decision sequence. The tree is conducted from the roof node to the terminal leaf, and the feature for each interior node is selected by information gain or the Gini impurity index. A pruning operation is employed in simplifying the DT without increasing errors [14]. Due to the unsatisfactory performance, an ensemble of DTs, such as RF, is more commonly used than the simple DT [37]. RF combines predictors from trees, and the final result of a sample is the most popular class among the trees. Each tree is conducted via a sub-randomly selected sample set and a sub-randomly selected feature space [38]. Each tree in RF is grown to the maximum depth, and not pruned. RF is relatively robust to outliers and noise, and it is insensitive to over-fitting [14].

Experiments and Results
A series of test images were used to validate the proposed method for the automatic selection of training samples. In the experiments, the proposed method was compared with the traditional method (i.e., manually collected samples), in order to verify the feasibility of the automatically selected samples.
Remote Sens. 2015, 7, 16024-16044 Figure 4 shows the four test datasets, as well as the manually selected samples (the ground truth). The study areas are located in Hangzhou, Shenzhen, Hong Kong, and Hainan, respectively, with a size of 640ˆ594, 818ˆ770, 646ˆ640, and 600ˆ520 in pixels, as well as a resolution of 2 m, 2.4 m, 2 m, and 2 m. These datasets were acquired by WorldView-2, GeoEye-1, WorldView-2, and WorldView-2, respectively, with eight, four, four, and eight spectral bands, respectively The study areas exhibit the characteristics of a set of typical urban landscapes in China, and mainly consist of six classes: buildings, shadow, water, vegetation, roads, and bare soil. The six classes can be automatically extracted by the proposed method, the effectiveness of which was tested in the experiments. For the manually collected samples, 40% of the labeled samples were randomly selected from the ground truth as the training sample set (named ROI in the following text), while the rest were used for the testing (Table 1).

Datasets and Parameters
The parameters of the linear structuring element (SE) for the MBI and the MSI, including the minimal value, maximal value, and the interval, {Smin, Smax, ΔS,}, can be determined according to the spatial size of the buildings and the spatial resolution of the images used. These parameters were unified in this study as: Smin = 8 m, Smax = 70 m, and ΔS = 2 m, respectively. In addition, the binarization threshold values for the information indexes were set according to the suggestions of our previous study [25]. Please note that these thresholds can be simply and conveniently determined since we merely aim to choose pure and reliable samples for the further consideration.       The parameters of the linear structuring element (SE) for the MBI and the MSI, including the minimal value, maximal value, and the interval, {S min , S max , ∆S,}, can be determined according to the spatial size of the buildings and the spatial resolution of the images used. These parameters were unified in this study as: S min = 8 m, S max = 70 m, and ∆S = 2 m, respectively. In addition, the binarization threshold values for the information indexes were set according to the suggestions of our previous study [25]. Please note that these thresholds can be simply and conveniently determined since we merely aim to choose pure and reliable samples for the further consideration.
For the SVM classifier, the radial basis function (RBF) kernel was selected, and the regularization parameter and kernel bandwidth was optimized by five-fold cross-validation. For the RF classifier, 500 trees were constructed. The MLP classifier was carried out with two hidden layers, and the number of neurons in each layer was also optimized by five-fold cross-validation.
In the experiments, training with ROI or Auto means that the classification model was trained with manually labeled samples or automatically labeled samples, respectively. Each classification was conducted 10 times with different initial training samples that were randomly chosen from the candidate training sample set, and the average accuracies were recorded as the classification accuracy. For each classification experiment, 100 training samples per class were used for the training.

Results
The automatically selected training samples of the four datasets are displayed in Figure 5, and their numbers are provided in Table 2. It can be clearly observed that the automatically labeled samples are correct, pure, and representative, and are uniformly distributed in the whole image.
Remote Sens. 2015, 7 page-page 10 For the SVM classifier, the radial basis function (RBF) kernel was selected, and the regularization parameter and kernel bandwidth was optimized by five-fold cross-validation. For the RF classifier, 500 trees were constructed. The MLP classifier was carried out with two hidden layers, and the number of neurons in each layer was also optimized by five-fold cross-validation.
In the experiments, training with ROI or Auto means that the classification model was trained with manually labeled samples or automatically labeled samples, respectively. Each classification was conducted 10 times with different initial training samples that were randomly chosen from the candidate training sample set, and the average accuracies were recorded as the classification accuracy. For each classification experiment, 100 training samples per class were used for the training.

Results
The automatically selected training samples of the four datasets are displayed in Figure 5, and their numbers are provided in Table 2. It can be clearly observed that the automatically labeled samples are correct, pure, and representative, and are uniformly distributed in the whole image.   In general, from Table 3, the Auto samples achieve satisfactory accuracies, which are close to the accuracies achieved by the ROI samples. In particular, for the Shenzhen and Hong Kong datasets, the classification results obtained by the Auto samples are very similar and comparable to the manually selected ones, for all the classifiers. With respect to the Hangzhou and Hainan datasets, the accuracy achieved by the proposed automatic sampling is also acceptable (80%~90%), although their accuracy scores are slightly lower than the ROI samples by an average of 4%~7%. Considering the fact that the proposed method is able to automatically select samples from the images, it can be stated that the method is effective, making it possible to avoid time-consuming manual sample selection. Table 3. The overall classification accuracies for the four datasets. When comparing the performances of the different classifiers with the Auto sampling, MLC achieves the highest accuracies in two test datasets (Hong Kong and Hainan). However, generally speaking, all the classifiers perform equally in the four test images, showing the robustness of the proposed automatic sampling method in different scenes.

Large-Size Image Testing
The previous experiments verified that the proposed automatic sampling strategy is able to achieve satisfactory classification results over the four urban images. We also tested the practicability of the automatic method by the use of a large-size image from the Shenzhen city center, which is China's first and most successful Special Economic Zone. The dataset was acquired on 25 March 2012, by the WorldView-2 satellite, covering 92 km 2 with a 2-m spatial resolution, consisting of eight spectral bands. As shown in Figure 6, the dataset (named WV-2 in the following text) covers the city center of Shenzhen, and three sub-regions are manually labeled as the source of the training samples (ROI). Please note that the whole image was manually labeled as the ground truth for testing, in order to guarantee the reliability of the experimental conclusions. The numbers of available samples for ROI, Auto, and test are provided in Table 4. The parameters used in this experiment were the same as the previous ones.
The classification results, including the quantitative accuracy scores and the classification maps, are shown in Table 5 and Figure 7, respectively. The experimental results convey the following observations: ‚ In general, the classification accuracies obtained by the automatic sampling are very promising (80%~85%), which shows that it is fully possible to automatically classify large-size remote sensing images over urban areas.
‚ By comparing the performances of the different classifiers, it can be seen that MLC achieves the highest accuracy for the Auto samples, while SVM and the MLP give the best results for the ROI samples.
‚ It is interesting to see that in the case of MLC, the automatic sampling strategy significantly outperforms the manual sampling by 8% in the overall accuracy.
MLC performs better than the other classifiers with the automatic sampling. This phenomenon can be explained by: (1) the difference in the properties between the Auto and ROI samples; and (2) the difference in the decision rules between the classifiers. In general, the classification accuracies obtained by the automatic sampling are very promising (80%~85%), which shows that it is fully possible to automatically classify largesize remote sensing images over urban areas.

•
By comparing the performances of the different classifiers, it can be seen that MLC achieves the highest accuracy for the Auto samples, while SVM and the MLP give the best results for the ROI samples.

•
It is interesting to see that in the case of MLC, the automatic sampling strategy significantly outperforms the manual sampling by 8% in the overall accuracy.
MLC performs better than the other classifiers with the automatic sampling. This phenomenon can be explained by: (1) the difference in the properties between the Auto and ROI samples; and (2) the difference in the decision rules between the classifiers.       1. It should be noted that the Auto samples are purer than ROI, since the automatic selection prefers homogeneous and reliable samples in order to avoid errors and uncertainties. Specifically, as described in Algorithm 1, boundary pixels which are uncertain and mixed have been removed, and the area thresholding further reduces the isolated and heterogeneous pixels. 2. The four classifiers considered in this study can be separated into parametric classifiers (MLC), and non-parametric classifiers (SVM, RF, and MLP). The principle of MLC is to construct the distributions for different classes, but the non-parametric methods tend to define the classification decision boundaries between different land-cover classes. Consequently, pure samples are more appropriate for MLC, but an effective sampling for the non-parametric classifiers is highly reliant on the samples near the decision boundaries so that they can be used to separate the different classes.

Discussions
In this section, several important issues regarding the proposed automatic sampling method are discussed, including the influence of the number of training samples, the effectiveness of the classifiers for the automatic training samples, and the limitations of the proposed approach.

Number of Training Samples
In this experiment, the classification was conducted with different numbers of training samples extracted by ROI and Auto, respectively. The large-size WV-2 image of Shenzhen city was taken as an example. The results are demonstrated in Figure 8, where the general conclusion is that, for both ROI and Auto sampling, increasing the number of training samples does not significantly increase the classification accuracy after 500~1000 samples per class are chosen. In addition, it can be seen that Auto-MLC can provide much higher accuracies than ROI-MLC, which shows that the proposed automatic sampling method is a satisfactory sampling strategy for the MLC classifier. The CPU time for the various classifiers with different training samples is recorded in Figure 8b. Here, it can be seen that MLP and SVM are more sensitive to the number of training samples, and a large number of samples lead to more computational time. RF is less sensitive to the training sample number, as the CPU time tends to be invariant when the number of samples is larger than 1500 pixels per class. It should be noted that MLC, which aims to describe the probability of the class distribution, is totally insensitive to the number of training samples.
1. It should be noted that the Auto samples are purer than ROI, since the automatic selection prefers homogeneous and reliable samples in order to avoid errors and uncertainties. Specifically, as described in Algorithm 1, boundary pixels which are uncertain and mixed have been removed, and the area thresholding further reduces the isolated and heterogeneous pixels. 2. The four classifiers considered in this study can be separated into parametric classifiers (MLC), and non-parametric classifiers (SVM, RF, and MLP). The principle of MLC is to construct the distributions for different classes, but the non-parametric methods tend to define the classification decision boundaries between different land-cover classes. Consequently, pure samples are more appropriate for MLC, but an effective sampling for the non-parametric classifiers is highly reliant on the samples near the decision boundaries so that they can be used to separate the different classes.

Discussions
In this section, several important issues regarding the proposed automatic sampling method are discussed, including the influence of the number of training samples, the effectiveness of the classifiers for the automatic training samples, and the limitations of the proposed approach.

Number of Training Samples
In this experiment, the classification was conducted with different numbers of training samples extracted by ROI and Auto, respectively. The large-size WV-2 image of Shenzhen city was taken as an example. The results are demonstrated in Figure 8, where the general conclusion is that, for both ROI and Auto sampling, increasing the number of training samples does not significantly increase the classification accuracy after 500~1000 samples per class are chosen. In addition, it can be seen that Auto-MLC can provide much higher accuracies than ROI-MLC, which shows that the proposed automatic sampling method is a satisfactory sampling strategy for the MLC classifier. The CPU time for the various classifiers with different training samples is recorded in Figure 8b. Here, it can be seen that MLP and SVM are more sensitive to the number of training samples, and a large number of samples lead to more computational time. RF is less sensitive to the training sample number, as the CPU time tends to be invariant when the number of samples is larger than 1500 pixels per class. It should be noted that MLC, which aims to describe the probability of the class distribution, is totally insensitive to the number of training samples. (a)

Further Comparison Between Auto and ROI Sampling
In order to further analyze and understand the automatic training samples, in this subsection, the class distributions derived from the ROI, Auto, and test samples are demonstrated and compared. In this analysis, MLC ( Figure 9) and SVM ( Figure 10) are taken as a representative example for the parametric and non-parametric classifiers, respectively. In the figures, the distributions are illustrated with a two-dimensional feature space (first and second principal components, namely, PCA1 and PCA2). From Figure 9, in the case of MLC, it can be clearly observed that the shape of the decision boundaries for the Auto and reference samples are quite similar. This reveals that the automatically selected samples are effective for modelling the probabilistic distributions of land-cover classes.

Further Comparison Between Auto and ROI Sampling
In order to further analyze and understand the automatic training samples, in this subsection, the class distributions derived from the ROI, Auto, and test samples are demonstrated and compared. In this analysis, MLC ( Figure 9) and SVM ( Figure 10) are taken as a representative example for the parametric and non-parametric classifiers, respectively. In the figures, the distributions are illustrated with a two-dimensional feature space (first and second principal components, namely, PCA1 and PCA2). From Figure 9, in the case of MLC, it can be clearly observed that the shape of the decision boundaries for the Auto and reference samples are quite similar. This reveals that the automatically selected samples are effective for modelling the probabilistic distributions of land-cover classes.

Further Comparison Between Auto and ROI Sampling
In order to further analyze and understand the automatic training samples, in this subsection, the class distributions derived from the ROI, Auto, and test samples are demonstrated and compared. In this analysis, MLC ( Figure 9) and SVM ( Figure 10) are taken as a representative example for the parametric and non-parametric classifiers, respectively. In the figures, the distributions are illustrated with a two-dimensional feature space (first and second principal components, namely, PCA1 and PCA2). From Figure 9, in the case of MLC, it can be clearly observed that the shape of the decision boundaries for the Auto and reference samples are quite similar. This reveals that the automatically selected samples are effective for modelling the probabilistic distributions of land-cover classes.    In addition, it can be seen that the decision boundaries derived from the Auto samples are more similar to those derived from the reference samples than those from the ROI samples, which can be used to explain and support the conclusion in Table 5.
On the other hand, in the case of the SVM classifier, as demonstrated in Figure 10, it can be seen that the boundaries derived from ROI are closer to the boundaries of the reference samples than the Auto sampling, which is reflected in the classification accuracy, i.e., 81.8% for Auto-SVM and 84.0% for ROI-SVM, respectively. As explained previously, the accuracy of the non-parametric classifiers is highly reliant on the samples near the decision boundaries (for instance, the so-called support vectors for the SVM classification), but automatic sampling is more capable of identifying the homogeneous and pure samples which are far from the decision boundaries. Therefore, in the case of SVM, the manually labeled samples seem more suitable than the automatically selected ones. This analysis is also consistent with the accuracies reported in Table 5.

Active Learning for the Automatic Sampling
As previously shown, the proposed approach is able to automatically and effectively label samples for different land-cover classes. On the other hand, active learning can select the most informative samples from the available sample set by considering the contribution of each sample to In addition, it can be seen that the decision boundaries derived from the Auto samples are more similar to those derived from the reference samples than those from the ROI samples, which can be used to explain and support the conclusion in Table 5.
On the other hand, in the case of the SVM classifier, as demonstrated in Figure 10, it can be seen that the boundaries derived from ROI are closer to the boundaries of the reference samples than the Auto sampling, which is reflected in the classification accuracy, i.e., 81.8% for Auto-SVM and 84.0% for ROI-SVM, respectively. As explained previously, the accuracy of the non-parametric classifiers is highly reliant on the samples near the decision boundaries (for instance, the so-called support vectors for the SVM classification), but automatic sampling is more capable of identifying the homogeneous and pure samples which are far from the decision boundaries. Therefore, in the case of SVM, the manually labeled samples seem more suitable than the automatically selected ones. This analysis is also consistent with the accuracies reported in Table 5.

Active Learning for the Automatic Sampling
As previously shown, the proposed approach is able to automatically and effectively label samples for different land-cover classes. On the other hand, active learning can select the most informative samples from the available sample set by considering the contribution of each sample to the classifier. In this regard, it is interesting to integrate the proposed automatic sample labelling and active learning for sample selection and optimization. The processing chain is straightforward: The candidate sample sets for various land-cover classes are selected and labeled by the proposed automatic strategy, and these samples are then ranked by active learning in terms of their contribution to the classifier. In this study, the SVM classifier is considered, and the commonly used breaking ties (BT) method [39] is adopted for the sample selection. Note that the SVM parameters (kernel parameter and penalty coefficient) are retuned during the active learning iterations. The experimental results for the active learning with the automatically labeled samples are presented in Figure 11 for the four test datasets. It can be clearly seen that in the Shenzhen, Hong Kong, and Hainan experiments, active learning (Auto-AL) can provide additional accuracy increments compared to the original automatic sampling approach (Auto-SVM). In the case of Hangzhou, active learning outperforms the original algorithm in the early stage, but does not perform as well after about 300 samples are chosen. However, in general, the difference is not significant. Therefore, it can be stated that active learning can further optimize the proposed sample labelling method, since it can select the most informative samples by considering their importance in the classification. the classifier. In this regard, it is interesting to integrate the proposed automatic sample labelling and active learning for sample selection and optimization. The processing chain is straightforward: The candidate sample sets for various land-cover classes are selected and labeled by the proposed automatic strategy, and these samples are then ranked by active learning in terms of their contribution to the classifier. In this study, the SVM classifier is considered, and the commonly used breaking ties (BT) method [39] is adopted for the sample selection. Note that the SVM parameters (kernel parameter and penalty coefficient) are retuned during the active learning iterations. The experimental results for the active learning with the automatically labeled samples are presented in Figure 11 for the four test datasets. It can be clearly seen that in the Shenzhen, Hong Kong, and Hainan experiments, active learning (Auto-AL) can provide additional accuracy increments compared to the original automatic sampling approach (Auto-SVM). In the case of Hangzhou, active learning outperforms the original algorithm in the early stage, but does not perform as well after about 300 samples are chosen. However, in general, the difference is not significant. Therefore, it can be stated that active learning can further optimize the proposed sample labelling method, since it can select the most informative samples by considering their importance in the classification.

Conclusions and Future Scope
In this paper, a novel method for automatic sample selection and labelling for image classification in urban areas is proposed. The training sample sets are obtained from multiple information sources, such as soil from the HSV color space, roads from OSM, and automatic information indexes referring to buildings, shadow, vegetation, and water. A series of processing

Conclusions and Future Scope
In this paper, a novel method for automatic sample selection and labelling for image classification in urban areas is proposed. The training sample sets are obtained from multiple information sources, such as soil from the HSV color space, roads from OSM, and automatic information indexes referring to buildings, shadow, vegetation, and water. A series of processing steps are further used to refine the samples that are initially chosen, e.g., removing overlaps, removing borders, and semantic filtering.
The experiments with four test datasets showed that the proposed automatic training sample labelling method (Auto) is able to achieve satisfactory classification accuracies, which are very close to the results obtained by the manually selected samples (ROI), with four commonly used classifiers. Furthermore, the experiments with a large-size image (WorldView-2 image from Shenzhen city, 92 km 2 ) showed that the proposed method is able to achieve automatic image classification with a promising accuracy (84%). It was also found that the automatic sampling strategy is more suitable for maximum likelihood classification (MLC), which aims to describe the probabilistic distribution of each land-cover class. In particular, in the experiments, active learning [40] was jointly used with the proposed Auto sampling method, in order to select the most informative samples from the automatically labeled samples. The results were interesting and promising, as active learning could further improve the classification accuracies by about 2%~4% in most of the test sets.
The significance of this study lies in the fact that it has showed that automatic sample selection and labelling from remote sensing images is feasible and can achieve promising results. Our future research will address the mixing of manually and automatically selected samples. In this way, the decision boundaries generated by the Auto method could be further enhanced by adding new samples and removing wrong ones. It will also be possible to evaluate and compare the importance of manual and automatic samples for classification. In addition, the samples used for the accuracy assessment will be generated randomly, in order to avoid any bias in the results [41].