Identiﬁcation of Village Building via Google Earth Images and Supervised Machine Learning Methods

: In this study, a method based on supervised machine learning is proposed to identify village buildings from open high-resolution remote sensing images. We select Google Earth (GE) RGB images to perform the classiﬁcation in order to examine its suitability for village mapping, and investigate the feasibility of using machine learning methods to provide automatic classiﬁcation in such ﬁelds. By analyzing the characteristics of GE images, we design different features on the basis of two kinds of supervised machine learning methods for classiﬁcation: adaptive boosting (AdaBoost) and convolutional neural networks (CNN). To recognize village buildings via their color and texture information, the RGB color features and a large number of Haar-like features in a local window are utilized in the AdaBoost method; with multilayer trained networks based on gradient descent algorithms and back propagation, CNN perform the identiﬁcation by mining deeper information from buildings and their neighborhood. Experimental results from the testing area at Savannakhet province in Laos show that our proposed AdaBoost method achieves an overall accuracy of 96.22% and the CNN method is also competitive with an overall accuracy of 96.30%.


Introduction
With the rapid development of urbanization processes, maps used to illustrate buildings and their distribution are significant and are required in a wide range of fields.Important applications include environmental monitoring, resource management, disaster response, and homeland security [1].In urban areas, accurate maps are often available; however, this is not always the case for villages and there may be no digital version available, which has several negative consequences.For instance, during catastrophic events such as the Wenchuan earthquake in 2008, disaster relief could not be conveniently provided due to inadequate rural information and because the location of residential buildings was uncertain, which resulted in serious loss of life and property [2].A swift update of building information is very important and essential during catastrophes [3] because secondary disasters such as tsunamis, avalanches, and landslides may follow [4], causing swift changes to land conditions.In rural planning, which aims to benefit the inhabitants, public facilities need to be developed on the basis of the residential buildings' distribution information [5].To solve such complicated problems, the tools used to identify buildings must provide rapid, accurate, efficient, and time-sequenced results.
With the help of remote sensing satellite images [6][7][8], earth-observation activities on regional to global scales can be implemented owing to advantages such as wide spatial coverage and high temporal resolution [9,10].Most previous studies that focus on village mapping commonly use low-and medium-spatial resolution images such as Landsat Thematic Mapper (TM), The Enhanced Thematic Mapper Plus (ETM+), and National Oceanic and Atmospheric Administration (NOAA)/ Advanced Very High Resolution Radiometer (AVHRR) [11][12][13], whereas in recent years, high-resolution images such as QuickBird, Ikonos, and RapidEye facilitate high-accuracy identification.Unfortunately, considering the high cost of data acquisition, these high-resolution images are generally utilized in a specific small region and are rarely applied in large ones [14].
A promising alternative solution is offered by Google Earth (GE), which provides open, highly spatially resolved images suitable for village mapping [15][16][17][18].However, GE images have rarely been used as the main data source in village mapping.GE images are limited to a three-band color code (R, G, and B), which is expected to lower the classification performance due to its poor spectral signature [19].Actually, the potential for the classification of spatial characteristics by Google Maps has been underestimated [20].By analyzing the tone, texture, and geometric features in a GE image [18,21], experts can recognize village buildings with high confidence.Consequently, we believe that GE images can provide a good data source for village mapping.
In terms of classification technique, many methods have been studied by consulting published literature.With the help of image processing and feature extraction techniques, various machine learning algorithms [22] in remote sensing have been implemented.The AdaBoost method [23] is widely used in remote sensing pattern recognition.For instance, Zhang et al. [24] combine the K-means method with AdaBoost to classify buildings, and the overall accuracy is about 90%.Zongur et al. [25] utilize satellite images to detect an airport runway using AdaBoost with a circular-Mellin feature.Using an improved Normalized Difference Build-up Index (NDBI) and remote sensing images, Li et al. [26] dynamically extract urban land.Cetin et al. [27] use textural features such as the mean and standard deviation of image intensity and gradient for building detection.In the field of remote sensing detection, using the convolutional neural networks (CNN) method [28], Chen et al. [29] address vehicle detection, Li et al. [30] focus on building pattern classifiers, and Yue et al. [31] use both spectral and spatial features for hyperspectral image classification.To predict geoinformative attributes from large-scale images, Lee et al. [32] also choose CNN, and Sermanet et al. [33] utilize the CNN method to identify house numbers.In high-resolution image processing, Hu et al. [34] solved scene classification tasks using CNN and achieved an overall accuracy of approximately 98%.Other machine learning methods such as support vector machines (SVM) [35], which maximizes the margin in high-dimensional feature spaces using kernel methods for the samples, are introduced for classification.For the identification of forested landslides, Dou et al. [36] utilize a case-based reasoning approach and Li et al. [37] adopt two machine learning algorithms: random forest (RF) and SVM.When dealing with classifying complex mountainous forests via remote sensing images, Attarchi et al. [38] verify the performances of three machine learning methods: SVM, neural networks (NN), and RF.For mapping urban areas of DMSP/OLS nighttime light and MODIS data, Jing et al. [39] also utilize SVM.
To investigate the accuracy and efficiency of identification considering the characteristics of GE images, we herein explore the feasibility of supervised machine learning approaches for building identification using AdaBoost and CNN, respectively.Both methods adopt different feature extraction schemes, enabling full exploitation of the texture, spectral, geometry, and other characteristics in the images.The AdaBoost algorithm focuses on the color and textural information of the buildings and their surrounding areas; hence, it utilizes both color information and a large number of Haar-like features.
The performance of the AdaBoost method largely depends on the quality of the feature selection, which is itself quite challenging.In contrast, the CNN method achieves more robust and stronger performance than AdaBoost because it mines the deeper representative information from low-level inputs [28].With multilayer networks trained by a gradient descent algorithm, CNN can learn complex and nonlinear mapping from a high-to low-dimensional feature space.Here, we constructed a four layer CNN network to describe the characteristics inside an 18 ˆ18-pixel window and applied it to the classification.
The remainder of our paper is organized as follows.In Section 2, we describe the study area and the experimental data.In Section 3, we briefly introduce the principles of the AdaBoost and CNN methods.We compare and analyze the experimental results of each algorithm in Section 4. The performance of our proposed method and suggestions for future work are presented in Section 5.

Study Area
In contrast to densely packed urban buildings, village buildings tend to be sparsely scattered.In this study, we define "village buildings" as any settlement with size less than 2 km.The study area, Kaysone (Figure 1a), is located at the Savannakhet province (Figure 1b) in Laos.The remote top-view RGB image of Kaysone has a size of 3600 ˆ4500 pixels with a resolution of 1 m, which was captured from Google's satellite map in February 2015.Its longitude and latitude range from E104 ˝47 1 22" to E104 ˝49 1 54" and from N16 ˝34 1 28" to N16 ˝36 1 26", respectively, showing an area of approximately 19.44 km 2 .The projection is in the UTM Zone 48N system and Datum WGS 84.The study area is a complex and rural region with many different types of landscape, including natural components such as mountains, rivers, and vegetation cover as well as artificial areas such as villages, roads, and cultivated land, which are typical in rural areas.
Remote Sens. 2016, 8, 271 3 of 16 constructed a four layer CNN network to describe the characteristics inside an 18 × 18-pixel window and applied it to the classification.The remainder of our paper is organized as follows.In Section 2, we describe the study area and the experimental data.In Section 3, we briefly introduce the principles of the AdaBoost and CNN methods.We compare and analyze the experimental results of each algorithm in Section 4. The performance of our proposed method and suggestions for future work are presented in Section 5.

Study Area
In contrast to densely packed urban buildings, village buildings tend to be sparsely scattered.In this study, we define "village buildings" as any settlement with size less than 2 km.The study area, Kaysone (Figure 1a), is located at the Savannakhet province (Figure 1b) in Laos.The remote top-view RGB image of Kaysone has a size of 3600 × 4500 pixels with a resolution of 1 m, which was captured from Google's satellite map in February 2015.Its longitude and latitude range from E104°47′22″ to E104°49′54″ and from N16°34′28″ to N16°36′26″, respectively, showing an area of approximately 19.44 km 2 .The projection is in the UTM Zone 48N system and Datum WGS 84.The study area is a complex and rural region with many different types of landscape, including natural components such as mountains, rivers, and vegetation cover as well as artificial areas such as villages, roads, and cultivated land, which are typical in rural areas.

Data
As the training dataset, we selected four typical village/non-village areas (see Figure 1) from the original image.Each image was sized 600 × 900 pixels.Figure 1c,d

Data
As the training dataset, we selected four typical village/non-village areas (see Figure 1) from the original image.Each image was sized 600 ˆ900 pixels.Figure 1c,d in rich in village information, showing features such as buildings, roads, rivers, and cultivated lands.Figure 1f contains forests and mountains, whereas Figure 1e features crop and vegetation cover.The test data are contained within the entire area of Figure 1a.Within this area, the ground truth map of the village buildings was manually drawn beforehand using a polygon-based interaction tool.This ground truth map contains accurate information of the land categories and is chiefly used in sampling and result detection.

Machine Learning Approaches
Machine learning approaches can be divided into two broad categories: supervised and unsupervised.In general, unsupervised learning methods cannot learn the characteristics of GE images to a satisfactory level because of their limited distinguishability and lack of prior knowledge.Supported by training data, supervised learning methods usually deliver better classification results.In this study, AdaBoost and CNN were employed as classifiers and the feature extraction step was designed accordingly.Another popular alternative is SVM, which effectively solves nonlinear and higher dimensional problems [40].However, in preliminary tests , SVM delivered similar performance to AdaBoost and required much longer computational time.Therefore, we exclude the SVM method in the present analysis.

AdaBoost Algorithm
The AdaBoost classification method [23,41] has gained great popularity due to its high accuracy, low complexity, and ability to recognize salient features.Based on the sample data, this algorithm iteratively adjusts the weights of many weak classifiers and builds these weak classifiers into a strong classifier.The samples are defined as: where m refers to the number of samples, and x i P X, y i P Y " t´1, `1u.Note the inclusion of both positive and negative input samples.Initially, all samples are equally weighted as follows: The weight is adjusted after checking the performance of the weak classifier.If a sample cannot be correctly classified, its weight is increased.The details of the AdaBoost algorithm are shown below: For t " 1, ..., T: Train weak learner using distribution W t .

‚
Get weak hypothesis h t : X Ñ t´1, `1u , with error: ‚ Choose: ‚ Update: where T is the total number of weak classifiers and Z t is a normalization factor, which is chosen such that W t`1 is a distribution.
Output the final hypothesis: where α t indicates the weight of a weak classifier h t .If α t is large, the corresponding weak classifier plays an important role in the final combination, indicating a relatively the important feature.Manually extracting the relevant features of village buildings is a challenging task.To improve the classification accuracy, the AdaBoost algorithm instead constructs the weak classifiers from a large number of simple features.Meanwhile, although the input dimension is quite high, the AdaBoost algorithm is robust to over-fitting problems [42].In our experiment, we automatically generated 500 weak classifiers by applying the sample data to a three-layer decision tree.

Color Feature
To optimize the classifier for detecting the characteristics of color-coded GE images, we utilized four kinds of input samples from the images in Figure 1, varying the window size from 1 ˆ1 to 7 ˆ7 pixels to show the color information of specific blocks.Given an m ˆn sized color image patch, AdaBoost organizes the color feature by reshaping the color of each pixel in the patch into an m ˆn ˆ3 vector.

Haar-Like Feature
Haar-like features are proven to be efficient tools for detecting textural and structural information of buildings [23].In this study, a large number of Haar-like features are generated for classification.By considering not only the target pixel but also the pixels inside its neighborhood window, the texture and structural characteristics of buildings can be well recognized.
The Haar-like features were generated by adopting several basic Haar filters and shifting and scaling them inside the neighborhood window.Figure 2 demonstrates the application of a 2 ˆ1 Haar filter.Here, w and h denote the width and height of the filter, respectively.The original filter is presented in Figure 2a.Shifting this filter inside the window, we obtain the various features shown in Figure 2b.Similarly, some results of shifting and scaling operations are presented in Figure 2c,d.In these panels, the original filter was enlarged to 4 ˆ2 and 6 ˆ3, respectively.
Remote Sens. 2016, 8, 271 5 of 16 Output the final hypothesis: ( ) where t α indicates the weight of a weak classifier t h .If t α is large, the corresponding weak classifier plays an important role in the final combination, indicating a relatively the important feature.Manually extracting the relevant features of village buildings is a challenging task.To improve the classification accuracy, the AdaBoost algorithm instead constructs the weak classifiers from a large number of simple features.Meanwhile, although the input dimension is quite high, the AdaBoost algorithm is robust to over-fitting problems [42].In our experiment, we automatically generated 500 weak classifiers by applying the sample data to a three-layer decision tree.

Color Feature
To optimize the classifier for detecting the characteristics of color-coded GE images, we utilized four kinds of input samples from the images in Figure 1, varying the window size from 1 × 1 to 7 × 7 pixels to show the color information of specific blocks.Given an m × n sized color image patch, AdaBoost organizes the color feature by reshaping the color of each pixel in the patch into an m × n × 3 vector.

Haar-Like Feature
Haar-like features are proven to be efficient tools for detecting textural and structural information of buildings [23].In this study, a large number of Haar-like features are generated for classification.By considering not only the target pixel but also the pixels inside its neighborhood window, the texture and structural characteristics of buildings can be well recognized.
The Haar-like features were generated by adopting several basic Haar filters and shifting and scaling them inside the neighborhood window.Figure 2 demonstrates the application of a 2 × 1 Haar filter.Here, w and h denote the width and height of the filter, respectively.The original filter is presented in Figure 2a.Shifting this filter inside the window, we obtain the various features shown in Figure 2b.Similarly, some results of shifting and scaling operations are presented in Figure 2c,d.In these panels, the original filter was enlarged to 4 × 2 and 6 × 3, respectively.

Convolutional Neural Networks (CNN)
CNN was inspired by biological entities.When multilayer networks are trained with gradient descent algorithms, they can learn complex nonlinear mappings from a high-to a low-dimensional feature space, where classification is evident [43].Importantly, the simplified low-dimensional space can largely restore the information of the high-dimensional features, and it mines deeper information of the input high-dimensional features.CNN vertically concatenates an m × n RGB image patch into a

Convolutional Neural Networks (CNN)
CNN was inspired by biological entities.When multilayer networks are trained with gradient descent algorithms, they can learn complex nonlinear mappings from a high-to a low-dimensional feature space, where classification is evident [43].Importantly, the simplified low-dimensional space can largely restore the information of the high-dimensional features, and it mines deeper information of the input high-dimensional features.CNN vertically concatenates an m ˆn RGB image patch into a 3m ˆn matrix.Here, we briefly introduce the framework of CNN; the details are provided in [28,43].In implementing CNN, we utilized the DeepLearnToolbox library developed by [44].To simplify the high-dimensional feature, the approach utilizes a multilayer system, including convolution and subsampling layers, as shown in Figure 3. Convolution can enhance the raw signal while decreasing the noise signal; the subsampling layer can utilize the correlation of the contiguous pixels, pooling the feature into lower dimensions without losing meaningful information.With the increase of layers, the dimension decreases but the number of features increases.3m × n matrix.Here, we briefly introduce the framework of CNN; the details are provided in [28,43].In implementing CNN, we utilized the DeepLearnToolbox library developed by [44].
To simplify the high-dimensional feature, the approach utilizes a multilayer system, including convolution and subsampling layers, as shown in Figure 3. Convolution can enhance the raw signal while decreasing the noise signal; the subsampling layer can utilize the correlation of the contiguous pixels, pooling the feature into lower dimensions without losing meaningful information.With the increase of layers, the dimension decreases but the number of features increases.To restore the information of high-dimensional features to the greatest extent, the key point is back propagation.Similar to neural networks, to connect layers i and j , one important parameter is the propagation weight ij W , and another is the bias i b .At the beginning, ij W and i b are randomly decided as 0.Then, we perform step-forward propagation where l z denotes the total weighted sum of inputs to all the units in layer l , ( ) f x is the activation function, such as a sigmoid, l a denotes the activation (meaning output value) of all the units in layer l .After the iteration, output , ( ) h x is produced.Due to the primitive definition of ij W and i b , there must be some error between the result and the true value.Two kinds of error are important, which can measure the accuracy of the network. ' where δ refers to the error in each output unit i in each layer l ; it measures how far the corresponding node is responsible for any error in the output.n .In addition to the error to get the optimum solution of ij W and i b , we utilize an efficient iteration algorithm called a gradient descent, To restore the information of high-dimensional features to the greatest extent, the key point is back propagation.Similar to neural networks, to connect layers i and j, one important parameter is the propagation weight W ij , and another is the bias b i .At the beginning, W ij and b i are randomly decided as 0.Then, we perform step-forward propagation z pl`1q " W plq a plq `bplq a pl`1q " f ´zpl`1q ¯ (8) where z l denotes the total weighted sum of inputs to all the units in layer l, f pxq is the activation function, such as a sigmoid, a l denotes the activation (meaning output value) of all the units in layer l.After the iteration, output h W,b pxq is produced.Due to the primitive definition of W ij and b i , there must be some error between the result and the true value.Two kinds of error are important, which can measure the accuracy of the network.
where δ plq i refers to the error in each output unit i in each layer l; it measures how far the corresponding node is responsible for any error in the output.δ pn l q i denotes the error for each output unit i in the output layer n l .In addition to the error to get the optimum solution of W ij and b i , we utilize an efficient iteration algorithm called a gradient descent, where α is the learning rate and the definition of JpW, bq is an average sum-of-squares error term.We can efficiently compute the solution of the partial derivatives using back propagation.Incorporating Based on the multilayer networks, the parameters of convolution subsampling cores are significant.In our experiment, the sequence of middle layers contains four layers C 1 , S 1 , C 2 , and S 2 .The data of each layer set is shown in Table 1, and the concrete process is shown in Figure 3.Both the kernel size and pooling scale can turn the sample from a high-to low-dimensional feature space.As set initially, the size of the sample window is 18 ˆ18 in 3D; after going through the convolution for the first time, the size changes to 14 ˆ14, which becomes 7 ˆ7 after the first pooling; in the following procedure, convolution 3 and subsampling 4 are similar to the previous ones, whereby the size becomes 4 ˆ4 and 2 ˆ2, respectively.As observed from the result, size can be scaled to low dimensions after the corresponding process.Output maps decide the number of kernel core, influencing the number and category of the training data.

Result and Discussion
During the experiment, we first trained the classification model using the sample data and then applied the model to the test dataset.The classification performance was evaluated by three widely used parameters. (1) Confusion Matrix (see Table 2): This parameter comprehensively describes the performance of a binary classification result.True Positives: actual buildings that were correctly classified as buildings, False Positives: non-buildings were incorrectly labeled as buildings, False Negatives: buildings that were incorrectly marked as non-buildings, True Negatives: all the things correctly classified as non-buildings.
(2) Overall Accuracy: Measures the overall performance of the classifier.

Overall Accuracy
where n is the total number of pixels.
(3) Kappa: This parameter comprehensively describes the classification accuracy by measuring the inter-rater agreement among qualitative items [45].Kappa is defined as follows: Kappa " 2 ˆpTP ˆTN ´FP ˆFNq TP p2TN `FP `FNq `TN pFP `FNq `FP 2 `FN 2 (16) The performances of AdaBoost and CNN are evaluated in Sections 4.1 and 4.2 respectively.All input training data are contained in the 900 ˆ600 pixel RGB image shown in Figure 1d.The same image is used for the test data.The test results of the two algorithms are compared and discussed in Section 4.3.In Section 4.4, we test the entire image (Figure 1a, with a size of 3600 ˆ4500 pixels) by CNN.To enhance the performance, we also alter the input training data.The accuracy of the experiment is evaluated by comparing the result with the ground truth, which contains the tree labels (building areas, non-building areas, and unknown areas such as cloud cover).

Result of AdaBoost
In the training section, the positive samples contain information of the recognized target, which herein refers to building information; conversely, negative samples are without information of a recognized target such as trees, roads, and unknown areas.
The confidence obtained by AdaBoost is thresholded by a parameter called Terra.However, when Terra was set to 0 (as in traditional AdaBoost methods), the performance was greatly degraded by the large number of false positives in the classification results.After observing the results for different values of Terra, we selected the optimal Terra based on the Kappa criterion.The following result indicates the progress from using color features to add Haar-like features.

Color Feature
As previously mentioned, the classifier was optimized by trialing four kinds of input samples from Figure 1d.All the testing samples were also prepared from Figure 1d.There must be a one-to-one size correspondence between the testing and training samples, as shown in Table 3.Since the edges are ignored, the number of testing samples differs among the tests.After the training procedure, four kinds of stronger classifier emerged: AdaBoost_A-D.
To test the accuracy of classifiers generated in the training part, we utilize the classifiers obtained to detect the testing samples, respectively, and the results are as follows (Table 4).
As we can infer from the result table, classifier B has the optimum solution, with a Kappa of 0.306 and overall accuracy of 96.04%.To enhance the accuracy of the classifier, we incorporate Haar-like features and color features to mine deeper texture information of the target.Based on the feature value theory, by calculating the entire feature value via an integral image algorithm, the AdaBoost method can be achieved.
The training samples are captured from Figure 1d, with 111 pieces of positive sample and 283 negative samples, all sized at 25 ˆ25 pixels in gray-scale.The positive and negative samples are the gray-scale images of buildings and non-buildings, respectively.The input training data is a matrix of Haar-like feature values.The size of the Haar-like features can be stretched or reduced, and they can move within the entire image.The dimension of a two-rectangle feature would reach 3328; therefore, considering that the dimension of the input training data is extremely large, only four kinds of common Haar-like feature have been used.Based on the principle of Figure 2, the total dimension of the input features is 3328 ˆ2 + 2600 + 2276 = 11532.The testing data is also shown in Figure 1d.After combining with the color features, the result of the identification is enhanced to Kappa of 0.312 and the overall accuracy is enhanced to 96.22%.Compared with the result of using color features only, this result is somewhat improved.

Result of CNN
In the CNN algorithm training process, placing the training samples and their relevant labels together is necessary.To test the influence of the input parameters on the final classifier, we utilize a control variate method as follows (Table 5).After the training part of CNN, three different classifiers are generated from CNN_A to CNN_C.After utilizing the classifiers to test the given figure, the result and the corresponding figure will be produced (Table 6).Compared with the respective results, the obtained classifier CNN_C performs well in detecting the buildings from the given figure.Kappa is enhanced to 0.564 and the overall accuracy reaches 96.30%.In such cases, the confusion matrix and the outcome figures are as follows (Table 7).As we mentioned in the definition of the confusion matrix, in the method based on CNN, actual buildings that were correctly classified as buildings is 12,540, non-buildings incorrectly labeled as buildings is 17,366, buildings that were incorrectly marked as non-buildings is 799, and number of correctly classified non-buildings is 459,034.Although the number of non-buildings that were incorrectly labeled as buildings is a little bit high, impacting the performance of the result, other parts of the matrix perform well.
In the outcome figure of CNN_C, gray refers to the unknown part, green means the actual buildings that were correctly classified as buildings, blue indicates the non-buildings that were incorrectly labeled as buildings, red shows the buildings that were incorrectly marked as non-buildings, and black denotes the correctly classified non-buildings.As shown in the result, the CNN_C classifier has high performance in detecting buildings.In Figure 4b, it can be seen that the classification error decreases stably as the iteration increases.

CNN_C
0.564 96.30% Compared with the respective results, the obtained classifier CNN_C performs well in detecting the buildings from the given figure.Kappa is enhanced to 0.564 and the overall accuracy reaches 96.30%.In such cases, the confusion matrix and the outcome figures are as follows (Table 7).As we mentioned in the definition of the confusion matrix, in the method based on CNN, actual buildings that were correctly classified as buildings is 12,540, non-buildings incorrectly labeled as buildings is 17,366, buildings that were incorrectly marked as non-buildings is 799, and number of correctly classified non-buildings is 459,034.Although the number of non-buildings that were incorrectly labeled as buildings is a little bit high, impacting the performance of the result, other parts of the matrix perform well.
In the outcome figure of CNN_C, gray refers to the unknown part, green means the actual buildings that were correctly classified as buildings, blue indicates the non-buildings that were incorrectly labeled as buildings, red shows the buildings that were incorrectly marked as non-buildings, and black denotes the correctly classified non-buildings.As shown in the result, the CNN_C classifier has high performance in detecting buildings.In Figure 4b, it can be seen that the classification error decreases stably as the iteration increases.

Comparison and Discussion
We utilized machine learning methods including AdaBoost and CNN to identify buildings from remote sensing images and generated the relative testing results, as mentioned in Sections 4.1 and 4.2, respectively.Inferring from the comparisons in accuracy, the best result of Kappa and overall accuracy are 0.312% and 96.20% in the AdaBoost algorithm, respectively, whereas in CNN,

Comparison and Discussion
We utilized machine learning methods including AdaBoost and CNN to identify buildings from remote sensing images and generated the relative testing results, as mentioned in Sections 4.1 and 4.2 respectively.Inferring from the comparisons in accuracy, the best result of Kappa and overall accuracy are 0.312% and 96.20% in the AdaBoost algorithm, respectively, whereas in CNN, the result is enhanced to 0.564% and 96.30%, respectively.According to the comparison, in the corresponding testing area, the Kappa of our CNN method is approximately 25% higher than that of AdaBoost.For overall accuracy, CNN also outperforms AdaBoost.
Note that the traditional visual interpretation of remote sensing images is a complex and time-consuming process.Although it has very high accuracy, it is not suited to large-scale automation projects.The effect of AdaBoost methods is highly dependent on the training feature.The color and Haar-like features chosen in our experiment cannot express all the helpful and useful features of the buildings, which impacts the accuracy of the result.In contrast, CNN can mine and extract deeper information on the input features of building, which can be helpful in identification.

Practical Application
In this section, we demonstrate how the CNN method works via the entire image in Figure 1a, which is 30 times bigger than that in Figure 1d.We compare two kinds of input training data and separately obtain the identification results as follows (Table 8).The training data in type Train_B contains more diverse negative sample information than that in Train_A, with 150,000 samples including information of mountains and other types of land.The identification results are as follows (Table 9).As observed from the testing result, the Kappa and overall accuracy increase when the negative training samples are expanded.In Figure 4, most of the village buildings can be identified, but the inaccuracy points are mainly distributed in the village boundary and some building-like areas.From Figure 5b, areas such as mountains can be identified as non-buildings areas, whereas they could not be detected in Figure 5a.
AdaBoost.For overall accuracy, CNN also outperforms AdaBoost.
Note that the traditional visual interpretation of remote sensing images is a complex and time-consuming process.Although it has very high accuracy, it is not suited to large-scale automation projects.The effect of AdaBoost methods is highly dependent on the training feature.The color and Haar-like features chosen in our experiment cannot express all the helpful and useful features of the buildings, which impacts the accuracy of the result.In contrast, CNN can mine and extract deeper information on the input features of building, which can be helpful in identification.

Practical Application
In this section, we demonstrate how the CNN method works via the entire image in Figure 1a, which is 30 times bigger than that in Figure 1d.We compare two kinds of input training data and separately obtain the identification results as follows (Table 8).The training data in type Train_B contains more diverse negative sample information than that in Train_A, with 150,000 samples including information of mountains and other types of land.The identification results are as follows (Table 9).As observed from the testing result, the Kappa and overall accuracy increase when the negative training samples are expanded.In Figure 4, most of the village buildings can be identified, but the inaccuracy points are mainly distributed in the village boundary and some building-like areas.From Figure 5b, areas such as mountains can be identified as non-buildings areas, whereas they could not be detected in Figure 5a.To detect the details of the performance of Train_B, we split the image into 30 small images of size 900 ˆ600 pixels as follows (Figure 6).
The performance of each part is different and the accuracy comparison is as follows.
For different study areas, the overall accuracy of the CNN method can be up to 99.00%.We only show part of the results in Table 10, especially those containing information of buildings (positive pixels) where Kappa can be up to 0.888, which indicates high performance.We can also infer that the overall accuracy of the proposed method is considerably stable in different areas.As Kappa is relatively low in some areas, identification accuracy can be improved by expanding the training data, as shown in previous processes.To detect the details of the performance of Train_B, we split the image into 30 small images of size 900 × 600 pixels as follows (Figure 6).In general, the experimental results indicate that our proposed method based on machine learning, especially on CNN, has high performance in remote sensing identification of buildings.

Conclusions and Future Work
In this paper, we propose two kinds of supervised machine learning methods for building identification based on GE images.The corresponding machine learning methods AdaBoost and CNN help us obtain building identification classifiers by training the designed samples.Applying the obtained AdaBoost and CNN classifiers to automatically extract information of buildings from the remote sensing images generates the identification maps of buildings.
The proposed method shows the ability of supervised machine learning in village mapping, which is experimentally demonstrated by including several kinds of areas in Kaysone.The obtained classifier can be used for all cases and no manual interaction is needed.Our method of CNN achieves a Kappa of 0.564 and an overall accuracy of 96.30%, which is comparable to those of AdaBoost and other methods.
Furthermore, the proposed method can be efficiently utilized in remote sensing recognition of not merely buildings.By training the prior knowledge of the corresponding identification samples, the proposed method can generate a classifier that has the ability to classify relative targets and leads to promising classification results.Therefore, based on the result of this experiment, our proposed method is a promising approach that might be applied to many potential applications in the near future.
Although this study indicates that the proposed method could be efficiently used in building identification, further and more detailed exploration on the method is required in the future.First, to test the method's stability, more extended and sophisticated areas need to be tested.Second, to alleviate the labor-intensive task of training the data, we will apply the learned model of one area to other areas with similar landscapes.Unfortunately, the spectral characteristics of remote sensing images respond to the varying conditions of image capture.To improve the performance in such cases, we will consider a transfer learning technique.Third, we will apply the proposed method to other feature identifications in high-resolution remote sensing images, such as roads and agricultural land.We are also interested in extending this method to classifications of multi-class landscapes.We believe that the proposed method has great practical value for solving diverse classification problems.

Figure 1 .
Figure 1.Kaysone area (a); located in Savannakhet Province, Laos (b).Panels (c) and (d) are enlarged views of typical village areas; and (e) and (f) show mountain and vegetation areas with similar tones to buildings.The resolution of all images (except (b)) is 1 m.

Figure 1 .
Figure 1.Kaysone area (a); located in Savannakhet Province, Laos (b).Panels (c) and (d) are enlarged views of typical village areas; and (e) and (f) show mountain and vegetation areas with similar tones to buildings.The resolution of all images (except (b)) is 1 m.
for each output unit i in the output layer l
, W ij and b i are updated into the optimum solution; then we can calculate the optimum classification h W,b pxq.

Figure 6 .
Figure 6.Classification results (a-D) of test areas using CNN_C.The performance of each part is different and the accuracy comparison is as follows.

Figure 6 .
Figure 6.Classification results (a-D) of test areas using CNN_C.

Table 1 .
Input data of middle layers.

Table 3 .
Training data of color features.

Table 4 .
Comparison of accuracy based on color features.

Table 5 .
Training data of CNN.

Table 6 .
Comparison of accuracy based on CNN.

Table 7 .
Confusion matrix based on CNN in pixel.

Table 7 .
Confusion matrix based on CNN in pixel.

Table 8 .
Different training data using CNN.

Table 9 .
Comparison of accuracy based on CNN.

Table 8 .
Different training data using CNN.

Table 9 .
Comparison of accuracy based on CNN.

Table 10 .
Accuracy assessment of all the testing data in Figure4based on CNN.