Unsupervised Global Urban Area Mapping via Automatic Labeling from ASTER and PALSAR Satellite Images

In this study, a novel unsupervised method for global urban area mapping is proposed. Different from traditional clustering-based unsupervised methods, in our approach a labeler is designed, which is able to automatically select training samples from satellite images by propagating common urban/non-urban knowledge through the unlabeled data. Two kinds of satellite images, captured by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and the Phased Array L-band Synthetic Aperture Radar (PALSAR), are exploited here. In this method, spectral features are first extracted from the original dataset, followed by coarse prediction of urban/non-urban areas via weak classifiers. By developing an improved belief-propagation based clustering algorithm, a confidence map is obtained and training data are selected via weighted sampling. Finally, the urban area map is obtained by employing the Support Vector Machine (SVM) classifier. The proposed method can generate urban area maps at a resolution of 15 m, while the same settings are used for all test cases. Experimental results involving 75 scenes from different climate zones show that our proposed method achieves an overall accuracy of 84.4% and a kappa coefficient of 0.628, which is competitive relative to the supervised SVM method. Remote Sens. 2015, 7 2172


Introduction
Urbanization has always been an important issue with great impacts for various applications ranging from regional and global environmental changes [1,2], socio-economic problems [3], to urban planning and disaster management [4,5].The percentage of global urbanization has been increasing in the past decades and now more than half of the world's population lives in urban settlements [6], which further enhances the importance and impacts of urbanization and becomes increasing popular worldwide [7,8].Global urban area maps are exploited in various researches to evaluate the influence of urbanization on the natural and human environments and to estimate some important aspects of urbanization such as the size, scale and shape of cities [9].Comparing with traditional methods, the satellite-based remote sensing technique offers advantages in monitoring such properties of urbanization, due to its timeliness, efficiency and global coverage.Therefore, the study of deriving global urban area maps and corresponding attributes from different kinds of satellite images is attracting increasing attention worldwide [10][11][12][13][14].
To recognize urban areas from satellite images, different classification and clustering methods are employed for this purpose and the reader may refer to [15] for an overview.However, this task remains very challenging due to the large diversity of spectral characteristics of urban areas.An example is the AVHRR 1km global land cover product [16], which employs an unsupervised clustering algorithm on most land cover classes with the assistance of manual image interpretation.However, urban areas cannot be consistently classified by using this method, due to the heterogeneous features and complex patterns of land use in urban areas.Therefore, in the AVHRR product additional maps from Defense Mapping Agency are integrated for identifying urban areas.
In general, urban area classification methods can be divided into two categories: supervised and unsupervised ones.For supervised methods, support vector machine (SVM) based classifiers are very popular due to the good performance and robustness [17].In [18], an urban area mapping method is proposed by combining multiple SVM classifiers via fuzzy integral and attractor dynamics.In [19], a SVM-based region growing method is presented for extracting urban areas from data captured by Defense Meteorological Satellite Program's Operational Line-scan System (DMSP-OLS) and Satellites Pour l'Observation de la Terre (SPOT) Vegetation (VGT).In addition, artificial neural network (ANN) based methods are also widely used [20,21], especially in early studies.Other supervised classification methods such as decision tree, random forest and logarithmic regression can also be found in urban area related studies [22][23][24], which achieve plausible results.In [25], several supervised classification methods are exploited together.First logistic regression models are created to represent the priori probability of urban areas, and then the supervised classification is performed by combining the decision tree and boosting techniques.For unsupervised methods, traditional clustering methods such as K-Means and the iterative self-organizing data analysis technique (ISODATA) are often exploited.In GLC2000 land cover products, unsupervised classification methods are applied to multi-spectral and multi-temporal datasets for generating land cover maps, but regional products are produced and tuned independently by different groups [13].In IGBP-DIS global 1km land cover products, an optimized K-Means algorithm for handling large datasets is utilized [26].In some studies, both supervised and unsupervised methods are employed for recognizing urban areas.In the GlobCover product, first a supervised spectral classification is conducted for identifying some specific land cover classes.Then an unsupervised clustering algorithm is applied to the spectro-temporal characteristics, followed by an automated reference-based labeling step [27,28].
Usually, supervised methods produce higher accuracy than unsupervised ones, while more processing steps are required in order to build reliable training data [19,29].These researches provide very valuable information about the urban settlements, especially for regions in developing countries which are less documented.
However, there is a common issue for both supervised and unsupervised urban area mapping methods.Due to the large variety of local landscapes in different areas, most classifiers need to be tuned based on the local study area and the accuracy of urban maps may decrease violently if the same settings are applied directly to other areas [16].This means large amount of human interaction of experienced researchers is needed for parsing the results, which can be very time-consuming and expensive.The cost is even higher for supervised methods, since training samples of good quality need to be collected for each scene.In addition, most products of global urban mapping have limited resolution, ranging from about 300 m to 9000 m [11].
In this paper, a robust unsupervised global urban area mapping method is proposed, which performs urban classification fully automatically for all 75 test scenes and is able to generate urban area maps at a resolution of 15 m with an average overall accuracy of 84.4%.The rest of this paper is organized as follows.In Section 2, we briefly introduce the problem and the dataset used in this study.Section 3 describes the details of the proposed method.Experimental results are presented in Section 4 and the paper is summarized in Section 5.

Defining Urban Area
In social and economic studies, an urban area is characterized by high population density and is usually defined by its demographic attributes according to the available information of administrative units.However, as pointed out in [30], this definition of urban area suffers from the heterogeneity: the national definitions can vary much across countries and over time.In addition, it also depends on the available information of administrative units, which is less documented in developing countries and therefore results in low-resolution urban area maps.
Independent of demographic attributes of regions, in this paper urban areas are defined according to their spectral features, i.e., the value of pixels from multi-spectral satellite images.In remote sensing literature [3,31], urban areas are usually defined as places which are recognized as "built up" objects, such as buildings, roads and dams.Correspondingly, non-urban areas are defined as places without any artificial objects, such as grassland, forests, rivers and agricultural fields.This definition of urban area is homogeneous and can be applied for analyzing urban areas across different countries over time.
For global urban area mapping, the issue of sub-pixel mixing plays an important role.It is considered as one of the main reasons why various products of urban area maps which are derived from low-resolution satellite images can have significant differences [32].In this research, two kinds of high-resolution satellite images are exploited, and the spatial resolution of our urban area map is 15 m, much higher than most existing maps.According to the analysis in [33], we believe that the 15 m urban area map is sufficient for representing most features of urban land covers.Therefore, the problem of sub-pixel mixing will not be discussed and is considered as a future task of our research.In addition, due to the limitation on the mechanism of satellite remote sensing, urban areas which are covered by non-urban objects, such as houses hidden by dense canopy of trees, will be classified as non-urban in our method.

ASTER and PALSAR Satellite Images
In the proposed method, satellite images captured by ASTER and PALSAR are exploited for generating urban area maps.The ASTER instrument is provided by the Japanese Ministry of Economy, Trade and Industry (METI) and has been operating for global coverage since December 1999 [34].ASTER includes three separate optical subsystems with different ground resolution: the visible and near-infrared (VNIR) radiometer, shortwave-infrared (SWIR) radiometer, and thermal infrared (TIR) radiometer.It supplies VNIR satellite images of 15 m spatial resolution, which are superior to most existing global urban maps.In addition, VNIR is especially useful since it can provide stereo coverage in Band 3, according to its nadir (Band 3N) and backward (Band 3B) views.Therefore, ASTER/VNIR images attract increasing attention and have been exploited for a number of urban area related researches such as [35][36][37][38].
In this work, four types of ASTER VNIR satellite images from three spectral bands (Band 1, Band 2, Band 3N and Band 3B) are utilized, denoted as Aster b1 , Aster b2 , Aster b3 and Aster b4 , respectively.In addition, the research in [39] shows that terrain information is very helpful for recognizing urban areas.Thus the degree of slope, which is calculated from the digital elevation model (DEM) generated by stereoscopic analysis of ASTER/VNIR data, is also exploited.
PALSAR was developed by METI as a joint project with Japan Aerospace Exploration Agency (JAXA), and was launched in 2006 on board the Advanced Land Observing Satellite (ALOS) [40].Features of PALSAR, such as multi-polarization and off nadir pointing, improved the accuracy of recognizing geological structure [41].PALSAR satellite images have been applied for urban area mapping in recent researches [42,43] and the study in [44] shows that ALOS/PALSAR data have better performance for distinguishing bare lands and deserts from urban areas than ASTER images.Therefore, PALSAR HH (horizontal transmitting, horizontal receiving) and HV (horizontal transmitting, vertical receiving) polarization images obtained in the Fine Beam Dual polarization (FBD) mode are exploited here (denoted as hh and hv, respectively).In addition, to reduce the distortion caused by high degree of local incident angle in mountainous areas, a correction step on HH images was performed based on the method in [45], resulting in local-incident-angle corrected HH images (denoted as hh cor ).

Overview
Recognizing urban areas in a fully automatic way is very challenging.Traditional supervised classification methods need to build sample data for different scenes, while unsupervised ones require to tune parameters manually when handling different cases.These steps are very labor-intensive and can be quite expensive.Inspired by recent advances in semi-supervised learning methods which incorporate a small number of labeled data with unlabeled data [46][47][48], a novel unsupervised urban area mapping is proposed here.The general idea of our method can be explained as follows.In Figure 1a, case 1 and case 2 stands for two examples of distributions of urban and nor-urban areas, where x stands for the value of their spectral features.It can be seen that to distinguish urban/non-urban areas, the optimal threshold for cases 1 is u 1 .However, the distribution of case 2 is somewhat different and its optimal threshold is u 2 .It is obvious that applying u 1 to case 2 will lead to a lot of misclassifications and vice versa.Therefore, using exactly the same classifier for both cases will suffer much from the difference of these two distributions.For this reason, our proposed method tries to adapt the prior knowledge to the unlabeled input data.As shown in Figure 1b, based on some general prior knowledge of spectral distributions of landscapes, some pixels in the satellite images are recognized as urban/non-urban areas (denoted by blue/red points).Meanwhile, there are still a large number of unlabeled pixels (denoted by black points, respectively).First the similarity among all pixels is evaluated and the confidence of belonging to urban/non-urban area is propagated based on the similarity.Training samples will be selected based on the confidence, leading to a traditional supervised classifier.The final result is shown in Figure 1c, where the optimal threshold v 1 and v 2 can be obtained for case 1 and case 2, respectively.Therefore, in this way the proposed method can build the urban area classifiers for different scenes based on the distributions of input data.
The key part of our proposed method for global urban area mapping is building training samples in a fully automatic way.A labeler is designed for this task via analyzing ASTER VNIR images, ASTER slope data and PALSAR HH/HV images.Figure 2 shows the detailed processing flow of the proposed method.Firstly, various spectral features are extracted for further analysis.Then in the labeler coarse prediction of urban/non-urban areas is performed by applying prior knowledge to weak classifiers based on these features, resulting in a small number of urban/non-urban pixels.By improving a clustering algorithm known as Learning with Local and Global Consistency (LLGC) [49], an urban area confidence map is obtained and training samples are selected correspondingly.Finally, the urban area map is achieved by utilizing the Support Vector Machine (SVM) classifier with training samples and extracted features.The main advantages of our proposed method consist of three aspects: (1) The designed labeler only employs some common knowledge about urban area for coarse prediction and is able to refine the result adaptively according to the distributions of current unlabeled data.Therefore, our method shows strong ability of unsupervised learning from input data, which is demonstrated in our experiment involving 75 scenes over different climate zones.
(2) The proposed method provides competitive accuracy, even when comparing with the traditional supervised SVM method.
(3) The proposed method is fully automatic and its performance is quite robust.No manual interaction is needed and the same parameter settings are applied to all test scenes.

Feature Extraction
In addition to the original satellite images (Aster b1 ∼ Aster b4 , hh, hv) and preprocessing results (slope, hh cor ), some other features are also employed.Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) have been widely used after their appearance in [50,51] and are considered as very effective descriptors about vegetation features [52] and surface water features [53], respectively.For ASTER data, the definitions of NDVI and NDWI are given by : According to [44], in mountain areas the difference between PALSAR hh and hh cor images can be quite large, which is useful for recognizing non-urban areas.Therefore, here we define hh sub as follows: Moreover, entropy filtering is a common and effective technique for describing the richness of texture, by calculating the local entropy of pixels within a given window.Therefore, to describe the rich texture in urban areas, entropy filtering [54,55] is performed on the PALSAR hh image with a neighboring window size of 15 × 15 pixels, denoted as follows:

Predict Non-Urban and Urban Area
As aforementioned, in this step prediction of urban/non-urban areas will be performed based on some common prior knowledge.Several independent weak classifiers are utilized, generating a number of urban/non-urban pixels which will be used as seeds for the LLGC clustering algorithm later.Please note that we are not expecting that a single week classifier can recognize urban/non-urban areas with a high accuracy.The purpose of this step is to make a coarse prediction about salient urban/non-urban areas by combining these weak classifiers.In our design, it is still acceptable even if there are some misclassified pixels because the following LLGC algorithm is robust against noises.
When applying this step, we assume that a sufficient number of urban and non-urban pixels must exist in the scene.Otherwise, the result of urban/non-urban area prediction will be inaccurate, leading to poor performance on urban area classification.It is suggested that at least 10 5 points for both urban and non-urban land cover classes should appear in the scene to ensure the diversity of spectral features of urban/non-urban areas.Usually, this requirement can be easily satisfied as long as a city is included in the selected image.
The predictor for non-urban areas is designed as follows: (5) Here mean(.)stands for the average value of the input images and std(.) for the standard deviation.This predictor consists of 6 independent classifiers, which generate 6 masks correspondingly.For these masks, the value of mask is 1 if the condition is satisfied, and 0 otherwise.mask 1 and mask 2 are defined according to NDVI and NDWI.Pixels whose values are much higher than the mean value are marked, indicating obvious vegetation and water areas.mask 3 is defined in a similar way, intending to recognize non-urban areas that have large difference between hh and hh cor due to the influence of the incident angle, such as mountain areas.Based on our observations, usually the values of the PALSAR HH image in urban areas are much higher than those of non-urban objects and mask 4 is designed based on this rule.By analyzing the slope data and the richness of the texture, mask 5 and mask 6 are proposed based on given thresholds.Based on our experience, the values of thresh 5 and thresh 6 are set as 15 and 4.5, respectively, for all cases in the following experiments.
For all 6 masks, binary morphological operations [56], denoted as M orphF ilt(.)here, are utilized for refining the masks.First a morphological close operation is performed, followed by a morphological open operation, with fixed structuring elements of 10 × 10 pixels.The purpose is to remove isolated non-urban areas which include few pixels and mask nonurban is obtained by taking the union of these refined masks.
The predictor for urban areas is designed in a similar way and mask nonurban is also integrated.The definitions are as follows: mask urban = M orphF ilt mask 7 N ot(mask nonurban ) Here N ot(.) means the inverse of the binary mask.Similar to mask 4 , here mask 7 also exploits the rule about the high-reflectance rate in urban areas.Finally mask urban is obtained by using morphological operations to refine the intersection of mask nonurban and mask 7 .

Confidence Estimate by LLGC
It is noteworthy that mask urban and mask nonurban do not stand for the full set of urban/non-urban pixels, respectively.Theoretically, they only represent a subset of urban/non-urban pixels, which are salient enough to be recognized by the prior knowledge.In practice, this prediction result is not error-free and the label of some pixels may be incorrect.Here the prediction result will be regarded as initial seeds of the LLGC clustering algorithm and the urban area confidence map will be built by propagating the belief from seeds through the whole feature space where the input data reside.
The LLGC algorithm [49] was first proposed in 2004 and has been widely used due to its good performance and stability against noisy initialization.Here we provide a brief introduction about the LLGC algorithm and how it is implemented in our method.
Given a set of pixels X = {x 1 , x 2 , ..., x N }, the initial value of the N × 2 non-negative label matrix F is defined as follows: here the first and the second element in F i indicate the confidence of x i belonging to urban/non-urban areas, respectively.The affinity matrix W N ×N = [W i,j ] defines how the confidence should be propagated according to the similarity between each pair of pixels: where the constant σ stands for kernel size and dist(.) is a scalar function indicating the difference between the feature vectors of x i and x j .Then the normalized propagation matrix S is constructed as follows: where D = diag{d 1 , d 2 , ..., d N } is a diagonal matrix with d i equal to the sum of the i-th row of W .The label matrix F is updated through iteration, by propagating the confidence from labeled points to unknown ones: here α is the propagation parameter in (0, 1).The final solution of the label matrix F * can be expressed explicitly as follows: In our method, ASTER/VNIR Band 1, Band 2 and Band 3 images are exploited as clustering features and are merged to generate a color image Aster rgb .The dist(.) function is defined by: However, there are two problems if the original LLGC algorithm is applied.First, the number of urban/non-urban pixels based on the coarse predictor is imbalanced.The number of urban pixels is much smaller than that of non-urban pixels, and the LLGC algorithm will mark almost all pixels as non-urban since labeled non-urban pixels have a much stronger influence in the propagation step.Second, in our test cases the variable N , which stands for the valid number of pixels in a satellite image, can be as large as 2,000,000 and it is impossible to construct the dense matrix W and S with such a huge size.
To solve these problems, we improve the LLGC algorithm in two aspects: (1) Quantize the Aster rgb image by converting it into an indexed image and then apply LLGC to indexed colors.Pixels with the same indexed color will be considered as one entrance in F , with the number of pixels integrated in the matrix W and S correspondingly.In this way, N is not more than the maximum number of indexed colors, which is set as 300 in all test cases.Now the pixels are represented by X = {(x i , n i ), i = 1, 2, ..., M }, where M stands for the total number of indexed colors and n i for the number of pixels which belong to the i-th index color.The affinity matrix W is defined in a way slightly different from Equation ( 21), and its size becomes M × M : Note that now W i,i = 1.It can be proved that the propagation matrix S M ×M = [S i,j ] can be expressed as follows: By this means, the improved LLGC algorithm can achieve promising results while the computation cost is greatly reduced.
(2) Based on mask urban , find the largest connected urban area and choose the sub-image according to its bounding rectangle.The number of urban/non-urban pixels in this sub-image is balanced and the LLGC algorithm is performed for this region.The urban area confidence map of the sub-image can be mapped back to the whole image, according to the rule that pixels with the same indexed color share the same confidence.
In this way, the improved LLGC algorithm is able to generate the urban area confidence map efficiently and effectively.Training data for further classification, i.e., samples of urban/non-urban pixels, are obtained through weighted sampling, where the confidence of each pixel is used as the weight.

Urban Area Classification
Based on our proposed labeler, training data are obtained automatically and now traditional supervised methods for urban area classification can be applied.Here the widely used Support Vector Machine (SVM) classifier [57] is exploited.
In our method, a total of 10 features (Aster b1 , Aster b2 , Aster b3 , Aster b4 , slope, N DV I, N DW I, hh, hv, hh ent ) are used for classification, via the classical SVM classifier with a linear kernel function.The SVM classifier was implemented by using the LIBSVM library [58].

Accuracy Assessment
To evaluate the accuracy of extracted urban area maps, a widely used assessment method based on the confusion matrix is employed here.The confusion matrix is generated by cross-tabulation of the class labels from the classification results against the ground truth data.The diagonal elements in the confusion matrix represent the cases where the classification results agree with the ground truth data, while the off-diagonal ones show disagreements in the labels.For urban area mapping, there are two classes: urban and non-urban (abbreviated as U/NU, respectively) and the size of the confusion matrix is 2 × 2. The structure of the confusion matrix is shown in Figure 3.

Ground truth data NU U Total
Classification results According to comments in [59], the performance of urban area mapping will be evaluated via 4 parameters, defined as follows:

NU
P roducer s accuracy = n uu n +u (30) Overall accuracy = n uu + n nn n (31) In general, overall accuracy indicates the rate of correct classification, while user's accuracy and producer's accuracy show whether the urban areas have been overestimated or underestimated.Kappa coefficient represents the inter-rater agreement of the confusion matrix and sometimes is regarded as a more robust measure than overall accuracy.For more detailed interpretation about these parameters, please refer to [59,60].

Study Area
In this experiment, 75 urban areas are investigated and their locations are shown in Figure 4. Considering that the performance of urban area classification may vary based on the landscapes in the scene, these areas are selected from different climate zones, following a similar proportion of the number of cities by climate zone in GRUMP settlement points [22].In total, 10 scenes are from cities in the tropical zone, 16 from the arid zone, 33 from the temperate zone, and 16 from the cold zone.Here all ASTER images are obtained within the period from January 2000 to March 2008, and are aligned based on the Global Earth Observation Grid (GEO Grid) as described in [61].As for PALSAR images, Level 4.1 product (see the user's guide in [40] for more details) was utilized and the pixel spacing of HH/HV polarization images is 12.5 m.The PALSAR HH/HV images are captured from January 2006 to March 2011, and spatial resampling has been performed to align with the ASTER data of 15 m resolution, based on the GEO Grid service [62].We assume that there are no significant changes of urban area in these scenes between the capture date of ASTER and PALSAR images and generally this assumption is reasonable for most cases.

Ground Truth Data
To provide quantified evaluation about the accuracy of extracted urban area maps, ground truth data were collected via manual interaction, based on the false color images consisting of ASTER/VNIR satellite images (see Figure 5a for an example).One author and two trained assistants manually selected urban/non-urban pixels from the false color image based on their visual appearances on color tone and texture.Each operator separately selected a set of possible urban/non-urban points in random, and then submitted the data to the other two operators for verification.For a point to be interpreted as urban/non-urban, two of the three operators had to interpret it as urban/non-urban.For each scene, about 80∼90 pixels in total for urban/non-urban areas were sampled in random.It is noteworthy that the ground truth data are not involved in our method and are only used for evaluating the performance of different methods.

Criterion of Performance Evaluation
To verify the performance of our proposed method, the accuracy of urban area mapping are compared with other two baseline maps.First, we employed the global urban area map of 2001 from MCD12Q1 [63], which is derived from Terra-and Aqua-MODIS data.It has a resolution of about 500 m and covers all investigated cities in our experiment.In addition, it was considered as the most accurate urban area map over 140 cities among 8 maps [32].Here the MCD maps were resampled to 15 m resolution by using the resample function in the GRASS GIS software [64].Second, we designed a supervised urban area extraction method based on SVM.Half of the ground truth points are used as training data and the same procedures in Section 3.5 are performed to classify pixels and therefore generate the urban area map.
To evaluate the quality of extracted urban area maps, the accuracy parameters (user's accuracy, producer's accuracy, overall accuracy and kappa) are calculated and the corresponding confusion matrix is listed.In addition, the visual appearance of some cases is presented.
Here unsupervised classification methods were not selected for comparison and the reason is twofold.First, as shown in [13,16,28], a large amount of human interaction is needed for post refinement.Second, the performance of such methods heavily depends on the characteristics of local landscapes and usually parameters need to be carefully tuned for different scenes.Our proposed method is fully automatic and the experimental result was obtained in 75 different scenes with fixed parameter settings.Therefore, it is obvious that our method is superior to unsupervised methods in these two aspects.As for the accuracy of urban area mapping, we believe that the comparison with the SVM method, which achieves promising results for urban area mapping studies [17,19,65], is sufficient to demonstrate the performance of our method.

Processing Results of Proposed Method
In this subsection, we demonstrate how each part of the proposed method works via an example.Figure 5 shows an example taken at Mexicali, Mexico.Figure 5a is the ASTER/VNIR false color image, where the red channel stands for VNIR Band 3N (0.76-0.86 µm), green for VNIR Band 1 (0.52-0.60 µm), and blue for VNIR Band 2 (0.63-0.69 µm), respectively.Figure 5b is the PALSAR false color image, where the red channel stands for hh, green for hh cor , and blue for hv, respectively.Followed by the method described in Section 3.3, the predicted urban/non-urban areas are obtained (see Figure 5c), where blue points stand for urban and green for non-urban.And there are still a number of unknown locations, marked by white points.The urban area confidence map derived by the improved LLGC method and the automatically selected samples are displayed in Figure 5d,e, respectively.In the confidence map, the intensity of pixels stands for the likelihood of belonging to urban area, where a higher value indicates a larger possibility.Total 500 urban points and 300 non-urban points (marked as blue/green cross respectively) are selected by our labeler, which are used to train the urban area model based on SVM.The final urban area map according to the classification result of SVM is given in Figure 5f.
It can be seen that the prediction map generated by the common prior knowledge can only make a rough estimate about the urban/non-urban areas.Some points are marked with incorrect labels and some are still unknown.The result is refined by using the improved LLGC method to propagate the confidence of points, selecting corresponding training samples, and utilizing the SVM method to build the urban area classifier.It is clear that the final urban area map matches much better with the ASTER and PALSAR images than the prediction map.

Comparison Results and Discussions
As mentioned in Section 4.3, the urban area maps extracted by our method are compared with the maps from MCD12Q1 and the maps generated by the supervised SVM method (abbreviated as MCD/SVM, respectively).The accuracy parameters by climate zones are listed in Table 1 and the corresponding confusion matrices are displayed in Table 2 For different climate zones, the overall accuracy of our method is about 10%∼14% higher that that of MCD, and is about 3%∼5% lower than that of SVM.For kappa coefficient, our method also outperforms MCD and has close performance to SVM.In addition, the performance of the proposed method is quite stable for different climate zones.SVM is also stable for all zones while the performance of MCD is slightly different when handling cold and temperate zones.
It is noteworthy that MCD has the best producer's accuracy and the worst user's accuracy.The reason can be found from the confusion matrix: in MCD maps most of ground truth urban points have been successfully included, but meanwhile a large percent of non-urban points have also been incorrectly classified as urban points.In contrast, although SVM and our maps missed more ground truth urban points, the number of misclassified non-urban points is much less than that of MCD maps. Figure 6 shows the extracted urban area maps of 5 cities (Mexicali, Addis Ababa, Niamey, Khulna and Fes), in comparison with MCD and SVM maps.It can be seen that the spectral characteristics of these scenes may vary to a large extent due to the difference of landscapes, which is very challenging for traditional unsupervised methods.Our method automatically adapts the difference over scenes and extracts the high resolution urban area maps with promising accuracy.The urban area maps by our method have better description about urban areas than MCD maps, and their performance are quite similar to that of SVM maps.
In general, the experimental results indicate that our proposed unsupervised method has better performance than the low resolution MCD maps, and is comparable to the supervised SVM method.However, there are two limitations of this method.First, as mentioned in Section 3.3, it is assumed that the images to be classified must include a sufficient number of urban and non-urban pixels.To satisfy this assumption, usually a small amount of manual work about urban area selection or confirmation is required.Second, the key contribution of our proposed method is building training samples in a fully automatic way.In most ideal case, its performance should be close to that of the supervised SVM method.Therefore, it is not realistic to expect the proposed method can outperform the supervised SVM method with manually selected samples.Meanwhile, since our proposed method is an automatic one with fixed parameter settings, we believe its performance is very promising to many potential urban area mapping applications.

Conclusions and Future Work
In this paper, we present an unsupervised method for global urban area mapping, based on ASTER and PALSAR satellite images.Based on our carefully designed labeler, the common prior knowledge about urban/non-urban area is propagated via the improved LLGC clustering algorithm through the unlabeled dataset and training samples can be automatically selected.The urban area map is generated by applying the SVM classifier to extracted samples and spectral features.
The proposed method shows strong ability of unsupervised learning from input datasets, which is demonstrated in the experiment including 75 scenes from different climate zones.The same parameter settings are used for all cases and no manual interaction is needed.Our method achieves an overall accuracy of 84.4% and a kappa coefficient of 0.628, which is comparable to the supervised SVM method.
More importantly, the proposed method here indicates a novel framework for unsupervised learning problems in the field of remote sensing.Given some common prior knowledge about the objects of interest and sufficient unlabeled data set, the proposed framework can transfer the prior knowledge into the new data set in a reasonable way, leading to promising classification results.Therefore, we believe that the proposed framework has great practical value for various classification issues in remote sensing and might be applied for many potential applications in the near future.
The future work of this study consists of three aspects.First, we plan to extend this method by using additional high-resolution global land cover data sets such as Corine Land Cover data.Second, in this method, a sufficient number of urban/non-urban pixels are needed in the coarse prediction step.Therefore, we will try to improve the performance of this step by employing more prior knowledge.Finally, we are also interested in utilizing other semi-supervised learning methods, so that prior knowledge can be further integrated with the unlabeled dataset.

2 Figure 1 .
Figure 1.Methodology of unsupervised urban area mapping.(a) Two examples of traditional unsupervised classification under different distributions; (b) Step 1/2 of our method: find some salient candidates based on common prior knowledge; (c) Step 2/2 of our method: Propagate the confidence of candidates based on current distribution, select training samples automatically and perform classification.

Figure 2 .
Figure 2. Processing flow diagram of unsupervised global urban area mapping.

Figure 4 .
Figure 4. Distribution of investigated urban areas, which are marked by red crosses.

Figure 5 .
Figure 5. Processing results of our proposed method at Mexicali, Mexico (32.65 • N, 115.52 • W).(a) ASTER/VNIR false color image; (b) PALSAR false color image (image contrast was enhanced for better visual effect); (c) Prediction of urban/non-urban area; (d) Urban area confidence map derived by improved LLGC; (e) Generated training data; (f) Final urban area map.

Figure 6 .
Figure 6.Comparison results of urban area maps.(a) ASTER/VNIR false color image; (b) PALSAR false color image (image contrast was enhanced for better visual effect); (c) Urban area map derived by our method; (d) MCD urban area map; (e) SVM urban area map.
Figure 3.The structure of the confusion matrix for urban area mapping.

Table 1 .
. Accuracy assessment of urban area maps.

Table 2 .
Confusion matrix of urban area maps.