Mapping Urban Land Use by Using Landsat Images and Open Social Data

High-resolution urban land use maps have important applications in urban planning and management, but the availability of these maps is low in countries such as China. To address this issue, we have developed a protocol to identify urban land use functions over large areas using satellite images and open social data. We first derived parcels from road networks contained in Open Street Map (OSM) and used the parcels as the basic mapping unit. We then used 10 features derived from Points of Interest (POI) data and two indices obtained from Landsat 8 Operational Land Imager (OLI) images to classify parcels into eight Level I classes and sixteen Level II classes of land use. Similarity measures and threshold methods were used to identify land use types in the classification process. This protocol was tested in Beijing, China. The results showed that the generated land use map had an overall accuracy of 81.04% and 69.89% for Level I and Level II classes, respectively. The map revealed significantly more details of the spatial pattern of land uses in Beijing than the land use map released by the government.


Introduction
Urbanization in China is taking place at a fast rate [1].The proportion of urban population increased from 20% in 1982 to more than 50% in 2014 [2].Large-scale urbanization has had a dramatic impact on the environment and the wellbeing of seven hundred million urban residents.Studies that assess this process and its impacts are important for taking remedial actions and designing better urbanization strategies for the future.To achieve these goals, detailed urban land cover/use maps are required.
Currently, land cover information with resolutions ranging from low to high is the primary data source used in studies such as urban growth simulation [3][4][5], evaluation of urban public health [6], and assessment of urban ecosystem services [7][8][9].However, to study issues such as housing provision [10,11], urban transportation, job accessibility and residential relocation [12,13], and land use patterns [14], detailed information on urban land use is needed due to the difference between the two concepts: land use is a cultural concept that describes human activities and their use of land, whereas land cover is a physical description of land surface [15].Land cover can be used to infer land use, but the two concepts are not entirely interchangeable.
Nevertheless, high-resolution urban land use maps covering large spatial extents are relatively rare because local knowledge and the techniques necessary for developing these types of maps are often not available, particularly for developing regions [16,17].Moreover, urban land use maps are normally produced by interpreting aerial photographs, field survey results, and auxiliary materials, such as appraisal records or statistical data [18,19].The evolving nature of urban development often outruns the on-and-off efforts to update existing land use databases and results in outdated maps.To make the situation worse, high-resolution land use maps are frequently kept out of the reach of the general public.As a result, to obtain land use maps that capture the pace of urban development in a timely and accurate manner at a relatively large spatial scale is a critical challenge in urban studies, both in China and in other countries facing similar situations.
Satellite-based remote sensing holds certain advantages in monitoring the dynamics of urban land use because of the large spatial coverage, high time resolution, and wide availability.Pixel-based image classification methods using spectral [20] and/or textural properties [21,22] are frequently applied to extract urban land use information.Recently, per-field and object-based classification methods have gained popularity in deriving land uses from the satellite images [19,23] because per-field classification methods can better describe the function of urban areas and serve the needs of urban planning [24].Although significant progress has been achieved, deriving high-resolution urban land use maps from satellite images is still a difficult task.The medium-resolution satellite images (e.g., Landsat images) allow for mapping urban areas at the large spatial scale, but it is still difficult to extract socioeconomic features of urban areas from these images [25].Land cover information derived from medium-resolution satellite images cannot provide sufficient separation among urban functional zones.Satellite images with high spatial and spectral resolution provide more detailed information on urban structures and thus facilitate the assignment of socioeconomic functions to different zones.Nevertheless, these images are prohibitively expensive in general.
The emergence of open social data creates new opportunities for mapping urban land uses at high-resolution.Open social data containing spatiotemporal patterns of human activities can be used to uncover land uses and intra-city functions [25].Efforts have been made to use social data sets, such as mobile phone records [26][27][28], taxi data [29,30], and smart card data [31], to reveal characteristics of urban land uses.The results have shown that social data can accurately capture the spatiotemporal rhythm of human activities, which provides a way for exploring how cities function at fine spatiotemporal resolution [32].However, the existing studies were often implemented over relatively small areas or specific land use types using data sets that were subjective or proprietary.In addition, the physical attributes of urban functional parcels, which can help determine land use at the parcel level, were seldom included in these studies [25].
There is strong potential to combine the strength of these two data sources, i.e., integrating social knowledge with remotely sensed data, to gain better insights into urban land use patterns.Physical features extracted from satellite data and socioeconomic features retrieved from open social data can be combined to characterize urban land uses more accurately.One type of open social data that is particularly promising for this purpose are Points of Interest (POI) data.POI data are geographical data provided voluntarily by individuals.The data are mainly used for monitoring users' positions in spatial tracking or geo-caching systems.POI data can link geographic locations to particular places, descriptive features, and other place-based information.In other words, POI data can link the informal world of everyday human discourse and the formal world of GIS (Geographic Information System) [33][34][35][36].As far as we know, there are no reports on using POI data and satellite data jointly to produce detailed land use maps.To fill in this gap, we developed a protocol that utilizes medium-resolution satellite images and POI data to map urban land uses.In the rest of the paper, we describe the details of the protocol and its application in Beijing, China.We also discuss the strengths and limitations of the proposed protocol and make suggestions for future work in this field.

Study Area
Beijing, the capital of China, is located in the North China Plain.The south and east parts of the city are plains, whereas the remaining areas are surrounded by mountains.Beijing has an administrative area of 16,808 km 2 , which includes a mix of urban and rural land uses.The urban and rural land use types are also referred to as built-up (constructed or developed) regions and non-built-up regions in China.The area of the built-up regions is approximately equivalent to the total impervious surface area of the entire region (Figure 1). up regions in China.The area of the built-up regions is approximately equivalent to the total impervious surface area of the entire region (Figure 1).

Data Collection
The administrative boundary of Beijing falls over two sets of Landsat images (path/row: 123/32 and 123/33).Totally 14 Landsat 8 Operational Land Imager (OLI) images of 2013 were procured as our primary data source from the U.S. Geological Survey (http://earthexplorer.usgs.gov/).These images were selected because of the low cloud proportions (<10%).Multiple available Landsat images with good quality in 2013 were used to remove the impact of cloud contamination, phenology of

Data Collection
The administrative boundary of Beijing falls over two sets of Landsat images (path/row: 123/32 and 123/33).Totally 14 Landsat 8 Operational Land Imager (OLI) images of 2013 were procured as our primary data source from the U.S. Geological Survey (http://earthexplorer.usgs.gov/).These images were selected because of the low cloud proportions (<10%).Multiple available Landsat images with good quality in 2013 were used to remove the impact of cloud contamination, phenology of vegetation and cropland rotation.In addition, a seasonal dynamic of land cover series, generated by Wang et al. [37], was used to help identify urban parcels.
Data on the road networks of Beijing were collected from Open Street Map (OSM) (https://www.openstreetmap.org),a provider of free open geographical data.The data are in vector format and contain different classes of streets organized using street levels and sizes.Street levels, in descending order, correspond to primary highways, primary roads, secondary roads, and small roads (i.e., local, neighborhood and rural streets).
Also more than 30 million POIs in Beijing were gathered from http://www.datatang.com/data/44484.These points were acquired from a popular social network in China (SINA WEIBO, a social network similar to Twitter), through a location-based service.Each point contains the functional and locational properties of a site, which has been automatically recognized by mainstream map suppliers (i.e., Google or Baidu maps) [38].The initial twenty types of POI were aggregated into 10 general categories, including residential, marketing and recreation, service building, hotel and restaurant, industrial, medical, educational, institutional infrastructure, government and social organization, and transportation land (see Table S1).POIs that did not belong to the aforementioned groups were removed.The quality of the POI data were verified by checking 100 randomly sampled sites manually, and the resulting accuracy level was 97%.Although spurious social data may occur, the overall pattern (or distribution) can be accurately reflected by using a huge amount of points.

Method
The overall structure of the protocol is shown in Figure 2. First, the entire study area was segmented into parcels based on road networks following the methods developed by Long and Liu [38] (Figure 2-1).Parcels are basic units used in this classification scheme with the assumption that they are homogeneous in terms of urban functions [39].The parcels were then separated into built-up areas and non-built-up areas based on classified impervious surface areas [37] and defined our classification system based on these two regions (Figure 2-2).The function of each parcel was inferred using the normalized feature distance (or similarity) to the pre-collected training sample units.The similarity of the built-up parcels was based on 10 socioeconomic features (i.e., residential, marketing and recreation, service building, hotel and restaurant, industrial, medical, educational, institutional infrastructure, government and social organization and transportation land) that were derived from the normalized kernel densities of the different functions of POI data and two physical indices derived from multi-temporal Landsat images (Figure 2-3).The land use types of the non-built-up parcels were determined by the dominant land cover types of the parcels.Finally, the classified built-up and non-built-up regions were merged into a final land use map of the city and assessed its accuracy (Figure 2-4).
infrastructure, government and social organization and transportation land) that were derived from the normalized kernel densities of the different functions of POI data and two physical indices derived from multi-temporal Landsat images (Figure 2-3).The land use types of the non-built-up parcels were determined by the dominant land cover types of the parcels.Finally, the classified builtup and non-built-up regions were merged into a final land use map of the city and assessed its accuracy (Figure 2-4).

Parcel Generation
A parcel is the basic unit carrying socioeconomic functions in urban management and urban planning.A parcel is relatively homogeneous in terms of land use function [40].Long and Liu [38] suggested that parcels are polygons bounded by road networks, which serve as natural segmentation boundaries of the urban area.In this study, we adopted this assumption and defined parcels using OSM road networks data.To remove some unnecessary details (e.g., overpasses), first a trim operation was performed for roads shorter than 500 m and both ends of roads for 100 m were extended to form road segments by connecting originally non-connected lines.As a result, small holes were filled and some incomplete roads were removed.Then, road spaces were generated through a dilation operation around road networks by referring to MoHURD (Ministry of Housing and Urban-Rural Development) [41] and the actual road situation in Beijing.Different road widths were used for different road levels, e.g., 55 m, 50 m, 40 m, and 35 m for the first, second, third and fourth levels, respectively.Finally, land parcels bounded by roads were generated after removing the road spaces (see Figure 3).After parcel generation, 17692 parcels of built-up area and 11072 parcels of non-built-up area were obtained.

Classification System
MoHURD defined the built-up area as places dominated by artificial buildings and structures and non-built-up area as places mainly occupied by cultivated land, forests, grassland, water and water conservancy facilities [42][43][44].First the proportion of impervious area for each parcel was computed based on the classified impervious surface map in the land cover map product [37].In this study, a threshold of 0.3 were used as suggested by [45,46], to differentiate the built-up and non-built-up areas.If the proportion of impervious surface of a parcel was higher than the threshold, then this parcel was designated as built-up.Otherwise, the parcel was labeled as non-built-up.
For the non-built-up area, the classification system of land cover map developed by Gong et al.
[47] was adopted, and the description is shown in Table 1.Referring to the Chinese land use classification criteria (GB/T21010-2007), the built-up areas were divided into four Level I classes and nine Level II classes (see Table 1).
were filled and some incomplete roads were removed.Then, road spaces were generated through a dilation operation around road networks by referring to MoHURD (Ministry of Housing and Urban-Rural Development) [41] and the actual road situation in Beijing.Different road widths were used for different road levels, e.g., 55 m, 50 m, 40 m, and 35 m for the first, second, third and fourth levels, respectively.Finally, land parcels bounded by roads were generated after removing the road spaces (see Figure 3).After parcel generation, 17692 parcels of built-up area and 11072 parcels of non-builtup area were obtained.

Classification System
MoHURD defined the built-up area as places dominated by artificial buildings and structures and non-built-up area as places mainly occupied by cultivated land, forests, grassland, water and water conservancy facilities [42][43][44].First the proportion of impervious area for each parcel was computed based on the classified impervious surface map in the land cover map product [37].In this study, a threshold of 0.3 were used as suggested by [45,46], to differentiate the built-up and non-built-up areas.For land use mapping in built-up areas, training parcels for Level II were selected by referring to high-resolution images on Google Earth and Baidu Map (http://map.baidu.com/)[47].Selected training samples were the most typical representatives of the Level II classes.For instance, the class of cottage is usually distributed in suburban areas and has higher housing intensity than other classes.The community type features a protective gate and some green vegetation.The class of industrial use contains areas for manufacturing, assembly, and fabrication operations, often over a relatively large scale but with lower intensity [23].For the class of commercial use, the floor area ratio and building density are higher than industrial and residential uses.A typical characteristic of institutional use is that parcels usually contain a lower housing intensity than that of residential use or commercial use [19].Institutional use is designated for specific public services, where the size of the site is adequate for its use and is also highly accessible to roads.Based on these basic features, as well as the field survey, 120 training sample units at the parcel level were collected, in which we are confident in terms of their representativeness.Table 2 lists the number of training parcels adopted in the built-up regions.Figure 4 shows some examples of the collected training sample units.

Processing POIs
Within the spatial extent of a parcel, there may be a variety of POIs of different types, which can be regarded as having compound functions instead of a single function [40].In addition, the qualities of POIs vary among different categories, i.e., the number of POIs associated with the commercial type is greater than the other types.This results in an unbalanced distribution of the numbers of points among different POI types.To cope with these issues, we normalized the functional intensity of the different POI types using kernel density estimation.Kernel density analysis was implemented using the quadratic kernel function [48] with a search radius of 500 m.The output is a smooth surface indicating the densities, and regions with relatively higher density values indicate that there are more POI points.This processing can mitigate possible errors caused by unbalanced quantity gaps among different POI types.Figure 5 shows the density images for the different functional POIs (Table S1) adopted in this study.To facilitate comparison among different functional POIs, a normalization procedure was implemented in advance to remove the different quantities of POIs among the different function types (see Equation (1)).
where V norm is the normalized value of the POI density map, and V min and V max are the minimum and maximum values in the POI density maps, respectively.

Processing POIs
Within the spatial extent of a parcel, there may be a variety of POIs of different types, which can be regarded as having compound functions instead of a single function [40].In addition, the qualities of POIs vary among different categories, i.e., the number of POIs associated with the commercial type is greater than the other types.This results in an unbalanced distribution of the numbers of points among different POI types.To cope with these issues, we normalized the functional intensity of the different POI types using kernel density estimation.Kernel density analysis was implemented using the quadratic kernel function [48] with a search radius of 500 m.The output is a smooth surface indicating the densities, and regions with relatively higher density values indicate that there are more POI points.This processing can mitigate possible errors caused by unbalanced quantity gaps among different POI types.Figure 5 shows the density images for the different functional POIs (Table S1) adopted in this study.To facilitate comparison among different functional POIs, a normalization procedure was implemented in advance to remove the different quantities of POIs among the different function types (see Equation ( 1)).
where is the normalized value of the POI density map, and and are the minimum and maximum values in the POI density maps, respectively.

Retrieving Physical Features
To measure the quantity of green vegetation and building in each parcel, the Normalized Difference Vegetation Index (NDVI) and Normalized Difference Built-up Index (NDBI) were derived from Landsat images.NDVI is calculated using Equation (2).Several Landsat OLI images were used with acceptable image quality in 2013 to remove the impact of cloud contamination, phenology of vegetation, and cropland rotation [49].Multi-temporal NDVI indices were used to generate a composite maximum NDVI (i.e., ) in 2013 (see Equation ( 3)).

Retrieving Physical Features
To measure the quantity of green vegetation and building in each parcel, the Normalized Difference Vegetation Index (NDVI) and Normalized Difference Built-up Index (NDBI) were derived from Landsat images.NDVI is calculated using Equation (2).Several Landsat OLI images were used with acceptable image quality in 2013 to remove the impact of cloud contamination, phenology of vegetation, and cropland rotation [49].Multi-temporal NDVI indices were used to generate a composite maximum NDVI (i.e., NDV I max ) in 2013 (see Equation ( 3)).
NDV I max " MAX rNDV I 1, NDV I 2,...., NDV I n s where NDV I 1, NDV I 2,...., NDV I n are the multi-temporal NDVI indices in 2013.Similarly, NDBI is calculated using Equation ( 4), where MIR is OLI band 6.The index was useful for detecting impervious surfaces [51].As NDBI is a relatively stable index within one year, it was calculated using only one image acquired on 3 October 2013 because of its low cloud coverage.

Determination of Parcel-Based Land Use
Using the training parcels that were collected based on the definition of the land use classification system given in Table 1, a normalized feature distance (similarity) in built-up regions was computed on a parcel-by-parcel basis (Equation ( 5)).The features used for calculating the similarity index include 10 POI density images, one NDVI band and one NDBI band.Two statistical parameters (i.e., the mean value and standard deviation) were previously estimated using the collected training samples.Then, the similarity index of a given parcel was compared with each type [52,53].
where x i and σ i are the mean and standard deviation of the pre-defined land use type i acquired from the training parcel; m is the total number of land use classes; and x j is the parcel value (i.e., the mean of all pixels within the parcel) for feature j in either normalized POI density images, NDVI or NDBI.The smaller feature distance means higher similarity to a corresponding land use type.The urban land use type of a parcel was determined by calculating similarity S i of the parcel to training samples, and the pre-defined land use type of training samples which has the minimum value of S i was assigned to this parcel.In addition, the land cover map was adopted to determine the land use of parcels in non-built-up lands.A land parcel commonly has multiple land cover types.The dominant land cover type was identified and the land use function of this land cover type was assigned to the corresponding parcel.Finally, the classified built-up and non-built-up areas were combined to form the detailed land use map for the entire city.

Accuracy Assessment and Uncertainty
To assess the performances of land use classification, a random sampling scheme was adopted to collect a testing sample set over the study area [54].All testing parcels were surveyed by a field crew with a relatively high level of confidence.The total number of collected testing parcels was 269, among which 180 were located in built-up regions, and 89 were in non-built-up areas.Thereafter, confusion matrixes for Level I and Level II were built.
To assess the uncertainty of the obtained urban land use map, the standard deviation of similarities among all land use types for each parcel was used as an approximate indicator.Low standard deviation values suggested that there was a relatively little similarity difference among the different types of land use, which meant that the identified land use was more uncertain.High standard deviation values indicated considerable variance of similarities among different land use types, which would tend to result in a more credible assigned type of land use to a parcel because the minimum similarity was considerably different.This approach qualitatively captured the overall pattern of the uncertainty distribution.

Performance of the Land Use Map in Beijing
The generated land use map of Beijing is shown in Figure 6 (i.e., A and C represent the overall and local views, respectively).This map contains eight Level I classes and 16 Level II classes, and their confusion matrixes are given in Tables 3 and 4 respectively.We obtained an overall accuracy (OA) of 81.04% and a kappa coefficient of 0.78 for the eight Level I classes (Table 3).For the Level II classes (Table 4), the OA and kappa coefficient were 69.89% and 0.68%, respectively.For Level I, the residential land use class had the highest producer's accuracy (85%) in built-up areas, whereas the type of institutions had a relatively lower value of 66%.However, the categories with the highest producer's accuracy were community (74%) and industrial areas (74%) in the Level II classes, and the lowest was administrative departments (33%).

Performance of the Land Use Map in Beijing
The generated land use map of Beijing is shown in Figure 6 (i.e., A and C represent the overall and local views, respectively).This map contains eight Level I classes and 16 Level II classes, and their confusion matrixes are given in Tables 3 and 4, respectively.We obtained an overall accuracy (OA) of 81.04% and a kappa coefficient of 0.78 for the eight Level I classes (Table 3).For the Level II classes (Table 4), the OA and kappa coefficient were 69.89% and 0.68%, respectively.For Level I, the residential land use class had the highest producer's accuracy (85%) in built-up areas, whereas the type of institutions had a relatively lower value of 66%.However, the categories with the highest producer's accuracy were community (74%) and industrial areas (74%) in the Level II classes, and the lowest was administrative departments (33%).
We also compared the only land use map that is available to the general public from the Beijing Municipal Bureau of Land and Resources (http://www.bjgtj.gov.cn/art/2010/1/8/art_2340_90537.html) with our result (Figure 6B,C).The mapping year of the official data is 2005 (scale, 1: 100,000).The land use map generated in this study revealed much more information on urban land use patterns in central Beijing.The official map only shows the distribution of built-up areas and non-built-up areas (including agricultural areas, water bodies and green space).We also compared the only land use map that is available to the general public from the Beijing Municipal Bureau of Land and Resources (http://www.bjgtj.gov.cn/art/2010/1/8/art_2340_90537.html)with our result (Figure 6B,C).The mapping year of the official data is 2005 (scale, 1: 100,000).The land use map generated in this study revealed much more information on urban land use patterns in central Beijing.The official map only shows the distribution of built-up areas and non-built-up areas (including agricultural areas, water bodies and green space).

The Land Use Pattern of Beijing
Given that the spatial structure of Beijing bears an approximate radial concentric pattern defined by ring roads [55], we calculated the percentages of Level I land use classes of areas contained in the ring roads (Table 5).The built-up areas were mainly distributed inside the 5th ring road, and the largest portion was the class of residential land use, which accounted for more than 40% of the built-up areas.More than 80% of the commercial land use was located inside the 4th ring road.In contrast, the percentage of industrial land use was the highest between the 5th and 6th ring roads.For the non-built-up areas, the class of agriculture land use was mainly distributed outside the 5th ring road and accounted for 67.61% of the total area.Green space was the largest land use class outside of the 6th ring road, with a value of 67.94%.According to Figure 6 and Table 5, the urban core area mainly provided commercial and political functions.Groups of commercial and institutional parcels were distributed inside the 4th ring road and helped to form a multi-nuclei development pattern outside the urban core area.Residential areas were distributed the most widely of all land use types as a result of the large-scale development of residential blocks to meet the high housing demands since the early 1980s [49].The industrial areas were mainly concentrated at the urban fringe, probably due to less stringent environmental standards and the availability of cheap land in this region.

Uncertainty of the Mapped Result
The standard deviation of similarity based on each parcel was used to assess the confidence level of the identified land use type (Figure 7).The standard deviations of the parcels were relatively high inside the 4th ring road and gradually decreased from the center to the outside.We were more confident with the identified land use types in the center part of the city due to the relatively small areas of parcels and the higher abundance of POIs in this region.
The standard deviation of similarity based on each parcel was used to assess the confidence level of the identified land use type (Figure 7).The standard deviations of the parcels were relatively high inside the 4th ring road and gradually decreased from the center to the outside.We were more confident with the identified land use types in the center part of the city due to the relatively small areas of parcels and the higher abundance of POIs in this region.

Discussion
In this study, we utilized both medium-resolution satellite remote sensing data and open social data to identify detailed land use classes in urban areas.The global pattern of urban land use distribution is well-reflected.Physical features (i.e., NDVI and NDBI) derived from remote sensing

Discussion
In this study, we utilized both medium-resolution satellite remote sensing data and open social data to identify detailed land use classes in urban areas.The global pattern of urban land use distribution is well-reflected.Physical features (i.e., NDVI and NDBI) derived from remote sensing data describe the biophysical elements of urban areas.The open social data indicate human activities of a place, especially inside the built-up area.The use of both data sources is beneficial for urban land use type identification in urban areas.In addition, the sensitivity of features adopted in the similarity assessment was tested.If we used only the POI or two biophysical features derived from the Landsat images (i.e., NDVI, NDBI), the obtained accuracies of the urban land parcels (Level I) in the built-up region are 41% and 31%, respectively.However, when combined, the accuracy reaches 75%.Moreover, some function types that were not identified in previous studies have been classified in our result, such as service buildings, medical and public places [28,39].Hence, the use of both features at the parcel level produced more detailed and accurate land use maps than studies using single source data [23,28].
Remote sensing shows its advantages in the mapping of land cover dynamics over an aerial area.Combined with open social data, it is promising to map detailed urban land use over a large metropolitan area, which is advantageous compared with methods that focus only on discerning a specific land use type in a relatively small area [19,27].Our approach is a prototype to serve this purpose.The impervious surface of the land cover map separated the urban area into built-up and non-built-up regions, so different classification strategies were used in terms of the distinctive characteristics of the two parts.Our experimental results indicate that both the biophysical attributes and the social economic features are crucial to urban land use type identification.For instance, community parcels are usually built with some vegetation coverage due to the consideration of eco-friendly human habitation.We can identify land use function with the dominant land cover types in the non-built-up region because minimal social data are considered in this area.
Although our approach can generate a detailed land use map over a large area, some limitations, such as confusable parcels with mixed functional types (e.g., the accuracy performance of institutional type in Table 3 is an example), should be considered.Furthermore, for most Level II urban land use categories, the accuracies are relatively lower, i.e., the producer's accuracies for service buildings, medical lands and administrative departments are 57%, 44% and 33%, respectively.Possible reasons may be that (1) parcels generated by road networks performed well in the urban core area but faced problems in rural areas because the road networks become sparse [40] and that (2) the POIs and their densities do not contain adequate detail to differentiate urban land uses; particularly for Level II classes.Therefore, the limited representativeness of parcels and lower POIs densities [16] in the rural areas may lead to a decline in accuracy based on the uncertainty analysis.

Conclusions
High-resolution land use maps are needed for academic research and urban management.However, complex and heterogeneous urban landscapes pose challenges to land use mapping.The use of both physical features (i.e., spectral information) derived from remotely sensed data and social attributes (e.g., socio-economic function) derived from open social data can help to delineate the detailed land use patterns in urban areas.We developed an approach to combine the strength of these two types of data to identify land use types quickly over a large area and tested the effectiveness in Beijing.The use of both biophysical features and socioeconomic features resulted in a land use map with higher accuracy and more detail.The overall accuracy of the land use map for this extremely heterogeneous urban area reached 81.04% and 69.89% for the Level I and Level II categories, respectively.This approach can be applied to derive land use maps of a large area in a relatively short time wherever satellite data and open social data are available, especially for fast-growing urban areas in developing countries.
More efforts can be undertaken for further improvement.First, incorporating both road networks derived from open social data and segmentation derived from biophysical features (e.g., bands in Landsat images) may be helpful to generate more detailed parcels, especially for suburban areas in which road networks are relatively sparse.Second, the shapes and sizes of parcels can be considered when assigning possible land use types (e.g., parcels in industrial areas are relatively large, but parcels in residential areas are small).Third, non-linear approaches, such as neural networks, can be considered when building the relationships between physical and socioeconomic features and training samples [56,57] if the number of pre-collected training parcels is adequate.

Figure 1 .
Figure 1.Map of the study area.The white line shows the boundary of the administrative area of Beijing, China.The built-up areas are at the center of the satellite image and are indicated by a red boundary.

Figure 1 .
Figure 1.Map of the study area.The white line shows the boundary of the administrative area of Beijing, China.The built-up areas are at the center of the satellite image and are indicated by a red boundary.

Figure 2 .
Figure 2. The proposed flowchart for mapping detailed land use.Figure 2. The proposed flowchart for mapping detailed land use.

Figure 2 .
Figure 2. The proposed flowchart for mapping detailed land use.Figure 2. The proposed flowchart for mapping detailed land use.

Figure 3 .
Figure 3. Distribution of urban land use parcels.(A) Overall pattern; (B) and (C) are zoomed-in views (red frame) of the original road networks and the segmented land parcels.

Figure 3 .
Figure 3. Distribution of urban land use parcels.(A) Overall pattern; (B) and (C) are zoomed-in views (red frame) of the original road networks and the segmented land parcels.

Figure 4 .
Figure 4. Training samples for nine subclasses of land use in the built-up regions: (A) cottage; (B) community; (C) retail place; (D) service building; (E) industrial land; (F) medical place; (G) educational/research place; (H) administrative office; (I) public service.

Figure 4 .
Figure 4. Training samples for nine subclasses of land use in the built-up regions: (A) cottage; (B) community; (C) retail place; (D) service building; (E) industrial land; (F) medical place; (G) educational/research place; (H) administrative office; (I) public service.

Figure 5 .
Figure 5. Normalized Kernel density maps of Points of Interest (POI) data: (A) residential; (B) marketing and recreation; (C) service building; (D) hotel and restaurant; (E) industrial; (F) medical; (G) educational; (H) institutional infrastructure; (I) government and social organization; and (J) transportation land.

Figure 5 .
Figure 5. Normalized Kernel density maps of Points of Interest (POI) data: (A) residential; (B) marketing and recreation; (C) service building; (D) hotel and restaurant; (E) industrial; (F) medical; (G) educational; (H) institutional infrastructure; (I) government and social organization; and (J) transportation land.

Figure 6 .
Figure 6.Detailed land use map of the Beijing area in 2013.(A) Overview map of the Level I land use; (B) zoomed-in view of the official land use map in Beijing; (C) detailed map of Level II land use types with the same extent of (B).

Figure 6 .
Figure 6.Detailed land use map of the Beijing area in 2013.(A) Overview map of the Level I land use; (B) zoomed-in view of the official land use map in Beijing; (C) detailed map of Level II land use types with the same extent of (B).

Table 3 .
Confusion matrix of the classification of Level I urban land use types.

Figure 7 .
Figure 7. Map of the standard deviation of the similarities of parcels.The 2nd to the 5th ring roads are shown by orange lines.

Figure 7 .
Figure 7. Map of the standard deviation of the similarities of parcels.The 2nd to the 5th ring roads are shown by orange lines.

Table 1 .
Land use classification system.

Table 2 .
Quantity of collected training parcels.

Table 4 .
Confusion matrix of the classification of Level II urban land use types.

Table 5 .
Percentages of different land use types among the ring roads.