1. Introduction
Urban land use mapping is of great importance for urban structure optimization, resource allocation, and development planning [
1]. The rapid economic and urban developments in China have generated diverse and sophisticated urban functional zones, which are reflected in urban land use patterns [
2]. Therefore, the effective detection and mapping of urban land use patterns are significant for formulating effective urban planning policies, and need to be resolved immediately.
Numerous approaches have been developed to monitor urban land use over long time periods. Traditional field investigations and interview questionnaires can produce land-use maps, but they are costly and time consuming [
3]. Moreover, the rapid development of cities makes the field investigation out of date of the actual land use types. In recent years, many studies have proved the advantage of high spatial resolution (HSR) remote sensing images in land use/cover classification and analysis [
4,
5,
6], and spectral, textural, geometric and spatial features are frequently extracted from remote sensing HSR images to improve classification accuracies. Li et al. [
7] computed geometrical, morphological, and contextual attributes of different buildings using HSR images, and used these features to distinguish different building functions. Moreover, urban land use types were classified around the coastal zone by extracting landscape pattern indicators from HSR images [
8].
Remote sensing techniques perform excellently in extracting physical characteristics, such as the reflectance of land surface and the texture of urban space. However, it is difficult to identify functional interaction patterns or to understand socioeconomic environments. Rich social sensing techniques can help bridge the semantic gap between land cover type and urban functions [
9]. Hence, multisource social media data have been introduced to monitor residential activities and urban land use dynamics. Many studies suggest that point of interests (POIs) have great potential to reveal urban land use patterns [
2,
10]. Bus smart card data were introduced for mapping functional areas for the first time [
11]. Yu et al. [
12] attempted to examine the reversed linkage by revealing urban land use variations from taxi trajectory data. Mobile phone positioning data was used to understand urban functions and diurnal patterns with remarkable results [
13]. Weibo records can represent the daily activities of residents, due to their spatiotemporal characters, and have been demonstrated to be effective in describing urban function types [
14]. As the largest social media platform in China, real-time Tencent user density (RTUD) contains the hourly numbers of smartphone users who use Tencent applications and provides location-based services. RTUD holds great potential in the classification of urban land use types, especially in the regions with typical spatial-temporal features such as commercial and residential areas. RTUD data have been widely applied in functional zoning [
2].
Many studies have demonstrated the effectiveness of fusing remote sensing data with multisource data in functional mapping [
14,
15,
16]. The threshold method was used to combine Landsat Images and open social data to generate land-use types [
16]. Hierarchical clustering was utilized to analyze mobile data and HSR images [
17]. The random forest (RF) model was applied to fuse POI and Gaofen-2 (Chinese satellite, GF2) image features in order to obtain functional results [
14]. The Bayesian model was introduced to link hierarchical semantic cognition (HSC) images with road networks to obtain functional results [
18]. Although the above-mentioned classification methods are broadly available for data fusion and classification, it is still a challenge to accurately represent the characteristics of different data sources because of the gap between visions and cognitions [
19]. To solve this problem, Zhang et al. [
18,
20] built a hierarchical semantic cognition (HSC) model to bridge this semantic gap and used this for functional area mapping. Then, the latent Dirichlet allocation (LDA) model was introduced in land use mapping [
2,
21]. This model is skilled at classifying land-use with HSC images and multi-source data and has been proven as the best semantic model at present [
22].
Previous studies mainly focus on the selection of model and the introduction of new data, but ignore the choice of basic unit. The basic unit is an essential precursor to the data fusion and is fundamental to the entire classification. How to partition a city into small units concurrently using both social media data and remote sensing images is still a challenge, and the most appropriate segmentation still needs to be discussed in the case of combining remote sensing and social sensing data.
The meaningful spatial unit in the overall urban analysis is a building community [
23], which indicates that all the analyzed urban characteristics must be reflected in the building community scale. Here, a building community refers to a housing community surrounded by roads which is represented a traffic analysis zone (TAZ). Because of its clear functional significance in urban analysis [
24], the building community is currently commonly used in many studies [
10,
22,
25]. However, this fine-grained unit is not sufficient for Chinese cities because most scenes are mixed and composed of many categories [
26] and scene classification methods cannot satisfy the demands of practical applications. As can be seen in
Figure 1, many TAZs of Shenzhen have more than one land use type. For convenience and detailed analysis of mixed functional areas, an accurate, easier to divide scale is required. Therefore, grid scale units are used in some studies [
12,
17], because they can obtain higher classification accuracy and are easy to be generated. Building level units are also used for accurate urban functional area mapping [
27,
28]. However, it is difficult to obtain an accurate and up to date building footprint. In some studies, multi-scale segmentation methods are used to obtain basic units [
18,
29], in particular, remote sensing images are used as primary data. These above-mentioned patches coincide with the patterns of reality, which make objects meaningful [
30]. When only social media data are used, the road network is used to partition urban regions into basic units, while the multi-scale segmentation method is only used for remote sensing images [
29]. At present, some studies have explored the effect of different scale partitions on urban land use mapping. Tu et al. evaluated the scale effect regarding land-use inference by weighting remote sensing and human sensing [
31], Zhang et al. illustrated that classical segmentation could not divide functional scenes well [
29] and Yuan et al. discussed the influence of mixed functional parcels when using TAZs [
14]. However, there is not enough discussion on the comparison of different partitions and the impact on different scenes.
In summary, from the perspective of data sources, there is a general trend of using multisource data, including remote sensing images, POI and other social media data, and from the perspective of machine learning algorithms, urban functional mapping has moved from traditional statistical models to high-level semantically models in order to bridge the gap between low-level features and high-level semantic meanings. Thus, the effectiveness and advantages of using multisource data and high-level models have been widely studied and validated. However, how to properly partition an urban region into small basic functional units is rarely explored. Generally, there are four methods to partition an urban region into small functional units, including the traffic analysis zone, square grid, building footprint and segmented object methods. In previous studies, the researchers have usually chosen only one method; however, the pros and cons of using different methods have still not been fully discussed.
To obtain an accurate urban functional map and analyze the effect of different scales on the results, this study aims to combine remote sensing images and POI data to extract features, extract a final map using the LDA model and determine the comparative analysis of different partitions. Three different methods were compared to partition the study area into small units, and their impacts and advantages on different methods are observed for the first time. Additionally, the different combinations and the importance of different features are fully discussed, respectively, in
Section 4.2 and
Section 4.3.
4. Results
4.1. Impact on the Selection of Different Feature Combinations
To examine the influence of different features, we designed six different cases to classify land use. The details of these six examples are shown in
Table 2. The selection of parameters refers to
Section 4.3.
In the first three cases (A, B and C), the LDA semantic features from the HSR images were utilized to carry out functional land use classification tasks. In the next case (D), LM, POI and PGs were combined and used to test the performances. Then, features without landscape metrics were used in case E. All the features were integrated in case F.
Significant improvements of OA and Kappa could be found in case C, where POI was used. The OAs of the three different partition methods were all significantly improved, which indicates that the POI feature contributed to effective land use mapping. Besides, the results of C and E showed that the accuracies of using all three different spatial units were further improved by using PGs object information, especially for the grid method. Landscape and special object data can distinguish urban functional areas better than using image features according to the result of A and D.
According to the result of case F, the block-based method obtained the highest OA and Kappa coefficient values (OA = 86.59%, Kappa = 83.87%) among these three methods. The results of the grid method showed a good testing accuracy of 80.62% and an overall classification accuracy of 83.85%. The object-based method obtained the lowest performance (OA = 81.62%, Kappa = 77.93%).
The accuracies of different land use types using different spatial units are shown in
Table 3. All urban zones were classified into six categories using a combination of all features. The common point of these three methods is that they all demonstrated good performances in the green areas and urban villages because of the huge difference in green land and urban village areas against other ground components. Traditional block-based and object-based methods had poor performances in the public service areas, but the grid methods performed well, because the physical image features are not enough to distinguish public service areas from other buildings and public service is mixed in with residential and commercial types in the POI dataset. The object-based method performed slightly less well in all classes of urban land-use classification compared with the other methods, but performed well in terms of accuracy in distinguishing the residential type.
Generally, by comparing the different feature combinations, we can observe that no matter what kind of spatial unit is used, the best choice is always to fuse all the features in order for the best classification accuracy to be obtained.
Figure 5 shows the confusion matrixes for the classification results (
Table 3) of each method that are closest to the average accuracy. The spatial unit had a great impact on classification accuracy. According to
Figure 5a, the public service area obtained an inadequate accuracy with block level urban parcels, because OSM is a volunteer based upload platform in which many roads are not accurate, or the networks are incomplete. Therefore, many blocks could not be properly partitioned, and lots of manual operations were needed. Moreover, due to the limited number of blocks in the Futian district, the number of public service samples is inadequate, which results in a large number of deficiencies when training the model.
Figure 5b illustrates that the educational type had a poor performance in the grid method and
Figure 5c shows that the public service parcels were often confused with other parcels.
4.2. Impact on Feature Importance
A total number of 114 features were used for urban land use classification in this study (
Figure 3), including 22 landscape metrics, 73 spectral/textural/SIFT features, 18 Gaode POI features, and 1 special object feature.
Figure 6 shows the rankings of feature importance (Top 30) using three different partitions. The importances of features are quite different among the different partition methods.
For block level parcels, POI features occupied 3/10 of the top ten features and the first three were all POI features. This means that POIs are far more important than other features when using the block partition method. The textural features also enter the top 10 and occupied two positions. Pland_4, Pland_2 and Pland_0 are the ninth, twelfth and thirteenth most important features, which represent the percentage of shadow, natural land and built-up areas in a block, respectively. The SIFT feature has four positions in the top 30. Only two feature vectors of the spectral features ranked in the top 30 most important features, which were ranked as four and 30, respectively.
Using grid level parcels, the gap of importances between different features is not particularly large. The landscape metrics has 13 positions in the list (
Figure 6b). The special object PGs ranked first. The texture takes only one position, which ranked second. None of the SIFT features ranked in the top 30.
Using object level parcels, POI is the most important feature which occupied 7/10 of the top ten and almost all of the POI features are on the list. Landscape metrics occupied seven of the top 30. Spectral features are the least important compared with the other features because they only occupy two of the top 30 positions.
Comparing the ranks of feature importances using the three different methods, it can be observed that the spectral features are less important for identifying parcels than the other features in the block and object methods. SIFT features are insignificant features in the grid method. As expected, the semantic feature of the playground was identified as an important feature no matter what kind of partition was chosen. This demonstrates the importances of the use of specific features in functional area mapping.
As suggested by
Figure 6, POI is the most important feature. Specifically, from the ranking we can deduce that the new extracted feature is a critical attribute for all of the conditions. Landscape metrics, especially at the class-level PLAND, became an important attribute in each partition, because they represented the proportion of land cover types in a single parcel. Spectral and textural attributes are more important for block and object level parcels, but were seldom found to rank in the top 30 features for grid level mapping.
4.3. Impact on Qualitative Performance
Figure 7 presents the detailed results using three partitions in the study area. As shown in the ground-truth map (
Figure 7a) and the land-use classification maps (
Figure 7b–d), the geographical distribution of land use in Futian is complicated. Many residential and commercial parcels were interweaved in the grid level (
Figure 7c) and object level maps (
Figure 7d). The mixed-use parcels, which were located in the southernmost areas, have been separated in the grid and object level maps when they were compared to the ground-truth map. Because of the complex land use types of the central area, mixed functional areas cannot be well represented by mapping. Grid level and object level maps show the complexity present within a block.
By observing these maps, it is clear that urban villages were intensively distributed in the mid-south areas. Natural land was seldom misclassified in all three methods. On the contrary, the number of misclassified public service parcels was largest for all three partitions. This conclusion indicates that natural land can be distinguished well by the extracted features, but public-service areas still need more appropriate features to improve the accuracy.
Figure 7e–h shows some details regarding the CBD area, which is one of the most prosperous areas of Shenzhen. The actual land use property is complicated when compared with the remote sensing images ((
Figure 7e). The misclassified parcels were generally few in number compared with the ground-truth (
Figure 7a) and the predicted results (
Figure 7b), which demonstrated the effectiveness of the results obtained using the block-level parcel method. However, the disadvantage of this partition method is also obvious. Comparison of the remote sensing images (
Figure 7e) and the block-level results (
Figure 7f) demonstrated that the scale of road network is too coarse for function mapping, which caused the appearance of the mixed areas. Moreover, the educational type cannot be easily separated from the other categories, which leads to the poor accuracy of educational land-use types. One of the improvement measures is to use a higher level of road network. This requires a large amount of manual work to correct the original data, which makes reliable mapping impossible if the study area is large. It can be concluded that this method is only available for the test of classification methods and small-scale mapping.
The shortcoming is obvious when comparing
Figure 7c,g with the ground-truth map (
Figure 7e). Although the phenomenon of the mixed functional area has been partially solved, compared with the ground-truth, the land use types at the edge of the grid cannot be distinguished well. Moreover, the accuracy of the gird-method deeply relies on the selection of grid size and the unsightly rough edges cannot be ignored. It is convenient and time-saving to analyze the data using this method if only complete some statistical analyses with the point data, for instance, mobile data [
13] and GPS tracking data [
12]. It is effective in improving model accuracy according to the testing result, but it is still not suitable for the accurate mapping of functional areas.
We can find that the parcels were classified perfectly in
Figure 7d,h, which used object-level parcels as the basic unit. It can be seen from
Figure 7d that mapping quality increased substantially and the phenomenon of the mixed functional area was solved well. Additionally, the disadvantages are obvious because of the complex segmentation process before classification: time consuming. Secondly, it also has a strong dependence on the selection of scale parameters for image segmentation. Comparing the results with the ground-truth map, there is still a level of confusion between public service areas and the business district. This means more representative features or data need to be identified and incorporated to distinguish between these two types in the future. In summary, this method is more suitable for subsequent statistics and fine-grained mapping of functional districts.
4.4. Impact on Quantitative Performance
In this part, 487 sample points were randomly selected to assess the mapping accuracies. The ground-truth labels of these points were determined based on remote sensing images, street views, building profile data and official land use data. Then, these points were used to evaluate block-level, grid-level and object-level maps, and three confusion matrices (
Table 4,
Table 5 and
Table 6) were obtained.
As we can see from
Table 4,
Table 5 and
Table 6, the overall accuracies differ a lot compared with model accuracy. As shown in
Table 4, the producer’s accuracy (PA) of the residential area is 35.83%. This result shows that residential areas are always misclassified with commercial areas, public service areas and educational areas. In particular, residential and educational areas are often close to each other, and the lack of road network separating them resulted in residential and educational areas having to be classed as the same block, which caused unavoidable misclassification errors. The grid-level method showed poor performance in distinguishing commercial and public service areas, because they have similar image patterns and POI features. The object-level map achieved the highest overall accuracy because of the fine-grained scale. Generally, natural areas and urban villages still have higher accuracies than other areas, because they have distinct characteristics from other areas in remote sensing images. It is difficult to distinguish residential and educational areas for all these three methods.
4.5. Parameters Sensitivity Analysis
Several parameters of this model may influence the discrimination accuracy, such as the numbers of spectral, textural and SIFT visual words and the numbers of LDA topics for the four used features. Previous studies have confirmed that the number of visual words and the size of the topics have an impact on the accuracies of land-use classification [
48]. We utilized the K-means method with the Euclidean distance measurement to cluster the spectral, GLCM and SIFT features. V denotes the number of visual words, and K is the number of topics for the LDA model. The visual word number (V) and topic number (K) are two free parameters in our approach. The suitable topic numbers were chosen by comparing the accuracies of classification results, and the training and validation datasets are the same as those discussed in
Section 3.3.
We used Dirichlet priors in the LDA estimation with α = 50/K and β = 0.01, which are common settings in the literature [
49]. We have attempted to get the best value with different combinations of K and V parameters. The range of the parameters was set according to the article [
50].
Figure 8 shows that the retrieval results are very sensitive to the values of these parameters.
Figure 8a–c shows the relationship between V with the corresponding accuracy, and
Figure 8d–f shows the variation of accuracy when the topic number K is changed. As
Figure 8a and b show, the number of words was set as 200 and 400 making the accuracy reach the peak for spectral and textural features, and 200 is the best set for the SIFT feature. After selecting the number of visual words, we found that the numbers of topics to ensure the highest accuracy for the spectral, textural and SIFT features are 23, 23 and 15, respectively.
Table 7 shows the best parameters finally selected for the LDA model.
5. Discussion
The selection of a basic unit to fuse remote sensing and social sensing data is an important issue in functional urban land-use mapping. Previous studies have discussed the pros and cons to depict the urban function using road networks [
14], but have seldom mentioned other partitions. Based on scene classification, this study proposed a semantic method to classify the urban land-use and discussed the influence of land-use mapping using different partition methods. However, some issues still need to be discussed.
Firstly, the selection of geospatial data which were chosen to classify the functional area, needs to be discussed. This study utilized POIs and remote sensing images to obtain a favorable accuracy for urban land-use classification. High spatial resolution remote sensing images provide rich spectral, textural and spatial information for accurate mapping, which materially aids classification. However, remote sensing data also have many shortcomings in this study. The actual land types under the shadow in the remote sensing image are difficult to distinguish by the features extracted in
Section 3.2. Moreover, there is no obvious distinction between commercial and residential buildings in remote sensing images. Due to the crowded land and excessive population density, a building is usually mixed with commercial and residential activities, which also highlights a weakness of POIs in distinguishing those two kinds of buildings. POI data can represent the different space characteristics of human activity to a good degree, but it still cannot separate some categories with distinct temporal characteristics, such as residential and commercial areas. In future studies, some diurnal data should be added to the research—such as Tencent user density, mobile data or luminous remote sensing data. By adding the semantic objects of playgrounds, the education area has been well distinguished from other types. However, playgrounds represent not only the semantic ground objects of educational areas but also public service areas, such as stadiums, which can lead to inaccurate classification. In future studies, more attention should be paid to find representative ground objects to distinguish different functional areas.
Secondly, the machine learning model also has some issues. The LDA model elevates the low-level features to the high-level semantics, which solves the problem of low-level features not being able to distinguish between urban functions. However, the LDA model is still sensitive to parameters. In
Section 4.3, the selection of word number V and semantic topic number K has affected the classification performance. It is time-consuming work to find the optimal parameter combination. A more automated way to select the best parameters of the LDA needs to be proposed in following studies.
Thirdly, the issue of mixed parcels also needs to be discussed. Previous studies have designed a metric to quantify the mixture degree [
14]. However, the problem of mapping mixed-function zones is still unresolved. The best way to distinguish between mixed functional areas is to discuss them from the standpoint of segmentation scale, however, no appropriate scale and partitioning method existed to solve this problem at present. In this study, a rough judgment has been made on the mixing degree of each block by three different subdividing blocks. The point is that each segmentation type has its suitable occasion, and correctly choosing it would greatly benefit a certain recognition task. Moreover, the scale chosen by different segmentation methods on functional zoning has not discussed in this paper. Therefore, those questions should be considered in future research.
Finally, we draw some conclusions using the table (
Table 8) based on the above analysis. Critical data, advantages and disadvantages, application Scenarios of different partitions are summarized as follows.