Fusing High-Spatial-Resolution Remotely Sensed Imagery and OpenStreetMap Data for Land Cover Classification Over Urban Areas

Land cover classification of urban areas is critical for understanding the urban environment. High-resolution remotely sensed imagery provides abundant, detailed spatial information for urban classification. In the meantime, OpenStreetMap (OSM) data, as typical crowd-sourced geographical information, have been an emerging data source for obtaining urban information. In this context, a land cover classification method that fuses high-resolution remotely sensed imagery and OSM data is proposed. Training samples were generated by integrating the OSM data and multiple information indexes. OSM data, which contain class attributes and location information of urban objects, served as the labels of initial training samples. Multiple information indexes that reflect spectral and spatial characteristics of different classes were utilized to improve the training set. Morphological attribute profiles were used because the structural and contextual information of images was effective in distinguishing the classes with similar spectral characteristics. Moreover, a road superimposition strategy that considers road hierarchy was developed because OSM data provide road information with high completeness in the urban area. Experiments were conducted on the data captured over Wuhan city, and three state-of-the-art approaches were adopted for comparison. Results show that the proposed approach obtains satisfactory results and outperforms the other comparative approaches.


Introduction
The rapid process of urbanization has dramatically changed the distribution of urban land cover in recent years.The land cover information of urban areas is important because it helps humans understand the change trend of their living environment.The urban land cover information can also help government agencies and other policy makers make decisions on urban planning and management [1].However, owing to the high cost and low efficiency, humans are not a satisfactory resource for collecting land cover information in most cases.Therefore, the geographic information provided by remote sensing technology or social sensors must be utilized for land cover classification over urban areas [2].
High-resolution remotely sensed imagery provides detailed spatial and structural information and thus offers new avenues for precise land cover classification over urban areas [3,4].However, high spatial resolution does not indicate high precision for computer interpretation.In the high-resolution remotely sensed imagery, the urban areas show high intra-class and low inter-class variability.The increases in intra-class variation and the decreases in inter-class variation reduce the separability of classes in the spectral domain, and this condition brings difficulty in distinguishing different classes with the exclusive use of spectral characteristics of image [5,6].Hence, considerable research exploited the spatial information of high-resolution remotely sensed imagery and considered textural and structural features as important information sources to complement spectral properties for accurate classification [7][8][9].The classification performance relies on the quality and quantity of training samples [10].The conventional ways to collect training samples, such as field surveys and visual inspections, are always time consuming, laborious, costly, and prone to artificial errors [11].In this context, active learning and semi-supervised learning methods were adopted to reduce the manual work for sample collection [12].By selecting the most informative data, active learning minimizes the amount of samples required to be labeled by experts [13,14].Semi-supervised learning uses the information exploited from unlabeled data to improve classification performance [15,16].In the meantime, the wide acceptance of open source geographical data has increased the attention to OpenStreetMap (OSM) in urban environment understanding [17].OSM is a crowd-sourced project that aims to create a set of map data that is free to use, edit, upload, and download [18].Volunteers can delineate an object depending on satellite image basemaps and label the object with some predefined tags (e.g., name, land cover/land use, and address) or custom tags (e.g., opening time of a hospital and website of a university).Name and land cover/land use are the most commonly used tags.Thus, OSM data have a large amount of land cover information for assisting urban land cover mapping [19].Since 2007, the number of registered users and the track points of OSM have increased considerably.The OSM data are equal to proprietary data in terms of accuracy and coverage in certain countries and regions [20].
In recent years, the integration of high-resolution remotely sensed imagery and OSM for land cover classification has drawn increasing attention.OSM contains geographic data with class attributes and location information, which benefit the collection and labeling of training samples for remote sensing classification.In [21], a land use/land cover mapping approach using time-series imagery and training information extracted from OSM data was introduced.Three relatively noise-tolerant algorithms-namely naïve Bayes, decision tree (C4.5 algorithm), and random forest (RF)-were used to reduce the influence of OSM noise on the classification performance.In [22], remote sensing images and OSM data were combined for land use/land cover classification.The contribution index (CI), which represents the activeness of user behavior, was utilized to assess the quality of the OSM data.The OSM data with high CI were selected as training samples preferentially.In [23], a high-resolution remote sensing image classification method using OSM data was proposed.Morphological erosion, super-pixel segmentation, and cluster analysis were used to refine training samples derived from the OSM data.The OSM road data were directly superimposed on the classification map due to the high accuracy and completeness.For the high-spatial-resolution remotely sensed imagery, large amounts of structural and detailed information are available.These existing OSM-based classification methods utilize only the spectral characteristics of image and ignore the spatial information inherited in the object distribution.The shadow information becomes clear due to the increase in spatial resolution [24].Given that shadows usually result in a loss of information and distortion of the affected regions, precise recognition of shadows is important for the analysis of high-spatial-resolution remotely sensed imagery [25].However, the shadow information is not involved in the OSM data.Consequently, the derived training set lacks the shadow samples.
In this study, a spectral-spatial classification framework that fuses high-resolution remotely sensed imagery and OSM data was developed.We derived training samples from the OSM data because they contained category and location information.However, OSM data may have contained errors, such as position errors and attribute errors, due to the unprofessional production process and the absence of data quality control.The multiple information indexes were introduced to refine the samples derived from the OSM data for decreasing the aforementioned errors and supplementing class information.Information indexes could reflect the spectral and spatial characteristics of specific classes.Thus, they could be used to label samples for these classes.In particular, normalized difference vegetation index Remote Sens. 2019, 11, 88 3 of 21 (NDVI), normalized difference water index (NDWI), morphological building index (MBI), and bare soil index (BSI) were utilized to purify the samples of corresponding classes extracted from the OSM data, whereas morphological shadow index (MSI) was adopted to derive shadow samples.Considering the complex land cover distribution in the urban areas, we used extended morphological attribute profiles (APs) to model the structural and spatial information of high-resolution images.In addition, principle component analysis (PCA) was adopted on the original image and the derived APs to reduce the data redundancy and select informative features.On the basis of the generated training samples and extracted features, the initial classification result was achieved using RF.Considering that the OSM road data contained road location and hierarchy information with high completeness, the OSM road information was designated to be superimposed on the classification map to reduce the misclassification between roads and other artificial architectures.An approach that generated road buffer with an adaptive radius in accordance with road hierarchy was developed.Experiments were conducted on the data covering the area within the third ring road of Wuhan.Comparison with three state-of-the-art methods illustrated that the proposed framework achieved satisfactory classification results.
The rest of the paper is organized as follows: Section 2 introduces the methodology of the proposed framework.The datasets and experimental results are provided in Section 3, followed by a detailed discussion and a comparison with other methods in Section 4. Section 5 elaborates on the conclusions.

Methodology
The land cover information over urban areas was obtained using the proposed framework of three steps: sample generation, feature extraction, and road superposition.Specifically, the OSM data without roads were utilized to obtain the initial samples.The samples were successively refined by multiple information indexes to generate candidate samples.Different classes of training samples with equal amounts were randomly selected from candidate samples.Then, APs were computed on the PCA result of the images.The dimensions of AP features were decreased by PCA before being introduced into an RF classifier.Lastly, the OSM roads were buffered with adaptive radii and superimposed on the classification map.The pixels overlapped by the buffered roads were relabeled as the road class.The flowchart of the proposed framework is presented in Figure 1.

Sample Generation
In this section, a novel sample generation method that integrates OSM data and multiple information indexes is proposed.The OSM data contained abundant information of ground object categories, which provided training sample labels for image classification.Notably, some errors existed due to the user-generated process of OSM.In the meantime, information indexes such as NDVI, NDWI, and MBI were adopted to extract training samples on the basis of the distinct spectral or structural characteristics of specific classes.In the proposed method, the samples generated from the OSM data were refined by the information indexes.In other words, the information indexes were calculated on pixels or objects of the corresponding class in OSM instead of on the entire image.For example, NDVI was calculated only in areas labeled as vegetation in OSM.Specifically, NDVI, NDWI, MBI, and BSI were introduced to purify samples of vegetation, water, buildings, shadows, and soils, respectively.MSI was used to derive the shadow samples.

Multiple Information Indexes
NDVI [26]: Given that vegetation has high near-infrared reflectance and low red-light reflectance, NDVI is defined as follows: where NIR and RED denote the digital numbers (DNs) of the near-infrared and red-light bands of images, respectively.NDWI [27]: Water has high-reflectance in green band and low reflectance in near-infrared band.On the basis of the spectral characteristic, NDWI is computed as:

Sample Generation
In this section, a novel sample generation method that integrates OSM data and multiple information indexes is proposed.The OSM data contained abundant information of ground object categories, which provided training sample labels for image classification.Notably, some errors existed due to the user-generated process of OSM.In the meantime, information indexes such as NDVI, NDWI, and MBI were adopted to extract training samples on the basis of the distinct spectral or structural characteristics of specific classes.In the proposed method, the samples generated from the OSM data were refined by the information indexes.In other words, the information indexes were calculated on pixels or objects of the corresponding class in OSM instead of on the entire image.For example, NDVI was calculated only in areas labeled as vegetation in OSM.Specifically, NDVI, NDWI, MBI, and BSI were introduced to purify samples of vegetation, water, buildings, shadows, and soils, respectively.MSI was used to derive the shadow samples.

Multiple Information Indexes
NDVI [26]: Given that vegetation has high near-infrared reflectance and low red-light reflectance, NDVI is defined as follows: where N IR and RED denote the digital numbers (DNs) of the near-infrared and red-light bands of images, respectively.
NDWI [27]: Water has high-reflectance in green band and low reflectance in near-infrared band.On the basis of the spectral characteristic, NDWI is computed as: where GREEN and N IR denote the DNs of the green-light and near-infrared bands of images, respectively.MBI [28]: Buildings are brighter than their surrounding shadows.Thus, the basic idea of MBI is to build the relationship between the spectral-structural characteristics and the morphological operators.Considering the characteristics of brightness, local contrast, size, and directionality, MBI can be represented as follows: where D and S denote the numbers of directionality and scale, respectively, and MP W−TH (d, s) denotes the morphological profiles (MPs) of white top-hat performed on the original image b with directionality d and scale s.MSI [29]: Given that shadows are darker than their surrounding objects, the calculation of MSI can be extended from MBI by replacing the white top-hat with the black top-hat transformation.MSI can be formulated as: BSI: Bare soil can be extracted from HSV color space.HSV color space, as a common color space, uses hue, saturation, and value to describe an image and can be used to extract soils from remote sensing image.

Sample Generation Method
The sample generation method includes the following steps: 1.
Sample labeling based on OSM data: the category information of OSM data is used to label the samples in the high-resolution remotely sensed imagery depending on their spatial coordination.

2.
Calculation of multiple information indexes: MBI, MSI, NDWI, and BSI are computed to indicate the area of buildings, shadows, water, and soils, respectively.Moreover, NDVI is utilized to extract the forest and grass information.

3.
Sample collection based on multiple information indexes: for NDVI and NDWI, the Ostu method is adopted to select the optimal threshold based on the histogram of information indexes of the OSM-label vegetation and water samples.For MBI, MSI, and BSI, the threshold is selected by experts.By applying the threshold on the obtained information indexes, we achieve the samples belonging to the corresponding classes.

4.
Training sample generation: the intersection of sample sets provided by OSM data and multiple information indexes are selected to construct the training sample set.The OSM data do not contain the shadow information.Thus, only MSI is used to generate the shadow samples.

5.
Training sample refinement: considering that some samples may be labeled as different classes by dissimilar volunteers, the regions that are assigned to more than one category are removed to refine the training set.

Morphological Attribute Profiles
For high-resolution remotely sensed imagery, the intra-class variation of spectral features increases but the inter-class variation decreases.The classification of high-resolution imagery cannot benefit considerably from the single use of spectral characteristics.In the meantime, high-resolution remotely sensed imagery can delineate the spatial features of surface objects clearly.By introducing spatial features, the accuracy of classification can be largely improved.APs provide a multilevel spatial characterization of an image by the sequential application of morphological attribute filters.Morphological attribute filters are powerful tools for modeling different specifications of structural information [30].These filters are connected operators, and thus images are processed by only considering their connected components.In other words, with the operations of morphological attribute filters, connected components of the processed image will merge, enlarge, shrink, split, appear, or disappear.A connected component is composed of a group of iso-intensity pixels that are considered to be connected based on a connectivity rule.Four-connected and eight-connected rules are two widely used connectivity rules in which a pixel is regarded as connected to its four or eight neighboring pixels, respectively.
Two fundamental morphological attribute filters are attribute thinning and attribute thickening.The attribute filters process an image in accordance with a criterion that is a logical prediction of a generic attribute.The criteria implement a comparison between the attribute value calculated on a connected component and a predefined threshold [31].Specifically, a criterion R that compares the attribute A of a connected component C with an area threshold λ can be expressed as: To derive APs on an image, the criterion is evaluated on all connected components of the image, which determines whether a connected component will be kept or merged.If the criterion is fulfilled (the value of the criterion is true), then the connected component will be preserved; otherwise, it will be combined to one of its adjacent connected components.The combined adjacent connected component will be the one with the closest lower or higher attribute value depending on whether the filter is thinning or thickening [32].That is, if the attribute filter is thinning, then the combined connected component will be the one with the closest lower value.Otherwise, it will be the connected component with the closest higher value.
An important property of the criteria is increasingness.Increasing criteria satisfy the following condition: if the criterion is verified for a connected component, then it will be also verified for all its supersets [33].Increasing attributes (e.g., area) and inequation relations (e.g., >) can form increasing criteria.Furthermore, increasing criteria lead to increasing filters, which transforms the thinning and thickening filters into opening and closing filters, respectively.
APs are obtained by applying a sequence of attribute thinning and thickening filters on the image.For a greyscale image, the APs can be defined as: where ϕ C k (g) and λ C k (g) denote attribute thinning and thickening output of the origin greyscale image g with the k-th criterion, respectively.Analogous to extended MPs, the extended APs (EAPs) can be defined as the APs extracted from the principle components of an image [8].Thus, EAP can be formularized as: where g n denotes the n-th band of the image.

Road Superposition
Roads and buildings usually reflect similar spectral characteristics because of their similar construction materials.The severe misclassification between roads and buildings is difficult to avoid regardless of the accuracy of the selected training samples.OSM, which was initially designated to collect street data by volunteers, has a higher completeness of road data than that of the other classes.OSM roads have reached a completeness of more than 80% worldwide [34] and a higher value with navigation companies contributing to OSM in countries such as the US and China.To avoid the severe misclassification, OSM roads were not selected as training samples.Instead, we superimposed OSM road data upon the classification map to fully utilize their excellent completeness [35,36].
Road buffer is conducted before superimposition, given that OSM road data are in line format.In traditional methods, the OSM road buffer is generated with a fixed-length radius [23].However, road buffers with a fixed-length radius cannot represent roads of all hierarchies because roads belonging to different hierarchies have dissimilar widths.To address the issue, an approach that derives the OSM road buffer with an adaptive radius was developed.
In our method, the widths of the road buffer radius were determined in accordance with its hierarchy.The spatial resolution of the remote sensing image was also considered.In general, the radius should satisfy the following conditions: where R denotes the radius of road buffer, d denotes the spatial resolution of image, k denotes the multiple, and W min and W max are the minimum and maximum road widths recommended by the related standards, respectively.According to our knowledge, Technical Standard of Highway Engineering, which is the current standard associated with roads and traffic in China, recommends the width range of different hierarchies of roads.Thus, Table 1 shows the estimated radii of road buffer at different hierarchies for an image with a spatial resolution of 4 m.

Study Area and Datasets
Wuhan is one of the largest cities of central China.The study area is located within the third ring road of Wuhan, which is the urban area of this city.It covers a region of approximately 500 km 2 , and occupies parts of seven districts: Jianghan, Jiangan, Qiaokou, Hanyang, Wuchang, Qingshan, and Hongshan.Figure 2 shows a GaoFen-2 multispectral image acquired on 1 September 2016.The image contains 5544 × 4720 pixels and has a spatial resolution of 4 m.Four channels, namely blue, green, red, and near-infrared, are incorporated in the image.
Wuhan is one of the largest cities of central China.The study area is located within the third ring road of Wuhan, which is the urban area of this city.It covers a region of approximately 500 km 2 , and occupies parts of seven districts: Jianghan, Jiangan, Qiaokou, Hanyang, Wuchang, Qingshan, and Hongshan.Figure 2 shows a GaoFen-2 multispectral image acquired on September 1, 2016.The image contains 5544 × 4720 pixels and has a spatial resolution of 4 m.Four channels, namely blue, green, red, and near-infrared, are incorporated in the image.A dataset of OSM covering the study area, which was downloaded from https://download.geofabrik.de/asia/china.html,was used.The dataset was composed of eight shapefile layers called points, places, waterways, railways, roads, natural, land use, and buildings, respectively.
The GaoFen-2 image was preprocessed with a series of steps including radiometric calibration, atmospheric correction, and georeferencing [37].The radiometric calibration was conducted on the GaoFen-2 image to convert DN to top-of-atmosphere (TOA) reflectance with parameters provided by the China Centre for Resources Satellite Data and Application.Then, the TOA reflectance was converted to ground surface reflectance by atmospheric correction with the Fast Line-of-sight Atmospheric Analysis of Spectral Hypertube module of the Environment for Visualizing Images A dataset of OSM covering the study area, which was downloaded from https://download.geofabrik.de/asia/china.html,was used.The dataset was composed of eight shapefile layers called points, places, waterways, railways, roads, natural, land use, and buildings, respectively.
The GaoFen-2 image was preprocessed with a series of steps including radiometric calibration, atmospheric correction, and georeferencing [37].The radiometric calibration was conducted on the GaoFen-2 image to convert DN to top-of-atmosphere (TOA) reflectance with parameters provided by the China Centre for Resources Satellite Data and Application.Then, the TOA reflectance was converted to ground surface reflectance by atmospheric correction with the Fast Line-of-sight Atmospheric Analysis of Spectral Hypertube module of the Environment for Visualizing Images software.Lastly, the GaoFen-2 image was georeferenced to the OSM data to remove spatial offset by first-order polynomial transformation of pairwise control points.

Experimental Setting
Four equal-sized sub regions with a size of 702 × 690 in pixels were selected as test regions.The sub-images and the corresponding ground truth annotations of the test regions are shown in Figure 3. Seven typical classes were considered: buildings, water, forests, grasses, roads, soils, and shadows.Table 2 presents the number of testing samples for each class in the test regions.RF [38] was employed as a classifier in the experiments.The minimal size, maximal size, and interval of the structure element used for generating MBI and MSI were set as 24, 48, and 4 pixels, respectively.The number of training samples of each class was 300 pixels.As for APs, the area was chosen as the attribute and the corresponding thresholds were selected as 25, 100, 400, and 1600 pixels.The number of trees for constructing the RF classifier was 400.The overall accuracy (OA), Kappa coefficient, and F1-score [39] for each class were used to evaluate the classification performance.
Remote Sens. 2019, 11, x FOR PEER REVIEW 9 of 21 software.Lastly, the GaoFen-2 image was georeferenced to the OSM data to remove spatial offset by first-order polynomial transformation of pairwise control points.

Experimental Setting
Four equal-sized sub regions with a size of 702 × 690 in pixels were selected as test regions.The sub-images and the corresponding ground truth annotations of the test regions are shown in Figure 3. Seven typical classes were considered: buildings, water, forests, grasses, roads, soils, and shadows.Table 2 presents the number of testing samples for each class in the test regions.RF [38] was employed as a classifier in the experiments.The minimal size, maximal size, and interval of the structure element used for generating MBI and MSI were set as 24, 48, and 4 pixels, respectively.The number of training samples of each class was 300 pixels.As for APs, the area was chosen as the attribute and the corresponding thresholds were selected as 25, 100, 400, and 1600 pixels.The number of trees for constructing the RF classifier was 400.The overall accuracy (OA), Kappa coefficient, and F1-score [39] for each class were used to evaluate the classification performance.

Experiment Results
The classification maps and accuracies of the four test regions are shown in Figure 4 and Table 3, respectively.From Figure 4, it can be clearly observed that the proposed method gave satisfactory classification results.The objects in the classified image were close to the real ground features in terms of size and shape.In particular, the well-shaped water, forests, roads, and shadows showed explicit boundary and were separate from their surrounding objects.

Experiment Results
The classification maps and accuracies of the four test regions are shown in Figure 4 and Table 3, respectively.From Figure 4, it can be clearly observed that the proposed method gave satisfactory classification results.The objects in the classified image were close to the real ground features in terms of size and shape.In particular, the well-shaped water, forests, roads, and shadows showed explicit boundary and were separate from their surrounding objects.As shown in Table 3, the proposed framework achieved a classification accuracy of 89.4%.Water received fairly optimal accuracies among all classes, and the accuracies of water in the four test regions were 97.8%, 95.6%, 98.6%, and 95.7%, respectively.Moreover, roads were also well identified with an accuracy of 93.2%, which indicated that the OSM data had excellent completeness in the Wuhan urban area.By employing the road superimposition strategy, the structure and continuity of the roads were preserved.Buildings obtained quite a high accuracy of 84.7% due to the noninterference of road classification.Although forests and grasses showed similar spectral and spatial characteristics, they were correctly recognized with accuracies of 79.2% and 87.9%, respectively.For shadow and soils, the classification accuracies were also acceptable and reached 82.1% and 77.9%, respectively.

Experiment Results
The classification maps and accuracies of the four test regions are shown in Figure 4 and Table 3, respectively.From Figure 4, it can be clearly observed that the proposed method gave satisfactory classification results.The objects in the classified image were close to the real ground features in terms of size and shape.In particular, the well-shaped water, forests, roads, and shadows showed explicit boundary and were separate from their surrounding objects.As shown in Table 3, the proposed framework achieved a classification accuracy of 89.4%.Water received fairly optimal accuracies among all classes, and the accuracies of water in the four test regions were 97.8%, 95.6%, 98.6%, and 95.7%, respectively.Moreover, roads were also well identified with an accuracy of 93.2%, which indicated that the OSM data had excellent completeness in the Wuhan urban area.By employing the road superimposition strategy, the structure and continuity of the roads were preserved.Buildings obtained quite a high accuracy of 84.7% due to the noninterference of road classification.Although forests and grasses showed similar spectral and spatial characteristics, they were correctly recognized with accuracies of 79.2% and 87.9%, respectively.For shadow and soils, the classification accuracies were also acceptable and reached 82.1% and 77.9%, respectively.

Method of Sample Generation
The class distribution of samples in feature space was derived to analyze the effectiveness of the proposed sample generation method.A comparison between the class distribution of the original OSM samples and that of the samples generated by the proposed method is presented in Figure 5, where the horizontal and vertical axes denote the first two principle components obtained by PCA, respectively.
As shown in Figure 5, the original OSM samples were dispersive.By contrast, the derived samples were aggregated in the feature space.The distribution of the original OSM samples indicated that several classes were confused with one another seriously, especially for the samples of buildings and soils.Some OSM samples were far from the center of the corresponding class in the feature space, such as water.The derived samples had more explicit boundaries with better separability among different classes than the original OSM samples.A few building samples were mixed with soil samples due to the similar spectral characteristics between the two classes.Nevertheless, the general quality of the derived samples was considerably improved.
mixed with soil samples due to the similar spectral characteristics between the two classes.Nevertheless, the general quality of the derived samples was considerably improved.

Utilization of Spatial Features
A comparative experiment that classified the imagery utilizing only the spectral features was conducted to verify the effect of the spatial features on the classification performance.The experiment was performed under the same conditions as the proposed framework except the utilization of spatial features.The classification maps and accuracies of the experiment are presented in Figure 6 and Table 4, respectively.

Utilization of Spatial Features
A comparative experiment that classified the imagery utilizing only the spectral features was conducted to verify the effect of the spatial features on the classification performance.The experiment was performed under the same conditions as the proposed framework except the utilization of spatial features.The classification maps and accuracies of the experiment are presented in Figure 6 and Table 4, respectively.Comparison of the results presented in Figures 3 and 6 indicated that most pixels were correctly classified.However, in regions II and IV, many small shadow objects were misclassified as water due to their similar spectral characteristics.Spatial features provided additional characteristics that enhanced the separability between different classes.As a result, the misidentification between shadows and water decreased considerably after the spatial features were integrated.
Table 4 shows that the OA of spectral-based classification was 83.4%, which was worse than the accuracy of 89.4% given by the spectral-spatial classification.The utilization of spatial features largely benefited the recognition of shadows and soils.Compared with the results of the spectral-based method, the accuracy increase of shadows and soils provided by the proposed method were 23.9% and 16.5%, respectively.The classification accuracies of water, forests, and grasses also improved by 2-6%.Comparison of the results presented in Figures 3 and 6 indicated that most pixels were correctly classified.However, in regions II and IV, many small shadow objects were misclassified as water due to their similar spectral characteristics.Spatial features provided additional characteristics that enhanced the separability between different classes.As a result, the misidentification between shadows and water decreased considerably after the spatial features were integrated.
Table 4 shows that the OA of spectral-based classification was 83.4%, which was worse than the accuracy of 89.4% given by the spectral-spatial classification.The utilization of spatial features largely benefited the recognition of shadows and soils.Compared with the results of the spectral-based method, the accuracy increase of shadows and soils provided by the proposed method were 23.9% and 16.5%, respectively.The classification accuracies of water, forests, and grasses also improved by 2-6%.

Strategy of Road Superimposition
A comparative experiment that extracted training samples from the OSM road data instead of directly overlaying it on the classification map was carried out to demonstrate the effectiveness of the road superimposition strategy.The classification maps and accuracies are presented in Figure 7 and Table 5, respectively.As shown in Figure 7, a mass of buildings and soils were misclassified as roads.Some pixels of water, forests, and grasses were confused with roads.This phenomenon can be attributed to the similar spectral characteristics among buildings, soils, and roads.

Strategy of Road Superimposition
A comparative experiment that extracted training samples from the OSM road data instead of directly overlaying it on the classification map was carried out to demonstrate the effectiveness of the road superimposition strategy.The classification maps and accuracies are presented in Figure 7 and Table 5, respectively.As shown in Figure 7, a mass of buildings and soils were misclassified as roads.Some pixels of water, forests, and grasses were confused with roads.This phenomenon can be attributed to the similar spectral characteristics among buildings, soils, and roads.
Comparisons between Tables 3 and 5 indicated that the method which utilized the road superimposition strategy provided better classification performance and increased the OA by 7.6%.Specifically, the classification accuracy of buildings increased by 39.1%, and the accuracy of roads increased from 77.4% to 93.2%.The road superimposition strategy considerably reduced the severe misclassification between roads and other artificial architectures.Notably, the success of the OSM road superimposition strategy was attributed to the high completeness of the OSM roads.For the other classes of information in OSM, the superimposition strategy was unsuitable due to low completeness.

The Object-Based Strategy
The pixel-based approach and the object-based approach were widely accepted strategies for Comparisons between Tables 3 and 5 indicated that the method which utilized the road superimposition strategy provided better classification performance and increased the OA by 7.6%.Specifically, the classification accuracy of buildings increased by 39.1%, and the accuracy of roads increased from 77.4% to 93.2%.The road superimposition strategy considerably reduced the severe misclassification between roads and other artificial architectures.Notably, the success of the OSM road superimposition strategy was attributed to the high completeness of the OSM roads.For the other classes of information in OSM, the superimposition strategy was unsuitable due to low completeness.

The Object-Based Strategy
The pixel-based approach and the object-based approach were widely accepted strategies for high-spatial-resolution image classification.An experiment of object-based image analysis (OBIA) was conducted for comparison.In the object-based cases, multi-resolution segmentation algorithm was used to divide the image into regions.Pixels within each region were spatially adjacent and similar in feature domains.Thus, the feature of the representative sample was regarded as the mean feature value of the pixels within each region, and the corresponding label was determined by the dominant class.The classification maps and accuracies using OBIA are presented in Figure 8 and Table 6, respectively, for comparison.Comparisons between Figures 4 and 8 showed that the classification maps became cleaner with fewer noises than before.The object-based method was advantageous because it reduced the salt and pepper noises in the classification results.As shown in Table 6, the OA of the four test regions was 89.3%-close to the accuracy of 89.4% obtained by the pixel-based method.The accuracies of buildings, water, forests, grasses, and roads were nearly equal to the accuracies using pixel-based method.From these results, we can conclude that the proposed framework is appropriate for pixel-and object-based classification.The accuracies of the two cases can be close to each other, whereas the classification maps derived by OBIA may show more homogeneity locally.Comparisons between Figures 4 and 8 showed that the classification maps became cleaner with fewer noises than before.The object-based method was advantageous because it reduced the salt and pepper noises in the classification results.As shown in Table 6, the OA of the four test regions was 89.3%-close to the accuracy of 89.4% obtained by the pixel-based method.The accuracies of buildings, water, forests, grasses, and roads were nearly equal to the accuracies using pixel-based method.From these results, we can conclude that the proposed framework is appropriate for pixeland object-based classification.The accuracies of the two cases can be close to each other, whereas the classification maps derived by OBIA may show more homogeneity locally.

Sensitivity Analysis of Sample Numbers
In this experiment, classifications with different numbers of training samples per class were conducted using pixel-and object-based methods, respectively.The OA obtained with different amounts of training samples is presented in Figure 9. Notably, the accuracies did not fluctuate considerably with a range from 85% to 90%.The accuracy reached a peak when the number of training samples was 300.Moreover, the accuracy of the OBIA method declined more heavily than that of the pixel-based method when the number of samples was more than 300.The accuracy remained stable between 85% and 90%.Therefore, the proposed framework is inconsiderably sensitive to the number of samples.In this experiment, classifications with different numbers of training samples per class were conducted using pixel-and object-based methods, respectively.The OA obtained with different amounts of training samples is presented in Figure 9. Notably, the accuracies did not fluctuate considerably with a range from 85% to 90%.The accuracy reached a peak when the number of training samples was 300.Moreover, the accuracy of the OBIA method declined more heavily than that of the pixel-based method when the number of samples was more than 300.The accuracy remained stable between 85% and 90%.Therefore, the proposed framework is inconsiderably sensitive to the number of samples.

Comparison with the State-of-the-Art Methods
Three state-of-the-art methods that use OSM data for remote sensing image classification were considered for comparison.The first method (CI method) introduced CI to assess the importance of the OSM data, and selected samples from OSM data with high CI [22].The second method (SR method) refined the training samples derived from the OSM data using a set of techniques and superimposed the OSM road data on the classification map [23].The third method (AS method) extracted automatic samples on the basis of multiple information indexes for remote sensing image classification [40].The classification maps and confusion matrixes of these methods and the proposed method are presented in Figure 10 and Table 7, respectively.

Comparison with the State-of-the-Art Methods
Three state-of-the-art methods that use OSM data for remote sensing image classification were considered for comparison.The first method (CI method) introduced CI to assess the importance of the OSM data, and selected samples from OSM data with high CI [22].The second method (SR method) refined the training samples derived from the OSM data using a set of techniques and superimposed the OSM road data on the classification map [23].The third method (AS method) extracted automatic samples on the basis of multiple information indexes for remote sensing image classification [40].The classification maps and confusion matrixes of these methods and the proposed method are presented in Figure 10 and Table 7, respectively.
Figure 10 shows that the proposed method exhibited promising performance.Specifically, although the training samples were selected from the datasets with high values of CI for the CI method, the image was still misclassified seriously.The classification maps of the SR method indicated evident confusion between buildings and soils and between forests and grasses.Furthermore, shadows were all recognized as water due to the lack of shadow samples.The classification maps of the AS method showed that most pixels were correctly classified, especially for the pixels of shadows, water, and vegetation.However, numerous pixels were misclassified as roads.In addition, grasses and forests could not be separated because they were integrally represented by the vegetation index.
By comparing the confusion matrixes of the four aforementioned methods, the following conclusions were obtained: The quality of training samples is crucial for the performance of classification.In our experiments, the methods using refined samples (SR, AS, and the proposed method) achieved an OA of at least 64.9%, whereas the method using raw OSM data as samples (CI method) achieved an OA of only 48.6%.

Discussion on the OSM Data Quality
The OSM data have gained increasing attention in land cover/land use mapping.Different strategies have been developed in accordance with the accuracy and completeness of OSM data.In regions where volunteers are active, e.g., some European countries, the quality of OSM data is as high as the proprietary data.Land cover/land use maps can be directly extracted and generated from OSM data [19,36,41].However, for most places in the world, OSM data do not have a high accuracy or completeness.In this context, OSM data are incorporated with remote sensing image for land cover/land use mapping.Training samples can be extracted from OSM data for remote sensing image classification [21][22][23].The inaccurate labels contributed by unprofessional volunteers hinder the image classification.Thus, it is important to collect reliable and representative samples from OSM data.Moreover, the OSM road network can be adopted to segment the study area for parcel-based land use mapping [42,43].The performance of this strategy relies on the completeness of the OSM road data.Although OSM data quality is unsatisfactory in certain regions, it is promising that OSM information becomes more accurate and complete with increasing volunteers contributing their knowledge.

Conclusions
In this study, high-resolution remotely sensed imagery and OSM data were fused to obtain the land cover classification map over urban areas.The class attributes from the OSM data and multiple information indexes from imagery were integrated to extract training samples.APs were computed to model the spatial features of imagery and PCA was performed to reduce the information redundancy.On the basis of the generated training samples and extracted features, an initial classification map was obtained.An OSM road buffer with an adaptive radius was derived in consideration of road hierarchy.
After superimposing the road buffer on the classification map and relabeling the overlapped pixels as the road category, the final classification result was obtained.
A high-resolution multispectral image acquired by GaoFen-2 satellite and the OSM data covering Wuhan City, China was used to test the effectiveness of the proposed framework.The experimental results illustrated that the proposed framework produced a satisfactory classification result with high accuracy.Firstly, the samples derived by the proposed method were more reliable than the raw OSM samples, given that they showed better discriminations than the original OSM data in feature space.Secondly, the integration of APs improved the classification accuracy compared with the classification approach that only utilized spectral features.Thirdly, the strategy of the OSM road superimposition effectively reduced the misclassification among buildings, soils, and roads.The proposed framework was compared with three state-of-the-art methods.Experimental results demonstrated that the proposed framework outperformed the other methods in terms of classification accuracy and visual interpretation.
In the future, we plan to conduct multi-temporal analysis on urban areas using multi-sensor images and OSM data [44,45].Considerable attention will also be paid to the fusion of open social and remote sensing data for the analysis of economic and social issues in urban areas.

Figure 1 .
Figure 1.Flowchart of the proposed framework.

Figure 1 .
Figure 1.Flowchart of the proposed framework.

Figure 4 .
Figure 4. Classification maps of the four test regions provided by the proposed method (yellow = buildings, blue = water, dark green = forests, light green = grasses, pink = roads, orange = soils, and black = shadows).

Figure 4 .
Figure 4. Classification maps of the four test regions provided by the proposed method (yellow = buildings, blue = water, dark green = forests, light green = grasses, pink = roads, orange = soils, and black = shadows).

Figure 5 .
Figure 5.Comparison between the class distribution of the original OpenStreetMap (OSM) samples (left) and the samples generated by the proposed method (right) (buildings = red, water = blue, forests = cyan, grasses = green, soils = black).

Figure 5 .
Figure 5.Comparison between the class distribution of the original OpenStreetMap (OSM) samples (left) and the samples generated by the proposed method (right) (buildings = red, water = blue, forests cyan, grasses = green, soils = black).

Figure 6 .
Figure 6.Classification maps of the four test regions provided by the spectral-based approach (yellow = buildings, blue = water, dark green = forests, light green = grasses, pink = roads, orange = soils, and black = shadows).

Figure 6 .
Figure 6.Classification maps of the four test regions provided by the spectral-based approach (yellow = buildings, blue = water, dark green = forests, light green = grasses, pink = roads, orange = soils, and black = shadows).

Figure 7 .
Figure 7. Classification maps of the four test regions provided by the approach that used roads as training samples (yellow = buildings, blue = water, dark green = forests, light green = grasses, pink = roads, orange = soils, and black = shadows).

Figure 7 .
Figure 7. Classification maps of the four test regions provided by the approach that used roads as training samples (yellow = buildings, blue = water, dark green = forests, light green = grasses, pink = roads, orange = soils, and black = shadows).

Figure 8 .
Figure 8. Classification maps of the four test regions provided by the object-based approach (yellow = buildings, blue = water, dark green = forests, light green = grasses, pink = roads, orange = soils, and black = shadows).

Figure 8 .
Figure 8. Classification maps of the four test regions provided by the object-based approach (yellow = buildings, blue = water, dark green = forests, light green = grasses, pink = roads, orange = soils, and black = shadows).

Figure 9 .
Figure 9. Accuracies with different numbers of training samples.

Figure 9 .
Figure 9. Accuracies with different numbers of training samples.

Table 1 .
Determined radii of road buffer with different hierarchies.

Table 2 .
Number of testing samples for each class in the test regions.

Table 2 .
Number of testing samples for each class in the test regions.

Table 3 .
Classification accuracies of the proposed method in the four test regions in terms of overall accuracy (OA), Kappa coefficient, and F1-score for each class.

Table 3 .
Classification accuracies of the proposed method in the four test regions in terms of overall accuracy (OA), Kappa coefficient, and F1-score for each class.

Table 4 .
The accuracies of the spectral-based classification approach in the four test regions in terms of overall accuracy, Kappa coefficient, and F1-score for each class.

Table 4 .
The accuracies of the spectral-based classification approach in the four test regions in terms of overall accuracy, Kappa coefficient, and F1-score for each class.

Table 5 .
Classification accuracies of the approach that used roads as training samples in the four test regions in terms of overall accuracy, Kappa coefficient, and F1-score for each class.

Table 5 .
Classification accuracies of the approach that used roads as training samples in the four test regions in terms of overall accuracy, Kappa coefficient, and F1-score for each class.

Table 6 .
Classification accuracies of the object-based approach in the four test regions in terms of overall accuracy, Kappa coefficient, and F1-score for each class.

Table 6 .
Classification accuracies of the object-based approach in the four test regions in terms of overall accuracy, Kappa coefficient, and F1-score for each class.