Classification of Ultra-High Resolution Orthophotos Combined with DSM Using a Dual Morphological Top Hat Profile

New aerial sensors and platforms (e.g., unmanned aerial vehicles (UAVs)) are capable of providing ultra-high resolution remote sensing data (less than a 30-cm ground sampling distance (GSD)). This type of data is an important source for interpreting sub-building level objects; however, it has not yet been explored. The large-scale differences of urban objects, the high spectral variability and the large perspective effect bring difficulties to the design of descriptive features. Therefore, features representing the spatial information of the objects are essential for dealing with the spectral ambiguity. In this paper, we proposed a dual morphology top-hat profile (DMTHP) using both morphology reconstruction and erosion with different granularities. Due to the high dimensional feature space, we have proposed an adaptive scale selection procedure to reduce the feature dimension according to the training samples. The DMTHP is extracted from both images and Digital Surface Models (DSM) to obtain complimentary information. The random forest classifier is used to classify the features hierarchically. Quantitative experimental results on aerial images with 9-cm and UAV images with 5-cm GSD are performed. Under our experiments, improvements of 10% and 2% in overall accuracy are obtained in comparison with the well-known differential morphological profile (DMP) feature, and superior performance is observed over other tested features. Large format data with 20,000 × 20,000 pixels are used to perform a qualitative experiment using the proposed method, which shows its promising potential. The experiments also demonstrate that the DSM information has greatly enhanced the classification accuracy. In the best case in our experiment, it gives rise to a classification accuracy from 63.93% (spectral information only) to 94.48% (the proposed method).


Background
Land cover classification is a well-studied, but challenging problem, which plays an important role in interpreting remote sensing data.Most of the previous investigations focus on low-to-medium resolution images for classification at the landscape level [1,2], which have been widely used for studying urban sprawl and forest mapping [3,4].The recent development of the very high resolution (VHR) (0.5-2 m) space-borne and aerial sensors has raised interest in the exploration of the spatial features for the land cover classification, since the spectral information alone does not constitute distinct features to separate different urban objects [5].Some proposed methods have achieved acceptable results using VHR images for classification at the building level using 2D spatial features [6,7].
However, the new aerial sensors and platforms (e.g., unmanned aerial vehicles (UAVs)) nowadays provide remote sensing data with even higher spatial resolution (ultra-high resolution (UHR), 0.05-0.3m).Interpreting such a kind of data is very important for urban object management and modeling [8,9], since objects that are smaller than buildings (sub-building level) become more visible and significant in UHR images, such as bus stations and cars.Due to the large perspective effects and high spectral variations, image-based spatial features may not suffice with respect to the need for the classification of UHR images.Moreover, the disparate scales of different classes are more significant.

Related Works in Classification Using Spatial Information
There are numerous works considering the improvement of classifier and feature representation for classification.Mylonas et al. [10] proposed to enhance the spatial representation using a fuzzy segmentation method and then adopted a voting strategy within the segments based on a pixel-based classification.Tuia et al. [11] proposed a multi-kernel approach that combined different kernel functions of the support vector machine (SVM) to address the different kinds of features.
Hester et al. [12] applied the ISODATA (iterative self-organizing data analysis technique) to the spectral bands of the of the VHR images and reported 89.0% overall accuracy (OA).However, in their works, ground, buildings, roofs and roads were all specified as impervious objects, which were insufficient for urban mapping applications.However, adopting the spectral information alone to distinguish these objects for finer classification will consequently result in poor results.
Researchers have demonstrated in a few studies that introducing the spatial features to describe the 2D spatial patterns of the objects brought significant improvement to the classification results [13].Notable developments include the grey level concurrent matrix (GLCM) [14], the length-width extraction algorithm (LWEA) [15], 3D wavelet analysis [16], the differential morphological profile (DMP) [17], etc. GLCM is a set of spatial features that extract textural statistics over a given window.It was firstly proposed by Haralick et al. [14] for texture analysis and later recapped by Zhang [18] for building detection.Recently, it was adopted by Pacifici et al. [6] for classification of VHR images through a neural network, resulting in satisfactory accuracy, even with panchromatic images.LWEA was proposed by Shackelford and Davis [15], which extracted the spectrally similar area surrounding a central pixel by radiating searching lines from this centric pixel and described the spatial feature of this pixel using the longest and shortest diameter for this area.Zhang et al. [13] extended this representation by counting the number of pixels on the searching lines, resulting in a simplified one-dimensional spatial feature named the pixel shape index (PSI).3D wavelet analysis was adopted as an indicator of urban complexity, as it can describe the spatial variation in the wavelet domain.A further extension from Huang and Zhang [19] considered a multi-scale approach, which varied the window size and decomposition level for more reliable spatial representation.Considering the scale problem of the VHR images and the special characteristics of the mean-shift vector [20], Qin [21] proposed a mean-shift vector-based shape feature (MSVSF), which aimed to differentiate the spatial patterns between concrete roofs and roads.Higher classification accuracy than LWEA and PSI was reported using MSVSF on the experimented dataset.Pesaresi and Benediktsson [22] computed a set of grey-scale morphological reconstruction operations on the remote sensing images by varying the size of the structural elements to build the DMP, which demonstrated effective improvements for classifying VHR remote sensing data [23].

Related Works in Classification and Data Interpretation Using Height Information
The aforementioned spatial features in Section 1.1 were proposed in a 2D context, where the 2D spatial pattern of the images is the major source of concern.A few researchers reported that integrating a third dimension (height information) could significantly improve the classification results [24,25].This provided an interesting study path; however, the accurate and usable height data were usually expensive to obtain.The idea of using Digital Surface Model (DSM) (height information) for remote sensing interpretation has recently been popularized by the advanced development of a dense matching algorithm, which produced relatively reliable DSM from photogrammetric images.Quite a few research works were devoted to using the image-derived 3D information for change detection in 3D [26][27][28], which provided more reliable results.However, integrating DSM for remote sensing data classification has not been fully investigated.Only a few works have directly/indirectly reported some preliminary studies.Huang et al. [25] computed the GLCM feature, max-min values and variations of the DSM under a pixel-based classification framework and reported 12.3% accuracy improvement.Qin et al. [29] combined the orthophoto and DSM for supervised classification under a change detection framework and reported over 90% overall accuracy.Qin [30], working with others, adopted the DSM for supervised building detection, and the DSMs of a time series were used to perform the spatiotemporal inference for enhancing the building detection accuracy.A recent work from Gu et al. [31] proposed a multi-kernel learning model for fusing the spectral, spatial and height information, with each kernel function designed according to different features.These kernels were then linearly combined and optimized with conventional SVM for optimal performance.Most of these methods mainly considered the height information derived from LiDAR (light detection and ranging), and the performance of using image-derived DSM for UHR data was not fully evaluated.

The Proposed Spatial Feature and Object-Based Classification
In this work, we consider the problem of land cover classification for UHR remote sensing orthophotos with DSMs.As compared to the classification task in VHR data, the objects of interest in UHR data have even larger variation in scale.The increased level of detail leads to higher computational load for pixel-based classification, as well as more ambiguities in the spectral signature of the urban objects.To address these issues, we propose a dual morphological top-hat profile (DMTHP), which makes use of top-hat by reconstruction and by erosion, to extract spatial features both from the orthophoto and DSM.Considering the high dimensional feature space of the conventional morphological profiles, the sizes of the structural elements are estimated adaptively using the training data.This avoids exhaustive morphology computation of a set of sizes with regular intervals, which consequently reduces the dimensions of the feature space.A modified synergic mean-shift segmentation method is applied for object-based classification, making full use of the DSM and radiometric values of the orthophoto.In this paper, we aim to address the following issues and gaps: (1) There are rarely studies addressing the classification problem in ultra-high resolution detail, mainly due to the high spectral ambiguity and large perspective distortion.We incorporate 3D information to improve the traditional land cover classification problem and investigate its accuracy potential in ultra-high resolution data.(2) 2D spatial features are used to enhance the classification results.We aim to develop an effective and computationally-efficient spatial feature that can be applied to the 3D information, for achieving higher accuracy than traditional spatial features.(3) The existing research works lack quantitative evaluation on the major spatial features and their performance on the 3D information.We aim to provide such comparative studies in the course of the presentation of our novel 3D spatial feature.
The remainder of the paper is organized as follows: Section 3 introduces the proposed DMTHP features, the segmentation procedure, together with the object-based classification using the random forest (RF) classifier.In Section 3, two experiments using UHR aerial data with a 9-cm GSD (ground sampling distance) and UAV data with a 5-cm GSD are performed and quantitative results demonstrated.A qualitative experiment is also performed over the whole Vaihingen dataset [32], which demonstrate the scalability of the proposed classification procedure.In Section 4, we validate our method by comparing it to other existing spatial features that contain height information and discuss the uncertainties, errors, accuracies and performance of our method.Section 5 concludes the paper by highlighting our contribution and the pros and cons of the proposed method.

Dual Morphological Top-Hat Profiles with Adaptive Scale Estimation
Mathematical morphology is regarded as a powerful tool for image processing.It was first used on binary images for shape analysis and later extended for grey-scale image analysis [22].The opening and closing operations describe the spatial relations and provide the shape information of the image content in a local area.One of the most useful operations is morphology reconstruction, where an image can be reconstructed as , from a marker image by finding the maximum of , which is marked by . is no greater than pixel-wise and usually derived from an erosion operation from by a structuring element (SE) [17].Intuitively, the reconstruction process inherently represents the structural characteristics of the image subject to a structuring element, thus being , ( , ) , where ( , ) is the grey level morphology erosion: Its dual form, where is the marker image, is the morphological dilation using an SE , denoted as ( , ), , where ( , ) is the grey-scale morphology dilation:

Morphological Profiles
Taking these two types of morphological reconstruction, Pesaresi and Benediktsson [17] proposed to use a set of SEs with different sizes to construct morphology profiles (MP) with a multi-scale property for remote sensing image segmentation, denoted as: , ( , ) , , ( , ) , … , , ( , ) , ( , ), , ( , ), , … , ( , ), where { } is a sequence of SEs with different sizes (scales), in the form of regularly-spaced granularity.A derived form, named the differential morphological profile (DMP), is defined as the differential of the profiles: This is effective to sense the structural differences of the image content at different scales.MPs and DMPs were experimented with by Benediktsson et al. [23] in the classification of high resolution remote sensing images of the urban areas.Tuia et al. [33] examined a set of morphological operators on classifying panchromatic images and demonstrated the effectiveness of the MPs.Each set of operators may delineate particular classes, but they could not fully take advantage of the classes and compensate for each other.The morphological reconstruction avoids discontinuities, and this retains redundancies across different scales in the MPs.

Morphological Top-Hat Profiles
Morphological top-hat (MTH) is defined as the peaks of an image grid, computed by morphological operations.Intuitively, the top-hats can detect the spatial blobs of the image grid.Such spatial blobs can be used effectively to represent urban structures, such as buildings, cars and shadows cast by buildings.Huang and Zhang [34,35] applied the multi-directional morphological top-hat transform to detect buildings and shadows in panchromatic images.They assumed that buildings dominate the bright region and that the shadows cast by the buildings are revealed as dark blobs.Qin and Fang [36] indicated that the height of a blob is a better index for representing buildings.They adopted the top-hats on the DSMs and combined with NDVI (normalized difference vegetation index) filtering to perform accurate building detection.
The success of top-hats in detecting urban objects inspired the idea to build morphological top-hat profiles to deal with multi-scale urban objects in UHR remote sensing data.We consider two types of morphological top-hats, (1) top-hat by reconstruction (THR) and ( 2) top-hat by erosion (THE), which can be simply defined as follows: A. Top-hat by reconstruction: B. Top-hat by erosion: THR is effective at detecting the peaks of an image grid.However, one drawback of the THR is that it cannot highlight off-terrain objects on a slanted surface connecting the top of the blobs, since the morphological reconstruction process finds the peaks globally over the image grids.Qin and Fang [36] partly addressed this problem by blocking the connections made by vegetation using NDVI before THR.On the contrary, THE simply detects the local height extreme subject to the given SE.It can detect the off-terrain objects on a slanted surface, whereas it produces errors for terrain objects.Figure 1 shows an example, where THR and THE are computed using a disk-shaped SE.The results of THR (Figure 1c) can effectively detect the off-terrain objects and, at the same time, maintain good separability of the terrain classes (road, ground).THE is able to highlight the local maximum under the area marked by the SE, but results in errors in the terrain objects (e.g., roads).The area in Figure 1 marked with a red circle shows a building that connects to the adjacent road, which has a similar height.The THR (shown in Figure 1c) misses a part of the buildings, while THE has detected this part.These two top-hats can be used to compensate for each other.As the sizes of the urban objects vary a lot, we consider a multi-scale approach.A series of SEs { } with different sizes are used to construct the DMTHP:

Adaptive Scale Estimation
The multi-scale SEs are effective at describing the spatial differences of objects with different sizes.However, as indicated by Benediktsson et al. [23], a major drawback of such a strategy is the high computational cost for classification with high dimensional features.To reduce the high dimension, they only use two elements with the maximal responses for training and classification, but the computation of the morphological reconstruction still needs to be done at full scale, which is considerably time consuming.Moreover, the morphological top-hats may contain redundancies, since the results are closely related to the scale of the objects.Figure 2 shows an example of the THR and THE profiles of different classes.It can be seen that the profiles in each sample of the urban object show different patterns.However, large redundancies can be observed, both in the profile bar and the THR and THE maps in the last two rows.The THR and THE maps in the last three columns remain similar.In the bar figure of the car, some values at different scales stay the same.This is due to the fact that the scales of different urban classes are within a certain range, and a fixed interval of scale may not contribute effectively to the distinct features for classification.Only the values near the discontinuities are useful to construct distinct features, corresponding to the scale of different urban objects.Since for most of the classification tasks, the training samples are selected as representatives of the urban object, it is feasible to estimate the scale bounds of different urban classes based on the training samples.
where (•) and (•) are the ranges of the segment in the x and y directions, respectively.We compute the upper bounds of the scales in each class as: The (∁ ) are sorted, and adjacent scales whose distance to the others is smaller than a threshold = 80 pixels are clustered.The final estimated scales are denoted as { } .Thereby, the DMTHP can be reformulated as: In our experiment, to ensure a numerically-equivalent contribution for each element of the feature, we normalize each dimension of the feature to [0,1] across the whole dataset.

Height-Assisted Synergic Mean-Shift Segmentation
The UHR data reveal a high level of detail of the ground objects.Therefore, it is necessary to adopt object-based analysis to reduce the computational complexity.To make full use of the spectral and height information, we adopt a height-assisted segmentation that employs both the DSM and color images, as proposed in Qin et al. [29].The segmentation method is essentially a synergic mean-shift (MS) segmentation [20,37], which applies MS segmentation and, at the same time, constrains the segment boundary using a weight map, implying the probability of each pixel being an object boundary.In this method, the weight map is defined as the Canny magnitude [38] of the DSM.In each iteration of the segmentation, the synergic segmentation prevents the MS procedure from going beyond pixels with a high boundary probability.In the classic MS segmentation, there are two major parameters, the spatial bandwidth and the spectral bandwidth , which have spatial proximity and spectral similarity for the segmentation procedure [20].In addition to the classic MS segmentation, the synergic MS segmentation has another parameter that controls the weight of the additional edge constraint, which in our context is derived from the DSM. Figure 3 shows an example of the synergic MS segmentation.It can be seen in the area outlined by the red circle that the height-assisted synergic MS segmentation is able to break the segments with height jumps.This provides more accurate segments for further training and classification.In our experiments, = 7, = 4 and = 0.1 are set as constants, as suggested in [29].Due to the DSM constraint, = 4 is set as a relatively large value to reduce the effect of over-segmentation.

Classification Combining the Spectral and DMTHP Features
The random forest (RF) classifier [39] is widely used for classifying features with hierarchical characteristics.Qin et al. [29] have demonstrated that RF performs better than support vector machine (SVM) when combining the height and spectral information for classification, since they do not linearly contribute to the final feature vector.In our experiment, RF is applied to classify the feature vectors constituted by the spectral and DMTHP features.RF is essentially an ensemble learning method using a decision tree classifier.The advantages of this method are the improved accuracy due to the voting strategy of multiple decision trees and the hierarchical examination of the feature elements, which are particularly useful for features constructed from different sources.
Since the spectral feature is the major driving force for the classification, we adopt the principal component analysis (PCA) transformation of the image color bands as the spectral information, since it has been proven to give better classification accuracy [13].It maximizes the variance of the spectral direction to increase the independence of each band.The DMTHP feature is applied to both the orthophoto and DSM, as the morphological top-hats on both data sources return useful information, as reported by Huang and Zhang [35] and Qin and Fang [36].For the orthophoto, we apply the DMTHP to the brightness and darkness images, which are defined as the first component of the PCA transform and its inverse image, being effective for describing bright blobs and dark blobs at different scales.For the height information, the DMTHP is directly applied to the DSM.We finally concatenate these features into a vector-stack fashion to perform the random forest classification.This feature extraction procedure is described in Figure 4.

Experimental Setup
Our experiments mainly target the UHR remote sensing data, and the purpose of the experiments is to (1) test our proposed feature on spatially-varying urban objects and (2) evaluate the accuracy potential of UHR data using the classic land cover classification paradigm.This section contains the three experiments on aerial and UAV images, with associated DSM derived from image dense matching techniques.The first experiment is performed on a small test area from the Vaihingen dataset [32], with a 9-cm GSD.The second experiment is performed on a dataset generated using UAV images, with a GSD of 5 cm.Both Experiments 1 and 2 are quantitatively evaluated against the ground truth.The third experiment applies the proposed classification flow to the whole Vaihingen dataset to qualitatively demonstrate the scalability of the classification procedure.
The RF classifier is adopted for the classification, and 500 decision trees are used for the training procedure; the number of variables for classification is computed as the square root of the feature dimension.In our experiment, the feature dimension is 12 and 15 for Experiments 1 and 2. A large part of the test datasets is manually labeled as reference data.Around 1 percent of the marked labels are randomly selected (as shown in Table 1), and a 5-fold cross-validation (CV) process is applied to ensure the statistical robustness of the proposed features, by eliminating the possibility of sample-independent results.The number of training samples and test samples is listed in Table 1.

Experiment with Test Dataset 1
The first test dataset and the resulting classification map are shown in Figure 5.It can be seen that the scene is composed of rather complex objects, with buildings varying in shapes and sizes and cars distributed in different areas with different densities.The orthophoto and DSM are derived from the aerial images by the INPHO 5.3 software, using multiple feature matching.The zoomed-in image (Figure 5c) shows the ground objects in a very high level of detail, and the color-coded DSM shows that the relief differences of the small parked cars are well captured.The false-color orthophoto (shown in Figure 5a) contains a near-infrared band, which is effective for distinguishing vegetation and concrete.Therefore, both the spectrum and DSM constitute rather informative sources to classify the complicated and detail scene.Table 2 shows the statistics of the classification against the ground truth.The proposed method has achieved 93% overall accuracy.The buildings and roads are well distinguished from each other and have over 90% classification accuracy.More than 80% of cars are correctly classified in the classification map.We include a second dataset from a different data source, as shown in Figure 6.The orthophoto and DSM are generated from a UAV mission, as described in [9,40], with a 5-cm GSD.The DSM is generated using a hierarchical semi-global matching algorithm [41,42].The slightly increased resolution of the dataset brings some challenges.A major challenge of this dataset is that the colors of the roof and the ground are very similar to each other, with the roads directly connected with the parking lots.Therefore, in this experiment, we consider such ground as roads.Moreover, the roof tops take over a large part of the image content, which have a similar or even larger area than the ground, and this might create potential problems for the sequence of the MTH to identify the correct object class.
Figure 6d and Table 3 show the resulting classification map and the associated statistics.We have obtained more than 94% overall accuracy.Over 96% of buildings and roads are identified well, and 83% of the cars in the parking lots are correctly detected, which could facilitate the application of urban infrastructure management.The classification accuracy of the vegetation is lower than Experiment 1, and this is expected, since we do not have the near-infrared information in this experiment.

Test Dataset 3
The quantitative experiments have shown the advantages of our proposed features against the tested methods.However, it is crucial to test the proposed method over a large dataset.In this experiment, we applied our method to the whole Vaihingen dataset, which is 20,000 × 20,000 in its dimensions.However, due to the lack of reference data for such a highly-detailed classification task, we only evaluate the whole dataset qualitatively by visual comparison, and the quantitative study of this whole dataset is still in progress.The training samples are around 0.1 percent of the total number of segments.We have selected two representative areas, as shown in Figure 7.The visual comparison of these two areas to the orthophoto has shown that the classification maps have described the scene well, including small plants and vehicles.There is a small part of the misclassifications occurring on the road in the upper left part of the scene, classifying the road as cars.This might be caused by sampling selection and the image matching errors.

Comparative Studies and Validations
To illustrate the effectiveness of the proposed DMTHP feature, we compare it to other spatial features with height information, and these include DMP, mean height information and nDSM (normalized DSM).The DMP is applied in the same manner as the DMTHP feature described in Figure 4, which computes the differential morphology profiles of the brightness image, the darkness image and the DSM.The size of the structuring elements ranges from 10 pixels-300 pixels, with a regular interval (this interval is fixed as 30 pixels in our experiment), which results in a 30-element spatial feature.The nDSM is very effective at representing the off-terrain objects, usually computed by subtracting the DTM (digital terrain model) from the DSM.Since we do not have a separate DTM for this area, we adopt the morphological top-hat by reconstruction as an estimation, which has been widely used as an approximation of the nDSM [36].The comparative studies of Experiments 1 and 2 are quantitatively performed, with the resulting classification maps and associated accuracies presented in Figures 8 and 9 and Tables 4 and 5.   Figure 8 shows the classification results using four different features associated with DSM, and Table 4 lists their classification accuracies, where CV is positively correlated with the OA.It can be seen that the classification result using spectral information alone (Figure 6a) produces many classification errors, which mainly occurs in the building and road/ground classes, as these classes have very similar spectral information.The incorporation of DSM renders better results.However, there are still misclassifications between the roof tops and the ground (e.g., shown in the black circle in Figure 8b-d).nDSM and DMP have obtained better accuracy, as they reveal the spatial structure of the image.Among the tested features, the proposed DMTHP with adaptive radius selection achieved the best OA, while the OA of DMTHP with a regular interval radius is slightly lower, as the adaptive radius selection procedure can avoid redundant information when constructing the feature vector.
Figure 9 shows the resulting classification maps, and Table 5 lists the associated classification accuracies.It is expected that we observe low classification when using the spectral information alone (Figure 8a), as a large roof top is classified as the concrete ground.The DMP in this experiment produces better OA than the nDSM.The DMTHP with adaptive radius selection has obtained the best OA, 2.78% higher than the regular interval DMTHP.The large improvement is due to the fact that our sequence radius is fixed between 10 and 300, while the adaptive radius selection procedure correctly estimated the granularity of the building roofs and the other objects.

Uncertainties, Errors, Accuracies and Performance
Both of the comparative studies have demonstrated that the consideration of DSM information generally has significantly increased the classification accuracy, and our proposed DMTHP feature outperforms the tested methods.In the best case, as shown in Table 5, our proposed method has obtained an overall accuracy of 94%, in comparison with the traditional method that only adopts spectral information (64%).In this case, the resulting accuracy can suffice with respect to the general need for accurate land cover mapping and object identification.
The incorporation of DSM in the classification procedure using our proposed DMTHP feature has obtained satisfactory results.However, being that there is a problem with the classification tasks, the major uncertainties that affect the resulting accuracy are the selection of samples, as well as the data quality.In our case, the accuracy of the DSM is a critical factor.For small objects, such as cars and public facilities (benches on the street), the matching algorithm may fail to capture their elevation, which consequently causes a wrongly-described DMTHP feature on these objects.Therefore, the selection of the samples should be random, but also considerate of such a matching failure.
The proposed DMTHP feature is based on morphology top-hat reconstruction, which may require high computational load in terms of feature extraction.The adaptive scale selection strategy proposed in Section 2.1.3can effectively reduce the computational time.In particular, we have recorded the running time of the DMTHP with and without adaptive scale selection, which is shown in Table 6.The implementation is done with a normal PC, with the mixed use of MATLAB and the C++ programming language.The running time comparison in Table 6 is performed under the same conditions.In Experiment 1, it shows that the adaptive DMTHP has a significantly shorter running time in the feature extraction, training and classification stages, which mainly is attributed to the reduced feature dimensions.The overall running time of the adaptive DMTHP is shorter than the others, while the feature extraction time is longer, which may be caused by a larger radius estimated by our adaptive radius selection procedure.

Conclusions
In this work, we have proposed a dual morphological top-hat profile (DMTHP), which extracts spatial features from the orthophoto and DSM.A simple and adaptive scale selection strategy that determines the granularity sequences of the profiles is suggested to effectively reduce the computational cost, as well as the dimensionality of the feature.We have further applied the proposed feature to address the problem of the land cover classification on UHR remote sensing images combined with the associated DSM, aiming to interpreting urban objects at a sub-building level (such as cars).The random forest classifier was adopted under an object-based scenario, in which the segmentation was performed using the advanced synergic mean-shift algorithm, combing both images and DSM.
The proposed method has been compared to several spatial features, and this demonstrates that the proposed DMTHP together with the adaptive scale strategy obtained the best OA.The DMTHP with adaptive radius selection has obtained 10% and 2% improvement in the OA compared to the well-known DMP feature in our two quantitative experiments.As for the comparison to the classification using merely the orthophoto, the incorporation of the DSM has demonstrated a notable improvement.In the best cases in our experiment, it increases the OA from 63.93% (spectral information only) to 94.48% (adaptive DMTHP).
A qualitative experiment using the whole Vaihingen dataset (20,000 × 20,000) has shown the possibility of the proposed method applied to large-scale datasets, and the visual comparison shows the promising potential of the proposed method.
In general, our contribution in this paper lies in the following aspects: (1) We have presented a novel feature DMTHP with adaptive scale selection to address large-scale variation of urban objects in the UHR data, as well as reduced the computational load and feature dimensionality, which have obtained the optimal classification accuracy in comparison with existing features (2%-10% enhancement to the well-known DMP feature and other height features).
(2) We have demonstrated that in the best case, the proposed method has improved the classification accuracy to 94%, as compared to 64% using only spectral information.This is important to draw the attention of the land cover mappers to consider the use of the height information for land cover classification tasks.(3) A complete quantitative analysis of different UHR data with a 9-cm and a 5-cm GSD has been performed, with comparative studies on some of the existing height features.This provides valid insights for researchers working on 3D spatial features.(4) We have performed a qualitative experiment with 20,000 × 20,000 pixels, which has shown that the proposed method can be used in a large-scale dataset to obtain very detailed land cover information.
Our experiments show that it is feasible to combine the orthophoto and DSM for urban object classification to obtain satisfactory accuracy.However, there are still some misclassifications in our experiments.These misclassifications mainly occur on large roofs with a similar color to the ground, as well as the underrepresented sample selection (as for the classification of a large scene, such as our qualitative experiment).Some errors are due to the image matching errors, which fail to capture the elevation differences of the cars to the ground.Therefore, our future work will focus on a better fusion of the spectral and height information.Since our current version of morphological top-hat is grey level based, morphological top-hat profiles based on color blobs will be considered to address these problems.

Figure 1 .
Figure 1.Results of top-hat by reconstruction (THR) and top-hat by erosion (THE) on a DSM of a test area (meters).(a) The orthophoto; (b) DSM; (c) THR; (d) THE.(The marked area is explained in the text).

Figure 2 .
Figure 2. The THR and THE values at different scales for difference urban classes in the ultra-high resolution (UHR) image."r" is the size of the structuring element (SE) in pixels.Given a set of training samples for classes {∁ } ,{∁ } ,…,{∁ } , where ∈ {∁ } is a segment in the image space specifying a sample for class ∁ , we compute its scale ( ) as:

Figure 3 .
Figure 3.An example of synergic mean-shift (MS) segmentation.(Left) Classic mean-shift segmentation; (middle) height-assisted synergic MS segmentation; (right) boundary probability map derived from the DSM.The interpretation of the red circle is in the text.

Figure 4 .
Figure 4.The proposed workflow for UHR image land cover classification.DMTHP, dual morphology top-hat profile.

Figure 5 .
Figure 5. (a) The orthophoto; (b) DSM; (c) zoom-images of a parking area; (d) reference data; (e) classification map of the proposed method.

Figure 6 .
Figure 6.(a) The orthophoto; (b) DSM generated from the UAV images; (c) reference data; (d) classification with the proposed method.

Figure 7 .
Figure 7. Qualitative experimental results the whole Vaihingen dataset.Two zoom-in areas are shown for a visual comparison.

Figure 8 .
Figure 8.The classification maps of Test Dataset 1 using PCA combined with different spatial features.(a) Using spectral features only (PCA); (b) PCA + DMP; (c) PCA + height; (d) PCA + normalized DSM (nDSM); (e) PCA + DMTHP with a regular interval radius sequence; (f) PCA + DMTHP with adaptive radius selection (the black circles are explained in the text).

Table 1 .
Statistics of the training and test samples.

Table 6 .
Running time of the different steps of the classification.DMP, differential morphological profile.