Terrain Extraction in Built-Up Areas from Satellite Stereo-Imagery-Derived Surface Models : A Stratified Object-Based Approach

Very high spatial resolution (VHSR) stereo-imagery-derived digital surface models (DSM) can be used to generate digital elevation models (DEM). Filtering algorithms and triangular irregular network (TIN) densification are the most common approaches. Most filter-based techniques focus on image-smoothing. We propose a new approach which makes use of integrated object-based image analysis (OBIA) techniques. An initial land cover classification is followed by stratified land cover ground point sample detection, using object-specific features to enhance the sampling quality. The detected ground point samples serve as the basis for the interpolation of the DEM. A regional uncertainty index (RUI) is calculated to express the quality of the generated DEM in regard to the DSM, based on the number of samples per land cover object. The results of our approach are compared to a high resolution Light Detection and Ranging (LiDAR)-DEM, and a high level of agreement is observed—especially for non-vegetated and scarcely-vegetated areas. Results show that the accuracy of the DEM is highly dependent on the quality of the initial DSM and—in accordance with the RUI—differs between the different land cover classes.


Introduction
Very high spatial resolution (VHSR) stereo-imagery from airborne (resolution of a few centimeters) and space-borne (resolution of a few meters to sub-meter) sensor systems enable the generation of digital surface models (DSM) using digital photogrammetric approaches [1][2][3].For some applications and for further analysis steps, an additional DSM-derived digital elevation model (DEM) and a subsequently generated normalized DSM (nDSM; related to ground level) are desirable.Both the DSM and DEM are discrete digital models of the land surface with the following main distinction: a DSM represents the land surface with surface objects (e.g., buildings, vegetation), whereas a DEM represents the bare terrain surface.nDSMs are useful for the identification of building or vegetation heights, for example.Especially for the analysis of natural features and processes such as landslides [4,5], landforms [6], or man-made structures (e.g., open-pit mines), DEMs are preferred over DSMs, because physically-based models need to work with bare ground data [7].nDSMs are especially important in urban areas to serve purposes such as the derivation of the 3D properties of urban buildings, which represent the three-dimensional nature of living spaces and are needed in population estimation or urban planning [8][9][10].To meet the increasing demand for such detailed DEMs, robust and automated DEM production algorithms are required [11].
For DEM production from DSM data sets, methods are available for both the Light Detection and Ranging (LiDAR) and the stereo-imagery domain [12,13].In the LiDAR domain, techniques range from morphological and contour-based filtering to densification techniques, designed to remove the above-ground signal in point clouds.A comprehensive review is provided by Meng et al. [14].Only a few standardized approaches can be identified for the filtering of DSMs produced from satellite stereo-imagery; these are slope-based or morphological filtering and TIN densification [14,15].Continuous optimization and improvement of the aforementioned approaches are achieved through statistical learning methods such as artificial neural networks or genetic algorithms [16], adaptive filtering methods [17], locally fitted surfaces within the data [18], volume-based filtering methods [19], slope-dependent filtering [20], and multi-directional slope-dependent filtering [21] or slope-based region-building and subsequent interpolation [22,23].All of these techniques require the adjustment of a maximum filter size, a certain threshold value, or at least a high degree of awareness of the DEM application purpose [24].All in all, these approaches need a high degree of user-algorithm interaction, and the user should be experienced in producing DEMs.
Compared with LiDAR DSMs, satellite stereo-imagery DSMs have some disadvantages with respect to the generation of a DEM: (1) the spatial resolution of LiDAR-derived DSMs ranges from several centimeters up to a few meters.For commercial satellite stereo-imagery, a coarser spatial resolution of around 1 to 10 m is reached for the derived DSMs [11]; (2) due to different acquisition strategies, only surface and terrain models can be directly extracted from satellite stereo pairs, whereas with LiDAR multiple returns are obtained from different height levels (e.g., top of tree, tree branches, ground level) of the scanned surface, enabling the computation of not only surface and ground level, but also intermediate levels.Intensity values of the LiDAR beam can provide additional information about the surface characteristics [25,26].
On the other hand, stereo-imagery DSMs have the advantage that additional spectral information is available, usually the red-green-blue (visible) and near infrared (NIR) in combination ( =VNIR), typical for most commercial VHSR satellites.Such spectral information can be used to classify the study area so as to stratify and potentially improve ground point sampling.An additional advantage of stereo-imagery DSMs is that they can be obtained for most regions of the world, including remote and inaccessible areas, due to near-global satellite coverage [27].
We propose a semi-automated land cover strata sampling to derive DEMs from stereo-imagery DSMs using specific object-based image analysis (OBIA) techniques.OBIA (for a detailed review, see Blaschke [28] and Blaschke et al. [29]) offers methods for the analysis of very high spatial resolution imagery by describing the imaged reality using spectral, textural, spatial, and topological characteristics.Making use of a hybrid raster-vector data model and object-oriented analysis methods (cf.[30]), we replace traditional DEM filtering methods by focusing on image objects as stratifying units for the DEM extraction approach.The advantage is to avoid "salt and pepper" effects in the initial land cover classification by working with segmented image objects and to more explicitly address the contextual information and the arrangement of objects for an improved stratified sampling.Due to the mentioned limitations of stereo-imagery DSMs compared to LiDAR methods for vegetated areas (especially forests), we are focusing mainly on urban areas in flat to undulating landscapes.To ensure a high level of transferability and a low degree of required parameterization, the proposed approach initially classifies the input data into generic land cover classes (cf.[31]).The separation of above-ground from ground land cover objects and the detection of land cover specific ground point samples is carried out using hierarchical and geometrical features.To detect the ground point samples, we propose an approach that is not restricted by user-defined filter size and reduces user-driven adjustments.The regional uncertainty index (RUI) is proposed to provide a reliability indication of the generated DEM.

Study Area and Data
The study area for this research was the northern part of the city of Salzburg, Austria, as shown on the left side of Figure 1.The extent was about 2.5 km by 2.5 km.The area is mainly characterized by built-up area features, but also includes some urban vegetation, agricultural land, and forests.The terrain of the area is flat with a hillslope in the north (cf.right part of Figure 1).A stereo image was taken on the 28 May 2013 by the Pléiades satellite.The satellite provides high-quality image data, proven suitable for DSM generation [32].The DSM (cf.right side of Figure 1) was computed at the German Aerospace Center (DLR) using the multi-stereo processing chain of the fully automated pre-processing "Catena" system [33].The pansharpened satellite images have a spatial resolution of 0.7 m, and the DSM of 1 m, respectively.An airborne LiDAR-derived DSM and DEM data set from 2006 served as ground reference for validation of the result.Both LiDAR-derived data sets have a spatial resolution of 1 m, equal to the Pléiades stereo DSM.In Central-European urbanized landscapes, the terrain surface is estimated not to change significantly.Hence, the gap in time between the two data sources does not hamper the validation.

Methodology
An initial comparison of the LiDAR-based DSM and the stereo-imagery-based DSM was conducted, and focused on two aspects: (1) to corroborate that the LiDAR-based datasets are qualified for the validation of the new approach for DEM extraction; and (2) to evaluate the quality of the stereo-imagery DSM itself.
The proposed workflow for object-based DEM generation is outlined in Figure 2. The processing steps-including raster layer analysis, vector-object generation, and classifications based on defined rules-were programmed in CNL (cognition network language) within the eCognition software development environment (Trimble Geospatial) [34].The workflow comprised two major steps: (1) a basic land cover classification based on spectral and DSM values; and (2) a sampling of ground points stratified by the land cover classification [35].Together with the VNIR images, the DSM served as input for the multi-resolution segmentation [36] to define regions of similar spectral information and similar height.A coarse delineation of homogeneous objects was sufficient due to the generic characteristics of the applied land cover classes.
A slope layer was derived to divide the area into flat and steep regions based on a threshold of five degrees.From a construction perspective, areas with equal to or less than five degrees of slope were considered flat; all remaining areas were regarded as steep.This terrain classification served as input for the following basic land cover classification to improve surface feature delineation, especially in steep slope areas where (for example) buildings could have blurred instead of sharp transitions to the adjacent sloping surface.Another layer was a customized geomorphometric index called "surface roughness" [37], which uses local kernels to derive custom roughness indices that outline different aspects of surface roughness [38,39].We focus on the separation of flat and elevated vegetation.For this purpose, a median filter with a 15 × 15 pixel window applied to the original DSM has proven to produce good results; hence, the window size was fixed for all application cases of the approach.The smoothed DSM was subtracted from the original DSM, resulting in a residual topographic layer or surface roughness layer.Since a smoothed DSM is subtracted from the original DSM, the generated surface roughness is mainly influenced by the sloping properties of the terrain.By implementing a small kernel size, we were able to preserve the roughness properties of the tree canopies.The following basic land cover classification relied on spectral and elevation values and the two aforementioned terrain layers.
Similar to the approach used by Lohmann [31], the land cover classes used in this approach (cf.Table 1) were kept generic in order to improve the sampling in most DSM data sets, and to ensure the transferability of the approach to a broad range of environments.The following assumptions were made: for non-elevated land cover classes, the DSM represents the bare ground, and is therefore equivalent to the DEM; for elevated land cover classes, only selected DSM locations represent the bare ground, and have to be detected by advanced sampling strategies.Six land cover classes were differentiated: • "Field/Pasture": derived by a low roughness value and a normalized difference vegetation index (NDVI) above zero; • "Vegetation elevated": NDVI above zero and a roughness layer standard deviation greater than 0.5; • "Water bodies": the spectrally darkest and mostly slopeless objects.Mix-up with shadow avoided by contextual features (no neighboring elevated objects); • "Bare soil": classified with a browning reflectance index (BRI) value close to zero [40]; • "Built-up elevated" (buildings and large elevated infrastructural elements): Objects with high slope, low NDVI, and high spectral values were assigned to this class.A comparison of average DSM height values with spatially adjacent objects was performed to ensure that detected built-up objects were above a certain relative (compared to the surrounding) height threshold; • "Built-up non-elevated": Objects that exhibited no difference in regard to DSM height to adjacent objects (contextual feature) while having similar spectral properties to the class "Built-up elevated".
Based on the classified objects, a land cover domain-specific ground point sampling per land cover object was performed.Three different types of ground point sampling strategies were implemented in this study (cf.Table 1): (1) If the class was assumed to represent true terrain values, the object (group of pixels) as a whole was taken into account for the DEM generation.(2) For the class "Built-up elevated", ground point sampling was based on local minima detection using flexible search radii/areas defined by the exact boundaries of the respective land cover object size instead of a fixed or user-defined window size.The minimum height pixel in the respective object was taken as the preliminary ground point.Several iterations were implemented to automatically validate the detected ground point by comparing its height with regard to deviations from the surrounding ground point height average.Hereby, only the lowest values in between the elevated features (e.g., gaps between buildings or courtyards in buildings) were considered for the DEM generation.(3) The class "Vegetation elevated" is the most difficult class to derive ground point samples from, since a stereo DSM does not provide any information about ground level height in between dense trees or shrubs (in contrast to LiDAR point cloud data).A ground point sampling procedure was implemented to detect objects with a negative height difference ≥ 10 m (rough, adaptable threshold, based on visual interpretation of average vegetation height in this area).These objects were regarded as clearings with at least low vegetation.Then, a ground point sampling procedure similar to the sampling in "Built-up elevated" was applied.However, a clearing can also represent a shrub within the forest or a fairly small tree; thus, ground point sampling within this class is prone to errors.Nonetheless, it is regarded as helpful to have these ground points, especially for large forested areas.The sampled ground points are not regularly spaced, and the lower the values of the sampling density, the higher the influence of the interpolation technique [41].A natural neighbours (NaN) interpolation was used, which is able to handle irregularly spaced ground points and to interpolate smooth continuous surfaces with good performance rates.
A root mean square error (RMSE) calculation per class and the standard deviation (STD) of height in meters per class were performed in order to receive explicit statements about the validity of the generated DEM areas in relation to the reference LiDAR-DEM data set.
Additionally, we propose a regional uncertainty index to provide information about the number of ground points available for each land cover object, and thus indicate the potential quality (compared to the DSM) and the spatial variation in the quality of the DEM.The RUI is the ratio of detected ground points ( = pixels) per land cover object versus the total number of pixels forming the respective land cover object.It is a dimensionless quality metric with values from 0 to 100.The RUI should serve as a reliability measure for the user of the DEM, so that the heterogeneity of the ground point sampling (stratified by the land cover classes) is made spatially explicit.

RU I land cover object = (
∑(ground point samples per land cover object) ∑(Pixels per land cover object) ) * 100

Results
As described in the methodology, an initial comparison of the LiDAR DSM and the stereo DSM was performed.The left side of Figure 3 shows the height variation of the stereo-imagery DSM in comparison to the LiDAR DSM.Since the stereo DSM is based on photogrammetric methods, the edges of elevated objects are smoother and remain visible when subtracting the LiDAR DSM from the stereo DSM [27].Symmetric elevational errors may have two different reasons: first, the inclination of the LiDAR beam, and second, the photogrammetric method of two non-nadir satellite images.Both may result in different angles made visible when subtracted (cf. Figure 3 close-up a).There is also a time difference of seven years between the data acquisitions.Positive differences are therefore also linked to vegetation growth (northern and western part of the study area (cf. Figure 3 close-up d) and construction work (cf. Figure 3 close-up b).Large negative differences occur due to removed buildings (cf. Figure 3 close-up c) and vegetation (e.g., small clear cuts).With regard to the comparison of surface object removed DEMs, the absence of large errors for bare terrain surfaces proves that the LiDAR-DEM is a valid source to examine the quality of the approach.Additionally, an RMSE calculation was performed at the end.
The generated DEM is presented in Figure 4, together with the reference LiDAR-DEM; Figure 5 depicts a differencing layer (LiDAR-DEM subtracted from the generated DEM) and the respective RUI-values for all land cover classes, while Figure 6 shows the nDSM (stereo DSM subtracted from the generated DEM) and the land cover classification used for the stratification.Figure 7 presents the quantitative assessment with RMSE values and standard deviation, both computed per class.A visual investigation of the results revealed the following:

•
The DEM generation works well for flat areas.The difference between the reference DEM and the generated DEM shown in the top of Figure 5 is mainly in the range of ±1 m, which is within the intrinsic precision error of the satellite sensor and stereo-matching procedure [42]; The water level of the river "Salzach" was higher due to precipitation during the days before data acquisition, and may also be influenced by a newly-constructed hydroelectric power station (built after the LiDAR data acquisition); • Small objects (e.g., a small river crossing from east to west in the bottom of Figure 4) are hardly detected in the satellite-derived DSM, and are thus visible in the difference layer; • In the LiDAR-DEM, man-made bridges have been removed (cf.top vs. bottom of Figure 4).Since the approach in this paper is based on the concept of landforms as discussed by [43], bridges or embankments were not removed from the stereo DEM, and therefore cause additional positive errors in the difference layer, visible in the top of Figure 5; The RUI classification in the bottom of Figure 5 indicates land cover regions where only a few ground points have been detected; these regions are presumably less reliable and less accurate; • Densely vegetated areas are prone to errors due to photogrammetric matching issues in the stereo DSM and very sparse or missing detected ground point samples, and thus result in large errors in the stereo DEM.This is especially true for dense vegetation on steep slope surfaces [44].If the vegetated area covers domes or similar relief structures (hilly areas), not enough ground point samples can be found to accurately estimate the underlying surface.The last fact is of particular interest, since the presented approach deals with irregularly-spaced data for the interpolation.The quality of the interpolated data set largely depends on the height structure and the density of the data: (1) there is low variation in the interpolated data if the sampling data density is high; (2) a large variation may occur if the sampling data density is low or the height differences in surface structure of the dataset are strong [41].
The results reveal that the approach is able to generate a DEM from a stereo DSM dataset for urban areas and flat areas with low vegetation.The quantitative assessment of the generated DEM corroborates the findings of the visual investigation.Figure 7 depicts the height deviation as a histogram for each class (same color ramp as the top of Figure 5).The majority of values are in the range of ±1 m.Deviations in "Built-up elevated" can be partly explained by the time difference of seven years (construction work; building removal).The variation in "Vegetation elevated" is caused by only very few ground points-indicated by low values of the RUI measure-which resulted in relatively poor DEM quality, especially in hilly terrain (cf.top of Figure 4, northern area).The positive shift in "Water" is due to a different water level, and is not considered as an error.The RMSE histogram was calculated for each class separately (cf. Figure 7) [45].The lowest RMSE values were observed for "Fields/Pasture", "Built-up non-elevated", and "Bare soil".They represent the quality of the stereo DSM compared to the LiDAR DSM, since the original stereo DSM was taken for these classes.This underpins the basic land cover classification quality for these areas.

Discussion and Conclusions
The issues in elevated vegetation and steep sloping areas can be seen as a data-inherent problem of DEMs derived from stereo imagery.In sloping terrain, the applicability of the proposed approach is limited, since elevated objects often show similar heights to the adjacent land surface (e.g., slope-facing sides of buildings).We tried to tackle this issue by exploiting the available spectral information; for example, by removing elevated objects and separating non-vegetated sloping terrain from vegetated sloping terrain.Hereby, we retrieved-to some degree-meaningful ground points in sloping areas.
As for other filtering techniques, many breaklines or discontinuities in the landscape were smoothed during the interpolation of the final DEM [46].This is the case for the aforementioned small river crossing from east to west.
Compared to other applications for DEM extraction, the approach presented in this paper does not depend on any definition of a filter window size or similar user-driven threshold.The size of the land cover objects is not pre-defined, but is flexible, and only depends on the characteristics and structure of the landscape represented in the imagery.This can be regarded as a substantial advantage compared to filter-based techniques.In this paper, we introduced a novel approach that uses the concept of object-based image analysis for the generation of a DEM from a stereo-imagery-derived DSM data set.Stratified object-specific sampling of ground points is conducted based on initial land cover classes.
Results show a high potential of the approach for areas that are characterized by relatively flat terrain.For sloping terrain-especially if covered by dense vegetation-the accuracy of the generated DEM is limited.Dealing with these limitations should be one of the future objectives to improve the approach.The generated DEM is highly dependent on the quality of the DSM produced from the stereo-imagery pair and the basic land cover classes.This can be observed in the forest stand in the north of the study area.Here, the comparison of the two DSM data sets already revealed differences; hence, the DEM generated by our approach is also prone to errors.However, with a suitable land cover classification, the respective interpolation area can be reduced to a minimum.
To visualize these dependencies, we proposed the RUI.The RUI is a numerical but dimensionless value in the range of 0 to 100, and reflects the sampling points per land cover class domain for the DEM generation as a reliability measure.
Further research and improvement of the approach will be conducted with regard to two aspects: (1) in the course of the analysis, it became apparent that a more sophisticated investigation of the spatial distribution of ground points with point pattern analysis (e.g., average nearest neighbours (ANN), Ripley's K) could complement the proposed RUI.Hence, an additional uncertainty indication could be provided.(2) The approach already tries to minimize the parametrization (user-algorithm interaction) of the initial land cover classification, since only a rough land cover classification is needed.This could be further investigated.Fully automated parameter-free pre-classification methods should be taken into account to increase the potential transferability of the approach (e.g., Baraldi et al. [47]).

Figure 2 .
Figure 2. Graphical overview of the workflow based on land cover (LC) stratification and ground point (GP) sampling conducted in the proposed approach.DSM: digital surface model; DEM: digital elevation model; VNIR: visible red-green-blue and near infrared spectral bands.

Figure 3 .
Figure 3. Left: Difference map showing height variation of the stereo-imagery DSM in comparison to the Light Detection and Ranging (LiDAR) DSM: blue colors represent areas where the stereo-imagery DSM is lower than the LiDAR DSM, brown colors denote areas where the stereo-imagery DSM is higher than the LiDAR DSM, and white areas are in the range of [−1/+1 m] difference; Right: Close-ups of the error map depicting (a) vertical residuals of the stereo DSM around buildings, (b) new buildings, (c) removed buildings, and (d) height differences in a small forest stand.

Figure 4 .
Figure 4. Top: Resulting DEM generated from the stereo-imagery DSM using the presented approach; Bottom: Reference DEM derived from LiDAR data.

Figure 5 .
Figure 5. Top: Error map showing the height variation of the generated DEM and the reference LiDAR-DEM.Bottom: RUI classification for the study area; green values indicate a very high sampling rate, red values indicate low sampling rates per object.

Figure 6 .
Figure 6.Top: normalized DSM (nDSM) generated from stereo DSM and the generated DEM; Bottom: land cover classification of the study area.

Figure 7 .
Figure 7. Height deviation of the computed stereo DEM in relation to the reference LiDAR-DEM, calculated for each class separately.RMSE: root mean square error; STD: standard deviation.

Table 1 .
[31]s names adapted from Lohmann[31], extraction features applied in the approach, respective methods for DEM sampling and regional uncertainty index value calculation.
a Regional Uncertainty Index; b Not applicable in the study site.BRI: browning reflectance index; NIR: near infrared.