Using ALS Data to Improve Co-Registration of Photogrammetry-Based Point Cloud Data in Urban Areas

: Globally, urban areas are rapidly expanding and high-quality remote sensing products are essential to help guide such development towards efficient and sustainable pathways. Here, we present an algorithm to address a common problem in digital aerial photogrammetry (DAP)-based image point clouds: vertical mis-registration. The algorithm uses the ground as inferred from airborne laser scanning (ALS) data as a reference surface and re-aligns individual point clouds to this surface. We demonstrate the effectiveness of the proposed method for the city of Kuopio, in central Finland. Here, we use the standard deviation of the vertical coordinate values as a measure of the mis-registration. We show that such standard deviation decreased substantially (more than 1.0 m) for a large proportion (23.2%) of the study area. Moreover, it was shown that the method performed better in urban and suburban areas, compared to vegetated areas (parks, forested areas, and so on). Hence, we demonstrate that the proposed algorithm is a simple and effective method to improve the quality and usability of DAP-based point clouds in urban areas. substantially


Introduction
Urbanization has been a prevalent global trend for the past several decades, with drastic consequences [1]. Urban areas are expanding considerably on a daily basis; some areas are estimated to expand as much as 110 km 2 every single day [2]. Urbanization has drastic consequences: patterns of land-cover, hydrological cycle components, climate, biogeochemistry, and biodiversity are all altered on a global scale as a result of it [3]. Another consequential aspect with respect to land use in urban centers is that over the past several decades, the land expansion rate has been higher than or equal to population growth rates; this suggests that urban growth is becoming more expansive than compact [1]. Urbanization has deep implications for global sustainability. It has been opined that the key to global ecological sustainability may lie in urban areas as they account for ~75% of global GDP [4]. Some positive aspects can also be associated with such urbanization trends. For instance, this represents a unique opportunity to build more sustainable spaces. These are cities that have 'green' buildings, sustainable transport options over reduced distances, parks and such green spaces, and ample domestic and industrial recycling opportunities.
Remote sensing data is crucial to city planners for monitoring urban areas, devising effective interventions and for planning urban infrastructure [2]. The importance of remote sensing in the mapping and monitoring of diverse sustainability factors (e.g., land use and cover, biodiversity, climate, hydrological systems, biogeochemistry) has been previously noted [2,3]. Remote sensing scientists are being tasked by policy and decision makers to generate more and better-quality urban maps and related products. Digital aerial photogrammetry-based 3D point cloud data is becoming increasingly popular in the urban context [5,6]. Point clouds have a unique advantage over other remote sensing data types in the context of urban environments because they are able to capture the 3D aspect of urban infrastructure such as buildings, elevated bridges, walkways, and open spaces. This is advantageous, especially because spatial visualizations are important for urban planners for creating and communicating various strategies and approaches [7]. Point cloud data is useful for a varied list of tasks such as urban planning, vehicular traffic management, and creation of digital surface models, 3D modeling, virtual reality simulations, and overall environmental monitoring [8]. Dense point clouds can be created with processing routines of Structure from Motion (SfM) [9], which can utilize a range of popular stereo matching algorithms such as semi-global matching (SGM) [10], the 'next-generation automatic terrain extraction' algorithm (NGATE) [11], or more recently, deep learning [12].
The vertical positional accuracy of digital aerial photogrammetry (DAP)-based point clouds is highly important for the creation of relatively error free digital surface models. However, it can be limited in urban environments because of several factors. First, there could be slight errors associated with the global navigation satellite system (GNSS) positioning equipment or with the inertial measurements units (IMU). This is more probable when low-cost equipment is used, such as in the context of 'citizen science' related data acquisition projects. Second, urban areas pose unique challenges to the point cloud generation algorithms. This is because of low texture in some areas, occlusion effects, heterogeneous surfaces and surface types, and shadow effects (e.g., [13]). Indeed, the noise and outlier-laden nature of DAP data when compared to airborne laser scanning (ALS) has been noted by several authors [14][15][16]. As part of the many DAP-based point cloud generation procedures, multiple point clouds are generated independently, one for each image-pair used. We have noticed in some of our urban DAP-based point cloud datasets that there is ~1m vertical misalignment between such point clouds (from different image pairs). This is the primary motivation for the development of a simple and intuitive algorithm to reduce such misalignment. There have been several attempts for image point cloud quality improvement. For example, a smoothing operation based on the similarity between various patches and their color has been proposed [17]. A technique of iteratively smoothing the point cloud by estimating quadric surfaces for various point cloud regions was suggested by Fua et al. [18]. A simpler version of this strategy of using best-fitting tangent planes (for all points) so as to detect and remove noise and outlier points have also been suggested [19]. Yet, to our knowledge, this is the first one that leverages on an ALS-based ground surface to correct DAP-based point clouds. A similar method has been validated for forested areas [20]. Here, we describe a version of the algorithm that is suitable for urban areas, and we also provide some quantitative estimates of the efficacy of our proposed algorithm.
The primary motivation behind our proposed scheme of correcting DAP-based point clouds using ALS data is because of their differing acquisition frequencies in the real world. On average, DAP data is roughly one-third the cost of ALS [21]. Hence, in many instances over urban centers, ALS acquisitions happen once in 10 or 15 years, while DAP-related acquisitions (i.e., collection of aerial images) happen once every two or three years. In fact, annual DAP acquisitions may also be justified for some fast-expanding or changing cities.
The main objective of this article is to study the efficacy of an image point cloud correction technique in the urban context. Specifically, we try to answer the following questions in an urban context: 1. Can an ALS-based digital terrain model (DTM) be used to realign DAP-based point clouds consistently? 2. If so, what are the quantified improvements seen in such point clouds?
By answering these questions, we hope to establish the efficacy of a simple yet powerful method to increase the technical quality and usability of image point clouds over metropolitan and similar city areas.

Study Area
Our study area is the city of Kuopio (Figure 1), an urban area with a population of around 120,000.

Remote Sensing Datasets
The ALS data were collected during May 2010 using the Optech ALTM GEMINI (05SEN180) system, a widely used sensor [23]. The mean flying height was 2000 m above ground level, the field of view used was 30 degrees, resulting in a side overlap of approximately 20%. This flying configuration resulted in a nominal sampling density of about 0.8 emitted pulses per m 2 . A DTM was constructed by first classifying points as ground and non-ground echoes according to the approach described by Axelsson [24]. A raster DTM of 2 m spatial resolution was then obtained by computing the mean of the ground echoes within each raster cell. Values for the cells with no ground echoes were interpolated using Delaunay triangulation and triangular interpolation.
Aerial photographs were also acquired over Kuopio on July 2014 using a Canon EOS-1Ds Mark III. The sensor was a nadir-facing RGB camera in a multiple-camera setup. Position and orientation came from the integrated Applanix POS/AV 310 Inertial Navigation System with GNSS and IMU units. The average flying altitude was 915 m above ground level. A total of 958 aerial photos were acquired, with an average end-lap of 60% and an average side-lap of 45%. Exterior camera orientations from the GNSS/IMU system were used as such without block adjustment (i.e., direct sensor orientation was used).
Block bundle adjustment is quite common in photogrammetry pipelines, even while using automatically detected tie points in images in the absence of ground control points. However, such an adjustment was skipped in this case for two reasons: First, our objective here was to recreate scenarios where such (potentially costly and time consuming) processing techniques may not be available or feasible. Second, we intend to showcase our algorithm as being able to correct point clouds with systematic spatial misalignments, no matter what the error source. Other possible sources for misalignment could be cheap equipment or less favorable acquisition conditions.

Creation of Image Point Clouds
We used an image matching algorithm called SGM [10] to create the initial DAP-based point clouds from overlapping aerial images. The basic idea is to pixelwise match images, but with global and local search techniques used to establish corresponding points. We used an SGM implementation done in ERDAS IMAGINE photogrammetry suite (version 15.0; Geosystems, 2014) for creating the point clouds. This implementation always results in image-pair based point clouds (multi-view stereo is not possible).

Height Adjustment Algorithm
The proposed algorithm was used to adjust height of DAP-based point clouds, by adjusting the height (z) values associated with each included point. A similar version of this algorithm was previously presented [20]. Herein, we presented a modified adaptation. The algorithm is parametrized by a set of user-specified input parameters, and is executed for each point cloud imagepair at a time. The following high-level description summarizes the essential elements of the procedure in eight steps: Parameters relevant to the procedure: 1. The pixel size (pixel_size). 2. The standard deviation threshold (SDmax). 3. The minimum number of points threshold (npmin). 4. Ground pixel outlier detection threshold (zcorrmax).
Output layer: 1. The adjusted DAP-based point cloud data.
Procedural steps: 1. The elevations of the unadjusted DAP-based points (ZDAP) are scaled so that they are now relative to the ALS ground level (∆ZDAP). This is done by subtracting the (ALS-based) DTM from the DAP point clouds z values. 2. A user-defined pixel size (pixel_size) was used to tessellate the spatial extend of the point cloud into a regular square-grid. 3. Grid-elements (pixels) that belong to buildings and similar structures are labelled as 'built-up area' (BUA) pixels. This is done using the 'demarcation of built-up area (BUA)' layer. This is elaborated further in Section 2.4. 4. Each grid-element (pixel) of this tessellated grid was examined as to whether it would qualify as a ground pixel (i.e., a pixel that represents a patch of ground) or not. This was done by the following three criteria: a. It should not be labelled as a BUA pixel (see step 3 above).
b. The 'flat surface' criterion: The standard deviation of vertical heights of the set of points must be below a user-specified threshold (SDmax). c. The number of (point cloud) points in the pixel should be greater than a user-defined value npmin.
In this step, criteria (a) and (b) helps us identify flat surfaces that are not the tops of buildings and such structures, while (c) is for avoiding spurious ground points. Hence, there is high likelihood that these would be ground surface patches. This step is implemented by using a for-loop and looping through and examining all pixels in the area under consideration. 5. Then, we compute a 'vertical correction estimate' (∆ZCGP, correction to ground pixel) for each such identified ground pixel: where ∆ZDAP is the set of Z values of (point cloud) points in that (ground) pixel. 6. We then iteratively trim the list of candidates of DAP-based point cloud ground pixels identified in the above steps using the following criteria: a. Drop candidate ground pixels which specify too high absolute correction values (i.e., abs(∆ZCGP) > zcorrmax). b. Compute the vertical displacement difference between the highest and lowest candidate ground pixels. If this is greater than a user-defined threshold (hdiffmax), then both those candidate ground pixels are dropped. This is mainly to exclude points which are either too high (such as tree top points) or too low (noise-related belowground artifacts) from being selected as spurious ground points.
7. We interpolate a raster (∆ZCOR) representing a correction surface for the full spatial extend of the original DAP-based point cloud from these set of ∆ZCGP points. This interpolation surface is computed using Delaunay triangulation (linear interpolation inside triangles) followed by Gaussian filtering using a user-specified sigma value. 8. The Z value of each point in the point cloud is corrected (adjusted) as follows: where ZADJ is the adjusted height of the point, ZDAP is the original height of the point and ∆ZCOR is the value of the correction surface raster at that point.
The gist of these steps are to identify patches of ground in the DAP-based point cloud and use this information to generate a correction surface with respect to the ALS DTM. This correction surface is then used to correct original heights of all points in the point cloud ( Figure 2). This procedure is repeated for each point cloud generated independently from image pairs.

Built-Up Area (BUA) Exclusion
As mentioned before, we exclude built-up areas while searching for flat areas (ground patches). Our definition of built-up areas for this purpose is limited to the footprint of multi-story buildings present in the center of most cities, suburban houses, large factories, warehouses, shops and similar structures. The reason for this exclusion is that the rooftop of such structures are relatively flat, and would generate false ground patches, if they were included. These false patches would then introduce substantial errors to the correction raster.

Evaluation of Efficacy of Algorithm
The efficacy of our algorithm for urban sites was evaluated using DAP-based point clouds generated from images acquired over the city of Kuopio, Finland. Kuopio is a rapidly urbanizing area and is representative of many such medium-scale cities around the world.
• After the examination of several areas in and around the city, the pixel size for flat area detection and correction raster generation (pixel_size) was fixed at 3.0 m. • The standard deviation threshold (SDmax) was set to 0.2 m. This was done after examining several vertical profiles of ground patches. Most of the points for these samples fell within ± 0.5m from the mean. • The value of npmin was set to 10. This was done after realizing that pixels with less than ten points were mostly erroneously classified. • Meanwhile, the value of zcorrmax was set to 100.0 m. This was done after some initial runs, and realizing that some outlier points (displaced more than 100m) need to be filtered off. • The building exclusion step was done using a built-up area shapefile obtained from the National Land Survey of Finland (NLS), based on data collected in 2016.
We first performed a 'close-up' evaluation of our algorithm by inspecting the vertical profile of the DAP-based point clouds before and after the correction, for several representative areas. We then performed a quantitative evaluation over the entire study area in the following ways. For any given pixel, the standard deviation (SD) of the vertical elevation of the (point cloud) points that fall in the pixel boundary is a good indication of the adequacy of the vertical co-registration at that pixel. For example, for a flat building rooftop, the point clouds generated from different image pairs for this surface may be offset by several meters (e.g., poor co-registration, high standard deviation) or may be almost vertically coincident (e.g., good co-registration, low standard deviation). Our algorithm should affect a substantial decrease in vertical standard deviation in most areas if it was effective. Another similar metric of vertical spread (that should decrease) is the 95% interquantile range (IQR95), which we define in the following way: 5 -h2.5 (3) where h97.5 and h2.5 are the 97.5% and 2.5% percentile heights above ground of the point cloud. For both of these metrics, we calculated the per-pixel change after our algorithm was used for correcting the DAP-based point cloud. For example: ∆SD = SDafter adjustment -SDoriginal DAP point cloud (4) where ∆SD is the change in standard deviation affected by our algorithm. The overall mean changes over the entire area of Kuopio was also calculated. Hence, we quantified the effectiveness of our approach by studying changes in the vertical dispersion of points in the DAP point cloud.

Results
We examined the vertical profile of the DAP-based point clouds at several areas before and after our adjustment technique was applied. We noticed that the vertical co-registration was improved in many instances (Figure 3). The ground patches represented in the DAP point clouds overlapped much better. Additionally, the various DAP-based point clouds generated from different image pairs co-registered much better in the vertical direction. The change in standard deviation (∆SD, see (4)) between the original and adjusted point cloud elevations was quantified over the entire study area using 3 × 3 m pixels. Most areas (76.7%) registered relatively small change (less than 1.0 m). A substantial decrease in standard deviation (greater than 1.0 m), signaling better vertical co-registration, was recorded in 23.2% of the area. Meanwhile, the standard deviation increased substantially (more than 1.0 m) in only 0.05% of the study area. The spatial patterns of these changes over the whole study area are shown in Figure 4. From the figure (part (a)), it can be seen that our calibration method provides a decrease in standard deviation of height over almost all parts of the study area (blue pixels), with slightly more effect in the forested fringes. Meanwhile, in part (b) of the figure, we see that most of the pixels register a decrease, many by more than 5.0 m. We also observe that these decreases are more prevalent at the edges and corners of buildings than at flat surfaces. We examined the change in the 95% interquantile range (IQR95, see (3)) between the original and adjusted point cloud elevations in a similar fashion. Again, a majority of the area (56.1%) registered relatively negligible changes in this measure of dispersion (less than 1.0 m), while calibration affected a decrease for this metric in 43.8% of the area. The 95% interquantile range increased in a very small proportion (0.09%) of the area.
We also analyzed the change in standard deviation with respect to three typical urban land-use divisions in our study area (Table 1). This analysis indicates that the algorithm is most effective in urban areas, followed by suburban, and then forested areas. The reason for this seems to be that the magnitude of standard deviations is on the higher side in urban regions, especially for areas with tall buildings. The performance of image matching algorithms also degrade for flat urban surfaces (roads, parking lots, top of buildings) with minimal texture information. Hence, for land-use divisions with more proportion of such flat areas (such as urban downtown divisions), there will be more proportion of areas with substantial change in standard deviation. Table 1. Proportion of area with substantial decrease in dispersion metric (decrease of more than 1.0 m) for three land-use categories in our urban study region. The proportion of area with increases were very small (less than 0.1%) and hence, we do not report them below.  The underlying aerial image shows several building blocks from above. Transparent areas denote negligible decease in SD (less than 1.0m).

Discussion
Remote sensing provides an effective path forward for monitoring urban areas. 3D remote sensing data are useful in urban analysis but acquisition costs may be high, particularly because repeated measurements are often indispensable. DAP-based point cloud data has its advantages given that the typical acquisition cost is 30-50% that of ALS data [20]. This would enable a high frequency of observations (say, every 2-5 years), useful in understanding many constantly changing urban areas [25].
We have demonstrated that the proposed height adjustment technique clearly improved vertical co-registration and vertical accuracy of image point clouds. In the verification urban site, standard deviation of the vertical displacement of (DAP point cloud) points decreased in 99.9% of the study area and the decrease was substantial in 23.2% of the area. The height adjustment was most effective in urban and suburban areas, and was less so in vegetated areas. However, since the comparison was based on the vertical standard deviation, the differences most likely arise from the fact that in urban areas height variations are higher than in vegetated areas, which enables higher improvement in vertical accuracy.
There are several drawbacks related to DAP-based image point cloud data. Some image processing pipelines use only measurements from the GNSS and IMU instruments, rather than those from ground control points (GCPs), also known as 'direct georeferencing'. Thus, they avoid the collection of (costly) GCP data. The positions of the exposure centers can be obtained either in real time (e.g., real-time kinematic, RTK) or during post-processing (e.g., post-processed kinematic, PPK). PPK may improve particularly the accuracy of low-cost GNSS receivers. However, angular errors caused by the IMU are likely greater than positioning errors, particularly with low-cost micro-electromechanical system (MEMS) based IMU. The increased availability of lightweight and survey-grade GNSS and IMU systems has led to more georeferencing options, but their associated spatial accuracy is still lesser than is achievable using GCPs [16,17]. All GNSS/IMU instruments have non-negligible errors associated with them. The resulting errors in external orientation are ultimately propagated as errors of positioning (both horizontal and vertical) of individual points in the generated point cloud. Another source of error is the displacement vector between the antennae and the center of the camera projection (lever-arm). Omission or the use of approximations of this element in calculations can be problematic. One should also consider even the differences between the orientation of the IMU and downward vector of gravity [26,27] . As the altitude of imaging gets higher, geometrical distortions such as refraction phenomena and the curvature of the earth also result in positional inaccuracies. Sources of errors associated with vegetated land-covers are canopy tops swaying in the breeze and occlusion of some patches. Hence, canopy gaps that could be distinguished in ALS data are usually not identified in DAP point cloud data.
A crucial element of the algorithm proposed in this paper involves the identification of several ground patches in the urban scene. Hence, it would be less effective for image pairs where very little ground patches are represented. This is possible for example, in the case of city center sections with large buildings. Nevertheless, such conditions do not dominate the landscape of the average city, and are usually restricted to a limited area. Another aspect of the method outlined in this paper is the need to generate built-up area estimates for the urban area in consideration (Section 2.4). In our case, the built-up area shapefile was from the National Land Survey of Finland. Similar products are possible, and may be available in other areas, too. It is possible to extract building footprints from remote sensing datasets such as high-resolution images [28,29]. However, sufficiently dense point clouds generated from ALS data used alone or in fusion with other datasets remain the primary approach used for contemporary urban footprint extraction [30]. Wang et al. describes how a combination of relief-corrected aerial imagery and ALS data has been used to identify and map buildings in urban areas [31]. A simple heuristic is to use an ALS-based digital surface model depicting height above ground (nDSM) to identify buildings (for e.g., nDSM height > 10m) and exclude such areas from the ground patch search. If the nDSM predates the DAP-based point cloud, one need to assume that new buildings were not built in the area of interest in the interim period.

Conclusions
In this article, we have proposed a simple and effective method to leverage on existing ALSbased digital elevation data to better co-register and align DAP-based point clouds. We also demonstrated the effectiveness of our method using data from a large, heterogeneous urban area, i.e., Kuopio city. DAP data is expected to become cheaper and more prevalent, given the advancement of technology related to acquisition systems (i.e., acquisition from high-altitude aircraft; from nearsurface unmanned aerial systems). Meanwhile ALS data will continue to the popular as a higher quality but less frequently available and updatable option. This paper presents a method for leveraging on one (the ALS data) to enhance the quality of the other (DAP point clouds). The proposed algorithm is computationally tractable and can be easily parallelized. Given that the amount of urban remote sensing data is rapidly increasing, we hope that efforts such as ours will help to create high-quality, analysis-ready, and policy-relevant spatial datasets.
Author Contributions: R.G. and P.P. conceived and designed the research experiments, with inputs from D.A. and M.K.; the algorithm was developed by D.A. and associated software was written by D.A. and R.G.; auxiliary datasets were generated by M.K.; analysis was done by R.G. with contributions from P.P.; writing-original draft preparation, R.G.; All authors have read and agreed to the published version of the manuscript.