On the Reliable Generation of 3D City Models from Open Data

: A 3D model communicates more e ﬀ ectively than a 2D model, hence the applications of 3D city models are rapidly gaining signiﬁcance in urban studies. However, presently, there is a dearth of free of cost, high-resolution 3D city models available for use. This paper o ﬀ ers potential solutions to this problem by providing a globally replicable methodology to generate low-cost 3D city models from open source 2D building data in conjunction with open satellite-based elevation datasets. Two geographically and morphologically di ﬀ erent case studies were used to develop and test this methodology: the Chinese city of Shanghai and the city of Nottingham in the UK. The method is based principally on OpenStreetMap (OSM) and Advanced Land Observing Satellite World 3D digital surface model (AW3D DSM) data and use GMTED 2010 DTM data for undulating terrain. Further enhancement of the resultant 3D model, though not compulsory, uses higher resolution elevation models that are not always open source, but if available can be used (i.e., airborne LiDAR generated DTM). Further we test and develop methods to improve the accuracy of the generated 3D models, employing a small subset of high resolution data that are not open source but can be purchased with a minimal budgets. Given these scenarios of data availability are globally applicable and time-e ﬃ cient for 3D building generation (where 2D building footprints are available), our proposed methodology has the potential to accelerate the production of 3D city models, and thus to facilitate their dependent applications (e.g., disaster management) wherever commercial 3D city models are unavailable.


Introduction
Three-dimensional city models have become an important resource for planning, development, and policymaking in urban areas [1][2][3][4][5]. A 3D city model is a digital model of an urban environment with a three-dimensional geometry of urban structures, as well as related objects belonging to urban areas [6]. Applications using 3D city models have increased in their scope and complexity [7], spanning from the analysis of electromagnetic propagation for telecommunications through environmental simulations analysing irradiation distribution [8,9] and noise propagation [10] to virtual or augmented reality applications [11,12]. This proliferation of applications is, in turn, driving an increasing demand for the creation and maintenance of reliable 3D city models. A standard approach to creating city models at a large scale automatically or semi-automatically is to apply stereo vision on aerial or satellite remote sensing imagery [3]. This, however, can be an expensive and/or time/labour-consuming process, in availability. Required, therefore, is a methodology that considers the terrain underlying the urban area of interest and uses datasets that are available worldwide.
In this paper, we used open DSM data as a foundation dataset and utility in a globally replicable methodology to generate 3D city models. Recently available elevation datasets such as the AW3D DSM (with a horizontal spatial resolution of approximately 30 m) by the Japanese Aerospace Exploration Agency (JAXA) have an open license (a higher resolution (approx. 5 m) DSM is also produced, but only as a commercial product [25]). Other common elevation-rich datasets include the ASTER DEM and that from the SRTM. Although these provide mainly terrain (a digital surface model includes all the natural and built features on the earth's surface, whereas a digital terrain model is simply an elevation surface representing the bare earth referenced to a common vertical datum [26]) elevation values that are freely available under permissive data licenses [27]. We present a methodology that uses open data of 2D building footprints, along with DSM and DTM datasets, to generate 3D buildings in two geographically and morphologically diverse cities, namely the Huangpu district in Shanghai, China, which has a relatively flat topography, and Nottingham, United Kingdom, which has a more undulating terrain. Shanghai and Nottingham are inherently different from each other, not only in terms of physiography but also in terms of level of urbanization. While Shanghai is a rapidly urbanizing city, Nottingham is stabilized and saturated. Hence, these two cities provide end members to transfer the methods globally.
A secondary objective was to consider scenarios of data availability that could improve the overall accuracy of the open source 3D building model generated (which we call a foundation model).
Here, we exploited that often higher resolution elevation data are available, though not always, or never, open source, and/or of limited spatial coverage. For instance, there are a number of examples where previously proprietary LiDAR datasets are now being opened, though often these are for cities in the global North [28], or it may be the case that projects to produce 3D city models have a limited budget. Further, here we used the ALOS DSM to generate building heights. AW3D-30 DSM is produced by resampling the 5 m ALOS DSM, resulting in accuracy reduction. Thus, it is not possible to use this low resolution DSM directly in the same way you would with a high resolution commercial dataset. From high resolution DSMs, roof heights or building heights could be easily measured. Whereas, in low resolution ALOS DSM, this is not possible. This study thus also explored the optimal approach to using the ALOS-30m DSM.

Study Area
The focus was on two cities of very different scale and character: Nottingham in the UK and Shanghai in China. These two cities also differ considerably with respect to data availability. The diverse topographical and urban morphologies of the two cities afforded a robust assessment of the methodology presented in this paper to produce 3D city models openly.
The city of Nottingham is located 206 km to the north of London, in the East Midlands region of the UK. The city has a total area of 75 km 2 and accommodates a total population of 325,000 [29]. Nottingham is situated on an area of low hills along the lower valley of the River Trent and has an undulating topography. The average elevation of Nottingham is about 61 m [30]. Although the population of Nottingham City has recently grown (by 13% between 2000 and 2010 according to the Nottingham City Economic Review, 2011). Compared to Shanghai, the city is less agglomerated with greater proportions of small and medium sized buildings, and far fewer high-rise buildings. Shanghai is also almost two orders of magnitude larger than Nottingham. Four wards were selected from Nottingham that represent the spatial characteristics of the city.
Shanghai, located on the east tip of the Yangtze River Delta and on the east coast of China, is one of the most urbanized areas in China. Being one of the most dynamic cities in the world, it is a difficult city to understand, plan, and manage [31]. With a total area of 6340 km 2 , it is one of the fastest economically growing and most densely populated cities in East Asia. In 2014, it had a population of more than Urban Sci. 2020, 4, 47 4 of 21 24 million. The average elevation of the city varies between 3 to 5 m above mean sea level. At present, Shanghai has 16 districts and one county (Chongming) under its jurisdiction. In the first instance, our focus was on the Huangpu District, due to the complexity of the morphology and environs across this area. Huangpu covers an area of 20 km 2 and is located in the city centre. It is comprised of a mixture of very tall buildings (more than 100 m), as well as very old and clustered buildings.
Unlike Nottingham, Shanghai is characterized by flat topography and the average elevation of the city's terrain is four meters above mean sea level (msl). While Nottingham is less agglomerated, with greater numbers of medium and small sized buildings and far fewer high-rise buildings, Shanghai is occupied by a very dense and complex morphology with large numbers of medium and tall buildings.
The availability of open data, including OSM, is very limited and non-uniform in coverage for Shanghai, particularly in comparison with Nottingham. Thus, Shanghai is an ideal case to be compared with Nottingham to gain insights on how our methodology may work across the spectrum of cities in their geographies and morphologies.

Data
A DSM affords the extraction a variety of features, including terrain, buildings, vegetation, and any other surface features [3]. Hence, the basic principle in obtaining the building heights from the AW3D DSM data was to remove the ground elevation from the DSM. For cities that have a flat terrain, the building heights can be generated by simply subtracting a mean ground elevation from DSM values. Whereas in the case of topographically varying city terrains, digital terrain models (DTM) can be used to obtain the ground elevation. DTMs are similar to DSMs, but exclude surface features. Thus, the datasets to be used with the OpenStreetMap data for Nottingham and Shanghai to produce globally replicable 3D city models were: (1) the open source ALOS DSM, which has a spatial resolution of 30 m and (2) the open source Global Multi-resolution Terrain Elevation (GMTED2010) dataset-the minimum value layer. Although this has a resolution of 225 m, it is used since it is a globally applicable dataset. In addition to the globally available ALOS DSM and GMTED2010 DTM datasets, we explored how additional datasets could enhance the quality of the 3D city models produced for both Nottingham and Shanghai under different scenarios of data availability. For the city of Nottingham, airborne LiDAR-generated DSM and DTM (2 m spatial resolution) were used and for Shanghai a commercial high-resolution DSM (AW3D Enhanced at 2 m spatial resolution) was procured and used. For validation of the 3D city models produced, the BHA MasterMap data set and the AW3D Enhanced were used for Nottingham and Shanghai, respectively. The composition and provenance of all datasets are described below and further details about their purpose is given in Table 1.

OpenStreetMap (OSM)
All the required 2D building footprints were gathered from the OSM database [32]. Open GIS data available for Shanghai, China were downloaded from the website mapzen.com, which relies on OSM for many of its products. OSM is a collaborative project to create free editable geographic data and a prominent example of volunteered geographic information [22]. The OSM building footprints (with relevant attribute information) were extracted for the Huangpu district-where the coverage is relatively dense (see Figure 1).
OSM data are available for Nottingham from a number of sources, and include similar data layers as for Shanghai. As with Shanghai, the OSM building layer data for Nottingham is of a higher density in the city centre, with sparser coverage for the residential suburbs. Building footprints vary in their complexity and accuracy compared to the detailed mapping available from the Ordnance Survey's MasterMap dataset [33] (highest resolution digital mapping available for the UK). For some buildings, the OSM data are visually comparable to its MasterMap counterpart, although we note that in some instances, the OSM footprints have a simplified geometry and often do not include building subdivisions (e.g., between properties of terraced houses). All the required 2D building footprints were gathered from the OSM database [32]. Open GIS data available for Shanghai, China were downloaded from the website mapzen.com, which relies on OSM for many of its products. OSM is a collaborative project to create free editable geographic data and a prominent example of volunteered geographic information [22]. The OSM building footprints (with relevant attribute information) were extracted for the Huangpu district-where the coverage is relatively dense (see Figure 1).  The DSM produced by the Japanese Aerospace Exploration Agency (JAXA) is of relatively fine resolution, at about 0.15 arcsec or approx. 5 m [34][35][36]. JAXA used the archived data of the panchromatic remote-sensing instrument for stereo mapping (PRISM) onboard the ALOS to generate a Urban Sci. 2020, 4, 47 6 of 21 DSM for the whole globe, known as "ALOS World 3D (AW3D)" [37]. The AW3D-30 global dataset, which has a 30 metre spatial resolution (1 arcsec), is a resampled version of the 5 m mesh version of the AW3D [25]. For this work, we used the latest AW3D-30 product, released in May 2017. For both Shanghai and Nottingham, 30 m ALOS DSM data are currently the most precise global scale open source elevation [36] dataset (free to the public since 2015). The AW3D Enhanced product (at 2 m resolution) was also procured, giving a sample covering 16 sq.km of the high resolution DSM at 2 m for our study area in Shanghai.

GMTED2010
GMTED2010 is the digital elevation (DEM) model product of The United States Geological Survey (USGS) and The National Geospatial Intelligence Agency (NGA) to replace the existing model, designed as Global 30 ArcSecond Elevation (GTOPO30), and has been available to the public since 2010 [38,39]. It is available in three resolutions, i.e., with horizontal spacing of 7.5 arc-second (about 250 m), 15 arc-second (about 500 m), and 30 arc-second (about 1 km), and its main data source is a SRTM version with 01" resolution restricted to the NGA and not available to the general public [40]. Other data sources include the Canadian Digital Elevation Data (CDED), SPOT 5 Reference 3D, NED for the continental USA and Alaska, GEODATA 9 Second Digital Elevation Model for Australia, DEMs for Antarctica and Greenland from laser altimetry (ICESat and GLAS data) and satellite radar (ERS-1 data) [38,40]. This study used the minimum band of GMTED2010 with 250 m resolution due to its global coverage.

Digital Terrain and Surface Models Derived from Airborne LiDAR Data for the UK
The UK Environment Agency's LiDAR data archive contains accurate digital elevation data for over 70% of England [41]. For the city of Nottingham, LiDAR-derived DSM and DTM at 2 m resolution are openly available. For the present study, we used this dataset to extract the ground elevation value for the Nottingham study area in order to enhance the 3D city model produced.

OS Mastermap BHA
The building height attribute (BHA) dataset published in 2014 is an enhancement to the Ordnance Survey (OS) MasterMap Topography Layer. BHA data are not available for the whole country, but it covers major cities and towns of Great Britain. BHA provides a set of height attributes (ground level, base of roof, and the highest part of the roof) for topographic area features with a buildings theme within OS MasterMap Topography Layer. OS publish the data as a single CSV file containing over 20 million records [42,43]. For the present study, we used the BHA data for Nottingham for validation.

Methodology
The overall methodology adopted is illustrated in Figure 2. The workflow describes different steps to be taken that are dependent, first, on the terrain on which an urban area resides and, second, on whether there are any relevant additional datasets available The foundation workflow yields a 3D model output possible for all urban areas globally, with the possibility of enhancement of that 3D model should other higher resolution data be available (but are not a necessity). Further details are below.

Generating 3D Buildings from Open Data (Foundation Workflow)
The first stage in applying this methodology is to establish whether the urban area of interest (AOI) has a terrain that is flat or undulating (workflow chart step 1), since this determines whether additional data and processing steps are required, on a building-by-building basis, to identify the building heights. The 2D building polygon data and the AW3D-30 data (i.e., the DSM) subsequently need to be co-registered, ensuring that there is no shift between the datasets.

Technical Validation of Building Height (Foundation Workflow)
For Nottingham, our building heights were compared with the building height values provided by the OSGB MasterMap [33]. The computed heights of 15,000 buildings in Nottingham were compared with the corresponding building height attributes (BHA) [33] of the OSGB MasterMap for the city, using arithmetic differencing. Structured Query Languages (SQL) queries were then performed to count the instances of buildings for which height differences h were <1 m, 1 m < h ≤ 2 The methodology is developed to extract the optimal elevation results from the low-resolution AW3D-30 DSM. As stated above the AW3D-30 open dataset has a 30 m spatial resolution (1 arcsec), which is a resampled version of the 5 m mesh version of the AW3D [25], so already the elevation values are the average of many adjacent pixel values. In the case of an urban AOI with a flat terrain (i.e., Shanghai in our example case), the AW3D-30 DSM is joined to the 2D shapefile (workflow chart steps 2A to 6A). The ALOS3D-30 is in raster format and the linear interpolation method is used in to assign the elevation value from raster surface to the vertex of the polygon. This operation will assign a Z value to each vertex of the 2D building polygon. Out of these values, the maximum Z of the geometry is taken as the elevation value since this will reduce the effects of shift caused by different projection systems and to overcome the low resolution of AW3D-30 data. This is because if we calculate an average Z value it may also include ground elevations (i.e., due to height data relating to surfaces beyond the building footprint as AW3D-30 is a resampled version of many adjacent pixels), thereby reducing the overall height value; similarly, if we consider minimum Z there is a chance that this will give the ground elevation directly. It is worth noting that if the DSM was of higher resolution (e.g., 2 m resolution), we would have taken the average Z value within a polygon as the building height. After this process, the mean ground elevation of 4 m (this is the mean elevation of Shanghai) is removed from the AW3D-30 DSM data in order to obtain the building heights (workflow chart steps 7 to 10).
In the case of an undulating terrain (i.e., Nottingham in our example case), building roof heights were computed following the same steps as for Shanghai. However, to accommodate for the change in elevation of the terrain across the urban AOI an alternative workflow is necessary. In this case, to obtain the buildings' ground elevation, the GMTED2010 (i.e., a DTM) is joined with the 2D building polygon using the same interpolate shape function and the minimum Z of the geometry is calculated and assigned to the attribute table of the 2D building polygon (flow chart step 2B to 6B). Here, the minimum Z is used to reduce the effect of shift in the process. If we use an average or maximum of Z, there is a chance that it may reflect the building height values (the converse of the previous case). Once these steps are complete, the height values of the individual buildings are calculated by subtracting the maximum elevation value obtained from the AW3D-30 DSM with the minimum elevation value obtained from the GMTED2010 DTM. The output generated is the estimated heights of individual buildings (workflow chart steps 7 to 10).

Technical Validation of Building Height (Foundation Workflow)
For Nottingham, our building heights were compared with the building height values provided by the OSGB MasterMap [33]. The computed heights of 15,000 buildings in Nottingham were compared with the corresponding building height attributes (BHA) [33] of the OSGB MasterMap for the city, using arithmetic differencing. Structured Query Languages (SQL) queries were then performed to count the instances of buildings for which height differences h were <1 m, 1 m < h ≤ 2 m, 2 m < h ≤ 5 m, and >5 m, together with the corresponding percentages. For Shanghai, a similar validation exercise was performed. However, for Shanghai, there is no openly available high resolution building height data. Therefore, to validate our results, we used the AW3D Enhanced product at 2 m spatial resolution. This product is stated to be derived from the Digital Globe WorldView satellites [44]. Building heights that are derived from AW3D-30 m could then be cross-checked with the heights derived from this 2 m DSM, and the resultant height values refined (flow chart step 11 and 12). In total, 2027 buildings were used in this validation.

3D Foundation Model Enhancement
The foundation workflow (Section 2.2.1) produces a 3D city model that is globally replicable, however, it may be the case that higher resolution elevation data are available (open) or could be procured as per limited budgetary resources. These data could enhance the accuracy of 3D buildings in the model by computing the error factor for building heights. The error factor is the deviation of height values generated in the foundation work flow to the height of the corresponding building obtained from high resolution data for each of the cities. Once computed, these values can be used to correct the building heights in other similar areas. For enhancement, a high resolution dataset needs to be available for a representative sample area of the AOI (Figure 3). Urban Sci. 2020, 4, 47 9 of 21 procured as per limited budgetary resources. These data could enhance the accuracy of 3D buildings in the model by computing the error factor for building heights. The error factor is the deviation of height values generated in the foundation work flow to the height of the corresponding building obtained from high resolution data for each of the cities. Once computed, these values can be used to correct the building heights in other similar areas. For enhancement, a high resolution dataset needs to be available for a representative sample area of the AOI (Figure 3). We used consistent 1 m interval categories of maximum building height for the polygon concerned (e.g., an approximation of a ridge height for pitched roof houses). This interval selection helps in generating good correlation and is easy to apply to other similar area. For the Nottingham case, the maximum number of building heights observed within the range of 2 m to 8 m was calculated using the AW3D-30 dataset (flowchart step 13 to 15). So, regression equations with 1 m intervals were created for this range (e.g., seven unique categories of building height: 2 ≤ h ≤ 3 m, 3 m < h ≤ 4 m, …, and 7 m < h ≤ 8 m). These 1 m ranges were chosen because they provide improved correlation over other ranges. In order to obtain the regression equations both ALOS-derived heights and high resolution derived heights were exported to the excel scatter plot graphs created, from which a linear regression equation was derived (flowchart step 14). The regression equations derived We used consistent 1 m interval categories of maximum building height for the polygon concerned (e.g., an approximation of a ridge height for pitched roof houses). This interval selection helps in generating good correlation and is easy to apply to other similar area. For the Nottingham case, the maximum number of building heights observed within the range of 2 m to 8 m was calculated using the AW3D-30 dataset (flowchart step 13 to 15). So, regression equations with 1 m intervals were created for this range (e.g., seven unique categories of building height: 2 ≤ h ≤ 3 m, 3 m < h ≤ 4 m, . . . , and 7 m < h ≤ 8 m). These 1 m ranges were chosen because they provide improved correlation over other ranges. In order to obtain the regression equations both ALOS-derived heights and high resolution derived heights were exported to the excel scatter plot graphs created, from which a linear regression equation was derived (flowchart step 14). The regression equations derived from different ranges were then employed to correct building heights for all instances of that category that were found within the AW3D-30 dataset, both within and outside the high resolution sample area (flowchart step 16 to 18). The technical validation of the enhanced model was done in a similar way stated for validation of foundation model. This validation was done over exactly the same buildings using the same data that were considered for the validation of foundation model.

Nottingham
After obtaining the foundation 3D model (Figure 4 shows a sample area) for Nottingham (i.e., AW3D-30 derived building heights), we compared these building heights with MasterMap BHA to assess the accuracy of this preliminary result. This revealed that 27.7% of all buildings fall within the accuracy level of +/−1 m elevation, and 51.45% and 84.47% within +/−2 m and +/−5 m, respectively. About 15.53% of buildings were above +/−5 m accuracy level. When we compared both sets of height values, it was observed that a higher level of height difference occurred in the case of taller buildings. The percentage of buildings falling under each error ranges are shown in Figure A1 Appendix A. The low-and medium-rise buildings showed relatively good correlation with the MasterMap BHA values.
After obtaining the foundation 3D model (Figure 4 shows a sample area) for Nottingham (i.e., AW3D-30 derived building heights), we compared these building heights with MasterMap BHA to assess the accuracy of this preliminary result. This revealed that 27.7% of all buildings fall within the accuracy level of +/−1 m elevation, and 51.45% and 84.47% within +/−2 m and +/−5 m, respectively. About 15.53% of buildings were above +/−5 m accuracy level. When we compared both sets of height values, it was observed that a higher level of height difference occurred in the case of taller buildings. The percentage of buildings falling under each error ranges are shown in Figure A1 Appendix A. The low-and medium-rise buildings showed relatively good correlation with the MasterMap BHA values. In the application of the accuracy enhancement method by way of a sample of high-resolution elevation data, it was determined that the majority of building heights fall within the range of 2 m to 8 m (established using the AW3D-30 dataset). Hence, a regression equation with a 1 m interval was created for this range of 2 m to 8 m in order to enhance the accuracy of the foundation 3D model ( Figure A2). This 1 m interval was chosen to obtain good correlation between two datasets of generated AW3D-30 height values and high resolution LiDAR data. The regression equations derived from these categories are given in Table 2 and these were applied to obtain an enhanced 3D city model.  In the application of the accuracy enhancement method by way of a sample of high-resolution elevation data, it was determined that the majority of building heights fall within the range of 2 m to 8 m (established using the AW3D-30 dataset). Hence, a regression equation with a 1 m interval was created for this range of 2 m to 8 m in order to enhance the accuracy of the foundation 3D model ( Figure A2). This 1 m interval was chosen to obtain good correlation between two datasets of generated AW3D-30 height values and high resolution LiDAR data. The regression equations derived from these categories are given in Table 2 and these were applied to obtain an enhanced 3D city model. Validation of the enhanced 3D model demonstrated that applying the regression equations to the foundation model had the impact of improving its accuracy across the board. The proportion of buildings in the model having an accuracy level of +/−1 m increased from 27.7% to 32.81% (Table 3), having an accuracy level of +/−2 m increased to 57.43% from 51.45, an accuracy level of +/−5 m increased to 88.46% from 84.47%, and buildings having an error value above +/−5 m were reduced from 15.73% to 11.54%. It was noticed that even after enhancement, there was no significant height value correlation increase in the case of taller buildings.
It is worth noting that, as stated in methodology, we considered maximum elevation value within a polygon as the AW3D-30 DSM height. Using the height generated via the minimum and average elevation value within was not as accurate.

Replacing GMTED 2010 Ground Elevation Data with High Resolution Ground Elevation Data
To understand how the GMTED 2010 DTM data impact on the quality of the foundation 3D model, the model was again constructed using high resolution LiDAR DTM as the ground elevation input along with the AW3D 30 m DSM. Validation using the MasterMap BHA values demonstrated that about 31.43% of total buildings achieved an accuracy within +/−1 m elevation and 60.14% were within +/−2 m (Table 4). Deviations for only 5.27% of all the buildings exceeded +/−5 m, but a significant proportion of the cases having this largest deviation were due to errors in the MasterMap BHA dataset or within AW3D-30 dataset (these errors were identified by cross-checking these individual sites with other datasets like Google Earth™, where open street views are available).

Shanghai
We considered only a sample of the 2027 OSM buildings of the Huangpu District of Shanghai to generate the 3D model, as well as to calculate a correlation coefficient. The modelled building heights from AW3D-30 DSM for Huangpu District have been compared with the commercial 2 m accuracy DSM that was procured for the study area. Unlike Nottingham, Huangpu, Shanghai has very tall buildings (Figure 5), hence the range of difference between the real height and generated 3D building heights were higher than for Nottingham. It was observed that about 33% of buildings fall within the error range of +/−2 m and about 30% of buildings within an error range of +/−2 m to +/−5 m (see Table A1). The regression equations used to enhance the accuracy of foundation model are given in Table 5.

Technical Validation of Enhanced 3D Model
It was observed from the validation results that the overall accuracy of the foundation 3D model has improved using the accuracy enhancement method. The difference in the percentage of buildings with different accuracy level ranges before and after applying accuracy enhancement methods are given in Table 6. Higher rates of accuracy enhancement were observed for the lower ranges (i.e., up for +/−1 and +/−2). Where the difference in values between the actual height and the generated height increased, there was an observed decrease in accuracy enhancement level. For example, after accuracy enhancement in the range of +/−5 m, the total percentage enhanced from 62.26% to 64.54% only and there was no accuracy increase for +/−10 m accuracy range (Table 6). In lower height deviations (1 or 2 m) level we obtained a good accuracy increase by correlation, but in higher deviation sections (5 or 10 m), the accuracy improvement was relatively lower or null.
Urban Sci. 2020, 4, x FOR PEER REVIEW 12 of 22 A1). The regression equations used to enhance the accuracy of foundation model are given in Table  5. Figure 5. Sample of foundation 3D model generated from AW3D-30 data and classified according to the elevations, Shanghai (green colour represents low-rise buildings, brown colour represents medium-rise buildings, dark brown represents high-rise buildings). It was observed from the validation results that the overall accuracy of the foundation 3D model has improved using the accuracy enhancement method. The difference in the percentage of buildings with different accuracy level ranges before and after applying accuracy enhancement methods are given in Table 6. Higher rates of accuracy enhancement were observed for the lower ranges (i.e., up for +/−1 and +/−2). Where the difference in values between the actual height and the generated height increased, there was an observed decrease in accuracy enhancement level. For example, after accuracy enhancement in the range of +/−5 m, the total percentage enhanced from 62.26% to 64.54% only and there was no accuracy increase for +/−10 m accuracy range (Table 6). In lower height deviations (1 or 2 m) level we obtained a good accuracy increase by correlation, but in higher deviation sections (5 or 10 m), the accuracy improvement was relatively lower or null. Figure 5. Sample of foundation 3D model generated from AW3D-30 data and classified according to the elevations, Shanghai (green colour represents low-rise buildings, brown colour represents medium-rise buildings, dark brown represents high-rise buildings). Shanghai is characterized by high-rise buildings, hence the ranges considered for accuracy assessment were from +/−1 to more than +/−10 m. Whereas for Nottingham, the maximum range Urban Sci. 2020, 4, 47 13 of 21 was +/−5 m, since the city is occupied by low-rise buildings. The proportion of buildings having an accuracy of +/−1 m was low (17.66%) in the case of Shanghai, which increased to 28.3% after accuracy enhancement. This contrasts with an accuracy of 27.7% for Nottingham, or 32.81% after accuracy enhancement. While 64.54% of buildings were found to be within the accuracy range of +/−5 m for Shanghai, this was much higher for Nottingham at 88.46% (after enhancement in both cases). Further, even after accuracy enhancement, 20% of all the buildings in Shanghai's Huangpu District were found to have an error of +/−10 m in their modelled height.

Discussion
Three dimensional building models form useful data inputs for many analytical tasks, but their creation typically relies upon time-consuming editing, expensive proprietary datasets, or both. Here, we present a simple method of generating 3D buildings from open data that can be applied globally. The results presented in this paper show that AW3D-30 DSM data provide more accurate results in the case of low-and medium-rise buildings, and that errors can be improved through a calibrated enhancement process. Using OSM in combination with the medium-resolution AW3D-30 DSM, a set of building footprints with height information were created and their quality ascertained. We then evaluated enhancements to height accuracy through statistical analysis of a small sample area of high-resolution data (thus limiting expense where these data are not freely available).
The approach presented can be applied by any user that has 2D building footprint data and AW3D data and terrain information (i.e., from GMTED2010). AW3D-30 is the most suitable open DSM for building height generation, in comparison with ASTER, SRTM, and TanDEM-X [21]. However, while using AW3D-30 DSM there is a challenge of dealing with mixed pixels due to instances when buildings in the AW3D-5 digital building height range with a ground footprint of approximately 30 m or less were split into adjacent 30 m resolution pixels, each with a lower height than the original [21]. Thus one of the important advantages of using OSM together with AW3D-30 DSM is that it helps to avoid the issues of mixed pixels and provides more accurate individual building heights and shapes. To the authors' knowledge, this is the first attempt at combining OSM data with AW3D data to generate 3D models. We built upon previous studies that fused OSM with satellite-derived elevation data [24], however in our study, we provided a method to generate 3D models for both flat and undulated terrain using open data, which makes it feasible to replicate globally with any kind of terrain. Our study also demonstrated ways to increase the accuracy of the generated 3D city models using a sample high resolution DSM and DTM data. Our work also demonstrated that the usage of high resolution DTM for ground elevation extraction can result in higher accuracy of building height values. This paper recommends the use of high-resolution digital terrain models (DTMs) wherever possible and in the absence of the same, GMTED 2010 data shall be used as ground elevation for undulating terrain and can use mean elevation value as ground elevation for flat terrains like Shanghai. The study also highlighted the need for a geospatial community to generate a global open access high-resolution DTM. The need for generating global high-resolution DEM in open access was also highlighted by Schumman and Bates, 2018 [45]. There are also initiatives like 'Open Topography', which facilitates community access to high-resolution topographic data [46]. These high-resolution data (metre to sub-metre scale) are derived from LiDAR and other technologies. This free access to high-accuracy terrain data further sheds light to the extensive potential of generating highly accurate 3D city models using open data.
The accuracy assessment of the two distinctive cities shows that the 3D model developed using this methodology will have higher accuracy in cities like Nottingham, where majority of the buildings are of a low rise and where growth is relatively saturated. Whereas in cities like Shanghai, where the percentage of very tall buildings is high, the accuracy will be reduced. In our study, for Nottingham, we could generate 27.7% of buildings with +/−1 m accuracy, 51.45% with +/−2 m accuracy, and 84.27% with +/−5 m. In Shanghai, the accuracy was much lower than that of Nottingham-the percentage of buildings within the accuracy levels of +/−1 m, +/−2 m, and +/−5 m were 17.66, 32.96, and 62.26 respectively. The accuracy reduction in Shanghai is explained by the increased number of tall buildings compared to city of Nottingham. It is significant to observe that the AW3D-30 DSM provides more accurate results for low-and medium-rise buildings, but exhibits relatively large errors in height for very tall buildings. This result echoes findings of Alganci et al. [36], but contradicts the finding of Misra et al. [21]. Accuracy assessments of different DSMs by Alganci et al. [36] revealed that the AW3D-30 DSM performed worse for high-rise buildings compared to SPOT DSM and PHR DSM, and that AW3D-30 DSM has a high accuracy level in residential areas. In contrast, Misra et al. [21] reported that AW3D-30 is most suitable for observing buildings taller than 9 m in height. However, this is in comparison with ASTER-and SRTM-based building heights, which are less suitable for extracting building height variation [21]. In our study we considered all buildings with height above 2 m and results showed good accuracy. Hence, using the presented method, even without any accuracy enhancement, will provide better accuracy in cities with low-and medium-rise buildings compared to cities with high-rise buildings. Once our accuracy enhancement method was applied (by way of a sample of high resolution elevation data), this improved the reliability of 3D models from open data (in Nottingham we demonstrated enhancements in the percentages of buildings within an accuracy level of +/−1 m from 27.7% to 32.81%, and for accuracy level of +/−2 m from 51.45% to 57.43%). However, this method is limited to the containment of only systematic errors; random errors are not accounted for.
Using OSM in combination with AW3D-30 DSM data has substantial potential for future scientific research due to the former's ever-growing size and the latter's global coverage [24,[34][35][36]47]. Studies have reported that there has been a considerable increase in OSM building data in recent years. For example, from 2012 to 2017 alone there has been a 20 times increase in OSM building data in China [48]. Effective derivation of elevation values for OSM data will likely extend its utility [22]. However, the absence of a global completeness assessment may hamper the use of OSM for urban planning and development, unless it is resolved [49]. One of the major concerns in using OSM data is the quality. Most OSM data are provided by nonprofessionals and hence both the coverage and the quality of the data are questionable [50][51][52]. Despite this disadvantage, OSM is a good source of 2D building data, especially where free 2D building data are unavailable, as in China, where authorized building data are not freely available [48]. Studies have also revealed that the rate at which OSM is receiving contributions from users has been constantly increasing and is continuing to grow; complemented by collaborative mapping efforts amongst the OSM community to check and improve the quality of contributions [53].
AW3D-30 DSM also has considerable future potential, particularly for low-and middle-income countries, given its global coverage and open license. The JAXA released its first version AW3D-30 m DSM with a horizontal resolution of approx. 30 m mesh, free of charge in May 2015. This dataset was generated from the DSM dataset (5 m mesh version) of the precise global digital 3D map ALOS World 3D" (AW3D), which was the world's first and the most precise 3D map covering all global land scales with a 5 m mesh [37]. Although the AW3D-30 DSM had a 30 m grid spacing, it could be deduced that this was due to the acquisition of strong signals from the original 5 m DSM, which was produced from the 2.5 m images [36]. In March 2017, version 1.1 was released, filling the void height values with existing DEMs in cloud and snow pixels between 60 • north and 60 • south. In April 2018, AW3D was upgraded to version two [54]. Continuous enhancements of AW3D-30 DSM are expected, improving its future utility.
Thus, one of the great advantages of our methodology is that 3D models can be generated from any 2D building data in combination with any DSM, which means not just using OSM and AW3D-30 DSM data. Wherever any 2D building data are available, the user will be able to generate the building elevation in combination with DSM data. Currently, only AW3D provides free DSM data. Even though ASTER DEM and SRTM provide elevation data, they are not surface models, hence those datasets are not usable to generate 3D building elevation. However, in the future there will likely be higher resolution DSMs. LiDAR DSM and ICESat-2 data are examples. Many countries are already providing accurate LiDAR DSM data. For example, LiDAR DSM data are already available for about 70% of England from the UK Environmental Agency [41]. ICESat-2 (ICE, CLOUD, and Land Elevation Satellite) is an ambitious mission from NASA, which will provide a global distribution of geodetic measurements of both the terrain surface and relative canopy heights and it will also survey urban areas [55]. Further, Global Ecosystem Dynamics Investigation (GEDI) LIDAR from NASA, with its dense track sampling and precise geolocation, forms the basis of an important dataset of ground control points to validate and calibrate global and regional DEMs and serves as a reference for surface elevation change [56]. Thus, we hope that when more accurate DSMs become available, it will enable the user to produce more accurate 3D models with better shape descriptions of buildings, especially roof modelling, thereby generating higher LODs using the defined methodology. Knowing the nature of the terrain in the modelling area is a factor in our method. For cases with flat terrain (e.g., Shanghai), the mean ground elevation is deducted from the DSM data to obtain the building height, whereas in cases of undulated terrain (e.g., Nottingham), terrain elevation can be obtained from multiple sources, such as contour topographic data or from satellite-based sources like GMTED2010 and LiDAR DTM.
The method presented in this paper affords the development of 3D models with LOD1 for any urban setting globally. High-resolution 3D datasets with higher LODs are, of course, possible but are very expensive to produce and many applications do not require very precise height datasets. Often, a model with LOD1 data is enough. Studies shows that LOD1 models provide a relatively high information content and usability compared to their geometric detail [57,58]. LOD1 model is the simplest volumetric 3D city model and fundamentally considered coarse and inferior to an LOD2. However, it may be more valuable than an LOD2 model for certain scenarios, especially when a finer footprint is more useful than the acquired roof shape [59]. Examples of such cases include: climate change and urban climate modelling, property registration, energy modelling [60], energy demand estimation [61,62], shadowing simulations [63,64], navigation, estimation of noise pollution [65], design of urban green spaces, crisis management, vulnerability assessments for disaster mitigation and management, simulating floods [66], for analysing wind comfort [67], global change assessments [68,69], and visualisation [70]. Computation of the net internal area of a building is another application area of LoD1 data, useful for energy estimations, real estate valuation, and population counts [71][72][73][74].
LoD requirements are task-specific and data volume-dependent [75][76][77][78]. Should LOD1 be appropriate, the method presented here will also allow users to generate data in a cost-effective manner. Indeed, studies have attempted to assess the possibilities of 3D model generation from the OSM data used here and have already identified the huge potential of OSM for fulfilling the requirements for CityGML LoD1 [79][80][81]. Further, free and open earth observation data (e.g., Landsat and Sentinel) offer great potential for large-area mapping of human settlements [68]. As our method relies on open datasets, we hope that it will be of great use in developing and low-income countries to generate 3D data at no cost and with minimal effort. Furthermore, as this method uses freely downloadable open source datasets, it helps to save time and effort. Usually, generating 3D data is very tedious and time consuming and also requires a through lengthy process of data procurement procedures. Many applications, like hazard and risk management or crisis management, require faster results and our data generation technique will be very handy in these circumstances.
As this study intended to develop LOD1 models, we did not consider topological errors. If the 2D topological relationships between the footprints are not taken into account, the resulting 3D city models will not necessarily be topologically consistent (i.e., primitives shared by 3D buildings will be duplicated and/or intersected and overlapped building parts etc.) [82][83][84]. Models with topological inaccuracies often cannot be accepted by downstream analytical applications that demand 2-manifold exterior shells [82,83]. However, our objective was to develop LOD1 3D city models, which do not require higher levels of accuracy. We removed all incomplete and irregular buildings after creating the 3D city model. However, we did not check for any minor errors, nor for topological accuracy. We used 2D polygons from OSM, hence, if there are any topological errors in this dataset then these errors will be reflected in our results. We recommend consideration of topology should higher accuracy in resultant models be required (i.e., LOD2+).
One of the main disadvantages observed related to using AW3D-30m data was that the accuracy limitation with high-rise buildings. As the accuracy of very tall buildings (more than 100 m) were found to be less in AW3D-30 DSM, building height data from websites like Skyscraper (which publishes the tall building information data from Council on Tall Buildings and Urban Habitat) can be used to replace the height values of these buildings, thereby increasing overall accuracy. Further, as the accuracy level reduces with the increasing percentage of tall buildings, it would be advantageous to know about the characteristics of a particular city before applying this methodology. Further, though we used AW3D-30 DSM data that were published in 2017, this dataset utilized the 2011 satellite data as base data. Hence, there could be accuracy difference for the buildings that were constructed after 2011. While using this method, it is also recommended to cross-check the results of building elevations with low height values for larger 2D footprints, as tall buildings may have a large low height podium. This can be done by visual interpretation from Google Earth satellite images. The original AW3D-30 DSM has some data-void regions and these values are filled with the values from adjacent pixels [44]. So, some accuracy difference could arise due to this procedure. Large digitization errors or shifts in the 2D building footprints can result in the misrepresentation of height information.

Conclusions
This paper demonstrated a globally replicable methodology to generate 3D buildings from open data. Generation of 3D buildings exclusively using open data was the highlight of this paper. This method is cost-effective, making it particularly attractive to users in low-and middle-income countries, where free 3D building data are not available. Further, this largely automated method requires minimal time to generate 3D city models, and also has flexibility for improvement in accuracy should higher resolution data be available. Given the use of relatively low resolution open data, this methodology will be of particular relevance to studies that do not require high resolution 3D models, such as for global environmental change studies, global climate change and urban climate modelling, disaster vulnerability models, and energy models. Real world simulations for 3D games may be another potential area of interest.
Finally, the methodology presented in this paper can, in the future, be employed in conjunction with alternative 2D input data, for example as quality checked OSM data become more abundant, and with more accurate height data, as upgrades to AW3D-30 are published, or other sources become available, such as those derived from LiDAR measurements.  Acknowledgments: The authors acknowledge the Leverhulme Trust and the UK Engineering and Physical Sciences Research Council (EPSRC) for providing funding to allow this research to take place. Authors also acknowledge School of Geography, University of Nottingham for providing resources and facilities.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Figure A1. Percentage of buildings under each range for foundation 3D model in Nottingham. 10 to 20 3 10 >20 7 Figure A1. Percentage of buildings under each range for foundation 3D model in Nottingham.