2.1. CityGML
CityGML is a standard that facilitates data representation, storage, and exchange for the 3D modelling of cities [
11]. The current version contains modules for datasets including vegetation, transportation, relief, water bodies, and city furniture.
In the CityGML standard, the Vegetation module contains two types of vegetation representation. The first class is called
SolitaryVegetationObject (SVO) and represents individual vegetation objects, such as boulevard trees. The second class is called
PlantCover and represents areas with dense and continuous coverage, such as forests [
12]. An SVO may be geometrically implicit and represented by LODs of 1 to 4 with increasing detail. A
PlantCover is represented by either a
MultiSolid or
MultiSurface object [
12].
The Transportation module in CityGML represents transportation corridors such as roads, railways, and public squares. The main TransportationComplex class differentiates TrafficAreas and AuxiliaryTrafficAreas, such as roads or sidewalks, in contrast with boulevards. The higher the LOD, the greater the distinction between these spaces.
Terrain is modelled in the CityGML format as a
ReliefFeature consisting of one or more entities called
ReliefComponents. These entities may include a TIN (Triangulated Irregular Network), which is a network of non-overlapped triangles formed by the interconnection of irregularly spaced points [
13]. Additionally, other representation methods include mass points (
MasspointRelief), break lines (
BreaklineRelief), or grids (
RasterRelief). There is potential for different LODs for
ReliefFeature and
ReliefComponents.
A series of surface boundaries define WaterBody geometry. The
WaterGroundSurface represents the very bottom of the water body, the submerged surface such as riverbeds, whereas
WaterSurface represents the upper surface, which represents the boundary between the water body and the atmosphere [
14]. LOD 0 and LOD 1 represent the lowest level of illustration and have a high level of generalization [
12]. A combination of geometries may be used to represent the WaterBody at each LOD.
Previously, CityFurniture objects could be represented by explicit or implicit geometry. Explicit geometry refers to using a specific, instanced object, whereas implicit geometry refers to using a prototypical object repeatedly within the model [
12]. No additional information regarding the LOD is provided for the CityFurniture module.
Although CityGML is just one standard for modelling UDTs, its framework and modules provide a starting point for working with the organization of datasets. CityGML utilizes a LOD for each of its modules and corresponding datasets. The LOD of building datasets is the only one that is explicitly defined. The LOD is certainly referenced in every other module, but the explicit LOD definition of each remains vague. This vagueness and lack of refinement of CityGML has drawn attention and proposals for refinements. In the following section, we review peer-reviewed publications on this topic and summarize research gaps in LOD refinement for buildings, vegetation roads, terrain/relief, water bodies, and city furniture.
2.2. Related Work
This study reviewed the existing literature on LOD refinement across various urban datasets. Searches for academic papers concerning the LOD refinement of other datasets included the following keywords/phrases: LOD, Level of Detail, Definition, Refined, 3D Model, Urban Digital Twin, Digital Twin City, Trees, Vegetation, Solitary Vegetation Object, Plant Cover, Road, Intersection, Transportation, Terrain, Relief, Waterbody, Water, Surface, and CityFurniture. We reviewed around seventy-seven sources, ten of which were explicitly related to refining the LODs of datasets. These related works contributed to the refinement and formalization of the LODs of other 3D model datasets, such as buildings [
15], trees and vegetation [
16], roads and transportation [
17,
18,
19,
20], terrain [
13], water [
21,
22], and city furniture [
23].
Ambiguities in CityGML’s building LOD definitions have prompted efforts for supplementary refinement. The authors of [
15] refined building LOD definitions in the context of the CityGML format. The authors of [
15] also identified the lack of precise LOD definitions in the CityGML format, which allowed for ambiguity. Ambiguity was observed in instances of models with varying differences in architectural features that were considered part of the same LOD. The intent behind refining the building LOD definition was not to extend CityGML but to provide a supplementary specification that reflects the practices and current concepts while also solving the ambiguities identified.
LOD definitions for Solitary Vegetation Objects (SVOs) are less common compared to those for buildings. Reference [
16] is the only article that provided use cases and a definition framework for SVOs in the CityGML Vegetation module. Ortega-Cordova identified five categories of applications for urban vegetation models and data: managing, maintaining, sustaining urban vegetation, urban and landscape planning, and policy making. The use cases within these categories are used to inform their LOD definition refinement. Ortega-Cordova provided a refined framework for the geometric LODs of SVO modules, but did not accommodate the inclusion of potential attribute data.
Refinements in road and transportation LOD definitions have focused on geometric representation and use-case applicability. The author of [
17] focused on roads and the associated vehicular transportation portion of the CityGML Transportation module. Boersma’s LOD refinement was built on existing research by [
18,
19]. Additionally, use cases for road data were provided as a foundation for forming the refined definitions. Boersma identified three primary categories of use cases for road and transportation data, including transport and traffic models, navigation, and road maintenance. Based on street design, The authors of [
20] emphasized the importance of the traffic area, driving lanes, and traffic logic in modelling roads. The author of [
17] provided refinement in the geometric representation of traffic areas and driving lanes but lacked the attribute data associated with these components and traffic logic.
Efforts to refine terrain LOD definitions focus primarily on geometric representation. The authors of [
13] provided a method for refining terrain LOD definitions. Based on the existing state of terrain representation in the CityGML standard, they emphasize the relationship between the Construction module, which includes Bridges and Tunnels, and the Relief module. They also explained in detail the method behind their proposal. Although the authors of [
13] redefined the LOD associated with the geometric representation of terrain, other attribute data, such as geological composition or land use, are not included in their work. The geometric-focused approach of [
13] neglects other important features/attributes that may be associated with terrain but are only sometimes geometrically represented, such as materials and geologic features.
No published peer-review articles were found on refining WaterBody LOD definitions. A WaterBody is defined as a significant and permanent or semi-permanent accumulation of surface water. Examples may include rivers, canals, lakes, and basins [
14]. No published peer-review articles were found on refining WaterBody LOD definitions. Existing CityGML documentation was reviewed to understand the components needed for a WaterBody object. However, the literature on hydrology [
21] identified difficulties in the modelling and simulation process that resulted from a lack of clearly defined LODs for water bodies. This is largely due to computing needs, as the LOD increases in complex 2D and 3D hydrological simulations in a UDT (or a city information model, CIM). In addition, while there is no direct literature on WaterBody LOD refinement, The authors of [
22] highlighted that accurate flood simulation and assessment in a UDT requires clearly defined LODs of not only water but other interdependent urban structures such as buildings and road networks.
CityFurniture is one of the most open-ended datasets, as different cities will have different urban infrastructure objects. CityFurniture objects involve immovable objects that include decorations, explanations, or controls [
14]. Examples may include street signs, traffic signals, streetlamps, benches, and fountains [
14]. No published peer-review articles were found on refining CityFurniture LOD definitions. The literature on street design and mobility analysis [
23], however, highlighted the importance of CityFurniture in mapping and managing public streets as the geometric condition and materials of CityFurniture can significantly impact the quality and experience of streetscapes. However, such data and a LOD framework to organize these datasets are lacking and not well aligned with other urban elements. With advancements in sensing technologies such as mobile laser scanning (MLS), increasingly rich and accurate datasets are expected to become more readily available. This study anticipates the influx of such high-dimensional datasets and proposes an LOD framework that integrates both geometric and semantic information for city furniture, addressing the growing need for comprehensive data representation in UDT models.
In summary, LOD refinements from the existing literature try to retrofit their corresponding dataset within the same definition framework as [
15], and therefore similar shortcomings are experienced in each urban element reviewed above. Although significant attention is given to sorting geometric differences, The authors of [
15] acknowledged the limited effort in associating embedded attributes with geometries in their refinement in favour of delineating the geometry divides between their LOD families. Embedded attributes refer to data that are typically associated with and may influence other geometric data but are not necessarily geometrically represented in 3D models. The existing literature addresses the ambiguity in graphic depictions mainly associated with geometry, but all lack attribute data. (See
Table 1).
Excluding attributes in the refinement of LOD definitions hinders research necessitating comparable and consistent datasets. Most datasets from municipal and federal governments feature non-geometric data that would not fit these existing definition refinements. While some are geometrically accurate, the existing definition refinements do not allow for evaluations and comparisons of datasets among cities because of limited attributes defined using a standardized framework.
Our work proposes to formulate a family of definitions considering both geometric and attribute data. These definitions are intended to be applicable to all datasets by determining divisions in LODs typically experienced across all UDT elements and their associated datasets. Universally applicable definitions for datasets would also enable individuals to recreate studies or representations of UDTs to ensure all relevant data are present. These definitions would also assist data providers in evaluating their data infrastructure, and therefore further strategizing and prioritizing updates or upgrades. This level of evaluation is not possible with the current ambiguously defined LODs of datasets.