A Systematic Literature Review of Physics-Based Urban Building Energy Modeling (UBEM) Tools, Data Sources, and Challenges for Energy Conservation

: Urban building energy modeling (UBEM) is a practical approach in large-scale building energy modeling for stakeholders in the energy industry to predict energy use in the building sector under different design and retroﬁt scenarios. UBEM is a relatively new large-scale building energy modeling (BEM) approach which raises different challenges and requires more in-depth study to facilitate its application. This paper performs a systematic literature review on physics-based modeling techniques, focusing on assessing energy conservation measures. Different UBEM case studies are examined based on the number and type of buildings, building systems, occupancy schedule modeling, archetype development, weather data type, and model calibration methods. Outcomes show that the existing tools and techniques can successfully simulate and assess different energy conservation measures for a large number of buildings. It is also concluded that standard UBEM data acquisition and model development, high-resolution energy use data for calibration, and open-access data, especially in heating and cooling systems and occupancy schedules, are among the biggest challenges in UBEM adoption. UBEM research studies focused on developing auto-calibration routines, adding feedback loops for real-time updates, future climate data, and sensitivity analysis on the most impactful modeling inputs should be prioritized for future research.


Introduction
Urban building energy modeling (UBEM) is a term used in the literature to refer to different types of simulations that are not necessarily related to the buildings in the "urban" area.Urban-scale, large-scale, and district-scale building energy modeling have been used in the literature interchangeably, but they all refer to the modeling and simulation of a group of buildings to study their energy use and behavior.The large-scale building energy modeling (herein UBEM) has been trending for the past decade (Figure 1) due to its effectiveness in providing a significant amount of data on a large group of buildings that could be used by different stakeholders in the energy sector such as energy policymakers, energy companies, building portfolio managers, and researchers.Making the modeling and simulation process more efficient in speed, flexibility, cost, and accuracy is critical, and this requires a thorough understanding of the current state of and future trends in UBEM tools and techniques.
UBEM models are developed for different purposes, such as identifying the buildings with high energy use intensity (EUI), regions with high energy consumption, microclimate impact on building energy use, and studying the energy saving of different energy conservation measures (ECMs) on a large scale.Finding the most effective ECMs is one of the essential end goals in energy modeling projects; however, not all the UBEM approaches are suitable for this purpose.This highlights the importance of such systematic literature reviews.The criteria for full-text review selection include research studies using physics based simulation engines, suggesting a large-scale data collection method for energy sim ulation inputs, or proposing an archetype development method applicable to large-scale physics-based simulation techniques.Although UBEM could be used in life cycle assess ment [2,3] to include the embodied energy of buildings, this paper is focused on opera tional energy.Other aspects of UBEM, such as data visualization, although very importan and studied by other researchers using tools such as Quantum Geographic Information System (QGIS) [4], are not reviewed in this paper since it does not directly contribute to physics-based simulation outcomes for ECM analysis.
From 173 full-text reviews, 88 papers are selected for in-depth analysis, plus seven papers from the snowballing process to extract the following information:

•
Technique development or usage of archetype-building models; Proposition of challenges and suggestions for the future UBEM-related studies.

Literature Review
Physics-based UBEM could include multiple aspects and steps proposed by research ers such as data preprocessing (e.g., geometric data, non-geometric data, weather data and energy use), model generation, simulation, calibration, and application (e.g., urban From 173 full-text reviews, 88 papers are selected for in-depth analysis, plus seven papers from the snowballing process to extract the following information:

Literature Review
Physics-based UBEM could include multiple aspects and steps proposed by researchers such as data preprocessing (e.g., geometric data, non-geometric data, weather data, and energy use), model generation, simulation, calibration, and application (e.g., urban planning, stock-level carbon reduction, building-level recommendations, and building-to-grid integrations) [5]. Figure 3 proposes a five-step data extraction/presentation approach for UBEM studies.This data and metadata adoption structure help to take a systematic and standard practice to provide and collect data in UBEM projects.
The scale of the UBEM study could determine the necessary tools and scope of the project.Oraiopoulos and Howard (2022) [6] performed a systematic review and adopted a statistical approach in UBEM to use micro, meso, and macro scales to categorize the UBEM projects based on the number of buildings (Figure 4).
planning, stock-level carbon reduction, building-level recommendations, and buildingto-grid integrations) [5]. Figure 3 proposes a five-step data extraction/presentation approach for UBEM studies.This data and metadata adoption structure help to take a systematic and standard practice to provide and collect data in UBEM projects.The scale of the UBEM study could determine the necessary tools and scope of the project.Oraiopoulos and Howard (2022) [6] performed a systematic review and adopted a statistical approach in UBEM to use micro, meso, and macro scales to categorize the UBEM projects based on the number of buildings (Figure 4).The literature review shows that most physics-based UBEM case studies cover under 100,000 buildings (Figure 5).More specifically, 46 out of 61 eligible studies model less than 10,000 buildings.This includes 86% of the eligible case studies reviewed in this paper.About 50% of the case studies model less than 1000 buildings (i.e., Microscale), and only about 3% of case studies model more than 500,000 buildings.The methods and tools used in the reviewed papers could be scaled up to model and simulate a large number of buildings, but lack of data could hinder that.Section 4 reviews such challenges in physics-based UBEM.The scale of the UBEM study could determine the necessary tools and scope of the project.Oraiopoulos and Howard (2022) [6] performed a systematic review and adopted a statistical approach in UBEM to use micro, meso, and macro scales to categorize the UBEM projects based on the number of buildings (Figure 4).The literature review shows that most physics-based UBEM case studies cover under 100,000 buildings (Figure 5).More specifically, 46 out of 61 eligible studies model less than 10,000 buildings.This includes 86% of the eligible case studies reviewed in this paper.About 50% of the case studies model less than 1000 buildings (i.e., Microscale), and only about 3% of case studies model more than 500,000 buildings.The methods and tools used in the reviewed papers could be scaled up to model and simulate a large number of buildings, but lack of data could hinder that.Section 4 reviews such challenges in physics-based UBEM.The literature review shows that most physics-based UBEM case studies cover under 100,000 buildings (Figure 5).More specifically, 46 out of 61 eligible studies model less than 10,000 buildings.This includes 86% of the eligible case studies reviewed in this paper.About 50% of the case studies model less than 1000 buildings (i.e., Microscale), and only about 3% of case studies model more than 500,000 buildings.The methods and tools used in the reviewed papers could be scaled up to model and simulate a large number of buildings, but lack of data could hinder that.Section 4 reviews such challenges in physics-based UBEM.
Researchers study the modeling and simulation techniques in UBEM, and their advantages and disadvantages are compared [7,8].Boghetti et al. (2020) [7] compared two UBEM development approaches (i.e., physics-based vs. data-driven) and noted that data-driven models rely on many data points and physics-based methods need more extended simulation and preparation time.Because ECM evaluation is one of the areas of study in this paper, the selected papers are focused on physics-based modeling.
Studies on bottom-up physics-based urban building energy models might simulate multiple buildings independently, focus on microclimate effects (e.g., urban heat island (UHI)), or combine these two and make the necessary adjustments in UBEM to consider the microclimate effects (Figure 6).These effects could include the shading from surrounding objects, increased temperature of the urban area due to the UHI effect, the long-wave radiation from other buildings, or the Heating, Ventilation, and Air Conditioning (HAVC) system's heat release that could affect the urban climate and other buildings [9] increasing the outdoor air temperature by 2.8 • C in commercial neighborhoods.Luo et al. (2019) [10] showed that the thermal interaction between buildings in UBEM for a dense urban area with high-rise buildings based on the long-wave radiations could affect the heating and cooling loads by up to about 3.6%.Therefore, the microclimate could directly impact individual buildings' energy performance, especially in dense areas.However, this is not necessarily considered in all the UBEM studies; hence, the "individual building simulation" category needs to be identified and studied separately (Figure 6).Studies on bottom-up physics-based urban building energy models might simulate multiple buildings independently, focus on microclimate effects (e.g., urban heat island (UHI)), or combine these two and make the necessary adjustments in UBEM to consider the microclimate effects (Figure 6).These effects could include the shading from surrounding objects, increased temperature of the urban area due to the UHI effect, the longwave radiation from other buildings, or the Heating, Ventilation, and Air Conditioning (HAVC) system's heat release that could affect the urban climate and other buildings [9] increasing the outdoor air temperature by 2.8 °C in commercial neighborhoods.Luo et al. (2019) [10] showed that the thermal interaction between buildings in UBEM for a dense urban area with high-rise buildings based on the long-wave radiations could affect the heating and cooling loads by up to about 3.6%.Therefore, the microclimate could directly impact individual buildings' energy performance, especially in dense areas.However, this is not necessarily considered in all the UBEM studies; hence, the "individual building simulation" category needs to be identified and studied separately (Figure 6).
The interface between these two approaches includes UBEM studies where at least one component of microclimates such as the UHI effect or long-wave radiation between buildings is considered in the physics-based simulation.This could be done by tuning the weather file using an urban weather generator (UWG) [11] or using external and complementary tools to include additional interactions between buildings [12].The majority of papers selected for full-text review in this paper could be categorized under the 'group of individual building simulation.' Review papers study different aspects of physics-based UBEM.Table 1 shows the selected review papers, which could identify the main UBEM field of research.The identified areas include general review papers on tools, methods, and challenges in UBEM, occupant-centric studies, UBEM tools, data acquisition methods, classification ap- The interface between these two approaches includes UBEM studies where at least one component of microclimates such as the UHI effect or long-wave radiation between buildings is considered in the physics-based simulation.This could be done by tuning the weather file using an urban weather generator (UWG) [11] or using external and complementary tools to include additional interactions between buildings [12].The majority of papers selected for full-text review in this paper could be categorized under the 'group of individual building simulation'.
Review papers study different aspects of physics-based UBEM.Table 1 shows the selected review papers, which could identify the main UBEM field of research.The identified areas include general review papers on tools, methods, and challenges in UBEM, occupant-centric studies, UBEM tools, data acquisition methods, classification approaches, energy-saving potentials, and the accuracy of UBEM.Other than the first category, there are limited review studies on different aspects of UBEM, especially on critical areas such as archetype development, data sources, acquisition techniques, calibration, and energy conservation evaluation.
Table 1.Review studies on urban building energy modeling.

Reference UBEM Research Area
General Review [13] Advancing urban building energy modeling through new model components and applications [14] Bottom-up physics-based approaches in UBEM [15] Information modelling for urban building energy simulation [16] AUBEM modeling approaches and procedures [17] State-of-the-art and prospects in urban building energy modeling [18] Ten questions on urban building energy modeling [19] The nascent field of urban building energy modeling [20] UBEM methods and tools using qualitative and quantitative analysis [5] Use cases in urban building energy modeling UBEM tools [21] A comparison of available tools in urban building energy modeling [22] UBEM tools [23] UBEM tools for district-scale energy systems Occupant-centric [24] Approaches, inputs, and data sources in occupant-centric urban building energy modeling [25] Occupant behavior in urban building energy models Data acquisition [26] Data acquisition for urban building energy modeling [27] GIS Data Extraction and Visualization to Support Urban Building Energy Modeling [28] Infrared thermography in the built environment Classification and archetype development [29] Archetype development strategies for energy assessment at the urban scale [30] Developing a common approach for classifying building stock energy models Table 1.Cont.

Reference UBEM Research Area
Energy conservation potentials [31] Energy saving potential for large-scale building [32] Estimating the energy-saving potential in national building stocks Accuracy and calibration [6] Accuracy of Urban Building Energy Modeling Figure 7 shows the occurrence frequency of keywords and their link strength in UBEM research.Specific dependent keywords such as 'buildings,' 'building energy modeling,' and 'urban building energy modeling' have the highest use frequency and link strength.The review of the literature shows that (1) energy utilization, (2) energy efficiency, (3) urban planning, (4) energy management, and ( 5) urban planning are among the top independent keywords in UBEM research.It could be observed that critical aspects of UBEM such as 'calibration' are not studied as often as other aspects.Figure 8 shows certain research areas of UBEM such as climate change, greenhouse gas emissions, offices, housing, and retrofitting are among the topics studied more recently (i.e., after 2020).

General Research Data in Physics-Based UBEM
UBEM's capability in providing energy-related data for a large number of buildings has converted it into an appealing tool and research topic globally.Depending on a given country energy policies and goals, some have been more invested in providing resources and conducting research.The papers reviewed in this article showed that departments and agencies in the U.S. such as the Department of Energy (DOE), the National Science Foundation, and the Office of Energy Efficiency and Renewable Energy have been among the active research sponsors in this field, followed by National Natural Science Foundation of China and National Research Foundation Singapore.Figure 9 shows the number of publications selected for this systematic literature review sponsored by the most active agencies.The affiliated universities with the highest number of publications and research activities in physics-based UBEM are shown in Figure 10, primarily located in North America, Europe, and East Asia.
Figures 11 and 12 show the publishers and journals/conferences with the highest number of publications related to physics-based UBEM.This shows journal articles as the primary data source, followed by conference proceedings and the limited number of journals housing the majority of research studies.

General Research Data in Physics-Based UBEM
UBEM's capability in providing energy-related data for a large number of buildings has converted it into an appealing tool and research topic globally.Depending on a given country energy policies and goals, some have been more invested in providing resources and conducting research.The papers reviewed in this article showed that departments and agencies in the U.S. such as the Department of Energy (DOE), the National Science Foundation, and the Office of Energy Efficiency and Renewable Energy have been among the active research sponsors in this field, followed by National Natural Science Foundation of China and National Research Foundation Singapore.Figure 9 shows the number of publications selected for this systematic literature review sponsored by the most active agencies.The affiliated universities with the highest number of publications and research activities in physics-based UBEM are shown in Figure 10, primarily located in North America, Europe, and East Asia.
Figures 11 and 12 show the publishers and journals/conferences with the highest number of publications related to physics-based UBEM.This shows journal articles as the primary data source, followed by conference proceedings and the limited number of journals housing the majority of research studies.

Building Systems and Energy Modeling Inputs in UBEM
UBEM requires a large amount of data to model many buildings properly.One of the most common and convenient input data sources in UBEM development is using prototype building data instead of measured data.BEM computer model inputs could be categorized into five types: (1) geometry and location, (2) HVAC and hot water systems, (3) building envelope systems, (4) schedules, and (5) weather data.Figure 13 shows that the geometrical data are the most accessible measured data in UBEM studies.This is followed by envelope, schedules, WWR, and HVAC systems.This indicates the lack of tools, methods, and open data on specific building systems and energy modeling inputs, including the HVAC and hot water system, WWR, schedules, and building envelope thermal properties for large-scale energy modeling (i.e., UBEM).The following five sub-chapters cover these categories in more detail.

Building Systems and Energy Modeling Inputs in UBEM
UBEM requires a large amount of data to model many buildings properly.One most common and convenient input data sources in UBEM development is using type building data instead of measured data.BEM computer model inputs could b gorized into five types: (1) geometry and location, (2) HVAC and hot water syste building envelope systems, (4) schedules, and (5) weather data.Figure 13 shows t geometrical data are the most accessible measured data in UBEM studies.This is fo by envelope, schedules, WWR, and HVAC systems.This indicates the lack of tools ods, and open data on specific building systems and energy modeling inputs, inc the HVAC and hot water system, WWR, schedules, and building envelope therma erties for large-scale energy modeling (i.e., UBEM).The following five sub-chapter these categories in more detail.

Geometry and Location
The geometrical and location data is the most common measured input in UBEMs.Researchers use several methods or sources to obtain these open-source data through open-access maps or computer vision methods.Remote sensing techniques such as light detection and ranging (LiDAR) [33], aerial drone-based images [34], and open-access maps [35] are among the common techniques.The Geographic Information System (GIS) and point-of-interest (POI) data are also commonly used by researchers to develop the geometry and determine the location of buildings [36].Wherever the open-access map data are unavailable, newer methodologies such as deep-learning-based segmentation and digital surface modeling could be deployed to reconstruct the buildings' 3D models [37].

HVAC Systems
HVAC system data are one of the least measured data in UBEM studies and are mainly modeled based on the pre-populated archetype building data.There are examples of HVAC archetype development on a large-scale that review and propose different archetype characteristics and categories for HVAC systems.For example, Kim et al. (2019) [38] categorized office buildings in Japan into 3960 groups and 44 segments based on the HVAC systems.The systems examined in this study include both centralized and decentralized variable refrigerant flow (VRF), air-source heat pump, absorption chiller, and boiler using either electricity or gas.Such comprehensive archetype development for HVAC systems could provide more accurate data for UBEM.HVAC systems are critical in BEM output accuracy and directly influence the ECM analysis results.
Another HVAC-related topic is the detail of UBEM in terms of the number of thermal zones.UBEM is a large-scale model that typically follows prototype building properties and could use either multi-zone or single-zone shoebox models.Researchers showed that for heating loads, the difference is insignificant, but for the annual energy use the difference between a shoebox and a detailed model could be up to about 9% [39].Such findings depend on factors such as HVAC type and climate zone, but it still raises awareness of the impact of such simplifications in UBEM.Future research studies are necessary to evaluate and quantify this in more detail.

Building Envelope
Measuring the building envelope system data is one of the most challenging inputs in BEM, especially on a large scale and if the construction drawings are unavailable.This is more challenging for UBEM as a large-scale BEM tool primarily due to the lack of public sources of data.Using infrared cameras and survey data are among the most common approaches in UBEM to contain the building envelope system data [28], typically performed via remote sensing techniques such as drone-based methods [34,40].
These methods might provide enough data for U-Value/R-Value calculations; however, they do not necessarily provide measured specific heat properties, which is a critical input in buildings' thermal performance and delayed heat transfer calculations.Moreover, review papers on infrared thermography show that relevant studies mainly focus on the UHI effect, land surface temperature, remote sensing, and U-Value [28].This shows that the main application of such techniques could be limited to urban design where the UHI effect or land surface temperature could be a factor.These do not necessarily and directly contribute to the existing physics-based UBEM modeling and simulation engines.
The counter-approach suggested by researchers is using probabilistic-based characterization or using open-access maps for WWR and selecting the U-Value of building envelope components [41] as opposed to remote sensing approaches such as LiDAR, drones [35], computer vision [42], or image processing on geotagged street view imagery data [43].Researchers also propose other non-archetype approaches using machine learning with specific inputs such as building type, floor area, number of stories, volume, and shape factors to assign the thermal properties of the building envelope [44].

Schedules
Schedules could help model the impact of occupancy, equipment, lighting, or air infiltration rates.This seems to be another challenging input to acquire accurately due to data privacy.Therefore, the prototype buildings' schedule data is one of the most common sources in UBEM development.However, researchers showed that empirical approaches in creating occupancy schedules could lead to up to a 70% difference in occupancy rates compared to the department of energy (DOE) prototype buildings [45].
Researchers suggest categorizing the approaches in developing schedules under four groups including deterministic (e.g., standards), data-driven (e.g., ML techniques), stochastic (e.g., probability), or agent-based (e.g., indirect observation) [24].Lim and Zhai (2022) [46] showed the feasibility of using a stochastic-deterministic approach in estimating unknown inputs in UBEM such as equipment power density, lighting power density, heating and cooling setpoint temperatures, occupancy, and infiltration rates.These are also among the dominant and most influential input parameters in BEM.

Weather Data
Physics-based UBEM simulation tools need weather data to simulate the building performance based on environmental conditions, mainly reflecting the temperature, insolation, wind characteristics, and air pressure.Weather files could include typical weather data for a region, also known as the Typical Meteorological Year (TMY), or measured data, also known as the Actual Meteorological Year (AMY).Figure 14 shows that about 30% of the physics-based UBEM studies use actual weather data.Some of these studies incorporate the actual weather data and adjust it based on the microclimate effects before implementing it in the energy simulation engine.Using AMY requires a weather station(s) installed for at least one year to perform a whole-year energy simulation, which is not necessarily available or accessible in all the projects.the physics-based UBEM studies use actual weather data.Some of these studies inc rate the actual weather data and adjust it based on the microclimate effects before i menting it in the energy simulation engine.Using AMY requires a weather station stalled for at least one year to perform a whole-year energy simulation, which is no essarily available or accessible in all the projects.Researchers proposed innovative approaches to obtaining weather data, su hourly air temperature via satellite-based remote sensing methods [47].However, the most efficient and feasible methods in considering the microclimate in UBEM is t either TMY or AMY to show the microclimate impacts, such as the temperature ris dense area.This approach is compatible with existing energy simulation engines su EnergyPlus and could be implemented in UBEM studies [11].The microclimate im such as the UHI effect, are seasonal phenomena [48], and it is necessary to adju weather files for both heating and cooling seasons.Using complimentary tools on the energy simulation engine such as CityFFD is also examined in UBEM studies [ include the microclimate impacts.Using complementary tools to include microclim fects needs further studies to identify the existing approaches, tools, and shortcomi AMY 29% TMY 71% Researchers proposed innovative approaches to obtaining weather data, such as hourly air temperature via satellite-based remote sensing methods [47].However, one of the most efficient and feasible methods in considering the microclimate in UBEM is tuning either TMY or AMY to show the microclimate impacts, such as the temperature rise in a dense area.This approach is compatible with existing energy simulation engines such as EnergyPlus and could be implemented in UBEM studies [11].The microclimate impacts, such as the UHI effect, are seasonal phenomena [48], and it is necessary to adjust the weather files for both heating and cooling seasons.Using complimentary tools on top of the energy simulation engine such as CityFFD is also examined in UBEM studies [49] to include the microclimate impacts.Using complementary tools to include microclimate effects needs further studies to identify the existing approaches, tools, and shortcomings.

Tools and File Schemas
This paper reviews physics-based UBEM studies where either a physics-based simulation engine is used for energy simulation or input necessary for physics-based modeling is used or measured.Figure 15 shows the tools or file schemas with the highest use frequency showing EnergyPlus as one of the common tools in the field.Being open-source is a critical feature in UBEM as most robust BEM tools, regardless of their capability in energy simulation, have limited flexibility in large-scale modeling.As other researchers indicated [3], open-source tools allow researchers to integrate the simulation engine into the backend of any platform they develop.UBEM development does not follow a standard workflow and file format; researchers would instead work with open-source tools in the backend and occasionally combined with customized codes in Python [50], which provides flexible libraries to work with different types of data.Another reason for the popularity of such tools identified by researchers is their compatibility with high-usage schemas such as City Geography Markup Language (CityGML) [15].Some of these tools are integrated within each other and do not necessarily work independently, such as Rhinoceros and Urban Modeling Interface (UMI).
Another observation is about the data exchange and file schemas.In BEM, file schemas such as green building Extensible Markup Language (gbXML) and Industry Foundation Classes (IFC) are popular as they allow users to transfer building data beyond the geometry.There are limited attempts to combine common Building Information Modeling (BIM) file schemas with GIS in UBEM case studies [51].Instead, most of the file schemas in UBEM are limited to schemas such as CityGML or GeoJSON mainly because the large building datasets are limited to geometrical data as other inputs such as schedules, construction materials, and HVAC systems are not or cannot be measured via mass data collection approaches such as satellite images for mass building file generations.
Other tools and methods shown in Figure 15, such as Bayesian and K-means, refer to the building archetype development research studies.This is a critical topic in UBEM that needs further studies and improvement as one of the cornerstones of UBEM.

Sources of Data
The inputs in physics-based UBEMs are the integral components that could determine the accuracy of outputs and the model scope.As defined by researchers, accessibility of open data is one of the significant challenges in UBEM development, especially since the measured data on a large scale is scarce or challenging to access due to privacy issues.Due to the variety of data needed for physics-based UBEM, researchers end up using and combining multiple sources of inputs [52].
Figure 16 shows the common data sources used in physics-based UBEM studies.Some sources have overlaps; for example, DOE prototype buildings could be based on ASHRAE 90.1 standards but depending on how the authors reported their data sources, only one of them is identified as the primary data source.It could be observed that most of these data sources provide common prototypical and geometrical data.Very few sources provide measured or non-geometrical data, such as TABULA and U.S. Energy Information Administration.Additionally, newer large-scale sources of measured data are available that could be used in future studies upon public availability and detailed access and use instructions, such as energy end-use load profile (EULP) [53], ResStock [54], and ComStock [55] outputs.

Sources of Data
The inputs in physics-based UBEMs are the integral components that could determine the accuracy of outputs and the model scope.As defined by researchers, accessibility of open data is one of the significant challenges in UBEM development, especially since the measured data on a large scale is scarce or challenging to access due to privacy issues.Due to the variety of data needed for physics-based UBEM, researchers end up using and combining multiple sources of inputs [52].
Figure 16 shows the common data sources used in physics-based UBEM studies.Some sources have overlaps; for example, DOE prototype buildings could be based on ASHRAE 90.1 standards but depending on how the authors reported their data sources, only one of them is identified as the primary data source.It could be observed that most of these data sources provide common prototypical and geometrical data.Very few sources provide measured or non-geometrical data, such as TABULA and U.S. Energy Information Administration.Additionally, newer large-scale sources of measured data are available that could be used in future studies upon public availability and detailed access and use instructions, such as energy end-use load profile (EULP) [53], ResStock [54], and ComStock [55] outputs.

Building Types and Locations
Two major types of buildings include residential and commercial buildings with sub-categories such as single-family, multi-family, office, retail, and hospital.Most UBEM studies (52%), even micro-scale studies, include residential and commercial buildings.As illustrated in Figure 17, some studies only focused on residential or commercial buildings.A higher frequency of UBEM development for particular building types does not necessarily imply higher interest or importance since it could be due to the higher availability of open data for the case studies.Therefore, this requires further studies to understand the dynamic between data availability for residential and commercial buildings versus interest in creating UBEM for each category.
Two major types of buildings include residential and commercial buildings with subcategories such as single-family, multi-family, office, retail, and hospital.Most UBEM studies (52%), even micro-scale studies, include residential and commercial buildings.As illustrated in Figure 17, some studies only focused on residential or commercial buildings.A higher frequency of UBEM development for particular building types does not necessarily imply higher interest or importance since it could be due to the higher availability of open data for the case studies.Therefore, this requires further studies to understand the dynamic between data availability for residential and commercial buildings versus interest in creating UBEM for each category.The reviewed research studies are performed in various countries shown in Figure 18.Publication databases such as Scopus determine the location based on where the paper is submitted from.However, the UBEM case studies are not necessarily located in these countries.Therefore, the cities/countries where the UBEM case studies are located are extracted from the selected papers, if reported, for full-text review and illustrated in Figure 19.The latitude and longitude of the recorded cities are obtained via the Google application programming interface (API), and a Python code is developed to mark these locations on OpenStreetMap.As previously observed in active institutes in physics-based UBEM studies, this data confirms that the case study buildings are primarily located in North America, Europe, and East Asia.This could indicate the availability of data and technology in these regions, as open data and simulation tools/techniques could be the two main drives in UBEM projects.Moreover, the incentives for developing such models play an essential role in investing in UBEM development, as energy policies and goals are not equally prioritized globally.

Residential 29%
Commercial 19% Both 52% The reviewed research studies are performed in various countries shown in Figure 18.Publication databases such as Scopus determine the location based on where the paper is submitted from.However, the UBEM case studies are not necessarily located in these countries.Therefore, the cities/countries where the UBEM case studies are located are extracted from the selected papers, if reported, for full-text review and illustrated in Figure 19.The latitude and longitude of the recorded cities are obtained via the Google application programming interface (API), and a Python code is developed to mark these locations on OpenStreetMap.As previously observed in active institutes in physics-based UBEM studies, this data confirms that the case study buildings are primarily located in North America, Europe, and East Asia.This could indicate the availability of data and technology in these regions, as open data and simulation tools/techniques could be the two main drives in UBEM projects.Moreover, the incentives for developing such models play an essential role in investing in UBEM development, as energy policies and goals are not equally prioritized globally.

Archetype Development in UBEM
Archetype or prototype building development is one of the most effective approaches in reducing the complexity and preparation time for UBEM development.Although it might reduce the accuracy of outcomes due to deviation from the individual building characteristics by up to 17% [56], it could still be a practical approach with acceptable accuracy.Different classification and clustering methods are used to create these archetype buildings [57] based on various inputs such as building vintage, type, and square footage.Review studies on building archetype development show that the most common variables used in developing construction archetypes include construction typology (e.g., commercial or residential), construction year (i.e., vintage), end-use, size of the building, and heating systems [29].
The identified archetypes would be affected by the deployed methodology.Goy et al. (2021) [58] compared unsupervised, semi-supervised, and supervised methods to develop building archetype development.They observed that the algorithms and the selected features impact the choice of archetype buildings.Usman et al. (2018) [59] also compared different clustering algorithms (K-means, K-Medoids, and Hierarchal) for archetype development in UBEM and found the K-mean method the most effective.Although there are studies on archetype development for specific types of buildings, such as religious worship buildings [60], almost all the studies apply their methodology to a diverse group of buildings with different use cases.This could be a future research topic to evaluate the performance of other methods based on the building types.

Validation and Calibration in UBEM
Validation or calibration of energy models against measured data can provide some metrics for the accuracy of the UBEM models.Data validation shows how close the outputs are to the measured data, and calibration is an attempt to tweak the inputs to converge the outputs toward measured data.Figure 20 shows the percentage of case studies reviewed in this paper, where simulation results are calibrated against measured data.Less than 10% frequency confirms a lack of measured data or challenges in accessing it in large-scale energy models.UBEM models could show a significant difference between the outputs and measured data [61] if the most important inputs are not modeled accurately and need to be recognized and fixed at the validation or calibration stage.The most effective inputs in UBEM are similar to BEM studied and identified by researchers by performing sensitivity analysis [62,63] on physics-based models.
large-scale energy models.UBEM models could show a significant difference between the outputs and measured data [61] if the most important inputs are not modeled accurately and need to be recognized and fixed at the validation or calibration stage.The most effective inputs in UBEM are similar to BEM studied and identified by researchers by performing sensitivity analysis [62,63] on physics-based models.The high level of uncertainty in UBEM inputs is one of the main reasons for calibration to be an integral step towards making the outcomes more reliable.This becomes more critical when studying ECMs within the scope of UBEM development.The high level of uncertainty is also indicated by other researchers and demonstrated by multiple factors in their studies, such as building geometry, end-use, envelope materials, age class, heating and cooling systems performances, energy certificates, and utility bills [64].Prataviera et al. (2022) [64] reviewed the sources of uncertainties in UBEM and identified multiple areas such as building geometry, end-use, envelope materials, age class, heating and cooling systems performances, energy certificates, and bills.
Another critical factor in UBEM validation and calibration is the availability of measured energy consumption.Most research studies skip the calibration step due to these missing components.The importance of regional surveys and utility bills is indicated in other research studies [65].Residential Energy Consumption Survey (RECS) and Commercial Buildings Energy Consumption Survey (CBECS) in the U.S. are among the sources of measured data used by researchers for data calibration [66], especially if the base buildings are developed based on DOE Prototype buildings.Another common source for measured data for calibration of large-scale models such as UBEM is the Advanced Metering Infrastructure (AMI) data used in UBEM studies [67] that is not publicly available.
The Bayesian-based method is a common technique in UBEM calibration used by researchers [68].Results showed that adding more detailed information on the building orientation, structural data, and heating times could improve the results significantly.

No calibration 91%
Calibration 9% The high level of uncertainty in UBEM inputs is one of the main reasons for calibration to be an integral step towards making the outcomes more reliable.This becomes more critical when studying ECMs within the scope of UBEM development.The high level of uncertainty is also indicated by other researchers and demonstrated by multiple factors in their studies, such as building geometry, end-use, envelope materials, age class, heating and cooling systems performances, energy certificates, and utility bills [64].Prataviera et al. (2022) [64] reviewed the sources of uncertainties in UBEM and identified multiple areas such as building geometry, end-use, envelope materials, age class, heating and cooling systems performances, energy certificates, and bills.
Another critical factor in UBEM validation and calibration is the availability of measured energy consumption.Most research studies skip the calibration step due to these missing components.The importance of regional surveys and utility bills is indicated in other research studies [65].Residential Energy Consumption Survey (RECS) and Commercial Buildings Energy Consumption Survey (CBECS) in the U.S. are among the sources of measured data used by researchers for data calibration [66], especially if the base buildings are developed based on DOE Prototype buildings.Another common source for measured data for calibration of large-scale models such as UBEM is the Advanced Metering Infrastructure (AMI) data used in UBEM studies [67] that is not publicly available.
The Bayesian-based method is a common technique in UBEM calibration used by researchers [68].Results showed that adding more detailed information on the building orientation, structural data, and heating times could improve the results significantly.However, not all the buildings could meet the criteria of ASHRAE Guideline 14.The calibration methodology could be comprehensive and include fine-tuning the majority of inputs or focusing on the most effective ones, such as occupancy profiles, which other researchers suggested and successfully tested [69].Simulation-based calibration techniques are also adopted by researchers in UBEM, where 17 parameters are selected and sampled to reach the minimum goodness of fit (GOF) [70].Further research should compare datadriven, simulation-based, and hybrid calibration methods to evaluate their efficiency in UBEM.
The computational capacity is also a critical concern with calibration in large-scale projects with high dimensions (e.g., UBEM).Researchers suggest using machine-learning approaches to develop surrogate models with lower training and prediction time.Researchers [71] found non-linear methods such as support vectors and neural networks more effective when working with a typical physics-based simulation engine such as EnergyPlus.

ECM Analysis
Physics-based UBEM creates a robust platform for ECM analysis compared to other modeling techniques [32].Although analyzing the impact of ECMs should be one of the critical outcomes of BEM and UBEM, there are not many UBEM research studies focused on this aspect of UBEM.This could be due to the need for high computational capacity Energies 2022, 15, 8649 18 of 24 to simulate several buildings under various ECM scenarios.In this systematic review paper, studies focused on any aspects of ECM analysis are screened and reviewed, listed in Table 2.The variety of ECMs, tools, and number of buildings show that the existing technology in UBEM is well capable of performing ECM analysis.Using the framework suggested in Figure 3, the scope of these case studies is evaluating the large-scale energy, carbon reduction, or building-grid interaction.The building-grid interactions are studied through either demand response opportunities [67] or the application of smart thermostats in reducing the grid peak loads [52,72,73].
Besides ECMs, some studies evaluated the energy generation potentials on a large scale [65,74,75].Although most models do not use actual HVAC data, it is one of the main systems of focus among ECMs in UBEM case studies.Either improvement of the efficiency of the HVAC components or the addition of economizers is studied based on the archetype data, which could cause significant errors compared to the actual savings.
It is also observed that most of these case studies are based on actual geometrical data, and archetype data is used for other systems and inputs such as envelope or HVAC systems.Comparing the accuracy of ECM analysis against measured data is not feasible unless post-commissioning is performed and such data is not available or addressed in any research studies reviewed in this paper.

Challenges in UBEM
Researchers have identified several challenges and shortcomings that slow down the application of UBEM and hinder it from achieving its full potential in the built environment.Figure 21 summarizes and categorizes these challenges.Ten distinguished groups list the lack of the following items: Public databases are more common in data-driven modeling [94] since those techniques do not necessarily require a particular format or file schema, unlike physics-based UBEM.This could be a focus of future research activities.
The research studies do not directly address or explain the computation approach in physics-based computer models in UBEM.However, similar to other computer-based simulations, the parallel computation effectively reduced the simulation time by up to one-fifth of the original time [95].

Future Research Directions and Conclusions
The literature review indicates that it is more efficient to adjust the level of detail in UBEM based on its scope and intended performance.As shown in this study and by other researchers [25], particular objectives such as ECM evaluation do not necessarily require a complicated occupancy modeling approach.Alternatively, adopting a deterministic versus stochastic process for occupancy schedules could be decided based on the objectives of the UBEM [25].Another example is the infiltration data and schedules, which significantly impact BEM and UBEM outputs.Researchers suggest that if the scope of UBEM is limited to the early-stage design of district energy systems, using fixed values should suffice [96].
A similar approach is adopted in other engineering fields such as seismic engineering, where codes have moved from non-performance based (e.g., prescriptive) to performance-based designs.The accuracy and level of details in UBEM inputs could also be tailored depending on the intended purpose and scope, such as urban design, singlebuilding/large-scale energy or carbon reduction, and building-grid interactions.There are studies on scaling up or down the level of detail in CityGML models based on the user inputs [4], but further research is still required to develop a standard approach.Public databases are more common in data-driven modeling [94] since those techniques do not necessarily require a particular format or file schema, unlike physics-based UBEM.This could be a focus of future research activities.
The research studies do not directly address or explain the computation approach in physics-based computer models in UBEM.However, similar to other computer-based simulations, the parallel computation effectively reduced the simulation time by up to one-fifth of the original time [95].

Future Research Directions and Conclusions
The literature review indicates that it is more efficient to adjust the level of detail in UBEM based on its scope and intended performance.As shown in this study and by other researchers [25], particular objectives such as ECM evaluation do not necessarily require a complicated occupancy modeling approach.Alternatively, adopting a deterministic versus stochastic process for occupancy schedules could be decided based on the objectives of the UBEM [25].Another example is the infiltration data and schedules, which significantly impact BEM and UBEM outputs.Researchers suggest that if the scope of UBEM is limited to the early-stage design of district energy systems, using fixed values should suffice [96].
A similar approach is adopted in other engineering fields such as seismic engineering, where codes have moved from non-performance based (e.g., prescriptive) to performancebased designs.The accuracy and level of details in UBEM inputs could also be tailored depending on the intended purpose and scope, such as urban design, single-building/large-scale energy or carbon reduction, and building-grid interactions.There are studies on scaling up or down the level of detail in CityGML models based on the user inputs [4], but further research is still required to develop a standard approach.
The review of physics-based UBEM studies shows trends of future research direction influenced by the shortcomings identified previously.The suggested future studies could be categorized as follows: 1.
Stochastic occupancy models, socio-economic factors, and the impact of future climate data [13]; 2.
Sensitivity analysis to find the most influential parameters so that they could be collected with more accuracy [22,41]; 3.
Development of a generalized solution that works with different data and scenarios [20].
Physics-based UBEM has shown to be a capable tool in simulating a large number of buildings.The existing tools and methods have been successfully adopted by different researchers worldwide.More in-depth studies are required to review the current data sources.All the components in the input-process-output workflow need to be standardized based on the intended application of the physics-based UBEM, such as ECM analysis and energy retrofit purposes.

Figure 2 .
Figure 2. Literature review diagram based on Systematic Reviews and Meta-Analyses (PRISMA) systematic literature review methodology

Figure 2 .
Figure 2. Literature review diagram based on Systematic Reviews and Meta-Analyses (PRISMA) systematic literature review methodology.

•
Technique development or usage of archetype-building models; • Sources of large-scale data inputs for UBEM development; • Study or proposition of a method to extract the building data regarding the geometry and location, envelope, heating, ventilation, and air-conditioning (HVAC), window-towall ratio (WWR), and schedules; • Tools or file schemas; • Type (residential or non-residential) and number of buildings; • Location of the study; • Usage of typical meteorological year (TMY) or actual meteorological year (AMY) weather data; • Usage of the UBEM to study ECMs; • Data validation or calibration performance against the measured data; • Proposition of challenges and suggestions for the future UBEM-related studies.

Figure 3 .
Figure 3. Physics-based urban building energy modeling project definition, data, and metadata extraction approach and categories.

Figure 4 .
Figure 4. Suggested scaling categories for urban building energy modeling.

Figure 3 .
Figure 3. Physics-based urban building energy modeling project definition, data, and metadata extraction approach and categories.
tematic and standard practice to provide and collect data in UBEM projects.

Figure 3 .
Figure 3. Physics-based urban building energy modeling project definition, data, and metadata extraction approach and categories.

Figure 4 .
Figure 4. Suggested scaling categories for urban building energy modeling.

Figure 4 .
Figure 4. Suggested scaling categories for urban building energy modeling.

Energies 2022 , 27 Figure 5 .
Figure 5.The frequency of the number of buildings simulated in physics-based urban building energy modeling case studies.Researchers study the modeling and simulation techniques in UBEM, and their advantages and disadvantages are compared [7,8].Boghetti et al. (2020) [7] compared two UBEM development approaches (i.e., physics-based vs. data-driven) and noted that datadriven models rely on many data points and physics-based methods need more extended simulation and preparation time.Because ECM evaluation is one of the areas of study in this paper, the selected papers are focused on physics-based modeling.

Figure 5 .
Figure 5.The frequency of the number of buildings simulated in physics-based urban building energy modeling case studies.

Figure 6 .
Figure 6.Two distinct areas of study in urban building energy modeling.

Figure 6 .
Figure 6.Two distinct areas of study in urban building energy modeling.

27 Figure 7 .
Figure 7. Keywords with the highest use in urban building energy modeling research studies developed by VOSviewer.Figure 7. Keywords with the highest use in urban building energy modeling research studies developed by VOSviewer.

Figure 7 .
Figure 7. Keywords with the highest use in urban building energy modeling research studies developed by VOSviewer.Figure 7. Keywords with the highest use in urban building energy modeling research studies developed by VOSviewer.

Figure 8 .
Figure 8.The frequency of different keywords in urban building energy modeling studies before 2018 and after 2020 developed by VOSviewer.

Figure 8 .
Figure 8.The frequency of different keywords in urban building energy modeling studies before 2018 and after 2020 developed by VOSviewer.

Figure 9 .
Figure 9. Top five research sponsors with publications in physics-based urban building energy modeling.

Figure 10 .
Figure 10.Physics-based urban building energy modeling publications' affiliated universities.

Figure 11 .Figure 9 . 27 Figure 9 .
Figure 11.Publishers with a minimum of three publications in physics-based urban building energy modeling selected for this review paper.

Figure 10 .
Figure 10.Physics-based urban building energy modeling publications' affiliated universities.

Figure 11 .Figure 10 . 27 Figure 9 .
Figure 11.Publishers with a minimum of three publications in physics-based urban building energy modeling selected for this review paper.

Figure 10 .
Figure 10.Physics-based urban building energy modeling publications' affiliated universities.

Figure 11 .Figure 11 . 27 Figure 12 .
Figure 11.Publishers with a minimum of three publications in physics-based urban building energy modeling selected for this review paper.

Figure 12 .
Figure 12.The journal and conference proceedings with the highest number of publications on physics-based urban building energy modeling.

Figure 12 .
Figure 12.The journal and conference proceedings with the highest number of publications o ics-based urban building energy modeling.

Figure 13 .
Figure 13.Number of studies using specific measured or surveyed building systems and modeling inputs.

3. 2 . 1 .Figure 13 .
Figure 13.Number of studies using specific measured or surveyed building systems and energy modeling inputs.

Figure 14 .
Figure 14.Frequency of Actual Meteorological Year (AMY) and Typical Meteorological Year in urban building energy modeling and simulation.

Figure 14 .
Figure 14.Frequency of Actual Meteorological Year (AMY) and Typical Meteorological Year (TMY) in urban building energy modeling and simulation.

Energies 2022 , 27 Figure 15 .
Figure 15.Use frequency of tools and file schemas in physics-based urban building energy modeling projects with at least two use cases.

Figure 15 .
Figure 15.Use frequency of tools and file schemas in physics-based urban building energy modeling projects with at least two use cases.

Figure 16 .
Figure 16.Frequency of data sources for urban building energy modeling development, including the measured data and synthetic prototype data.

Figure 16 .
Figure 16.Frequency of data sources for urban building energy modeling development, including the measured data and synthetic prototype data.

Figure 17 .
Figure 17.The frequency of building types in physics-based urban building energy modeling case studies.

Figure 17 .
Figure 17.The frequency of building types in physics-based urban building energy modeling case studies.

Energies 2022 ,Figure 18 .Figure 18 .
Figure 18.The number of urban building energy modeling studies in different countries.

Figure 19 .
Figure 19.Location of physics-based urban building energy modeling case studies.

Figure 20 .
Figure 20.Frequency of validation or calibration of urban building energy modeling models against the measured data.

Figure 20 .
Figure 20.Frequency of validation or calibration of urban building energy modeling models against the measured data.

Figure 21 .
Figure 21.Areas in physics-based urban building energy modeling with shortcomings identified by researchers.

Figure 21 .
Figure 21.Areas in physics-based urban building energy modeling with shortcomings identified by researchers.

Table 2 .
Urban building energy modeling studies focused on energy conservation measure evaluation.