Big Data Value Chain: Multiple Perspectives for the Built Environment

: Current climate change threats and increasing CO 2 emissions, especially from the building stock, represent a context where action is required. It is necessary to provide efﬁcient manners to manage energy demand in buildings and contribute to a decarbonised future. By combining new technologies, such as artiﬁcial intelligence, Internet of things, blockchain, and the exploitation of big data towards solving real life problems, the way could be paved towards smart and energy-aware buildings. In this context, the aim of this paper is to present a critical review and an in-detail deﬁnition of the big data value chain for the built environment in Europe, covering multiple needs and perspectives: “policy”, “technology” and “business”, in order to explore the main challenges and opportunities in this area.


Introduction
Current climate change threats and increasing CO 2 emissions, in particular from the building stock, represent a context where it is necessary to act upon and provide efficient manners to manage energy consumption and generation in buildings and contribute to a decarbonised economy. Within buildings today, more and more data are being generated, taking into consideration the increasing adoption of information and communication technologies (ICT), such as artificial intelligence (AI), Internet of things (IoT), and distributed ledger technology (DLT)/blockchain [1].
By combining these new technologies and the exploitation of big data towards solving real life problems surrounding buildings, the way could be paved towards a decarbonised future [2]. To this end, it is necessary to understand what technologies and solutions already exist, as well as how existing methods, processes, data models, and platforms could be exploited in the building sector. The aforementioned series of steps and information are crucial to generate value and useful insights from building data. They constitute the big data value chain (BDVC), which is considered to play key role in the future energy economy, bringing opportunities of digitalisation and accelerating the energy transition via the enhancement of building sector energy performance. In terms of an increase in energy efficiency and building performance, as well as intelligent energy management of the built environment, big data can play an important role. As an example, although a discrepancy between predicted and actual performance is unavoidable, solutions are necessary to explain and narrow this gap to make more accurate forecasts and reduce energy demand as much as possible.

•
Policy perspective: the main objective followed was to present the main directives and initiatives in the field of energy, big data, and digitalisation that could play a role and have an impact on the BDVC in the European context. • Technology perspective: this perspective focuses on frameworks, architectures, and datasets that are currently present and could support the BDVA. In particular, the focus has been placed on exploring the role of digital twins as a key element to help connect energy challenges among different scales (buildings to cities). To complement this vision, an in-depth but non-exhaustive analysis of building-related datasets and repositories is presented. Both EU and national-level repositories have been selected, and all of them are public and are related to the building stock, energy fields, or other contextual data (statistical, geographic, geometric, or meteorological data). • Business perspective: given the current context and trends, as well as the technological possibilities, this perspective explores how this can be combined with powerful analytics to improve decision making. In this context, four main challenges are addressed, which correspond to the categories of pilots deployed in the MATRYCS H2020 project: (1) energy performance, (2) building and related infrastructure design and refurbishment, (3) policymaking, and (4) energy efficiency financing. In addition, cross-cutting support is explored through geo-clustering methods.
Thus, the paper is organised following this classification. In Section 2, a detailed review of the big data concept in the built environment, as well as directives and initiatives of the European Union (EU), is presented (policy perspective). In Section 3, an extensive review of existing frameworks, architectures, and existing datasets across EU and at the national level is provided (technology perspective). In Section 4, particular emphasis is placed on analytic services and business models for the built environment (business perspective). The contributions and conclusions of the current study are discussed in Section 5.

"Policy" Perspective in Europe: Big Data Concept in Buildings
A number of European Directives and initiatives are setting strict objectives on Member States in order to meet targets set for 2030. To this end, significant changes and investments are required such as an increase in the renewable energy sources (RES) capacity and buildings stock energy efficiency improvement [3].

European Directives and Regulations
In addition to the common framework established by the European Green Deal [4], of particular relevance are the Energy Performance in Buildings Directive (EU) 2018/844 and the Energy Efficiency Directive (EU) 2018/2002, both part of the 'clean energy for all Europeans' package. It is also worth highlighting the initiative of the Renovation Wave [5], which focuses on boosting building renovation for climate neutrality and recovery. Moreover, existing directives and regulations of the European Commission (EC) related to thematic areas, such as big data, AI, and IoT, are the "European Strategy for Data" [6], "Proposal for a Regulation on European data governance" [7], "White Paper on Artificial Intelligence" [8], "Report on the safety and liability implications of Artificial Intelligence, the Internet of Things, and robotics" [9], "Ethics Guidelines for Trustworthy Artificial Intelligence" [10], "The Open Data Directive" [11], "General Data Protection Regulation" [12], and "Open-source software strategy 2020-2023" [13].

ECTP Vision for Buildings towards 2030
The European Construction Technology Platform (ECTP), as a leading organisation shaping the future of the built environment and construction sector in Europe, recently issued its Strategic Research and Innovation Agenda (SRIA) [14]. Long-term and intermediate goals (for 2050 and 2030) have been established. Climate changes and associated policies towards a CO 2 neutral society are among the main drivers for the transition in the building sector, which call for rethinking the design, maintenance, and management of buildings, as well as for accelerating the renovation of the housing stock and the integration of renewable energy sources.

BDVA Vision for Data towards 2030
The Big Data Value Association (BDVA) has been operative since 2014 and is now becoming DAIRO (standing for Data, AI, and Robotics), with the mission to develop the innovation ecosystem that will enable the data and AI-driven digital transformation in Europe [15]. In the BDVA position paper of November 2020 [16], they envisaged the main bottleneck for the exploitation of AI technologies, as a primary driver of the data economy, to be widespread, secure, and effective data sharing.

Common European Digital Platform and Collaborative Networks
A number of digital platforms and collaborative digital networks in Europe (Table 1) aim to gather information on good practices and achievements by each centre involved, as well as work in line with other European initiatives. Some have the ultimate goal of defining the future of Europe on the basis of data, as well as creating more transparency and visibility, while others focus on energy in buildings with the objective of gathering knowledge and fostering exchange, as well as taking up innovative and effective measures.

Platform Short Description Key Stakeholders
BUILD UP [17] The BUILD UP initiative supports EU Member States in implementing the Energy Performance of Buildings Directive (EPBD).
Professionals working in the building sector with an interest in energy efficiency European Energy Efficiency Platform (E3P) [18] The E3P among other tasks facilitates the practical implementation of the Energy Efficiency Directive at national, regional, and local levels, with data collection and analysis.
Energy efficiency experts in a wide range of thematic areas Coalition for Energy Savings [19] The Coalition for Energy Savings is a common advocacy platform to promote and mainstream energy efficiency at the European level, a centre of expertise on energy efficiency, and a forum to exchange intelligence on energy efficiency.
Businesses, professionals, local authorities, trade unions, consumers, and civil society organisations Housing Evolutions Hub [20] The Housing Evolutions Hub highlights the latest innovations in the field of social, public, affordable, and responsible housing.
Practitioners and policymakers in the area of housing/energy efficiency/sustainability of buildings and neighbourhoods AI4EU [21] AI4EU was established to build the first European Artificial Intelligence On-Demand Platform and Ecosystem.
Wide range of actors including scientists, entrepreneurs, SMEs, industries, funding organisations, and citizens

"Technology" Perspective: Digitalisation of the Built Environment
The digitalisation of the built environment is gaining ground and is exponentially growing today, even more so with the technological advancements and data exploitation capabilities that are becoming more and more available. On this basis, the generation of digital twins as digital data repositories to support specific processes is becoming more and more frequent. However, relevant challenges need yet to be overcome; building data are only partially available, rarely up to date, and almost never integrated into a single platform so that informed decisions can be made at the micro-and the macroscale. A wealth of opportunities exists through the combination of both static and dynamic data through the generation of analytics, leading to more accurate and cost-effective solutions, as well as the ease of analysis processes.

Digital Building Twin
The innovative services, applications, and technologies surrounding the digital building twin are analysed in the paragraphs below (i.e., existing data models, BIM and digital infrastructures, end-to-end process digitalisation, and how other scales can be tackled).

Data Models
Existing data models, relevant in the context of the building sector, are FIWARE Smart Data Models [22], the Industry Foundation Classes (IFC) [23], the international standard City Geography Markup Language (CityGML) [24], the INSPIRE Directive (2007/2/EC) [25], the Smart Applications REFerence (SAREF) ontology [26], and BRICK ontology-based metadata schema [27]. These data models have been harmonised in such a way as to enable data portability for different applications in different domains, when observed from different perspectives: as individual buildings, as a group of buildings (districts), as a group of districts (cities), etc.

BIM and Digital Infrastructures
Building information modelling (BIM) can be defined as a digital toolset used to digitally map buildings or infrastructure facilities for various purposes: visualisation, scheduling, communication, and collaboration between stakeholders through a building/facilities life cycle [28]. BIM allows various users to extract and analyse the data to make decisions and improve the process of delivering the building [29]. However, implementing BIM within the company requires a considerable build-up expertise, especially appropriate employee training and substantial ICT upgrading. This can be especially challenging for small and medium enterprises (SMEs), creating the need for extra investment into new technologies and training which may be difficult and not always possible. According to Poljansek [30] in order to successfully implement BIM in the construction sector, interoperability must be ensured; in this way, various stakeholders can share the information and cooperate the planning. Building owners and investors can slowly adopt the BIM technology until they manage to understand all the benefits of using BIM.

IoT and Integration with BIM-Enabled Platform
IoT devices make use of several protocols to connect to the internet and send data to the data repository (usually, data repositories are proprietary software from companies). Examples of these protocols are LoraWan, Sig-Fox, MQTT, NB-IoT, etc. What they have in common is the light use of bandwidth; usually, devices transfer little information, such as temperature, in a not very high frequency.
These data are very useful in the development of a digital twin of a building (district or city when available) because it is possible to know the behaviour of the building by means of the study of data evolution in time. However, there is a quite important lack of information regarding the location of the IoT devices. Usually, this information is not used by the IoT companies/platforms, and it is not incorporated into the datasets available to download. Commonly, only information about the name (that is selected by the particular user as desired) is available in the dataset, and it is usually not representative.
To make a proper match among devices, their measurements, and the digital twin, it is necessary to know where these devices are installed and what they are measuring. In the particular case of a building, while considering the standard IFC4 and the definition of spatial structures and the definition of devices included, it is needed to match devices to spaces.

City Digital Twin
Digital twin technology has also been applied at city and regional scales. It is important to highlight that, at these scales, the level of detail expected is not the same that at the building scale; it is unlikely that all the needed information will be available for a perfect BIM model for all the buildings in a city or region. Using a digital twin as a city model is an exceedingly difficult concept. In product and plant design, digital twins have been used for decades, to create, test, and build everything first in a virtual environment. The use of digital twins in a city concept is a larger vision compared to industrial design or a building digital twin. This can help with simulations and analysis of present and planned city environments, maintenance and administration systems, emergency planning and management, and the construction production chain. Additionally, the impact of the COVID-19 pandemic is forcing cities to accelerate their digital transformation.
City digital twins are based fundamentally on topographic and geometric models of the city infrastructure. CityGML has been used as a data model for the representation of urban objects in 3D. It defines classes and relationships for the most relevant topographic objects in city models with respect to properties: geometric, topologic, and semantic data. An example of this is Helsinki's 3D data model [31] (offered also as Virtual Helsinki [32]), a digital city twin that in part wascreated using CityGML 2.0 (especially for the Kalasatama digital twin [33]).

Building-Related Datasets and Repositories
Available open datasets in Europe were identified and classified on the basis of the level that the reported information refers to: public EU repositories ( Table 2) and public national repositories ( Table 3). The identified datasets are not only datasets containing building information, but also datasets related to the energy performance of the buildings, districts, and regions. The majority of the datasets that report information at an EU level also contain information about each individual Member State. It monitors the energy performance of buildings across Europe. Among the features under supervision, the energy efficiency levels in buildings (EU countries /EU as a whole), certification schemes, financing aspects, and levels of energy poverty throughout the EU can be mentioned.

EU Energy Poverty Observatory [35]
It is an initiative by the EC to help Member States combat energy poverty. The approach is to use a set of indicators that individually capture a slightly different aspect of the phenomenon.
EUROSTAT [36] It is responsible for publishing high-quality statistics and indicators at the European level that allow comparisons between countries and regions.
Statistical Review of World Energy [37] This report analyses data on world energy markets from the prior year. The review has provided timely, comprehensive, and objective data to the energy community since 1952.
TABULA EPISCOPE [38] A concerted set of energy performance indicators is given. It is focused on residential building typologies and contains data about buildings' energy needs, costs, demand, emissions, etc. as a function of climate zone, construction year classes, and buildings' characteristics.

ENTRANZE [39]
It aims to support policymaking process, by providing the required data, analysis, and guidelines to achieve a fast and strong penetration of nZEB and RES-H/C within the existing national building stock.

Repository Description
ODYSSEE-Full database [40] It contains energy and macroeconomic data at an economy-wide level and environmental indicators at an economy-wide and sectoral level (industry, transport, residential, services, and agriculture) over 2000-2018.
ODYSSEE-Key indicator tool [41] It contains saving rates and consumption data at an economy wide and sectoral level and offers the results in a geolocated form.
ODYSSEE-Decomposition tool [42] This online-web tool decomposes the energy use into various explanatory effects.
ODYSSEE-Market diffusion tool [43] This tool reports indicators reflecting the market diffusion of various energy efficient technologies.
ODYSSEE-Comparison tool [44] This tool enables the comparison of two countries in terms of their energy efficiency performance at an economy-wide and sectoral scale.

ODYSSEE-Energy saving tool [45]
This tool displays the trends and targets for the primary and final energy consumption, as well as the energy savings at a national level.
ODYSSEE-EU energy efficiency scoreboard [46] This tool scores EU countries on (a) the energy efficiency level, (b) the energy efficiency progress, (c) the energy efficiency policies, and a combination of all these criteria.
MURE database [47] It provides information on energy efficiency policies and measures that have been carried out in the Member States of the EU (as well as Norway, Switzerland, and Serbia).

EnergyPlus Weather Data [48]
EnergyPlus is an open-source whole-building energy-modelling engine. Weather data for more than 2100 locations are available in its weather format, which are arranged by World Meteorological Organisation region and country.
Climatic Research Unit (CRU) [49] The objective of the CRU is to improve the scientific understanding of the climate system and its interactions with society. It contains weather data, monthly and annually.

European Environment Agency (EEA) [50]
EEA focuses on providing data related to environmental policies and other topics related to the environment, taking advantage of its extensive network.
Our World in Data [51] It contains information about macroeconomic and energy-related variables.

European Open Data Portal [52]
It represents the access point to data institutions, agencies, and other bodies of the EU.

European Data Portal [53]
It collects the metadata of the public sector information available on the public data portals of EU countries.
World Bank [54] It contains information about the majority of macroeconomic and energy-related variables.
Organisation for Economic Co-operation and Development (OECD) [55] Its website contains information about the majority of macroeconomic and energy-related variables.

Covenant of Mayors [56]
Its website contains information about the climate mitigation measures and targets set per municipality, as well as the estimated impacts in terms of estimated greenhouse gas emissions reduction per sector.
The shift data portal [57] Its website contains information about national macroeconomic and energy statistics.
United Nations Statistics Division website (UNSD) [58] Its website contains information about national macroeconomic and energy statistics.
Ember [59] Its website offers interactive tools that report statistics about energy systems.
Climate Fund Inventory Database [60] It supports recipient countries, least developed ones in particular, by providing consolidated information on the number and types of climate funds that are available.
The Carbon Centre [61] It aims to support cities, towns, and regions tackling climate change (CDP and ICLEI are partnering to present one unified process for subnational climate action reporting). This site contains information about cities' climate mitigation measures, climate targets, and performance in terms of carbon emissions Open Street Map (OSM) [62] Maps are created using geographic information captured with mobile GPS devices, orthophotos, and other free sources.
Copernicus data [63] It offers information services based on Earth observation and "in situ" data covering six thematic areas: atmosphere monitoring, marine environment monitoring, land monitoring, climate change, emergency management, and security.
HotMaps [64] Values related to final energy consumption and useful energy demand for space hearing, space cooling and domestic hot water, construction materials and methodologies, technologies used, and building stock data/information can be found both for the residential and the non-residential sectors per building types and construction vintages.

ZEBRA [65]
It contains information related to energy performance certificates, materials employed for the buildings, energy performance, and final energy consumption, among others.

Repository Description
CommONEnergy [66] It includes building sector data and final energy demand data for non-residential buildings, especially focusing on the trade sector.
Integrated Database of the European Energy System (JRC IDEES) 2015 [67] JRC IDEES offers a set of disaggregated energy-environment-economy data, compliant with the EUROSTAT energy balances, as well as widely acknowledged data on existing technologies. It also contains a plausible decomposition of final energy consumption.

ExcEED [68]
A European database for measured and qualitative data on beyond the state-of-the-art buildings and districts.

iNSPiRe [69]
Building stock analysis and data gathering exercise focusing its attention on published literature and other sources, aiming to extrapolate information about the current residential and office building stock.
ZENSUS 2011 [70] This dataset contains disaggregated data concerning a building stock analysis for Germany, information about the occupancy of the buildings, and socioeconomic-related data. Information concerning the type of heating systems used is also reported.
Towards a sustainable Northern European housing stock-Sustainable Urban Areas 22 [71] It contains complete data for a building stock analysis with data varying from state to state between 2000 and 2006. Data concerning material used and heating, ventilation, and cooling systems installed are also reported. Construction/demolition rates  have been added to the report.
DEEP [72] DEEP is an open-source database for energy efficiency investment performance monitoring and benchmarking. It provides an exhaustive analysis of the performance of energy efficiency investments in order to support the assessment of the related benefits and financial risks.
D'Agostino et al. [73] It provides an overview on the results of the data collected by the Green Building Programme (GBP) and its main results from the launch in 2006 up to its completion in 2014. It focuses on building characteristics, energy performance, efficiency measures, and energy savings.
National Housing Census: European statistical System [74] This dataset contains a variety of data collected in relation to the national census performed in 2011 by EU27 + UK Member States. It is possible to find data concerning households, such as the number of components of single households at a granularity until NUTS3 level.
EDGAR [75] Carbon dioxide (CO 2 ) emissions by country and sector (buildings, transport, other industrial combustion, power industry, and other sectors) have been collected for the years between 1970 and 2018 and are reported expressed in MtCO 2 /year.

CORDEX [76]
Climatic data for Europe expressed as daily, monthly, and seasonal mean values, as well as at 3 or 6 h resolution. Data for air temperature at 2 m, wind speed, atmospheric pressure, and humidity can be found.

PVGIS-Photovoltaic Geographical Information System
This GIS dataset contains data related to the solar radiation. It considers both day-and night-time periods, expressing the solar radiation raster map in W/m 2 . This database provides information related to energy verification, primary energy demand, transmittance (u-value) of façade elements, thermal production systems and emission systems, and photovoltaic and solar panels.
GreenDataset [79] It includes detailed power usage information, obtained through a measurement campaign in households in Austria and Italy

Slovenia
Portal energetika [80] National portal where data on energy efficiency, RES production, energy certificates of buildings, energy management, etc. are collected.

OPSI [81]
It is a single national website for the publication of open data for the entire public sector.

Poland
Geoportal [82] Data (cloud point) from Airborne Laser Scanning for Poland (ALS), land development, land developments plans, and cadastral data.
EPC register [83] This database covers only public office buildings.

Spain
Spanish Cadastre data [84] It makes the cadastral data of the territory under its jurisdiction available to citizens (almost the entire national territory). Information about properties' cadastral information is organised by municipality, and it is INSPIRE-complaint.
AEMET OpenData [85] AEMET OpenData is a system for the dissemination and reuse of AEMET information. The State Meteorological Agency of Spain is a state agency whose objective is the provision of meteorological services, which are the responsibility of the State. Table 3. Cont.

Country Repository Description
INE Open data [86] The National Statistics Institute has created the Open data space in order to include the public information resources generated in it.
CNIG Download Centre [87] This website provides digital geographic information produced by the National Centre of Geographic Information. Other relevant public repositories can be found and exploited at the regional level, such as the Energy DataHub [98] and EPC Register [99] from the region of Castilla y León. Existing private data sources that could provide useful information are "Enerdata" related to national energy statistics (e.g., energy demand, CO 2 emissions) and the International Energy Agency (IEA, Paris, France), which contains information about most energy-related variables (e.g., energy consumption, energy intensity, energy prices) at a national and EU level.

"Business" Perspective: Data Analytics for the Built Environment
Data analytics in the activities and businesses related to the building sector and its lifecycle can identify useful information patterns within large datasets that can be transformed into actionable outcomes/knowledge to support improved decision making [100,101]. Data analytics applications can support energy management and optimisation, building and related infrastructure refurbishment and design, policymaking, investment de-risking, and a big data vision through geo-clustering.

Data Analytics for "Energy Performance"
A number of analytic services have been developed, focused on the operational stage of the buildings (Table 4). These services are aimed at monitoring, analysing, and improving the energy performance of buildings, as well as incorporating predictive capabilities related to comfort evaluation, energy demand, consumption, generation, and other building capabilities. The scope is to improve building energy performance and infrastructure energy performance, as well as optimise building energy efficiency considering thermal comfort perspectives, among others.

Data Analytics for "Building and Related Infrastructure Design and Refurbishment"
Data analysis allows the design, refurbishment and development of buildings and their related infrastructure, phenomena that are traditionally considered subjective, complex, or unpredictable. Thus, standardised catalogues and Energy Conservation Measure (ECM)based scenarios evaluation have been used, offering appropriate retrofitting solutions to the specific typology of analysed buildings and related infrastructure according to their current state (Table 5). Evaluating the correct application of an ECM in the energy efficiency of buildings is an increasingly popular topic in construction. Some other tools are available at the national level, such as URSA from Slovenia, EnCert-HR from Croatia, and KI Expert Plus from Croatia. All of them require building information regarding building refurbishment, refurbishment costs, and usability. On the international level, the National Renewable Energy Laboratory (NREL) of the US Department of Energy developed a tool to prioritise energy efficiency investments. The tool uses established methodologies to evaluate the energy savings and cost of those savings of these investment opportunities [119].

EPC Harmonisation and Analytics
Although the Energy Performance Certificates (EPCs) for the Member States come from the implementation of the same European directive (EPBD), EPCs datasets are very heterogeneous for different countries and regions of the EU, and their comparison is very challenging. The availability of harmonised EPC datasets across Europe would be beneficial firstly at the EU level, allowing the comparison of EPCs datasets from different regions and countries, and secondly for regional energy agencies, as a valuable support to energy efficiency policies. As expected, there is limited material referring to applications that use ML and DL algorithms for EPCs (Table 6).

Supporting Policymaking Impact Assessment and SECAPs Implementation
State-of-the-art methods and applications related to Sustainable Energy and Climate Action Plans (SECAP) decision making and policy evaluation were gathered for the purposes of this section. The findings are separated in two tables; the first one includes studies that analysed mainly textual information (Table 7), and the second one includes studies that took a more vertical approach in climate change relating issues that affects policymakers (Table 8). Table 7. Overview of policymaking AI-relevant studies.

Topic modelling Latent Dirichlet analysis
Lesnikowski et al. [124] Identifying environmental issues in Canada's local governments. Topic modelling Robust latent Dirichlet analysis Rana and Miller [125] Proving that machine learning approaches can help understanding natural resource policy and predicting socio-economic effects of them.

Socioeconomic Systems and econometrics methods
Causal tree (CT) and causal forest (CF) decision-tree algorithms Biesbroek et al. [126] Mapping actions regarding climate change adaptation by from policy texts and identifying high confidence blocks of adaptation.

Sorting and topic modelling ANN
Debnath et al. [127] Deep-narrative analysis in energy politics. Topic modelling, grounded theory Latent Dirichlet analysis Hanchen et al. [128] Identifying patterns and trends regarding hydro energy research and contributing towards strategy planning for hydro production growth.

Topic modelling Latent Dirichlet analysis
Tavana et al. [129] Identifying key issues in energy sector and the techniques that policymakers use for risk assessment.
Text clustering, topic modelling K-means clustering Boussanis and Coan [130] Introducing methodology to identify and record key issues regarding policy issues and climate change topics.

Text analysis Latent Dirichlet analysis
Kreif and Ordaz [131] To provide an overview and an illustration of machine learning methods for causal inference, with a view to answer typical causal questions in policy evaluation Overview Several ML methods  [137] To analyse climate policy effects by providing an ex post evaluation of a real-world policy experiment of carbon pricing: the UK carbon tax, also known as the Carbon Price Support.
Fossil-fuel power plant output, fuel prices, carbon prices, emissions factors and plant-specific heat efficiencies, plant capacity, demand, temperature Causal inference

Data Analytics for "Energy Efficiency Financing"
When planning investments in the field of increasing the energy efficiency of a building, it is extremely important to obtain substantive support [138]. In the context of Triple-A [139], the Deep platform's data are analysed for constructing some predefined performance classes about energy efficiency projects. Accordingly, in the context of SPEEDIER [140], the patterns and trends of the building' data (e.g., user's behaviour, energy consuming equipment on site, building fabric characteristics, etc.) are analysed to find potential areas for implementing energy saving retrofits.
In the context of CityInvest [141], the data of the energy efficiency projects that have been implemented through the project are analysed, in conjunction with the data of the local authorities in pilot regions, to recognise the key success factors of energy efficiency retrofits. This feedback is then integrated in future cases. EEnvest [142] and Industrial Energy Accelerator (IEA) [143] analyse the data of the inspected energy efficiency projects (e.g., experience and capability of the technical stuff, model used for baseline estimation) to quantify their risk of failing to achieve their predicted performance. Launch [144] followed the same approach, while also analysing the macroeconomic data of the country in which the investment takes place, such as the energy prices trajectory, for calculating the risk of an energy efficiency project.
Fowlie [145] conducted an experimental evaluation of the Weatherisation Assistance Program using a sample of approximately 30,000 households in Michigan. R2A [146] cre-ated a database that contains big data, both real records and modelled data, about buildings household's energy efficiency measures and their performance across eight countries. In the context of the Energy and Environmental Policy Analysis (EEPA) [147], a technology-based methodology for controlling the energy efficiency project's performance was implemented. In the same context, Fan and Fu [148] performed a case study on building energy efficiency retrofits, mining operational data of the building.
Two data sources were used by Garay-Martinez et al. [149] for monitoring the building's energy consumption and comfort conditions. One was the radiator energy use and the other was a weather station on the roof of the building, for capturing the climate conditions. Heo and Zavala [150] presented a Gaussian modelling framework for measurement and verification (M&V) practices. Gallagher et al. [151] employed powerful ML regression algorithms to maximise the effectiveness of available data.

Geo-Clustering Service as Support to the BD Vision
Clustering technologies take advantage of different data sources and process them in order to group the different elements in sets with similarities, which can provide very valuable information. Geo-clustering applications use also the geographical information available in order to include the geographical component in the algorithms, providing results that consider the location of the elements (Table 9). Table 9. Overview of ML/DL applications in geo-clustering.

Reference Service Features Learning Algorithm
Kuster et al. [152] Geo-mapping methodology Definition of 116 clusters using 16 parameters on building domain.
Not defined: use Matlab + Excel and "some criteria to select cluster" not specified Sesana et al. [153] Geo-cluster mapping tool Geo-cluster concept is based on the possibility to locate similarities across enlarged EU by correlating single or multiple parameters and indicators organised in homogeneous layers and sublayers.
Correlation and cross-correlation between variables.
Exceed Project. [154] Geo-cluster tool Geo-clustering of building performances according to both energy and comfort aspects through the identification of specific KPIs. Classification of buildings by filtering them with building metadata. Benchmarking of building using specific indicators.

K-means algorithm
Fatiguso et al. [155] Building geo-cluster Collection of geographic and climatic data, simulation of solar radiation and wind exposure, mapping of typologies, materials, construction techniques, and historic architectural values of all the buildings.

ArcGIS mapping cluster toolset (not specified what)
Gangolells et al. [156] Building geo-cluster Novel approach for identifying and defining a set of reference buildings by applying the k-means clustering method to energy performance certificate database.
K-means algorithm

Conclusions
Buildings are producing increasing data on energy production and consumption from various sources (e.g., smart meters, building management systems). Collecting, processing, analysing, and provisioning of reliable building data are key challenges for the built environment. There are currently a number of barriers which are actually hampering the exploitation of its full potential in improving building energy performance and management.

•
The digitalisation of the built environment is a critical issue in the architecture, engineering, and construction (AEC) industry, which is rapidly increasing. However, there is still a great amount of effort to be exerted. In particular, in the existing building stock, finding digital data to characterise buildings, their materials, or their energy consumption has been an unsuccessful endeavour.

•
The lack of quality and accuracy of data is also a challenge. Robust data quality strategies and methodologies are needed for data imputation, covering data uncertainty, data quality, reliability and data consistency, and data cleansing. For an improvement or increase in data stock, not only monitoring and digitalisation are essential. In some cases, there is no possibility of integrating hardware for digitalisation. In these cases, deep learning strategies for extracting data with more granularity are necessary and enable digital strategy. • At the same time, a standardised data-driven architecture for buildings is missing.
In this respect, sector-wide asset schemas should be defined, stemming from extended domain-specific ontologies to allow the curation, normalisation, and homogenisation of diverse content types and artefacts, such as FIWARE, SAREF (including extensions of SAREF4Buildings), BRICK, and IFC (BIM).

•
A systematic approach to organising and managing data is largely missing, taking into consideration that most information is not available in one place. The lack of interoperability across repositories leads to additional costs.

•
One of the main challenges concerns trusted mechanisms of data sharing and reusing, in order to maximise the value of AI-based analytics.
A solution to pave the way for an improved BDVC can be found in the deployment of digital twins. However, the notion of compiling multidimensional digital models that can support the entire lifecycle of the asset and that such models can be mined for data and used for ML/DL requires a leap of imagination. New initiatives are necessary to address the four main pillars that will unlock the deployment of digital building twins: (1) modelling and integration of information; (2) data enrichment; (3) assuring their interoperability with different data hubs; and (4) linking them with real business cases.
The data-driven architecture should lay the foundation for facilitating seamless B2B multilateral interoperability and supporting cross-stakeholders and cross-domain-based near-real-time edge-based analytics services tailored to the stakeholders of the building value chain. In this way, new data-driven paradigms can be developed, to analyse and manage vast volumes of heterogeneous data, whose management becomes essential with a view to address increasingly complex energy systems. Operational efficiency and the reliability of buildings as an active node of bigger ecosystems (i.e., districts, power systems) can be employed to inform real-time decisions, provide early warning for abnormal conditions, and avoid potential failures. Moreover, available data can be leveraged to better understand energy demand and align this to energy generation and distribution to maximise operational efficiency in buildings or groups of buildings, thus optimising the management of assets and grids connected to the building.

Acknowledgments:
The work presented is based on research conducted within the framework of the H2020 European Commission project MATRYCS under contract No. 101000158. All information related to the MATRYCS project is available on the website https://matrycs.eu/ (accessed on 28 July 2021). The authors wish to thank all the consortium partners, especially Marco Pau (RWTH), Pasquale Andriani (ENG), César Valmaseda (CARTIF), Julien Dijol, and Alice Pittini (Housing Europe), whose contribution, helpful remarks, and fruitful observations were invaluable for the development of this work. The content of the paper is the sole responsibility of the authors and does not necessary reflect the views of the EC.

Conflicts of Interest:
The authors declare no conflict of interest.