Big Data Reference Architecture for the Energy Sector

Wehrmeister, Katharina; Pastor, Alexander; Carreras Rodriguez, Leonardo; Monti, Antonello

doi:10.3390/su17146488

Open AccessArticle

Big Data Reference Architecture for the Energy Sector^†

by

Katharina Wehrmeister

^1,2

,

Alexander Pastor

¹

,

Leonardo Carreras Rodriguez

¹

and

Antonello Monti

^1,2,*

¹

Institute for Automation of Complex Power Systems, RWTH Aachen, 52074 Aachen, Germany

²

Digital Energy, Fraunhofer FIT, 52068 Aachen, Germany

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in the IISA Conference, Corfu, Greece, 18–20 July 2022 (DOI: 10.1109/IISA56318.2022.9904424).

Sustainability 2025, 17(14), 6488; https://doi.org/10.3390/su17146488

Submission received: 18 June 2025 / Revised: 12 July 2025 / Accepted: 14 July 2025 / Published: 16 July 2025

(This article belongs to the Special Issue Sustainability of Smart Energy Networks: Pathway for Achieving a Green Smart Energy Network—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Data sharing within and across large, complex systems is one of the most topical challenges in the current IT landscape, and the energy domain is no exception. As the sector becomes more and more digitized, decentralized, and complex, new Big Data and AI tools are constantly emerging to empower stakeholders to exploit opportunities and tackle challenges. They enable advancements such as the efficient operation and maintenance of assets, forecasting of demand and production, and improved decision-making. However, in turn, innovative systems are necessary for using and operating such tools, as they often require large amounts of disparate data and intelligent preprocessing. The integration of and communication between numerous up-and-coming technologies is necessary to ensure the maximum exploitation of renewable energy. Building on existing developments and initiatives, this paper introduces a multi-layer Reference Architecture for the reliable, secure, and trusted exchange of data and facilitation of services within the energy domain.

Keywords:

energy efficiency; SGAM; smart grids; renewable energy; big data; reference architecture

1. Introduction

1.1. Motivation

Working towards climate and carbon neutrality, the rise of renewable energy sources has led to an increasingly decentralized approach to the electricity system. Generation is shifting from a relatively small number of large power plants to more and more small-scale sites, such as residential and public buildings. In addition to being climate-friendly, this can reduce energy shortages and promote the community’s local consumption; however, decentralization and the resulting complexity present challenges to efficient and secure grid management. The stochastic nature of distributed energy resources and new control variables demands near-real-time big data processing to maintain grid resilience and reliability. To tackle this, a much larger amount of Information and Communication Technology (ICT) is needed in the network than was previously present, enabling decision-making and control technologies to ensure the reliability and security of supply [1,2].

These technologies require large amounts of data from disparate sources (e.g., optimized grid operation planning, needing up-to-date information on a diverse set of energy resources [3]). In the emerging decentralized system, many stakeholders, data sources, and existing technologies are involved. Vital legacy grid systems use proprietary protocols that are incompatible with modern cloud-IoT platforms, leading to isolated data silos. Forming coherent conclusions and analyses in such a diverse situation is one of the challenges of energy transition [4].

Another example is the employment of transfer learning to train advanced AI forecasting tools on multiple households’ consumption history [5]. While immensely useful to system operators, such data can be seen as sensitive and must be treated with appropriate care. Therefore, in addition to interoperability, privacy and trust must be considered. Although, e.g., IDS-based Data Spaces (see Section 2.3.2), offer a blueprint for sovereign exchange, they must be tailored to energy-sector-specific consent, licensing, and compliance needs.

There are different strategies and technologies under development and in use to address this complex landscape of requirements, for example, measures for semantic interoperability, Data Spaces for trusted exchange, or Artificial Intelligence (AI) tools for data analysis, processing, and forecasting. However, these are often individual solutions for parts of a larger issue. In order to reach the full potential of these technologies and truly transform the energy domain, they need to come together to form coherent systems that cover not just some but all of these matters, overcoming barriers between the different requirements and components [6].

Finally, one more specific challenge is the exponentially growing volume of data present in the energy grid and collected by system operators. With existing computational platforms mostly relying on cloud computing, a holistic Reference Architecture is needed to utilize their full potential by staying adaptive and scalable in this situation [4].

Instead of providing a fixed software architecture addressing one specific implementation of a use case, Reference Architectures (RAs) aim to provide a structured development approach for a set of use cases by putting all the required building blocks into perspective and in relation with each other [7]. While there are plenty of Big Data Reference Architectures [8,9] and Smart Grid Reference Architectures [10,11], none have emerged that successfully connect both concepts in a generalized manner, covering all the above issues.

In this paper, we present the BD4NRG Reference Architecture, which forms a coherent framework for the construction of systems across different use cases (UCs) and business models in the changing energy domain while maintaining a high level of interoperability and ensuring trusted data exchange.

1.2. Methodology

The foundation for creating the BD4NRG Reference Architecture has two aspects. First, an analysis of the existing literature was conducted, the findings of which were classified into four sectors: Architecture Models, Project Coalitions, Data Space Reference Architectures, and Concrete Microservice Architectures [12]. Secondly, a broad set of real-life energy domain use cases was set up, which were ordered into three groups: optimized energy network management, the efficient management of Distributed Energy Resources (DERs), and de-risking investments into building decarbonization, respectively (see Section 3.1).

Bringing these two foundational aspects together, the use cases were first individually mapped to the version of the BRIDGE Data Exchange Reference Architecture (DERA) (see Section 2.3.1) available at the time. It was chosen as a base due to its superior versatility and prevalence compared to its peers [12]. The layer structure was then iteratively altered to accommodate all of the needs of the UCs accurately in a single architecture by identifying and incorporating aspects of the remaining literature (such as IDS RAM [13]), as well as adding BD4NRG-specific tweaks. An early version of the BD4NRG Reference Architecture emerged, which was designed to meet the initial requirements derived from the set of UCs during their preparation phase. That early version is available at [14].

This paper presents the final iteration of the BD4NRG RA, which has been significantly improved. Requirements (see Section 3.2) were extended from the initial version as example systems and use cases continued development, moving out of conceptualization through implementation and into real-life operation. Modifications were developed and included according to lessons learned in the implementation and testing phase, and the validated Reference Architecture presented in this paper (Section 4) is the final product of that iterative refinement [15].

1.3. Document Structure

The document is structured as follows. In Section 2, we define our understanding of a Reference Architecture and introduce the state of the art of big data handling within and outside of the energy domain. The three clusters of UCs are presented and used to derive specific requirements in Section 3. In Section 4, we introduce the BD4NRG RA, describe how it meets those requirements, and follow up with an analysis of indicators of its benefit. Section 5 contains a set of exemplary implementations. Section 6 reiterates conclusions on the development, assessment, and instantiation of the RA.

2. Related Architectures and Initiatives

As mentioned in Section 1.2, a literature review was conducted early in the creation process for the BD4NRG RA [12]. While that analysis was used to develop the first version of the RA [14], it is not presented here due to the time that has passed since, limiting its relevance to the final product as well as the current state of the art. Instead, the purpose of the following section is to position the final iteration presented in Section 4 in the current landscape of technology and research.

2.1. Purpose of a Reference Architecture

A single comprehensive definition of a Reference Architecture does not exist [7,16]. Their general purpose is to streamline and standardize the development of software in a given domain, improving the development process by giving a structured basis to a set of systems. They are therefore not software architectures by themselves; Nakagawa et al. assess the differences between the two concepts thoroughly in [16]. Instead, RAs serve as a template and guidance to the development of multiple software systems, serving different contexts with a common structure while prioritizing interoperability [17,18].

In [7], Cloutier et al. perform a comprehensive review of RAs in different contexts and solidify a set of characteristics and tasks:

Capturing knowledge from existing architectures;
Guiding the creation and evolution of new architectures;
Addressing technical architectures, business architectures, and customer context;
Being presented alongside sufficiently concrete information and guidelines.

Given a large spectrum of available services and technologies, the selection of the elements in the RA streamlines the instantiation of concrete software architectures with a standardized and interoperable implementation.

This is the interpretation we will be basing our approach on, including capturing knowledge from the existing architectures described in Section 2 and basing our requirements on the use cases described in this section. In Section 4.2, we use the characteristics given by Cloutier to assess the resulting RA. This paper also specifically aims at an energy-domain-specific Reference Architecture in order to be able to properly address all the needs of the UCs.

2.2. Architectural Models and Implementations in Big Data Systems

2.2.1. Relevance and Distinction of Data Spaces and Big Data

Big data refers to large, complex datasets, often streamed in real time, that require advanced methods for ingestion, storage, processing, and analysis. It is typically characterized by the five Vs: volume (scale of data), velocity (speed of data generation and processing), variety (heterogeneity of data types and sources), veracity (uncertainty and data quality), and value (the potential to derive insights and benefits). Big data systems aim to extract actionable knowledge and support decision-making through scalable architectures and analytical tools [19].

Data Spaces, in contrast, are decentralized frameworks for enabling sovereign, secure, and policy-compliant data sharing among independent stakeholders. They define how data assets can be discovered, accessed, and governed in distributed environments. A Data Space emphasizes semantic interoperability, usage control, identity management, and trust mechanisms, often underpinned by European regulations such as the Data Governance Act (DGA) and the Data Act [20].

While big data architectures focus on the technical processing and analysis of data, Data Spaces address the organizational and legal frameworks for sharing and governance. These concepts are not opposed but complementary: big data platforms provide the computational foundation for deriving insights, and Data Spaces ensure that data can be exchanged and reused in a controlled and trusted manner across domains. In the energy sector, this combination enables advanced data-driven services—such as forecasting, optimization, and predictive maintenance—while ensuring compliance with privacy, security, and sovereignty requirements.

2.2.2. Relation of BD4NRG to Big Data Concepts

Big data systems rely on well-established architectures to ensure scalability, efficiency, and governance. These architectures can be categorized into theoretical concepts, RAs, and implementations. Theoretical concepts define new paradigms, RAs provide structured models for system design, and implementations use concrete technologies to realize them. However, existing Big Data Reference Architectures face notable challenges. A systematic review by Ataei and Litchfield [9] highlights that while these architectures offer essential design models to orchestrate complex big data systems, they often lack proper metadata management, overlook security and privacy requirements, and provide insufficient evaluation frameworks. These shortcomings limit their adaptability and applicability in real-world deployments.

The ISO/IEC 20547-3 standard defines Big Data Reference Architectures (BDRAs), focusing on two key perspectives: the user view and the functional view. The user view describes the big data ecosystem in terms of stakeholder roles, sub-roles, and activities, recognizing that parties may assume multiple roles and engage in different big data tasks. The functional view, on the other hand, outlines the technology-neutral functions necessary to support these activities, structured across four architectural layers: application, processing, platform, and infrastructure. These are complemented by cross-layer functions such as integration, security and privacy, and system management. Together, these views establish a logical, vendor-neutral model of big data systems [21]. The BD4NRG Reference Architecture adopts a similar separation of concerns, mapping its functional components to layers that can support energy-specific data flows, analytics, and trust frameworks.

The NIST Big Data Reference Architecture (NBDRA) is a Reference Architecture that standardizes the components of a big data ecosystem. It defines five key roles: System Orchestrator, Data Provider, Big Data Application Provider, Big Data Framework Provider, and Data Consumer. Its goal is to facilitate interoperability across different big data solutions by providing a vendor-neutral framework [22]. BD4NRG reflects a compatible role-based decomposition but adapts it to the energy domain by incorporating stakeholders such as DSOs, TSOs, and energy service providers, along with dedicated analytics and data exchange roles.

The Lambda Architecture is another RA, designed for handling large-scale data processing by combining batch processing (historical accuracy) with real-time processing (low-latency insights). It consists of a Batch Layer, Speed Layer, and Serving Layer. While not tied to specific technologies, typical implementations include Apache Hadoop (batch) or Flink (real-time), as well as NoSQL databases like Cassandra [23]. This dual-path approach is conceptually reflected in BD4NRG through its support for both streaming and batch ingestion, processing, and analytics services, enabling time-sensitive as well as historical evaluations.

Finally, DIN SPEC 91345 is a standard first established by the German initiative “Plattform Industrie 4.0” [24]. It defines a Reference Architectural Model for Industry 4.0 (RAMI 4.0), which establishes a set of guidelines and specifications for the design and implementation of Industry 4.0 solutions. Aiming to converge the separate worlds of Information and Operation Technology (IT/OT), its fundamental structure is defined by a three-dimensional coordinate system, based on the well-established Smart Grid Architecture Model (SGAM) [10,25]. The RAMI 4.0 model is accordingly structured into six layers (Asset, Integration, Communication, Information, Functional, and Business) and seven Hierarchy Levels (Product, Field Device, Control Device, Station, Work Centers, Enterprise, Connected World) [25]. This structure was thoroughly examined and employed as an important basis for the development of the first initial version of the BD4NRG RA, published in 2022 [12,14,18].

One theoretical concept relevant to the BD4NRG goals is Data Meshes. These aim to shift from centralized data lakes toward decentralized, domain-oriented data ownership. To eliminate bottlenecks in data access, it promotes data-as-a-product, self-serve infrastructures, and federated governance. Though theoretical, Data Mesh structures are being implemented in enterprises like Netflix and Zalando using technologies such as Kafka and Snowflake [26]. BD4NRG acknowledges these principles by enabling federated data sharing and semantic interoperability through its vertical Data Space pillar, thereby addressing decentralization, data ownership, and governance.

On the implementation level, the Cloud Native Computing Foundation (CNCF) landscape categorizes cloud-native tools for big data infrastructure. It provides a map of open-source and proprietary solutions across storage, data processing, and orchestration. Unlike RAs, CNCF is not a model but a collection of real-world tools used to build scalable big data platforms [27]. The BD4NRG RA remains technology-agnostic, but its layered design is compatible with cloud-native deployments and toolchains, enabling implementation via CNCF-aligned platforms.

2.3. European Landscape

There are several joint European initiatives addressing the topic of today’s ICT systems’ increased complexity (including, but not limited to, the energy domain) and the necessity for trusted data exchange and its coordination.

These initiatives share common goals with the models and implementations introduced in Section 2.2—interoperability, data governance, and decentralized data sharing—but differ in scope. Initiatives and projects funded by the European Commission tend to focus on cross-organizational and cross-border data exchange under regulatory frameworks like the DGA and Data Act, while architectures like NBDRA, Lambda, and Data Meshes primarily address data processing within individual organizations. Both emphasize federated data access: Data Meshes in enterprises, and the European approach in initiatives like GAIA-X [28]. However, the development of systems on the EU level requires additional layers of legal, Trust, and Sovereignty mechanisms, which go beyond traditional big data architectures.

In the following, we introduce the European initiatives that were the most fundamental to the development of the BD4NRG Reference Architecture, along with its own previous version, first introduced in 2022.

2.3.1. BRIDGE DERA

BRIDGE is a European initiative that was founded under the Horizon 2020 program and now continues under Horizon Europe. Split into four working groups, it aims to tackle wide-ranging challenges in projects across the domains of Smart Grids, Energy Storage, Islands, and Digitization [1].

The initiative is split into four working groups: Data Management, Regulation, Consumer and Citizen Engagement, and Business Models. BRIDGE DERA is being developed by the working group on Data Management. It aims to structure and standardize data exchange practices not just between Distribution System Operators (DSOs) and Transmission System Operators (TSOs), but also between any of the energy sector’s multitude of stakeholders. It is based on the SGAM framework, which visualizes the concept of a Smart Grid in a three-dimensional way through Layers, Domains, and Zones [10,29]. While SGAM provides an intuitive 3D mapping of smart-grid components, it remains purely conceptual and does not natively capture modern data sharing paradigms or emerging digital actors (e.g., AI services or data marketplaces).

DERA 2.0 was published in 2022 and consists of the five vertical layers given by SGAM: Component, Communication, Information, Function, and Business. DERA 2.0 adds a horizontal divide within most of them (in contrast to the Zones and Domains in SGAM) to separate data-exchange-specific aspects from each other. Its graphical representation also includes specific common instances of each area, but it is not restricted to those alone [30].

The Component Layer is split into three sublayers: data exchange solutions, applications, and hardware. It is sparsely populated, as the individual instances of all three can vary significantly by application. Above it, in contrast, the Communication Layer consists of Data Formats and Protocols, with each naming several specific popular standards. The Information Layer is separated into two areas (Information Models / Ontologies and Profiles / Data Models), while the Functional Layer contains a larger amount of exemplary Functional Processes necessary for data exchange, security, and quality. Lastly, on the Business Layer, we find four categories of stakeholders: Regulation, Associations, Role Models, and Business Processes [30].

DERA 3.0 is its follow-up version, published in 2023. It leans more into the emerging field of Data Spaces, keeping the same layer structure as DERA 2.0 but removing the individual sublayers in favor of splitting the entire architecture into “Local” and “Federated” areas. The Local part refers to (multiple) data platforms implemented by individual actors or groups thereof, collecting and maintaining their own data locally. The Federated part represents a Data Space, functioning as a connecting hub and providing indexing, discovery, and trading for both data and services.

The two blocks are connected by a Data Space Connector, which is to be implemented by each individual Local platform in order to communicate with the Federated Data Space [1].

Although valuable for cross-project alignment, the high-level structure of DERA lacks a dedicated Governance Layer and leaves data ownership, lineage, and compliance requirements underspecified. BD4NRG refines DERA by introducing a concrete, system-wide governance pillar with modules for consent management, audit logging, and policy enforcement, turning the conceptual blocks of DERA into an implementable blueprint.

The development of the BRIDGE DERA is based on dedicated survey input from a multitude of Horizon projects in this field, including BD4NRG [30]. It is, therefore, a very broad approach with limited specificity, but it is a very useful and reputable base to start from.

2.3.2. IDS-RAM

The International Data Spaces Association (IDSA) aims to leverage existing standards and technologies to facilitate sovereign and self-determined data exchange between otherwise independent stakeholders. It works on research, standardization, and development activities for tangible products towards this aim [31].

Part of these activities has included the development of the IDS Reference Architecture Model (IDS-RAM). It is a high-level representation of stakeholders’ needs across a data sharing system, structuring the required aspects into five layers across the perspectives of security, certification, and governance. As IDS-RAM is sector-agnostic, these layers are similar but not identical to those of SGAM. From bottom to top, they are the System Layer, Information Layer, Process Layer, Functional Layer, and Business Layer [13].

In line with this work (and specifically existing on the Business Layer), IDSA has defined the roles of different participants within a Data Space (DS) and their interaction. They are split into four categories:

Core Participants include members directly involved in any data exchange, namely the Data Owner, Data Provider, Data Consumer, and Application Provider.
Intermediary Participants are responsible for the connection between participants and the discovery and registration of the assets of a DS. They are The Metadata Broker Service Provider, Clearing House, Identity Provider, App Store, and Vocabulary Provider.
Software and Services, as a category according to its name, includes Software and Service Providers; Software Providers are responsible for the actual implementation of functionalities required by the DS, and Service Providers offer additional services to participants (such as analytics).
Governance Bodies represent the IDSA itself, as well as a Certification Body and Evaluation Facility, which are responsible for governing the participants and components within a DS, ensuring certification for collective quality assurance and standardization.

These roles serve as a way to structure the members involved in a DS and emphasize the responsibilities that need to be covered within it [31].

IDS-RAM excels at sovereign data exchange but is sector-agnostic and focuses solely on the data sharing aspect without guidance on integrating, e.g., energy-domain services or legacy grid systems. BD4NRG embeds IDS connectors and roles within an energy-specific stack, linking sovereign data exchange directly to grid operation platforms, DER management, and analytics tools, bridging the gap between generic Data Spaces and practical energy use cases.

2.3.3. Common European Data Space

The Common European Energy Data Space (CEEDS) Blueprint v2 [32] provides a broad conceptual architecture for federating diverse energy sector data platforms across Europe, emphasizing governance, technical, and semantic interoperability. CEEDS specifically builds upon existing initiatives such as the BRIDGE DERA and Smart Grid Architecture Model (SGAM), extending them into a federated, pan-European ecosystem.

It introduces standardized connectors that allow local data platforms—ranging from metadata hubs and flexibility registers to e-mobility service providers and DER management systems—to participate seamlessly within a unified Data Space. By integrating federated identity management, semantic vocabularies (e.g., IEC CIM and SAREF), and enforceable usage policies, CEEDS aims to enable secure and trusted cross-border data exchange. However, CEEDS remains at the conceptual level, providing detailed approaches and recommendations to guide the future real-world realization of energy Data Spaces, with a particular focus on enhancing existing infrastructures toward full Data Space integration.

In addition to defining the general architecture, the blueprint identifies a set of representative, high-level energy business use cases—such as collective self-consumption in energy communities, residential DER aggregation, TSO-DSO flexibility coordination, electromobility roaming, and renewables operation and maintenance optimization—which illustrate the economic and operational benefits expected from the adoption of Data Space principles in the energy sector.

CEEDS has been developed collaboratively within the Energy Data Space Cluster Projects (EDSCP) initiative, with the intention that it will align with broader European Data Space standards, such as those defined by the Data Spaces Support Centre (DSSC) and ISO/IEC/IEEE 42042 [33].

The authors of this paper contributed to the development of CEEDS. It outlines a visionary pan-European Data Space but remains a conceptual blueprint with no concrete reference implementation and presupposes a fully federated ecosystem. BD4NRG is fully aligned with CEEDS principles, yet it is immediately actionable; it can be instantiated for current use cases without waiting for or requiring a continental-scale Data Space while remaining “Data Space ready” for future federation.

2.3.4. BD4NRG: First Version

The first version of the BD4NRG Reference Architecture was presented in 2022 at the 13th International Conference on Information, Intelligence, Systems & Applications (IISA) [14]. It represents the state of the BD4NRG Project at the time of publishing. Since then, a set of updated requirements based on more mature UCs (see Section 3) has made it possible to expand and specify the architecture into a full version.

The architecture uses the 2021 version of the BRIDGE DERA (see Section 2.3.1) as a basis, adopting the SGAM layers as well as some of its BRIDGE-specific sublayers. Additionally, a vertical pillar is included to represent aspects of data sharing (and governance in particular) that cannot be confined to a single layer and, instead, need to span across the entire system. This concept is expanded upon and refined in the final version of the BD4NRG RA; see Section 4.1.

As this paper presents the full and finished version of the BD4NRG RA, a more detailed account of its predecessor is omitted here. For the full description, see [14].

3. Derivation of Requirements

In the following, we describe the different types of BD4NRG use cases and present the final set of requirements for the Reference Architecture derived from their properties and needs.

3.1. Basic Use Cases

This subsection introduces the three clusters of UCs that the BD4NRG RA aims to cover: the maintenance and reliability of the energy grid, the optimization of DERs management, and de-risking investments in the energy efficiency of buildings. Each cluster involves three to four real-world UCs located in different countries. In total, they serve as a representation for the wide range of goals the proposed Reference Architecture shall address while staying specific to the actual benefits such systems will bring to the energy domain. The following paragraphs describe the individual goals and needs of the clusters and how these inform the requirements defined in Section 3.2.

Firstly, BD-4-NET is a cluster of UCs relating to the management of the electricity network. Focusing on the needs of TSOs and DSOs (see Section 3.2: Actors within the system), they aim to use big data to increase the grid’s reliability and efficiency. To this end, they prioritize forecasting and predictive analytics, requiring large amounts of disparate data to be collected, organized, and processed (in Section 3.2: data access; Data Preprocessing). Some prescriptive tools to utilize existing flexibility in a grid-serving way also fall into this category [2,3].

Predictive asset maintenance and management tools can have very diverse needs depending on the type of asset involved. In addition to data from the assets themselves, they may also require access to external APIs providing, e.g., weather data [2]. Some predictive tools, in particular, may be based on other descriptive or declarative functionalities, making it necessary to access these and forward results in a secure way (Section 3.2: Data Processing and Analytics; access management) [15].

Secondly, BD-4-DER describes a group of UCs aiming to improve the efficiency and management of DERs connected to the electricity grid through the use of big data analytics tools. This includes, e.g., improving the use of flexibility potentials in residential assets, such as heat pumps or electric boilers, the prediction of energy generation, and the predictive maintenance of Battery Energy Storage Systems (BESSs) [34].

Such analysis requires access to other kinds of information than the UCs in BD-4-NET, including different types of residential data, resulting in an increased need for trusted exchange and clear data provenance and traceability (see Section 3.2: data access; access management; governance and trust) [35]. However, similar to the BD-4-NET cases, the main benefit is provided through predictive and/or prescriptive tools, which, in turn, require the secure use of separate declarative or descriptive tools and services (see Section 3.2: Data Processing and Analytics) [15].

Lastly, BD-4-ENEF UCs aim to use big data to de-risk investments as well as improve the efficiency and comfort of buildings. Rather than focusing solely on electrical assets themselves, UCs in this cluster consider Energy Performance Certificates and contracts, business models, and funding models, specifically regarding public buildings such as a school (data access; Actors within the System) [36].

Analyzing their expected and actual impact to de-risk financial decisions in the future requires very different kinds of information than both BD-4-NET and BD-4-DER UCs would access. While there is still energy consumption data required to assess the actual efficiency of existing projects and contracts, production data is not involved here. Instead, administrative data and information from public services are brought together with consumption data and analyzed (see Section 3.2: Data Preprocessing; Data Processing and Analytics) [35].

3.2. Requirements

To meet the diverse demands posed by the use cases outlined in Section 3.1, the BD4NRG RA must fulfill a layered set of requirements, ensuring that the system can adapt to the necessary breadth and variability of real-world energy applications. Among the initiatives and architectural approaches analyzed in Section 2, there is a common approach of grouping their functionalities into layers and including a hierarchical order that allows progressive abstraction for the subsequent services and stakeholders. By adhering to a layered structure in the following way, BD4NRG will ensure compliance with both the SGAM standard from the Smart Grid Domain, as well as with DIN SPEC 91345 from the Industry 4.0 domain [12].

The specific requirements presented in the following paragraphs explicitly do not represent design or implementation choices but rather the foundational needs that arise from the considered set of use cases. In addition to the parameters of individual UCs, the aim of the RA is to cover a larger number of them, which results in additional requirements to enable versatility.

The following requirements were referenced to derive a sublayer structure that specializes the functionalities along key technical and organizational dimensions. The resulting RA is described in Section 4.

Data access to heterogeneous sources that comprise the energy domain requires interoperability with many types of hardware devices, like sensors, meters, and gateways, but also existing databases and external APIs (e.g., for accessing weather data necessary for production forecasting) [5]. Interoperability with popular, established open data sources is an additional necessity. The specific kinds of data being used and shared in each pilot can be found in [35], along with corresponding data types, the hardware that is their source, the control and data acquisition applications that use or make them available, and the employed exchange platforms, some of which are in the form of APIs.

Data Preprocessing is strictly necessary when dealing with bulk and stream data from disparate sources, especially in a big data context. There needs to be support for a variety of information and data models, as well as data formats and protocols, making it possible to semantically bring together that data as well. The corresponding report describes specific protocols and data formats required by each UC, as well as their resulting need for and choice of data interoperability concepts [35].

Data Processing and Analytics tools form the main value proposition for most of the regarded UCs. They vary vastly among the different application areas, sometimes being stand-alone and sometimes requiring cooperation among multiple tools. In any case, they require management as well as coordinated and secure data access. The specific tools can vary from “simple” services (Data as a Service; Simulation as a Service), to descriptive and declarative tools concerning the current state of a system, up to predictive and prescriptive tools enabling the user to interact with future outcomes. This classification of functionalities was decided on after the creation and assessment of the first version of RA. During the development and implementation of many UC-specific functionalities, it proved to be a useful categorization for the systematic management of a wide range of tools [15]. It enables an understanding of their scope and ways individual tools may be able to support and build on each other’s results. Therefore, these categories need to be explicitly represented in order for the RA to achieve its goal of bringing structure to complicated ideas.

Access management and the distribution of tools and data are necessary for a working system. In each system developed based on the BD4NRG Reference Architecture, it must be clear who can access which assets, who owns them, and how compensation for accessing another participant’s data or service is handled. Participants need to be able to discover assets that are available to them in a structured way, and purchases and payments need to be facilitated fairly and securely, with stakeholders maintaining ownership of their data.

Governance, sovereignty, and trust are big issues across the entire proposed system structure. The RA should enable providers to share their data and their tools securely and on their terms while still enabling access in as open a manner as possible. The tracing of data and data provenance is required for this aim, as well as clear identity management and usage policy brokering. Cooperation, as well as service-level agreements, between stakeholders is required to manage fair and secure trade throughout the entire process. This is especially vital in the context of increasingly relevant AI services and tools, which require large amounts of training data to be acquired ethically and securely. In the energy domain, such tools can be immensely useful, while, at the same time, the necessary training data (such as private consumption data) can be seen as sensitive and needs to be handled accordingly [5]. Owners staying in control of their data’s use in current and future applications and tools is becoming increasingly universal as a challenge and a requirement.

The actors within the system are very varied in the electrical sector, and in emerging new markets, they may adopt more than one role. In the implementation cases where a Data Space is created, taking on general DS roles such as Data Consumer or Provider needs to be possible for basically any participant, regardless of their legal form or business role. The system needs to be accessible to energy business roles, such as DSOs and TSOs, and novel actors related to renewable technologies and markets, as well as their financial stakeholders. The ecosystem will not be complete without considering regulation instances and legal and normative aspects such as the Green Deal or the European Data Strategy (which, along with the EU Data Act, will be applicable to the actors), as well as compatibility with prevalent associations such as GAIA-X and IDSA (see Section 2.3.2).

4. The BD4NRG Reference Architecture

This section describes the final version of the BD4NRG Reference Architecture, before assessing its merit according to non-functional indicators. The RA has been defined in compliance with the 4 + 1 View model by Kruchten [37]; in order to highlight how the defined requirements are met, we present it here specifically through the lens of the logical view. The appropriate scenarios and development view can be found in the corresponding documentation [15].

4.1. Description of the Reference Architecture

The BD4NRG Big Data Reference Architecture consists of four layers and a vertical pillar stretching across all of them. Building on the structure of BRIDGE DERA version 2.0 (see Section 2.3.1), it adds to the Interoperability Layers introduced in SGAM [10] by including data analytics tools and services to the Functional Layer to be shared within a system. An additional Marketplace Layer connects participants to these tools and services, and a system-wide pillar contains DS and data sovereignty functionalities that stretch across the full architecture. In this way, all relevant technologies and requirements are joined into an overarching concept, connecting their functionalities into one harmonious structure (Figure 1).

The original SGAM concept explicitly defines all its layers as “Interoperability Layers”. This idea is carried over to the BD4NRG Reference Architecture, with the Data Interoperability Layer specifically focusing on communication and information concepts. The added vertical pillar enables the use of this technical interoperability across the entire system in a safe and trusted way.

The following paragraphs describe each layer and the vertical pillar in relation to the previously defined requirements.

At the lower end, the Data Sources Layer consists of several sections with different types of sources from which data can be obtained. In accordance with the data access requirements defined in Section 3.2, this includes (but is not limited to) existing databases as well as open and closed APIs and existing data exchange platforms. This is broadly comparable to BRIDGE and the Component Layer in SGAM; however, the focus lies on data acquisition. BRIDGE names applications such as privacy provision and big data tools in this layer, whereas the BD4NRG RA handles these aspects in separate layers and the vertical pillar (see below).

Data acquired from the data sources may come in different data types and file formats, or it could arrive via several different protocols and under various timings. The Data Interoperability Layer contains the tools necessary to handle this heterogeneity. It contains two sublayers, focusing on communication and information, respectively. They mirror the Communication and Information Layers in the BRIDGE DERA.

The Communication Sublayer adheres to its BRIDGE counterpart, being separated into the sections “Protocols” and “Data Formats”. It contains individual instances of both that correspond to those most relevant to the UCs described in Section 3.1. The Information Sublayer similarly specifies two sections, “Information Models” and “Data Models”, also corresponding to their BRIDGE counterparts but with each specifying more individual instances, which arise, again, from the requirements of the UCs.

The Data Interoperability Layer is crucial for fulfilling the preprocessing requirement, making it possible to truly be interoperable with the necessary variety of sources.

The Functional Layer contains some of the most significant departures from the BRIDGE format. It consists of two sublayers: a Data Analytics Toolbox and a Marketplace Sublayer.

The Data Analytics Toolbox Sublayer contains crucial functionalities not represented in BRIDGE DERA, and this layer is focused on system services necessary for exchange among users. Following the “Data Processing and Analytics” requirements mentioned in Section 3.1, this sublayer contains five different categories of big data analytics functionalities. In accordance with the requirements defined in Section 3.2, they are services, descriptive tools, declarative tools, predictive tools, and prescriptive tools. The specific set of included tools reflects those implemented by the full set of use cases on which the RA is based. As with the instances of, e.g., Data Sources in the respective layer, this toolset is not exhaustive. Future implementations serving new use cases may include other tools, which can then be organized within the BD4NRG structure according to the tool classification it provides.

Some functionalities contained in the Functional Layer of BRIDGE DERA are found here within the Marketplace Sublayer. It contains governance and User Functionalities necessary to facilitate the exchange, discovery, and accounting of the tools and data from the lower layers. However, it does not contain the full width of functionalities specified in DERA 2.0, as the aspects relevant to data sovereignty and trust, as well as DS governance, are represented in a layer-overarching pillar instead.

The layer representing Business Actors and Ecosystems is fairly consistent with the Business Layer defined by SGAM and used in BRIDGE. In fact, the sections within it are largely similar to those in BRIDGE DERA 2.0, with one exception: we define DS Stakeholders as a base role that any stakeholder could fill within a compliant system. This moves in the same direction as the updated DERA 3.0 version, acknowledging the importance of Data Spaces as a part of big data handling and sharing, without limiting their facilitation to only one specific kind of user. These roles are consistent with the needs of the BD4NRG UCs (Section 3.1) and the derived requirement regarding actors within the system (Section 3.2), as well as being compliant with the IDSA system (Section 2.3.2).

In addition to the layers, the BD4NRG RA contains a vertical pillar to stretch across all of them. It contains components that are crucial for IDSA-compliant sovereignty and trust, such as a Vocabulary Hub and Metadata Broker, as well as tools for identity management. Data Space governance is achieved according to the governance and trust requirement (Section 3.2) through cooperation- and service-level agreements between stakeholders. While having been developed concurrently, this reinforces similar priorities to BRIDGE DERA 3.0, specifically including Data Space functionalities and components in the representation of the overall system. However, in contrast to DERA 3.0, the BD4NRG RA does not require a DS to be set up (or a Connector to be used) in order for a system to be compliant and, instead, sees it as one of the multiple options to instantiate the Reference Architecture, depending on each individual system’s needs and priorities.

4.2. Indicators

Given the description of the BD4NRG Reference Architecture, this section applies a set of non-functional indicators to assess its benefits and limitations.

As described in Section 2.1, a Reference Architecture is not a software architecture. Instead, it spans a range of UCs, providing a common structure and support for the development of multiple systems. The following section applies criteria and quality attributes for RAs given by Galster et al. [17] and Cloutier et al. [7] to the BD4NRG Reference Architecture. Additionally, Galster et al. mention some general “criteria for good RAs”, which we subsequently address [17].

4.2.1. Alignment with Existing Initiatives

The use of knowledge from existing and proven architecture concepts is named as a quality criterion for RAs in both sources [7,17].

Section 1.2 describes how a thorough review of the state of the art at the time was employed as a fundamental base for the first version of the BD4NRG RA, on which this one is based in turn. The current most relevant related initiatives and existing architectures are put in relation to the final product in Section 2. In addition to utilizing and evolving the proven structures of BRIDGE DERA, RAMI 4.0, and SGAM, as described in the description of each layer, the BD4NRG RA adheres to the IDSA principles of data sovereignty and trust, enabling systems developed according to the RA to be compatible with IDS-RAM. While not mandating the use of a Data Space in particular, the RA is compatible with the concept and allows for a system created within its bounds to include or be governed by one.

4.2.2. Support for Instantiation

The usefulness of an RA is characterized according to both sources by its support of the instantiation of software for specific UCs [7,17].

The BD4NRG RA gives a comprehensive logical view, structuring the full component range as a system addressing one of its UCs that will need to be covered. Corresponding documents from BD4NRG describe concrete implementations [34,36,38,39]. Examples of this are discussed in Section 5. Galster et al. name the annotation of the RA with attributes and rules as an indicator of this criterion. A comprehensive description of relevant data exchange specifications can be found, along with the RA, in the corresponding report [15].

Additionally, as mentioned in Section 2.1, Cloutier et al. expect an RA to address technical and business architectures as well as customer context [7]. The BD4NRG RA approaches technical architectures through the adapted BRIDGE and IDS-RAM structures. Customer context is addressed through the inclusion of the Business Actors and Ecosystems Layer, as well as the Marketplace Sublayer, which serves to include user interests as well as monetization in the derived systems.

4.2.3. Documentation

Galster et al. recommend the documentation of an RA in the form of the 4 + 1 View model, with the choice of the exact views depending on the individual context [17]. Similarly, Cloutier et al. require a presentation of an RA alongside “sufficient concrete information and guidelines” [7]. In this paper, the BD4NRG RA is presented from the logical view (compliant under “viewpoints” with ISO 42010 [40]), with the additional development view to be found in the corresponding report [15].

4.2.4. Adaptability

The BD4NRG RA is applicable to a wide range of use cases due to its development base. That foundation also makes it possible to adapt to new UCs, as well as supporting changes to existing ones. The RA does not prescribe specific technologies to be used. Instead, it provides a structure that UCs can adapt to their needs. This technology neutrality is present in the conceptual layers and building blocks, which enables freedom of choice regarding specific functionalities and the implementation of components, as well as the interactions and interfaces between them. This promotes a smooth inclusion of new, emerging technologies and methodologies, in addition to allowing compliance with multiple requirements simultaneously.

4.2.5. Understandability

The RA provides an overview of a whole system while making an effort to be visually understandable. As mentioned previously, it is presented alongside several reports, enabling a deeper understanding and clearer definition of possible applications [15,38].

4.2.6. Accessibility Within Organization

This paper, as well as all the aforementioned reports, are either publicly available or will be [2].

4.2.7. Inclusion of Key Issues of Specific Domains

The specific needs of the energy domain are included, especially in the lower three layers, considering specific commonly required data sources, interoperability concepts, and analytics tools. They were included based on UCs spanning diverse aspects of this domain. The following section will emphasize this by mapping examples of them to the given structure.

5. Exemplary Implementations

As mentioned in Section 2.1, a Reference Architecture is not a single software architecture but rather a template to guide the implementation of multiple systems across different projects and UCs, which do not always need to use all components represented in the RA.

This section introduces three exemplary implementations, each catering to one of the UC clusters described in Section 3.1. They show how the same RA can result in a variety of implementations. Presenting one individual UC per cluster results in not all aspects of the RA being relevant to this specific selection, as visualized in Figure 2, showing a mapping of these specific UCs to the BD4NRG RA.

The use cases presented in the following subsections are not exhaustive. In total, there were over a dozen UCs identified and implemented [3,41,42]. We introduce the following three as examples in order to have a representation of all three clusters of BD-4-NET, BD-4-DER, and BD-4-ENEF.

5.1. BD-4-NET

The UC with the title “Efficient management of flexibility resources for supporting grid operation” is part of BD-4-NET and concerns a district connected to the grid by two secondary substations located in Terni, Italy, and managed by the local DSO, ASM Terni. Managed by ASM, the district contains Photovoltaic (PV) production and electrical consumption, including Electric Vehicle (EV) charging stations. The goal of this UC is to assist the DSO in efficiently managing these charging stations in a grid-friendly way by leveraging datasets from the charging stations themselves, as well as the state of the grid and the available PV production [3].

Advanced big data and AI-supported techniques for forecasting [43], as well as demand-side and asset management, were developed and employed to optimize distribution grid management to this end. The results show how the applied techniques can improve efficiency and ensure power reliability, significantly supporting the work of the DSO, who led this pilot [44].

The BD4NRG RA served in structuring the full software stack necessary for these activities. Full documentation can be found in the corresponding report [39], and the following paragraphs describe its alignment with the RA structure.

5.1.1. Data Sources

The Data Sources accessed by the system of this UC include IoT smart meters installed across the district at various end users’ locations (schools, businesses, households, etc.), Phasor Measurement Units (PMUs), providing power quality metrics, EVs, and EV charging stations. To enable the forecasting of PV production, real-time data from the PV system is collected, along with historical weather data and geographical imaging from its precise location [39].

5.1.2. Data Interoperability

Regarding protocols, Message Queuing Telemetry Transport (MQTT) is by far the most widely used in this implementation, including communicating real-time data from sensors and PMUs towards the low-voltage SCADA system of the ASM. This data is shared in JavaScript Object Notation (JSON) or SQL format. The IoT smart meters communicate with the SCADA system (and alert the DSO in case of anomalies) via the 2G Network. Information interoperability is handled by a “Homogenization Layer” based on FIWARE ORION Context Broker and smart data models, enabling the incorporation of standardization and quality checks for incoming data from disparate sources.

5.1.3. Functional Layer

A visual analysis tool based on Apache Superset is implemented for this UC, aligned with the “descriptive” type of data analytics within the Toolbox Sublayer. Predictive tools are implemented, specifically day-ahead forecasts of distributed generation and consumption within the district. They form the base for the prescriptive goal of determining optimal charging scheduling, which supports local grid balancing and storage management. Regarding the Marketplace Sublayer, a blockchain component is implemented to handle smart contracts. Metadata and Discovery are implemented through a Query Engine, enabling (authorized) access to UC data through a PostgreSQL database. Payment Facilitation and Transaction Tracking are handled by a Marketplace component developed especially for the BD4NRG ecosystem [39,45].

5.1.4. Business Actors & Ecosystems

The main Business Roles in this UC are ASM Terni, as the DSO (using data collection and analysis to manage the grid), and the DER operators (responsible for EV charging stations). In the context of DS roles, the DER operators serve as Data Providers. In providing and using the analytics services, the DSO acts as a Data Consumer. Regulations apply according to Italian and European law.

5.1.5. Data Space Pillar

“Keyrock” is implemented as an identity management component. For Access and Usage Policy Brokerage, a Data Access Policy Broker is implemented and integrated with the Query Engine, providing Metadata and Discovery on the Marketplace. This UC did not implement a full DS, resulting in the other components not being relevant.

5.2. BD-4-DER

The objective of the UC “Predictive Analytics for DER/Prosumer Flexibility Potential Forecasting” within the BD-4-DER cluster is to create a BESS service. It leverages Condition-based Monitoring (CbM) to facilitate predictive maintenance in front-of-the-meter and behind-the-meter applications. To this end, data from various sensors is collected, including from the leading company’s (Enel X) own Catania X Lab Energy Storage and Lab Data Acquisition Systems [46]. The purpose of the resulting system is to enable DER Operators to monitor assets in real-time (RT), predict failures, extend asset life, and improve safety [41]. RT data analysis using AI algorithms detects anomalies, significant patterns, and trends. In total, 18 such patterns are recognized by the service. Furthermore, alerts, notifications, and reports are generated automatically upon detecting potential problems. Beyond monitoring and maintenance, the service includes BESS performance optimization involving adjusting charging/discharging parameters dynamically based on operating conditions and energy demand forecasts. While there is less emphasis on data and service exchange in this example, implementing this composition of data collection and analysis requires a complex software setup. Its structure is based on the BD4NRG RA. Further documentation can be found in the corresponding reports [34,41].

5.2.1. Data Sources

The used data sources are comprised of an RT and historical data from batteries (voltage, current, temperature, and state of charge), as well as model parameters and settings for the Digital Twin (a controllable digital representation [41]) of the physical battery system, including profile data from the BESS itself, as well as the corresponding PV system.

5.2.2. Data Interoperability

The service is designed to manage various data streams, including batch data in Comma-separated Values (CSV) format, near real-time data via an Application Programming Interface (API) (in JSON format), and real-time data input to the MQTT broker from the BESS system. The storage systems’ profile data and PV power production profile data are collected as CSV / JSON via the Modbus RTU protocol. Furthermore, the service uses the Remote Procedure Calls (gRPC) protocol with protobuf for efficient data de-serialization to interface with external modules. A custom data model (described in [34]) is used to manage the collected information.

In terms of data processing, there are two ingestion/analysis modes: a batch analysis mode and a streaming analysis mode. In the batch analysis mode, users can send a set of input data for analysis. The service processes this data as a batch job and returns a single output upon completion. This mode is suitable for scenarios where there is a predefined dataset for comprehensive analysis. In the streaming analysis mode, users can continuously stream battery-related data to the service, which then provides a continuous stream of output results as it analyzes the incoming data. This mode is ideal for applications requiring timely insights, such as the real-time monitoring of battery performance.

5.2.3. Functional Layer

The primary analysis tool of this UC is of the “predictive” type. It leverages predictive analytics to foresee potential failures and critical events in BESSs by analyzing real-time sensor data, which is central to its predictive maintenance approach. The tool, therefore, relies on the previously mentioned service for batch and stream data processing. Regarding the Marketplace Sublayer, the tool is published as a Docker image and made available only to individual users by specific contracts.

5.2.4. Business Actors & Ecosystems

The owner and operator of the involved BESS, in this implementation Enel X, is the main actor in this system (Business Role “DER Operator”). They occupy the Service Provider DS role, making their tool available to individual users given special agreements. Any user signing such an agreement and gaining access then becomes a Service Consumer. As the system does not anticipate the use of data owned by separate entities, the Data Provider and Data Consumer roles do not apply here. As this UC is located in Italy as well, it is subject to regulations according to Italian and European law.

5.2.5. Data Space Pillar

Cooperation agreements with Enel X are necessary to access any of the aforementioned tools and services. Having an individual agreement with every user ensures Trusted Exchange (at the cost of openness) and, accordingly, requires an identity management concept to allow certain users to access the relevant Docker Images.

5.3. BD-4-ENEF

The UC with the title “Predictive Analytics for Energy Efficiency Investments de-risking” is part of the cluster case BD-4-ENEF, and it aims to use big data to predict the value of energy efficiency investments in Latvia. It bases its analysis on past projects and evaluation methodologies, using data from smart meters connected to energy systems in buildings (heating, cooling, lighting, and ventilation). The results enable investors to make informed decisions and, in turn, improve the flow of funding towards projects that will have the most impact on building efficiency [42]. The Latvian Environmental Investment Fund (LEIF), owned by the Latvian government, supervised this activity and conducted several studies on its effectiveness, including in combination with and comparison to other Latvian initiatives [47,48,49,50].

Its full documentation can be found in the corresponding reports [36,42], and the description of its relation to the RA follows:

5.3.1. Data Sources

The UC participants use custom scripts to preprocess, integrate, and store information related to energy consumption and CO₂ emissions, as well as data about the involved buildings (type, construction year, heating area, etc.). The modular development gives room for future IoT integration to gather real-time data and improve the decision-making process.

5.3.2. Data Interoperability

Historical data is imported in .xlsx format and represented in a custom data model. Custom scripts were used to adapt, enrich, and store data in a relational database that was connected to the Integrated Query Engine.

5.3.3. Functional Layer

Two main analytics tools are implemented. The first one, classified as declarative and predictive, categorizes energy efficiency investments using a pool of historical data to train machine learning models. The second tool focuses on the calculation of the actual energy savings of a renovation action in buildings. Descriptive visual analytics can be incorporated to make their results more accessible. Regarding the Marketplace Sublayer, Metadata and Discovery are handled by the same Query Engine used in the example described in Section 5.1. It enables users and services to search and find the aforementioned data in .xlsx format.

5.3.4. Business Actors & Ecosystems

Data Providers, such as building owners and real estate market participants, can mitigate their risks, while Energy Service Companies (ESCOs) can become Data and Service Consumers, using the tool to structure performance contracts with the building owners. Similarly, investors can de-risk their decisions based on the evidence-based predictions obtainable from the services. Consultants and Advisors can provide expertise in using the tool to all the actors.

5.3.5. Data Space Pillar

The access control and policies are defined by the owners of the data sources, using a dedicated Identity and Access Management component. Given its integration with the Query Engine, this ensures the required governance, trust, and sovereignty.

5.4. Applicability and Implications

The three given examples show how the BD4NRG RA can support vastly different use cases, providing a robust structure for a variety of challenges. The presented cases include supporting a DSO in managing its grid effectively, enabling DER Operators’ improved asset management for increased safety and reliability, and employing historical data to encourage well-founded decisions for investors supporting building efficiency.

As mentioned at the beginning of Section 5, these examples are not exhaustive, as over a dozen total use cases were developed and tested based on the presented architecture [3,41,42]. Other applications include sharing data to support rule-based fault detection for overhead power lines in Portugal [51] and enabling the Slovenian TSO ELES to modernize outage planning [52]. These large-scale instances of BD4NRG show how energy stakeholders can be directly supported by the provision of a structured approach with which to match their goals.

On the other hand, an RA provides opportunities from a policy perspective. As outlined in Section 2.3, standardized Data Space creation and establishment in the European energy domain is an ongoing topic, with CEEDS in active development. BD4NRG is aligned with CEEDS principles, allowing for the implementation of a Data Space without mandating it.

BD4NRG further connects these DS principles with the concept of a common marketplace, theoretically enabling, e.g., the use of the structure for the creation of flexibility markets. By providing a single energy-specific structure for employing cooperation- and service-level agreements and smart contracts while combining DS components for secure exchange and governance with user functionalities, like usage accounting and payment facilitation in one RA, BD4NRG opens up significant opportunities for public stakeholders.

6. Conclusions

In this paper, we present a new, innovative Big Data Reference Architecture for the secure exchange of data and services in the energy domain. This was carried out through a thorough analysis of existing concepts in the space and consideration of a wide variety of use cases.

Requirements were derived from real-life use cases in the energy domain, and these were shown to be met by the final architecture. Furthermore, the analysis of non-functional indicators showed how the RA balances adherence to existing, domain-agnostic architectures with the needs of these energy-specific use cases.

By supporting the development of software architectures through accessible documentation and clear Supplementary Information, the BD4NRG RA is adaptable to a wide range of situations. This was demonstrated by the inclusion of three implementation examples of different types that approached the structure through differing individual needs, resulting in three distinct software concepts based on and enabled by the same architecture.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/su17146488/s1, BD4NRG Deliverable 2.3; BD4NRG Deliverable 2.5; BD4NRG Deliverable 2.6; BD4NRG Deliverable 3.2; BD4NRG Deliverable 5.4; BD4NRG Deliverable 7.1; BD4NRG Deliverable 7.4; BD4NRG Deliverable 8.1; BD4NRG Deliverable 8.4; BD4NRG Deliverable 9.1; BD4NRG Deliverable 9.4.

Author Contributions

Conceptualization, K.W. and A.P.; methodology, K.W. and A.P.; validation, K.W., A.P., and L.C.R.; formal analysis, K.W., A.P., and L.C.R.; investigation, K.W., A.P., and L.C.R.; resources, A.M.; data curation, K.W. and L.C.R.; writing—original draft preparation, K.W.; writing—review and editing, K.W., A.P., and L.C.R.; visualization, K.W. and L.C.R.; supervision, A.M.; project administration, K.W.; funding acquisition, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

The work of the authors was supported financially by the BD4NRG Project. The BD4NRG Project has received funding from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under Grant Agreement No. 872613.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

Data supporting the results can be found in the publicly available reports at https://bd4nrg.eu (accessed on 15 June 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Couto, J.; Monti, A.; Kotsalos, K.; Valino, J.; Kukk, K. BRIDGE, Cooperation Between Horizon 2020 and Horizon Europe Projects in the Fields of Smart Grid, Energy Storage, Islands, and Digitalisation–2023 Brochure; Publications Office of the European Union: Luxembourg, 2023. [Google Scholar] [CrossRef]
BD4NRG. BD4NRG Project Site. 2022. Available online: https://www.bd4nrg.eu/pilots-applications (accessed on 26 October 2023).
Bucarelli, M.A.; Santori, F.; Bragatto, T.B.; Kerin, U.; Bečan, M.; Kozjek, D.K.; Francesco, B.; Mancinelli, E.; Gubina, A.; Medved, T.; et al. BD4NRG Deliverable 7.1; Technical Report; BD4NRG: Rome, Italy, 2021. [Google Scholar]
Bhattarai, B.P.; Paudyal, S.; Luo, Y.; Mohanpurkar, M.; Cheung, K.; Tonkoski, R.; Hovsapian, R.; Myers, K.S.; Zhang, R.; Zhao, P.; et al. Big data analytics in smart grids: State-of-the-art, challenges, opportunities, and future directions. IET Smart Grid 2019, 2, 141–154. [Google Scholar] [CrossRef]
Gokhale, G.; Van Gompel, J.; Claessens, B.; Develder, C. Transfer Learning in Transformer-Based Demand Forecasting For Home Energy Management System. In Proceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, New York, NY, USA, 15–16 November 2023; BuildSys ’23. pp. 458–462. [Google Scholar] [CrossRef]
Hu, J.; Vasilakos, A.V. Energy Big Data Analytics and Security: Challenges and Opportunities. IEEE Trans. Smart Grid 2016, 7, 2423–2436. [Google Scholar] [CrossRef]
Cloutier, R.; Muller, G.; Verma, D.; Nilchiani, R.; Hole, E.; Bone, M. The concept of reference architectures. Syst. Eng. 2010, 13, 14–27. [Google Scholar] [CrossRef]
Siriweera, A.; Paik, I. AutoBDA: Model-Driven Reference Architecture for Automated Big Data Analysis Framework. IEEE Trans. Serv. Comput. 2025, 18, 1293–1307. [Google Scholar] [CrossRef]
Ataei, P.; Litchfield, A.T. Big data reference architectures, a systematic literature review. In Proceedings of the ACIS 2020, Wellington, New Zealand, 1–4 December 2020. [Google Scholar]
Gottschalk, M.; Uslar, M.; Delfs, C. The smart grid architecture model–SGAM. In The Use Case and Smart Grid Architecture Model Approach: The IEC 62559-2 Use Case Template and the SGAM Applied in Various Domains; Springer Briefs in Energy: Berlin, Germany, 2017; pp. 41–61. [Google Scholar]
Wilker, S.; Meisel, M.; Piatkowska, E.; Sauter, T.; Jung, O. Smart Grid Reference Architecture, an Approach on a Secure and Model-Driven Implementation. In Proceedings of the 2018 IEEE 27th International Symposium on Industrial Electronics (ISIE), Cairns, Australia, 13–15 June 2018; pp. 74–79. [Google Scholar] [CrossRef]
Wehrmeister, K.; Pastor, A.; Carreras, L.; Dähling, S.; Mammina, M.; Rossi, A.; Profeta, D.; Bothos, E.; Magoutas, B.; Karakolis, V.; et al. BD4NRG Deliverable 2.5; Technical Report; BD4NRG: Rome, Italy, 2021. [Google Scholar]
Otto, B.; Steinbuß, S.; Teuscher, A.; Lohmann, S. IDSA Reference Architecture Model Version 3.0; Technical Report; International Data Spaces Association: Dortmund, Germany, 2019. [Google Scholar]
Wehrmeister, K.A.; Bothos, E.; Marinakis, V.; Magoutas, B.; Pastor, A.; Carreras, L.; Monti, A. The BD4NRG Reference Architecture for Big Data Driven Energy Applications. In Proceedings of the 2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA), Corfu, Greece, 18–20 July 2022; IEEE: New York, NY, USA, 2022; pp. 1–8. [Google Scholar]
Wehrmeister, K.; Pastor, A.; Carreras, L.; Mammina, M.; Herrmann, E.; Medela, A.; Abella, A.; vd Berg, W.; Dimitropoulos, N.; Karakolis, V.; et al. BD4NRG Deliverable 2.6; Technical Report; BD4NRG: Rome, Italy, 2022. [Google Scholar]
Nakagawa, E.Y.; Oliveira Antonino, P.; Becker, M. Reference Architecture and Product Line Architecture: A Subtle But Critical Difference. In Proceedings of the Software Architecture; Crnkovic, I., Gruhn, V., Book, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 207–211. [Google Scholar]
Galster, M.; Avgeriou, P. Empirically-Grounded Reference Architectures: A Proposal. In Proceedings of the Joint ACM SIGSOFT Conference–QoSA and ACM SIGSOFT Symposium–ISARCS on Quality of Software Architectures–QoSA and Architecting Critical Systems–ISARCS, New York, NY, USA, 20 June 2011; QoSA-ISARCS ’11. pp. 153–158. [Google Scholar] [CrossRef]
Alexopoulos, K.; Bakopoulos, E.; Larrinaga Barrenechea, F.; Castellvi, S.; Firouzi, F.; Luca, G.d.; Maló, P.; Marguglio, A.; Meléndez, F.; Meyer, T.; et al. Bridging the Gap Between IDS and Industry 4.0-Lessons Learned and Recommendations for the Future; Technical Report; IDSA: Dortmund, Germany, 2024. [Google Scholar]
De Mauro, A.; Greco, M.; Grimaldi, M. What is big data? A consensual definition and a review of key research topics. In Proceedings of the AIP Conference Proceedings; American Institute of Physics: College Park, MD, USA, 2015; Volume 1644, pp. 97–104. [Google Scholar]
Definitions-JRC Data Spaces Knowledge Base-EC Public Wiki. Available online: https://wikis.ec.europa.eu/spaces/jrcdataspaceswiki/pages/57443811/1.3%2BDefinitions?utm_source=chatgpt.com (accessed on 10 March 2025).
ISO/IEC 20547-3; Information Technology—Big Data Reference Architecture—Part 3: Reference Architecture. International Organization for Standardization: Geneva, Switzerland, 2020. Available online: https://www.iso.org/standard/71277.html (accessed on 20 June 2025).
Chang, W.; Boyd, D.; Levin, O. NIST Big Data Interoperability Framework: Volume 6, Reference Architecture; Special Publication (NIST SP) 1500-6r2; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2019. [Google Scholar]
Marz, N.; Warren, J. Big Data: Principles and Best Practices of Scalable Real-Time Data Systems; Manning Publications: Shelter Island, NY, USA, 2015. [Google Scholar]
DIN. DIN SPEC 91345. 2016. Available online: https://www.dinmedia.de/de/technische-regel/din-spec-91345/250940128 (accessed on 11 July 2025).
Lin, S.W.; Murphy, B.; Clauer, E.; Loewen, U.; Neubert, R.; Bachmann, G.; Pai, M.; Henkel, M. Architecture Alignment and Interoperability; Technical Report; Plattform Industrie 4.0: Dortmund, Germany, 2017. [Google Scholar]
Dehghani, Z. Data Mesh: Delivering Data-Driven Value at Scale; O’Reilly Media: Sebastopol, CA, USA, 2022. [Google Scholar]
Foundation, C.N.C. CNCF Cloud Native Landscape, 2025. Available online: https://landscape.cncf.io/ (accessed on 21 March 2025).
GAIA-X European Association for Data and Cloud. Gaia-X Architecture Document. 2023. Available online: https://docs.gaia-x.eu/technical-committee/architecture-document/23.10/ (accessed on 28 December 2023).
Lambert, E.; Boultadakis, G.; Kukk, K.; Kotsalos, K.; Bilidis, N. European Energy Data Exchange Reference Architecture. 2021. Available online: https://energy.ec.europa.eu/publications/bridge-reports_en (accessed on 13 July 2025).
Kukk, K.; Kotsalos, K. European (energy) Data Exchange Reference Architecture 2.0– Data Management Working Group–June 2022; Publications Office of the European Union: Luxembourg, 2023. [Google Scholar] [CrossRef]
Otto, B. The evolution of data spaces. In Designing Data Spaces: The Ecosystem Approach to Competitive Advantage; Springer International Publishing: Cham, Switzerland, 2022; pp. 3–15. [Google Scholar]
Dognini, A.; Monti, A.; Kung, A.; Medela, A.; Joglekar, C.; Schaffer, C.; Stampatori, D.; Jimenez, D.; Maqueda, E.; Coelho, F.; et al. Blueprint of the Common European Energy Data Space; Technical Report; Entec: Dortmund, Germany, 2024. [Google Scholar] [CrossRef]
International Organization for Standardization (ISO); International Electrotechnical Commission (IEC); Institute of Electrical and Electronics Engineers (IEEE). Enterprise, Systems and Software—Reference Architectures, 2025. Committee Draft P42042/CD1, Unapproved Draft. Available online: https://www.iso.org/standard/87310.html (accessed on 13 July 2025).
Scavo, F.B.; Castro, G.; De Benedetti, M.; Lanuzza, L. BD4NRG Deliverable 8.4; Technical Report; BD4NRG: Rome, Italy, 2023. [Google Scholar]
Bothos, E.; Magoutas, B.; Abella, A.; Wehrmeister, K.; Dähling, S.; Carreras, L.; Pastor, A.; Buyuk, A.; Gazioglu, I.; vd Berg, W. BD4NRG Deliverable 2.3; Technical Report; BD4NRG: Rome, Italy, 2021. [Google Scholar]
Zucika, A.; Rodionovs, R.; Sarmas, E. BD4NRG Deliverable 9.4-LSP 12 Pilot Documentation; Technical Report; BD4NRG: Rome, Italy, 2023. [Google Scholar]
Kruchten, P. The 4 + 1 View Model of architecture. IEEE Softw. 1995, 12, 42–50. [Google Scholar] [CrossRef]
Kapetanios, A.; Bilidis, N.; Rossi, A.; Ropolo, A.; Marinakis, V.; Karakolis, V.; Dimitropoulos, N.; Medela, A.; Malo, P.; Di’Orio, G.; et al. BD4NRG Deliverable 3.2; Technical Report; BD4NRG: Rome, Italy, 2022. [Google Scholar]
Bucarelli, M.A.; Ghoreishi, M.; Santori, F.; Natalini, A.; Arnone, D.; Mammina, M.M.; Sarmas, E.; Bellesini, F.; Smolnikar, M.; Craciunescu, V.; et al. BD4NRG Deliverable 7.4; Technical Report; BD4NRG: Rome, Italy, 2023. [Google Scholar]
ISO/IEC/IEEE. Systems and Software Engineering–Architecture Description. ISO/IEC/IEEE 42010:2011(E) (Revision of ISO/IEC 42010:2007 and IEEE Std 1471-2000) 2011, pp. 1–46. Available online: https://ieeexplore.ieee.org/document/6129467 (accessed on 13 July 2025).
Georgiadou, V.; Hofbauer, E.; Sarmas, E.; Marinakis, V.; Castro, G.; Campos, J.; Heylen, E.; Meier, D.; Krisper, U.; Kordes, A. BD4NRG Deliverable 8.1; Technical Report; BD4NRG: Rome, Italy, 2021. [Google Scholar]
González, V.; Palencia, S.; Hernández Moral, G.; Lorenzo, M.; Tribino, J.; Sanz, J.; Zucika, A.; Karklins, G.; Rodionovs, R.; Oliver, M.; et al. BD4NRG Deliverable 9.1; Technical Report; BD4NRG: Rome, Italy, 2021. [Google Scholar]
Sarmas, E.; Strompolas, S.; Marinakis, V.; Santori, F.; Bucarelli, M.A.; Doukas, H. An Incremental Learning Framework for Photovoltaic Production and Load Forecasting in Energy Microgrids. Electronics 2022, 11, 3962. [Google Scholar] [CrossRef]
Bucarelli, M.A.; Santori, F.; Sarmas, E.; Cipolla, S.; Marinakis, V.; Natalini, A.; Mammina, M. Application of Big Data Analytics in the Electrical Sector: A Real Case Study. In Proceedings of the 2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA), Volos, Greece, 10–12 July 2023; pp. 1–6. [Google Scholar] [CrossRef]
Ruiz, B.; Medela, A.; Iuhasz, G.; Teleaga, D.; Sarno, C.; Rossi, A.; Mammina, M.; D’Auria, A.; de Graaf, E.; Pastor, A. BD4NRG Deliverable 5.4; Technical Report; BD4NRG: Rome, Italy, 2023. [Google Scholar]
Noce, C.; Lanuzza, L.; De Benedetti, M.M. Electrification technologies and grid services testing inside Enel X labs. In Proceedings of the 27th International Conference on Electricity Distribution (CIRED 2023), Rome, Italy, 12–15 June 2023; IET: Rome, Italy, 2023; Volume 2023, pp. 266–270. [Google Scholar]
Sarmas, E.; Marinakis, V.; Doukas, H. A data-driven multicriteria decision making tool for assessing investments in energy efficiency. Oper. Res. 2022, 22, 5597–5616. [Google Scholar] [CrossRef]
Sarmas, E.; Spiliotis, E.; Marinakis, V.; Koutselis, T.; Doukas, H. A meta-learning classification model for supporting decisions on energy efficiency investments. Energy Build. 2022, 258, 111836. [Google Scholar] [CrossRef]
Sarmas, E.; Forouli, A.; Marinakis, V.; Doukas, H. Baseline energy modeling for improved measurement and verification through the use of ensemble artificial intelligence models. Inf. Sci. 2024, 654, 119879. [Google Scholar] [CrossRef]
Sarmas, E.; Kleideri, M.; Zučika, A.; Marinakis, V.; Doukas, H. Improving energy performance of buildings: Dataset of implemented energy efficiency renovation projects in Latvia. Data Brief 2023, 48, 109225. [Google Scholar] [CrossRef] [PubMed]
Brito Palma, L. Hybrid Approach for Detection and Diagnosis of Short-Circuit Faults in Power Transmission Lines. Energies 2024, 17, 2169. [Google Scholar] [CrossRef]
Zupančič, J.; Medved, T.; Gubina, A.F.; Antončič, M.; Bečan, M.; Kerin, U. Cross-functional Integration of Grid Operation with Predictive Asset Management. In Proceedings of the 2022 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Novi Sad, Serbia, 10–12 October 2022; pp. 1–5. [Google Scholar] [CrossRef]

Figure 1. The BD4NRG big data Reference Architecture [15].

Figure 2. Mapping of three selected use cases to the BD4NRG Reference Architecture.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wehrmeister, K.; Pastor, A.; Carreras Rodriguez, L.; Monti, A. Big Data Reference Architecture for the Energy Sector. Sustainability 2025, 17, 6488. https://doi.org/10.3390/su17146488

AMA Style

Wehrmeister K, Pastor A, Carreras Rodriguez L, Monti A. Big Data Reference Architecture for the Energy Sector. Sustainability. 2025; 17(14):6488. https://doi.org/10.3390/su17146488

Chicago/Turabian Style

Wehrmeister, Katharina, Alexander Pastor, Leonardo Carreras Rodriguez, and Antonello Monti. 2025. "Big Data Reference Architecture for the Energy Sector" Sustainability 17, no. 14: 6488. https://doi.org/10.3390/su17146488

APA Style

Wehrmeister, K., Pastor, A., Carreras Rodriguez, L., & Monti, A. (2025). Big Data Reference Architecture for the Energy Sector. Sustainability, 17(14), 6488. https://doi.org/10.3390/su17146488

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Big Data Reference Architecture for the Energy Sector †