ICT Scalability and Replicability Analysis for Smart Grids: Methodology and Application

: The essential role of Information and Communication Technologies (ICT) in modern electricity grids makes it necessary to consider them when evaluating the scalability and replica-bility capabilities of smart grid systems. This paper proposes a novel step-by-step methodology to quantitatively perform an ICT scalability and replicability analysis (SRA) in a smart grid context. The methodology is validated and exemplified by applying it to two real case studies that are demonstrated in the EU-funded RESPONSE project and comprise solutions relying on different communication technologies. The results of the proposed methodology are summarised through ICT scalability and replicability maps, which are introduced in this paper as a quick way of obtaining an overview of the scalability and replicability capabilities of an ICT system and as an efficient way of estimating the feasibility of scenarios not covered in the SRA.


Introduction
A smart grid is a digitalised electricity grid that uses Information and Communications Technologies (ICT) to monitor and control devices in order to improve the Quality of Service (QoS) and performance of the grid.This includes the ability to remotely and efficiently manage real-time events, measurements, and failures.
The importance of ICT for smart grids requires consideration of both the power system requirements and the interoperability requirements when creating a smart grid use case.To identify any standardisation gaps, the Smart Grid Architecture Model (SGAM) was created [1].This tool enables the display of the various stakeholders of the smart grid (domains), their hierarchical management system levels (zones), and the interoperability layers.
The rapid digitisation of electricity grids to meet the challenge of decarbonising the electricity sector necessitates not only the recognition of standardisation gaps while implementing smart grid systems but also the evaluation of their scalability and replicability.The purpose of a Scalability and Replicability Analysis (SRA) is to detect any potential impediments and limitations that could stop the solution from being just a one-off local demonstration [2].
The scalability and replicability of smart grid projects are influenced by technical, economic, regulatory, and stakeholder acceptance factors [2,3].Technical SRAs of smart grid use cases in various European projects have focused on the impact on the power system, using a consistent methodology that calculates indicators such as the decrease in network losses or the hosting capacity [4].However, the ICT infrastructure employed is also a major factor in the scalability of smart grids [5], as it may impose more stringent constraints on the scalability and replicability of the smart grid use case than the power grid.For instance, Ref. [4] showed that the reliability of Medium Voltage (MV) grids, regardless of the topology, does not significantly increase for automation degrees higher than 20-30%; yet, the ICT used may not match such scalability levels or may depend on factors such as the topology or area to cover.Therefore, to gain a complete understanding of the technical scalability and replicability of a smart grid use case, the ICT part is essential to reduce the risk of having to upgrade the infrastructure in the near future.Combining the results from both types of SRAs would provide very useful information for the technical scalability and replicability of the use case.
Despite the fact that scalability and replicability concepts have already been applied to ICT in other fields, mainly to computer applications and operating systems [6], there are no clear guidelines for their application in a smart grid context.In some cases, the relationship between the performance indicators and the unique requirements or constraints of the use case is not clear.This lack of clarity leads to non-homogeneous analyses, which, in turn, affects the conclusions drawn, as shown in [7], where the maximum Distributed Energy Resources (DER) penetration in a system varied depending on which ICT performance metric was analysed (latency or packet loss rate).
Additionally, the ICT SRA approach proposed in [8,9] was based on the qualitative evaluation of the different ICT attributes (reliability, computational resources, and manageability) of components and communication links by stakeholders.The main issue with this method is its reliance on stakeholders' assessment; the level of cooperation must be very high and stakeholders must decide the importance and effect of each attribute for the analysis, which can lead to biased input, particularly from ICT providers, when the number of stakeholders is small.Based on these critical links provided by the qualitative analysis, a quantitative analysis was performed in [9].However, the analysis did not consider the specific requirements or constraints of the use case to be used as a basis for the evaluation of the performance indicators considered.
The BRIDGE initiative at the European level provides high-level instructions, based on the SGAM, to perform an SRA regardless of the layer/dimension considered [10].Nevertheless, Ref. [10] pointed out that more precise instructions and techniques can be created for each layer or kind of technology.
Also within the BRIDGE initiative, Ref. [11] proposed an SRA methodology for smart grid projects.This methodology involves the identification of Key Exploitable Results (KERs) for each SGAM layer, with the aim of evaluating scalability and replicability as two overall indexes for each KER.In terms of ICT, Ref. [11] suggested assessing the use of open technology, standards, and communication protocols, as well as the interoperability of the systems, to determine if they can be replicated.To evaluate scalability, it was proposed to determine whether additional resources based on open standards would be necessary to expand the system.However, it is not clear how to carry out this mainly qualitative assessment and how to calculate and interpret the proposed scalability and replicability indices for ICT systems in smart grids.In addition, such an analysis would lack technical insights that mostly need a quantitative analysis, such as how many components the deployed ICT infrastructure could support or what conditions could affect the performance of the use case.
When scalability was not the objective of the analysis but was considered just another metric that characterises an ICT system [12][13][14][15][16], the assessment method was not specified or was qualitatively defined in general terms.
Conducting an ICT SRA can be challenging due to the absence of a well-defined approach and the wide range of factors to consider.This makes it difficult to ensure that the analysis yields the most useful insights about the scalability and replicability of the ICT involved in a solution.
This article aims to provide a common methodological basis for quantitative ICT SRAs so that the outcomes of such studies can be as beneficial as possible.For this, the concept of an ICT SRA map is introduced as a way of summarising SRA results, constituting a tool to determine the potential scalability and replicability of smart grid ICT systems so that each future implementation does not have to reinvent the wheel.This methodology is also validated and exemplified by applying it to two real case studies (one using wired technology and another one using wireless technology) from the EU-funded RESPONSE project.
Thus, the contributions of this paper are as follows: • A standalone step-by-step methodology to quantitatively analyse the scalability and replicability of ICT systems involved in smart grid solutions.The methodology makes use of the SGAM to identify the most critical part of the system to be analysed.It allows for the establishment of a clear relationship between the requirements and the performance indicators used and does not depend on the type of technology (wired or wireless) or the quantitative approach followed (simulations or experiments).This methodology covers the existing gap in guidelines on how to carry out a quantitative SRA focused on the ICT of smart grid solutions.

•
The novel introduction of ICT scalability and replicability maps as the outcome of ICT SRAs.To the best of the authors' knowledge, these maps have not been introduced before.The use of these maps allows for a quick overview of the scalability and replicability of a system in different scenarios and could be used to estimate the feasibility of non-analysed scenarios.

•
The validation and application of the proposed methodology to two real case studies involving different technologies and smart grid use cases.In both cases, the methodology is applied step by step and the usefulness of ICT SRA maps is exemplified.
This paper is structured as follows.First, the different scalability concepts are introduced in Section 2.Then, Section 3 describes the methodology developed to analyse the scalability and replicability of ICT systems in smart grids.This is followed by the application of the methodology to two case studies in Section 4. Finally, in Section 5, the main conclusions are drawn and ideas for future works are discussed.

Scalability
The discussion about whether scalability and performance evaluations of ICT systems are two different things remains open.Hennessy et al. [17] considered them to be two different types of research, considering scalability analysis to be more relevant.In some cases, scalability is considered just a characteristic of the system [12,13] or a qualitative requirement [14][15][16].
We agree with Bondi et al. [6] and Jogalekar et al. [18] that scalability and performance are closely related.Ultimately, a scalability analysis is a performance assessment of a larger version of the system.However, the reverse is not always the case.It is possible to conduct a performance evaluation without gaining any understanding of scalability; simply to adjust the system to certain operational conditions.
Two general dimensions for the scalability of smart grid solutions are differentiated [4]: scalability in size, when the system covers a larger area, and scalability in density, when the number of elements involved varies.
For ICT systems, different types of scalability can be distinguished, based on the categories provided by Bondi et al. [6] for operating systems and local area networks:

•
Load scalability: Whether the system works well with both light and heavy workloads.
A high workload can be due to an increase in the number of elements interacting (size) or an increase in the number of interactions or information exchanged between the elements (density).

•
Space scalability: Whether memory limits are exceeded when increasing the number of elements in the system (scalability in size).

•
Space-time scalability: Whether the system works well while significantly increasing the number of elements (scalability in size).

•
Distance scalability: Whether the system works well with short and long distances.This is related to the scalability in density since the number of elements does not vary.• Speed/distance scalability: Whether the system works well with short and long distances, regardless of the speed required (scalability in density).
• Structural scalability: Whether the standards implemented constrain the system.This is a type of qualitative analysis based on the technical specifications of communication protocols and technologies.
Figure 1 shows the different types of scalability, providing a complete picture of the scalability of ICT.

Density Structural
Space, memory Space-time Load Distance Speed/distance

Description of Quantitative ICT SRA Methodology
The methodology developed to perform a quantitative ICT SRA is summarised in Figure 2 and described below.It consists of up to seven steps that cover the characterisation of the ICT system and the definition of the scope of the analysis to the visualisation of the ICT SRA results through scalability and replicability maps.

Map the ICT System into the SGAM
The first step in the SRA is to obtain information about the implemented ICT, the topology, and the functioning of the system.This information can be mapped into the SGAM, which presents zones (process, field, station, operation, enterprise, and market), domains (generation, transmission, distribution, DER, and customer premises), and interoperability layers (component, communication, information, function, and business).A preliminary map that includes the components and communication layers would be sufficient for this step of the methodology to determine the scope and characteristics of the SRA.Obtaining the whole map is a complex task that requires time and is only useful if the scope of the SRA has already been set.

Scalability Questions and System Characteristics
Based on the SGAM, some initial scalability questions can be asked in order to determine how a scaled-up version of the system would look.This can be done by observing the domains and zones involved.To characterise the system from an ICT perspective, the focus would be on the interoperability layers.

Scalability Questions
Scalability questions are a set of simple initial questions to answer during the characterisation of the ICT system under study to determine the scope of the ICT SRA.If all the scalability questions can be answered without testing or performing simulations, a quantitative SRA is not necessary.
To formulate these questions, the value chain of the electricity system should be considered.In general, domains grow larger as they become closer to electricity customers.That is to say, electricity consumers are in the order of millions, DER may be in the order of thousands/millions, distribution grids must provide services to both consumers and DER (i.e., distribution customers), and transmission grids connect bulk generation with distribution grids.
From a scalability point of view, scaling up in one domain may affect the domain immediately above it.An example is the deployment of smart meters by DSOs, which, in many countries, are in charge of this process.Millions of smart meters have been deployed at the customer level in many countries, but the DSOs are the ones providing the means to monitor them.Another example would be the implementation of a Transmission System Operator (TSO)-Distribution System Operator (DSO) coordination scheme managed by the TSO in countries where there is only one TSO and hundreds or thousands of DSOs: the TSO would have to provide the necessary scalability to replicate such a coordination scheme with each DSO.
Zones within a domain also have this characteristic.From process to market, the number of components is expected to decrease.In the smart metering example, data collectors had to be deployed at the secondary substation level (field zone), which used the router deployed at primary substations (station level) to send the data to the central system (operation level).Therefore, in each zone, the ICT scalability is supported by the component that provides the connection in an upper zone.
This potential influence of scaling up components in SGAM domains and zones is illustrated in Figure 3.
Scalability questions can then be formulated, taking into account these zone and domain aspects.Two general examples would be:

•
Will the communications between the station and field zone work properly if the number of field devices increases?• Will the TSO operation system be able to cope with an increment in the amount of data exchanged with the DSO operation system?

Characterise the ICT System
As indicated in the first step, to formulate the scalability questions, it is necessary to have a description of the component layer of the system in the SGAM.As the ICT system is characterised for each SGAM layer, some of the scalability questions can be re-evaluated and even discarded.
The component layer provides two pieces of information.The first is the topology of the ICT system implemented, which is essential to know the communication links, potential information flows, and in which zones they are placed.This is relevant for scalability in size analysis.
The second is the technical characteristics of the devices, which, even with missing information, can provide an idea of the type of ICT implemented (i.e., wired or wireless) and the capacity of the devices.Depending on the amount of information available, this can be relevant for all types of scalability analysis.
The communication layer is built on top of the component layer, providing essential information for conducting a quantitative ICT SRA, regardless of the type of scalability considered.This layer indicates the communication technology (physical layer) used by each link in the component layer and the communication protocol implemented.ICT systems can be wired, wireless or hybrid, which determines the different key performance indicators used during the analysis.
The communication protocol determines how the components exchange information and may be key to the scalability and replicability of the system.If the protocol is proprietary, the replicability of the system can be affected, and if the specification is not freely accessible, it can be a huge obstacle to performing a quantitative SRA, which can potentially exclude these links from the scope of the analysis.
The information layer refers to the data models and information exchanged between components through the communication links.This determines the amount of information that needs to be taken into account in the analysis, as well as any overhead that may be added by the communication protocol used.This information is essential for performing a quantitative scalability analysis (in density and size) since it affects potential requirements such as latency and can be related to the existence of bottlenecks.
Finally, the function and business layers are related to the services provided by the system.These provide the frequency of data exchange, which is essential for determining whether the quantitative SRA is necessary, as it relates to the scalability in size and density.For example, some functions may require exchanging information once per day (e.g., daily market results), whereas others may need further resolution (e.g., monitoring of resources).The higher the frequency of exchange, the higher the probability of communication bottlenecks when scaling up the system to reasonable levels.In all likelihood, a once-per-day, non-essential exchange would need to scale up to disproportionate levels before experiencing information bottlenecks.

Minimum Requirements and Technical Constraints
Once the characteristics of the system have been obtained for all the layers of the SGAM, the functional requirements and technical constraints must be examined.These are typically provided by the function/business layers, which specify the frequency of data exchange (i.e., the first requirement for the system); the component layer, which determines whether the technical specifications of the devices and systems implemented are available; or the communication technology employed.Each smart grid solution would have different requirements [19] and, in all likelihood, the ICT selected would fulfil all the requirements of the use case [20].However, this compliance should be checked when scaling up and replicating the system.
For the analysis of ICT systems, these requirements can be related, but not limited, to the following:

•
Latency.When an application requires real-time communication, latency is typically the most important factor to take into account, making it the primary performance measure for the system, as it can affect the reliability of the smart grid [21,22] and is an essential requirement when designing control schemes for DER [23].Scalability requires that as the system grows, latency should remain below the limit set by the application.Replicability involves making sure that the system can maintain the same latency level under different conditions.• Aggregated communication time.The aggregated communication time is the total time taken for all communications within the system over a given period.For example, a smart metering data collector may need to collect all smart meters' data in less than 15 min.Scalability and replicability involve maintaining aggregated communication times below the limit under different conditions.

•
Bandwidth.The bandwidth indicates how much data can be transmitted through the communication channel in a given time.This can constitute a very important requirement when the communication channel is shared with other applications.
As the system scales, it should maintain the bandwidth used at acceptable values.

•
Reliability.This concept is related to the system's ability to correctly deliver the information being transmitted.This is an important requirement in all ICT, but especially in those that rely on wireless communications, as the signal may not reach its destination under certain conditions (e.g., weather conditions).Data loss can reduce the stability of the grid [23] and have an economic impact on the grid [24].A scalable system must be able to maintain high reliability, regardless of size and conditions.

•
Coverage.This refers to the geographical or network extent to which the communication system can serve effectively.It is a very important requirement in wireless communications to guarantee scalability and replicability and is deeply related to the reliability of the system.

•
Memory.Memory usage refers to the Random Access Memory (RAM) and storage consumption of the components that make up the system.Scalability requires efficient memory management of the different components to face increasing loads and avoid information bottlenecks that end up affecting the final application of the system.
In large ICT systems, the data collection and analysis of these requirements may be an extremely complex task.However, the scalability of a system is usually determined by the components that could potentially generate communication bottlenecks, so by restricting the scope of the ICT SRA to these critical components and their direct connections, the scalability of the entire system can be analysed.To identify potential information bottlenecks, a fast and simple approach is to analyse the system topology.As Figure 4 shows, information bottlenecks may appear in components that receive information from many components (many-to-one communications), send information to many components (one-to-many communications), and communicate bidirectionally with other components.In addition to this, when identifying potential information bottlenecks, the frequency of information exchange must be considered.As mentioned previously, the higher this frequency, the higher the likelihood of an information bottleneck when scaling up the system.

Bidirectional
One-to-many Many-to-one

Development of Scenarios
The scenarios analysed during the ICT SRA should cover a wide range of possible conditions for the replication of the system.For each scenario, its scalability in size (i.e., increasing the number of users, devices, or systems) should be evaluated.The conditions or characteristics that define each scenario must be identified for each SRA and may be related to the ICT used, the place where the system is implemented (environment), the devices deployed, and the functional characteristics of the system for the use case under study.At least one condition should be different from one scenario to another so that the impact on performance can be better assessed.
The type of ICT used (i.e., wired or wireless) may set the conditions, such as the topology of the system (wired technologies may allow for a bus or star topology), distance or area to be covered, or Bit-Error Ratio (BER).In addition, some communication protocols can be configured in different ways, which may fit larger-scale versions of the system more effectively.
The environment in which wireless communications are deployed can have a major effect on their performance.Different scenarios should be taken into account, including various types and sizes of obstacles, interference, and ambient noise.
The deployed devices could also provide some interesting scenarios for analysis.If the solution involves multiple types of devices, scenarios with different proportions of each type could be assessed.An interesting scenario could be defined to analyse the effect on performance when a different communication protocol is used on devices that are compatible with multiple protocols and standards, as long as the functionality of the use case is not affected.
Finally, functional characteristics could also be the basis for some scenarios.For example, for the analysis of scalability in density, different information sizes could be considered.However, it is important that the functional characteristics that are modified as part of a scenario do not alter the minimum requirements of the use case.That is, in a comprehensive SRA, a scenario should not involve changing any of the requirements by which the performance of the ICT system is to be evaluated.

Definition of Key Performance Indicators
The Key Performance Indicators (KPIs) defined must have the following main characteristics:

•
They must allow for evaluating whether the ICT system meets the minimum requirements identified in step 3. Therefore, the Key Performance Indicator (KPI)s should be related to these requirements and technical constraints.
• It must be possible to measure or calculate them in all the scenarios analysed.

•
For each KPI defined, an acceptance threshold must be stated.This, again, is determined by the requirements of the use case.

Development of a Simulation Model or Experiment
The are two main approaches to performing quantitative SRAs of ICT: performance tests with actual or emulated hardware and software, or simulations.
Conducting an SRA through laboratory tests or emulated hardware/software can be very precise, but it often requires a large financial investment to acquire the necessary equipment.In certain cases, the lack of resources for the analysis requires the simulation of some components [25].In other cases, equipment is used to replicate the performance of a particular system involved (e.g., internet delays in [26]).This approach can be cost-effective when researching platforms or software [27][28][29] since the wide range of cloud providers allows for creating production-like environments and collecting statistical data.
The most cost-effective and efficient way to conduct an ICT SRA is through simulations.This method is usually much faster to set up compared to a laboratory setting and provides a great deal of flexibility for exploring various scalability and replicability scenarios.When the technology being studied is wireless, simulations are practically the only way to carry out a comprehensive SRA, as it would require a large amount of resources to do so in an experiment.
NS-3 is a widely used, open-source, discrete-event network simulator, primarily employed in academia, that is centred on internet systems (wired and wireless).Despite its popularity, it is more challenging to use compared to other simulation frameworks due to the lack of graphical user interface tools [37].
Riverbed Modeler (formerly OPNET) is a commercial, discrete-event network simulator that offers a variety of validated models for different types of networks and technologies.This simulator provides a user-friendly graphical interface to configure and run simulations [37].
OMNeT++ is an open-source, discrete-event simulation platform designed for the simulation of wired and wireless communication networks.There are a variety of opensource extensions that can increase its capabilities.
The simulation software chosen for the analysis should be based on the knowledge and preferences of the user, the characteristics of the analysis, and the availability of free models [37].

Run Scenarios and Analysis of Results
Regardless of the approach selected for the analysis (simulation or experiment), the results of the ICT SRA can be represented in a scalability and replicability map so that the main conclusions of the analysis can be drawn quickly and efficiently.This map constitutes a valuable tool when considering scaling up or replicating the system in the future.Figure 5 shows an example of the structure and visual representation of an ICT scalability and replicability map.In the example, the SRA has identified five key conditions to be considered in the scenarios and a total of 12 values for these conditions.Therefore, the minimum number of scenarios for the SRA is 12.For simplicity, only two scenarios (S 1 and S 2 ) are exemplified.The scenarios are represented by a vertical line placed precisely where the system is at its limit to comply with all the requirements (in terms of number of components connected).If the system does not comply with the requirements in a scenario for any number of components considered, it is placed in the "Not feasible" zone of the map (S 1 in Figure 5).For each scenario, its conditions are represented graphically by a circle.The functional limit for each value condition is represented by green and red bars.For example, Figure 5 shows that in a scenario considering value 1.3 for condition 1 instead of value 1.2, the ICT system would support the connection of fewer components.Placing the ICT SRA results in a scalability and replicability map facilitates not only the task of summarising the results of the analysis and its conclusions but also the analysis of the impact of each scenario's condition on the scalability and replicability of the system.The Battery Management System (BMS) device is responsible for the management of the batteries deployed to provide electricity when needed, whereas the PV data logger is responsible for the management of the solar PV panels installed.

Application of the
The EMPAIR is a device that implements a set of hardware and software methods for cybersecurity.It can be installed in either electrical substations (station/field zone of the DSO) or renewable power plants (field zone of the customer domain).To communicate with the BMS and PV data logger, the Modbus TCP is used.The EMPAIR is compatible with different communication protocols (IEC 61850 Manufacturing Message Specific (MMS), Message Queuing Telemetry Transport (MQTT), IEC 60870-5-104, Modbus TCP/IP) and Application Program Interfaces (APIs) thanks to GeneSys, a control software for embedded applications.
The cloud hosts an EMS named Clevery, developed by EDF, for the optimisation of energy production.It communicates with the EMPAIR using IEC 61850 MMS and a Virtual Private Network (VPN) tunnel.

Scalability and Replicability Questions
Some initial scalability and replicability questions arise when observing Figure 6: 1.
What would be the effect of placing the EMPAIR in the distribution domain?This would mean increasing the size of the Local Area Network (LAN) or, in other words, increasing the distance (i.e., the length of the Ethernet cables) between the connected devices.There may be a maximum distance under which the operational requirements cannot be satisfied.

2.
What would be the effect of increasing the number of devices connected to the EM-PAIR?This question could also be studied in combination with the previous one.When placed at a Positive Energy Block (PEB) level, the results would show the maximum number of devices that can be controlled within a building; when placed at a Positive Energy District (PED) level, the operational contour defined by the distance and number of devices could be obtained.
Taking into account these questions, Modbus TCP communications over Ethernet in the system is the key part of the SRA, as the connection between the cloud and the EMPAIR does not raise any significant questions since it provides scalability by design.Therefore, the focus of the SRA would be the communications between the BMS, PV data logger, and the EMPAIR device.

Characterise the ICT System
The simplified SGAM layers of the DER control and monitoring system are depicted in Figure 7.An EMPAIR device is responsible for controlling and monitoring the solar PV and EMS (component layer).This is done through the Modbus TCP, which uses Ethernet connections between devices (communication layer).Measurements (battery and generation), control commands, and alarms are transmitted using the Modbus Protocol Data Unit (Modbus functions).The server for each type of information, its frequency of exchange, its size, and the Modbus function used to transmit the data are outlined in Table 1.The ultimate goal is to optimise the self-consumption of the PEB where the solution is implemented (business layer).Table 1.Functional characteristics of the control and monitoring system studied in case study A [38].The EMPAIR client can only establish a Modbus TCP connection with one server at a time.According to the exchange frequency shown in Table 1, the control and monitoring system must take an average of one minute to request all connected servers (to finish the poll).This would constitute the main functional requirement for the system when scaling up.The use of Ethernet cables (in this case, Cat-5e UTP cables) would set a distance constraint, as they can only be used up to a maximum of 100 m.

Development of Scenarios
To assess the scalability of the system under different conditions, an analysis is conducted of all the scenarios for a range of 2-192 servers in steps of 10, with a simulation time of 24 h.Table 2 shows the scenarios developed for the SRA of the ICT system in case study A, where scenario A1 is the baseline scenario for the analysis.The parameters or conditions that determine the scenarios are the topology of the ICT system (star VS bus), the distance between the client and the server, the device type (% of BMS devices to % of PV data logger devices), the BER, and the processing delay (time for the client to process the server's response).
The main purpose of scenarios A2 and A3 is to evaluate the replicability of the system if only one type of server is considered (only the BMS for A2, and only the PV data logger for A3) with respect to the baseline.Scenarios A4, A5, and A6 study the performance of the system if the client processes messages faster (A4), slower (A5), or if its process time is negligible (A6).To study the impact of the BER on performance, the first six scenarios (A1-A6) consider BERs of 10 −12 , 10 −7 , 10 −6 , and 10 −5 .Although Ethernet transmission generally provides a BER of 10 −12 , higher values represent the worst-case scenarios, which must be considered in the replicability analysis.Scenario A7 studies the performance of the system if the distance between the client and the servers is pushed to the limits of the Ethernet (≈100 m).
Finally, scenarios A8-A10 analyse what happens if the topology of the system is "bus" instead of "star" while keeping the distance to less than 100 m.
Table 3 summarises the scenarios that should be considered to assess the impact on the performance of the ICT system in different aspects.The main requirement is that the EMPAIR must be able to request all the necessary information from all the servers in one minute.Therefore, the main KPI would be related to either the time taken to complete the polling process or the polling time.As demonstrated in (1), the polling time in round j (T j ) is calculated as the sum of the time it takes for the client to request, receive, and process all the necessary information from each server i at round j for a total of N connected servers.
To truly assess the performance of the system, thousands of rounds must be studied.Therefore, the average polling time for all rounds and its Standard Deviation (SD) have to be calculated as KPIs.If the system manages to keep the average polling time to 60 s but its Coefficient of Variation (COV) is higher than 0.5% (SD of 300 ms), the client may be missing information from some of the servers in some rounds.

Simulation Model
The OMNeT++ simulator was used to model the Modbus TCP network connecting the EMPAIR to the BMS and the solar PV data logger.Modbus TCP is an application layer communication protocol for client-server communications between devices.The EMPAIR acts as the client, and the BMS and PV data logger act as the servers.
The client is assumed to be connected to the servers via a 100 Mbps Ethernet Cat-5e UTP cable, which has an estimated transmission rate of 2 × 10 8 m/s [39].The client, depending on the type of server, sends up to three types of requests with different characteristics (see Table 1): read measurements, read alarms, and write control commands.
The processing delay in the baseline scenario (A1) was set to 9 ms (∼111 requests/s), which is an intermediate value between an ESP8266 chip and a Raspberry Pi [40].
As mentioned above, the client can only establish a connection with one server at a time.After connecting, it requests the alarm values (which have the same frequency of exchange for both types of servers) and assesses whether it should send any other requests after receiving the response.The polling time should be one minute.To compensate for any polling-time deviations from 60 s, the client is programmed to use the last polling-time error for each new round.The priority of requests is as follows: alarms, measurements, and then control commands.However, the client does not request more than two information objects in the same connection, as in an actual implementation.

Results
The results of the analysis of the scenarios in case study A are presented in Figure 8, which provides the ICT scalability and replicability map of the Modbus TCP system analysed.The scenarios are placed graphically on the map depending on the maximum number of servers they would support, with the blue circles indicating their characteristics.The map shows the impact that the device type, topology, BER, and processing delay have on the scalability and replicability of the system.Starting with the device type, Figure 8 shows that increasing the share of BMS devices with respect to PV data logger devices significantly reduces the number of servers that can be connected to the EMPAIR.In the baseline scenario, A1, which connects 50% of BMS devices and 50% of PV data logger devices, the maximum number of servers is 152.This increases to 202 servers when they are 100% PV data logger devices (scenario A3) and decreases to 72 servers when they are 100% BMS devices.This is very interesting because it means that although scenario A2 does not have the margin to add 10 BMS devices to the operation of the system, it could add 4 BMS devices and 76 PV data logger devices (converting scenario A2 to A1).It can be said that in this case, from a functional point of view, one BMS device would be equivalent to 12.66 PV data logger devices.This can be explained by the functional characteristics presented in Table 1: once an hour, a BMS device has to send more information (48 bytes of measurements, which require more time to be transmitted) compared to a PV data logger device.When this happens, the requirement of keeping a polling time of 60 s must still be fulfilled, limiting the scalability of the system.

Number of servers
Although the limit of 152 servers in A1 (50-50% devices) can be increased to 162 by changing the topology of the system from star to bus, this change in topology would not have any effect when all the servers are of the same type (scenario A9 with respect to A2, and A10 with respect to A3).Therefore, the topology of the Modbus network has almost no impact.
Despite the fact that the device type has a significant impact, it is the BER of the Ethernet transmission that is determinant.Figure 9 shows the standard deviation of the polling time for scenario A2 (100% of BMS devices) for different BERs and numbers of servers.It can be observed that only a cable with a BER of 10 −12 can provide some scalability to the system (72 servers in scenario A2; maximum SD of 300 ms).However, this should not be a problem, as most Ethernet transmissions guarantee a maximum BER of 10 −12 .With respect to the processing delay, it obviously has an impact on the scalability and replicability of the system.Figure 10 shows the standard deviation of the polling time for scenarios A1 (baseline) and A5 (13.5 ms processing delay).A 42% increase in the processing delay decreases the maximum number of servers by 33% (from 152 to 102 servers).This increase in the processing delay translates into the same percentage increase in the SD of the polling time by up to 182 servers, as shown in Figure 10.Since the processing delay affects all the requests made by the client (EMPAIR) regardless of the type of server, it can be expected to always have an impact on the scalability of the system.This means that, for example, scenario A2 would have a maximum number of servers lower than 72 when increasing the processing delay.For this reason, the scalability and replicability map depicted in Figure 8 shows an orange bar for a processing delay of 13.5 ms.If the impact on the SD maintains its proportionality, the maximum number of servers in scenario A2 is estimated to be 52 for a processing delay of 13.5 ms.
Therefore, the ICT SRA results show that the scalability and replicability of the Modbus TCP control and monitoring system for DER are mainly determined by the type of connected devices and the processing delay of the client.The system was found to be very scalable, as long as the maximum distance of 100m for the Ethernet cable was not exceeded.Although the bus topology increased the scalability of the system in one scenario, it had no impact on others, so it cannot be firmly stated which topology would be better for scaled-up deployments.

Case Study B: Smart Metering and Sensing System
This case study examines the indoor conditions monitoring system implemented in a PEB consisting of 96 dwellings in Turku, Finland, as part of the EU-funded RESPONSE project.Through the analysis of the scalability of this system, it can be optimised for its future implementation at the city district level.The edge sense [41] is a wireless sensor that is placed in apartments to measure temperature and humidity.Therefore, it is in the customer domain and the process zone of the SGAM, as shown in Figure 11.It transmits these data multiple times each hour to the edge hub via a wireless M-Bus.The wireless M-Bus is a communication protocol mainly defined at the application, data link, and physical layers of the Open Systems Interconnection (OSI) model.
The edge hub [42] is a building access point device that offers both Global System Mobile (GSM) and wireless M-Bus connectivity.It is placed in the station/field zone of the SGAM.This allows for the collection of sensor data and makes it available to the energy management service in the cloud.Although it constitutes a potential application for the future, this specific use case did not involve the provision of services to the DSO, so the edge cloud is considered to be in the operation zone of the customer domain.

Scalability and Replicability Questions
By observing Figure 11, some initial scalability and replicability questions arise.

1.
What would be the effect of increasing the area to be covered by the edge hub?This would mean increasing the distance between the edge sense devices and the edge hub, as well as increasing the number of sensors.

2.
What would be the effect of increasing the number of sensors connected to the same edge hub?Since modifying the distance would be limited by the wireless communication, increasing the number of sensors connected to a single edge hub could pose a significant challenge: the wireless medium is shared by all the sensors, and all of them need to send their measurements at a minimum time interval.
Based on these questions, the wireless M-Bus communications of the system are the key part of the SRA, as the connection between the edge hub and the cloud does not raise any significant questions about scalability and replicability.Therefore, the focus of the SRA is the communications between the sensors and the edge hub.

Characterise the ICT System
The simplified SGAM layers of the wireless M-Bus system analysed in case study B are depicted in Figure 12.Table 4 outlines the technical characteristics of the multiple sensors that communicate with a single edge hub.The purpose of the system is to monitor the indoor conditions in order to optimise energy consumption and achieve the desired indoor climate.The messages transmitted by the wireless M-Bus are expected to be a few bytes in size, containing information such as indoor temperature and humidity.

Minimum Requirements and Technical Constraints
The optimisation algorithm requires data frequently.Sensors must provide new measurements at least every 15 min, which is a common time interval for smart meters.Therefore, the edge hub must be able to receive measurements from all the sensors deployed in ≤15 min (aggregated communication time); if it takes more time, some sensors' measurements could be missed.This means that the edge hub constitutes a potential information bottleneck of the ICT system.Since wireless communications share the transmission medium (i.e., the air), some factors should be considered for the SRA:

•
The presence of obstacles to the wireless transmission, such as walls, objects, etc.

•
The presence of background noise due to other devices.

•
The probability of message collision.If sensors send information to the edge hub at the same time, messages could collide and be missed.To avoid this, the wireless M-Bus defines a first-transmission and a retransmission scheme.To achieve a probability of reception of 95%, each message must be sent at least twice within the update period (15 min).Based on the EN 13757-4:2019 specification, the first transmission time for the baseline system is defined by a uniform distribution between 0 and 300 s (5 min).The retransmission time interval, t acc , of each message is determined using (2).The nominal transmission time (t nom ) is set to 300 s, and n acc is the access number, which must be between 0 and 255.Each sensor randomly generates a new n acc when installed and increases it by one every 15 min, restarting when it reaches 255.

Development of Scenarios
Figure 13 shows the baseline building block (96 dwellings; 2500 m 2 ) of the system.
To assess the scalability of the system under different conditions, an analysis is conducted of all the scenarios developed for a range of 96-192 sensors in steps of 12. Table 5 shows the scenarios developed for the SRA of the ICT system in case study B, where scenario B1 is the baseline scenario in the analysis.The parameters that determine the scenarios are the area to be covered by the system, the thicknesses of the walls of the buildings, the size of the information transmitted, the background noise, and the statistical distribution considered to determine the first transmission time of the messages.Scenarios B2 and B3 are load scenarios (scalability in density), as the information size is modified to 50% (B2) and 150% (B3).Scenarios B4.1 and B4.2 constitute replicability scenarios, as the background noise is changed to −70 and −60 dBm, respectively.
Previously, it was mentioned that the first transmission time for the messages in the baseline system is defined by a uniform distribution between 0 and 300 s (5 min).An interesting replicability scenario would be what the performance of the system would be if, instead of a uniform distribution, a Gaussian distribution was implemented.Scenarios B5, B6, and B7 are equivalent to B1, B2, and B3 but with a Gaussian distribution.The means considered for the distribution are (in minutes) 2.5, 5, 7.5, and 10, whereas the standard deviations considered are 2.5, 5, and 7.5.Therefore, twelve distributions are analysed for scenarios B5, B6, and B7.
To study the performance when increasing the thickness of the walls of the building, scenario B8 considers an increase of 10cm in the wall thickness.While retaining the conditions of scenario B8, scenario B9 doubles the area to be covered by the solution (scalability in density and size).This would mean considering two building blocks, similar to the one shown in Figure 13.With the exception of the larger area, scenarios B10, B11.1, B11.2, and B12 are homologous to scenarios B1, B4.1, B4.2, and B5, respectively.
Table 6 summarises the scenarios that should be considered to assess the impact on the performance of the ICT system.Since the most restrictive requirement is that the edge hub must retrieve data from all the sensors every 15 min, the reliability of the wireless M-Bus communications must be assessed.
For this, the three main KPIs taken into account are the delivery ratio of the network, the message-error ratio, and the gross delivery ratio.The delivery ratio (3) measures the proportion of messages with new data received by the edge hub, whereas the message-error ratio (4) measures the proportion of messages received with errors due to interference.

Delivery ratio =
#Messages processed #New data messages essage error ratio = #Erroneus messages #Messages received The gross delivery ratio (5), on the other hand, measures the proportion of messages that reach the edge hub, including those with errors.
Gross delivery ratio = #Messages received #Messages sent

Simulation Model
The wireless M-Bus network was simulated using the OMNeT++ simulator.The sensors and the edge hub were modelled according to their technical specifications [41,42] (Table 4).
The wireless M-Bus communications were modelled considering the following characteristics:

•
Transfer S-mode of the wireless M-Bus is used.

•
Messages have a total size of 38B in the baseline scenario.

•
Communications are unidirectional (i.e., S1 mode) from the sensors to the edge hub.

•
Sensors take new measurements every 15 min.

•
The only impediments to the wireless signals taken into account are the walls and floors of the buildings, assuming they are constructed of concrete.To this end, the 3D model of the PEB, depicted in Figure 14 (top view), was created in OMNeT++.

•
The transmission medium model implements three models included in the INET library [43]: the free-space path loss model (FSPL), the isotropic dimensional background noise model (background noise model), and the dielectric obstacle loss model.These are implemented following the formulation described in [44].The FSPL model + obstacles was chosen for the simulation because it provides an appropriate perfor-mance level (not too optimistic, not too pessimistic) when an empirical model is not possible [45].

Results
The results of the analysis of the scenarios in case study B are presented in Figure 15, which provides the ICT scalability and replicability map of the wireless M-Bus system analysed.The scenarios are placed graphically on the map depending on the maximum number of sensors they would support, with the blue circles indicating their characteristics.Starting with the scenarios that allow for scalability of the system, the effect of the size of the information is remarkable.The baseline size (38 B) and the smaller one (19 B) do not impact the scalability of the system, allowing it to scale up to 192 sensors, whereas the larger one (57 B) limits the scalability to 108 sensors (scenario B3).This is explained by the low data rate of the S-mode in the wireless M-Bus (16,384 kbps) and the use of a uniform distribution of 5 min for the first transmission.Larger messages require longer transmission times, increasing the probability of message collision as the number of sensors increases.
However, scenario B7 manages to overcome the limitation imposed by the size of the information.This scenario allows for the deployment of up to 192 sensors by using a Gaussian distribution instead of a uniform distribution for the first transmission time of messages.Figure 16 shows that this is true for all the Gaussian distributions considered and that outstanding performance can be expected when the standard deviation time is 7.5 min.This means that when replicating the solution, if a larger amount of information needs to be transmitted per sensor (for example, because they include additional measurements or other data), a better approach would be to configure the sensors to follow a Gaussian distribution instead of a uniform one for the first transmission.
Delivery ratio scenario B7 It should be noted that 57% of the scenarios studied would not allow for the scalability and replicability of the ICT system.This means that the system would have to reduce the number of sensors from the demonstration's 96 sensors to enable replication in scenarios B4.1, B4.2, and B8-B12.By considering these scenarios, it is possible to gain useful knowledge about the scalability and replicability of the wireless M-Bus system.For this, Figure 17 plots the delivery ratios, message-error ratios, and gross delivery ratios of the baseline scenario (B1) and scenarios B4.1, B4.2, B8, and B10.Scenarios B4.1 and B4.2 in Figure 17 show that the impact of background noise is significant.In urban settings, a background noise level of -90dBm is considered acceptable, which has no effect on the system analysed.However, if the noise is higher, such as −70 or −60 dBm, the system's capabilities would be significantly reduced.Although the delivery ratio is close to the acceptable threshold (0.9) in scenario B4.1, the message-error ratio is excessive (≈0.2) for the use case.Regarding scenario B4.2, less than half of the new measurements are received, resulting in extremely poor performance.
The impact of wall thickness is shown by scenario B8 in Figure 17.Increasing the wall thickness from 10 cm (B1) to 20 cm (B8) implies a decrease in the delivery ratio of ≈0.15.Despite this, this could be considered a moderate decrease (the delivery ratio in scenario B8 is close to the acceptable threshold of 0.9), and in this case, the impact on the gross delivery ratio should also be considered.This ratio is ≈0.9 for scenario B8, which means that approximately 10% of the messages do not reach the data collector.Since this ratio remains quite stable regardless of the number of sensors, the main cause for non-received messages is not message collision but the thicknesses of the obstacles, which prevent messages from reaching their destination.Scenario B10 in Figure 17 shows the impact on performance of increasing the deployment area from 2500 m 2 to 5000 m 2 .Despite the gross delivery ratio remaining invariant with respect to B1 (low impact of message collision or lost messages), the delivery and message-error ratios are much worse (both are ≈0.7).These ratios reveal an interesting fact: although 70% of the messages are not properly processed due to the presence of errors, the remaining 30% that do not contain errors account for 70% of the measurements that need to be processed.This could mean that the data collector cannot process the messages from the sensors that are further away since the obstacles and background noise that the signal encounters on its way increase its BER, and a second data collector is necessary, which would require further analysis.These results, together with those presented in a preliminary analysis of this system [44], indicate that the system presents high-density scalability: As long as the area of deployment does not increase from 2500 m 2 , the system would be able to support at least 384 sensors [44].However, the performance of the system is deeply affected when scaling in area size.
Therefore, the boundaries for the scalability and replicability of the wireless M-Bus system for smart metering and sensing using just one data collector are determined by the size of the information to be transmitted, which can be addressed by implementing a Gaussian distribution for the first transmission and by considering the background noise of the environment, the size of the area to be covered, and the thicknesses of walls.These aspects should be taken into account when considering a new implementation, changing the characteristics of the system, and scaling up in density.

Conclusions
The inclusion of ICT in the scope of a technical SRA would allow for a complete understanding of the scalability and replicability of smart grid solutions, which are increasingly dependent on ICT.
This paper has proposed a novel methodology for quantitatively performing an ICT SRA in a smart grid context.This methodology uses the SGAM as a basis to characterise the system and define the scope of analysis, as a quantitative analysis may not be necessary in all cases.The proposed approach does not depend on the use case, communication technology, or the quantitative approach (simulations or experiments) selected.
To validate the proposed methodology, it was applied to two case studies comprising solutions that use different communication technologies and are demonstrated in the EUfunded RESPONSE project.Case study A analyses the scalability and replicability of a Modbus TCP control and monitoring system for DER, whereas case study B analyses a wireless M-Bus system for smart metering and sensing.
The ICT SRA results of both case studies are summarised through their corresponding ICT scalability and replicability maps, a concept introduced by this paper for this type of analysis.These maps allow for a quick overview of the scalability and replicability of an ICT system without involving complex plots of results that may be difficult to interpret.In addition, they offer an efficient way of estimating the feasibility of potential scenarios that were not explicitly considered during the SRA.
The application of the methodology shows its effectiveness in analysing, in a structured way, the scalability and replicability of an ICT system by focusing on the most critical links, which are identified through a prior characterisation of the system.The clear identification of requirements and constraints enables drawing clear conclusions about the scalability and replicability of the system, as well as the main factors impacting these aspects, regardless of the type of ICT (wired or wireless).
Future research may apply this methodology to other smart grid solutions to further validate it.It could be applied to the analysis of dynamic scalability (e.g., cloud-based solutions) to further prove the versatility of the methodology.However, the case should be selected carefully since there are likely other components in the solution that could limit scalability, as the two studies presented in this paper show.One challenging work could be expanding this methodology through the definition of simple, numeric, scalability, and replicability indicators to be calculated based on the results obtained so that different ICT alternatives for the same use case could be further compared.In addition, it would be interesting to complement this methodology with a qualitative approach to evaluate aspects such as interoperability and standardisation.

Figure 3 .
Figure 3. Potential influence of scaling up components in SGAM domains and zones.Note: Customers include DER and consumers.

Figure 4 .
Figure 4. Types of communications between devices and/or systems to consider for the identification of potential bottlenecks.

Figure 5 .
Figure 5. Structure and visual representation of an ICT scalability and replicability map.
Figure6shows the component layer of the SGAM for the system.It consists of four main elements: the cloud, Equipement Modulaire de Protection des Accès Industriels Répartis (EMPAIR), Energy Management System (EMS), and PV data logger.The Battery Management System (BMS) device is responsible for the management of the batteries deployed to provide electricity when needed, whereas the PV data logger is responsible for the management of the solar PV panels installed.The EMPAIR is a device that implements a set of hardware and software methods for cybersecurity.It can be installed in either electrical substations (station/field zone of the DSO) or renewable power plants (field zone of the customer domain).To communicate with the BMS and PV data logger, the Modbus TCP is used.The EMPAIR is compatible with different communication protocols (IEC 61850 Manufacturing Message Specific (MMS), Message Queuing Telemetry Transport (MQTT), IEC 60870-5-104, Modbus TCP/IP) and Application Program Interfaces (APIs) thanks to GeneSys, a control software for embedded applications.The cloud hosts an EMS named Clevery, developed by EDF, for the optimisation of energy production.It communicates with the EMPAIR using IEC 61850 MMS and a Virtual Private Network (VPN) tunnel.

Figure 6 .
Figure 6.ICT system of case study A mapped into the SGAM: component and communication layers.

Figure 7 .
Figure 7. Simplified SGAM characterisation of the ICT system of case study B.

Table 3 .
Scenarios to be compared depending on the objective of analysis for case study A.

Figure 9 .
Figure 9.Standard deviation of the total polling time for different BER and number of servers in scenario A2.

Figure 10 .
Figure 10.Standard deviation of the total polling time for different numbers of servers in scenarios A1 and A5.

4. 2 . 1 .
Figure 11 illustrates the system mapped into the component layer of the SGAM.It comprises three main components: edge cloud, edge hub, and edge sense.The edge sense[41] is a wireless sensor that is placed in apartments to measure temperature and humidity.Therefore, it is in the customer domain and the process zone of the SGAM, as shown in Figure11.It transmits these data multiple times each hour to the edge hub via a wireless M-Bus.The wireless M-Bus is a communication protocol mainly defined at the application, data link, and physical layers of the Open Systems Interconnection (OSI) model.The edge hub[42] is a building access point device that offers both Global System Mobile (GSM) and wireless M-Bus connectivity.It is placed in the station/field zone of the SGAM.This allows for the collection of sensor data and makes it available to the energy management service in the cloud.Although it constitutes a potential application for the future, this specific use case did not involve the provision of services to the DSO, so the edge cloud is considered to be in the operation zone of the customer domain.

Figure 11 .
Figure 11.ICT system of case study B mapped into the SGAM: component and communication layers.

Figure 12 .
Figure 12.Simplified SGAM characterisation of the ICT system of case study B.

Figure 13 .
Figure 13.Baseline building block in Turku, Finland, for case study B.

Table 6 .
Scenarios to be compared depending on the objective of analysis for case study B.

Figure 14 .
Figure 14.Top view of the 3D model in OMNeT++ for the PEB.

Figure 15 .
Figure 15.ICT Scalability and replicability map of case study B illustrating the analysed scenarios.

Figure 16 .
Figure 16.Delivery ratio of scenario B7, which depends on the standard deviation and mean (in minutes) of the Gaussian distribution used to determine the first transmission time of messages.

Table 2 .
Scenarios simulated for the ICT SRA of case study A. Note: Scenarios A1-A6 include the analysis of three different BERs for each one (10 −12 , 10 −6 , and 10 −5 ).

Table 5 .
Scenarios simulated for the ICT SRA of case study B.