Impact of an ML-Based Demand Response Mechanism on the Electrical Distribution Network: A Case Study in Terni

: The development of smart grids requires the active participation of end users through demand response mechanisms to provide technical beneﬁts to the distribution network and receive economic savings. Integrating advanced machine learning tools makes it possible to optimise the network and manage the mechanism to maximise the beneﬁts. This paper proceeds by forecasting consumption for the next 24 h using a recurrent neural network and by processing these data using a reinforcement learning-based optimisation model to identify the best demand response policy. The model is tested in a real environment: a portion of the Terni electrical distribution network. Several scenarios were identiﬁed, considering users’ participation at different levels and limiting the potential with various constraints.


Introduction
The distribution network (DN) faces a challenging transition period due to the spread of renewable energy sources, which, at the European level, are expected to reach 40% of the energy mix in 2030 to reduce greenhouse gas emissions by 55% compared with 1990 [1].The grid must move from a static infrastructure, with large generation centres and unidirectional power transmission, to a more agile infrastructure, with many distributed energy resources (DERs) and millions of IoT devices connected and continuously exchanging data with the grid.In smart grids, end-users can actively participate in the transition with their choices (energy communities, electric vehicles, and demand shifting).Several solutions must be adopted and implemented in the grid.In this paper, the authors focus on the impact of optimised demand response (DR) systems on the DN, seeing the benefits of mass deployment.
Electrical utilities consider using DR programs to improve their networks' stability, efficiency, and reliability in response to increased power demand [2] or when the grid is affected by unplanned events [3].The goal of DR is to allow consumers to contribute to the operation of the electric grid by reducing or shifting their electricity usage to offpeak hours, allowing the grid to relax.Advancements in automated infrastructure and DN technology have enabled residential consumers to participate in demand curtailment plans [4].Distribution system operators (DSOs) require an advanced management system to proactively manage access to large-scale flexible resources using smart grid technologies to achieve a reliable, robust, cost-effective, and optimised network.A DSO's management system is a centralised platform capable of hosting various optimisation schemes and integrating and managing loads, generation, and other field devices at all distribution voltage levels in real time.It can determine the DR to keep the active DN within safe operating limits.As a result, in addition to financial benefits for DR participants, DSOs can address technical network issues such as voltage or thermal constraints, transformer phasor measurement units (PMUs).Zhu et al. [31] present a distributed algorithm to flatten load profiles while minimising individual customer costs.
As described previously, the literature concerning the application of DR mechanisms presents numerous models and mechanisms mainly simulated on electrical datasets, often aimed at cost-effectiveness analyses and dynamic pricing programme scheduling.A lack is noted in the availability of real data and the use of innovative models based on machine learning.Indeed, by addressing these challenges and shortages, this paper aims to prepare an optimal management mechanism of network flexibility resources with DR, using RL analytics applied to a real Italian DN.Within the tool, a forecast model, i.e., a recurrent neural network (RNN) based on gated recurrent unit (GRU) layers, allows for identifying the trend in the load and generation in the next 24 h; this model has been used for online forecasting based on streaming dataset collected from real grids on a continuous basis.The optimisation and forecasting ML models have been chosen as an example of a real and effective application and do not claim to be the most effective algorithms.
In addressing the gap in the literature, the contributions of this paper can be summarised as follows:

•
Modelling and testing an original ML-based forecasting model, with continuous model training, applied to the PV production trend and industrial and domestic consumption trends;

•
Modelling and testing an original ML model for optimising an electricity grid, leveraging local flexibility resources; • Integration of the models and deployment on a real power grid, analysing the benefits for the DN.
This paper is organised as follows: the introductory section presents the problem, the literature analysis, and the contributions of this article to the existing literature.Section 2 presents the ML-based forecasting models used for load and generation prediction and grid optimisation, as well as the methodological outline of this study.Section 3 presents the real case study of the Terni DN.Section 4 shows the results of applying the models in the real case, using the previously defined indicators, and finally, Section 5 reports the conclusions.
The work performed in this paper was founded by the IoT-NGIN European Project [32] and leverages the experience and contribution of some of the Consortium partners.

Methodology and Models
In order to evaluate the effectiveness of the ML tools developed, some energetic indicators highlighting the benefits of the DN are used.These indicators are network losses, the self-consumption rate (SCR), the self-sufficiency rate (SSR), and the reverse power flow (RPF), i.e., the energy flowing from the DN to the primary substation (PS) in the opposite direction to the one commonly followed.The equations to calculate SSR and SCR are as follows [33].
where E i is the energy fed into the grid, E p is the energy produced, and E a is the energy absorbed by the grid.

Forecasting Model
The forecasting model is based on online learning, i.e., the training of ML models is performed continuously when new data are available.This paradigm acquires great importance in the context of IoT due to the large number of sensors or devices that can be present in common scenarios and the large amount of information captured by them dynamically.The online learning service supports both application programming interface (API) REST requests and streaming data since most IoT devices generate communication flows in real time.
Forecasting the power generated by the grid over a future time frame, based on historical data collected from grid sensors, consists of predicting the following power values from the last past measured ones.For example, given the last measured 36 h of generated power, the model can predict the next 24 h.Forecasting is computed using a trained ML model that inferences the prediction based on a given input tensor of measured values.This is an example of a univariate time series prediction problem addressed using ML modelling techniques.Before the ML model is ready for computing predictions, it must be trained with historically generated power data.As the data are continuously pushed by the sensor into an MQTT topic, learning is conducted using an IoT-NGIN MLaaS online learning service [21] that collects online data from the topic and trains the model after applying the following data pre-processing procedure: (i) power is computed from sensor data, (ii) the data are resampled by averaging over each hour, (iii) the data are scaled in the range [0, 1] as required by the ML algorithms applied during training, and (iv) the data are windowed across collected power data to create a dataset with the shape required by the ML training algorithms.
The ML model designed for forecasting the generated power of the grid is a DL model with layers based on RNNs [22].RNNs have been demonstrated to work well in predicting the future behaviour of time series, although they present some disadvantages, such as the vanishing gradient problem [34].To overcome this problem, alternative architectures for ML models, evolving from RNNs, have been proposed, including LSTM and GRUs [35].For the power forecasting problem, the GRU is used since it solves the vanishing gradient problem suffered by the original RNN and converges faster than other types of RNNs (e.g., LSTM).After the recurrent layers, fully connected layers are added for applying linear transformations to the outputs of the GRU layer.Figure 1 depicts the layers of the implemented architecture.A module is implemented for explainable artificial intelligence (XAI), which attempts to explain the predictions made to answer the question: Why has the model made this prediction?XAI is a set of methods and processes that help to comprehend and trust the prediction of the ML model.Moreover, it helps to characterise model performance by providing the impact of the input data for a given prediction, adding transparency to the prediction and capacity for model bias detection.
The OL service is deployed into MLaaS using Kserve [36] with Kubeflow [37].Kserve allows for deploying three types of components: predictor, transformer, and explainer.Each of these components exposes a REST API as an HTTP service.
The monitoring service consists of an HTTP endpoint deployed using the FastAPI framework [38], a Prometheus engine [39], and a Grafana web tool [40].

Grid Optimisation Model
The grid optimisation model, aiming to maximise its SCR and SSR ratios, is designed as an RL-based optimisation model [41].RL is an ML discipline centred on the learning of This proposed model is trained to minimise the mean squared error (MSE) between the real values and the predictions.In order to understand the performance of the forecasting model, the lower the MS, the better the effectiveness.The MSE is calculated using Equation (3): where x t is the real value of the PV production or the building consumption time series at hourly intervals t of the evaluation period and xt is the produced forecast of the respective model.A module is implemented for explainable artificial intelligence (XAI), which attempts to explain the predictions made to answer the question: Why has the model made this prediction?XAI is a set of methods and processes that help to comprehend and trust the prediction of the ML model.Moreover, it helps to characterise model performance by providing the impact of the input data for a given prediction, adding transparency to the prediction and capacity for model bias detection.
The OL service is deployed into MLaaS using Kserve [36] with Kubeflow [37].Kserve allows for deploying three types of components: predictor, transformer, and explainer.Each of these components exposes a REST API as an HTTP service.
The monitoring service consists of an HTTP endpoint deployed using the FastAPI framework [38], a Prometheus engine [39], and a Grafana web tool [40].

Grid Optimisation Model
The grid optimisation model, aiming to maximise its SCR and SSR ratios, is designed as an RL-based optimisation model [41].RL is an ML discipline centred on the learning of optimal behavioural policies for the decision-making of a group of agents interacting in a common environment, leading to a maximisation of a cumulative reward (i.e., expertdefined performance metric).In the context of control systems, the learned policy allows for deploying deterministic or stochastic control logic/instructions for agents interacting with the end system, such as maximising the grid SCR and SSR ratios.Some of the main concepts of RL involving the environment, agents, states, actions, rewards, observations, and policies are shown in Figure 2:

•
The environment refers to the physical or simulated space that the agents interact with.

•
Agents are entities that are affected by and are in a position to interact with the environment by taking action.

•
Agents take a state (e.g., vector) that represents their status at every point in time.
States are defined as a discrete or a continuous closed set.

•
A set of actions is defined for the agents to take.This group is defined as a discrete or a continuous closed set.

•
Rewards are given by the environment after the undertaking of actions by the autonomous agents.

•
Observations are pre-processed snapshots (e.g., in the form of vectors) collected after each transition, which gather relevant variables from the environment as well as the previous state and the actions taken, resulting in the state and the observed reward.

•
Policies, in broad terms, are the learned (deterministic or stochastic) mapping between the set of states and the set of actions.Depending on the problem to be tackled, experience is gathered with interactions in either a simulated or a real (digital or physical) environment.Based on the knowledge of the environment, it is possible to distinguish between model-free and model-based approaches [41].For the reinforcement algorithm to learn/approximate an optimal policy, it needs to buffer enough experience from the environment.The experience collected in the model comprises transitions (steps within episodes), which are made up of observations including relevant variables from the environment, the previous agent status, the action taken, the following states, and the observed reward.In the following, the RL elements for the grid optimisation model are described.The RL service is deployed using KServe with the Tensorforce framework [42].

Environment
As interactions with the real grid are not possible, a grid simulator is implemented Depending on the problem to be tackled, experience is gathered with interactions in either a simulated or a real (digital or physical) environment.Based on the knowledge of the environment, it is possible to distinguish between model-free and model-based approaches [41].For the reinforcement algorithm to learn/approximate an optimal policy, it needs to buffer enough experience from the environment.The experience collected in the model comprises transitions (steps within episodes), which are made up of observations including relevant variables from the environment, the previous agent status, the action taken, the following states, and the observed reward.In the following, the RL elements for the grid optimisation model are described.The RL service is deployed using KServe with the Tensorforce framework [42].

Environment
As interactions with the real grid are not possible, a grid simulator is implemented based on the pandapower framework [43].This framework uses the Flow Pypower power solver [44] to create a calculation network program to automate and optimise power systems.As a result, the simulator describes the same environment state as the real grid when connected with the same sources of power generation and consumption.The operation of the simulator is as follows: once the electrical network is specified in pandapower, the simulator reads the power loads demanded by all the consumer groups connected to the grid and the power generated along a day.The data have a resolution of 15 min, so there are 96 values per day of domestic and industrial consumers' power demand.Next, the simulator introduces the loads on the grid and performs a simulation.Then, the simulator outputs the grid state, consisting of the parameters to be optimised: network losses, SCR, SSR, and RPF.

States
The optimiser acts on the customers' energy demand: it modulates the distribution of energy demand throughout the day, so it is necessary to know the state of energy demand or, equivalently, the distribution of current energy demand.Two types of customers can be distinguished: domestic and industrial.There are 13 loads (client groups) of each.To understand better what the data look like, a pre-analysis of a dataset collected over a year was carried out.Figures 3 and 4 show the average distribution of, respectively, each domestic and industrial load for one day.The available data, provided by the grid owner, do not show the annual curve for each user but provide an averaged curve relating to the industrial cluster and one relating to the domestic cluster, as well as the photovoltaic production curve per unit of installed power.Furthermore, the capacities of each load and each photovoltaic system are known.In fact, the user curves are proportional to each other according to the installed power.To simplify the optimisation model, it only acts on higher loads, i.e., those having a stronger influence on the grid (the top eight loads, combining four domestic and four industrial loads, are selected since they cover, respectively 70% and 77% of the total cluster load).It is considered that the flexibility of users, i.e., the ability to vary their consumption curve, is proportionally the same for all users, with the difference being that larger users are able to provide higher-performance services for the entire DN.Reducing the number of actors simplifies the computational aspects of the problem and speeds up the solution.
It can be seen that the trend in domestic users peaks in the morning hours, and the two main peaks are around lunchtime and dinnertime.This is an averaged curve, which therefore considers full-and part-time workers, students, and pensioners.The curve for industrial and commercial users presents a peak in the early hours of the day, corresponding to the start-up of engines and other consumer devices, and a slow decrease in consumption, interrupted only by a recovery around noon.

Discretisation of States
As the power demand continuously changes, it must be discretised, as a discrete state space is needed for training the optimiser in a reasonable time.The state space should be as small as possible in order to be completely explored by the optimiser so that the complete policy is learned.However, an excessive reduction of the time interval would increase the state space, imposing higher exploration time on the optimiser training process.After initial tests with quarter-hourly discretisation, it was noted that an extremely large number of episodes were required by the optimiser to explore the state space, despite running on a very high-performance HPC cluster.Hence, it was decided to carry out hourly load discretisation.An approach with much lower discretisation times, such as 15 s or less, would be desirable for electrical aspects (overcurrents, voltage issues. ..), for energy aspects, such as SCR or SSR, as the objective of the grid optimiser, there is no need to use short analysis times.In this respect, the Italian legislation also uses hourly resolution times for the calculation of the SCR for the calculation of incentives for renewable energy communities [45].
Therefore, the states consist of the sum of all the loads, discretised by the hour.This approach ended with 48 states, consisting of two vectors of 24 elements each (one for domestic consumers and another for industrial consumers).Each state could take a continuous value in the range [0, 1], that is, the percentage over the total daily demand for each daily hour.Therefore, the range [0, 1] was also discretised into 10 bins.Furthermore, it is possible to apply a further normalisation by dividing the state value by a "load threshold" so that the possible states of demand in the range [0, 1] are constraints (to reduce the state space dimension) to those values with a higher likelihood of being explored and applied by the optimiser.In this way, the simulator can explore more states in a reasonable time.It can be seen that the trend in domestic users peaks in the morning hours, and the two main peaks are around lunchtime and dinnertime.This is an averaged curve, which therefore considers full-and part-time workers, students, and pensioners.The curve for industrial and commercial users presents a peak in the early hours of the day, corresponding to the start-up of engines and other consumer devices, and a slow decrease in consumption, interrupted only by a recovery around noon.It can be seen that the trend in domestic users peaks in the morning hours, and the two main peaks are around lunchtime and dinnertime.This is an averaged curve, which therefore considers full-and part-time workers, students, and pensioners.The curve for industrial and commercial users presents a peak in the early hours of the day, corresponding to the start-up of engines and other consumer devices, and a slow decrease in consumption, interrupted only by a recovery around noon.

Actions
The RL-based optimiser seeks to optimise the energy demand of the grid.To do so, the distribution of energy demand of each load is shifted throughout the day.This optimiser uses four sets of discrete actions (see Table 1).The first action set is the load selection.This action set includes eight possible actions (select one load from the eight available).The second action set determines the start and end time of the energy displacement.The third action set determines the percentage of energy that is shifted at different times.Three different energy shifts were considered: 1%, 5%, and 10%.For example, if from 8:00 a.m. to 9:00 a.m.there is an energy demand of 10 kW and it is chosen to make a 10% shift to the time slot between 9:00 a.m. and 10:00 a.m., the second time slot will increase the power demand by 1kW, and the energy demand from 8:00 a.m. to 9:00 a.m. will decrease it by the same amount.This resulted in a set of 147,456 possible actions.
The energy demand shift presents several restrictions: (i) the total energy demanded by a load must remain constant throughout the day, (ii) the shift can only occur within two contiguous time slots, i.e., the displacement cannot be applied to any arbitrary distant time slots, and (iii) the demand cannot be negative.
The power demand load describes the daily demand, with a 15 min interval, resulting in a 96-value vector for a full day.In order to be aligned with the state discretisation of 1 h (as described in Section 2.2.2), the optimiser resamples the loads by averaging the samples within each hour.

Rewards
The objective of the system is to optimise the operation of the grid.This version tries to maximise the SSR and SCR.The reward function is the mean of both parameters.In order to obtain greater flexibility during the training process, the reward is computed as a weighted average: By tuning the α parameter, it is possible to address the optimisation needs and promote SSR over SCR (or vice versa).In this way , we can force SCR and SSR to evenly contribute to the reward, by setting α = 1 2 , or make the reward equal to the SCR, by setting α = 1.

Agents
The optimisation model includes agents for the training of PPO, DQN, and DDQN [41].The optimisation model was implemented using the Tensorforce RL framework.For training, some improvements were conceived and implemented, including:

•
Variable number of steps per episode: some experiments show that the reward fell down after some steps and was incapable of going up for the rest of the episode.To avoid wasting time in the training process, the episode is concluded when the obtained rewards go down a customisable threshold of 20%.
• Variable exploration time: the training process supports the configuration of a variable exploration rate, which can be diminished as the learning process progresses over more learning episodes.

•
State density matrix: the optimisation training process registers all the states visited by the agent, intended to give a clear vision of the agent's preferable combination of states, aiming to understand the reasons for the agent to choose such a state combination as optimal.This state's matrix consists of 48 columns, corresponding to the 24 h of both domestic and industrial consumers, and 10 rows, corresponding to the 10 state bins available for each state.

Case Study
ASM Terni S.p.A. is a multi-utility in Terni municipality, an Italian city in the Umbria area, which manages operations for the administration of electricity, gas, and water networks and is responsible for waste collection and disposal.ASM, through its business unit Terni Distribuzione Elettrica, owns and operates the DN serving 65,000 users.ASM manages 65,000 smart meters, 700 secondary substations, and three PS.In total, 1276 LV solar plants and 53 MV PV make up Terni's distributed generation's installed power of 63.4 MVA; 40% of the DN's yearly energy consumption is met by this DER.Additional information about ASM DN can be found in [46][47][48], which, respectively, exploited the digital twin concept in the power DNs, investigated false data impact, and analysed the integration of a hydrogen supply chain for feeding vehicles based on fuel cell technology.
In order to evaluate the effectiveness of the ML tools developed, a portion of the Terni network in MV was used as a case study, as depicted in Figure 5, consisting of 14 nodes, one of which represents a PS, while the others represent secondary substations and 30 power lines.Each of the 13 secondary substations feeds a different capacity of industrial loads, for a total of 650 kW of installed power and domestic loads (3374 kW), and some of them have photovoltaic systems (3663 kW).Regarding the historical data from the last 5 years, a yearly average curve of industrial and domestic loads and PV production is available, with a granularity of 15 min.These curves are used as a baseline in the optimisation tool.A PMU is located in the PS, which is capable of monitoring voltage, active and reactive power, and other electrical parameters in real time; this sensor is used for forecasting analysis.Experiments were conducted using an Atos HPC cluster of 12 nodes with 720 cores each and a total memory of 2968 Gb.Experiments were conducted on 16 cores and 128 Gb RAM, which took days to complete.

Forecasting Results
To verify that the proposed ML architectures are valid, it was used to predict the voltage trend and the active and reactive power of the PMU, located in a PS, a smart meter, and two power quality analysers.Forecasting was carried out using a dataset collected for 18 months through the MQTT topic broadcast by sensors, by training the proposed models.The predicted data were chosen to have a time horizon of 24 h and a resolution of one hour, whose forecasting was based on the last 36 h of data averaged every hour, resulting in an input tensor of 36 values.The proposed models were trained with the hyper-parameters listed in Table 2.The optimiser, called Adam, is an algorithm for first-order gradientbased optimisation of stochastic objective functions, based on adaptive estimates of lowerorder moments [49].Experiments were conducted using an Atos HPC cluster of 12 nodes with 720 cores each and a total memory of 2968 Gb.Experiments were conducted on 16 cores and 128 Gb RAM, which took days to complete.

Forecasting Results
To verify that the proposed ML architectures are valid, it was used to predict the voltage trend and the active and reactive power of the PMU, located in a PS, a smart meter, and two power quality analysers.Forecasting was carried out using a dataset collected for 18 months through the MQTT topic broadcast by sensors, by training the proposed models.The predicted data were chosen to have a time horizon of 24 h and a resolution of one hour, whose forecasting was based on the last 36 h of data averaged every hour, resulting in an input tensor of 36 values.The proposed models were trained with the hyper-parameters listed in Table 2.The optimiser, called Adam, is an algorithm for first-order gradient-based optimisation of stochastic objective functions, based on adaptive estimates of lower-order moments [49].After training the model, inferences for forecasting the generated power were computed, comparing predictions with actual measure power.Figure 6 shows the actual power data (orange line), inferences performed using the ML model (blue points), and the forecasting intervals with 90% confidence (blue area).The forecasting intervals can be computed when the errors between the actual data and the model predictions present a distribution that can be considered Gaussian.To confirm the errors came from a Gaussian distribution, they were subjected to normality tests: Shapiro-Wilk [50], Anderson-Darling [51], and D'Agostino-Pearson [52].These tests consist of statistical hypothesis tests that check whether the data contain specific properties.Thus, two hypotheses are defined: the null and alternative hypotheses.The null hypothesis supports that the data probably come from a normal distribution, while the alternative hypothesis suggests that the data present a different distribution.The statistical test returns a probability known as the p-value.If the result presents a value lower than the defined significance level (0.05 in this case), then the null hypothesis must be rejected so that the data distribution can be assumed to be normal.the null hypothesis must be rejected so that the data distribution can be assumed to be normal.Table 3 lists the p-values obtained.Within the confidence intervals, the model learns the seasonal variations in the generated power quite well, as depicted in Figure 6, obtaining an MSE of 0.009 during training.As shown in Figure 6, there is good matching between the observed and predicted values within the confidence interval, which captures both the trend and seasonality in the future series behaviour.

Results of Grid Optimisation using the Demand Response
The grid optimisation model was trained with the agent's hyperparameters listed in Table 4.In our experiment,  = 1 was chosen because, from the DSO's point of view, it is more convenient to implement a DR mechanism that favours the self-consumption of local DER to reduce RPF and reduce demand from the grid.A high SCR means low grid losses.SSR, on the other hand, is more related to consumers and the possibility of becoming grid-  Within the confidence intervals, the model learns the seasonal variations in the generated power quite well, as depicted in Figure 6, obtaining an MSE of 0.009 during training.As shown in Figure 6, there is good matching between the observed and predicted values within the confidence interval, which captures both the trend and seasonality in the future series behaviour.

Results of Grid Optimisation Using the Demand Response
The grid optimisation model was trained with the agent's hyperparameters listed in Table 4.In our experiment, α = 1 was chosen because, from the DSO's point of view, it is more convenient to implement a DR mechanism that favours the self-consumption of local DER to reduce RPF and reduce demand from the grid.A high SCR means low grid losses.SSR, on the other hand, is more related to consumers and the possibility of becoming grid-independent.Initial experiments were conducted to determine each episode's optimal duration (i.e., #steps), which resulted in 2000 steps.Also, as described before, better optimisation results were obtained by applying a normalised discretisation of the state space.The results show the agent was exploring the state space much better than if normalised discretisation had not been applied.Figure 7 shows the matrix space of visited loads after completing the experiment.

Load threshold 30%
Initial experiments were conducted to determine each episode's optimal duration (i.e., #steps), which resulted in 2000 steps.Also, as described before, better optimisation results were obtained by applying a normalised discretisation of the state space.The results show the agent was exploring the state space much better than if normalised discretisation had not been applied.Figure 7 shows the matrix space of visited loads after completing the experiment.As a result of training, the reward was improved (Figure 8) after applying learnt actions on the user's demand load, resulting in better SCR.The achieved SCR of around 52% is very high, considering that the portion of the grid in question has an average daily production of 23.6 MWh and a consumption of 12.9 MWh.Thus, the maximum SCR that can theoretically be achieved is 55%, which occurs in the case of completely zero energy absorption from the grid.In Figure 8, five different episodes of the training have been depicted.As a result of training, the reward was improved (Figure 8) after applying learnt actions on the user's demand load, resulting in better SCR.The achieved SCR of around 52% is very high, considering that the portion of the grid in question has an average daily production of 23.6 MWh and a consumption of 12.9 MWh.Thus, the maximum SCR that can theoretically be achieved is 55%, which occurs in the case of completely zero energy absorption from the grid.In Figure 8, five different episodes of the training have been depicted.As a result of training, the reward was improved (Figure 8) after applying learnt actions on the user's demand load, resulting in better SCR.The achieved SCR of around 52% is very high, considering that the portion of the grid in question has an average daily production of 23.6 MWh and a consumption of 12.9 MWh.Thus, the maximum SCR that can theoretically be achieved is 55%, which occurs in the case of completely zero energy absorption from the grid.In Figure 8, five different episodes of the training have been depicted.In Figure 9, domestic loads are shown before (i.e., left) and after being optimised (i.e., right) along hours in the day on the x-axis.A similar trend is recognised for industrial loads.The optimised load shows a shape closer to the theoretical optimal load, with a bell shape, where the load is concentrated at noon.For the optimiser to reach the optimal shape, an extremely longer episode duration, beyond our computing capability, is required to exhaust the exploration of the state space in order to compute the optimal policy.However, the initial training results, within our computational constraints, show progress in the right direction.In Figure 9, domestic loads are shown before (i.e., left) and after being optimised (i.e., right) along hours in the day on the x-axis.A similar trend is recognised for industrial loads.The optimised load shows a shape closer to the theoretical optimal load, with a bell shape, where the load is concentrated at noon.For the optimiser to reach the optimal shape, an extremely longer episode duration, beyond our computing capability, is required to exhaust the exploration of the state space in order to compute the optimal policy.However, the initial training results, within our computational constraints, show progress in the right direction.Further experiments are required to improve the optimised SSR and SCR values to become close to the theoretical ones.This can be accomplished by reducing the dimension of the state space so that within the affordable computation time, the agent can exhaust the state space when searching for optimal user demand loads.

Conclusions
This paper shows some of the services developed within the European IoT-NGIN project and their integration within the electricity infrastructure of ASM Terni.This paper aimed to show the application of ML tools within the DN and highlight the added value that can be obtained.Analytics for forecasting and grid optimisation were presented.The GRU-based forecasting applied to a PMU, a smart meter, and two power quality analysers indicated the active and reactive power and voltage values in the next 24 h.At the same time, the grid optimisation tool, which consists of an electrical simulation part using pandapower and an RL-based optimisation part for the consumer curves, managed the decisions of the DR mechanisms on a portion of the medium voltage network (14 nodes) of ASM Terni.Technical details of the services were provided, accompanied by some results.A limitation of this paper is its simplified approach to user participation in DR mechanisms, which does not take into account the real availability of flexibility resources (storage, loads that can be shifted over time, etc.), which was assumed to be a portion of the Further experiments are required to improve the optimised SSR and SCR values to become close to the theoretical ones.This can be accomplished by reducing the dimension of the state space so that within the affordable computation time, the agent can exhaust the state space when searching for optimal user demand loads.

Conclusions
This paper shows some of the services developed within the European IoT-NGIN project and their integration within the electricity infrastructure of ASM Terni.This paper aimed to show the application of ML tools within the DN and highlight the added value that can be obtained.Analytics for forecasting and grid optimisation were presented.The GRU-based forecasting applied to a PMU, a smart meter, and two power quality analysers indicated the active and reactive power and voltage values in the next 24 h.At the same time, the grid optimisation tool, which consists of an electrical simulation part using pandapower and an RL-based optimisation part for the consumer curves, managed the decisions of the DR mechanisms on a portion of the medium voltage network (14 nodes) of ASM Terni.Technical details of the services were provided, accompanied by some results.
A limitation of this paper is its simplified approach to user participation in DR mechanisms, which does not take into account the real availability of flexibility resources (storage, loads that can be shifted over time, etc.), which was assumed to be a portion of the total load.
In future research, we plan to test additional ML models for application in these topics, compare the results, and apply the models to a more significant portion of the electrical network.Furthermore, we plan to evaluate how the integration between the forecasting service and the optimisation service can be strengthened using single analytics that exploit the results provided by both services in a cascade.

Electronics 2023 ,
12, x FOR PEER REVIEW 7 of 16 entire DN.Reducing the number of actors simplifies the computational aspects of the problem and speeds up the solution.

Figure 3 .
Figure 3. Energy demand for domestic clusters.

Figure 4 .
Figure 4. Energy demand for industrial clusters.

Figure 3 .
Figure 3. Energy demand for domestic clusters.

Figure 3 .
Figure 3. Energy demand for domestic clusters.

Figure 4 .
Figure 4. Energy demand for industrial clusters.

Figure 4 .
Figure 4. Energy demand for industrial clusters.

Figure 5 .
Figure 5. Schematic diagram showing the portion of the 20 kV grid used as a case study.Some nodes are connected via different sections of the line, for example, the overhead and cable types or with different typologies.For reasons of graphic clarity, they were not distinguished here.

1 Figure 5 .
Figure 5. Schematic diagram showing the portion of the 20 kV grid used as a case study.Some nodes are connected via different sections of the line, for example, the overhead and cable types or with different typologies.For reasons of graphic clarity, they were not distinguished here.

Figure 6 .
Figure 6.Training results for generated power forecasting for 20 days of analysis in August 2022.

Figure 6 .
Figure 6.Training results for generated power forecasting for 20 days of analysis in August 2022.

Figure 9 .
Figure 9. Domestic demand load before (on the left) and after (right) the training.

Figure 9 .
Figure 9. Domestic demand load before (on the left) and after (right) the training.

Table 2 .
Hyper-parameters for power forecasting models.

Table 3
lists the p-values obtained.

Table 3 .
Normality test results (p-values) for power forecasting models.

Table 3 .
Normality test results (p-values) for power forecasting models.

Table 4 .
Hyper-parameters for grid optimisation models.