Massive Generation of Customer Load Proﬁles for Large Scale State Estimation Deployment: An Approach to Exploit AMI Limited Data

: The management of the distribution network is becoming increasingly important as the penetration of distributed energy resources is increasing. Reliable knowledge of the real-time status of the network is essential if algorithms are to be used to help distribution system operators deﬁne network conﬁgurations. State Estimation (SE) algorithms are capable of producing such an accurate snapshot of the network state but, in turn, require a wide range of information, e.g., network topology, real-time measurement and power proﬁles from customers/productions. Those proﬁles which may, in principle, be provided by smart meters are not always available due to technical limitations of existing Advanced Metering Infrastructure (AMI) in terms of communication, storage and computing power. That means that power proﬁles are only available for a subset of customers. The paper proposes an approach that can overcome these limitations: the remaining proﬁles, required by SE algorithms, are generated on the basis of customer-related information, identifying clusters of customers with similar features, such as the same contract and pattern of energy consumption. For each cluster, a power proﬁle estimator is generated using long-term power proﬁles of a limited sub-set of customers, randomly selected from the cluster itself. The synthesized full power proﬁle, representing each customer of the distribution network, is then obtained by scaling the power proﬁle estimator of the cluster to which the customer belongs, by the monthly energy exchanged by that customer, data that are easily available. The feasibility of the proposed approach was validated considering the distribution grid of Unareti SpA, an Italian Distribution System Operator (DSO), operating in northern Italy and serving approximately one million customers. The application of the proposed approach to the actual infrastructure shows some limitations in terms of the accuracy of the estimation of the power proﬁle of the customer. In particular, the proposed methodology is not fully able to properly represent clusters composed of customers with a large variability in terms of power exchange with the distribution network. In any case, the root mean square error of the synthesized full power proﬁle with the respect to validation power proﬁles belonging to the same cluster is, in the worst case, on the order of 6.3%, while in the rest of cases is well below 5%. Thus, the proposed approach represents a good compromise between accuracy in representing the behavior of customers on the network and resources (in terms of computational power, data storage and communication resources) to achieve that results.


Introduction
During the last decade, the power distribution network has been facing a deep transformation due to a change in the paradigm of the power generation (from a centralized to distributed one, largely based on renewable sources) and due to the increasing conversion

•
Analysis of existing DSO informative system, including AMI and DMS • Definition of a procedure based on limited AMI data for power profile generators • Formalization of the procedure for the synthesis of customer power profiles • Validation of the approach on large-scale distribution network (one million customers) and real data from existing AMI system The rest of the paper is structured as follows. In Section 2, the architecture and technical limits of existing AMI systems are described. Then, the proposed procedure for the massive generation of customer' power profile using the limited amount of data from AMI is introduced. Section 3 deals with the formal description of the three-phase process used to generate power profiles. In the first phase, the customers are clustered in homogenous groups, based on their contractual features. In the second phase, a limited number of customers is randomly selected per each cluster. Their long-term power profiles are selected to represent the average behavior of each cluster, and they are used to synthesize a normalized profile. The normalized profile represents the average behavior of customers of each cluster. Finally, the normalized profile is scaled by the real monthly energy exchanged by each of the customers of a cluster, to synthesize a power profile for each customer. The validation of the proposed approach in a real distribution network is presented in Section 4. In this section, the customers' data made available from Unareti S.p.A. are processed to generate one million customer power profiles. The effectiveness and the accuracy of the proposed solution is discussed in the Section 5. Finally, the results are summarized in Section 6.

Review of the 1st Generation AMI in Italy
AMIs are now deployed in several European Countries [24,25] and the rest of the world [26][27][28][29]. In Italy, SMs installation started in 2003 and became mandatory with the resolution 292/06 of the Italian Regulatory Authority for Energy, Networks and Environment (ARERA). SMs are mainly used for customer relationship management: Concerning the measurement and collection of real-time measurements and power profiles, AMIs generally present some constraints, inherited from the most common implementation architecture (Figure 1), which is composed of: Concerning the measurement and collection of real-time measurements and power profiles, AMIs generally present some constraints, inherited from the most common implementation architecture (Figure 1), which is composed of: • Electronic meters per each customer • A Meter Data Concentrator (MDC) collecting customers' data and installed in secondary substations • An Automatic Metering Management (AMM) system, collecting data from MDCs Contractual information about customers is stored in the Customer Information System (CIS), which is connected to the AMM. When, for example, a contractual power changes, a new task is scheduled about the contract in the CIS and this creates a reconfiguration job for the AMM, which in turn reconfigures the specific SM with the new value. A first constraint of this architecture is due to the Narrow Band Power Line Communication (NB-PLC), the most common communication technology used by the MDC to read from SMs. Despite providing the best trade-off in terms of cost and performance in urban and semi-urban areas, it has a relatively low throughput and a high latency [30]. Contractual information about customers is stored in the Customer Information System (CIS), which is connected to the AMM. When, for example, a contractual power changes, a new task is scheduled about the contract in the CIS and this creates a reconfiguration job for the AMM, which in turn reconfigures the specific SM with the new value.
A first constraint of this architecture is due to the Narrow Band Power Line Communication (NB-PLC), the most common communication technology used by the MDC to read from SMs. Despite providing the best trade-off in terms of cost and performance in urban and semi-urban areas, it has a relatively low throughput and a high latency [30]. For this reason, today's SMs calculate 15-min power profiles and typically only transfer them to the MDC once a month, and not in real-time.
The technology used for AMM represents the second constraint. Storing the profile for all the customers requires a huge amount of storage, which would be extremely expensive with a non-cloud solution, such as those commonly used for the first generation of AMIs. The normative requires that only power profiles of customers with contract power over 55 kW must be collected and stored: they represent a limited number considering the overall distribution network. Generally, the DSO does not download the rest of the power profiles, which include residential or small industrial customers with smaller contractual powers (e.g., 16.0, 6.6 or 3.3 kW). To summarize, state-of-the-art AMIs provide power profiles for about 1% of the customers, and, in any case, the infrastructure is not technically designed and sized to manage a large quantity of power profiles. Thus, applications, such as SE algorithms, which require many power profiles should define approaches to generate them, by limiting the total number of additional data to recover from the AMI.

Profiles Generator
Considering the DSO supervision infrastructure, the SE function is part of the Distribution Management System (DMS), the informative infrastructure used to monitor, manage and configure the distribution network. Generally, SE algorithms are already included as optional add-on in commercial DMS but can be improved by DSO in necessary [31]. In any case, the analysis and study of SE algorithms are out of the scope of the current research, which focuses on the source of data, as detailed in the Introduction. The application of SE algorithm at the LV level requires as many data as possible about the distribution grid, which can be obtained by processing AMI data stored in AMM. Nevertheless, existing AMI infrastructure does not provide power profiles for every connected customer and only a limited number of additional profiles can be recovered from it due to the technical limits highlighted in the previous section. Additional functions are thus required by the DSO informative system in order to recover the required data. The proposed solution to this issue is based on deployment of a profiles generator block, which uses the limited data collected by the AMI system to generate the information required to feed the SE in DMS of the DSO, as shown in Figure 2. This block can be considered as an add-on functionality of the DMS.
For this reason, today's SMs calculate 15-min power profiles and typically only transfer them to the MDC once a month, and not in real-time.
The technology used for AMM represents the second constraint. Storing the profile for all the customers requires a huge amount of storage, which would be extremely expensive with a non-cloud solution, such as those commonly used for the first generation of AMIs.
The normative requires that only power profiles of customers with contract power over 55 kW must be collected and stored: they represent a limited number considering the overall distribution network. Generally, the DSO does not download the rest of the power profiles, which include residential or small industrial customers with smaller contractual powers (e.g., 16.0, 6.6 or 3.3 kW). To summarize, state-of-the-art AMIs provide power profiles for about 1% of the customers, and, in any case, the infrastructure is not technically designed and sized to manage a large quantity of power profiles. Thus, applications, such as SE algorithms, which require many power profiles should define approaches to generate them, by limiting the total number of additional data to recover from the AMI.

Profiles Generator
Considering the DSO supervision infrastructure, the SE function is part of the Distribution Management System (DMS), the informative infrastructure used to monitor, manage and configure the distribution network. Generally, SE algorithms are already included as optional add-on in commercial DMS but can be improved by DSO in necessary [31]. In any case, the analysis and study of SE algorithms are out of the scope of the current research, which focuses on the source of data, as detailed in the Introduction. The application of SE algorithm at the LV level requires as many data as possible about the distribution grid, which can be obtained by processing AMI data stored in AMM. Nevertheless, existing AMI infrastructure does not provide power profiles for every connected customer and only a limited number of additional profiles can be recovered from it due to the technical limits highlighted in the previous section. Additional functions are thus required by the DSO informative system in order to recover the required data. The proposed solution to this issue is based on deployment of a profiles generator block, which uses the limited data collected by the AMI system to generate the information required to feed the SE in DMS of the DSO, as shown in Figure 2. This block can be considered as an add-on functionality of the DMS.  More in detail, customers' data are clustered to identify different classes of customers on the distribution network under analysis. Each cluster is formed by customers with similar characteristics from the DSP point of view. The profile generation function uses the cluster information to download from AMM only a limited number of long-term power profiles for each class of customer. Then, this information is used to estimate the power profiles for all customers who are associated to the same cluster. This information is then provided to the SE algorithm of the DMS.
The diagram in Figure 3 shows the details of the process followed to calculate the power profile for all customers. As a first step, customers' data are analyzed to identify clusters, i.e., the different classes of customers with the same set of properties (Customer Clustering). Then, power profiles of a sample of customers for each of the classes identified in the previous step are downloaded from the AMI system to generate a normalized power profile for each cluster (Normalized Profile Generation). Finally, power profiles for all customers of each cluster are calculated based on the normalized power profiles estimated in the previous step (Full Profile Generation). The formal description of the process is described in the following section.
More in detail, customers' data are clustered to identify different classes of customers on the distribution network under analysis. Each cluster is formed by customers with similar characteristics from the DSP point of view. The profile generation function uses the cluster information to download from AMM only a limited number of long-term power profiles for each class of customer. Then, this information is used to estimate the power profiles for all customers who are associated to the same cluster. This information is then provided to the SE algorithm of the DMS.
The diagram in Figure 3 shows the details of the process followed to calculate the power profile for all customers. As a first step, customers' data are analyzed to identify clusters, i.e., the different classes of customers with the same set of properties (Customer Clustering). Then, power profiles of a sample of customers for each of the classes identified in the previous step are downloaded from the AMI system to generate a normalized power profile for each cluster (Normalized Profile Generation). Finally, power profiles for all customers of each cluster are calculated based on the normalized power profiles estimated in the previous step (Full Profile Generation). The formal description of the process is described in the following section.

Customer Clustering
The first problem in customer clustering is that the number and features of clusters are not known a priori. In fact, the definition of the correct number of clusters has a strong impact on the time required to process a dataset and the association of customers to the correct cluster. This is why several techniques (among them, those referenced in the Introduction) have the definition of clusters as primary task. However, in the real case when the set of power profiles is not complete, approximations have to be defined and accepted.
The approximation proposed in this paper is to define clusters based on contract information of customers-contained in the CIS-and then to perform further refinement

Customer Clustering
The first problem in customer clustering is that the number and features of clusters are not known a priori. In fact, the definition of the correct number of clusters has a strong impact on the time required to process a dataset and the association of customers to the correct cluster. This is why several techniques (among them, those referenced in the Introduction) have the definition of clusters as primary task. However, in the real case when the set of power profiles is not complete, approximations have to be defined and accepted.
The approximation proposed in this paper is to define clusters based on contract information of customers-contained in the CIS-and then to perform further refinement to reach a target accuracy in the model. Examples of contract features which can be taken into consideration for the clustering process are: These parameters are normally stored in CIS systems, therefore clustering based on them is a straightforward and efficient procedure for real-life applications. Besides the above parameters, for each customer, the total energy consumption in a given month is other information which is available in the CIS. Based on that, the total annual consumption E per each customer can also be easily obtained.

•
More formally, let us consider a set of N customers; each customer X i can be described as a couple: The computational effort of the clustering operation depends on the number of clusters k. In the specific case, the maximum number of clusters, k, is given by the H properties of each customer used to define the clusters: where || represents the cardinality operator of each set. It is evident from (2) that the total number of clusters can increase quickly as the number of features used to cluster the profiles increases. Fortunately, the cardinality of some set of features is limited, and some combinations of parameters are not considered interesting from a practical point of view. In this paper, an iterative approach is proposed to limit the total number of clusters and, consequently, the resources required to estimate the power profiles. This approach is based on a recursive clustering. During the first round, only the parameters S, V, T, R and P are used to classify the customers and to generate the first subset. In Section 4, it is proved that the assumption of limiting the first level of clustering to this subset of parameters is valid in most of the cases.
The criteria to decide which clusters need a second level of clustering are related to the overall energy associated to the cluster. In details, considering the cluster k of cardinality M (i.e., composed by M customer), the total energy E k associated to the cluster k is defined as: where E m represents the total annual energy of the mth customer of the cluster k, whose value is stored in CIS. Considering the distribution of the E k population, whose standard deviation is σ E , those clusters fulfilling the condition: are those that need the second level clustering. Clusters characterized by a large energy exchange with the distribution network are those that provide a larger contribution to the SE algorithms. Therefore, it is reasonable to balance the total energy associated per each cluster using sub-clustering. A more detailed clustering makes the customers more homogenous, improving the accuracy in the estimation of customers' power profile. Using the sub-clustering approach allows, at the same time, to guarantee a proper trade-off between the accuracy of power profile estimator and the required computational effort.

Normalized Profile Generation
Considering a cluster, k, of cardinality M (where M < N), the estimator, Y k , of the power profile (in the following power profile estimator), is defined as: i.e., it is obtained by randomly-selecting a subset Q of power profiles of the cluster k and performing their statistical average. Q is set as min(Q S , M) and corresponds to the number of profiles processed to obtain the estimated profile, for each cluster. Q S is a design parameter, and its value represents a trade-off between accuracy and the fact that the number of available profiles is limited. The higher is Q S . the more accurate is the estimation of the power profile. If Qs is too high, it is not possible to calculate the error due to a lack of profiles. In the present paper, it is assumed Qs = 100.
The index generally used in statistic to estimate the quality of an estimator is the Root Mean Square Error (RMSE), the square root of the "second sample moment" of the difference between the expected values and the observed values. This quantity is referred as "residuals" when calculations are performed over the data sample used for estimation and is referred as "errors" (or prediction errors) when computed out-of-sample.
Considering the power profile estimator of the cluster k, Y k , the total residual, r k , is calculated as: represents the residual of the qth power profile of the cluster k. The normalization factor P k = sup{P m } M is the upper bound of contractual power among the M customers belonging to the cluster k, and it is introduced to compare residuals among different clusters, which in general have different contractual powers.
The use of a limited number (M) of power profiles, Y k , during the estimation process could polarize the power profile estimator, Y k . Thus, a subset, W k , of power profiles is randomly selected from each cluster k, to validate the estimator. In this way, the polarization of the estimator with the respect to the behavior of specific power profiles is largely avoided. The number of power profiles required for the validation is obtained as W k = min(W, M k −Q k ), where M k is the cardinality of the cluster k and Q k is the number of power profiles used to estimate Y k . Considering the trade-off between the statistical validity required by validation phase and the need to reduce the total number of power profiles to be recovered from AMI, the design parameter W can be set to 30. The total error, e k , of the power profile estimator of the cluster k, Y k , with the respect to the W k validation profiles is calculated as: is the error of the wth power profile of the cluster k. The power profile estimator of the cluster k, Y k , is considered valid if the following relationship is verified: where ε is the tolerated deviation of the validation power profiles with the respect to the estimator. The threshold ε is used during the validation phase to assess if the estimated power profile is able to represent with the proper accuracy the cluster to which it belongs, avoiding possible polarization or over-estimating effect due to the limited number of power profiles used. In our analysis, ε was set to 0.05. If the deviation of the validation power profiles is greater than this value, it means the power profiles selection mechanism polarizes the estimator Y k . Thus, the power profile selection process has to be executed again to define a new training set (see Figure 3), increasing the number of power profiles to recover from AMI. After the validation, each estimated profile, Y k , has to be normalized by its own energy. Each estimated profile, Y k can be monthly partitioned, as: where the set s = {s 1 , . . . , s 12 }, represents the months in the considered year (please note how s x is not an integer index but is the set of days of the xth month). The energy of the sth month, E s k , of the estimated power profile, Y k , is defined as: where ∆t is the time interval between two consecutive power profile samples (i.e., 15 min in the considered case). The normalized estimator is thus defined as:

Full Profiles Generation
The third step is responsible for the generation of an estimation of the power profile for each of the customer, using the reference normalized estimator, Y k,N , obtained during the previous step. The power profile for the mth customer of a cluster k, Y k,m , is obtained by scaling the normalized estimator by the energy consumed by that customer on each month of the year, data that are directly available from CIS.
Each normalized estimator Y k,N can be monthly partitioned, as: where the set s = {s 1 , . . . , s 12 }, represents the months in the considered year (please note how s x is not an integer index but is the set of days of the xth month). The full profile associated to the mth customer of the cluster k is thus defined as: where E s k,m represents the total energy consumed by the mth customer of the cluster k during the sth month. This procedure is applied for each customer of each k cluster obtained from Step 1.

Results
The aim of this section is to validate the approach described in Section 3 under realistic conditions. The AMI system of Unareti S.p.A. was used as a source of data to be used to estimate full power profile estimators. The following analysis was performed on the distribution network of Lombardy region, in northern Italy, formed by approximately one million customers. The clustering phase was based on real customer data downloaded from the CIS database of the DSO, part of the Distribution Management System (DMS). Unless specified otherwise, the results were obtained from data collected during 2019-2020 (12 months).

First Level of Clustering
This first level of clustering is applied to all customers and it classifies them based on a subset of features available in CIS: Contract Features S ("Connected" case only), V, T, R and P. The result of first level clustering is 29 clusters, as shown in Table 1. As shown by the data, the distribution of customers per each cluster is different. In particular, the cluster AAAA, i.e., the one associated to "LV, Domestic, Consumer, Contract power less than 6.6 kW", groups approximately 80% of the total number of customers. As shown in Figure 4, the rest of the clusters share the remaining 20%, considering that the cluster ABAA groups approximately another 10% of customers. Nevertheless, the cardinality of a cluster is not the main parameter to be considered during the clustering phase given the specific application (i.e., grid SE). In fact, from a SE point of view, it is more relevant to consider the total annual energy exchanged by each cluster, as it is an index of the impact of such cluster on the grid behavior. This means that clusters with a "significant" total annual energy are the candidates for a further clustering phase. More formally, the standard deviation of the total energy exchanged per each cluster is 0.48 GWh, which means that the cluster with a total energy per year above 1.44 GWh, as defined from Condition (4), should be further processed.
As can be seen from the total annual energy distribution shown in Figure 5, this is the case of clusters AAAA and BBAD representing approximately half of the total energy exchanged with the power grid. The rest of the energy is distributed almost equally among seven clusters. However, the BBAD cluster is composed of customers with a contractual power above 55 kW. For this class, as recalled in Section 2.1, power profile are available for each customer. The process of the power profile generation in this case can be used to compensate any missing data in the record. Thus, only the AAAA cluster requires a sub-clustering.

Second Level of Clustering
The second level of clustering is performed only on cluster AAAA that includes 80.1% of all customers (S = Connected; V = LV; T = Domestic; R = Consumer; P = (0.0, 6.6] kW) and 21% of the total energy consumed per year by the customers of the DSO.
Two further features are considered for the sub-clustering: • The sub-clustering splits the cluster AAAA into 20 sub-clusters. The percentage of customers and the percentage of total annual energy consumed per each sub-cluster are summarized in Table 2. As clearly highlighted by the data in the table, the sub-cluster AAAA-BA is grouping more than 67.91% of the total customers of the AAAA cluster, and it represents, approximately, 65% of the total energy consumption of the original cluster. This result is expected, since this sub-cluster groups the most common contract (contractual power P in (3.3, 4.4]) in the area of the city of Milan, the most populous city managed by the DSO.
Thus, the clusters considered by the generation of the normalized estimator are those indicated in Tables 1 and 2.   The sub-clustering splits the cluster AAAA into 20 sub-clusters. The percentage of customers and the percentage of total annual energy consumed per each sub-cluster are summarized in Table 2. As clearly highlighted by the data in the table, the sub-cluster AAAA-BA is grouping more than 67.91% of the total customers of the AAAA cluster, and it represents, approximately, 65% of the total energy consumption of the original cluster. This result is expected, since this sub-cluster groups the most common contract (contractual power P in (3.3, 4.4]) in the area of the city of Milan, the most populous city managed by the DSO.
Thus, the clusters considered by the generation of the normalized estimator are those indicated in Tables 1 and 2.

Normalized Profile Generation
The identification of the customer' clusters is preparatory for the estimation of the power profile phase. The power profiles of each cluster k are processed to produce as an output the power profile estimator, Y k . As mentioned in Section 3, two metrics are used to validate the quality of this estimator:

•
The residual describes the variability of the power profile estimator with respect to the power profiles used to calculate it. • The error describes the variability of the power profile estimator with the respect to validation power profiles, different from the previous one.
In the following, a Level 1 cluster (ABAB) and a Level 2 sub-cluster (AAAA-BA) are presented and analyzed as an example of this process. These results are then compared with an algorithm which does not take into account the customers' clusters. Let us consider the cluster ABAB. Q S = 100 power profiles are used to compute the power profile estimator of the cluster, following Equation (5). The partial residual is shown in Figure 6, where it is normalized by the contractual power of the cluster (P ABAB A = 16.5 kW). As shown in the figure, it is generally within 5%, although some of the power profiles used during the estimation exhibit a larger value (between 10% and 20% of the contractual power). The total residual, r k ( Y k ), of the cluster ABAB is 5.4%.
After the elaboration phase, the power profile estimator is validated versus W ABAB = 30 power profiles, which differ from those used in the previous phase. Under this condition, it is possible to identify any possible polarization effects. The partial error of the power profile estimator of cluster ABAB with the respect to the validation power profiles is shown in Figure 7, where it is normalized by the contractual power of the cluster P ABABA . The partial error is generally below 5%. The partial error is above 10% in few cases (in one case around the 25%). The total error, e k ( Y k ), is 6.4%. It should be noted that the condition expressed by (8) is satisfied, since the total error of power profile estimator is lower than 5% of the total residual. Thus, the power profile estimator of the cluster ABAB is not polarized. Figure 8 shows the power profile estimator of class ABAB during 11-25 February 2020. expressed by (8) is satisfied, since the total error of power profile estimator is lower than 5% of the total residual. Thus, the power profile estimator of the cluster ABAB is not polarized. Figure 8 shows the power profile estimator of class ABAB during 11-25 February 2020.   expressed by (8) is satisfied, since the total error of power profile estimator is lower than 5% of the total residual. Thus, the power profile estimator of the cluster ABAB is not polarized. Figure 8 shows the power profile estimator of class ABAB during 11-25 February 2020.

Example of Cluster AAAA-BA
The same approach is used for the sub-cluster AAAA-BA with Q S = 100, W AAAA-BA = 30 and considering the contractual power P AAAA-BA = 4 kW for the sake of normalization. The partial residual of the power profile estimator and the related partial error are shown in Figures 9 and 10, respectively. The total residual, r k ( Y k ) is 4.97%, while the total error, e k ( Y k ) is 4.9%. It should be noted that the condition expressed by (8) is satisfied, since the error is lower than the residual. This result can be explained since the validation was performed on a smaller number of power profiles if compared to those used during the estimation phase. Figure 11 shows the power profile estimator of class AAAA-BA during 11-25 February 2020.

Example of Cluster AAAA-BA
The same approach is used for the sub-cluster AAAA-BA with QS = 100, WAAAA-BA = 30 and considering the contractual power PAAAA-BA = 4 kW for the sake of normalization. The partial residual of the power profile estimator and the related partial error are shown in Figures 9 and 10, respectively. The total residual, ( ) is 4.97%, while the total error, (̂) is 4.9%. It should be noted that the condition expressed by (8) is satisfied, since the error is lower than the residual. This result can be explained since the validation was performed on a smaller number of power profiles if compared to those used during the estimation phase. Figure 11 shows the power profile estimator of class AAAA-BA during 11-25 February 2020.

Reference Algorithm
Let us consider a subset of 150 power profiles randomly selected amon ers with contract power less than 16.5 kW, representing a large part of th network. In this case, the algorithm does not take into account the clusteri randomly selected 150 power profiles are used to compute the power profil the considered network, following Equation (5). The partial residual is sho 12, where it is normalized by the upper bound of contract power of the cons (i.e., P= 16.5 kW). As shown in the figure, it is generally less than 20%, alth the power profiles used during the estimation exhibit a larger value (up to contract power). The total residual, r(Y), is 18.2%, approximately four tim which can be obtained through the proposed approach.
After the elaboration phase, the power profile estimator is validate power profiles, which differ from those used in the previous phase. Under t it is possible to identify any possible polarization effects. The partial error profile estimator with the respect to the validation power profiles is shown where it is normalized by the upper bound of the contract power of the cons (i.e., P = 16.5 kW). The partial error is generally below 20%. The total error i

Reference Algorithm
Let us consider a subset of 150 power profiles randomly selected among the customers with contract power less than 16.5 kW, representing a large part of the distribution network. In this case, the algorithm does not take into account the clustering phase. The randomly selected 150 power profiles are used to compute the power profile estimator of the considered network, following Equation (5). The partial residual is shown in Figure 12, where it is normalized by the upper bound of contract power of the considered subset (i.e., P = 16.5 kW). As shown in the figure, it is generally less than 20%, although some of the power profiles used during the estimation exhibit a larger value (up to the 70% of the contract power). The total residual, r( Y), is 18.2%, approximately four time the results which can be obtained through the proposed approach.  After the elaboration phase, the power profile estimator is validated versus 150 power profiles, which differ from those used in the previous phase. Under this condition, it is possible to identify any possible polarization effects. The partial error of the power profile estimator with the respect to the validation power profiles is shown in Figure 13, where it is normalized by the upper bound of the contract power of the considered subset (i.e., P = 16.5 kW). The partial error is generally below 20%. The total error is 16.88%.

General Results
The total residual and error of the power profiles estimator, ̂, of a subset of the clusters (obtained at both Level 1 and Level 2 of sub-clustering) is summarized in Table 3. It should be noted that, in all the cases, Condition (8) is satisfied. That is, the total error obtained during the validation phase of the power profile estimator of each cluster is always lower than 5% of the total residual obtained during the estimation phase. That

General Results
The total residual and error of the power profiles estimator, Y k , of a subset of the clusters (obtained at both Level 1 and Level 2 of sub-clustering) is summarized in Table 3. It should be noted that, in all the cases, Condition (8) is satisfied. That is, the total error obtained during the validation phase of the power profile estimator of each cluster is always lower than 5% of the total residual obtained during the estimation phase. That means the power profile estimators of the clusters shown in the table are not polarized by the power profiles used during the estimation phase. Comparing the obtained results with an algorithm which does not take into account the clustering phase, it should be noted that the clustering approach significantly reduce the error of the estimator power profile at the cost of a limited additional number of power profiles. Table 3. The residual, r k,P ( Y k ), and the error, e k,P ( Y k,m ), of power profile estimator Y k .

Full Profile Generation
The power profiles estimators, Y k , of each cluster k, are normalized by the monthly energy. The so-obtained normalized estimator is then scaled by the monthly energy exchanged by the mth customer (m ∈ [1, M], where M is the cardinality of each cluster k), to synthesize the m full power profiles, Y k,m .
At this stage, no further corrections can be done on the generated full profiles. Nonetheless, it is interesting to evaluate the overall accuracy of this final result, by introducing two indexes: the partial error, e k,m ( Y k.m ), measuring the distance between the full power profiles, Y k,m , and the measured power profiles, Y k,m , and the overall metric for the cluster "total error", e k ( Y k.m ), are defined as Obviously, e k ( Y k,m ) can only be estimated because a limited number of measured power profiles, Y k,m , is available per each cluster k. The estimation of e k ( Y k,m ) is performed using the W k power profiles used during validation phase in order to get consistent results.
The full power profiles of the cluster k, Y k.m , are considered valid if the following relationship is verified: meaning the full power profiles are able to minimize the error introduce by the power profile estimator Y k . The limit of this metric is related to the fact that it can be calculated only for W k times, since these is the number of real profile available for checking the performance of the system. For the remaining population of power profiles which are not available, a second-less accurate-metric can be introduced. Considering the full power profiles, Y k,m , and the power profile estimator, Y k , of the cluster k, the total deviation, d k , and the partial deviation, d k,m , are obtained as 4.3.1. Example of Cluster ABAB Figure 14 shows the power profile estimator Y ABAB (solid black line) of cluster ABAB and a set of five full power profiles (dotted line), Y ABAB,m , during 11-25 February 2020. It should be noted that the full profile represents a scaled version of the power profile estimator, so that the monthly energy exchanged by each mth full power profiles corresponds to the monthly energy exchanged by the mth customer. Figure 14 shows the power profile estimator ̂ (solid black line) of cluster ABAB and a set of five full power profiles (dotted line), , , during 11-25 February 2020. It should be noted that the full profile represents a scaled version of the power profile estimator, so that the monthly energy exchanged by each mth full power profiles corresponds to the monthly energy exchanged by the mth customer. The partial deviation of the full profiles of cluster ABAB is shown in Figure 15, normalized by the contractual power of the cluster (PABABA = 16.5 kW). As shown in the figure, it is generally within 5%, although some of the full power profiles exhibit a larger deviation (around the 15%). The total deviation, ( . ) , of the cluster ABAB, is 5.1%. Similarly, the partial error of the full power profiles of cluster ABAB with the respect to the corresponding measured power profiles is shown in Figure 16, estimated over the WABAB = 30 power profiles used during the validation phase. It is around the 15% just in one case, while it is above 5% in three cases. The total error, ( , ) , of the cluster ABAB is 4%. It should be noted that the condition expressed by (15) is satisfied, since the total error of full profiles is below the total error of the profile estimator of the cluster ABAB (i.e., 6.4%). The partial deviation of the full profiles of cluster ABAB is shown in Figure 15, normalized by the contractual power of the cluster (P ABABA = 16.5 kW). As shown in the figure, it is generally within 5%, although some of the full power profiles exhibit a larger deviation (around the 15%). The total deviation, d k ( Y k.m ), of the cluster ABAB, is 5.1%.  Similarly, the partial error of the full power profiles of cluster ABAB with the respect to the corresponding measured power profiles is shown in Figure 16, estimated over the W ABAB = 30 power profiles used during the validation phase. It is around the 15% just in one case, while it is above 5% in three cases. The total error, e k ( Y k,m ), of the cluster ABAB is 4%. It should be noted that the condition expressed by (15) is satisfied, since the total error of full profiles is below the total error of the profile estimator of the cluster ABAB (i.e., 6.4%).

Example of Cluster AAAA-BA
The same metrics were calculated on the power profiles obtained from the sub-cluster AAAA-BA.
As shown in Figure 17, the partial deviation (PAAAA-BA = 4 kW) is generally within 5%, although some of the full power profiles exhibit a larger deviation (in any case below 10%). The total deviation, ( . ) , of the sub-cluster AAAA-BA, is 2.9%. Similarly, the error of the sub-cluster, calculated with WABAB = 30, is shown in Figure  18, which is always below 10%. The error is more equally distributed with the respect to the error of the cluster ABAB. The total error, ( , ) , of the cluster AAAA-BA is 4.2%. It should be noted that the condition expressed by (15) is satisfied also in this case, since the total error of full profiles is below the total error of the profile estimator of the cluster AAAA-BA (i.e., 4.9%).

Example of Cluster AAAA-BA
The same metrics were calculated on the power profiles obtained from the sub-cluster AAAA-BA.
As shown in Figure 17, the partial deviation (P AAAA-BA = 4 kW) is generally within 5%, although some of the full power profiles exhibit a larger deviation (in any case below 10%). The total deviation, d k ( Y k.m ), of the sub-cluster AAAA-BA, is 2.9%.  Similarly, the error of the sub-cluster, calculated with W ABAB = 30, is shown in Figure 18, which is always below 10%. The error is more equally distributed with the respect to the error of the cluster ABAB. The total error, e k ( Y k,m ), of the cluster AAAA-BA is 4.2%. It should be noted that the condition expressed by (15) is satisfied also in this case, since the total error of full profiles is below the total error of the profile estimator of the cluster AAAA-BA (i.e., 4.9%).

Reference Algorithm
The same metrics were calculated on a subset of 50 power profiles randomly selected among the customers with contract power less than 16.5 kW. In this case, the algorithm does not take into account the clustering phase. As shown in Figure 19, the partial deviation is generally below 15%, although some of the full power profiles exhibit a larger deviation (around 80%). The total deviation, d( ), is 19.8%.
Similarly, the error, calculated with a total number of reference power profiles equal to 50, is shown in Figure 20, and it is always below 20%. The total error, ( ) is 12.8%, approximately three time the error obtained using the proposed approach.

Reference Algorithm
The same metrics were calculated on a subset of 50 power profiles randomly selected among the customers with contract power less than 16.5 kW. In this case, the algorithm does not take into account the clustering phase. As shown in Figure 19, the partial deviation is generally below 15%, although some of the full power profiles exhibit a larger deviation (around 80%). The total deviation, d( Y), is 19.8%.
Energies 2021, 14, x FOR PEER REVIEW 22 o Figure 19. The partial deviation of the full power profiles, ̂, with the respect to the estimate power profile ̂, obtained using the reference algorithm without the clustering phase. The deviation is normalized by the upper bound of the contractual power of the subset of considered power profiles (i.e., P = 16.5 kW). Similarly, the error, calculated with a total number of reference power profiles equal to 50, is shown in Figure 20, and it is always below 20%. The total error, e( Y) is 12.8%, approximately three time the error obtained using the proposed approach. Figure 19. The partial deviation of the full power profiles, ̂, with the respect to the estimate power profile ̂, obtained using the reference algorithm without the clustering phase. The deviation is normalized by the upper bound of the contractual power of the subset of considered power profiles (i.e., P = 16.5 kW).

General Results
The results of the analysis, in terms of total deviation, ( , ), and total error, ( , ), on a subset of the clusters are summarized in Table 4. Looking at the table, the condition defined in (15) is always validated, i.e., the total error of the full profiles is always less than the total error of the power profile estimator. It should be highlighted that the reduction of the error of the full power profiles with the respect to the power profile estimator is, generally, higher in the cluster characterized by a larger deviation. This result is rather easy to explain. Clusters characterized by a large value of deviation have a large

General Results
The results of the analysis, in terms of total deviation, e k ( Y k,m ), and total error, e k ( Y k,m ), on a subset of the clusters are summarized in Table 4. Looking at the table, the condition defined in (15) is always validated, i.e., the total error of the full profiles is always less than the total error of the power profile estimator. It should be highlighted that the reduction of the error of the full power profiles with the respect to the power profile estimator is, generally, higher in the cluster characterized by a larger deviation. This result is rather easy to explain. Clusters characterized by a large value of deviation have a large distribution of monthly energy. The full power profiles are able to compensate this distribution, since they are obtained by scaling the power profile estimator by the monthly energy, resulting in a lower error. Comparing the obtained results with an algorithm which does not take into account the clustering phase, it should be noted that the clustering approach significantly reduces the error of the estimator power profile at the cost of a limited additional number of power profiles.

Discussion
The clustering process defined 49 types of cluster, but not all would be used for the power profile generation process because for every customer with contract power greater than 55 kW (12 clusters) load profile are always available and can be directly used for SE algorithms. In that case, the power profile estimator is still useful: it can be used to compensate missing data, if any.
Customers with a contractual power greater than 55 kW, although their number is limited, have a high-energy absorption compared to the total consumption of the network, as demonstrated also by the analysis performed in the paper. Thus, the accuracy of the result of SE algorithms strongly relies on the availability of these profiles: for this reason, the normative requires they are logged and registered by the DSO.
The remaining 37 clusters that would used to generate power profiles include the customers with contract power less or equal than 55 kW. For these customers, load profiles are not always available, therefore load profiles of sample customers have to be requested to the AMI.
The proposed approach takes into consideration the limits of existing AMI systems. Fewer than 5000 annual power profiles (100 power profiles for each of the 49 clusters) are needed, a quantity of information that can be easily managed by any informative system. These profiles are used to generate a profile estimator per each cluster that in turn is used to generate a full power profile per each of the customer of the DSO network. This approach is scalable, since the computational complexity scales linearly with the increase of number of customers. The generation of a full power profiles consists in a simple multiplication of the power profile estimator by the monthly energy value of the specific customer. This approach has the benefits of limiting the number of power profile to be recorded by the AMI, but, at the same time, offers a good approximation of the behavior of the network, since the maximum error is about 6.3%.

Conclusions
The management of modern distribution grid requires the most in-depth knowledge of the behavior of customers to properly respond to the increase of emerging energy consumers, for example, EV charging, or to distributed power generators which production depend by the unpredictable behavior of renewable resources, such as wind or sunlight. The AMI system is a precious source of information the DSO could exploit to gather such knowledge, but technical limits of real implementations prevent downloading power profiles for all customers. Thus, the proposed solution is to synthesize a power profile for each customer of the distribution grid managed by the DSO. The process to generate these power profiles takes the use of the information available in CIS and requires the download of a limited number of reference long-term power profiles from the AMI system.
The proposed approach for the generation of a massive number of customer's power profiles is a three-phase process. In the first phase, the customers are clustered in homogenous groups. The clustering is based on customer's data contained in CIS database. The clustering phase provides for an iterative approach, which allows sub-clustering the clusters which still exhibit a larger variation in terms of energy exchanged by the customers of that cluster. In the second phase, a limited number of customers is randomly selected to represent each of the clusters. The power profiles of these customers are monitored and logged for a predefined time interval. Then, these power profiles are processed to synthesize the normalized estimator of power profile, representing the average behavior of the customers of each cluster. Finally, the normalized estimator of power profile is scaled by the monthly energy exchanged by each customer of that cluster, to generate a synthesized power profile for each customer. Then, these full power profiles can be used by the DSO for distribution grid management, e.g., as input of SE algorithms.
The validation of this approach was performed using customers' data of the distribution grid managed by Unareti S.p.A., an Italian DSO. The clustering was performed on more than one million of customers, located in northern Italy. The result of the clustering phase is the definition of 49 clusters, each of them representing a specific class of customers with similar contractual features. The proposed approach allows reducing the total number of long-term power profiles to be recovered from the AMI system: fewer than 5000 annual power profiles (100 power profiles for each of the 49 clusters) are needed for the generation of full power profile estimator.
As a result of the above-mentioned process, the total root mean square error of the synthesized full power profiles with the respect to real power profile is on the order of 6.3%, considering the cluster with largest variability of customers' energetic behavior. These results demonstrate the validity of the proposed approach, able, at the same time, to limit the total number of real power profiles to be logged from the AMI system and synthesized customers' power profiles with a proper accuracy. Data Availability Statement: Data available on request due to privacy restrictions.

Conflicts of Interest:
The authors declare no conflict of interest. Total energy associated to a cluster k over the time horizon T E m Total energy of the mth power profile over the time horizon T Y k Power profile estimator Q Sub-set of power profiles used to estimate Y k Q S Design parameter used to define the actual value of Q P k Upper bound of contractual power of cluster k r k,q ( Y k ) The residual of the qth power profile of the cluster k r k ( Y k )

Abbreviations
The normalized total residual of cluster k W k Cardinality of the set of power profiles used for the validation e k ( Y k ) The normalized total error of cluster k e k,w ( Y k ) The error of the wth power profile of the cluster k ε Tolerated deviation of the validation power profiles with the respect to the estimator E k Energy of the power profile estimator of cluster k over the time horizon T Y k,N Normalized power profile estimator Y k,m Full power profile of the m customer of the cluster k e k,m ( Y k.m ) The error of the mth full power profile with the respect to validation power profile e k ( Y k,m ) The total error of the cluster k in estimating the full power profiles d k,m ( Y k,m ) The distance of the mth full power profile with the respect to the power profile estimator d k ( Y k,m ) The total distance of the full power profiles of the cluster k