Analysis of Electric Energy Consumption Proﬁles Using a Machine Learning Approach: A Paraguayan Case Study

: Correctly deﬁning and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and ﬁve linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a conﬁguration of six clusters.


Introduction
In electric distribution networks, the identification of electric load profiles is of great interest for electric energy distribution network planners and operators [1] (DNOs). The grouping distribution of feeders can be useful for tasks such as the simulation of the impact of new grid technologies, new tariffs, or network re-configurations [2]. Furthermore, the identification of a set of representative feeders allows the load distribution to be modeled avoiding an exhaustive simulation process on every feeder of the network.
To identify representative feeders, operators often use deterministic and aggregated load models [3]. This approach is straightforward to apply and clear to assess. However, it fails in the presence of uncertainties leading to suboptimal solutions. In order to integrate the uncertainties, probabilistic and optimization load modeling approaches have been applied [4]. Despite the improvement with respect to the aggregated model, they require detailed knowledge or assumptions at an appliance level [5]. To overcome this problem, the clustering approach finds the best model according to the data. In this approach, different 1.
Weekly time series data, where the consumption of each feeder was aggregated on a weekly basis; 2.
Monthly time series data, where the consumption of each feeder was aggregated on a monthly basis; 3.
Statistical data set-a set of statistical features was calculated from the raw data; 4.
Seasonal and daily load curve feature data set-a set of features based on the daily load curve and seasonal consumption variations was computed.
We can summarize the contributions of this work as follows: • Analysis and comparison of the performance of different clustering algorithms using real electricity consumption data collected from a Paraguayan electricity provider. • Study of the suitability of four different data processing strategies. • Evaluation of the influence of distance metrics and linkage criteria for this particular case study.
The rest of the paper is organized as follows: In Section 2, related works are presented. Then, the raw data, data processing, clustering algorithms and related techniques are described in Section 3. Section 4 shows the algorithms results and, finally, in Section 5, the conclusions and future work are proposed.

Related Works
There is a growing concern to address energy-related problems such as electricity consumption, load and demand. Understanding different energy consumption patterns or measuring the environmental impact of energy production can help the adoption of new policies according to demand-response scenarios [13], as well as more sustainable energy policies [14]. In the literature, much attention has been given to electricity consumption prediction [15]. In [16], for example, Walket et al. applied several learning algorithmsboosted tree, random forest, support vector machine (SVM) and artificial neural networksto predict commercial building electricity demands. Liu et al. [17] applied SVM to public buildings' energy consumption from Wuhan (China). In this case, the energy consumption data were combined with climatic and time-cycle factors. Many other works using the supervised approach can be found [18][19][20].
Clustering, although to a lesser extent than said predictive methods, has also been studied in the literature. There are several relevant works in the field. For example, Diao et al. [21] used a clustering approach to identify and classify the behaviour of occupants analysing energy consumption outcomes and energy time use data. Pérez-Chacón et al. [22] applied this approach to extract the energy consumption pattern of smart cities in a big data context. The method proposed was tested using electricity consumption during the years 2011-2017 for eight buildings in a public university. Divina et al. [23] applied the biclustering approach to find anomalies in the energy consumption pattern of smart buildings from a Spanish university campus. In [24], Pinto-Roa et al. proposed to extend an evolutionary algorithm to the time-series approach to identify consumption user profiles.
Feature extraction is another interesting approach. It entails proposing new features from the original ones to enhance relevant information. In this context, disregarding temporal information results in the loss of time-related information and redundancy of features. In this context, Meng et al. [25] applied a discrete wavelet transform (DWT) to decompose the raw data. The DWT is not only capable of extracting the rising trend and periodic waves, but it can also distinguish stochastic behavior. Neural networks (NN) were used to predict periodic waves, which can simulate their increasing amplitude. For this work, in which electric energy consumption data from China were under analysis, the results suggest the competitiveness of the proposal for a forecasting purpose. In [26], Luo et al. developed an integrated artificial intelligence-based approach that was combined with an evolutionary algorithm to enhance an adaptive deep neural network model. The proposal was tested on hourly energy consumption data. Liang et al. [27] presented a hybrid model. Such model combined empirical mode decomposition, minimal redundancy, maximal relevance and general regression neural network with fruit fly optimization algorithm. This approach, called EMD-mRMR-FOA-GRNN, was validated using load data from the Chinese city of Langfang. Finally, a systematic time series feature extraction method called hierarchical time series feature extraction was proposed by Ouyanf et al. [28]. This model was used for supervised binary classification tasks and only used user registration information and daily energy consumption data to detect anomaly consumption users with an output of stealing probability. The performance of this proposal was tested using data from over 100,000 customers.

Materials and Methods
This section introduces the nature of the electric energy consumption data and provides the basic concepts of time series and feature-based clustering. The data used in this work, the characteristics calculated for the feature-based clustering approach and the basic notions of the clustering algorithms used are all described here.
Electric energy consumption data are usually represented as a time series through a discrete sequence of data points measured at equal time intervals. Let is one of them and is characterized by T real values. Thus, the sample X can be represented through a matrix H N×T .
In the context of these particular data, each time series represents a sequence of sensor data collected over time. Therefore, the data can be viewed as an N × T energy consumption data matrix EM. EM is a real matrix, where each element e it represents the electric energy consumption of a feeder (expressed in kWh) as measured by sensor i at the hour t.
Another approach to represent the electric energy consumption is to calculate a set of features representing each electric consumption sequence instead of considering it as a time series [29]. The main advantages of this feature-based clustering method are: the ability to reduce the dimensionality of the original time series; the fact that it is less sensitive to missing values; and the fact that it can handle different lengths of time series [29]. The two feature data set representations implemented in this paper are described in Section 3.3.

Data
The data set used in this work contained 2,967,224 records of electric consumption measured in amperage from January 2017 to December 2020 (4 years) of 115 feeders distributed in 17 substations of the eastern region in Paraguay.
The data set, named "Electricity consumption and meteorological data of Alto Paraná, Paraguay", is freely and publicly available at [12].

Data Preprocessing
A couple of transformations were applied to the data set to reduce the error in the results. The first step was normalizing the time stamp value. For example, 23:59:59 on a given day was converted to 00:00:00 of the next day. The elimination of negative and zero electric consumption records was applied. Since the collected data did not have a standard timing interval (records were saved every thirty minutes in some periods; in others every hour), the next step was the hourly frequency normalization. All records that did not match an o'clock time were removed from the set. Before the outlier detection and data imputation phase, feeders with less than 90% of records were discarded. After all the preprocessing, 24 records per day from each feeder were expected over four years, i.e., feeders with less than 31.536 records were removed.
The result of these steps is a reduced data set made of 55 feeders distributed in 14 substations with at least 90% of recorded hourly data during said four-year period.
With the reduced data set, outlier detection was performed using the algorithm proposed by Vallis et al. [30]. This algorithm requires a full data set. Thus, a linear interpolation to fill the gaps was needed before running it.
The Box-Cox transformation [31] was also used to stabilize the variance in the data, so that they remained stationary and obtained an additive time series as described by Chatfiel [32] and Hyndman et al. [33]. This resulted in X * as the Box-Cox transformation of X. Given the time series X * , this algorithm implements the Seasonal and Trend decomposition using LOESS (STL) [34] to obtain the components of seasonality S x * , trend T x * and remainder R x * , such that X * = S x * + T x * + R x * . This decomposition method allows the seasonal component to be varied according to the nature of the series; simultaneously, it is robust to the presence of outliers.
After this, the remainder component was recalculated as R x * = X * − S x * −X * , wherẽ X * is the median of the data considering a non-overlapping moving window of two-week length as described in [30]. Then, the generalized extreme studentized deviate (ESD) test [35] was applied over the resulting remainder component using both median and median absolute deviation to detect outliers as described by Vallis et al. [30].
Finally, the inverse Box-Cox transformation was run. The outliers, as well as the interpolated values that were added at the beginning of this phase, were removed. The outliers quantity per feeder is shown in Figure 1.  After all outliers and unwanted records were discarded, the historical average data imputation technique [36] was applied to estimate each missing record y i as an average of N H representative historical records y j , j ∈ H, where | H |= N H . The set H included all historical records where the day of the week (DOW) is the same as the one on the missing record and within selected spans of it. The DOW guaranteed that historical means were calculated over records of the same days of the week and similar seasonal characteristics. The selected DOW span for this analysis was ±6 weeks. The resulting data set contained 1,848,947 records of 55 feeders distributed over 14 substations. Figure 2 shows the percentage and number of records per feeder.

Data Sets and Features
In this section, the making of the four data sets used is explained. The first data set provided the weekly demand registered by the feeders. Calculations considered a Sunday-to-Saturday span, resulting in a time series of 207 records. Sunday was chosen because the time series of the feeders began on that day, on 1 January 2017. Thus, an equitable distribution of the days for each week was obtained from the start. However, some days were dropped, even in the middle of the time series, due to missing data and some weeks yielded data with less than seven days.
The second data set contained the monthly demand, with a time series of 48 records. As on the first data set, some months had fewer data than others due to discarded data. December 2020 was the month with the fewest observations, only 16 days.
The third data set was considered from the work performed by Rasanen et al. [29]. Seven statistical features were extracted from each of the feeders in a window of size equal to one calendar week N i throughout the entire time series, where i = 1, 2, ..., 207 weeks. It should be noted that, although N i corresponds to one week, it presents different lengths due to missing values in certain weeks.
Therefore, the features used were: mean (µ), standard deviation (σ), skewness (S), kurtosis (K), maximum Lyapunov exponent (λ), energy (E) and periodicity (P). The mean, calculated by Equation (1), indicates the central value of the analyzed data. In contrast, the standard deviation (Equation (2)) indicates a measure of the dispersion of the data.
Skewness (Equation (3)) is a measure that indicates the degree of asymmetry in the distribution of the demand data [37]. Kurtosis (Equation (4)) is related to the tails in the distribution. High Kurtosis indicates greater extremity of deviations [37].
Likewise, chaotic dynamical systems are common natural and artificial phenomena, including energy demand. The measured time series comes from the attractor of an unknown system with a certain ergodicity. In other words, it refers to a set of numerical values towards which the system evolves. This ergodicity contains the attractor information [38]. The maximum Lyapunov exponent (MLE) is the most used quantity measured on chaotic systems, as it describes the exponential divergence of nearby trajectories. For the case of a time series x i = (x i,1 , x i,2 , ..., x i,N i ), a ν-dimensional phase attractor with delay coordinates is considered, i.e., a point on the attractor is represented by where τ describes the almost arbitrarily considered delay and ν the embedding dimension. Then, a initial point {x i,t 0 , x i,t 0 +τ , x i,t 0 +2τ , ..., x i,t 0 +(ν−1)τ } is chosen and the nearest neighbor to it is determined [39]. The initial separation between these two selected points is represented by the vector δZ 0 . Therefore, the system diverges approximately at a rate given by δZ t = e λ(t×∆t) δZ 0 , where λ is the maximum Lyapunov exponent and ∆t the sampling period. Hereof, λ became more accurate when t → N i . Therefore, it was estimated as the mean rate of separation of the nearest neighbors across the samples. Thus, the MLE was expressed according to Equation (5).
The energy present was also considered and was obtained using the fast Fourier transform (FFT) [40]. For this purpose, the resulting Fourier transform sequence was comprised by Given this, the energy calculation was performed by adding the squares of the magnitudes of the resultant components; then, it was divided by the length of the sequence (N i ) to normalize the calculated measurement (Equation (6)).
Finally, another highly relevant measure to assimilate the behavior of the time series is periodicity. To obtain it, a periodogram was determined to estimate the power spectral density, which also uses the FFT as the basis of the calculation. This function indicates the distribution of the frequencies present in the signal given by the time series. Hereof, the most powerful frequency was selected and converted into an hourly period value via Equation (7).
where P xx,i (ω) represents the power spectral density in the frequency domain ω and T the period converted to hours, in which the power is higher. The fourth data set was built in order to capture seasonal and daily effects on the energy demand, as in Haben et al. [41]. Consequently, each day was divided into five relevant periods that characterized the behavior of daily demand as shown in Figure 3. It is important to note that these periods were defined considering the Paraguayan electricity demand curve. Therefore, they are different from the proposal presented in [41]. The intervals of the chosen time periods are detailed in Table 1.  The features to be used were defined, taking into consideration such periods. For a specific feeder and each period i = 1, 2, 3, 4, 5 over the entire time series, P i was represented as the mean electricity demand with σ p corresponding to its standard deviation. Meanwhile, P was considered as the mean daily demand over the complete time series. In each period, the mean demands corresponding to the summer and winter seasons, P S i and P W i , respectively, were also computed. Similarly, the mean demands on weekdays and weekends were considered in each period of the entire time series. They were noted as P WD i and P WE i , respectively. As a result, the following eight features were extracted: • Features from 1 to 5: The relative average power in each time period over the entire time series given by • Feature 6: Mean relative standard deviation over the entire time series given bŷ • Feature 7: A seasonal score given by • Feature 8: A weekend vs. weekday difference score given by It is important to mention that, for each data set obtained, the values of the preprocessed time series were scaled within a [0, 1] range for each feeder, through the transformation x scaled = x−x min x max −x min , since, otherwise, the clustering process would have been carried out as a function of the mean daily demand [42]. Finally, the conformed data sets are represented in Figure 4.

Distance Measurements
The work aims to find similarities in feeder consumption. Thus, it was essential to determine appropriate distance measures. Since one of the strategies was based on feature extraction, the use of Euclidean distance was reasonable. However, when considering the strategy based on patterns present in the consumption time series, the distance measure based on dynamic time warping (DTW) proved to be a better choice [43], although the Euclidean distance showed some promising results that should be considered for experimentation [7].
Therefore, for the time series approach, the definition of the Euclidean distance is such that, given two time series x = (x 1 , x 2 , ..., x N ) and y = (y 1 , y 2 , ..., y N ) of lengths N, is represented as In the case of feature extraction, x and y correspond to the arrangement of the considered features.
On the other hand, the DTW algorithm presents an efficient method that minimizes shifting and distortion effects. It includes a transformation that allows similar shapes with different phases between time series to be detected [44]. Given the time series x = (x 1 , x 2 , ..., x N ) and y = (y 1 , y 2 , ..., y N ) of lengths N, a cost matrix is created with objects that correspond to the all pairwise distance between the x and y components, such that M: m i,j = x i − y j for i, j ∈ [1, N]. From here, the optimal warping path wp = (p 1 , p 2 , ..., p L ) is determined, where p = (i , j ) represents the pair of indices of the selected components in the matrix M. The value of L corresponding to the length of wp is such that N ≤ L < 2 × N. For the determination of wp, there are three conditions to be followed. The first one corresponds to the boundary condition, in which p 1 = (1, 1) and p L = (N, N); thus, it is ensured that such a path starts at the beginning of both series and closes at the end. The second refers to the monotonicity condition, where it is fulfilled that i −1 ≤ i and j −1 ≤ j , in order to preserve the time-ordering of points. The third condition is known as the step size condition, whose criterion limits the warping path of the long jumps while aligning the series. This last condition is formulated as p − p −1 ∈ {(1, 1), (1, 0), (0, 1)}. Then, wp is composed in such a way that the cost function m wp (x, y) = ∑ L =1 m i ,j is minimized. Finally, the DTW distance is expressed as Figure 5 shows the difference between the components considered for the calculation of the distance between the D3 and E2 feeders, both Euclidean and DTW. The latter shows that the pairs of components considered were not necessarily located in the same temporal location.

Clustering Techniques
In machine learning, clustering refers to the process of grouping a sample of objects according to a similarity measure. Classically, clustering is defined as follows: Let O be a set of n o objects described by d features ffl j , j = 1, 2, ..., d, so that o ij denotes the value of the feature ffl j for the object o i . Clustering aims to group the n o objects into K clusters C 1 , C 2 , ..., C K so that objects in the same cluster are more similar than those in other clusters.
The clustering algorithms used in this work are K -means [45] and hierarchical clustering [46]. The following section describes both strategies.

K-Means
The K-means algorithm is one of the simplest and most widely used clustering techniques. It determines cluster centroids belonging to a data set, according to a K value representing the number of clusters in which they are to be partitioned. In particular, the algorithm repeatedly performs two steps for this purpose. First, it assigns the closest centroid (c k ) to each data in order to minimize the sum of squared distance as expressed by Equation (14); then, it recalculates the centroids based on the mean of the data that were assigned to it, until it finds no variation or reaches a predefined number of iterations [47].
It is worth noticing that the initialization of the centroids can be carried out randomly. Nevertheless, for this work, the Kmeans++ optimization method was used, thus selecting the starting points with a probability weighted by the distance from the previously chosen initial centroids [48]. In addition, it should be noted that, when K-means was applied on the time series data with the DTW distance measurement, the centroids were calculated using the DTW barycenter averaging (DBA) algorithm [49].

Hierarchical Clustering
Hierarchical clustering allows the construction of a hierarchy structure or linkage between the clusters formed, which can be either agglomerative or divisive. In the agglomerative method, each object is initially considered as a group. Then, the groups are iteratively combined to form an ascending hierarchy of groups until a single root group is reached. In contrast, the divisive method considers the complete set of objects as a single cluster. Then, it iteratively splits the clusters to achieve a top-down hierarchy where each object represents a single cluster [47].
The final structure of the clusters obtained is called a tree or dendrogram. The process carried out to obtain the dendrogram requires determining the similarity between the objects with the use of a linkage criterion [50]. In this work, a focus on the agglomerative method was given, using the linkage criteria summarized in Table 2. The application is described given two clusters, C i and C j . Table 2. Proposed linkage criteria for use in the hierarchical algorithm.

Criterion Formula Description
Single Determined by the distance of the nearest objects between clusters C i and C j . Complete Determined by the distance of the farthest objects between clusters C i and C j . Average Determined by the average distance between the objects of clusters C i and C j . Centroid Determined by the distance between the centroids c i and c j corresponding to clusters C i and C j , respectively. Ward Determined by sum of the squares of the distance between all objects in cluster C i and C j , and c i,j , centroid of the new cluster merged from C i and C j .
This approach was been applied to time series data. For example, in [51], the hierarchical algorithm was applied using the DTW distance.

K-Spectral Centroid
K-spectral centroid [52] (K-SC) allows clusters to be found in the time series based on the distinctive temporal pattern of the time series. It is an iterative algorithm similar to the classical K-means clustering algorithm, but performs an efficient centroid calculation under a scale-invariant and shift-invariant distance metric. Similar to K-means, K-SC alternates between two steps to minimize the sum of squared distances; however, the distance metric is not Euclidean, but is given by where x and y correspond to time series, y q corresponds to the time series shifted by q time units and γ is a scaling coefficient to time series. This measure finds the optimal alignment and the scaling coefficient for matching the shapes of the two time series. As a result, it allows one to compute the cluster centroids more appropriately by better acquiring the temporal patterns of the data. Thus, this algorithm was applied to the weekly and monthly time series of electricity demand.

Cluster Validity Indices
Since the task of grouping objects that share similar characteristics belongs to the area of unsupervised methods, it is challenging, at first, to select the number of sets to be considered. For this purpose, several clustering validation indices provide a quantitative criterion about the number of clusters formed. In this work, the Silhouette, Davies-Bouldin and Calinski-Harabasz validation indices were considered. They have shown promising results in comparative studies [53] and also provide enough information to select the most optimal configuration.
The Silhouette index describes a measure of quality based on how similar an object is to those belonging to the same cluster (cohesion) in contrast to how dissimilar it is from those belonging to the nearest cluster (separation) [54]. This index is normalized within a [−1, +1] range, where high values indicate a good conformation of the objects based on their similarities concerning the distinctions of the other clusters. In this case, the average of the Silhouette index scores for each component of a given cluster was considered. Since α i represents the average distance of an i-th sample for the others in the same cluster and β i represents the average distance of the same sample with respect to those in the nearest cluster, the Silhouette index for a sample is represented by Therefore, the average score of the Silhouette index is given by (17) where N corresponds to the total amount of samples. The Davies-Bouldin validation index represents the average similarity between clusters [55]. In this case, the cohesion estimation is based on the average distance δ i between the centroid of a considered cluster i and the objects that conform it. The separation is represented by the distance D ij between the centroids of the cluster i and another cluster j. Thus, is maximized, where δ j represents the cohesion estimation for cluster j. Therefore, the Davies-Bouldin index is represented by the expression where K indicates the number of clusters. The lowest score that can be obtained for this index is 0; values close to it indicate better clustering.
Finally, the Calinski-Harabasz validation index measures the ratio of the sum of the between-cluster dispersion and within-cluster dispersion for all clusters [56]. In this sense, dispersion is defined as the sum of the squared distances. Therefore, when considering a set of objects O of size n o , which have been clustered in one of the K clusters, it is necessary to determine both the between-cluster dispersion matrix B and the within-cluster dispersion matrix W, expressed as where C i indicates the set of objects belonging to cluster i, c i the center of cluster i, c o the center of O and n i the number of objects in cluster i. Once this is carried out, the traces tr(B) and tr(W) corresponding to the matrices B and W, respectively, are considered. With them, the Calinski-Harabasz index is defined as High scores indicate well separated and dense clusters, which is expected when the clustering algorithm is correctly applied.

Workflow
As shown in Figure 6, this work followed a rigorous process to determine the necessary tools for experimentation. The starting point was collecting available data from the studied feeders, followed by the corresponding preprocessing to correct the anomalies present. Once this stage was completed, four sets were generated based on the above description. The description of the different models induced on each data set are presented in Table  3. It gives a better appreciation of the configurations to be taken into account. For each data set, both the K-means and hierarchical clustering algorithms were applied, considering the corresponding variation depending on the nature of the data. Thus, for data belonging to time series, the analyses were performed for Euclidean distance measurement and DTW. On feature-based data, the only distance applied was the Euclidean distance. Likewise, for the models where the hierarchical algorithm was applied, the linkage criteria set out in Table 2 were taken into account. Therefore, each model was assigned an identifier for further analyses based on the results.
After the learning process was completed for different cluster sets, the validation index scores were considered to determine the best performing model and the optimal number of these clusters. Finally, the results were plotted to visualize the characteristics possessed by the conformed clusters, as shown in the next section.

Results
This section presents the results obtained from the numerical experimentation carried out using the previously defined models. The objectives defined in this work are the following: • Comparison of the different clustering techniques studied to identify the best models according to the cluster validity index measures. • Analysis of the consumption data of the best model found.
For the comparison of the different models, the number of clusters was varied from two to ten. For each model, the Silhouette, Davies-Bouldin and Calinski-Harabasz index scores were calculated. However, only the Silhouette index was taken into account because of its data independence [54].
However, since the preliminary results based on the Silhouette score yielded the best configuration of only two clusters, which did not imply a good solution to the problem, considering it did not give the DNOs the opportunity to assess different options, the scores based on the local maximum were also considered. This opened a broader range of clustering possibilities. The same consideration was also given to the Calinski-Harabasz index. In contrast, the local minimum was considered for the Davies-Bouldin index.

Model Comparison
Once the defined models had been subjected to the variation of the different numbers of clusters, the best 15 models were selected as indicated in Table 4.
Models using the data set based on seasonal demand characteristics showed better results than on other data sets. The above indicates that the differences in energy consumption in different seasons of the year and the variation in consumption during the week provided more relevant information to characterize the similarities among feeders. In addition, there was repeatability in terms of the number of clusters present for different models, i.e., four, six, or seven clusters generally showed good results. It is important to highlight that the models that made use of the data set based on time series also showed good results, since they appeared in the ranking, starting from the 12th position. Under this aspect, the K-SC algorithm had a higher relevance with respect to the others used in this strategy. However, its scores were well below those of the models based on seasonal features mentioned above. On the other hand, those models based on statistical characteristics are not presented in the table due to their poor performance.
Additionally, with respect to the distance metrics applied to the time series and used in the described algorithms, both the DTW methods used in K-means and the K-SC metric showed better results in contrast to the Euclidean distance, as shown in Table 4. On the other hand, there was no relevant difference in the results obtained by the types of linkage criteria applied to the hierarchical algorithm, since the Silhouette indices were very similar. These results were further analyzed considering the other validation indices mentioned. Since the models to be compared now shared the same data set, there was a concordance between the scores obtained by the Calinski-Harabasz and the Davies-Bouldin indices.
Therefore, according to Figure 7, which shows the variation in the validation indices based on the change in the number of groupings considered, it was possible to make a more concrete determination of the best configuration. The points where a local maximum appeared in the curve produced by the Silhouette scores were marked with a vertical dashed line. It intercepted with the other curves formed for a more evident appreciation of the comparable values.
Firstly, the points where both the Silhouette and Davies-Bouldin scores produced better results simultaneously were determined. These corresponded to the points for the Davies-Bouldin curve where there was a local minimum and where it intersected the vertical dashed line. Therefore, there were only two cases where this condition was fulfilled. One corresponded to the K-means algorithm and the other to the hierarchical algorithm with the ward criterion, both under the consideration of K=6 clusters.
Similarly, the points where the Silhouette and Calinski-Harabasz scores presented the best results together were also determined. In this case, it is necessary to point out those values that belonged to a local maximum in the Calinski-Harabasz curve and, likewise, intersected with the vertical dashed line. Thus, five points were detected where these considerations were satisfied. For the K-means algorithm, it was found at K = 6. Regarding the hierarchical algorithms, with the complete criterion, one was found at K = 7. Finally, with the centroid criterion, both for K = 6 and K = 8 were found. Thus, the conditions were verified. When considering the average criterion, there was a point at K = 5 that also satisfied the requirements. As a result, the model based on the K-means algorithm for K = 6 clusters showed the best configuration concerning the scores of the validation indices as a whole, as the preferable results coincided with this one. Therefore, it is important to note that, in the clusters formed, the objects presented a good similarity between those belonging to the same cluster and dissimilarity between the objects of nearby clusters. Likewise, the conformations presented a low dispersion, thus yielding dense clusters.
However, while the model based on the hierarchical algorithm with the ward criterion for K = 6 did not perform well for the Calinski-Harabasz index, it did well with the remaining validation indices. Therefore, it was relevant to compare to determine the differences between the resulting clusters in contrast to the K-means cluster for the same number of clusters. Given the comparison illustrated in Figure 8, corresponding to the clusters formed by the K-means model in contrast to those obtained from the hierarchical model with Ward's criterion for K = 6, two of them shared the same objects, that is, the same feeders, which indicates an important relationship between those that made up these clusters. In contrast, the remaining clusters differed in several ways between the two models. Cluster 4 of the hierarchical model included all the objects of its homonym belonging to the K-means model. However, it also included some objects of Clusters 3 and 5 from the latter. Another essential aspect was observed in Cluster 2 of K-means, formed by Clusters 2 and 3 of the hierarchical model.

Analysis of Selected Model
Given the previous analysis of the validation indices, the K-means model with K=6 clusters was selected for use. Therefore, we proceeded to analyze the consumption curves for the clusters determined. Figure 9a shows a box plot of the average daily consumption of all clusters. Cluster 4 had a very distinct behavior on electricity consumption throughout the day, when compared to the other clusters. In this case, the feeders that made up this cluster showed a prominent peak at midday, with no other peak at night as usual. The other clusters considered, in turn, presented a similar behavior with the consumption curve. There were more pronounced peaks both at midday and at night. However, there were slight differences in the level of consumption.
The fact that there was not a very marked distinction in these graphs is because the clustering was performed based on the consumption characteristics of the seasons, that is, the difference between certain times of the day, weekdays or weekends and the seasons of the year. For this purpose, a better analysis is presented in Figure 9b. Here, the centroid of each cluster is presented as daily consumption, where the summer and winter seasons were considered, as well as the weekdays and weekends for each of them. Daily consumption was similar for both weekdays and weekends in summer for all clusters, except for Cluster 4. In winter, Cluster 5 showed a considerable drop in its consumption that differed from the other clusters. In summer, although there were differences, they were not so significant. The changes in the consumption levels of Clusters 1 and 3 were also notable. In summer, Cluster 1 had a higher consumption than Cluster 3; however, in winter, this was reversed.
In a nutshell, the feeders present in each defined cluster were exposed.

Discussion
In this paper, a cluster analysis of real data from the Paraguayan eastern region's electric power system is presented for the first time. The data contain four years of hourly electric consumption of 115 feeders distributed in 17 substations.
The data were pre-processed to generate four data sets useful for the clustering algorithms according to the following: i) weekly demand, ii) monthly demand, iii) a statistical feature set and iv) a seasonal and daily consumption feature set.
The K-means and the hierarchical agglomerative clustering algorithms were used with the Euclidean and the dynamic time warping (DTW) measures as distance metrics. For the hierarchical algorithm, five linkage criteria were tested. In this context, a total of 36 different models were tested on the four data sets. The results were evaluated with three index scores, the Silhouette, Davies-Bouldin and the Calinski-Harabasz.
The seasonal feature set obtained the best results; this was expected, considering that this feature set was designed thinking in terms of the electric consumption curve with a particular daily period of the Paraguayan load curve. The K-means showed slightly better performance than hierarchical agglomerative clustering, although the difference was not significant, even among the linkage criteria used in the latter. The K-means with the seasonal features data set obtained the best Silhouette score of 0.432 with four clusters. However, when all three metrics were considered, the K-means with six clusters presented the best performance. All tested models, K-means, hierarchical and K-SC, exhibited the worse performance on both time series and statistical based data sets when compared to models using the seasonal feature data set. However, metrics applied to time series for handling time shifting, such as DTW and K-SC's own metric, yielded better results than the Euclidean distance.
The three metrics considered in this paper did not score the same cluster configuration as the best. Therefore, different options and optimal local results were assessed. Showing more than one result gave the DNOs the opportunity to analyze different quality options before deciding whether they may be studying new tariff incentives, the impact of distributed generation, or new distribution network structures.
In future works, other clustering algorithms, such as kernel DBScan, modified fuzzy c-means, or k-medoids-based genetic clustering [57], may be implemented on the data set. In addition, a biclustering approach [23] is proposed as an interesting alternative for future works of this research. We also plan to apply the methods studied in this work to other real world data.

List of Symbols
The following symbols are used in this manuscript: Cost function for DTW α Average distance of a sample with respect to the others in the same cluster β Average distance of the same sample with respect to those in the nearest cluster Silhouette index for a sample SIL Average score of the Silhouette index (Silhouette index) δ Average distance between the centroid of a considered cluster and the objects that conform it D Distance between centroids of two clusters R Similarity score between clusters DB Davies-Bouldin index B Between-cluster dispersion matrix W Within-cluster dispersion matrix CH Calinski-Harabasz index