Clustering Electrical Customers with Source Power and Aggregation Constraints: A Reliability-Based Approach in Power Distribution Systems

: Reliability is an important issue in electricity distribution systems, with strict regulatory policies and investments needed to improve it. This paper presents a mixed integer linear programming (MILP) model for clustering electrical customers, maximizing system reliability and minimizing outage costs. However, the evaluation of reliability and its corresponding nonlinear function represent a signiﬁcant challenge, making the use of mathematical programming models difﬁcult. The proposed heuristic procedure overcomes this challenge by using a linear formulation of reliability indicators and incorporating them into the MILP model for clustering electrical customers. The model is mainly deﬁned on a density-based heuristic that constrains the set of possible medians, thus dealing with the combinatorial complexity associated with the problem of empowered p-medians. The proposed model proved to be effective in improving the reliability of real electrical distribution systems and reducing compensation costs. Three substation cluster scenarios were explored, in which the total utility compensations were reduced by approximately USD 86,000 (1.80%), USD 67,400 (1.41%), and USD 64,000 (1.3%). The solutions suggest a direct relationship between the reduction in the compensation costs and the system reliability. In addition, the alternative modeling approach to the problem served to match the performance between the distribution system reliability indicators.


Introduction
Continuity in the supply of electrical energy is one of the success criteria for an energy system. However, the occurrence of interruptions in the supply of energy to end users, sometimes due to possible failures, significantly compromises the operational and economic conditions of the electrical utilities [1,2].
In the energy market, the performance of distribution systems is evaluated according to the level of the continuity of delivery and is regulated by entities that standardize the technical activities relevant to their operation, in order to guarantee that each customer receives energy with an acceptable quality level [3,4].
For the electrical distribution utilities, the quality of service is a central problem, and its impacts are evaluated through the reliability aspects of the service provided during the energy supply [5,6]. As for the quality of service, utilities monitor metrics of frequency and duration of outages, designed to regulate the performance of these companies in relation to the desired goals and requirements [1,7,8].
In Brazil, the dimensions of the quality of service provided by the distributors are regulated by the National Electric Energy Agency (ANEEL), responsible for overseeing the distributors [4,6]. Among the parameters used to evaluate the service are the collective The introduction of smart meters benefits the planning and operation of the distribution networks and demand management in addition to reducing costs [18]. Different models and approaches have been proposed to minimize costs related to [19] the reliability and [14,20] the switch installation.
There are several reasons for researching substation clustering models in electrical power distribution systems. First, reliability is a crucial issue in power systems, and clustering substations can significantly improve system reliability [8]. Second, bundling can lead to reductions in compensation costs, which can be beneficial for both the utility and consumers [17].
In practice, power utilities can apply the economic considerations of clustering substations in a variety of ways. However, the substation clustering context lacks models that include both reliability and cost reduction aspects.
In addition, economic considerations must be balanced with other technical and operational factors that may influence the utility's decision-making policies in relation to government regulations.
Despite the importance of clustering substations, there are gaps in the literature, such as the lack of models that simultaneously consider the compensation costs and the system reliability. More research is needed to fill these gaps and contribute to optimizing substation clustering in electrical power distribution systems.
The hypothesis assumed in the modeling for the optimization of the clustering of consumers belonging to physically adjacent substations is that the use of Mixed Integer Linear Programming (MILP) will result in compact and contiguous clustering in relation to the center defined by the weighted median. This process will provide a shorter time in determining the settings and analyzing their contributions to the quality of the service.
The second hypothesis assumed is that the clustering of substations will promote the minimization of the financial compensation associated with the violation of the limits of the reliability indicators. This will allow the reallocation of the financial resources of the concessionaires in the area.
Given these research gaps, this article proposes an approach for clustering a set of geographically connected customers in the electrical distribution network according to economic criteria and regulatory standards [11].
The combinatorial nature of the reclustering problem [21] suggests that the associated complexity can be adequately addressed by defining the problem based on the number of customers and their related clusters [22][23][24].
In the context of the problem of clustering electricity customers, it is possible to form groups of customers in different ways; however, restrictions, such as geographical restrictions on the distribution network or control of the combinatorial nature of the problem, limit the possibilities.
In this investigation, the challenges associated with the clustering of electricity customers are addressed, with a view to analyzing the impacts on the costs of energy discontinuity, when customers are submitted to new clustering configurations [25][26][27].
This article aims to propose a model to solve the customer clustering problem, formulated based on the MILP for substation clustering. As a decision criterion adopted, the maximization of the reliability of the distribution system was assumed, while seeking to minimize the costs associated with compensation.
This approach improved the applicability of the solution in the electrical distribution sector, resulting in a MILP model that can be used in distribution networks of realistic size and complexity. The main contributions of the work are as follows: 1.
The definition of a heuristic approach to clustering customers with a reference point (center) using p-medians; 2.
A MILP model based on the linear formulation of reliability indicators; 3.
MILP complexity control for application in real and large distribution systems; 4.
The possibility of identifying the impact on the reliability performance, with different consumer set configurations; 5.
The analysis of the reduction in the compensation credited to consumers arising from violations of the indicator limits; 6.
The reallocation of investments and optimization of the use of materials, teams, and network, enabling an increase in the quality of the electricity supply service; 7.
In the context of the consumer clustering problem based on the MILP model to group substations, there is a lack of research that addresses the influence of clustering on the costs of financial compensation.
The remainder of this paper is structured as follows: Section 2 presents the literature review, while the definition of the problem is presented in Section 3; the clustering approach is presented in Section 4; and finally, the results and the discussion and conclusions are presented in Sections 5 and 6, respectively.

Clustering Electrical Customers
The last stage of energy supply, the distribution system, has the purpose of transporting electricity from the transmission system and delivering it to customers [28,29]. Therefore, assuming an approach that considers the set of customers of an electrical distribution network becomes very appealing from the point of view of computational complexity [28].
One of the main points is the ability of distributors to group customers, considering their electrical characteristics, which consequently makes it possible to avoid unnecessary costs. The clustering of customers in the distribution network addressed in this paper is defined according to the division of the service area into several substations [30], and the formation of customer sets is based on solving the clustering problem equivalent.
Although practical applications of cluster analysis are found in different areas [21], particularly on data clustering in the electrical area, the clustering context is applied to the behavior of different types of electrical customers, from which parameters are used regarding the size, economic activity, and energy consumption [31,32], and the clustering assigns similar electrical customers to the same class [32].
Although clustering methods are often used to model the energy consumption [33][34][35][36][37], customer clustering can be approached from different aspects. As for energy quality, the researchers in [38] directed their work toward improving energy quality assessments, and another work aimed to detect abnormal energy uses that could compromise the reliability of the electrical network [34].
In Benitez et al. [39], the static K-means algorithm was applied to identify energy consumption patterns in Spanish homes and also to assess their general consumption trends more quickly depending on the applied technique. Rhodes et al. [40] employed K-means together with Probit regression to determine so-called seasonal groups, based on variations in energy use to determine the daily profiles of residential customers. Another use of the clustering algorithm K-means, with the addition of Fuzzy C-means, was proposed by Sharma and Singh [34] with the aim of defining customer load profiles, similar to Rhodes et al. [40].
Biscarri et al. [31] investigated the applicability and performance of different clustering techniques under a framework that tested different numbers of clusters and different validation measures. The clustering problem was addressed with the development of a set of rules based on the identification of the load profiles of similar customers, selected based on their consumption and other economic criteria, to then classify new customers automatically, in such a way as to provide different tariffs established by the electrical utility.
Also exploring load profile studies, Panapakidis and Moschakis [41] developed a successful application of the K-means algorithm for cluster-based analysis of the daily load profile. However, unlike the other clustering algorithms available in the literature, the work of Falabretti and Sabbatini [30] proposed a clustering algorithm in which the MILP served as the basis for electrical losses along the lines, for the layout of the existing distribution network, and for the system reliability.
Jiang, Wu, and Zhan [42] presented the clustering problem according to an analysis of the characteristics of composite substations through a multi-objective model and a clustering algorithm for the adequate choice of substations, aiming to improve the efficiency and safety of the electric power system.
Corigliano et al. [43] proposed the use of k-means data clustering techniques to identify patterns and group locations that had similar characteristics for the problem of locating electrical power distribution substations, considering restrictions such as the need for safe distances between the stations and installation costs.
Huang et al. [44] modeled the clustering of electrical power substation load data, using the spectral clustering technique to group load data into homogeneous groups and improve the accuracy of substation load forecasting, aiming to optimize the operation and electrical power system planning.
Regardless of the approach, there was a recognized success in the application of clustering algorithms as techniques to extract and form datasets from similarities between elements. Especially in the energy sector, reports pointed to valuable strategies precisely because of the adequacy and speed with which responses were provided [31,41,45,46].

The Problem Definition
The problem considered in this work refers to the substation clustering in an electrical distribution network. Ultimately, the problem refers to the clustering of customers belonging to the substations, with some similarities to the problem addressed by Moreno et al. [47] and Assis et al. [48] in territorial planning.
The differentiation is related to the dimension of the customers' connection to predetermined physical sets: the electrical circuits of the substations that, later, will constitute functional sets for the purpose of calculating the performance indicators of the quality of the electricity distribution service [49]. Even if the customer sets electrically linked to a substation are assumed, the clustering of these substations must have a reference in order to identify the proximity of one substation to another, therefore evaluating the similarity and dissimilarity between the considered sets. For this reason, the empowered p-median problem [50][51][52] is the formulation that best approximates the intended approach.
Initially, the proposed formulation had the data, parameters, and variables defined in Table 1. Table 1. Sets, input data, and variables of the p-median model.

Set Description
T The set of customers; The outage time for customer i; The outage frequency for customer i; S The set of substations; T s The set of customers in the substation s, T s ∈ P(TS); TS The set of sets of customers for all substations: The set of new substations: The maximum distance between each customer and the reference point of the substation to which the customer is linked; α k The normalization factor for the magnitude associated with the objective function: k = 1 for distance; k = 2 for dec; and k = 3 for f ec; |G| The cardinality of a hypothetical set G;

Input data Description
M A large number, typically 10 9 ; D ij The distance from customer i to the substation that has its center at point j;

Variable Domain / Description
x ij 1 if the customer i is assigned to a substation whose reference point is located at point j; 0 otherwise; y j 1 when the point j is used as the center of a substation, and 0 otherwise.
The objective function of Equation (1) minimizes the distance between each customer and the reference point of the considered substation.
subject to: In this problem, customers are all assigned to substations with their corresponding locations. The objective function of Equation (1) minimizes the sum of the distances D ij between each customer i and the reference point j linked to a substation. The constraints of Equation (2) guarantee that all customers are allocated to one and only one substation. The constraints defined in Equation (3) establish that every customer i must be assigned to a location j whenever there is a substation installed in this, i.e., y j = 1. The number of substations assumed to be installed at every reference point j must be limited to the cardinality of the set S, according to Equation (4). The constraints given in Equation (5) ensure that all points originally linked to existing substations are jointly assigned to a new substation located at the reference point j ∈ T. Finally, the domain of the decision variables x and y are defined in Equation (6).
The problem defined in Equations (1)-(6) solves the allocation of customers to the new substations considered from the determination of the points j that are linked to the variables y. It should be noted that this model does not include reliability, either as a criterion or even as a restriction. Therefore, the reliability criterion adopted in this work includes the group indicators that make reference to SAIDI and SAIFI [9], herein represented respectively by dec j and f ec j , as well as other variables derived from these indicators. With regard to the customer indicators CAIDI and CAIFI, they are represented by DIC i and FIC i , respectively.
The calculations of dec j and f ec j are given in Equations (7) and (8), respectively.
Assuming a possible regulatory restriction regarding the limits for aggregating customers, Equation (9) defines that any connection of a customer to a reference point of a new substation is limited to a maximum distance D max .
Finally, the objective function originally defined in Equation (1) now also includes the minimization of the continuity indicators, according to Equation (10).
Hence, the problem considered in this work assumes the objective function of Equation (10) and the constraints of Equations (2)-(9).

The Clustering Approach for Electrical Customers
The customer clustering problem is defined as the search for similarities between the elements, using techniques based on clustering analysis [53,54]. The main purpose is to find a smaller number of entities that best represent all the original elements, thus better assisting some analysis or planning/operation action in the electrical network [28].
Customer reclustering assumes the geographic location and the possible aggregation of contiguous substations in new areas, thus forming new sets of customers [48]. However, the application of this approach has challenges that require subsidies for its resolution, such as: The combinatorial complexity as the number of power distribution substations grows, requiring a high computational load for its resolution; The nonlinear nature of the power flow that includes integer and continuous variables.
In substation clustering, first the number of sets (T) is calculated based on the problem defined in Section 4.2. From the practical problem perspective, the reclustering task is restricted to an area contiguity matrix ( Figure 1). This matrix is associated with the parameter D ij , relating the medians of each substation to the others and indicating whether the areas are contiguous (D ij << M) or non-contiguous (D ij = M), where M is defined according to Table 1. When the new sets defined in the customer reclustering problem are considered, the assignment of these customers is based on a reference point of each substation. This reference point is generally assumed as the centroid, used to define the partitioning of customers [21]. Herein in this work, nevertheless, the reference point of each substation was assumed to be any one of the customers assigned to the median of the set (substation) [51].
Even though distance is the commonly used criterion to determine the median of each set [55], in this work, a density measure was considered that weights the distance of each customer to the respective median with the load of this customer. This measure allows clusters whose formation are also influenced by load density.
The proposed method for substation clustering was based on the reclustering strategy, affecting the existing customer sets T n and assuming that T is an arbitrary integer that can be greater, equal to, or less than T n . Mainly inspired by the substation reliability indices, the steps linked to the proposed method are illustrated in Figure 2. From the input data, a mathematical model based on the MILP is proposed to define a restricted set of possible medians for the reclustering problem, which involves the construction of new customer sets in an attempt to obtain better reliability indices and to reduce the amount of financial compensation, according to Equations (11) and (21).

The Proposed Algorithm
In order to reduce the computational complexity, a heuristic based on Ahmadi and Osman's constructive procedure [56] was developed to define a set of candidate medians as a parameter in the MILP model.
The median concept is linked to the median of each customer set to be created with the mathematical model described in Section 4.2, with the variables x ij and y j and from the definition of set T n .
Algorithm 1 describes the procedure that systematically creates the set of candidate medians T n and then solves the MILP model, defined in Section 4.2. The input data comprised the customer set (T), the indicators (DIC and FIC), the substations (S) and sets of customers linked to each one of them (TS), the normalization factors α, the value M, the matrix distance D, the limit of iterations LI M it , and the value that controls the acceptable margin of error to characterize the algorithm convergence. As output, Algorithm 1 furnished the sol solution that defined the sets and pertinence of each existing substation to the newly created sets. Steps 1-3 corresponded to the initialization of the variables sol, di f , and it. The outermost loop between the steps 4-18 corresponded to the systematic determination of the set T n (steps 5-11) and of solving the mathematical model of Section 4.2 in step 13. The comparison between the value of the solution sol with the previous solution sol p was conducted in step 15. The construction of the candidate points for the median [56] made between the steps 5-11 had two stages: (i) the density of the points of the substation s was defined in the step 8, considering the distances of the matrix D only for those points that were not already selected in T n ; and (ii) the selection of one of the points in step 9, assuming a weighted roulette-based selection [57,58].

The Proposed Mathematical Model
The problem defined by Equations (2)-(10) presented some characteristics that made its resolution difficult, namely: By considering each customer individually, the corresponding problem presented a high level of complexity, even when scenarios with a number of substations less than a dozen were considered; (b) The median of each substation could be each one of the thousands of customers linked to it; (c) The objective function (Equation (10)) was nonlinear due to the reliability indicators involved.
In order to reduce the complexity of the model defined by Equations (2)-(10), the following approaches were assumed: The set of candidate medians was defined by a heuristic procedure, thus redefining the objective function originally built in Equation (10) to the new form given by Equation (11); (e) The reliability indicators were approximated to make Equations (7) and (8) linear, replacing them with Equations (18) and (19), together with the definition of Equations (16) and (17).
The modified model based on the approaches mentioned in (d) and (e) is defined in Equations (11)- (21).

Results and Discussion
Once the mathematical formulation adopted for the MILP problem of clustering customers of the energy distribution system based on reclustering substations was developed, its effectiveness was evaluated with a real network extracted from a utility in the southern region of Brazil. The utility has more than 1.7 million customers, distributed in 72 cities in Southern Brazil, covering more than 73 thousand km². Figure 3 presents an overview of the regions served by the utility. The utility service area involves 61 (sixty-one) customer sets (red area in Figure 3), four of them were selected (the yellow area in Figure 3) as a case study, corresponding to almost 7% of all the sets: Set 1 to 4, entitled T s 1 , T s 2 , T s 3 , and T s 4 . However, the sets were served by five substations, named in this study as: s 1 , s 2 , s 3 , s 4 , and s 5 .
For validation, the degree of the precision of the calculations was checked, obtaining partial results of the penalties for each of the existing substations, which served as references for comparisons with the results made available by the utility, referring to the sets of the region under study.
The compensation values were considered for a monthly period, over the years 2017-2019, to customers of the four sets (T s 1 :T s 4 ), whose amounts are presented in Table 2. The difference between the real values and those obtained by the proposed model is explained by the treatment of the data, where some occurrences of customer units were eliminated due to data inconsistency.

Results Analysis
Once the calculations were validated, the clustering model was run and compared to the initial scenario. Thus, with each new configuration, new sets were tested, where the substations were grouped forming a set of substations T n = { j | y j = 1 }, while the remaining substations remained ungrouped, moving through combinations conditioned to the set of substations S.
The decision of the best configuration was determined considering the difference in the total values obtained for each new configuration in relation to the original configuration (zero configuration). The figures present the values of each analyzed year, and the representation of the financial compensations is presented by its total (blue line). A negative change indicated a decrease in financial rewards compared to the initial setting, while a positive percentage change indicated an increase.
Multiple scenarios of combinations were tested. The first scenario consisted of two clusters of substations. This scenario presented 13 valid configurations, and their results are shown in Figure 4. Of these configurations, Configuration 11, composed by T n = {{s 1 , s 2 , s 4 , s 5 }, {s 3 }}, was the most predominant, providing the largest reduction in the financial compensation value of approximately USD 86,000 (1.80%), followed by Configuration 13, with a reduction of approximately USD 81,600.
The annual values of the highlighted configuration varied between USD 9000 and USD 34,000, respectively, in 2020 and 2019.
In the second presented scenario, three clusters of substations were formed. The values displayed in Figure 5 highlight Configuration 2, which considered the clustering of T n = {{s 1 , s 2 , s 4 }, {s 3 }, {s 5 }}, resulting in a reduced financial compensation value of approximately USD 67.4 thousand (1.41%), while Configuration 16 obtained the secondhighest reduction in total costs in the same period, resulting in USD 64,010.47. For the third scenario, four clusters of substations were formed. The values presented in Figure 6 show the results obtained after solving the clustering problem, resulting in seven possible configurations.
In the new configuration, two substations were grouped together to form a new set of consumers, while the remaining three substations remained ungrouped. Therefore, T n = {{s 1 , s 4 }, {s 2 }, {s 3 }, {s 5 }}. The comparison carried out, as shown in Figure 6, showed that for configurations 1 and 3, their corresponding compensation values remained unchanged. Configuration 2 resulted in the highest total reduction among the values presented, corresponding to a total of USD 64,010.46, an approximately 1.3% reduction in financial compensation by the utility under analysis. In contrast, the two worst configurations were in Configuration 6, with an increase of USD 63,274.59 and Configuration 4, with an increase of approximately USD 40,500.00. Figure 7 illustrates the results referring to the annual financial compensation of the best solutions (bars), compared to the initial configuration values (blue line).  It is important to point out that this reduction in the amounts of financial compensation did not compromise the reliability of the system, as evidenced by the analysis of the performance indicators presented in the results section. Therefore, it can be concluded that the application of the proposed substation clustering model had a positive impact on the electrical system and resulted in significant improvements in terms of cost and reliability.
However, this article presented an approach to optimize substation clustering in order to minimize compensation costs and maximize distribution system reliability. After implementing the substation clustering model, three optimal configurations were obtained, which presented satisfactory results in relation to the value of the initial configuration, and the decision regarding the ideal configuration was made in favor of the scenario composed of two substation clusters, with a 1.8% reduction in the amount of compensation.
Although the proposed approach initially sought to reduce the compensation costs, it was essential to evaluate the impact of this strategy on the system reliability. In this sense, the results obtained were analyzed in order to verify the evolution of the reliability conditions in each configuration. Figure 8 illustrates the variations of the SAIDI (blue line) and SAIFI (red line) indicators from the initial configuration for the period from 2017 to 2020, together with the average variation of these indicators over this period. It is important to highlight that these results were obtained from the best configurations for the main explored scenarios. After analyzing the figure, initially the results indicated that scenario 3, composed of four clusters of substations, presented the best results for the SAIDI and SAIFI indicators, with values of 33.2% and 8.5%, respectively. However, it is important to note that the lowest values do not necessarily indicate the best option.
Regarding the SAIDI indicator, there was an increase of 2.9% in 2017 compared to the initial value; however, there were reduced results ranging between 8% and 31% in the following years, resulting in an average of 14% below the configuration over the four years analyzed. The same pattern applied to the analysis of the SAIFI indicator. There was an approximate increase of 6%, 2%, and 5% in 2019, 2018, and 2017, respectively, from the baseline. However, this indicator was influenced by a sharp drop of 8% in 2020, resulting in an average variation of 1.1% over the four years.
Additionally, Table 3 lists the summary of the indicators evaluated in the model, considering the results obtained in the model and evaluating them in comparison with the initial (original) configuration. In the end, it was observed that in scenario 1, there was a reduction in the individual values of the SAIDI from 24.72 to 21.18, corresponding to a decrease of 14%. However, the SAIFI indicator showed an increase of 0.27 compared to the initial value. These results were modest when compared to those presented by the best performance scenario (scenario 3).
However, the configurations modeled for the indices were appropriate for substation clustering when considering the financial compensations as the decision variable for the ideal configuration. This was noted, presenting in Figure 9 an overview of the respective evolutions in the reliability conditions over the analyzed period, when the configurations were compared to each other. The graphical analysis allowed comparing the evolution of the indicators over time. When examining the three main scenarios explored, illustrated in Figure 9, similarities were noted in the way the indicators evolved over the period from 2017 to 2020. This means that, regardless of the configuration adopted, it was possible to obtain an average reduction in the SAIDI and SAIFI indicators, which suggests that the indicators were associated, although they did not necessarily cause changes in each other.
The indicators obtained in the three scenarios showed optimal configurations with similar evolutions in relation to reliability, corroborating the decision for scenario 1.
Two other scenarios were tested. One involved clustering the five substations considered in the study, resulting in a reduction of USD 4183.90, representing approximately 0.1%. The fifth scenario considered each of the five substations as a set, but it did not show any improvement in the compensation values when compared to the value generated by the original configuration.

Discussion
As reported in Section 2, the analyzed literature addressed different substation clustering techniques. Jiang, Wu, and Zhan [42] reported that there were two main approaches to clustering substations, depending on the differences in their load patterns or the types of electricity customers associated with them. However, clustering substations with unusual load characteristics can decrease the clustering accuracy.
Jiang, Wu, and Zhan [42], Corigliano et al. [43], and Huang et al. [44] highlighted the importance of clustering substations to obtain relevant information for decision-making and to present the different approaches used to choose substation installation sites, analyze the characteristics, and forecast the substation loads.
In this study, a heuristic approach was proposed for substation clustering based on a reference point, using the p-median problem. This approach was applied to a multiobjective MILP model based on the linear formulation of reliability indicators, seeking to control the complexity of the problem. This strategy proved to be particularly relevant, considering the potentiated p-median problem, mentioned by Gnägi and Baumann [59], which requires efficient solutions for large instances.
According to Gnägi and Baumann [59], in practical clustering applications, the size of clusters is usually limited, leading to the need for an extension of the p-median problem, called the capacitated p-median problem. The authors also mentioned that there are several classic heuristics for formulating and solving this problem, including binary linear programming, but these formulations are more suitable for small-scale instances.
Similar to Yan et al. [60], Xu, Yu, and Yang [61], Li et al. [62], and Liu [63], the spatial complexity of clustering was addressed based on the adjacency matrix (D), describing only the adjacency relationship between possible substations to group in order to achieve a small storage space.
From a practical standpoint, the task of reclustering was linked to the contiguity matrix of areas (Figure 1). Based on this, new set configurations were proposed, exhausting all possible combinations while considering the union of two substations and maintaining a total of four sets, to allow for comparison with the original configuration.
Despite not demonstrating a significant number of clusters formed, the clustering model for substations still achieved its objective of forming clusters, providing a viable path and reducing the number of possible combinations by 13.3%, 20%, and 30% for scenarios 1, 2, and 3 respectively.
The model enabled a reduction in the number of combinations to be analyzed by identifying those that were less relevant to the problem at hand. In this case, the application of the model was efficient and could be extremely useful in reducing the number of combinations to be analyzed, making the analysis more feasible. With this information, it is possible to direct efforts towards the most relevant variables and increase the chances of success in solving the problem at hand, to make more informed and assertive decisions.
When analyzing Figure 6, it is observed that configurations 1 and 3 presented values equal to the original values. This was because the three configurations were within the same range of the limits of the annual SAIDI and SAIFI indicators.
In Rodrigues, Araújo, and Penido [14] and Anteneh et al. [16], the studies discussed approaches involving the optimization of the distribution network through reconfiguration to improve the network reliability by reducing the SAIDI and SAIFI indices.
Rodrigues, Araújo, and Penido [14] proposed a multiobjective approach for the allocation of disconnect switches on feeders using genetic algorithms, which resulted in a significant reduction of 43.77% in the SAIDI index. On the other hand, Anteneh et al. [16] used genetic algorithms to determine the best network configuration for optimal placement of switches, which resulted in reductions of 77.33% in the SAIFI compared to the average value of the system reliability index in base years, and of 80% in the SAIDI.
However, the results obtained do not diminish the importance of this research. It is important to emphasize that the values presented are an average of a four-year period, and it is fundamental to analyze each year individually to better understand the context and the factors that affected the SAIFI.
The research presented significant results and demonstrated that most configurations brought values below their limits, as evidenced in Figure 8. In addition, even if some results surpassed the initial value in some years, it was possible to obtain a general average reduction in the SAIDI and SAIFI indicators over the period of analysis.
The goal of reducing financial tradeoffs and maximizing network reliability has been achieved. However, finding the best configuration for the entire electrical system, considering the various possible combinations between substation configurations, remains a challenge.
The implementation of the best configuration depends on the decision makers, who must identify the areas that need attention and direct their efforts towards specific improvements, according to their objectives. For example, this may involve reducing the costs of financial compensation or improving the quality of the electricity distribution service, such as reducing the frequency of interruptions in the electricity supply.
The utilities must take into account the financial and cost impact of any decision that affects the performance and efficiency of the electricity distribution system. When deciding on the implementation of substation clustering, they need to consider investment costs in infrastructure and technology, maintenance, allocation of financial resources, among others, in their decision-making policies.
Usually, decisions are influenced by broader policies, such as government regulations and standards related to the safety, quality, and efficiency of energy supply. Therefore, utilities must ensure that their decisions are aligned with the costs and benefits of their actions to maintain a balance between economic considerations and the need to improve system reliability.
The results of the study, presented in Figure 9, which compared different scenarios for improving the quality of electrical service, are highly significant. The findings indicate that the evaluated configurations can be considered viable options for decision-making regarding the improvement in the performance indicators. Additionally, the similarity of the results across the three scenarios suggests that the choice of a specific configuration will depend on the particular characteristics and needs of each region or situation.
Utilities should have models that allow them to quickly perform cost-benefit analyses to compare the cost of implementing any model, its potential benefits, and reductions.
In this way, the results obtained from the analysis of the scenarios can be used to aid decision-making regarding the implementation of improvements in the electrical service, considering the particularities and needs of each case. It is important to emphasize that the detailed analysis of the results and the consideration of other relevant factors must be carried out before making the final decision.

Conclusions
In this paper, a heuristic solution based on Mixed Integer Linear Programming was proposed to solve the customer clustering problem in electrical distribution systems, focusing on maximizing the system reliability and on minimizing the compensation costs.
The solution approach was applied to a utility of Southern Brazil, assuming the historical customer data of four years (2017-2020). Starting from the four original customer sets over five substations, an exhaustive approach corroborated the solution reached by the proposed heuristic when the customer clustering problem was solved.
Regarding the contributions of this work, we note the introduction of linear reliability indicators to allow the use of the MILP model and the further development of a heuristic that aimed to reduce the MILP's computational complexity when restricting the median definition.
The present study emphasized the importance of maximizing the reliability of the distribution system, in addition to minimizing compensation costs. The results obtained demonstrate that the three optimal configurations showed a favorable evolution in relation to reliability, reinforcing the importance of considering this aspect when making decisions about the clustering of substations.
The substation clustering of customers approach resulted in significant reductions in the total utility compensation costs of USD 86,382.41 (1.80%) for Scenario 1, comprised of two clusters of substations, about USD 67,400 (1.41%) for Scenario 2, consisting of three clusters of substations, and approximately USD 64,000 (1.3%) for Scenario 3, consisting of four clusters of substations, but the solution with two clusters, composed by (T n = {{s 1 , s 2 , s 4 , s 5 }, {s 3 }}), was the most viable among all tested cluster configurations.
The solutions suggest that it is possible to reduce compensation costs by maximizing the reliability of the distribution system, without the need for new investments in solutions. In addition, the evolution of the indicators indicates that the evaluated configurations are viable options to improve the performance of the quality of the electrical service. The decision on the best configuration will depend on the specific characteristics and needs of each region or situation.
It is important to point out that the different ways of measuring are not mutually exclusive; on the contrary, they are reconcilable and complementary. They are different perspectives but with the same purpose and converging to the same point.
For future work, it is suggested to explore the clustering approach by considering topics such as using the criterion of the number of customers to form sets along with the criterion of the contiguity of areas, applying the dynamic method to calculate the limits of the collective indicators for the newly formed sets, expanding the number of sets, substations, and customers analyzed, as well as considering other indicators and the financial impact with the costs of investment in new equipment, maintenance costs, and others.

Data Availability Statement:
The data from which the results presented in this article were derived are available upon request to the corresponding author. The data are not publicly available, as they contain information about the customers of the utility under study.

Acknowledgments:
The authors express sincere thanks to CEEE Grupo Equatorial Energia and the Federal University of Santa Maria for the technical and financial support provided during the research.

Conflicts of Interest:
The authors declare no conflict of competing financial interests or personal relationships that could have appeared to influence this paper.

Abbreviations
The following abbreviations are used in this manuscript: