Load Profile Segmentation for Effective Residential Demand Response Program: Method and Evidence from Korean Pilot Study

Due to the heterogeneity of demand response behaviors among customers, selecting a suitable segment is one of the key factors for the efficient and stable operation of the demand response (DR) program. Most utilities recognize the importance of targeted enrollment. Customer targeting in DR programs is normally implemented based on customer segmentation. Residential customers are characterized by low electricity consumption and large variability across times of consumption. These factors are considered to be the primary challenges in household load profile segmentation. Existing customer segmentation methods have limitations in reflecting daily consumption of electricity, peak demand timings, and load patterns. In this study, we propose a new clustering method to segment customers more effectively in residential demand response programs and thereby, identify suitable customer targets in DR. The approach can be described as a two-stage k-means procedure including consumption features and load patterns. We provide evidence of the outstanding performance of the proposed method compared to existing k-means, Self-Organizing Map (SOM) and Fuzzy C-Means (FCM) models. Segmentation results are also analyzed to identify appropriate groups participating in DR, and the DR effect of targeted groups was estimated in comparison with customers without load profile segmentation. We applied the proposed method to residential customers who participated in a peak-time rebate pilot DR program in Korea. The result proves that the proposed method shows outstanding performance: demand reduction increased by 33.44% compared with the opt-in case and the utility saving cost in DR operation was 437,256 KRW. Furthermore, our study shows that organizations applying DR programs, such as retail utilities or independent system operators, can more economically manage incentive-based DR programs by selecting targeted customers.


Introduction
Recently, distributed energy resources (DER) such as photovoltaic (PV), wind turbine (WT), energy storage system (ESS), and demand response (DR) have been rapidly expanded on the distribution system. Because of this trend, the power demand characteristics have been more complicated. In addition, various business models and policies as resources application increased have been created. However, the DER expansion leads to load fluctuation on the distribution system locally. The DR program has been regarded as one of the solutions to mitigate imbalance. For this reason, DR programs have recently received significant attention. Under a DR program, electricity consumers change their electricity consumption patterns in response to a time-based rate or incentive payments for the periods when needed [1]. Utilities and/or independent system operators (ISO) manage DR programs to avoid peak demand, high prices, and variable generation of renewables.
DR programs can be divided into two types: price-and incentive-based. Price-based DR programs vary the electricity price depending on certain time conditions being met [1]. Time of use (TOU), critical peak pricing (CPP), and real time pricing (RTP) are examples of this type of DR. Meanwhile, incentive-based DR programs encourage customers to shed their load or sell back to the electricity market. In the case of incentive-based DR programs, targeting suitable customers takes priority before DR implementation [2]. According to the peak time rebate program implemented by San Diego Gas & Electric (SDG&E), targeted enrollment, which selects suitable customers to participate in incentive-based DR programs, is essential for efficient DR operation [3]. Before DR program introduction, the customer demand characteristics analysis is significant because of heterogeneous characteristics. Especially, the costs of recruiting DR customers may be considerable, as the process involves several activities such as marketing, education, and DR system support and operation. If utility companies or ISO do not select suitable customers for enrollment in the DR program, the losses caused by enrollment of inappropriate customers could be substantial. Therefore, to minimize losses, it is essential to secure a large DR capacity with a relatively small number of customers.
Before choosing suitable customers with potential in electricity consumption and similarity between peak time and event, analyzing the load profiles of customers is essential. We considered the customer targeting concept through analyzing several typical load profiles as a result of load profile segmentation. Therefore, load profile segmentation analysis should be conducted for selecting adequate customers. Various clustering methods are normally employed to perform electricity consumer segmentation. Residential electricity consumption is uncertain and variable due to various factors affecting demand, such as home appliance usage patterns, the number of family members, lifestyle patterns, customer occupations, and income levels. These factors cause residential demand to have far more variability than commercial and industrial demand [4], thus making the residential load profile segmentation problem relatively more difficult. When analyzing load profile clusters, their load patterns or characteristics are commonly applied as variables. However, in residential load profile clustering, only considering load patterns poses a number of problems such as an excessively broad spectrum of hourly consumption rates and different peak occurrence times within the same group, whereas the drawback of only considering load characteristics is that consumer patterns are not reflected accurately. To determine suitable DR participant groups, residential customers should therefore be segmented by both pattern and consumption scales.
This paper proposes a two-stage k-means model to address pattern and consumption scales. In the first stage, k-means clustering is conducted based on load characteristics, such as daily consumption and peak occurrence time. In the second stage, k-means clustering is performed based on hourly load profile of residential customers. This methodology is applied to over 800 Korean residential DR participants, for whom hourly electricity use data is available. The results reveal an appropriate segmentation methodology for DR participants. This paper contributes to the literature on load profile segmentation for targeting customers by: • Extending the k-means clustering method to reflect all load patterns and characteristics, thus resulting in outstanding performance; • Deriving home appliances and usage pattern data using only electricity consumption data and not any additional data such as customer information, thus making the analysis more efficient; • Presenting load profile segmentation of Korean household electricity demand data; and • Conducting data analysis to suitable select groups for DR.
The remainder of this paper is organized as follows. In Section 2, we illustrate the current state-of-the-art clustering methodology. In Section 3, we present the proposed two-stage k-means model, which ensures effective household load profile segmentation for targeting residential customers. In Section 4, we show the effect of targeting residential customers in the DR program and compare this Energies 2020, 13, 1348 3 of 18 effect to the effect of opt-in enrollment in Korea. Section 5 concludes and outlines ideas for further research in this area.

Literature Review
This section presents a review of the current state-of-the-art methodology for load profile segmentation. Many studies have been performed to segment load profile accurately by applying various clustering methods. K-means, self-organizing maps (SOM), mixture models, expectation maximization (EM), and spectral clustering have been widely used as clustering methods. Among the several methods available for clustering to address load pattern segmentation, the most commonly employed are standard k-means [5][6][7][8][9][10], adaptive k-means [11,12], fuzzy k-means [13,14], and g-means [15], which is an alternative clustering model to k-means. SOM [16,17] is commonly employed by itself but has also been combined with other clustering methods such as k-means and hierarchical clustering as a hybrid model [18]. Mixture models [19,20] and EM [21] are also popular as statistical clustering methods. For DR program operation, DR customer segmentation is commonly conducted for many reasons. Spectral clustering applying information entropy based piecewise aggregate approximation is proposed for commercial demand response application being able to reflect multiscale similarities [22]. Recently, deep learning based clustering such as deep embedded clustering has become a trend for use in residential baseline estimation [23]. Each of the existing clustering methods normally used for electricity consumer segmentation has its own characteristics and is summarized in Table 1 for each characteristic. As explained in Table 1, each clustering method has its advantages in terms of data type or separation process. Although there are a lot of existing clustering methods, k-means has great strength in that it is easier than other existing models and shows good performance in various problem solving cases. Less computation compared with other clustering methods 3.
Fast and applicable to a wide range of problems 4.
Necessary to specify the number of initial clusters SOM 1. Excellent clustering result 2.
Easy evaluation grouped by visual inspection 3.
Necessary to specify the number of initial clusters Mixture models 1. Ability to model a mixture of both continuous and categorical data 2.
Providing probability that a given point belongs to each of the possible clusters Spectral clustering 1. Allowing more flexible distance metrics and performing well 2.
Necessary to specialized machines with large memory to compute full graph Laplacian matrix (quadratic/super quadratic complexities in the number of data point) Embedded clustering [24] 1. Able to simultaneously learn feature representations and clustering assignments using deep neural networks 2.
Less sensitive to the choice of hyper parameters It also can be used to increase accuracy of customer baseline and select appropriate customers for DR. Zhang et al. [7] proposed clustering by k-means before baseline estimation, and it demonstrated improved results. In regard to addressing clustering structure issues, some studies have employed two-stage clustering methods [14,15] which are similar with the proposed methodology in this study, showing that this structure could reflect all the load factors (i.e., voltage, residential type, consumption, and pattern) better than the structures prevalent in the literature. However, load profile segmentation Energies 2020, 13, 1348 4 of 18 for DR targeting enrollment was not performed in these studies, and they were just focused on similar patterns in groups, which has the limitation of large variation in customer daily consumption. It is hard to use the existing models as it is in this study. Therefore, we considered the two-stage methodology to reflect load characteristics affecting DR at the first stage.
Commonly, optimization methods are utilized for customer targeting in DR program, and there are some studies on this without load profile segmentation [25,26]. Kwac et al. [25] proposed solving the stochastic knapsack problem (SKP) as a means to recruit optimal customers for DR programs. Zhou et al. [26] designed an adaptive targeting method to estimate DR effects.
This paper describes a customer targeting and DR analysis model through a two-stage clustering analysis. The proposed methodology will enable the effective selection of customers for DR programs and illustrate a better DR effect than in opt-in enrollment.

Targeting Customers for Incentive DR Using a Two-stage Load Profile Clustering Method
Selecting and recruiting appropriate customers for DR programs is essential for the successful operation of incentive-based DR. DR potential can be estimated by analyzing customer load characteristics. In this study, we derived adequate customer groups for residential DR from demand data through the load profile segmentation. There are many methods for clustering such as k-means, SOM, fuzzy clustering, Gaussian Mixture Models (GMMs), and hierarchical clustering. We adopted k-means methods in view of simplicity and accuracy, and designed load profile segmentation framework as two-stage methodology considering load characteristics in the first step and load profile value in the second step.

k-means
k-means is a popular method for cluster analysis in data mining that is commonly employed to study electricity demand clustering. It is a simple and robust algorithm which aims to separate n observations into k clusters [15,27]. When a dataset X = {x 1 , x 2 , . . . , x N }(with x i ∈ R n ) and K clusters C = {C 1 , C 2 , . . . , C K } are given, each x i ∈ X is assigned to exactly one cluster C k ∈ C, which is characterized by a cluster centroid µ k . The classical k-means clustering method is performed as follows. First, the integer value K corresponding to the number of clusters is determined. Then, the initial cluster centroid set µ 1 , µ 2 , . . . , µ K is selected randomly. Data point x i ∈ X is assigned to the closest µ k through distance comparison against µ 1 , µ 2 , . . . , µ K using the Euclidean distance. The formula for setting the data set in clusters is illustrated by Equation (1): The clustering algorithm aims to minimize the sum of squares within the groups and maximize it between the groups. The cost function J to be minimized in k-means is therefore expressed by Equation (2): The cluster centroid set update is performed by calculating the mean data set belonging to cluster C k as given by Equation (3): This process is repeated until the distribution of the dataset among the clusters no longer changes. In other words, cluster centroids do not change.

Methodology for Customer Targeting Based on Two-Stage Clustering Method in Efficient DR Operation
The framework used to segment customers into groups based on load profile and to determine appropriate groups for incentive-based DR program participation is depicted in Figure 1. First, load data is collected for load profile clustering. Subsequently, we perform data preprocessing comprising data selection (i.e., exclude weekends, holidays, and event days from the data) and cleansing (i.e., replace missing data and delete incomplete customer data). After data preprocessing, a two-stage load profile clustering is performed to segment residential DR customers in accordance with electricity consumption characteristics and their load profile.
Load profile including information such as peak time, duration, and electricity consumption can estimate approximately how much customers can reduce their capacity, so this information could be an important factor for determining which customers can reduce the most demand during the implementation of the DR program. These characteristics should be extracted from the load profile and treated as variables in the clustering method. Therefore, the characteristics (i.e., daily consumption, peak time) are considered in the first stage of clustering. In the second stage, the classification variable is the normalized load profile. Suitable DR participation groups are then derived by analyzing the segmentation results. Distributions of peak time, average consumption, and peak demand scale could be obtained from this analysis. After selecting the target groups, a DR effect analysis is conducted to verify the effect of targeted enrollment. This analysis shows the demand reduction capacity per customer of the targeted enrollment, and these results are compared with the results obtained assuming opt-in enrollment into the DR program. When the clustering method is applied, considering many variables does not always produce reliable results. Therefore, it is necessary to include the essential variables strategically. However, if there are too many variables to segment customers well, a method to deal with this problem should be devised. In this study, we improve load profile clustering performance by applying our proposed methodology. Figure 2 explains the proposed two-stage load profile clustering algorithm.  Load profile including information such as peak time, duration, and electricity consumption can estimate approximately how much customers can reduce their capacity, so this information could be an important factor for determining which customers can reduce the most demand during the implementation of the DR program. These characteristics should be extracted from the load profile and treated as variables in the clustering method. Therefore, the characteristics (i.e., daily consumption, peak time) are considered in the first stage of clustering. In the second stage, the classification variable is the normalized load profile. Suitable DR participation groups are then derived by analyzing the segmentation results. Distributions of peak time, average consumption, and peak demand scale could be obtained from this analysis. After selecting the target groups, a DR effect analysis is conducted to verify the effect of targeted enrollment. This analysis shows the demand reduction capacity per customer of the targeted enrollment, and these results are compared with the results obtained assuming opt-in enrollment into the DR program. When the clustering method is applied, considering many variables does not always produce reliable results. Therefore, it is necessary to include the essential variables strategically. However, if there are too many variables to segment customers well, a method to deal with this problem should be devised. In this study, we improve load profile clustering performance by applying our proposed methodology. Figure 2 explains the proposed two-stage load profile clustering algorithm. Before load profile segmentation, load characteristics should be found from load profile by using feature selection (being the process of selection of a subset of relevant features). Features used for cluster input variables are selected through correlation analysis. When we derive relevant features from load profile, we consider factors (i.e., daily consumption, peak time, difference between peak demand and minimum demand) affecting effective DR operation.
The next step is normalization of load characteristics for 1st stage segmentation and load profile for 2nd stage segmentation instead of using raw data. Normalization transforms the load to a number from 0 to 1 and can provide better performance by changing the value of input data. The normalization about load characteristics was conducted on the basis of each variable. On the other hand, the normalization about load profile was used in accordance with each customer. Min-max normalization was used, as illustrated by Equation (4) in the case of load profile normalization: After the normalization process, segmentation based on load characteristics is preceded by the k-means method before the load profile segmentation as explained in Figure 2. This process separates customers based on their consumption scale and peak times. In other words, it is a process to segment customers over a large range. The reason why these components are chosen is that consumption scale would be an indicator to estimate how much customers can reduce their demand, and the customer's peak time occurrence during an event indicates whether customers stay at home. The next step is customer segmentation based on load profiles, which is conducted for all members of each group following the first-stage clustering analysis. The effect of separation as two-stage k-means clustering is that features can be better reflected as compared to basic k-means.
The main goal of this analysis is to determine a way to produce the most significant effect with suitable customers enrolled in the DR program. To achieve this goal, we propose two standards to select customer groups with high DR potential. If a peak demand event occurs, the likelihood of customers staying in their homes is relatively high. It may be argued that the corresponding customers tend to be able to reduce their demand effectively. However, this is not an absolute indicator. In some cases, for instance, the demand of some customers could be high although peak demand time may not remain constant, or some customers may register an insignificant demand reduction although peak demand times remain constant. Therefore, we stipulate the following criteria to determine the target groups: 1. Customer groups with high demand consumption; 2. Customers groups who have similar peak demand times with an event. Before load profile segmentation, load characteristics should be found from load profile by using feature selection (being the process of selection of a subset of relevant features). Features used for cluster input variables are selected through correlation analysis. When we derive relevant features from load profile, we consider factors (i.e., daily consumption, peak time, difference between peak demand and minimum demand) affecting effective DR operation.
The next step is normalization of load characteristics for 1st stage segmentation and load profile for 2nd stage segmentation instead of using raw data. Normalization transforms the load to a number from 0 to 1 and can provide better performance by changing the value of input data. The normalization about load characteristics was conducted on the basis of each variable. On the other hand, the normalization about load profile was used in accordance with each customer. Min-max normalization was used, as illustrated by Equation (4) in the case of load profile normalization: where i, t, d i,t , and d i,t are customers, time, demand of customer i at time t, and the min-max normalization result, respectively. After the normalization process, segmentation based on load characteristics is preceded by the k-means method before the load profile segmentation as explained in Figure 2. This process separates customers based on their consumption scale and peak times. In other words, it is a process to segment customers over a large range. The reason why these components are chosen is that consumption scale would be an indicator to estimate how much customers can reduce their demand, and the customer's peak time occurrence during an event indicates whether customers stay at home. The next step is customer segmentation based on load profiles, which is conducted for all members of each group following the first-stage clustering analysis. The effect of separation as two-stage k-means clustering is that features can be better reflected as compared to basic k-means.
The main goal of this analysis is to determine a way to produce the most significant effect with suitable customers enrolled in the DR program. To achieve this goal, we propose two standards to select customer groups with high DR potential. If a peak demand event occurs, the likelihood of customers staying in their homes is relatively high. It may be argued that the corresponding customers tend to be able to reduce their demand effectively. However, this is not an absolute indicator. In some cases, for instance, the demand of some customers could be high although peak demand time may not remain constant, or some customers may register an insignificant demand reduction although peak demand times remain constant. Therefore, we stipulate the following criteria to determine the target groups: 1.
Customer groups with high demand consumption; 2.
Customers groups who have similar peak demand times with an event.
Energies 2020, 13, 1348 7 of 18 After the load profile segmentation, result analysis through a boxplot chart is adopted as a method of excluding customer groups who are inappropriate customers in DR program participation.

Internal Evaluation of the Clustering Method
After performing customer segmentation via clustering, the accuracy the clustering result should be assessed. Evaluation methods are commonly divided into external and internal processes [28]. In external evaluation, the result is assessed by a comparison with the actual value. Internal evaluation is normally used when the data does not contain actual values; thus, the assessment is based on the idea that good results have minimum distance within clusters and maximum distance between clusters (i.e., high intracluster similarity and low intercluster similarity). Although there are many evaluation methods available, we consider only internal evaluations, since they are used to measure the goodness of clustering evaluation structure without respect to external information (i.e., labels or actual results). Among these, we use the Davies-Bouldin index (DBI) and Dunn index (DI). The DBI is an internal evaluation method to quantify clustering quality. It evaluates customer segmentation based on the similarity between clusters and is calculated as follows: where n, µ i , σ i , and d(µ i , µ j ) are the number of clusters, the centroid of cluster i, average distance between µ i and all objects in cluster i, and the Euclidean distance between µ i and µ j , respectively. Its output is a single number, and clustering algorithms with lower output values indicate better performance. The DI is another internal evaluation method to quantify clustering quality. The indicator measures how well clusters are separated and how dense they are. It can be formulated as follows: where d(i, j) and d (k) are the distances between centroids of cluster i and j and between objects within cluster k, respectively. Its output is a single number, and clustering algorithms with larger output values indicate improving performance.

Cost-effective Analysis
From the perspective of operation research (OR), cost-effective analysis is an important component. The effectiveness of customer targeting through load profile segmentation in DR operation is operation cost reduction. Thus, we need to identify the amount of cost variation compared with opt-in and targeting recruitment. We confirmed it by using the cost-effectiveness test which is one of the economic analysis methods usually performed before public project investment [29]. It is divided into Total Resource Cost (TRC), Program Administrator Cost (PAC), Ratepayer Impact Measure (RIM), and Participant Cost Test (PCT). We considered the PAC test to recognize the cost effect according to DR customer targeting in perspective of DR operator (i.e., utility or ISO). The cost and benefits list should be defined before economic effectiveness estimation. The list for analysis from the perspective of DR operators can be specified in Table 2.
To identify whether the utility project is appropriate for investment, each cost/benefit item should be calculated. If the cost-effectiveness test result has a positive value, it represents that the project has profit. The project is a nonprofitable business in the opposite case.
Avoided energy costs is the benefit of decreasing the amount of power purchased in accordance with electricity consumption reduction. It can be formulated as follows: Energies 2020, 13, 1348 8 of 18 where ER and ARU are the amount of power reduction and the unit cost of energy avoidance (i.e., average system marginal price (SMP) during DR event), respectively. Avoided transmission and distribution cost is the benefit reducing demand for transmission and distribution construction as a result of decreasing annual peak demand. It can be formulated as follows: where PR, ATU, and ADU are peak reduction capacity in power system, unit cost of transmission construction avoidance, and unit cost of distribution construction avoidance, respectively. In the case of the cost list, it contains the cost of revenue loss from changes in sales, incentives, DR system operation, measure, evaluation, marketing, and education. Revenue loss from changes in sales is a cost as the utility company cannot provide power to customers as an amount of DR reduction. Incentive paid cost is cost for utility companies to provide incentives to DR participants as a result of demand reduction. Measurement, evaluation, marketing, and education cost are included in DR operation cost and we assume that these costs are calculated proportionate to the number of DR customers.

Load Profile Segmentation for Effective DR Program Operation in Korea
DR options in Korea have mostly been unavailable to residential customers and have been implemented only for commercial and industrial customers. However, utility companies have recently attempted to attract residential customers by changing their policies and opening DR programs to them. The Korea Electric Power Corporation (KEPCO) which is a utility in Korea also conducted a peak-time rebate (PTR) pilot program from November 2017 to February 2018 in 10 events to develop an appropriate residential DR program in Korea [30]. It was performed with about 800 residential customers living in Seoul, Korea. The PTR program was designed based on incentive-based DR to mitigate peak demand by reducing participant demand in accordance with the utility's notification. After the DR event, the PTR provides incentives based on the amount of demand reduction achieved after participants receive a notification to reduce their demand. It does not have any penalty in the case of the PTR program and can make customers who pay a flat electricity price realize that the electricity price has a time-varying rate system.
Although this PTR program is designed for opt-in customers, targeted enrollment to select residential customers with high DR potential is necessary to improve the benefits of the DR program. Therefore, we analyze residential customer demand data from the PTR pilot program and apply the two-stage clustering methodology discussed in the previous section. From this study, we obtain customer clusters according to load pattern and consumption and select suitable groups for efficient DR operation through an analysis of group characteristics. Finally, we identify the actual demand reduction effect in the case of opt-in operation and targeted enrollment operation by applying residential customer data during an actual PTR event.

Input Data
This study was conducted using residential demand data. We obtained residential hourly demand data in the Korea Electric Power Cooperation (KEPCO) service area where residents live. This data covered 847 residential customers, all of whom participated in the PTR pilot program. The data covered the period from November 2017 through February 2018, during which time the PTR pilot program operated from mid-January to the end of February. The PTR events occurred throughout nine days from 17:00 to 20:00. The average hourly demand from residential PTR program customers in Seoul, Korea is illustrated in Figure 3. This study was conducted using residential demand data. We obtained residential hourly demand data in the Korea Electric Power Cooperation (KEPCO) service area where residents live. This data covered 847 residential customers, all of whom participated in the PTR pilot program. The data covered the period from November 2017 through February 2018, during which time the PTR pilot program operated from mid-January to the end of February. The PTR events occurred throughout nine days from 17:00 to 20:00. The average hourly demand from residential PTR program customers in Seoul, Korea is illustrated in Figure 3.

Data Preprocessing: Feature Selection
In the data, preprocessing, missing data imputation, deleting invalid data, selecting eligible days, and reducing dimensions are performed to obtain reasonable results. We conducted a feature selection as part of the dimension reduction for DR potential. When we select features, we consider which factor affects to DR reduction as follows: 1. Is the customer living at home and contributing to peak reduction during the DR event? 2. Are there any incentives to reduce their demand due to large usage? 3. Is the large capacity that can be reduced compared to the base load? Therefore, we selected three features (i.e., daily consumption, peak hour, difference between maximum and minimum demand) from the load profile as principal factors. Deleting features through correlation analysis between these features should be processed. As a result of the correlation analysis, we selected two features (daily consumption and peak hour) for 1st stage clustering based on demand characteristics. The correlation analysis between demand characteristics' features is illustrated in Figure 4.

Data Preprocessing: Feature Selection
In the data, preprocessing, missing data imputation, deleting invalid data, selecting eligible days, and reducing dimensions are performed to obtain reasonable results. We conducted a feature selection as part of the dimension reduction for DR potential. When we select features, we consider which factor affects to DR reduction as follows: 1.
Is the customer living at home and contributing to peak reduction during the DR event? 2.
Are there any incentives to reduce their demand due to large usage? 3.
Is the large capacity that can be reduced compared to the base load?
Therefore, we selected three features (i.e., daily consumption, peak hour, difference between maximum and minimum demand) from the load profile as principal factors. Deleting features through correlation analysis between these features should be processed. As a result of the correlation analysis, we selected two features (daily consumption and peak hour) for 1st stage clustering based on demand characteristics. The correlation analysis between demand characteristics' features is illustrated in Figure 4. This study was conducted using residential demand data. We obtained residential hourly demand data in the Korea Electric Power Cooperation (KEPCO) service area where residents live. This data covered 847 residential customers, all of whom participated in the PTR pilot program. The data covered the period from November 2017 through February 2018, during which time the PTR pilot program operated from mid-January to the end of February. The PTR events occurred throughout nine days from 17:00 to 20:00. The average hourly demand from residential PTR program customers in Seoul, Korea is illustrated in Figure 3.

Data Preprocessing: Feature Selection
In the data, preprocessing, missing data imputation, deleting invalid data, selecting eligible days, and reducing dimensions are performed to obtain reasonable results. We conducted a feature selection as part of the dimension reduction for DR potential. When we select features, we consider which factor affects to DR reduction as follows: 1. Is the customer living at home and contributing to peak reduction during the DR event? 2. Are there any incentives to reduce their demand due to large usage? 3. Is the large capacity that can be reduced compared to the base load? Therefore, we selected three features (i.e., daily consumption, peak hour, difference between maximum and minimum demand) from the load profile as principal factors. Deleting features through correlation analysis between these features should be processed. As a result of the correlation analysis, we selected two features (daily consumption and peak hour) for 1st stage clustering based on demand characteristics. The correlation analysis between demand characteristics' features is illustrated in Figure 4.

Load Profile Segmentation of Residential DR Customers
When load profile clustering is conducted for customer segmentation, it is essential to determine the optimal number of clusters in the data. We used the NbClust package in the R statistical software to estimate the number of clusters, following Charrad et al. [31]. This package provides 30 indices which determine the number of clusters in a data set and offers the best clustering scheme [32]. Hubert statistics values and Dunn index values are also provided by NbClust. These numbers provide a graphical method to determine the number of clusters. We can realize that the number of optimal clusters is situated in a peak point in their plots of second differences, indicating that the number of optimal clusters is six when first-stage clustering based on demand characteristics (i.e., daily consumption, peak demand time, and difference between peak and minimum demand) was conducted. Figure 5 shows the Hubert index and D index results.
to estimate the number of clusters, following Charrad et al. [31]. This package provides 30 indices which determine the number of clusters in a data set and offers the best clustering scheme [32]. Hubert statistics values and Dunn index values are also provided by NbClust. These numbers provide a graphical method to determine the number of clusters. We can realize that the number of optimal clusters is situated in a peak point in their plots of second differences, indicating that the number of optimal clusters is six when first-stage clustering based on demand characteristics (i.e., daily consumption, peak demand time, and difference between peak and minimum demand) was conducted. Figure 5 shows the Hubert index and D index results.
After the first-stage clustering, second-stage clustering for customer segmentation based on demand patterns was conducted. The optimal number of clusters for each of the six groups separated by demand characteristics were 3, 2, 2, 2, 2, and 2. Therefore, we separated residential customers, who participate in the PTR pilot program into 13 groups according to load patterns and consumption.
We performed load profile segmentation through our proposed method and then compared the resulting customer segmentation according to different clustering models. We examined 12 methods, including our proposed method. The remaining clustering methods are based on the fundamental kmeans, SOM [16,17,33], and FCM [34][35][36]   After the first-stage clustering, second-stage clustering for customer segmentation based on demand patterns was conducted. The optimal number of clusters for each of the six groups separated by demand characteristics were 3, 2, 2, 2, 2, and 2. Therefore, we separated residential customers, who participate in the PTR pilot program into 13 groups according to load patterns and consumption.
We performed load profile segmentation through our proposed method and then compared the resulting customer segmentation according to different clustering models. We examined 12 methods, including our proposed method. The remaining clustering methods are based on the fundamental k-means, SOM [16,17,33], and FCM [34][35][36] methodology in which the classification variables are: (1) demand characteristics, (2) load patterns, and (3) both characteristics and load patterns.
To compare the results, the internal evaluation measures Davis-Bouldin index (DBI) and Dunn index (DI) were used. The DBI and DI result of clustering methods were presented as Table 3. The proposed methodology showed the best result according to Table 3, so we conclude that our proposed methodology is indeed appropriate. We judged that the reason why the proposed methodology has a better result can be explained as follows. It is at a point that separation as two-stage clustering framework can reflect each feature impact, considering that factors affecting DR reduction in 1st stage segmentation make it so that rough load profile clustering before 2nd stage segmentation separates each feature impact by its pattern. Generally, clustering methods separate data based on the distance of input variables. Therefore, the undesirable result would be presented if a lot of input variables which can make each variable effect difficult to verify are used unnecessarily. However, the proposed methodology considers all variables by separating the clustering method into two stages. It can make an outstanding result in two-stage k-means clustering. The proposed method separates residential customers into 13 groups, with the load profiles of each group illustrated in Figure 6. Groups 1 through 13 contain 14, 15, 25, 76, 56, 38, 85, 120, 88, 98, 68, 85, and 79 customers, respectively. The load profile of the 13 groups showed morning peak, evening peak, nighttime peak, and dual morning and night peaks. Residential customers do not usually consume electricity during daytime, so these peak characteristics were consistent with residential load profiles. Table 3. DR operation clustering result evaluation according to variable selection and clustering structure.

Index DBI DI Ranking
One-stage k-means (Characteristics) 0.5007 1.0084 2/10 One-stage k-means (Characteristics, Pattern) 1.2108 1.2065 3/9 One-stage k-means (Pattern) To compare the results, the internal evaluation measures Davis-Bouldin index (DBI) and Dunn index (DI) were used. The DBI and DI result of clustering methods were presented as Table 3. The proposed methodology showed the best result according to Table 3, so we conclude that our proposed methodology is indeed appropriate. We judged that the reason why the proposed methodology has a better result can be explained as follows. It is at a point that separation as twostage clustering framework can reflect each feature impact, considering that factors affecting DR reduction in 1st stage segmentation make it so that rough load profile clustering before 2nd stage segmentation separates each feature impact by its pattern. Generally, clustering methods separate data based on the distance of input variables. Therefore, the undesirable result would be presented if a lot of input variables which can make each variable effect difficult to verify are used unnecessarily. However, the proposed methodology considers all variables by separating the clustering method into two stages. It can make an outstanding result in two-stage k-means clustering. The proposed method separates residential customers into 13 groups, with the load profiles of each group illustrated in Figure 6. Groups 1 through 13 contain 14,15,25,76,56,38,85,120,88,98,68,85, and 79 customers, respectively. The load profile of the 13 groups showed morning peak, evening peak, nighttime peak, and dual morning and night peaks. Residential customers do not usually consume electricity during daytime, so these peak characteristics were consistent with residential load profiles.

Customer Targeting for DR Operation
Appropriate customer selection for DR participation by using the load profile segmentation result in the previous section is applied to study efficient DR operation. The 13 load patterns are shown in Figure 7. It can be possible to use load profile and the amount of consumption to estimate DR potential from the 13 load pattern. If customers consume little electricity, their DR participation would be inefficient to a DR operator, despite having a suitable pattern for DR (i.e., nighttime peak and dual morning and night peaks). Therefore, the DR operator considers both factors. To reflect these components, a boxplot analysis is conducted. Peak time (i.e., hour of the day) when maximum demand happens and average consumption boxplots for the 13 groups were analyzed, and they are illustrated in Figure 8.
First, we eliminated groups which experienced inconsistent peak demand occurrence times as the events. The groups not corresponding to this criterion were 6, 7, 8, and 9. Then, groups with low electricity consumption were also deleted, as they are inappropriate for economic purposes. The groups corresponding to little consumption were 4, 5, 12, and 13. We emphasize that utility companies should operate the PTR program using the remaining groups, namely, groups 1, 2, 3, 10, and 11. The total number of customers included in this targeted enrollment scenario was 220.

Customer Targeting for DR Operation
Appropriate customer selection for DR participation by using the load profile segmentation result in the previous section is applied to study efficient DR operation. The 13 load patterns are shown in Figure 7. It can be possible to use load profile and the amount of consumption to estimate DR potential from the 13 load pattern. If customers consume little electricity, their DR participation would be inefficient to a DR operator, despite having a suitable pattern for DR (i.e., nighttime peak and dual morning and night peaks). Therefore, the DR operator considers both factors. To reflect these components, a boxplot analysis is conducted. Peak time (i.e., hour of the day) when maximum demand happens and average consumption boxplots for the 13 groups were analyzed, and they are illustrated in Figure 8.
First, we eliminated groups which experienced inconsistent peak demand occurrence times as the events. The groups not corresponding to this criterion were 6, 7, 8, and 9. Then, groups with low electricity consumption were also deleted, as they are inappropriate for economic purposes. The groups corresponding to little consumption were 4, 5, 12, and 13. We emphasize that utility companies should operate the PTR program using the remaining groups, namely, groups 1, 2, 3, 10, and 11. The total number of customers included in this targeted enrollment scenario was 220.  After finding targeting groups, calculating the amount of demand reduction was performed to identify the effect in accordance with targeting customers by using the actual PTR event data. There were 847 DR participation customers and there were nine event days when the utility company  After finding targeting groups, calculating the amount of demand reduction was performed to identify the effect in accordance with targeting customers by using the actual PTR event data. There were 847 DR participation customers and there were nine event days when the utility company After finding targeting groups, calculating the amount of demand reduction was performed to identify the effect in accordance with targeting customers by using the actual PTR event data.
There were 847 DR participation customers and there were nine event days when the utility company notified residential PTR customers to reduce demand. We considered that all of residential customer (i.e., 847 customers) are participated in PTR pilot program in case of Opt-in enrollment, and targeted enrollment is attracted by a group of customers who are able to reduce their demand more than other groups during the event. There were 220 customers in the targeted enrollment group, which is different from the number of customers in the opt-in enrollment group. To compare the demand reduction for both types of enrollment, we calculated average demand reduction of event days per customer in both cases. As the customer baseline load (CBL) should be estimated for demand reduction capacity due to the DR event, we applied the Max 4 of 5 method, which has been used for the PTR program in Korea [30]. The Max 4 of 5 method estimates CBL by averaging high demand of four days among five eligible days, which means days excluding weekends, event days, and holidays. Average demand reduction for event days per customer in opt-in enrollment, targeted enrollment, and the 13 groups are illustrated in Table 4. Average demand reductions for the opt-in enrollment and targeted enrollment program were 0.2620 (kWh) and 0.3496 (kWh), and the difference between them was 0.0876 (kWh). The electricity consumption for 6~8PM was 1.3569 (kWh). The demand reduction ratio based on common demand during events was 19.31% and 25.76%, respectively. An improvement of 6.45% was observed, with targeted enrollment reduction increasing demand reduction by 33.44%, in comparison with opt-in enrollment. Thus, it is significantly more efficient to operate the DR program with customers who have larger DR potential, as defined in this study.
Additionally, we conducted a cost-effectiveness analysis for managing the DR program in two cases: residential customers who want to participate in the DR program and targeted residential customers who have large DR potential. We assume that the demand reduction of targeted customers is the same as the actual DR participants in identifying the cost-effectiveness of DR customer targeting. Economic analysis based on the California Standard Practice Manual is performed from the perspective of the DR operator [29]. There were 847 total households participating in the PTR pilot program whose total average reduction is 221.914 kWh, and 635 households (which comprise 75% of the total participants) that we determined as DR targeting participants.
Customer operation cost decreased due to the reduced number of customers, and the amount of increased benefit is 437.256 KRW, the exchange rate is 1100 KRW, marking a 108.58% benefit increase over the existing economic analysis result. The economic analysis changes by customer targeting is presented as Table 5.

Conclusions
We presented an appropriate DR customer selection methodology for a Korean residential DR program to maximize the DR effect with lower customer enrollment. The proposed method showed better performance than other methods. Our method is divided into two parts. The first is customer segmentation according to load profile and consumption, and the second is targeted group selection based on two standards for DR participation. When we conducted customer segmentation, a two-stage clustering method was introduced. Customers were clustered by demand characteristics as variables in the first stage, and then segmented based on load patterns in the second stage. It can reflect more features of residential demand data than existing clustering methods, that makes better result in customer segmentation. Customer groups were classified as having higher DR potential by peak time and consumption patterns to select adequate groups having large potential in PTR program. As a result, the targeted groups were 1, 2, 3, 10, and 11 in our sample of residential customers in Korea, and their average demand reduction was 0.3496 (kWh), for an improvement of approximately 0.0876 (kWh), which increased savings by 33.44% compared to demand reduction due to opt-in enrollment. The proposed method allowed identifying enhanced DR effects. After the DR targeting demand reduction, we also conducted the cost-effective analysis of the PTR program from the perspective of the DR operator.
As a result, we observed that targeted DR capacity may be achieved with a small number of customers if targeted enrollment is implemented, which can use infrastructure and operation costs effectively. These results provide insights into the efficient use of DR in Korea. The number of customers and total DR capacity of targeted enrollment decreased compared with opt-in enrollment. However, if the number of customers who would like to participate in the DR program is high enough when the official full-scale program starts, selecting optimal customers among them would be more highly important. Therefore, the proposed method would be of great help in ensuring an efficient and economically sensible DR program in Korea.
We considered the residential customer targeting based on customer segmentation in demand response in this paper. Customer segmentation focus on the model structure to reflect features affecting demand response well. Some researches consider clustering model with heuristic algorithm in other areas, so we will apply this concept in further study.
Author Contributions: Conceptualization, E.L. and J.K.; data analysis, simulation, and methodology framework development, E.L.; writing, review, and editing, J.K.; supporting data collection and comments for improving the article, D.J. All authors have read and agreed to the published version of the manuscript.