An Incremental Clustering Algorithm with Pattern Drift Detection for IoT-Enabled Smart Grid System

The IoT-enabled smart grid system provides smart meter data for electricity consumers to record their energy consumption behaviors, the typical features of which can be represented by the load patterns extracted from load data clustering. The changeability of consumption behaviors requires load pattern update for achieving accurate consumer segmentation and effective demand response. In order to save training time and reduce computation scale, we propose a novel incremental clustering algorithm with probability strategy, ICluster-PS, instead of overall load data clustering to update load patterns. ICluster-PS first conducts new load pattern extraction based on the existing load patterns and new data. Then, it intergrades new load patterns with the existing ones. Finally, it optimizes the intergraded load pattern sets by a further modification. Moreover, ICluster-PS can be performed continuously with new coming data due to parameter updating and generalization. Extensive experiments are implemented on real-world dataset containing diverse consumer types in various districts. The experimental results are evaluated by both clustering validity indices and accuracy measures, which indicate that ICluster-PS outperforms other related incremental clustering algorithm. Additionally, according to the further case studies on pattern evolution analysis, ICluster-PS is able to present any pattern drifts through its incremental clustering results.


Introduction
The smart grid system has been developing with the integration of massive new technologies, such as Internet of Things (IoT), Blockchain, and Artificial Intelligence (AI) [1][2][3]. Diverse IoT devices and frameworks are applied on smart grid to support data collection, transmission [4], real-time monitoring [5], etc. Blockchain technologies can provide decentralization, trust, and an incentive mechanism for improving the cybersecurity of smart grid system [6][7][8]. Compared with AI, the applications of AI methods including machine learning and deep learning are usually used to process and analyze data for decision-making, such as electric load forecasting [9,10], electric consumer categorization [11], and anomaly detection [12]. In such a smart grid system, the smart meter is an essential IoT device that records energy consumption data for further understanding, managing, planing, and optimizing power demands of electric consumers [13,14].
Smart meter data, also called electricity load data, are data streams that record the electricity consumption behaviors of consumers at regular intervals. They can be used for various studies and applications in smart grid, such as load forecasting, load profiling [15], anomaly detection [16,17], consumer categorization [18], and energy disaggregation [19]. In the studies of load profiling, one significant purpose is to extract the typical electricity consumption patterns, which is usually called load patterns, of every consumer based on load data clustering [20]. Most of works on load data clustering focus on the clustering problem of static load data. However, we notice that updating load patterns based on new load data is essential because electricity consumption behaviors may be changeable and inaccurate load patterns can cause wrong decisions. Although load patterns can be updated by conducting repetitive clustering on overall load data including the new ones, this leads to extra computation and storage, especially in batch-oriented data processing. In that case, incremental learning, which refers to learning from streaming data that arrive over time [21], can be a better solution as it can make full use of the historical information, reduce the training scale, and save training time [22]. Moreover, there are also some special clustering algorithms designed for data streams mining [23]. However, few of them are designed for high-dimensional smart meter data streams so that it is necessary to find out an effective incremental clustering algorithm to update load patterns, especially for end consumers with limited resources.
In real-world industry and our daily lives, the electricity consumption behaviors of consumers may change over time. Some consumers keep their patterns for a long period while others may change frequently. An example of load pattern drift is shown in Figure 1. Each curve denotes a typical load pattern of the same electricity consumer, and the curves in the same color in different subfigures indicate the same load pattern. This consumer has two typical load patterns from January to July, which means that this consumer has a stable electricity consumption behavior. Then, it can be observed that the load patterns drift twice. The first drift happened in August shown by the red curve in Figure 1b, which indicates that this consumer has a new electricity consumption behavior. The second drift happened in August shown by the cyan curve in Figure 1d. Then, this consumer has four electricity consumption behaviors since October. Once we extract the load patterns from the static electricity load data in a certain period, these load patterns are fixed unless they are updated. It is possible that there are some new load patterns that denote consumer behavior drift in the following periods, so that we should update the previously obtained load patterns by adding the new ones. However, consumer behaviors are complex. It is still uncertain that all coming load patterns are new, which means that some load patterns may exist in the previously obtained load patterns and others may not. In that case, we cannot simply add each load pattern extracted from the new coming data or assign them to any existing load patterns. How to update load patterns accurately is the main challenge of our incremental clustering problem. Therefore, this work proposes an incremental clustering algorithm with probability strategy, which is named ICluster-PS. We assume that this algorithm can deal with smart meter data streams to update load patterns efficiently for every end-consumer through facilities with limited time and space. The incremental clustering algorithm of ICluster-PS includes three phases: load pattern extraction, load pattern intergradation, and load pattern modification. Load pattern extraction is a preparation to extract load patterns from new electricity load data, which are preprocessed as daily load curves. Load pattern intergradation and modification is an novel approach for determining whether or not we should create a new load pattern and optimize K for the number of updated load patterns. A short paper of this work is published in [24], and we revise and extend it by adding more details of the algorithm, experiments and pattern evolution analysis in this paper. The main contributions of this work are summarized as follows: • We consider the problem of load pattern update based on smart meter data streams, and propose an incremental clustering algorithm for continuously updating load patterns. It is significantly helpful for learning electricity consumption behaviors in smart grid field. • In the incremental clustering algorithm, we propose a probability strategy on distance measure for optimizing the performance of incremental clustering, and also consider updating parameter to conduct continuous incremental clustering with new coming data. • We evaluate both accuracy and clustering validity of our algorithm on a real-world dataset, which contains 17,776 commercial and residential electricity consumers in various districts. The results indicate that ICluster-PS is closed to the performance of the non-incremental clustering based on overall daily load curves and outperforms other related incremental clustering algorithm in terms of both clustering validity and accuracy. • The load pattern evolution can be clearly presented by the incremental clustering results, in which we are able to detect any pattern drifts or anomalies of electricity consumers.
The rest of this paper is organized as follows. Section 2 briefly reviews the related works. Section 3 provides the preliminary for Section 4, which introduces the details of the proposed incremental clustering algorithm. Experimental settings are presented in Section 5, and results with evaluation are discussed in Section 6. Finally, we conclude this work in Section 7.

Related Work
This section briefly reviews the most relevant related works in terms of load pattern extraction, incremental learning algorithms, and data stream clustering. Electricity consumer load pattern extraction is one of the most important research areas in smart grid, while incremental learning and data stream clustering are two related research areas in machine learning and data mining. However, there are few works that consider the problem how to conduct an incremental learning for electricity consumer load pattern extraction. Some relevant research works are compared in Table 1. Load Pattern Extraction. Load pattern extraction is an unsupervised clustering problem. There are two types of clustering methods for load data clustering: direct clustering and indirect clustering [15]. In direct clustering, load data are directly used in clustering without any additional dimension reduction or data preprocessing methods. There are many classical clustering algorithms for load data clustering, such as K-means, fuzzy K-means, self-organizing map (SOM), and support vector clustering (SCV) [36][37][38]. As for indirect clustering, researchers usually pay more attentions to dimension reduction, feature extraction and feature construction methods for load data preprocessing. In [25], the authors constructed three new types of features. Their work indicates that the clustering performance of constructed features outperforms the one of default features. In [26], two variations of K-means algorithm with four proposed dimension reduction methods are applied to the clustering process in load profiling. A fused load curve clustering algorithm based on wavelet transform (FCCWT) is proposed in our previous work [39]. This algorithm first applies a multi-level wavelet transform to daily load curves for dimension reduction, and then fuse the K-means clustering results of both normalized approximation signals and detail signals, which are two outputs of wavelet transform, to gain an optimized clustering result.
Incremental Learning Algorithms. In recent years, incremental and online learning gain more attentions especially in big data and data stream areas [40,41]. There are many incremental learning algorithms based on ν-support vector regression, support vector machines (SVM), random forest (RF), neural networks, etc. [27,[42][43][44]. An incremental support vector machine (ISVM) with Markov resampling (MR-ISVM) is introduced in [22] to study how dependent sampling methods influence the learning ability of ISVM. However, most of incremental learning algorithms study supervised classification without adding new classes. Although an incremental learning based on RF is studied to incrementally learn new classes for large-scale image [27], this method adds new classes into the trees without judging whether or not the coming classes are new. In [28], the authors proposed an incremental algorithm based on fast finding and searching of density peaks (CFS), named ICFKM, for clustering large data in industrial IoTs. Two challenges-how to integrate new clusters into the previous one and how to update the clustering centers-are solved in ICFKM, which seems to be useful for our incremental clustering problem. However, CFS has relatively strong subjectivity for selecting cluster centers based on the decision graph [45] so that it cannot applied in batch-oriented data processing. Moreover, CFS does not work well on relatively high-dimensional data. Many clusters may be missed by CPS because it only considers the global structure of data [46]. As time-series electricity load data have relatively high dimensions, ICFKM cannot be directly adopted for updating load patterns. Data Stream Clustering. Clustering data streams requires the capability of partitioning observations continuously within limited memory and time [47]. Most data stream clustering algorithms consist of an online step that incrementally processes the data stream and produces summary statistics, and an offline step that summarizes data to generate clusters by traditional batch clustering algorithms [48]. There are various classic data stream clustering algorithms, such as Stream, CluStream, StreamKM++, DenStream, and HPstream. Both HPstream [29] and incPreDeCon [30] can deal with high-dimensional data streams. The former algorithm is based on K-means, while the later one is based on PreDecom which is a density-based clustering algorithm and requires too many parameters to be run efficiently. In [31], the authors introduced a data stream clustering based on Fuzzy C-mean algorithm and entropy theory. In [32], the authors developed algorithms for clustering high-dimensional dynamic data streams, whereas the algorithms are based on the assumption that no insertions of data that are already in the dataset, which may be not consistent with our load data. Meanwhile, the efficiency of these proposed algorithms is only evaluated by a 2D implementation. In [33], a fully online clustering algorithm is proposed for clustering evolving data streams into arbitrarily shaped clusters (CEDAS), which is also a density-based clustering algorithm. In [34], a density-based clustering algorithm called DStream-GC is designed for discovering gradual moving object clusters pattern from trajectory streams. In [35], a self-organizing incremental neural network (SOINN+) is developed for unsupervised learning clusters with arbitrary shapes from noisy data. Although some algorithms are incremental methods or declare that they can process high-dimensional data streams, their validity and efficiency on load pattern extraction and update require further evaluation. For example, density-based clustering algorithms may not achieve an excellent performance in the experiments of high-dimensional load curve clustering.
In summary, the works on load pattern extraction do not consider the incremental learning problem in their clustering algorithm, while the existing incremental learning or data stream clustering algorithms are not designed for load clustering. Therefore, it is essential to provide an incremental clustering algorithm for our load clustering problem.

Preliminary
Before introducing our method, we should first give the problem formulation and several important mathematical notations, shown in Table 2. We also briefly present the method used for load pattern extraction, which is the base of electricity consumer behavior learning. No. of daily load curves in X s C s the set of clusters obtained from a load curve clustering on [X 0 , X 1 , · · · , X s ] C si the ith cluster in C s , 1 ≤ i ≤ K s . The cluster center of C si is µ si n si No. of daily load curves in C si A s the set of cluster centers, also called load patterns, of C s µ si the ith load patterns in A s , referring to the cluster center of C si , 1 ≤ i ≤ K s K s No. of load patterns in A s / clusters in C s P s the set of probabilities of load patterns A s p si the probability of µ si , p si ∈ P s , ∑ K s i=1 p si = 1 X 0 the set of initial daily load curves, X 0 ∈ X X 1 the first set of new daily load curves, X 1 ∈ X A 0 the set of load patterns obtained from a load curve clustering on X 0 a 1 the set of load patterns obtained from a load curve clustering on X 1 iA 1 the set of load patterns obtained from load pattern intergradtion on [A 0 , a 1 ] A 1 the set of updated load patterns obtained from the incremental clustering on [X 0 , X 1 ]

Problem Formulation
For an electricity consumer, let X 0 = {x 01 , x 02 , . . . , x 0N 0 } ∈ R d×N 0 where x 0i is ddimensional vector be the electricity load data and N 0 be the number of days contained in the dataset X 0 . We can extract the load patterns from these data by conducting daily load curve clustering.
vector that presents the electricity power consumption of one consumer in one day. It is recorded by a smart meter at a regular interval, which usually is 1 h, 30 min, or 15 min. Definition 2 (Load Pattern). Given a set of daily load curves X 0 = {x 01 , x 02 , . . . , x 0N 0 } ∈ R d×N 0 , we apply a load curve clustering to X 0 and obtain a set of clusters C 0 = {C 01 , C 02 , . . . , C 0K 0 }. Let A 0 = {µ 01 , µ 02 , . . . , µ 0K 0 } ∈ R d×K 0 be the set of cluster centers of C 0 , and each µ 0i is called a load pattern that denotes one typical electricity power consumption behavior feature of the consumer. Every electricity consumer may have one or several load patterns. As X 0 contains N 0 daily load curves which are divided into K 0 clusters, let n 0i , 1 ≤ i ≤ K 0 , be the number of daily load curves contained in the cluster C 0i ∈ C 0 , and we obtain ∑ K 0 i=1 n 0i = N 0 . Then, we can give the definition of the probabilities of load patterns. Definition 3 (Probability of Load Pattern). The probability of a load pattern µ 0i denotes the percentage of the daily load curves represented by µ 0i in the whole daily load curve dataset X 0 . Let P 0 = {p 01 , p 02 , . . . , p 0K 0 } be the set of probabilities of load patterns A 0 , then where After obtaining a set of load patterns A 0 based on X 0 , a new set of daily load curves X 1 = {x 11 , x 12 , . . . , x 1N 1 } ∈ R d×N 0 comes due to the continuous electricity power consumption. We aim to obtain a set of updated load patterns A 1 = {µ 11 , µ 12 , . . . , µ 1K 1 } ∈ R d×K 1 based on the existing load patterns A 0 and the new daily load curves X 1 . This means that we conduct an incremental clustering with X 1 and A 0 rather than an overall clustering with [X 0 , X 1 ].
As new sets of daily load curves continuously come, we can give a generalization of our incremental clustering problem.
be the existing load patterns and X t = {x t1 , x t2 , . . . , x tN t } ∈ R d×N t be the new set of daily load curves, we aim at proposing an incremental clustering algorithm that can obtain a set of updated load patterns A t = {µ t1 , µ t2 , . . . , µ tK t } ∈ R d×K t , which equals or approximates to the load patterns extracted directly from overall daily load curves X = {X 0 , X 1 , . . . , X t }.

Load Pattern Extraction
Load pattern extraction is based on the clustering of daily load curves in this work. We adopt a fused load curve clustering algorithm called FCCWT [39] to extract the load patterns. The diagram of FCCWT is illustrated in Figure 2. This algorithm is designed specially for load clustering based on time-series electricity load data in our previous work. It conducts an indirect clustering, in which daily load curves are transformed into approximation signals and detail signals by a multi-level Harr wavelet before the load curve clustering for dimension reduction. Moreover, the approximation signals X αL and detail signals X αH are clustered separately and then fused to avoid information loss caused by the dimension reduction and improve the clustering performance. Although this algorithm is non-incremental, it provides a higher clustering validity comparing with other related methods.

Incremental Consumer Behavior Learning
In this section, we introduce the incremental clustering algorithm used for electricity consumption pattern learning. First, we present an overview of the incremental clustering algorithm. Second, we optimize this algorithm by a novel probability strategy in order to improve the incremental clustering performance. Third, several parameters are updated for the following continuous incremental clustering. Finally, we give the generalization of our optimized incremental clustering with the analysis of its asymptotic time complexity.

Incremental Clustering Algorithm
As presented in Section 3, the inputs are the existing load patterns A 0 and new daily load curves X 1 , while the output is the updated load patterns A 1 . The main challenge of our problem is how to determine whether to create a new load pattern. As consumer behaviors are complex, it is uncertain that there are any different load patterns in X 1 comparing with A 0 . We cannot conduct a simple clustering by regarding all µ 0i ∈ A 0 as the cluster centers. As a result, a novel incremental clustering algorithm is proposed to intergrade the load patterns of X 1 into A 0 . This model is able to determine whether integrating a load pattern into a µ 0i or keeping it as a new load pattern. An illustration of the incremental clustering algorithm is presented in Figure 3, which contains three phases: load pattern extraction, load pattern intergradation, and load pattern modification. As the example shown in Figure 3, the set of existing load patterns A 0 , which is extracted from X 0 , contains two load patterns. Then, we extract five new load patterns from new daily load curves X 1 and intergrade them with the two existing load patterns one by one. For the intergration of µ a 1 ,1 , we obtain an existing load pattern and an intergrated load pattern. After five times of load pattern intergradation and one extra load pattern modification, we finally obtain four updated load patterns.

SSWC Judge
Create new wait in line, one by one Reset Figure 3. An illustration of the incremental clustering algorithm, including (1) load pattern extraction, (2) load pattern intergradation, and (3) load pattern modification. The inputs are the existing load patterns A 0 and new daily load curves X 1 , and the output is the updated load patterns A 1 .
Load pattern extraction. We need to process the new daily load curves X 1 = {x 11 , x 12 , . . . , x 1N 1 } before the load pattern intergradation. A fused load curve clustering algorithm FCCWT [39], which is our previous work designed specially for daily load curve clustering, is applied to X 1 . Then, we obtain the set of its load patterns a 1 = {µ a 1 1 , µ a 1 2 , . . . , µ a 1 K a 1 }. The corresponding probability of a 1 is P a 1 = {p a 1 1 , p a 1 2 , . . . , p a 1 K a 1 }, where p a 1 i = n a 1 i /N 1 and 1 ≤ i ≤ K a1 .
Load pattern intergradation. Let iA 1 denote the result of load pattern intergradation, and iA 1 is initialized as iA 1 = A 0 . We combine the ith load pattern µ a 1 i ∈ a 1 with all load patterns in iA 1 , which is denoted as [iA 1 , µ a 1 i ]. Then, two K-means clusterings are performed on [iA 1 , µ a 1 i ] with K = K 0 and K = K 0 + 1, respectively. We evaluate their clustering results by the Simplified Silhouette Width Criterion (SSWC), which is one variant of Silhouette Width Criterion (SWC) index [39,49]. Then, the SSWC values of the clustering results when K = K 0 and K = K 0 + 1 are denotes as SSWC K=K 0 and SSWC K=K 0 +1 , respectively.
where a c r ,µ j is the distance between µ j and the center of cluster c r ∈ C, while b c r ,µ j is the closest distance between µ j and the centers of other clusters in C except for c r . They are calculated as follows: where 1 ≤ r, w ≤ K and r = w. K refers to the parameter of clustering conducted on [iA 1 , µ a 1 i ]. Then, we obtain SSWC K=K 0 and SSWC K=K 0 +1 according to Equations (2)-(4). There are two situations when comparing SSWC K=K 0 and SSWC K=K 0 +1 .
(1) SSWC K=K 0 ≥ SSWC K=K 0 +1 implies that the clustering performance of K = K 0 is equal or superior to the performance of K = K 0 + 1. As a result, we do not keep the ith load pattern µ a 1 i as a new load pattern, and adopt the set of cluster centers when K = K 0 as the integrating result of [iA 1 , µ a 1 i ].
(2) SSWC K=K 0 < SSWC K=K 0 +1 implies that K = K 0 + 1 results in a better clustering performance than K = K 0 does. In that case, we keep the ith load pattern µ a 1 i as a new load pattern, and adopt the set of cluster centers when K = K 0 + 1 as the integrating result of [iA 1 , µ a 1 i ].
After the above comparison and judgment, the set iA 1 is reset with the integrating result of [iA 1 , µ a 1 i ]. Each µ a 1 i over i = 1, 2, . . . , K a 1 with iA is integrated gradually according to this procedure. Finally, we obtain the intergraded set iA 1 = {iµ 11 , iµ 12 , . . . , iµ iK iA 1 }. Load pattern modification. We perform a further modification on the intergraded set iA 1 to obtain an optimal incremental clustering result. As the the number of load patterns generally is within the range K ∈ [2, 10] [25,39], multiple K-means clusterings are applied to iA 1 with K in the range of 2 to min{K iA 1 , 10}, where K iA 1 denotes the number of load patterns in iA 1 . The SSWCs of min{K iA 1 , 10} − 1 times clusterings are calculated and compared with each other. Then we select the K with the largest SSWC as the optimal parameter, and regard the set of cluster centers with the selected optimal K as our target set of updated load patterns A 1 .
We outline the incremental clustering of A 0 and X 1 in Algorithm 1, including the three phase mentioned above. In Algorithm 1, Line 1 is for load pattern extraction, Lines 2-11 conduct load pattern intergradation, and Lines 12-16 are for load pattern modification.

Algorithm 1: The incremental clustering algorithm
Input: a set of existing load patterns A 0 , a set of new daily load curves X 1 ; Output: the set of updated load patterns A 1 . 1 Apply FCCWT algorithm to X 1 to obtain the set of its load patterns a 1 = {µ a 1 1 , µ a 1 2 , . . . , µ a 1 K a 1 }; 2 Initialize iA 1 = A 0 ; 3 for each µ a 1 i do 4 Combine iA 1 and µ a 1 i as a set [iA 1 , µ a 1 i ]; Calculate the SSWC of the clustering result; Integrate µ a 1 i into an existing load pattern by resetting iA 1 with the cluster centers of K= K 0 ; 10 else 11 Keep µ a 1 i as an new load pattern by resetting iA 1 with the cluster centers of K= K 0 +1; 12 for K = 2, 3, . . . , min{K iA 1 , 10} do 13 Perform K-means clustering on iA 1 ;

14
Calculate the SSWC of the clustering result; 15 Select the K with the largest SSWC as K opt ; 16 Assign the cluster centers of K-means(iA 1 ,K opt ) to A 1 ; 17 return A 1 .

Optimization via Probability Strategy
Assume A 1 = {µ 11 , µ 12 , . . . , µ 1K 1 } ∈ R d×K 1 is the load patterns extracted directly from the combined set [X 0 , X 1 ], then A 1 is based on the non-incremental clustering of N 0 + N 1 daily load curves. On the other hand, the incremental clustering algorithm shown in Algorithm 1 is based on the fusion of load patterns from both A 0 and a 1 , which refer to only K 0 + K a 1 load patterns. Our purpose is to obtain an A 1 that equals or approximates to A 1 . However, the simply K-means clustering algorithm with Euclidean distance is not appropriate to achieve this purpose.
It should be considered that the load patterns usually have different probabilities so that we should not treat them equally in the incremental clustering. Thus, an optimized distance measure with probability strategy is proposed for Algorithm 1, in which Euclidean distance measure is replaced with the proposed measure when performing both K-means clustering and SSWC calculation shown in Equation (3). It is assumed that this probability strategy can optimize Algorithm 1 to achieve an ideal A 1 . Given a set of load patterns A = {µ 1 , µ 2 , . . . , µ K } with the set of corresponding probability P = {p 1 , p 2 , . . . , p K }, where p i = n i /N and N is the number of daily load curves that A refers to, the optimized distance with probability strategy between µ i and µ j is calculated as follows: where n i and n j denote the numbers of daily load curves that µ i and µ j represent, respectively. The cluster center µ r in K-means clustering with Euclidean distance is calculated as the mean of the objects that contained in the cluster: where m r is the number of µ contained in the cluster C r . We set the probability of µ r with p µ r = 1/N when performing K-means clustering with the optimized distance. As a result, the optimized distance with probability strategy between µ i and µ r is calculated as follows: Similarly, the calculation of cluster center shown in Equation (6) should be rewritten as where n r denotes the number of daily load curves that the load pattern µ r refers to, and ∑ µ ∈ C r n r denotes the total number of daily load curves that all µ ∈ C r refer to.

Updating Parameters
As new daily load data continuously grow with the electricity power consumption of consumers, we should update several essential parameters after one incremental clustering for the preparation of the next incremental clustering. The sets X 0 and X 1 contain N 0 and N 1 daily load curves, respectively. Their combined set [X 0 , X 1 ] contains N 0 + N 1 daily load curves totally. The incremental clustering on [X 0 , X 1 ] gives the set of updated load patterns A 1 and the set of its corresponding clustering result C 1 = {C 11 , C 12 , . . . , C 1K 1 }. Let P 1 = {p 11 , p 12 , . . . , p 1K 1 } be the set of corresponding probabilities of A 1 , the probability of µ 1r ∈ A 1 for the rth cluster C 1r is updated as where n 1r is the number of daily load curves that the load pattern µ 1r represents. We update n 1r as follows: where ∑ µ 0 ∈C 1r n 0r denotes the total number of daily load curves that all µ 0i ∈ A 0 belonging to C 1 r represent, and ∑ µ a 1 ∈C 1r n a 1 r denotes the one that all µ a 1 i ∈ a 1 belonging to C 1 r represent. After the updating of P 1 , A 1 is ready to be conducted in another incremental clustering with the next coming data set X 2 .

Generalization of Incremental Clustering
In practice, there are continuous coming new daily load data sets X 1 , X 2 , . . . , X t . The generalization of incremental clustering algorithm, which is based on the existing load patterns A 0 and new daily load curves X = {X 1 , X 2 . . . , X t }, is outlined in Algorithm 2. For n sr and p sr in the generalized algorithm, the updating equations shown in Equation (9) and Equation (10) become n sr = ∑ µ s−1 ∈C sr n s−1,r + ∑ µ as ∈C sr n a s r , where N i is the number of daily load curves that X i contains, and µ s−1 and µ a s are the load patterns that belong to A s−1 and a s , respectively. In Algorithm 2, the incremental clustering is continuously performed with the coming of X s . This means that it is performed immediately once X s comes without waiting all X s+1 , X s+2 , . . . , X t come. Therefore, we can obtain the updated load patterns A s in time, and then the algorithm is paused until X s+1 comes.

Algorithm 2: The generalization of Algorithm 1
Input: a set of existing load patterns A 0 referring to N 0 daily load curves; the set of probabilities P 0 ; t set of new daily load curves X 1 , X 2 . . . , X t , each X s contains N s daily load curves; Output: the updated load patterns A 1 , A 2 , . . . , A t . 1 for X s over s = 1, 2, . . . , t do 2 Perform Algorithm 1 with the probability strategy based on A s−1 and X s to obtain A s ;

3
Update n sr for each µ sr in A s by Equation (12); 4 Update p sr for each µ sr in A s by Equation (11); 5 return A s .

Complexity Analysis
The time complexity of FCCWT is O(NKT) where N is the number of daily load curves, K is the number of clusters and T is the number of iterations needed until convergence [39]. The time complexity of K-means is O(NdKT) while the one of SSWC calculation is O(NdK), where d is the size of dimensions of daily load curves. As the default of maximum T is usually set as 100, 200, or 300, we assume that all Ts in Algorithm 1 are the same so that the time complexity can be analyzed more easily. Moreover, we also assume that all Ks adopt the maximum value 10 due to K ∈ [2, 10].
Based on the above assumptions, the asymptotic time complexities of load pattern extraction, load pattern intergradation and load pattern modification in Algorithm 1 are O(KN 1 T), O((2K 3 + 3K 2 + K)d(T + 1)) and O(Kd(T + 1) ∑ K k=2 k), respectively. Therefore, the asymptotic time complexity of Algorithm 1 is The time complexity of updating parameters is O(K) so that the asymptotic time complexity of Algorithm 2 is where O(KT ∑ t s=1 N s ) is the time complexity of t times FCCWT performed on X s , O((2K 3 + 3K 2 +K ∑ K k=1 k)td(T+1)) is the time complexity of t times load pattern intergradation and modification, and O(Kt) is the time complexity of t times parameter updating.
As for non-incremental clustering, the time complexity of t times FCCWT on [X 0 , X 1 , ·, X s ] over s = 1, 2, · · · , t is which is sensitive to the size of t. Similarly, the time complexity of t times non-incremental clustering algorithm K-means on the same data is O(((t + 1)N 0 + tN 1 + (t − 1)N 2 + · · · + N t )dKT). Comparing Equation (15) with the time complexities of two non-incremental clustering algorithms, it is suggested that the incremental clustering saves time and reduces the clustering scale when t is relatively large.

Experimental Settings
This section presents the experimental settings including datasets, evaluation criterion, and comparison methods in details. In evaluation criterion, an weighted mean error measure is proposed to evaluate the accuracy of the load patterns extracted by incremental clustering.

Datasets
The dataset used in the experiment refers to 14,976 commercial and 2800 residential electricity consumers in 936 counties of United States (Available online: https://openei.org/ datasets/files/961/pub/ (accessed on 24 June 2019). Eight of 2808 residential consumers have missing data so that only the data of 2800 residential electricity consumers are used in the experiment). It contains 24-value daily load data over one year and records the electricity power consumption at every 1 h from 1:00 to 24:00 per day. As the proposed algorithm is designed for learning the electricity consumption patterns of a single consumer, the data of one electricity consumer can be regarded as a sub-dataset that leads to a subexperiment. As a result, we conduct 17,776 sub-experiments totally. Moreover, three situations are considered for every sub experiment. We select 3 months, 6 months, and 9 months daily load data as the initial set X 0 , respectively. The remaining data are divided by month and then regarded as X 1 , X 2 , · · · , X t . For example, in the case of t = 3, daily load data from January to September are selected as X 0 , and the data of October, November, and December are regarded as X 1 , X 2 , and X 3 , respectively.

Evaluation Criterion
We employ two types of measures including clustering validity indices and accuracy measures as the evaluation criterion in the experiment. Moreover, we propose an weighted mean minimum error measure for the accuracy measures.
Clustering validity indices. The clustering performance of the proposed method is also evaluated by diverse clustering validity indices including Davies-Bouldin index (DB), Dunn validity index (DVI), and SWC.
Let C s = {C s1 , C s2 , · · · , C sK s } be the corresponding clustering results of A s , the clustering validity indices of C s follow the equations below: where C sr and C sw is the average within-group distance for C sr and C sw , respectively; x i and x j denote two daily load curves contained in [X 0 , X 1 , . . . , X s ], respectively; N = ∑ s i=0 N i , a C sr ,x j denotes the mean distance of x j to all other daily load curves in C sr ; and b C sr ,x j denotes the minimum mean distance of x j to all daily load curves in C sw , w = r.
Accuracy measures. As we aim to obtain A s = {µ s1 , µ s2 , · · · , µ sK s } that equals or approximates to A s = {µ s1 , µ s2 , · · · , µ sK s }, which is the load patterns extracted directly from [X 0 , X 1 , · · · , X s ], we employ the accuracy measures for time-series forecasting to evaluate the load patterns in A s comparing with those in A s . Both scale-dependent and percentage-based measures are employed, including Normalized Root Mean Square Error (NRMSE), Mean Absolute Error (MAE), and Symmetric Mean Absolute Percentage Error (sMAPE) [50,51]. However, both A s and A s contain several load patterns so that we propose a weighted mean minimum error based on the numbers of load patterns in A s and A s .
(1) K s ≤ K s indicates that the incremental clustering may cause extra load patterns. We calculate the minimum error for each µ sj ∈ A s , which is the error between µ sj and its most similar load pattern µ si ∈ A s . Moreover, we weight the mean error by K s /K s due to the extra load patterns.
(2) K s > K s indicates that the incremental clustering misses some load patterns. We calculate the minimum error for each µ si ∈ A s , which is the error between µ si and its most similar load pattern µ sj ∈ A s . Similarly, we weight the mean error by K s /K s due to the missing load patterns.
According to the definitions of these indices and measures, smaller Errors indicate higher accuracy and smaller DB indicates better clustering performance. On the contrary, the larger the DVI and SWC are the better the clustering performance is.

Comparison Methods
We adopt two algorithms FCCWT and K-means to conduct non-incremental clustering on [X 0 , X 1 , · · · , X s ] over s = 1, 2, · · · , t, and then regard the load patterns with the optimal clustering performance as the baseline for evaluating the accuracy of other incremental clustering methods. Moreover, we also compare the clustering performance of the nonincremental clustering algorithms with our proposed method and other related incremental clustering methods. Methods compared in the experiments are summarized in Table 3. Table 3. Summary of comparison methods.

Method Description Incremental
Probability Strategy (PS)

FCCWT
The method designed for daily load curve clustering [39] no no K-means The common K-means algorithm no no

ICluster-PS
The proposed method designed for daily load curve clustering yes yes ICluster The proposed method without PS yes no IK-means-PS The incremental method that adopts K-means with PS yes yes IK-means The incremental method that adopts K-means without PS yes no HPStream The algorithm for high-dimensional data streams [29] yes no

Results and Evaluation
In this section, we first present and discuss the general incremental clustering performance and accuracy of comparison methods on data of all consumers. Then, a commercial consumer is randomly selected as a case for electricity consumption behavior patterns analysis. We also compare the mean runtime of incremental and non-incremental clustering algorithms to support the time complexity analysis in the former section. Furthermore, we conduct pattern evolution analysis based on the incremental clustering results of another randomly selected residential consumer.

Incremental Clustering Performance
We conduct the experiments of Algorithm 2 with t = 3, t = 6, and t = 9, which means that three, six, and nine incremental clustering processes shown in Algorithm 1 are performed in one experiment, respectively. Both clustering performance and accuracy of the methods are compared for the incremental clustering performance. Although there are various types of consumers in the dataset, we still use the mean performance of all consumers to evaluate the comparison methods because most of the evaluation criteria are percentage-based. Table 4 shows the mean clustering performance comparison of the methods on the data of 17,776 electricity consumers. The former two methods are non-incremental clustering methods while the later five methods are incremental. We first compare the mean clustering performance of incremental methods. According to the definitions of three clustering validity indices shown in Equation (16)-(18), the larger the DVI and SWC are the better the clustering performance is, while a smaller DB indicates better clustering performance. The optimal results of incremental methods are displayed in bold. The proposed method ICluster-PS shows the smallest DB values and largest SWC values in Table 4. Although the DVI values of ICluster-PS are slightly lower than the ones of HPStream when t = 3 and t = 6, the average clustering performance of ICluster-PS is optimal in all compared incremental clustering methods. Therefore, these results indicate that the proposed method ICluster-PS outperforms other incremental methods. The largest improvement of clustering performance comparing with other incremental clustering methods is 44.2%. On the other hand, ICluster-PS still requires improvement due to its lower clustering performance compared with the non-incremental clustering FCCWT and K-means that conduct clustering directly on overall daily load curves. 19.8% * : non-incremental method; − : the minimum is the optimal; + : the maximum is the optimal.
As FCCWT presents the optimal clustering performance in Table 4, we decide to adopt the load patterns obtained from FCCWT as the baseline for accuracy measure. Then, we can calculate the mean errors of the five incremental methods based on Equation (22) and Equation (23) using three different accuracy measures. The results, which are shown in Table 5, indicate that ICluster-PS has the optimal performance as the minimum error denotes the highest accuracy. The improvement of accuracy is between 29.8% and 66.0% comparing with other incremental clustering methods. According to the results shown in Tables 4 and 5, the better clustering performance and smaller errors of methods with probability strategy compared with those without the strategy prove the optimization of our proposed probability strategy. Moreover, the incremental clustering algorithm of ICluster-PS, especially load pattern intergradation and modification, improves both clustering performance and the accuracy of K-means based on the comparisons between ICluster and K-means with or without probability strategy. As for the three groups of mean errors with different t, it is noticed that the mean errors increase with the rise of t, which means that the errors may increasingly rise over the continuous incremental clustering. However, the three groups of the mean clustering performance present an opposite tendency. Therefore, it can be only suggested that the load patterns updated by incremental clustering may tend to deviate from the load patterns obtained by FCCWT over time.
In summary, the proposed incremental clustering algorithm, ICluster-PS, can achieve an acceptable accuracy with mean error less than 10% and an improved clustering validity via its designed model and probability strategy. This result indicates that we can provide an efficient response when consumers require consumption analysis via smart meter or other facilities with limited resource. Although our experiments set the data of one month as X s , it can be set optionally by consumers in practical application.

Case Analysis
A random electricity consumer is selected to be analyzed in detail for a further discussion of the proposed method and electricity consumer behaviors. The selected consumer is a full service restaurant, which have three typical load patterns based on the overall daily load curves. Figure 4 illustrates the load patterns obtained by ICluster-SP and FCCWT in the experiment when t = 6. Each subfigure presents both the incremental and non-incremental cluster centers of the data [X 0 , X 1 , · · · , X s ], where 1 ≤ s ≤ t. The load patterns in solid line style denote the incremental cluster centers of ICluster-SP, while those in dashed line style denote the non-incremental cluster centers of FCCWT.
According to the clustering performance shown in Table 4, the load patterns of FCCWT are regarded as the accurate results. Note that these accurate load patterns are relatively stable and there is no distinct electricity consumption behavior drift happening to this consumer from July to December. The three typical load patterns of this consumer are distinct in terms of power degrees, starting time of the increase in the morning and ending time at night. The possible reasons for these distinctions are daylight saving time and seasonal influence. As for the incremental clustering results, their load patterns drift once on August shown in Figure 4b. Therefore, we can find out three typical load patterns in Figure 4a and four typical load patterns in other subfigures. These updated load patterns show similar patterns as the accurate ones if the power degrees of them are not taken into account. However, the distinct starting time of the increase in the morning shown by the accurate ones are not revealed by those of ICluster-PS until December, shown in Figure 4f.  (a-f) Curves presenting the load patterns based on incremental or non-incremental clustering of [X 0 , X 1 , · · · , X s ] over s = 1, 2, · · · , t.
In addition, we evaluate the load patterns by the same accuracy measures and clustering validity indices used in the former evaluation, the results of which are illustrated in Figure 5. Each curve contains six values which refer to the evaluation of load patterns in Figure 4a-f, respectively. Figure 5a-c presents the clustering performance of both ICluster-PS and FCCWT. FCCWT shows a relatively stable clustering performance while ICluster-PS shows slight fluctuation. The optimal clustering performance, especially for DVI and SWC, of ICluster-PS is presented in July. On the other hand, Figure 5d-f denotes the accuracy measures of ICluster-PS comparing with FCCWT so that there is only one curve in each subfigure. All three curves show an increase at first and then decrease after August. Different from the presentation of its clustering performance, their optimal accuracies are shown in December, which are in accord with the results shown in Figure 4. Based on the observation of this case, ICluster-PS can achieve incremental clustering for load pattern updating, although it may provide an slightly unstable performance in terms of accuracy and clustering validity. This result is acceptable for providing efficient and effective updated electricity consumption patterns with time and space constraints.

Runtime Comparison
Apart from the time complexity analysis of both incremental and non-incremental clustering algorithms in Section 4, we also compare their runtime in the experiment to support this analysis. The algorithms, which are written in Python and run on 64-bit Windows 10 operating system with Intel Core i5-5300U CPU and 8 GB RAM, are performed on the data of 16 commercial consumers in a same randomly selected county. Figure 6 shows the mean runtime comparison of the methods when t = 9. The comparison methods include the proposed incremental clustering algorithm ICluster-PS, and two non-incremental algorithms, FCCWT and K-means. Each algorithm is run 100 times in every clustering, which means that we run 16 × 9 × 100 × 3 times non-incremental or incremental clustering algorithms totally. According to Figure 6, it can be noticed that the runtime of ICluster-PS is stable and around 0.3 s while the the runtime of other two non-incremental clusterings increase with the rise of t. This result proves the time complexity analysis in Section 4, which is that the incremental clustering saves time when t is relatively large because it reduces the clustering scale. The runtime curve of ICluster-PS shows some slight fluctuations, which are caused by the small differences of the data in every month.

Pattern Evolution Analysis
We assume that our incremental clustering algorithm can be used to investigate the electricity consumption pattern evolution over time when conducting load pattern updates. As a result, we randomly select a residential consumer, who may have less stable consumption patterns than a commercial consumer, as a case for pattern evolution analysis. Figure 7 shows the updated load patterns of the selected residential consumer from April to December, which means that t = 9 is set in the experiment of this case analysis. Each subfigure denotes the load patterns of one incremental clustering with one month adding new data based on the load patterns of previous months. For example, Figure 7a indicates the load patterns updated by the first incremental clustering based on the existed load patterns of January to March and new daily load data of April, Figure 7b indicates the load patterns updated by the second incremental clustering based on the load patterns shown in Figure 7a and new daily load data of May, etc. In Figure 7, we use curves with different colors, line styles, and markers to distinguish various types or meanings of updated load patterns. The curves in blue and solid line style denote the load patterns that exist in last month, which means that these load patterns are not affected by new adding data and do not drift in current month. The curves in green and dashed line style denote the load patterns that are updated by new adding data in current month and have drifts comparing with the ones in last month. The curves in red and point line style denote the load patterns which are completely new and only refer to the days in current month. Markers on curves are only used to label different load patterns.
Moreover, we draw another figure, Figure 8, to illustrate the pattern evolution of the case shown in Figure 7. In Figure 8, each circle with a number denotes a cluster or load pattern, and the number inside the circle denotes the number of days that the load pattern refers to. There are three types of circles, which represent existed load patterns, updated load patterns and new load patterns, respectively. The plus and number shown on an arrow denote the number of new days added to the load pattern after one incremental clustering. In fact, Figure 8 is in accordance with Figure 7. The first column in left of Figure 8 indicates two load patterns extracted by non-incremental clustering with load data from January to March. Other nine columns, each of which denotes four load patterns updated by an incremental clustering with adding new load data in current month and the load patterns in last month (shown in left column), are corresponding to Figure 7a  The electricity consumption pattern evolution of the residential consumer is presented clearly according to Figures 7 and 8. The residential consumer has two load patterns, which refer to 36 and 54 days, in the first three months of the year based on a non-incremental clustering with the data from January to March. Note that the load pattern with 36 days is unchanged until December. There are 18 new days in December that have similar shape with this load pattern so that they are added in this pattern and the number of days included in this pattern becomes 54. Then, it can be found that this load pattern drifts slightly based on the comparison of the curve with 36 days shown in Figure 7h and the curve with 54 days shown in Figure 7i. Another load pattern with 54 days at first is unchanged until August. Then, it is continually updated and merged with new days or other existed load patterns, and finally becomes a load pattern with 301 days (Jan-Dec), which is presented by the green dashed curve with star markers shown in Figure 7i. In total, nine new load patterns emerge in April, June, August, September, October, November, and December. Most of them are merged with other load patterns in next month. For instance, a new load pattern with nine days emerges in August, then it is merged with 18 new days in September and the other load pattern with 74 days (Jan-Aug), and finally becomes an updated load pattern that refers to 101 days (Jan-Sept). We note that some load patterns are merged after one or several incremental clustering. Why do different load patterns become one after one or a few months? The reason is that the increase in the number of data samples leads to the change of the optimal clustering results.
Based on the pattern evolution analysis and the further analysis on the dates of all days included in every load pattern, we can find out when and how this residential consumer drifts electricity consumption behaviors. In that case, this consumer can have a clear understand of her/his electricity demand and make an effective response to it. On the other hand, electricity suppliers or other agencies can also detect any anomalies once electricity consumers, especially commercial or industrial consumers, drift their consumption patterns significantly.

Conclusions and Future Work
This paper aims to achieve efficient demand response and consumer segmentation for both electricity end consumers and suppliers by incremental consumer behavior learning. It supposes that an effective incremental clustering algorithm would constantly updated load pattern data for electricity consumers with limited resource. Moreover, the incremental clustering algorithm should reduce the training scale and save time comparing with nonincremental clustering algorithms. Therefore, we propose an incremental clustering algorithm with probability strategy, ICluster-PS, for updating load patterns based on smart meter data. We also provide parameter updating and algorithm generalization to ICluster-PS in order to continuously perform our algorithm with new coming data. The proposed algorithm is evaluated on realworld data. The experimental results prove the accuracy and validity of our incremental clustering algorithm, especially load pattern intergradation, modification, and probability strategy. It has less time complexity and runtime than non-incremental clustering algorithm. On the other hand, although ICluster-PS cannot provide load patterns that are the same as those extracted directly from the overall electricity load data, it achieves acceptable updated results when saving time, reducing the clustering scale and even making full use of the historical information. It also outperforms other related incremental algorithms or data stream clustering algorithms.
Moreover, we conduct additional case study of pattern evolution analysis by using our proposed algorithm. The analysis results indicate that our algorithm is able to detect load pattern drifts through its updated load patterns. In the future work, we plan to improve the performance of the incremental clustering algorithm and employ incremental consumer behavior learning for automatic and real-time load pattern evolution analysis and detection.