Whole Time Series Data Streams Clustering: Dynamic Profiling of the Electricity Consumption

Data from smart grids are challenging to analyze due to their very large size, high dimensionality, skewness, sparsity, and number of seasonal fluctuations, including daily and weekly effects. With the data arriving in a sequential form the underlying distribution is subject to changes over the time intervals. Time series data streams have their own specifics in terms of the data processing and data analysis because, usually, it is not possible to process the whole data in memory as the large data volumes are generated fast so the processing and the analysis should be done incrementally using sliding windows. Despite the proposal of many clustering techniques applicable for grouping the observations of a single data stream, only a few of them are focused on splitting the whole data streams into the clusters. In this article we aim to explore individual characteristics of electricity usage and recommend the most suitable tariff to the customer so they can benefit from lower prices. This work investigates various algorithms (and their improvements) what allows us to formulate the clusters, in real time, based on smart meter data.


Introduction
The advances in smart metering solutions have enabled that gathering information about customer power consumption in real time is feasible and it can be successfully used for data exploration to bring actionable recommendations. The data (in the form of a time series) from the smart grid still makes challenges to analyze it due to the very large size, high dimensionality, skewness, sparsity, and number of seasonal fluctuations, including daily and weekly effects. Although the analysis requires a lot of effort to discover the segmentation of entities based on their electricity consumption data, the benefits, as the result of the data insights, would be very appealing to the electricity providers [1]. By supplying providers with demand response predictions on aggregated level, due to segmentation (other terms such as clustering and grouping are used interchangeably), and revealing the real economic structure of the entities (e.g., individual users, households, small business) the goal is to fit into the integrated planning system, where the appropriate real-time actions could be proposed to meet the system demands effectively [2]. Well recognized consumption patterns itself are also a source of valuable insight to determine optimal tariff rates for the users and to deal with the spikes in electricity demand.
The analysis of the data streams (in this article we deal with time series and therefore we will use term time series data streams as well) coming from the grid over consecutive time windows allows for a better understanding of the usage characteristics. With the data arriving in a sequential form the underlying distribution is subject to changes over the time intervals what is referred to as concept drift [3,4]. For example, the changes in smart meter streaming data may be the result of many factors, including those related to weather conditions, to week days or those related to price incentives [5].
It is often observed that smart meter readings received at an instant intervals may have a dynamic distribution or may contain a large number of sparse and missing values. Therefore, traditional algorithms are not applicable directly nor suitable for these type of data as they extract patterns from data by assuming the global properties (what requires the complete training data set), rather than capturing the local ones.
Time series data streams have their own specifics in terms of data exploration and processing, because, usually, it is not possible to process the whole history in memory. The reason for that is that data are coming very fast so the processing and the analysis should be done incrementally using sliding windows (overlapping or non-overlapping) or using other approaches like the stochastic learning weak estimators [6]. Classical clustering algorithms aim to divide a set of objects (observations) into groups so that objects in the same group are more similar to each other than objects in other groups. The literature on time series data stream clustering makes a distinction in terms of what is the subject of grouping [3]. The first approach tries to cluster observations from a single univariate or multivariate time series data stream through lots of promising tools and methods [7]. On the other hand, second approach tries to analyze multiple time series data streams, generated by several sources (e.g., smart meters), in order to find a division of sources. In literature the latter problem is also known as attribute clustering [8]. Despite the proposal of many clustering techniques dedicated for the first approach, only a few of them are dedicated to the second approach. Due to that in this article we focus on multiple time series data streams clustering, as this is one of the most important challenge in data stream mining.
In many countries, all over the world, the retail electricity demand side of the market consists of several groups of end users. In Poland, for instance, the vast majority of consumers belong to the so-called tariff group G (mostly households). Other end users belong to so-called tariff groups A (top, strategic customers), B (large, key customers) which are supplied from the high and medium voltage grid, while group C consists of customers connected to the low voltage grid, consuming electricity for business purposes and they are called commercial customers [2]. For low-voltage households, operators have set up several different tariff groups which differ in the time zone (single or two time zone meters) and whether or not electricity is used for heating. The most general tariff group for households is G11, i.e., customers with single time zone meters and flat price per kWh. The other tariff groups, G12, G12r, and G12w, are time and weekdays. G12 is effective between 10 p.m. and 6 a.m. and between 1 p.m. and 3 p.m., while G12w is additionally effective during the weekends (between 10 p.m. on Friday and 7 a.m. on Monday). G12r is effective seven days a week between 10 p.m. and 7 a.m. and between 1 p.m. and 4 p.m.
The main goal of this article is to investigate technical aspects of the existing clustering algorithms for time series data streams. The secondary goal is to explore individual characteristics of electricity usage and to recommend the most suitable tariff to the customers so they can benefit from lower prices, thus optimize the expenses. The research shall be conducted on the basis of a dataset provided by the Irish Commission for Energy Regulation (CER; detailed analysis) [5] and two other datasets, which are described later. We investigate various algorithms (and their improvements) what allows us to formulate the clusters in real time based on smart meter data. Basically, we develop a clustering approach applicable for data streams with the primary motivation to create well defined user profiles what may further allow to create more predictable groups of customers. The contribution of this article can be summarized as follows: • We have created the framework and measures to compare and to evaluate time series data streams clustering algorithms; • New Fast Fourier Transformation based features were created (calculated in liner time) to compress and to represent time series using the business context; • Comparative study between the state-of-the-art time series data streams clustering algorithms was prepared; • Comparative study between overlapping and non-overlapping windows and their impact on the choice of an optimal tariff was prepared; and • Finally, an approach for dynamic consumer segmentation and prediction of an optimal tariff was proposed.
We believe that our contribution would address the gap related to those aspects of dynamic profiling where there was no clear conclusion with regards to the benefit of using overlapping vs. non-overlapping windows and the impact of those on the results of clustering algorithms.
The remainder of this paper is organized as follows: Section 2 provides an overview of the similar research problems for data stream time series clustering and electricity consumption segmentation. In Section 3, the theoretical framework of the proposed algorithm is presented. In Section 4, the research framework is outlined, including the details of numerical implementation, evaluation measure description, and algorithm parameter settings. Section 5 outlines the experiments and presents the discussion of the results. The paper ends with concluding remarks in Section 6.

Literature Review
Whilst the vast majority of customers belong to a single tariff with high volatility within the group, it creates a number of challenges, including short-term and long-term forecasting to meet the demand side response (DSR) of electricity operators, not to mention the stability of the whole network [9]. Obviously, daily energy consumption does not depend only on the composition of the customer's tariffs, but also it depends on many external factors related to specific days, atmospheric phenomena, and weather conditions [10]. Due to that, there is a need for an objective approach to increase the effectiveness and efficiency of network management and operations by dividing mass markets into consumer groups with clearly similar patterns of behavior. This can be supported by statistical clustering methods what helps to formulate valid and meaningful clusters based on the available measurements data e.g., hourly.
Given the huge number of low-voltage customers, especially households, hourly measuring and recording equipment are a serious shortage. Both, the future demand and the initial settlement of customers are determined based on the load shape associated with specific tariff group. In that case, a similar energy demand structure determines the number of groups. Statistical and engineering techniques [11][12][13][14], time series [15][16][17], and neural networks [16,18,19] are used for load profiling. Based on the literature review, there is a clear and increasingly recognizable research trend that addresses the challenges of segmentation of electricity end-users. For example, the application of the k-means algorithm for clustering of the daily load profiles of individual users was described in [17,[20][21][22]. A comparison of clustering algorithms for classifying household electricity consumers Kohonen's self-organization map (SOM), and including hierarchical clustering, was analyzed by [2,23].
The literature on data streams clustering is quite extensive and includes the methods (1) aiming at grouping of the observations of a single data stream; and (2) proposals that monitor the proximity between multiple data streams in order to find the division of streams into clusters. The state-of-the-art survey of a multivariate or single univariate data stream clustering methods is available in [3]. Authors have presented a comprehensive survey on this phenomenon which discusses various types of data stream clustering techniques and the corresponding challenges. So far, most of the attention has been devoted to observations-based data streams clustering, which focuses on clustering of the observations from the single data stream. Reference is made to several categories of methods, including: Grid-based stream methods, partitioning stream methods, density-based stream methods, hierarchical stream methods, and growing neural gas-based methods. The flagship methods in those categories are: Str-FSFDP [24], MuDi [25], D-Stream [26], ClusStream [27], DBSTREAM [28], BIRCH [29], E-Stream [30], and StreamKM++ [31].
A more detailed analysis of the literature on grouping of multiple data streams (or time series stream), which is the subject of this article, is desired. For example, the recent methods are constructed in a way to ensure the division of streams over time [32][33][34][35][36][37][38][39]. All of them monitor the proximity of data streams using a record flow and introduce some strategies to obtain partitioning of streams into a set of clusters. Other interesting methods, such as [40][41][42][43], are focused on monitoring proximity between streams, but these do not include a grouping stage.
In the broader context of the techniques used for electricity consumption data driven by explosive growth of time-series data and the capability of the methods there are interesting attempts which propose a cohort of dominant data set selection algorithms for electricity consumption time series with a focus on discriminating the dominant data set that is a small data set but capable of representing the key information carried by time series with an arbitrarily small error rate [44].
Authors in [34] discussed the clustering on-demand framework (COD) involving a single data scan to derive online statistics. The COD consists of two stages, namely the online maintenance (providing an effective mechanism for maintaining hierarchical summaries of data streams) and offline clustering (finding approximations of desired sub-streams from the summary hierarchy according to cluster queries). Based on this algorithm Chen [39] introduced the CORREL-cluster algorithm offering a time horizon segmentation scheme and statistical information storage for each time segment.
A tree-like hierarchy of clusters evolving with the data and using a top-down strategy has been introduced in [38]. The Online Divisive-Agglomerative Clustering algorithm (ODAC), incorporates correlation-based measure of similarity between time series, dividing each node by the furthest pair of streams. Due to the splitting and merging, operators algorithm is able to detect and to adapt to the data in the presence of the concept drift. The performance of the ODAC algorithm has been next improved by TS-Stream algorithm which calculates several descriptive time series measures and builds a decision tree [37]. Adequate measures are selected on the basis of the criterion of minimizing variance. As previously, the algorithm can gradually expand or reduce the tree according to changes in the stream that change the node variance. Finally, in [45] authors have presented an extended version of the TS-Stream algorithm, that overcomes some base algorithm drawbacks. After those modifications the final tree structure reaches its full size immediately and it can have leaves with the number of time series above a certain threshold (otherwise the tree would be very complex and deep).
Algorithm called IDEStream has been introduced by [39]. In this approach an autoregressive modelling (AR) is used to measure the correlation between data streams and it uses the estimated frequency spectrum to extract the relevant data stream characteristics such as attenuation rate, phase, and amplitude. Authors in [36] presented a two phase algorithm which uses a gamma mixture model to identify dense units of incoming data in the first phase. Aim of the second phase is to cluster the time series from one time window, while third phase performs incremental clustering between received groups of two consecutive time windows.
In [32] authors have developed a powerful online version of the fuzzy C-means algorithm (FCM-DS), allowing to quickly calculate the approximate distance between the streams, thanks to the scalable online transformation of the original data. In [35] authors have presented an algorithm called ClipStream where time-series are compressed and represented by interpretable features separated from clipped representation. Next, based on such data transformation the K-medoids method with the Partition Around Medoids (PAM) algorithm cluster the data streams.
Finally, paper [8] presents a strategy which is based on the independent processing of incoming data batches, through a preliminary summarization using histograms, followed by local clustering carried out on histograms, which ensures further summarization of the data. To track the proximity of data between data streams over time they used local clustering outputs to update the proximity matrix.

Notations and Data Representation
A time series is an ordered sequence of values of a variable at equally spaced time intervals (e.g., 30 min electricity load readings). Let us assume that, s j = s j,1 , s j,t , . . . , s j,n T is a partial realizations from a j-th (j = 1, . . . , m) real-valued processes S j = S j,t , t ∈ Z . Formally, the problem of grouping the time series data streams can be defined as follows. Let S = s 1 T , s j T , . . . , s m T be the data stream composed of m time series each of length n (S is a matrix with m rows and n columns). For a l-th (l = 1, . . . , k) overlapping or non-overlapping time windows (blocks) with w time slots (intervals), ) is a subset (of columns) of S, i.e., a matrix of dimension m × w (each block consists of a subset of times series from the same time interval. For a given block, L l = L l 1 , L l o , . . . , L l p represents a partition (of rows) of B l such that L l o is the o-th cluster of L l , L l o ∩ L l p = ∅, ∀o p and ∪

Histogram-Based Clustering Algorithm
The algorithm presented by [8] is composed of 4 main phases, where phases 1-3 are done online, while phase 4 is done offline. The goal of the phase 1 is to represent each time series data stream as a series of histograms by dividing the incoming data into (by default) non-overlapping time windows (this assumption will be further extended) and calculating the histogram of each -th window:

Histogram-Based Clustering Algorithm
The algorithm presented by [8] is composed of 4 main phases, where phases 1-3 are done online, while phase 4 is done offline. The goal of the phase 1 is to represent each time series data stream as a series of histograms by dividing the incoming data into (by default) non-overlapping time windows (this assumption will be further extended) and calculating the histogram of each l-th window: H l j = (I 1 , π 1 ), . . . , I p , π p , . . . , (I P , π P ) , where I p denotes P successive bins/intervals associated with the relative frequencies π p (weights), which sum up to 1. In this way, one can obtain, for each time window, a set of histograms which become the input for the local clustering procedure. The purpose of the phase 2 is to get a local data partition (using BIRCH algorithm [29]) on a set of histograms that summarize the data behavior in each window. In order to do that the L 2 Wasserstein metric (distance) should be introduced, which simply calculate the distance between any two histograms H l k and H l j . As shown in [46] this metric requires an initial homogenization step to ensure consistency of distance calculations, which is based on the histogram configurations. Since all histograms are uniformly dense in each I p interval, their quantile functions Q l j are piecewise linear. Aforementioned homogenization step consists in dividing Q l j functions in such a way that piecewise linear functions are defined on the same set of h cumulative probability values q v = v p=1 π p , (v = 1, . . . , h) [8]. To make the computation faster, according to the authors [46], each bin I v = I v ; I v in the histogram can be represented as a function of a radius and a center, i.e., the centre of each interval and r v = I v − I v /2 is the radius. Finally, using this representation the L 2 Wasserstein distance is as follows: The formula allows to take into account the features of two histograms being compared in terms of shape, range and location.
To perform a local clustering on l-th batch, aforementioned BIRCH algorithm requires two information about each o-th group (o = 1, . . . , p), i.e., histogram centroid (average) H l o and L 2 Wasserstein-based variance σ 2l o . According to the [47] and based on the Formula (2), the mean of a set of histograms of equal frequency is obtained by the average of the centers and the average of the radii of the corresponding h intervals: where: On the other hand, a volatility measure for a set of histograms is the average of the L 2 Wasserstein measure between each j-histogram and the average histogram defined in Formula (3): The rationale in favor of this phase is to perform a single scan of the input data in order to obtain a division into a large number of clusters with low variability. To do that authors in [8] adopted the basic BIRCH algorithm to histogram-based data structures. Whenever a new time window is introduced, the algorithm allocates each H l j histogram to existing micro-clusters or generates new micro-clusters according to a fixed threshold u that controls the growth of heterogeneity in micro-clusters. In other words, if the L 2 Wasserstein distance to the nearest micro-cluster centroid is smaller than the predefined threshold d 2 W H l j , H l o < u then H l j histogram (representation of the time series data stream) is assigned to this cluster, otherwise it creates entirely new cluster, with the initialized variance σ 2l o set to at the L 2 Wasserstein distance to the nearest cluster. In phase 3 an update of the proximity matrix A l = a l (k, j) is performed, which registers the dissimilarities between the streams. The proximity matrix is updated incrementally (each cell a l (k, j)) each time a new data window is processed in phase 2, therefore, it tracks the proximities over time, using information only from the local partitions. If two histograms H l k and H l j fall into the same micro-cluster the proximity matrix is updated by adding the value of the variance σ 2l o of this cluster: On the other hand, if these two histograms fall into different micro-clusters, the cell is updated by adding the mean of two distances: i.e., L 2 Wasserstein distances to the nearest micro-cluster centroids for both histograms. This update strategy allows to use only information from the micro-clusters, thus it requires only m 2 /2 operations. Finally, phase 4 provides an ultimate global clustering of the time series data streams from B l block by grouping the updated proximity matrix into L l . In order to obtain such partition DCLUST algorithm [48] is employed which minimizes intra-cluster variability, expressed by the sum of the dissimilarities between all pairs of elements within a cluster: According to the authors [8] histograms are fast to compute with the time complexity O(wP). The generation and the update of histogram micro-clusters, through a single scan of the histograms in a window, induces the time complexity of the algorithm is linear in m and p.

ClipStream Algorithm
The ClipStream algorithm is composed of two main phases [35], i.e., online data abstraction (representation) and an offline clustering. The first data representation phase includes a fast and incremental method of calculating feature vector from each B l block named FeaClip and automatic detection of outliers. The second offline phase aims at grouping of a new data abstraction, aggregation of time series data streams in the cluster and the change detection process.
The feature extraction approach from the first phase is based on a so called clipped representation. Let us first define a short window b short as a subsequence of an original time series data stream s of length z (z is shorter than window length w, and it could represent e.g., each day having 24 or 48 recordings; see also Section 3.1.) and a long window b long which consists of last d consecutive short windows (therefore it is of length d * z). Next, a new representation (with reduced dimensionality p < z) of b short is repr short defined as below, first: short is a clipped (bit-level) abstraction of the original block, where µ denotes a mean value of b short .
Then, the compression method called Run Length Encoding (RLE) [49] is applied on this abstraction to create the final representation repr short (of length 8) defined as: from run lengths of ones, sum 1 = sum of run lengths of ones, max 0 = max. from run lengths of zeros, crossings = length of RLE encoding − 1, f 0 = number of first zeros, l 0 = number of last zeros, f 1 = number of first ones, l 0 = number of last ones, Finally, the ultimate repr long abstraction is an union of d short representations repr short d which has length d * 8. Whenever a new window b short d+1 is arrived, first 8 features from repr long are removed and new repr short d+1 is attached to the end of repr long . Based on the calculated FeaClip abstractions of all available time series data streams, outlying values can be easily and automatically detected by using domain knowledge. To automatize this, mean values of crossings and sum 1 are calculated for each stream and corresponding repr long . Bead on these statistics, lower and upper quartiles and IQR (interquartile range) are calculated to create box-and-whisker diagrams, with threshold value λ set at 1.5. Time series with the characteristics that meet the following conditions: Q are considered as non-outliers. Outlying values are not deleted from the whole clustering, they are simply stored in memory, and after the clusters are determined, those objects are assigned to the nearest ones.
Once the data representation phase is completed second offline stage follows to create the final grouping. Only filtered (without outliers) repr long representations are subject to clustering using K-medoids method with Partition Around Medoids (PAM) algorithm [50] with Euclidean distance. To capture the dynamic and evolving nature of time series data streams, the number of clusters should also be determined dynamically. Therefore, the optimal number of clusters is determined on the basis of the internal measure of Davies-Bouldin index [51]. During the first iteration of clustering the number of possible clusters is determined in the range p min − p max , where p that minimizes the Davies-Bouldin index is chosen. To speed up further iterations of clustering the optimal number of clusters is selected from p − 2, p + 2 , where p is the number of clusters from the previous iteration.
In order to carry out the process of grouping time series data streams only when it is necessary, i.e., only when data streams evolve and change of distributions occur, a stage for detecting concept drift is conducted. It detects changes of the Empirical Distribution Function (EDF) of the normalized aggregated data stream within each cluster, using K-sample Anderson-Darling test, defined as: where (according to the Section 3.1 and notation introduced at the beginning of this section) s j,t is the t-th recording in the k-th sample, N kt denotes the number of observations in the k-th sample that are not greater than x t , where x t < · · · < x w is the pooled ordered sample (long window). Concept drift is detected if p-value is less than the significance level α set at 0.05, however clustering is updated only if one of these conditions are meet: (1) The number of detected changes is more than half of the grouped p time series (number of clusters); (2) the number of detected changes is higher than in the previous step of the sliding window.
According to the authors [35] the representation phase has the linear time complexity O(w) with respect to the length of the time window. Outlier detection phase is linear O(m). The offline phase consists of the PAM clustering algorithm that for each iteration has the quadratic complexity of , where m o denotes number of outliers.

Extended TS-Stream Algorithm
The algorithm presented by [45] is an extended (improved) version of the algorithm presented in [37]. In general, it evokes a model with a structure similar to the decision tree, but built in an unsupervised manner. The top-down strategy is employed to build the tree, starting from all times series data streams in the same main cluster (root) and gradually creating partition or aggregations. Each indirect node executes a binary test of a type f eature value ≤ x for a specific time series descriptive measure. Once a leaf is reached, the time series is stored together with other time series which belong to the same leaf.
During the first step the algorithm calculates descriptive measures (here also called coefficients, characteristics, indices) for each time series data stream. This gives a matrix of characteristics of the dimension m × f , where f is the number of characteristics. To make all features comparable (which is required when variance minimization criterion is used), for each column of the matrix the z-score normalization of the form x = (x − µ)σ is performed. A simple and natural way to model each time series data stream is to use generating functions to depict their behavior in time domain. Unfortunately, many of the existing grouping techniques do not take into account specific characteristics of the generating function, e.g., stochasticity, linearity, and stationarity. So, the algorithm employs many descriptive measures in order to obtain the appropriate characteristics of the generating function to better describe the resemblance between the series.
Authors in [37] claim that after their investigation of several descriptive measures such as Discrete and Continuous Wavelet Transforms, Recurrence Quantification Analysis measures, Empirical Mode Decomposition, Lyapuno, Discrete Cosine Transform, Detrended Fluctuation, Autocorrelation function and Box and Jenkins model parameters, the best ones were Hurst exponent, Auto Mutual Information (AMI) and Discrete Fourier Transform (DFT). Those indices have been chosen because they are efficient to compute and provide high information gain (see Formulas (12)- (14), below).
The Hurst's exponent, is a measure of long-term memory of the time series. It refers to the auto-correlation of the time series and the rate at which it decreases as the delay between value pairs increases. There are different estimating approaches of the exponent; the Scaled Range approach is most often used. The Hurst, H exponent is defined in terms of the asymptotic behavior of the Scaled Range as a function of the time series time interval, as follows [37]: where t stands for the time span of the observation, c is a constant, R t is the range of the first t cumulative deviations from the mean, and S t is their standard deviation. The second measure, which is Auto Mutual Information (AMI), provides insight of how much one random variable explains the other variable. To calculate this characteristic, a histogram (with intervals) has to be created. Let p i be the probability that the signal has a value inside the i-th intervals, and let p ij (τ) be the probability that s t is in intervals i and s t+τ is in intervals j. Then, the AMI for time delay, τ, is defined as [37]: The last one is the Discrete Fourier Transform (DFT) [52] which describes time series in the frequency domain. This transform, after receiving a time series s t as input, provides a new series X m of n complex numbers, each one describing a sine function at a given frequency [37]: where j = √ −1. The Fourier transform helps to characterize the generating function of this time series by indicating the most relevant frequencies, i.e., first 20 DFT coefficients of every time series in each window with the highest energy have been retained.
To split the times series into different clusters/nodes, each time a dedicated function is called which is accountable for finding the best coefficient for the binary test of the current node. This function takes as its input normalized matrix of characteristics and aims to minimize the weighted variance criterion of the form: where V is the current node consisting of n time series data streams, σ 2 (·) is the variance function, V right and V le f t are the nodes established after the split, each with n right and n le f t series, respectively.
In each consecutive iteration after obtaining a new time window the algorithm maintains the current tree model (structure from the previous iteration) and clusters time series based on the new batch of data. After this, the update stage begins, in which the breakdowns and/or aggregations are checked and executed, if necessary and/or possible [37], which is controlled by a set of parameters, i.e., α ∈ [0, 1], λ ∈ [0, 1], and minSeries. Two sibling leaves (denotes as Le f tChild and RightChild) must be aggregated if their weighted variance (denoted as WVC) is greater than or equal to λ of the parent node variance (VP) computed from its test feature. This makes the structure of the tree simpler and more resistant to noise/outliers. If aggregation did not occur the algorithm checks for possible leaf splits, which is done if the weighted variance of its potential children decreases by at least α times its variance. Finally, to prevent a split when two possible children have less than a certain percentage of all observations, minSeries parameter controlling the complexity/depth of the tree is set by default at 5%.
The overall time complexity is O m 2 w . It is important to note that the quadratic term in the algorithm refers to the number of time series, which is typically low (order of tenths) [45].

Numerical Implementation
As presented below, numerical experiments were prepared using R programming language working on Ubuntu 18.04 operating system on a personal computer equipped with Intel Core i7-9750H 2.6 GHz processor (12 threads) and 32 GB of RAM.
The first algorithm, which is histogram-based clustering, was implemented using several libraries. To represent each time series as a histogram and to compute the L 2 Wasserstein distance the HistDAWass package was used [47], which implements a framework of Symbolic Data Analysis, a relatively new approach for the statistical analysis of multi-valued data. Next, to get a local data partition based on a set of histograms a modification of BR_BIRCH package was used [53]. Finally, a symbolicDA [54] package was utilized to obtain a global clustering using DCLUST algorithm. The second algorithm, which is ClipStream, was entirely implemented using ClipStream library which is a software strictly connected to the article [55]. Finally, the extended TS-Stream algorithm was implemented in line with the following work [45].

Algorithms Parameters Setting
In order to have robust and consistent results all algorithms parameters settings are in line with the source articles and libraries. Since for the extended TS-Stream algorithm the parameters α and λ have a similar influence, it is not recommended to set one value as a function of the other. During the research preparation stage, it was observed that setting these two parameters to values smaller than 0.6 resulted in almost no splits. On the other hand, values greater than 0.6 could result in a too wide and too deep tree. Next, minSeries parameter which is responsible for controlling the size of a tree, is set at 5% (50 time series). Due to the fact that there are 1000 time series in the investigated data set (see Section 5) the final tree structure might have up to 20 leaves, i.e., clusters.
For ClipStream algorithm, long (b long ) and short windows (b short ) length were set to 1008 or 48 for overlapping windows and to 1440 or 48 for non-overlapping windows (see Section 4.4), while threshold value λ determining outliers was set at 1.5. The optimal number of cluster derived by the Davies-Bouldin measure was determined in the range 5 and 11. The latter number was determined as an average number of clusters obtained for each batch (for both overlapping and non-overlapping windows) for extended TS-Stream algorithm. Finally, concept drift is detected if p-value is less than the significance level α set at 0.05.
Histogram-based clustering algorithm has the following changeable parameters: P, which determines number of bins for each histograms, was set at 10 (average number of clusters obtained for both aforementioned algorithms), u, which is a threshold on the micro-cluster size, was set at 0.01, and because other two remaining algorithms usually provided maximal number of clusters, o parameter, which defines number of clusters, was set at 11.

Tested Changeable Components
One of the main goals of the article is to find the best clustering algorithm and, if possible, to propose some improvements with regards to different components adopted from other algorithms. To do so, firstly, a comparative study between overlapping windows and non-overlapping windows was be conducted, i.e., research was conducted in two different variants (see also Figure 1):

•
Using non-overlapping window: This approach is in line with our previous study where the window length w of each block B l , has been set to 30 days. As the electricity consumption data were recorded at 30-min intervals, each window has length of 1,440 (2 × 24 h × 30 days); • Using overlapping window: This approach is in line with the article [35] implementing ClipStream algorithm where window is of length 21 days (3 weeks). In this case, each time there are two overlapping weeks led by the new arriving week (2 × 24 h × 21 days = 1008).
Secondly, a new Fast Fourier Transformation based features (calculated in liner time) is proposed, allowing to compress and represent time series using the business context. In our previous paper a set of 20 dominating Fourier coefficients have been taken as descriptive measures (see also Section 3.4). To make the usage of Fourier coefficients more intuitive, in this paper, the frequency domain have been divided into four intervals/ranges. Each of them represents electricity consumption behavior changes, respectively, monthly, weekly, daily, and all more frequent (see Table 1). The frequency is calculated with respect to the following equation: where f c (m) is the frequency of m-th coefficient f s is the frequency of sampling, w number of samples (i.e., window length) used in Fourier transform. A period is calculated as 1/ f c (m). As it can be noted, end of an interval is not a beginning of the another one. One should remember about discrete nature of values of DFT coefficients. Moreover f c (0) represents the mean value. Those aforementioned features were used in the extended TS-Stream algorithm (in this case a node partition is performed based on only one feature) and in the ClipStream algorithm. In the latter case instead of FeaClip representation each time series is represented base on those 4 features.
Thirdly, to conduct process of time series data streams clustering only when it is necessary, a stage for detecting concept drift using K-sample Anderson-Darling test (idea taken from ClipStream algorithm) was also implemented in the extended TS-Stream algorithm.
Finally, it is necessary to mention that all above improvements/components were not implemented in the histogram-based clustering algorithm, because it would entirely change the logic and the behavior of this algorithm.

Framework and Measures for Clustering Comparison
The main problem existing in the investigated area is the fact that there are no explicit frameworks, measures, criteria allowing to assess the performance, effectiveness and to compare algorithms to each other. To overcome this issue, we have proposed the following framework.
To compare the results of the grouping against external criteria, a measure of consensus is needed. Since it is assumed that each time series is assigned to only one cluster a natural way is to utilize the Adjusted Rand Index which is a measure of the similarity between two data clusterings. However, the practical aim in this article is to propose an optimal tariff for each time series. In this context we would like to know which clustering algorithm provides stable results, i.e., clusterings that are similar to each other. To do so we reformulated standard ARI measure as follows: where n ou denotes the number of objects that are in both, cluster l o form l-th time window and cluster l u from the l + 1 time window (l u is simply the same cluster as l o but from consecutive window), with the marginal distributions denoted as n o * and n * u . After comparing each batch to each other an upper triangle matrix is created [45] (for an example please see Table 4). The second measure is closely related to the selection of an optimal tariff for each customer. Let us assume that a particular customer has a base tariff G11 (single time zone with flat price rate per kWh) over an entire year. From the customer perspective it might be better to change a tariff to G12 for an entire year. Furthermore, one may analyze more frequent changes of the tariff e.g., after each month or even after each week. To answer that question we propose the following approach: (1) For a particular time window l apply a given clustering algorithm; (2) Assign a particular customer to his cluster; (3) Determine an optimal tariff for the entire cluster, i.e., the lowest price for an aggregate consumption of all customers in cluster by calculating the total electricity cost if they would belong to G11, G12, G12r or G12w tariff plan; (4) Select an optimal tariff from the previous step as an optimal tariff for a given customer; (5) Deploy an optimal tariff for each customer as a tariff for the next time window l + 1; (6) Return to the first step.
According to the above procedure it might happen that for a given customer an optimal tariff for an entire year is G12. However, on the other hand it might happen that an optimal tariff will change after each time a new batch of data arrives. Next, to assess whether application of a particular clustering algorithm and aforementioned procedure make sense, we propose to derive, as previously, a similar upper tringle matrix having the following values: Tariff improvement = dynamic optimal tariff static optimal tariff . To clarify that, let us consider first data batch l in a given year (for non-overlapping windows there would be 12 batches). This case is represented as the first top row in the upper triangle table (Table 4). Based on that particular window it was decided that an optimal tariff for the entire year is G12w (an optimal tariff for a cluster where a particular customer belongs), therefore, for this investigated row, denominator in the above equation takes always the same value, i.e., price of this fixed tariff for a particular customer calculated for each month separately. On the other hand, nominator is determined dynamically. For the first column it takes the same value as the denominator. For the remaining eleven columns (batches from l + 1, . . . , l + 11) it takes dynamically changeable price of the tariff determined in the 5th step of the mentioned earlier procedure. Such table is prepared for each customer, therefore to have only one global table, as in case of the ARI, each field in the final table was calculated as the mean value of the 1000 customer-wise matrices.
The last measure is the weighted volatility of time series for a given block B l . After the division, the time series are spread over several groups. It is assumed that the variation (standard deviation) of electricity consumption in each group is to be less than the variation of time series in only one group (root) [45]. Furthermore, because of the difference in the size of each group, the measure takes into account this fact by assigning smaller weights to a smaller leaf-as in the right-hand side of the Equation (19): where #L l o denotes the number of time series for a given cluster, m denotes the number of time series in a block B l and σ(·) is the standard deviation of all times series assigned to a given cluster L l o .

Data and Tariffs Characteristics
The dataset used in this research is originated from the Irish Commission for Energy Regulation (CER) project where the measurements of the electricity load where recorded for 4182 households between July 2009 and December 2010. In total, time span covers 75 weeks where each reading was recorded with 30 min data granularity [5]. Due to the missing recordings in the time series and computational complexity of the investigated algorithms, the research was conducted using data from 1000 households selected randomly.
Unfortunately, CER dataset does not provide any information regarding tariff plan of each customer. After investigation of several tariffs plans provided by electricity suppliers in the European countries, it can be stated that there are many similarities. Therefore, to conduct simulation of the optimal tariff, all the information and the tariff prices were taken from one of the biggest energy holding company in Poland.
Depending on the tariff plan, the customers can benefit from lower prices per kWh if the usage falls between certain time zones. In Figure 2 the prices for G11, G12, G12w, and G12r tariff are presented. G11 tariff (blue straight line) has the fixed price of 0.35 PLN/kWh. G12r tariff (purple dotted-dashed line) plan has lower rate of 0.21 PLN/kWh between 10 p.m. and 7 a.m. and between 1 p.m. and 4 p.m., while the higher rate of 0.48 PLN/kWh is applicable outside these windows. G12w tariff (green double dotted-dashed line) has lower rate of 0.28 PLN/kWh during the weekends and Monday-Friday between 10 p.m. and 6 a.m. and between 1 p.m. and 3 p.m., while the higher price of 0.43 PLN/kWh is applicable outside these windows.
Let us now simulate what is the relation between the best and the worst tariff for each customer. Table 2 shows various statistics of the simulation (aggregated over 1000 customers) for non-overlapping windows case. When dynamically changing an optimal tariff for each customer a minimal improvement between the best and the worst individual tariff is 2.39%, while the biggest improvement reaches 19.27%. Second row of the table shows what is the improvement between dynamically changing an optimal tariff and one fixed optimal tariff derived based on the entire period. It was observed that dynamic simulation of the optimal tariff, all the information and the tariff prices were taken from one of the biggest energy holding company in Poland.
Depending on the tariff plan, the customers can benefit from lower prices per kWh if the usage falls between certain time zones. In Figure 2 the prices for G11, G12, G12w, and G12r tariff are presented. G11 tariff (blue straight line) has the fixed price of 0.35 PLN/kWh. G12r tariff (purple dotted-dashed line) plan has lower rate of 0.21 PLN/kWh between 10 p.m. and 7 a.m. and between 1 p.m. and 4 p.m., while the higher rate of 0.48 PLN/kWh is applicable outside these windows. G12w tariff (green double dotted-dashed line) has lower rate of 0.28 PLN/kWh during the weekends and Monday-Friday between 10 p.m. and 6 a.m. and between 1 p.m. and 3 p.m., while the higher price of 0.43 PLN/kWh is applicable outside these windows. Let us now simulate what is the relation between the best and the worst tariff for each customer. Table 2 shows various statistics of the simulation (aggregated over 1000 customers) for non-overlapping windows case. When dynamically changing an optimal tariff for each customer a minimal improvement between the best and the worst individual tariff is 2.39%, while the biggest improvement reaches 19.27%. Second row of the table shows what is the improvement between dynamically changing an optimal tariff and one fixed optimal tariff derived based on the entire period. It was observed that dynamic change resulted in average improvement of 0.28%. Finally, it can be concluded that, on average, an optimal tariff would change almost 5 times, out of 17 data batches, each 30 days long, in the analyzed timeframe.   When speaking of overlapping windows case (Table 3), results are slightly higher. Average improvement between the best and the worst individual tariff for each batch increases to 8.47%, while the best individual tariff for each batch vs best individual tariff for the entire period increases to 0.51%, on average. Due to the fact that there are 73 batches in this scenario, each batch of 21 days long, the median of dynamic individual tariff changes is 25. Those results present the best and the worst case scenarios, when an optimal tariff is derived for each customer separately without any clustering algorithm. Therefore, those results provide benchmarking ranges between which the clustering results presented in the following subsections will be included.

Clustering Results
Let us now investigate which algorithm provide relatively robust results, i.e., overall groupings that are similar to each other (in other words, maintaining time series belonging to the same clusters). For the non-overlapping case, the extended TS-Stream algorithm provides on average 11 clusters, all having more than 5% of all time series. For the 17 investigated batches on average each time series should change his optimal tariff 7.98 times (median is 8; this is determined as the optimal tariff for the cluster to be monitored). The ClipStream algorithm changes the tariff 5.38 times on average (median is 6), while not using the concept drift results in increasing these values to 6.52 and 7. On average, histogram-based algorithm changes the tariff 7.04 times (median is 7). All aforementioned numbers are higher than those reported in Table 2, where the best tariff is chosen separately for each customer without any clustering algorithm, which means that a time series changes its tariff more frequent than it should. For better understanding of the idea, in this article we present only sample matrix of the ARI index obtained for the ClipStream algorithm (in Table 4). Tables 5 and A1 (in the Appendix A) provide various statistics of the ARI and tariffs improvement derived based on the upper-triangular matrixes (described also in Section 4.4) for both non-overlapping and overlapping windows (see Appendix A). In this example, similarity (measured using ARI) between the first batch B 1 and second the batch B 2 is 0.100. Clustering from the first batch is the least similar to batches from seven to nine (0.040). Because algorithm detected no concept drift between batches B 7 -B 9 , the change of clusters membership did not occur which results in ARI equals 1.
According to the results presented in Table 5 (the best results for each statistic are bolded), it can be seen that, on average, the highest ARI provides histogram-based algorithm. This is impacted by two things, first-it always generates the same number of clusters. Secondly, it divides customers into the clusters based on the iteratively updated (after each batch) global proximity matrix which uses partition from the BIIRCH algorithm (second step of this algorithm). This step provides only a minor modification of the global matrix and once in the last step the DCLUST is incorporated, it provides very similar groupings (customers rarely change their cluster). It should be noted that whenever ClipStream algorithm decides not to make any changes ARI is equal to 1. The worst results are connected with the extended TS-Stream algorithm (median is 0.033).
For the overlapping widows case (Table A1 in the Appendix A) the dependencies are similar. One more time the histogram-based algorithm produces the most stable partitions. In previous case, for the extended TS-Stream algorithm concept drift module was not used. This time for couple of batches the tree preserved the same structure which increased the highest value at 0.326. What is interesting, for ClipStream algorithm the new data representation (Fourier coefficients) increases lower (up to median) statistics.
In the similar manner as for the ARI index the upper triangle matrix has been derived for the tariffs improvement (Equation (18)).
From practical point of view it is better for the electricity provider to have customer groups with relatively similar size [2]. The extended TS-Stream algorithm guaranties that each cluster has no less than 5% of all customers, and after investigation of the group size it can be stated that this algorithm produces clusters with the similar size. On the other hand, both ClipStream and Histogram-based algorithms do not have such restriction. On average, ClipStream algorithm generates one (rarely two) cluster having only couple of customers (1-5 time series). Histogram-based algorithm usually produces three up to four clusters whose are very small. This observation has high influence on the values of the investigated metrics (they are rewarded), since in small groups memberships change rarely and the volatility is small (see Tables 6 and A3). According to the results presented in Table 6, the least volatile partitions provides the extended TS-Stream algorithm, median is 21.06 while mean is 23.22 (since there were no batches when the concept drift module was used both versions produce the same results). Seconds place in this ranking takes the Histogram-based algorithm whose maximal volatility is even smaller than for the extended TS-Stream. For the overlapping windows case, the least volatile groups produces the histogram-based algorithm. Slightly worse results are connected with the Extended TS-Stream (with the concept drift module) whose the minimal statistic is even smaller than for the histogram-based algorithm. Finally, in both windows (overlapping and non-overlapping), new data representation and not use the concept drift procedure in ClipStream worsen the results.

Tariff Evaluation
In this section tariff improvements are discussed. When it comes to the various statistics for non-overlapping windows, it is observed that the all investigated algorithms provide, on average, an improvement of 0.3%-0.4%, please refer to Table 7. The highest improvement is observed for the Extended TS-Stream and the histogram-based algorithms, and for the ClipStream algorithm with the newly proposed data representation (up to 1.8%). Moreover, the first two algorithms mentioned do not produce worse results (please refer to the first column with Min values). For the overlapping windows case, please refer to Table A2, one more time, all algorithms usually provide the improvement, with the mean value between 0.1% and 0.2%. Unfortunately, in the worst-case-scenario each algorithm chose worse tariff, the smallest worsening (−0.1%) is for the extended TS-Stream algorithm without concept drift module.
The last results presented below are to answer the question, whether it is possible and justified to use clustering (and associated optimal tariffs for each group) obtained for a particular batch B l and the deploy those optimal tariffs as the applicable tariffs in the following period B l+1 . Tables 8 and A4, provide statistics of the tariffs improvement compared to the basic (flat) tariff G11 in case when the future optimal tariff for each customer (for the next data batch) is derived as the current optimal tariff for the cluster to which a particular customer belongs. The advantage of this approach is that it does not require training nor the use of any predictive models. As shown in Table 8, for the non-overlapping windows case, on average, it is possible to achieve some improvement. The ClipStream algorithm provides better results of 0.31% comparing to the base tariff (removing concept drift module gives improvements as well). The mean improvement for both versions of the extended TS-Stream produces no improvement; however, median value equals −0.09%. Unfortunately, the histogram-based algorithm usually provides worse tariff than costs related with the G11. It should be noted that when comparing the optimal predicted tariff to the random tariff (rather than to the G11), on average, the results are always better (see Table A5). For the extended TS-Stream algorithm it is 1.66%, for the ClipStream algorithm (base version) it is 2.17%, and for the histogram-based algorithm it is 1.50%.
For the overlapping windows case (batch size equals 3 weeks while each time new data cover one week), please refer to Table A4, the improvements are more common and clear for all algorithms, i.e., according to the median and to the mean value the improvement is positive. Only for the statistics such as 3rd quartile and above the worsening can be noted. The biggest improvement is noted for the base version of the ClipStream algorithm (7.9%). Second place in terms of the mean value belongs to both versions of the extended TS-Stream algorithm (0.21%; 0.20%).
Finally, when it comes to the comparison to the random assignment of tariff (as an optimal for the future), the extended TS-Stream algorithm (base version) achieves improvement of 2.69%, for the ClipStream algorithm (base version) it is 2.91%, and for the histogram-based algorithm it equals 2.65% (see Table A6).
Based on the results we could summarize the comparative study between overlapping windows and non-overlapping windows and their impact on the choice of an optimal tariff as outlined in Table 9. For the purpose of results discussion the average improvements were considered. It was observed that the implementation of the current best tariff is feasible and could deliver the benefits for both, overlapping and non-overlapping windows. Specifically, for non-overlapping windows the general tariff improvement was up to 0.40%, on average, depending on the algorithm. In case of tariffs improvement comparing to the G11 tariff plan the highest improvement was for overlapping windows, where two ClipStream algorithms (with and without concept drift) were able to deliver up to 0.43% of the improvement, on average. Importantly, the results, in terms of the tariff improvement, are only the highlight for possible knowledge utilization based on the algorithms that were used for profiling the customers. Nevertheless, the results are promising although the improvements might appear negligible. Please note that the improvement rates of 0.40-0.43%, as provided in Table 9, directly influence the elasticity of electricity demand. In case of Poland, the whole installed capacity of the system is approx. 45,000 MW so the improvement of 0.43% is representing 193.5 MW which is an equivalent of one power block in the power plant. Therefore, if some of the usage can be shifted outside peak hours then the benefit is not only for the customers but also for the electricity operators who can purchase the electricity cheaper.

Other Applications-Australian Case Study
To proof the applicability of the dynamic profiling approach further analysis was conducted based on the data from the customer trial conducted as part of the Smart Grid Smart City (SGSC) project [56]. It provides sets of customer time of use (half hour increments) and demographic data for Australia between 2010 and 2014. For the purpose of the case study 998 households were randomly extracted covering 1 September 2012-28 February 2014 time frame. The reason to select that time frame was availability of complete data, i.e., without missing values. In total, 25,399 data points were analyzed, each representing half hour readings.
For the purpose of results discussion the average improvements were considered as presented in Table 10. It was observed that the implementation of the current best tariff is feasible and could deliver the benefits for both, overlapping and non-overlapping windows. Specifically, for non-overlapping windows the general tariff improvement was up to 0.96%, on average, depending on the algorithm. In case of tariffs improvement comparing to the G11 tariff plan the highest improvement was for overlapping windows, where two ClipStream algorithms with and without concept drift, were able to deliver up to 1.08% and 1.06% of the improvement, on average, respectively. The results are consistent with the results on Irish data set. However, this time an improvement is considerably higher what can influence directly the elasticity of electricity demand. More detailed analysis are presented in Appendix B, please refer to Tables A7-A16.

Other Applications-London Case Study
Another verification of dynamic profiling approach was conducted based on the data from UK Power Networks led Low Carbon London project [57]. The dataset contains energy consumption in kWh (per half hour) for the sample of 5567 London households observed between November 2011 and February 2014. The customers in the trial were recruited as a balanced sample representative of the Greater London population.
For the purpose of the case study 1000 households were randomly extracted covering 1 September 2012-28 February 2014 time frame. The reason to select that time frame was availability of complete data, i.e., without missing values. In total, 25,440 data points were analyzed, each representing half hour readings.
To enable comparison of the results with previous applications (case studies) the average improvements were considered, as presented in Table 11. It was observed that the implementation of the current best tariff is feasible and could deliver the benefits for both, overlapping and non-overlapping windows. Specifically, for non-overlapping windows the general tariff improvement was equal, on average, to 0.93% for Extended TS-Stream without concept drift. The lower improvements, between 0.15% and 0.39%, were observed for other algorithms. In case of tariffs improvement comparing to the G11 tariff plan the highest improvement was for overlapping windows, where histogram-based approach resulted in the improvement of 0.68%, on average. Other methods were able to deliver improvements between 0.49% and 0.65% which could be considered significant, too. The improvement for non-overlapping windows was slightly lower, i.e., 0.55% and similarly, it was observed for histogram-based clustering approach. The results are consistent with the results on Irish data and Australian data. Table 11. Summary results in terms of the average improvements on non-overlapping and overlapping windows for London data.

Tariff Improvement
Tariffs Improvement Comparing to the G11

Non-Overlapping Overlapping Non-Overlapping Overlapping
Extended More detailed results are presented in Appendix C, please refer to Tables A17-A26.

Conclusions
Data streams clustering is one of the most common ways of analyzing data that is potentially infinite and evolves over time. Although the literature provides some methods of the data streams clustering, unfortunately, majority of them are not appropriate for the whole time series data streams clustering. Even though electricity consumer objectives are usually based on monetary benefits, electricity providers benefit from the knowledge of consumer' profiles, to create individualized means aimed at consumers with compatible use profiles and socio-economic behavior. The analysis has shown that there are prominent distinction between consumers' behaviors, which allows us to distinguish homogeneous groups.
Through the CER Irish data analysis and two other case studies, i.e., Australian and London data sets, an attempt was made to evaluate different ways of time series data streams clustering by comparative study of the state-of-the-art algorithms, as well as new combinations employing elements from different algorithms. From the technical point of view the results introduce a general guidance on when and where to apply a particular clustering algorithm (along with its improvements).
It was revealed that the extension to the way of ARI index calculation (and its statistics) based on the upper triangle matrix, which compares blocks to each other, provides good evaluation framework, and it also allows to visualize the dependencies. This part of the research has shown that the best results, in terms of the similarity of the clusters, are provided by the histogram-based clustering algorithm. That is due to the fact that the algorithm always performs a partitioning using the same number of clusters and the underlying procedure is less fragile to any distribution changes than other two algorithms. Therefore, if the electricity providers need stable partitions this algorithm would be their first choice. Furthermore, to obtain a partition which provides clusters with the least weighted volatility the extended TS-Stream algorithm should be applied. It is mainly caused by the fact that this algorithm is able to expand or to shrink the tree structure very quickly according to the distribution changes of the particular phenomenon. On the other side of the pole is the ClipStream algorithm.
As it was presented in our previous work [45], standard TS-Stream algorithm outperforms benchmarking clustering methods and, in addition, this research indicates that these results can be further improved. The new Fast Fourier Transformation based features allow to improve the operation of the base for this algorithm. The new data representation slightly deteriorates the performance of the ClipStream algorithm; however it should be noted that this time a business interpretation is prevailing. Moreover, a much smaller dimension is needed to represent a given time series, i.e., only 5 features instead of 8 multiplied number of weeks (3 weeks for overlapping and 4 weeks for non-overlapping windows).
In terms of the implementation/software requirements all the algorithms are able to work in linear time, however the histogram-based algorithm requires O m 2 memory space. It also produces fixed number of clusters. For the ClipStream algorithm it is necessary to set up minimal and maximal number of cluster in advance (which sometimes might be impracticable or unfounded). The extended TS-Stream algorithm is the most flexible in its nature what allows to incorporate new descriptive measures, data representation and concept drift detection module.
When it comes to the comparison between overlapping and non-overlapping windows, as it might expect, statistics of the ARI and weighted volatility for the overlapping windows are usually better (base version of each algorithm). This is due to the fact that each time we analyze almost the same time series that differ only with one new added week.
Based on the comparative study between the state-of-the-art time series data streams clustering algorithms and their modifications we could perform the dynamic consumer segmentation and prediction of an optimal tariff. Finally, comparative study between overlapping and non-overlapping windows and their impact on the choice of an optimal tariff was undertaken what revealed that significant improvements could be reported due to tariff changes. Specifically, the percentage improvements, on average, were as follows: Irish data-0.40-0.43%; Australian data-0.96-1.08%; and London data-0.68-0.93%. Assuming that the overall capacity of the system is approx. 45,000 MW in Poland, thus the improvements may deliver elasticity of electricity demand which is between 193.5 MW (0.43%) and 486 MW(1.08%). Those values are considered a significant from market balancing perspective.
The direction for the future work will be to develop a fully scalable system (along with the results which are interpretable) for a large number of time series in the data stream, in the presence of: • Concept drift of different kinds, such as incremental, recurring, sudden, or gradual; • unstable number of sources (some sensors are newly created while other removed); • heterogeneous and missing recordings; • irregularly spaced data; and • assuming application of other approaches for classifying incoming continuous data in dynamic systems e.g., stochastic learning weak estimators.
Due to that, we will investigate different incrementally computable time series similarity measures. In the future, we will investigate the influence (sensitivity of the algorithm) of the input parameters on the final results.
Author Contributions: K.G. prepared the simulation and analysis and wrote the Sections 1-6 of the manuscript; M.B. wrote Section 1, Section 3, and Section 4; T.Z. wrote Section 1, Section 2, and Section 6 of the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.