Analysis of the Behavior Pattern of Energy Consumption

: Analyzing energy consumption is currently of great interest to define efficient energy


Introduction
Currently, there is an immense global demand for energy, which is necessary for the 35 functional consumption of most tasks of life, such as lighting, the use of computer equip-36 ment, household appliances, and other electronic devices. The aforementioned devices are 37 currently vital in our society. On the other hand, currently, different types of buildings 38 (residential, commercial, and industrial, among others) are being equipped with intelli- 39 gent devices, such as cameras, sensors, and different actuators [1]. These devices, together 40 with the communication infrastructure, characterize the Internet of Things (IoT) paradigm 41 [2]. 42 Due to the increase in energy required by this paradigm, consumption in homes has 43 increased between 1,232 and 1,460 kWh per year [3]. It is estimated that the energy con-44 sumption derived from this increase in devices will increase much more in the coming 45 years. For example, in Europe, it will go from 4 TWh in 2015 to 104 TWh in 2025 [3]. Due 46 to this increase in energy demand, there is a great concern to achieve greater efficiency 47 and optimization of consumption [1], [4], [5]. To do this, among other things, it is neces-48 sary to identify the consumption pattern of users, and based on that information, propose 49 strategies and mechanisms to save energy resources as much as possible. In the next two 50 sections, we compare other related works and describe the contributions of this paper. 51 Particularly, being able to know how the customer's energy consumption pattern evolves 52 can be useful in different energy management tasks [6]. For example, in the case of pro-53 viders to determine when there are more or fewer demands to adjust their offer, and in 54 the case of the client to know their peaks and from there look for an energy optimization 55 mechanism. 56 57 Based on the interest of this article, which is to develop methods that allow knowing 58 how the pattern of energy consumption of a client evolves, in this section, we will describe 59 those recent works close to this topic. Particularly, since we have not found any work on 60 this specific aspect, we present recent works linked to related topics, specifically, on the 61 prediction and optimization of energy consumption. 62 In general, the current applications of artificial intelligence, and specifically machine 63 learning algorithms, in the field of energy are enormous [6], [7], [8]. However, there are 64 no recent works on the study of behavioral patterns of energy consumption. Most of the 65 research works on the study of energy consumption and its characteristics focus on reduc-66 ing consumption and optimizing the use of energy resources using, for example, optimi-67 zation and predictive models. For example, in [8], Yoon et al. focus on the efficient use of 68 energy and its infrastructure in smart cities using machine learning techniques. The au-69 thors use machine learning to create a deep learning network in a smart city to analyze 70 and predict the energy consumption of IoT sensor devices. Wu et al. [10] use ML models 71 to predict the consumption of an intelligent building, with the aim of energy conservation 72 and environmental protection. 73 Xiao et al. [9] carried out a comparison of different configuration parameters for en-74 ergy models. They propose 2 scenarios, one only with data to predict energy efficiency, 75 and another that considers information from spindle motor aging and tool wear. For both 76 cases, they use Support Vector Regression models [14], Artificial Neural Networks [15], 77 and Gaussian Process Regression [16]. In the article [17], the authors proposed a predictive 78 model based on the energy consumption of the users, which allows monitoring and esti-79 mating the energy consumption. In this case, they make use of the K-means algorithm and 80 Support Vector Machines. 81 Other articles that analyze energy consumption, study strategies, and make predic-82 tions about energy consumption, among other things, are presented in [11] (based on Sup-83 port Vector Regression, [12] (based on Artificial Neural Networks), and [13] (based on 84 Random Forests). Also, for an analysis of user trends, there are currently various algo-85 rithms and methods, as well as techniques to reduce the complexity of the problem [9], 86 [10], [18], [19], [20]. As we can see in this section, and to our knowledge, there are no 87 previous works in the literature dedicated to studying the evolution of energy consump-88 tion patterns. Most of the works are dedicated to predict energetic behavior, to diagnose 89 what may happen in an energetic infrastructure, but none of them have focused on defin-90 ing energy consumption patterns and monitoring their evolution, and from there, propos-91 ing strategies of analysis and explainability of these patterns. 92 93 The objective of this work is to identify and analyze the behavior pattern of customers 94 according to their energy consumption profiles. In particular, it is necessary to identify 95 how the pattern of customer behavior changes over time. We propose to use online unsu-96 pervised machine learning algorithms to follow/analyze the evolution of energy con-97 sumption patterns. For this, we assume that the centroids of the groups obtained by the 98 clustering techniques represent the energy consumption patterns of the group. The main 99 contributions of the work are: 100 • We propose a framework to analyze the evolution of energy consumption 101 patterns.

102
• We adjust two clustering techniques to carry out an online clustering process 103 of energy consumption data. 104 The work is organized as follows, section 2 presents the unsupervised machine learn-105 ing used in this work. Section 3 describes the experiments and carries out an analysis of 106 the clusters obtained. Section 4 presents an analysis of the evolution of the patterns and a 107 general comparison in different datasets. Finally, the last section presents the conclusions 108 and future work. Unsupervised learning algorithms assume that the data is not labeled and they ana-111 lyze datasets to identify similarities between the data (similar data make up a cluster) [21]. 112 This paradigm is useful when the categories of the data are not defined, and one of the 113 techniques used in this area is clustering algorithms [22]. The main purpose of a clustering 114 algorithm is to separate the data into smaller subsets, called groups (clusters), such that 115 the content of the data is similar in each cluster but different from the content of the other 116 groups. The centroid of a cluster can be understood as its pattern. Particularly, we will use 117 unsupervised online learning to adapt the cluster to changes in user consumption pat-118 terns, enabling real-time updates [23]. In this work, we will use X-means and the LAMDA 119 algorithms. 120

121
The X-means algorithm is based on K-Means. The K-means algorithm is one of the 122 simplest and most common algorithms used in clustering, dividing the dataset into K 123 clusters. K-means tries to find the center of each cluster, which is representative of a data 124 region [21]. This point is called the centroid. Thus, K-means is a clustering technique based 125 on centroids [22]. K-means alternates 2 steps: 126 • Assignment of points/individuals to the nearest centroid 127 • Calculation of centroids 128 These steps are repeated in a loop until the centroids stabilize. 129 Particularly, in this work, we will use X-Means, which is an extension of K-Means 130 that allows varying the value of K (it does not have to be predefined at the beginning, as 131 it happens with K-Means) [24]. Thus, X-means is an incremental sequential K-means that 132 determines the value of K (clusters) based on a function f(K), which is defined by the fol-133 lowing Equation [25]: Where is the sum of the cluster distortions when the number of clusters is K (see 137 below), and is the number of attributes in the dataset. The term  −1 in the Equa-138 tion above is an estimate of based on −1 , made under the assumption that the data 139 have a uniform distribution. The value of f(K) is the ratio of the actual distortion to the 140 estimated distortion, and is close to 1 when the data distribution is uniform. When there 141 are areas of concentration in the data distribution, then will be less than the estimated 142 value, so f(K) decreases. The smaller f(K), the more concentrated the data distribution. 143 Therefore, values of K that produce a small value of f(K) can be considered to provide 144 well-defined groups. 145 On the other hand, the distortion of a cluster is the distance between the objects/in-146 dividuals of a cluster and its centroid, according to the following Equation [25]: Where is the distortion of cluster j, is the centroid of cluster j, is the number 150 of objects belonging to cluster j, is the object t belonging to cluster j, and d( , ) is 151 the distance between the object and the centroid of cluster j. Each cluster is repre-152 sented by its distortion, and the overall impact of all clusters on the entire data set is eval-153 uated by the sum of all distortions, , given by the following Equation [25]: Where K is the number of clusters. The number of clusters K is assumed to be much 157 smaller than the number of objects N. In particular, if for any immediate K f(K) shows 158 special behavior, in particular a minimum point, that value of K should be taken as the 159 number desired of clusters. Thus, X-Means converges when it obtains a minimum value 160 of f(K). 161 In this way, X-means determines if new centroids should appear within a current 162 model (Mj). The appearance of new centroids is carried out by dividing some clusters into 163 2, which have been classified as optimizable according to the Schwarz criterion (it is a 164 criterion for the selection of models among a finite set of models), based on the BIC value, 165 defined by the following equation [24]: 168 Where, ^( ) is the logarithmic probability of the data in the model Mj; is the 169 number of free parameters present in the model Mj; and R represents the number of sam-170 ples present in D (R = |D|). 171 In essence, X-means starts with a given K, goes on to add centroids (changes the K) 172 according to the value of f(K), and calculates the BIC score for each cluster to determine, 173 if any, which cluster to split. When X-Means converges (determines the ideal value of K 174 for that data set), then the final clustering is obtained. 175 176 LAMDA is a non-iterative fuzzy algorithm based on the degree of adequacy of an 177 individual (data) to a group. It provides great versatility since it allows not to specify the 178 number of clusters during the execution and, furthermore, it can work online [26], [27]. 179 LAMDA works by performing an evaluation of the similarity between the descriptors of 180 an element X of the form X = { 1 , 2 , . . . , , . . . }, which is its vector with m descriptors, 181 with the descriptors of the centroids of the existing clusters, to define in which cluster this 182 data X should be entered. In addition, once X has been assigned to a cluster, it becomes X 183 = { 1 , 2 , . . . , , . . . , }, i = 1, 2, … k, where is the label associated with X [26]. The base 184 definitions of LAMDA are summarized below [26], [28]. 185 Normalization. Each descriptor of X must be normalized, based on its maximum and 186 minimum values:

LAMDA (Learning Algorithm for Multivariate Data Analysis)
Where ( ̅ ) is the normalized value of descriptor j, is the minimum value of 190 descriptor j, and is the maximum descriptor of descriptor j. The element resulting 191 from normalization X will be used to compute the degree of adequacy of the element to 192 each existing cluster. 193 Degree of Marginal Adequacy (MAD). Determines the degree of similarity of a de-194 scriptor with respect to another descriptor in a given class. For the calculation of the MAD, 195 density functions are used, the most common is the fuzzy binomial function: Where is the mean value of descriptor j in the cluster k, calculated by: 200 is progressively updated each time a new element is added to the cluster. 201 The function for ( ̅ / ) is the density function of the binomial distribution, 202 which can be interpreted as the probability that the analyzed normalized descriptor be-203 longs to a cluster j, given its mean . 204 Degree of Global Adequacy (GAD). Determines the degree of adequacy of a sample to 205 each existing cluster, it is calculated by mixing the MAD with aggregation functions. 206 These functions are interpolations between the t-norm (T) and the t-conorm (S), like the 207 Dombi operator [29]: In most cases, p = 1 is used to obtain an approximation close to a linear behavior of 210 the t-norm and the t-conorm [29]. 211 There is also a requirement parameter 0 <α < 1, used to calibrate fuzzy partitioning 212 data [30]. If α = 1 then GAD is calculated as the t-norm, obtaining a stricter clustering. If α 213 = 0 then GAD is computed as a t-conorm, leading to a more permissive grouping. Thus, α 214 produces a linear interpolation between the t-norm and the t-conorm to calculate the GAD 215 [31]. 218 On the other hand, when an individual (data) does not belong to any class, then a 219 non-informative class (NIC) is created, which will be a new cluster. The GAD of the data 220 entering the NIC is computed considering that = 0.5, independent of the value 221 of ̅ : That element that enters the NIC becomes the first element of the new cluster. 225 Finally, the assignment of elements to a cluster is done by calculating the maximum 226 GAD of all classes. The index (in) corresponds to the number of the class where the ele-227 ment will be assigned:

231
In this section, we will explain how we perform the instantiation and execution of 232 the two techniques presented in the previous section. 233

234
For this experiment, we have used a real dataset from [32]. The first task is to divide 235 the dataset into several files by time periods. In our case, they were divided by months or 236 quarters. From the original data, more data has been generated using the distribution of 237 each variable in the dataset, in order to increase the amount of data for our execution. 238 This first dataset corresponds to data taken from a commercial building in 2018. The 239 building had a maximum hourly consumption of 48 W/m 2 , and the annual consumption 240 was 183.2 kWh/m 2 [32]. Each variable in the dataset was taken every half hour throughout 241 the year, breaking down the total consumption in kW as follows: total consumption, light, 242 heat pump, air treatment units, circulation pumps, heating and hot water, cooling, air 243 coolers, and elevators. 244

245
To evaluate the quality of the online clustering algorithms we have used two metrics. 246 An ideal metric for distance-based algorithms like X Means (Silhouette coefficient [33]), 247 and another metric for density-based algorithms (Davies-Bouldin index [34]). 248 Silhouette 249 The Silhouette coefficient is a measure of the cohesion of the clusters. It determines 250 the degree of similarity between the objects of the same cluster [33]. To get this measure-251 ment, the average of the proximities between its elements is calculated. This metric is 252 therefore effective in situations where the clusters have a circular shape [23], [33] or are 253 grouped around a point. The silhouette coefficient for a data sample is determined with 254 the mean of the silhouette coefficient for each sample data, calculated as [33]: Where a(i) and b(i) are computed for each sample i of the cluster , where d(i, j) 259 is the distance between the points i and j. 260 The coefficient gives a result between -1 and 1. Values close to 1 are the most optimal, 261 those close to 0 indicate that there are overlapping clusters, and negative values generally 262 indicate that there are samples erroneously assigned to clusters. As a general rule, the 263 higher the silhouette coefficient, the better defined the clusters will be [23]. 264

Davies-Bouldin
The Davies-Bouldin index is defined as the mean similarity of each cluster with its 266 most similar cluster. This measure compares the distance between both clusters with the 267 size of the clusters themselves [34]. The measure can be used to infer the adequacy of a 268 data partition. The Davies-Bouldin index is calculated as [34]: Where is the similarity between the clusters i and j. There are different ways to 272 calculate , one of them is = + ⁄ , where is the average distance between 273 each point of cluster i and the center of cluster i, and d(i, j) is the distance between the 274 centroids of the clusters i and j. 275 The minimum value that can be obtained using this index is 0, which is the case when 276 there are as many clusters as there are individuals. Therefore, it is understood that the best 277 values of this metric are those closest to 0 since they indicate a better partition and a model 278 with better separation between clusters [23], [34]. 279

280
Next, we proceed to describe how the clustering models are obtained with each al-281 gorithm, using a time period (iteration) of a month. 282 X-means 283 In the first iteration, k is initialized to 3 (number of initial clusters), a value that X-284 Means then optimizes in that first iteration (month). In the following iterations (months), 285 the algorithm readjusts that value of K. In the specific case of the dataset used, X-Means 286 determines that 20 clusters are necessary on its second iteration. This number of clusters 287 is maintained throughout the 12 months, X-Means determines that it is the ideal value of 288 K in each iteration (month). 289 We will start by evaluating the centroids of the 20 clusters from January to December 290 in Figures 1 and 2. The centroids for the analysis are normalized between 0 and 1 to graph 291 them (it is the X axis of Figures 1-5), and then the energy consumption represented by 292 them is what we analyze next. Looking at both Figures, it can be seen that in a range of 293 approximately 0.14 and 0.06 in the centroids (equivalent to a range between 200kW and 294 400kW over the total of kW), there are 10 clusters. We also see 2 clusters in the upper range 295 of Figure 2, which stand out for being separated from those in the middle zone. Clusters 296 19 and 20 are isolated from the rest throughout the run, slightly converging and stabilizing 297 at the end of the run. Particularly, in Figure 2, we see that in the summer months, clusters 298 19 and 20 behave erratically, perhaps with a greater number of clusters this behavior 299 would be softened. 300 In this particular case, clusters 19 and 20 represent a high consumption, in one case 301 the consumption is higher due to the circulation pumps, and in the other case due to the 302 heat pump and heating and hot water. Finally, cluster 5 represents the pattern with the 303 lowest consumption (around 250kW), which is generated by several variables (lights, re-304 frigeration, and elevators). Limiting the upper range of clusters to 15 (this value is when X-Means has the 311 best performance), we obtain a more detailed view of the clusters in Figure 3. We find 10 312 clusters that never exceed 0.15 (500kW of the total), regardless of the time of the year. On 313 the other hand, we see in Figure 3 how in the last quarter the variations are minimal. It 314 can be deduced from this that a suitable and stable grouping has been reached, with well-315 defined groups. Some clusters represent the consumption of more than 600 kW, such as 316 clusters 14 and 15. According to their centroids, in one case it is for heat and circulation 317 pumps, and in the other for air treatment units, cooling, and air coolers.   For the execution of LAMDA, an implementation of this algorithm has been used 322 following what is indicated in the article [28]. In the same way as in the execution of X-323 means, the data is evaluated month by month. In the first iteration, the algorithm starts 324 with a single empty cluster, and new clusters are created each time an element enters the 325 NIC. We remember that the values that enter the NIC are those that have not managed to 326 be located in existing clusters. All values are normalized before starting their evaluation. 327 Particularly, the centroids are normalized between 0 and 1. LAMDA eliminates, merges 328 and creates clusters depending on the GAD and the defined neighborhood threshold. 329 In Figure 4 can be seen how at the beginning of the execution, in January, 16 clusters 330 are created, although 3 of them (10, 11 and 13) are merged after the first month. These 331 remaining 13 clusters are maintained throughout the rest of the run. All the clusters arrive 332 at a different point, except for the sets {7, 8} and {4, 5} whose centroids end up being quite 333 similar, although their trajectory over the months is very different. In this case, the value 334 of the centroids of clusters 10, 11 and 13 differ mainly in the variables light, air coolers, 335 and elevators. Similarly, the difference among clusters 7 and 8 is mainly in the values of 336 heat and circulation pumps, and in the case of clusters 4 and 5 of air treatment units, cool-337 ing, and air cooler.  In Figure 5 we see the rest of the clusters created throughout the execution. Starting 342 in February, new groups are being created, and it can be seen that there are stable clusters 343 and others that vary over time. For example, cluster 33 completely changes its trend, going 344 from being in a range of 0.2-0.25 in July to dropping to 0.09 in October, establishing itself 345 as the only cluster in that low value. Here we can also see the last cluster that is created in 346 August, this being number 40. Particularly, cluster 33 represents a decrease in energy con-347 sumption to less than 400 kW, derived mainly from the values of heat pump, heating and 348 hot water, and cooling.  Based on the metrics, LAMDA consistently performs better on both metrics. On the 357 other hand, the Silhouette coefficient is an excellent metric in data with circular spatial 358 behavior, while Davies-Bouldin is better in other cases. According to the results obtained, 359 it could be intuited that the spatial distribution of the data is circular, so silhouette would 360 be the best metric to compare them. Now, X-Means does not change the clusters, while 361 LAMDA adjusts the number of clusters over time. Thus, the advantage of LAMDA is that 362 it automatically checks the need to merge and create new clusters. We are interested in 363 studying this evolution in the next section.  365 In this section, we analyse the evolution of LAMDA clusters by month and quarterly. 366 Subsection one studies in detail how LAMDA is creating and merging the clusters over 367 time, and subsection 2 extends the periods to quarters, to evaluate the capacity of LAMDA 368 for larger periods. Also, at the end, we discussed the size of the clusters. 369 370 Comparing the evolution of both algorithms, at the end of the execution, there are 20 371 and 26 clusters for X-means and LAMDA, respectively. In this section, we will analyze the 372 evolution of LAMDA clusters, since it presents the best results and has a more dynamic 373 behavior, creating and merging clusters throughout the execution. 374 We will start by analyzing the creation and merger of clusters shown in Table 2. Let 375 us remember that the online clustering process is cumulative, that is, the behavior of the 376 previous month is taken into account. Table 3 shows the reference month, the identifier of 377 the clusters formed, the total number of clusters formed at the moment, and also com-378 ments where it is mentioned if there is a merger of some clusters, as well as the number 379 of clusters that are added in the month. 380 Initially, 16 clusters are formed, of which the clusters identified with the numbers 10, 381 11 and 13 merge with other clusters, leaving a total of 13 in the first month. For the second 382 month, it is observed that apart from the 13 clusters created the previous month, initially 383   We decide to analyze the evolution of the clusters by quarter. We can see in Figure 6 396 how the general tendency is to remain stable and follow a predictable trend. The especially 397 erratic behavior that appeared in clusters 19 and 20 in Figure 2 is no longer visible. These 398 clusters have few individuals compared to the rest of the groups, which makes them more 399 volatile to small changes or new inclusions in the cluster. These are clusters that represent 400 patterns with high consumption (more than 700 kW). Cluster 39 has a behavior pattern 401 similar to that of 16, so, over time, if they maintain this trend, it is possible that they will 402 unify because the difference is due to the values of the light and elevators. In the same 403 way, we can study the behavior of 3 clusters that are approaching in December, these are 404 clusters 26, 27 and 40, which are grouped below the 0.22 value. However, this case is dif-405 ferent from the previous one since they only approach the end of the analysis, as we can 406 see in Figure 10 (the difference is due to the values of air treatment units and air coolers). 407 In this case, we should be aware of their evolution since the 3 come with different trajec-408 tories.  416 can be seen that their trajectory (once they have a high number of elements) is more stable, 417 and only small corrections are made to their centroids as individuals are added to the 418 groups. We see then that the majority of individuals are in these 3 groups. Particularly, in 419 December its centroids are 15 = 0.170, 33 = 0.092 and 34 = 0.119 (see Figure 6). Cluster 15 420 represents a pattern with medium consumption (more than 500 kW) due to mainly heat 421 and circulation pumps, and heating and hot water. Similarly, cluster 34 represents a pat-422 tern with medium consumption (less than 400 kW) due to mainly cooling, air coolers and 423 air treatment units. Finally, cluster 33 is a pattern of low consumption (less than 300 kW). 424 In Figure 7, we see how most of the elements/individuals have been assigned in these 427 3 clusters. Between them, they occupy almost 90% of the data occupation, the rest of the 428 data is found in the remaining 23 clusters. With this, we can see that most of the elements 429 are in the medium and low threshold of consumption, the centroid with the highest value 430 of this trio of clusters is that of cluster 15, with 0.17 (less than 600 kW). 431 Let's analyze cluster 33, which has more individuals. In a cluster whose trend (evo-432 lution) reflects a fall in the last quarter of the year, which may be due to the fact that the 433 use of these office/laboratory spaces is reduced at that time of the year. We can see the 434 next values in the centroid variables, the average consumption of light of 3.5 kW, of the 435 heat pump of 24.2 kW, of air treatment units of 2.8 kW, and circulation pumps of 0.42 kW. 436 They reflect a space with a moderate consumption of energy, mainly derived from the 437 consumption of the heat pump. As this is the device with the highest consumption, its use 438 in heating and cooling tasks could be analyzed to optimize it. 439 As can be seen, the detailed analysis, both at the temporal level and at the level of the 440 values of a pattern, allows two things: i) determine the energy behavior over time to es-441 tablish temporary improvement measures (for example, in the months of greatest con-442 sumption search for less expensive energy sources) ii) determine the devices that consume 443 the most, the reason, in order to establish strategies that optimize them. 444 445 To show the feasibility of the energy consumption evolution analysis process based 446 on our online clustering algorithms, several energy consumption datasets are used in this 447 section. Table 3 shows in the first column where the datasets were drawn from, and in the 448 following columns the quality of the techniques in each of the performance measures an-449 alyzed in the work. This allows determining if the clusters obtained in each case are of 450 high quality.  According to the results, we see that LAMDA is a very robust method. In particular, 461 in the different datasets, it obtains the best result. It is a very robust algorithm regardless 462 of the energy consumption dataset (time series type). In addition, we see in the previous 463 results (see section 5) the ability of LAMDA to create or merge clusters over time to adapt 464 to the context. 465

466
In this work, we have performed online clustering algorithms to analyze the evolu-467 tionary behavior of energy consumption patterns, understood as the centroids of the 468 groups they propose. By using X-means and LAMDA, we are able to delegate decision-469 making about the number of clusters to the algorithms. This was particularly shown in 470 LAMDA since it was able to increase and/or decrease the number of groups. In X-Means, 471 we couldn't see this behavior, since from the first iteration it created the maximum num-472 ber of clusters. On the other hand, an analysis without a cluster limit is more appropriate 473 in a real scenario (for example, in the case of X-means, the values of K were bounded in 474 one case), regardless of the time it takes. In addition, with X-Means the abnormal behavior 475 of some clusters was observed (affected by outliers). 476 The analysis of the centroids with LAMDA has made clear the great difference in 477 consumption between users. In addition, according to its evolution, consumption trends 478 can be studied. In short, the analysis of the evolution of the centroids of the groups allows 479 making more precise decisions in the energy world (months of higher consumption, ab-480 normal behavior, etc.). Thus, something relevant is how the variables that generate more 481 energy consumption can be analyzed, particularly, the evolution of this consumption 482 through the months (for example, cluster 33 in Figure 6). In general, in the patterns that 483 represent high energy consumption, the variables responsible for this high energy con-484 sumption are clearly identified. Normally, these variables, in some cases were heat and 485 circulation pumps, and heating and hot water, and in others were air treatment units, 486 cooling, and air coolers. These combinations of variables are closely linked to high con-487 sumption. Likewise, there is a relationship between these variables with respect to the 488 time of year, due to the environmental impact of the time of year on these variables. On 489 the other hand, it can also be identified in the centroids that the variables that have very 490 little impact on energy consumption are light and elevators. 491 Thus, we have shown in this work the feasibility of using online unsupervised learn-492 ing approaches to monitor energy consumption patterns. In addition, with our approach, 493 it is possible to analyze and explain in detail the evolution of energy consumption using 494 the cluster centroids, with which it is possible to study their behavior over time, and de-495 termine the specific energy behavior of the devices. With both, optimization strategies can 496 be defined, both at a global level (according to the customer's consumption trend) and at 497 a specific level (in the devices). 498 In general, the pattern of energy consumption behavior of a customer/user can be 499 used by both suppliers and consumers. In the case of consumers, know their energy con-500 sumption and, based on this, optimize it, carry out optimal management of it, among 501 other things, and in the case of suppliers to adapt their offer to the needs of users, among 502 other things. One of the limitations of this work is that it has been carried out with da-503 tasets, but in a real context, a robust platform will be required that captures in real time 504 the different energy consumption values of the different devices to be monitored. Another 505 limitation is the dependence on the quality of the data from the clustering process, which 506 may affect the quality of the results when there are many atypical values, missing data, 507 among other aspects. 508 Some aspects to take into account for future work are: i) Have data on energy con-509 sumption (applied in our case) together with user profile, to favor a more complete and 510 specific analysis (for example, profiling the energy behavior of an individual) and; ii) an 511 automatic construction of the analysis of the evolution of the clusters would be ideal (give 512 more explainability to the centroids that are obtained), to help decision-makers. Therefore, 513 a future work should define hybrid models that combine online clustering algorithms 514 with techniques that allow predicting some of the energy variables. Also, this work will 515 be extended to analyze these patterns using explainability techniques, to establish an in-516 terpretability of the patterns from the behavior of the attributes that make up the cen-517 troids. Finally, future works will use these results in an intelligent energy management 518 system, in order to personalize their behavior in the function of the consumer's energy 519 pattern.