Unsupervised Mixture Models on the Edge for Smart Energy Consumption Segmentation with Feature Saliency

Smart meter datasets have recently transitioned from monthly intervals to one-second granularity, yielding invaluable insights for diverse metering functions. Clustering analysis, a fundamental data mining technique, is extensively applied to discern unique energy consumption patterns. However, the advent of high-resolution smart meter data brings forth formidable challenges, including non-Gaussian data distributions, unknown cluster counts, and varying feature importance within high-dimensional spaces. This article introduces an innovative learning framework integrating the expectation-maximization algorithm with the minimum message length criterion. This unified approach enables concurrent feature and model selection, finely tuned for the proposed bounded asymmetric generalized Gaussian mixture model with feature saliency. Our experiments aim to replicate an efficient smart meter data analysis scenario by incorporating three distinct feature extraction methods. We rigorously validate the clustering efficacy of our proposed algorithm against several state-of-the-art approaches, employing diverse performance metrics across synthetic and real smart meter datasets. The clusters that we identify effectively highlight variations in residential energy consumption, furnishing utility companies with actionable insights for targeted demand reduction efforts. Moreover, we demonstrate our method’s robustness and real-world applicability by harnessing Concordia’s High-Performance Computing infrastructure. This facilitates efficient energy pattern characterization, particularly within smart meter environments involving edge cloud computing. Finally, we emphasize that our proposed mixture model outperforms three other models in this paper’s comparative study. We achieve superior performance compared to the non-bounded variant of the proposed mixture model by an average percentage improvement of 7.828%.


Introduction
The predictive power of machine learning holds the key to deciphering intricate patterns and driving efficient solutions for a sustainable future, particularly in the realm of smart meter data modelling and utility program improvement.This predictive capability has already played a crucial role in ensuring global food supplies, a once seemingly insurmountable challenge.Scientists have been instrumental in harnessing this power for the benefit of society.As we embark on this new era, machine learning is poised to further our understanding of the world, optimize resource utilization, and reduce our environmental impact, ultimately promoting prosperity and sustainability.In this pursuit, our focus is on the intricacies of smart meter data modelling and its application in enhancing utility programs, such as energy efficiency and demand response.The implementation of Advanced Metering Infrastructure (AMI) across Europe stands as a notable catalyst behind the surpassing of energy efficiency targets outlined in the EU's 20-20-20 energy policy.Building on the triumphs in Europe, smart meter deployments have transcended borders, becoming a global phenomenon in nations striving to modernize their electricity grids.Consequently, these groundbreaking advancements in energy metering technologies have birthed a trove of high-quality, consistently sampled electrical power consumption datasets.This surge in data dimensions underscores the compelling necessity for meticulous feature selection within the domain of machine learning models.This ensures the prioritization of the most enlightening attributes while simultaneously mitigating noise and curtailing computational expenditures.Within the machine learning context, "features" denote the distinct measurable properties or intrinsic characteristics of data that serve as the essential input for predictive models.In the scope of this paper, when we allude to "features", we specifically refer to the statistical metrics derived from time-series data or the readings gleaned from smart meters pertaining to a particular energy consumer.Our work delves into the challenges and potential of this domain, introducing methodologies that not only improve the predictive accuracy but also enhance transparency and interpretability.Our goal is to ensure that every stakeholder, from scientists to policymakers, can fully utilize the potential of these advancements for a brighter and more sustainable future.
The challenge of smart meter data modelling using clustering techniques is pivotal in advancing utility programs geared towards achieving energy sustainability and fostering a better future.In this context, the integrated IoT architecture for smart metering proposed by the research in [1] provides valuable insights into the technological foundations of modern smart metering systems.Effectively harnessing high-frequency smart meter data to understand consumer energy consumption behaviour presents a significant opportunity.Research papers, exemplified by [2], have delved into the segmentation of household energy consumption using hourly data, enabling the identification of intricate consumption patterns.Likewise, the work in [3] revolves around the analysis and clustering of residential customers' energy behavioural demand using smart meter data, facilitating the recognition of distinct consumption behaviours [3][4][5][6][7].These modelling solutions offer substantial benefits by providing utility programs with tailored insights.They empower utilities to develop strategies for energy efficiency and demand response that are intricately aligned with consumer behaviour.Ultimately, this not only enhances energy sustainability but also contributes to the creation of a more environmentally responsible and prosperous future.
Moreover, the richer and more granular data may lead to more complex and diverse consumption patterns, necessitating the use of flexible distributions in statistical models to capture the nuances in class data distributions effectively [8][9][10].DR is an incentive program that allows utility companies to save money on unnecessary investments and lower emissions of greenhouse gases (GHG) [8][9][10].DR induces households to reduce their energy consumption levels at high wholesale market prices or when system reliability is jeopardized.EE programs aim to reduce the power demand of households while maintaining their consumption habits [8,[11][12][13].Traditional machine learning exploratory analysis tools, such as unsupervised learning techniques, transform smart meter information into valuable information participating in customer clustering [8].Clustering is a statistical data analysis technique that can uncover or infer intrinsic properties and cluster the data into several components according to the observations' similarities [8].As a soft clustering approach, the Gaussian mixture's reliability and minimal impact on computational capabilities have made it a good candidate for modelling smart meter data [8,[14][15][16].The Gaussian distribution does not fit data well within a mixture model if the data have an asymmetric distribution, as demonstrated in Figure 1.The estimation of data-bounded support regions using Gaussian mixture models has been a notable avenue of research, with advancements in vector quantization techniques [17][18][19][20][21].The deployment of AMI has introduced high dimensionality in modern energy consumption datasets [4].Patterns are easily distinguished within observations represented with features of high entropy.Feature selection has several advantages: it is well established to improve the performance of model-based classification [22], and it helps to develop interpretable models that are reduced in complexity within applications across several disciplines [23].The search for the optimal number of clusters and the optimal set of features is an interrelated optimization problem [23].
However, searching for the optimal set of features is challenging in an unsupervised setting because there is no clear criterion for the optimization process, since the number of clusters is unknown [23].Historically, to find the optimal number of features, an exhaustive search is done through the space of all feature subsets [24][25][26].Additionally, non-exhaustive search techniques do not guarantee finding the optimal feature subset.Therefore, an efficient solution was proposed within an unsupervised setting [23]; the optimal feature subset search is converted into an estimation problem parallel to the learning of mixture models, where a vector of feature weights is estimated using the expectation-maximization (EM) algorithm [23].In our experimental analysis, our proposed method outperforms the asymmetric generalized Gaussian mixture model-based feature selection (FSAGGMM), the bounded asymmetric generalized Gaussian mixture model (BAGGMM), and the asymmetric generalized Gaussian mixture model (AGGMM) according to several performance evaluation metrics.Additionally, our proposed mixture model has been implemented using Concordia University's High-Performance Computing (HPC) Facility: Speed [27].
The current energy consumer segmentation approach distinguishes itself from previous works by effectively modelling different representations of smart meter data, taking into account the class data bounds, inferring the true number of consumer clusters, and finding the optimal set of features in a single optimization process.The rest of the paper is organized as follows: in Section 2, we inform the reader about all the prior works within the context of this paper.in Section 3, we describe the proposed feature selection model based on the bounded asymmetric generalized Gaussian mixture model (FSBAGGMM).Section 4 explains how the mixture model's parameters are estimated and how the MML's objective function is derived for our specific case.Section 5 exhibits the experimental results in the context of household energy consumption segmentation by comparing the performance of our proposed algorithm against several state-of-the-art clustering algorithms.Finally, we discuss and conclude our research in Sections 6 and 7, respectively.

Prior Works
Numerous applications leverage energy consumption data, benefiting from the increased feasibility and reliability facilitated by smart meters.Non-intrusive load monitoring (NILM) has enhanced heating, ventilation, and air conditioning (HVAC) fault detection through smart meter readings, eliminating the need for additional sensors [28].Smart meter data serve as valuable input for load forecasting and energy efficiency recommendations [29].Customer-oriented solutions, such as user-friendly web portals for bill understanding, have also been proposed [30].Additionally, energy consumption data inform predictive models and offer consumption insights, further contributing to energy efficiency [29,31,32].Previous research has addressed key aspects of smart meter data analytics.The research in [33] focused on smart-meter-driven segmentation, while the research in [34] introduced layer-wise relevance propagation for smart grid stability prediction.The research in [35] optimized deep models for improved smart grid stability prediction.Additionally, the research in [36] explored customer segmentation based on smart meter data analytics.These studies form the foundation for our research, covering various aspects of smart meter data analysis and its applications.
Clustering has proven helpful to find energy consumption patterns in low-and highvoltage customers [37,38].Additionally, demand management programs have successfully utilized clustering in order to select suitable candidate energy consumers [39][40][41].Thus, several approaches have been employed for the segmentation of energy users, such as Euclidean distance-based clustering [31,38] and multi-resolution clustering in the spectral domain [42].Similarly, several clustering methods, such as hierarchical clustering, K-means, fuzzy K-means, and self-organizing maps (SOM), have been used to cluster consumers with similar energy consumption patterns in [37].SOM was tested for its capability to classify consumption profiles in [43].Clustering has also proven useful to enhance energy consumption prediction using a two-layer feed-forward artificial neural network [10].The Gaussian mixture model, optimized by the EM algorithm, was utilized in [32,44] as a non-distance-based consumer segmentation tool.Other finite mixture models have also been used within the context of the same application [45].
In order to model smart meter data in different representations, several limitations imposed by the Gaussian mixture model must be overcome.Several distributions have been used as a base distribution of mixture models to overcome the shape rigidity of the Gaussian distribution, such as the Student's-t distribution [46][47][48] and the generalized Gaussian distribution (GGD) [49][50][51].Compared to the Gaussian distribution, the Student's-t distribution has an additional parameter (ν) called the degree of freedom that allows the distribution to generalize to different probability distributions.The Student's-t distribution is identical to the Cauchy distribution when (ν = 1) and approaches the Gaussian distribution as (ν) approaches infinity.As for the GGD, the additional parameter per component (λ) is called the shape parameter; it controls the tails of the distribution, making it far more flexible to different types of data and less vulnerable to outliers [52][53][54].In more recent studies, the asymmetric generalized Gaussian distribution (AGGD) was used as a base distribution for mixture models [55,56].The AGGD can generalize to a large class of distributions, such as the impulsive, the Laplacian, the Gaussian, and the uniform distributions, in addition to the ability to fit asymmetric data [57].Additionally, and in order for mixture components to fit better to real-life data, the bounded support concept was adopted in several finite mixture models [17,20,58].
Several feature extraction methods have been utilized to process high-dimensional data in electrical load observations and convert them into a new set of reduced feature spaces.In [59], a scalable algorithm for data processing has been proposed for a dataset collected from 10,000 Australian homes over a year.Dimensionality reduction is accomplished by employing a sparse representation technique in [60].An encoding system has given representations for energy consumers with a pre-processed dictionary in [2].The discovery of prominent energy consumption time windows is crucial for feature extraction and, therefore, in modelling the typical consumer's behaviour.Through a thorough analysis of several smart meter trials, researchers have been able to identify four time periods where the most extensive distribution of peak demand occurs within smart meter datasets [32].The energy consumption data within the specified time periods were used to calculate seven weakly correlated features.Projection methods such as principal component analysis (PCA) were also used to concisely represent a consumer's load curve [37].
In the context of the energy consumption segmentation application, a feature selection approach based on genetic algorithms has been utilized effectively in [31] to reduce the high dimensionality of smart meter data and improve the clustering performance of k-means.In general, several exhaustive search methods are conducted to perform feature selection, such as sequential forward search, backward search, floating search, beam search, bidirectional search, and genetic search [24][25][26]61].However, more recently, several studies have approached the problem of finding the optimal set of features as an optimization problem within the context of mixture-based clustering in several real-life applications [56,62], thus achieving feature selection with minimal computation expenses.
Various methods have been employed to determine the optimal number of energy consumer clusters.Diverse clustering evaluation metrics and scenarios have been utilized, with the best scenario dictating the optimal number of consumption profiles [63,64].Additionally, an entropy-based evaluation index was applied to time series data for cluster optimization [31].Probabilistic model selection methods, such as the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC), were used in different studies to select the optimal cluster count [32,65].It is worth noting that the AIC tends to favour more complex models, particularly with smaller training datasets, while the BIC leans toward simpler models.Another superior approach is the Minimum Message Length (MML) criterion, known for its excellence over BIC and AIC [66][67][68].MML, combined with the feature-weighting mixture model [23], simultaneously performs model and feature selection, avoiding exhaustive searches.This paper builds on prior research that has evolved mixture models to become increasingly flexible and assumption-light, aiming to better capture real-world data complexities.Our proposed model leverages this accumulated knowledge to introduce a more flexible approach.

The Unsupervised BAGGMM-Based Feature Selection Model
Mixture models are a powerful approach to model incomplete data.The observations in this paper are represented as a set of vectors X = { X 1 , X 2 , X 3 , . . ., X N }, X i ∈ R D , i ∈ {1, 2, 3, . . ., N}.We aim to model data in X using a mixture model with M components where M ≥ 1.It is possible to state that the D-dimensional random variable X i = (X i1 , X i2 , . . ., X iD ) is sampled from a M component mixture model if its probability density function can be written as follows: where Θ represents the set of parameters of all the M-component mixture models.The term p k represents the mixing proportion of the component k; by definition, p k is positive and ∑ M k=1 p k = 1.The likelihood function gives the joint distribution for all the observations: In order to define the complete data likelihood, an M-dimensional vector of unobserved variables is defined, and it is denoted by Z i .For each observation i, the unobserved binary vector is assigned with 0s, except at the k'th position, where the cluster is responsible primarily.The complete data likelihood is defined as follows: where Z = { Z 1 , . . ., Z N }.The features in Equation (2) are considered to be of equal importance.However, in the context of a real application, the estimation of the feature weights is an effective approach to better model data [37,38].The integration of the feature selection approach within the mixture model involves considering that the irrelevant features are modelled with a background Gaussian distribution as in [23].In this paper, feature weights are estimated for all the mixture components.Therefore, the background Gaussian distribution has a single set of parameters β = { η, δ}, where η represents the vector of means for all the data dimensions and δ represents the standard deviation vector.Thus, we are proposing to rewrite Equation (2) to adopt feature relevancy as follows: where β = {(η 1 , δ 1 ), . . ., (η D , δ D )}.The unobserved binary vector ϕ = (ϕ 1 , . . ., ϕ D ) indicates the relevancy of each feature.By assuming that the elements within vector ϕ are mutually exclusive and independent of the component label Z, we have After the marginalization over ϕ, the obtained mixture model is formalized as follows: where Θ M = [Θ, ω, β] is the complete set of parameters that define the proposed mixture model.The vector ω = (ω 1 , . . ., ω D ) quantifies the feature importance with a set of weights where ω d = p(ϕ d = 1).Thus, Equation ( 6) represents the probability density function that is assumed to generate the data.The foreground distribution or the mixture base distribution p(X id |θ kd ) models the relevant attributes of each latent class in the data.Several distributions have been proposed for feature selection in the context of mixture models, such as the asymmetric Gaussian distribution (AGD) [62] and the asymmetric generalized Gaussian distribution (AGGD) [56].However, these distributions are unbounded with a support region that extends across the set of real numbers.Real-life datasets are mostly digitized and have bounded support [18].Therefore, we propose the bounded asymmetric generalized Gaussian distribution (BAGGD) to model the relevant features of each component in the mixture.The BAGGD distribution generalizes several different distribution classes, such as the impulsive, the Laplacian, the Gaussian, and the uniform distributions, to fit different shapes of observed bounded support, asymmetric, and non-Gaussian data.
In order to define the bounded distribution proposed in this paper, the bounded support region τ kd in R for each component is first defined for the following indicator function: The bounded asymmetric generalized Gaussian probability density function for each Ddimensional data point is defined as follows: The unbounded distribution p(X id |θ kd ) is the asymmetric generalized Gaussian distribution (AGGD).The symmetric and asymmetric generalized Gaussian distributions are defined in Equations ( 9) and ( 10), respectively. where ; θ kd = [µ kd , σ l kd , σ r kd , λ kd ] represents the set of parameters that defines the AGGD for each mixture component.µ kd , σ l kd , σ r kd , and λ kd denote the mean, the left standard deviation, the right standard deviation, and the shape parameter of the AGGD, respectively.The shape parameter controls the distribution's tails.The larger its value, the flatter the distribution at the mean; the smaller it is, the more peaked the distribution at the mean.The right and left variance combination allows the probability density function to be asymmetric or non-asymmetric.Thus, the proposed mixture model would consider the different shapes, asymmetry, and bounded support region of the smart meter data.Bounded distribution generalizes to all its special cases, including the bounded variants [18].Thus, our proposed FSBAGGMM generalizes to a wide range of mixture models, including the bounded variants, as shown in Table 1.Additionally, we will demonstrate in Section 5 how the proposed FSBAGGMM can generalize feature selection models based on the asymmetric generalized Gaussian mixture, in addition to several specific mixture models in terms of modelling smart meter data.

Special Case Required Change in FSBAGGMM Parameters
Feature selection model based on the Asymmetric Generalized Gaussian Mixture (FSAGGMM) [56] H(X id |k) = 1 Feature selection model based on the Bounded Asymmetric Gaussian Mixture (FSBAGMM)

Model Parameter Estimation and Selection
In this section, we will explain how the feature weights and the mixture model parameters are estimated for the modelling of the training data, in addition to the model selection criterion.We propose an approach to reveal the valid number of intrinsic clusters within a dataset using MML and estimate the proposed model's parameters using EM.

Parameter Estimation Using the EM Algorithm
The mixture model's parameters are optimized in parallel with the features' weights in each iteration using the EM algorithm.The iterations of the EM algorithm produce a sequence of models with a non-decreasing log-likelihood.The parameters are optimized to achieve the maximum log-likelihood, and the log-likelihood function is expressed as follows: The EM algorithm has made the optimization process for mixture models feasible through an iterative process using Equation ( 11) instead of Equation (2).The conditional expected values γ(Z jh ) and ωd are given by Equations ( 12) and (13). where The EM algorithm consists of a loop over two steps: the E-step and the M-step.They are performed repetitively until convergence.In the E-step, Equation ( 12) is evaluated using either the initial parameters or the parameters estimated in the M-step.In the M-step, the parameters of the next model in the sequence are estimated.Each estimated model in the sequence represents a better approximation of the distribution of the smart meter data.Due to the complicated nature of the BAGGD function, the gradient of the log-likelihood function (Equation ( 11)) with respect to each one of the parameters was non-linear, and a closed-form solution was not obtained; therefore, for these parameters, we used the Newton-Raphson method to approximate the update values, as demonstrated in the equations below.The partial derivatives obtained with respect to each of the parameters can be found in Appendix A. Thus, the M-step is implemented using the following equations:

Model Selection
Model selection involves selecting the best set of parameters that model the smart meter data.Among several candidate models, the model with the maximum log-likelihood may achieve the best fit to the data; however, it is not guaranteed to perform well on unseen data.In other words, model evaluation based on the log-likelihood exclusively could be misleading.In this section, we develop a model selection criterion to infer the true number of consumption profiles within a dataset in an unsupervised manner.The Minimum Message Length criterion [72,73] is an information-theory-based model selection method; it selects the best model among a list of candidate statistical models based on its capability of compressing a message containing the data.According to the MML criterion, the best model minimizes a message that consists of two parts: the first part encodes the model using prior knowledge about the model exclusively, and the second part encodes the data using the model.Given a list of candidate models, the following function is minimized to obtain the true number of intrinsic clusters within the data: In Equation ( 21), the prior distribution is represented by p(Θ M ), the determinant of the Fisher information matrix is represented by |I(Θ M )|, and the model's likelihood is represented by p(X |Θ M ).The constant c is the total number of parameters; in this case, it is calculated as c = M + D + 4DM + 2D, c ≥ 1.The term ρ c ∈ R c represents the optimal quantization lattice constant [74]; the value of the constant is approximated with ρ c = 1   12   as the value of c changes across the list of candidate models [75].The independence of the different clusters of parameters has been considered in this paper, which allows the factorization of the prior distribution and Fisher information matrix in Equation (21).Additionally, we approximate the determinant of the Fisher information matrix using the complete likelihood, and we consider the uninformative Jeffrey's prior for the distribution of each group of parameters.Hence, in our case, the MML optimization objective function is calculated as follows: Equation ( 22) is minimized with respect to several constraints [23], which are listed as follows: 0 < p k ≤ 1, 0 ≤ ω d ≤ 1, and ∑ M j=1 p j = 1.In the context of this model selection criterion, since we are estimating feature weights using the EM algorithm, Equations ( 23) and ( 24) are utilized alternatively to approximate the parameters pk and ωd , respectively, as follows: The Algorithm of Model Selection and Model Parameter Estimation Algorithm 1 describes how to perform model selection and feature selection using the MML criterion and model parameter estimation using the EM algorithm.Initialize Θ M A K-means clustering results are used to initialize the parameters (π 1 , . . ., π M , µ 1 , . . ., µ M , σ l 1 , . . ., σ l M , σ r 1 , . . ., σ r M , λ 1 , . . ., λ M ).B For each cluster k, each element of the parameter vector λ k is set to the value 2. C Initialize the background Gaussian distribution parameter set β using the following equations for all the dimensions, where d ∈ {1, . . ., D}: Implement the E-step.

Implementation with HPC
The advancements in computational methodologies have played a pivotal role in addressing the challenges of data processing, especially in the realm of smart meters.Given the magnitude and intricacy of the data generated by these meters, traditional computing methods often fall short.This necessitated the exploration and implementation of our algorithm via HPC.
Our choice of HPC was rooted in its inherent capability to expediently process large volumes of data.For the clustering task at hand, HPC provided the computational agility required to analyze vast datasets from smart meters swiftly.By leveraging the parallel processing capabilities of HPC, we could achieve a significant reduction in computation time, while ensuring the consistency and accuracy of our clustering results.
Edge cloud computing stands at the forefront of modern computational paradigms, emphasizing on-the-spot processing to facilitate real-time decision-making.With the integration of HPC in edge settings, we foresee several advantages.

•
Enhanced Speed and Efficiency: By employing HPC at the edge, data from smart meters can be processed locally, resulting in quicker analytics and response times.This is especially crucial for utility programs that require timely information, such as demand response and energy efficiency initiatives.• Scalability: As the deployment of smart meters expands, the amount of data to be processed will proportionally increase.HPC can readily handle this surge, ensuring that the system can scale without compromising on performance.• Real-Time Analytics for Utility Programs: HPC, coupled with edge cloud computing, can power real-time analytics.For instance, utility providers can swiftly analyze consumption patterns and roll out demand response strategies almost instantaneously.This not only enhances grid reliability but also aids in optimizing energy consumption and costs for consumers.

Experimental Results
In this section, we will validate the performance of the MML model selection criterion and the proposed FSBAGGMM using two synthetic and real-life smart meter datasets within the application of household energy consumption segmentation.The first real-life dataset was recorded by the Commission for Energy Regulation (CER) and made accessible for researchers by the Irish Social Science Data Archive (ISSDA) [4].The dataset consists of smart meter data gathered from more than 6000 Irish energy consumers from 14 July 2009 to 31 December 2010.The energy consumption is recorded in kWh with an interval of half an hour.This dataset has two types of energy consumers: residential and small to medium enterprises.As stated earlier, we are interested in analyzing the energy consumption of residential energy consumers only.Therefore, 3639 Irish residential energy consumers remain for analysis after data cleaning.Each residential consumer is assigned six different tariffs ( E, A, D, C, B, and W).The second real-life smart meter dataset consists of smart meter data collected from 5567 residential homes in London.The data were collected by the UK Power Networks led by the Low Carbon London Project between November 2011 and February 2014 [6].The energy consumption is recorded in kWh with an interval of half an hour.After data cleaning, observations of 3891 household energy consumers within the year 2013 are used to analyze this experiment.The residential energy consumers in this dataset are subjected to two types of tariffs.The first type is the dynamic time of use (ToU), where the energy consumption prices vary as follows: high (67.20 pence/kWh), low (3.99 pence/kWh), or normal (11.76 pence/kWh).The second type is the standard (std), where the consumers pay a flat rate of 14.228 pence/kWh.Additionally, the energy consumers in this dataset belong to five different geo-demographic groups.
The application considered in this paper aims to segment energy consumers given their load curve.We use characteristic load profiles to find the optimal number of energy consumption clusters with similar consumption patterns and determine the cluster membership of every load curve given in the training dataset.Utility companies can use accurate energy-consumer-type identification to make correct decisions regarding the investments in load-shifting campaigns to prevent over-or under-dimensioning linked to the peak energy demand.Several performance evaluation metrics [64] are used in this paper.They are defined as follows.
DI [76]: Dunn's index is a model performance evaluation metric that is calculated using the minimum ratio between the closest distance of two observations of different clusters and the largest distance between two observations in the same cluster.This index is maximized for the best clustering and it is defined as follows: where d denotes the distance or the similarity function, φ(A, B) denotes the minimum distance between two observations that each belong to either cluster A or B, and M denotes the set of clusters.
EoE [31]: The entropy of eigenvalues is an entropy-based clustering performance measure; it is obtained from the eigenvalue analysis of the correlation matrix calculated using raw smart meter data.The index is calculated using the correlation between representative time series of different clusters and the correlation between different time series within each cluster.The EoE index is calculated using the following equation: The SM similarity is a normalized average information measure; the larger it is, the greater the similarity.The term SM b represents the normalized entropy of eigenvalues obtained from the correlation matrix between different clusters, and SM wk represents the normalized entropy of eigenvalues obtained from the correlation matrix between time series in each cluster k.In an ideal clustering, EoE is a small value consisting of high similarity between time series within each cluster and low similarity between representative time series of different clusters.S [77]: The silhouette score is a model evaluation measure that is concerned with calculating a score for each observation in the training dataset.The measure calculates the overall evaluation by computing the average score for all the dataset observations.The metric is maximized for better clustering and is defined in the following equation: where a(x i ) represents the average dissimilarity of the data point x i to all the other data points within the same cluster.b(x i ) represents the minimum average dissimilarity of data point x i to data points existing in a cluster different from the data point's cluster.CH [78]: The Calinski-Harabasz index is a model performance evaluation index; the measure calculates the ratio between the inter-cluster variance and the intra-cluster variance.This measure is maximized for better clustering and is defined as follows: where N k is the number of observations predicted to belong to cluster k, c k denotes the centroid of class k, c denotes the global centroid of all the clusters, and d denotes the distance or the similarity function.DB [79]: The Davies-Bouldin index is a model performance evaluation measure; it calculates the ratio of intra-cluster distances to inter-cluster distances for each possible pair of clusters.The maximum ratio calculated for each pair of clusters is considered in a summation.The summation result is divided by the total number of clusters to obtain the metric's value.This measure is minimized for better clustering, and it is defined as follows: where (A) denotes the cardinality of cluster A, k denotes the number of components enforced by the mixture model, M denotes the set of clusters, c A denotes the centroid of class A, and d denotes the distance or the similarity function.M has k elements.GOF [80]: The goodness of fit statistic value measures the model's fitting accuracy and it is calculated as follows: where Υ( X i ) and Ω( X i ) represent the empirical and the expected frequencies of the observation X i , respectively.The indices ACC, TPR, PPV, TNR, NPV, FPR, FNR, and FDR represent the average accuracy, average true positive rate, positive predictive value, true negative rate, negative predictive value, false positive rate, false negative rate, and false discovery rate, respectively.They are defined as follows: where TP k , FP k , TN k , and FN k denote the number of true positives, false positives, true negatives, and false negatives, respectively, for the cluster k.In order to compute the metrics explained in Equations ( 37)-( 44), cluster k labels are considered a positive class and all the remaining cluster labels are considered a negative class.MCC represents the Matthews correlation coefficient evaluation metric [81].
The AIC and BIC are probabilistic model selection methods [82] that attempt to select the model with the best performance while taking into consideration its complexity (by adding a complexity-related penalty).Unlike probabilistic model selection criteria, performance metrics select models with no regard to their complexity.The distinct probabilistic model selection criteria used in this paper originate from different fields of study.The AIC is derived from the frequentist framework, while the BIC is derived from Bayesian probability and inference.Compared to the BIC, the AIC emphasizes the model performance and penalizes complex models less, making it prone to selecting overfitted models.In comparison to the AIC, the BIC attempts to penalize candidate models more for their complexity.The AIC and BIC model selection criteria statistics for each candidate model are computed as follows: where L(Θ) is the likelihood function estimate given a set of parameters Θ, κ represents the number of free parameters, and N represents the number of observations.As N approaches infinity, the BIC criterion is more likely to select the candidate model with the true number of intrinsic clusters.The candidate model with the lowest AIC and BIC is selected for both model selection criteria.
In the upcoming sections, the performance of the proposed model is compared to specific mixture models such as the BAGGMM, the AGGMM, and the FSAGGMM.Model selection using the proposed model is performed using the MML model selection criterion and compared against specific model selection methods such as the BIC and AIC, and model selection methods using performance measures, such as Dunn's index (DI) and the entropy of eigenvalues (EoE).

Synthetic Data
As a first stage, synthetic datasets are used to validate the proposed mixture model and its model selection method.We propose using a 49-dimensional dataset, which imitates a real-life smart meter dataset by representing each energy consumer with a load curve.In order to generate the synthetic datasets used in this paper, the following steps were followed.

1.
For each energy consumer in the real-life dataset, only the first 49 smart meter observations are considered.

2.
The Gaussian mixture model is used to cluster the data into a specific number of clusters.The mean of each cluster is considered a consumption profile.

3.
Each consumption profile inferred from the previous step is summed with instances generated by Gaussian white noise using five different sets of parameters to form the observations of the synthetic dataset.
In other words, the origin of each cluster of observations within the synthetic datasets used in this paper is an actual energy consumption profile concluded from a real dataset.
The data-generating process delineated above provides a systematic approach to crafting synthetic datasets with asymmetric class distributions and varied shapes.By grounding the data in real consumption profiles and subsequently introducing variations via Gaussian white noise, the process ensures a rich diversity of data shapes.This diversity serves as a rigorous testing ground to evaluate the flexibility and robustness of the proposed mixture model, effectively challenging its capability to adapt and accurately represent varied data structures.
The first dataset consists of five clusters.The five real-life consumption profiles used to generate the first dataset are demonstrated in Figure 2a.The count of the observations generated for each energy consumption profile using the distinct Gaussian white noise parameters is shown in Table 2.The clustering results of our proposed model are evaluated using several performance measures and compared against the clustering performance of specific mixture models, as shown in Tables 3 and 4. As an illustrative example of the data generation process, 378 observations of the first dataset are generated by summing the white noise vector generated using the parameter set (µ = 0.001; σ = 0.2) of the multivariate Gaussian white noise with the vector of "Consumption Profile 1".Our model selection approach successfully infers the correct number of components within this dataset, as demonstrated in Table 5. MML outperforms specific model selection methods using the clustering results obtained from each instance of our proposed model.Figure 3a demonstrates the maximum log-likelihood achieved by clustering the data using the proposed model in comparison with specific mixture models.The proposed model achieves the best fit of the training data by achieving the best performance according to all the performance metrics used in this experiment and by reaching the highest log-likelihood.The second dataset consists of eight clusters.The eight real-life consumption profiles used to generate this dataset are demonstrated in Figure 2b.Our model selection approach successfully infers the correct number of components within this dataset, as demonstrated in Table 6.The count of the observations generated for each energy consumption profile using the distinct Gaussian white noise parameters is shown in Table 7. MML chooses the proposed model's instance with a component count equal to the ground truth, outperforming specific model selection methods used in this comparison.The proposed model fits the data better than all the mixture models used in the comparison by achieving the highest maximum log-likelihood, as demonstrated in Figure 3b.According to all the performance metrics used in this experiment, the proposed model also outperforms the mixture models selected for the comparison, as shown in Tables 8 and 9.In this section, we investigate the performance of our proposed model using the first real-life smart meter dataset.As mentioned earlier, the dataset that we consider has smart meter observations from 3639 Irish energy consumers.Each consumer has 25,728 electricity usage readings that are recorded in kilowatt-hours.In order to summarize and preserve the information within the numerous features representing each energy consumer, PCA is used for feature extraction in this experiment.Several datasets with a different number of features are considered within the range between 50 and 250.Due to the low reconstruction error, the dataset with 250 features is favoured for this experiment.We used the dataset as an input to three different instances of our proposed model.Each instance had a different number of mixture components within the range M = [2, 4].The model selection algorithm concluded that the minimum value calculated using its objective function was obtained while using the model instance with three components, as shown in Figure 4a.Table 10 demonstrates the optimal number of clusters concluded by each model selection criterion used in comparison with MML.In addition to the fact that our derived model selection criterion infers the correct number of clusters in solid experiments using synthetic data, the AIC and BIC also agree that the true number of clusters is three in this experiment.
Figure 4b demonstrates the log-likelihood trail for each mixture model used in the comparison within this experiment.As observed, the proposed model converged to the highest log-likelihood, indicating a better fit to the training dataset.The clustering evaluation of the proposed model for the concluded optimal number of clusters is demonstrated in Table 11 in comparison with specific mixture models.As demonstrated, our proposed model achieves the best clustering performance according to all the evaluation measures used in the comparison.As mentioned earlier, we determined the true number of clusters using MML and achieved the best clustering result using our proposed mixture model.Since this is an implementation of a real-life application, it is necessary to analyze the resulting clusters to understand further the energy consumption patterns of each consumption trend discovered.Figure 5a demonstrates the average power demand of all the energy consumers without clustering.Comparatively, we demonstrate the average power demand of each energy consumer cluster in Figure 5b.For all the time intervals available in the dataset, as observed, the responsibility of each energy consumption pattern to the overall average power demand can be determined.The proposed model can determine the consumer's contribution to each consumption profile and which the consumer is mostly following.Table 12 demonstrates the ratio of the count of energy consumers in each cluster to the total count of energy consumers in the dataset; the table also demonstrates the consumption responsibility of each consumer cluster to the total average energy consumption in the year 2010.Additionally, the real-life dataset that we use in this experiment provides the tariff assigned for each energy consumer.We have discovered that the tariff types are distributed almost identically across the resulting clusters, as shown in Figure 6, which indicates that the tariff type does not influence the consumer's electrical usage pattern.In this section, we validate the performance of our proposed model using the second real-life smart meter dataset.As mentioned earlier, the dataset that we consider in this experiment has smart meter observations from 3891 household energy consumers that are located in London.Each consumer has 17,520 electricity usage readings that are recorded in kilowatt-hours.In order to summarize the information included in the load curve of each energy consumer, we have extracted nine features.Following [32], seven features are extracted after the definition of four key time periods and they are denoted by t ∈ {1, 2, 3, 4}.The overnight time period (t = 1) is defined between 10:30 p.m. and 6:30 a.m., the breakfast time period (t = 2) is defined between 6:30 a.m. and 9:00 a.m., the daytime period (t = 3) is defined between 9:00 a.m. and 3:30 p.m., and the evening time period (t = 4) is defined between 3:30 p.m. and 10:30 p.m. Based on the four previously explained prominent time periods, seven features are extracted from the smart meter data to summarize the representation of energy consumers, and they are calculated as follows.
• RAP t denotes the relative average power for time period (t) over the entire year; it is defined as follows: • the mean STD denotes the mean relative standard deviation of the average power used over the entire year; it is defined as follows: • The seasonal score is defined as follows: • The weekend vs. weekday difference score (WD-WE diff.score) is calculated as follows: where AP t , and σ t represent the average power used by the specific consumer and its corresponding standard deviation in the time period (t), respectively, over all the available smart meter data.DAP represents the average daily power used by the specific consumer throughout the available smart meter data.AP W t and AP S t represent the average power used by the specific consumer in the time period (t) throughout winter and summer, respectively.AP WD t , and AP WE t represent the average power used by the specific consumer in the time period (t) throughout the weekdays and weekends, respectively, for the available data.Finally, the eighth and the ninth features represent the consumer's tariff and geodemographic group, respectively.
We have determined the optimal number of clusters for our proposed model using the MML model selection criterion, similarly to our previous experiments.Among five candidate FSBAGGMM models of mixture components within the range [2,6], the model instance with four components achieved the minimum message length.
Most of the model selection methods used in the comparison demonstrated in Table 13 agree on the optimal number of mixture components.Therefore, the data were clustered into four clusters using our proposed model, and the clustering performance evaluation was compared against specific mixture models.Table 14 demonstrates how our proposed mixture model has been able to outperform the different mixture models used in the comparison using six different performance metrics.As shown in Figure 7b, the categorical feature representing the tariff for each energy consumer has an almost identical distribution across the clusters obtained using our proposed mixture model, having little to no influence on the energy consumption behaviour.Nevertheless, as demonstrated by the CH score in Table 14, our proposed model has achieved clusters with relatively small intra-cluster (within clusters) variance and relatively large inter-cluster (between clusters) variance.Additionally, the minimum number of members within the clusters achieved using the FSBAGGMM is 225 energy consumers, as demonstrated in Figure 7a.Additionally, Table 15 demonstrates the average values of several features for the inferred household energy consumer clusters.Since the smart meter data have been modelled successfully, the proposed model is capable of identifying energy consumer clusters that are suitable for demand reduction initiatives within several utility programs [2].As an example, Table 15 demonstrates that the first cluster has a relatively high evening RAP with a relatively low mean STD, seasonal score, and WD-WE difference score.The power demand of energy consumers exhibiting energy consumption patterns similar to the first cluster could be lowered by implementing storage devices.The third and fourth clusters' energy consumption patterns exhibit relatively low variability in demand, as represented by the mean STD and WD-WE difference score, while exhibiting a relatively high seasonal difference in power demand, as represented by the seasonal score.Such households could be offered non-electric or more efficient heating systems to reduce the winter demand.

Discussion
In this paper, we have presented an expectation-maximization algorithm within the MML criterion to optimize the parameters of the bounded asymmetric generalized Gaussian mixture model and to find the optimal number of consumption profiles and the optimal subset of features simultaneously.Our approach assumes that the data arise from a mixture of bounded asymmetric generalized Gaussian distributions.The final results demonstrate that the load curve of an individual energy consumer shows a probabilistic association with each class, indicating which pattern of electricity use is more or less likely to be used within a household.Therefore, it is possible to categorize households and how they consume energy using our proposed model.
Prior works in household energy consumption segmentation unrealistically approach model selection and feature selection as independent problems.Our approach successfully achieves the discovery of the true number of energy consumption profiles and the determination of the optimal set of data attributes to be used for modelling in our proposed mixture model in a single optimization process and avoids running the EM algorithm many times.
Clustering synthetically generated smart meter data with a ground truth cluster size, our proposed algorithm has outperformed most of the existing model selection approaches.In the same experiment, the proposed model correctly models the first and the second synthetic smart meter data with high accuracy of 95.569% and 91.856%, respectively.Similarly, our algorithm has also determined the optimal number of clusters in both datasets in experiments involving real-life data, and the proposed model outperforms all the mixture models used in the comparison, as demonstrated by all the utilized performance metrics.Thus, the superiority of the proposed algorithm in modelling smart meter data with different feature extraction methods over all the state-of-the-art clustering algorithms used in the comparison is proven.
Privacy and security concerns loom large in the realm of smart meter data analytics.Fortunately, the datasets employed in our research have been thoughtfully curated, with a paramount emphasis on safeguarding the privacy of individuals whose households are equipped with smart meters.These datasets meticulously exclude any information that might compromise the privacy of the participants while providing valuable insights for research.We have underscored in our research paper, particularly in the Results section, that the conventional categorization, carried out prior to any consumption data observation, is fundamentally ineffective.Respecting individuals' privacy is not only an ethical imperative but also a fundamental human right.Remarkably, our proposed mixture model navigates this privacy-centric landscape adeptly.It uncovers the underlying data distribution and identifies energy consumption patterns without the need for additional, potentially intrusive information.This privacy-preserving approach aligns with the broader scientific quest for generalization and effectiveness in solutions that refrain from privacy invasion.Furthermore, our experiments with real-life datasets, which encompassed features such as tariff and geo-demographic groups, yielded intriguing results.These attributes, often considered vital, were deemed unimportant by our meticulous feature selection approach.This underscores our commitment to privacy and our ability to derive meaningful insights without resorting to invasive practices.
Finally, our implementation underscores a promising synergy between HPC and edge cloud computing, especially in the realm of smart meter data processing.As we progress towards a more interconnected and data-centric world, the amalgamation of these technologies will prove indispensable in sculpting the future of energy management and utility programs.

Conclusions
Our approach to analyzing real-life smart meter data is effective in determining households that are suitable for demand reduction initiatives such as DR and EE, thus providing the opportunity for utility companies to adopt environmentally friendly and cost-effective technologies.
The application addressed in this paper is well suited for an unsupervised approach, especially given the absence of ground truth labels.However, many applications would benefit from supervised or semi-supervised machine learning solutions.A limitation of the current learning framework presented in this paper is its inability to leverage ground truth labels.Recognizing this as a crucial area of improvement, future work could involve proposing a learning method for the mixture model that incorporates these labels to optimize the model parameters.

Figure 2 .
Figure 2. Consumption profiles used to generate the synthetic datasets.(a) First synthetic dataset.(b) Second synthetic dataset.

Figure 3 .
Figure 3. Mixture model's log-likelihood function demonstration during the clustering of the synthetic datasets.(a) First synthetic dataset.(b) Second synthetic dataset.

Figure 4 .
Figure 4.The mixture models' performance information during the clustering of the first real-life smart meter data.(a) Selection of the optimal number of mixture components using MML and the proposed model.(b) The log-likelihood functions of the mixture models used in the comparison.

Figure 5 .
Figure 5. Household energy consumption segmentation demonstration of the first real-life smart meter dataset.(a) The average demand of all the energy consumers starting from 14 July 2009 to 31 December 2010.(b) The average demand of the optimal energy consumption clusters from 14 July 2009 to 31 December 2010.

Figure 6 .
Figure 6.Number of energy consumers in each cluster.

Figure 7 .
Figure 7.The UK Power Networks smart meter data clusters information.(a) Percentage of energy consumers in each cluster.(b) The distribution of tariffs across the resulting clusters.

Table 2 .
Count of observations generated for the first synthetic dataset.

Table 3 .
Mixture models' clustering performance evaluation using the first synthetic dataset.

Table 4 .
Mixture models' clustering performance evaluation using the first synthetic dataset.

Table 5 .
Clusters using first synthetic dataset.

Table 6 .
Clusters using second synthetic dataset.

Table 7 .
Count of observations generated for the second synthetic dataset.

Table 8 .
Mixture models' clustering performance evaluation using the second synthetic dataset.

Table 9 .
Mixture models' clustering performance evaluation using the second synthetic dataset.

Table 10 .
Identified optimal number of clusters for the real-life smart meter dataset.

Table 11 .
Mixture models' clustering performance using the real-life smart meter dataset.

Table 12 .
Consumption profile statistics for the year 2010.

Table 13 .
Identified optimal number of clusters for the second real-life smart meter dataset.

Table 14 .
Mixture models' clustering performance using the second real-life smart meter dataset.

Table 15 .
The mean values of the first seven smart meter data features.