Handling Data Heterogeneity in Electricity Load Disaggregation via Optimized Complete Ensemble Empirical Mode Decomposition and Wavelet Packet Transform

Global warming is a leading world issue driving the common social objective of reducing carbon emissions. People have witnessed the melting of ice and abrupt changes in climate. Reducing electricity usage is one possible method of slowing these changes. In recent decades, there have been massive worldwide rollouts of smart meters that automatically capture the total electricity usage of houses and buildings. Electricity load disaggregation (ELD) helps to break down total electricity usage into that of individual appliances. Studies have implemented ELD models based on various artificial intelligence techniques using a single ELD dataset. In this paper, a powerline noise transformation approach based on optimized complete ensemble empirical model decomposition and wavelet packet transform (OCEEMD–WPT) is proposed to merge the ELD datasets. The practical implications are that the method increases the size of training datasets and provides mutual benefits when utilizing datasets collected from other sources (especially from different countries). To reveal the effectiveness of the proposed method, it was compared with CEEMD–WPT (fixed controlled coefficients), standalone CEEMD, standalone WPT, and other existing works. The results show that the proposed approach improves the signal-to-noise ratio (SNR) significantly.


Introduction
Reducing carbon emissions stemming from electricity consumption has been the leading global vision to tackle global warming, which can wreak havoc on human lives. Environmental experts have emphasized that global warming leads to severe ice and permafrost melting, which releases large amounts of methane, which has a greenhouse effect that is about 30 time more powerful than that for carbon dioxide [1]. This situation may lead to irreversible positive feedback for glacial melting if increased sea-level temperatures reach a certain threshold. This drives the vision for smart, green buildings to reduce the global carbon footprint [2].
In many countries, traditional electric grids have been changed to smart grids to address this challenge. Smart grids contribute to modernization by improving the transmission of electricity, the distribution system, and the electricity infrastructure. Various research topics have emerged, including battery energy storage [3], electrical-gas-hydrogen interconnected networks [4], and advanced metering infrastructure (AMI) [5]. Particularly in AMI, the deployment of smart meters, which support the continuous collection of electricity data in apartments and buildings, has played a crucial role in developing smart grids. Recent works estimated that the number of smart meters has reached 200 million in Europe, 96 million in China, 70 million in the USA, and 2.9 million in the UK, with a market penetration of over 50% [6,7]. This has built a solid foundation for further analysis of massive amounts of electricity data. In light of the introduction of the electricity load disaggregation (ELD) algorithm (also called nonintrusive load monitoring (NILM)), electricity data are disaggregated into electricity consumption of individual appliances, which brings valuable insight to the public, electric companies, and governments [8]. Users may benefit from three insights in particular: The first is determining which appliance is the most power-hungry so that follow-up actions can be taken to reduce electricity consumption in these appliances, and as a result, lower the electricity bill. Another insight is to verify whether there are appliances being turned on during the night or outside of office hours, resulting in electricity wastage. The third insight requires analysis between the past and current energy profiles of an appliance to evaluate whether the appliance has been degraded significantly. It could be more worthy to replace an appliance rather than continue to use it if it has been degraded. In terms of electricity users without smart meters, the demonstrated benefits in terms of electricity reduction among smart meter users provides a strong reason for electric companies to migrate from the traditional electric meter to the smart meter.
Various techniques, including signal processing, data mining, shallow learning, and deep learning, have been proposed for ELD in the literature. Readers who are interested in the details may refer to the latest state-of-the-art articles [9][10][11][12]. Researchers have devoted efforts to enhancing the ELD model from an algorithmic perspective, particularly toward deep learning approaches [13][14][15]. The advantages of deep learning compared to shallow learning have been demonstrated in large-scale datasets.
A critical review summarized 42 ELD datasets developed by the scientific community [16]. These datasets are heterogeneous in nature with varying factors such as location, type of space (e.g., residential, commercial, and industrial), electric appliance, powerline cable, AC power source, and smart meter.
The research focus of this paper is to merge heterogeneous datasets, which can provide two major advantages. It increases the amount of training data, especially when data collection is sometimes challenging for some appliances (suffering from small sample size). In addition, countries that have had more experience in the deployment of smart meters could support the quick rollout for those that have newly joined the smart meter initiative. Section 1.1 presents a literature review of the techniques used to merge heterogeneous datasets. This is followed by the limitations of related works and the rationales of our work in Section 1.2. The research contributions of this paper are summarized in Section 1.3.

Literature Review
One review article [12] addressed the unsolved issue of data heterogeneity. It creates difficulty in fair performance evaluation and comparisons between heterogeneous datasets, yet about 40 performance metrics have been utilized in ELD research. Additionally, other heterogeneous features of public datasets include folder structure and file format [17]. Various approaches, such as those of Brick [18] and Blond [19], were employed to structure electricity data as a metadata schema in order to produce a summary of the characteristics of the ELD database. The discussions and investigations of data heterogeneity algorithms for ELD are limited. Algorithms in the literature were evaluated based on individual benchmark datasets instead of groups of benchmark datasets. Furthermore, discussions of folder structure, file format, and metadata schema [17][18][19] addressed how the attributes between datasets become consistent. This is not related to how heterogeneous datasets can be merged.
To the best of our knowledge, our research idea of merging heterogeneous ELD datasets is the first of its kind. We made the following query using the advanced search function in Web of Science: TS = ((nonintrusive load monitoring OR NILM OR load monitoring OR energy disaggregation OR electricity disaggregation OR electricity load disaggregation OR load disaggregation) AND (heterogeneity OR heterogeneous data OR heterogeneous OR heterogeneous datasets)). The same query was made using Scopus with the function TITLE-ABS-KEY. We read titles, abstracts, keywords, and introductions to confirm that there was no relevant work on the research topic.
It is worth noting that extra data generation from the source dataset [20,21] and data simulation [22] are not related to the topic of this research.

Limitations of Existing Works
The limitations of the existing works are as follows:

•
No previous work has conducted research on merging heterogeneous ELD datasets.

•
It is difficult to ensure fair performance evaluation and comparison between heterogeneous ELD datasets given that about 40 performance metrics were used.

•
There is limited investigation of the powerline noise transformation between heterogeneous ELD datasets.

Major Research Contributions
The major research contributions of this research work are summarized as follows: • It is the first of its kind to merge heterogeneous ELD datasets. • It unifies the performance comparison of ELD models with merged heterogeneous datasets. • An optimized complete ensemble empirical model decomposition and wavelet packet transform (OCEEMD-WPT) is proposed, which provides in-depth decomposition of electricity data and enhances the performance of powerline noise transformation. • A feasibility study is carried out to confirm the enhancement of the deep learning model given the increased size of training data (after combining heterogeneous datasets).

Datasets and Methodology
In this section, 5 benchmark ELD datasets were selected to analyze the merger of heterogeneous datasets. This is followed by an illustration of the powerline noise transformation approach.

Benchmark Electricity Load Disaggregation Datasets
As mentioned above, one review article summarized 42 benchmark ELD datasets [16]. Five of these datasets were selected to exemplify the performance of the proposed powerline noise transformation approach. The selection criteria were based on country (the ELD datasets collected from different countries were highly heterogeneous) and sampling rate (high-frequency data, i.e., more than 10 kHz, were chosen, which led to complete information about the electricity data). In contract, low-frequency electricity data (e.g., 1 Hz) were aggregated; therefore, some essential characteristics may have been lost, thus lowering the performance of the ELD model.
The selected benchmark datasets were as follows: (i) reference energy disaggregation dataset (REDD) [23], (ii) United Kingdom domestic appliance-level electricity dataset (UK-DALE) [24], (iii) worldwide household and industry transient energy dataset (WHITED) [25], (iv) controlled on/off loads library dataset (COOLL) [26], and (v) laboratory for innovation and technology in embedded systems dataset (LIT) [27]. Table 1 summarizes the characteristics of the datasets, including country, number of classes, data duration, and sampling rate. WHITED [25] can be further categorized into 3 groups: Germany, Austria, and Indonesia. There were 7 datasets (one for each country) in total.

Overview of the Proposed Powerline Noise Transformation Approach
The conceptual flow of the proposed powerline transformation approach is shown in Figure 1. We assume that there are M + 1 datasets, with the total number of originating datasets M = 6. The originating dataset X i = [X 1 , . . . , X M ] performs powerline noise transformation using OCEEMD-WPT (Section 2.3), including powerline noise removal from the source and powerline noise inclusion of the destination dataset X d . The originating datasets mimicked the powerline noise of the destination dataset. The amplitude and sampling rate of X i are normalized to match X d for data homogeneity. In other words, 6 originating datasets are merged with 1 destination dataset.

Optimized Complete Ensemble Empirical Model Decomposition and Wavelet Packet Transform
Empirical mode decomposition (EMD) and its variants have demonstrated effectiveness in handling nonstationary and nonlinear time-series signals. They have received increasing attention based on the number of publications since 2007. Ensemble empirical mode decomposition (EEMD) was proposed in [28], which introduced Gaussian white noise (GWN) to address 2 major issues of EMD: mode mixing, which affects the further decomposition of other modes, and amplitude variation in a mode. However, EEMD has inadequacies in terms of computational cost, spectral separation of modes, and reconstruction errors. This inspired the proposal of complete ensemble empirical mode decomposition (CEEMD) [29] and improved CEEMD (ICEEMD) [30]. The controlled coefficients of the signal-to-noise ratio (SNR) in CEEMD and ICEEMD were fixed [29,30] and can be further improved by customization (via optimization).
In our work, the requirement of powerline noise transformation is to minimize the powerline noise of the originating datasets so that new powerline noise (based on destination datasets) can be added. In addition, it is desirable to maximize the noise generated by the electric appliance because it is a useful characteristic for feature extraction in the ELD model. Hence, the research problem of powerline noise transformation can be formulated as a multi-objective optimization problem called optimized complete ensemble empirical mode decomposition (OCEEMD). It helps to capture the temporal resolutions and frequency components of the signal. The signal is expressed as various intrinsic mode functions (IMFs) and a residual. Furthermore, the output of OCEEMD performs second-phase decomposition by WPT. The rationale of WPT is to emphasize time components and to characterize orthogonality, smoothness, and localization properties [31,32]. To summarize, we combined OCEEMD and WPT as OCEEMD-WPT, which captures both the time and frequency components of signals.
The mathematical formulations of OCEEMD and WPT are explained in Sections 2.3.1 and 2.3.2, respectively.

Optimized Complete Ensemble Empirical Model Decomposition
We consider an originating dataset X i = [X 1 , . . . , X M ]∀i ∈ [1, M], where M is the total number of originating datasets. We define X i = [x i (1), · · · , x i (L i )] ∈ R N , where L i is the length of X i , which is decomposed into various IMFs and a residual using OCEEMD.
where K is the total number of IMFs. This GWN-masked signal is given as follows: where α ik is the controlled coefficients of the SNR to be optimized and w j i (t) is the GWN. First, the first IMF I MF i1 (t) and residual r i1 are computed: where EMD(·) is the basic EMD decomposition function. The decomposition is repeated with general formulas: which are stopped when r j ik (t) has one extremum. The original signal x i (t) can be reconstructed by all IMFs and the last residue r i, f inal .

Wavelet Packet Transform
The results of I MF ik (t) are further decomposed and extended. The extended version of I MF ik (t) is I MF ik (t) e (of length L e ) and is given by the following: I MF ik (t) e = I MF ik,0 , · · · , I MF ik,L e (9) with low-pass filter h low = h low,0 , · · · , h low,L low −1 of length L low . The general form of the approximated WPT coefficients with h low is given by the following: Likewise, the high-pass filter h high = h high,0 , · · · , h high,L high −1 of length L high is defined. The general form of approximated WPT coefficients with h high is given by the following: For the selection of wavelets, typical Daubechies wavelets (D2-20) were selected for analysis.
As mentioned above, the controlled coefficients of the SNR α ik must be optimized. We formulated the optimization problem as a multi-objective optimization problem with two objective functions, in which F 1 is the kurtosis and F 2 is the residual difference: where E{·} is the expected value, and γ and σ γ are the average and standard deviation, respectively, of wavelet coefficients γ. A reference-point-based multi-objective evolutionary algorithm following the NSGA-II framework (NSGA-III) [33,34] was adopted to solve the multi-objective optimization problem. NSGA-III has advantages in solving the optimization problem with smaller population sizes, thus lowering the computation time, enhancing the diversity of the new population based on the reference points, and using adaptive allocation of reference points depending on the Pareto-optimal front. The flow of the NSGA-III-based OCEEMD-WPT is shown in Figure 2.
The pseudo-code of the NSGA-III is summarized in Algorithm 1. The reference points are predefined with locations and uniformly distributed on a hyperplane to ensure the convergence of solutions. It adopts a set of reference directions (rays starting from the original and pointing towards the reference point) to maintain the diversity among solutions. The goal of a multi-objective evolutionary algorithm is to seek a Pareto solution set that is evenly distributed, well extended, and converged. Regarding the association of the populations with reference points, there are two possibilities: (i) if only one member of the population is associated with the reference point, the reference point is ignored in the current generation and, (ii) if more than one member of the population is associated with the reference point, the member with the shortest perpendicular distance is included.

Analysis and Comparison
To evaluate the performance of the proposed NSGA-III optimized OCEEMD-WPT approach for merging heterogeneous ELD datasets, four studies were conducted (i) on the performance of NSGA-III-optimized OCEEMD-WPT, (ii) on the contribution of NSGA-III to solving the controlled coefficients, (iii) on the contribution of merging CEEMD and WPT, and (iv) on the performance of the proposed approach in comparison to existing works merging time-series heterogeneous data.
The performance indicator of the powerline noise transformation is based on the average improvement of signal-to-noise ratio (SNR) in dB.

Performance Evaluation of Proposed Work
Recall that there are seven heterogeneous ELD datasets considered, as shown in Table 1. The evaluation of the proposed NSGA-III optimized OCEEMD-WPT can be formulated as seven destinations (each ELD dataset corresponds to one destination). The experiment is based on a workstation (i7-10850H 2.7-5.1 GHz CPU, NVIDIA Quadro RTX 3000 6 GB GDDR6 GPU, and 64 GB memory). The average computational times for WPT (one execution), CEEMD (one execution), and NSGA-III are 0.001 s, 0.0568 s, and 8.5 min to 1.4 h, respectively. The average improvement in SNR is summarized in Table 2. Table 2. Average improvement in signal-to-noise ratio (SNR) using the proposed NSGA-III-optimized OCEEMD-WPT. • The larger the number of classes in the originated ELD dataset, the larger the average improvement in SNR.

•
The larger the number of classes in the destination ELD dataset, the larger the average improvement in SNR.

Study on the Contribution of NSGA-III to Solving Controlled Coefficients
The optimal design of the controlled coefficients of the SNR was solved by NSGA-III and compared with the performance based on fixed controlled coefficients (without optimization) [29,30]. Table 3 summarizes the average improvement in SNR of CEEMD-WPT. It can be seen from the results that CEEMD-WPT yields a smaller average improvement in SNR compared with the proposed NSGA-III optimized OCEEMD-WPT. When attributed to the fixed controlled coefficients using CEEMD-WPT, less powerline noise can be eliminated.  Comparing the columns between Tables 2 and 3, the proposed approach improves the average SNR by 37.3-47.4%, 43.2-53.6%, 34.0-52.9%, 30.7-50%, 34.0-54.5%, 43.1-52.5%, and 43.9-50.8% for REDD, UK-DALE, WHITED (Germany), WHITED (Austria), WHITED (Indonesia), COOLL, and LIT, respectively. This reflects the need for an optimal design of the controlled coefficients.

Study on the Contribution of Merging Complete Ensemble Empirical Model Decomposition and Wavelet Packet Transform
To examine the advantages of merging CEEMD and WPT as two stages of decomposition of electricity data, the performance of the average improvement in SNR using either standalone CEEMD or WPT is summarized in Tables 4 and 5, respectively.   Figure 3 presents the range of percentage improvements by the proposed work compared with standalone CEEMD (Figure 3a) and standalone WPT (Figure 3b) as well as those between standalone CEEMD and standalone WPT (Figure 3c). The proposed work achieved the greatest improvements in SNR, followed by standalone CEEMD and standalone WPT.

Conclusions and Future Work
Merging ELD datasets (heterogeneous in nature) provides a larger pool of data for training ELD models. More data availability is advantageous for deep learning-based methods. In this paper, we propose an NSGA-III-based OCEEMD-WPT approach for powerline noise transformation so that heterogeneous ELD datasets can be merged, with the unification of powerline noise. Various studies determining the necessity for NSGA-III, for combining CEEMD and WPT, and for making comparisons with existing works were conducted to confirm the effectiveness of the proposed approach, which enhances the SNR significantly. The results of this research could be beneficial in shifting from total electricity consumption to consumption of individual appliances, which could possibly reduce the number of power-hungry appliances. Current work could be realized by enabling optimal tracking [38] and price control [39] strategies for heterogeneous loads. Secured control can be guaranteed using blockchain-based authentication and authorization [40], and convolutional neural network [41]. Consequently, climate change as a critical governing factor in the global hydrological cycle could be relieved [42].
Since the current work is the first to consider merging ELD datasets, there are research limitations; thus, we suggest conducting further investigations in the following areas: (i) consideration of more ELD datasets based on a summarized list of datasets from a review article [16]. (ii) evaluation of the performance of the proposed approach and existing works in low-frequency (i.e., aggregated electricity data) ELD datasets. (iii) evaluation of the performance enhancement of deep learning-based models for ELD, and (iv) exploration of alternative approaches to addressing the challenges that arise when the number of classes in the ELD datasets is small.