Improving Residential Load Disaggregation for Sustainable Development of Energy via Principal Component Analysis

The useful planning and operation of the energy system requires a sustainability assessment of the system, in which the load model adopted is the most important factor in sustainability assessment. Having information about energy consumption patterns of the appliances allows consumers to manage their energy consumption efficiently. Non-intrusive load monitoring (NILM) is an effective tool to recognize power consumption patterns from the measured data in meters. In this paper, an unsupervised approach based on dimensionality reduction is applied to identify power consumption patterns of home electrical appliances. This approach can be utilized to classify household activities of daily life using data measured from home electrical smart meters. In the proposed method, the power consumption curves of the electrical appliances, as high-dimensional data, are mapped to a low-dimensional space by preserving the highest data variance via principal component analysis (PCA). In this paper, the reference energy disaggregation dataset (REDD) has been used to verify the proposed method. REDD is related to real-world measurements recorded at low-frequency. The presented results reveal the accuracy and efficiency of the proposed method in comparison to conventional procedures of NILM.


Introduction
Energy is one of the most important aspects of industrial and economic development in all countries. Future energy systems must be equipped to provide sustainable, affordable, and reliable energy, and to provide consumers with the ability to guarantee sustainable development. Effective use and energy efficiency are essential for sustainable development [1]. Therefore, energy consumption monitoring processes and planning in conserving energy in buildings are considered to be energy management for sustainable development. Due to rising costs and environmental impacts of energy consumption, the importance of energy conservation and planning is growing significantly [2]. Today, policy efforts to reduce CO 2 emissions from energy sources are one of the major expert efforts of environmentalists. On the other hand, energy demand of consumers is increasing exponentially, and the energy demand is projected to double by 2030. Therefore, several researches have been conducted on the effective management of energy supply and demand [3,4]. Nowadays, worldwide smart Nowadays, worldwide smart electricity meters are widely installed and used in homes and other places. According to research, it is estimated that by the end of 2020, approximately 72% of European homes will have electricity smart meters installed [5,6]. With the advances in smart electricity metering technologies, consumers are aware of their energy consumption patterns over days, weeks, or months. Adding features such as power/energy consumption onto the surface of home appliances will make them "smart" users of energy. In addition to monitoring the energy consumption of the entire home, they can monitor the energy consumption of each device [7].
Load monitoring or energy disaggregation is a very effective and useful step in energy management. Obtaining this information about the active loads of a grid is very effective and useful to the energy management system. Monitoring the load level of home appliances can be measured by two types: intrusive load monitoring (ILM) and non-intrusive load monitoring (NILM). In the ILM method, a sub-meter is attached to each appliance. This method is expensive and inconvenient, because an ILM-based power system with several appliances to read and record data needs a magnitude of sensors, which incur a prohibitive extension cost, and it does not respect consumer privacy. In contrast, NILM, or load disaggregation, can analyze the aggregated power consumption data of the appliances' exclusive power consumption via the appliance power consumption patterns, with no need to have data recorder sensors ( Figure 1) [8,9]. In NILM, all measured power consumption is processed in the smart meter. This process is continued until the required information about the time and amount of home electrical appliances consumption is calculated. Electrical smart meters have the capability to record the total customer energy consumption of the building. Using the information that electrical smart meters record from home power consumption can have several useful benefits in the areas of energy, trade, and economics, such as improving short and long term forecasts of demand profiles, load forecasting, providing consumers with detailed feedback on their energy consumption, designing demand management plans, and measuring and validating energy efficiency plans of buildings. Therefore, the development of techniques to improve load disaggregation problems, which can recognize the individual appliance's signal signatures through reading the total power consumption, has emerged as an interesting research topic in academic and industrial fields [10,11].
Data collection, event discovery, pattern recognition, and appliance identification are the four principal steps of an NILM system. Identifying electrical appliances operating at the same time in a home is the core of the NILM system [12].
There are many methods to improve the NILM system problems. Some of these methods are based on numerical indices, classical methods, and optimization. In some studies, different approaches for load disaggregation based on hidden Markov models (HMM), are used to model each appliance. In [13], segmented integer quadratic constraint programming is used to solve the load disaggregation problem. In [14], the load disaggregation problem is solved via increasable factorial approximate maximum posteriori. In [15], the event-based load disaggregation method is suggested, in which multiple signatures including distortion, active, and reactive powers are used. The information coding perspective of the load disaggregation is proposed in [16], in which appliances In NILM, all measured power consumption is processed in the smart meter. This process is continued until the required information about the time and amount of home electrical appliances consumption is calculated. Electrical smart meters have the capability to record the total customer energy consumption of the building. Using the information that electrical smart meters record from home power consumption can have several useful benefits in the areas of energy, trade, and economics, such as improving short and long term forecasts of demand profiles, load forecasting, providing consumers with detailed feedback on their energy consumption, designing demand management plans, and measuring and validating energy efficiency plans of buildings. Therefore, the development of techniques to improve load disaggregation problems, which can recognize the individual appliance's signal signatures through reading the total power consumption, has emerged as an interesting research topic in academic and industrial fields [10,11].
Data collection, event discovery, pattern recognition, and appliance identification are the four principal steps of an NILM system. Identifying electrical appliances operating at the same time in a home is the core of the NILM system [12].
There are many methods to improve the NILM system problems. Some of these methods are based on numerical indices, classical methods, and optimization. In some studies, different approaches for load disaggregation based on hidden Markov models (HMM), are used to model each appliance. In [13], segmented integer quadratic constraint programming is used to solve the load disaggregation problem. In [14], the load disaggregation problem is solved via increasable factorial approximate Sustainability 2020, 12, 3158 3 of 14 maximum posteriori. In [15], the event-based load disaggregation method is suggested, in which multiple signatures including distortion, active, and reactive powers are used. The information coding perspective of the load disaggregation is proposed in [16], in which appliances with similar power draws are recognized. In [17], the segmented integer quadratic programming problem is suggested to improve the NILM problem. Recognition of the simultaneous on and off state of multiple devices is dealt with a Cepstrum smoothing-based load disaggregation in [18]. Optimization-based methods to solve the NILM problem are proposed in [9,[19][20][21].
On the other hand, data mining methods are widely used to solve energy management problems. Some works have been done on this basis to solve the NILM program. These methods are usually divided into two types: supervised and unsupervised. The main difference between these two methods is in learning the features that are in the essence of data. Unsupervised methods do not need to learn these features.
Supervised applications such as artificial neural networks, support vector machine applications, deep learning, feature learning, etc., use the training dataset of each appliance to identify and extract the features and build a feature dictionary [22][23][24][25][26][27][28]. In [23], a deep long short-term memory (LSTM) recurrent network is used to classify the types of electrical appliances into a set. A convolutional neural network (CNN) for recognizing multi-state appliances is suggested in [24], in which low-frequency power measurements are used. In [24], a support vector machine (SVM) is used to improve NILM problems, so that the K-means is considered to reduce the SVM training set size. A deep convolutional neural network is used in [26] to implement a practical data reinforcement technique with the need of sub-metering for new unseen houses, which makes a post-processing technique to solve the NILM problem. In [27], load disaggregation based on deep learning methods is proposed, in which deep dictionary learning and deep transform learning techniques are used. The transform learning method is also proposed in [28] for solving the NILM problem.
Unsupervised methods [29][30][31] collect features through power consumption data sets. In [32], the graph-based signal processing (GSP) load disaggregation is developed without the need for training. NILM based on unsupervised learning is proposed in [33], in which the fuzzy clustering algorithm called entropy index constraints competitive agglomeration (EICCA) is improved and utilized for solving the load disaggregation problem.
In this paper, a transparent unsupervised approach based on dimensional reduction is used to improve the residential load disaggregation problem via a visual and transparent process. Here, the power consumption curves of home electrical appliances are acting as a vector in high-dimensional space. The high-dimensional power consumption curves, related to household electrical appliances, are diminished to low-dimensional ones via principal component analysis (PCA) to disaggregate them. The proposed method does not require the training of specific networks by the use of training data to identify their characteristics, so some probabilities, based on inaccurate learning, will reduce the accuracy of the problem. The proposed method uses a feature space to transfer data from a high-dimensional space to a low-dimensional one. Because every household electrical appliance has its own consumption pattern, it is possible to obtain the inherent characteristics and patterns of each appliance by extracting eigenvalues and eigenvectors of the consumption curve of each appliance.
To apply the proposed method, the low-frequency data of power consumption readings at the meter, related to the REDD dataset [34], is utilized. This data shows the power consumption of several home electrical appliances in the real world. To obtain the best results in this paper, transient state information for each appliance are considered, because selecting the operating state of each appliance has a great impact on the aggregation operation.
The rest of this paper is structured as follows: PCA is elucidated in Section 2. Section 3 describes in detail the case study. How to apply PCA on data, experimental results, and load disaggregation results via proposed method are presented in Section 4. Finally, Section 5 concludes the paper.

Principal Component Analysis
In statistical analysis, principal component analysis (PCA) was introduced by Hotelling as a tool for dimension reduction of data in 1933. PCA is a convenient and useful method to compress images, reduce dimensions in high-dimensional data, and a common application for pattern recognition and feature extraction of big data [35,36]. The fundamental idea of PCA is to find an orthogonal linear model, which designs the high-dimensional data on a low-dimensional space known as the principal component (PC), while maximizing the variance of the data and minimizing the mean squared reconstruction error [37,38]. Achieving this idea first requires the calculation of the covariance matrix (CM) and then the obtaining of the eigenvalues and eigenvectors. In this paper, the PCA is used to identify eigenvalues and eigenvectors of power consumption curves of home electrical appliances, and to re-display them in a low-dimensional space. Let us suppose each database has power consumption curves of home electrical appliances with a column vector F i , the length of which consists of n eigenvectors that are in the power consumption curves of home electrical appliances inside the original space. For m items of F i vectors related to power consumption curves, F-matrix with the size of n × m could be defined [39]: The F data matrix can be transformed into a low dimensional space using PCA: where eigenvectors of data matrix F have formed the columns of scheme matrix H, and H T is the transpose of the matrix H.
The steps of PCA are as follows [39]: • approximation of the CM, • eigen-dissociation of the CM and selecting the k highest eigenvalues, • building the feature matrix I via respective eigenvectors, and • mapping the main power consumption curves to the k-dimensional vector space by applying the I.
Considering m items of power consumption vectors, a CM is obtained from the following equation [40]: where T represents the transmission of the vectors. Solving the following eigenvalues equation is required to conduct a specific analysis of the CM [39,40]: where I and λ show the identification matrix and eigenvalues, respectively. The total variance of the main matrix (dataset) elements for the average zero is equivalent to the sum of the eigenvalues [38]. After the transmutation, the variance of the i th element equalize λ i . To discover the adequate number of PCs to discriminate home electrical appliances, the accumulative contributory ratio (ACR) can be a useful parameter [41]. If the obtained eigenvalues are sorted in descending order, the ACR related to the first k PCs is explained as Having obtained the CM and computed the eigenvalues, we arrange their eigenvectors in descending order. To create the feature space, finding the first k PCs in which their γ k exceeds 0.85 is Sustainability 2020, 12, 3158 5 of 14 necessary [41]. After finding these k principal components and placing their eigenvectors in a matrix, the feature matrix I is formed. As the final result of PCA, matrix P is obtained from Equation (6).
Every electrical appliance has its own unique consumption pattern, but in most NILM problem solving techniques, some of the features of the power consumption curves are lost. The PCA method, by using its ability to detect the intrinsic structure and nature of data, can disaggregate the share of any electrical appliances' power consumption of the total home power consumption. Figure 2 illustrates the flowchart of the proposed method in this paper for load disaggregation. Figure 3 shows the basic principal diagram of the work done in this paper, step by step. illustrates the flowchart of the proposed method in this paper for load disaggregation. Figure 3 shows the basic principal diagram of the work done in this paper, step by step.

Case Study
In this paper, experiments were performed on the REDD dataset. The REDD dataset contains low-frequency data for 6 homes in Massachusetts, USA, including the total power consumption of  illustrates the flowchart of the proposed method in this paper for load disaggregation. Figure 3 shows the basic principal diagram of the work done in this paper, step by step.

Case Study
In this paper, experiments were performed on the REDD dataset. The REDD dataset contains low-frequency data for 6 homes in Massachusetts, USA, including the total power consumption of

Case Study
In this paper, experiments were performed on the REDD dataset. The REDD dataset contains low-frequency data for 6 homes in Massachusetts, USA, including the total power consumption of the home, and the power consumption of each individual electrical device in the home [34]. Given that the main grid is sampled at 1 Hz, we used a 3 s interval to match these readings with the main grid, on both the main and plug levels (1/3 Hz), for on-line disaggregation. Because this data is relevant to real-world use, it has been used in most studies in NILM fields. To apply the proposed method, data from three houses including REDD house 1, REDD house 2, and REDD house 3 were used. Table 1 presents the types of household electrical appliances that the proposed method was able to identify.

Experimental Results
It is necessary to identify and extract the features and consumption patterns of the appliances, to load/energy disaggregate and to assess the consumption of each electrical appliance from the total power consumption of the whole house. In this paper, extraction of features and consumption patterns of household electrical appliances is done using PCA.
Using the proposed method requires a database as input. We used the power consumption curves of the electrical appliances presented in Table I as inputs. In this database, the power consumption curve of the two-day (2880 min) operation of each appliance was considered as a sample of each appliance. Four samples from each appliance (power consumption for the first eight days of each house) were considered as inputs. Figure 4 illustrates the data considered for the power consumption of the appliances in REDD House 1, as the network input.
After collecting the database, the steps were performed as presented in the flowchart of Figure 1. For more accuracy, load disaggregation was conducted based on dimension reduction. This method maintains the highest variance and eigenvalue for each power consumption curve in the principal component. The five highest values of the calculated eigenvalues, and the computed ACRs for them, are given in Table 2. It is visible that k exceeded 0.90 using two PCs. Thus, the power consumption curves of home electrical appliances in high-dimensional space could be reduced to vectors in two-dimensional space by sustaining the highest variance. Figure 5 shows the results of the separation of the power consumption pattern of each electrical appliance in the studied homes of the REDD dataset via PCA. It can be seen that the proposed method was able to disaggregate the power consumption patterns of electrical appliances in two-dimensional space by extracting the power consumption curve features.
Using the proposed method requires a database as input. We used the power consumption curves of the electrical appliances presented in Table I as inputs. In this database, the power consumption curve of the two-day (2880 min) operation of each appliance was considered as a sample of each appliance. Four samples from each appliance (power consumption for the first eight days of each house) were considered as inputs. Figure 4 illustrates the data considered for the power consumption of the appliances in REDD House 1, as the network input.  After collecting the database, the steps were performed as presented in the flowchart of Figure  1. For more accuracy, load disaggregation was conducted based on dimension reduction. This method maintains the highest variance and eigenvalue for each power consumption curve in the principal component. The five highest values of the calculated eigenvalues, and the computed ACRs for them, are given in Table 2. It is visible that exceeded 0.90 using two PCs. Thus, the power consumption curves of home electrical appliances in high-dimensional space could be reduced to vectors in two-dimensional space by sustaining the highest variance. Figure 5 shows the results of   Table 2. It is visible that exceeded 0.90 using two PCs. Thus, the power consumption curves of home electrical appliances in high-dimensional space could be reduced to vectors in two-dimensional space by sustaining the highest variance. Figure 5 shows the results of the separation of the power consumption pattern of each electrical appliance in the studied homes of the REDD dataset via PCA. It can be seen that the proposed method was able to disaggregate the power consumption patterns of electrical appliances in two-dimensional space by extracting the power consumption curve features.   Now, to test and monitor the accuracy and efficiency of the proposed method, new samples of the power consumption of each electrical appliance were needed. To do this, new samples of the power consumption of each electrical appliance were considered over a two-day period. The PCA method was used to extract the features of this data, and the test results for the electrical appliances of each house by new samples are shown in Figure 6. In this figure the black color was used to represent each new sample of each electrical appliance.
From the above figures, it is clearly visible that the proposed method fully recognized the power consumption patterns of the new samples of each of the home appliances and identified them from the previous samples. However, the basic principle in the NILM is the disaggregation and detection of the power consumption of each electrical appliance from the total power consumption of the home. In this regard, samples of whole-house power consumption taken from the home's electricity smart meter were considered, each of which represented two hours. This data was used as input for PCA. Figure 7 shows plotted samples of the total home power consumption of each dataset. Dimension reduction via PCA was applied to the new samples that were obtained from the total home consumption. Figure 8 shows the test results of the proposed method using the new data.
The results of load disaggregation for the intended data revealed the performance and accuracy of the proposed method. It was found that PCA was able to efficiently display and distribute the nature and pattern of power consumption for household electrical appliances, in a two-dimensional space. Identifying the intrinsic behavior of any electrical appliance with regard to its power consumption could accurately perform the disaggregation of the total power consumption of an entire house in different hours. Now, to test and monitor the accuracy and efficiency of the proposed method, new samples of the power consumption of each electrical appliance were needed. To do this, new samples of the power consumption of each electrical appliance were considered over a two-day period. The PCA method was used to extract the features of this data, and the test results for the electrical appliances of each house by new samples are shown in Figure 6. In this figure the black color was used to represent each new sample of each electrical appliance.  From the above figures, it is clearly visible that the proposed method fully recognized the power consumption patterns of the new samples of each of the home appliances and identified them from the previous samples. However, the basic principle in the NILM is the disaggregation and detection of the power consumption of each electrical appliance from the total power consumption of the home. In this regard, samples of whole-house power consumption taken from the home's electricity smart meter were considered, each of which represented two hours. This data was used as input for PCA. Figure 7 shows plotted samples of the total home power consumption of each dataset. Dimension reduction via PCA was applied to the new samples that were obtained from the total home consumption. Figure 8 shows the test results of the proposed method using the new data. A comparison of the results should be made to show the accuracy and efficiency of the proposed method compared to other methods [12]. This is done by calculating the F-score as follows: where the F-Score is the accuracy evaluation metric for the predicted results, TI represents the number of samples that were truly identified, and FI represents the number of samples that were falsely identified.
the previous samples. However, the basic principle in the NILM is the disaggregation and detection of the power consumption of each electrical appliance from the total power consumption of the home. In this regard, samples of whole-house power consumption taken from the home's electricity smart meter were considered, each of which represented two hours. This data was used as input for PCA. Figure 7 shows plotted samples of the total home power consumption of each dataset. Dimension reduction via PCA was applied to the new samples that were obtained from the total home consumption. Figure 8 shows the test results of the proposed method using the new data.  Direct comparisons of results should be done with extreme caution; this was done by ensuring that the database used in all cases were similar. Because this paper used REDD data, comparisons were made with the works that previously used this data, and the results of these comparisons are stated in Table 3. Table 3. Performance comparison of the proposed method with other unsupervised solutions for REDD data.

Appliance Identification Method
Remarks F-Score Proposed Method Using all appliances from REDD houses 1, 2, and 3 94.68% Basic NILM [42] Using all appliances from REDD 79.7% Supervised GSP [43] Using 5 appliances selected from the REDD 64% Unsupervised GSP [32] Using 5 appliances selected from the REDD 72.2% Unsupervised HMM [44] Using 7 appliances selected from the REDD 62.2% Unsupervised dynamic time warping (DTW) [45] Using 9 appliances selected from the REDD 68.6% Supervised decision-tree (DT) [45] Using 9 appliances selected from the REDD 76.4% Viterbi algorithm [46] Using 9 appliances selected from the REDD 88.1% Since unsupervised solutions have no information (target) about the features of input data (power consumption curves), they must have the ability to enable the extraction high features of data so that they can perform the detection operation well. The presented results in Table 3, show the accuracy and efficiency of the proposed method compared to other unsupervised methods, in identifying power consumption patterns of household electrical appliances.

Conclusions
To address non-intrusive load monitoring in an efficient and transparent manner, the pattern recognition of power consumption time series is very helpful. In this paper, principal component analysis, as an unsupervised approach, was used to extract useful features from power consumption data to detect consumer type. This approach displays high-dimensional data in a low-dimensional space by preserving maximum information of the initial data. Extraction of features and recognition of the consumption patterns of each electrical appliance in the load disaggregation make it possible for the consumers to be aware of their own power consumption pattern in any time period. Low-frequency sampled data from the REDD was used to test the proposed method. Power consumption signatures received from each home electrical appliance at different times were considered as input data for PCA. By applying the proposed method to the input data power, consumption patterns of each electrical appliance in a two-dimensional space was transparently observed. Subsequently, PCA was applied to the samples of the total home power consumption for load disaggregation. The power consumption of the household electrical appliances was estimated from the total power consumption of the home at different times. Clarity and transparency in displaying power consumption patterns of different electrical appliances in a low-dimensional space, makes the proposed method desirable.