A Novel Nonintrusive Load Monitoring Approach based on Linear-Chain Conditional Random Fields

In a real interactive service system, a smart meter can only read the total amount of energy consumption rather than analyze the internal load components for users. Nonintrusive load monitoring (NILM), as a vital part of smart power utilization techniques, can provide load disaggregation information, which can be further used for optimal energy use. In our paper, we introduce a new method called linear-chain conditional random fields (CRFs) for NILM and combine two promising features: current signals and real power measurements. The proposed method relaxes the independent assumption and avoids the label bias problem. Case studies on two open datasets showed that the proposed method can efficiently identify multistate appliances and detect appliances that are not easily identified by other models.


Background
As the core of an interactive service system, smart power utilization is one of the essential components of a smart grid. There are three aspects to the key technologies associated with this: advanced metering infrastructure (AMI) standards, systems, and terminal technologies; intelligent two-way interactive operation mode and supporting techniques; and the interaction between the user's electrical environment and energy consumption patterns. In actual production, we must break through the bottleneck regarding the meter only being able to read the total amount of energy consumption rather than analyzing the internal load components for users. Load monitoring can not only improve the power information collection system and intelligent power system but also support two-way interactive service and smart power utilization. Nonintrusive load monitoring (NILM), which is a vital part of smart power utilization techniques, can achieve fine-grained tracking of energy consumption and provide load disaggregation information without any intrusive device installation. These data can be further applied to optimize energy conservation strategies.

Literature Review and Motivation
NILM was first proposed by Hart [1], who devised a method for appliance load monitoring by only identifying electrical appliances within the aggregate power consumption data. This method decomposes the aggregated data into the actual power components of each load and avoids cumbersome device installation. Since then, many new methods have been introduced for load disaggregation, such as Bayes [2] and support vector machines (SVMs) [3,4]. Bayes has shown good performance in Energies 2019, 12,1797 3 of 17 require the stable power measurements needed for Bayes as well as the exhausting parameter training needed for SVM. Moreover, by quantizing the power probability density function for each load, we can easily identify multistate appliances. We also employ two promising features: current signals and real power measurements to develop our model. Experimental results on two open datasets demonstrate that the proposed model is feasible for a NILM task.

Contributions
Our main contributions are as follow: • We proposed a method called the linear-chain CRF model for load disaggregation and achieved accuracy of 96.04-99.94%. It is demonstrated that this method is effective for the NILM task. • Because we relaxed the independent assumption required by HMM-based models and avoided the label bias problem, the performance is enhanced by 2.21% compared to existing models.

•
We combined two promising features: current signals and real power measurements to build our model, which improved the accuracy of the model significantly. Figure 1 shows the goal of our model: breaking down the aggregate data into the actual power consumption of each appliance. Figure 2 illustrates the main framework of our linear-chain CRF model for NILM. First, submeter data of each load was used to create the probability density function for each appliance to acquire the working states. Then, the states of the appliances were grouped to tag and segment the smart meter data. Next, our model extracted features over the training set according to the feature templates. Consequently, the improved iterative scaling algorithm (IIS) was used to train the linear-chain CRF model. Finally, we adopted the Viterbi algorithm to disaggregate the states for each appliance given the aggregate power data.

Methodology
Energies 2019, 12, x FOR PEER REVIEW 3 of 17 as well as the exhausting parameter training needed for SVM. Moreover, by quantizing the power probability density function for each load, we can easily identify multistate appliances. We also employ two promising features: current signals and real power measurements to develop our model. Experimental results on two open datasets demonstrate that the proposed model is feasible for a NILM task.

Contributions
Our main contributions are as follow: • We proposed a method called the linear-chain CRF model for load disaggregation and achieved accuracy of 96.04-99.94%. It is demonstrated that this method is effective for the NILM task.

•
Because we relaxed the independent assumption required by HMM-based models and avoided the label bias problem, the performance is enhanced by 2.21% compared to existing models.

•
We combined two promising features: current signals and real power measurements to build our model, which improved the accuracy of the model significantly. Figure 1 shows the goal of our model: breaking down the aggregate data into the actual power consumption of each appliance. Figure 2 illustrates the main framework of our linear-chain CRF model for NILM. First, submeter data of each load was used to create the probability density function for each appliance to acquire the working states. Then, the states of the appliances were grouped to tag and segment the smart meter data. Next, our model extracted features over the training set according to the feature templates. Consequently, the improved iterative scaling algorithm (IIS) was used to train the linear-chain CRF model. Finally, we adopted the Viterbi algorithm to disaggregate the states for each appliance given the aggregate power data.

Probability Mass Functions
Various appliances, such as washing machines, have multiple operating states. The simple on state cannot reflect the real state change when the appliance is working. To identify the different working states of multistate type appliances at a given time, we used the approaches of Stephen [13] to quantize power probability mass function (PMF) for each appliance. We took the PMF as the probability density function (PDF) for their working states. Figures 3 and 4 show the power PDF of some appliances in AMPds2 [14] and REDD house 2 [15]. Compared with low power measurements, most probabilities of high power measurements were excessively low, so we enlarged the y-axis scale appropriately to make it clear.

Probability Mass Functions
Various appliances, such as washing machines, have multiple operating states. The simple on state cannot reflect the real state change when the appliance is working. To identify the different working states of multistate type appliances at a given time, we used the approaches of Stephen [13] to quantize power probability mass function (PMF) for each appliance. We took the PMF as the probability density function (PDF) for their working states. Figures 3 and 4 show the power PDF of some appliances in AMPds2 [14] and REDD house 2 [15]. Compared with low power measurements, most probabilities of high power measurements were excessively low, so we enlarged the y-axis scale appropriately to make it clear.

Probability Mass Functions
Various appliances, such as washing machines, have multiple operating states. The simple on state cannot reflect the real state change when the appliance is working. To identify the different working states of multistate type appliances at a given time, we used the approaches of Stephen [13] to quantize power probability mass function (PMF) for each appliance. We took the PMF as the probability density function (PDF) for their working states. Figures 3 and 4 show the power PDF of some appliances in AMPds2 [14] and REDD house 2 [15]. Compared with low power measurements, most probabilities of high power measurements were excessively low, so we enlarged the y-axis scale appropriately to make it clear. working states of multistate type appliances at a given time, we used the approaches of Stephen [13] to quantize power probability mass function (PMF) for each appliance. We took the PMF as the probability density function (PDF) for their working states. Figures 3 and 4 show the power PDF of some appliances in AMPds2 [14] and REDD house 2 [15]. Compared with low power measurements, most probabilities of high power measurements were excessively low, so we enlarged the y-axis scale appropriately to make it clear.     When the power measurement of the appliance is distributed in a certain power range, it indicates that the device is in a specific working state. By figuring out the power distribution and finding the power range of the concentrated power distribution, we could analyze the working states for appliances. Let ( ) represent the probability of , where is the number of possible observed power measurements. In Stephen's [13] paper, they found the power range by capturing the peak, which is defined as when the slope on the left in the PDF is positive and the slope on the right is negative, which is to say: where is used to make sure that the probabilities under this value will not be quantized as a state. However, on the one hand, we considered that the value of was hard to generalize since it varied in different datasets and different appliances. Furthermore, this method pays more attention to the peaks with higher probability. However, these peaks are mainly distributed in low power measurements, and most of them are noises rather than states. In fact, some high power measures include some major working states, to which importance should be attached. Therefore, we combined some states with low power measurements and concentrated on the states with high measurements according to the PDF of each appliance. On the other hand, this approach was used to identify a load with a finite number of operating states and that worked worse when the appliances belonged to continuously variable devices. It was apparent to see from the PDF that some appliances, such as dining room plugs and instant hot water units, were not multistate appliances. Thus, it was inapplicable to quantize their PDFs for working states. We simply determined the on/off states for this type of appliance. More details are discussed in Section 3. When the power measurement of the appliance is distributed in a certain power range, it indicates that the device is in a specific working state. By figuring out the power distribution and finding the power range of the concentrated power distribution, we could analyze the working states for appliances. Let P(n) represent the probability of n, where n is the number of possible observed power measurements. In Stephen's [13] paper, they found the power range by capturing the peak, which is defined as when the slope on the left in the PDF is positive and the slope on the right is negative, which is to say: where ε is used to make sure that the probabilities under this value will not be quantized as a state. However, on the one hand, we considered that the value of ε was hard to generalize since it varied in different datasets and different appliances. Furthermore, this method pays more attention to the peaks with higher probability. However, these peaks are mainly distributed in low power measurements, and most of them are noises rather than states. In fact, some high power measures include some major working states, to which importance should be attached. Therefore, we combined some states with low power measurements and concentrated on the states with high measurements according to the PDF of each appliance. On the other hand, this approach was used to identify a load with a finite number of operating states and that worked worse when the appliances belonged to continuously variable devices. It was apparent to see from the PDF that some appliances, such as dining room plugs and instant hot water units, were not multistate appliances. Thus, it was inapplicable to quantize their PDFs for working states. We simply determined the on/off states for this type of appliance. More details are discussed in Section 3.

Segmenting Data
CRFs are a framework for segmenting and labeling sequential data. Let S = {s 1 , s 2 , . . . , s n } be the label sequences, and P = p 1 , p 2 , . . . , p n be the observation sequences. A graphical structure of linear-chain CRFs is shown in Figure 5, which demonstrates that the input of our model is a series of sequences. Our templates then extract features throughout each chain. Therefore, segmenting smart meter data is crucial for feature extraction and model performance. CRFs are adept at dealing with a sentence with no more than 20 tokens. Considering that the working state of an appliance from an hour or 30 min ago has little effect on the current working state, we segmented smart meter data into a sequence for the AMPds2 datasets every 10 min and every minute for the REDD dataset in terms of their different sampling rates (per minute in AMPds2 and per 3 s in REDD). Then, 10 tokens were included in a sequence for the AMPds2 datasets and 20 for the REDD dataset, which made our model perform more efficiently compared with other segmentation methods.

Segmenting Data
CRFs are a framework for segmenting and labeling sequential data. Let = { , , … , } be the label sequences, and = { , , … , } be the observation sequences. A graphical structure of linearchain CRFs is shown in Figure 5, which demonstrates that the input of our model is a series of sequences. Our templates then extract features throughout each chain. Therefore, segmenting smart meter data is crucial for feature extraction and model performance. CRFs are adept at dealing with a sentence with no more than 20 tokens. Considering that the working state of an appliance from an hour or 30 min ago has little effect on the current working state, we segmented smart meter data into a sequence for the AMPds2 datasets every 10 min and every minute for the REDD dataset in terms of their different sampling rates (per minute in AMPds2 and per 3 s in REDD). Then, 10 tokens were included in a sequence for the AMPds2 datasets and 20 for the REDD dataset, which made our model perform more efficiently compared with other segmentation methods.
where is a transition feature function depending on the current state and previous state − 1 in the label sequence given the observation sequences; is a state feature function depending on the current state in the label sequence, which is also viewed as a local feature function; and =

Extracting Features
Let Y = y 1 , y 2 , . . . , y n be the label sequences, X = {x 1 , x 2 , . . . , x n } be the observation sequences, λ = {λ k } ∈ R, µ = µ k ∈ R be the parameter vectors, and P(y|x) represent the linear-chain CRFs. Then, define the probability of marking a tag sequence Y on a given observation sequence X as follows [16]: where t k is a transition feature function depending on the current state i and previous state i − 1 in the label sequence given the observation sequences; s l is a state feature function depending on the current state i in the label sequence, which is also viewed as a local feature function; and λ = {λ k } ∈ R, µ = µ k ∈ R are the parameter vectors, which index the weights of the corresponding t k and s l function and can be learned by our model. We defined feature functions t k and s l using feature templates. A feature template has the form of a single state S n or some combination of current states and previous states S n−k . . . S n . For example, assume that we have a power measurement sequence: 1919, 1918, 1921, 106, 107, 105, 106, 2, 3, 1. The corresponding state sequence is: 2, 2, 2, 1, 1, 1, 1, 0, 0, 0. A single state S n template refers to series state functions s nj , where n is the position of the current token and j is the number of appliance states. Let the current token be the fifth one; then, we define s 51 : if (state = 1 and power measurement = 107) return 1 or else return 0; s 52 : if (state = 2 and power measurement = 107) return 1 else return 0; s 53 : if (state = 0 and power measurement = 107) return 1 else return 0. Similarly, templates have the form of S n−k . . . S n , representing several transition functions t nj where n is the position of the current token, n − k is the position of the previous token, and j is the number of appliance states. Let k = 1; then, we construct functions as follows: t 51 : if (state = 1 and power measurements = 106, 107) return 1 else return 0; t 52 : if (state = 2 and power measurement = 106, 107) return 1 else return 0; t 53 : if (state = 0 and power measurement = 106, 107) return 1 else return 0. The whole process is shown in Figure 6. : if (state = 0 and power measurement = 106, 107) return 1 else return 0. The whole process is shown in Figure 6. Our model constructs * feature functions according to the feature templates designed, where represents the number of output types, and represents the number of expanded features. In practice, many feature functions are constructed. For example, in our experiments, 4,704,668 feature functions were produced for five loads (CWE, DWE, FRE, HPE, and WOE) in AMPds2. The excessive feature functions increased the complexity of our model and made it difficult for subsequent training and testing. Actually, some measurements in the dataset were inaccurate or completely noisy, which made those feature functions considering these measurements unnecessary. We found that the frequency of these feature functions' occurrence was much less than normal functions. Therefore, we ignored those functions with fewer than three occurrences, which reduced the complexity greatly.

Improved Iterative Scaling (IIS) Algorithm
Formulas (4) and (5) define the primary form of linear-chain CRFs. The parameters and are the corresponding weights to be estimated from the training set. From Formula (4), we can easily discover that the definition of ( | ) is similar to a maximum entropy model. Actually, the CRF model is motivated by the principle of maximum entropy. Thus, we could apply the IIS algorithm of the maximum entropy model for parameter learning.
To simplify, let there be transition feature functions and state feature functions, = + , defined as Our model constructs L * N feature functions according to the feature templates designed, where L represents the number of output types, and N represents the number of expanded features. In practice, many feature functions are constructed. For example, in our experiments, 4,704,668 feature functions were produced for five loads (CWE, DWE, FRE, HPE, and WOE) in AMPds2. The excessive feature functions increased the complexity of our model and made it difficult for subsequent training and testing. Actually, some measurements in the dataset were inaccurate or completely noisy, which made those feature functions considering these measurements unnecessary. We found that the frequency of these feature functions' occurrence was much less than normal functions. Therefore, we ignored those functions with fewer than three occurrences, which reduced the complexity greatly.

Improved Iterative Scaling (IIS) Algorithm
Formulas (4) and (5) define the primary form of linear-chain CRFs. The parameters λ k and µ l are the corresponding weights to be estimated from the training set. From Formula (4), we can easily discover that the definition of P(y|x) is similar to a maximum entropy model. Actually, the CRF model is motivated by the principle of maximum entropy. Thus, we could apply the IIS algorithm of the maximum entropy model for parameter learning.
To simplify, let there be M 1 transition feature functions and M 2 state feature functions, Then, CRF can be normalized as a product of vector ω and F(y, x): Given the empirical distribution P(x, y), the log-likelihood function L p (P w ) of conditional probability distribution P(y|x) is defined as: When P(y|x) is defined as (10), the log-likelihood function can be derived as followed: Assuming the current vector ω = (ω 1 , ω 2 , . . . , ω M ) T , the IIS algorithm tries to find the best vector ω + δ = (ω 1 + δ 1 , ω 2 + δ 2 , . . . , ω M + δ M ) T , which increases the value of the log-likelihood function. According to Adam [16], the IIS algorithm finds out the increment vector δ = (δ 1 , δ 2 , . . . , δ M ) T by solving the renewal equation for transition feature Function (14) and state feature Function (15): where k = 1, 2, . . . , M 1 ; y i and y i−1 refer to the current and previous power measurements; y depends on all states. x,y where k = M 1 + l; l = 1, 2, . . . , M 2 , T(y, x) is the summation of all feature functions: The complete IIS algorithm is shown in Algorithm 1 below.

Viterbi Algorithm
The Viterbi algorithm for CRF prediction is similar to the one for HMM. Assuming that the observation sequences are {x}, then the task of prophecy is to find the max probability of label sequences {y * }: y * = argmax y (P w (y|x))= argmax y 1 Z w (x) exp(ω · F(y, x))= argmax y exp(ω · F(y, x))= argmax y (ω · F(y, x)). (17) Therefore, the prediction problem for CRF is converted to max y (ω · F(y, x)). The Viterbi algorithm is shown in Algorithm 2 below. Algorithm 2. Viterbi algorithm for CRF prediction.

Data
The tests were conducted using real monitoring data from AMPds2 [14] and REDD house 2 [15]. The AMPds2 dataset collected the electricity usage of a Canadian family for two years, with a sampling frequency of one reading every minute. It monitored 24 appliances, but only 21 were kept, for they did not detect any data of the removed appliances for the entire measurement time. There were just a few missing data or errors in the dataset, and the algorithm was used to populate the missing data so that the whole dataset was contiguous. This facilitated the division of sequences in subsequent model training. In terms of electricity data, AMpds2 provided 11 measurements: voltage, current, frequency, displacement power factor, apparent power factor, real power, real energy, reactive power, reactive energy, apparent power, and apparent energy, which made it easy to select different features for improving model performance. Developed specifically for load disaggregation, the REDD dataset gathered real power consumption in some homes over several months, with a sampling frequency of approximately 3 s for every reading. In our experiments, we only used the data of house 2 in the REDD dataset, which included 10 types of equipment: lighting, refrigerator, dishwasher, washer-dryer, bathroom GFI, kitchen outlets, oven, microwave, electric heat, and stove.

Experimental Setup
Firstly, we segmented smart meter data into a sequence for AMPds2 datasets every 10 min and every 1 min for the REDD dataset, as discussed in Section 2.2. We chose the power measurements and current signals in AMPds2 and single power measurements in REDD for disaggregation. Then, we designed several templates for feature extraction. Table 1 shows the list of the feature templates used in our experiments. Among them, Templates 1 and 2 refer to the single power signature in REDD house 2 and AMPds2, respectively, while Templates 3 represent the double signatures: power and current in AMPds2. Only extracting features over a continuous period of time was meaningful, which directly reflected the influence of previous states. If the time interval between two measurements is too large, for example 30 min, then it is not necessary to construct a transition function for these two measurements, because the state of an appliance half an hour ago has little effect on the current state. However, the timestamps in REDD house 2 was not continuous, so we found those intervals and just segmented data through those continuous data. Next, CRF++ [17] was used to build our model. CRF++ is an open-source CRF tool for continuous data annotation and segmentation, which is easy to use and customizable. We removed the features function for which the occurrences were less than three to further reduce the complexity of our model as claimed in Section 2.3. Additionally, a hyper-parameter C need to be selected in CRF++ to trade the balance between overfitting and underfitting. We found that the optimal value is 1.5 after cross-validation. All our work was carried out in Python 3 and C++. We also used 10-fold cross-validation to acquire the best error estimation.

Evaluation Metrics
In our paper, let Acc be the accuracy, T be the correct prediction, and F be the incorrect prediction. Then, Acc is defined as This metric has normally been adopted by many researchers such as Stephen [13] and Kolter [15]. However, we do not think this indicator can properly reflect the performance of the model. Therefore, Energies 2019, 12, 1797 12 of 17 we adopted a new evaluation indicator: total load accuracy. Let x i , i = 1, 2, . . . , N be the appliances monitored in the house, l j , j = 1, 2, . . . , M be the observation sequences, and TAcc be the total loads' accuracy. We employed the following notation: 1 if the predicted state of i appliance at j is the same as the real state 0 otherwise Then, the total loads' accuracy TAcc is defined as Each load has one state at any given time; the total load's accuracy refers to the accuracy that all appliance states are assigned correctly at a given time. We combined this index for estimation because we believed that it could reflect the overall prediction ability of our model for the whole house. However, the accuracy results were generally lower than results in other papers, which only considered a single appliance's on/off accuracy.

Experiment Results and Analysis
To better test the accuracy of our linear-chain CRF model, we chose seven appliances in REDD house 2: lighting, stove, microwave, washer-dryer, refrigerator, dishwasher, and disposal. Figure 7 illustrates the seven loads' on-duration accuracy in REDD house 2. Obviously, the refrigerator showed the best score, while the disposal scores were very low. The low accuracy results were due to there being less disposal data working in the training sets. We also found that the power measures of the washer-dryer were mainly distributed from 0 to 10 all the time, which was purely low for a normal washer-dryer and similar to other appliances' off state. Thus, our model mostly tagged the washer-dryer working when the measurements varied from 1 to 10, which made the accuracy results higher. We inferred that the washer-dryer in REDD house 2 did not work and the measurements were completely noisy. This metric has normally been adopted by many researchers such as Stephen [13] and Kolter [15]. However, we do not think this indicator can properly reflect the performance of the model. Therefore, we adopted a new evaluation indicator: total load accuracy. Let , = 1,2, … , be the appliances monitored in the house, , = 1,2, … , be the observation sequences, and be the total loads' accuracy. We employed the following notation: , , , = 1 if the predicted state of appliance at is the same as the real state 0 otherwise Then, the total loads' accuracy is defined as Each load has one state at any given time; the total load's accuracy refers to the accuracy that all appliance states are assigned correctly at a given time. We combined this index for estimation because we believed that it could reflect the overall prediction ability of our model for the whole house. However, the accuracy results were generally lower than results in other papers, which only considered a single appliance's on/off accuracy.

Experiment Results and Analysis
To better test the accuracy of our linear-chain CRF model, we chose seven appliances in REDD house 2: lighting, stove, microwave, washer-dryer, refrigerator, dishwasher, and disposal. Figure 7 illustrates the seven loads' on-duration accuracy in REDD house 2. Obviously, the refrigerator showed the best score, while the disposal scores were very low. The low accuracy results were due to there being less disposal data working in the training sets. We also found that the power measures of the washer-dryer were mainly distributed from 0 to 10 all the time, which was purely low for a normal washer-dryer and similar to other appliances' off state. Thus, our model mostly tagged the washer-dryer working when the measurements varied from 1 to 10, which made the accuracy results higher. We inferred that the washer-dryer in REDD house 2 did not work and the measurements were completely noisy. We extracted some test sequences, as shown in Figure 8. It illustrates the real state changes of the electrical appliances which were working within a period of 150 s, as well as the inference results of our linear-chain CRF model according to the same data. It is clear that our model worked when different electrical appliances were used at the same time. Nevertheless, errors may have occurred when the power's values of different working states of electrical appliances were similar. For example, during the period from 100 to 150 s, the total power decreased because the refrigerator stopped We extracted some test sequences, as shown in Figure 8. It illustrates the real state changes of the electrical appliances which were working within a period of 150 s, as well as the inference results of our linear-chain CRF model according to the same data. It is clear that our model worked when different electrical appliances were used at the same time. Nevertheless, errors may have occurred when the power's values of different working states of electrical appliances were similar. For example, during working. However, our model identified that the light and microwave stopped working while the refrigerator started working. Figure 8. Comparison between appliances' real and estimate states. Figure 9 shows each test's total loads' accuracy in REDD house 2. Among them, 1 load refers to the refrigerator only; 2 loads mean the refrigerator and microwave; 3 loads stand for the kitchen outlets, microwave, and dishwasher; 4 loads indicate the lighting, microwave, washer-dryer, and refrigerator; 5 loads represent the refrigerator, lighting, dishwasher, microwave, and stove; 6 loads denote the lighting, stove, microwave, refrigerator, dishwasher, and disposal; 7 loads show the lighting, stove, microwave, washer-dryer, refrigerator, dishwasher, and disposal. We can see that the correct rate of accurate prediction of all electrical appliances at each moment was over 88% throughout the test time. This indicates that our model could correctly reflect the working state of all electrical appliances in the house tested at any time, not just for a single device. As our linear-chain CRF model can combine more than one feature, we chose the current signals together with power measurements for parameter learning to test whether it promotes the performance of our model or not. Further, we hoped to estimate how well our model disaggregates loads in other datasets. We used five loads (including CDE, DWE, FRE, HPE, and WOE) in AMPds2 and verified the accuracy of a single power feature and double features. Figure 10 shows that using dual features can improve the efficiency to some extent. When it is challenging for the classifier to judge the state of the appliance only by the power value, the multiple features can provide an   Figure 9 shows each test's total loads' accuracy in REDD house 2. Among them, 1 load refers to the refrigerator only; 2 loads mean the refrigerator and microwave; 3 loads stand for the kitchen outlets, microwave, and dishwasher; 4 loads indicate the lighting, microwave, washer-dryer, and refrigerator; 5 loads represent the refrigerator, lighting, dishwasher, microwave, and stove; 6 loads denote the lighting, stove, microwave, refrigerator, dishwasher, and disposal; 7 loads show the lighting, stove, microwave, washer-dryer, refrigerator, dishwasher, and disposal. We can see that the correct rate of accurate prediction of all electrical appliances at each moment was over 88% throughout the test time. This indicates that our model could correctly reflect the working state of all electrical appliances in the house tested at any time, not just for a single device. working. However, our model identified that the light and microwave stopped working while the refrigerator started working. Figure 8. Comparison between appliances' real and estimate states. Figure 9 shows each test's total loads' accuracy in REDD house 2. Among them, 1 load refers to the refrigerator only; 2 loads mean the refrigerator and microwave; 3 loads stand for the kitchen outlets, microwave, and dishwasher; 4 loads indicate the lighting, microwave, washer-dryer, and refrigerator; 5 loads represent the refrigerator, lighting, dishwasher, microwave, and stove; 6 loads denote the lighting, stove, microwave, refrigerator, dishwasher, and disposal; 7 loads show the lighting, stove, microwave, washer-dryer, refrigerator, dishwasher, and disposal. We can see that the correct rate of accurate prediction of all electrical appliances at each moment was over 88% throughout the test time. This indicates that our model could correctly reflect the working state of all electrical appliances in the house tested at any time, not just for a single device. As our linear-chain CRF model can combine more than one feature, we chose the current signals together with power measurements for parameter learning to test whether it promotes the performance of our model or not. Further, we hoped to estimate how well our model disaggregates loads in other datasets. We used five loads (including CDE, DWE, FRE, HPE, and WOE) in AMPds2 and verified the accuracy of a single power feature and double features. Figure 10 shows that using dual features can improve the efficiency to some extent. When it is challenging for the classifier to judge the state of the appliance only by the power value, the multiple features can provide an  As our linear-chain CRF model can combine more than one feature, we chose the current signals together with power measurements for parameter learning to test whether it promotes the performance of our model or not. Further, we hoped to estimate how well our model disaggregates loads in other datasets. We used five loads (including CDE, DWE, FRE, HPE, and WOE) in AMPds2 and verified the accuracy of a single power feature and double features. Figure 10 shows that using dual features can improve the efficiency to some extent. When it is challenging for the classifier to judge the state of the appliance only by the power value, the multiple features can provide an inferential basis by providing other state parameters. For example, our model performed much better in identifying the wall oven using double features, which was better than using a single feature by 32.49%. inferential basis by providing other state parameters. For example, our model performed much better in identifying the wall oven using double features, which was better than using a single feature by 32.49%. (b) Histogram charts with five loads accuracy. "Single" means that only power measurements were used, while "Double" means that power measurements and current signals were both included.
In Stephen's [13] paper, they used a sparsity HMM and obtained a perfect result. Thus, an experiment was conducted to assess the performance of the proposed linear-chain CRFs. We used REDD house 2 to test the performance for each model. Stephen divided their tests into three categories: Denoised, Noisy, and Modeled. Our tests belonged to the Noisy configuration, which neither removes the noise in the aggregate observation sequences nor tries to model the noise as a load [13]. Therefore, the Noisy configuration is the most realistic configuration for testing. We found that the use of different datasets and measurement metrics made it nearly impossible to compare different algorithms. Thus, the same datasets and measurement metrics have been used as recommended in Stephen's paper. Firstly, we identified each load working state by quantizing its PMF. In Stephen's [13] paper, they quantized both power and current observations. We just quantized power measurements, because it was enough to describe the working states for each appliance. Table 2 and Table 3 show Stephen's and our results for some appliance state quantization in the AMPds2 dataset. '\' refers that the appliance does not have certain working state. In Stephen's results, they classified the low-power operating state of the appliance in detail while grouping all high-power operating states into one state. Hence, the quantization results generated by Stephen are not reasonable. In contrast, we roughly clustered the appliances into a low-power operating state while dividing the high-power operating states in detail. That is more in line with the actual working state of the appliances.  Histogram charts with five loads accuracy. "Single" means that only power measurements were used, while "Double" means that power measurements and current signals were both included.
In Stephen's [13] paper, they used a sparsity HMM and obtained a perfect result. Thus, an experiment was conducted to assess the performance of the proposed linear-chain CRFs. We used REDD house 2 to test the performance for each model. Stephen divided their tests into three categories: Denoised, Noisy, and Modeled. Our tests belonged to the Noisy configuration, which neither removes the noise in the aggregate observation sequences nor tries to model the noise as a load [13]. Therefore, the Noisy configuration is the most realistic configuration for testing. We found that the use of different datasets and measurement metrics made it nearly impossible to compare different algorithms. Thus, the same datasets and measurement metrics have been used as recommended in Stephen's paper. Firstly, we identified each load working state by quantizing its PMF. In Stephen's [13] paper, they quantized both power and current observations. We just quantized power measurements, because it was enough to describe the working states for each appliance. Tables 2 and 3 show Stephen's and our results for some appliance state quantization in the AMPds2 dataset. '\' refers that the appliance does not have certain working state. In Stephen's results, they classified the low-power operating state of the appliance in detail while grouping all high-power operating states into one state. Hence, the quantization results generated by Stephen are not reasonable. In contrast, we roughly clustered the appliances into a low-power operating state while dividing the high-power operating states in detail. That is more in line with the actual working state of the appliances.  Next, we compared some different appliance combinations in REDD house 2, and the results are shown in Table 4. The combinations are the same as in Figure 9. Our disaggregate results were slightly better than Stephen's results, demonstrating that a basic linear-chain CRF model performs better, especially for the case that includes a kitchen outlet (three loads). Most common algorithms cannot deal with kitchen outlets because their power values change irregularly according to the appliances plugged into them. By extracting previous states, our model could improve accuracy to some extent. However, our model scored lower than sparse HMM when it came to four loads involving a washer-dryer. We found that the power value of the washer-dryer in REDD house 2 was excessively low. Thus, compared with HMM, which only extracted the last information, our model was more prone to obtaining errors. Table 4. Accuracy comparison between the linear-chain CRF model and other algorithms.

Load\Acc (%) Linear-Chain CRFs Sparse HMM SVM-rbf SVM-Linear SVM-Sigmoid
In addition, the proposed method was compared with algorithms which were not based on the probabilistic graph model. We chose the SVM with three different kernels: radial basis function (rbf) kernel, linear kernel, and sigmoid kernel. There were several parameters that had to be determined cautiously to fit for the study, because a higher or lower figure can affect the results considerably and may lead to local maxima or overfitting. "C" is the penalty parameter of all three kernels, and "gamma" is the parameter of the rbf and sigmoid kernels. We employed a grid search to find the best parameters on a small scale of datasets. Then, we employed the best parameters to train the model on all of the training sets and then tested the performance on the test data. The best accuracy rate was obtained when C = 1.0 and gamma = 1.0. The accuracy results are shown in Table 4. It is clear that the rbf kernel was more suitable for identifying appliances in REDD house 2 compared with linear and sigmoid kernels. Moreover, the accuracy rates have a tendency to decrease when there are more appliances, while our model remained reliable. In fact, with the increase in the number of appliances, the total loads' accuracy will decline as shown in Figure 9. However, by extracting a large number of state change characteristics of appliances, the recognition accuracy for most appliances can still be very high.