Event Matching Classiﬁcation Method for Non-Intrusive Load Monitoring

: Nowadays, energy management aims to propose different strategies to utilize available energy resources, resulting in sustainability of energy systems and development of smart sustainable cities. As an effective approach toward energy management, non-intrusive load monitoring (NILM), aims to infer the power proﬁles of appliances from the aggregated power signal via purely analytical methods. Existing NILM methods are susceptible to various issues such as the noise and transient spikes of the power signal, overshoots at the mode transition times, close consumption values by different appliances, and unavailability of a large training dataset. This paper proposes a novel event-based NILM classiﬁcation algorithm mitigating these issues. The proposed algorithm (i) ﬁlters power signals and accurately detects all events; (ii) extracts speciﬁc features of appliances, such as operation modes and their respective power intervals, from their power signals in the training dataset; and (iii) labels with high accuracy each detected event of the aggregated signal with an appliance mode transition. The algorithm is validated using REDD with the results showing its effectiveness to accurately disaggregate low-frequency measured data by existing smart meters.


Introduction
Due to the unpredictable nature of both generation, caused by renewable energy resources, and consumer demand, maintaining the balance between generation and demand is one of the main challenges in smart grids [1,2]. Residential demand-side management programs have thus emerged as a promising set of methods to strike such balance [3]. In view of these programs, non-intrusive load monitoring (NILM), that is, the process of extracting the power profile or operating pattern of each appliance from the aggregated power signal of a house using purely analytical methods, has gained a great deal of attention in recent years [4]. Practical and efficient, NILM provides consumers with an opportunity to track the energy consumption of each appliance and voluntarily change their usage patterns to save energy and reduce the cost while maintaining their comfort which also results in higher stability and efficiency of the power grid [5,6].
The concept of NILM was first introduced in 1992 by Hart [7]. Since then, a variety of analytical algorithms have been proposed to address the NILM problem. These algorithms employ various features and parameters such as voltage, current, and active and reactive power signals of a house. As measuring the active power is cost-efficient, a majority of studies have focused on this feature alone [8]. NILM research based on the active power signal diverged into two main lines of study: (i) state-based algorithms that consider each appliance as a finite-state machine and disaggregate the total power signal based on the learned model of state transitions of appliances [9] and (ii) event-based algorithms, which are based on the edges or considerable variations of the signal caused by turning ON/OFF of appliances or their other mode transitions [3]. Due to the low computational complexity of event-based techniques, they have proved more popular than the state-based ones [10].
In designing an event-based algorithm, multiple challenges are involved. The first one lies in the event detection part caused by the presence of noise, spikes, uncertainties in the voltage of the grid, and overshoots in appliances' power signals. The second challenge is closeness of different appliances' consumption values which makes them somewhat indistinguishable. The third and last challenge is that high volume training datasets and ground-truth information about each appliance are scarce in practice, although a small amount of data can perhaps be collected for each residential building. Overcoming these challenges, this paper proposes an event-based NILM algorithm with competitive accuracy, which, first, detects events in the power signals via a novel method, then extracts specific features and information about appliances from their consumption profiles in a small training dataset, and finally utilizes them to disaggregate the aggregated power signal.

Related Work
The most well-known state-based NILM algorithms are the Hidden Markov Model (HMM) [11] and its variants such as Factorial HMM methods [12]. The main drawback of these methods is the requirement for a large training dataset to construct and learn the model. Computational complexities of these methods also increases exponentially by adding a new appliance [13]. However, event-based NILM techniques which deal with the detected events of the aggregated signal and classify them have lower computational complexities in comparison with state-based ones [14]. Recent research of event-based NILM falls into two main categories: unsupervised and supervised methods [15]. Unsupervised NILM algorithms, tackling the so-called blind source problem, deal with the case where no prior information about appliances is available. In these methods, events are detected and different clustering algorithms such as subtractive clustering [16] and k-means [17] are applied to them. They detect different clusters of appliances without assigning a label to each cluster. Despite some success in the case where all appliances have only two (ON and OFF) modes, these algorithms have been ineffective in dealing with multi-mode appliances [17,18].
In contrast with unsupervised NILM methods, supervised algorithms such as NILM classification algorithms require prior metadata information about the number of appliances and their operation modes as well as a training dataset containing appliances' consumption profiles for a period of time. Considering modes of appliances as class labels, various classification methods such as KNN [19], multi-label classification [20,21], and deep learning [22] have been utilized in this field. These methods have proved to be significantly more accurate than their unsupervised counterparts, particularly in the presence of multi-state appliances [22]. However, their main drawback is the need for an enormous training dataset that is not in general feasible to collect [23]. Therefore, extracting useful information from a small training dataset for the NILM classification problem has become a topic of great interest in the past few years [24].

Contributions
This paper proposes a novel event-based NILM algorithm that minimizes the groundtruth data required, performs well in analyzing real data measured by existing meters, and remains efficient and accurate even for large numbers of appliances. Major contributions of this work are detailed below.
(1) Event-based algorithms are highly dependent on detection of events. Therefore, the event detection algorithm used for the NILM purpose should be accurate in the sense that it should not miss any actual event or mistake fluctuations of the signal as an event. We propose in Section 3 a novel statistics-based method that filters the signal and does not require any predefined threshold. (2) For NILM as a classification problem, the number of operation modes of appliances and their respective consumption values are key to assigning labels. In most of the existing literature, these modes are obtained by visually analyzing the appliances' power signals in the training dataset. We introduce a clustering approach in Section 4.1, using in part the linkage-Ward algorithm, which automatically extracts appliances' modes and their respective consumption values. Then, in Section 5, a novel classification technique, with competitive accuracy, is established for the NILM problem employing previously extracted features. (3) Existing classification algorithms consider appliances' consumption values at each mode as their main characteristics. However, the appliance consumption pattern, its transitions between different modes, their ON duration period, and their probability of occurrence are also key information that can be used to distinguish two appliances with close consumption values. Analyzing the training dataset in Section 4, we extract these features of appliances and utilized them for label refinement.

Paper Organization
The remainder of this paper is organized as follows. The terminology and problem statement are presented in Section 2. In Section 3, the proposed signal filtering and event detection techniques are described. The feature extraction methods are then detailed in Section 4, followed by the proposed classification method in Section 5. The effectiveness and accuracy of the proposed algorithms are evaluated and compared with other algorithms using the REDD [25] in Section 6. Finally, Section 7 concludes the paper.

Terminology and Problem Statement
In this section, we present the terminology and the event-based NILM classification problem considered in this paper.

Notions and Terminology
In this research, power refers to active power. Operation modes of an appliance refer to a fixed set of modes, including the OFF mode, in which the appliance can operate. Appliances are assumed to have two or more operation modes. The operating mode of an appliance is the mode in which the appliance is operating at a specific point in time. When no ambiguity results, the term mode is used to refer to an operation mode or operating mode. A state of an appliance is defined as its power amount in one of its operation modes. As this amount is assumed to vary at least slightly over time, a state is represented by a fixed closed interval within the set R of real numbers. One notices that there exists a state corresponding to each operation mode of an appliance.
The aggregated power signal, or simply the aggregated signal, refers to the sum of power signals of all appliances of a house or specific appliances of interest. The term non-intrusive load monitoring, or NILM, in this work is then defined as extracting the sequence of operating modes of each appliance from the aggregated signal. This NILM problem is sometimes referred to as the NILM classification problem. While deducing individual power signals of appliances from the aggregated signal is also a NILM problem, that one views as a NILM regression problem, it is not considered in this work.
Given the power signal of an appliance, an event is a change in the signal value caused by a mode transition of the appliance. Similarly, an event of the aggregated signal is a change in the signal value caused by a mode transition of any of the appliances contributing to the aggregated signal.

Event-Based NILM Problem
The event-based NILM (classification) problem can be described as the process of assigning proper labels to events of the aggregated power signal, where the set of labels consists of all appliance mode transitions. It should be noted that a training set in the form of a set or sequence of events and their corresponding labels is in general not immediately available. Instead, it has to be derived from the given individual appliances' power signals over a period of time. No additional information, such as the number of modes of each appliance or their nominal consumption values, is available. It is assumed throughout the paper that the given signals are measured in discrete time.

Signal Filtering and Event Detection
A fundamental part of event-based NILM is detecting events accurately. Event detection should be executed on individual appliances' power signals in the training set as well as the aggregated signal in the test set. In the vast majority of the literature, an event is detected based on the difference between two consecutive sampled values [6]. More precisely, if the absolute value of this difference is greater than a certain threshold, an event is considered to have occurred between the two sampling times [26]. Existing threshold-based event detection techniques rely heavily on the threshold that is selected manually given the dataset in hand. Thus, they are not expected to perform as well on the meter's data of a different residential house. Beside this extensibility issue, there appears to exist a fundamental limit on the accuracy of threshold-based event detection techniques, which is caused by fluctuations of voltage in the power grid and noise, spikes, and the various ranges of overshoots in the signal, as shown in Figure 1 [27]. In this section, we propose a novel statistics-based algorithm that overcomes all these challenges and accurately detect events. While the mainstream view of an event is a significant value change in the signal, our view of an event is an uncommon value change. Thus, considering a set formed based on value changes in the signal, we search for "outliers" of the set. As it will be explained later in this section, this set consists of the min/max ratios between consecutive sampled values of the signal, subtracted from 1. Of course, careful considerations should be made with regard to transient spikes and lengthy overshoots during mode transitions, as they would also be outliers of the formed set. Detecting these spikes and overshoots and filtering them will also prove significant for getting more accurate results in the event-based NILM problem.
Our proposed event detection algorithm consists of three main steps: (1) outlier detection, (2) filtered signal construction, and (3) event detection, which will be discussed in the following subsections.

Outlier Detection
Different fields of research have been dealing with the outlier detection problem given a dataset and different methods have been proposed to address it [28,29]. We herein extend a statistics-based outlier detection method to suit the NILM problem. The algorithm starts with calculating the min/max ratio between any two consecutive sampled values of signal P(t) as (1), subtracting them from 1, and saving them in a vector M as (2).
Finally, for any t, if M(t) is greater than the calculated standard deviation, it is considered an outlier, and t is saved as the outlier occurrence instance in vector M o of outlier instances. Algorithm 1 illustrates the procedure of the proposed outlier detection method.

Algorithm 1: Proposed algorithm for outlier detection.
Step 0: Initialize the parameters, i = 1, M(: Step 3: Compute sd as the standard deviation of M Step 4: Extract outliers' instances based on sd

Filtered Signal Construction
Having performed outlier detection, outliers' instances are obtained, as well as instances that are not outliers, referred to as inlier instances. In this so-called filtering step, the aim is to flatten spikes and overshoots of the signal. To achieve this aim, the signal value at each outlier instance is substituted with the mean of the signal values at the following consecutive inlier instances, as shown in Figure 2. You may note that, unlike for spikes and overshoots, the signal values at actual event times will not experience significant change in the filtering step.

Event Detection
In the final step, to detect events, Algorithm 1 is applied to the filtered signal. Now, all detected outlier instances are considered as event instances. We note that transient spikes and overshoots of the original signal have already been flattened in constructing the filtered signal, meaning that they can no longer be mistaken for events.

Feature Extraction
The most common specific feature of appliances utilized in NILM algorithms is their consumption values at their operation modes. However, as different appliances may have operation modes with close consumption values, an effective load disaggregation algorithm should use additional appliance features broadly called finger prints in [30]. In this section, we propose different methods to extract operation modes of appliances and additional features from a small training dataset. The framework of this stage is illustrated in Figure 3.

Modes and States of Appliances
The states of an appliance, defined as the power interval corresponding to its operation modes, are its most useful features widely used for the NILM purpose. Therefore, detecting the number of modes of an appliance and their corresponding states plays a crucial role in the accuracy of NILM. As opposed to most of the existing literature that extracts an appliance's modes/states visually using its consumption values in the training set or by using the datasheet of the appliance, we propose a novel clustering-based approach for appliance modes/states extraction from its power signal in a systematic fashion.
Our approach is based on the linkage-Ward (LW) clustering algorithm [31]. The objective function of the LW algorithm is the squared sum of distances between data points and the their cluster centroids. The LW clustering algorithm first treats each data point as a cluster of its own, which means that the initial value of the objective function is 0. Then, clusters are merged together, one pair at every stage, based on the following merging policy: clusters A and B are merged if ∆(A, B) is a minimum among all pairs of clusters, where m A , m B , and m A∪B are centroids of clusters A, B, and A ∪ B, respectively, and n A and n B are the size of clusters A and B, respectively. One notices that ∆(A, B), which is non-negative, is in essence the cost of merging clusters A and B. Thus, the value of the objective function at any given stage is the total cost of all the merging up to that stage. Analyzing the increasing objective function, the elbow method [14] is often used to determine the optimal number of clusters, that is, to determine when to stop merging. Roughly speaking, according to the elbow method, merging stops when it becomes too costly compared to the merging at the previous stage.
It can be seen from the definition of ∆(A, B) in (4) that the combination of the LW algorithm and the elbow method is susceptible to unbalanced data. More precisely, when the LW algorithm is close to reaching the optimal number of clusters, the elbow method discourages stopping where two or more small clusters exist since merging them is now not relatively "too costly" even though their centroids may be far apart. For the mode/state extraction purpose, this could be a significant issue, as infrequently occurring modes of an appliance may be lumped into one mode with a wide-ranging power, which would undermine any attempt of NILM. Therefore, one should take advantage of the LW algorithm in such a way to suit the NILM purpose by discouraging merging of clusters that are far apart. Thus, the following algorithm is suggested for mode extraction.
After filtering the power signal of each appliance in the training dataset, the LW algorithm is applied to the signal's data considering K clusters, where K ≥ 10. The cluster centroids are then computed and sorted in descending order. From this stage forward, a different merging policy, entitled as distance-based policy, is adopted that only depends on the distance between cluster centroids. It starts by considering the cluster with the highest centroid as the root cluster. If its centroid's distance to the next highest centroid is less than 15% of the root cluster's centroid, the two clusters merge and the step is repeated considering the merged cluster as the new root cluster. Otherwise, the cluster with the next highest centroid to the root cluster's centroid is considered as the new root cluster and the step is repeated. The algorithm terminates when no further merging can take place. The flowchart of the proposed mode extraction method is illustrated in Figure 4, where centroids of the root cluster and the cluster with the next highest centroid are denoted as C r and C j , respectively.

Transition Intervals of Appliances
Consider an arbitrary mode transition of an appliance from a (relatively) high state H = [H min , H max ] to a (relatively) low state L = [L min , L max ]. Then, the interval of this transition is defined as where T L and T H show the lowest and highest value of the specific transition of T, respectively.

Transition Participation Indices
Two appliances with some overlapping transition intervals can be difficult to separate in the aggregated signal. The participation index of their transitions defined as (6), obtained from each appliance's usage pattern in the training set, may help separate these appliances. Given a transition T of an appliance, its participation index P par is defined as where N days is the total number of days of training dataset in which the transition T happened. In other words, this parameter measures the daily average of contribution of a specific transition of an appliance in events of the total signal.

Additional Features of Appliances
Analyzing the power signals of appliances in the training dataset, one observes that some of the appliances exhibit very specific behavioral patterns. Some of these features, which can further help improve the accuracy of load disaggregation, are listed below.
• As shown in Figure 5, the dishwasher exhibits the same pattern when it is ON. As this pattern is complex, using it may reduce efficiency of NILM. However, one notices that in any given day, the dishwasher operates in either all or non of its modes. This simple characteristic will prove significant for NILM. • One observes that not all possible mode transitions of multi-mode appliances can ever occur. • Some appliances appear to have unique mode transition overshoots in their signals. • The time period between two consecutive ON samples (with OFF samples in between) of different appliances are significantly different.

Classification
In this section, a novel classification algorithm is proposed to address the NILM problem, that is, to determine each event in the aggregated power signal corresponds to which appliance mode transition. Different steps of the proposed classification method is displayed in Figure 6. Based on this method, to label a given event, the classifier first obtains all transition intervals containing the value of that event. As a simple example, consider two ON/OFF appliances 1 and 2, whose ON modes correspond to states [T L 1 , T H 1 ] = [890, 1000] and [T L 2 , T H 2 ] = [970, 1050], respectively. Given an event with value 980 in the aggregated test signal, the classifier first associates this event with OFF-to-ON transition of appliance 1 as well as that of appliance 2. Afterward, taking other specific features of each appliance into considering along with the test signal's behavior near the event, a single label is assigned to the event.
Defining N e and N T as the total number of events in the test signal and the number of all possible appliance mode transitions, respectively, let L e be an N T × N e binary-valued matrix, with columns corresponding to events and rows corresponding to mode transitions, which represents predicted labels for events of the test signal. Obviously, an element 1 of L e indicates that its row's corresponding transition is the predicted label of its column's event. The proposed classification algorithm is detailed in 4 steps below.
Step 1: Given an event of the test signal and a mode transition of an appliance, if the event value is within the transition interval, the respective element of L e is labeled 1. Otherwise, it is labeled 0. It should be clear that as transition intervals may overlap, some events may be assigned multiple labels in this step. We also point out that some events may remain unlabeled, which means that matrix L e may have some all-zero columns. Each of these unlabeled events is then labeled with the transition whose interval is closest to the event value. We note that the distance between a value and an interval is calculated as the minimum distance between the value and any point within the interval.
Step 2: Analyzing the daily aggregated signal, it can be observed that in the majority of time samples, appliances are in their OFF modes. With that in mind, one should focus on parts of the aggregated signal where at least one appliance is ON. In particular, given the aggregated signal, one obtains its cycles, where each cycle starts with an event succeeding an all-OFF sample and ends with the nearest event preceding an all-OFF sample. Figure 7 illustrates how cycles of an aggregated signal are derived. Over each cycle, the labels assigned to the events should be compatible.

Sample
Active power (W) To clarify what is meant by labels' compatibility over a cycle, one thinks of an undirected graph in which nodes represent all mode vectors θ, where each element θ a of θ is a mode of appliance a. Two nodes are then connected by an edge if their corresponding vectors differ in exactly one element. It should be clear that an edge represents a single appliance mode transition, while noting that two different edges may correspond to the same transition. Now, a sequence of labels over a cycle are said to be compatible if starting from the all-OFF node of the graph, one can walk according to the labels in the sequence and terminate at the all-OFF node. In other words, labels/transitions over a cycle are compatible if they form a cycle in the graph constructed above.
Using the compatibility condition described, the multiple labels assigned to some events can be narrowed down as some labels are deemed inadmissible. More precisely, a label within a cycle is removed if it is not part of any compatible sequence of labels over the cycle. As an example, consider 3 type I (ON-OFF) appliances with the transition intervals shown in Table 1.  Figure 8 shows a specific cycle of this dataset. Based on the event matching classification in Step 1, the first event is assigned two labels: one a mode transition of appliance A and one a mode transition of appliance C, while other events are assigned single labels. Figure 9 displays the predicted graph based on assigning different labels to the first event.
As it is shown, the mode transition of appliance C is ruled out as a label of the first event since it is not part of any compatible sequence of labels from Step 1 over the cycle. In other words, appliance C cannot possibly be ON when the first event occurs. In this case, to detect the incompatible predicted labels, a label matrix is computed as Figure 10. In the matrix, the first row and column show the value of transitions and appliances, respectively. If a positive transition belongs to an appliance, label = 1 is assigned to the corresponding element of the matrix and for a negative transition, label = −1 is assigned. As each transition is caused by an appliance, the sum of each column should be 1 or −1. On the other hand, If an appliance is turned ON in a cycle, it should be turned OFF in the same cycle. In this regard, the sum of each row should equal zero. Otherwise, incompatible labels are assigned to transitions. As an example, in Figure 10 first column and third row, show that appliance C should not be assigned to the first event. For multi-mode appliances, the sum of rows is obtained based on probable transitions between different operation modes. Step 3: After Step 2's cycle-based label refinement, each event may still be assigned multiple labels. Considering extracted specific features of some appliances as described in Section 4.4, some labels are removed from the multi-labeled events.

Transitions
Step 4: Finally, based on participation indices of appliance transitions explained in Section 4.3, the most probable label is chosen for the multi-labeled events.

Simulation Study
In this section, the accuracy and effectiveness of our algorithms proposed in Sections 3-5 for the NILM purpose are evaluated by applying them to two low-frequency datasets, the gathering of which is practical as it can be done using existing smart meters [14,32]. To measure the accuracy of the proposed classification, following evaluation metrics are used [33], where TRP and RC show the precision and recall, respectively; TP is true positive; FP is false positive; TN is true negative; and FN is false negative. It should be noted that for evaluation purposes, in line with the NILM literature, appliances with more than two operation modes are treated as ON/OFF appliances. In other words, all non-OFF modes of an appliance are lumped into an ON mode.

Evaluation on Residential Energy Disaggregation Dataset (REDD)
The dataset considered consists of 28 days of power data for seven appliances of house 1 in the REDD [25]. These appliances are listed as oven (OV), microwave (MW), kitchen outlets (KO), bathroom GFI (BGFI), washer/drier (W/D) with a high consumption state, refrigerator (RFG), and dishwasher (DW). The first five listed appliance only have ON and OFF modes, while the last two have more than two operation modes. Figure 11 shows the power signals of all seven appliances in one day of the dataset and the total consumption of all individual appliances' consumption. Three weeks of data from this dataset is considered as the training dataset, and the rest is considered as the test dataset. In the following subsections, first the proposed filtering end event detection method are applied to training and test dataset. Then, based on the filtered signal of each appliance, their specific features are extracted. Finally, considering these features the proposed classification technique is utilized to disaggregate the test signal.

Signal Filtering and Event Detection
The filtering and event detection method of Section 3 is applied to individual appliances' power signals in the training dataset as well as the aggregated signal in the test dataset. As an example, Figure 12 illustrates the outliers of the signal, the overshoots and spikes in the signal, and the constructed filtered signal of the dishwasher for a period of time, respectively.

Feature Extraction
Having individual appliances' signals filtered, their features are extracted via methods of Section 4 as discussed below.

States of Each Appliance:
Applying the LW-based clustering method of Section 4.1 to the filtered signal of each appliance, its modes and their corresponding states are obtained. One recalls that the merging policy is that of the LW algorithm until 10 clusters are obtained, then changes to that of distance-based policy. As an example, Table 2 shows the 10 obtained clusters for modes/states of the dishwasher before the merging policy changes from the LW method, and Table 3 shows the final obtained states of all appliances. Table 2. Minimum, maximum, and centroid of the 10 LW-obtained clusters for dishwasher's modes/states.

Min Max Centroid
Active power (W)   154  183  168  198  208  205  210  231  219  236  261  236  398  420  449  416  496  489  500  598  589  643  737  680  1078  1110  1099  1115 1247 1173  Table 4 shows participation index for different groups of the overlapped appliances. To apply the proposed classification method, events of the filtered test signal are detected. Then, the classification-based algorithm is applied to detected events based on the transition intervals of appliances. Table 5 illustrates the evaluation metrics after the first step of the proposed classification method (event-matching step). As OV and W/D do not have overlapping consumption values, they are detected accurately. However, as it is shown in Figure 13, as BGFI and MW have overlapping consumption values, there exist some events of MW which are incorrectly assigned to BGFI decreasing its F measure . Finally, after cycle-based label refinement, to refine the remained multiple labeled events and choose the most probable label for each event, the following specific features of appliances are considered. Applying aforementioned features, for each remaining event with multiple labels, the participation index is calculated for overlapped appliances separately. The appliance which has close P par to the calculated participation index of appliances in training dataset, is assigned to events. Table 6 shows the high accuracy of our classification method for each appliance in comparison with the results of [24,34]. Keeping in mind that a higher number of appliances should diminishes the accuracy of NILM, note that the number of appliances considered in this work and in [24] are seven and six, respectively. On the other hand, considering multi-mode appliances such as dishwasher increases the complexity of disaggregation. However, the accuracy of our proposed method in which we have considered dishwasher is higher than [34] which did not considered dishwasher in appliances' set. In this case study, we have considered six appliances from AMPds: fridge (FGE), Basement plugs (BME), clothes dryer (CDE), dishwasher (DWE), heat pump (HPE), and whole oven (WOE) [32]. Five-hundred-and-fifty days of data are considered as the training dataset and 180 days of data as the test dataset. Figure 14 illustrates the consumption pattern of these appliances in a day of this dataset, along with the aggregated consumption which is the sum of the individual appliances' consumption.

Signal Filtering and Event Detection
In the first stage of the proposed classification method, outlier detection and event detection methods are applied to the power signal of each appliance in the training dataset as well as the aggregated test signal. Figure 15 displays a day of the test signal versus its filtered signal. The proposed state extraction method in Section 4.1 is applied to the filtered signal of each appliance in the training dataset. Then, the intervals of the plausible transitions of appliances is computed based on (5), as reported in Table 7. Finally, participation indices are obtained via (6) for three overlapping transition groups, as provided in Table 8.

Classification
The first step of proposed classification method (event matching step) is applied to the events of the filtered test signal. As shown in Table 9, in this step, appliances which do not have overlapping consumption values with others, such as HP, are disaggregated more accurately. However, appliances such as BM which have overlapping intervals of transitions have low F measure . After conducting cycle-based label refinement, the following pattern-based features are applied to multi-labeled events to role out the misclassified ones.

1.
As shown in Figure 16, WO and DW have specific operation patterns, in which they operate in either all or non of their operation modes. 2.
FG shows specific overshoots which makes it distinguishable from BM and WO. 3.
The ON durations of CD and FG differs considerably which make them distinguishable from each other.  In the last step, P par of appliances with overlapping consumption values are considered to assign the most probable label to the remaining multi-label transitions. Table 10 displays the F measure of the proposed classification method in comparison with [35]. It should be noted that in this paper, 14 operation modes which consist of multiple overlapping ones are considered. However, the authors of [35] considered a total of nine operation modes that do not have overlapping consumption values. This shows the effectiveness of our proposed method in the disaggregating a high number of multi-mode appliances with overlapping consumption values.

Concluding Remarks and Future Work
In this paper, we have proposed a novel classification method to address the NILM problem given a small dataset. The proposed algorithm has three main phases: (1) filtering training and test signals and accurately detecting their events using a statistics-based method,; (2) extracting features of appliances, most notably their modes and states via a clustering approach that in part uses the LW clustering method; and (3) proposing a classification algorithm labeling events of the aggregated test signal with mode transitions of appliances, where various features and techniques are utilized to enhance its accuracy. The proposed event detection and modes/states extraction methods have been done in a systematic fashion in such a way to perform well for any sets of power data. The proposed filtering method, feature extraction techniques, and event-based NILM classification algorithm have been validated using the REDD. Juxtaposing the results of our classification algorithm with two recently introduced event-based NILM methods indicate a relatively high accuracy of our algorithm.
Reconstructing the power signals of appliances, which can be cast as a regression problem as opposed to the classification problem we considered in this work, is one of the main challenges of event-based NILM problems. We aim to modify our presented algorithm in Section 5, to address the reconstruction problem. Moreover, due to the lack of a training dataset for each residential building, we wish to move a step further to use transfer learning to bypass the training phase of the proposed method, with the practical assumption that nominal values for appliances' power are given.