A Scalable Real-Time Non-Intrusive Load Monitoring System for the Estimation of Household Appliance Power Consumption

Smart-meter technology advancements have resulted in the generation of massive volumes of information introducing new opportunities for energy services and data-driven business models. One such service is non-intrusive load monitoring (NILM). NILM is a process to break down the electricity consumption on an appliance level by analyzing the total aggregated data measurements monitored from a single point. Most prominent existing solutions use deep learning techniques resulting in models with millions of parameters and a high computational burden. Some of these solutions use the turn-on transient response of the target appliance to calculate its energy consumption, while others require the total operation cycle. In the latter case, disaggregation is performed either with delay (in the order of minutes) or only for past events. In this paper, a real-time NILM system is proposed. The scope of the proposed NILM algorithm is to detect the turning-on of a target appliance by processing the measured active power transient response and estimate its consumption in real-time. The proposed system consists of three main blocks, i.e., an event detection algorithm, a convolutional neural network classifier and a power estimation algorithm. Experimental results reveal that the proposed system can achieve promising results in real-time, presenting high computational and memory efficiency.


Introduction
Nowadays, the amount of data that is generated almost continuously is enormous. Once analyzed, they can reveal useful information in many different disciplines; economy, healthcare, and e-commerce, to name a few. In this context, the energy sector could not have been an exception. Traditionally, energy data was acquired at a few critical points of the power grid, usually at the transmission level, but the landscape has changed due to the advance in smart-metering technologies. Thousands of internet-of-things (IoT) endpoints are placed within the smart grid, providing energy utilities access to valuable data; thus, new opportunities have been created for energy services and data-driven business models [1][2][3][4][5]. Energy disaggregation is an example of such a service.
Energy disaggregation is the process of consumption breakdown at appliance or activity level for residential or commercial-industrial (C&I) users; in other words, it estimates the individual power consumption for all appliances contributing to the total mains power. This process can help energy utilities reveal useful information to support load forecasting and demand-side management programs. Regarding residential consumers, it can be used to provide accurate billing and meaningful feedback regarding their energy consumption as well as to improve the appliance efficiency (e.g., by detecting old devices and replace them with more efficient ones) [6].
There are two main possible energy disaggregation solutions: (a) Intrusive Load Monitoring (ILM) and (b) Non-Intrusive Load Monitoring (NILM). In ILM, i.e., a hardwarebased approach, power meters are attached behind each target appliance. The large number of hardware devices required for ILM makes the installation process difficult and costinefficient but results in very accurate power estimates. On the other hand, NILM is a software-based approach. It requires a single meter for the total aggregated power, thus the installation process is simplified and the corresponding cost is reduced. However, since there is no information about the aggregated power appliances, appropriate algorithms should be created to perform energy decomposition.
The utilities should perform a large-scale deployment to support thousands of consumers to benefit as much as possible from energy disaggregation services; only then it is possible to extract useful information for business models. This large-scale deployment makes NILM far more favorable than ILM due to the low cost, installation simplicity and minimum hardware requirements. However, in many cases, NILM algorithms present high computational complexity and significant memory requirements. In this sense, utilities should either use high-end smart meters-or extra hardware attached to them-with powerful central processing units (CPUs) and sufficient memory. Alternatively, energy disaggregation must be performed in cloud services. In the latter case, the cost of cloud services increases with the number of consumers. To this end, utilities must adopt scalable solutions. Scalability can be more critical even than disaggregation accuracy. As it is realized, low computational and memory requirements are necessary to run the service on the edge with conventional microprocessors or minimize the cost of needed cloud services. Furthermore, to improve user experience, minimum feedback must be required; thus, the necessity for pre-trained generic appliance models is of utmost importance.
Several approaches have been proposed to cope with the NILM problem [7,8]. It was first introduced by Hart [9]. Hart's approach was based on monitoring power changes (corresponding to the appliance turning-on/off events) of both active and reactive power signals. These power changes are grouped into clusters, with each cluster representing a state change of a target appliance. Since then, several works have investigated the NILM problem utilizing different sampling rates and techniques. Earlier approaches employed sampling rates lower than 1 Hz, where event detection (appliances turningon/off) is impractical and probabilistic models, such as variants of hidden Markov models (HMM) were examined [10][11][12][13][14][15][16][17]. HMMs yield promising results but present disadvantages, e.g., high computational complexity when the number of appliances increases and difficulty in classifying appliances that present similar power consumption [18]. Due to these disadvantages, researchers have turned to alternative methods, including machine learning and deep learning techniques . NILM approaches can be generally categorized as event-based and state-based.
Event-based solutions [33][34][35][36][37][38][39] leverage the information-rich transient response of an appliance turning-on. Specifically, they consist of two modules: (a) an event detection algorithm for discovering power changes corresponding to an appliance turning-on and (b) a classifier for identifying the appliance that caused the power change. This approach is based on the fact that turn-on transient responses contain more information regarding the operating device than steady states. However, in order to obtain this transient state information, high-resolution data is vital [33][34][35][36][37][38]. One widespread event-based method is the V-I trajectory, utilizing high-resolution voltage and current measurements. In [37,38], useful features are extracted from the V-I trajectories and neural networks are trained for classification. Other researchers depict the trajectories as binary images [34][35][36]. This visual representation solves the appliance recognition problem by exploiting computer vision techniques. Furthermore, transfer learning techniques have been investigated, as in [36] where an image classifier has been implemented based on AlexNet [40]. An important advantage of event-based approaches is the low complexity; only a few time instants corresponding to on/off events are processed. Furthermore, such approaches can detect an appliance turn-on event in real-time since information only from the transient state is required. However, high-resolution data of several kHz is essential, applying mainly to detect appliance turn-on/off events, without calculating power consumption.
On the other hand, state-based approaches [19][20][21][22][23][24][25][26][27] mainly require lower frequency data. These approaches do not detect state transitions. On the contrary, they parse all available data of a time-series, even if no events occur. In such approaches, the appliance must operate for at least some minutes to determine if it is on [20,22]. There are even cases where the appliance end-use has to be fully completed to estimate the power consumption [19,24]. In [19], three different neural network architectures were presented, i.e., (a) long short-term memory (LSTM) networks, (b) stacked denoising autoencoders, and (c) a regression algorithm to forecast the start time, stop time and average power demand of devices. In [21], a bidirectional LSTM cell was used; in [26] a deep convolutional neural network (CNN) that uses as input a time window of active power consumption and predicts the active power in the center of the window. In [23], the authors feed their network with active, reactive, and apparent power and current data. Furthermore, they use mainly CNN blocks in order to create a recurrent property similar to LSTMs. Finally, in [24], an attention-based deep neural network is introduced, inspired by deep learning techniques used in Natural Language Processing (NLP). State-based approaches use low-resolution data and predict the power consumption per appliance. However, they present higher computational complexity since all available data are used, thus cannot detect in real-time an appliance being turned-on/off. The scope of this paper is to present a real-time event-based NILM methodology to detect an appliance turn-on event and calculate its power consumption in real-time. The proposed NILM design is built on top of three main blocks, i.e., an event detector, a CNN classifier and a power estimation algorithm. The main strengths of the proposed NILM system rely on the following: • The proposed system can identify when an appliance is turned-on in real-time, based on its active power transient response sampled at 100 Hz; processing data of the total appliance operational duration is not required, as in [19,24]. • The proposed system is delay-free; once the appliance has been turned-on, the system can calculate its power in real-time.

•
The combination of a machine learning model to detect appliance turning-on and a heuristic algorithm to estimate the power in real-time constitute a system lightweight, presenting less memory and CPU requirements than end-to-end deep learning models [19,23,24]. • The proposed NILM algorithm is automatic, thus, no feedback is required by the user. • Data sampling rate of 100 Hz for active power measurements is used, contrary to several kHz in relevant works [33][34][35]41,42].
Generally, as it can be suggested from the above analysis, the proposed system constitutes a real-time scalable solution presenting minimum hardware requirements; thus, it can be integrated into low-cost chip-sets and, consequently, run on the edge.
The paper is structured as follows: In Section 2, the proposed methodology is presented. In Section 3, the dataset and the metrics used for evaluation are described. In Section 4, experimental validation results from real-life installations are analyzed and the performance of the system is compared to other state-of-the-art approaches. In Section 5, an industrial perspective regarding scalable real-time NILM services is discussed. Finally, Section 6 concludes the paper.

Proposed System
The proposed methodology comprises of three main parts: (a) an event-detection system to find active power changes corresponding to turn-on events, (b) a CNN binary classifier to determine if the turn-on event was caused by a specific target appliance or not, and (c) a power estimation algorithm to calculate in real-time the appliance power per second and consequently the energy consumption. An overview of the system in flowchart form is illustrated in Figure 1.

Event Detection
The event detection algorithm is used to identify the time instant (trigger time) when a sudden increase of active power occurs, indicating a possible turn-on event. The advantages of the proposed event detection algorithm are its simplicity and the fact that no pre-training is required.
Let us assume that the aggregated active power time-series at 100 Hz is P. The original signal P is down-sampled at 1 Hz by means of averaging, resulting into signal P d . Downsampling is applied for two main reasons: (a) the event detection algorithm becomes simpler, presenting less computational burden and (b) most of power changes are still easily identifiable assuming an 1 Hz sampling frequency. However, if two or more events occur almost simultaneously, e.g., in a period of less than a second, the algorithm detects these events as a single one. Considering that the probability of this scenario is very low, the frequency of 1 Hz has been selected. Next, the maximum power difference (MPD) for each second n is calculated as: MPD shows the maximum difference in active power in a region around n, i.e., the maximum power during the first three seconds after n, minus the minimum power of the three first seconds before n. In this sense, the transient onset can be accurately determined since the real power increase may not appear immediately, but some seconds after n.
To determine the trigger time candidates, MPD is compared with a threshold, P th , which is determined in terms of the appliance rating power. This means that, at time instant n an event occurs if At this point, it should be mentioned that trigger time candidates close in time are merged. For each trigger time, a 6 s window of the captured transient response, P tr , is generated from P (100 × 6 = 600 samples). The pseudo-code for the process described is presented in Algorithm 1.

Algorithm 1: Event detection.
Input: P, P th Output: list of captured transient responses P d = P down-sampled at 1 Hz; Initialize an empty list L; Initialize an empty list transients; for each second n do max_after = max(P d (n + 1), P d (n + 2), P d (n + 3)); min_before = min(P d (n−3), P d (n−2), P d (n−1)); MPD = max_after − min_before; if MPD ≥ P th then Append n to L; end end Merge consecutive seconds in L; for t in L do P tr = P(t − 300:t + 299); Append P tr to transients; end return transients;

CNN Classifier
In the proposed methodology, the transient response generated by an appliance's turning-on is used as the load signature for appliance classification [7]. Whenever a target appliance is turned on, a transient response can be detected in the aggregated active power waveform. Besides appliance classification, this load signature presents two additional advantages. Firstly, for a given appliance, the turn-on transient response pattern is unique and relates only to the operational characteristics of the appliance [43]. Consequently, the identification algorithm's performance is independent of the simultaneous operation of other types of appliances, even when a large number of devices is considered [7,44]. Secondly, the proposed algorithm can successfully treat various types of appliances, even though presenting similar consumption levels at steady-state, since classification is performed based on the unique appliance transient characteristics instead of calculating steady-state features.
The same principle can detect specific operational states by identifying transient responses caused by a state transition regarding multi-state appliances. For example, for a washing machine or a dishwasher, the water heating process's transient response can be used to identify this specific state, being of primary interest as the most energy-intensive process during an operation cycle.
In order to associate a given transient response, P tr , with a specific target appliance behavior, a CNN classifier is utilized. In this sense, for each target appliance, a dedicated CNN classifier is used, identifying P tr as positive when related to the target appliance or negative otherwise.
Different types of appliances generate transient responses with distinct characteristics, primarily when a high sampling frequency, e.g., at 100 Hz, is used. Suppose a user was initially given an example of such a response corresponding to a specific appliance. In that case, he/she could later recognize a new response of the same appliance by simple visual inspection. However, the implementation of such a recognition algorithm is not an easy task.
Inspired from the area of computer vision, where CNN models are used for image recognition, and classification [45], a similar approach has been adopted in this paper.
Convolutional layers can automatically extract useful features from the input data without user supervision [45]. Thus, there is no need to implement specific algorithms; instead, by training a CNN model, the classification problem can be successfully solved. A block diagram of the proposed CNN architecture is depicted in Figure 2.
Initially, min-max normalization is applied to P tr by means of (3); the resulting normalized vector, P norm , is forwarded as input to the CNN model.
Since the CNN input model is an one-dimensional signal, one-dimensional convolutional layers are used to extract the useful features from P norm . In particular, three consecutive 1-d convolutional layers are used in combination with an 1-d max-pooling layer. All convolutional layer parameters have been set to 32 filters, kernel size equal to 3, strides equal to 1, 'same' padding, and rectified linear unit (ReLU) activation function. The ReLU function is defined as for x ∈ R. For max-pooling layers, the pool size was set to 2. Generally, at each 1-d convolutional layer, a number of filters is applied to the corresponding input, x conv . Assuming that the size of x conv is M conv × N conv and a single filter, f, is of 3 × N conv , the output of the convolution between x conv and f will be a M conv × 1 matrix. The resulting y conv is calculated as , where x conv (0, n) and x conv (M conv +1, n) are considered zero for any n ∈ [1, ..., N conv ] as a result of zero-padding. In our case, where 32 filters are used in a convolutional layer results y conv , are stacked as columns, forming a M conv × 32 matrix. Each layer is followed by a max-pooling layer to down-sample the extracted features of the input signal. In this sense, a summarized version of the extracted features (half the size) is created, maintaining the most important features and is further used as input to the next layer. Assuming x pool , with size M pool × N pool is the max-pooling layer input matrix, the output, y pool , has a size of (M pool /2) × N pool and is calculated as for m ∈ [1, ..., M pool /2] and n ∈ [1, ..., N pool ]. Following the three convolutional/pooling pairs, a flattening layer is applied, transforming its input to a single vector by column-wise stacking. Finally, two dense layers are used of 20 and 1 output nodes, respectively. For the first dense layer, the ReLU activation function is applied; for the last layer, the sigmoid activation function defined in (7) for x ∈ R is used to compute the probability of the transient response to correspond to the positive class.
Generally speaking, a dense layer with M dense input nodes and K output nodes includes two trainable parameters, i.e., a weight matrix, w, with size M dense × K and a bias vector, b, with size K. Given an input vector, x dense , with M dense elements, the output y dense of size K is calculated as where F is the corresponding activation function. Before each dense layer, a dropout layer [46] is used. Its value is set to 0.2 to prevent model over-fitting. A standard backpropagation algorithm is used during training to optimize the binary cross-entropy loss between the predicted probabilities and the actual labels. Assuming that the predicted probabilities are p 1 , p 2 , ..., p B for B samples and the actual labels are q 1 , q 2 , ..., q B , the binary cross-entropy loss is The CNN classifier is trained for a maximum of 50 iterations. The Adamax optimizer [47] was selected assuming an initial learning rate of 0.01 and batch size 32. In order to avoid over-fitting, early stopping with patience is used. The training process stops once the validation accuracy does not improve after five consecutive iterations.

Consumed Energy Estimation Algorithm
The last module is related to the real-power estimation of the target appliance. The implemented algorithm considers the appliance end-uses as pulses of constant power; this approximation is well-suited for single-state appliances such as microwave oven, kettle or toaster. In the case of appliances with operating cycles comprising of multiple pulses, the algorithm considers each pulse as a new appliance end-use and not as a single enduse event of several pulses. An example is the oven turning-on and off controlled by a thermostat and the dishwasher, where several water heating pulses may occur depending on the selected program. In this sense, the proposed algorithm performance may degrade for multi-state appliances. They are characterized by varying power consumption and cannot be approximated with a constant power pulse. However, such appliances present a predominant energy-intensive process during a full operating cycle while the rest operating states are less critical regarding the total energy consumption. For example, washing machine or dishwasher cycles include energy-intensive water heating processes and low energy-consuming processes, e.g., water pumping. Therefore, regarding multistate appliances, the proposed power estimation algorithm focuses on the estimation of the energy-intensive processes neglecting the effect of the minor consuming ones.
When the CNN classifies a transient response as positive, it is implied that the appliance has been turned-on. The calculated power increase, P init is considered equal to the appliance power consumption and assumed constant during the total time of operation of the appliance. When a power decrease between two consecutive seconds in P d inside the interval [0.8 P init , 1.2 P init ] is detected, the appliance is considered to be turned-off. The pseudo-code of the energy consumption estimation algorithm is shown in Algorithm 2, having as input the time (in seconds), t, when the target appliance is turned-on and P d .

Algorithm 2: Energy consumption estimation.
Input: t, P d Output: turn-off time, Assume power consumption equal to P init ; end end return n, P init ;

Dataset
The proposed NILM system is based on the fact that each household appliance presents a transient response pattern with distinct characteristics, becoming more noticeable as the sampling frequency increases. In this paper, the selected sampling frequency is 100 Hz; at this frequency the transient characteristics are captured in contrast to lower sampling rates where such information may be lost. In Figure 3, turn-on transient responses at 100 Hz and 1 Hz for five appliances are depicted. It is evident that the frequency of 100 Hz reveals unique details that are lost when sampling at 1 Hz. More specifically, Figure 3a presents the turn-on response of a high-power consumption (~1.2 kW) fridge compressor with a duration of fewer than two seconds. Figure 3b visualizes the water heating process of a washing machine, which corresponds to a steep power step-up. Next, Figure 3c illustrates the transient response of a microwave oven as a high-power spike followed by a smooth power increase. In Figure 3d, a stove turn-on presenting a smooth and convex power increase is shown, and finally, in Figure 3e visualizes the transient response from a heat pump dryer appliance, including a high-power spike at motor starting time.
An extensive set of transient responses for each target appliance is required to train the CNN classifier. For this purpose, a private dataset that includes transient responses of different household appliances sampled at 100 Hz from different installations is used. The type of appliance and the number of samples for each case are summarized in Table

Number of Transient Responses
Fridge 132 Dishwasher 171 Heat pump 202 Washing machine 135 Oven 82 Stove 148 Heat pump dryer (drum spinning) 54 Heat pump dryer (heating) 42 Microwave 290 In this study, three appliances are selected to test the proposed methodology's performance, i.e., fridge, washing machine, and microwave oven. Pulses can approximate the end-use of these appliances without significant error in power estimation. Furthermore, such appliances are considered typical for most households, corresponding to substantial total energy consumption. The selected appliances represent a larger group of appliances since both single-state and multi-state appliances are considered. Additionally, detailed results regarding the analysis of such appliances can be found in several relevant works [19,20,22,[25][26][27]; thus, a comprehensive comparative analysis can be performed. Finally, low energy-consuming appliances such as game consoles and phone chargers have not been investigated, being of trivial importance and hard to be identified in terms of NILM algorithm application [19].
For each target appliance, a binary classifier is implemented and trained. During training, the transient responses of the appliance under consideration are labeled positive; the responses corresponding to a different appliance are labeled negative. Balancing of the positive and negative classes is performed in order to prevent bias towards the class with the most samples; the number of negative responses is the same as the number of positive ones. A training/validation/testing split is used assuming a ratio of 60%/20%/20% to avoid over-fitting for each class separately.
However, because the number of samples per appliance is small, augmentation techniques are used. These techniques aim to increase the number as well as the diversity of the training samples by artificially introducing variations in existing transient responses. Specifically, for each transient response, 15 samples with the required length of 6 s are created. Assuming that the time-series that contains a response is z, each one of the 15 samples is generated by means of the following steps:

1.
Considering that the transient response starts at index s of z, a random number u in the interval [s − 500, s − 100] is selected, following uniform distribution. The selected sample is equal to z from index u to index u + 599.

2.
White Gaussian noise with mean value (µ) equal to 0 and standard deviation (σ) equal to 1 is added to the sample; 10 W maximum power is considered.
The number of samples for training, validation and testing the sets per appliance is shown in Table 2.

Performance Metrics
The proposed methodology is evaluated in terms of the event detection algorithm, the CNN classifiers as well as the overall system performance. For each case, different metrics are used.

Metrics for Classifier Evaluation
To evaluate the classifier, the most common metrics used in classification and NILM problems are adopted [18,29,32,34,35]. Specifically, the accuracy, precision, recall and F 1 -score, defined in (10) In this context, for a transient response classifier, a sample (i.e., transient response of 6 s) is positive if the transient response corresponds to the target appliance. Otherwise, it is assumed negative.

Metrics for Overall NILM System Evaluation
The overall proposed NILM system is tested by using the same metrics as previously, i.e., accuracy, precision, recall, and F 1 -score to evaluate the predicted status of the appliance (ON or OFF). Thus, a sample (i.e., a time instant) is considered positive if the appliance is ON and negative if not. It should be mentioned that an appliance is considered turned-on if the measured active power is higher than 5 W. Additionally, for energy estimation, the mean absolute error (MAE) and the root mean square error (RMSE) in (14) and (15), respectively, are computed where y[n] andŷ[n] is the original and the estimated power response with N samples. Moreover, the relative error in total energy (RE), defined in (16), is calculated where E and Ê is the original and the estimated total energy consumption of the appliance.

Results
In this section, experimental validation results are analyzed considering data from realworld installations. The Building-Level fully-labeled dataset for Electricity Disaggregation (BLUED) [48] is used to test the applicability of the proposed event-detection algorithm. Energy consumption data from three household installations are also used to evaluate the performance of the proposed methodology; common metrics are employed and results are compared with those obtained from other state-of-the-art methods proposed in the literature. Finally, the computational and memory efficiency of the proposed system is discussed.

Event Detection Evaluation
The BLUED dataset contains aggregate voltage/current and active power data, sampled at 12 kHz and 60 Hz, respectively, from a 2-phase household in Pittsburgh, USA. The recording duration is eight days. The time instants when a turn-on or turn-off event occurred are also reported in the dataset. In particular, for testing the proposed eventdetection algorithm, the active power measurements of phase A, at 1 Hz, from 11:58:32 20 October 2011, to 09:29:55 21 October 2011, are used. In fact, during this period, 125 events have occurred, including six pairs of simultaneous events. The proposed algorithm detects the simultaneous events as well as two near-simultaneous turn-off events as single events, respectively. Finally, one false event is detected; an appliance power drop, was incorrectly identified as an appliance turning-off, while the appliance being still in operation. In summary, 118 out of the 125 events have been correctly detected by the proposed event-detection algorithm. In Figure 4, the active power and the detected events for the period from 18:30:00 to 20:30:00 are shown.  The TPR, FPR and FNR metrics are calculated and compared to other more complex solutions [32,49,50] in Table 3. It can be seen that the proposed algorithm can achieve good results while being simple and computationally efficient.

Classification Evaluation
To evaluate the classifiers performance regarding the three target appliances, the private testing sets mentioned in Section 3.1 are used. The calculated accuracy, precision, recall and F 1 -score results are summarized in Table 4. It is evident that the proposed classification algorithm presents high performance regarding the microwave and the fridge. These appliances are related to transient response patterns presenting specific characteristics, thus can be identified with high confidence. However, this is not the case for the washing machine, since the turn-on transient response is a simple steep step-up waveform. Similar patterns are also related to the heating processes of most of the household appliances, e.g., dishwasher, oven and generally appliances that use resistive elements for heating as shown in Figure 5. This illustrates the relatively lower scores obtained for the washing machine metrics compared to the other appliances.

Application on Residential Households
The overall performance of the proposed methodology is tested on a private dataset. This dataset includes three 3-phase power supply households located in the Netherlands. For each household, aggregated active power per phase was measured at 100 Hz along with power consumption of selected appliances for 15 days. For evaluation purposes, the proposed NILM system is applied only when the target appliance is connected. Figure 6 presents the results for each target appliance, assuming an operational duration of four hours. Specifically, the aggregated power is colored in blue. The actual target appliance power measured with plugwise meters is colored in red. The target appliance power, as estimated by the proposed methodology, is colored in green.  The accuracy, precision, recall, F 1 -score, MAE, RMSE and RE are calculated as well as their average considering the three households for 15 days. Results for the fridge, washing machine and microwave oven are shown in Tables 5-7, respectively. It can be generally observed that the proposed algorithm presents high accuracy regarding the power and energy estimates of the fridge and the microwave. On the contrary, the microwave oven recall metric is low. This can be attributed to the fact that the proposed methodology considers this appliance standby mode of operation as OFF. In fact, the power consumption during this period is low, thus, of trivial importance regarding energy consumption calculations. Regarding the washing machine results, the NILM system is designed to detect only the most energy-intensive process during the washing machine operation cycle, i.e., water heating mode of operation. For the rest of the operational cycles (non-detected), i.e., water pumping, drum spinning, rinsing, the appliance status is assumed OFF. The partial detection of the washing machine appliance is evident in Figure 6, resulting into low recall scores. Moreover, in the third household, the calculated low precision is due to the operation of appliances presenting similar transient response patterns, being misclassified as washing machine end-uses.

Comparison with Other Methods
The performance of the proposed methodology is compared to other NILM-based energy consumption estimation systems. The average MAE, RE, precision, recall, F 1 -score and accuracy calculations obtained by the proposed method are summarized in Tables 8-10 regarding the fridge, washing machine and microwave, respectively. The corresponding results (where available) reported in the relevant literature are also presented as well as the associated NILM technique, sampling frequency, and testing dataset. Note that, most of the literature state-of-the-art methods have been tested by using the well-known UK Domestic Appliance-Level Electricity (UK-DALE) [51] dataset. This dataset includes aggregated active power and appliance measurements of 0.167 Hz for several months, recorded for a small number of household installations. Moreover, the Reference Energy Disaggregation Data Set (REDD) [52] has been used in [21] to evaluate the LSTM algorithm performance; the sampling frequency is 1 Hz for mains and 0.333 Hz for the appliances. The proposed NILM system is tested by using an 100-Hz private dataset, since high-frequency sampling data are not provided in the above mentioned public datasets. It is important to stress out that in order to conduct a fair comparison between the different approaches, all metrics should be taken into consideration. However, this is not possible, since results for all metrics calculations are not always provided in the corresponding literature. Therefore, a direct comparison should be carried out with caution.    Table 8 it can be seen that the proposed algorithm presents a high performance on most metrics. In particular, the method presents the third-best MAE, being inferior only to PCNN AE and PCNN LSTM. Regarding energy estimation, the RE metric is low (equal to 0.19), thus the proposed method is outperformed only by the CNN [19] and the WGRU [22] algorithms. Finally, the proposed solution presents the highest precision in terms of status estimation. In particular, the fridge status has been falsely identified as ON (real status was OFF) for the minimum of cases from all examined NILM solutions. On the other hand, the proposed method presents moderate performance in terms of recall (0.80), since the Autoencoder, CNN [19] and LSTM [21] algorithms achieve better results. This is mainly attributed to the proposed power estimation algorithm design. The fridge status may be falsely considered OFF prior to an actual turning-off, due to similar power stepdown recordings, caused by appliances different from the target one. A possible solution is to determine the fridge duration pulse. However, this is practically infeasible since the fridge duration pulse varies significantly due to temperature difference inside and outside the appliance. Finally, by ranking all methods in terms of the F 1 -score and accuracy, it can be realized that the proposed method is the second-best and first, respectively, among all examined solutions (where the corresponding metrics were available).
By analysing the washing machine results in Table 9, it can be observed that the proposed method presents relatively high MAE; seven out of the fourteen examined methods perform better. Regarding energy estimation the proposed method can be considered as the second-best in terms of RE, following the seq2point implementation [20]. Moreover, the proposed method presents the highest precision and the lowest recall among the examined solutions. This is due to the fact that the proposed system is specifically designed to detect the most energy-intensive and lower-duration process of the appliance, i.e., heating. The rest of the washing machine operation cycles, e.g., drum-spinning and rinsing are not taken into account as low energy-consumption longer-duration processes; thus, being of less importance. This implies that the proposed NILM system can accurately estimate the washing-machine energy consumption (low RE value) but predicts the appliance idle status (no water heating process) as OFF, resulting into low recall and high MAE. Some of the current state-of-the-art NILM systems can indeed detect these low energy-intensive processes. However, this results into an increased number of FP and consequently to low precision. Note that, the low precision (although the highest among the examined solutions) is attributed to the fact that the transient response of the heating process is similar to that of other household appliances; thus, may lead to an increased number of FP predictions. Finally, the F 1 -score and accuracy metrics set the proposed method as the third-and fourth-best, respectively, among the examined solutions (where metrics were available).
Finally, regarding the microwave oven (Table 10), the proposed method outperforms the examined NILM methods presenting the lowest MAE and RE as well as the highest precision, F 1 -score and accuracy. Better results by other methods are observed only in terms of recall. This is due to the fact that the proposed system can not detect the microwave oven standby mode of operation. However, the power consumption during this period can be considered negligible. It is also important to note, that in NILM and from a user-experience point of view, precision is considered more important than recall; missing an appliance event is preferable than detecting an appliance event that has not actually occurred. In this sense, missing standby modes is more favored than predicting false microwave end-uses. The superiority of the proposed method for the analysis of the microwave oven is based on the following: (a) the microwave transient response pattern is unique, thus, it can be easily identified, and (b) the microwave oven end-use duration is short, varying from few seconds to minutes; thus, the number of the possible turning-off events caused from other appliances that may degrade the power estimation algorithm performance is very limited.

Computational and Memory Efficiency
The proposed methodology is designed to be memory and computationally efficient. The first part, i.e., the event detector, calculates the power difference over time. The second part, i.e., the classifier, is triggered only when a significant power step-up is detected. If the classifier detects a target appliance, the power estimation algorithm is enabled. This eventbased approach can be considered computationally efficient compared to other solutions operating continuously, i.e., even no turning-on event occurs. Furthermore, the transient response classifier consists of 54,377 parameters. This is a small number compared to other end-to-end deep learning models requiring a number of parameters in the order of millions, e.g., the model parameters proposed in [19] range from 1 million to more than 150 million parameters. Therefore, the proposed NILM system can be considered as memory efficient.
The only drawback is the use of 100 Hz active power data to recognize appliance turning-on when a transient occurs. However, this feature is important to enable the real-time application of the proposed NILM system, contrary to other approaches requiring power data of more extended periods (minutes to hours) in order to identify which appliance is operating. Moreover, it must be noted that the 100 Hz time-series is used only when an event is detected, and only a 6 s window is extracted. Based on the above, it is evident that the proposed system can operate on the edge without the need of high-end microprocessors.

Discussion-Towards Scalable Real-Time NILM Services
As already reported, the proposed methodology is implemented as a real-time scalable solution with minimum hardware requirements, thus allowing utilities to perform a large-scale deployment. However, some criteria need to be met from an industry perspective before massively adopting such a service. Coming up with the correct blend of characteristics is not a trivial issue. So, it is no surprise that no real-time NILM solution based on sub-second energy data resolution has been rolled out in scale (>50 K end-users) globally yet. In this section, four necessary criteria are investigated and we examine if the proposed methodology meets them or not.

1.
First of all, as expected, comes the accuracy metric. Accuracy usually refers to a weights-based combination of (i) correctly detected events, (ii) precise energy con-sumption estimation for the detected appliance events and (iii) minimized FP. Energy companies and electricity consumers usually trust a NILM service when its accuracy exceeds 90% and when they are not receiving reports for appliances/activities never actually occurred.

2.
Second comes the data resolution and as a result the data volumes required for an accurate NILM output. As mentioned above in Section 3, for real-time appliance identification sub-second data granularity is needed. Note that, most of the solutions presented in literature deal with kHz or even MHz of data. Considering as a rule of thumb that 1-s resolution data from separate phases in a 3-phase installation result in almost 1 GB of data being produced per year, we realize that moving into the kHz resolution areas makes data parsing, storing and analysing a rather complicated, costly and therefore non-scalable option. 3.
Next in the list comes the computational/RAM efficiency of such a service. Although the recent trend was to move everything to the cloud, now NILM vendors and energy companies realise that such a decision is not always the most cost-effective; the opposite actually. Running for example the whole service for~100 k end users on the cloud can increase cloud operation costs that much, that there is no business case that can be built on top of a NILM layer, no matter how accurate that is. So, the key to unlock scalability opportunities here is to built a system that is so efficient that can run on the edge instead of the cloud.

4.
Strictly connected to the hardware constraints of the previous point comes the hardware cost. Traditionally sub-second data can be acquired only via a din meter hardware installed in the metering cabinet (it's only recently that a few smart-meter manufacturers make >1 Hz resolutions available through their S1 port [53]). On the other hand, utilities and energy retail companies see NILM as a great customer engagement tool on top of which they can build value-added services and they usually tend to offer that as a freemium service. As a result, hardware cost has to be as low as possible and ideally within the companies customer retention and acquisition budgets.
In Figure 7, three of the criteria mentioned above are analyzed for the proposed system, i.e., accuracy, sampling frequency, and computational burden. As we can see, scalabilityrelated criteria #1 and #2 are met; the proposed system presents accuracy higher than 90% in all examined cases by utilizing the sampling frequency of 100 Hz (see results in Section 4). Although this frequency is high, it is still considerably lower than a resolution of several kHz used in most state-of-the-art real-time implementations [33][34][35]39]. To that end, it is an excellent "do a lot with a little" decision to take. Regarding criterion #3, i.e., computational and memory efficiency, as demonstrated in Section 4.5, optimized design can efficiently run on the edge and even on low-cost chip-sets. Specifically, in Figure 7, it is assumed that the "High" value refers to expensive algorithms, incorporating several parameters that cannot be easily integrated into a low-cost microprocessor. On the other hand, the "Very Low" value refers to low computational complexity algorithms that can be integrated and run in a low-cost microprocessor. The proposed system is between the "Low" and "Very Low" area. Criterion #4 is expected to be met as a consequence of #3. However, such an investigation falls out of the scope of this paper.

Conclusions
In this paper, a novel real-time event-based energy disaggregation methodology is introduced. Initially, a simple event-detection algorithm is proposed to find time instants when an appliance is turned-on and extract transient responses at 100 Hz. Next, a convolutional neural network classifier identifies if a transient response was caused by a target appliance. Finally, a power estimation algorithm is implemented considering appliance end-uses as pulses of constant power. Experimental results show a promising performance for specific appliances.
Unlike most relevant papers in the literature, the proposed non-intrusive load monitoring system can identify in real-time when an appliance is turned-on based on its information-rich transient response sampled at 100 Hz. Furthermore, it is delay-free since, once a target appliance has been turned-on, the active power can be directly calculated. Moreover, the system is computational and memory-efficient and can be integrated into smart meters.
The proposed approach can be used for a significant number of appliances with negligible error. However, energy consumption of specific appliances, e.g., heat pump, tumble dryer, including many states of operation, cannot be calculated by the proposed methodology. For such cases, dedicated algorithms should be implemented. As future steps, a more robust power estimation algorithm will be examined for multi-state appliance uses. Additionally, the proposed methodology will be tested on more types of appliances.
Funding: This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH-CREATE-INNOVATE (project: T2EDK-03898).

Data Availability Statement:
The public available BLUED [48] dataset as well as a private dataset obtained from NET2GRID BV that is not public available have been used in this study.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: