Concatenate Convolutional Neural Networks for Non-Intrusive Load Monitoring across Complex Background

Non-Intrusive Load Monitoring (NILM) provides a way to acquire detailed energy consumption and appliance operation status through a single sensor, which has been proven to save energy. Further, besides load disaggregation, advanced applications (e.g., demand response) need to recognize on/off events of appliances instantly. In order to shorten the time delay for users to acquire the event information, it is necessary to analyze extremely short period electrical signals. However, the features of those signals are easily submerged in complex background loads, especially in cross-user scenarios. Through experiments and observations, it can be found that the feature of background loads is almost stationary in a short time. On the basis of this result, this paper provides a novel model called the concatenate convolutional neural network to separate the feature of the target load from the load mixed with the background. For the cross-user test on the UK Domestic Appliance-Level Electricity dataset (UK-DALE), it turns out that the proposed model remarkably improves accuracy, robustness, and generalization of load recognition. In addition, it also provides significant improvements in energy disaggregation compared with the state-of-the-art.


Introduction
Energy consumption has always been a major concern in the world, which can be alleviated with accurate and efficient load monitoring methods.There are two branches of load monitoring, namely intrusive load monitoring (ILM) and non-intrusive load monitoring (NILM).The major difference between them is the number of sensors.The ILM needs to install at least one sensor at each appliance to monitor the load respectively, while the NILM needs to install one sensor on the bus per house merely.The NILM is physically simple at the cost of more complex approaches, thus NILM based approaches are more widely researched.
NILM contains two main objectives: energy disaggregation and load recognition.Conventional energy disaggregation aims to obtain energy consumption for every single appliance, which acts on the entire operation cycle of an appliance, and some research discards event detection and gives a result at the hour level to disaggregate the energy consumption [1].The approximate power consumption of each appliance in a certain period is its main concern, thus the exact on/off time of appliances is unknown.For online load recognition, appliances are detected from the uncompleted operation cycle, that is the transient-state process of on/off events.In addition, the result of load recognition further benefits energy disaggregation.Some advanced applications in the smart grid need to acquire appliance operation status for remote household control [2], such as demand response.It represents that the power supply side uses induction mechanism (e.g., price changes over time) to improve end users' energy consumption patterns, which demands for the balance of demand and supply in real time [3].
Real-time load recognition requires short sample windows and short execution time.The sample window is usually in seconds, while the execution time is less than a second.Obviously the former has a more significant impact on real time, thus the transient-state process should be captured and recognized.High sampling rate is necessary for sufficient information acquisition from transient-state features.High enough frequency (i.e., 100 kHz or higher) features can easily distinguish different appliances (e.g., electromagnetic interference (EMI) signatures), but EMI features can only be transmitted within a few meters, which is not suitable for household or industrial use.This paper uses 2 kHz features, which introduces interference from background loads, so the emphasis of this paper is to eliminate the influence of complex background loads.There are diverse types of appliances in different houses, and even the same type of appliances in different households vary by brand.It is hard for researchers to collect all combinations of appliances.Briefly, household electrical signals are multi-source single-channel data.Several similar topics are raised at the task of multi-source signals processing, such as the cocktail party problem in speech recognition and blind source separation (BSS) in wireless communications.Multichannel observations are necessary in these areas, whether it is two-channel audio-visual data [4] or two-channel audio observed signal [5], the data source itself gives the possibility of separation.Noiseless samples are used for training and being compared with separated samples in speech recognition methods, and artificial communication signals have the property of time or frequency separability, or they are orthogonal to each other.Unlike these problems, people can hardly obtain individual appliance samples in the single-channel scenario by non-intrusive approaches.To extract the feature of the target appliance from the load mixed with background loads, this paper utilizes the fact that the electrical signal is approximately stationary in a short time, in other words, loads are strongly correlated during this period.
Due to the powerful feature extraction ability of deep learning, this paper proposes an efficient network called the concatenate convolutional neural network.The model combines signal processing and pattern recognition to eliminate the influence of background loads in single-channel data and recognize different appliances.Features in high-dimensional spectrograms can be extracted in virtue of the development of deep neural networks.A series of deep convolutional networks for image classification have been proposed, such as Extreme Inception (Xception) [6], Residual Net (ResNet) [7], Dense Convolutional Network (DenseNet) [8].In this paper, high-frequency current data is converted to spectrograms by Short Time Fourier Transform (STFT) and set as the model input.Two proven networks are used as the embedding layer to extract spectrogram features.However, it is problematic to apply object recognition methods to the model input without modification.The recognition object covers the background in computer vision, thus the background causes little substantial impact of recognition, whereas the foreground appliance in spectrograms suffers from blurring due to the superimposition of background loads.
The main contributions of this paper are as follows.
(1) The model eliminates the impact of background loads to a certain extent, and achieves an F1-score of 89.0% in the classification task.(2) The proposed approach improves the performance of energy disaggregation, especially on multi-state appliances and programmable appliances.It increases an F1-score of 3-73% and reduces mean square error (MAE) of 3.1-24.2watts.The proposed model is evaluated on the UK Domestic Appliance-Level Electricity dataset (UK-DALE) [9] and the Building-Level fUlly labeled dataset for Electricity Disaggregation (BLUED) [10].The proposed model is also compared with several works in the aspect of energy disaggregation.In the remainder of this paper, the research status of NILM and the proposed network model will be introduced.Next, the results of the classification and energy disaggregation experiments are presented.Finally, this paper provides conclusions and future work.

Related Work
NILM was proposed by Hart more than 20 years ago [11], and numerous related research projects have begun since then.In the demand side management (DSM) program, the home energy management system comprises active appliances and passive appliances [12].The active appliances consist of energy sources and energy storage systems [13,14].Like most NILM methods, passive appliances are only considered in this paper.The approaches of improving classification accuracy or energy disaggregation results are closely related to data acquisition and feature selection.
The sampling rate in data acquisition can be simply divided into high-frequency and low-frequency, which directly affects the selection of features and approaches.Low-frequency data is typically used for approaches based on steady-state features, whereas high-frequency data is used for the transient-state analysis.Low-frequency data is easy to acquire and suitable for energy disaggregation [15,16].The fundamental frequency is either 50 Hz or 60 Hz in most countries, thus sampling at several Hertz or lower is not suitable for spectrum analysis.High-frequency sampling at over 1 kHz captures features of higher harmonics and further recognizes different loads.
Prior work [17] proved that the power spectrum computed from 15 kHz transient-state data can identify different loads with the same steady-state active and reactive power.
The higher the sampling frequency, the richer the information obtained.Multiple appliances could be distinguished by high-frequency signatures.Previous research showed that when the sampling rate was as high as 1 MHz, the EMI signatures of almost all appliances were distinguishable on the spectrograms [18], but the signatures could not be transmitted over a long distance, even in tens of square meters house.Low-noise data with no background load has been measured in some datasets [19,20], so there is no need to address these problems with complex networks.
In the field of power, the features are almost directly or indirectly derived from current, voltage, and power.These features in time domain or frequency domain are used in various methods, which mainly focus on some common machine learning algorithms, such as support vector machine (SVM) [21], k-nearest neighbors (kNN) [21], decision tree [22], wavelet design [23], evolutionary algorithm [24], adaptive boost [24], graph signal processing (GSP) [25], hidden Markov model (HMM) [16,26,27], etc.Even though there are methods show the good robustness under noisy conditions [26,27], the noise is different from background loads.The signal noise in these works derives from measurement error and the voltage fluctuation, and prove to have less impact (see Section 3).
The progress of deep learning has a positive impact on feature extraction.Deep neural networks have been exploited to train high dimensional images or sequences.Auto-encoder (AE) [15,28], long short-term memory (LSTM) [15,22], and sequence-to-point (Seq2point) neural networks [29] have been applied to the NILM task to achieve the corresponding targets.The model input of all these methods is low-frequency time series, without considering the influence of background loads.Their target lies in offline energy disaggregation, whereas the proposed approach can eliminate background loads and recognize different appliances.Experiments in Section 4 give a comparison of whether or not the background load has been processed.Some low-frequency NILM approaches perform in real time during the load classification or disaggregation [26,30], and the high-frequency approach in this paper can also achieve short sample windows and short execution time.

Concatenate Convolutional Neural Networks
The core idea of this paper is to utilize features extracted by convolutional neural networks (CNNs), then eliminate the influence of background loads by concatenate networks.The interference to recognition is divided into two parts: the voltage fluctuation and multiform background loads.When the voltage fluctuation is regarded as noise, the estimation result on the UK-DALE dataset shows that the signal-to-noise ratio (SNR) of voltage waveforms is 53 dB.For the measurement error, current and voltage waveforms have the SNR of 90 dB [9].On the other hand, background loads might cause a strong noise on the target appliance.For example, for a 1000 W appliance, the SNR is 10 dB when background loads are only 100 W, and the SNR is 0 dB when background loads are 1000 W. It is obvious that the influence of background loads is much stronger than that of the voltage fluctuation.Therefore, the main purpose of this paper is to eliminate background loads.

Problem Statement
Originally, the inspiration about this paper comes from the parallel circuit shown in Figure 1.Assume that the main circuit current is I 0 , and the branch currents are I 1 and I 2 respectively, and the background load and the target load are Z 1 and Z 2 respectively.When the switch K is turned off, we have the equation I 0(o f f ) = I 1(o f f ) , and I 0(o f f ) is known.After the switch K is turned on, we have the equation I 0(on) = I 1(on) + I 2(on) according to Kirchhoff's current law (KCL), and I 0(on) is known.In the ideal case, it will be ignored that the effect of Z 2 on Z 1 and the fluctuation of Z 1 branch, thus I 1(o f f ) = I 1(on) , further I 2(on) can be expressed by: However, experiments shown in Figure 2 prove that Equation (1) does not strictly hold.Figure 2a shows the microwave spectrogram without background loads, which is measured in the laboratory.Figure 2b shows the microwave spectrogram in the UK-DALE dataset, Figure 2c shows the spectrogram calculated by spectral estimation based on the previous hypothesis of Equation (1).Although there is a strong correlation between Figure 2a and Figure 2c, spectral estimation does not eliminate the background load precisely, there are still intermittent spectral lines before the appliance is turned on.Besides, compared to Figure 2a, some detail components are eliminated in Figure 2c after the appliance is turned on.These phenomena prove that the background load has a certain degree of stationarity, but this does not mean it is exactly unchanged, and I 1(o f f ) is not equal to I 1(on) .Accordingly, the background load needs to be estimated more reasonably.On this issue, this paper proposes a novel approach, concatenate deep neural network, to estimate the features of I 2(on) indirectly, which can be represented as: where s ϕ is the function to extract the similar part of two features {X 0(o f f ) , X 0(on) }, d φ is the function to extract the different part of two features {X 0(on) , X 1(on) }, X with different subscripts are features of spectrograms computed by the corresponding current, which can be calculated by: where SI is the spectrogram.The function f θ is used to extract the features of the mixed load spectrogram and the background spectrogram.
The inspiration of function s ϕ and d φ borrows from Code Division Multiple Access (CDMA), which allows multiple users to transmit independent information within the same bandwidth simultaneously, and the orthogonal spreading code is used to distinguish and extract signals from different users [31].
In the circuit model, the main circuit current I 0 (t) is given by: where K is the number of branches.The branch current I k (t) is expressed as: where I rated k is the rated current of the load on the k-th branch, which is a constant, and the c k (t) represents the time-varying noise function of the load.
Since appliances are relatively independent in construction and operation, their noise functions are almost uncorrelated with each other and can act as the spreading code in CDMA.
The i-th branch current is recovered by multiplying the noise function: As the spreading code in CDMA is designed, the noise functions are assumed to be orthogonal in the ideal case, where Therefore the branch current is restored as: Unfortunately, unlike CDMA, it is difficult to get the precise "spreading code" for each appliance in practice, and background loads usually consist of multiple loads.Therefore the method of network fitting is used to recover branch current.The first step is to extract the spreading code, and the second step is to reconstruct the feature of the branch current.Then the simplified form of Equation ( 4) is: where I B and I L represent the branch current of background loads and the load to be recognized (i.e., target load), respectively.c B (t) and c L (t) are weakly correlated.In practice, it is easy to get the previous background loads I B (t) before the target load is turned on.On the hypothesis that background loads are stationary in a short time, the relationship between the noise functions c B (t), c B (t) and c L (t) is stated as: In fact, such estimation is not rigorous, because stationarity does not mean complete equality, and weak correlation does not mean strict independence.Thus the similarity learning module s ϕ is equiped to fit Equation (10), and the feature of background loads is obtained by: For the branch current, I L (t) can be calculated by subtraction through Equation ( 9).For the corresponding feature, the difference learning module d φ is used to fit the subtraction operation and obtain the feature of I L (t) by: In summary, the model consists of an embedding module, a similarity learning module, a difference learning module and a classifier, which realizes the complete process of feature extraction, feature selection, and classification.

Model Architecture
Based on the previous hypothesis, each spectrogram image is split into two blocks representing the background load and the mixed load as shown in Figure 3. Two blocks are input into the network simultaneously, which can be seen in Figure 4.The CNNs in the embedding module are placed at the front end of the network to convert the image matrix into a vector, which is f θ in Equation ( 3), and here the module comprises two networks { f θ 1 , f θ 2 } with the same structure and different parameters.The similarity learning module s ϕ is used to generate the similar part in the concatenate feature and get the background feature behind the mixed feature, we refer to this as the "implicit background", distinct from the "explicit background" extracted from the background-only load.The concatenation is channel-wise.The difference learning module d φ converts the features of the mixed load and the "implicit background" to the target feature X 2(on) .The final classifier determines the label of the target load through the target feature maps.The loss function is the cross-entropy function: where y i is the i-th bit of the one-hot label y of C classes, and Z i is the i-th element of the network output Z with the softmax activation, which can be represented as: where h σ is the classifier to map the obtained target spectrogram feature to the C-dimensional vector, and softmax is the activation.Figure 5 presents the detail architecture of the proposed network with the embedding module omitted.Similarity learning module seems to be a residual block even though it is not identical.The shortcut connection here aims to convey information about the mixed feature.Each convolutional layer (Conv) comprises a 256-filter convolution of kernel size 7 × 2 and stride 1.The kernel size is determined by the embedding vector computed by the CNN.In all experiments, the size of the embedding vector (the background feature and the mixed feature) is 7 × 2 × 256.The similarity learning module is followed by an average-pooling layer (AvgPool) with kernel size 7 × 2 to compress the feature map in height and width.The output size of two fully connected (FC) layers in the difference learning module is 256.For the classifier, two FC layers are 64 and C dimensional, respectively.

Experiments and Discussion
The proposed approach is compared with two baselines and evaluated on the UK-DALE dataset and the BLUED dataset to test its universality.An energy disaggregation result on the UK-DALE dataset will be also provided.

Dataset and Data Preprocessing
Both the real and imaginary part of spectrograms are taken as model input, in other words, complex spectrograms are computed instead of power spectrograms or magnitude spectrograms.The two-channel data is more expressive not only because it contains phase information, but also due to the additivity of spectrograms in the complex domain.As envisaged, the background load can be removed from the mixed load on this account.
Every switching event contains 7-second current data, that is, 14,000 data points.As shown in Figure 6, plenty of samples are drawn to confirm that background loads are stationary in a 7-second window in most cases.Note that 15-second spectrograms are drawn in this paper in order to give a more complete demonstration, but this does not affect the 7-second spectrograms actually used in experiments.The size of an original 7-second spectrogram is 224 × 100 × 2, the first two dimensions represent frequency and time, and the last dimension represents real and imaginary part.The switching point of each sample is located slightly after the midpoint of the time axis.As shown in Figure 3, two blocks are split from the original image as the proposed network input, and the splitting line is in the midpoint of the time axis.Every block of input image has a size of 224 × 50 × 2.

UK-DALE
Most of our experiments are based on the UK-DALE dataset.The UK-DALE dataset contains electrical data from 5 houses for up to 655 days.A 1/6 Hz aggregate (mains) and individual appliance (submetered) power data were recorded for each house.Houses 1, 2, and 5 are selected as our original data source because only these 3 houses contain 16 kHz aggregate voltage and current data, then 16 kHz data is downsampled to 2 kHz to draw spectrograms.For submetered data, dataset authors did not record 16 kHz data.The switching events are detected with the help of submetered data (6-second time interval).For all experiments on UK-DALE, house 1 and house 5 data are set as the training set, and house 2 data is set as the test set to measure the generalization ability of our model.Seven classes of appliances are used for classification-kettle, fridge, dish washer (DW), microwave (MW), washing machine (WM), laptop or monitor (LCD), and running machine (RM).The number of appliances in each house and appliance spectrograms are shown in Table 1 and Figure 6, respectively.

BLUED
Unlike the UK-DALE dataset, the BLUED dataset collects data from a house for only 8 days, which also has dozens of appliances, so each appliance has very few samples.Table 2 shows the selected appliances to facilitate comparison with the published work based on Fast Shapelets [32].Raw data (12 kHz) is also downsampled to 2 kHz to draw the same size spectrogram as the one in the UK-DALE dataset.The validation set accounts for 33% and a 3-fold cross-validation is adopted.

Experimental Infrastructure
All neural networks in classification tasks are implemented using TensorFlow and trained on NVIDIA GeForce GTX 1080 GPUs.
In this paper, all networks are trained using Adam optimizer [33], and the initial learning rate is set to 0.001 for all networks, although some exceptions will be supplemented later.

The Recognition Result on UK-DALE
Two different prevalent networks are chosen as the embedding module of the model: Xception and DenseNet-121.We name these models Concatenate-CNNs.Meanwhile, two baselines are used to compare with the proposed networks.The model input and description is shown in Table 3.The first baseline is to classify appliances without the disposal of background loads.The original 224 × 100 × 2 spectrogram is input into the CNN without concatenate features and following networks.
The second baseline is to deal with the background load with spectral estimation (SE).The model input is the estimated target spectrogram which is shown in Figure 2c, the network is the same as the first baseline, and we call this CNN-SE.In this case, the background current is assumed to be invariant during the window.Specifically, the background feature is obtained by calculating the average spectrum of the first 3 seconds in the entire 7-second window, the target spectrogram is obtained by subtraction between the original spectrogram and the calculated background spectrogram.
In Concatenate-DenseNet-121 and its two baselines, the growth rate is set to 32 and the compression rate is 0.5.Other parameters have been illustrated in the last section.
Here are three metrics to evaluate the performance.'Recall' indicates the proportion of samples of one class are correctly recognized, which is given by: 'Precision' represents the proportion of samples recognized as one class and truly belong to that class, which is calculated by: where true positives (TP) are the number of events correctly classified when the appliance was on, false positives (FP) are the number of events classified as on being the appliance off, and false negatives (FN) represent the number of events classified as off being the appliance on.The F1-score combines recall and precision, here appliance events are classified without estimating the power consumption, thus the F1-score is reported [34], which is given by: The performance of these models on the UK-DALE dataset is reported in Table 4.The performance of concatenate models is highly depended on the embedding module, and concatenate structure improves the classification result on this basis.For two different CNNs as embedding layers, Concatenate-CNN models perform better than two baselines in average F1-score, as well as recall and precision for most appliances.The proposed model can achieve an equilibrium result among appliances in F1-score.However, the second baseline (spectral estimation) is just slightly better than the first baseline in Xception-SE, and DenseNet-121-SE is even worse than DenseNet-121.Accordingly, Concatenate-CNN models can eliminate background loads better, and the results make clear that background loads are not exact stationary.It is not rigorous to estimate the target spectrogram by directly subtracting the background spectrogram (baseline 2).
In Table 5, six options of network parameters in the similarity learning module and the difference learning module are compared.Small differences among models 1/2/3/4 prove that the proposed model is insensitive to FC output size/Conv filter size.Model 4 is marginally better than model 2, but the huge parameters increase model complexity and inference time.Model 2 is slightly better than model 6 and significantly better than model 5, which indicates that Conv kernel size has an observable effect on the model.To sum up, model 2 is selected for the rest of this paper.To further visualize the effectiveness of the model, the network's response to two same class samples at different layers is shown in Figure 7, and the mixed feature and the target feature have been indicated in Figure 4.The shown mixed feature is averaged in height and width, retaining channel data.The difference of features is obtained by calculating the absolute value of the difference between the two columns on the left.Within a class, the target feature is more similar than the mixed feature, that is to say, the difference of target features (DF_target) is smaller than the difference of mixed load features (DF_mixed).

The Recognition Result on BLUED
The BLUED dataset contains many types of appliances with few samples, thus the model parameters of Concatenate-DenseNet-121 on UK-DALE are retained and applied to the BLUED dataset to avoid overfitting, then the BLUED data is trained for 30 epochs at a small learning rate of 0.0001.
The result of classification on the BLUED dataset is shown in Table 6.The Concatenate-DenseNet-121 distinguishes 6 classes of appliances, whereas the Fast Shapelets algorithm [32] trains a single

The Energy Disaggregation Result on UK-DALE
The main goal of this paper is to recognize the type of appliances, and energy disaggregation is considered a by-product.Switching events recognized by the previous appliance recognition algorithm determine where the appliance cycle is located, thus the appliance recognition results have a great impact on energy disaggregation.On-events are recognized by the Concatenate-DenseNet-121 network, then off-events are recognized through the power of previous on-events.After determining on/off time, the disaggregated active power data is derived from the translation and smoothing of the aggregate data.The entire process of energy disaggregation is summarized in Algorithm 1 and Figure 8 is an example of energy disaggregation.

Algorithm 1 Work Flow of Energy Disaggregation
Require: The aggregate power data at time t, P A (t) The on-events recognized by the classification task, E on = {E on (1), . . ., E on (n)} The power of individual appliance l, P l 1: Set threshold T l for appliance l 2: Detect a falling edge e 3: if e matches the power of E on (k) ∈ E on then 4: Determine the appliance label l and on/off time t on /t o f f of an operation cycle 5: for i = t on to t o f f do 6: Five target appliances are selected for energy disaggregation except for LCD and RM in the classification task.The performance is presented on house 2 appliances with the training of house 1 and 5 data.True or false binary judgments no longer depend only on the classification of switching events, but whether each data point matches, thus the F1-score here takes power estimation into account, called the M-fscore in [34].Mean absolute error (MAE) is introduced to measure the error in every single point [15].Figure 9 shows the disaggregation result on house 2. The proposed model is compared with the best result in Kelly and Knottenbelt's paper [15] in terms of F1-score and MAE, that is, the "Rectangles" architecture, and the performance is also compared with the result of AE [28] in F1-score and seq2point network [29] in MAE.
The energy disaggregation results of the recognition model outperform the network dedicated to energy disaggregation.With high disaggregation accuracy, concatenate CNNs can be used as an auxiliary tool for energy disaggregation.Compared with the other two methods, all the metrics are greatly improved, especially MAE is reduced in DW, MW, and WM.
The result proves that once the type of switching event is determined, the power waveform does not need a point-to-point reconstruction with neural networks or other approaches.The nuances in waveforms have a small effect, but the influence of incorrect appliance recognition is more critical.The events in the proposed approach are different from those in Kelly and Knottenbelt's paper [15].The event number of washing machine in this paper (Table 1) far exceeds theirs (less than 600).This is because multiple state transitions in each entire operation cycle are monitored in this paper.The location of every state transition greatly improves the result of energy disaggregation.Multi-state and programmable appliances will be a trend in the future, thus it is difficult to capture and train the entire operating cycle due to numerous combinations of states.It is worth mentioning that the approach in this paper recognizes appliances by high-frequency data, while the other methods in this section used low-frequency data.Obviously, high-frequency data is conducive to locating and distinguishing appliance events, and eliminating the influence of background loads facilitates energy disaggregation.

Conclusions
The concatenate convolutional neural networks proposed in this paper can apply favorably to appliance recognition and energy disaggregation.The proposed models are evaluated on two real world datasets: UK-DALE and BLUED.Experiment results show the capacity of the proposed model, which can partly resist the interference of the background load on both large and small samples.Besides, the approach presents great generalization ability in appliance recognition and energy disaggregation.The hypothesis of short time stationarity is a premise of the proposed model, which also restricts the window length of samples at the same time.However, there is still the exception that the transient-state process of some appliances is more than three seconds (e.g., running machine).Therefore, we intend to break through the restriction and estimate target features from the longer time mixed load in the future.

Figure 2 .
Figure 2. Spectrograms of microwave.(a) The spectrogram with no background load.(b) The spectrogram with background loads (c) The estimated spectrogram.

Figure 3 .Figure 4 .
Figure 3.The background load part and the mixed load part of a complete spectrogram.

Figure 5 .
Figure 5.The detail architecture of similarity learning and difference learning module.The addition operation in similarity learning module is element-wise.

Figure 6 .
Figure 6.Spectrograms drawn from appliances in UK Domestic Appliance-Level Electricity dataset (UK-DALE).

Figure 7 .
Figure 7.An example of the network Concatenate-DenseNet-121's response at different layers.The model input is two kettle samples with different background loads.For each layer, the output is reshaped to form a 2D image.

Figure 8 .
Figure 8.An example of energy disaggregation.The red dashed box stands for interference from other appliances.The blue circle stands for on/off points of an operation cycle.

Figure 9 .
Figure 9. Energy disaggregation results on the UK-DALE dataset.Table 7 records the average execution time (AET) of the Concatenate-DenseNet-121 model.The CPU option means the model is executed with an Intel Core i3 processor and 8 GB RAM.The GPU option has been stated in Section 4.2.The AET consists of the inference time of the classification task and the running time of the disaggregation task.The AET of the proposed model is short enough to meet the requirement of continuous operation.

Table 1 .
The number of appliance events per house in UK Domestic Appliance-Level Electricity dataset (UK-DALE).

Table 2 .
Appliances used in Building-Level fUlly labeled dataset for Electricity Disaggregation (BLUED).

Table 3 .
Model input and description.

Table 4 .
Comparison of the proposed model and two baselines on house 2 data of the UK-DALE dataset with recall (%), precision (%), and F1-score (%).Best results are shown in bold.

Table 5 .
F1-score (%) on house 2 data of the UK-DALE dataset.Fully connected (FC) output size/Conv filter size and Conv kernel size are tuned in Concatenate-DenseNet-121.

Table 6 .
The performance of classification on the BLUED dataset.phase, which only needs to distinguish 3 classes at a time.Nevertheless, the proposed approach improves recall by 3.5%, precision by 19.1%, and F1-score by 11.8%.The result on the BLUED dataset shows the validity of the model on small samples.

Table 7 .
The average execution time (AET) of the proposed model.