1. Introduction
In recent years, the world energy demand has increased due to the population growth and economic development [
1] and it is expected that it will further increase in the next decades [
2]. The energy demand worldwide is annually increasing both in the residential and the industrial sector with households consuming approximately 40% of the world’s consumed energy [
3,
4]. The technological development of the last decades has led to low costs for buying electrical appliances and the automation of tasks and procedures both in industry and in households, thus it is estimated that the electric power needs will further grow and the average number of electrical appliances per household will significantly increase within the next two decades [
4]. It is estimated that approximately 20% of the energy consumed in the residential sector could be saved by changing consumers’ behavior and by improving the existing poor operational strategies [
5,
6]. Moreover, the development of smart grids and energy demand management systems as well as the fluctuation of power generation due to the increasing percentage of power generated by renewable energies units can confine the issue of annually increasing energy demands [
7,
8]. These changes in energy demand and generation are challenging for network operators and power generation units, since power needs are becoming less stable and less predictable while at the same time energy demand increases [
9,
10]. To address the above mentioned challenges, precise monitoring of electrical energy consumption in the residential sector is needed [
10], as well as proper energy demand prediction and management [
9]. At the moment energy consumption monitoring is mostly done by measuring the aggregated energy consumption in the form of monthly bills and therefore does not address the above-mentioned issues.
The measurement of energy consumption is performed using smart meters (SM). Smart meters measure the voltage drop over a device or a circuit and the current flowing through it at a predefined sampling rate with the sampling period varying from milliseconds to minutes [
11]. The lower the sampling period, the more accurate temporal information of the energy consumption signal is recorded, however high sampling frequency increases the amount of data acquired per time unit and also requires hardware supporting high sampling frequency A/D conversion, which in general increases the cost of hardware [
12] and might not lead to better disaggregation results [
13]. Most commercial smart meters must use a sampling rate in the order of seconds for the transmission and storage of energy data for several months or years to be feasible and to keep the corresponding hardware costs relatively low.
Energy consumption should not be monitored at a household level but rather at the device level, in order to detect faulty device operation and inefficient or suboptimal operational strategies and thus maximize improvements in terms of energy savings as shown in [
14]. To measure energy consumption at device level, energy usage has to be measured either for each device separately using one smart meter per device or the household aggregated energy consumption (sum of energy consumption from several devices measured at one central point e.g., the power inlet of a household) has to be disaggregated to device level using computational algorithms. When using only one sensor (smart meter) to disaggregate the total consumed energy and to extract energy consumption on the appliance level the task is called non-intrusive load monitoring (NILM) as introduced in [
15]. In the NILM approach the energy disaggregation task is expressed as a single-channel source separation problem, where the smart meter is the only input channel measuring the total power consumption and the goal is to find the inverse of the aggregation function to calculate the energy consumption per device. In intrusive load monitoring (ILM) one smart meter per device is used, thus measuring the energy consumption directly from each device. Compared to ILM, NILM has the advantage of requiring less hardware (ILM uses one smart meter per device which is impractical for most households) as well as meets consumers’ acceptability with respect to privacy conserving [
7,
16]. NILM approaches assume that there is a single observation (smart meter measurements) and multiple unknowns (power consumption of electrical devices) making the disaggregation problem highly under-determined and difficult to solve without any further constraints.
Several approaches for NILM have been proposed in the literature. In these approaches one or multi-state electrical devices have been modeled by finite-state machines, i.e., with steady energy consumption behavior per operational state [
15,
17,
18]. In contrast to one/multi-state devices, there is no established approach in detecting appliances with continuous power consumption or with non-linear behavior and a highly-varying power signature [
19,
20]. Researchers have addressed this issue by using high frequency features or wavelets to detect transient device behavior, however, these have the drawback of a higher cost in hardware and an increased computational power needed [
12,
20,
21]. Therefore most approaches use disaggregation algorithms with sampling rates in the order of seconds to minutes, in addition with temporal information (e.g., factorial hidden Markov models (FHMM) [
22,
23]) to identify appliances with varying power consumption [
12,
24]. Furthermore, special filtering techniques (e.g., Kalman filters [
25]) with time-varying coefficients and probabilistic approaches using appliance grouping [
26] have been proposed to address the issue of modeling devices with continuous or non-linear characteristics. The NILM approaches can briefly be classified into methods with and without source separation (SS). Approaches without SS are based on the decomposition of the aggregated signal to a sequence of feature vectors, which will be classified to device labels by a machine learning (ML) algorithm (e.g., artificial neural networks (ANN) [
27], cecision trees (DT) [
28], hidden Markov models (HMM) [
22], k-nearest neighbors (KNN) [
29], support vector machines (SVM) [
30]) or by a predefined set of rules and thresholds [
31,
32]. Furthermore, recent research in deep learning and big data has led to a significant increase of use of data-driven approaches using large scale datasets (e.g., AMPds [
33]). Approaches based on convolutional neural networks (CNNs) [
34,
35,
36], recurrent neural networks (RNNs) [
37,
38] and long short time memories (LSTM) [
37,
39] have been proposed in the literature, while denoising autoencoders (dAEs) [
40] and gate recurrent units (GRUs) [
36] have also been used. Approaches with SS are based on single-channel source separation algorithms (e.g., non-negative matrix factorization [
41], sparse component analysis [
42]) to extract the consumption of each device from the aggregated signal by using additional constraints (e.g., sparseness or sum-to-one [
43]) during the optimization procedure. The features extracted from the aggregated signal in approaches with and without SS strongly depend on the sampling frequency, with either macroscopic (for low sampling frequency) or microscopic (for high sampling frequency) features being extracted. Macroscopic features are mainly active and reactive power, while statistical values from the active or reactive power (e.g., mean, median, variance or energy) can be estimated as well [
44]. Microscopic features can be current harmonics or transient energy [
31,
45] and require high-sampling frequency to be calculated (1 kHz and above).
In addition to the above-mentioned machine learning-based NILM solutions, approaches using template matching have been proposed. More specifically, in [
46] dynamic time warping (DTW) was used to detect transient signatures for NILM and a weighted DTW was proposed and evaluated for different sampling frequencies. In [
47] a hybrid detection approach utilizing FHMMs and DTW-based iterative subsequence clustering was introduced for generating subsequences to refine initial estimates provided by the FHMM. In [
48] load disaggregation was performed using subsequence searching by utilizing DTW and iteratively disaggregate one appliance at a time in order of decreasing energy consumption. In [
49] a DTW-based pattern matching approach was proposed and its performance was compared to HMMs and DTs.
In this paper, an architecture based on elastic matching algorithms for non-intrusive load monitoring is proposed. In contrast to machine learning-based approaches which require significant amount of data to train a model, elastic matching-based approaches do not have any model training process but perform recognition using template matching. Except for a few papers [
46,
47,
48,
49] that have used only the DTW algorithm for NILM, no previous work on the evaluation of elastic matching algorithms for energy disaggregation has been published in the literature. In the proposed architecture, excluding DTW, several other elastic matching algorithms such as the global alignment kernel, the soft dynamic time warping, the minimum variance matching and the all common subsequences have been used. The remainder of this article is organized as follows. In
Section 2 five different elastic matching algorithms are reviewed. In
Section 3 the proposed architecture for energy disaggregation using elastic matching is presented. In
Section 4 and
Section 5 the experimental setup and evaluation results are described, respectively. In
Section 6 we conclude this work.
2. Elastic Matching Algorithms
In the context of energy disaggregation five different elastic matching algorithms, which can be used to compare any two time series of unequal lengths, are reviewed. These are the DTW algorithm, which has been used before in the NILM task [
46,
47,
48,
49], as well as the global alignment kernel (GAK), the soft dynamic time warping (sDTW), the minimum variance matching (MVM) and the all common subsequences (ACS), which have not been used before in the NILM task. GAK, sDTW, MVM and ACS algorithms were chosen as they offer additional degrees of freedom on the warping path [
50,
51,
52] comparing to the DTW algorithm.
Considering the aggregated power consumption signal acquired by a smart meter let be a sequence of length N where is the sample of and let be a second sequence of length M where is the sample of and . Furthermore let be an arbitrary cost matrix, where is a distance metric e.g., Euclidean distance, Manhattan distance or Kullback–Leibler (KL) distance and being the inner product of matrix A with the cost matrix , where A is an alignment matrix with being the alignment score between the and the element of and respectively.
2.1. Dynamic Time Warping
Based on the above, using the cost matrix
and the different alignment matrices
A,
is the minimum accumulated cost between
and
for all possible warping paths in the
search space. Accordingly the minimum cost is defined as in Equation (
1) and the recursive update rule for finding the optimal warping path is given in Equation (
2) [
51,
53].
where
is the accumulated cost associated with any warping path
from
to
with path-length
L and point
. Furthermore the initial conditions for the accumulated cost are set as follows:
,
for
and
for
.
2.2. Global Alignment Kernel
Extending the previous definition of DTW in
Section 2.1 the global alignment (GA) kernel is defined as the exponentiated soft-minimum of all alignments distances and can be written as in Equation (
3) [
50]
where
is the smoothing parameter of the kernel. Compared to DTW,
incorporates the whole spectrum of costs
and thus provides a richer representation than the absolute minimum of set
A, as considered by DTW [
50].
2.3. Soft Dynamic Time Warping
As described in [
51] Equations (
1) and (
3) can be computed using a single algorithm. The generalized
operator, with the smoothing parameter
can be written as in Equation (
4) and is referred to as soft dynamic time warping
.
where the original DTW score is recovered by setting
, while for
a scaled version of GAK can be written as
.
2.4. Minimum Variance Matching
In contrast to DTW, sDTW and GAK, MVM tries not to find the optimal alignment between the two sequences
and
, but also considers the alignment of subsequences. Thus MVM tries to find a subsequence
of length
N such that
best matches
. To formally describe MVM the difference matrix
r between the two sequences
and
and is defined as follows [
52]:
Furthermore,
is treated as a directed graph with the following links [
52]:
Using Equations (
6) and (
7) the least-value path in terms of the linkcost and pathcost can be written as described by Equations (
8) and (
9).
2.5. All Common Subsequences
As proposed in [
54] the number of all common subsequences
, of any two sequences
and
, can be found using dynamic programming. Specifically let
be the number of common subsequences then:
and consequently
.
3. NILM Using Elastic Matching
Considering a set of
M-1 known devices each consuming power
with
, the aggregated power
measured by the sensor will be
where
is a ‘ghost’ power consumption (noise) consumed by one or more unknown devices and
f is the aggregation function. In NILM the goal is to find precise estimations
of the power consumption of each device
m using an estimation method
with minimal estimation error and
, i.e.,
In the proposed approach the minimization is performed using a database of power consumption signatures built from frames of the aggregated signal
and their corresponding ground-truth information for each appliance, providing estimates
for each
. The block diagram of the proposed NILM architecture is illustrated in
Figure 1.
As illustrated in
Figure 1 the proposed approach consists of three steps, namely preprocessing, framing and template matching using an elastic matching algorithm. During the training phase the energy consumption of each of the
M devices,
, of a household and the aggregated consumption,
, are recorded from smart meters (denoted as SM). The acquired measurements (
M+1 time-synchronous signals) are preprocessed using a filter to remove outliers and static noise from the smart meters, frame blocked in frames
,
, of constant length
with
being the number of frames and grouped, i.e., every stored aggregated energy consumption frame (reference frame) is stored together with the corresponding time-synchronous energy consumption frames of each of the
M devices, into a table
,
. Finally all tables
are stored in a database
. During the operational phase only the aggregated signal
is measured from a (central/main) smart meter. Similarly to the training phase, the aggregated signal
is initially preprocessed and frame blocked in frames of the same constant length
, with
t being the number of the frame of the aggregated signal during operation. Each frame
is then compared against all aggregated power consumption reference frames
stored in the database
W using an elastic matching algorithm
and from the best matching reference frame the
M device frames are used for numerical estimation,
, of the power consumption of each of the
M devices as described in Equations (
14) and (
15).
In both the training and operational phase, only the active power samples of the device and aggregated signals were used since not all elastic matching algorithms can align multidimensional time-series data [
52,
54].
5. Experimental Results
The performance was evaluated in terms of estimation accuracy (
) considering device operation in state level with a double counting of errors as proposed in [
55], i.e.,
where
T is the number of disaggregated frames and
M the number of appliances including the ‘ghost’ device. The five different elastic matching algorithms described in
Section 2 were evaluated on the REDD database using all houses and all available data. Specifically a 10-fold cross validation protocol was followed, with 90% of the data being used for building the signature database and 10% of the data for evaluating the proposed elastic matching-based NILM architecture. The evaluation results are tabulated in
Table 6.
As can be seen in
Table 6 MVM outperforms all other evaluated elastic matching algorithms across all datasets as well as on average increasing disaggregation accuracy approximately 2.7% resulting in an absolute average disaggregation accuracy of 80.93%. Furthermore sDTW offered a slight improvement with respect to the DTW baseline system with a performance increase of 0.8% and a total disaggregation accuracy of 78.95%. Moreover, GAK’s average performance was slightly lower than the baseline DTW (−1.0%), with the REDD-2 and REDD-5 datasets performing significantly lower than DTW. ACS was observed to perform significantly lower than DTW across all houses as well as in average, which is probably owed to the fact that ACS forces matching of subsequences and has neither a soft a margin as sDTW/GAK nor can it skip outliers like MVM [
62]. It is worth mentioning that the energy disaggregation accuracy of the REDD-5 dataset is above 80% for both DTW and MVM despite the limited amount of available data for this household.
Furthermore results on the device level are presented for house two of the REDD database. REDD-2 was chosen as all appliances were metered over the whole recording period and there are no gaps in the measurements. For the purpose of direct comparison with previous studies we additionally tested our proposed methodology on five selected loads from the REDD database, so called deferrable loads, defined in [
63]. These loads (reported as deferrable loads), namely the refrigerator, the lighting, the dishwasher, the microwave and the furnace (not available in REDD-2), were proposed as they contain a significant amount of the total consumed energy and were used in previous publications [
56,
63]. For evaluating estimation accuracy on device level Equation (
20) is modified by eliminating the summation over
M appliances resulting to Equation (
21).
The results are tabulated in
Table 7, with the last row presenting the average disaggregation accuracy computed according to Equation (
21) and the second column presenting the percentage of the total energy consumed by each appliance.
As can be seen in
Table 7 DTW in general offers good performance for appliances with one/multi-state behavior (e.g., refrigerator, microwave or dishwasher) and performs poorly for device operating for long duration and without many state changes (e.g., lighting or kitchen-outlets), which is in agreement with the evaluation results in [
49]. MVM was found to improve the disaggregation accuracy of appliances with long operational duration due to its ability of matching subsequences without being restricted in aligning the corresponding first and last samples of the two sequences as in the case of DTW alignment. Furthermore as stated in [
64] MVM allows the skipping of outliers that are present in the test series
and thus is able to handle noisy data better compared to DTW. In detail lighting and kitchen-outlets showed the largest improvements with 11.1% (10.5%) and 8.3%, respectively. Moreover the detection of ghost power, which usually appears in the aggregated signal and has a high variance due to possibly several unknown devices working in parallel was further improved achieving disaggregation accuracy of 90.96%.
The best performing MVM elastic matching algorithm is compared to other methods proposed in the literature that have been evaluated on the REDD database. It is worth mentioning that the number of datasets used across previous studies was not the same thus MVM performance has been calculated for each dataset setup (datasets 1,2,3,4,6; dataset 2; referable loads of dataset 2; fridge of dataset 2). Also the split of the data to training/test subsets is not the same in the literature thus only rough comparison is possible. The results are tabulated in
Table 8.
As can be seen in
Table 8 the best performing elastic matching algorithm MVM outperforms all other reported approaches on the REDD-1/2/3/4/6 dataset setup. Similarly the results of REDD-5 dataset setup showing the advantage of elastic matching over machine learning-based approaches when limited available training data exit. Considering the REDD-2 dataset setup with deferrable loads, which was initially proposed in [
63], the proposed methodology using elastic matching outperforms all reported methodologies. The exception is the method of Makonin et al. [
56] that utilized HMM sparsity, which performed 2.9% better than our proposed MVM, however the approach in [
56] is specifically designed for deferrable loads and performances using all appliances of each house of the REDD database are not reported. Considering the latest deep learning techniques using CNNs, our MVM-based elastic matching approach performed 7.3% better for the fridge only REDD-2 dataset setup in [
38].
For the purpose of direct comparison of the above-presented evaluation results with the previous evaluation of the DTW algorithm, the approach presented in [
49] using a DTW and evaluated in houses REDD-1,2,6 was used. In detail, the approach presented in [
49] uses a train/test data splitting, with the first week of every dataset used for training and the rest for testing as well as a lower sampling rate of 1 min, thus the results have been recalculated according to the setup of [
49]. Furthermore the approach is event-based thus performance is measured using the
-score as defined in Equation (
22),
and a set of thresholds is used to decide if a device is operating within each frame or not. In Equation (
22)
,
and
are the True Positives, False Negatives and False Positives for each identified turned on appliance combination. As thresholds are not explicitly given for all devices in [
49] for our evaluation the decision threshold was empirically selected to 25 W as also in [
48,
72]. The results are tabulated in
Table 9.
As can be seen in
Table 9 the
-scores of [
49] and of our DTW implementation are almost identical with only 0.5% difference in average, most probably owed to the different preprocessing and threshold settings (the parameter values of them are not given in [
49]). In this experiment MVM also outperforms all other elastic matching algorithms and improves the average disaggregation accuracy by 2.4%, resulting in an total average disaggregation accuracy of 89.19% in terms of the
-score. In agreement with the previous evaluations presented in
Table 9 sDTW again offers slight performance improvement, while GAK performs slightly worse compared to the baseline DTW, achieving average disaggregation accuracies of 86.84% and 86.26%, respectively. Furthermore, ACS shows a significant performance decrease when compared to DTW, resulting in an average disaggregation accuracy of 79.11%. The highest performance increase is observed for the REDD-1 dataset, improving the energy disaggregation
-score by 4.2% when using MVM as the elastic matching algorithm.