Modelling the Disaggregated Demand for Electricity in Residential Buildings Using Artificial Neural Networks (Deep Learning Approach)

The paper addresses the issue of modelling the demand for electricity in residential buildings with the use of artificial neural networks (ANNs). Real data for six houses in Switzerland fitted with measurement meters was used in the research. Their original frequency of 1 Hz (one-second readings) was re-sampled to a frequency of 1/600 Hz, which corresponds to a period of ten minutes. Out-of-sample forecasts verified the ability of ANNs to disaggregate electricity usage for specific applications (electricity receivers). Four categories of electricity consumption were distinguished: (i) fridge, (ii) washing machine, (iii) personal computer, and (iv) freezer. Both standard ANNs with multilayer perceptron architecture and newer types of networks based on deep learning were used. The simulations included over 10,000 ANNs with different architecture (number of neurons and structure of their connections), type and number of input variables, formulas of activation functions, training algorithms, and other parameters. The research confirmed the possibility of using ANNs to model the disaggregation of electricity consumption based on low frequency data, and suggested ways to build highly optimised models.


Introduction
According to the Kyoto protocol of 2008, the electricity used in buildings constitutes 40% of global consumption [1]. A significant part is used in residential buildings. Households in the European Union are estimated to be responsible for more than 27% of the total energy consumption (in 2017), which makes them the second largest source of demand. Only transport exceeds these values and consumes more energy [2].
Knowing the time patterns of energy demand is crucial from the point of view of managing and optimising energy consumption. Optimisation processes are understood as those leading to reduction in electricity consumption and electricity acquisition costs (e.g., in the case of zonal tariffs). It is estimated that potential electricity savings triggered only by consumer behaviour (who possess detailed knowledge about its usage) can range from 5% to 15% [3]. Creating (on this basis) personalised guidelines for the application of electricity receivers can lead to estimated savings of at least 12% [4]. Regardless of the adopted objective, a comprehensive approach requires data for those devices that generate demand. In practice, only aggregate data from smart meters are usually available. They do not provide information on the sources of demand for electricity and, thus, do not provide information on actions that may lead to the optimisation of electricity consumption. The aim of this study is to analyse the possibility of modelling the disaggregated demand for electricity at the level of residential buildings with the use of artificial neural networks (ANNs) based on time patterns, as well as the relation between real and apparent power. The rationale behind the application of different kinds of power consumed by appliances has been confirmed, e.g., by Figueiredo et al. [5] and Esa et al. [6]. The final objective is to predict the activity of a given appliance, understood as its real power consumption above a certain threshold identified in the research. Real power consumption was measured by a sensor located in the plug of a given appliance. Real and apparent powers of total consumption in a house were measured for three single phases separately and were then summed up. As noted by Yu et al. [7], most smart meters in use nowadays make measurements with relatively low frequency, ranging from 1 Hz to 1/900 Hz. It is, therefore, desirable to create systems based on data sampled at a relatively low frequency.
The novelty of the described approach lies in the following: • the application of data with a relatively low sampling rate of 1/600 Hz to model the disaggregated demand for electricity • the use of different types of ANNs with real and apparent power, as well as selected time and data variables • the use of the difference between apparent and real power as an input variable in an ANN model.
The estimation of energy consumption for selected key demand sources would potentially enable the implementation of energy management systems without the need to install individual meters for each consumption point. This potentially means not only lower costs for the energy management system but also greater opportunity to increase the popularity (and application) of the systems on the market. Effective (precise) analyses may concern both individual households (end users of energy) [8]-as presented in this study-and the electricity consumption of entire buildings (Liu et al. [9] used the spatiotemporal pattern network (STPN), Henriet et al. [10] showed differences in electricity consumption patterns between residential and commercial buildings pointing out higher periodicity for the latter).
Previous studies on electricity demand in buildings have confirmed its close connection with a number of measurable factors. Chen et al. [1] mention among others: (i) zone temperature measurements, (ii) node temperature measurements, (iii) lighting schedule, (iv) in-room appliances schedule, and (v) room occupancies. Untypical patterns of electricity consumption may be caused by the malfunction of a device (as examined by Rashid et al. [11]). It should be noted that the different structure of energy consumption is related to the legitimacy of using different independent variables. For example, in Canada, 63% of energy is used for space heating, while in the United States this amounts to only 22%. The energy consumption rates for space cooling are 2% and 9% respectively [12]. Therefore, in the first case, weather data are much more desirable in the model than in the second case.
Knowledge of the above-mentioned demand determinants in real time is related to the need for installing technically advanced (expensive) measurement infrastructure connected to the database system. For this reason, numerous studies have focused on the development and implementation of more affordable systems based only on values of total electricity demand. This disaggregation technique is also referred to as non-intrusive load monitoring (NILM), and its origins date back to 1984 [13] and Hart's publication with the same name [14].
Kotler and Johnson [15] used the factorial hidden Markov model (FHMM) to obtain the percentage of correct answers (called Accuracy-definition in Section 2.5, Formula (5)) for the test set (data from a period of two weeks), which ranged from 46.6% to 82% depending on the model and house. High frequency data of up to 15 kHz sub-sampled to a ten-second interval for the purposes of the model evaluation was used in the study. Cominaola et al. [4] extended two-state FHMM with a trace pattern correction by using Iterative Subsequence Dynamic Time Warping (ISDTW). In addition, the research was divided into two periods-summer and winter-due to the seasons' different energy consumption patterns. The analyses were based on data of a one-minute interval. Bonfigli et al. [16] proposed Energies 2020, 13, 1263 3 of 16 a modified version of FHMM. Instead of the original algorithm, the additive factorial approximate maximum a posteriori, a new approach based on the structural variational approximation method and the Viterbi algorithm was used for the disaggregation. This translated into an increase in the accuracy of forecasts ranging from 2.5% to 14.9% depending on the case study. Azaza and Wallin [17] used a method based on finite state machines (FSM). The results obtained during the estimation of the activity of seven appliances were characterised by an absolute average error ranging from 5.75% to 21.4% depending on the modelled appliance. To a large extent, this was related to the type of dataset used. The Building-level Fully-labelled Dataset for Electricity Disaggregation (BLUED) yielded results with higher precision than the Reference Energy Disaggregation Dataset (REDD). The authors speculate that this is due to the different intervals between successive samples for the above-mentioned sets of 60 Hz and 1 Hz respectively. Also, the research by Tomkins et al. [18], using the hinge-loss Markov random field (HL-MRF), confirmed the high sensitivity of models to the parameters of acquired data. For REDD and for the Pecan Street dataset (Dataport), the F 1 -Measure was 0.722 and 0.505 or 0.503 (depending on the model) respectively. Data from Dataport had a lower sampling rate of 1/60 Hz or 1/3600 Hz. Schirmer et al. [19] tested five different elastic matching algorithms in NILM based on REDD. The minimum variance matching (MVM) achieved the best results measured by both Accuracy and F 1 -Measure (definition in Section 2.6, Formula (6)) at 87.58% and 89.19% respectively. As noted by Schirmer et al. [19], in contrast to the algorithms they used, approaches based on machine learning require a much larger dataset in order to train the model. De Paiva Penha and Castro [20] applied the convolutional neural network (CNN) approach to model the activity of six appliances in six houses using data from the REDD. The authors used networks with three convolution layers and a single dense layer. Data were divided between the training, validation and test sets in a proportion of 60%, 20%, and 20% respectively. F 1 -Measure was 0.93. In one of the proposed models Lie et al. [9] demonstrated the legitimacy of using variables such as indoor and outdoor temperature, which can translate into increased precision for the model. Wu and Wang [21] applied CNN extended by the concatenation operation to separate the feature of the target load by extracting it from the load mixed with the background. The proposed technique was combined with two prevalent networks: Extreme Inception (Xception) and Densely Connected Convolutional Network (DenseNet-121.) The models were positively evaluated in the REDD and UK Domestic Appliance-Level Electricity dataset (UK-DALE) with average F 1 -Measure values of 85.1% and 89.0% respectively.

Methodology
The study used real data from the Electricity Consumption and Occupancy (ECO) dataset collected from six houses in Switzerland over a period of approximately eight months at a frequency of 1 Hz. The dataset contains, among others, such data provided by smart meters as, the sum of real power over all phases, values of real power, current, and voltage over every phase separately. The plug meters collected the values of real power consumed by an appliance. Detailed information on the structure of energy demand in each property has been given among others in [22]. Since the original frequency was higher than assumed for the purposes of the study, it was necessary to reduce it. The frequently used time interval is 15 min (1/900 Hz −1 ) [23]. In this study, higher frequency data (with potentially higher practical usefulness) were used. As in [1], the interval of 1/600 Hz −1 , i.e., ten minutes, was applied. In order to carry out the transformation described above, arithmetic averages from one-second readings of power consumption for both the overall data and values representing power consumption by specific devices were calculated.
Previous research confirmed the suitability of using different types of power in non-intrusive appliance load monitoring. Dong et al. [24] empirically showed the ranges of real and reactive power values in the case of particular appliances. By applying clustering, they defined the clusters for selected appliances, as well as calculating the average values of both power types. The well-known interdependence between real, reactive and apparent power (its graphic representation is commonly called a power triangle) enables the calculation of each, just by knowing the other two. The data used Energies 2020, 13, 1263 4 of 16 in this research were based on real and apparent power. The calculation and application of reactive power in the model would mean the use of input variables with strong autocorrelation and high values of variance inflation factor (VIF). It was expected that this would result in the deterioration of the model's quality. For this reason, only real and apparent power, as well as the differences between them, were used in the research. Figure 1 shows an example of both types of power and their differences for house no. 1. the model's quality. For this reason, only real and apparent power, as well as the differences between them, were used in the research. Figure 1 shows an example of both types of power and their differences for house no. 1.

Multilayer Perceptron
The creation of ANNs dates back to 1943, when a mathematical model of neurons was developed and presented by McCulloch and Pitts [26] (Figure 2a). In 1958, Rosenblatt [27] published a paper describing the model of a unidirectional network, in which neurons were grouped into successive so-called layers, with signals flowing-as the name indicates-in only one direction (from input to output). A multilayer perceptron (MLP) is composed of three types of layers: (i) a single input layer, (ii) hidden layers, (iii) an output layer ( Figure 2b). The last two types of layers are subject to the learning process, which means that the neurons placed in them have the ability to acquire knowledge by modifications in the so-called weights in neurons (real numbers marked as w in Figure 2a).

Deep Neural Networks
Due to the numerous limitations of the primary architecture, more advanced ANNs based on modified training algorithms, generally referred to as deep neural networks (DNNs), have been gaining popularity in recent years. Under this name, there are many types of networks built on the basis of different types of layers. Although the beginnings of deep learning algorithms in ANNs date back to 1965 [28], they have only recently grown very popular. The reasons behind this surge of interest include technological developments and the increasing use of high-performance graphics processing units (GPUs) to accelerate computing in relation to the CPU. A DNN with a relatively

Multilayer Perceptron
The creation of ANNs dates back to 1943, when a mathematical model of neurons was developed and presented by McCulloch and Pitts [25] (Figure 2a). In 1958, Rosenblatt [26] published a paper describing the model of a unidirectional network, in which neurons were grouped into successive so-called layers, with signals flowing-as the name indicates-in only one direction (from input to output). A multilayer perceptron (MLP) is composed of three types of layers: (i) a single input layer, (ii) hidden layers, (iii) an output layer ( Figure 2b). The last two types of layers are subject to the learning process, which means that the neurons placed in them have the ability to acquire knowledge by modifications in the so-called weights in neurons (real numbers marked as w in Figure 2a). the model's quality. For this reason, only real and apparent power, as well as the differences between them, were used in the research. Figure 1 shows an example of both types of power and their differences for house no. 1.

Multilayer Perceptron
The creation of ANNs dates back to 1943, when a mathematical model of neurons was developed and presented by McCulloch and Pitts [26] (Figure 2a). In 1958, Rosenblatt [27] published a paper describing the model of a unidirectional network, in which neurons were grouped into successive so-called layers, with signals flowing-as the name indicates-in only one direction (from input to output). A multilayer perceptron (MLP) is composed of three types of layers: (i) a single input layer, (ii) hidden layers, (iii) an output layer ( Figure 2b). The last two types of layers are subject to the learning process, which means that the neurons placed in them have the ability to acquire knowledge by modifications in the so-called weights in neurons (real numbers marked as w in Figure 2a).

Deep Neural Networks
Due to the numerous limitations of the primary architecture, more advanced ANNs based on modified training algorithms, generally referred to as deep neural networks (DNNs), have been gaining popularity in recent years. Under this name, there are many types of networks built on the basis of different types of layers. Although the beginnings of deep learning algorithms in ANNs date back to 1965 [28], they have only recently grown very popular. The reasons behind this surge of interest include technological developments and the increasing use of high-performance graphics processing units (GPUs) to accelerate computing in relation to the CPU. A DNN with a relatively

Deep Neural Networks
Due to the numerous limitations of the primary architecture, more advanced ANNs based on modified training algorithms, generally referred to as deep neural networks (DNNs), have been gaining popularity in recent years. Under this name, there are many types of networks built on the basis of different types of layers. Although the beginnings of deep learning algorithms in ANNs date back Energies 2020, 13, 1263 5 of 16 to 1965 [27], they have only recently grown very popular. The reasons behind this surge of interest include technological developments and the increasing use of high-performance graphics processing units (GPUs) to accelerate computing in relation to the CPU. A DNN with a relatively low complexity and high level of similarity to MLP is a network containing dense layers. Its neurons commonly use the rectified linear unit (ReLU) as an activation function (defined by Formula (1)). Among other functions, SoftPlus (calculated with Formula (2)), SoftSign, SoftMax, Sigmoid, and Tanh are popular. The respective graphs are shown in Figure 3.
Energies 2019, 12, x FOR PEER REVIEW 5 of 16 low complexity and high level of similarity to MLP is a network containing dense layers. Its neurons commonly use the rectified linear unit (ReLU) as an activation function (defined by formula (1)). Among other functions, SoftPlus (calculated with formula (2)), SoftSign, SoftMax, Sigmoid, and Tanh are popular. The respective graphs are shown in Figure 3. The multitude of different types of layers in DNN enables the creation of more complex structures. An example is the convolutional neural network (CNN). Commonly used for image classification [29], it also works well in time series analyses [30]. Also noteworthy is their ability to extract features of a low-, mid-and high-level. [31]

Convolutional Neural Network
CNNs are predisposed to extracting high-level features from data by convolution. This is a mathematical operation for merging two datasets. Using the convolution kernel, results are generated in the form of a map of features. In the CNN, in addition to convolution layers, there are typically such layers as: (i) pooling-most often using the maximum or average function (selecting the maximal or calculating the average value from the data area with the dimensions of the filter used)-to reduce the spatial amount of input data [32], (ii) flatten-transforming a two-dimensional dataset into a vector (one-dimensional data) enabling it to be sent to (iii) a dense layer.
DNNs built of three dense layers (3dl-DNN) and CNNs were applied in this research. DNNs use mainly the rectified linear unit (ReLU). The possibility of using other activation functions, such as SoftPlus, SoftSign, SoftMax, Sigmoid, and Tanh, was also tested during the research process. Figure 4 presents the structure of the DNNs used in the research. The multitude of different types of layers in DNN enables the creation of more complex structures. An example is the convolutional neural network (CNN). Commonly used for image classification [28], it also works well in time series analyses [29]. Also noteworthy is their ability to extract features of a low-, mid-and high-level. [30] Convolutional Neural Network CNNs are predisposed to extracting high-level features from data by convolution. This is a mathematical operation for merging two datasets. Using the convolution kernel, results are generated in the form of a map of features. In the CNN, in addition to convolution layers, there are typically such layers as: (i) pooling-most often using the maximum or average function (selecting the maximal or calculating the average value from the data area with the dimensions of the filter used)-to reduce the spatial amount of input data [31], (ii) flatten-transforming a two-dimensional dataset into a vector (one-dimensional data) enabling it to be sent to (iii) a dense layer.
DNNs built of three dense layers (3dl-DNN) and CNNs were applied in this research. DNNs use mainly the rectified linear unit (ReLU). The possibility of using other activation functions, such as SoftPlus, SoftSign, SoftMax, Sigmoid, and Tanh, was also tested during the research process. Figure 4 presents the structure of the DNNs used in the research.  The hardware used for the calculations related to the DNNs (both 3dl-DNN and CNN) included a computer equipped with two GPUs with a total computing power of about 26.8 GFLOPS (billions of floating point operations per second), 8704 CUDA cores and thermal design power (TDP) over 500 W. The hardware capacity required by traditional ANNs involved mainly two CPUs with a total of 16 physical cores (32 threads) clocked at 3.6 GHz and TDP rating of 300 W.

Neural Networks Structure in Modelling Two-State Appliances Activity
Modelling device activity is a classification issue. For this reason, the ANN structure must enable the status of a given device to be assigned to one of the classes (as a minimum). As this research has distinguished between two different states of devices (active and inactive), the final objective reveals a dichotomous problem. There are three popular approaches based on ANN, which involve a different number of neurons in the output layer, and the stage at which the final classification is made. Figure 5 shows their structure, as adapted to the requirements of this study.  The hardware capacity required by traditional ANNs involved mainly two CPUs with a total of 16 physical cores (32 threads) clocked at 3.6 GHz and TDP rating of 300 W.

Neural Networks Structure in Modelling Two-State Appliances Activity
Modelling device activity is a classification issue. For this reason, the ANN structure must enable the status of a given device to be assigned to one of the classes (as a minimum). As this research has distinguished between two different states of devices (active and inactive), the final objective reveals a dichotomous problem. There are three popular approaches based on ANN, which involve a different number of neurons in the output layer, and the stage at which the final classification is made. Figure 5 shows their structure, as adapted to the requirements of this study.

Neural Networks Structure in Modelling Two-State Appliances Activity
Modelling device activity is a classification issue. For this reason, the ANN structure must enable the status of a given device to be assigned to one of the classes (as a minimum). As this research has distinguished between two different states of devices (active and inactive), the final objective reveals a dichotomous problem. There are three popular approaches based on ANN, which involve a different number of neurons in the output layer, and the stage at which the final classification is made. Figure 5 shows their structure, as adapted to the requirements of this study.  Model (a) has two outputs, and only one class must be chosen. This problem can be solved by choosing the class, for which the output value is closer to one. In model (b) only a single neuron is used in an output layer. When its value is equal to zero, it represents one state (e.g., inactivity), whereas when it is equal to one it represents the state of activity. As a continuous activation function used in the neuron, ANN generates output in the form of real numbers in some ranges, e.g., from zero to one for Sigmoid. It is necessary to set a threshold (typically equal to 0.5) below which the answer is understood as zero (inactivity), and over which the answer is considered as one (active). A similar approach is applied in model (c), in which output values of real power consumed by an appliance are assigned to the class by an external classifier working according to the rules applied in model (b), with thresholds as presented in Section 2.5. Model (c) was used in this paper. Figure 6 presents the algorithm for classifying real and estimated appliance activity by the external classifier. In the research, the same threshold was used to assign an appliance activity to one of the two classes on the basis of real data and the modelled device operation. Using two different values of thresholds in the external classifier could generate more accurate results; however, such an approach was rejected for the study due to the risk of not maintaining full objectivity. Two outputs allow the use of such a model in non-dichotomous issues and take into account the situation when two classes can be chosen simultaneously. Model (a) has two outputs, and only one class must be chosen. This problem can be solved by choosing the class, for which the output value is closer to one. In model (b) only a single neuron is used in an output layer. When its value is equal to zero, it represents one state (e.g. inactivity), whereas when it is equal to one it represents the state of activity. As a continuous activation function used in the neuron, ANN generates output in the form of real numbers in some ranges, e.g. from zero to one for Sigmoid. It is necessary to set a threshold (typically equal to 0.5) below which the answer is understood as zero (inactivity), and over which the answer is considered as one (active). A similar approach is applied in model (c), in which output values of real power consumed by an appliance are assigned to the class by an external classifier working according to the rules applied in model (b), with thresholds as presented in Section 2.5. Model (c) was used in this paper. Figure 6 presents the algorithm for classifying real and estimated appliance activity by the external classifier. In the research, the same threshold was used to assign an appliance activity to one of the two classes on the basis of real data and the modelled device operation. Using two different values of thresholds in the external classifier could generate more accurate results; however, such an approach was rejected for the study due to the risk of not maintaining full objectivity. Two outputs allow the use of such a model in non-dichotomous issues and take into account the situation when two classes can be chosen simultaneously.

Pre-Assumed Networks Structure and Parameters
The number of neurons in the input layer was adjusted to an optimal set of independent variables (experimentally selected). The output (dense) layer always consisted of one neuron, as all models had a single dependent variable. This study uses three-layer MLPs (single hidden layer) trained by the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Sigmoid and Tanh were used as activation functions. The number of neurons in the first two dense layers of the 3dl-DNN was selected experimentally. Dropout units of 0.3 were used after dense layers to reduce interdependent learning among the neurons. In the CNNs, a single convolution layer was applied. The number of epochs for the 3dl-DNN and CNN was limited to 200 and 350 respectively.

Pre-Assumed Networks Structure and Parameters
The number of neurons in the input layer was adjusted to an optimal set of independent variables (experimentally selected). The output (dense) layer always consisted of one neuron, as all models had a single dependent variable. This study uses three-layer MLPs (single hidden layer) trained by the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. Sigmoid and Tanh were used as activation functions. The number of neurons in the first two dense layers of the 3dl-DNN was selected experimentally. Dropout units of 0.3 were used after dense layers to reduce interdependent learning among the neurons. In the CNNs, a single convolution layer was applied. The number of epochs for the 3dl-DNN and CNN was limited to 200 and 350 respectively.

Selection and Preparation of Input and Output Variables of Models
All available data were divided into three sets (training, validation, and test) in the manner introduced by De Paiv Penha and Castro [20], i.e., in a proportion of 60%, 20%, and 20% respectively. Data from the training set were used to train the ANNs. The aim of the validation set was to select the optimal model from all networks of a specific type differing in parameters (e.g., number of neurons, activation functions). The third set-test-was used to determine performance metrics. Due to the methodology used, the tests are an out-of-sample type. The data structure in all three sets was identical.
The independent variables used in this research in the MLP and 3dl-DNN are: (i) the sum of real power over all phases in ten minutes [W], (ii) variable (i) delayed by ten minutes, (iii) the sum of apparent power over all phases in ten minutes [VA], (iv) variable (iii) delayed by ten minutes, (v) the difference between variables (iii) and (i), (vi) variable (v) delayed by ten minutes, (vii) the natural number from the range [1,144] representing the daily number of ten-minute intervals, (viii) a dummy variable indicating a weekend day (1) and a weekday (0). The structure of datasets (consisting of input variables and an output one) with numerical data is presented in Table 1. Due to the artificial neuron texture, as well as the ANN functioning principle, the modelling errors generated by a single cell are multiplied and summed in cells of the next layer. As a result, the total model error is increased. Therefore, the number of input variables was a compromise between a relatively simple structure of the model and the necessity to use the data required for an accurate analysis. The numbers on inputs refer to their numbers and names given in Section 2.5. 2 The thresholds were set as presented in Table 2 (10 W in the case of the fridge). In the case of the CNN, two-dimensional data had to be prepared so that they could be input into the convolution layer. The second dimension consisted of delayed values of the time series. The current (relative to the analysed moment) and five historical values were introduced to obtain two-dimensional data with a total of six elements in the second dimension.  The dependent variable was real power measured from a plug (each model made predictions from one plug meter alone). For all properties, the readings from the plug meters did not include the entire period covered by the smart meter measurements. Limiting the analysis to those days for which a full set of data was available (including delayed input variables (ii) and (iv)) would mean that datasets of insufficient size would be used. This would result in both problems with proper ANN training and a decrease in the reliability of the study. For this reason, only selected categories of electricity receivers with a sufficient number of plug meter readings (presented in Table 2) were analysed. The data were divided into three subsets: training, validation, and test (a similar approach was used by Buddhahai et al. [33]).  Figure 8 shows an example of the arithmetic average of plug meter readings (with a measurement frequency of 1 Hz) recorded in ten minutes. The course of real power consumption is highly cyclical (appliances typically have their own specific work cycles [34]); however, certain differences between cycles might be observed. Some segments in the cycle are constant while others are not. This behaviour of many household appliances is typical and has been described by Seevers et al. [35]. Some detailed differences can easily be observed in the case of electricity receivers. Based on [36], Liu et al. [37] identified four types of devices: (i) on-off appliances (e.g. lamp, toaster), (ii) finite state machines (e.g. washing machine, stove burner), (iii) continuously variable consumer devices (with fluctuations in electricity consumption during activity, e.g. power drill, dimmer lights), and (iv) permanent consumer devices (with constant electricity demand, e.g. hardwired smoke detector, telephone sets). A similar categorisation was introduced by Hamid et al. [38]. In this study, all devices were analysed as two-state appliances (on-off-type (i) or low-high consumption-a subtype of type (ii)). Owing to the fact that over the ten-minute period it is possible that a given device remains both fully active (high demand for electricity) and in a low (or zero) state of current consumption (standby mode or inactive), it was necessary to determine the threshold of real power above which the device was classified as active. Due to the varying electricity demand characteristics of each type of device, these thresholds had to be set individually (third column in Table 2). The dependent variable was real power measured from a plug (each model made predictions from one plug meter alone). For all properties, the readings from the plug meters did not include the entire period covered by the smart meter measurements. Limiting the analysis to those days for which a full set of data was available (including delayed input variables (ii) and (iv)) would mean that datasets of insufficient size would be used. This would result in both problems with proper ANN training and a decrease in the reliability of the study. For this reason, only selected categories of electricity receivers with a sufficient number of plug meter readings (presented in Table 2) were analysed. The data were divided into three subsets: training, validation, and test (a similar approach was used by Buddhahai et al. [32]). Figure 8 shows an example of the arithmetic average of plug meter readings (with a measurement frequency of 1 Hz) recorded in ten minutes. The course of real power consumption is highly cyclical (appliances typically have their own specific work cycles [33]); however, certain differences between cycles might be observed. Some segments in the cycle are constant while others are not. This behaviour of many household appliances is typical and has been described by Seevers et al. [34]. Some detailed differences can easily be observed in the case of electricity receivers. Based on [35], Liu et al. [36] identified four types of devices: (i) on-off appliances (e.g., lamp, toaster), (ii) finite state machines (e.g., washing machine, stove burner), (iii) continuously variable consumer devices (with fluctuations in electricity consumption during activity, e.g., power drill, dimmer lights), and (iv) permanent consumer devices (with constant electricity demand, e.g., hardwired smoke detector, telephone sets). A similar categorisation was introduced by Hamid et al. [37]. In this study, all devices were analysed as two-state appliances (on-off-type (i) or low-high consumption-a subtype of type (ii)). Owing to the fact that over the ten-minute period it is possible that a given device remains both fully active (high demand for electricity) and in a low (or zero) state of current consumption (standby mode or inactive), it was necessary to determine the threshold of real power above which the device was classified as active. Due to the varying electricity demand characteristics of each type of device, these thresholds had to be set individually (third column in Table 2).
On the one hand, the use of a ten-minute period eliminates (to an extent) the problem of real power consumption fluctuations (described by Welikala et al. [38]) through the process of averaging. As a result, it is easier to identify the activity of an appliance. On the other hand, this runs the risk of improperly including several minutes of inactivity into the activity period (and vice versa). Therefore, it was crucial to determine the thresholds as precisely as possible. On the one hand, the use of a ten-minute period eliminates (to an extent) the problem of real power consumption fluctuations (described by Welikala et al. [39]) through the process of averaging. As a result, it is easier to identify the activity of an appliance. On the other hand, this runs the risk of improperly including several minutes of inactivity into the activity period (and vice versa). Therefore, it was crucial to determine the thresholds as precisely as possible.

Performance Metrics
The Accuracy and F1-Measure metrics were used to assess the precision of the model's operation. In order to determine the second one, two scores called Recall and Precision, defined analogously to those applied by Kolter and Jaakkola [40], were used. Recall presents the percentage share of correctly classified cases. Precision is the percentage of correct classifications only for active appliances. If the designations of the numbers of correct and incorrect estimations according to Table 3 are adopted, the Recall and Precision scores can be determined using formula (3) and formula (4) [41].
Accuracy is the percentage of all correct estimations (without taking into account which device state they concern). It is calculated according to formula (5). Accuracy = (TP + TN)/(TP + FP + TN + FN) F1-Measure is the geometric mean between Precision and Recall [17]. It is determined using formula (6).

Performance Metrics
The Accuracy and F 1 -Measure metrics were used to assess the precision of the model's operation. In order to determine the second one, two scores called Recall and Precision, defined analogously to those applied by Kolter and Jaakkola [39], were used. Recall presents the percentage share of correctly classified cases. Precision is the percentage of correct classifications only for active appliances. If the designations of the numbers of correct and incorrect estimations according to Table 3 are adopted, the Recall and Precision scores can be determined using Formula (3) and (4) [40].
Precision = TP/(TP + FP) (4) Accuracy is the percentage of all correct estimations (without taking into account which device state they concern). It is calculated according to Formula (5).
F 1 -Measure is the geometric mean between Precision and Recall [16]. It is determined using Formula (6).

Results
The conducted empirical research showed that (in the DNNs) ReLU is the most effective in most issues analysed. The second-best activation function was SoftPlus. Hyperparameters of the best networks are listed in Table 4. The structure parameters are presented in Table 5.   Table 6 presents the results achieved by both the best MLP and 3dl-DNN model based on six input variables: (i), (ii), (v), (vi), (vii), (viii) and the best CNN models using four input variables: (i), (v), (vii), and (viii).  Figure 9 shows two performance metrics (Accuracy and F 1 -Measure) calculated separately for each type of appliance (jointly for all the houses in which it was analysed) and for each of the three methods used.

Discussion
The analysis of the results achieved by the DNN models shows that the highest precision occurred when estimating the electricity consumption generated by the washing machine. Both the CNN and MLP obtained similar accuracy of estimation. High precision estimation was obtained by all models for fridge electricity consumption. The reasons for this are to be found mainly in the cyclicality of the appliance's activity, as well as the availability of numerous patterns (at night, devices such as personal computers or washing machines are hardly ever active-similar

Discussion
The analysis of the results achieved by the DNN models shows that the highest precision occurred when estimating the electricity consumption generated by the washing machine. Both the CNN and MLP obtained similar accuracy of estimation. High precision estimation was obtained by all models for fridge electricity consumption. The reasons for this are to be found mainly in the cyclicality of the appliance's activity, as well as the availability of numerous patterns (at night, devices such as personal computers or washing machines are hardly ever active-similar observations were made by Parson et al. [41]). The worst results were achieved by modelling the energy consumption of the personal computer. It should be assumed that, in addition to the difficulty in creating time patterns of activity using ANNs, this is due to the fact that electricity demand is low in relation to total electricity consumption ( Figure 10). Only the CNN was able to obtain acceptable values of accuracy. This could be attributed to the CNN's ability to recognise complex patterns, as well as its use of input data from a longer time range (five delayed ten-minute intervals). It should be noted, however, that the low value of F 1 -Measure was caused by a large number of incorrectly classified cases of personal computer activity, which was recognised as inactive. The opposite estimation error characterised the results obtained by the MLP.

Discussion
The analysis of the results achieved by the DNN models shows that the highest precision occurred when estimating the electricity consumption generated by the washing machine. Both the CNN and MLP obtained similar accuracy of estimation. High precision estimation was obtained by all models for fridge electricity consumption. The reasons for this are to be found mainly in the cyclicality of the appliance's activity, as well as the availability of numerous patterns (at night, devices such as personal computers or washing machines are hardly ever active-similar observations were made by Parson et al. [42]). The worst results were achieved by modelling the energy consumption of the personal computer. It should be assumed that, in addition to the difficulty in creating time patterns of activity using ANNs, this is due to the fact that electricity demand is low in relation to total electricity consumption ( Figure 10). Only the CNN was able to obtain acceptable values of accuracy. This could be attributed to the CNN's ability to recognise complex patterns, as well as its use of input data from a longer time range (five delayed ten-minute intervals). It should be noted, however, that the low value of F1-Measure was caused by a large number of incorrectly classified cases of personal computer activity, which was recognised as inactive. The opposite estimation error characterised the results obtained by the MLP. Depending on the modelled appliance activity and the house, different networks achieved the best results. In the case of the CNNs, three were based on the SoftPlus activation function in neurons of dense layers. Other modifications included the number of filters and neurons in the first dense Depending on the modelled appliance activity and the house, different networks achieved the best results. In the case of the CNNs, three were based on the SoftPlus activation function in neurons of dense layers. Other modifications included the number of filters and neurons in the first dense layer. This implies not only the need to train models for each appliance individually, but also to optimise their structure and other parameters.
Three tested models achieved similar accuracy with a slight advantage of the deep learning approach. In 81.8% of simulations, the accuracy of the DNNs was higher than the MLP (in accordance with the numbers in Table 6). The CNN and the 3dl-DDN were the best in 54.5% and 27.3% cases respectively. In 18.2% of the cases, however, the winning model was the MLP.

Conclusions and Future Work
Research confirmed the possibility of modelling the disaggregated demand for electricity at the level of individual households (houses) on the basis of low frequency data from smart meters extended by time variables, and real and apparent power.
The simulations showed that when modelling specific appliances, some ANN types may not be able to estimate their activity precisely (e.g., in the case of the fridge in house no. 4). Out of the three models, the one with the highest values of performance metrics can be chosen. It is recommended to create hybrid solutions combining different types of ANNs (and potentially other estimation methods) not only as part of cascading solutions, but primarily as models working in parallel. In this scenario, the winning network would emerge on the basis of tests conducted on the validation dataset.
Future research should be conducted on other ANN models, especially of the DNN type (e.g., with more convolution layers as in [42]-up to two layers-and in [43]-five layers). Due to the high precision of Long Short-Term Memory Networks (LSTMs) in NILM (which has been demonstrated, among others, by Kim and Lee [44] and Le and Kim [45]), models combining both CNNs and LSTMs are also worth exploring (e.g., similar to those proposed by Bhanja and Das [46], Almonacid-Olleros et al. [47], and Kim and Cho [48]). It would be reasonable to create models based on data with a shorter time interval. This would increase the practical value of the models and allow the modelling of the demand for electricity by devices with short time of activity, e.g., microwave ovens. Further research on the possibility of using the presented solution as one component of hybrid models based on the analysis and classification of time patterns of high sampling frequency is also desirable.
The models presented in the study aim primarily at determining the sources (components) of aggregated demand. Their future applications, however, are much wider. For example, their adaptation to the detection of anomalies in the functioning of electricity receivers before their total failure or the subsequent destruction of their technical infrastructure should be considered (similar to those presented in [49]).
Funding: This research received no external funding which could have any impact on the obtained results and the conclusions drawn. Financial support was granted for the proofreading of the final version of the paper.