Modeling and Simulation of Household Appliances Power Consumption

Featured Application: The method presented in this paper serves to predict the consumption of household appliances by modeling their behavior and by simulating accordingly. Abstract : The consumption of household appliances tends to increase. Therefore, the application of energy efficiency measurements is urgently needed to reduce the levels of power consumption. Over the last years, various methods have been used to predict household electricity consumption. As a novelty, this paper proposed a method of predicting the consumption of household appliances by evaluating statistical distributions (Kolmogorov–Smirnov Test and Pearson’s X 2 test). To test the veracity of the evaluations, first, a set of random values was simulated for each hour, and their respective averages were calculated. These were compared with the averages of the real values for each hour. With the exception of HVAC during working days, great results were obtained. For the refrigerator, the maximum error was 3.91%, while for the lighting, it was 4.27%. At the point of consumption, the accuracy was even higher, with an error of 1.17% for the dryer while for the wash ‐ ing machine and dishwasher, their minimum errors were less than 1%. The error results confirm that the applied methodology is perfectly acceptable for modeling household appliance consump ‐ tion and consequently predicting it. However, these consumptions can be only extrapolated to dwellings with similar surface areas and habitats.


Introduction
Currently, the growth in energy consumption and the need to address the pollution resulting from its generation are of concern to consumers and suppliers [1].Governments have enacted legislation on circular economy and environmental protection [2,3].Generally speaking, energy efficiency refers to both energy and climate policy [4].Improving energy efficiency in buildings can significantly reduce the environmental impact of buildings as well as provide economic savings to consumers [5,6].
In residential buildings, household appliances consume a considerable amount of energy, resulting in high electricity bills [7].This consumption has an increasing trend [8,9], despite improvements in their efficiency in recent years [10].Households in the European Union account for 26% of the final energy consumption, yet their share in demand response (DR) systems is practically nonexistent [11].Moreover, the energy consumption of refrigerators and freezers accounts for about half of the corresponding households [12], while HVAC is responsible for around 20-30% [13].
To cope with this high value of energy consumption, it can be of great help to predict or estimate how much each appliance may consume, i.e., its future behavior in those terms.An accurate prediction of the consumption at each hour of a day can be used to make decisions to try to reduce them as much as possible.In [14], an adaptive predictive control algorithm, using mixed linear programming, is proposed.Moreover, in [15], a two-stage control algorithm for the centralized management of residential loads is proposed to ensure their control.In [16], several regression models were established to analyze the determinants of end-use energy consumption.Further, in [17], a microgrid model was used to estimate the consumption of washing machines, dishwashers and dryers.Moreover, in [18], a model for detecting the behavior of building occupants was developed to estimate building consumption.In [19], three prediction models were proposed to estimate HVAC consumption as a function of occupants, electrical equipment and lighting.Similarly, in [20], a hidden Markov model was used to model a system and estimate the power values of household appliances.In [21], an SVM-based segmentation method was performed for dishwashers and washing machines.Moreover, in [22], a calibration method was employed to integrate POE data.In [23], a fuzzy logic controller was employed to estimate HVAC consumption.In [24], the authors establish a residential user evaluation system based on an evaluation model by selecting indicators related to user characteristics and electricity consumption data.In [25], an innovative customer preference-based appliance scheduling framework is presented.In [26], real-time simulations are provided by using finite element analysis programs.In [27], a multi-agent simulator capable of emulating different profiles of consumes and equipment is proposed.In [28], a harmonic coupled dynamic admittance matrix model based on voltage and current data was established to predict the consumption of household appliances.In [29], a domestic energy management model that was based on time perspective theory was developed, incorporating energy storage devices and flexible and smart appliances.In [30], the factors associated with the variation of total daily energy consumption by smart meters in different British households were investigated, including weather conditions, demographics and user attitudes.In [31], a study was conducted on the impact of teleworking in the aftermath of the COVID-19 pandemic on air-conditioning consumption using an ordinal logistic regression model.In [32], a real-time dynamic pricing method was proposed to determine the hourly electricity prices and schedule the electricity consumption of smart appliances.In [33], an equivalent thermal model of the HVAC system of a reference house was established.In [34], an air conditioner on/off state prediction model, combined with the diversity of occupant behavior, was established.In [35], Lasso regression was used to estimate the consumption of electrical appliances, electric heating, cooling and lighting of building parks in Wallonia.In [36], a Pecan Street dataset was used to group building occupants according to the energy they consume per electro-domestic appliance in their home with a subsequent load profile development.In [37], air-conditioning usage patterns of three climate zones were analyzed and developed.In [38], a thermal model was developed to calculate the regulating power provided by air conditioners.In [39], an optimal load control strategy for air conditioners was proposed.In [40], a model was established to characterize the Spanish electricity consumption considering typical appliances and key parameters of vulnerable households.In [41], a model of commonly used household appliances was constructed and a user satisfaction evaluation index was established.In [42], it was proposed to find the optimal delay time in the most energy-consuming equipment according to the priority of each appliance.
All of the above methodologies lack a descriptive statistical analysis of each household appliance for each hour of the day.
As alternatives to the previous solutions, this paper proposes an evaluation of the statistical distributions followed by each household appliance in a house [43] and, using them, to generate random values in such a way that they are as close as possible to the real values measured.To check the applicability of the assessment, it has been compared with an average of sets of simulated values with the sets of measured values and, thus, proves that the result is accurate.
All the above papers provide different forms of modeling, but none of them deal with the statistical distributions that may follow the consumptions at each period in depth.As a novelty, in this paper, the aim is to discover what statistical distribution the consumption of household appliances follows in 10-min periods, from the most basic ones (such as the normal, exponential or Weibull distribution) to the less known ones (Rayleigh or Generalized Pareto), applying the Kolmogorov-Smirnov test and the X 2 test.To guarantee a correct evaluation, the averages of sets of random values are compared with the averages of real values for each hour, checking that the difference between them is acceptable.
Obtaining the statistical distribution of the power consumption of household appliances can be of certain interest because: it avoids dealing with errors, the absence of data or out-of-scale values; it may simulate data corresponding to one, three or even ten years in order to assess all possible scenarios; it can be statistically combined with other models in order to obtain more complex theoretical developments and, last but not least, derived results may be obtained from the type of distribution and its parameters (mean, mode, variance, quantiles, etc.).
This article is organized as follows: Section 2 describes the methodology applied for the two different types of consumption to deal with, i.e., the mathematical models and procedures used to obtain the results.Section 3 includes the information of the case study in detail.Section 4 presents the results after applying the methodology, together with a discussion of the results.Finally, Section 5 presents the conclusions of the work carried out.

Methodology
Based on the values of power consumption of different household appliances, they are processed according to the type of consumption; whether it is continuous for a long time or not.

Continuous Power Consumption Household Appliances
The starting data corresponds to each 10-min period of the day, from the one included in 00:00-00:10 to the one corresponding to 23:50-00:00.The distribution of consumption for each period is evaluated.First, every power consumption set of values will be evaluated using the Kolmogorov-Smirnov test [44] to check whether they follow a Normal distribution or not.First, the mean (μ) and standard deviation (σ) of each sample are calculated, and a theoretical normal cumulative distribution function, expressed in Equation (1), is used.
This theoretical cumulative function is compared with the observed cumulative frequency (  ).At first, the maximum upper difference ( ) is obtained, as in Equation (2).
Subsequently, the maximum lower difference is calculated ( ), as it is expressed in Equation (3).
The maximum value between the upper difference and the lower difference is the maximum absolute difference () between the theoretical cumulative function and the observed cumulative frequency, as in Equation ( 4).

𝐷 𝑚𝑎𝑥 𝐷 , 𝐷
is compared with Dα, the maximum difference allowed according to the level of significance (α) and the type of distribution.Dα, Equation ( 5), is calculated by checking Tables 1 and 2 for the values of  and   .

𝑫 𝜶 𝒄 𝜶 𝒌 𝒏
(5) The significance selected for these distributions is 0.05 (95% of confidence).In Table 1, the coefficient of significance ( ) is selected according to the model, its amount of data if the model is Weibull and the level of the significance.The significance selected for these distributions is 0.05 (95% of confidence).In Table 2, the expression to calculate k(n) is selected according to the distribution, and subsequently, this last parameter is calculated with the amount of data.In this way,  is calculated.As it can be seen, in Table 2, the expression of k(n) is different in function of the distribution model, from Normal distribution to Weibull distribution.
In the case of D < Dα, the null hypothesis H0 of Normality is accepted so that the corresponding distribution would follow a Normal distribution.10-min periods whose distribution rejects the null hypothesis of Normality are assessed in order to check if they follow an Exponential distribution with an analogous process to the previous one.The difference is the cumulative distribution function, expressed in Equation ( 6), where λ is the rate parameter, as well as the different values of cα and the way k(n) is calculated according to Table 2.
Again, it is checked for the Exponential distribution condition, and the unfitted sets are assessed against a Weibull distribution using the same process with its respective cumulative distribution function, formulated in Equation (7), where β is the scale factor and α is the shape factor.

𝐹 𝑥
1 The sets of values that are still not associated with any of the three distributions tested so far, will be subjected to an evaluation of the Lognormal, Logistic, Loglogistic, Gamma, Generalized Pareto and Rayleigh distributions until one of these is found to be correct.
The method used for the last six mentioned distributions will be Pearson's X 2 test [45].The process begins with the grouping of the data in a number of classes greater or equal to five, in such a way that they cover the whole possible range of values of the variables, and the expected frequency Oi is calculated for each sample.
Subsequently, the probability density function of the corresponding model is calculated.The Lognormal distribution is defined by Equation ( 8), while the Loglogistic distribution is as Equation ( 9), where μl is the location factor and s is the scale factor.
The Loglogistic, Gamma, Generalized Pareto and Rayleigh distributions are shown in Equations ( 10)-( 13), respectively, where λa and αd are the skewness and distribution shape factors, respectively.
The probability density function of Loglogistic distribution is calculated by Equation ( 10): Gamma distribution is defined by Equation ( 11): Generalized Pareto distribution is expressed by Equation ( 12): Finally, Equation ( 13) defines the Rayleigh distributions: If any period corresponding to a series of values cannot be represented by any of the distributions assessed, it will be associated with the distribution corresponding to the nearest period, because the probability of being the same is quite high.
Once all the series values have an associated distribution, the expected frequency is calculated with the amount of data, defined in Equation ( 14).

𝐸 𝑛𝑓 𝑥
Finally, X 2 , defined in Equation (15), is calculated and compared with X 2 α(k-r-1), where k is the number of classes and r is the number of parameters on which each distribution depends.If X 2 is less than or equal to X 2 α(k-r-1) the null hypothesis H0 of the corresponding distribution to be evaluated is accepted.
Figure 1 shows a flowchart of the sequence of evaluations carried out before the following process.
Once all the curves have their associated distribution, a set of 300 random values are generated for each 10-min period according to their associated distribution, and the averages of these are compared with the averages of the real values to verify whether each simulation is close to reality (Figure 2).
The percentage of error in each case is calculated by Equation ( 16):

Discontinuous Power Consumption Household Appliances
The number of times each household appliance is used per day is counted, as well as the duration of each time and in which periods of the day it happens.
The washing machine, dishwasher and dryer only consume energy while they are doing their functions.The first step consists of counting how many times each appliance consumes per day.The duration of each time and its start hour are registered as well.
The times with the same consumption duration are grouped together and evaluated in the same way as the continuous consumption appliances.The essential difference is that, in the latter case, just the moments where consumption is taking place are considered, and consequently shorter time intervals, whereas in the previous process the whole day was evaluated.
Once the power consumption curves are associated with each distribution, the following process is carried out: 1. Simulation of the number of power consumptions of each element, where the probability of each integer value is based on the ratio of the previous count.2. If the above-simulated value is greater than or equal to 1, the duration of each count and its start time are simulated, also based on the data previously collected.3. Simulation of 300 sets of random consumptions according to the duration of the consumptions and their associated distribution, comparing their average value with the average of the real values.4. Calculation of the percentage of error by Equation ( 16).
Figure 3 shows a flowchart of the entire process of the treatment of the punctual consumptions, from the different counts to the calculation of errors, including the evaluation of distributions The simulation was carried out with the consumption data of a house, with 199 m 2 and three occupants, located in Vancouver.The extrapolation of results can be done in homes with an equivalent surface area, an equal number of inhabitants and a similar climatic zone.If these conditions change, the results would be different.Thus, if the surface area was larger, the consumption would increase.If it were a single dwelling, consumption would be lower, on the other hand.However, the methodology proposed in this paper can be applied if the number of consumption values available are significant and correspond to the same appliances that have been evaluated.Moreover, in some cases, such as the HVAC, the climate data influences the models obtained.This fact is not taken into account in this research work but can be of interest for future work.

Case study
The methodology explained in the previous section was applied to the instantaneous consumption data per second of different household appliances, extracted from [43], namely lighting, refrigerator, HVAC, dryer, washing machine and dishwasher.
The input data are the consumption of different household appliances in a house located in Vancouver.The powers were measured for 63 days, from 6 March 2016 to 7 May 2016.The dwelling consists of two floors, with a total of 199 m 2 and three people living in it.Before starting the statistical evaluation, the consumption was grouped according to a 10-min period of power and then separated into three groups, i.e., working days, Saturdays and Sundays.The last step beforehand was to distinguish between continuous and discontinuous consumption, given that the treatment of the latter is more complex.
The reason for evaluating the 10-min periods is a matter of trade-off in accuracy and computational cost.On one hand, in a 10-min period, an appliance is not very likely to experience a large number of consumption changes.If it were 20 min, there is a risk of adding a larger error, as it could cover more than two phases of operation, e.g., a dishwasher.On the other hand, if the period is reduced to 5 min, the accuracy would increase, but the computational cost would be doubled, as twice as many periods would have to be modeled.Thus, a period of 10 min shows an adequate balance.The refrigerator, lighting and HVAC consume a considerable minimum at all times, while the dryer, dishwasher and washing machine only consume a considerable amount when they are running, thus they are considered point loads, or discontinuous consumptions.
A summary of the type of data can be observed in Table 3, with its type of consumptions, days and amount.

Results
Figure 4 compares the simulated average values with the real average values for each 10-min period of lighting during the weekdays.In general, the trends in both graphs are very similar, since the off-peak hours, between 08:00 and 13:00, coincide, and the peak hours, around 07:00, are the same.However, there are some significant differences between the simulation and the measured situation, especially at 15:00, just at the end of the hours of minimum consumption.
The error of the simulation with respect to the measured values is 3.81%, an acceptable value, which makes the simulation valid and perfectly close to the real situation.
Figure 5 shows the difference in lighting on Saturdays.Just as there are several more coinciding values in the previous graph, with the same trend, there are also two or three consumption peaks with a somewhat more significant difference than in the previous case, which means that the error increases to 4.26%, but it is still a valid simulation that is close to reality.In the case of Sundays, there are also slightly unequal trends between the two curves (Figure 6), but also some significant differences in some specific points.The error produced in this simulation is 4.27%, very close to that of Saturdays thus all the simulations for the three lighting cases are acceptable.In the case of the refrigerator during the days of the week (Figure 7), the same trend and generally insignificant differences can be seen, except for the first peak.In many cases, the values are practically the same or very close.The error is 3.91%, thus the simulation is considered successful.On Saturdays (Figure 8), the simulation of refrigerator consumption is even tighter than on weekdays.Differences are generally minimal, leading to an error of 3.33%.For Sundays (Figure 9), the same happens as with the simulations for Saturdays, with minimal deviations and a high coincidence.Nevertheless, the error amounts to 3.61%, which is not very significant.The three simulation graphs for the refrigerator consumption fit perfectly with a real situation.In the case of HVAC on weekdays (Figure 10), the same general trend between the simulation and the real values can also be observed, but in this case, there are more differences, although they are not particularly significant.These inaccuracies are responsible for an error of 5.55%, the highest error of all simulations.For Saturdays (Figure 11), as well as a high number of ex-actual coincidences, there are also a few other considerably significant deviations.However, the error produced in this case is smaller than for weekdays and drops to 4.72%.Finally, in the HVAC simulation for Sundays (Figure 12), the trends are practically the same and show a high number of coinciding values for weekdays and Sundays, and consequently a lower error (4.03%).The three HVAC simulations are also considered acceptable despite being the ones with the highest error.In the case of the dryer during the days (Figure 13), the most frequent duration occurs with a 40 min consumption, as can be seen in the graph.In this case, the simulation is practically coincident with the real average, producing an error of 1.17%.The rest of the graphs with different durations have not been included, as they are very unlikely and therefore have low significance.In the case of the washing machine during weekdays (Figure 14), the most frequent consumption durations are 10 and 70 min, approximately.In this case, the simulations are also close to the average of the real data, with errors of 1.04% and 0.78% for the 10-min and 70-min durations, respectively.Both errors are low and, therefore, valid.For the dishwasher during weekdays (Figure 15), 30 and 40 min were the most frequent durations.In the two cases, the errors (2.33% in 30 min and 0.78% in 40 min) are perfectively admissible, so the two approximations are also valid.Table 4 shows a summary of the results obtained for each appliance, with the errors for each simulation.After the results of the contrast of simulations, the applied methodology can be considered valid to predict the consumption of any home where the appliances are the same as those evaluated, i.e., lighting, HVAC, refrigerator, dryer, washing machine and dishwasher.Moreover, the results are perfectly extrapolated to homes whose surface area is equivalent to the one studied (199 m 2 ) and the number of inhabitants is the same (three in this case).If these characteristics change, the results would no longer be applicable.However, the methodology can be applied in the same manner and, after a proper validation, the results obtained would be considered valid.

Conclusions
A ten minute period of consumption of six household appliances, based on the data provided, was assessed and simulated based on a related distribution.The first three appliances consume in a continuous way (lighting, refrigerator and HVAC), while the remaining appliances consume just in some periods of the day (dryer, washing machine and dishwasher).In order to establish the best model for each case, a methodology was proposed, which consisted of checking, in order of expectance, if data could be approximated through statistical distributions.The best results obtained, in the continuous consumption case, have been the refrigerator, then the lightning, and finally the HVAC.While the best results for the discontinuous consumptions were obtained for the washing machine, then the dryer, and the dishwasher.
The simulation of the continuous consumptions can be extrapolated to any house, as they are the most demanded appliances in most of them.For the dryer, washing machine and dishwasher, i.e., the discontinuous ones, their use can be very varied, since the duration can be longer or shorter.In these cases, it was decided to analyze the most frequent durations, as they are more significant.Generally speaking, good results were obtained and, therefore, simulation curves could also be extrapolated to other houses.

Symbols and Acronyms
Symbols are used along the paper and are included here for reference.

Figure 1 .
Figure 1.Flowchart of the sequence of statistical distributions evaluations.

Figure 2 .
Figure 2. Flowchart of the comparison between random and real consumptions.

Figure 3 .
Figure 3. Flowchart of the treatment of the punctual consumptions.

Figure 4 .
Figure 4. Random (blue) and real (orange) average consumption of the lighting during working days.

Figure 5 .
Figure 5. Random (blue) and real (orange) average consumption of the lighting during Saturdays.

Figure 6 .
Figure 6.Random (blue) and real (orange) average consumption of the lighting during Sundays.

Figure 7 .
Figure 7. Random (blue) and real (orange) average consumption of the refrigerator during working days.

Figure 8 .
Figure 8. Random (blue) and real (orange) average consumption of the refrigerator during Saturdays.

Figure 9 .
Figure 9. Random (blue) and real (orange) average consumption of the refrigerator during Sundays.

Figure 10 .
Figure 10.Random (blue) and real (orange) average consumption of HVAC during working days.

Figure 11 .
Figure 11.Random (blue) and real (orange) average consumption of HVAC during Saturdays.

Figure 12 .
Figure 12.Random (blue) and real (orange) average consumption of HVAC during Sundays.

Figure 13 .
Figure 13.Random (blue) and real (orange) average consumption of the dryer whose duration is 40 min during working days.

Figure 14 .
Figure 14.Random (blue and yellow) and real (orange and purple) averages consumption of the washing machine, whose durations are 10 and 70 min during working days.

Figure 15 .
Figure 15.Random (blue and purple) and real (red and orange) averages consumption of the dishwasher whose durations are 30 and 40 min during working days.

Table 1 .
Cα values according to the model and the significance.

Table 2 .
k(n) according to the distribution function.

Table 3 .
Summary of the evaluated appliances with their type of consumption, type of days and amount of data.

Table 4 .
Summary of the error values.