Demand Response Alert Service Based on Appliance Modeling

: Demand response has been widely developed during recent years to increase efﬁciency and decrease the cost in the electric power sector by shifting energy use, smoothening the load curve, and thus ensuring beneﬁts for all participating parties. This paper introduces a Demand Response Alert Service (DRAS) that can optimize the interaction between the energy industry parties and end users by sending the minimum number of relatable alerts to satisfy the transformation of the load curve. The service creates appliance models for certain deferrable appliances based on past-usage measurements and prioritizes households according to the probability of the use of their appliances. Several variations of the appliance model are examined with respect to the probabilistic association of appliance usage on different days. The service is evaluated for a peak-shaving scenario when either one or more appliances per household are involved. The results demonstrate a signiﬁcant improvement compared to a random selection of end users, thus promising increased participation and engagement. Indicatively, in terms of the Area Under the Curve (AUC) index, the proposed method achieves, in all the studied scenarios, an improvement ranging between 41.33% and 64.64% compared to the baseline scenario. In terms of the F 1 score index, the respective improvement reaches up to 221.05%.


Background
A series of measures have been suggested over recent years for the demand side of Power Systems (PSs), aiming to balance the dynamic aspect of Renewable Energy Sources (RESs) introduced in their production side as well as decrease the respective costs and increase the overall system efficiency. These measures comprise what is widely referred to as Demand Side Management (DSM) [1]. One of the main DSM objectives is the transformation of load curves, through the utilization of different strategies, such as peak shaving, strategic conservation or load growth, valley filling, load shifting, or flexible load shape [2]. An important concept within the context of DSM is Demand Response (DR) [3]. DR includes the provision of control signals to the demand side to shift energy usage at important times, e.g., times of extremely high demand, high non-dispatchable production, or when grid reliability is jeopardized. DR can potentially add stability to the system and decrease the need for coal-and gas-fired spinning reserves as well as decrease the operating costs and wholesale prices.
DR programs are either incentive-based or price-based [4] (also referred to as timebased [1]). In incentive-based programs, users are rewarded for changes in energy use compared to a baseline level, while in price-based programs varying prices are used to promote changes in energy use [5]. Incentive-based DR includes Direct Load Control (DLC), Emergency DR (EDR), Interruptible/Curtailable (I/C) service, Capacity Market (CAP), Demand Bidding/Buyback (DB), and Ancillary Service (AS) market clearing programs. On the other hand, price-based DR includes Time of Use (TOU), dynamic TOU, and Real-Time Pricing (RTP) programs. Combinations or variations of these programs have also been developed.
DR has been mostly applied until now in the industrial sector. However, there has been a rising interest regarding the DR potential in the residential sector. The US energy markets demonstrate an increasing number of both time-and incentive-based residential DR programs during recent years [6]. The implementation of these DR programs in the USA is facilitated by a constantly increasing Advanced Meter (AM is also referred to as a Smart Meter (SM)) penetration rate in the residential sector (57% in 2020) as well as a series of regulatory and legislative changes. In the EU, DR is facing a series of policy and regulatory obstacles. Several EU member states have not yet seriously engaged with DR reforms, and their regulatory structures do not yet actively support the participation of the demand side in the markets [7]. In addition, there is a serious delay in the penetration of SMs in residential installations (expected to be 80% by 2020 but currently reaching only 43%). Furthermore, a lack of interoperability has also been highlighted regarding the different types of AM already installed [8].
Residential DR programs are implemented either through automated control systems or manual shifting by the end users. In Automated Demand Response (ADR) programs, an Energy Consumption Scheduler (ECS) [9] or a Home Energy Management System (HEMS) [10] is necessary as well as a SM installation to optimize the respective appliances' schedules, based on different restrictions, optimization goals, and pricing signals. In some ADR programs, users may override such a control. ADR is developing slowly due to the high costs of the required infrastructure. Another important open issue seems to be the end-users' perceived loss of privacy or control due to automation [11].
In manual DR, end users are expected to shift their consumption following an incentive or a price signal. The end-users' response to a DR event in such programs depends heavily on their demographics [12] as well as their knowledge and perception of their energy use, which is often limited or incorrect [13]. A potential drawback of manual DR is that if end users receive the same incentives or price signals they could shift their appliances in a similar way and create new peaks in their total consumption [14].
EDR comprises a potentially promising manual DR program type for peak shaving. Such an opt-out pilot program in 2015 in California, USA [15] enrolled 74,900 households and used personalized messaging based on neighbor comparison to reduce peak demand during four event days called "Summer Saving Days", with no monetary incentives. The decrease in peak load demand declined from about 2.2% on the first event to about 1.2% on the fourth event. As noted by the authors, "this decline over time is approximately linear and might result from a coincidence of other temporal factors with the performance of the program (e.g., weather) or it might result from the fact that the impact of DR messaging weakens with repeated exposure". A similar approach that included monetary incentives was adopted for a pilot study in China in 2017 with 20,000 enrollments that were invited via a SM on certain days to participate in a 1.5 h peak reduction [16]. The households replying "YES" were considered policy-responding households and received corresponding monetary subsidies based on the electricity savings during the response period. The pilot study reported that the effective participation was 11%, and the average household electricity savings were an average 0.86 kWh during the peak period. "Defeat the Peak" was another peak-shaving program implemented in the USA in 2018, with 16,000 participating households, involving a collective charity donation as the end-users' incentive [17]. Six peak events were set throughout the program and consumers were asked to curtail energy use during specific time periods (3 or 4 h in the afternoon). However, significant savings were reported only when the event coincided with the day of the respective annual peak demand.

Related Work
In all the above programs, alerts included the straightforward recommendation that users should reduce energy use during a DR event. Pratt and Erickson suggested that more specific recommendations could have increased the impact of their program [17]. Moreover, as mentioned earlier, it is reported in [15] that users' fatigue due to frequent or unrelatable DR recommendations is another factor that may affect participation during DR events. However, the creation of specific, personalized recommendations requires the effective modeling of energy use.
Bottom-up end-use modeling has been widely used during recent years as a potential part of effective DR prediction and semi or full ADR rescheduling. Variations of Markov models [18,19], neural networks [20], classifiers [21], and event-based approaches have been used in the literature [22][23][24][25] to acquire appliance models that can successfully estimate their energy use or potential flexibility.
Another approach to this end is to carry out appliance-level power consumption forecasting using different machine learning methods, e.g., deep learning [26,27] or Conditional Hidden Semi-Markov Models (CHSMMs) [28]. The forecasts in these papers, though, are either short term (one time interval ahead, be that a minute or an hour) or make use of external features other than the power consumption (humidity, precipitation, temperature, cloud cover, etc.) [29]. We differentiate in this aspect as we do next-day forecasting of the ON/OFF states of appliances, while, in addition, our resulting model is explainable and thus easier to be examined by an expert for its validity.
An approach that is a closer match to our method is that of [30] that provides behavioral energy consumption pattern mining for appliance-appliance and appliance-time associations derived from smart meter data and uses association rules and clustering to provide insight into consumers' energy consumption decision patterns. The method, though, needs to be amended with the mining of patterns that include both the type of day and the time of day along with a way to rank the households for receiving a personalized recommendation.
Finally, probability distributions of appliance use can be used as preference indicators in appliance scheduling optimization problems. For example, in [31] the hourly energy consumption curve for an appliance type is transformed into a probability distribution by converting the average energy consumption of appliances (per household) to the appliance use distribution using histogram construction algorithms. Such an approach, though, is too simplistic and as we will see in Section 3.4 less performant.
Huber et al. [24] went one step further by demonstrating a first comparison between a histogram, a pattern search, and a Bayesian inference algorithm for the modeling of appliance use. These models, however, are aimed toward the effective modeling of a single end user. In a DR implementation, an aggregator has to identify the part of a consumer group that will most probably respond to a DR request. In this context, the end-use modeling problem becomes rather a problem of probability ranking within this consumer group.

Contribution
For these reasons, in [25] a targeted messaging mechanism was presented that aims to improve DR communication services by suggesting appliance-based DR actions to the end users during a peak DR event, based on their past-usage appliance information. To this purpose, it is assumed that data regarding the past energy use are available through a SM and appliance data are available either via Non-Intrusive Load Monitoring (NILM) algorithms or via smart plugs. As a first step, appliance profiles are created for each major DR appliance and end user based on past-usage data. Only the users with the highest expected consumption regarding this appliance during the examined peak period are selected to receive the DR alert. This paper extends this work by testing the efficiency of three different appliance models, in a single-appliance and a multi-appliance scenario, based on two different publicly available datasets and a variety of appliance types. In the multi-appliance scenario, the end users also receive a specific recommendation on which appliance to turn off. The contributions of this work are therefore as follows: • Different variations for a probabilistic data-driven appliance model that permits us to rank our users based on their past appliance usage are presented; • In the multi-appliance scenario, appliance-based personalized DR alerts are offered, which can help end users to choose the right actions for reducing energy use during the DR event; • Overall, an improved data-driven appliance-based DRAS to increase participation in manual DR programs is introduced; • The DRAS is evaluated on real-world data and on different appliances and households.
The process for the creation of appliance models is presented in Section 2. The evaluation on the real-world data is presented in Section 3, which is followed by the Discussion and Conclusions sections.

Method
The approach is based on individual major household appliance measurements, which are used for the development of probabilistic appliance use models. For the purposes of this work, the required measurements are acquired from publicly available residential electrical energy datasets. A training period is set, and appliance models are created based on the appliance usage data during this period. Different single-and multi-appliance scenarios are examined. The part of the dataset that does not belong in the training period is used for the evaluation of the DRAS.

Appliances
When analyzing the potential shifting of residential load for DR it is important to recognize and categorize residential appliances in terms of their DR potential based on the following aspects: contribution to total load, flexibility over time, and variability over energy use [32]. Electric Vehicles (EVs) can serve as ideal DR appliances since their recharging is energy demanding, interruptible, and can be charged at different charging rates. For this reason, in the US markets, specific rates and rate programs for electric vehicles have been introduced during recent years. Another popular solution in the US market is the introduction of smart thermostats and the respective DR programs for HVAC, pool pumps, and water heaters, which are all common loads among US residencies. Wet or washing appliances, i.e., washing machines, dishwashers, and tumble dryers, have also been widely used in market research as DR appliances [33]. These are deferrable in time but are not variable in terms of energy use. As reported by [34], wet appliances can play an important part in DR, particularly in Europe [35].

Appliance Modeling
This study builds upon an earlier work on a probabilistic appliance modeling scheme for DR alerts [25] by examining variations of the initial model and reassessing the approach for more single-appliance and multi-appliance scenarios.
Two states are assigned to each appliance (ON/OFF). The ON state is attributed to an appliance if the mean power during a 15 min period (a typical SM data frequency) exceeds a representative threshold. In this work, the threshold of 20 W (mean power during a 15 min period) was used without loss of generality, as it was sufficient in order to distinguish between the ON state of the studied appliances and the cases of measurement noise. One model is created for each appliance, as we make the modeling assumption that the use of each appliance is independent of the use of other appliances. The model aims at estimating the probability, P(I(t 1 , t 2 )|d), that an appliance is active during the time interval, I(t 1 , t 2 ), between times t 1 and t 2 on a particular calendar day, d. To estimate P(I(t 1 , t 2 )|d), we make a series of modeling approximations. First, we model the probability, P(D|d), of event D that the appliance is used on day d. Specifically, let T(d) be the type of day (e.g., "Monday" or "Business Day"). Our model assumes that there is a regular usage pattern such that the probability of the use of the appliance on d depends only on the type of day, i.e., P(D|d) ≈ P(D| T(d)) written as P(D|T) for notation simplicity. Types of days are further discussed in Section 2.4.
Assuming the appliance is used on d, the next step is to model the probability that the appliance is active during a specific time interval of the day. To do that, we first model the event A(t), where the appliance is active at time t. This is true if the appliance started consuming power before t and its usage duration was long enough, such that it was active at t. If S is the usage start time and U the usage duration, the event A(t) can be approximated by for sufficiently small durations, ∆t, where t max = k max ∆t is the maximum appliance usage duration. Notice that the events ( Then, assuming the duration of the usage of an appliance is independent of its usage start time, the probability of A(t), where the appliance is active at time t of a usage day, is where f S (s|D) is the probability density function (PDF) of S, given that the appliance is used on that day, and f U (u) is the PDF of the duration, U.
If we work with discrete random variables and probability mass functions, the same calculation can be approximated through Similarly, the event I(t 1 , t 2 ), where the appliance is active sometime during the interval [t 1 , t 2 ],is equivalent to the event i.e., where the appliance either started during [t 1 , t 2 ] or it started sometime before t 1 , but its usage duration was long enough so that the appliance was active at t 1 and afterward. Thus, the probability that the appliance is active during [t 1 , t 2 ] of day d is We have therefore defined the probability that we can use to rank consumers with a decreasing probability of use of the target appliance during the interval I, in terms of P(D|T) , f S (s|D) , and f U (u). These distributions can be estimated from the available data, as described in the following section.

Estimating the PDFs
Given a dataset with active power consumption measurements, P a (t), for appliance a, we identify appliance activation events as intervals for which P a exceeds a threshold, i.e., P a (t) > P 0 . In the experiments of Section 3, we set the threshold at 20W. Then, the probability P(D|T) is the percentage of days of type T that include at least one consumption interval for a.
The PDF f S is modeled as a univariate Gaussian Mixture Model (GMM): where s is the start time, K is the number of mixtures, ∑ K k=1 a k = 1 are the mixture coefficients, and N(s; µ k , σ k ) is a univariate Gaussian distribution with mean, µ k , and standard deviation, σ k . In our experiments, we set a fixed K = 10 for all appliances. The GMM parameters are estimated through the Expectation-Maximization (EM) algorithm on the start times of all the appliance events in the training set.
Finally, we approximate f U via a probability mass function, which includes the probability of appliance usage duration in 15 min intervals (0-15 min, 15-30 min, etc.). This is also computed using the set of all activation events of the appliance in the training set. i.e., ( ) . In the experiments of Section 3, we set the threshold at 20W. Then, the probability ( | ) is the percentage of days of type that include at least one consumption interval for .
The PDF is modeled as a univariate Gaussian Mixture Model (GMM): where is the start time, is the number of mixtures, ∑ = 1 are the mixture coefficients, and ( ; , ) is a univariate Gaussian distribution with mean, , and standard deviation, . In our experiments, we set a fixed = 10 for all appliances. The GMM parameters are estimated through the Expectation-Maximization (EM) algorithm on the start times of all the appliance events in the training set. Finally, we approximate via a probability mass function, which includes the probability of appliance usage duration in 15 min intervals (0-15 min, 15-30 min, etc.). This is also computed using the set of all activation events of the appliance in the training set. Figure 1a

Variations of the Model
One can consider many variations of this basic model, depending on the data volume available for training. For example, different options for the type of day, , include using "workdays" vs. "non-workdays" or differentiating among the individual days of the week (Mon, Tue, Wed, Thu, Fri, Sat, and Sun). The former case assumes there is a regular usage pattern that is similar across business days and non-business days. The latter only makes this assumption for the same day of the week (e.g., Thursdays) and requires more data for training. Similarly, another option is to include the type of day in the calculation of the PDF of the appliance usage start time (in the basic model it is independent of ).

Variations of the Model
One can consider many variations of this basic model, depending on the data volume available for training. For example, different options for the type of day, T, include using "workdays" vs. "non-workdays" or differentiating among the individual days of the week (Mon, Tue, Wed, Thu, Fri, Sat, and Sun). The former case assumes there is a regular usage pattern that is similar across business days and non-business days. The latter only makes this assumption for the same day of the week (e.g., Thursdays) and requires more data for training. Similarly, another option is to include the type of day in the calculation of the PDF f S of the appliance usage start time (in the basic model it is independent of T).
Another option that has been explored in the experiments of this work is to include a variable, Y, denoting whether the appliance under study was used the previous day in the model. The use of Y affects the probability calculation regarding the use of the appliance on a particular day, i.e., P(D|d) = P(D|T, Y) The distribution P(D|T, Y) is calculated based on the activation events from the data.

Performance Metrics
As far as performance metrics are concerned, Wenninger et al. [21] successfully highlighted the weakness of accuracy as a metric to evaluate the results of appliance modeling for event-based algorithms. Pereira and Nunes have included in [35] an extensive list of performance metrics for event-based algorithms. Due to the nature of the problem at hand, precision, recall, F 1 score, and AUC were selected as evaluation metrics. In this study, precision and recall represent the following:  ( | ) = ( | , ) The distribution ( | , ) is calculated based on the activation events from the data.

Performance Metrics
As far as performance metrics are concerned, Wenninger et al. [21] successfully highlighted the weakness of accuracy as a metric to evaluate the results of appliance modeling for event-based algorithms. Pereira and Nunes have included in [35] an extensive list of performance metrics for event-based algorithms. Due to the nature of the problem at hand, precision, recall, score, and AUC were selected as evaluation metrics. In this study, precision and recall represent the following: = appliances that were selected by the DRAS and turned ON during the DR event all appliances that were turned ON during the DR event = appliances that were selected by the DRAS and turned ON during the DR event all appliances that were selected by the DRAS

Dataset
The DRAS was evaluated taking into account the available wet appliances (i.e., washing machines, dishwashers, and tumble dryers) in the REFIT [36] and the HES [37] datasets. The data were down sampled to a 15 min sampling period. The REFIT dataset included 20 washing machines, 15 dishwashers, and 12 tumble dryers. The HES dataset contained 22 washing machines, 16 dishwashers, and 11 tumble dryers. Each dataset was

Dataset
The DRAS was evaluated taking into account the available wet appliances (i.e., washing machines, dishwashers, and tumble dryers) in the REFIT [36] and the HES [37] datasets. The data were down sampled to a 15 min sampling period. The REFIT dataset included 20 washing machines, 15 dishwashers, and 12 tumble dryers. The HES dataset contained 22 washing machines, 16 dishwashers, and 11 tumble dryers. Each dataset was decreased to the minimum period including measurements for all houses (341 days for the REFIT dataset and 168 for the HES dataset). Subsequently, the dataset was divided into a training and an evaluation period. The training and evaluation periods were split 50/50.

Single-Appliance DR Scenario
In the single-appliance DR scenario, it is assumed that only one type of appliance per household participates in the DR program. The DRAS sends alerts to a pre-specified number of households (2,4,6,8, and 10 out of 20 for the REFIT or 26 houses for the HES dataset) to avoid using this certain appliance.
The selection of the households that will receive an alert is performed based on the probability produced by the appliance use model that the specific appliance will be used in a household within the specific timeframe. This feature is tested in the evaluation periods of the considered datasets. To further test the model's ability to select the appropriate households, a random selection process is also performed on the dataset evaluation periods (presented as an average over five different runs), and the respective results are compared to the proposed model results. In the following tables, suffix "_s" in the evaluation metrics denotes results obtained through selection by the proposed model, while suffix "_r" denotes results obtained by the random selection process. The different models used are presented in the following subsections.

Workdays versus Non-Workdays
The model assumes in this case that the probability of the use of an appliance on any given day depends only on whether it is a workday or a non-workday (weekends and holidays). This reasoning was evaluated on the wet appliances for both datasets. The results regarding the washing machines for both datasets are presented in Table 1. The results correspond to an evening peak period between 17:00 and 19:00. In all cases presented in Table 1, it is evident that the proposed model results are significantly better than the random selection process results. In terms of AUC, a metric that does not account for the number of selected households, the model results exhibit a 54% and 44% improvement in REFIT and HES, respectively, compared to the random selection process. As expected, as the number of alerts increases, recall increases, while precision decreases, striking a balance between four and six alerts based on the F 1 score. At the same time, the respective values of these metrics are expected to be lower for the HES dataset than for the REFIT dataset, as the former includes more households than the latter, while the magnitudes of the evaluated subsets are the same for both datasets.

Seven Day Types
The model assumes in this case that the probability of the use of an appliance on any given day depends only on the day of the week, i.e., Monday to Sunday. This model was evaluated on all wet appliances for both datasets. As in the previous case study, the following results correspond to an evening peak period between 17:00 and 19:00.
In general, the same observations can be made in Tables 2-4 as in the previous case. Moreover, in the case of the washing machines, the results are comparable (and in some cases slightly worse) than the ones in the previous scenario.

Different Peak Periods
DR programs for peak shaving have been targeting mostly the early evening peak, but the frequency of alerts and the duration of DR events vary between these programs (2 h, 2.5 h, 3 h, and 4 h duration). For the 7-day types and the REFIT dataset, different peak periods were examined for all wet appliances. Apart from the evening peak also used in the previous case studies, an early morning peak (07.00-09.00) was also chosen as relevant for UK datasets [38,39]. Different evening peak slots were also investigated (17.00-19.00, 17.00-20.00, and 16.00-20.00) to examine the different approaches of DR programs.
Overall, the three appliances studied in the REFIT dataset demonstrate different results, as can be seen in Table 5, which in turn offer information regarding the specific average use of each appliance. For example, the washing machines and tumble dryers exhibit better results (i.e., they are used more consistently) in the afternoon (AUC = 0.788 and 0.792, respectively, for the slot 16:00-20:00), while the dishwashers exhibit better results in the morning (AUC = 0.772 for the slot 07:00-09:00). The model assumes in this case that the probability of the use of an appliance on any given day depends only on whether it was used the previous day (Y = 1 in Equation (7)) or not (Y = 0 in Equation (7)). The model was evaluated on the washing machines for the 17.00-19.00 peak period in both datasets. In this case, for the REFIT dataset, the best F 1 score value was 0.316 for six households with an AUC value of 0.725. In the case of the workdays/non-workdays model, the respective values for a selection of six households were 0.367 for the F 1 score, while the best F 1 score was obtained for a selection of four households, with a value of 0.379. In both the latter cases, AUC reached 0.770.
The results in this case (Table 6) are worse than in the case of the calculation of the probability regarding the appliance use for workdays/non-workdays and for individual days.

Multi-Appliance DR Scenario
The selection of a pre-defined number of houses as receivers of a DR alert comes with a risk. For example, in the case of appliances used infrequently in a household, the selection of a relatively large number of houses as receivers of an alert will result in a number of meaningless alerts (to houses with a very low probability to use the appliance of interest in the specific timeframe). This issue can be avoided through the dynamic selection of houses that will receive a DR alert based on a threshold regarding the probability that the appliance of interest will be used in the desired timeframe. Within this context, in the multi-appliance scenario presented here, alerts are sent based on a probability threshold (0.1, 0.15, 0.2, 0.25, 0.3), resulting in a different number of receivers for each day. In this scenario, it is assumed that more than one appliance per household will participate in a DR implementation. All wet appliances in the REFIT dataset and all wet appliances and water heaters in the HES dataset are considered. The 7-day type model is applied and the 17.00-19.00 peak period is used. The model is tested regarding its ability to predict the households in the following scenarios: 1.
Any one of the selected appliances will be used. Each residence receives a general alert asking to lower consumption. A result is considered a True Positive (TP) if any of the studied appliances are ON in the respective household during the DR event; 2.
A specific appliance among the selected ones will be used. Each residence receives a personalized alert to turn off the appliance with the highest calculated probability during the peak period. A result is a TP only if this appliance is ON during the DR event.
The results in this case (Table 7) are significantly improved compared to the singleappliance scenario. For the REFIT database, the F 1 score reaches up to 0.513 (for Scenario 1-any of the studied appliances-and an average number of 9.719 alerts to be sent), while in Scenario 2-the corresponding single-appliance case-(7-day type model, 17:00-19:00 peak period, REFIT database) the F 1 score is 0.353 for 10 alerts to be sent.

Ablation Study
To check whether a simpler model having only the start time distribution available as the probability for ranking consumers is more accurate than a more complete model as presented in Section 2, we ran the following ablation study: • For REFIT, for the time interval 17:00-19:00 and focusing on the washing machine, when we removed the probability of an appliance being ON on a given day type, by setting the probability equal to 1 for all day types, the AUC decreased from 0.77 to 0.72; • Again, for the REFIT dataset, when we set the probability distribution of an appliance being ON on a specific time during the day to a uniform distribution, rendering it indifferent, the AUC decreased from 0.77 to 0.68; • The same experiment for HES decreased the AUC from 0.72 to 0.7 in the first scenario and from 0.72 to 0.67 in the second scenario.
This is an indication that these two components contribute sufficiently to the forecasting ability of the complete model.

Discussion
In terms of the proposed model evaluation, the results are significantly better than the random selection process outcomes (the AUC value ranges, in all cases, between 0.651 and 0.788). More importantly, there are valid reasons to believe that the results can be further improved through the utilization of more data. For example, the REFIT dataset includes as aforementioned 341 usable days. Among them, 170 were used for training. When considering the workdays/non-workdays scenario, this amounts to about 121 workday samples and 49 non-workday samples used to train the proposed model. However, in the case of the differentiation of each individual day, this training set corresponds to an average of only about 24 samples for each individual day.
The scenario regarding the different studied time slots yields similar results in terms of the effectiveness of the proposed model. However, this is partly to be expected as the selected time slots coincide with reported peak periods. Nevertheless, it is interesting to note that the respective results can offer practical information regarding the selection of specific appliances according to the time slot used in a DR implementation. Naturally, this is an area that requires further research to reach more concrete results.
The same holds for the calculation of the appliance use probability based on the operation status of the appliance on the previous day. Compared to the random selection process, the results of the model are significantly better, albeit not as good as the results obtained by the workdays/non-workdays and 7-day type models. Still, the results of this model show that there are merits in the consideration of the previous use of an appliance in the calculation of its use probability. The results may improve significantly if the previous use consideration is extended to cover more days in the past.
Regarding the multi-appliance scenarios, the results show that the respective models can serve as a basis for practical DR implementations, as they combine a multitude of appliances of interest, a concept for the provision of personalized alerts, and a method for the dynamic selection of households as receivers of meaningful DR alerts

Conclusions
This paper presented a sophisticated DRAS, which is part of a peak-shaving DR program. Through the creation of appliance models based on past appliance measurements, the DRAS chooses the households with the highest probability of using their DR appliance. Three alternatives were examined for appliance use modeling along with different types of appliances and peak periods. Specifically, appliance use probability was calculated by differentiating among workdays/non-workdays as well as among each individual weekday. The third modeling alternative related the current and future appliance use probability with its previous use (i.e., the possibility that the appliance was used the previous day). Two main scenarios were examined: a single-appliance scenario and a multi-appliance scenario. The multi-appliance scenario performed significantly well for both datasets. The results of the paper are encouraging toward creating personalized approaches for DR services that could improve attractiveness and increase participation in DR peak-shaving programs. Specifically, in terms of the AUC, the scenarios studied with the proposed models yield values ranging between 0.651 and 0.788, which represent a clear improvement compared to the values in the baseline scenario, which range between 0.48 and 0.52. Moreover, in terms of the F 1 score, the proposed models improve the baseline results up to 221.05%.
The work presented here opens up several pathways for future research. For example, more research is needed in the study of different and more DR appliances to identify the best combination of meaningful DR alerts for each household. Moreover, an integration of end-users' compliance into the model could greatly improve its real-life performance. A future implementation should also consider the concerns of end users in relation to privacy issues [40]. The potential lack of trust could lead to a high dropout rate for DR programs and should be taken into account during real-life implementations.