# Deep Reinforcement Learning for Autonomous Water Heater Control

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Literature Review

#### 1.2. Research Gaps

#### 1.3. Study Contributions

## 2. Background

#### 2.1. Reinforcement Learning

#### 2.2. Heat Pump Water Heaters

#### 2.3. The CTA-2045 Standard

## 3. Methodology

#### 3.1. Simulator Development

#### 3.2. Reinforcement Learning Problem Formulation

#### 3.3. Reinforcement Learning Training

Algorithm 1 DQN training |

Initialize replay memory with capacity of N |

Initialize policy network with random weights |

Initialize the counter: $\xi =0$ |

for $episode=1,M$ do |

Get the initial state ${s}_{t}$ |

Copy the weights of policy network to target network at every $K$ episode |

for $t=1,dt,T$ do |

Scale state values between [0, 1] Update the value of $\epsilon $: $\epsilon ={\epsilon}_{end}+\left({\epsilon}_{start}-{\epsilon}_{end}\right)\times {e}^{-\xi /{\epsilon}_{decay}}$ |

Select a random action ${a}_{t}$ with probability of $\epsilon $ |

Otherwise select action ${a}_{t}$, using the policy network |

Execute action ${a}_{t}$ in the water heater model |

Observe reward ${r}_{t}$ and next state ${s}_{t+1}$ |

Store transitions $\left({s}_{t},{a}_{t},{r}_{t},{s}_{t+1}\right)$ in replay memory |

Move to next state: ${s}_{t}:={s}_{t+1}$ |

Sample a batch of transitions $\left(s,a,r,{s}^{\prime}\right)$ from replay memory |

Calculate state action values $Q\left(s,a,\theta \right)$ using the policy network |

Calculate the expected state action values $y=\left\{\begin{array}{c}rforterminal\\ r+\gamma ma{x}_{{a}^{\prime}}Q\left({s}^{\prime},{a}^{\prime},\widehat{\theta}\right)fornonterminal\end{array}\right.$ |

Calculate the loss between the calculated $Q\left(s,a,\theta \right)$ and $y$ values using Equation (17) |

Perform a gradient descent and update the weights of $\theta $ |

Update the counter: $\xi :=\xi +1$ |

end |

end |

#### 3.4. Rule-Based, MPC-Based, and Optimzation-Based Controllers

Algorithm 2 Rule-based controller |

for $t=1,dt,T$ do |

if $\mathrm{mean}\left({\lambda}_{t:t+dt}\right)\mathrm{mean}\left({\lambda}_{1:T}\right)$ and $\mathrm{sum}\left({V}_{t:t+dt}\right)0$ then |

send Load up |

else if $\mathrm{mean}\left({\lambda}_{t:t+dt}\right)\mathrm{mean}\left({\lambda}_{1:T}\right)$ and $\mathrm{sum}\left({V}_{t:t+dt}\right)=0$ then |

send Normal |

else |

send Shed |

#### 3.5. Performance Testing

## 4. Experiments and Results

#### 4.1. Water Heater Experiments and Simulator Results

#### 4.2. Price and Hot Water Usage Profiles

#### 4.3. Training Results

#### 4.4. Testing Restuls

_{HP}values and less element usages. For example, the RL agent #2 used elements for 106 min and had the average COP

_{HP}of 5.39, whereas the baseline used elements for 121 min and had the average COP

_{HP}of 4.79. RL agent #1, however, was not able to reduce the element usage because it could not do any preheating prior to large hot water draws since it does not have any information about future hot water draws. Figure 8 shows the operation of RL agent #1 on a sample day. As shown in the figure, the agent only sent normal and load up commands when the price was low and was able to avoid higher price periods. For example, the agent heated up the water by sending load up commands before and after the peak price around 15:00. Consequently, the agent could avoid peak price periods. In addition, the agent sent shed commands most of the time to operate in lower temperatures for higher COP values. However, unlike RL agent #1, did not do any heating before or after peak price periods and therefore could not avoid the peak price at 17:00.

_{HP}, and therefore got the lowest COP

_{HP}values. The rule-based controller simply operated the water heater in high temperatures to minimize the use of elements at the expense of low heat pump efficiency. The RL agents outperformed their corresponding MPC controllers for all look ahead periods including 30 min, 1 h and 2 h. For example, the RL agent with 2 h of future hot water usage volumes and electricity prices information (RL agent #4) achieved USD 0.88, while MPC #4 achieved USD 0.94 using the same information. Figure 9 and Figure 10 show the operations based on RL agent #4 and MPC #4, respectively. Overall, the RL agents maintained a good balance between element usage and COP

_{HP}. The MPC-based controllers achieved the highest COP

_{HP}values, which however increased their element use. For example, RL agent #4 had an average COP

_{HP}of 5.47, while MPC #4, the equivalent of RL agent #4, had an average COP

_{HP}of 5.61. RL agent #4 used elements for 54 min only while MPC #4 used elements for 84 min. As a result, despite very close performance, RL agent #4 cost 6¢ less than MPC #4 in five days due to less element usage. The cost of RL agent #4 was also very close to the cost of the optimization-based controller. These results, therefore, show that RL is able to reduce electricity cost without any prior knowledge about the water heater and its power usage. RL can also adapt to unseen price signal and water usage profiles.

## 5. Conclusions and Limitations

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Nomenclature

Abbreviations | |

BEMS | Building Energy Management System |

COP | Coefficient of Performance |

DHW | Domestic Hot Water |

DNN | Deep Neural Networks |

DOE | Department of Energy |

DP | Dynamic Programming |

DQN | Deep Q-networks |

DR | Demand Response |

FQI | Fitted Q-iteration |

GA | Genetic Algorithms |

GPM | Gallons Per Minute |

HPWH | Heat Pump Water Heater |

HVAC | Heating, Ventilation, and Air Conditioning |

MDP | Markov Decision Process |

MPC | Model Predictive Control |

RL | Reinforcement Learning |

RMSE | Root Mean Squared Error |

RTP | Real-Time Pricing |

TOU | Time-of-Use |

Symbols | |

$\dot{Q}$ | Heat added |

$\dot{m}$ | Mass flow rate |

$\Delta T$ | Temperature change |

m | Water mass |

s1 | Heat pump cycle |

s2 | Lower element cycle |

s3 | Upper element cycle |

V | Hot water usage volume |

$C$ | Heat fraction |

$N$ | Node |

$T$ | Temperature |

$UA$ | Standby heat loss coefficient |

$\lambda $ | Electricity price |

Subscripts | |

amb | Ambient |

E | Element |

HP | Heat pump |

set | Setpoint |

w | Water |

## References

- International Energy Agency. Renewables 2019; International Energy Agency: Paris, France, 2019. [Google Scholar]
- Enerdata. Global Energy Statistical Yearbook 2020. Available online: https://yearbook.enerdata.net/renewables/renewable-in-electricity-production-share.html (accessed on 3 December 2020).
- Jensen, S.Ø.; Marszal-Pomianowska, A.; Lollini, R.; Pasut, W.; Knotzer, A.; Engelmann, P.; Stafford, A.; Reynders, G. IEA EBC Annex 67 energy flexible buildings. Energy Build.
**2017**, 155, 25–34. [Google Scholar] [CrossRef] [Green Version] - Department of Energy. Demand Response. Available online: https://www.energy.gov/oe/activities/technology-development/grid-modernization-and-smart-grid/demand-response (accessed on 14 December 2020).
- National Rural Electric Cooperative Association. Standardized Communications for Demand Response; National Rural Electric Cooperative Association: Arlington, VA, USA, 2018. [Google Scholar]
- Ruelens, F.; Claessens, B.J.; Quaiyum, S.; Schutter, B.D.; Babuška, R.; Belmans, R. Reinforcement learning applied to an electric water heater: From theory to practice. IEEE Trans. Smart Grid
**2018**, 9, 3792–3800. [Google Scholar] [CrossRef] [Green Version] - Energy Information Administration. Annual Energy Outlook 2020 with Projections to 2050; U.S. Energy Information Administration: Washington, DC, USA, 2020.
- Energy Information Administration. 2015 Residential Energy Consumption Survey; Energy Information Administration: Washington, DC, USA, 2015.
- Department of Energy. New Infographic and Projects to Keep Your Energy Bills out of Hot Water. Available online: https://www.energy.gov/articles/new-infographic-and-projects-keep-your-energy-bills-out-hot-water (accessed on 14 December 2020).
- Wang, F.; Kusnandar; Lin, H.; Tsai, M. Energy Efficient Approaches by Retrofitting Heat Pumps Water Heating System for a University Dormitory. Buildings
**2021**, 11, 356. [Google Scholar] [CrossRef] - Wang, Z.; Hong, T. Reinforcement learning for building controls: The opportunities and challenges. Appl. Energy
**2020**, 269, 115036. [Google Scholar] [CrossRef] - Vanthournout, K.; Hulst, R.D.; Geysen, D.; Jacobs, G. A Smart Domestic Hot Water Buffer. IEEE Trans. Smart Grid
**2012**, 3, 2121–2127. [Google Scholar] [CrossRef] - Péan, T.Q.; Ortiz, J.; Salom, J. Impact of Demand-Side Management on Thermal Comfort and Energy Costs in a Residential nZEB. Buildings
**2017**, 7, 37. [Google Scholar] [CrossRef] [Green Version] - Perera, D.W.; Skeie, N.-O. Comparison of Space Heating Energy Consumption of Residential Buildings Based on Traditional and Model-Based Techniques. Buildings
**2017**, 7, 27. [Google Scholar] [CrossRef] [Green Version] - Manrique Delgado, B.; Ruusu, R.; Hasan, A.; Kilpeläinen, S.; Cao, S.; Sirén, K. Energetic, Cost, and Comfort Performance of a Nearly-Zero Energy Building Including Rule-Based Control of Four Sources of Energy Flexibility. Buildings
**2018**, 8, 172. [Google Scholar] [CrossRef] [Green Version] - Killian, M.; Kozek, M. Ten questions concerning model predictive control for energy efficient buildings. Build. Environ.
**2016**, 105, 403–412. [Google Scholar] [CrossRef] - Perera, A.T.D.; Kamalaruban, P. Applications of reinforcement learning in energy systems. Renew. Sustain. Energy Rev.
**2021**, 137, 110618. [Google Scholar] [CrossRef] - Tarragona, J.; Fernández, C.; de Gracia, A. Model predictive control applied to a heating system with PV panels and thermal energy storage. Energy
**2020**, 197, 117229. [Google Scholar] [CrossRef] - Gholamibozanjani, G.; Tarragona, J.; Gracia, A.d.; Fernández, C.; Cabeza, L.F.; Farid, M. Model predictive control strategy applied to different types of building for space heating. In Thermal Energy Storage with Phase Change Materials; Mohammed Farid, A.A., Gohar, G., Eds.; CRC Press: Boca Raton, FL, USA, 2021; Volume 4.3. [Google Scholar]
- Starke, M.; Munk, J.; Zandi, H.; Kuruganti, T.; Buckberry, H.; Hall, J.; Leverette, J. Real-Time MPC for Residential Building Water Heater Systems to Support the Electric Grid. In Proceedings of the 2020 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 17–20 February 2020; pp. 1–5. [Google Scholar]
- Wang, J.; Li, C.; Li, P.; Che, Y.; Zhou, Y.; Li, Y. MPC-based interval number optimization for electric water heater scheduling in uncertain environments. Front. Energy
**2019**. [Google Scholar] [CrossRef] - Nazemi, S.D.; Jafari, M.A.; Zaidan, E. An Incentive-Based Optimization Approach for Load Scheduling Problem in Smart Building Communities. Buildings
**2021**, 11, 237. [Google Scholar] [CrossRef] - Görges, D. Relations between Model Predictive Control and Reinforcement Learning. IFAC-PapersOnLine
**2017**, 50, 4920–4928. [Google Scholar] [CrossRef] - Wei, T.; Yanzhi, W.; Zhu, Q. Deep reinforcement learning for building HVAC control. In Proceedings of the 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar]
- Kurte, K.; Munk, J.; Kotevska, O.; Amasyali, K.; Smith, R.; McKee, E.; Du, Y.; Cui, B.; Kuruganti, T.; Zandi, H. Evaluating the Adaptability of Reinforcement Learning Based HVAC Control for Residential Houses. Sustainability
**2020**, 12, 7727. [Google Scholar] [CrossRef] - Wang, Y.; Velswamy, K.; Huang, B. A Long-Short Term Memory Recurrent Neural Network Based Reinforcement Learning Controller for Office Heating Ventilation and Air Conditioning Systems. Processes
**2017**, 5, 46. [Google Scholar] [CrossRef] [Green Version] - Kazmi, H.; Mehmood, F.; Lodeweyckx, S.; Driesen, J. Gigawatt-hour scale savings on a budget of zero: Deep reinforcement learning based optimal control of hot water systems. Energy
**2018**, 144, 159–168. [Google Scholar] [CrossRef] [Green Version] - Al-jabery, K.; Wunsch, D.C.; Xiong, J.; Shi, Y. A novel grid load management technique using electric water heaters and Q-learning. In Proceedings of the 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm), Venice, Italy, 3–6 November 2014; pp. 776–781. [Google Scholar]
- Zsembinszki, G.; Fernández, C.; Vérez, D.; Cabeza, L.F. Deep Learning Optimal Control for a Complex Hybrid Energy Storage System. Buildings
**2021**, 11, 194. [Google Scholar] [CrossRef] - Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature
**2015**, 518, 529–533. [Google Scholar] [CrossRef] - Vázquez-Canteli, J.R.; Nagy, Z. Reinforcement learning for demand response: A review of algorithms and modeling techniques. Appl. Energy
**2019**, 235, 1072–1089. [Google Scholar] [CrossRef] - Han, M.; May, R.; Zhang, X.; Wang, X.; Pan, S.; Da, Y.; Jin, Y. A novel reinforcement learning method for improving occupant comfort via window opening and closing. Sustain. Cities Soc.
**2020**, 61, 102247. [Google Scholar] [CrossRef] - Han, M.; May, R.; Zhang, X.; Wang, X.; Pan, S.; Yan, D.; Jin, Y.; Xu, L. A review of reinforcement learning methodologies for controlling occupant comfort in buildings. Sustain. Cities Soc.
**2019**, 51, 101748. [Google Scholar] [CrossRef] - Boudreaux, P.R.; Munk, J.D.; Jackson, R.K.; Gehl, A.C.; Parkison, A.E.; Nutaro, J.J. Improving Heat Pump Water Heater Effeciency by Avoiding Electric Resistance Heater Use; Oak Ridge National Laboratory: Oak Ridge, TN, USA, 2014.
- Hepbasli, A.; Kalinci, Y. A review of heat pump water heating systems. Renew. Sustain. Energy Rev.
**2009**, 13, 1211–1229. [Google Scholar] [CrossRef] - Hudon, K.; Sparn, B.; Christensen, D.; Maguire, J. Heat Pump Water Heater Technology Assessment Based on Laboratory Research and Energy Simulation Models. In Proceedings of the ASHRAE Winter Conference, Chicago, IL, USA, 21–25 January 2012. [Google Scholar]
- Clarke, T.; Slay, T.; Eustis, C.; Bass, R.B. Aggregation of Residential Water Heaters for Peak Shifting and Frequency Response Services. IEEE Open Access J. Power Energy
**2020**, 7, 22–30. [Google Scholar] [CrossRef] - Bonneville Power Administration. CTA-2045 Water Heater Demonstration Report; Bonneville Power Administration: Portland, OR, USA, 2018.
- Brandi, S.; Piscitelli, M.S.; Martellacci, M.; Capozzoli, A. Deep reinforcement learning to optimise indoor temperature control and heating energy consumption in buildings. Energy Build.
**2020**, 224, 110225. [Google Scholar] [CrossRef] - Sparn, B.; Hudon, K.; Christensen, D. Laboratory Performance Evaluation of Residential Integrated Heat Pump Water Heaters; National Renewable Energy Laboratory: Golden, CO, USA, 2014.
- Skycentrics. Available online: https://skycentrics.com/ (accessed on 3 November 2020).
- Department of Energy. Building America DHW Event Schedule Generator. Available online: https://www.energy.gov/eere/buildings/downloads/building-america-dhw-event-schedule-generator (accessed on 4 February 2020).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8026–8037. [Google Scholar]

RL Agent # | State Variables |
---|---|

1 | ${T}_{w}{\left(1\right)}_{t},\dots {T}_{w}{\left(6\right)}_{t}$, $s{1}_{t}$, $s{2}_{t}$, $s{3}_{t}$, $mean({\lambda}_{t:t+15}),mean({\lambda}_{t+15:t+2\times 15})$ |

2 | ${T}_{w}{\left(1\right)}_{t},\dots {T}_{w}{\left(6\right)}_{t}$, $s{1}_{t}$, $s{2}_{t}$, $s{3}_{t}$, $mean({\lambda}_{t:t+15}),mean({\lambda}_{t+15:t+2\times 15})$, $mean({V}_{t:t+15}),mean({V}_{t+15:t+2\times 15})$ |

3 | ${T}_{w}{\left(1\right)}_{t},\dots {T}_{w}{\left(6\right)}_{t}$, $s{1}_{t}$, $s{2}_{t}$, $s{3}_{t}$, $mean({\lambda}_{t:t+15}),\dots ,mean({\lambda}_{t+5\times 15:t+6\times 15})$, $mean\left({V}_{t:t+15}\right),\dots ,mean({V}_{t+5\times 15:t+6\times 15})$ |

4 | ${T}_{w}{\left(1\right)}_{t},\dots {T}_{w}{\left(6\right)}_{t}$, $s{1}_{t}$, $s{2}_{t}$, $s{3}_{t}$, $mean({\lambda}_{t:t+15}),\dots ,mean({\lambda}_{t+7\times 15:t+8\times 15})$, $mean\left({V}_{t:t+15}\right),\dots ,mean({V}_{t+7\times 15:t+8\times 15})$ |

Instrument Type | Instrument Name | Error Range |
---|---|---|

Thermistor | Omega 44031 10 kOhm Precision | ±0.01 °C |

Ambient temperature sensor | Campbell Scientific HC2S3 | ±0.01 °C |

Power meter | WattNode Pulse | ±0.5% |

Flow meter | Omega FTB4607 | ±2.0% |

Parameter | Explanation | Value | Unit |
---|---|---|---|

${c}_{p}$ | Specific heat of water | 4.184 | $\mathrm{J}/\left(\mathrm{g}\xb7\mathrm{K}\right)$ |

$m\left(N\right)$ | Water mass in a node | 41.7 | kg |

${T}_{amb}$ | Ambient temperature | 21.5 | °C |

${T}_{inlet}$ | Inlet water temperature | 23.9 | °C |

${T}_{set}$ | Temperature setpoint | 51 | °C |

${P}_{HP}$ | Heat pump rated power | 400 | $\mathrm{W}$ |

${P}_{E}$ | Element rated power | 4500 | $\mathrm{W}$ |

COP_{HP} | COP of heat pump | $-0.004\times {T}_{w}{\left(5\right)}^{2}0.19\times {T}_{w}\left(5\right)+3.56$ | N/A |

COP_{E} | COP of element | 0.99 | N/A |

$UA\left(1\right)\dots UA\left(6\right)$ | Standby heat loss coefficients | 0.04, 0.03, 0.03, 0.03, 0.03, 0.06 | $\mathrm{kJ}/\left(\mathrm{min}\xb7\mathrm{K}\right)$ |

$\beta $ | Water tank wall coefficient | 1.12 | N/A |

$s$ | Perceived distance | 3.7 | N/A |

$p$ | Concavity parameter | 1.06 | N/A |

Parameter | Explanation |
---|---|

Batch size | 32 |

Discount rate ($\gamma $) | 0.99 |

${\epsilon}_{start}$ | 0.5 |

${\epsilon}_{end}$ | 0.03 |

${\epsilon}_{decay}$ | 140,000 |

Target network update frequency ($K$) | Every episode |

Number of episodes ($M$) | 125 |

Episode length ($T$) | 61 days |

Replay memory capacity ($N$) | 25,000 |

DNN structure | [L,512,512,3] ^{1} |

DNN optimizer | RMSprop |

Optimizer learning rate | 0.0001 |

DNN loss function | Mean squared error (MSE) |

Control time step ($dt$) | 15 min |

^{1}L is the number of state variables.

Operation Strategy | Look Ahead | Electricity Cost | Element Usage | Average COP_{HP} |
---|---|---|---|---|

Baseline | N/A | USD 6.04 | 121 min | 4.79 |

RL agent #1 | 30 min * | USD 5.03 | 226 min | 5.30 |

RL agent #2 | 30 min | USD 4.37 | 106 min | 5.39 |

RL agent #3 | 1 h | USD 3.55 | 73 min | 5.47 |

RL agent #4 | 2 h | USD 3.44 | 71 min | 5.58 |

Operation Strategy | Look Ahead | Electricity Cost | Element Usage | Average COP_{HP} |
---|---|---|---|---|

Baseline | N/A | USD 1.36 | 75 min | 4.88 |

Rule-based | N/A | USD 1.10 | 58 min | 4.63 |

RL agent #2 | 30 min | USD 1.10 | 68 min | 5.35 |

RL agent #3 | 1 h | USD 0.89 | 58 min | 5.27 |

RL agent #4 | 2 h | USD 0.88 | 54 min | 5.47 |

MPC #2 | 30 min | USD 1.31 | 157 min | 5.76 |

MPC #3 | 1 h | USD 1.04 | 104 min | 5.70 |

MPC #4 | 2 h | USD 0.94 | 84 min | 5.61 |

Optimization | 5 days | USD 0.81 | 47 min | 5.48 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Amasyali, K.; Munk, J.; Kurte, K.; Kuruganti, T.; Zandi, H.
Deep Reinforcement Learning for Autonomous Water Heater Control. *Buildings* **2021**, *11*, 548.
https://doi.org/10.3390/buildings11110548

**AMA Style**

Amasyali K, Munk J, Kurte K, Kuruganti T, Zandi H.
Deep Reinforcement Learning for Autonomous Water Heater Control. *Buildings*. 2021; 11(11):548.
https://doi.org/10.3390/buildings11110548

**Chicago/Turabian Style**

Amasyali, Kadir, Jeffrey Munk, Kuldeep Kurte, Teja Kuruganti, and Helia Zandi.
2021. "Deep Reinforcement Learning for Autonomous Water Heater Control" *Buildings* 11, no. 11: 548.
https://doi.org/10.3390/buildings11110548