Centralized and Decentralized Optimal Control of Variable Speed Heat Pumps

Utility service providers are often challenged with the synchronization of thermostatically controlled loads. Load synchronization, as a result of naturally occurring and demand-response events, has the potential to damage power distribution equipment. Because thermostatically controlled loads constitute most of the power consumed by the grid at any given time, the proper control of such devices can lead to significant energy savings and improved grid stability. The contribution of this paper is the development of an optimal control algorithm for commonly used variable speed heat pumps. By means of selective peer-to-peer communication, our control architecture allows for the regulation of home temperatures while simultaneously minimizing aggregate power consumption, and aggregate load volatility. An optimal centralized controller is also explored and compared against its decentralized counterpart.


Introduction
The goal of a 100% renewable electric supply system presents significant challenges to the organizations responsible for maintaining the reliability and resilience of electric grids. Although grid-level battery storage is often touted as the solution to integrating high penetrations of variable generating resources [1,2] (e.g., solar PV and wind generators), there is a growing body of research pointing to the potential for flexibility and control of demand to play a significant role in grid operations going forward. In this regard, thermostatically controlled loads (TCLs) are some of the most sensible areas to explore because they make up a significant portion of demand on any given electric grid and there is a great deal of flexibility in when they actually draw power.
To realize the true potential of TCLs as a resource to aid in grid operations, it is widely understood that numerous loads must be aggregated and controlled in a coordinated fashion. Many researchers have proposed methods of centralized control of aggregated TCL's and the problem of synchronization often arises [3]. The most common example of synchronization occurs during load-shedding demand-response events. Many utilities have programs which homeowners agree to allow the grid operators to occasionally turn off their air conditioning compressor in return for a rebate or reduction in rates. When the grid operator anticipates that they will have a difficult time meeting load requests, then many compressors can be shut off, thus shedding that load. The problem arises when the demand-response event is released. At this point, a large portion of the compressors will cycle "on" because they have no doubt risen in temperature during the demand-response event. This often results in an immediate peak that is higher than the peak they were attempting to avoid. Subsequently, a period of oscillations occurs, during which time the aggregate load experiences large oscillations.
This paper explores innovative approaches to controlling an aggregation of TCLs by applying a combination of optimum control theory and localized communication between

Background
In the United States, including many other modernized countries, the majority of generated electricity is therefore consumed by thermostatically controlled loads (TCLs). These TCLs, defined by their ability to store thermal energy (and are controlled as such), loosely resemble that of a leaky battery. For instance, water-heaters, a type of TCL, heats inlet water in an insulated storage tank to within a prescribed temperature range where it awaits its use. Including water-heaters, other prominent TCLs, defined by their high Energies 2021, 14, 4012 3 of 18 energy consumption, are HVAC systems and refrigerator units. It should be noted these TCL devices, including many others, operate on a similar premise. Typically, TCLs are governed by simplistic toggle condition, like that of an on/off controller, also referred to as a Bang-Bang control. Because thermal energy is stored within a medium, there exists a level flexibility as to when such energy is consumed/replenished. Through the proper control of such devices, as will be demonstrated in this work, certain beneficial aggregate characteristics may be achieved for instance the reduction in aggregate power consumption and its associated volatility. In the following sections a mathematical model for a particular TCL, that being a variable speed heat pump (VSHP) is developed. This mathematical model provides a causal relationship between the control action of a population of VSHPs and the aggregate effects of power consumption. Although this research focuses on a particular TCL, its application may be expanded other similarly governed devices.

Dynamics
This paper uses the well-established second-order equivalent thermal parameter (EPT) model to describe the temperature dynamics of a residential home. By means of the thermal circuit shown in Figure 1, two coupled first order differential Equations are formed. and, where T A and T M denote the interior air and lumped mass temperatures of a residential home. Adjacent to the residential control volume is the surrounding outside air temperature T o . With regard to control theory, both the outside temperature T o and the internal heat generating elements, denoted Q M , represent disturbances to this thermal system. Rejection of such disturbances is accomplished via the home's HVAC system, denoted Q A . In particular, Q A represents heat being removed from the interior air. The elegance of this ETP model is that it is fully defined by four measurable parameters, that being the thermal capacities C A , C M and the conductive properties R 1 , R 2 . As will be further discussed in Section 2.4, a parameter estimation model will be presented, which provides systematic method to estimate these thermal parameters. Through the manipulation of Equations (1) and (2) a second-order differential equation is formed in terms of T A and its derivativesṪ A andT A , Within Equation (3), the terms Q A andQ A are replaced with ηm and ηṁ respectively. The negative constant term η < 0 represents the heat removal capacity of the homes HVAC system and m, a controllable parameter, scales η according to the governing control algorithm. Q M , similar to that of Q A , represents alternative heat sources/sinks. In a typical residential setting, Q M includes, but is not limited to, heat generated by home appliances, solar radiation, and the dissipation of heat between the home's lumped mass and ground. Based on the minimal effects of Q M , especially with regard to Q A := ηm, its inclusion will be neglected in this study.  Equation (3) is further abbreviated by replacing each constant term with elements of a parameter vector θ ∈ R 5 .
For the sake of clarity, a VSHP regulates the indoor air temperature, T A , by adjusting its cooling capacity, m, between being fully on or completely off. This range of values may be mathematically defined as the continuous set m ∈ [0, 1]. A goal of this research is the development of a controller which provides certain beneficial characteristics to an electric utility service provider. These characteristics are the minimization of aggregate power consumption, defined as and the instantaneous rate of change of aggregate power consumption,Ṗ agg . Along with the aggregate power design constraints, this controller must also maintain indoor air temperatures at, or near, its associated set-point temperature T sp . To satisfy the combination of these three control requirements, a novel approach is taken. Instead of directly controlling the cooling capacity of a VSHP by means of its control variable m, this study opts to additionally control its derivativeṁ. To distinguishṁ as a controllable parameter,  (4), incorporating this novel control approach, is defined as follows, where x = T AṪA and, Through the application of backward Euler method, Equation (6) is therefore rewritten in a discrete format as, where ∆t and k denote the step-length and simulation time-step respectively.

Decentralized Network Communication
Much of the proposed literature in smart grid systems, particularly TCLs, are constructed using a centralized framework. In such a framework, a utility service provider determines the control action of a population of TCLs, typically by means of tracking an aggregate reference signal. This methodology, including other centralized controllers, have shown great performance benefits. However, there are several glaring drawbacks to a centralized framework. Some of the more prominent challenges include its scalability, vulnerability to cyber-attack, and inherent lack of consumer privacy. To address these challenges, a decentralized framework is introduced in this paper. A decentralized framework relies on the autonomy of participating agent/device to calculate their own control action. As will be demonstrated, the performance of a decentralized controller is further improved through the localized communication between neighboring TCLs.
To address the privacy concerns of participating end-users, communication between VSHPs is limited to what might be ascertained if one were to open their window and listen to when a neighbor's compressor is cycled off/on. For this reason, each VSHP can only communicate with four other neighboring VSHPs. As a result of this communication, a VSHP's control action m is partially influenced by the previous control actions of its neighbor set N p , where N p is the set of all VSHPs connected to the p th agent. The entire population set is similarly defined as N = {1, · · · , N}, which is the set all homes in the given population, N p ⊆ N .
Conventionally, communication is modeled via the elements of an adjacency matrix, Based on the communication constraints proposed above, a random regular graph with a connection degree N cd = 4, such as the one shown in Figure 2, is used to simulate a population of VSHPs.

Demand Response
In addition to the implementation of structured communication, this study also observes the effects of a demand-response event on the proposed controller(s) defined Section 3. A demand-response event, also referred to as a conservation event, is a circumstance where power consumption is systematically controlled by the electric utility to maintain both power generating equipment and the means of distribution within safe operating conditions. Many types of demand-response events exist ranging from emergency protocol to adaptive price control and other related ancillary services. Correspondingly, this study implements the following demand-response event in an effort to test the governing controller with strenuous and unpredictable conditions. Over the time-span ∆t DR , centered about the warmest outside temperature peak T o,max , all VSHPs are prevented from conditioning their associated home. This demand-response event, aside from long-term power outages, represents a worst-case scenario regarding aggregate power consumption. During this demand-response event indoor air temperatures will no doubt rise. Upon reinstatement of power, all VSHPs will independently begin cooling their respective homes resulting in a characteristically large spikes in aggregate power consumption.
To prime the readers of the challenges associated with this demand-response event, a population of VSHP's are simulated, whose control action, m, is governed by a typical proportional-integral-derivative (PID) controller. This PID controller represents what would likely be used to regulate a population of independently operated VSHP devices. With the system diagram of Figure 3, in conjunction with the controller defined by Equation (8), a sample population of VSHPs are simulated. The results of this simulation are graphically presented as the subplots of Figure 4. As previously mentioned, a spike in aggregate power consumption is observed. This spike in aggregate power consumption is further reduced at the expense of a larger temperature deviation from the customers set-point preference. The gains of the PID controller in Equation (8) are judicially selected to balance each home's indoor temperature and the aggregate power of the population. This means that we performed an empirical optimization of the PID gains such that the temperature and power response of the system is the best that we could achieve. This study aims to find a more optimal solution which minimizes spikes in aggregate power consumption while simultaneously maintaining home temperatures at or near their associated temperature set-point, T sp .
As depicted in Figure 3, a control action u(t), is determined based on the closed loop error signal e(t) prior to being constrained via the saturation limits of the maximum cooling capacity, m ∈ [0, 1].

Parameter Estimation
With regard to a physical setting, the thermal parameters C A , C M , R 1 , and R 2 , which define the ETP model, are likely to be measured with some statistical error. Given the proposed ETP model is capable of accurately predicting the dynamics of a residential home, these measurement errors will, by deduction, result in suboptimal controller performance. To account for these initial measurement errors, including other errors that might arise, a recursive least squares (RLS) algorithm is employed to systematically update the thermal parameter vector θ ∈ R 5 . Using similar notation presented in [22], an RLS algorithm is constructed by first redefining Equation (4) as, where the control input, y k , and regressor, ϕ k , terms are respectively defined as, Whether physically measured, or in the case of this simulation, generated about a known statistical distribution, all elements within the parameter vector θ 0 must be known prior to running the RLS algorithm. Before the RLS algorithm can predict thermal parameters with sufficient accuracy, both control input and regressor terms are to be collected over the set of initial time-steps k ∈ {1, · · · , k s } to form the following control input and regressor matrices, The last time-step, k s ∈ N, is chosen such that Φ k s Φ k s is non-singular. Given the starting value θ 0 and the adequately populated control input and regressor matrices, which are used to form the inverse covariance matrix, P k s = (Φ k s Φ k s ) −1 , the RLS algorithm has the following sequence of events, Within Algorithm 1, λ ∈ R denotes the exponential forgetting factor. Unlike the sustained plant dynamics of this study, the thermal properties of a residential home are expected to change over time. The inclusion of this forgetting factor allows for long-term adaptation to the present dynamics. It should be noted, as λ → 1 the RLS algorithm with exponential forgetting becomes the vanilla RLS algorithm. Like many other feedback systems, rejection of system level noise is an important step to achieve stable and robust performance. A low-pass filter is applied, via software, to the newly estimated thermal parameters therefore damping system level noise.
Algorithm 1 RLS with Exponential Forgetting.

Controller
The culmination of this research is the development of a decentralized controller, and to a lesser extent, centralized controller for a population of VSHPs. In brief, two frameworks are presented in Sections 3.1 and 3.2, which optimally schedule the control actions of a population of VSHPs by means of model predictive control (MPC). As shown in Figure 5, an MPC solves for the control action(s) m i ∀i ∈ H such that the model dynamics are satisfied and the objective penalty is minimized. Next, the first control action m i 1 :=k is therefore sent to the VSHP device. Assuming the system's response is observable, the measured/predicted states are returned to the controller serving as the initial conditions for the following iteration. This sequence is then repeated until a termination condition is met.
Two indices are used in Sections 3.1 and 3.2 to denote time, that being k and i k . Of the two indices, k denotes the current time-step of the simulation, while i k represents the step(s) of the controllers predicted horizon which symbolically begins at the k th index.

Decentralized Controller (Dc)
In this section, a decentralized controller is developed, whose control action, m, is partially influenced by control actions of its neighbor set N p . The controller depicted in Figure 5 is mathematically defined as the following quadratic optimization program, This optimization program minimizes the objective function, J(·), with respect to the state, x, and control, m and σ, decision variables. Through the manipulation of these decision variables, a horizon of control actions are calculated such that the model dynamics, initial conditions, and saturation limits are upheld. To ensure continuity between time-steps, an equality constraint is placed between the initial state values, x i k =1 , and the measured/predicted response of the plant, X o . Likewise, m is constrained between zero and one, therefore preserving its associated VSHP devices within safe operating conditions. Lastly, the model dynamics of Equation (7), is expressed as a set of linear equality constraints, where the function f CD (·) is defined as, Energies 2021, 14, 4012 10 of 18 As shown in Equation (13), the objective function, J(·), is the summation of each time-step's cumulative objective penalty, DC (·), defined as, Each penalty term of Equation (15) is dedicated to a certain attribute of the desired response. Based the preferences of an electric utility service provider and its end-users alike, this controller must maintain indoor air temperatures while simultaneously minimizing the aggregate power consumption signal and its associated volatility.
The first objective penalty term, provides a means for optimizer to group aggregate effects via peer-to-peer (P2P) communication. This is done so by minimizing the difference in control action, m, between the p th agent and its neighbor set N p . In an effort to reduce aggregate power consumption, defined by Equation (5), individual power consumption is minimized with the penalty term −βηm i k . In addition, a home's indoor air temperature, T A , is maintained at or near its set-point temperature, T sp , via the soft constraint γT i k . As depicted in Figure 6, the double hinge function,T i k , is mathematically expressed as, If the indoor air temperature, T A , deviates above δ + or below δ − , a proportional cost will be accrued. Similarly, the minimization of the penalty term, ζ(Ṫ i k A ) 2 , reduces the rate at which temperature changes in the resulting simulation. Lastly, to ensure the ramp-rate of aggregate power consumption is maintained, the penalty term, τ(σ i k ) 2 , is minimized. This term results in a smoothing effect in aggregate power consumption.  Each of the five objective penalty terms are accompanied by a real numbered objective constant, that being α, β, γ, ζ, and τ. Much like knobs on a dial, these objective constants scale their respective objective penalty term. The qualitative performance of the controller is determined by the relative scale of each objective parameter regarding one another. By deduction, said performance may be tuned in accordance to the attributes described above.

Centralized Controller (Cc)
Much like the decentralized controller defined above, the preceding centralized controller uses a similarly structured optimization program to determine the control actions of a population of VSHPs. However, unlike the decentralized controller, this centralized controller has omniscient knowledge of the states and control actions of its population. Furthermore, during simulation, this optimization program simultaneously solves for the control actions for an entire population set. In this formulation, the notion of network communication is less meaningful as all information is distributed to and from the electric utility service provider, much like a star graph. This quadratic optimization program has the form, min X, M, oe are defined as the collection of all state and control decision variables at the i th k time-step. As may become apparent, the number of decision variables is now directly proportional to the size of the population being simulated. Due to this increased number of decision variables, the associated complexity of the controller so to rises.
The dynamic model is simulated with a set of equality constraints between the incremented state values and the function f CC (·). Similar to f DC (·) of Equation (14), f CC (·) is now an expressed function of the p th home, for brevity, its redefinition is omitted. Furthermore, the objective penalty function, CC (·), which is summed over each time-step i k , has the form, The last four objective penalty terms of Equation (18), are similar to that of Equation (15). However, the penalty term, is used to minimize the difference in aggregate control effort between adjacent time-step. Observably, this penalty term smooths aggregate power consumption among VSHP devices. Because Equations (16) and (19) focus on the grouping of control actions, they both are scaled the objective constant α. The other objective constants β, γ, ζ, and τ scale their associated objective penalty term.

Case Study
HVAC units are typically sold in half ton increments, where one ton of cooling is defined as the amount of heat required to freeze/melt 2000 pounds of water in a 24-h period. Prior to installing a VSHP, the discrete tonnage is chosen according to thermodynamic properties of the space required to condition. Similar to how a residential HVAC system is chosen, in this study a VSHP's cooling capacity, η ≤ 0, is determined based on the time required to cool that home's indoor air temperature, T A , from the upper dead-band, δ + , to the lower dead-band, δ − . To assure population heterogeneity, the thermal parameters C A , C M , R 1 , and R 2 are generated about a known statistical distribution. This Gaussian distribution is defined by the mean and standard deviation values listed in Table A1. All other simulation parameters, including objective constants, are therefore listed in Table A2.
As previously stated, T o and its derivativeṪ o represent a disturbance to the thermal system. The proposed controllers, shown in Section 3, use a receding horizon approach to determine control action, m, for a corresponding VSHP. This receding horizon controller necessarily requires that T o andṪ o be known prior to simulation. Based on modern metrological forecasting techniques, it is reasonable to assume this outside temperature data are known over the current finite simulation horizon H. Conversely, in a simulated environment, like the one presented in this paper, outside temperature data are queued from the National Solar Radiation Database (NSRDB) in the form of a Typical Meteorological Year (TMY) [23]. This TMY dataset, among other qualitative properties, provides hourly ambient outside temperature data. As the name suggests, this dataset represents the most usual weather conditions for a given region and is well suited for the application of weather prediction.
The simulation results generated via Algorithms 2 and 3 are compared using several quantitative metrics. These performance metrics provide a key insight into the loadshedding capability of each control framework. The first two metric, denoted P i and P f , describe the peak power drawn before and after the simulated demand response event. Each term is expressed as a ratio between the peak power demand and the total consumable power within the system. For clarity, the total consumable power is the amount of power demanded, assuming all VSHPs operate at full capacity, i.e., m k p = 1 ∀k, p ∈ K, N . The next performance metric is a measurement of the total energy associated with the aggregate power consumption signals observed in Figures 7b and 8b. In the context of this paper the energy consumed is defined as, To ensure this consumed energy is intrinsically represented, it too is expressed, in Table 1, as a ratio by dividing it with the total consumable energy within the system. As a point of reference, the standard PID controller, shown in Figure 4, is also compared using the three performance metrics described above.       Based on the simulation results of Figures 7 and 8, in addition the quantitative metrics listed in Table 1, both control frameworks are observed to similar performance attributes. Unlike the PID controller defined in Section 2.3, both optimal control frameworks reduce the load synchronization effects caused by the demand-response event. Furthermore, Figures 7b and 8b show a smooth gradual increase in aggregate power consumption. This reduction in P f is partially attributed to the scheduling of control actions accomplished via the minimization the objective penalty terms (16) and (19). For the other quantitative metrics, P i and E T , little change is observed between control frameworks.
A key difference between the centralized and decentralized controllers defined in Section 3, is the computational complexity associated with each framework. For the centralized controller, the number of decision variables within the optimization program of Equation (17) is directly proportional to the number of homes being simulated, N. As the number of simulated homes increases, so too increases the time required to compute the population of control actions, assuming all auxiliary features remain constant. An advantageous feature of the decentralized framework is that the computational complexity required to solve a control action for a VSHP remains constant. This assertion is exemplified by the simulation runtimes listed in Table 2. The proposed decentralized controller necessarily requires that each VSHP compute its own control action. By way of Algorithm 1, elements of the thermal parameter vector, θ ∈ R 5 , are systematically updated to improve the accuracy and resilience of the predictive model. These incremental updates provide an ETP model the wherewithal to accurately mimic the dynamics of its respective plant. To show convergence between the model and plant dynamics an error signal is generated for each element of the thermal parameter vector. This error signal is defined as the difference between the plant and model parameter values, i.e., θ p − θ m . Each error signal is graphically depicted by the subplots of Figures 9 and 10. Due to its initialization process, these thermal parameters show a tendency to deviate from the plant values during early time-steps of the simulation. We also observe that before the demandresponse event, these thermal parameters tend not to converge upon their respective plant value. This failure to converge is partially attributed to the lack of persistence of excitation. After the demand-response event, parameters then converge upon their desired plant value. In a real setting, the true thermal parameters of the proposed ETP model, which defines the plant dynamics, are likely uncharacterizable by a linear approximation. Moreover, it is reasonable to assume the thermal characteristics of a home are capable of change by way of renovation or degradation. For these reasons, two update condition are employed to help determine periods of stable prediction. The first update condition is κ(P k ) ≤ c 1 , where κ(·) is the condition number of P k , as defined in Algorithm 1, and c 1 is an empirically determined constant. In Figures 9 and 10, it may be observed that the system parameters change rapidly during the initial stages of the system response. This may be attributed to the condition number κ(P k ) being relatively large. The second indicator used to predict convergence is the rate at which each thermal parameter varies. Thermal parameters are therefore updated when the condition,θ k ≤ c 2 , holds. similar to the value of c 1 , the value of c 2 is empirically determined. The combination of these two conditions attempts to predict when the thermal parameters are capable of being updated. Each thermal parameter must be initialized within an approximate region of the true parameter characteristics.

Remark 1.
It is a standard result of adaptive control theory that when the inputs to a control system do not excite sufficient modes of the system, parameter estimation does not converge to their correct values. For a detailed exposure, the reader is invited to see Chapter 2.4 (page 63) of [22].

Conclusions
In this paper, we develop a decentralized control framework to optimally schedule the control actions of a population of VSHPs. Albeit contradictory, this control framework balances end-user temperature requirements with an electric utility service provider's desire to supply smooth predictable power. We show through minimal network communication that our decentralized control framework performs on par with a similarly structured centralized controller with omniscient knowledge of the state and control actions of its population. Based on the quantitative metrics presented in Section 4, we also show this decentralized control framework alleviates the load aggregations experienced by more traditional PID controllers. In an effort to improve the accuracy of our controller, we implement a recursive least squares algorithm to adaptively update the parameters defining the second-order ETP model. Although failing to converge under certain simulated conditions, this RLS algorithm proved useful in the convergence of parameters when stimulated with a demand response event. Over longer simulated time periods, by means of persistence of excitation, we conclude that thermal parameters will eventually approach a desired value such that the dynamic model mimics the response of the plant. Further experimental validation is needed to show the efficacy of our simulation. Such experiments will be the focus of future research. Due to the approvals needed by local utilities, regulators, and participants alike, significant planning is required to perform the necessary experiments. This research serves as an important precursor to future experimental studies.

Conflicts of Interest:
The authors declare no conflict of interest.