Next Article in Journal
Ensuring Consistency for In-Image Translation
Previous Article in Journal
An Inverse Source Problem in a Variable-Order Time-Fractional Diffusion PDE
Previous Article in Special Issue
Adaptive Iterative Algorithm for Optimizing the Load Profile of Charging Stations with Restrictions on the State of Charge of the Battery of Mining Dump Trucks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Stochastic Optimal Control Problem and Sensitivity Analysis for a Residential Heating System

by
Maalvladédon Ganet Somé
1,2,* and
Japhet Niyobuhungiro
1,3
1
Department of Mathematics, School of Science, College of Science and Technology, University of Rwanda, Kigali P.O. Box 4285, Rwanda
2
African Institute for Mathematical Sciences, Ghana, Legon, Accra, 1st Shoppers Street, Spintex, Accra P.O. Box LGDTD 20046, Ghana
3
National Council for Science and Technology, Kigali P.O. Box 2285, Rwanda
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(3), 489; https://doi.org/10.3390/math14030489
Submission received: 22 November 2025 / Revised: 17 December 2025 / Accepted: 19 December 2025 / Published: 30 January 2026

Abstract

We consider a network of a residential heating system (RHS) composed of two types of agents: a prosumer and a consumer. Both are connected to a community heating system (CHS), which supplies non-intermittent thermal energy for space heating and domestic hot water. The prosumer utilizes a combination of solar thermal collectors and CHS heat, whereas the consumer depends entirely on the CHS. Any excess heat generated by the prosumer can either be stored on-site or fed back into the CHS. Weather conditions, modeled as a common noise term, affect both agents simultaneously. The prosumer’s objective is to minimize the expected discounted total cost, taking into account storage charging and discharging losses as well as uncertainties in future heat production and demand. This leads to a stochastic optimal control problem addressed through dynamic programming techniques. Scenario-based analyses are then performed to examine how different parameters influence both the value function and the resulting optimal control strategies. For a common noise coefficient σ 0 = 0.4 , the prosumer incurs an approximate 16.08 % increase in the aggregated discounted cost from the case of no common noise. For a discharging efficiency η E = 1 0.9 , the maximum aggregated discounted cost increases by approximately 1.85 % as compared to the perfect discharging efficiency. Similarly, for a charging efficiency η E = 0.9 , we observe an approximate 1.94 % increase in the aggregated discounted cost as compared to a perfect charging efficiency. Furthermore, we derive insights into the maximum expected discounted investment that a consumer would need to make in renewable technologies in order to transition into a prosumer.

1. Introduction

In recent decades, global weather patterns have shifted markedly, largely due to human-driven factors such as extensive fossil fuel use and deforestation, both of which have significantly increased the global carbon footprint. Projections from the United Nations Department of Economic and Social Affairs indicate that, by 2050, urbanization will reach 68 % , with more than 40 metropolitan areas exceeding 10 million inhabitants [1]. As noted in [1], cities currently account for over two-thirds of global CO 2 emissions and energy consumption. These combined pressures—accelerating urbanization, climate change, and rising energy demand—underscore the urgent need to transition away from a business-as-usual economic model toward one that prioritizes sustainability and energy efficiency across all sectors. According to the International Energy Agency (IEA), buildings contribute roughly 30 % of global final energy use and 26 % of energy-related emissions. Consistent with [2], in the United States and many European countries, 18–30% of total energy consumption is associated with the thermal energy needs of buildings. In this study, we consider a residential heating system (RHS) composed of two agents—a prosumer and a pure consumer (hereafter simply “consumer”)—as illustrated in Figure 1. In this setting, a prosumer is an entity capable of producing and consuming thermal energy, whereas a consumer only uses energy supplied to it. The RHS provides heat for space heating and domestic hot water demands. The prosumer is equipped with a solar thermal collector for local heat generation, an internal storage (IS), and an external storage (ES), both modeled as water-based storage tanks for simplicity [2]. The IS serves short-term storage needs, while the ES accommodates longer-term storage. Because solar production and residual thermal demand are highly intermittent and weather-dependent, additional support is required to ensure a stable and sustainable thermal supply. To address this, both buildings are connected to a community heating system (CHS). In this work, we consider a low-temperature bidirectional CHS (LTB-CHS) [1,3,4,5] and assume, for simplicity, that the temperature in the distribution pipes remains constant. The LTB-CHS is a relatively recent concept in district energy networks, offering benefits such as improved integration of renewable heat sources, simplified coupling with distributed technologies, and reduced carbon emissions. Its bidirectional capability allows buildings to inject surplus heat into the network, lowering peak central production requirements and enabling economic incentives whereby building owners can sell excess heat. Since the supply temperature in an LTB-CHS is lower than that in conventional systems, a heat pump (HP) is required to upgrade the delivered heat to the desired level.
In our modeling approach, daily temperature variations influence both the prosumer and consumer and are represented by a common noise process. Following [2,6,7], we define residual demand as the imbalance between thermal supply and demand. We assume that the prosumer can meet this residual demand either via the CHS or by using the ES, whereas the consumer relies solely on the CHS. When production falls short, the prosumer either purchases heat from the CHS or releases it from the ES. Conversely, during periods of surplus production, the prosumer may store the excess in the ES (subject to capacity limits) or sell it to the CHS. The consumer, lacking local production and storage, always faces unmet demand that must be satisfied exclusively through CHS purchases. Consistent with [2], we assume that the CHS is responsible for fulfilling the residual thermal demands of both agents, even though this demand fluctuates with respect to the common noise. Moreover, energy transfer between the IS and ES, as well as energy exports from the prosumer to the CHS, requires an ordinary pump (OP). The prosumer aims to minimize the expected total discounted cost associated with meeting its thermal demand, while accounting for uncertain weather conditions and variability in solar output. In contrast, the consumer—lacking local production or storage—continuously faces unmet thermal demand that must be supplied entirely from the CHS.
Including both agents in our framework enables us to determine the maximum expected investment that would make it economically reasonable for a consumer to adopt distributed renewable heating technologies and storage infrastructure, thereby transitioning into a prosumer.
Previous work, such as [2], has formulated an RHS model and solved a stochastic optimal control problem using semi-Lagrangian and dynamic programming approaches. In Ref.  [7], the authors addressed a similar problem for a stand-alone system with geothermal storage. However, these studies did not explicitly model the dependence of residual demand on daily temperature variations via a common noise process, nor did they include ES charging and discharging inefficiencies. Our model fills this gap by incorporating these features, which increase realism. Models with common noise are advantageous because they reflect uncertainties that affect all agents simultaneously, potentially altering both qualitative and quantitative system behavior. Applications of common noise models in the energy sector include [6,8,9,10,11,12,13].
Unlike in [2,7], we incorporate ES charging and discharging efficiencies, which account for storage losses when heat is stored or released. For simplicity, we assume that these efficiency factors depend solely on the residual demand, as in [14].
Our contributions include first modeling a consumer connected to the CHS. We also motivate a common noise in the dynamics of the residual demands. This allows us to capture the impact of daily temperature variations on the residual demand. In addition, we include charging/discharging efficiencies in the dynamics of the ES temperature level. In reality, losses are observed during charging or discharging processes of the ES. Therefore, modeling these efficiencies is required. Furthermore, we perform a sensitivity analysis on the common noise, charging and discharging parameters to evaluate their impact on the expected total discounted cost. We also show that the value function is a viscosity solution of the associated partial differential equation (PDE). We derive the discrete-time state-dependent control set from the state constraint without imposing charging and discharging thresholds. In this work, the state-dependent control intervals depend on the charging/discharging efficiencies. We compare the costs of satisfying residual demands for both the consumer and the prosumer, and we estimate the expected investment that a consumer would need to undertake to transition into a prosumer. Finally, we perform extensive numerical experiments and provide economic interpretations.
The remainder of this paper is structured as follows: Section 2 introduces the model, including state variables, control variables, constraints, and the heat price specification. Section 3 presents the optimal control problems for both agents. Section 4 and Section 5 describe the semi-Lagrangian discretization and numerical results, including sensitivity analysis. Section 6 concludes and outlines directions for future research.

2. Model Formulation

In what follows, we study a model that includes both a prosumer and a consumer connected to a CHS. The prosumer is able to generate thermal energy locally while also drawing from, or supplying heat to, the CHS. In contrast, the consumer has neither local production capabilities nor on-site storage units. We present a mathematical framework and corresponding optimization problems for both agents. Throughout this paper, we work on a fixed probability space ( Ω , F , P ) endowed with a filtration F = ( F t ) t [ 0 , T ] that satisfies the usual conditions and supports two independent standard Brownian motions W R and W 0 . The parameter T > 0 denotes the finite planning horizon.
An RHS is designed to supply thermal energy for both space heating and domestic hot water in a building. The prosumer is equipped with a solar collector, two storage units, a heat pump, and ordinary pumps. Thermal energy generated by the solar collector is first directed to the internal storage (IS) to meet immediate demand, while the external storage (ES) serves as a long-term buffer to handle periods of insufficient or excess production. We describe the imbalance between thermal supply and demand, resulting from stochastic fluctuations in solar collector output, by a stochastic process R = ( R ( t ) ) t [ 0 , T ] . This process is decomposed as R ( t ) = μ R ( t ) + Z R ( t ) , where μ R ( t ) denotes the seasonal average and Z R ( t ) represents random deviations from this mean.
The IS is modeled as a cylindrical tank whose water temperature must remain within prescribed bounds, i m i n and i m a x . In this study, we treat the IS as a black box and do not model its internal thermodynamic behavior explicitly. The ES is also represented as a cylindrical water tank with a fixed mass of stored water, which can be heated up to a maximum temperature q ¯ through transfers of thermal energy from the IS. Conversely, it can be cooled to a minimum temperature q ̲ when heat is withdrawn to satisfy the residual demand. For simplicity, we assume that charging and discharging the ES involves only ordinary pumps.
At any time t [ 0 , T ] , the prosumer’s state variables consist of the exogenous, deseasonalized residual demand Z R and the endogenous ES temperature level Q. We therefore define the prosumer’s state process as X = ( Z R , Q ) .
For the consumer, the state is given solely by the deseasonalized residual demand Z ˜ , which is entirely exogenous.
Residual Demand: The deseasonalized residual demand processes Z R and Z ˜ , associated with the prosumer and the consumer, respectively, evolve according to
d Z R ( t ) = κ R Z R ( t ) d t + σ R ( t ) d W R ( t ) + σ 0 d W 0 ( t ) ,
d Z ˜ ( t ) = κ 0 Z ˜ ( t ) d t + σ 0 d W 0 ( t ) ,
where W R represents the idiosyncratic noise capturing uncertainties specific to the prosumer’s heat demand, and  W 0 is the common noise accounting for shared weather-driven fluctuations.
The seasonal behavior of the residual demands is modeled by bounded and deterministic seasonality functions μ R , μ ˜ : [ 0 , T ] R for the prosumer and the consumer, respectively. Typical choices of μ ^ ( t ) = μ R , μ ˜ include
μ ^ ( t ) = c 0 + c t + j = 1 m 1 c j cos 2 π ρ j ( t t j ) + d j sin 2 π ρ j ( t t j ) ,
where c 0 is the long-term mean residual demand; c j , d j represent the amplitudes of seasonality component j of the cosine and sine terms, respectively; ρ j , t j represent the length and a reference time of seasonality component j, respectively; c is the coefficient of the linear term; and m 1 is the number of seasonality components.
Since the prosumer is equipped with a local production unit, which can lead to overproduction, the expression of R ( t ) agrees with our modeling perspectives. Similarly, we model R ˜ ( t ) = μ ˜ ( t ) + Z ˜ ( t ) and choose parameters of μ ˜ ( t ) and Z ˜ ( t ) for which there exists ν 1 such that P ( R ˜ ( t ) < 0 ) < ν . Below, R > 0 is referred to as unsatisfied demand and R < 0 as overproduction.
The control α [ 0 , 1 ] denotes the proportion of residual demand satisfied by the CHS. As in Ref. [2], we assume the decomposition R ( t ) = α ( t ) R ( t ) + ( 1 α ( t ) ) R ( t ) , where α ( t ) R ( t ) is the residual demand satisfied by the ES, and ( 1 α ( t ) ) R ( t ) is the residual demand satisfied by the CHS. Satisfying the residual demand via the CHS suggests either buying (for R > 0 ) or selling (for R < 0 ). Likewise, satisfying the residual demand via the ES implies either discharging (for R > 0 ) or charging (for R < 0 ).
External Storage: In the following, we consider as storage a water tank where the water mass and specific heat capacity are given by m Q [ k g ] and c P [ K W h / k g   K ] , respectively. The surface area of the tank is denoted as A [ m 2 ] . From the heat fluxes and the Newton cooling law with the heat transfer coefficient γ [ k W / m 2 K ], the change in the thermal energy of the water inside the ES is given by
d Q ( t ) = 1 m Q c P ( η E ( t ) ( 1 α ( t ) ) R ( t ) + A γ ( Q ( t ) q ̲ ) ) d t ,   Q ( 0 ) = Q 0 .
The first term η E ( t ) ( 1 α ( t ) ) R ( t ) represents the ES’s charging or discharging rate. Indeed, for an unsatisfied demand ( R ( t ) > 0 ) and for a non-empty ES ( Q ( t ) > q ̲ ), the prosumer discharges the storage at a rate α ( t ) > 0 . For an overproduction ( R ( t ) < 0 ) and for a non-full ES ( Q ( t ) < q ¯ ), the prosumer can charge the storage at the rate α ( t ) < 0 . The second term A γ ( Q ( t ) q ̲ ) is the rate of heat transfer with the environment. η E represents the charging or discharging efficiency of the ES. Letting η E C , η E D ( 0 , 1 ) , we model η E for simplicity as in [14] by
η E ( t ) : = η E ( R ( t ) ) = η E C ,   if   R ( t ) < 0 , 1 / η E D ,   if   R ( t ) 0 .
Remark 1. 
  • A large γ implies large losses per unit of time, prohibiting long-term storage. Similarly, a small γ implies small losses per unit of time, allowing for long-term storage.
  • In the case α = 0 , the prosumer does not satisfy its residual demand via the CHS. In the event of an unsatisfied demand ( R ( t ) > 0 ), α = 1 implies that the prosumer satisfies all of the residual demand via the CHS. However, for an overproduction ( R ( t ) < 0 ), α = 1 implies that the prosumer sells all the residual demand to the CHS.
State and Control Constraints: In order to reflect physical limitations of our external storage, we suggest that Q ( t ) [ q ̲ , q ¯ ] , where q ̲ and q ¯ are the minimum and maximum storage levels, respectively. Q ( t ) = q ̲ is referred to as an empty storage, while Q ( t ) = q ¯ is a full storage.
The operational limitations experienced by the prosumer give rise to state-dependent control constraints:
1.
If Q ( t ) = q ̲ , only charging is feasible. Therefore,
  • If R ( t ) > 0 , all residual demand is satisfied by the CHS, implying α ( t ) = 1 .
  • If R ( t ) < 0 , there are no control constraints other than the requirement α ( t ) [ 0 , 1 ] .
2.
If Q ( t ) = q ¯ , we examine the following cases:
  • R ( t ) > 0 , no control constraints other than the requirement α ( t ) [ 0 , 1 ] .
  • R ( t ) < 0 and (4) suggest that η E ( 1 α ( t ) ) R ( t ) + A γ ( Q ( t ) q ̲ ) 0 , hence implying that α ( t ) 1 + A γ ( q ¯ q ̲ ) R ( t ) η E C = : χ ^ ( t , R ( t ) ) . In the following, we define χ ( t , R ( t ) ) = χ ^ ( t , R ( t ) ) 0 and choose α ( t ) = χ ( t , R ( t ) ) = χ ( t ) .
Hence, the state-dependent set of feasible controls, U ( t , z R , q ) , is defined by
U ( t , z R , q ) = [ 0 , 1 ] , q > q ̲ ,   z R μ R ( t ) , { 1 } , q = q ̲ ,   z R μ R ( t ) , [ 0 , 1 ] , q < q ¯ ,   z R < μ R ( t ) , [ χ ( t ) , 1 ] , q = q ¯ ,   z R < μ R ( t ) .
Heat Price Formulation: We introduce time-varying heat buying and selling prices, denoted by P b u y ( t ) and P s e l l ( t ) , respectively, for all t [ 0 , T ] . We assume a contractual agreement between the prosumer and the CHS, binding the latter to always satisfy the residual demand of the former. To incentivize the CHS, it retains the right to determine the values of P b u y ( t ) and P s e l l ( t ) . We model P b u y as a deterministic function composed of a constant baseline and multiple seasonal fluctuations. P s e l l is a fixed markdown from the buying price through the spread parameter ξ > 0 . Therefore, we obtain
P b u y ( t ) = 𝓁 0 + j = 1 m 2 𝓁 j cos 2 π ρ j S ( t t j S ) ,   and   P s e l l ( t ) = P b u y ( t ) ξ .
The constant term 𝓁 0 [ k W h ] corresponds to the long-term average heat price, while each cosine term introduces a seasonal component with amplitude 𝓁 j [ k W h ] , period ρ j S [ h ] , and phase shift t j S [ h ] ; m 2 represents the number of seasonalities. The sum in the heat price model means that it is not driven by a simple seasonal pattern but by the combined effects of several independent and interpretable cycles, each corresponding to a different physical, operational, or economic rhythm of the heating system. The deterministic seasonal structure makes prices transparent and predictable for the prosumer, while the positive spread guarantees economic viability for the CHS. The bid–ask spread reflects transaction costs, operational constraints, or the market power of the CHS. It ensures that the CHS is compensated for providing balancing and backup services.
Assumption 1. 
Let P c and i d denote the heat pump inlet and outlet temperatures, respectively. Similar to [2], we assume that the condition q ̲ > i d > i m i n > P c holds.
Assumption 1 suggests that no heat pumps are necessary when satisfying the residual demand via the ES. This justifies the use of ordinary pumps in our model.

3. Problem Formulation

In this section, we formulate both the prosumer’s and consumer’s problems. The prosumer’s objective is to satisfy its heating and hot water demand while minimizing the cost of running the pumps and payments to the CHS.

3.1. Prosumer’s Problem

In the following, we present a mathematical framework to determine both the value and the optimal control strategy for the prosumer’s problem. This leads to the formulation of a stochastic optimal control problem for which the state process is defined as X = ( Z R , Q ) X = R × [ q ̲ , q ¯ ] , representing the relevant system variables. From the dynamics described in the first equation in (1) and (4), the state process evolves as follows:
d Z R ( t ) = f ( Z R ( t ) ) d t + σ R d W R ( t ) + σ 0 d W 0 ( t ) , Z R ( 0 ) = Z 0 R R , d Q ( t ) = h ( Z R ( t ) , Q ( t ) , α ( t ) ) d t , Q ( 0 ) = Q 0 [ q ̲ , q ¯ ] ,
where f ( z R ) = κ R z R and h ( z R , q , υ ) = 1 m Q c P ( η E ( t ) ( 1 υ ) ( μ R ( t ) + z R ) + A γ ( q q ̲ ) ) .
Assumption 2. 
Let ( z R , q ) R × [ q ̲ , q ¯ ] and q 1 , q 2 [ q ̲ , q ¯ ] .
1.
The function h ( z R , q , υ ) is F -measurable.
2.
The function h ( z R , q , υ ) is continuous for almost all z R .
3.
For almost all z R , there exists C R such that | h ( z R , q 1 , υ ) h ( z R , q 2 , υ ) | C | q 1 q 2 | .
As in [15], this assumption ensures that the random ordinary differential equation (RODE) in (7) is well defined.

Cost Formulation

The costs incurred for operating the prosumer’s system are divided into a running cost and a terminal cost. Together, they define the total cost that needs to be minimized.
Running Cost: This consists of the cost (revenue) of satisfying the residual demand via the CHS, along with the cost of electricity consumption for operating the pumps when interacting with both the CHS and the ES. At  t [ 0 , T ] , the prosumer can either purchase or sell thermal energy from or to the CHS at prices P b u y ( t ) and P s e l l ( t ) , respectively, to satisfy its residual demand. Let φ 0 denote the cost (revenue) per unit of time of buying (selling) thermal energy from (to) the CHS, defined by
φ 0 ( t , z R , υ ) = υ ( μ R ( t ) + z R ) P b u y ( t ) ,   z R μ R ( t ) , υ ( μ R ( t ) + z R ) P s e l l ( t ) ,   z R < μ R ( t ) .
Since we assume that the prosumer is connected to an LTB-CHS, we incur an additional cost for the electricity consumption due to the use of a heat pump when satisfying the residual demand via the CHS. Let b 1 and b 2 [ K 1 ] denote positive constants penalizing a high water flow rate and a high temperature difference between P c and i d , respectively. Letting S denote the electricity price, the  cost of electricity consumption φ 1 incurred when interacting with the CHS is modeled by
φ 1 ( t , z R , υ ) = υ ( μ R ( t ) + z R ) ( b 1 + b 2 ( i d P c ) ) S , z R μ R ( t ) , υ ( μ R ( t ) + z R ) b 1 S , z R < μ R ( t ) ,
For z R μ R ( t ) , the first equation in φ 1 is the cost per unit of time of electricity consumption to run the heat pump to increase the temperature from P c to i d . On the other hand, for  z R < μ R ( t ) , the last equation in φ 1 is the cost per unit of time of electricity consumption for operating the ordinary pump when selling thermal energy to the CHS.
The cost of satisfying the residual demand via the ES comes from the cost of electricity consumption to operate the ordinary pumps and is given by
φ 2 ( t , z R , υ ) = ( 1 υ ) | μ R ( t ) + z R | b 1 S .
Therefore, the running cost Γ ( t , z R , υ ) is given by Γ ( t , z R , υ ) = φ 0 ( t , z R , υ ) + φ 1 ( t , z R , υ ) + φ 2 ( t , z R , υ ) .
Terminal Cost: In the context of finite-time horizon models with terminal time T > 0 , it is customary to incorporate a terminal cost. In our context, this depends on the ES storage level at time T and is denoted by Φ ( Q ( T ) ) . This terminal cost can reflect a contractual agreement for the ES temperature to be at an agreed level at T. Let P l i q , P p e n denote the liquidation price and penalty price, respectively. Indeed, at terminal time, the prosumer can sell the leftover thermal energy at P l i q per unit of thermal energy. Similarly, the prosumer can be penalized for failing to keep the ES at a certain level at T, where P p e n is the penalty price per unit of thermal energy. The liquidation price P l i q and penalty price P p e n satisfy the conditions P l i q < P s e l l ( T ) and P p e n > P b u y ( T ) , respectively. In the following, we let q r e f denote a pre-agreed temperature level in the ES, such that
Φ ( q ) = P p e n m Q c P ( q r e f q ) η E C , q < q r e f , η E D P l i q m Q c P ( q q r e f ) , q q r e f .
In this model, the terminal cost Φ ( q ) now depends on the charging and discharging efficiency term η E ( R ( t ) ) .
The performance criterion J P : [ 0 , T ] × X × U R denotes the expected aggregated total discounted cost over the time interval [ 0 , T ] and is defined as follows:
J P ( t , x , α ) = E t , x t T e δ ( s t ) Γ ( s , Z R ( s ) , α ( s ) ) d s + e δ ( T t ) Φ ( Q ( T ) ) ,
where x = ( z R , q ) , δ 0 is a discount rate and E t , x [ · ] is the conditional expectation given that at initial time t, X ( t ) = x .
Assumption 3. 
For any admissible control α, the functions Γ and Φ satisfy the condition
E t T | Γ ( s , Z R ( s ) , α ( s ) ) | d s + | Φ ( Q ( s ) ) | < .
Given that we want to use dynamic programming techniques, we restrict to Markov controls defined by α ( t ) = θ ( t , X ( t ) ) , for all t [ 0 , T ] with a measurable function θ : [ 0 , T ] × X U , which is called a decision rule and is adapted to the filtration F . We denote by A the class of admissible controls, defined as follows:
A = { ( α ( t ) ) t [ 0 , T ]   |   α   is   F progressively   measurable , α ( t ) = θ ( t , X ( t ) ) ,   t [ 0 , T ] , θ ( t , x ) U ( t , x )   and     Equations   ( 7 )   and   ( 9 )   are   well   defined } .
The prosumer seeks to minimize J P over all admissible controls. The value of the prosumer’s problem for all ( t , x ) [ 0 , T ] × X is given by
V ( t , x ) = inf α A J P ( t , x , α ) .

3.2. Consumer’s Problem

We discuss the mathematical framework to determine the value of the consumer’s problem. We recall that the consumer is not equipped with local production or storage units and is always subject to a positive residual demand. As a result, it always satisfies all residual demand via the CHS. The state process of the consumer is given by
d Z ˜ ( t ) = f 2 ( Z ˜ ( t ) ) d t + σ 0 d W 0 ( t ) ,     Z ˜ ( 0 ) = Z ˜ 0 R ,
where f 2 ( z ˜ ) = κ 0 z ˜ .
Similar to the prosumer’s case, the consumer’s running cost consists of the cost of purchasing thermal energy from the CHS and the cost of operating the heat pump to raise the temperature from P c to i ˜ d . Let φ ˜ 0 and φ ˜ 1 denote the cost per unit of time of purchasing thermal energy from the CHS and that of electricity consumption to run the heat pump, respectively, which are defined as follows:
φ ˜ 0 ( t , z ˜ ) = R ˜ ( t ) P b u y ( t ) ,
φ ˜ 1 ( t , z ˜ ) = R ˜ ( t ) ( b 1 + b 2 ( i ˜ d P c ) ) S ,
where R ˜ ( t ) = μ ˜ ( t ) + Z ˜ ( t ) . Therefore, Γ ˜ ( t , z ˜ ) = φ ˜ 0 ( t , z ˜ ) + φ ˜ 1 ( t , z ˜ ) .
Since the consumer is not equipped with a local storage unit, there is no terminal cost. The performance criterion J C : [ 0 , T ] × R R + denotes the expected aggregated total discounted cost over the time interval [ 0 , T ] and is defined as follows:
J C ( t , z ˜ ) = E t , z ˜ t T e δ ( s t ) Γ ˜ ( s , Z ˜ ( s ) ) d s ,
where δ 0 is a discount rate and E t , z ˜ [ · ] is the conditional expectation given that at initial time t, Z ˜ ( t ) = z ˜ . The value of the consumer’s problem for all ( t , z ˜ ) [ 0 , T ] × R is given by
V ˜ ( t , z ˜ ) = J C ( t , z ˜ ) .

4. Semi-Lagrangian Discretization

In this section, we recall the discrete-time numerical scheme in [2]. We focus on the prosumer’s problem, since the consumer’s problem can be solved directly. We start by discussing the state discretization.
State Discretization: Let N t , N z , and N q denote the number of grid points in the t-, z R -, and q-directions, respectively. For computational reasons, we truncate the domain X = R × [ q ̲ , q ¯ ] to X ^ = [ z ̲ R , z ¯ R ] × [ q ̲ , q ¯ ] , such that for a tolerance ϵ 1 , P ( Z R ( t ) [ z ̲ R , z ¯ R ] ) 1 ϵ , for all t [ 0 , T ] . z ̲ R and z ¯ R represent the minimum and maximum values of the deseasonalized residual demand Z R , respectively. Given the asymptotic standard deviation s 0 = σ R 2 + σ 0 2 2 κ R of Z R , the 3- σ rule motivates z ̲ R = 3 s 0 and z ¯ R = 3 s 0 .
Let t 0 < t 1 < < t N t , z 0 R < z 1 R < < z N z R , and q 0 < q 1 < < q N q be a finite number of grid points in the z R and q-directions. We define
G ^ = G ^ t × G ^ z × G ^ q = [ t 0 , , t N t ] × [ z 0 R , , z N z R ] × [ q 0 , , q N q ] ,
where t n = t 0 + n Δ t , z i R = z ̲ R + i Δ z R , q k = q ̲ + k Δ q , for  n G ^ n = { 0 , 1 , , N t } , i = 0 , , N z and k = 0 , , N q , as a 3-dimensional equidistant grid on X ^ with the temporal and spatial step sizes
t n + 1 t n = : Δ t = T t 0 N t ,   Δ z R = z ¯ R z ̲ R N z ,   Δ q = q ¯ q ̲ N q .
We now recall the discrete-time numerical scheme discussed in [2]. This is an alternative to the semi-Lagrangian approach introduced in [16] and extended in [17,18]. It is a finite difference scheme based on the theory presented in [19].
The control problem in (12) can be solved through dynamic programming techniques that rely on the following dynamic programming principle (DPP). The subsequent result is useful for solutions to the control problem in (12).
Theorem 1 
(DPP). For all ( t , x ) [ 0 , T ) × X , h > 0 and t + h T , the value function V ( t , x ) , satisfies the DPP
V ( t , x ) = inf α A E t , x t t + h e δ ( s t ) Γ ( s , Z R ( s ) , α ( s ) ) d s + e δ h V ( t + h , X ( t + h ) ) .
Proof. 
For a proof, we refer to [20].    □
The following result shows that the value function V, defined in (12), is a viscosity solution of an associated partial differential equation (PDE).
Proposition 1. 
Let  Θ ( t , x , υ ) = f ( z R ) ,   h ( z R , q , υ ) ,   Γ ( t , z R , υ ) ,   Φ ( q ) . Thus, there exists L > 0 such that
| Θ ( t , x , υ ) Θ ( t , x ^ , υ ) | L | x x ^ | ,   t [ 0 , T ] , x , x ^ X , υ U ,
| Θ ( t , 0 , υ ) | L ,   ( t , υ ) [ 0 , T ] × U .
Therefore, the value function V is a viscosity solution of the PDE
V t + L V + inf υ U ( t , z R , q ) { L q V + Γ ( t , z R , υ ) } = 0 ,
with the terminal condition V ( T , q ) = Φ ( q ) and L V = f ( z R ) V z R + 1 2 ( σ R 2 ( t ) + σ 0 2 ) 2 V ( z R ) 2 δ V , L q = h ( z R , q , υ ) V q .
Proof. 
First, we note that the functions f ( z R ) ,   h ( z R , q , υ ) ,   Γ ( t , z R , υ ) ,   Φ ( q ) are linear in the states z R and q. Then, there exits L > 0 , such that
| Θ ( t , x , υ ) Θ ( t , x ^ , υ ) | L | x x ^ | ,   t [ 0 , T ] , x , x ^ X , υ U .
In addition, we have | Θ ( t , 0 , υ ) | L ,   ( t , υ ) [ 0 , T ] × U . Finally, the result follows from Theorem 2.3 in [21].    □
Idea of the Scheme: We begin by fixing the triple ( t n , z i R , q k ) and assume that, for t [ t n , t n + 1 ) , the deseasonalized residual demand remains constant, i.e.,  Z R ( t ) = z i R , where z i R is fixed and known. Next, we assume that the control is constant over this interval, i.e.,  α ( t ) = α i , k n = : υ , which is fixed but unknown. Under these assumptions, and using the explicit solution, we compute the arrival point Q k ( i , n ) υ , n + 1 of the ES temperature level, given that at time t n , the storage level was q k , the residual demand was μ R ( t n ) + z i R , and constant action υ was taken. We then apply the DPP in two steps: In the first step, using the above discretization, we obtain an approximation of the optimal control α i , k * n = : υ * . In the second step, we substitute this approximate optimal control in the DPP and “release” the deseasonalized residual demand Z R . Finally, by applying the Feynman–Kac formula to the second step, we derive a one-period PDE, which we discretize using finite difference methods to obtain a discrete-time scheme. This scheme is implemented in Algorithm 1.
From the above discussion, we formulate the following assumptions:
Assumption 4 
(Piecewise Constant Control). For n = 0 , 1 , , N t 1 and t [ t n , t n + 1 ) , the control α and the associated decision rule θ are kept constant between two consecutive grid points of the time discretization. That is,
α ( t ) = α ( t n )   and   θ ( t , X ( t ) ) = θ ( t n , X ( t n ) ) .
Algorithm 1: Backward Recursion Algorithm
Mathematics 14 00489 i001
Assumption 5 
(Piecewise Constant Model Parameters). For n = 0 , 1 , , N t 1 and t [ t n , t n + 1 ) , the time-dependent seasonality μ R and the efficiency parameter η E are kept constant between two consecutive grid points of the time discretization. That is,
μ R ( t ) = μ R ( t n ) = μ n R ,   μ R ( T ) = μ N t R , η E ( t ) = η E ( t n ) = η E , n ,   η E ( T ) = η E , N t ,
Assumptions 4 and 5 suggest that the system’s time-dependent parameters are adjusted only at discrete-time points and remain constant within two consecutive discrete-time points. This situation is also consistent with reality.
Let Q υ ( t ) denote the solution of the ODE (7) for the ES temperature level on the time interval [ t n , t n + 1 ) for a fixed but unknown control υ , satisfying
d Q υ ( t ) = h ( Z R ( t ) , Q υ ( t ) , υ ) d t ,   Q υ ( t n ) = q k ,   t [ t n , t n + 1 ) .
Lemma 1. 
Let Assumptions 4 and 5 hold and λ = A γ m Q c P . Then, for n = 0 , 1 , , N t , i = 0 , 1 , , N z , k = 0 , 1 , , N q , t [ t n , t n + 1 ) and Q υ ( t n ) = q k , the closed-from solution of Q υ is given by
Q υ ( t ) = q k e λ ( t t n ) + η E , n ( υ 1 ) ( μ n R + z i R ) A γ + q ̲ ( 1 e λ ( t t n ) ) .
Moreover, letting t = t n + 1 and θ i , k n = A γ q k [ η E , n ( υ 1 ) ( μ n R + z i R ) A γ q ̲ ] Δ q 1 e λ Δ t A γ , (23) becomes
Q k ( i , n ) υ , n + 1 = q k θ i , k n Δ q .
Proof. 
Let λ = A γ m Q c P . For n = 0 , 1 , , N t , i = 0 , 1 , , N z and k = 0 , 1 , , N q , assume that Q υ ( t n ) = q k . Under Assumptions 4 and 5, (22) is a linear first-order ODE with constant coefficients and source term, which is solved to obtain the desired result. Now, substituting t = t n + 1 in (23) yields
Q k ( i , n ) υ , n + 1 = q k e λ Δ t + η E , n ( υ 1 ) ( μ n R + z i R ) A γ + q ̲ ( 1 e λ Δ t )
Rearranging the terms in (25) yields (24). □
Q k ( i , n ) υ , n + 1 denotes the ES temperature level at time t n + 1 knowing that, at time t n , the ES was at the level q k with a residual demand μ n R + z i R and an action υ was taken.
Discrete-Time State-Dependent Control Constraints: In order for Q k ( i , n ) υ , n + 1 to always satisfy the condition Q k ( i , n ) υ , n + 1 [ q ̲ , q ¯ ] , we reformulate U to adapt to the discrete-time setting, where the control can only be adjusted at the end of the time interval [ t n , t n + 1 ) . We obtain
U d n ( z i R , q k ) = { υ U ( t n , z i R , q k )   |   Q k ( i , n ) υ , n + 1 [ q ̲ , q ¯ ] } .
In the following lemma, we give the full expression of U d n ( z i R , q k ) .
Lemma 2. 
Under Assumptions 4 and 5, the set of discrete-time state-dependent constraints is
U d n ( z i R , q k ) = [ max ( 0 , υ m i n d ) , min ( 1 , υ m a x d ) ] , q k > q ̲ ,   μ n R + z i R 0 , { 1 } , q k = q ̲ ,   μ n R + z i R 0 , max ( 0 , υ m i n c ) , min ( 1 , υ m a x c ) ] , q k < q ¯ ,   μ n R + z i R < 0 , max ( χ ( t n ) , υ m i n c ) , min ( 1 , υ m a x c ) ] , q k = q ¯ ,   μ n R + z i R < 0 ,
where
υ m i n d = 1 + η E D [ A γ ( q ̲ q k e λ Δ t ) q ̲ ( 1 e λ Δ t ) ] ( 1 e λ Δ t ) ( μ n R + z i R ) , υ m a x d = 1 + η E D [ A γ ( q ¯ q k e λ Δ t ) q ̲ ( 1 e λ Δ t ) ] ( 1 e λ Δ t ) ( μ n R + z i R ) , υ m i n c = 1 + A γ ( q ¯ q k e λ Δ t ) q ̲ ( 1 e λ Δ t ) η E C ( 1 e λ Δ t ) ( μ n R + z i R ) , υ m a x c = 1 + A γ ( q ̲ q k e λ Δ t ) q ̲ ( 1 e λ Δ t ) η E C ( 1 e λ Δ t ) ( μ n R + z i R ) .
Proof. 
From the condition Q k ( i , n ) υ , n + 1 [ q ̲ , q ¯ ] , we obtain
A γ ( q ̲ q k e λ Δ t ) 1 e λ Δ t q ̲ η E ( υ 1 ) ( μ n R + z i R )   A γ ( q ¯ q k e λ Δ t ) 1 e λ Δ t q ̲ .
If μ n R + z i R 0 , η E = 1 η E D , and we obtain υ [ υ m i n d , υ m a x d ] . Similarly, if μ n R + z i R < 0 , η E = η E C , and we have υ [ υ m i n c , υ m a x c ] . Therefore, for μ n R + z i R 0 , U d n ( z i R , q k ) = U ( t n , z i R , q k ) [ υ m i n d , υ m a x d ] , and for μ n R + z i R < 0 , U d n ( z i R , q k ) = U ( t n , z i R , q k ) [ υ m i n c , υ m a x c ] . □
One-Step Terminal Value Problem: Starting from the DPP as in [22], we derive the following proposition, which is analogous to Theorem 4.3 in [2]. This result provides the one-step approximate optimal control together with the associated terminal value problem. The latter is then solved numerically to determine the value and optimal strategy of the prosumer’s optimization problem.
Proposition 2. 
Let ( t n , z i R , q k ) be fixed, and suppose that Assumptions 4 and 5 hold. Then, starting from the DPP, the one-step approximate optimal control υ * is given by
υ * = arg min υ U d n ( z i R , q k ) t n t n + 1 e δ ( t t n ) Γ ( t , z i R , υ ) d t + e δ Δ t V ( t n + 1 , z i R , Q k ( i , n ) υ , n + 1 ) .
In addition, setting H ( t , z R ) = V ( t , z R , Q υ * ) , we obtain the one-step terminal value problem
H t ( t , z R ) + L H ( t , z R ) + Γ ( t , z R , υ * ) = 0 ,   on   [ t n , t n + 1 ) × X H ( t n + 1 , z R ) = V ( t n + 1 , z R , Q k ( i , n ) υ * , n + 1 ) ,
where L H = f ( z R ) H z R + 1 2 ( σ R 2 ( t ) + σ 0 2 ) 2 H ( z R ) 2 δ H .
Proof. 
The proof is similar to that of Theorem 4.3 in [2], taking σ 2 = σ 2 ( t ) + σ 0 2 . □
Positivity Condition: In order to solve the one-step terminal value problem (28) using numerical techniques, we first discretize the differential operator L . We let Λ denote the discretization operator of L and denote ϑ i = κ R z i R . Given that the sign of ϑ i changes according to z i R and cannot easily be determined, we apply the upwind discretization for the convection term V z R . Subsequently, we apply the central second-order finite difference for the diffusion term 2 V ( z R ) 2 . Letting σ R ( t n ) = σ R , n , we obtain
Λ V i , k n = 1 2 ( σ R , n 2 + σ 0 2 ) V i 1 , k n 2 V i , k n + V i + 1 , k n ( Δ z R ) 2 δ V i , k n + ϑ i V i , k n V i 1 , k n Δ z R , ϑ i 0 , ϑ i V i + 1 , k n V i , k n Δ z R , ϑ i < 0 ,
= A i V i + 1 , k n B i V i , k n + C i V i 1 , k n ,
where the expressions of A i , B i , and C i change depending on the sign of ϑ i . The following result gives us the upper bound for Δ z R and defines the positivity condition for the coefficients A i , B i , and C i .
Lemma 3 
(Positivity Condition). For i = 1 , , N z 1 and n = 0 , , N t 1 , the coefficients A i , B i , and C i remain positive provided
Δ z R 2 κ R ( σ R , n 2 + σ 0 2 ) 6 κ R .
Proof. 
Similar to the proof of Proposition 4.4 in [2]. □
CFL Condition: In the following, we formulate the Courant–Friedrichs–Lewy (CFL) condition, as introduced in [23]. It relates the spatial step size Δ q to the time step Δ t and ensures that the arrival point Q k ( i , n ) υ , n + 1 consistently lies within the interval defined by the neighboring grid points q k 1 = q k Δ q and q k + 1 = q k + Δ q adjacent to q k . From this assumption, the following inequalities can be derived
q ̲ : = q 0 q ̲ θ i , 0 n Δ q q 1 : = q ̲ + Δ q , θ i , 0 n 0 , q k 1 q k θ i , k n Δ q       q k + 1 , k = 1 , , N q 1 , q ¯ Δ q : = q N q 1 q ¯ θ i , N q n Δ q q N q : = q ¯ , θ i , N q n 0 ,
In the following, we obtain the Courant–Friedrichs–Lewy (CFL) condition ([23]) relating the time step size Δ t to the ES step size Δ q . This ensures the stability of the derived numerical scheme. Let μ ̲ R = min n { 0 , 1 , , N t } μ n R , μ ¯ R = max n { 0 , 1 , , N t } μ n R . Therefore, from the above inequalities and the state-dependent set of feasible controls, the CFL condition is given by
Δ q Δ t m Q c P max η E C ( μ ̲ R + z ̲ R ) , μ ¯ R + z ¯ R η E D + A γ ( q ¯ q ̲ ) .
Interpolation: As noted in [2], the arrival point Q k ( i , n ) υ , n + 1 does not necessarily coincide with a grid point q k G ^ q . This prompts an interpolation of V ( t n + 1 , z i R , Q k ( i , n ) υ , n + 1 ) based on the function values V i , k n + 1 at the grid points of G ^ . Following [17], a linear interpolation is sufficient to construct a monotone difference scheme. In what follows, we denote by V k ( i , n ) n + 1 , the interpolated values of V ( t n + 1 , z i R , Q k ( i , n ) υ , n + 1 ) defined in the result below.
Proposition 3. 
Let θ i , k n = A γ q k [ η E , n ( υ 1 ) ( μ n R + z i R ) A γ q ̲ ] Δ q 1 e λ Δ t A γ , and assume that the CFL condition holds. Then, the interpolated value V k ( i , n ) n + 1 is given by
V k ( i , n ) n + 1 = D i , k ( q , n ) V i , k n + 1 + F i , k ( q , n ) V i , k 1 n + 1 + H i , k ( q , n ) V i , k + 1 n + 1 ,
for i = 0 , , N z R ,   k = 1 , , N q 1 ,   n = 0 , , N t 1 and where
D i , k ( q , n ) = 1 | θ i , k n | ,   F i , k ( q , n ) = θ i , k n + | θ i , k n | 2     a n d     H i , k ( q , n ) = θ i , k n | θ i , k n | 2 .
Furthermore, we obtain that for k = 0 and k = N q ,
V 0 ( i , n ) n + 1 = D i , 0 ( q , n ) V i , 0 n + 1 + H i , 0 ( q , n ) V i , 1 n + 1 , V N q ( i , n ) n + 1 = D i , N q ( q , n ) V i , N q n + 1 + F i , N q ( q , n ) V i , N q 1 n + 1 .
Proof. 
See Appendix C.3 in [2]. □
Now, we discretize the one-step terminal value problem (28) to obtain
V i , k n + 1 V i , k n Δ t + Λ V i , k n + Γ ( t n , z i R , υ * ) = 0 .
Finally, we obtain the fully implicit scheme given by
V i , k n Δ t A i V i + 1 , k n B i V i , k n + C i V i 1 , k n = V k ( i , n ) n + 1 + Δ t Γ ( t n , z i R , υ * ) ,   Q k ( i , n ) υ , n + 1 = q k θ i , k n Δ q , υ i , k * n = arg min υ U d n ( z i R , q k ) { t n t n + 1 e δ ( t t n ) Γ ( t , z i R , υ ) d t + e δ Δ t V k ( i , n ) n + 1 } U d n ( z i R , q k ) = { υ U ( t n , z i R , q k )   |   Q k ( i , n ) υ , n + 1 [ q ̲ , q ¯ ] } V i , k N t = Φ ( q k ) .
From the first equation in (35), we want to form a system of linear algebraic equations that are useful to obtain the values of V i , k n . In that regard, we set Γ i , k n + 1 = Γ ( t n , z i R , υ * ) and assume that, for fixed n and k, the optimal strategy υ * = υ i , k * n , the CFL condition, and the terms in (33) for all i are known. We denote by Ψ i , k n + 1 the known right-hand side of the difference equation (35) for fixed t n , q k and for all i. Hence, for i = 1 , , N z 1 ,   k = 1 , , N q 1 , n = 0 , 1 , , N t 1 , we obtain
( 1 + Δ t B i ) V i , k n Δ t C i V i 1 , k n Δ t A i V i + 1 , k n = Ψ i , k n + 1 ,
At the boundary q = q ̲ , i.e., k = 0 , we recall that θ i , 0 n 0 . Similarly, at the boundary q = q ¯ , i.e., k = N q , we have θ i , N q n 0 . Therefore, we have the following
V k ( i , n ) n + 1 = 1 θ i , k n V i , k n + 1 + θ i , k n V i , k 1 n + 1   θ i , k n 0 ,   k = 1 , , N q , 1 + θ i , k n V i , k n + 1 θ i , k n V i , k + 1 n + 1   θ i , k n < 0 ,   k = 0 , , N q 1 .
Boundary Conditions: To ensure that the one-step terminal value problem, formulated as a PDE, is well posed, we specify boundary conditions at z R = z ̲ R , z ¯ R and q = q ̲ , q ¯ . These conditions arise from truncating the computational domain of the PDE from X to a bounded domain X ^ . Accordingly, we require that
2 V ( z R ) 2 ( t , z ̲ R , q ) = 0 ,   2 V ( z R ) 2 ( t , z ¯ R , q ) = 0 ,   ( t , q ) ( 0 , T ) × [ q ̲ , q ¯ ] .
2 V q 2 ( t , z R , q ̲ ) = 0 ,   2 V q 2 ( t , z R , q ¯ ) = 0 ,   ( t , z R ) ( 0 , T ) × [ z ̲ R , z ¯ R ] .
We note that the corner values V 0 , 0 n , V 0 , N q n , V N z , 0 n , and V N z , N q n are computed using previously obtained values V i , k n for i = 0 , , N z , k = 0 , , N q , so that
V 0 , 0 n = 2 V 1 , 0 n V 2 , 0 n , V 0 , N q n = 2 V 0 , N q 1 n V 0 , N q 2 n , V N z , 0 n = 2 V N z , 1 n V N z , 2 n , V N z , N q n = 2 V N z , N q 1 n V N z , N q 2 n .
Combining (40), the boundary conditions, and the difference equation (35), we proceed to the matrix formulation below.
Matrix Formulation: Simultaneously varying the indices i , k , for n fixed in the difference equation (35), and using (40), we obtain the following system of linear equations:
G V k n = 𝚿 k n + 1 ,   for   n = 0 , , N t 1 ,   k = 1 , , N q 1 ,
where V k n = ( V 1 , k n , , V N z 1 , k n ) T , 𝚿 k n + 1 = ( Ψ 1 , k n + 1 , , Ψ N z 1 , k n + 1 ) T , with G an ( N z 1 ) × ( N z 1 ) tridiagonal matrix given as
G = e 1 p 1 0 0 0 0 h 2 e 2 p 2 0 0 0 0 h 3 e 3 p 3 0 0 0 0 0 h N z 2 e N z 2 p N z 2 0 0 0 0 h N z 1 e N z 1
where
e 1 = 1 + Δ t B 1 2 Δ t C 1 , p 1 = Δ t ( C 1 A 1 ) , e k = 1 + Δ t B k ,   h k = Δ t C k , p k = Δ t A k ,   k = 2 , , N z 2 e N z 1 = 1 + Δ t B N z 1 2 Δ t A N z 1 h N z 1 = Δ t ( C N z 1 A N z 1 ) .
It now remains to obtain the values V 0 , k n , V N z , k n , V i , 0 n , and V i , N q n . To do so, we substitute the boundary conditions (38) in the difference equation (35) to obtain
V t ( t , z R , q ) + L 1 V ( t , z R , q ) + Γ ( t , z R , υ * ) = 0 ,
where for i = 0 , L 1 V = κ R z ̲ R V z R δ V , while for i = N z , L 1 V = κ R z ¯ R V z R δ V . Following similar steps as above, we obtain the following for z R = z ̲ R :
K V 0 n = 𝚿 0 n + 1 + Δ t A 0 V 1 n ,   for   n = 0 , , N t 1 ,
V 0 n = ( V 0 , 1 n , , V 0 , N q 1 n ) T , 𝚿 0 n + 1 = ( Ψ 0 , 1 n + 1 , , Ψ 0 , N q 1 n + 1 ) T , V 1 n = ( V 1 , 1 n , , V 1 , N q 1 n ) T .
Similarly, for z R = z ¯ R , we have
M V N z n = 𝚿 N z + Δ t C N z V N z 1 n ,   for   n = 0 , , N t 1 ,
V N z n   = ( V N z , 1 n , , V N z , N q 1 n ) T , 𝚿 N z n + 1 = ( Ψ N z , 1 n + 1 , , Ψ N z , N q 1 n + 1 ) T ,
V N z 1 n = ( V N z 1 , 1 n , , V N z 1 , N q 1 n ) T . Letting I N q 1 denote the ( N q 1 ) × ( N q 1 ) identity matrix, K and M are ( N q 1 ) × ( N q 1 ) diagonal matrices given by
K = ( 1 + Δ t B 0 ) I N q 1 ,   M = ( 1 + Δ t C N z ) I N q 1 .
Substituting k = 0 and k = N q , respectively, in the difference equation (35), we obtain the following for n = 0 , , N t 1 :
G V 0 n = 𝚿 0 n + 1 ,   for   i = 1 , , N z 1 , G V N q n = 𝚿 N q n + 1 ,   for   i = 1 , , N z 1 ,
V 0 n = ( V 1 , 0 n , , V N z 1 , 0 n ) T , 𝚿 0 n + 1 = ( Ψ 1 , 0 n + 1 , , Ψ N z 1 , 0 n + 1 ) T , V N q n = ( V 1 , N q n , , V N z 1 , N q n ) T , 𝚿 N q n + 1 = ( Ψ 1 , N q n + 1 , , Ψ N z 1 , N q n + 1 ) T . Further details can be obtained in [2].
Backward Recursion Algorithm: The approximate optimal control presented in (27), along with the value functions in (41), (44), (45), and (47), can be computed using Algorithm 1, proceeding backward in time from the terminal time step N t . To determine υ * = υ i , k * n , we take into account the storage level and distinguish between scenarios of unsatisfied demand and overproduction. This classification indicates which subinterval of U d n should be evaluated. Subsequently, an optimization procedure is carried out to identify the control value that minimizes the objective function. The resulting control is then interpreted as the optimal decision for the given state variables at grid points z i R and q k . This procedure is applied iteratively across all combinations of grid points to recover the complete set of optimal controls.

5. Numerical Results

In this section, we discuss the numerical solution for the prosumer’s problem. The results are based on the implementation of Algorithm 1 to find the optimal strategies and the value functions as well as studying the properties of the obtained value functions. As the terminal condition, we consider a penalization problem modeled as
Φ ( q ) = P p e n m Q c P ( q p e n q ) η E C , q < q p e n , 0 , q q p e n .
For the purpose of the numerical simulations, the seasonality function and heat price are modeled, respectively, by
μ R ( t ) = c 0 + c cos 2 π t ρ ,   P b u y ( t ) = 𝓁 0 + 𝓁 cos 2 π t ρ ,   P s e l l ( t ) = P b u y ( t ) ξ .
The full description of the model parameter values is given in the Table 1 below.
Parameters γ , c 0 , c , b 1 , and b 2 are calibrated to the model, and the idea for the calibration is provided in [2].
Terminal Cost:Figure 2 shows the terminal cost Φ ( q ) as a function of the residual demand r and the ES storage level q. Here, the terminal cost is formulated as a penalization problem. By construction as in (48), we observe that the value function is constant with respect to the residual demand at the terminal time. However, if the storage level falls below the reference temperature q r e f = 40 °C, the prosumer incurs a cost proportional to the deviation from q r e f and the charging efficiency η E C . This cost increases the further the storage temperature is below the reference. For temperatures above the reference, no additional cost is incurred, and the storage is effectively considered to have no residual value.
Value Function and Optimal Strategy for Time-Dependent Heat Buying and Selling Prices:Figure 3 depicts the value functions and optimal strategies of the prosumer at the initial time t = 0 and on day t = 362 . From the top- and bottom-left panels, we can observe that as the ES temperature increases, the prosumer incurs lower costs. Conversely, as the residual demand increases, the prosumer incurs higher costs. The top-right panel shows that, for an unsatisfied demand ( r 0 ), the prosumer satisfies all residual demand via the CHS if the ES is empty; otherwise, it discharges the ES. In the case of overproduction ( r < 0 ), all excess production is sold to the CHS. From the bottom-right panel, we can note that, for an unsatisfied demand, the prosumer satisfies all residual demand via the CHS when q < 40 °C. For q 40 °C, all residual demand is satisfied by the ES. In the case of overproduction, the prosumer first stores some energy in the ES and subsequently sells any remaining excess to the CHS. Since the horizon is one year (365 days), with a penalization cost at the terminal time, we can observe that on day 362, the prosumer begins filling the storage close to the penalty temperature q r e f = 40 °C to avoid higher terminal costs.
Figure 4 shows the prosumer’s value functions and optimal strategies as functions of the residual demand r and the ES temperature level q on day t = 230 . In the top-left panel, we can observe that, for a given residual demand, the prosumer incurs lower costs as the ES temperature increases. The expected aggregated discounted cost is higher for the strongest unsatisfied residual demand and lower for the strongest overproduction. On the other hand, the bottom-left panel shows that the cost increases with respect to the residual demand, with the highest cost incurred for an empty storage. The smallest cost is incurred for a full storage. In the top-right panel, we can observe that, for both the strongest and smallest unsatisfied demand, the prosumer first purchases thermal energy from the CHS. Afterwards, for a sufficiently filled ES, it discharges the storage to meet the unsatisfied demand. For the strongest overproduction, the prosumer stores all excess production in the ES and only sells to the CHS if the ES is full. In the bottom-right panel, for both an empty and 50 % -full ES, the prosumer stores the excess production in the ES for an overproduction and satisfies all unsatisfied demand via the CHS for an unsatisfied residual demand. Now, for a full ES, the prosumer compensates the loss to the environment of the ES in the case of overproduction. For an unsatisfied demand, all residual demand is satisfied by the ES.
Value Function and Optimal Strategy for Constant Heat Buying and Selling Prices: For the case of constant heat buying and selling prices, we focus on the value function and optimal strategies as functions of the residual demand r and the ES temperature level q. Let P b u y C and P s e l l C denote the constant heat buying and selling prices, respectively. In the following, we model P b u y C and P s e l l C as
P b u y C = max t [ 0 , T ] P b u y ( t )   P s e l l C = P b u y C ξ ,
where P b u y , P s e l l , and ξ are the same as in (49). In Figure 5, we focus on the top- and bottom-right panels since the top- and bottom-left panels have the same interpretation as in Figure 4. In the top-right panel of Figure 5, for both the strongest and the smallest unsatisfied demand, the prosumer discharges the ES if it is not empty; otherwise, all residual demand is satisfied by the CHS, at a higher cost. For the strongest overproduction, all excess thermal energy is sold to the CHS for revenue. As shown in the bottom-right panel, for an empty ES and overproduction, the prosumer sells all residual demand to the CHS. However, for an unsatisfied demand, all residual demand is satisfied by the CHS. For a half-full or full ES, the prosumer sells all residual demand to the CHS in the event of overproduction. For an unsatisfied demand, it discharges the ES.
Let V m a x denote the maximum value function of the prosumer for time-dependent heat buying and selling prices. V C o n s t a n t m a x denotes the maximum value function of the prosumer for constant heat buying and selling prices, and V ˜ m a x is the maximum value function of the consumer for time-dependent heat buying and selling prices.
In Table 2, we note that the prosumer incurs a higher expected aggregated discounted cost under the time-dependent heat prices model than with its constant counterpart. This is due to the fact that the time-dependent heat prices model reflects the changes in heat prices during cold and warm seasons, which is not observed in the constant pricing case. Since the consumer is not equipped with heat production and storage units, it relies solely on the CHS for its unsatisfied residual demand, hence leading to a much higher expected aggregated discounted cost. From the values of V ˜ m a x and V m a x , we can compute the expected consumer investment cost into heat production and storage units in order to reduce the cost of satisfying its heating and hot water demands. This investment cost is denoted by C ˜ i n v e s t = V ˜ m a x V m a x .

5.1. Sensitivity Analysis

In this section, we perform a sensitivity analysis on σ 0 , η E C , and η C D to study their impact on the expected aggregate discounted cost. The results presented correspond to the case of time-dependent heat buying and selling prices. All parameters are fixed, with values given in Table 1, except for σ 0 , η E C and η C D , which are varied. Throughout this section, we let V m a x denote the maximum value function, as in Table 2.
In Table 3, we can observe that as the impact of the weather conditions increases, the aggregated discounted cost becomes higher. This is the case because higher values of σ 0 increase the residual demand, which, in turn, increases the prosumer’s cost. Thus, the more pronounced the impact of weather conditions is on the heating system, the higher the aggregated discounted cost the prosumer will incur. For example, for σ 0 = 0 , the prosumer incurs a maximum cost of V m a x = 1346.4 EUR whereas for σ 0 = 0.4 , V m a x = 1562.9 EUR, which is approximately a 16.08 % increase from the case of no common noise.
In Table 4, we can observe that the better the discharging efficiency, the smaller the cost incurred by the prosumer. This is due to the fact that, for a weak discharging efficiency, the prosumer, when satisfying residual demand via the ES, discharges more thermal energy to compensate for the loss during the process. Hence, a better discharging efficiency reduces losses during the discharge, hence contributing to the reduction in the prosumer’s aggregated discounted cost. For a discharging efficiency η E = 1 0.9 , we can observe an increase of approximately 1.85 % in the aggregated discounted cost as compared to the case of a perfect discharging efficiency η E = 1 .
As the charging efficiency improves in Table 5, we can observe that the prosumer incurs a smaller aggregated discounted cost. For a weak charging efficiency, not all overproduction can be stored in the ES, due to the losses to the environment during charging. Therefore, as the charging efficiency improves, more thermal energy can be stored in the ES for further use. Hence, for a charging efficiency of 0.9, we observe an approximate 1.94 % increase in the aggregated discounted cost as compared to a perfect charging efficiency.

5.2. Optimal Path of ES Level

In Figure 6, we show the optimal path of the ES temperature level Q together with the residual demand R, the seasonality function μ R , and the absolute optimal control α * R for a time-dependent heat price. At the initial time, we assume that the prosumer starts with a full storage. Hence, for an unsatisfied residual demand, the prosumer discharges the ES until it becomes empty. For overproduction, the prosumer slowly charges the ES and drives it almost full in late summer. We further observe that for R > 0 , the prosumer discharges the ES to satisfy the unsatisfied demand and charges the ES for R < 0 .

6. Summary and Outlook

In this work, we investigated a stochastic optimal control problem involving a prosumer and a consumer, both connected to a CHS. The prosumer is equipped with a local renewable heat production source (solar collector) and a local storage unit. The ES allows the prosumer to store excess thermal energy for future use. However, we assume that it is subject to charging and discharging efficiencies. In contrast, the consumer has no local production or storage unit and is therefore constantly subject to an unsatisfied demand. We focused primarily on the prosumer problem, as the consumer always satisfies all residual demand directly via the CHS. The prosumer’s problem was formulated as a mathematical optimization problem, which we solved using semi-Lagrangian techniques. We presented the numerical results for both a time-dependent and a constant heat price formulation and observed that, under the former price formulation, the prosumer incurred a higher maximum expected aggregated discounted cost than in the latter price formulation. We further performed a sensitivity analysis on the common noise and charging/discharging efficiency parameters. For the case of the common noise, we noted that for σ 0 = 0.4 , the maximum expected aggregated discounted cost increases by approximately 16.08 % as compared to no common noise. Meanwhile, for the discharging efficiency, we observed that for η E = 1 0.9 , the prosumer incurs a cost about 1.85 % higher than in the case of a perfect discharging efficiency. Also, for the charging efficiency, we noted that for η E = 0.90 , the maximum expected aggregated discounted cost increases by approximately 1.94 % as compared to the case of a perfect charging efficiency. Our numerical results also highlight the value function and the optimal strategies of the prosumer. Finally, for the consumer, we evaluated the maximum expected investment required in local heat production and storage to transform into a prosumer.
Looking ahead, it would be interesting to incorporate additional weather factors, such as ambient temperature, to better model the residual demand. Furthermore, modeling electricity prices as a stochastic differential equation (SDE) could provide more realistic dynamics. These extensions are currently under investigation in ongoing research.

Author Contributions

Conceptualization, M.G.S.; Methodology, M.G.S.; Software, M.G.S.; Validation, M.G.S.; Formal Analysis, M.G.S. and J.N.; Writing—Original Draft, M.G.S.; Writing—Review and Editing, M.G.S. and J.N.; Supervision, J.N.; Funding Acquisition, M.G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada, www.idrc.ca (accessed on 21 November 2025); and with financial support from the Government of Canada, provided through Global Affairs Canada (GAC), www.international.gc.ca (accessed on 21 November 2025).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gjoka, K.; Rismanchi, B.; Crawford, R.H. Fifth-generation district heating and cooling systems: A review of recent advancements and implementation barriers. Renew. Sustain. Energy Rev. 2023, 171, 112997. [Google Scholar] [CrossRef]
  2. Ganet Somé, M. Stochastic optimal control of prosumers in a district heating system. arXiv 2025, arXiv:2501.09088. [Google Scholar] [CrossRef]
  3. Bilardo, M.; Sandrone, F.; Zanzottera, G.; Fabrizio, E. Modelling a fifth generation bidirectional low temperature district heating and cooling (5gdhc) network for nearly zero energy district (nzed). Energy Rep. 2021, 7, 8390–8405. [Google Scholar] [CrossRef]
  4. Bünning, F.; Wetter, M.; Fuchs, M.; Müller, D. Bidirectional low temperature district energy systems with agent-based control: Performance comparison and operation optimization. Appl. Energy 2018, 209, 502–515. [Google Scholar] [CrossRef]
  5. Li, H.; Wang, S.J. Challenges in smart low-temperature district heating development. Energy Procedia 2014, 61, 1472–1475. [Google Scholar] [CrossRef]
  6. Alasseur, C.; Ben Taher, I.; Matoussi, A. An extended mean field game for storage in smart grids. J. Optim. Theory Appl. 2020, 184, 644–670. [Google Scholar] [CrossRef]
  7. Takam, P.H.; Wunderlich, R. Cost-optimal management of a residential heating system with a geothermal energy storage under uncertainty. Int. J. Dyn. Control 2025, 13, 424. [Google Scholar] [CrossRef]
  8. Djehiche, B.; Barreiro-Gomez, J.; Tembine, H. Price dynamics for electricity in smart grid via mean-field-type games. Dyn. Games Appl. 2020, 10, 798–818. [Google Scholar] [CrossRef]
  9. Dumitrescu, R.; Leutscher, M.; Tankov, P. Energy transition under scenario uncertainty: A mean-field game of stopping with common noise. Math. Financ. Econ. 2024, 18, 233–274. [Google Scholar] [CrossRef]
  10. Elie, R.; Hubert, E.; Mastrolia, T.; Possamaï, D. Mean–field moral hazard for optimal energy demand response management. Math. Financ. 2021, 31, 399–473. [Google Scholar] [CrossRef]
  11. Escribe, C.; Garnier, J.; Gobet, E. A mean field game model for renewable investment under long-term uncertainty and risk aversion. Dyn. Games Appl. 2024, 14, 1093–1130. [Google Scholar] [CrossRef]
  12. Frihi, Z.E.O.; Choutri, S.E.; Barreiro-Gomez, J.; Tembine, H. Hierarchical mean-field type control of price dynamics for electricity in smart grid. J. Sys. Sci. Complex. 2022, 35, 1–17. [Google Scholar] [CrossRef]
  13. Fujii, M.; Takahashi, A. A mean field game approach to equilibrium pricing with market clearing condition. SIAM J. Control Optim. 2022, 60, 259–279. [Google Scholar] [CrossRef]
  14. Verrilli, F.; Srinivasan, S.; Gambino, G.; Canelli, M.; Himanka, M.; Del Vecchio, C.; Sasso, M.; Glielmo, L. Model predictive control-based optimal operations of district heating system with thermal energy storage and flexible loads. IEEE Trans. Autom. Sci. Eng. 2016, 14, 547–557. [Google Scholar] [CrossRef]
  15. Neckel, T.; Rupp, F. Random Differential Equations in Scientific Computing; Walter de Gruyter: Berlin, Germany, 2013. [Google Scholar]
  16. D’Halluin, Y.; Forsyth, P.A.; Labahn, G. A semi-lagrangian approach for american asian options under jump diffusion. SIAM J. Sci. Comput. 2005, 27, 315–345. [Google Scholar] [CrossRef]
  17. Chen, Z.; Forsyth, P.A. A semi-lagrangian approach for natural gas storage valuation and optimal operation. SIAM J. Sci. Comput. 2008, 30, 339–368. [Google Scholar] [CrossRef]
  18. Ware, A. Accurate semi-lagrangian time stepping for stochastic optimal control problems with application to the valuation of natural gas storage. SIAM J. Financ. Math. 2013, 4, 427–451. [Google Scholar] [CrossRef]
  19. Duffy, D.J. Finite Difference Methods in Financial Engineering: A Partial Differential Equation Approach; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  20. Yong, J.; Zhou, X.Y. Stochastic Controls: Hamiltonian Systems and HJB Equations; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999; Volume 43. [Google Scholar]
  21. Touzi, N. Stochastic Control Problems, Viscosity Solutions and Application to Finance; Scuola normale Superiore; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  22. Pham, H. Continuous-Time Stochastic Control and Optimization with Financial Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009; Volume 61. [Google Scholar]
  23. Courant, R.; Friedrichs, K.; Lewy, H. Über die partiellen differenzengleichungender mathematischen physik. Math. Ann. 1928, 100, 32–74. [Google Scholar] [CrossRef]
Figure 1. A model with a consumer and a prosumer connected to a community heating system.
Figure 1. A model with a consumer and a prosumer connected to a community heating system.
Mathematics 14 00489 g001
Figure 2. Terminal cost function Φ ( q ) = V ( t N t , x ) for a penalization problem.
Figure 2. Terminal cost function Φ ( q ) = V ( t N t , x ) for a penalization problem.
Mathematics 14 00489 g002
Figure 3. Value functions and optimal strategies of the prosumer at t = 0 , 362 days for time-dependent heat buying and selling prices.
Figure 3. Value functions and optimal strategies of the prosumer at t = 0 , 362 days for time-dependent heat buying and selling prices.
Mathematics 14 00489 g003
Figure 4. Value functions (left) and optimal strategies (right) as functions of the residual demand r and ES temperature level q at t = 230 days for time-dependent heat buying and selling prices.
Figure 4. Value functions (left) and optimal strategies (right) as functions of the residual demand r and ES temperature level q at t = 230 days for time-dependent heat buying and selling prices.
Mathematics 14 00489 g004
Figure 5. Value functions (left) and optimal strategies (right) as functions of the residual demand r and ES temperature level q at t = 230 days for constant heat buying and selling prices.
Figure 5. Value functions (left) and optimal strategies (right) as functions of the residual demand r and ES temperature level q at t = 230 days for constant heat buying and selling prices.
Mathematics 14 00489 g005
Figure 6. Optimal path of the ES Q (blue), seasonality function μ R (magenta), residual demand R (red), and optimal absolute control α * R (green). The minimum and maximum storage levels are represented by dotted lines (reddish orange and golden yellow, respectively).
Figure 6. Optimal path of the ES Q (blue), seasonality function μ R (magenta), residual demand R (red), and optimal absolute control α * R (green). The minimum and maximum storage levels are represented by dotted lines (reddish orange and golden yellow, respectively).
Mathematics 14 00489 g006
Table 1. Model and discretization parameters.
Table 1. Model and discretization parameters.
ParametersValuesUnitsParametersValuesUnits
κ R 0.025 h 1 S0.335 k W h
σ R , σ 0 0.005, 0.4 k W h P p e n 0.325 k W h
c 0 , c 0.37, 1.00 k W ξ 0.02 k W h
𝓁 0 , 𝓁 0.17, 0.15 k W h
ρ 8760h z ̲ R , z ¯ R −5.37, 5.37 k W
b 1 0.01 T , Δ t 8760, 1h
b 2 0.012 K 1 N t 8760
m Q 7854 k g η E D , η E C 0.95, 0.95
c P 0.0012 k W h k g   K N z , N q 85, 60
A21.99 m 2 P c , i d 20, 25°C
γ 2.34 × 10 4 k W m 2   K q ̲ , q p e n , q ¯ 25, 40, 85°C
Table 2. Maximum value function for an imperfect efficiency with time-dependent and constant heat buying and selling prices, in EUR, for a prosumer and a consumer.
Table 2. Maximum value function for an imperfect efficiency with time-dependent and constant heat buying and selling prices, in EUR, for a prosumer and a consumer.
Value FunctionNumerical Value
V m a x 1562.9
V C o n s t a n t m a x 1465.1
V ˜ m a x 40,474.0
Table 3. Maximum value function from varying common noise volatility coefficient σ 0 under time-dependent heat buying and selling prices, in EUR.
Table 3. Maximum value function from varying common noise volatility coefficient σ 0 under time-dependent heat buying and selling prices, in EUR.
σ 0 00.10.150.250.350.4
V m a x 1346.41377.31400.81459.41527.11562.9
Table 4. Maximum value function for varying η E D under time-dependent heat buying and selling prices, in EUR.
Table 4. Maximum value function for varying η E D under time-dependent heat buying and selling prices, in EUR.
η E D 0.900.920.940.961
V m a x 1574.31570.21565.51559.91545.7
Table 5. Maximum value function for varying η E C under time-dependent heat buying and selling prices, in EUR.
Table 5. Maximum value function for varying η E C under time-dependent heat buying and selling prices, in EUR.
η E C 0.900.920.940.961
V m a x 1575.21570.71565.71559.71545.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Somé, M.G.; Niyobuhungiro, J. Stochastic Optimal Control Problem and Sensitivity Analysis for a Residential Heating System. Mathematics 2026, 14, 489. https://doi.org/10.3390/math14030489

AMA Style

Somé MG, Niyobuhungiro J. Stochastic Optimal Control Problem and Sensitivity Analysis for a Residential Heating System. Mathematics. 2026; 14(3):489. https://doi.org/10.3390/math14030489

Chicago/Turabian Style

Somé, Maalvladédon Ganet, and Japhet Niyobuhungiro. 2026. "Stochastic Optimal Control Problem and Sensitivity Analysis for a Residential Heating System" Mathematics 14, no. 3: 489. https://doi.org/10.3390/math14030489

APA Style

Somé, M. G., & Niyobuhungiro, J. (2026). Stochastic Optimal Control Problem and Sensitivity Analysis for a Residential Heating System. Mathematics, 14(3), 489. https://doi.org/10.3390/math14030489

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop