Stochastic Optimal Control Problem and Sensitivity Analysis for a Residential Heating System

Somé, Maalvladédon Ganet; Niyobuhungiro, Japhet

doi:10.3390/math14030489

Open AccessArticle

Stochastic Optimal Control Problem and Sensitivity Analysis for a Residential Heating System

by

Maalvladédon Ganet Somé

^1,2,*

and

Japhet Niyobuhungiro

^1,3

¹

Department of Mathematics, School of Science, College of Science and Technology, University of Rwanda, Kigali P.O. Box 4285, Rwanda

²

African Institute for Mathematical Sciences, Ghana, Legon, Accra, 1st Shoppers Street, Spintex, Accra P.O. Box LGDTD 20046, Ghana

³

National Council for Science and Technology, Kigali P.O. Box 2285, Rwanda

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(3), 489; https://doi.org/10.3390/math14030489

Submission received: 22 November 2025 / Revised: 17 December 2025 / Accepted: 19 December 2025 / Published: 30 January 2026

(This article belongs to the Special Issue Intelligent Optimization and Control Modeling in Power and Energy System)

Download

Browse Figures

Versions Notes

Abstract

We consider a network of a residential heating system (RHS) composed of two types of agents: a prosumer and a consumer. Both are connected to a community heating system (CHS), which supplies non-intermittent thermal energy for space heating and domestic hot water. The prosumer utilizes a combination of solar thermal collectors and CHS heat, whereas the consumer depends entirely on the CHS. Any excess heat generated by the prosumer can either be stored on-site or fed back into the CHS. Weather conditions, modeled as a common noise term, affect both agents simultaneously. The prosumer’s objective is to minimize the expected discounted total cost, taking into account storage charging and discharging losses as well as uncertainties in future heat production and demand. This leads to a stochastic optimal control problem addressed through dynamic programming techniques. Scenario-based analyses are then performed to examine how different parameters influence both the value function and the resulting optimal control strategies. For a common noise coefficient

σ_{0} = 0.4

, the prosumer incurs an approximate

16.08 %

increase in the aggregated discounted cost from the case of no common noise. For a discharging efficiency

η_{E} = \frac{1}{0.9}

, the maximum aggregated discounted cost increases by approximately

1.85 %

as compared to the perfect discharging efficiency. Similarly, for a charging efficiency

η_{E} = 0.9

, we observe an approximate

1.94 %

increase in the aggregated discounted cost as compared to a perfect charging efficiency. Furthermore, we derive insights into the maximum expected discounted investment that a consumer would need to make in renewable technologies in order to transition into a prosumer.

Keywords:

residential heating system; prosumer; stochastic optimal control; dynamic programming; storage efficiency; thermal energy storage

MSC:

93E20; 49L20; 91G80; 65N06

1. Introduction

In recent decades, global weather patterns have shifted markedly, largely due to human-driven factors such as extensive fossil fuel use and deforestation, both of which have significantly increased the global carbon footprint. Projections from the United Nations Department of Economic and Social Affairs indicate that, by 2050, urbanization will reach

68 %

, with more than 40 metropolitan areas exceeding 10 million inhabitants [1]. As noted in [1], cities currently account for over two-thirds of global

{CO}_{2}

emissions and energy consumption. These combined pressures—accelerating urbanization, climate change, and rising energy demand—underscore the urgent need to transition away from a business-as-usual economic model toward one that prioritizes sustainability and energy efficiency across all sectors. According to the International Energy Agency (IEA), buildings contribute roughly

30 %

of global final energy use and

26 %

of energy-related emissions. Consistent with [2], in the United States and many European countries, 18–30% of total energy consumption is associated with the thermal energy needs of buildings. In this study, we consider a residential heating system (RHS) composed of two agents—a prosumer and a pure consumer (hereafter simply “consumer”)—as illustrated in Figure 1. In this setting, a prosumer is an entity capable of producing and consuming thermal energy, whereas a consumer only uses energy supplied to it. The RHS provides heat for space heating and domestic hot water demands. The prosumer is equipped with a solar thermal collector for local heat generation, an internal storage (IS), and an external storage (ES), both modeled as water-based storage tanks for simplicity [2]. The IS serves short-term storage needs, while the ES accommodates longer-term storage. Because solar production and residual thermal demand are highly intermittent and weather-dependent, additional support is required to ensure a stable and sustainable thermal supply. To address this, both buildings are connected to a community heating system (CHS). In this work, we consider a low-temperature bidirectional CHS (LTB-CHS) [1,3,4,5] and assume, for simplicity, that the temperature in the distribution pipes remains constant. The LTB-CHS is a relatively recent concept in district energy networks, offering benefits such as improved integration of renewable heat sources, simplified coupling with distributed technologies, and reduced carbon emissions. Its bidirectional capability allows buildings to inject surplus heat into the network, lowering peak central production requirements and enabling economic incentives whereby building owners can sell excess heat. Since the supply temperature in an LTB-CHS is lower than that in conventional systems, a heat pump (HP) is required to upgrade the delivered heat to the desired level.

In our modeling approach, daily temperature variations influence both the prosumer and consumer and are represented by a common noise process. Following [2,6,7], we define residual demand as the imbalance between thermal supply and demand. We assume that the prosumer can meet this residual demand either via the CHS or by using the ES, whereas the consumer relies solely on the CHS. When production falls short, the prosumer either purchases heat from the CHS or releases it from the ES. Conversely, during periods of surplus production, the prosumer may store the excess in the ES (subject to capacity limits) or sell it to the CHS. The consumer, lacking local production and storage, always faces unmet demand that must be satisfied exclusively through CHS purchases. Consistent with [2], we assume that the CHS is responsible for fulfilling the residual thermal demands of both agents, even though this demand fluctuates with respect to the common noise. Moreover, energy transfer between the IS and ES, as well as energy exports from the prosumer to the CHS, requires an ordinary pump (OP). The prosumer aims to minimize the expected total discounted cost associated with meeting its thermal demand, while accounting for uncertain weather conditions and variability in solar output. In contrast, the consumer—lacking local production or storage—continuously faces unmet thermal demand that must be supplied entirely from the CHS.

Including both agents in our framework enables us to determine the maximum expected investment that would make it economically reasonable for a consumer to adopt distributed renewable heating technologies and storage infrastructure, thereby transitioning into a prosumer.

Previous work, such as [2], has formulated an RHS model and solved a stochastic optimal control problem using semi-Lagrangian and dynamic programming approaches. In Ref. [7], the authors addressed a similar problem for a stand-alone system with geothermal storage. However, these studies did not explicitly model the dependence of residual demand on daily temperature variations via a common noise process, nor did they include ES charging and discharging inefficiencies. Our model fills this gap by incorporating these features, which increase realism. Models with common noise are advantageous because they reflect uncertainties that affect all agents simultaneously, potentially altering both qualitative and quantitative system behavior. Applications of common noise models in the energy sector include [6,8,9,10,11,12,13].

Unlike in [2,7], we incorporate ES charging and discharging efficiencies, which account for storage losses when heat is stored or released. For simplicity, we assume that these efficiency factors depend solely on the residual demand, as in [14].

Our contributions include first modeling a consumer connected to the CHS. We also motivate a common noise in the dynamics of the residual demands. This allows us to capture the impact of daily temperature variations on the residual demand. In addition, we include charging/discharging efficiencies in the dynamics of the ES temperature level. In reality, losses are observed during charging or discharging processes of the ES. Therefore, modeling these efficiencies is required. Furthermore, we perform a sensitivity analysis on the common noise, charging and discharging parameters to evaluate their impact on the expected total discounted cost. We also show that the value function is a viscosity solution of the associated partial differential equation (PDE). We derive the discrete-time state-dependent control set from the state constraint without imposing charging and discharging thresholds. In this work, the state-dependent control intervals depend on the charging/discharging efficiencies. We compare the costs of satisfying residual demands for both the consumer and the prosumer, and we estimate the expected investment that a consumer would need to undertake to transition into a prosumer. Finally, we perform extensive numerical experiments and provide economic interpretations.

The remainder of this paper is structured as follows: Section 2 introduces the model, including state variables, control variables, constraints, and the heat price specification. Section 3 presents the optimal control problems for both agents. Section 4 and Section 5 describe the semi-Lagrangian discretization and numerical results, including sensitivity analysis. Section 6 concludes and outlines directions for future research.

2. Model Formulation

In what follows, we study a model that includes both a prosumer and a consumer connected to a CHS. The prosumer is able to generate thermal energy locally while also drawing from, or supplying heat to, the CHS. In contrast, the consumer has neither local production capabilities nor on-site storage units. We present a mathematical framework and corresponding optimization problems for both agents. Throughout this paper, we work on a fixed probability space

(Ω, F, P)

endowed with a filtration

F = {(F_{t})}_{t \in [0, T]}

that satisfies the usual conditions and supports two independent standard Brownian motions

W_{R}

and

W^{0}

. The parameter

T > 0

denotes the finite planning horizon.

An RHS is designed to supply thermal energy for both space heating and domestic hot water in a building. The prosumer is equipped with a solar collector, two storage units, a heat pump, and ordinary pumps. Thermal energy generated by the solar collector is first directed to the internal storage (IS) to meet immediate demand, while the external storage (ES) serves as a long-term buffer to handle periods of insufficient or excess production. We describe the imbalance between thermal supply and demand, resulting from stochastic fluctuations in solar collector output, by a stochastic process

R = {(R (t))}_{t \in [0, T]}

. This process is decomposed as

R (t) = μ^{R} (t) + Z^{R} (t)

, where

μ^{R} (t)

denotes the seasonal average and

Z^{R} (t)

represents random deviations from this mean.

The IS is modeled as a cylindrical tank whose water temperature must remain within prescribed bounds,

i_{m i n}

and

i_{m a x}

. In this study, we treat the IS as a black box and do not model its internal thermodynamic behavior explicitly. The ES is also represented as a cylindrical water tank with a fixed mass of stored water, which can be heated up to a maximum temperature

\bar{q}

through transfers of thermal energy from the IS. Conversely, it can be cooled to a minimum temperature

\underset{̲}{q}

when heat is withdrawn to satisfy the residual demand. For simplicity, we assume that charging and discharging the ES involves only ordinary pumps.

At any time

t \in [0, T]

, the prosumer’s state variables consist of the exogenous, deseasonalized residual demand

Z^{R}

and the endogenous ES temperature level Q. We therefore define the prosumer’s state process as

X = {(Z^{R}, Q)}^{⊺}

.

For the consumer, the state is given solely by the deseasonalized residual demand

\tilde{Z}

, which is entirely exogenous.

Residual Demand: The deseasonalized residual demand processes

Z^{R}

and

\tilde{Z}

, associated with the prosumer and the consumer, respectively, evolve according to

\begin{matrix} d Z^{R} (t) & = - κ_{R} Z^{R} (t) d t + σ_{R} (t) d W_{R} (t) + σ_{0} d W^{0} (t), \end{matrix}

(1)

\begin{matrix} d \tilde{Z} (t) & = - κ_{0} \tilde{Z} (t) d t + σ_{0} d W^{0} (t), \end{matrix}

(2)

where

W_{R}

represents the idiosyncratic noise capturing uncertainties specific to the prosumer’s heat demand, and

W^{0}

is the common noise accounting for shared weather-driven fluctuations.

The seasonal behavior of the residual demands is modeled by bounded and deterministic seasonality functions

μ^{R}, \tilde{μ} : [0, T] \to R

for the prosumer and the consumer, respectively. Typical choices of

\hat{μ} (t) = μ^{R}, \tilde{μ}

include

\hat{μ} (t) = c_{0} + c t + \sum_{j = 1}^{m_{1}} (c_{j} cos (\frac{2 π}{ρ_{j}} (t - t_{j})) + d_{j} sin (\frac{2 π}{ρ_{j}} (t - t_{j}))),

(3)

where

c_{0}

is the long-term mean residual demand;

c_{j}, d_{j}

represent the amplitudes of seasonality component j of the cosine and sine terms, respectively;

ρ_{j}, t_{j}

represent the length and a reference time of seasonality component j, respectively; c is the coefficient of the linear term; and

m_{1}

is the number of seasonality components.

Since the prosumer is equipped with a local production unit, which can lead to overproduction, the expression of

R (t)

agrees with our modeling perspectives. Similarly, we model

\tilde{R} (t) = \tilde{μ} (t) + \tilde{Z} (t)

and choose parameters of

\tilde{μ} (t)

and

\tilde{Z} (t)

for which there exists

ν ≪ 1

such that

P (\tilde{R} (t) < 0) < ν

. Below,

R > 0

is referred to as unsatisfied demand and

R < 0

as overproduction.

The control

α \in [0, 1]

denotes the proportion of residual demand satisfied by the CHS. As in Ref. [2], we assume the decomposition

R (t) = α (t) R (t) + (1 - α (t)) R (t)

, where

α (t) R (t)

is the residual demand satisfied by the ES, and

(1 - α (t)) R (t)

is the residual demand satisfied by the CHS. Satisfying the residual demand via the CHS suggests either buying (for

R > 0

) or selling (for

R < 0

). Likewise, satisfying the residual demand via the ES implies either discharging (for

R > 0

) or charging (for

R < 0

).

External Storage: In the following, we consider as storage a water tank where the water mass and specific heat capacity are given by

m_{Q} [k g]

and

c_{P} [K W h / k g K]

, respectively. The surface area of the tank is denoted as

A [m^{2}]

. From the heat fluxes and the Newton cooling law with the heat transfer coefficient

γ

[

k W / m^{2} K

], the change in the thermal energy of the water inside the ES is given by

d Q (t) = \frac{- 1}{m_{Q} c_{P}} (η_{E} (t) (1 - α (t)) R (t) + A γ (Q (t) - \underset{̲}{q})) d t, Q (0) = Q_{0} .

(4)

The first term

η_{E} (t) (1 - α (t)) R (t)

represents the ES’s charging or discharging rate. Indeed, for an unsatisfied demand (

R (t) > 0

) and for a non-empty ES (

Q (t) > \underset{̲}{q}

), the prosumer discharges the storage at a rate

α (t) > 0

. For an overproduction (

R (t) < 0

) and for a non-full ES (

Q (t) < \bar{q}

), the prosumer can charge the storage at the rate

α (t) < 0

. The second term

A γ (Q (t) - \underset{̲}{q})

is the rate of heat transfer with the environment.

η_{E}

represents the charging or discharging efficiency of the ES. Letting

η_{E}^{C}, η_{E}^{D} \in (0, 1)

, we model

η_{E}

for simplicity as in [14] by

η_{E} (t) : = η_{E} (R (t)) = \{\begin{matrix} η_{E}^{C}, & if R (t) < 0, \\ 1 / η_{E}^{D}, & if R (t) \geq 0 . \end{matrix}

(5)

Remark 1.

A large γ implies large losses per unit of time, prohibiting long-term storage. Similarly, a small γ implies small losses per unit of time, allowing for long-term storage.
In the case $α = 0$ , the prosumer does not satisfy its residual demand via the CHS. In the event of an unsatisfied demand ( $R (t) > 0$ ), $α = 1$ implies that the prosumer satisfies all of the residual demand via the CHS. However, for an overproduction ( $R (t) < 0$ ), $α = 1$ implies that the prosumer sells all the residual demand to the CHS.

State and Control Constraints: In order to reflect physical limitations of our external storage, we suggest that

Q (t) \in [\underset{̲}{q}, \bar{q}]

, where

\underset{̲}{q}

and

\bar{q}

are the minimum and maximum storage levels, respectively.

Q (t) = \underset{̲}{q}

is referred to as an empty storage, while

Q (t) = \bar{q}

is a full storage.

The operational limitations experienced by the prosumer give rise to state-dependent control constraints:

1.

If

Q (t) = \underset{̲}{q}

, only charging is feasible. Therefore,

If $R (t) > 0$ , all residual demand is satisfied by the CHS, implying $α (t) = 1$ .
If $R (t) < 0$ , there are no control constraints other than the requirement $α (t) \in [0, 1]$ .

2.

If

Q (t) = \bar{q}

, we examine the following cases:

$R (t) > 0$ , no control constraints other than the requirement $α (t) \in [0, 1]$ .
$R (t) < 0$ and (4) suggest that $η_{E} (1 - α (t)) R (t) + A γ (Q (t) - \underset{̲}{q}) \geq 0$ , hence implying that $α (t) \geq 1 + \frac{A γ (\bar{q} - \underset{̲}{q})}{R (t) η_{E}^{C}} = : \hat{χ} (t, R (t))$ . In the following, we define $χ (t, R (t)) = \hat{χ} (t, R (t)) \lor 0$ and choose $α (t) = χ (t, R (t)) = χ (t)$ .

Hence, the state-dependent set of feasible controls,

U (t, z^{R}, q)

, is defined by

U (t, z^{R}, q) = \{\begin{matrix} \begin{matrix} [0, 1], & q > \underset{̲}{q}, z^{R} \geq - μ^{R} (t), \\ {1}, & q = \underset{̲}{q}, z^{R} \geq - μ^{R} (t), \\ [0, 1], & q < \bar{q}, z^{R} < - μ^{R} (t), \\ [χ (t), 1], & q = \bar{q}, z^{R} < - μ^{R} (t) . \end{matrix} \end{matrix}

Heat Price Formulation: We introduce time-varying heat buying and selling prices, denoted by

P_{b u y} (t)

and

P_{s e l l} (t)

, respectively, for all

t \in [0, T]

. We assume a contractual agreement between the prosumer and the CHS, binding the latter to always satisfy the residual demand of the former. To incentivize the CHS, it retains the right to determine the values of

P_{b u y} (t)

and

P_{s e l l} (t)

. We model

P_{b u y}

as a deterministic function composed of a constant baseline and multiple seasonal fluctuations.

P_{s e l l}

is a fixed markdown from the buying price through the spread parameter

ξ > 0

. Therefore, we obtain

P_{b u y} (t) = 𝓁_{0} + \sum_{j = 1}^{m_{2}} 𝓁_{j} cos (\frac{2 π}{ρ_{j}^{S}} (t - t_{j}^{S})), and P_{s e l l} (t) = P_{b u y} (t) - ξ .

(6)

The constant term

𝓁_{0}

[\frac{€}{k W h}]

corresponds to the long-term average heat price, while each cosine term introduces a seasonal component with amplitude

𝓁_{j}

[\frac{€}{k W h}]

, period

ρ_{j}^{S}

[h]

, and phase shift

t_{j}^{S}

[h]

;

m_{2}

represents the number of seasonalities. The sum in the heat price model means that it is not driven by a simple seasonal pattern but by the combined effects of several independent and interpretable cycles, each corresponding to a different physical, operational, or economic rhythm of the heating system. The deterministic seasonal structure makes prices transparent and predictable for the prosumer, while the positive spread guarantees economic viability for the CHS. The bid–ask spread reflects transaction costs, operational constraints, or the market power of the CHS. It ensures that the CHS is compensated for providing balancing and backup services.

Assumption 1.

Let

P_{c}

and

i_{d}

denote the heat pump inlet and outlet temperatures, respectively. Similar to [2], we assume that the condition

\underset{̲}{q} > i_{d} > i_{m i n} > P_{c}

holds.

Assumption 1 suggests that no heat pumps are necessary when satisfying the residual demand via the ES. This justifies the use of ordinary pumps in our model.

3. Problem Formulation

In this section, we formulate both the prosumer’s and consumer’s problems. The prosumer’s objective is to satisfy its heating and hot water demand while minimizing the cost of running the pumps and payments to the CHS.

3.1. Prosumer’s Problem

In the following, we present a mathematical framework to determine both the value and the optimal control strategy for the prosumer’s problem. This leads to the formulation of a stochastic optimal control problem for which the state process is defined as

X = (Z^{R}, Q) \in X = R \times [\underset{̲}{q}, \bar{q}]

, representing the relevant system variables. From the dynamics described in the first equation in (1) and (4), the state process evolves as follows:

\begin{matrix} d Z^{R} (t) & = f (Z^{R} (t)) d t + σ_{R} d W_{R} (t) + σ_{0} d W_{0} (t), & Z^{R} (0) = Z_{0}^{R} \in R, \\ d Q (t) & = h (Z^{R} (t), Q (t), α (t)) d t, & Q (0) = Q_{0} \in [\underset{̲}{q}, \bar{q}], \end{matrix}

(7)

where

f (z^{R}) = - κ_{R} z^{R}

and

h (z^{R}, q, υ) = \frac{- 1}{m_{Q} c_{P}} (η_{E} (t) (1 - υ) (μ^{R} (t) + z^{R}) + A γ (q - \underset{̲}{q}))

.

Assumption 2.

Let

(z^{R}, q) \in R \times [\underset{̲}{q}, \bar{q}]

and

q_{1}, q_{2} \in [\underset{̲}{q}, \bar{q}]

.

1.: The function $h (z^{R}, q, υ)$ is $F$ -measurable.
2.: The function $h (z^{R}, q, υ)$ is continuous for almost all $z^{R}$ .
3.: For almost all $z^{R}$ , there exists $C \in R$ such that $| h (z^{R}, q_{1}, υ) - h (z^{R}, q_{2}, υ) | \leq C | q_{1} - q_{2} |$ .

As in [15], this assumption ensures that the random ordinary differential equation (RODE) in (7) is well defined.

Cost Formulation

The costs incurred for operating the prosumer’s system are divided into a running cost and a terminal cost. Together, they define the total cost that needs to be minimized.

Running Cost: This consists of the cost (revenue) of satisfying the residual demand via the CHS, along with the cost of electricity consumption for operating the pumps when interacting with both the CHS and the ES. At

t \in [0, T]

, the prosumer can either purchase or sell thermal energy from or to the CHS at prices

P_{b u y} (t)

and

P_{s e l l} (t)

, respectively, to satisfy its residual demand. Let

φ_{0}

denote the cost (revenue) per unit of time of buying (selling) thermal energy from (to) the CHS, defined by

φ_{0} (t, z^{R}, υ) = \{\begin{matrix} υ (μ^{R} (t) + z^{R}) P_{b u y} (t), & z^{R} \geq - μ^{R} (t), \\ υ (μ^{R} (t) + z^{R}) P_{s e l l} (t), & z^{R} < - μ^{R} (t) . \end{matrix}

Since we assume that the prosumer is connected to an LTB-CHS, we incur an additional cost for the electricity consumption due to the use of a heat pump when satisfying the residual demand via the CHS. Let

b_{1}

and

b_{2} [K^{- 1}]

denote positive constants penalizing a high water flow rate and a high temperature difference between

P_{c}

and

i_{d}

, respectively. Letting S denote the electricity price, the cost of electricity consumption

φ_{1}

incurred when interacting with the CHS is modeled by

φ_{1} (t, z^{R}, υ) = \{\begin{matrix} υ (μ^{R} (t) + z^{R}) (b_{1} + b_{2} (i_{d} - P_{c})) S, & z^{R} \geq - μ^{R} (t), \\ - υ (μ^{R} (t) + z^{R}) b_{1} S, & z^{R} < - μ^{R} (t), \end{matrix}

For

z^{R} \geq - μ^{R} (t)

, the first equation in

φ_{1}

is the cost per unit of time of electricity consumption to run the heat pump to increase the temperature from

P_{c}

to

i_{d}

. On the other hand, for

z^{R} < - μ^{R} (t)

, the last equation in

φ_{1}

is the cost per unit of time of electricity consumption for operating the ordinary pump when selling thermal energy to the CHS.

The cost of satisfying the residual demand via the ES comes from the cost of electricity consumption to operate the ordinary pumps and is given by

φ_{2} (t, z^{R}, υ) = (1 - υ) | μ^{R} (t) + z^{R} | b_{1} S .

Therefore, the running cost

Γ (t, z^{R}, υ)

is given by

Γ (t, z^{R}, υ) = φ_{0} (t, z^{R}, υ) + φ_{1} (t, z^{R}, υ) + φ_{2} (t, z^{R}, υ)

.

Terminal Cost: In the context of finite-time horizon models with terminal time

T > 0

, it is customary to incorporate a terminal cost. In our context, this depends on the ES storage level at time T and is denoted by

Φ (Q (T))

. This terminal cost can reflect a contractual agreement for the ES temperature to be at an agreed level at T. Let

P_{l i q}, P_{p e n}

denote the liquidation price and penalty price, respectively. Indeed, at terminal time, the prosumer can sell the leftover thermal energy at

P_{l i q}

per unit of thermal energy. Similarly, the prosumer can be penalized for failing to keep the ES at a certain level at T, where

P_{p e n}

is the penalty price per unit of thermal energy. The liquidation price

P_{l i q}

and penalty price

P_{p e n}

satisfy the conditions

P_{l i q} < P_{s e l l} (T)

and

P_{p e n} > P_{b u y} (T)

, respectively. In the following, we let

q_{r e f}

denote a pre-agreed temperature level in the ES, such that

Φ (q) = \{\begin{matrix} \frac{P_{p e n} m_{Q} c_{P} (q_{r e f} - q)}{η_{E}^{C}}, & q < q_{r e f}, \\ - η_{E}^{D} P_{l i q} m_{Q} c_{P} (q - q_{r e f}), & q \geq q_{r e f} . \end{matrix}

(8)

In this model, the terminal cost

Φ (q)

now depends on the charging and discharging efficiency term

η_{E} (R (t))

.

The performance criterion

J_{P}

:

[0, T] \times X \times U \to R

denotes the expected aggregated total discounted cost over the time interval

[0, T]

and is defined as follows:

J_{P} (t, x, α) = E_{t, x} [\int_{t}^{T} e^{- δ (s - t)} Γ (s, Z^{R} (s), α (s)) d s + e^{- δ (T - t)} Φ (Q (T))],

(9)

where

x = (z^{R}, q)

,

δ \geq 0

is a discount rate and

E_{t, x} [\cdot]

is the conditional expectation given that at initial time t,

X (t) = x

.

Assumption 3.

For any admissible control α, the functions Γ and Φ satisfy the condition

E [\int_{t}^{T} | Γ (s, Z^{R} (s), α (s)) | d s + | Φ (Q (s)) |] < \infty .

(10)

Given that we want to use dynamic programming techniques, we restrict to Markov controls defined by

α (t) = θ (t, X (t))

, for all

t \in [0, T]

with a measurable function

θ

:

[0, T] \times X \to U

, which is called a decision rule and is adapted to the filtration

F

. We denote by

A

the class of admissible controls, defined as follows:

\begin{matrix} A & = {{(α (t))}_{t \in [0, T]} | α is F - progressively measurable, α (t) = θ (t, X (t)), \\ \forall t \in [0, T], θ (t, x) \in U (t, x) and Equations (7) and (9) are well defined} . \end{matrix}

(11)

The prosumer seeks to minimize

J_{P}

over all admissible controls. The value of the prosumer’s problem for all

(t, x) \in [0, T] \times X

is given by

V (t, x) = inf_{α \in A} J_{P} (t, x, α) .

(12)

3.2. Consumer’s Problem

We discuss the mathematical framework to determine the value of the consumer’s problem. We recall that the consumer is not equipped with local production or storage units and is always subject to a positive residual demand. As a result, it always satisfies all residual demand via the CHS. The state process of the consumer is given by

d \tilde{Z} (t) = f_{2} (\tilde{Z} (t)) d t + σ_{0} d W_{0} (t), \tilde{Z} (0) = {\tilde{Z}}_{0} \in R,

(13)

where

f_{2} (\tilde{z}) = - κ_{0} \tilde{z}

.

Similar to the prosumer’s case, the consumer’s running cost consists of the cost of purchasing thermal energy from the CHS and the cost of operating the heat pump to raise the temperature from

P_{c}

to

{\tilde{i}}_{d}

. Let

{\tilde{φ}}_{0}

and

{\tilde{φ}}_{1}

denote the cost per unit of time of purchasing thermal energy from the CHS and that of electricity consumption to run the heat pump, respectively, which are defined as follows:

\begin{matrix} {\tilde{φ}}_{0} (t, \tilde{z}) & = \tilde{R} (t) P_{b u y} (t), \end{matrix}

(14)

\begin{matrix} {\tilde{φ}}_{1} (t, \tilde{z}) & = \tilde{R} (t) (b_{1} + b_{2} ({\tilde{i}}_{d} - P_{c})) S, \end{matrix}

(15)

where

\tilde{R} (t) = \tilde{μ} (t) + \tilde{Z} (t)

. Therefore,

\tilde{Γ} (t, \tilde{z}) = {\tilde{φ}}_{0} (t, \tilde{z}) + {\tilde{φ}}_{1} (t, \tilde{z})

.

Since the consumer is not equipped with a local storage unit, there is no terminal cost. The performance criterion

J_{C} : [0, T] \times R \to R^{+}

denotes the expected aggregated total discounted cost over the time interval

[0, T]

and is defined as follows:

J_{C} (t, \tilde{z}) = E_{t, \tilde{z}} [\int_{t}^{T} e^{- δ (s - t)} \tilde{Γ} (s, \tilde{Z} (s)) d s],

(16)

where

δ \geq 0

is a discount rate and

E_{t, \tilde{z}} [\cdot]

is the conditional expectation given that at initial time t,

\tilde{Z} (t) = \tilde{z}

. The value of the consumer’s problem for all

(t, \tilde{z}) \in [0, T] \times R

is given by

\tilde{V} (t, \tilde{z}) = J_{C} (t, \tilde{z}) .

(17)

4. Semi-Lagrangian Discretization

In this section, we recall the discrete-time numerical scheme in [2]. We focus on the prosumer’s problem, since the consumer’s problem can be solved directly. We start by discussing the state discretization.

State Discretization: Let

N_{t}, N_{z}

, and

N_{q}

denote the number of grid points in the t-,

z^{R}

-, and q-directions, respectively. For computational reasons, we truncate the domain

X = R \times [\underset{̲}{q}, \bar{q}]

to

\hat{X} = [{\underset{̲}{z}}^{R}, {\bar{z}}^{R}] \times [\underset{̲}{q}, \bar{q}]

, such that for a tolerance

ϵ ≪ 1, P (Z^{R} (t) \in [{\underset{̲}{z}}^{R}, {\bar{z}}^{R}]) \geq 1 - ϵ

, for all

t \in [0, T]

.

{\underset{̲}{z}}^{R}

and

{\bar{z}}^{R}

represent the minimum and maximum values of the deseasonalized residual demand

Z^{R}

, respectively. Given the asymptotic standard deviation

s_{0} = \sqrt{\frac{σ_{R}^{2} + σ_{0}^{2}}{2 κ_{R}}}

of

Z^{R}

, the 3-

σ

rule motivates

{\underset{̲}{z}}^{R} = - 3 s_{0}

and

{\bar{z}}^{R} = 3 s_{0}

.

Let

t_{0} < t_{1} < \dots < t_{N_{t}}

,

z_{0}^{R} < z_{1}^{R} < \dots < z_{N_{z}}^{R}

, and

q_{0} < q_{1} < \dots < q_{N_{q}}

be a finite number of grid points in the

z^{R}

and q-directions. We define

\begin{matrix} \hat{G} & = {\hat{G}}_{t} \times {\hat{G}}_{z} \times {\hat{G}}_{q} = [t_{0}, \dots, t_{N_{t}}] \times [z_{0}^{R}, \dots, z_{N_{z}}^{R}] \times [q_{0}, \dots, q_{N_{q}}], \end{matrix}

where

t_{n} = t_{0} + n Δ t

,

z_{i}^{R} = {\underset{̲}{z}}^{R} + i Δ z^{R}

,

q_{k} = \underset{̲}{q} + k Δ q

, for

n \in {\hat{G}}_{n} = {0, 1, \dots, N_{t}}

,

i = 0, \dots, N_{z}

and

k = 0, \dots, N_{q}

, as a 3-dimensional equidistant grid on

\hat{X}

with the temporal and spatial step sizes

t_{n + 1} - t_{n} = : Δ t = \frac{T - t_{0}}{N_{t}}, Δ z^{R} = \frac{{\bar{z}}^{R} - {\underset{̲}{z}}^{R}}{N_{z}}, Δ q = \frac{\bar{q} - \underset{̲}{q}}{N_{q}} .

We now recall the discrete-time numerical scheme discussed in [2]. This is an alternative to the semi-Lagrangian approach introduced in [16] and extended in [17,18]. It is a finite difference scheme based on the theory presented in [19].

The control problem in (12) can be solved through dynamic programming techniques that rely on the following dynamic programming principle (DPP). The subsequent result is useful for solutions to the control problem in (12).

Theorem 1

(DPP). For all

(t, x) \in [0, T) \times X, h > 0

and

t + h \leq T

, the value function

V (t, x)

, satisfies the DPP

\begin{matrix} V (t, x) & = inf_{α \in A} E_{t, x} [\int_{t}^{t + h} e^{- δ (s - t)} Γ (s, Z^{R} (s), α (s)) d s + e^{- δ h} V (t + h, X (t + h))] . \end{matrix}

(18)

Proof.

For a proof, we refer to [20]. □

The following result shows that the value function V, defined in (12), is a viscosity solution of an associated partial differential equation (PDE).

Proposition 1.

Let

Θ (t, x, υ) = f (z^{R}), h (z^{R}, q, υ), Γ (t, z^{R}, υ), Φ (q)

. Thus, there exists

L > 0

such that

\begin{matrix} | Θ (t, x, υ) - Θ (t, \hat{x}, υ) | & \leq L | x - \hat{x} |, \forall t \in [0, T], x, \hat{x} \in X, υ \in U, \end{matrix}

(19)

\begin{matrix} | Θ (t, 0, υ) | & \leq L, \forall (t, υ) \in [0, T] \times U . \end{matrix}

(20)

Therefore, the value function V is a viscosity solution of the PDE

\begin{matrix} \frac{\partial V}{\partial t} + L V + inf_{υ \in U (t, z^{R}, q)} {L_{q} V + Γ (t, z^{R}, υ)} & = 0, \end{matrix}

(21)

with the terminal condition

V (T, q) = Φ (q)

and

L V = f (z^{R}) \frac{\partial V}{\partial z^{R}} + \frac{1}{2} (σ_{R}^{2} (t) + σ_{0}^{2}) \frac{\partial^{2} V}{\partial {(z^{R})}^{2}} - δ V

,

L_{q} = h (z^{R}, q, υ) \frac{\partial V}{\partial q}

.

Proof.

First, we note that the functions

f (z^{R}), h (z^{R}, q, υ), Γ (t, z^{R}, υ), Φ (q)

are linear in the states

z^{R}

and q. Then, there exits

L > 0

, such that

\begin{matrix} | Θ (t, x, υ) - Θ (t, \hat{x}, υ) | & \leq L | x - \hat{x} |, \forall t \in [0, T], x, \hat{x} \in X, υ \in U . \end{matrix}

In addition, we have

| Θ (t, 0, υ) | \leq L, \forall (t, υ) \in [0, T] \times U

. Finally, the result follows from Theorem 2.3 in [21]. □

Idea of the Scheme: We begin by fixing the triple

(t_{n}, z_{i}^{R}, q_{k})

and assume that, for

t \in [t_{n}, t_{n + 1})

, the deseasonalized residual demand remains constant, i.e.,

Z^{R} (t) = z_{i}^{R}

, where

z_{i}^{R}

is fixed and known. Next, we assume that the control is constant over this interval, i.e.,

α (t) = α_{i, k}^{n} = : υ

, which is fixed but unknown. Under these assumptions, and using the explicit solution, we compute the arrival point

Q_{k (i, n)}^{υ, n + 1}

of the ES temperature level, given that at time

t_{n}

, the storage level was

q_{k}

, the residual demand was

μ^{R} (t_{n}) + z_{i}^{R}

, and constant action

υ

was taken. We then apply the DPP in two steps: In the first step, using the above discretization, we obtain an approximation of the optimal control

α_{i, k}^{* n} = : υ^{*}

. In the second step, we substitute this approximate optimal control in the DPP and “release” the deseasonalized residual demand

Z^{R}

. Finally, by applying the Feynman–Kac formula to the second step, we derive a one-period PDE, which we discretize using finite difference methods to obtain a discrete-time scheme. This scheme is implemented in Algorithm 1.

From the above discussion, we formulate the following assumptions:

Assumption 4

(Piecewise Constant Control). For

n = 0, 1, \dots, N_{t} - 1

and

t \in [t_{n}, t_{n + 1})

, the control α and the associated decision rule θ are kept constant between two consecutive grid points of the time discretization. That is,

α (t) = α (t_{n}) and θ (t, X (t)) = θ (t_{n}, X (t_{n})) .

Algorithm 1: Backward Recursion Algorithm

Assumption 5

(Piecewise Constant Model Parameters). For

n = 0, 1, \dots, N_{t} - 1

and

t \in [t_{n}, t_{n + 1})

, the time-dependent seasonality

μ^{R}

and the efficiency parameter

η_{E}

are kept constant between two consecutive grid points of the time discretization. That is,

\begin{matrix} μ^{R} (t) & = μ^{R} (t_{n}) = μ_{n}^{R}, μ^{R} (T) = μ_{N_{t}}^{R}, \\ η_{E} (t) & = η_{E} (t_{n}) = η_{E, n}, η_{E} (T) = η_{E, N_{t}}, \end{matrix}

Assumptions 4 and 5 suggest that the system’s time-dependent parameters are adjusted only at discrete-time points and remain constant within two consecutive discrete-time points. This situation is also consistent with reality.

Let

Q^{υ} (t)

denote the solution of the ODE (7) for the ES temperature level on the time interval

[t_{n}, t_{n + 1})

for a fixed but unknown control

υ

, satisfying

\begin{matrix} d Q^{υ} (t) & = h (Z^{R} (t), Q^{υ} (t), υ) d t, Q^{υ} (t_{n}) = q_{k}, t \in [t_{n}, t_{n + 1}) . \end{matrix}

(22)

Lemma 1.

Let Assumptions 4 and 5 hold and

λ = \frac{A γ}{m_{Q} c_{P}}

. Then, for

n = 0, 1, \dots, N_{t}, i = 0, 1, \dots, N_{z}

,

k = 0, 1, \dots, N_{q}

,

t \in [t_{n}, t_{n + 1})

and

Q^{υ} (t_{n}) = q_{k}

, the closed-from solution of

Q^{υ}

is given by

Q^{υ} (t) = q_{k} e^{- λ (t - t_{n})} + (\frac{η_{E, n} (υ - 1) (μ_{n}^{R} + z_{i}^{R})}{A γ} + \underset{̲}{q}) (1 - e^{- λ (t - t_{n})}) .

(23)

Moreover, letting

t = t_{n + 1}

and

θ_{i, k}^{n} = \frac{A γ q_{k} - [η_{E, n} (υ - 1) (μ_{n}^{R} + z_{i}^{R}) - A γ \underset{̲}{q}]}{Δ q} \frac{1 - e^{- λ Δ t}}{A γ}

, (23) becomes

Q_{k (i, n)}^{υ, n + 1} = q_{k} - θ_{i, k}^{n} Δ q .

(24)

Proof.

Let

λ = \frac{A γ}{m_{Q} c_{P}}

. For

n = 0, 1, \dots, N_{t}, i = 0, 1, \dots, N_{z}

and

k = 0, 1, \dots, N_{q}

, assume that

Q^{υ} (t_{n}) = q_{k}

. Under Assumptions 4 and 5, (22) is a linear first-order ODE with constant coefficients and source term, which is solved to obtain the desired result. Now, substituting

t = t_{n + 1}

in (23) yields

\begin{matrix} Q_{k (i, n)}^{υ, n + 1} = q_{k} e^{- λ Δ t} + (\frac{η_{E, n} (υ - 1) (μ_{n}^{R} + z_{i}^{R})}{A γ} + \underset{̲}{q}) (1 - e^{- λ Δ t}) \end{matrix}

(25)

Rearranging the terms in (25) yields (24). □

Q_{k (i, n)}^{υ, n + 1}

denotes the ES temperature level at time

t_{n + 1}

knowing that, at time

t_{n}

, the ES was at the level

q_{k}

with a residual demand

μ_{n}^{R} + z_{i}^{R}

and an action

υ

was taken.

Discrete-Time State-Dependent Control Constraints: In order for

Q_{k (i, n)}^{υ, n + 1}

to always satisfy the condition

Q_{k (i, n)}^{υ, n + 1} \in [\underset{̲}{q}, \bar{q}]

, we reformulate

U

to adapt to the discrete-time setting, where the control can only be adjusted at the end of the time interval

[t_{n}, t_{n + 1})

. We obtain

U_{d}^{n} (z_{i}^{R}, q_{k}) = {υ \in U (t_{n}, z_{i}^{R}, q_{k}) | Q_{k (i, n)}^{υ, n + 1} \in [\underset{̲}{q}, \bar{q}]} .

(26)

In the following lemma, we give the full expression of

U_{d}^{n} (z_{i}^{R}, q_{k})

.

Lemma 2.

Under Assumptions 4 and 5, the set of discrete-time state-dependent constraints is

U_{d}^{n} (z_{i}^{R}, q_{k}) = \{\begin{matrix} \begin{matrix} [max (0, υ_{m i n}^{d}), min (1, υ_{m a x}^{d})], & q_{k} > \underset{̲}{q}, μ_{n}^{R} + z_{i}^{R} \geq 0, \\ {1}, & q_{k} = \underset{̲}{q}, μ_{n}^{R} + z_{i}^{R} \geq 0, \\ max (0, υ_{m i n}^{c}), min (1, υ_{m a x}^{c})], & q_{k} < \bar{q}, μ_{n}^{R} + z_{i}^{R} < 0, \\ max (χ (t_{n}), υ_{m i n}^{c}), min (1, υ_{m a x}^{c})], & q_{k} = \bar{q}, μ_{n}^{R} + z_{i}^{R} < 0, \end{matrix} \end{matrix}

where

\begin{matrix} υ_{m i n}^{d} & = 1 + \frac{η_{E}^{D} [A γ (\underset{̲}{q} - q_{k} e^{- λ Δ t}) - \underset{̲}{q} (1 - e^{- λ Δ t})]}{(1 - e^{- λ Δ t}) (μ_{n}^{R} + z_{i}^{R})}, \\ υ_{m a x}^{d} & = 1 + \frac{η_{E}^{D} [A γ (\bar{q} - q_{k} e^{- λ Δ t}) - \underset{̲}{q} (1 - e^{- λ Δ t})]}{(1 - e^{- λ Δ t}) (μ_{n}^{R} + z_{i}^{R})}, \\ υ_{m i n}^{c} & = 1 + \frac{A γ (\bar{q} - q_{k} e^{- λ Δ t}) - \underset{̲}{q} (1 - e^{- λ Δ t})}{η_{E}^{C} (1 - e^{- λ Δ t}) (μ_{n}^{R} + z_{i}^{R})}, \\ υ_{m a x}^{c} & = 1 + \frac{A γ (\underset{̲}{q} - q_{k} e^{- λ Δ t}) - \underset{̲}{q} (1 - e^{- λ Δ t})}{η_{E}^{C} (1 - e^{- λ Δ t}) (μ_{n}^{R} + z_{i}^{R})} . \end{matrix}

Proof.

From the condition

Q_{k (i, n)}^{υ, n + 1} \in [\underset{̲}{q}, \bar{q}]

, we obtain

\begin{matrix} \frac{A γ (\underset{̲}{q} - q_{k} e^{- λ Δ t})}{1 - e^{- λ Δ t}} - \underset{̲}{q} & \leq η_{E} (υ - 1) (μ_{n}^{R} + z_{i}^{R}) \\ \leq \frac{A γ (\bar{q} - q_{k} e^{- λ Δ t})}{1 - e^{- λ Δ t}} - \underset{̲}{q} . \end{matrix}

If

μ_{n}^{R} + z_{i}^{R} \geq 0

,

η_{E} = \frac{1}{η_{E}^{D}}

, and we obtain

υ \in [υ_{m i n}^{d}, υ_{m a x}^{d}]

. Similarly, if

μ_{n}^{R} + z_{i}^{R} < 0

,

η_{E} = η_{E}^{C}

, and we have

υ \in [υ_{m i n}^{c}, υ_{m a x}^{c}]

. Therefore, for

μ_{n}^{R} + z_{i}^{R} \geq 0

,

U_{d}^{n} (z_{i}^{R}, q_{k}) = U (t_{n}, z_{i}^{R}, q_{k}) \cap [υ_{m i n}^{d}, υ_{m a x}^{d}]

, and for

μ_{n}^{R} + z_{i}^{R} < 0

,

U_{d}^{n} (z_{i}^{R}, q_{k}) = U (t_{n}, z_{i}^{R}, q_{k}) \cap [υ_{m i n}^{c}, υ_{m a x}^{c}]

. □

One-Step Terminal Value Problem: Starting from the DPP as in [22], we derive the following proposition, which is analogous to Theorem 4.3 in [2]. This result provides the one-step approximate optimal control together with the associated terminal value problem. The latter is then solved numerically to determine the value and optimal strategy of the prosumer’s optimization problem.

Proposition 2.

Let

(t_{n}, z_{i}^{R}, q_{k})

be fixed, and suppose that Assumptions 4 and 5 hold. Then, starting from the DPP, the one-step approximate optimal control

υ^{*}

is given by

υ^{*} = \underset{υ \in U_{d}^{n} (z_{i}^{R}, q_{k})}{arg min} \{\int_{t_{n}}^{t_{n + 1}} e^{- δ (t - t_{n})} Γ (t, z_{i}^{R}, υ) d t + e^{- δ Δ t} V (t_{n + 1}, z_{i}^{R}, Q_{k (i, n)}^{υ, n + 1})\} .

(27)

In addition, setting

H (t, z^{R}) = V (t, z^{R}, Q^{υ^{*}})

, we obtain the one-step terminal value problem

\begin{matrix} \frac{\partial H}{\partial t} (t, z^{R}) + L H (t, z^{R}) + Γ (t, z^{R}, υ^{*}) & = 0, on [t_{n}, t_{n + 1}) \times X \\ H (t_{n + 1}, z^{R}) & = V (t_{n + 1}, z^{R}, Q_{k (i, n)}^{υ^{*}, n + 1}), \end{matrix}

(28)

where

L H = f (z^{R}) \frac{\partial H}{\partial z^{R}} + \frac{1}{2} (σ_{R}^{2} (t) + σ_{0}^{2}) \frac{\partial^{2} H}{\partial {(z^{R})}^{2}} - δ H

.

Proof.

The proof is similar to that of Theorem 4.3 in [2], taking

σ^{2} = σ^{2} (t) + σ_{0}^{2}

. □

Positivity Condition: In order to solve the one-step terminal value problem (28) using numerical techniques, we first discretize the differential operator

L

. We let

Λ

denote the discretization operator of

L

and denote

ϑ_{i} = - κ_{R} z_{i}^{R}

. Given that the sign of

ϑ_{i}

changes according to

z_{i}^{R}

and cannot easily be determined, we apply the upwind discretization for the convection term

\frac{\partial V}{\partial z^{R}}

. Subsequently, we apply the central second-order finite difference for the diffusion term

\frac{\partial^{2} V}{\partial {(z^{R})}^{2}}

. Letting

σ_{R} (t_{n}) = σ_{R, n}

, we obtain

Λ V_{i, k}^{n} = \frac{1}{2} (σ_{R, n}^{2} + σ_{0}^{2}) \frac{V_{i - 1, k}^{n} - 2 V_{i, k}^{n} + V_{i + 1, k}^{n}}{{(Δ z^{R})}^{2}} - δ V_{i, k}^{n} + \{\begin{matrix} ϑ_{i} \frac{V_{i, k}^{n} - V_{i - 1, k}^{n}}{Δ z^{R}}, & ϑ_{i} \geq 0, \\ ϑ_{i} \frac{V_{i + 1, k}^{n} - V_{i, k}^{n}}{Δ z^{R}}, & ϑ_{i} < 0, \end{matrix}

= A_{i} V_{i + 1, k}^{n} - B_{i} V_{i, k}^{n} + C_{i} V_{i - 1, k}^{n},

(29)

where the expressions of

A_{i}

,

B_{i}

, and

C_{i}

change depending on the sign of

ϑ_{i}

. The following result gives us the upper bound for

Δ z^{R}

and defines the positivity condition for the coefficients

A_{i}, B_{i}

, and

C_{i}

.

Lemma 3

(Positivity Condition). For

i = 1, \dots, N_{z} - 1

and

n = 0, \dots, N_{t} - 1

, the coefficients

A_{i}, B_{i}

, and

C_{i}

remain positive provided

Δ z^{R} \leq \frac{\sqrt{2 κ_{R} (σ_{R, n}^{2} + σ_{0}^{2})}}{6 κ_{R}} .

(30)

Proof.

Similar to the proof of Proposition 4.4 in [2]. □

CFL Condition: In the following, we formulate the Courant–Friedrichs–Lewy (CFL) condition, as introduced in [23]. It relates the spatial step size

Δ q

to the time step

Δ t

and ensures that the arrival point

Q_{k (i, n)}^{υ, n + 1}

consistently lies within the interval defined by the neighboring grid points

q_{k - 1} = q_{k} - Δ q

and

q_{k + 1} = q_{k} + Δ q

adjacent to

q_{k}

. From this assumption, the following inequalities can be derived

\begin{matrix} \underset{̲}{q} : = q_{0} & \leq \underset{̲}{q} - θ_{i, 0}^{n} Δ q \leq q_{1} : = \underset{̲}{q} + Δ q, & θ_{i, 0}^{n} \leq 0, \\ q_{k - 1} & \leq q_{k} - θ_{i, k}^{n} Δ q \leq q_{k + 1}, & k = 1, \dots, N_{q} - 1, \\ \bar{q} - Δ q : = q_{N_{q} - 1} & \leq \bar{q} - θ_{i, N_{q}}^{n} Δ q \leq q_{N_{q}} : = \bar{q}, & θ_{i, N_{q}}^{n} \geq 0, \end{matrix}

(31)

In the following, we obtain the Courant–Friedrichs–Lewy (CFL) condition ([23]) relating the time step size

Δ t

to the ES step size

Δ q

. This ensures the stability of the derived numerical scheme. Let

{\underset{̲}{μ}}^{R} = min_{n \in {0, 1, \dots, N_{t}}} μ_{n}^{R}

,

{\bar{μ}}^{R} = max_{n \in {0, 1, \dots, N_{t}}} μ_{n}^{R}

. Therefore, from the above inequalities and the state-dependent set of feasible controls, the CFL condition is given by

Δ q \geq \frac{Δ t}{m_{Q} c_{P}} max (- η_{E}^{C} ({\underset{̲}{μ}}^{R} + {\underset{̲}{z}}^{R}), \frac{{\bar{μ}}^{R} + {\bar{z}}^{R}}{η_{E}^{D}} + A γ (\bar{q} - \underset{̲}{q})) .

(32)

Interpolation: As noted in [2], the arrival point

Q_{k (i, n)}^{υ, n + 1}

does not necessarily coincide with a grid point

q_{k} \in {\hat{G}}_{q}

. This prompts an interpolation of

V (t_{n + 1}, z_{i}^{R}, Q_{k (i, n)}^{υ, n + 1})

based on the function values

V_{i, k}^{n + 1}

at the grid points of

\hat{G}

. Following [17], a linear interpolation is sufficient to construct a monotone difference scheme. In what follows, we denote by

V_{k (i, n)}^{n + 1}

, the interpolated values of

V (t_{n + 1}, z_{i}^{R}, Q_{k (i, n)}^{υ, n + 1})

defined in the result below.

Proposition 3.

Let

θ_{i, k}^{n} = \frac{A γ q_{k} - [η_{E, n} (υ - 1) (μ_{n}^{R} + z_{i}^{R}) - A γ \underset{̲}{q}]}{Δ q} \frac{1 - e^{- λ Δ t}}{A γ}

, and assume that the CFL condition holds. Then, the interpolated value

V_{k (i, n)}^{n + 1}

is given by

\begin{matrix} V_{k (i, n)}^{n + 1} & = D_{i, k}^{(q, n)} V_{i, k}^{n + 1} + F_{i, k}^{(q, n)} V_{i, k - 1}^{n + 1} + H_{i, k}^{(q, n)} V_{i, k + 1}^{n + 1}, \end{matrix}

(33)

for

i = 0, \dots, N_{z^{R}}, k = 1, \dots, N_{q} - 1, n = 0, \dots, N_{t} - 1

and where

D_{i, k}^{(q, n)} = 1 - | θ_{i, k}^{n} |, F_{i, k}^{(q, n)} = \frac{θ_{i, k}^{n} + | θ_{i, k}^{n} |}{2} a n d H_{i, k}^{(q, n)} = - \frac{θ_{i, k}^{n} - | θ_{i, k}^{n} |}{2}

.

Furthermore, we obtain that for

k = 0

and

k = N_{q}

,

\begin{matrix} V_{0 (i, n)}^{n + 1} & = D_{i, 0}^{(q, n)} V_{i, 0}^{n + 1} + H_{i, 0}^{(q, n)} V_{i, 1}^{n + 1}, \\ V_{N_{q} (i, n)}^{n + 1} & = D_{i, N_{q}}^{(q, n)} V_{i, N_{q}}^{n + 1} + F_{i, N_{q}}^{(q, n)} V_{i, N_{q} - 1}^{n + 1} . \end{matrix}

Proof.

See Appendix C.3 in [2]. □

Now, we discretize the one-step terminal value problem (28) to obtain

\frac{V_{i, k}^{n + 1} - V_{i, k}^{n}}{Δ t} + Λ V_{i, k}^{n} + Γ (t_{n}, z_{i}^{R}, υ^{*}) = 0 .

(34)

Finally, we obtain the fully implicit scheme given by

\begin{matrix} V_{i, k}^{n} - Δ t \{A_{i} V_{i + 1, k}^{n} - B_{i} V_{i, k}^{n} + C_{i} V_{i - 1, k}^{n}\} = V_{k (i, n)}^{n + 1} + Δ t Γ (t_{n}, z_{i}^{R}, υ^{*}), \\ Q_{k (i, n)}^{υ, n + 1} = q_{k} - θ_{i, k}^{n} Δ q, \\ υ_{i, k}^{* n} = \underset{υ \in U_{d}^{n} (z_{i}^{R}, q_{k})}{arg min} {\int_{t_{n}}^{t_{n + 1}} e^{- δ (t - t_{n})} Γ (t, z_{i}^{R}, υ) d t \\ + e^{- δ Δ t} V_{k (i, n)}^{n + 1}} \\ U_{d}^{n} (z_{i}^{R}, q_{k}) = {υ \in U (t_{n}, z_{i}^{R}, q_{k}) | Q_{k (i, n)}^{υ, n + 1} \in [\underset{̲}{q}, \bar{q}]} \\ V_{i, k}^{N_{t}} = Φ (q_{k}) . \end{matrix}

(35)

From the first equation in (35), we want to form a system of linear algebraic equations that are useful to obtain the values of

V_{i, k}^{n}

. In that regard, we set

Γ_{i, k}^{n + 1} = Γ (t_{n}, z_{i}^{R}, υ^{*})

and assume that, for fixed n and k, the optimal strategy

υ^{*} = υ_{i, k}^{* n}

, the CFL condition, and the terms in (33) for all i are known. We denote by

Ψ_{i, k}^{n + 1}

the known right-hand side of the difference equation (35) for fixed

t_{n}, q_{k}

and for all i. Hence, for

i = 1, \dots, N_{z} - 1,

k = 1, \dots, N_{q} - 1, n = 0, 1, \dots, N_{t} - 1

, we obtain

(1 + Δ t B_{i}) V_{i, k}^{n} - Δ t C_{i} V_{i - 1, k}^{n} - Δ t A_{i} V_{i + 1, k}^{n} = Ψ_{i, k}^{n + 1},

(36)

At the boundary

q = \underset{̲}{q}

, i.e.,

k = 0

, we recall that

θ_{i, 0}^{n} \leq 0

. Similarly, at the boundary

q = \bar{q}

, i.e.,

k = N_{q}

, we have

θ_{i, N_{q}}^{n} \geq 0

. Therefore, we have the following

V_{k (i, n)}^{n + 1} = \{\begin{matrix} (1 - θ_{i, k}^{n}) V_{i, k}^{n + 1} + θ_{i, k}^{n} V_{i, k - 1}^{n + 1} & θ_{i, k}^{n} \geq 0, k = 1, \dots, N_{q}, \\ (1 + θ_{i, k}^{n}) V_{i, k}^{n + 1} - θ_{i, k}^{n} V_{i, k + 1}^{n + 1} & θ_{i, k}^{n} < 0, k = 0, \dots, N_{q} - 1 . \end{matrix}

(37)

Boundary Conditions: To ensure that the one-step terminal value problem, formulated as a PDE, is well posed, we specify boundary conditions at

z^{R} = {\underset{̲}{z}}^{R}, {\bar{z}}^{R}

and

q = \underset{̲}{q}, \bar{q}

. These conditions arise from truncating the computational domain of the PDE from

X

to a bounded domain

\hat{X}

. Accordingly, we require that

\frac{\partial^{2} V}{\partial {(z^{R})}^{2}} (t, {\underset{̲}{z}}^{R}, q) = 0, \frac{\partial^{2} V}{\partial {(z^{R})}^{2}} (t, {\bar{z}}^{R}, q) = 0, (t, q) \in (0, T) \times [\underset{̲}{q}, \bar{q}] .

(38)

\frac{\partial^{2} V}{\partial q^{2}} (t, z^{R}, \underset{̲}{q}) = 0, \frac{\partial^{2} V}{\partial q^{2}} (t, z^{R}, \bar{q}) = 0, (t, z^{R}) \in (0, T) \times [{\underset{̲}{z}}^{R}, {\bar{z}}^{R}] .

(39)

We note that the corner values

V_{0, 0}^{n}, V_{0, N_{q}}^{n}, V_{N_{z}, 0}^{n}

, and

V_{N_{z}, N_{q}}^{n}

are computed using previously obtained values

V_{i, k}^{n}

for

i = 0, \dots, N_{z}

,

k = 0, \dots, N_{q}

, so that

\begin{matrix} V_{0, 0}^{n} & = 2 V_{1, 0}^{n} - V_{2, 0}^{n}, \\ V_{0, N_{q}}^{n} & = 2 V_{0, N_{q} - 1}^{n} - V_{0, N_{q} - 2}^{n}, \\ V_{N_{z}, 0}^{n} & = 2 V_{N_{z}, 1}^{n} - V_{N_{z}, 2}^{n}, \\ V_{N_{z}, N_{q}}^{n} & = 2 V_{N_{z}, N_{q} - 1}^{n} - V_{N_{z}, N_{q} - 2}^{n} . \end{matrix}

(40)

Combining (40), the boundary conditions, and the difference equation (35), we proceed to the matrix formulation below.

Matrix Formulation: Simultaneously varying the indices

i, k

, for n fixed in the difference equation (35), and using (40), we obtain the following system of linear equations:

G V_{k}^{n} = 𝚿_{k}^{n + 1}, for n = 0, \dots, N_{t} - 1, k = 1, \dots, N_{q} - 1,

(41)

where

V_{k}^{n} = {(V_{1, k}^{n}, \dots, V_{N_{z} - 1, k}^{n})}^{T}

,

𝚿_{k}^{n + 1} = {(Ψ_{1, k}^{n + 1}, \dots, Ψ_{N_{z} - 1, k}^{n + 1})}^{T}

, with G an

(N_{z} - 1) \times (N_{z} - 1)

tridiagonal matrix given as

G = [\begin{matrix} e_{1} & p_{1} & 0 & 0 & \dots & 0 & 0 \\ h_{2} & e_{2} & p_{2} & 0 & \dots & 0 & 0 \\ 0 & h_{3} & e_{3} & p_{3} & \dots & 0 & 0 \\ \dots & \dots & \dots & \dots & \dots & \dots & \dots \\ 0 & 0 & 0 & \dots & h_{N_{z} - 2} & e_{N_{z} - 2} & p_{N_{z} - 2} \\ 0 & 0 & 0 & \dots & 0 & h_{N_{z} - 1} & e_{N_{z} - 1} \end{matrix}]

(42)

where

\begin{matrix} e_{1} & = 1 + Δ t B_{1} - 2 Δ t C_{1}, & p_{1} = Δ t (C_{1} - A_{1}), \\ e_{k} & = 1 + Δ t B_{k}, h_{k} = - Δ t C_{k}, & p_{k} = - Δ t A_{k}, k = 2, \dots, N_{z} - 2 \\ e_{N_{z} - 1} & = 1 + Δ t B_{N_{z} - 1} - 2 Δ t A_{N_{z} - 1} & h_{N_{z} - 1} = - Δ t (C_{N_{z} - 1} - A_{N_{z} - 1}) . \end{matrix}

It now remains to obtain the values

V_{0, k}^{n}, V_{N_{z}, k}^{n}, V_{i, 0}^{n}

, and

V_{i, N_{q}}^{n}

. To do so, we substitute the boundary conditions (38) in the difference equation (35) to obtain

\begin{matrix} \frac{\partial V}{\partial t} (t, z^{R}, q) + L_{1} V (t, z^{R}, q) + Γ (t, z^{R}, υ^{*}) & = 0, \end{matrix}

(43)

where for

i = 0, L_{1} V = - κ_{R} {\underset{̲}{z}}^{R} \frac{\partial V}{\partial z^{R}} - δ V

, while for

i = N_{z}, L_{1} V = - κ_{R} {\bar{z}}^{R} \frac{\partial V}{\partial z^{R}} - δ V

. Following similar steps as above, we obtain the following for

z^{R} = {\underset{̲}{z}}^{R}

:

K V_{0}^{n} = 𝚿_{0}^{n + 1} + Δ t A_{0} V_{1}^{n}, for n = 0, \dots, N_{t} - 1,

(44)

V_{0}^{n} = {(V_{0, 1}^{n}, \dots, V_{0, N_{q} - 1}^{n})}^{T}

,

𝚿_{0}^{n + 1} = {(Ψ_{0, 1}^{n + 1}, \dots, Ψ_{0, N_{q} - 1}^{n + 1})}^{T}

,

V_{1}^{n} = {(V_{1, 1}^{n}, \dots, V_{1, N_{q} - 1}^{n})}^{T}

.

Similarly, for

z^{R} = {\bar{z}}^{R}

, we have

M V_{N_{z}}^{n} = 𝚿_{N_{z}} + Δ t C_{N_{z}} V_{N_{z} - 1}^{n}, for n = 0, \dots, N_{t} - 1,

(45)

V_{N_{z}}^{n} = {(V_{N_{z}, 1}^{n}, \dots, V_{N_{z}, N_{q} - 1}^{n})}^{T}

,

𝚿_{N_{z}}^{n + 1} = {(Ψ_{N_{z}, 1}^{n + 1}, \dots, Ψ_{N_{z}, N_{q} - 1}^{n + 1})}^{T}

,

V_{N_{z} - 1}^{n} = {(V_{N_{z} - 1, 1}^{n}, \dots, V_{N_{z} - 1, N_{q} - 1}^{n})}^{T}

. Letting

I_{N_{q} - 1}

denote the

(N_{q} - 1) \times (N_{q} - 1)

identity matrix, K and M are

(N_{q} - 1) \times (N_{q} - 1)

diagonal matrices given by

K = (1 + Δ t B_{0}) I_{N_{q} - 1}, M = (1 + Δ t C_{N_{z}}) I_{N_{q} - 1} .

(46)

Substituting

k = 0

and

k = N_{q}

, respectively, in the difference equation (35), we obtain the following for

n = 0, \dots, N_{t} - 1

:

\begin{matrix} G V_{0}^{n} & = 𝚿_{0}^{n + 1}, for i = 1, \dots, N_{z} - 1, \\ G V_{N_{q}}^{n} & = 𝚿_{N_{q}}^{n + 1}, for i = 1, \dots, N_{z} - 1, \end{matrix}

(47)

V_{0}^{n} = {(V_{1, 0}^{n}, \dots, V_{N_{z} - 1, 0}^{n})}^{T}

,

𝚿_{0}^{n + 1} = {(Ψ_{1, 0}^{n + 1}, \dots, Ψ_{N_{z} - 1, 0}^{n + 1})}^{T}

,

V_{N_{q}}^{n} = {(V_{1, N_{q}}^{n}, \dots, V_{N_{z} - 1, N_{q}}^{n})}^{T}

,

𝚿_{N_{q}}^{n + 1} = {(Ψ_{1, N_{q}}^{n + 1}, \dots, Ψ_{N_{z} - 1, N_{q}}^{n + 1})}^{T}

. Further details can be obtained in [2].

Backward Recursion Algorithm: The approximate optimal control presented in (27), along with the value functions in (41), (44), (45), and (47), can be computed using Algorithm 1, proceeding backward in time from the terminal time step

N_{t}

. To determine

υ^{*} = υ_{i, k}^{* n}

, we take into account the storage level and distinguish between scenarios of unsatisfied demand and overproduction. This classification indicates which subinterval of

U_{d}^{n}

should be evaluated. Subsequently, an optimization procedure is carried out to identify the control value that minimizes the objective function. The resulting control is then interpreted as the optimal decision for the given state variables at grid points

z_{i}^{R}

and

q_{k}

. This procedure is applied iteratively across all combinations of grid points to recover the complete set of optimal controls.

5. Numerical Results

In this section, we discuss the numerical solution for the prosumer’s problem. The results are based on the implementation of Algorithm 1 to find the optimal strategies and the value functions as well as studying the properties of the obtained value functions. As the terminal condition, we consider a penalization problem modeled as

Φ (q) = \{\begin{matrix} \frac{P_{p e n} m_{Q} c_{P} (q_{p e n} - q)}{η_{E}^{C}}, & q < q_{p e n}, \\ 0, & q \geq q_{p e n} . \end{matrix}

(48)

For the purpose of the numerical simulations, the seasonality function and heat price are modeled, respectively, by

\begin{matrix} μ^{R} (t) & = c_{0} + c cos (\frac{2 π t}{ρ}), P_{b u y} (t) = 𝓁_{0} + 𝓁 cos (\frac{2 π t}{ρ}), P_{s e l l} (t) = P_{b u y} (t) - ξ . \end{matrix}

(49)

The full description of the model parameter values is given in the Table 1 below.

Parameters

γ, c_{0}, c, b_{1}

, and

b_{2}

are calibrated to the model, and the idea for the calibration is provided in [2].

Terminal Cost:Figure 2 shows the terminal cost

Φ (q)

as a function of the residual demand r and the ES storage level q. Here, the terminal cost is formulated as a penalization problem. By construction as in (48), we observe that the value function is constant with respect to the residual demand at the terminal time. However, if the storage level falls below the reference temperature

q_{r e f}

= 40 °C, the prosumer incurs a cost proportional to the deviation from

q_{r e f}

and the charging efficiency

η_{E}^{C}

. This cost increases the further the storage temperature is below the reference. For temperatures above the reference, no additional cost is incurred, and the storage is effectively considered to have no residual value.

Value Function and Optimal Strategy for Time-Dependent Heat Buying and Selling Prices:Figure 3 depicts the value functions and optimal strategies of the prosumer at the initial time

t = 0

and on day

t = 362

. From the top- and bottom-left panels, we can observe that as the ES temperature increases, the prosumer incurs lower costs. Conversely, as the residual demand increases, the prosumer incurs higher costs. The top-right panel shows that, for an unsatisfied demand (

r \geq 0

), the prosumer satisfies all residual demand via the CHS if the ES is empty; otherwise, it discharges the ES. In the case of overproduction (

r < 0

), all excess production is sold to the CHS. From the bottom-right panel, we can note that, for an unsatisfied demand, the prosumer satisfies all residual demand via the CHS when

q < 40

°C. For

q \geq 40

°C, all residual demand is satisfied by the ES. In the case of overproduction, the prosumer first stores some energy in the ES and subsequently sells any remaining excess to the CHS. Since the horizon is one year (365 days), with a penalization cost at the terminal time, we can observe that on day 362, the prosumer begins filling the storage close to the penalty temperature

q_{r e f} = 40

°C to avoid higher terminal costs.

Figure 4 shows the prosumer’s value functions and optimal strategies as functions of the residual demand r and the ES temperature level q on day

t = 230

. In the top-left panel, we can observe that, for a given residual demand, the prosumer incurs lower costs as the ES temperature increases. The expected aggregated discounted cost is higher for the strongest unsatisfied residual demand and lower for the strongest overproduction. On the other hand, the bottom-left panel shows that the cost increases with respect to the residual demand, with the highest cost incurred for an empty storage. The smallest cost is incurred for a full storage. In the top-right panel, we can observe that, for both the strongest and smallest unsatisfied demand, the prosumer first purchases thermal energy from the CHS. Afterwards, for a sufficiently filled ES, it discharges the storage to meet the unsatisfied demand. For the strongest overproduction, the prosumer stores all excess production in the ES and only sells to the CHS if the ES is full. In the bottom-right panel, for both an empty and

50 %

-full ES, the prosumer stores the excess production in the ES for an overproduction and satisfies all unsatisfied demand via the CHS for an unsatisfied residual demand. Now, for a full ES, the prosumer compensates the loss to the environment of the ES in the case of overproduction. For an unsatisfied demand, all residual demand is satisfied by the ES.

Value Function and Optimal Strategy for Constant Heat Buying and Selling Prices: For the case of constant heat buying and selling prices, we focus on the value function and optimal strategies as functions of the residual demand r and the ES temperature level q. Let

P_{b u y}^{C}

and

P_{s e l l}^{C}

denote the constant heat buying and selling prices, respectively. In the following, we model

P_{b u y}^{C}

and

P_{s e l l}^{C}

as

P_{b u y}^{C} = max_{t \in [0, T]} P_{b u y} (t) P_{s e l l}^{C} = P_{b u y}^{C} - ξ,

(50)

where

P_{b u y}, P_{s e l l}

, and

ξ

are the same as in (49). In Figure 5, we focus on the top- and bottom-right panels since the top- and bottom-left panels have the same interpretation as in Figure 4. In the top-right panel of Figure 5, for both the strongest and the smallest unsatisfied demand, the prosumer discharges the ES if it is not empty; otherwise, all residual demand is satisfied by the CHS, at a higher cost. For the strongest overproduction, all excess thermal energy is sold to the CHS for revenue. As shown in the bottom-right panel, for an empty ES and overproduction, the prosumer sells all residual demand to the CHS. However, for an unsatisfied demand, all residual demand is satisfied by the CHS. For a half-full or full ES, the prosumer sells all residual demand to the CHS in the event of overproduction. For an unsatisfied demand, it discharges the ES.

Let

V^{m a x}

denote the maximum value function of the prosumer for time-dependent heat buying and selling prices.

V_{C o n s t a n t}^{m a x}

denotes the maximum value function of the prosumer for constant heat buying and selling prices, and

{\tilde{V}}^{m a x}

is the maximum value function of the consumer for time-dependent heat buying and selling prices.

In Table 2, we note that the prosumer incurs a higher expected aggregated discounted cost under the time-dependent heat prices model than with its constant counterpart. This is due to the fact that the time-dependent heat prices model reflects the changes in heat prices during cold and warm seasons, which is not observed in the constant pricing case. Since the consumer is not equipped with heat production and storage units, it relies solely on the CHS for its unsatisfied residual demand, hence leading to a much higher expected aggregated discounted cost. From the values of

{\tilde{V}}^{m a x}

and

V^{m a x}

, we can compute the expected consumer investment cost into heat production and storage units in order to reduce the cost of satisfying its heating and hot water demands. This investment cost is denoted by

{\tilde{C}}_{i n v e s t} = {\tilde{V}}^{m a x} - V^{m a x}

.

5.1. Sensitivity Analysis

In this section, we perform a sensitivity analysis on

σ_{0}, η_{E}^{C}

, and

η_{C}^{D}

to study their impact on the expected aggregate discounted cost. The results presented correspond to the case of time-dependent heat buying and selling prices. All parameters are fixed, with values given in Table 1, except for

σ_{0}, η_{E}^{C}

and

η_{C}^{D}

, which are varied. Throughout this section, we let

V^{m a x}

denote the maximum value function, as in Table 2.

In Table 3, we can observe that as the impact of the weather conditions increases, the aggregated discounted cost becomes higher. This is the case because higher values of

σ_{0}

increase the residual demand, which, in turn, increases the prosumer’s cost. Thus, the more pronounced the impact of weather conditions is on the heating system, the higher the aggregated discounted cost the prosumer will incur. For example, for

σ_{0} = 0

, the prosumer incurs a maximum cost of

V^{m a x} = 1346.4

EUR whereas for

σ_{0} = 0.4

,

V^{m a x} = 1562.9

EUR, which is approximately a

16.08 %

increase from the case of no common noise.

In Table 4, we can observe that the better the discharging efficiency, the smaller the cost incurred by the prosumer. This is due to the fact that, for a weak discharging efficiency, the prosumer, when satisfying residual demand via the ES, discharges more thermal energy to compensate for the loss during the process. Hence, a better discharging efficiency reduces losses during the discharge, hence contributing to the reduction in the prosumer’s aggregated discounted cost. For a discharging efficiency

η_{E} = \frac{1}{0.9}

, we can observe an increase of approximately

1.85 %

in the aggregated discounted cost as compared to the case of a perfect discharging efficiency

η_{E} = 1

.

As the charging efficiency improves in Table 5, we can observe that the prosumer incurs a smaller aggregated discounted cost. For a weak charging efficiency, not all overproduction can be stored in the ES, due to the losses to the environment during charging. Therefore, as the charging efficiency improves, more thermal energy can be stored in the ES for further use. Hence, for a charging efficiency of 0.9, we observe an approximate

1.94 %

increase in the aggregated discounted cost as compared to a perfect charging efficiency.

5.2. Optimal Path of ES Level

In Figure 6, we show the optimal path of the ES temperature level Q together with the residual demand R, the seasonality function

μ^{R}

, and the absolute optimal control

α^{*} R

for a time-dependent heat price. At the initial time, we assume that the prosumer starts with a full storage. Hence, for an unsatisfied residual demand, the prosumer discharges the ES until it becomes empty. For overproduction, the prosumer slowly charges the ES and drives it almost full in late summer. We further observe that for

R > 0

, the prosumer discharges the ES to satisfy the unsatisfied demand and charges the ES for

R < 0

.

6. Summary and Outlook

In this work, we investigated a stochastic optimal control problem involving a prosumer and a consumer, both connected to a CHS. The prosumer is equipped with a local renewable heat production source (solar collector) and a local storage unit. The ES allows the prosumer to store excess thermal energy for future use. However, we assume that it is subject to charging and discharging efficiencies. In contrast, the consumer has no local production or storage unit and is therefore constantly subject to an unsatisfied demand. We focused primarily on the prosumer problem, as the consumer always satisfies all residual demand directly via the CHS. The prosumer’s problem was formulated as a mathematical optimization problem, which we solved using semi-Lagrangian techniques. We presented the numerical results for both a time-dependent and a constant heat price formulation and observed that, under the former price formulation, the prosumer incurred a higher maximum expected aggregated discounted cost than in the latter price formulation. We further performed a sensitivity analysis on the common noise and charging/discharging efficiency parameters. For the case of the common noise, we noted that for

σ_{0} = 0.4

, the maximum expected aggregated discounted cost increases by approximately

16.08 %

as compared to no common noise. Meanwhile, for the discharging efficiency, we observed that for

η_{E} = \frac{1}{0.9}

, the prosumer incurs a cost about

1.85 %

higher than in the case of a perfect discharging efficiency. Also, for the charging efficiency, we noted that for

η_{E} = 0.90

, the maximum expected aggregated discounted cost increases by approximately

1.94 %

as compared to the case of a perfect charging efficiency. Our numerical results also highlight the value function and the optimal strategies of the prosumer. Finally, for the consumer, we evaluated the maximum expected investment required in local heat production and storage to transform into a prosumer.

Looking ahead, it would be interesting to incorporate additional weather factors, such as ambient temperature, to better model the residual demand. Furthermore, modeling electricity prices as a stochastic differential equation (SDE) could provide more realistic dynamics. These extensions are currently under investigation in ongoing research.

Author Contributions

Conceptualization, M.G.S.; Methodology, M.G.S.; Software, M.G.S.; Validation, M.G.S.; Formal Analysis, M.G.S. and J.N.; Writing—Original Draft, M.G.S.; Writing—Review and Editing, M.G.S. and J.N.; Supervision, J.N.; Funding Acquisition, M.G.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was carried out with the aid of a grant from the International Development Research Centre, Ottawa, Canada, www.idrc.ca (accessed on 21 November 2025); and with financial support from the Government of Canada, provided through Global Affairs Canada (GAC), www.international.gc.ca (accessed on 21 November 2025).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gjoka, K.; Rismanchi, B.; Crawford, R.H. Fifth-generation district heating and cooling systems: A review of recent advancements and implementation barriers. Renew. Sustain. Energy Rev. 2023, 171, 112997. [Google Scholar] [CrossRef]
Ganet Somé, M. Stochastic optimal control of prosumers in a district heating system. arXiv 2025, arXiv:2501.09088. [Google Scholar] [CrossRef]
Bilardo, M.; Sandrone, F.; Zanzottera, G.; Fabrizio, E. Modelling a fifth generation bidirectional low temperature district heating and cooling (5gdhc) network for nearly zero energy district (nzed). Energy Rep. 2021, 7, 8390–8405. [Google Scholar] [CrossRef]
Bünning, F.; Wetter, M.; Fuchs, M.; Müller, D. Bidirectional low temperature district energy systems with agent-based control: Performance comparison and operation optimization. Appl. Energy 2018, 209, 502–515. [Google Scholar] [CrossRef]
Li, H.; Wang, S.J. Challenges in smart low-temperature district heating development. Energy Procedia 2014, 61, 1472–1475. [Google Scholar] [CrossRef]
Alasseur, C.; Ben Taher, I.; Matoussi, A. An extended mean field game for storage in smart grids. J. Optim. Theory Appl. 2020, 184, 644–670. [Google Scholar] [CrossRef]
Takam, P.H.; Wunderlich, R. Cost-optimal management of a residential heating system with a geothermal energy storage under uncertainty. Int. J. Dyn. Control 2025, 13, 424. [Google Scholar] [CrossRef]
Djehiche, B.; Barreiro-Gomez, J.; Tembine, H. Price dynamics for electricity in smart grid via mean-field-type games. Dyn. Games Appl. 2020, 10, 798–818. [Google Scholar] [CrossRef]
Dumitrescu, R.; Leutscher, M.; Tankov, P. Energy transition under scenario uncertainty: A mean-field game of stopping with common noise. Math. Financ. Econ. 2024, 18, 233–274. [Google Scholar] [CrossRef]
Elie, R.; Hubert, E.; Mastrolia, T.; Possamaï, D. Mean–field moral hazard for optimal energy demand response management. Math. Financ. 2021, 31, 399–473. [Google Scholar] [CrossRef]
Escribe, C.; Garnier, J.; Gobet, E. A mean field game model for renewable investment under long-term uncertainty and risk aversion. Dyn. Games Appl. 2024, 14, 1093–1130. [Google Scholar] [CrossRef]
Frihi, Z.E.O.; Choutri, S.E.; Barreiro-Gomez, J.; Tembine, H. Hierarchical mean-field type control of price dynamics for electricity in smart grid. J. Sys. Sci. Complex. 2022, 35, 1–17. [Google Scholar] [CrossRef]
Fujii, M.; Takahashi, A. A mean field game approach to equilibrium pricing with market clearing condition. SIAM J. Control Optim. 2022, 60, 259–279. [Google Scholar] [CrossRef]
Verrilli, F.; Srinivasan, S.; Gambino, G.; Canelli, M.; Himanka, M.; Del Vecchio, C.; Sasso, M.; Glielmo, L. Model predictive control-based optimal operations of district heating system with thermal energy storage and flexible loads. IEEE Trans. Autom. Sci. Eng. 2016, 14, 547–557. [Google Scholar] [CrossRef]
Neckel, T.; Rupp, F. Random Differential Equations in Scientific Computing; Walter de Gruyter: Berlin, Germany, 2013. [Google Scholar]
D’Halluin, Y.; Forsyth, P.A.; Labahn, G. A semi-lagrangian approach for american asian options under jump diffusion. SIAM J. Sci. Comput. 2005, 27, 315–345. [Google Scholar] [CrossRef]
Chen, Z.; Forsyth, P.A. A semi-lagrangian approach for natural gas storage valuation and optimal operation. SIAM J. Sci. Comput. 2008, 30, 339–368. [Google Scholar] [CrossRef]
Ware, A. Accurate semi-lagrangian time stepping for stochastic optimal control problems with application to the valuation of natural gas storage. SIAM J. Financ. Math. 2013, 4, 427–451. [Google Scholar] [CrossRef]
Duffy, D.J. Finite Difference Methods in Financial Engineering: A Partial Differential Equation Approach; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Yong, J.; Zhou, X.Y. Stochastic Controls: Hamiltonian Systems and HJB Equations; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999; Volume 43. [Google Scholar]
Touzi, N. Stochastic Control Problems, Viscosity Solutions and Application to Finance; Scuola normale Superiore; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Pham, H. Continuous-Time Stochastic Control and Optimization with Financial Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009; Volume 61. [Google Scholar]
Courant, R.; Friedrichs, K.; Lewy, H. Über die partiellen differenzengleichungender mathematischen physik. Math. Ann. 1928, 100, 32–74. [Google Scholar] [CrossRef]

Figure 1. A model with a consumer and a prosumer connected to a community heating system.

Figure 2. Terminal cost function

Φ (q) = V (t_{N_{t}}, x)

for a penalization problem.

Figure 2. Terminal cost function

Φ (q) = V (t_{N_{t}}, x)

for a penalization problem.

Figure 3. Value functions and optimal strategies of the prosumer at

t = 0, 362

days for time-dependent heat buying and selling prices.

Figure 3. Value functions and optimal strategies of the prosumer at

t = 0, 362

days for time-dependent heat buying and selling prices.

Figure 4. Value functions (left) and optimal strategies (right) as functions of the residual demand r and ES temperature level q at

t = 230

days for time-dependent heat buying and selling prices.

Figure 4. Value functions (left) and optimal strategies (right) as functions of the residual demand r and ES temperature level q at

t = 230

days for time-dependent heat buying and selling prices.

Figure 5. Value functions (left) and optimal strategies (right) as functions of the residual demand r and ES temperature level q at

t = 230

days for constant heat buying and selling prices.

Figure 5. Value functions (left) and optimal strategies (right) as functions of the residual demand r and ES temperature level q at

t = 230

days for constant heat buying and selling prices.

Figure 6. Optimal path of the ES Q (blue), seasonality function

μ^{R}

(magenta), residual demand R (red), and optimal absolute control

α^{*} R

(green). The minimum and maximum storage levels are represented by dotted lines (reddish orange and golden yellow, respectively).

Figure 6. Optimal path of the ES Q (blue), seasonality function

μ^{R}

(magenta), residual demand R (red), and optimal absolute control

α^{*} R

(green). The minimum and maximum storage levels are represented by dotted lines (reddish orange and golden yellow, respectively).

Table 1. Model and discretization parameters.

Parameters	Values	Units	Parameters	Values	Units
$κ_{R}$	0.025	$h^{- 1}$	S	0.335	$\frac{€}{k W h}$
$σ_{R}, σ_{0}$	0.005, 0.4	$\frac{k W}{\sqrt{h}}$	$P_{p e n}$	0.325	$\frac{€}{k W h}$
$c_{0}, c$	0.37, 1.00	$k W$	$ξ$	0.02	$\frac{€}{k W h}$
			$𝓁_{0}, 𝓁$	0.17, 0.15	$\frac{€}{k W h}$
$ρ$	8760	h	${\underset{̲}{z}}^{R}, {\bar{z}}^{R}$	−5.37, 5.37	$k W$
$b_{1}$	0.01		$T, Δ t$	8760, 1	h
$b_{2}$	0.012	$K^{- 1}$	$N_{t}$	8760
$m_{Q}$	7854	$k g$	$η_{E}^{D}, η_{E}^{C}$	0.95, 0.95
$c_{P}$	$0.0012$	$\frac{k W h}{k g K}$	$N_{z}, N_{q}$	85, 60
A	21.99	$m^{2}$	$P_{c}, i_{d}$	20, 25	°C
$γ$	$2.34 \times 10^{- 4}$	$\frac{k W}{m^{2} K}$	$\underset{̲}{q}, q_{p e n}, \bar{q}$	25, 40, 85	°C

Table 2. Maximum value function for an imperfect efficiency with time-dependent and constant heat buying and selling prices, in EUR, for a prosumer and a consumer.

Value Function	Numerical Value
$V^{m a x}$	1562.9
$V_{C o n s t a n t}^{m a x}$	1465.1
${\tilde{V}}^{m a x}$	40,474.0

Table 3. Maximum value function from varying common noise volatility coefficient

σ_{0}