Variables Reduction in Sequential Resource Allocation Problems

: This paper presents a general framework to address diverse notoriously difﬁcult problems arising in the area of optimal resource management, exploitation of natural reserves, pension fund valuation, environmental protection, and storage operation. Using some common abstract features of this problem class, we present a technique which provides a signiﬁcant reduction of decision variables. As an application, we discuss a battery storage control to show how a decision problem, which is practically unsolvable in the original formulation, can be treated by our method.


Introduction
Optimization of sequential decision-making under uncertainty arises in different fields, such as finance, economics, robotics, manufacturing, and telecommunication. These problems are frequently discussed under the framework of real options, as applications of optimal stochastic control. Therefore, the real options viewpoint (see [1,2]) highlights a freedom in the choice of decisions and consistently uses the operational flexibility for strategy optimization. The classical contributions [3,4] were among the first, placing sequential decision optimization into real option context, with applications in manufacturing [5], investment planning [6], mining [7], and commodities [8]. Therefore, a formulation of decision problems in continuous time was preferred: using a jump-diffusion setting and its well-established stochastic control toolbox, the so-called Hamilton-Jacobi Bellman (HJB) equations, (see [9]) and the Backward Stochastic Differential Equations (BSDE) (see [10]) became applicable [11][12][13]. However, also using discrete-time modeling, sequential decision problems have been routinely treated in terms of the Markov Decision Theory [14,15] whose approximate techniques are known as Approximate Dynamic Programming, (see [16], which comprises diverse heuristic numerical approaches).
In this paper we discuss a specific class of discrete-time stochastic control problems from the viewpoint of real options, and use a certain managerial flexibility which can be expressed in terms of a virtual reserve price. We show that in the optimal regime at any time, the actions are connected to each other (in a certain sense) via virtual reserve price. This observation can be used to significantly reduce the range of actions available for decision choice by excluding at each step those which cannot be optimal. More specifically, we consider an abstract but generic situation which frequently arises in sequential decision-making, when a certain activity (production plan) must be optimally chosen to meet the right balance between the current costs of the activity and the consumption of some resource. While all activity costs have an immediate effect, the impact of resource consumption is uncertain and becomes relevant in the future. In practice, one is frequently confronted with a huge range of possibilities to choose activities, and their combinations form a complex space whose structure is usually determined by numerous inter-relations. Although a solution to such sequential decision problem can be theoretically obtained in terms of the standard backward induction, its practical implementation is virtually infeasible due to the high complexity of decisions.
It turns out that a certain transformation can be of great help in such situations. In particular, under specific but natural assumptions, we show how the decision variables space can be reduced to a simple one-parameter family. Such reduction is achieved by a solution to a separate deterministic optimization problem which is usually easily obtainable. Using such re-formulation, diverse numerical techniques for backward induction can be applied. In this context, in our particular situations, the Bellman's optimality principle turns out to be equivalent to certain fixed-point equations, which might lead to alternative ways to further simplify computational efforts in the backward induction.
The paper is organized as follows. Section 2 reviews dynamical Bellman principle underlying the classical and stochastic dynamic programing and explains how our method is placed within this context and its use in practice. Section 3 presents the problem and our approach to variables reduction within a general framework. Section 4 introduces an application to battery control which is reviewed in Section 5 to obtain a variables reduction for battery storages. Section 6 is devoted to formal justification of our techniques, while Section 7 concludes.

Dynamic Principle for Optimal Switching and Reduction of Decision Variables
In applications, sequential decision-making is usually addressed under the framework of discrete-time Stochastic Control. The theory of Markov Decision Processes/Dynamic Programming provides a variety of methods to deal with such questions. In generic situations, approaching analytical solutions for even some simplest decision processes may be a cumbersome process ( [10,14,16]) Furthermore, since closed-form solutions to practically important control problems are usually unavailable, numerical approximations became popular among practitioners to obtain approximately optimal control policies. Although a huge variety of computational methods have been developed therefore, typical real-world problems are usually too complex for the existing solution techniques, in particular if the state dimension of the underlying controlled evolution is high. Let us review the finite-horizon Markov decision theory following [17].

Controlled Markov processes:
Consider a random dynamics on a finite time horizon 0, . . . , T whose state x evolves in E and is controlled by actions a from an action set A. For each a ∈ A, we assume that K a t (x, dx ) is a stochastic transition kernel on E. A mapping π t : E → A which describes the action that the controller takes at time t is called a decision rule. A sequence of decision rules π = (π t ) T−1 t=0 is called a policy. For each initial point x 0 ∈ E and each policy π = (π t ) T−1 t=0 , there exists a probability measure P x 0 ,π and a stochastic process (X t ) T t=0 satisfying the initial condition P x 0 ,π (X 0 = x 0 ) = 1 such that holds for each B ⊂ E at all times t = 0, . . . , T − 1, i.e., given that system is in state X t at time t, the action a = π t (X t ) is used to pick the transition probability K a=π t (X t ) t (X t , ·) which randomly drives the system from X t to X t+1 with the distribution K π t (X t ) t (X t , · ). Let us use K a t to denote the one-step transition operator associated with the transition kernel K a t when the action a ∈ A is chosen. In other words, for each action a ∈ A the operator K a t acts on functions v by whenever the above integrals are well-defined.

Costs of control:
For each time t, we are given the t-step reward function r t : E × A → R, where r t (x, a) represents the reward for applying an action a ∈ A when the state of the system is x ∈ E at time t. At the end of the time horizon, at time T, it is assumed that no action can be taken. Here, if the system is in a state x, a scrap value r T (x), which is described by a pre-specified scrap function r T : E → R, is collected. Given an initial point x 0 , the goal is to maximize the expected finite-horizon total reward, in other words to find the argument π * = (π * t ) T−1 t=0 such that where A is the set of all policies, and E x 0 ,π denotes the expectation over the controlled Markov chain defined by (1).

Decision optimization:
The maximization (3) is well-defined under diverse additional assumptions (see [14], p. 199). The calculation of the optimal policy is addressed in the following setting. For t = 0, . . . , T − 1, introduce the Bellman operator which acts on each measurable function v : E → R where the integrals K a t v for all a ∈ A exist. Furthermore, consider the Bellman recursion Under appropriate assumptions, there exists a recursive solution (v * t ) T t=0 to the Bellman recursion, which gives the so-called value functions and determines an optimal policy π * via Stochastic switching: Consider now a Markov decision model whose state evolution consists of one controllable and one uncontrollable component. To be more specific, we assume that the state space E = P × Z is the product of a space P (operation modes) and a set Z (states of environment) being a subset Z ⊂ R d of the Euclidean space. We suppose that at each time t = 0, . . . T − 1 the mode component p ∈ P is driven by actions a ∈ A in terms of a deterministic function where α p,z t (a) ∈ P stands for the new mode from if the action a ∈ A was taken at time t = 0, . . . , T − 1 in the state (p, z) ∈ E. In this setting, the transition operators are given by for t = 0, . . . , T − 1, and a ∈ A.
Variables reduction: In the above context of stochastic switching, the application of Bellman operator may cause numerical difficulties, particularly if the action space A is high-dimensional and fragmented. Unfortunately, this situation appears frequently in practice, where high-dimensional vectors of decision variables usually subject to numerous feasibility inter-dependencies. In such framework, the problem may become unsolvable due to difficulty of maximization over a complex action set A. The main contribution of our paper is to provide a significant reduction of set of actions, which are relevant for maximization. Under specific assumptions described in this paper, we show there exists a one-parameter family (curve) A p,z t ⊂ A which depends on the recent time t = 0, . . . , T = 1 and state (p, z) ∈ E such that the domain of maximization reduces from A to A p,z t which yields instead of the infeasible maximization (7) a new problem which is significantly simpler and usually admits a (numerical) solution. To determine the curve A p,z t ⊂ A of relevant actions, a separate deterministic optimization problem must be solved. Its solution is usually obtained explicitly and provides interesting economic insights.
Contribution of this work: Using a standard Bellman principle, we explore an abstract, but natural framework to reduce a potentially very large space of decision variables (actions) to a single one-parameter family. Although our technique is entirely placed within the traditional Bellman principle of stochastic/classic dynamic programming and addresses a relatively narrow problem class, a wide range of application is covered, including mining operations, pension fund management, and emission control. This approach can serve efficient numerical algorithms for notoriously difficult and important problems from practice.

Optimal Control via Virtual Resource Price
Let us introduce the required framework more precisely. Consider an agent who is confronted with the following problem. At the beginning t = 0, 1 . . . , T − 1 of each decision epoch, an activity plan (work schedule) ξ t is to be determined. Therefore, all costs of this work plan must be optimally balanced against their resource consumption/generation. Suppose that the limited resources are described in terms of the state variable e ∈ I which stands for the current resource shortage and can vary within a certain interval I ⊂ R. For instance, if the resource under consideration is a commodity in the storage, then e ∈ I stands for the amount of commodity required to fill the storage to its maximal capacity. The other state variable z ∈ R d is supposed to represent the situation in the surrounding environment. Let us agree that this environmental state variable z is relevant for decisions, but cannot be influenced by agent's actions. For instance, for commodity storage, z may comprise the driving market factors of the commodity price evolution. We furtherly assume that the environment state occurs at any time t = 0, . . . , T as a realization z = Z t of an R d -valued Markov process (Z t ) T t=0 whose dynamics carries all information, relevant for decisions.
Having observed at time t = 0, . . . , T − 1 the resource level e ∈ I and the realization of the environmental state variable z = Z t , the agent selects a plan ξ ∈ Ξ from the set Ξ of all feasible activity plans. This choice yields an immediate cost C e,z t (ξ) and causes an immediate resource consumption E e,z t (ξ) via pre-specified functions C e,z t , E e,z t on Ξ which may depend on the recent state (e, z) ∈ I × R d . While all costs are accumulated, the resource level is carried over to the next decision time t + 1 as e + E e,z t (ξ) and will influence the decision at this time. The availability of resources becomes crucial at the end t = T of the planning horizon, when a certain terminal costs C T (e, z) must be paid, which depend on the total resource level e ∈ I and on the state z ∈ R d of the environment. Under additional assumptions, such control problems are solved in terms of the so-called value functions (V t ) T t=0 which are obtained via Bellman recursions as for all arguments (e, z) ∈ I × R d representing the current situation at decision time. Please note that we implicitly agreed to penalize the violation of the admissible resource level in (10) by infinity, gaining the freedom to interpret the value function for all arguments e ∈ R with possible values +∞ in order to avoid watching restriction e + E e,z t (ξ) ∈ I in (11). The idea underlying our variable reduction scheme is based on the realization that if the resources had a market price, then an agent would minimize all activity costs taking into account the monetary value of the consumed resources. To realize this concept, we suppose that some entity (a regulatory body) could charge a resources price A ∈ R at any time t, depending on the situation (e, z). In the presence of such virtual price A ∈ R, each the decision-maker would examine the virtual charges for the resource consumption and chose an activity accordingly, by obtaining Obviously, such mapping A → X e,z t (A) represents an optimal activity depending on the current state (e, z) ∈ I × R d , time t = 0, . . . , T − 1, given the resource price A ∈ R. To some degree, this mapping can be interpreted as the willingness to save resources by following a less profitable strategy in response to an increased value of the resource. For ease of understanding, let us postpone the discussion concerning the existence of the minimizer in (12) and the properties of the relation A → X e,z t (A) which are crucial for the targeted results. The important assumption for now is that for each state (e, z) there exists a bounded interval A e,z ⊂ R such that the one-parameter family of activities contains only the "best candidates" (for the purpose of minimization in (11)). They can be used in the Bellman recursion, replacing the minimization in (11) by (16) as follows v t (e, z) = min A∈A e,z (C e,z t (X e,z t (A)) +ṽ t+1 (e + E e,z t (X e,z t (A)), z)) .
Please note that now the minimization in (16) must be performed merely over a curve (13) instead of over the whole space Ξ as in (11). In practical applications, such reduction can provide a reasonable (numerical) approach to a virtually unsolvable problem. The main question here is whether (14)- (16).
In the following, we work out all conditions required therefore. Before we turn to the illustration of our technique by battery storage management in Sections 4 and 5, followed by proofs in Section 6, let us present all required assumptions which ensure the validity of the assertion (17).
Suppose that all information, relevant for decision-making, is carried by R d -valued Markov process (Z t ) T t=0 which is realized on a filtered probability space (Ω, F , P, (F t ) T t=0 ). As introduced above, we assume that the resource level can vary within a certain bounded interval I ⊂ R. Having selected an activity ξ from the set Ξ of all feasible activity plans at time t = 0, . . . , T − 1 in the state (e, z) ∈ I × R d , the costs C e,z while the terminal costs are determined by All our considerations rely on additional assumptions on the functions (18) which we formulate next. To ensure that the minimization in (11) is well-defined, let us agree that there exists an idle activity which does not consume any resources: To ease our argumentation, we suppose that Furthermore, let us propose a mild technical assumption for each z ∈ R d , ξ ∈ Ξ and e, e ∈ I with e < e there exists To determine the desired minimizer ξ * = X e,z t (A * ) to (11), we rely on the following natural technical assumption for each (e, z) ∈ I × R d and t = 0, . . . , T − 1, the function A → E e,z t (X e,z t (A)) is continuous on A e,z , strictly decreasing, and possesses a root.
Finally, we require some convexity properties in the sense that the set Ξ is convex and for each z ∈ R d , t = 0, . . . , T − 1 the functions (e, ξ) → C e,z t (ξ), (e, ξ) → E e,z t (ξ) are convex on I × Ξ, furthermore e → C T (e, z) is convex and non-decreasing on I.
As mentioned above, the advantage of our approach is that a simpler form (14)-(16) of Bellman recursion occurs, whose efficient (numerical) treatment may be easily possible, unlike that of the original problem (9)- (11). Indeed, it turns out that such simplification solves the original problem in the following sense: In view of the above result, the practical solution now requires obtaining for each t = 0, . . . , T − 1 the optimal decision via minimization in the simplified Bellman recursion where the virtual price for the optimal regime is obtained via π * t (e, z) = A * as a minimizer A * = argmin A∈A e,z C e,z t (X e,z t (A)) +Ṽ t+1 (e + E e,z t (X e,z t (A)), z) .
Remark 1. We also prove that the optimal decision π * t (e, z) = A * can be obtained as a solution to the fixed-point equation A * ∈ ∇Ṽ t+1 (e + E e,z t (X e,z t (A * )), z), meaning that A * must be a sub-gradient of the expected value function, i.e., in the optimal regime, the virtual resource price must always be equal to the marginal change rate of the value function with respect to the resource level. The economic interpretation of this insight is natural: when choosing an activity plan, the agent increases the resource consumption to a level where the value loss caused by this consumption starts taking over the instant revenue of the activity.

Battery Storage Control
To illustrate our variables reduction technique, we introduce a model for battery storage installation operated within a deregulated electricity market. First, let us introduce some technical characteristics. The battery capacity stands for the maximal energy stored (measured in MWh), whereas the battery power is the amount of electrical power (measured in MW) the installation can provide at any moment (see [18]). The conditions under which batteries are operated affect their performance in terms of the so-called cycle life, which can be defined as the number of cycles completed before the effective battery capacity falls below than 60% of its nominal capacity. Therefore, the Depth of Discharge (DoD) is essential. To give the reader a quantitative understanding of this phenomena, let us consider an example of a lithium ion battery. Assuming that each charge/discharge cycle causes a battery deterioration, one would assume that emptying completely a battery (which corresponds to 100% DoD) 200 times is roughly equivalent (in terms of performance decline) to 400 cycles at 50% DoD and 600 cycles at 33% DoD. However, the actual behavior is different. Usually, a lithium-type battery serves longer than 400 cycles at 50% DoD and significantly longer than 600 cycles at 33% DoD, (see [19]) In our model, we suppose that visiting deep discharge states is costly since it affects battery life. For this reason, we suggest including a user-defined function which penalizes deep discharge states accordingly. Figure 1 illustrates its dependence measured in charge cycles and dependence on depth of discharge. We include such effects into our model by a cost penalization of deep discharge states.  Beyond preventing deep discharge, further operational improvements encompass avoiding that the battery is fully charged (by reducing voltage when charging) and diminishing the so-called charge/discharge rate, which stands for the maximal electric flow during charging/discharging.
We assume that the agent attempts to optimally manage an electricity storage by sequential decisions on the amount of energy procured/purchased through the intraday market and on the optimal charge/discharge of batteries. Obviously, these decisions must take into account the current electricity price, market state, storage level, and all costs and technical restrictions.
Consider an energy retailer facing the obligation to satisfy an unknown energy demand of its customers at times t = 0, . . . , T − 1, while renewable energy sources produce a random electricity amount given a certain battery storage. To manage a potential energy imbalance, appropriate forward positions are taken in advance and decisions are made to drive the battery storage. This is a typical dynamic control problem, since at any time t = 0, . . . , T an action must be chosen (encompassing a simultaneous energy trade and battery control) which immediately causes some costs but also determines a transition to the next system state (future battery level and market conditions). Clearly, one needs to optimally balance the recent costs against those incurred in the future, based on the current situation. The problems of this type are naturally formulated and solved in terms of dynamic programming.

Remark 2.
Please note that we investigate the problem of electricity storage management within an abstract context which is applicable to most of the deregulated electricity markets, possibly with minor adaptation. Namely we consider energy trading on two time scales. The longer-term trading is realized by forwards or futures or by energy delivery contracts, for instance from the day-ahead trading. In practice, these long-term positions are constantly adjusted on a short-term scale using intraday trading or balancing procedures. This two-scale structure is universal and inherent to any deregulated electricity market and we usually observe significant differences between both scales considering their prices, liquidity, and spread.
Consider an agent who serves an energy consumer whose random demand is (partially) covered from a renewable energy generation facility whose energy output is also random. We denote by D t the cumulative electricity demand of such facility (which stands for demand if D t < 0 and surplus if D t > 0) for the delivery period t. Assume that the time point t = 0, . . . , T − 1 corresponds to the beginning of the period t and agree that the demand D t is observed after t, at the end of the delivery period t. Let us agree to model D t = d t + ε t , with a zero-mean random variable ε t standing for the deviation of the realized demand D t from its prediction d t , which is observable at time t. Moreover, assume that at each time t = 0, . . . , T, the producer can take a forward position which attempts to cover D t . Let us describe such position as d t + f t where f t stands for the deviation of the total amount traded forward from the prediction d t . In generic situations, the quantity f t can be considered to be a "safety margin" which must be purchased on the top of predicted demand d t to avoid a potential energy shortage during delivery interval. However, we do not assume that f t or d t must always be positive. Moreover, introduce the control variable b t , standing for the decision to transfer the energy amount |b t | from/to the battery, where b t > 0 and b t < 0 represent discharging and charging actions, respectively. Here, we agree that these control actions must be decided at time t (immediately before the delivery period t starts). With these assumptions, the energy to be balanced during delivery period t = 0, . . . , T using electricity grid is Please note that on the right-hand side of this equation, the quantities f t and b t must be chosen at t whereas ε t becomes observable after time t.
Now we turn to storage control costs and introduce electricity prices Ψ t = (Ψ + Remark 3. Please note that Ψ 0 t stands for the price of energy from long-term market. In the sense of the above remark, this price can represent a forward, futures, or day-ahead price of electrical energy in front of delivery, depending on modeling.
While the forward price Ψ 0 t is listed prior to the delivery period t and applies to energy traded in advance, the balancing prices Ψ + t , Ψ − t are determined during delivery period t and apply for purchase and procurement of the grid energy. Usually, it holds that In practice, the price range [Ψ + t , Ψ − t ] can be wide, meaning that Ψ + t is significantly lower than Ψ 0 t which is lower than Ψ − t . This issue makes any balancing using grid energy potentially unfavorable. For this reason, the agent attempts to meet the demand as precisely as possible using a combination of the energy from the forward position and from the battery. More specifically, we suppose that the costs, associated with taking forward position, are given by where q > 0 is a coefficient, representing the elasticity of the forward price with respect to contract volume. The total costs associated with energy balancing during the period t = 0, . . . , T is given by Please note that this quantity is observable after t and is controlled by the variables f t and b t which must be chosen at t.
Let us precisely formulate the assumptions on random variables observables, concerning the time of their observation. Suppose that the processes (d t ) T t=0 ( f t ) T t=0 , (b t ) T t=0 , (ε t ) T t=0 are given on a filtered probability space (Ω, F , P, (F t ) T+1 t=0 ) where F t represents the information available at time point t, just before the start of the delivery period t. According to the above modeling, we suppose that for t = 0, . . . , T d t , f t , b t , Ψ 0 t are F t -measurable, and ε t , Ψ + t , Ψ − t are F t+1 -measurable. Let us suppose that and ε t are conditionally independent, given F t t = 0, . . . , T, and denote by E t (·) the expectation, conditioned on F t for t = 0, . . . , T. Applying such conditional expectation E t (·) to (32), we use the conditional independence (4) to obtain In this equation, the prices expected in front of delivery, are denoted bȳ To simplify our energy storage management, we furtherer agree that the distribution of prediction errors does not depend on recent information in the sense that ε t and F t are independent, t = 0, . . . , T.
This natural assumption yields a compact form for the expected costs (34): with functions h + and h − explicitly computable from In view of (34)-(37), the expected costs of (32) depend on the energy control variables f and b as Please note that these costs are expected at time t and can be changed by appropriate adjustment of decision variables f , b ∈ R.
For an agent concerned with the minimization of all costs accumulated within the decision period ranging from t = 0 to t = T, the dynamical aspects are important. Specifically, the decision at time t to use energy b from the storage changes the storage level, which has a distinct impact on the availability of energy in the future, influencing all following decisions.
To formulate our storage control as a dynamic programming problem, we assume that a Markov dynamics (Z t ) T t=0 on (Ω, F , P, (F t ) T t=0 ) carries all relevant information. Therefore, we suppose that (Z t ) T t=0 is a Markovian process which takes values in R d . This process describes the evolution of all relevant state variables of the environment. In particular, we assume that the expected prices are represented by function ψ : R d → R 3 of state variables, whose components ψ = (ψ − In accordance to (30), we require for t = 0, . . . , T that Furthermore, we suppose that at any time t = 0, . . . , T, the conditional expectation of the next period's demand is described in terms of a deterministic function Besides the state of the environment z ∈ R d , the other important state variable is the current storage level e ∈ I. Having denoted the minimal and the maximal energy amounts of the battery by e and e respectively, we suppose that the storage level e represents by the amount e of energy, which is needed to fully charge the battery. With this interpretation, our variable is the resource level e, which takes values in the interval In view of (34)-(39), the expected costs of (32) are now written in terms of a function of the resource level e ∈ I, the state variable z ∈ R d , and the control variables f , b ∈ R for t = 0, . . . , T. Please note that in (42) we include the costs χ(e) of deep discharge corresponding to the resource level e ∈ I, modeled by an increasing and convex function χ : I → R.

Remark 4.
Ideally, an understanding of the chemistry of a particular battery will suggest some generic penalization function. However, in practice, there are diverse approaches to assess (in economic terms) how the battery life is affected by visiting a deep discharge state. Here, the usual cycle life graph (as in Figure 1), does not detail all aspects required by our approach. The point is that the cycle life is determined by charge/discharge periods oscillating linearly between the minimal and the maximal capacity, rather by a certain strategy run within a random environment. Such life cycle diagrams do not provide any information whether it could be worth visiting a deep discharge state for a short period of time in order to catch an electricity price spike. Given diverse battery storage technologies, there is no simple way of determining an appropriate penalization function. For this reason, the authors suggest a pragmatic approach: For a pre-specified penalization as depicted in Figure 2, the user will calculate the corresponding optimal strategy which must be examined and assessed in simulations. If required by simulation results, the penalty function can be altered, followed by another round of optimization and simulations. Such attempts can be repeated until satisfactory results (considering battery life expectation, peak load response, and total revenue) are reached. Now, let us propose Bellman recursion for our battery control problem. Recall that in our modeling, the state variables at time t = 0, . . . , T comprise a market situation described by the realization z = Z t of the environment state process and the current resource level e ∈ I. Having supposed that at the final date t = T, in the environment state z = Z T the entire battery storage content e − e can be sold at the market price ψ 0 T (z), we agree that the terminal costs function is This quantity determines the value function V T at the final time T by Prior to the terminal time t = 0, . . . , T − 1, the backward induction yields the value functions where the minimum must be taken over the set Ξ(e, z) of all admissible controls due to restriction that the energy transfer from/to the storage is limited by the storage capacity. However, in our approach, many arguments rely on the assumption that the set of admissible decisions does not depend on the current situation. To replace the minimization over admissible controls Ξ(e, z) in (11) by a minimization over an unrestricted set we introduce a penalization for violation ( f , b) ∈ Ξ(e, z), having in mind that for a sufficiently strong penalization it will be never optimal to violate the restriction e + b ∈ I. This concept is realized by the following recursions: Please note that with this definition, the functions (V t ) T t=0 satisfy (45)-(48) if and only if they fulfill (49)-(51).

Variables Reduction for Battery Control
In this section, we adopt and use our variable reduction technique to the specific situation described in Section 4. Therefore, we show that all assumptions of Theorem 1 are fulfilled. Recall from our modeling (44), (42) that the set of activities is given by with the interpretation that f and b represent the energy from trading and that from the battery, respectively. On this account, we have assumed that the costs and resource consumption functions are given by Here the coefficient q > 0 represents the price elasticity whereas the functions h + t and h − t are defined by (37) as in terms of the distribution P ε t of the demand prediction error ε t , this distribution is non-random by assumption (35). Define the function and observe that u → h t (z, u) is convex and non-increasing on R for each z ∈ R d due to 0 ≤ ψ + giving the minimizer From this, we proceed with that shows the desired convexity.
In what follows, we recall the notion ∇g of the so-called sub-gradient of a convex function g : R n → R ∪ {∞}, (n ∈ N) which is defined at each point u ∈ R n as the family of linear functionals For a function V on I × R d → R as in (65) which is convex in the first component on I, we agree to consider its sub-gradient in the first component, only for arguments in I. More precisely, for (e, z) ∈ I × R d we write ∇V(e, z) to denote the set of linear functionals l : R → R satisfying V(e, z) + l(e − e) ≤ V(e , z) for all e ∈ R.
Now, let us elaborate on the fixed-point property of the virtual resource price in the optimal regime in the spirit of the remark before Section 4.
Proof. Suppose on the contrary that A * ∈ ∇Ṽ( ): then there exists δ ∈ R with First, we assume that δ > 0, then there exists A 0 < A * such that and sinceṼ is convex, the above inequality holds for all intermediate points However, for sufficiently large n ∈ N we obtain A + n > A 0 such that which gives a contradiction to (75) for n ∈ N sufficiently large to satisfy ∆ + n ∈]0, δ[. A similar argument in the alternative situation δ < 0 also shows a contradiction and completes the proof.
Finally, let us now compose the outcomes of the Lemmas 1-4 to our technique for control variables reduction. Consider a problem defined in Section under the assumptions formulated in Section 3. Let us prove Theorem 1.

Proof.
We proceed by induction to show that the functions (V t (·, z)) T t=0 and (Ṽ t (·, z)) T t=0 are convex and non-decreasing on I for each z ∈ R d .
Since for each z ∈ R d the function is convex and non-decreasing by (24), the initialization (10) ensures thatṼ T (·, z) is convex and non-decreasing on I. Please note that all conditions of Lemma 1 are fulfilled by assumptions, thus for t = T − 1 we conclude that that V T−1 (·, z) is also convex and non-decreasing on I for each z ∈ R d , and calculating the conditional expectation, we observe thatṼ T−1 (·, z) is also convex and non-decreasing on I for each z ∈ R d . Proceeding for t = T − 1, . . . , 0 inductively, with the same argumentation, the assertion (78) follows. Now we turn to the main claim (25). Since (9), (10) coincide with (14), (15), we obtainṼ T =ṽ T . With this, we apply Lemma 3 whose assumption (65) is satisfied because of (78), ensuring thatṼ T =ṽ T is convex and non-decreasing in the first argument on I. Further conditions (21), (23) of this lemma also hold. Moreover, since we have supposed in (16) that the minimum is reached, say at A * = A * ,e,z , the fixed-point property (66) A * ,e,z ∈ ∇Ṽ T (e + E e,z T−1 (X e,z T−1 (A * ,e,z )), z) holds for all (e, z) ∈ I × R d . Using Lemma 2, we conclude that ξ * T−1 = X e,z T−1 (A * ,e,z ) is a minimizer to (10), showing that V T−1 = v T−1 . Repeating the argumentation for t = T − 1, . . . , 1, the desired assertion v t = V t for t = T, . . . , 0 follows inductively.

Conclusions
New technologies have triggered a growing attention to optimization algorithms for operational management of energy storage facilities. In the generic settings, a typical electricity retailer determines an optimal strategy for purchase, procurement, generation, and storage of electrical power while taking into account a fluctuating energy price, storage costs, limited storage capacity, and uncertain production rates. Such problems are challenging, since numerous decision and control variables must be considered within a potentially high-dimensional setting. This paper addresses a variables reduction technique for such problems. Within an abstract, but natural framework, we show that a certain Legendre-type transform can be applied to equivalently reformulate the original strategy optimization into a stochastic control problem driven by a single one-parameter family of decision variables. It turns out that the presented technique is sufficiently general and can be applied to solve diverse dynamic resource allocation involving scarce reserves. These problems encompass a wide range of important areas including mining operations, pension fund management, and emission abatement. The authors believe that their approach can be used as a starting point for efficient numerical algorithms and will address this topic in future research.