A Neural Network Monte Carlo Approximation for Expected Utility Theory

: This paper proposes an approximation method to create an optimal continuous-time portfolio strategy based on a combination of neural networks and Monte Carlo, named NNMC. This work is motivated by the increasing complexity of continuous-time models and stylized facts reported in the literature. We work within expected utility theory for portfolio selection with constant relative risk aversion utility. The method extends a recursive polynomial exponential approximation framework by adopting neural networks to ﬁt the portfolio value function. We developed two network architectures and explored several activation functions. The methodology was applied on four settings: a 4/2 stochastic volatility (SV) model with two types of market price of risk, a 4/2 model with jumps, and an Ornstein–Uhlenbeck 4/2 model. In only one case, the closed-form solution was available, which helps for comparisons. We report the accuracy of the various settings in terms of optimal strategy, portfolio performance and computational efﬁciency, highlighting the potential of NNMC to tackle complex dynamic models.


Introduction
Optimally allocating a collection of financial investments such as stocks, bonds and commodities has been a topic of concern to financial institutions and shareholders at least since the pioneering work of Markowitz's mean-variance portfolio theory in 1952. People then realized the potential of diversification and their work laid the foundations for the development of portfolio analysis in both academia and industry. These initial results were in discrete-time, but it was not long before continuous-time portfolio decisions were produced in the alternative paradigm of expected utility theory, as can be seen in Merton (1969). The author assumed that the investor is able to continuously adjust their position, and the stock price process is modelled by a geometric Brownian motion (GBM). The optimal trading strategy and consumption policy that maximize the investor's expected utility were obtained in closed-form by solving a Hamilton-Jacobi-Bellman equation.
The beauty and practicality of this continuous-time solution has led many researchers onto this path, producing optimal closed-form strategies for a wide range of models. For example, Kraft (2005) considered the stochastic volatility (SV) Heston model, Heston (1993). Flor and Larsen (2014) constructed a portfolio of stocks and fixed-income market products to hedge the interest rate risk. Explicit solutions in the presence of regime switching, stochastic interest rate and stochastic volatility was presented in Escobar et al. (2017), whilst the positive performance of their portfolio is confirmed by empirical study. For the commodities asset class, Chiu and Wong (2013) modelled a mean-reverting risky asset by an exponential Ornstein-Uhlenbeck (OU) process and solved the investment problem for an insurer subject to the random payment of insurance claim.
These models are particular cases of the quadratic-affine family (see Liu (2006)), one of the broadest models solvable in closed-form. The value function for a model in We designed two architectures enriching an embedded quadratic-affine structure, and we considered three types of activation functions.
Given the lack of closed-form solutions for SV 4/2 models, we used them as our toy examples in the implementations. In particular, we first implemented our methodology in the solvable case (i.e., GBM 4/2 with solvable MPR), so the accuracy and efficiency were demonstrated before it is applied to the unsolvable cases of: GBM 4/2 model with stochastic jumps, GBM 4/2 model with proportional instantaneous volatility MPR, and the OU 4/2 model. Furthermore, we numerically show which network architecture is preferable in each case.
The paper is organized as follows. Section 2 introduces the dynamic portfolio choice problem, and presents the neural network architectures, activation functions and parameter training details. The step-by-step algorithm of our methodology is provided in Section 3. Sections 4 and 5 apply the methodology to the GBM 4/2 and the OU 4/2 models. Section 6 concludes.

Problem Setting and Architectures of the Deep Learning Model
We considered a frictionless market consisting of a money market account (cash, M) and one stock (S). We assume the stock price follows a generalized diffusion process incorporating a one-dimensional state variable X. All the processes are defined on a complete probability space (Ω, F , P) with a right-continuous filtration {F t } t∈[0,T] , summarized by the stochastic differential equations (SDE): (1) B t and B X t are Brownian motions with correlation ρ. r(X t ) is the interest rate, θ(X t , S t ) and σ(X t , S t ) are the drift and diffusion coefficients for the stock price. a(X t ) and b(X t ) are measurable functions of state variable X t . N t is a pure-jump process independent of B t and B X t with stochastic intensity λ N X t for constant λ N > 0, and µ N > −1 denotes the jump size.
We consider an investor with risk preference represented by a constant relative risk aversion (CRRA) utility: Investors can adjust their allocation at a predetermined set of rebalancing times (0, ∆t, 2∆t, ..., T − ∆t). The investors wish to derive a portfolio strategy π (percentage of wealth allocated to the stock) that maximizes their expected utility of terminal wealth, in other words, E(U(W T )). The value function, representing the investor's conditional expected utility, has the following representation: The value function is separated into a wealth factor W 1−γ 1−γ and a state variable function f . The NNMC estimates the state variable function f with a neural network model NN and computes the optimal strategy π * t with the Bellman principle.

Architectures of the Deep Learning Model
In this section, we present two neural network architectures to fit the value function. According to the separable property of the value function shown in (3), the only unknown component is the state variable function f , which is therefore the target function for the neural network. The architectures of the networks are built around exponential polynomial functions, which are the most common form of solvable investor's value functions and used in the PAMC method (see ). This property of proposed networks ensures that the new method generalizes PAMC.
The neural network is expected to achieve a better fit than a polynomial regression if the true state variable function is significantly different from the exponential polynomial function. Furthermore, we designed an initialization method for networks, which is better than a random initialization in terms of portfolio value function fitting.

Sum of Exponential Network
We first introduced the sum of the exponential polynomial neural network (SEN), as illustrated in Figure 1. The amount of input depends on the number of state variables. For simplicity, we took two inputs as an example. The first hidden layer computes the monomial of inputs. The second hidden layer obtains the linear combinations of the neuron in the first layer, where the weights are fitted in NNMC. An exponential activation function is applied to the second layer. The final output calculates a linear combination of exponential polynomials, so the exponential polynomial is a specific case of this neural network. We denote the sum of exponential network by NN SEN ; the proposition next states the estimation of the corresponding optimal allocation. Proposition 1. Given the SEN approximation of the value function at the next rebalancing time t + ∆t, (i.e., NN SEN [t + ∆t, S t , X t ]), the optimal strategy at time t is given by which is the solution of: where: when S t follows a jump process, i.e., σ(X t , S t ) = 0.
Proof. It follows similarly to Theorem 1 in . According to the Bellman principle: We substitute V(t + ∆t, W t+∆t , S t+∆t , X t+∆t ) with W 1−γ 1−γ NN SEN (t + ∆t, W t+∆t , S t+∆t , X t+∆t ) and expand the right hand side of the equation with respect to W, S and X, then V(t, W t , S t , X t ) is written as a function of strategy π t . Equation (5) is obtained with the first order condition.

Improving Exponential Network
The architecture of an improving exponential network (IEN) is exhibited in Figure 2. The target function of IEN is the log of the state variable function f (i.e., ln f ). The neural network consists of three parts. Node 1 is a polynomial with the output denoted by V 1 . Node 2 is an artificial neural network with an arbitrary number of hidden layers and neurons; we denoted its output by V 2 . Node 3 is a single-layer network with a Sigmoid function which computes a proportion p ∈ [0, 1]. The final output is the weighted average of the first two nodes pV 1 + (1 − p)V 2 . The second node is the complement to the exponential polynomial function. Moreover, the similarity between the true value function and the exponential polynomial function is measured by p, which is fitted into the NNMC methodology. Therefore, the network automatically adjusts the weights on the exponential polynomial function and its supplement according to the generated data. Finally, the state variable function f is computed as which is the geometric weighted average of nodes 1 and 2. Letting NN IEN denote the IEN, the estimation of the optimal strategy is given in the next proposition.
Proposition 2. Given the IEN approximation of the log value function at time t + ∆t (i.e., NN IEN [t + ∆t, S t , X t ]), the optimal strategy at time t is given by which is the solution of: where: when S t follows a diffusion process, in other words, λ N = 0.
Proof. The proof follows similarly to Proposition 1.

Initialization, Stopping Criterion and Activation Function
In this section, we disclose more details on training the neural networks. The initialization of weights is the first step of network training, which may significantly impact the goodness of fit. A good initialization prevents the network's weights from converging to a local minimum and avoids slow convergence. Random initialization is mostly used as the interpretability of the network is usually weak. In contrast, both the SEN and the IEN are extensions of an exponential polynomial function; we suggest taking advantage of the results from the polynomial regression. Hence, the neural network searches the minimum near the exponential polynomial function used in the PAMC ensuring consistency. The polynomial regression initialization achieves superior results to the random initialization.
The coefficients of the exponential polynomial were first obtained with a regression model. The output of the SEN is a linear combination of exponential polynomial func- n (x, y)) + b, we substitute the coefficients from polynomial regression into P 1 n (x, y) and set a 1 = 1, a 2 = a 3 = ... = a n = b = 0. For the initialization of the IEN, we substitute the coefficients into the first node and artificially make p = 0.
The training process minimizes the mean squared error (MSE) between the network's output and the simulated expected utility, and the sample data are split into a training set and a test set to reduce the overfitting problem. Adam is a back-propagation algorithm that combines the best properties of the AdaGrad and RMSProp algorithms to handle sparse gradients on noisy problems and provides excellent convergence speed. We applied the Adam on the training set for updating the network's weights, and the test set MSE was computed and subsequently recorded. The test set MSE was expected to be convergent, so the training process was finished when the difference between the moving average of the recent 100 test set MSEs and the most recent test set MSE was less than a predetermined threshold, which was set at 0.00001 in the implementation.
The number of exponential polynomials is a hyperparameter in the SEN. We let the SEN be a sum of two exponential polynomial functions for simplicity. Node 2 in the IEN is an artificial neural network, which complements node 1 when the value function significantly deviates from an exponential polynomial function. The number of hidden layers and neurons, as well as the activation function of node 2, are freely determined before fitting the value function. We assume node 2 is a single layer network with 10 neurons and we implement several functions for comparison purposes, such as the logistic (sigmoid): the Rectified linear unit (ReLU): and the Exponential linear unit (ELU):

Notation and Algorithm of the Methodology
In this section, we clarify the notation and the step-by-step algorithm. Table 1 displays a summary of the notation.

Algorithm
We first generated the paths of the stock price S m t and state variable X m t . The method starts from t = T − ∆t (i.e., the last rebalancing time before the terminal). We computed the optimal strategy π m T−∆t given W 0 , S m T−∆t , X m T−∆t using the Equation (5) or Equation (10).
] when using IEN. The network NN(T − ∆t, X, S), approximating the state variable function, is trained with the input (X m T−∆t ,S m T−∆t ) and outputv m . We conduct a similar procedure at each rebalancing point and recursively approximate the value function and optimal strategy until the inception of the portfolio. To evaluate the expected utility, we regenerated the paths of stock price and state variables. The path-wise optimal strategy was computed from NN(t, X, S), so the optimal terminal wealth is easy to obtain. The average of the utility of optimal terminal wealth approximates the expected utility. Algorithms 1 and 2 present the pseudo code for NNMC using SEN and IEN, respectively. Simulation variance reduction methods, such as antithetic variates, could be incorporated into both algorithms to reduce the standard error of estimated expected utility.

Algorithm 1: NNMC-SEN
Input: S 0 ,W 0 ,X 0 Output: Optimal trading strategy π * 0 and expected utilityV(0, Compute optimal allocation π m T−∆t with Equation (5) ; Train a network with input (X m T−∆t ,S m T−∆t ) and outputv m . Denote the network by NN(T − ∆t, X, S) Train a new network with input (X m T−∆t ,S m T−∆t ) and outputv m and denote it by NN(t, X, S) ; Compute π * 0 with with NN(∆t, X, S) and Equation (5);

15
Generate new paths of S z t , X z t f or z = 1...N 0 , use the estimation of value function NN(t, X, S) to compute π z t and W z T .

5
Simulate wealthŴ m,n T (π m T−∆t ) given W 0 , S m T−∆t , π m T−∆t and X m T−∆t at T − ∆t f or n = 1...N; Train the network with input (X m T−∆t ,S m T−∆t ) and outputv m . Denote the network by NN(T − ∆t, X, S) Compute optimal allocation π m t with NN(t + ∆t, X, S) and Equation (10) given W 0 , S m t , and X m t ;

10
Simulate wealthŴ m,n t+∆t (π m t ),Ŝ m,n t+∆t andX m,n t+∆t given W 0 , S m t , π m t and X m t at t f or n = 1...N; Train a new network with input (X m T−∆t ,S m T−∆t ) and outputv m and denote it by NN(t, X, S) ; Compute π * 0 with with NN(∆t, X, S) and Equation (10);

16
Generate new paths of S z t , X z t f or z = 1...N 0 , use the estimation of transformed value function NN(t, X, S) to compute π z t and W z T .

17
The expected utility is,

Application to 4/2 Model
Grasselli (2017) unified the 1/2 and 3/2 SV models and proposed the 4/2 SV model. The 4/2 model better captures the evolution of the implied volatility surface and uniformly bounds the instantaneous variance away from zero when weights on 1/2 and 3/2 factors are positive. We implement the NNMC on the 4/2 model and report the optimal allocation, expected utility and the annualized CER defined by Three versions of the 4/2 model are considered; all are specific cases of the generalized model (1). The first assumes market price of risk proportional to the volatility driver. In other words, the value function and the optimal allocation are solvable in closed form. The second incorporates stochastic jumps into the 4/2 model, while the last uses the preferred setting for the market price of risk in the economics/finance literature (i.e., proportional to the instantaneous volatility). The parameters used in this section are presented in Table 2 1 and are estimated from the S&P 500 and its volatility index (VIX) in Cheng and Escobar-Anel (2021).

A Solvable Case
Cheng and Escobar-Anel (2021) found the closed-form solution for an optimal dynamic portfolio when the stock price follows a 4/2 model with a market price of risk linear to the square root of the volatility driver √ X t . The dynamics of stock price S t and volatility driver X t are exhibited in (16): Solving the associated Hamilton-Jacobi-Bellman (HJB) equation: the optimal trading strategy and value function are given by The functions a(T − t) and b(T − t) are: with auxiliary parameters k 0 = 1−γ γ λ 2 S , k 1 = κ X − 1−γ γ ρ SX σ X λ S , k 2 = σ 2 X + (1−γ)σ 2 X ρ 2 SX γ and k 3 = k 2 1 − k 0 k 2 . The closed-form solution (see (18)) reveals that the value function in this case is an exponential linear function. Hence, we set the degree of polynomial to 1 when implementing NNMC with both the SEN and the IEN. Table 3 compares the optimal allocation, expected utility and CER from NNMC, the embedded PAMC and the theoretical solution. PAMC takes the least computational time. The optimal allocation obtained from PAMC is more accurate than the results from NNMC, while the differences in expected utility and CER are not significant. Furthermore, SEN slightly outperforms IEN in terms of the accuracy of optimal allocation and computation efficiency. Moreover, the ReLU activation function is superior to the sigmoid and ELU function when the IEN is applied. Table 3. Results for the 4/2 model with a market price of risk λ S √ X t . We reported the optimal weights, expected utility and CER obtained with the theoretical result and with the approximation method for different levels of risk aversion γ. The standard deviation of estimated expected utility and CER from 100 runs is displayed in parentheses. We repeat the estimation of expected utility (i.e., steps 14-16 in NNMC-SEN and steps 15-17 in NNMC-IEN) after the value function and optimal strategy are obtained. All approximation methods have similar standard deviations of the estimated expected utility and CERs. Moreover, standard deviation decreases with an risk aversion level γ, which indicates that our approximation is more accurate for higher risk averse investors. Figure 3 displays the expected utility and CER as a function of time to maturity T when γ = 2. The expected utility increases with maturity T as expected, while the CER decreases. Expected utility from PAMC, NNMC and the theoretical solution are visually the same. The comparison in portfolio performance is clearer by showing the CER: PAMC and NNMC produce CERs that are slightly smaller than the theoretical result. Furthermore, ELU seems to be inferior to the ReLU and sigmoid function, and the CER obtained from NNMC with the ELU activation function is slightly smaller than the results from other methods when the investment horizon is small.

An Unsolvable Case, 4/2 Model with Jumps
We then extended the 4/2 model to account for stochastic jumps. The dynamics of stock prices and volatility drivers are summarized by the SDE:

< B S
Volatility and market price of risk are the same with the 4/2 model given in (16). N t is an independent Poisson process with intensity λ N X t , µ N is the jump size, and λ Q X t captures the market price of jump risk.
We used the set of jump risk parameters given in Liu and Pan (2003): λ N = λ Q = 0.1/θ X and µ N = 0.1. Notably, the stock is expected to jump once every 10 years if X t stays at its mean level θ X . The degree of polynomial in PAMC and NNMC was chosen to be 1. In this case, the optimal strategy cannot be explicitly solved given the approximation of the value function at the next rebalancing time (see Propositions 1 and 2), which is therefore obtained by the Newton-Raphson method in NNMC. The optimal allocation, expected utility, CER obtained with NNMC and PAMC are reported in Table 4. When the stock follows the 4/2 model with jumps, PAMC is faster, followed by NNMC-SEN. Moreover, the accuracy of the estimated expected utility and CER from PAMC and NNMC are similar; the standard deviations of these approximation methods have little difference. Figure 4 exhibits the expected utility and CER as a function of investment horizon T. Portfolios with a longer investment horizon are expected to achieve a better performance (i.e., higher expected utility) while CER decreases with T. Table 4. Results for the 4/2 model with stochastic jumps. We report the optimal weights, expected utility and CER obtained via the approximation methods for different levels of risk aversion γ. The standard deviation of estimated expected utility and CER from 100 runs is displayed in parentheses.

An Unsolvable Case, Market Price of Risk Proportional to Volatility
In this section, we consider an excess return, proportional to the instantaneous variance. The dynamics are given in (21), and a closed-form solution has not yet been found. We report the optimal allocation and expected utility from PAMC and NNMC, as well as investigated the impact of maturity T. The degree of polynomial in PAMC and NNMC remains 1: Table 5 reports the optimal allocation, expected utility and CER from PAMC and NNMC. PAMC is still the most efficient method, followed by the NNMC-SEN. All methods achieve similar portfolio performance in terms of the expected utility and CER as well as the corresponding standard deviation. Figure 5 plots the expected utility and CER versus maturity T when γ = 2, which further verifies the non-significant difference in expected utility and CER obtained from the methods.
). We report the estimation of optimal weights, expected utility and CER obtained via approximations given different levels of risk aversion γ. The standard deviation of estimated expected utility and CER from 100 runs is displayed in parentheses.
, where (a) shows the expected utilities obtained with theoretical results and approximation methods versus investment horizon T; and (b) shows the CERs versus investment horizon T given γ = 2.

Application to the OU 4/2 Model
Motivated by the 4/2 stochastic volatility model and mean-reverting price pattern popular among various asset classes (e.g., commodities, exchange rates, volatility indexes), Escobar-Anel and Gong (2020) defined an Ornstein-Uhlenbeck 4/2 (OU 4/2) stochastic volatility model for volatility index option and commodity option valuation. Equation (22) presents the dynamics involved in the OU 4/2 model, which is a specific case of (1) The parameters used in this section are reported in Table 6, which is estimated from the data of gold Exchange-traded fund (ETF) and the volatility index of gold ETF in Escobar-Anel and Gong (2020). There are two state variables in the OU 4/2 model; hence, the input in both the SEN and the IEN are 2. Furthermore, the degree of polynomial in PAMC and NNMC is 2: SEN performs worse than IEN when fitting the value function with the OU 4/2 model. Sometimes, SEN significantly deviates from the true value function, which results in poor portfolio performances and the occurrence of negative terminal wealth. Therefore, we excluded the results from NNMC-SEN in this section. Table 7 compares the optimal allocation, expected utility and CER obtained for the OU 4/2 model. PAMC and NNMC-IEN produce similar optimal allocations, both outperforming NNMC-SEN. Furthermore, we also estimated the standard deviation of expected utility and CER, which demonstrates that NNMC leads to a less volatile estimation of expected utility and CER than PAMC in most cases. In contrast to the results for the 4/2 model, IEN is more efficient than SEN. We conclude that IEN is suitable for the model with a complex structure and multiple state variables. The expected utility and CER as a function of the maturity T when γ = 2 is plotted in Figure 6. Both the expected utility and CER increase with T. The expected utility and CER obtained from PAMC and NNMC-IEN visually overlap and are slightly higher than that of NNMC-SEN. Moreover, the selection of activation function in IEN makes little difference. Table 7. Results for the OU 4/2 model. We report the estimation of optimal weights, expected utility and CER obtained via approximations for different levels of risk aversion γ. The standard deviation of estimated expected utility and CER from 100 runs is provided in parentheses.

Conclusions
This paper investigated fitting the value function in an expected utility, dynamic portfolio choice using a deep learning model. We proposed two architectures for the neural network, which extends the broadest solvable family of value functions (i.e., the exponential polynomial function). We measured the accuracy and efficiency of various types of NNMC methods on the 4/2 model and the OU 4/2 model. The difference in optimal allocation, expected utility and CER is insignificant when the stock price follows the 4/2 model. The embedded PAMC is superior to NNMC due to the lower parametric space, hence its efficiency. Furthermore, when considering the OU 4/2 model, NNMC-SEN is inferior to a polynomial regression (PAMC) and to the NNMC-IEN in terms of expected utility and CER.
In summary, NNMC benefits from the popular exponential polynomial representation (embedded PAMC method) to propose a network architecture flexible enough to reach beyond affine models. Although the best setting, NNMC-IEN (ELU), is not as efficient as PAMC, neural networks demonstrate the way to tackle more advanced models along the lines of Markov switching, Lévy processes and fractional Brownian processes.
Author Contributions: The authors contributed equally. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest. Note 1 ∆ re t is the portfolio rebalancing interval, 1 ∆ re t indicates the rebalancing frequency. The Euler method with step size ∆ si t is applied in generating the stock price and states variables.