Long-and Medium-Term Financial Strategies on Equities Using Dynamic Bayesian Networks

: Devising a financial trading strategy that allows for long-term gains is a very common problem in finance. This paper aims to formulate a mathematically rigorous framework for the problem and compare and contrast the results obtained. The main approach considered is based on Dynamic Bayesian Networks (DBNs). Within the DBN setting, a long-term as well as a short-term trading strategy are considered and applied on twelve equities obtained from developed and developing markets. It is concluded that both the long-term and the medium-term strategies proposed in this paper outperform the benchmark buy-and-hold (B&H) trading strategy. Despite the clear advantages of the former trading strategies, the limitations of this model are discussed along with possible improvements.


Introduction
In this paper, Dynamic Bayesian Networks (DBNs) are used to study the problem of obtaining and testing a financial strategy whose return is higher than the buy and hold strategy for a given equity.A Bayesian Network (BN) is a graphical and compact representation of a joint probability density function (PDF) that makes use of conditional independence and can be used to model a system under one time instance.DBNs extend the BN to more than one time period, which is to say that DBNs are a temporally driven extension of BNs.Some benefits of using DBNs over other models, such as basic time series models, are the following.Firstly, DBNs have the capacity to incorporate what are known as hidden (or latent) variables, which either govern, or are thought to govern, the observations of the observable random variable.These hidden variables are random variables whose true value cannot be measured directly, sometimes due to instrumental inadequacies, and other times simply because the existence of these variables is hypothesized.Examples of such hidden variables are the intelligence of an individual (which cannot be measured precisely using any instrument) or the state (bull or bear) of a financial market (whose existence itself is hypothetical).Secondly, since they are an offshoot of BNs, DBNs allow for compact representations of an otherwise possibly cumbersome joint distribution.Russell and Norvig [1] argue that due to conditional independence, representing a joint distribution as a DBN may eliminate any redundant terms, which, in turn, also makes parameter estimation simpler.Murphy [2] state that another benefit of DBNs is the ease with which variations to the model at hand can be introduced.DBNs can be used to alternatively represent both basic Hidden Markov Models (HMMs), and HMMs with variations, such as HMMs with a mixture-of-Gaussian outputs, Auto-regressive HMMs, Input-Output HMMs and Hierarchical HMMs.DBNs can also be used to represent Kalman Filter Models where, contrary to HMMs, the hidden state is continuous rather than discrete.Lastly, DBNs also have the capacity to answer questions about different types of reasoning or inference, such as diagnostic inference (from effects to causes), causal inference (from causes to effects), intercausal inference (between causes of the same effect), and mixed inference (a mixture of any two of the above types of inference).
In this research paper, we use the price-earnings (PE) ratio in the models discussed.In the literature, various papers can be found which apply the said PE ratio to examine investment strategies.Most notably, one can mention Basu [3].This ratio has also been used in a vast range of literature including that of Lleo [4] and Angelini [5].Chang and Tian [6] also use BNs to model the qualitative and quantitative relationships between several variables that affect the dynamics of the S&P 500 stock index, with the aim of optimizing trading decisions, namely when to open a short position and when to invest in a long position.Damiano et al. [7] base their work on a previous study, that of Tayal [8], which employs HHMMs to analyze financial data.The results obtained by Tayal are reproduced in Damiano et al.'s paper, and further insight is given.Damiano et al. [7] conclude that probabilistic inference allows the identification of two distinct states in high-frequency data that are mainly marked by buying and selling pressure.
The rest of this paper is structured as follows.Section 2 introduces the theoretical underpinnings of DBNs.In Section 3, DBNs are applied to twelve equities from both developed and developing markets, and the return on investment is presented.The reason for considering both developed and developing markets is to test model robustness under different economic conditions and growth patterns.Finally, in the concluding section, the key points and results obtained are highlighted.

Theoretical Framework
This model is based on the work by Wang [22] and exploits two behavioral finance phenomena-behavioral volatility and mean reversion.Behavioral volatility refers to the phenomenon whereby market-related events such as irrational trades executed by inexperienced traders cause the trading price of a stock to deviate away from its 'true' value.Events like these continually affect market price either until the effects caused cancel each other out, or until rational investors balance these effects out with their rational trades.This phenomenon is known as mean reversion.The theory of mean reversion is applicable not only to the price of a stock, but to any price-related metric, such as the price-earnings (PE) ratio (share price divided by the earnings per share).As a result, one can decide to buy (or sell) a stock only when some metric that follows the mean reversion theory is far away from the mean, knowing that the 'true' value of the metric will be returned to at some point in the future.
A Directed Graphical Model (DGM) is a representation of a probabilistic model that uses conditional independence assumptions through a directed graph, where the vertices (note that the terms "state", "random variable", "node" and "vertex" may be used interchangeably) of the graph represent random variables and the arcs represent conditional dependence between pairs of random variables.An arc from vertex A to vertex B represents the statement "A causes B" (Murphy [2]).DGMs are convenient due to their compactness in exhibiting some joint probability distribution-for N binary random variables, the general closed form of the joint probability distribution of these random variables may need O(2 N ) parameters, whereas the graphical model may give the same information using fewer parameters due to the omission of terms via conditional independence statements.
A Bayesian Network is a DGM that represents a set of random variables X 1 , ...., X n and their conditional dependencies through a directed acyclic graph (DAG), denoted by G = (V, E ).Each vertex in V represents a random variable X i , and each directed edge in E represents the conditional dependence a random variable has on another.A conditional probability distribution is associated to each node X i , and the joint probability distribution on the vertex set V is given by P The generalization of a BN to multiple time slices gives rise to the definition of a DBN.A Dynamic Bayesian Network is defined to be a pair (B 0 , B → ), where B 0 is the BN representing the prior probability (or the initial state probability), that is, the probability distribution of the random variables at time 0, and B → is a two-slice temporal BN which describes transition probabilities from time t − 1 to time t for any node X in the vertex set V of the DAG G = (V, E ), denoted by P[X t |X t−1 ].The joint probability distribution of the vertex set V over all time slices is given by: where V t 1 :t 2 refers to the set of all vertices indexed from time t 1 up to time t 2 .A DBN can be parametrized by its transition matrix A t = P[X t = j | X t−1 = i], and its prior distribution If the transition probabilities are assumed to be constant for all time slices (A t = A), then they are said to be homogeneous and have a much more compact joint distribution function.When representing a DBN pictorially, two or three time slices are typically shown-the initial time slice and the subsequent one or two-since its structure is assumed to replicate throughout time.
An important property of DAGs is that "nodes can be ordered such that parents come before children.This is called a topological ordering, and it can be constructed from any DAG."(Murphy [23]).Given such an ordering, the Ordered Markov property (or Local Markov property) is defined to be the "assumption that a node only depends on its parents, not on all its predecessors in the ordering" (Murphy [23]).This assumption is in fact a generalization of the Markov property for Markov chains.

Inference for DBNs
The objective in inference is to infer the value of the latent states given the observations of the observable states, that is, inferring the marginals P[X t = i | y 1:τ ].If τ = t, the process is known as filtering (or 'now-casting'); if τ > t, then this is smoothing; and if τ < t, then one would be performing prediction (or forecasting).A commonly used inference algorithm for DBNs is the forward-backward algorithm.In this algorithm, dynamic programming is implemented through two steps, known as passes, that run in a counter-directional manner-one runs forward in time, whilst the other runs backward.Note that it is assumed that the transition probability matrix, emission probability matrix, and prior probabilities are all known.In the forward pass, the value of α t (i) := P[X t = i | y 1:t ] is found in a recursive manner.In the backward pass, the value of β t (i) := P[y t+1:T | X t = i] is also found recursively but moving in counter chronological order (from time T to time 2).After the forward and backward passes are complete, the value for γ t (i) := P[X t = i | y 1:T ] can finally be obtained: where P[y 1:T ] = ∏ T t=1 c t .

Learning DBNs
In this context, learning refers to the parameter estimation process.In parameter estimation, learning can be tackled either through a Maximum Likelihood (ML) approach or through the Maximum A Posteriori (MAP) approach.If using ML, then the data are used to obtain parameter estimates via the solution of the optimization problem: , where θ is the set of parameters to be estimated.Typically, θ contains the transition matrix and parameters pertaining to the probability distribution used in the emission matrix.On the other hand, if using the MAP method, the optimization problem above is adjusted slightly to become θ * MAP = arg max θ log P[Y | θ] + log P[θ] , where P[θ] is the parameter prior distribution.The approach to solving the optimization problems above is through an adaptation of the Expectation-Maximization (EM) algorithm known as the Baum-Welch algorithm, proven to give a local optimum to the optimization problems above (Baum et al. [24], Dempster et al. [25]).It uses the forward-backward algorithm as a subroutine.Hence, in cases where the model parameters are unknown, the Baum-Welch algorithm is first used to estimate (or learn) the model parameters.Then, the forward-backward algorithm is used to infer the posterior marginals.

Application of Theory
The main hypothesis of the model is that the stock price of a firm is not always equal to the firm's 'true' intrinsic value.Utilizing the phenomena of behavioral volatility and mean reversion, temporary effects that cause stock metrics to deviate from their 'true' value are classified into two: short-term effects (length of a few days) and medium-term effects (length of several weeks).
It is hypothesized that the fundamental value of a company i at time t, denoted by P * i,t , is directly proportional to its annual earnings at time t, denoted by E i,t , with the fundamental PE ratio, denoted by PE * i,t , acting as the constant of proportionality: On the other hand, the observed PE ratio (openly available on the public domain), denoted by PE i,t , is given by PE i,t = where P i,t denotes the actual trading price of a company i at time t.
Note that from this point onward, the index i is dropped, as the analysis on stocks is performed univariately.The model equation that results from the above is given by: where y t = ln P t E t , {Z t } t∈N is a discrete-time Markov chain modeling the medium-term noise effects, and ε t ∼ N (0, σ 2 ) is a random variable modeling the short-term noise effects.Due to the above model equation, the model used is only applicable for firms that have positive earnings throughout the period under study.Furthermore, one must note that y t is an observable quantity (since both P t and E t are); however, PE * and Z t are not.It is assumed that both Z t and PE * are discrete-valued; hence, Z t ∈ {a 1 , . . ., a M } and PE * ∈ {b 1 , . . ., b N } where M and N represent the number of possible latent states of Z t and PE * , respectively.
The conditional independence assumptions used in this model are presented next: For t ≥ 2 and r, s ∈ {1, . . ., M}, the matrix W := [w rs ] M×M is defined to be the transition probability matrix, where Note that w rs ∈ [0, 1] and ∑ M s=1 w rs = 1.For t ≥ 1, m ∈ {1, . . ., M} and n ∈ {1, . . ., N}, the matrix D t := [d mn (y t )] M×N is defined to be the emission probability matrix at time t, where d mn (y t ) where d mn (y t ) ∼ N (ln[b n (1 + a m )], σ 2 ).For m ∈ {1, . . ., M} and n ∈ {1, . . ., N}, the vectors u := (u m ) m∈N and v := (v n ) n∈N are defined to be the initial probability vectors that contain the initial probability distributions, where These prior distributions serve to incorporate any expert knowledge that the researcher may have available.A graphical representation of the model described above can be found in Figure 1.
Having laid out the principles needed for inference, the aim of the analysis now makes itself clearer-that of inferring the value of PE * so as to estimate the fundamental price of a stock and formulate a trading strategy based on this knowledge.Furthermore, inferring the value of Z t is also useful, as it can be used to test an alternative trading strategy.In this context, the data used will be split into a training and a test set.The set of model parameters is given by θ = {W, u, v, σ 2 }.The space of possible parameters is given by the set: Figure 1.Representation of the model described above using a DBN, where y t is observable and PE * and Z t are latent, discrete variables.
Learning and inference then follow.Since the parameters in θ are unknown, they will first need to be estimated (learning procedure).As mentioned, the algorithm used to obtain the MAP estimates is the Baum-Welch algorithm.After the unknown model parameters are estimated, the forward-backward algorithm is used on the parameter estimates to infer the constant value of PE * for the training set and test set, and to infer the value of Z t through smoothing for the training set, and through filtering for the test set:

Inference with Known Parameters
As per the forward pass of the forward-backward algorithm, the following definitions are made and equations derived for the filtering probabilities: ∀ t ∈ {1, . . ., T}, m ∈ {1, . . ., M}, n ∈ {1, . . ., N}, After defining the filtering probabilities, the smoothing probabilities are given through the estimate denoted by γ tmn : Next, as per the backward pass of the forward-backward algorithm, the definition of β tmn is to be given so as to be able to obtain the values for γ tmn : Therefore, as per the forward-backward algorithm, one has: What remains to be derived are the expressions for β tmn ∀ t ∈ {1, . . ., T − 1}: For β tmn ∀ t ∈ {1, . . ., T − 2}, m ∈ {1, . . ., M}, n ∈ {1, . . ., N}: With expressions found for both the smoothing and filtering probabilities, the most probable values of the latent variables PE * and Z t are found through marginalization: To find the estimate Z t for the latent state Z t , smoothing is used on the training ,set whilst filtering is applied on the test set: where X train and X test denote the training set and test set, respectively.

Learning Unknown Parameters
The optimization problem in learning is given by θ MAP = arg max θ∈Θ P θ | y T 1 and is solved using the Baum-Welch algorithm (note that in the forthcoming expressions, θ should technically be written with a hat superscript (ˆ) since it is a set of parameter estimates): 1.
while θ (j) − θ (j−1) > δ do Solve the constrained maximization problem An expression in closed form can be obtained for Q θ; θ (j) by using the smoothing probabilities: mn (y t ) c In the argument of the maximization problem above, ln P[θ] is the logarithm of the prior distribution P[θ], where P[θ] = P[u]P[v]P[W]P[σ 2 ] since the independence of priors is assumed.Within this prior distribution, two types of expert knowledge can be included-prior knowledge of the ballpark value of PE * and prior knowledge of the persistence of the medium-term noise effects, which are encoded through the prior P[v] and the prior P[W], respectively.
The prior for the vector v is represented by the Dirichlet distribution: The values k n for n ∈ {1, . . ., N} intuitively correspond to the degree of belief an expert has on the event that b n is the 'true' value for PE * .For this analysis, k n = 1 ∀ n.The prior for the matrix W is also derived from the Dirichlet distribution f (W) = ∏ M m=1 f (w m ), where w m = (w im ) i=1,...,M and Since W is the transition matrix for the hidden Markov chain {Z t } t∈N , then the diagonal entries of W represent the probability that a particular state persists (stays as is in the next time point).The greater the value of Z t (or, correspondingly, a m ), the greater that state's persistence.In this analysis, the off-diagonal entries (the reader is suggested to refer to Wang [22] for more information on the values of the off-diagonals) are set to 0. Taking all the above into consideration, the logarithm of the prior P[θ] becomes ln P[θ] = ln P[W] where s 1 = ln Γ(k 1 +k 2 +...+k N ) Γ(k 1 )Γ(k 2 )...Γ(k N ) + ln Γ(k 1m +k 2m +...+k Mm ) Γ(k 1m )Γ(k 2m )...Γ(k Mn ) .The constant s 1 is only included for completeness' sake-it is rendered irrelevant when maximizing in the Baum-Welch algorithm.
With the priors set, the constrained optimization problem is now fully defined.Note that only the equality constraints in (6) will be considered, as the inequality constraints will end up being satisfied still.Therefore, the method of Lagrange multipliers can be used to solve this optimization problem.The expressions for the estimators of the four variables in question are given below: 1mn β ; ( 9) where mn (y T ) c tmn β

Methodology of Analysis and Results
Twelve equities were chosen to be analyzed in this paper (see Table 1)-nine from a developed market (US) and three from emerging markets (Brazil and China).The training set contains data from the 1st of January 2011 to the 31st of December 2019 whilst the test set contains data from the 1st of January 2020 to the 30st of September 2020.Using a 12-month rolling Sharpe ratio as a point of reference, all equities displayed average or above average returns (with respect to an S&P benchmark) in the test period, with the exception of the two underperforming Brazilian equities [26].Two datasets were collected for each stock for the above-mentioned period-daily price data and quarterly earnings per share data.After the preliminary data cleaning and preparation phase, the initial values of vectors a = (a m ) and b = (b n ) are set next.Priors of the vector v and matrix W are set as discussed earlier; the prior for u is set to be the discrete uniform distribution, and the initial value of σ 2 is set to 5. Parameter learning is then performed before inference of the latent states PE * and Z t .Simulation of the trading strategies follows.The trading strategies proposed by Wang [22] and used in this paper will be compared to the benchmark B&H strategy.For both the proposed trading strategies, let I t denote the amount of cash available at time t; let N t denote the units of a security held at time t; let T train represent the size of the training set; and let T represent the size of the dataset (sum of sizes of training and test sets).Both the long-term and the medium-term trading strategies can be described by three possible courses of action (labeled (i), (ii) and (iii) below) at time t; that is, courses of action (i) through (ii) are common to both trading strategies: (i) If PE t ≤ A t (1 − Tr) and I t > 0, then buy the security using all the available cash I t .Hence, N t+1 = I t P T and I t+1 = 0. (ii) If PE t ≥ A t (1 + Tr) and I t = 0, then sell all the units held N t .Hence, N t+1 = 0 and I t+1 = P t N t .
(iii) If neither (i) or (ii) are satisfied, do not execute any trades.Hence, N t+1 = N t and I t+1 = I t .
Note that A t is considered to be a baseline and depends on the trading strategy.The threshold value Tr ∈ (0, 1) acts as a sensitivity gauge defining how much the investor wants to allow PE t = P t E t to deviate from the baseline A t before triggering a particular course of action in the trading strategies.This is clear from how the courses of action are defined.The total profit at the end of the trading period is given by I T + P T N T − I T train +1 .
The first trading strategy is the so-called 'long-term strategy', where trading is performed with respect to the constant value of PE * , so A t = PE * .The alternative strategy is the 'medium-term strategy', where trading is performed with respect to the dynamic values of PE * (1 + Z t ), where each Z t is dynamically estimated through filtering.Therefore, for the medium-term strategy, A t = PE * (1 + Z t ).
The results on the BLK and ITUB stocks are presented in graphical detail in this paper.Firstly, the long-term strategy on the BLK stock data suggests that the investor buys the stock at time point 2265 = T train + 1 and holds the stock for the rest of the period.Clearly, this strategy coincides with the B&H strategy and, as a result, profit from the long-termstrategy would be equal to profit from the B&H strategy which is equal to USD1298.04, equaling a 12.98% return on the initial investment of USD 10,000.On the other hand, the medium-term strategy suggests that the investor buys the stock at time points 2265, 2301 and 2437, sells it at time points 2289 and 2431, and holds it for the rest of the time points.This would yield a profit of USD 3391.85;equalling a 33.92% return on the initial investment of USD 10,000 in the nine-month period that the testing set covers.This means that the medium-term strategy beats the B&H strategy by 20.94%. Figure 2 illustrates the long-term strategy and medium-term strategy.
Next, we discuss the ITUB stock data.The long-term strategy suggests that the investor buys the stock at time point 2308 and holds the stock for the rest of the period.Implementing this strategy would yield a loss of USD 3986.16, which equates to a 39.86% loss on the initial investment of USD 10,000.The buy-and-hold strategy, however, would yield a greater loss of USD 5584.67.In contrast, the medium-term strategy suggests that the investor buys the stock at time points 2313, 2315 and 2376, sells it at time points 2314 and 2372, and holds it for the rest of the time points.This would actually yield a profit of USD 909.81.Although this is only a 9.1% return on the initial investment of USD 10,000, the medium-term strategy provides the investor with a strategy whereby he or she can make a profit in a period when the stock is actually crashing.Figure 3 illustrates the long-term strategy and medium-term strategy, respectively.More generally, we see in Table 1 that the medium strategy has been consistently superior (for various thresholds) for BLK, COST, HD, ITUB, MA, MCD, NVDA and UNH.For ITUB, for all but one threshold, the strategy turns a slight profit even though a loss is registered for other strategies.The long-term strategy has been consistently superior for NTES and SAN.Neither strategy has offered any advantages on ADBE, while for AAPL, the success of the medium-term strategy depends on the choice of threshold.Some further analysis on these results will be given in the conclusion.

Conclusions
Through the use of DBNs, the model for stock movement by Wang [22] is built for our existing equity dataset.This model includes two latent states-one modeling the medium-term noise effects, and the other modeling the true fundamental PE ratio of a firm, the latter assumed to be constant throughout the period under study.The forwardbackward algorithm and the Baum-Welch algorithm (variant of the EM algorithm) are used to perform parameter learning and inference.Based on this fitted model, the longand medium-term strategies are applied to the twelve stocks studied here, with nine of these stocks trading on a developed market and the rest on an emerging one.Overall, both the long-term and medium-term strategies outperform the benchmark B&H trading strategy 17 and 31 times, respectively, out of a total of 48 experiment runs for each strategy (four for each of the twelve stocks).The strategies proposed by Wang only lose out to the B&H four and three times, respectively.Furthermore, the outperformance of these trading strategies is substantial.Whereas the average profit over all stocks when using the B&H is 20.83%, the average profit over all stocks and over all thresholds for the long-term strategy is 27.78% and that for the medium-term strategy is 36.23%.Lastly, it results that these trading strategies provide the investor with trading suggestions that, in certain cases, can even turn a loss under the B&H strategy into a profit as was shown to be the case for the ITUB stock.This stock was on a downfall, but the medium-term strategy still yielded a profit on the sum invested.All these results are as displayed in Table 1.
As with any statistical model, the model implemented in this work has room for improvement.The first and most significant limitation is the fact that only stocks that had a positive EPS during the period under study can be modeled by the model.This is due to the left-hand side of the model equation, that is, Equation (1).In certain times such as during pandemics or during recessions, it may be considerably difficult to find a firm who has not registered a loss in at least one quarter for the period under study.A second limitation is the lack of use of expert knowledge.In real-world investing, expertise in the field is considered highly valuable, and only seasoned investors are generally advised to trade actively.Since the model in this paper allows for the incorporation of expert knowledge, it is indeed a limitation that such knowledge is not made use of.Another limitation related to expert knowledge is the limitation that no trading fees or commissions are taken into account in this analysis.Although the proposed model allows for commissions to be considered in the trading strategies, no knowledge on the actual values of fees or commissions charged was available at the time of writing, and hence, such expenses could not be taken into account.It is well known that certain trading fees can sometimes tally up when executing numerous trades in such a way that a profit can turn into a loss when these fees are brought to the fore, which would make the buy-and-hold strategy more profitable.
Improvements on the above-mentioned limitations can add value to the model and should be considered in future works.In addition, another improvement that makes the model here more applicable in real-life scenarios is the incorporation of some variable that measures the liquidity of a stock.This is because the suggestions provided by the trading strategies on when to buy and sell a stock are rendered useless if the stock itself is not liquid.Prices may change by the time the stock becomes liquid enough for an actual trading opportunity to arise, and if the price changes significantly, that obviously renders the suggestion itself useless.In this study, liquidity was not envisaged to be of concern since all 12 stocks considered are large-cap stocks.
The choice of historical data used is always an important decision due to the fact that the length of the time series plays a major role.In principle, longer time series are preferred to shorter ones, but if the historical data contain changes in regime, this may inhibit the model in its forecasting performance.Furthermore, future regime changes that may occur in the test period may also impact the model's forecasting performance.
Finally, other improvements that can be implemented in future studies are further and deeper experimentation and testing, and the possibility of short-selling stock.To begin with, experimentation at a portfolio level can be implemented with the stocks studied here to understand how the profits change when a group of stocks are considered together-for instance, samples of nine stocks from the twelve studied here can be taken to form portfolios, and the 'optimal' portfolio can be identified.Apart from this, more time can be spent on cross-validation in future studies to further improve on the suggestions provided by the trading strategies in this work.Also, the properties of all the estimators used could be derived in future works, to better understand their behavior.Finally, this strategy does not allow the short-selling of stock; extensions to the model which allow for this could be proposed.

Figure 2 .
Figure 2. Simulated strategies for the BLK stock.

Figure 3 .
Figure 3. Simulated strategies for the IUTB stock.

Table 1 .
Table of results, showing the return on investment as a percentage of the sum invested.Values in brackets are negative.