TPLVM: Portfolio Construction by Student's $t$-process Latent Variable Model

Optimal asset allocation is a key topic in modern finance theory. To realize the optimal asset allocation on investor's risk aversion, various portfolio construction methods have been proposed. Recently, the applications of machine learning are rapidly growing in the area of finance. In this article, we propose the Student's $t$-process latent variable model (TPLVM) to describe non-Gaussian fluctuations of financial timeseries by lower dimensional latent variables. Subsequently, we apply the TPLVM to minimum-variance portfolio as an alternative of existing nonlinear factor models. To test the performance of the proposed portfolio, we construct minimum-variance portfolios of global stock market indices based on the TPLVM or Gaussian process latent variable model. By comparing these portfolios, we confirm the proposed portfolio outperforms that of the existing Gaussian process latent variable model.


Introduction
Estimation of covariance matrix of timeseries plays a dominant role in applications of modern financial theory.The optimization of mean-variance portfolio, which is one of the pioneering works of the modern finance theory [1], is based on the covariance matrix of the multi-dimensional timeseries of return of assets.Since the return of assets are modelled by non-stationary stochastic processes, the covariance matrix should be estimated as a time-dependent symmetric matrix.In practice, we often estimate the covariance matrix by empirical time averaging, because of the lack of complete information of the corresponding probabilistic space.It is however pointed out that time averaging often causes serious estimation error of the covariance matrix in the case of larger assets [2,3].To overcome this problem, several inference methods are proposed from the point of view of the random matrix theory [4,5].
With the aid of recently growing machine learning techniques, we can improve the accuracy of the estimation of the covariance matrix [6,7].Furthermore, the applications of the machine learning techniques have been spreading in both theoretical and practical financial problems [8,9].The prediction of the future price is implemented by the deep neural networks of various modeling [10,11].The Gaussian process is used as a model of dynamics of the covariance matrix of multi-dimensional timeseries.In the literature of option pricing theory, the model of the volatility of a risky asset is given by the Gaussian process [12].In particular, the application of the machine learning techniques for the portfolio optimization has attracted the interest of both academia and industry [13,14].
In the field of mathematical finance, stochastic volatility models have been utilized in estimating dynamic covariance matrix of the return of assets.One of the most popular conditional volatility models is the generalized autoregressive conditional heteroscedasticity (GARCH) model [15], which describes the volatility clustering of the return of assets.To introduce a time-varying correlation structure to these conditional volatility models, the dynamic conditional cor-relation (DCC) GARCH model has been proposed [16].The parameters of the GARCH and DCC GARCH can be estimated by the method of maximum likelihood.
On the other hand, in the literature of the machine learning, some kinds of latent variable models can be utilized to infer the dynamics of the covariance matrix.Recently, the Gaussian process latent variable model (GPLVM) has been employed to the problem of the portfolio optimization, where latent variables are introduced as factors of return of the assets.Namely, this model can be interpreted as a latent variable factor model [17].
Despite these practical applications, we should reconsider the assumption and validation of the use of the GPLVM for finance because the GPLVM assumes that observed data follows the Gaussian distribution.In the most case of financial problems, the return of assets is regarded as an observed variable.It is well known that the fluctuations of the return of assets follow non-Gaussian distributions [18].To describe such fluctuations, some fat-tailed distributions have been presented and applied to the financial timeseries.Thus, the GPLVM should be extended to fat-tailed distributions when we use it for the financial problems.
In this paper, we propose Student's-t process latent variable model (TPLVM) as an extension of the GPLVM.This model is developed based on the Student's t-distribution, which is a symmetric fat-tailed distribution.Since the Student's t-distribution converges to the Gaussian distribution with the limit of a parameter, degree of freedom, the TPLVM includes the GPLVM as a special case.To use the TPLVM in practice, as well as the GPLVM, we derive its predictive distribution as closed form and an estimator of hyper parameters by the variational inference in Bayesian sense.
The reminder of this paper is organized as follows.Chapter 2 gives a brief introduction the GPLVM including the Gaussian process with the concept of kernel functions.In Chap.3, we introduce the formula of TPLVM, which consists of the kernel functions, predictive distribution and variational inference for estimating hyper parameters.As a preliminary preparation of finance, we explain the basis of factor model and portfolio optimization in Chap. 4. Chapter 5 implements portfolio optimization, where we compare the performance of the GPLVM and TPLVM.Chapter 6 is dedicated to conclusions and future works.
2 Short review of Gaussian process

Gaussian process
The Gaussian process, a kind of stochastic processes, is a non-parametric method of machine learning method [19,20].This has been firstly introduced to describe random dynamics such as a fluctuating pollen on water surface known as Brownian motion [21].Without loss of generality, the argument of the Gaussian process can be extended from onedimensional time to multi-dimensional feature space.In this chapter, we provide a short review of the Gaussian process for multi-dimensional features as the preliminary preparation of the proposed model.
For a sequence of input features {x 1 , x 2 , • • •, x n }, a stochastic process f (•) is the Gaussian process when the sequence of random variables {f (x 1 ), f (x 2 ), • • •, f (x n )} is sampled from the multivariate Gaussian distribution.In general, the form of the multivariate Gaussian distribution is determined by the mean vector and covariance matrix.Likewise, the Gaussian process are specified by the mean and covariance function of input features.Thus, the Gaussian process is regarded as the representation of the infinite dimensional Gaussian distribution.
The mean and covariance functions are defined as follows: where the operator E[•] denotes expectation, m(•) and k(•, •) are respective mean and covariance functions.The mean vector and covariance matrix of the Gaussian process for given dataset are represented by On these setting, the stochastic process In this situation, the stochastic process f (•) is the Gaussian process expressed as f ∼GP(m, K).The covariance function satisfies to be symmetric and positive definite, and thus is also called as kernel function.In the literature of the Gaussian process, the covariance matrix is often called as kernel matrix.The mathematical characteristics of the kernel functions are explained in [22].
Given an additional dataset the predictive distribution of the conditional Gaussian process is also given by the Gaussian process GP(f * , K * ), where In Eqs. ( 5) and ( 6), it is seen that the covariance functions propagate the information about D to D * .Hence, the covariance functions play the dominant role in the use of the Gaussian process.

Gaussian process latent variable model
In the literature of big data analysis, it is often expected that observed variables can be explained by lower dimensional latent variables.For this purpose, various methods of dimension reduction have been developed.One of the most popular methods is the principal component analysis (PCA), which extracts latent variables by the singular value decomposition.To extend the PCA for nonlinear and random data, the Gaussian process latent variable model (GPLVM) has been proposed [23].The GPLVM expresses the nonlinearity of both observed and latent variables by the covariance function.The randomness is assumed to be originate from the Gaussian distribution.
To describe an observed variable y∈R D , we introduce a latent variable x∈R Q with Q < D, and a nonliner map f : R Q →R D with a Q-dimensional noise ǫ∼N (0, σ 0 I) as For this latent variable model, we assume that the nonlinear map f (•) is sampled from the Gaussian process as f ∼N (0, K).This model is known as the GPLVM.For the sake of brevity, we introduce notations for the set of latent and observed variables as Assume that the columns of the observed matrix Y ∈R N ×D are samples from the independently identical distributed Gaussian distributions which have the covariance functions with respect to the latent variable matrix X∈R N ×Q , the probability density function of the GPLVM is introduced as follows: In the GPLVM, hyperparameters of the covariance functions and latent variables are inferred by several existing methods such as gradient methods, variational inference and Markov Chain Monte Carlo methods.
3 Proposed model: Student's t-process latent variable model

Introduction of the Student's t-process
The Gaussian process has diverse applications in the fields of computer science, robotics and others.However, it seems not to be applicable to problems in finance because the fluctuations of the financial data follow non-Gaussian distributions with fat-tails.It is thus necessary to extend the methods of the Gaussian process non-Gaussian stochastic processes with fat-tails.
For this purpose, the Student's t-process was proposed as a generalization of the Gaussian process [24].This stochastic process follows the Student's t-distribution, of which tails show power-law behaviours.As with the Gaussian process, the Student's t-process is specified by the mean and covariance functions.Given the mean and covariance functions, the probability density function of the Student's t-process is defined as where Γ(•) is the multivariate gamma function and the positive real parameter ν is degrees of freedom.In this setting, the stochastic process f (•) is the Student's t-process expressed as f ∼T P(m, K; ν).Note that the Student's t-process converges to the Gaussian process at the limit of ν→∞.
The conditional distribution of the Student's t-process can be also derived analytically and given as the conditional Student's t-distribution.Namely, we can update the mean and covariance functions and the degrees of freedom from the conditional distribution.Through cumbersome calculations, the renewal formulas of the mean and covariance functions and the degrees of freedom are derived as follows: It is seen that the renewal formula of the covariance function explicitly depends on the number of observed variables, which property does not appear in the case of the Gaussian process.Hence, the Student's t-process is regarded to utilize prior information more effectively than the Gaussian process.

Student's t-process latent variable model
To extend the GPLVM to stochastic processes following non-Gaussian distributions, we propose the Student's-t process latent variable model (TPLVM).Suppose an observed variable y∈R D is explained by a low dimensional latent variable x∈R Q (Q < D) by a nonlinear map f : R D →R Q , f ∼T P(m, K; ν), the TPLVM is introduced as follows: The nonlinear dependency of the latent matrix X∈R N ×Q is given through the covariance matrix.It is expected that the TPLVM provides a robust estimation especially for observed data with large fluctuations because the Student's t-distribution can capture large deviated data from the Gaussian distribution in its sampling.
As with the GPLVM, the latent variable and hyperparameters of the TPLVM can be estimated from its likelihood.The logarithmic likelihood of the TPLVM is given as By means of existing optimization methods, we can estimate the latent variables and hyperparameters of the covariance function and the degrees of freedom.However, it is known that the optimization of the covariance function with respect to the latent variables often induces numerical instability because of its complexity.Hence, we should carefully select the initial values of optimization procedures and repeat with diverse seeds of the initial values to refuse dropping in local minima.

Variational inference
To overcome the shortcomings of the method of maximum-likelihood, we utilize the method of variational inference [25].Instead of optimizing the logarithmic likelihood in Eq. ( 15), we consider that of posterior p(X|Y ) = p(Y |X)p(X)/p(Y ) in the Bayesian sense.In solving the optimization problem with respect to the posterior, we try to approximate p(X|Y ) by q(X).As a measure of the difference between two probability density functions, we introduce the Kullback-Leibler (KL) divergence as follows: With the use of the Bayes theorem, the KL divergence is alternatively represented as Since the second term in the right hand side in Eq. ( 17) does not depend on q(•), we just have to maximize the first term in the right hand side, which is known as the evidence lower bound (ELBO), to minimize the KL divergence.The ELBO provides the lower bound of the evidence log p(Y ) because the KL divergence is non-negative.Therefore, this procedure realizes the sufficient fitting of the observed data at the same time.Indeed, the maximization of the ELBO serves the best explanation of the reduced dimension Q of the latent variable.

Problem formulation in finance 4.1 Factor model
Arbitrage pricing theory [26] assumes that the D-days expected return of an asset r n ∈R N is explained by the factor model as where α n ∈R D is an excess return, β n ∈R Q is weight coefficients, F ∈R D×Q is a factor matrix, and ǫ∈R D is an error term with zero mean and a finite covariance.The factor model manifests that the return of the asset is originated from the returns of Q-factors.In fact, without the excess return α n , the expected return of the factor model is derived as follows: The special case of this formula with only one factor is known as the model of the capital asset pricing model, which is a cornerstone of the modern finance theory [27].
The weight coefficients β n in the factor model in Eq. ( 18) can be interpreted as latent variables which explain the return of the asset.Based on this idea, we introduce a nonlinear factor model as This model is regarded as a latent variable counterpart of nonlinear factor model [10].Here, we employ the Student's t-process as the model of nonlinear mapping f : R Q →R D .In other words, the nonlinear factor model in Eq. ( 20) is given by the TPLVM.The nonlinear correlation of the latent variable factors depends on the specific form of the covariance function of the TPLVM, and the predicted return of the asset can be inferred by the predicted distribution.Furthermore, the nonlinear factor model can be interpreted as a dimension reduction model when Q < D. Hence we can expect to obtain the essential lower dimensional variable which explains the dynamics of the return of the asset.

Portfolio theory
Markowitz established the modern portfolio theory on the mean-variance portfolio.In this theory, a portfolio consists of multi assets classes such as stock, bond, currency and commodity with their optimal allocations based on both individual and entangled risk of assets.
The mean-variance portfolio is designed by the constrained quadratic programming problem with respect to the objective function as where w∈R D is the weight coefficients of the portfolio, K∈R D×D is the covariance matrix of the returns, λ is a Lagrangian multiplier, r is the return of the portfolio and µ is the expected return of the portfolio.In practical use, the return of the portfolio is quite hard to be estimated, whereby, without the constraint condition of the expected return, the mean-variance portfolio is often replaced by the minimum-variance portfolio with empirically estimated covariance matrix.

Experiment
In this section, we test the performance of the minimum-variance portfolio with the TPLVM by comparing with that with the GPLVM.Before proceeding, we explain the experimental dataset of our performance test.
As With the use of the historical returns of the stock indices, we construct the minimum-variance portfolios based on the GPLVM (Port G ) and TPLVM (Port t ).The covariance matrix of each portfolio is estimated by the covariance function with 120 past samples.As the kernel function, we utilize the exponential kernel defined as with θ l (l = 1, 2) being hyper parameters.For the sake of brevity, the dimension of the latent variables are fixed Q = 1.Under these conditions, we compare the performance of the Port G and Port t by its annualized return (Return), annualized risk as the standard deviation of return (Risk), risk/return (R/R) as return divided by risk.
Here, R P t indicates GPLVM or TPLVM portfolio return at time t, and µ P = (1/T ) T t=1 R P t denotes the average return of the GPLVM or TPLVM portfolio.
Table 2 shows the performances of the portfolios by comparing annual return, risk and return-risk ratio.The sample period is separated into anterior half period (Jun 2008 -Jun 2013) and posterior half period (Jul 2013 -Jun 2019).Note that the anterior half period contains the global financial crisis 2007-2008.As is seen in this table, the Port t outperforms the Port G in the both half periods.In particular, the difference of the annual return in the anterior half period is larger than that in the posterior half period.It is said that the market volatility during the global financial crisis intensively fluctuated whereby non-Gaussian nature clearly emerged in the global stock market.In such situation, the TPLVM is a consistent model to describe the intermittent volatility fluctuations.Thus, we can construct a robust portfolio by the TPLVM based minimum-variance portfolio.

Conclusion
In the literature of Bayesian machine learning, the Gaussian process has been developed and utilized to the diverse area including finance.It is.however, well known that the historical financial data follows non-Gaussian distributions.The Student's t-process is proposed, as the generalization of the Gaussian process, to model the observed data following the non-Gaussian distributions with fat-tails.
In this paper, we proposed the TPLVM by incorporating the latent variables into the Student's t-process.The TPLVM can be used to reduce the number of explanation variable following the non-Gaussian distributions with fat-tails.The nonlinear correlation of the TPLVM is modelled by prescribed kernel functions.The hyperparameters of the TPLVM can be determined by the method of maximum-likelihood.As a robust parameter optimization, we presented the method of variational inference of the TPLVM, which utilize the information of prior distribution of latent variables.
The problem of the portfolio optimization has been studied in both academia and industry.We applied the TPLVM into the portfolio optimization with the use of the minimum-variance portfolio.To test the performance of the proposed portfolio, we implemented the empirical analysis for the global stock market data and compared the Port G with Port t .It was shown that the Port t outperforms the Port G in the whole test periods because Port t can capture the non-Gaussian nature of the global stock market especially in the period of the global financial crisis.
The TPLVM can be applied other risk-based portfolios such as risk parity [28], maximum risk diversification [29], and complex valued risk diversification [30].These applications are expected to show high-performance compared with conventional ones.In addition, the TPLVM can be modified to a latent variable dynamical model to catch the nature of historical volatility fluctuations.These ways of research are our future works.

Table 2 :
Performance of Port G and Port t