Multitask Learning Based on Least Squares Support Vector Regression for Stock Forecast

: Various factors make stock market forecasting difﬁcult and arduous. Single-task learning models fail to achieve good results because they ignore the correlation between multiple related tasks. Multitask learning methods can capture the cross-correlation among subtasks and achieve a satisfactory learning effect by training all tasks simultaneously. With this motivation, we assume that the related tasks are close enough to share a common model whereas having their own independent models. Based on this hypothesis, we propose a multitask learning least squares support vector regression (MTL-LS-SVR) algorithm, and an extension, EMTL-LS-SVR. Theoretical analysis shows that these models can be converted to linear systems. A Krylov-Cholesky algorithm is introduced to determine the optimal solutions of the models. We tested the proposed models by applying them to forecasts of the Chinese stock market index trend and the stock prices of ﬁve stated-owned banks. The experimental results demonstrate their validity.


Introduction
The stock market is an indispensable part of the securities and financial industries.It reflects a country's economic situation and development.However, the behavior of a stock market is affected by many factors, such as financial and economic policy, business development, and investor psychology [1].Due to complex internal factors and the changing economic environment, forecasting the stock market is a challenge for researchers of financial data mining [2].Traditional stock market forecasting methods include securities investment analysis, nonlinear dynamic methods, time series mining, and statistical modeling [3].
Securities investment analysis requires close attention to the international and domestic events along with keen market insight [4,5].Considering the nonlinear character of the financial trading market, some nonlinear forecasting models have been established on the basis of nonlinear dynamical theory [6,7].Chen et al. [8] studied a multifactor time series model for stock index forecasting, which achieved a low root mean square error.Some statistical methods require strong modeling and statistical capabilities, which is difficult for general investors [9].
Many machine learning and deep learning methods are used to forecast the stock market and analyze stock prices.A high-performance stock trading architecture which integrates neural networks and trees is developed by Chalvatzis et al. [10], which is presented to enhance profitability during the investment.Zhang et al. [11] presented a model based on support vector regression (SVR) and a modified firefly algorithm to forecast stock prices.A back propagation (BP) neural network was extended to predict the stock market [12,13], but it fell easily into a local optimal solution.Song et al. [14] developed a deep learning model to forecast stock price fluctuation, which included hundreds of input-features.To explore the impact of historical events, a stacked long short term memory network was adopted to predict stock market behavior [15].However, their work inevitably built many irregular hidden network layers.Li et al. [16] proposed MISA-K-ELM, integrating mutual information-based sentimental analysis with kernel extreme learning machine to forecast stock prices.Dash et al. [17] proposed a self-evolving recurrent neuro-fuzzy inference system to predict irregular financial time series data.Text mining was applied to analyze financial articles and investor sentiment to predict daily stock market behavior [18,19].Mohanty et al. [20] proposed a hybrid model combined with auto encoder (AE) and kernel extreme learning machine (KELM), and the prime advantage of the proposed technique over the conventional SAE is robust prediction of different financial markets with reduction in error.These methods performed well at forecasting stock market trends, but they ignored the essential relatedness among stock data.
As an important and ongoing issue in machine learning, multitask learning has attracted significant attention in many fields, such as supervised learning, semi-supervised learning, reinforcement learning, and multiview learning [21].According to the concept of multitask learning, there is shared useful information among multiple related tasks, hence the learning effect of tasks can be well enhanced.When there are intrinsic relations among subtasks, the learning effect of all tasks can be greatly improved by learning them simultaneously.Gao [22] adopted clustered multitask support vector regression (MT-SVR) to perform age estimation based on facial images, which required the solution of large-scale quadratic programming problems.Li et al. [23] proposed multitask proximal support vector machine (MTPSVR), which incurred a lower computational cost than MT-SVR.Xu et al. [24] applied multitask least squares support vector machine (MTLS-SVM) to analyze three components of broomcorn samples.Nevertheless, MTLS-SVM confused the common bias and the bias in the subtask decision function, and could not flexibly select an appropriate kernel function according to different information.
In this paper, we develop a multitask learning assumption such that each subtask model can be obtained by solving a common model and an independent model.Then multitask learning least squares support vector regression (MTL-LS-SVR) is proposed under the assumption that the same kernel functions are employed in the common model and independent models.In addition, we propose an extension of MTL-LS-SVR, EMTL-LS-SVR, which cannot only consider the internal cross-correlation among subtasks but can select different kernel functions for the common model and independent models.These features can improve the prediction performance more efficiently than other training algorithms.Next, we present the Krylov-Cholesky algorithm to solve the proposed models, which greatly improves the training speed.Finally, the proposed models are applied to forecast the Chinese stock market index trend and the stock price movements of five state-owned banks.

Least Squares Support Vector Regression
SVR attempts to minimize the generalization error bound under the structural risk minimization principle.
It is based on a generalized linear regression function . The inequality constraints in SVR are transformed to equality constraints by least squares, which greatly improves the efficiency of training LS-SVR [26].Given a set of input samples, LS-SVR trains the generalized linear regression function to complete the regression prediction by a nonlinear mapping.The decision function of LS-SVR can be obtained by solving the following optimization problem where J(•) denotes an objective function, the symbol T represents the transpose of a certain matrix or vector, and ϕ(•) is a nonlinear mapping from the original input space to the feature space.C is a penalty coefficient, and ξ = [ξ 1 , ξ 2 , • • • , ξ n ] T is a slack vector to reflect whether the samples can be located in the ε-tube.(ω, b) is the generalized weight vector to be solved.x and y are respective sample attributes and tags.The Lagrange function method is used to transform the quadratic programming problem (1) to a linear system where T is an n × 1 vector of ones.α and b are the Lagrange multiplier vector and threshold.Q = Ω + E n C ∈ R n×n is a positive definite matrix, and E n is an n-dimensional identity matrix.Ω is an n × n matrix with elements ω i,j = ϕ(x i ) T ϕ x j = K x i , x j .ϕ(•) is a nonlinear mapping from the original input space to the feature space, and K(•, •) is the corresponding kernel function.Solving the linear system (2) gives us the following regression function (3)

Extension of Multitask Learning Least Squares Support Vector Regression
Suppose we have T(T > 1) learning tasks that are distinct but have good internal cross-correlation.For every task, there are m t training data {(x t i , y t i )} m t i=1 , where x t i ∈ R d and y t i ∈ R. Hence, we have m = ∑ T t=1 m t training data.Multitask learning aims to train subtasks at the same time and uses the effective information among related tasks to improve the generalization ability of the regression model.Since multiple tasks are related and different, there is shared information among all tasks and private information belonging to the subtasks themselves.We assume that all tasks share a common model ρ 0 and each subtask has an independent submodel η t , t = 1, 2, • • • , T. The regression function corresponding to the t − th subtask can be expressed as ρ t = ρ 0 + η t .To clearly illustrate multitask learning, Figure 1 shows a block diagram of our proposed models.We first establish MTL-LS-SVR, and an extension, EMTL-LS-SVR.Next, we present a Krylov-Cholesky algorithm to solve large-scale multitask learning problems.

MTL-LS-SVR
In the multitask learning model MTL-LS-SVR, the subtask model is represented as where ω 0 and ϕ(•) are, respectively, the normal vector and nonlinear mapping function for the common model ρ 0 , and υ t and ϕ(•) are those of the independent model η t .b 0 is the bias of the common hyperplane, and b t is the threshold difference between the hyperplane corresponding to the t − th subtask and the common hyperplane.υ t tends to zero if the subtasks are closely related, and ω 0 tends to zero otherwise.
The MTL-LS-SVR model can be determined by solving the following optimization problem where represents the slack variable vectors for the t − th subtask.C is a positive regularization coefficient.The dataset of the t − th subtask is mapped to the feature space by the nonlinear mapping φ(•), which can be denoted by , is a column vector of ones.For all the learning tasks, the task-coupling parameter λ balances the trade-off relationship between the shared information and the private information among all tasks.In particular, the greater the value of λ, the stronger the degree of association among subtasks; otherwise, the degree of association is weaker.It can be seen that subtasks are trained at the same time because they share some internal information.
The Lagrange function of the quadratic programming problem ( 5) is as follows: where T is a nonnegative Lagrange multiplier vector.According to the Karush-Kuhn-Tucker (KKT) condition, we derive the linear system where Q (1) = AA T + T λ Ω (1) + E m C is an m × m positive definite matrix, and Ω (1) is a block-wise diagonal matrix noted as Ω (1) an m-dimensional identity matrix.H is a column vector of ones, b 0 is the threshold of the common hyperplane, and . Solving the linear system (7) gives us the Lagrange multiplier vector α, bias item b 0 , and regression parameters corresponding to the common hyperplane and private information.Therefore, the decision function of MTL-LS-SVR can be obtained as

EMTL-LS-SVR
In the MTL-LS-SVR model, to select the same kernel functions for the common model and independent models cannot effectively distinguish their essential differences.Therefore, we propose an extension of MTL-LS-SVR, EMTL-LS-SVR.The regression function corresponding to the t − th subtask can be represented as where ρ t represents the joint model of the t − th subtask, ρ 0 and η t respectively denote the common model and private model.ω 0 and ϕ(•) are, respectively, the normal vector and nonlinear mapping function for the common model, and υ t and φ(•) are those of the independent model η t .It is obvious that ϕ(•) and φ(•) are different nonlinear mappings, hence different kernel functions are applied to process the shared information and private information.
According to the above analysis, the optimization problem of EMTL-LS-SVR can be obtained as where the dataset of the t − th subtask is mapped to the feature space by the nonlinear mapping φ(•) that can be noted as . λ, T, ξ t , A t , ϕ(•), I m t , b 0 , and b t have the same meanings as in the quadratic programming problem (5).
The corresponding Lagrange function of the quadratic programming problem (10) is where α t is the nonnegative Lagrange multiplier vector denoted as . Setting the gradient of Lagrange function (11) with respect to ω 0 , υ t , b 0 , b t , ξ t and α t to zero, we obtain the following equations: Then, refer to Equation ( 12), we can obtain the linear system where is a positive definite matrix, and Ω (2) is a block-wise diagonal matrix noted as Ω (2) = blkdiag( H, b 0 , and α m×1 have the same meanings as for the linear system (7).By solving the linear system (13), we obtain the Lagrange multiplier vector α m×1 and threshold b 0 , and the regression parameters corresponding to the common model and independent sub-models can also be determined.Therefore, the decision function of EMTL-LS-SVR can be obtained as where K 0 (•, •) and K t (•, •) are the respective kernel functions in the common model and independent models.It is obvious that EMTL-LS-SVR is reduced to MTL-LS-SVR if and only if ϕ(•) is equivalent to φ(•).

Krylov-Cholesky Algorithm
The linear systems ( 7) and ( 13) contain respective m + 1 equations, and are difficult to solve directly if the coefficient matrices are not positive definite.By using the Krylov methods [27], we can convert these linear systems to the form where ) are the respective positive definite matrices when solving MTL-LS-SVR and EMTL-LS-SVR, which we can denote as H is a positive number.Therefore, the linear system ( 13) is also positive definite.We will need to calculate the inverses of the large matrices Q (i) , which can be timeconsuming.In fact, Q (i) is positive definite and symmetric, hence can be simplified by the Cholesky factorization method [28].Hence, we develop a Krylov-Cholesky algorithm to solve the proposed models.To describe the model establishment process, Figure 2 shows the flowchart of the proposed multitask learning models.The Krylov-Cholesky algorithm steps are listed as follows: (1) Convert the linear system (7) or (13) to the following form using Krylov methods: Axioms 2022, 11, 292 where s = H T Q −1 H is a positive number.Q m×m is a positive definite and symmetric matrix denoted as and Ω is an m × m block-wise diagonal matrix; (2) Apply the Cholesky factorization method to decompose Q into Q = LL T , and the elements l ij of the lower-triangular matrix L can be determined from Q; (3) Calculate L −1 , and thus According to the Krylov-Cholesky algorithm, we can obtain the optimal solutions of linear systems ( 7) and ( 13) by solving linear system (16).We can apply the Krylov methods to convert the original linear system to a new one with a sparser coefficient matrix.We use the Cholesky factorization method to decompose Q into the product of a lower-triangular matrix L and its upper-triangular conjugate transpose L T .Q −1 is optimized by solving the inverse of L. The optimal solutions of linear systems ( 7) and ( 13) can be determined by solving for b 0 and α, respectively.

Experiments
To verify the effectiveness of the proposed multitask learning models, we compared them to SVR, LS-SVR, MTPSVR [23], and MTLS-SVR [24].Experiments were performed in MATLAB R2016a on a PC with an Intel Core i5-2500 CPU (3.30GHz) and 8 GB of RAM.In our experiments, the radial basis function kernel is employed in MTL-LS-SVR and four comparative models.For EMTL-LS-SVR, we used kernel functions K 0 and K t in the common model and independent models, respectively, as referred (14).Three combinations were used:

1)
K 0 is a linear kernel and K t is a polynomial kernel.2) K 0 is a linear kernel and K t is a radial basis function kernel.
3) K 0 is a polynomial kernel and K t is a radial basis function kernel.
For convenience, "L", "P", and "R", respectively, represent linear kernel K x i , x j = x i , x j , polynomial kernel K x i , x j = x i , x j + 1 d , and radial basis function kernel In this paper, "L + P", "L + R" and "P + R" denote the three kernel function combinations, hence the last three multitask learning models are denoted as EMTL-LS-SVR(L + P), EMTL-LS-SVR(L + R), and EMTL-LS-SVR(P + R), respectively.

Parameter Selection
In general, parameters are crucial to the performance of the model.There exist two different kernel parameters (σ, d) and the regularized coefficient C in the compared algorithms, and there is a task-coupling parameter λ in the multitask learning models.To train the models with appropriate parameters, we set the parameter scopes as σ ∈ 4  in advance.The grid search method was applied to search for the best parameters to avoid overfitting or underfitting [29].Datasets were normalized.About 80% of the instances were randomly chosen from the entire dataset to train the model, and the remaining 20% formed the test set.Ten-fold cross-validation was used on the training set to search for the optimal parameters, and the regression accuracy was the average value obtained from 20 independent experiments.

Evaluation Criteria
Some evaluation indicators were chosen to assess the experimental results and evaluate our models.Define l and k as the number of training and testing samples, respectively.Let y i and ŷi be the true and predicted values, respectively, of samples x i and y = 1 k k ∑ i=1 y i .We used the following indicators to evaluate the algorithms: Generally, the smaller the values of MAE, RMSE, and SSE/SST are, the better the algorithm performance will be.SSR/SST increases as SSE/SST decreases [30]."Accuracy ± S" denotes the average regression accuracy of 20 experiments plus or minus the standard deviation.

Forecast of Security of Stock Market Investment Environment
The running trend of the stock market index directly determines the security of the investment environment.Whether to prevent systemic structural risks or harvest dividends from stock market investments, accurate forecasts of the stock market index can provide much meaningful information.In this experiment, we verified the rationality of our models on stock index datasets, and applied them to forecast the opening index value of the stock market indices.The stock index datasets included historical data of the Shanghai Securities Composite Index (SSEC), a SZSE Composite Index (SZI), b Growth Enterprise Index (CNT), c and SSE SME Composite Index (SZSMEPI).d From the development history of the Chinese stock market, the crash effect of a rapid change from a bull market to a bear market was worse than for some international events, such as the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) in 2012, and the 2019-nCoV out-broken in early 2020.Therefore, we selected an entire evolutionary period from a bull market to a bear market in Chinese stock market, with historical data including 1352 trading days, from 25 June 2013, to 4 January 2019.The data from each trading day were used as a sample point, with nine indicators: opening index, highest index value, lowest index value, closing index, index changing margin, index changing ratio, trading volume, trading amount, and previous day's closing price.The four major stock market indices together compose the stock market of China, and they are affected by factors such as national policies, trade, and the international situation.We regard the above four stock indices as four subtasks, which are distinctive but interrelated, and which conform to the rules of multitask learning method.
To use multitask learning method to analyze the opening stock market index can make full use of the cross-correlation among different stock indices to obtain a more accurate prediction.Figure 3 shows the index time series change of the four indices on 1 June 2015.The horizontal axis represents the trading time (minutes) of stock trading in 240 min, and the vertical axis represents the change rate of each index on that day (taking the closing index of the last day as the baseline).Figure 3 shows that the movement of the four stock market indices is roughly the same, and they reach their highest and lowest points in the day at approximately the same times.These facts reflect their internal relationships.To verify the performance of the proposed regression models, eight algorithms were used to perform 20 independent runs on the stock index datasets.The experimental results of eight algorithms in predicting the stock indices are summarized in Table 1.For the MAE criterion, it is clear that SVR only has good results on the CNT dataset.The proposed multitask learning methods always produce the smallest SSE/SST and SSR/SST, and EMTL-LS-SVR in particular achieves significant performance.For the RMSE criterion, our MTL-LS-SVR models also achieve the best prediction results.In summary, SVR and MTLS-SVR have considerable learning effects, and the learning results of LS-SVR and MTPSVR are unsatisfactory compared to other regression methods.The results in Table 1 demonstrate that the learning effect of multitask learning models on different datasets can be further improved by selecting appropriate kernel functions and adjusting the relevant hyperparameters.Through experimental comparison, it is easy to find that when there is internal correlation among the learning tasks, multitask learning models can obtain much better prediction results than single-task learning methods.To select appropriate kernel functions for different information to establish the regression models also can greatly improve the learning effect.Figures 4-7 present the prediction results of the comparative algorithms on the four stock index datasets.In Figures 4a-7a, it can be seen that the eight methods produced apparent differences in the prediction results near the 480th day, which should be due to the mutual influences among stock indices.For clarity, the forecasting details of the stock indices for 20 continuous trading days around the 480th day are shown in Figures 4b-7b.It can be observed from Figures 4b-7b that MTPSVR and LS-SVR have larger deviations than MTL-LS-SVR.In addition, SVR and MTLS-SVR are comparable in learning and superior to MTPSVR, but they are still inferior to the proposed models.In Figures 4b-6b, it can be observed that the four comparative regression models have obvious prediction deviations.In particular, MTPSVR produces relatively large prediction errors on many trading days.
The prediction results for stock indices shown in Figures 4-7 and Table 1 further confirm the superior regression capability and robust performance of MTL-LS-SVR and EMTL-LS-SVR.

Forecasting Opening Prices of Five Major Banks
Accurate forecasts of the stock market index can help us to analyze future changes of the investment environment.Using the stock market index to guide real trading is the most critical issue for many investors.The five state-owned banks are important pillars of the Chinese banking industry, and their development is influenced by the country's macroeconomic policies and the development of state-owned enterprises.Therefore, multitask learning models can be used to predict the banks' stock price trends.In this experiment, we applied the proposed models to predict stock price trends of the five major stateowned banks.The bank datasets included 1346 days of trading data of the Industrial and Commercial Bank of China (ICBC), e Agricultural Bank of China (ABC), f Bank of China (BOC), g China Construction Bank CCB), h and Bank of Communications (BCM) i from 1 January 2014 to 10 July 2019.The data included eleven attribute indicators: opening price, highest price, lowest price, closing price, price changing margin, price changing ratio, trading volume, trading amount, trading amplitude, trading turnover rate, and previous day's closing price.Therefore, five interrelated but different learning tasks were trained simultaneously, and used to confirm the accuracy of our proposed models.In the experiment, the opening price of the day is the dependent variable, and the remaining ten indicators of the previous day are independent variables.The stock (opening) prices on 1346 trading days are shown in Figure 8.It can be seen that the opening prices of the five major   4b, 5b, 6b and 7b that MTPSVR and LS-SVR have larger deviations than MTL-LS-SVR.In addition, SVR and MTLS-SVR are comparable in learning and superior to MTPSVR, but they are still inferior to the proposed models.In Figures 4b, 5b and 6b, it can be observed that the four comparative regression models have obvious prediction deviations.In particular, MTPSVR produces relatively large prediction errors on many trading days.
The prediction results for stock indices shown in Figures 4-7 and Table 1 further confirm the superior regression capability and robust performance of MTL-LS-SVR and EMTL-LS-SVR.

Forecasting Opening Prices of Five Major Banks
Accurate forecasts of the stock market index can help us to analyze future changes of the investment environment.Using the stock market index to guide real trading is the most critical issue for many investors.The five state-owned banks are important pillars of the Chinese banking industry, and their development is influenced by the country's macroeconomic policies and the development of state-owned enterprises.Therefore, multitask learning models can be used to predict the banks' stock price trends.In this experiment, we applied the proposed models to predict stock price trends of the five major state-owned banks.The bank datasets included 1346 days of trading data of the Industrial and Commercial Bank of China (ICBC), e Agricultural Bank of China (ABC), f Bank of China (BOC), g China Construction Bank CCB), h and Bank of Communications (BCM) i from 1 January 2014 to 10 July 2019.The data included eleven attribute indicators: opening price, highest price, lowest price, closing price, price changing margin, price changing ratio, trading volume, trading amount, trading amplitude, trading turnover rate, and previous day's closing price.Therefore, five interrelated but different learning tasks were trained simultaneously, and used to confirm the accuracy of our proposed models.In the experiment, the opening price of the day is the dependent variable, and the remaining ten indicators of the previous day are independent variables.The stock (opening) prices on 1346 trading days are shown in Figure 8.It can be seen that the opening prices of the five major banks almost always fluctuated in the same direction, confirming a strong internal correlation among them.To evaluate the forecast results of the proposed models, eight algorithms were used to perform 20 independent runs on the bank datasets.Table 2 shows the average prediction results of 20 independent runs.Figures 9-13 show the predictive effects of different regression methods for ICBC, ABC, BOC, CCB, and BCM, respectively.To more clearly distinguish the forecast effect of different models on the bank datasets, only the prediction results of the 300 continuous trading days from 28 June 2017, to 13 September 2018, are shown in Figures 9a, 10a, 11a, 12a and 13a.To further estimate the performance of the models, we selected some distinct forecast areas including 5% continuous trading days, which are marked by the red dash-dot line in Figures 9a, 10a, 11a, 12a and 13a.The comparison is shown in Figures 9b, 10b, 11b, 12b and 13b.In the experiment, the red dashed line with four different hollow symbols shows the prediction results of the compared singletask methods, and the blue solid line with four different solid patterns shows those of our proposed multitask learning models.
The experimental results of the eight algorithms for the bank datasets are generalized in Table 2.For the MAE criterion, it is clear that SVR has good results on the ABC and CCB datasets.The results show that our EMTL-LS-SVR models not only have the smallest but obtain the best results in terms of and criteria among the different regression models.It can be seen from Table 2 that EMTL-LS-SVR(L + P) achieves a better learning effect on the ABC and BOC datasets, whereas EMTL-LS-SVR(P + R) performs better on the other three banks' data.
Figures 9-13 show the forecasting results of the regression algorithms on the opening prices of the five major state-owned banks.To better distinguish the predictive effects of the different models on the bank datasets, Figures 9b, 10b, 11b, 12b and 13b present some significantly different areas, which are drawn based on the natural exponential function values corresponding to the source data.From Figures 9b, 10b and 11b, we can see that the fitting degrees of MTL-LS-SVR and EMTL-LS-SVR are preferable on the ICBC, ABC, and BOC stock datasets, whereas the prediction results obtained by LS-SVR, MTLS-SVR, and MTPSVR apparently differ from the real data.In Figures 12b and 13b, the predictions of MTL-LS-SVR and EMTL-LS-SVR appear to have a slight deviation, perhaps because there is only a weak internal cross-correlation between CCB and BCM and the other state-owned banks.The fitting degree of SVR on CCB is better than that on the other state-owned banks, but it produces a high error on the 997th day in Figure 11b.MTPSVR also produces larger prediction deviations on the 998th day, as shown in Figure 10b.As shown in Figures 9b and 12b, both LS-SVR and MTLS-SVR produce apparent prediction deviations over many trading days.In Figure 13b, it can be seen that MTPSVR produces relatively large prediction deviations over many trading days, and MTLS-SVR also produces an unsatisfactory prediction result on the 997th day.The experimental results in Table 2 and Figures 9-13 generally show that EMTL-LS-SVR and MTL-LS-SVR have better learning ability and more robust performance than the other models.The prediction results of different regression models for stock market indices and bank stock prices further verify their advantages.In summary, our proposed multitask learning models cannot only infer stock market crash signals but can accurately forecast stock price fluctuations.These results indicate that multitask learning models can capture the internal relationships among subtasks, and have more robust performance than single-task regression algorithms.In other words, multitask learning methods can use more information than single-task learning methods.Therefore, MTL-LS-SVR and EMTL-LS-SVR can achieve better learning effects.

Discussion
In the experiments section, the proposed multitask learning models are applied to perform forecasts of the Chinese stock market index trend and the stock prices of five statedowned banks.In order to further discuss the performance differences between algorithms, the Friedman test with its corresponding Bonferroni-Dunn test [31] are employed for the experiments.For simplicity, we only analyze the prediction results of different algorithms on the two experimental datasets based on MAE and RMSE.Table 3 lists average ranks for all algorithms on two experimental datasets.The Friedman test results are computed based on the two static parameters χ 2 F and F F .Based on the null hypothesis that all the algorithms are equivalent, the Friedman statistic can be computed by the following equations: where N denotes the number of experiment datasets and K denotes the number of the comparative algorithms.R i = 1 N N ∑ j=1 r i j is the average rank of the i − th algorithm on the N experiment datasets used, and r i j represents the ranking of the prediction results of the i − th algorithm on the j − th experiment dataset among the K algorithms.F F is distributed according to the F-distribution with (k − 1) and (k − 1)(N − 1) degrees of freedom.
For this experiment, there are k = 8 and N = 9.Based on the Equations ( 21) and ( 22), we can obtain χ 2 F ∼ = 47.937 and F F ∼ = 25.459 for MAE criteria, and χ 2 F ∼ = 47.937 and F F ∼ = 25.459 for RMSE criteria, where F F is distributed according to the F-distribution with (k − 1)(N − 1) = 56 degrees of freedom.The critical value of F(7, 56) for significance level α = 0.05 is 2.178, which means the critical value is 2.178 based on a 95% confidence interval.For MAE and RMSE criteria, we find that the values of F F are much larger than the critical value, and thus the null hypothesis can be rejected and the eight algorithms have significant differences.Further, it can be seen from Table 3 that the proposed multitask learning models rank smaller than the other comparative algorithms and the EMTL-PSVM(L + R) obtain the smallest average rank for the MAE and RMSE criteria.
For further pairwise comparison, the Bonferroni-Dunn test is used [31].The performance of two models are significantly different if their average ranks differ by more than the critical difference CD = q α K(K+1) 6N .For this experiment, we find that q α = 2.829 for α = 0.1 and K = 8, then we can obtain that CD = 2.829.This means there is a 90% confidence level when the rank difference of two models is bigger than CD.Based on Table 3, we can compute the average rank deviations between the other methods and the EMTL-LS-SVR(L + R) for MAE as follows: In addition, Table 4 shows all of the comparison results between EMTL-LS-SVR(L + R) and the other comparative algorithms on the average rank deviations.The "Tag" represents the relation between d(a − b) and the CD value."Tag" is 1 when d(a − b) is larger than CD; otherwise, "Tag" is 0. From Table 4 can we know that, in terms of MAE criteria, the average rank difference between LS-SVR, MTPSVR, MTLS-SVR, and EMTL-LS-SVR(L + R) are larger than the critical value, which illustrates the performance of EMTL-LS-SVR(L + R) is significantly better than that of LS-SVR, MTPSVR, and MTLS-SVR.However, there are only slightly deviations between EMTL-LS-SVR(L + R) and SVR.For RMSE criteria, the performance of EMTL-PSVR(L + R) is superior to that of two single-task learning methods, MTPSVR and MTLS-SVR.Additionally, whether it is MAE or RMSE, there are slightly deviations between EMTL-LS-SVR(L + R) and other three forms, MTL-LS-SVR, EMTL-LS-SVR(L + P) and EMTL-LS-SVR(P + R).In summary, the advantages of MTL-LS-SVR and its extension EMTL-LS-SVR are all evaluated whether from a experimental analysis view or from a statistical testing perspective.The superiority of MTL-LS-SVR and EMTL-LS-SVR models are benefited from they can effectively capture the correlation among multiple learning tasks to improve the prediction performance of the model.Meanwhile, selecting appropriate kernel functions for shared information and private information can more effectively deal with different information, which makes our proposed model have strong robust performance.For the fair, the traditional algorithms can achieve a better learning effect on the small-scale problems, while deep learning models have better advantages in dealing with large-scale data mining problems [32].Therefore, how to effectively integrate multitask learning and deep learning to solve the real-world scenarios is also an attractive issue.

Conclusions
In this paper, we proposed an assumption that multiple related tasks share a common model and have their own independent models.Based on this assumption, we developed the MTL-LS-SVR model and an extension, EMTL-LS-SVR.MTL-LS-SVR makes good use of the advantages of least squares support vector regression and multitask learning.Meantime, the regularized parameter λ is introduced in MTL-LS-SVR and EMTL-LS-SVR to balance the shared information and private information among learning tasks.When learning tasks are related, superior performance can be achieved by adjusting λ and selecting appropriate kernel functions.Additionally, a Krylov-Cholesky algorithm is presented to optimize the solution procedures of the proposed models, which reduces the time to solve large-scale multitask learning problems.We tested the proposed models on the two stock datasets and compared the experimental results of different algorithms, which show that the EMTL-LS-SVR model can achieve a superior prediction effect and robust performance with the single-task learning method.
For the limitations of the MTL-LS-SVR algorithm, the correlations among learning tasks must be evaluated in advance when use it to make prediction and analysis on relevant real scenarios; otherwise, the learning effect may be weaker because of the potential negative transfer effect.Considering the advantages of neural networks, determining how to effectively apply the deep learning technique to solve multitask learning problems will be important future work for us.
Solve R, τ from QR = H and Qτ = Y, respectively, and record the corresponding solution R * and τ * ; (5) Calculate s = H T R * ; (6) Obtain the optimal solution: b

Figure 3 .
Figure 3. Movement of stock market indices on 1 June 2015.a Figures 4-7 show the predicted results for SSEC, SZI, CNT, and SZSMEPI, respectively.Considering the large number of training samples, only the forecast results of 400 continuous trading days during the peak period (11 September 2014, to 5 May 2016) are shown in Figures4a, 5a, 6a and 7a.To further compare the models, we enlarged the distinct parts labeled by the red dash-dot line that are 5% continuous trading days, and presented them in Figures4b, 5b, 6b and 7b.The red dashed lines with four different hollow symbols denote prediction results of the compared algorithms, and the blue solid lines with four different solid symbols represent the results of our proposed multitask learning models.

Figure 4 .
Figure 4. Predictions of different regression models on opening index for SSEC (a) Original figure, (b) Enlarged figure.

Figure 5 .
Figure 5. Predictions of different regression models on opening index for SZI (a) Original figure, (b) Enlarged figure.

Figure 6 .
Figure 6.Predictions of different regression models on opening index for CNT (a) Original figure, (b) Enlarged figure.

Figure 6 .
Figure 6.Predictions of different regression models on opening index for CNT (a) Original figure, (b) Enlarged figure.

Figure 7 .
Figure 7. Predictions of different regression models on opening index for SZSMEPI (a) Original figure, (b) Enlarged figure.

Figure 7 .
Figure 7. Predictions of different regression models on opening index for SZSMEPI (a) Original figure, (b) Enlarged figure.

Figures 4 -
Figures 4-7 present the prediction results of the comparative algorithms on the four stock index datasets.In Figures4a, 5a, 6a and 7a, it can be seen that the eight methods produced apparent differences in the prediction results near the 480th day, which should be due to the mutual influences among stock indices.For clarity, the forecasting details of the stock indices for 20 continuous trading days around the 480th day are shown in Figures4b, 5b, 6b and 7b.It can be observed from Figures4b, 5b, 6b and 7b that MTPSVR and LS-SVR have larger deviations than MTL-LS-SVR.In addition, SVR and MTLS-SVR are comparable in learning and superior to MTPSVR, but they are still inferior to the proposed models.In Figures4b, 5b and 6b, it can be observed that the four comparative regression models have obvious prediction deviations.In particular, MTPSVR produces relatively large prediction errors on many trading days.The prediction results for stock indices shown in Figures4-7and Table1further confirm the superior regression capability and robust performance of MTL-LS-SVR and EMTL-LS-SVR.

Figure 8 .
Figure 8. Change of stock opening prices of five stated banks.e

Figure 9 .
Figure 9. Predictions of different regression models on stock opening price for ICBC (a) Original figure, (b) Enlarged figure.

Axioms 2022 ,
11, x FOR PEER REVIEW 16 of 22(a) Original figure (b) Enlarged figure

Figure 9 .
Figure 9. Predictions of different regression models on stock opening price for ICBC (a) Original figure, (b) Enlarged figure.

Figure 10 .
Figure 10.Predictions of different regression models on stock opening price for ABC (a) Original figure, (b) Enlarged figure.

Figure 10 .
Figure 10.Predictions of different regression models on stock opening price for ABC (a) Original figure, (b) Enlarged figure.

Figure 11 .
Figure 11.Predictions of different regression models on stock opening price for BOC (a) Original figure, (b) Enlarged figure.

Figure 12 .
Figure 12.Predictions of different regression models on stock opening price for CCB (a) Original figure, (b) Enlarged figure.

Figure 12 .
Figure 12.Predictions of different regression models on stock opening price for CCB (a) Original figure, (b) Enlarged figure.

Figure 12 .
Figure 12.Predictions of different regression models on stock opening price for CCB (a) Original figure, (b) Enlarged figure.

Figure 13 .
Figure 13.Predictions of different regression models on stock opening price for BCM (a) Original figure, (b) Enlarged figure.

Table 1
lists the average results of 20 independent experiments.

Table 1 .
Performance comparisons of eight algorithms on four major stock market indices.

Table 2 .
Performance comparisons of eight algorithms on stock data of China's five state banks.

Table 3 .
Average ranks of all algorithms in the Friedman test on the two experimental datasets.