Prediction of the Change Points in Stock Markets Using DAE-LSTM

: Since the creation of stock markets, there have been attempts to predict their movements, and new prediction methodologies have been devised. According to a recent study, when the Russell 2000 industry index starts to rise, stocks belonging to the corresponding industry in other countries also rise accordingly. Based on this empirical result, this study seeks to predict the start date of industry uptrends using the Russell 2000 industry index. The proposed model in this study predicts future stock prices using a denoising autoencoder (DAE) long short-term memory (LSTM) model and predicts the existence and timing of future change points in stock prices through Pettitt’s test. The results of the empirical analysis conﬁrmed that this proposed model can ﬁnd the change points in stock prices within 7 days prior to the start date of actual uptrends in selected industries. This study contributes to predicting a change point through a combination of statistical and deep learning models, and the methodology developed in this study could be applied to various ﬁnancial time series data for various purposes.


Introduction
Since the creation of stock markets, there have been attempts to predict their movements, and new predictions methodologies have been devised. Since the 2000s, stock market predictions have been made using various machine learning algorithms beyond qualitative and quantitative methods. Research using multiple algorithms such as the hidden Markov model (HMM), artificial neural network (ANN), genetic algorithm (GA), and support vector machine (SVM) or using a combined model for stock market prediction has been conducted [1 -3]. Recently, due to the development of deep learning technology, studies applying deep learning to financial markets have been actively conducted. Financial market prediction using deep neural networks (DNNs) based on the improvement of PC computational capabilities was first attempted in [4], and financial market prediction using RNNs suitable for time series data was attempted in [5]. Since then, studies have shown that LSTM solves the vanishing gradient problem of recurrent neural networks (RNNs) and is more suitable for predicting financial time series [6,7].
The Russell 2000 Index, which has been released since 1982 by US investment advisor Frank Russell, is an index that includes 2000 stocks that are among the top 1001 to 3000 stocks based on market capitalization among 10,000 US listed companies. The number of constituent stocks reaches 2000, but the sum of the market caps of all Russell 2000 stocks is only a single digit share of the market caps of all stocks on the New York Stock Exchange. In this light, the Russell 2000 stocks include many small-scale companies with high growth potential. Unlike the Nasdaq 100 Index, which contains large technology stocks of the top 100 companies by market capitalization, the Russell 2000 Index includes stocks from various industries with proportions greater than 10 percent, such as finance, IT, health care, manufacturing, and consumer discretionary goods. A recent study revealed that when the Russell 2000 industry index begins to rise, stocks belonging to the relevant industry in other countries follow the Russell 2000 industry index and enter a trending period [8]. Based on this empirical result, this study seeks to predict the date when the Russell 2000 index starts to rise by sector.
In this paper, the denoising autoencoder (DAE), long short-term memory (LSTM), and Pettitt's test are used to predict the rising change points of the Russell 2000 index by industry. First, we use the DAE to remove the noise from Russell 2000 index data and predict future closing prices using LSTM. Then, Pettitt's test is performed based on the predicted closing prices to detect the change points.
The empirical results show that the model proposed in this paper is good at detecting rising change points for representative industries. The proposed model could predict a change point approximately 7 days before an actual change point date for several industry indices including consumer discretionary, consumer staples, information technology, and health care industries.
Until now, there have been attempts to find change points in past stock price data [8] or to predict future stock price movements through deep learning. However, no research to date has attempted to predict future stock price change points. Therefore, this study aims to predict future stock price flows through DAE-LSTM and predict the existence and timing of future change points through Pettitt's test.
The remainder of this study is organized as follows. Section 2 presents the literature review, and Section 3 presents the data and methodology used in this paper. Section 4 describes the empirical study, and Section 5 presents the conclusions of our study.

Denoising Autoencoder (DAE)
An autoencoder is a type of unsupervised machine learning that learns to approximate an output value to an input value. As shown in Figure 1, the autoencoder is a neural network structure that consists of an encoder that compresses input data and a decoder that restores the data. In the process of compressing the input data in the encoder, only important information is saved and used as a feature extractor. is only a single digit share of the market caps of all stocks on the New York Stock Exchange. In this light, the Russell 2000 stocks include many small-scale companies with high growth potential . Unlike the Nasdaq 100 Index, which contains large technology  stocks of the top 100 companies by market capitalization, the Russell 2000 Index includes  stocks from various industries with proportions greater than 10 percent, such as finance,  IT, health care, manufacturing, and consumer discretionary goods. A recent study revealed that when the Russell 2000 industry index begins to rise, stocks belonging to the  relevant industry in other countries follow the Russell 2000 industry index and enter a trending period [8]. Based on this empirical result, this study seeks to predict the date when the Russell 2000 index starts to rise by sector. In this paper, the denoising autoencoder (DAE), long short-term memory (LSTM), and Pettitt's test are used to predict the rising change points of the Russell 2000 index by industry. First, we use the DAE to remove the noise from Russell 2000 index data and predict future closing prices using LSTM. Then, Pettitt's test is performed based on the predicted closing prices to detect the change points.
The empirical results show that the model proposed in this paper is good at detecting rising change points for representative industries. The proposed model could predict a change point approximately 7 days before an actual change point date for several industry indices including consumer discretionary, consumer staples, information technology, and health care industries.
Until now, there have been attempts to find change points in past stock price data [8] or to predict future stock price movements through deep learning. However, no research to date has attempted to predict future stock price change points. Therefore, this study aims to predict future stock price flows through DAE-LSTM and predict the existence and timing of future change points through Pettitt's test.
The remainder of this study is organized as follows. Section 2 presents the literature review, and Section 3 presents the data and methodology used in this paper. Section 4 describes the empirical study, and Section 5 presents the conclusions of our study.

Denoising Autoencoder (DAE)
An autoencoder is a type of unsupervised machine learning that learns to approximate an output value to an input value. As shown in Figure 1, the autoencoder is a neural network structure that consists of an encoder that compresses input data and a decoder that restores the data. In the process of compressing the input data in the encoder, only important information is saved and used as a feature extractor.  The denoising autoencoder, devised by Pascal Vincent [9], is a model that saves the aforementioned important information and removes the noise of the data by removing the nonessential characteristics. After adding noise to the input data, two methods can be used in the learning process: learning to approximate the data without adding noise and learning by adding dropout to the input. In this study, the model was trained to approximate the original data after adding noise to the input data [10]. Then, when the data are passed through a neural network that has been trained, noise is removed, and data that retain important features can be obtained.

Long Short-Term Memory (LSTM)
LSTM is a type of RNN used in deep learning and was designed to solve the vanishing gradient problem, which is a problem in traditional RNNs [11]. LSTM stores the information of the previous step in a memory cell and determines how much of the past content will be forgotten through the input gate, forget gate, and output gate. The current information is added to the result and delivered to the next point in time.
LSTM networks are suitable for classification, processing, or prediction using time series data. In addition, an advantage of LSTM is that it is insensitive to the length of the input data compared to other algorithms that handle time series data, such as the RNN and HMM. In this study, the tanh function was used as the activation function.

Pettitt's Test
Pettitt's test is a nonparametric test used in several hydroclimatological studies to detect rapid changes in the mean distribution of a variable of interest. This test is based on the Mann-Whitney two-sample test (rank-based test) and can detect a single shift at an unknown time point [12,13]. Pettitt's test, introduced as one of Csorgo and Horvath's change-point detection methodologies, is defined as follows [14].
For a series of random variables X 1 , X 2 , . . . , X n , two sets of random variables {X 1 , X 2 , . . . , X k * } with the cumulative distribution function F 1 (X) and {X k * +1 , X k * +2 , . . . , X n } with the cumulative distribution function F 2 (X) have a change point at k * when F 1 (X) = F 2 (X). A test for changes in the distribution uses a nonparametric approach. The alternative hypothesis is established as H 1 : 1 ≤ k * < n, that is, a change point exists; against the null hypothesis H 0 : k * = n, that is, a change point does not exist. This approach devised by Pettitt [15] has been applied to detect change points in various continuous data. A test to detect a change point uses the following equations: (1) The statistic U k,n indicates whether the distributions of the two time series {X 1 , X 2 , . . . , X k * } and {X k * +1 , X k * +2 , . . . , X n } are the same. For t and U k,n at 1 ≤ k < n, Pettitt's test statistic is defined as follows: The limiting distribution of K n is approximated to 2exp −6k 2 / n 2 + n 3 .

Russell 2000 Index
Managed by FTSE Russell, the Russell 2000 Index was developed in 1984 by the Frank Russell Company. The Russell index is largely divided into three different indices [16]. These indices are the Russell 1000 index, which consists of the top 1000 stocks by market capitalization; the Russell 2000 index, which consists of the top 1001 to 3000 stocks by market capitalization; and the Russell 3000 index, which consists of the top 3000 stocks by market capitalization. Among these indices, the Russell 2000 index is the most widely used as a small-cap index and the second most used benchmark index after the S&P 500 [17].

Materials and Methodology
The experiment in this study used the daily data of the Russell 2000 sector index from 1 January 2000 to 31 December 2019 provided by Bloomberg. The starting date of the Russell 2000 s trend-up period by industry was used as the starting point of the empirical study [8].
As shown in Figure 2, the model proposed in this study is a deep learning combination model for predicting a date when a change point will appear at the present time, and the prediction process is largely composed of three parts. In step 1, after removing the noise from the data through the LSTM stacked denoising autoencoder, LSTM is used to predict the closing price of the next day, t + 1. In step 2, the closing prices of days t + 2, t + 3, . . . , t + 30 are predicted through a moving window model that uses the prediction data generated in step 1 as the input. In step 3, Pettitt's test is performed with the time series data created in step 2 to find the change points.

DAE-LSTM-Based Closing Price Prediction Model
In step 1, time point t is determined, and the closing price at time t + 1 is predicted using the DAE and LSTM. The autoencoder is trained to approximate the noise-added data to the original data by using 4 years of closing price data as the training dataset and 1 year of closing price data as the validation dataset from the past 5 years of closing price data from time t. We use normally distributed noise at this time following the previous studies that use noise following a normal distribution when they adopt DAE [18,19]. Then, the original data are passed through the learned autoencoder to obtain the data from which the nonessential noise has been removed. The denoised data obtained here are used as an input for the LSTM to obtain the closing price at point t + 1.
In the prediction model of step 1, the sequence length when the model is trained and when inputs necessary for prediction are inserted are designated as the five immediately preceding days. All input data are subjected to min-max normalization to improve the learning ability of the model. The final predicted value is calculated by denormalizing the predicted value output through the DAE-LSTM model.
A recent study demonstrated that compared with other machine learning algorithms including multiple linear regression (MLR), support vector regression (SVR), ANN and LSTM, the proposed DAE-LSTM model achieves the best prediction accuracy [20].

Repeat the Closing Price Prediction Using LSTM from t + 2 to t + 30
In this step, the process of predicting the data one point forward is repeated from t + 2 to t + 30. At this time, the predicted value of the LSTM model is set to t + 2 instead of t + 1 to obtain the predicted value of day t + 2, and the predicted value of day t + 3 is obtained by setting the predicted value of the model to t + 3. This process is repeated until a predicted value of t + 30 is obtained. By repeating this process, the closing price is predicted to obtain a time series predicted from time t + 1 to time t + 30.

Predict Change Points Using Pettitt's Test
In this step, Pettitt's test is performed using the predicted time series data obtained through step 1 and step 2 as input data to determine whether there is a change point and to find the date of the change point. Previous studies show that the prediction power in financial markets is improved when the neural network is applied after clustering using the Pettitt's test [21,22]. At this time, if the p value of Pettitt's test is less than 0.05, a statistically significant change point is obtained; and if the p value is greater than 0.05, it can be said that there is no significant change point in the corresponding period.

Empirical Study
This study examines four sectors including consumer discretionary (consumer discret), consumer staples, information technology (IT), and health care included in the Russell 2000 index. Table 1 shows the start date of the rising trend by industry. Only nonoverlapping industries were used among the uptrend days for each industry used in the previous study [8]. The experiment was conducted using the same process for each industry, and the experiment consisted of three processes. This period refers to the five-year period before the start of the general trend of each sector. After Pettitt's test was conducted for the entire period, Pettitt's test was conducted by dividing the period based on the corresponding change point. This process was repeated until the p value of Pettitt's test was greater than 0.05 and there was no change point in the corresponding period. The interval of change points was measured and averaged. The average interval of change points based on the 95% confidence level was 25.32 days, and a change point was detected within 30 days. Therefore, 15 days before the start date of the uptrend was set as time t in step 1, and the closing prices from t + 1 to t + 30 in step 2 were predicted.
The time t set according to the above criteria by sector is as follows.
After dividing the data based on the selected time t in Table 2, the noise in the data were removed through the DAE. Five years of daily data before time t were used. The first 4 years of data were used as the training dataset and the next 1 year of data was used as the validation dataset. In the learning process, the sequence length was set to 5, the learning rate was 0.0001, and the number of learning sessions was set to 10,000. The loss function was the MSE (mean squared error), the optimization function was the Adam optimizer, and batch normalization was used to prevent overfitting during training. The denoised closing price data for each industry were obtained by using the data from the Sustainability 2021, 13, 11822 6 of 15 5 years before point t as input data to the trained model. Figures 3-6 show the denoised closing prices and actual closing prices of the sector indices. Figure 3 shows the graphs for the consumer discretionary sector index from 17 June 2003 to 16 March 2007, and Figure 4 shows the graphs for the consumer staples sector index from 18 July 2006 to 17 July 2010. Figures 5 and 6 show the graphs for the IT sector index and health care sector index from 17 June 2011 to 16 June 2015, respectively. The time set according to the above criteria by sector is as follows.
After dividing the data based on the selected time t in Table 2, the noise in the data were removed through the DAE. Five years of daily data before time t were used. The first 4 years of data were used as the training dataset and the next 1 year of data was used as the validation dataset. In the learning process, the sequence length was set to 5, the learning rate was 0.0001, and the number of learning sessions was set to 10,000. The loss function was the MSE (mean squared error), the optimization function was the Adam optimizer, and batch normalization was used to prevent overfitting during training. The denoised closing price data for each industry were obtained by using the data from the 5 years before point as input data to the trained model. Figure 3, Figure 4, Figure 5 and Figure 6 show the denoised closing prices and actual closing prices of the sector indices. Figure 3 shows the graphs for the consumer discretionary sector index from 17 June 2003 to 16 March 2007, and Figure 4 shows the graphs for the consumer staples sector index from 18 July 2006 to 17 July 2010. Figure 5 and Figure 6 show the graphs for the IT sector index and health care sector index from 17 June 2011 to 16 June 2015, respectively.        The denoised data obtained through the DAE are divided into a training dataset with 4 years of data and a validation dataset with 1 year of data, and then the LSTM model is trained. In the LSTM model training, the sequence length, learning rate, number of learning iterations, loss function, and optimization function were the same values as those used in the DAE. Using the data up to time as input data, the predicted value at time + 1 was obtained.    The denoised data obtained through the DAE are divided into a training dataset with 4 years of data and a validation dataset with 1 year of data, and then the LSTM model is trained. In the LSTM model training, the sequence length, learning rate, number of learning iterations, loss function, and optimization function were the same values as those used in the DAE. Using the data up to time as input data, the predicted value at time + 1 was obtained. The denoised data obtained through the DAE are divided into a training dataset with 4 years of data and a validation dataset with 1 year of data, and then the LSTM model is trained. In the LSTM model training, the sequence length, learning rate, number of learning iterations, loss function, and optimization function were the same values as those used in the DAE. Using the data up to time t as input data, the predicted value at time t + 1 was obtained. Figure 7 shows the predicted and actual prices of the consumer discretionary sector index from 17 June 2007 to 31 July 2008 when the validation dataset was used as the input data of the trained model. Similarly, Figure 8 shows the predicted and actual prices of the consumer staples sector index from 18 July 2010 to 17 August 2011 when the validation dataset was used as the input data of the trained model. Finally, Figures 9 and 10 show the predicted and actual prices of the IT sector index and the health care sector index from 17 June 2015 to 31 July 2015 when the validation dataset is used as the input data of the trained model, respectively.
index from 17 June 2007 to 31 July 2008 when the validation dataset was used as the input data of the trained model. Similarly, Figure 8 shows the predicted and actual prices of the consumer staples sector index from 18 July 2010 to 17 August 2011 when the validation dataset was used as the input data of the trained model. Finally, Figure 9 and Figure 10 show the predicted and actual prices of the IT sector index and the health care sector index from 17 June 2015 to 31 July 2015 when the validation dataset is used as the input data of the trained model, respectively.    index from 17 June 2007 to 31 July 2008 when the validation dataset was used as the input data of the trained model. Similarly, Figure 8 shows the predicted and actual prices of the consumer staples sector index from 18 July 2010 to 17 August 2011 when the validation dataset was used as the input data of the trained model. Finally, Figure 9 and Figure 10 show the predicted and actual prices of the IT sector index and the health care sector index from 17 June 2015 to 31 July 2015 when the validation dataset is used as the input data of the trained model, respectively.    data of the trained model. Similarly, Figure 8 shows the predicted and actual prices of the consumer staples sector index from 18 July 2010 to 17 August 2011 when the validation dataset was used as the input data of the trained model. Finally, Figure 9 and Figure 10 show the predicted and actual prices of the IT sector index and the health care sector index from 17 June 2015 to 31 July 2015 when the validation dataset is used as the input data of the trained model, respectively.

Repeat the Closing Price Prediction
Using LSTM from + 2 to + 30.
Repeating the same prediction as that performed in step 1, we obtain the predicted closing prices at + 2, + 3, …, and + 30. Given the denoised and predicted values from + 1 to + 30, we calculated the mean squared error (MSE) for each sector to evaluate the DAE-LSTM model. The MSE is a loss function calculated as the square of the error, which is the difference between the predicted value and the actual value. The MSE is calculated as follows: In Equation (4), is the number of values, is an actual value, and is a predicted value. Figure 11, Figure 12, Figure 13 and Figure 14 show the LSTM prediction errors for the consumer discretionary, consumer staples, IT, and health care sectors, respectively. In Figure 11, Figure 12, Figure 13 and Figure 14, the blue (orange) line presents the MSE of the training (validation) set. Figure 11. LSTM prediction errors for the consumer discretionary sector.

Repeat the Closing Price Prediction Using LSTM from t + 2 to t + 30
Repeating the same prediction as that performed in step 1, we obtain the predicted closing prices at t + 2, t + 3, . . . , and t + 30. Given the denoised and predicted values from t + 1 to t + 30, we calculated the mean squared error (MSE) for each sector to evaluate the DAE-LSTM model. The MSE is a loss function calculated as the square of the error, which is the difference between the predicted value and the actual value. The MSE is calculated as follows: In Equation (4), n is the number of values, y i is an actual value, andŷ i is a predicted value. Figures 11-14 show the LSTM prediction errors for the consumer discretionary, consumer staples, IT, and health care sectors, respectively. In Figures 11-14, the blue (orange) line presents the MSE of the training (validation) set.

Repeat the Closing Price Prediction
Using LSTM from + 2 to + 30.
Repeating the same prediction as that performed in step 1, we obtain the predicted closing prices at + 2, + 3, …, and + 30. Given the denoised and predicted values from + 1 to + 30, we calculated the mean squared error (MSE) for each sector to evaluate the DAE-LSTM model. The MSE is a loss function calculated as the square of the error, which is the difference between the predicted value and the actual value. The MSE is calculated as follows: In Equation (4), is the number of values, is an actual value, and is a predicted value. Figure 11, Figure 12, Figure 13 and Figure 14 show the LSTM prediction errors for the consumer discretionary, consumer staples, IT, and health care sectors, respectively. In Figure 11, Figure 12, Figure 13 and Figure 14, the blue (orange) line presents the MSE of the training (validation) set. Figure 11. LSTM prediction errors for the consumer discretionary sector. Figure 11. LSTM prediction errors for the consumer discretionary sector.

Prediction of the Change Points Using Pettitt's Test.
Change point detection was performed using Pettitt's test, as described in step 3, to check whether a significant change point existed in the predicted and actual data, and the date of a change point was recorded.
As shown in Figure 15, in the case of the consumer discretionary sector index, Pettitt's test was conducted on data from 17 June 2006 to 16 July 2008. During this time period, 6 November 2007 was detected as a change point, as shown in Figure 15a. Then, as shown in Figure 15b, Pettitt's test was performed on the data from 6 November 2007 to 16 July 2008. In this case, 3 January 2008 was detected as a change point. By repeating this process, the first date of the change point detected later than the set time in Table 2

Prediction of the Change Points Using Pettitt's Test
Change point detection was performed using Pettitt's test, as described in step 3, to check whether a significant change point existed in the predicted and actual data, and the date of a change point was recorded.
As shown in Figure 15, in the case of the consumer discretionary sector index, Pettitt's test was conducted on data from 17 June 2006 to 16 July 2008. During this time period, 6 November 2007 was detected as a change point, as shown in Figure 15a. Then, as shown in Figure 15b, Pettitt's test was performed on the data from 6 November 2007 to 16 July 2008. In this case, 3 January 2008 was detected as a change point. By repeating this process, the first date of the change point detected later than the set time t in Table 2

Prediction of the Change Points Using Pettitt's Test.
Change point detection was performed using Pettitt's test, as described in step 3, to check whether a significant change point existed in the predicted and actual data, and the date of a change point was recorded.
As shown in Figure 15, in the case of the consumer discretionary sector index, Pettitt's test was conducted on data from 17 June 2006 to 16 July 2008. During this time period, 6 November 2007 was detected as a change point, as shown in Figure 15a. Then, as shown in Figure 15b, Pettitt's test was performed on the data from 6 November 2007 to 16 July 2008. In this case, 3 January 2008 was detected as a change point. By repeating this process, the first date of the change point detected later than the set time in Table 2    In the same way as for the consumer discretionary sector index, Pettitt's test is conducted for the consumer staples sector index (Figure 16), IT sector index (Figure 17), and health care sector index ( Figure 18) to select each final predicted change points. In the same way as for the consumer discretionary sector index, Pettitt's test is conducted for the consumer staples sector index (Figure 16), IT sector index (Figure 17), and health care sector index ( Figure 18) to select each final predicted change points.   In the same way as for the consumer discretionary sector index, Pettitt's test is conducted for the consumer staples sector index (Figure 16), IT sector index (Figure 17), and health care sector index ( Figure 18) to select each final predicted change points.    Table 3 reports the dates when the change points occurred in the predicted and actual time series data by sector. It is interesting to note that the date of the predicted change point is found within 7 days prior to the start date of the trending upward phase for all sectors.

Discussion and Concluding Remarks
The purpose of this study was to use deep learning models to predict the change points that will appear in the future using the Russell 2000 industry index. Pettitt's test, a statistical model, has been used to find the change points that only occurred in the past using historical data. Furthermore, this study contributes to predicting change points through a combination of statistical and deep learning models.
As shown in the empirical results, we were able to find a change point close to the actual start date of the uptrend for all sectors. Accurate prediction of change points in stock prices provides useful information, and the methodology that combines a deep learning model (DAE-LSTM) and a statistical model (Pettitt's test) developed in this study can be utilized as a portfolio allocation strategy for investors in stock markets.  Table 3 reports the dates when the change points occurred in the predicted and actual time series data by sector. It is interesting to note that the date of the predicted change point is found within 7 days prior to the start date of the trending upward phase for all sectors.

Discussion and Concluding Remarks
The purpose of this study was to use deep learning models to predict the change points that will appear in the future using the Russell 2000 industry index. Pettitt's test, a statistical model, has been used to find the change points that only occurred in the past using historical data. Furthermore, this study contributes to predicting change points through a combination of statistical and deep learning models.
As shown in the empirical results, we were able to find a change point close to the actual start date of the uptrend for all sectors. Accurate prediction of change points in stock prices provides useful information, and the methodology that combines a deep learning model (DAE-LSTM) and a statistical model (Pettitt's test) developed in this study can be utilized as a portfolio allocation strategy for investors in stock markets.
There are various assets traded in financial markets, and an enormous number of models or techniques for achieving efficient portfolios have been developed in the literature. Financial assets, investment techniques and investors are critical components in the efficiency of financial markets, and efficient financial markets have been well known to play an important role in sustaining economic growth. Investors in stock markets are able to achieve more efficient portfolios using our deep learning models. In this sense, our approach developed in this paper appears to contribute to the efficiency of financial markets, and hence, plays a role in sustaining economic growth.
It is expected that the methodology developed in this study can be applied not only to sector index data but also to individual stock data or various financial time series data, such as exchange rates, interest rates, and various macroeconomic indicators, to predict change points and use them for various purposes.
However, this study has potential limitations. The empirical results are limited to Russell 2000 stocks in four sectors. Based on our DAE-LSTM methodology, future research can enrich the topic by developing a new model that can be utilized for various types of financial time series data from global markets. In addition, the problem of LSTM prediction error could be improved by utilizing a deep learning model for better prediction of performance in future studies.