A Labeling Method for Financial Time Series Prediction Based on Trends

Time series prediction has been widely applied to the finance industry in applications such as stock market price and commodity price forecasting. Machine learning methods have been widely used in financial time series prediction in recent years. How to label financial time series data to determine the prediction accuracy of machine learning models and subsequently determine final investment returns is a hot topic. Existing labeling methods of financial time series mainly label data by comparing the current data with those of a short time period in the future. However, financial time series data are typically non-linear with obvious short-term randomness. Therefore, these labeling methods have not captured the continuous trend features of financial time series data, leading to a difference between their labeling results and real market trends. In this paper, a new labeling method called “continuous trend labeling” is proposed to address the above problem. In the feature preprocessing stage, this paper proposed a new method that can avoid the problem of look-ahead bias in traditional data standardization or normalization processes. Then, a detailed logical explanation was given, the definition of continuous trend labeling was proposed and also an automatic labeling algorithm was given to extract the continuous trend features of financial time series data. Experiments on the Shanghai Composite Index and Shenzhen Component Index and some stocks of China showed that our labeling method is a much better state-of-the-art labeling method in terms of classification accuracy and some other classification evaluation metrics. The results of the paper also proved that deep learning models such as LSTM and GRU are more suitable for dealing with the prediction of financial time series data.


Introduction
A time series is a set of observations, each one being recorded at a specific time [1]. Prediction of time series data is a relatively complex task. Since there are many factors affecting time series data, it is difficult to predict the trend of time series data accurately. Time series forecasting aims at solving various problems, specifically in the financial field [2]. In the course of financial market development, a large number of studies have shown that the market is non-linear and chaotic [3][4][5], especially for the financial time series data such as the values of stocks, foreign exchanges, and commodities in the financial market that are sensitive to external impact and tend to fluctuate violently. Such time series data often have strong non-linear characteristics [6]. How to better predict the trend of financial market is of great significance for reducing investment risk and making financial decisions.
There have been a lot of studies on the prediction of financial market. In particular, we focus on the efficient market hypothesis [7]. There are many studies on market efficiency [8]. Sigaki et al. used permutation entropy and statistical complexity over sliding time-windows of price log returns to daily data of Chinese stock markets, and the results showed that the logarithmic change of stock price (logarithmic return) has a lower possibility of being predicted than the volatility [65]. It is inferred that a model trained based on the data labeled by the comparison of the logarithmic rate of return will not have good prediction results. Since the continuous trend of time series data is often affected by intrinsic factors, it is more sustainable and predictable. For example, the rising or falling stock prices are often influenced by the company's intrinsic value and macroeconomic environment, and it is difficult to forecast the rising and falling of stock prices on a certain day, while the forecast of the continuous trend is relatively reliable. The rising and falling of commodity prices are often influenced by macroeconomic factors and demand, and the forecast of trends is more feasible. The foreign exchange market is influenced by interest rates and international environment, and its trend often shows continuity in time. The intrinsic factors lead to the persistence and predictability of the trend, and the possibility of trend changing due to random factors is relatively low in short-term.
Therefore, this paper proposed a novel method to define the extracted features of the continuous trends of time series, aiming to make corresponding decisions by predicting the trend changes of time series. The prediction of continuous trends is more in line with the rules of financial time series data operation, more concerned with the prediction of trend change, and more in consistent with the investment habits of individual investors. In addition, an algorithm for automatically labeling data according to a given parameter was presented to automatically label the data used to train machine learning models. The prediction results were compared with those obtained by the traditional time series data labeling method. The classification result of the automatically labeling algorithm was significantly better than that of the traditional labeling method. The investment strategies were constructed, and the investment results were compared. The method proposed in this paper achieved much better performance.
The paper is organized as follows: In Section 1, the labeling method proposed in this paper was explained in details along with description of the automatic labeling algorithm. In Section 2, different values of parameter ω were used for data labeling and the results of different parameters in four groups were analyzed. In Section 3, the classification results of six machine learning models under different labeling method were compared and investment strategies were constructed based on the proposed labeling method to compare the results obtained from traditional labeling method and the buy-and-hold strategy. Finally, a short discussion is provided in Section 4 and the conclusions in Section 5. A detailed flowchart of the study is shown in Figure 1. according to a given parameter was presented to automatically label the data used to train machine learning models. The prediction results were compared with those obtained by the traditional time series data labeling method. The classification result of the automatically labeling algorithm was significantly better than that of the traditional labeling method. The investment strategies were constructed, and the investment results were compared. The method proposed in this paper achieved much better performance. The paper is organized as follows: In Section 1, the labeling method proposed in this paper was explained in details along with description of the automatic labeling algorithm. In Section 2, different represent the ith date, opening price, highest price, lowest price, closing price, volume, and quantity, respectively. The prediction results are obtained by the labeling method proposed in this paper, and the investment strategies are constructed.

Learning Algorithms
In order to better verify the effectiveness of the proposed labeling algorithm in this paper, six different machine learning models were selected to design experiments, including four traditional machine learning models and two deep learning models that are suitable for processing sequence data. These machine learning models can be briefly described as follows: Logistic regression is a mathematical modeling approach that can be used to describe the relationship of several variables to a dichotomous dependent variable [66,67]. LOGREG theory is simple and easy to understand, but it lacks robustness and accuracy when there is noise in the data [68].

Random Forest (RF)
The random forest classifier is an integrated classifier which generates multiple decision trees using randomly selected subsets of training samples and variables [69]. The random forest algorithm proposed, by Breiman in 2001, has been extremely successful as a general-purpose classification and regression method [70]. RF is one of the most powerful ensemble methods with high performance when dealing with high dimensional data [71]. The K-nearest-neighbor (KNN) is a non-parametric classification method, which is simple but effective in many cases [72][73][74]. It can identify many market conditions including both mean reversion and trend following [75].
2.1.4. Support Vector Machine (SVM) Support Vector Machine (SVM) is a very special learning algorithm, which is characterized by the capacity control of decision function, the use of kernel functions and the scarcity of the solution [76]. It has many advanced features with good generalization capabilities and fast computing capabilities [77]. The support vector machine adopts a risk function composed of empirical error and a regularization term derived based on the principle of structural risk minimization, which is a promising method for predicting financial time series [78]. The fundamental motivation for SVM is that the method can accurately predict time series data when the underlying system process is usually a non-linear, non-stationary, and undefined priori [79]. The SVM kernel mechanism can map nonlinear data into high latitude space and make it linearly separable. SVM has a good effect on time series classification prediction [80].

Long Short-Term Memory (LSTM)
LSTM is a kind of time recurrent neural network, which is suitable for processing and predicting important events with a relatively long interval and delay in time series. It was first proposed by Hochreiter and Schmidhuber in 1997 [81]. The ingenuity of LSTM is that the weights of the self-loop could be changed by increasing the inputting gate threshold, forgetting gate threshold and outputting gate threshold. In this way, when the model parameters are fixed, the integral scale at different time points can be dynamically changed, thereby avoiding gradients vanishment [82][83][84].

Gated Recurrent Unit (GRU)
GRU is a variant of LSTM, which was first proposed by Cho et al. in 2014 [85]. The difference between GRU and LSTM is that one gate threshold is used to replace the inputting gate threshold and the forgetting gate threshold, that is, an "updating" gate threshold is used to control the state of the cell. The advantage of this method is that the calculation is simplified and the expression ability of the model is excellent (Table 1) [86,87]. The parameters of the models above are shown in Table 1. Table 1. The related parameters information of the six models mentioned above in this paper.

Models
Related Parameters

LOGREG
The penalty was set as "L2".

RF
The maximum number of iterations parameter N was set as 10 in this study.

KNN
The parameter of N was set as 20.

SVM
The RBF kernel function was used and the regularization parameter C was set as 2, and the kernel parameter σ was also set as 2.
GRU Hidden size = 50; the number of layers = 2. The optimization function is Adam with parameter learning rate = 0.02, betas = (0.9, 0.999). Time series data gradually accumulate with the passage of time, and a sliding window parameter λ is needed. In order to allow the data vector to keep certain historical information, the vector dimension is extended according to the sliding window parameter. The purpose of dimension expansion is to enable the current vector to contain the historical price information in the length of the sliding window. The sliding window parameter λ was set to 11 in this paper and was used as an empirical parameter in combination with the trend characteristics of China's stock market. In this paper, the closing price was selected for feature processing and model training.
The vector dimension expansion was carried out after determining the parameter of the sliding window. The formulas were shown in Equations (1) and (2), where x, X, y, x i and y i represent the raw data, the expanded matrix data [88], the label vector of the expanded data X, the closing price on the ith day, and the i-th label of the extended vectors, respectively. After dimension expansion, the one-dimensional data with only one closing price on that day were extended to λ-dimensional data:

Feature Processing Method without Look-Ahead Bias
A key step in data preprocessing is data standardization or normalization. Traditional time series normalization or standardization processes often need to access all data, and there is usually a look-ahead bias [57,89]. This paper proposed a new feature extraction method, which retained the relevant information while reducing the absolute data size. At the same time, the original data could be scaled to a relative stable range of fluctuation. The feature vector based on the mean-deviation-rate was associated with historical data only and did not have look-ahead bias. Instead of standardization or normalization based on all data, the sliding window parameter λ was used to dynamically calculate a mean-deviation-rate by deviating from the mean value, and solving the problem of look-ahead bias. The formula was as follows: For the λ-dimensional vector, since the λ -1 length of data was the historical closing price, the mean value of this vector was subtracted and the value of each data was divided by this mean value. The feature f ij was computed as Equations (3) and (4): where x ij denotes the closing price in the data matrix X and M λ s denotes the mean of the closing price of numbers in the corresponding sliding window period λ. After preprocessing of the original data by the above formula, the feature matrix F was obtained (the dates of the data were not listed) as follows Equation (5):

Definition of Continuous Trend Labeling
When the market evolves with a continuous trend, it is divided into a rising market and a falling market. The investors should buy and hold the target (stocks or commodities) in the rising market but hold the short position in a market with short mechanism. If there is no short mechanism, investors should sell the target in the falling market. Their position should not change until the forecast of the market trend is about to change [90]. In order to distinguish the continuous trends, this paper provided the following definition of continuous rising and continuous falling trends. First, the peak points and the trough points of the historical data in a time period were put into vectors h and l, where t represents the number of peak points, and m represents the number of trough points in Equation (6). A TD index was used to calculate the degree of trending of some time series data, and the result of the index reflected the continuous trend fluctuation degree of two adjacent peak and trough points in Equations (7) and (8): In this paper, by comparing the fluctuation parameter ω with the index value of TD, the continuous trend was defined as the fluctuation amplitude between the two adjacent peak and trough points exceeding the given threshold parameter ω, otherwise the fluctuation was considered a normal fluctuation without a continuous trend. The latest lowest and highest prices were selected as the basis of calculation, and the rise of the market above or the fall of the market below the ω parameter was defined as a continuous rising trend or a continuous falling trend, respectively. Then, all data labels were set to 1 in a period of the upward trend and −1 (or 0 for the training process of deep learning models, if the label is negative, it may report an error) in a period of the downward trend for model training. An example is given in Figure 2, and the time series data are a part of the data to be analyzed in the next step of the paper.
The calculation of TD of L4H5 and H8L9 is shown in the figure. From the perspective of actual performance, it was hoped that L4H5(H8L9) would be viewed as an overall continuous upward(downward) trend, and the corresponding data label belonged to a same category, which was more in line with the law of market operation. However, the traditional labeling method in the L4H5(H8L9) section was obviously noisy and did not conform to the law of market operation, and the same was true for the rest of the series. of calculation, and the rise of the market above or the fall of the market below the ω parameter was defined as a continuous rising trend or a continuous falling trend, respectively. Then, all data labels were set to 1 in a period of the upward trend and −1 (or 0 for the training process of deep learning models, if the label is negative, it may report an error) in a period of the downward trend for model training. An example is given in Figure 2, and the time series data are a part of the data to be analyzed in the next step of the paper. Definition of continuous trend labeling: the market was divided into two categories, rising market and falling market, based on the index of TD. The "label1" denotes the labeling results of traditional labeling methods. The "label2" represents the labeling results of labeling method based on the trend definition proposed in this paper. From the picture, we can see that in any period of time, such as the trend of L4H5, our labeling method gives one direction label for the data, but the traditional labeling methods label the data in two directions in this period of time, which is not in line with the reality. It is the same situation with H8L9.
The calculation of TD of L4H5 and H8L9 is shown in the figure. From the perspective of actual performance, it was hoped that L4H5(H8L9) would be viewed as an overall continuous upward(downward) trend, and the corresponding data label belonged to a same category, which was more in line with the law of market operation. However, the traditional labeling method in the L4H5(H8L9) section was obviously noisy and did not conform to the law of market operation, and the same was true for the rest of the series. Definition of continuous trend labeling: the market was divided into two categories, rising market and falling market, based on the index of TD. The "label1" denotes the labeling results of traditional labeling methods. The "label2" represents the labeling results of labeling method based on the trend definition proposed in this paper. From the picture, we can see that in any period of time, such as the trend of L4H5, our labeling method gives one direction label for the data, but the traditional labeling methods label the data in two directions in this period of time, which is not in line with the reality. It is the same situation with H8L9.
The trend of a market is often the combined results of fundamentals and economic environment. The duration of a trend is relatively long, it is more in line with the actual investment behavior in terms of actual operation. The traditional research methods focus on predicting the rising and falling prices in a short time period in the future, by evaluating the regression or classification effect of an established model, thus only focusing on the prediction of direction of short-term fluctuations. In practice, the traditional research methods are often unworkable, especially when the actual operation cost and market capacity are taken into account. Therefore, it is theoretically feasible to define the continuous up and down trends of the market and adopt a machine learning model to predict the direction of the market trend while ignoring the normal fluctuations, which is more in line with the law of financial market operation.
The label vector set y is obtained from the labeling operations carried out after the parameter ω is given. However, different investors have different judgments on the continuous trend even for the same stock or commodity market, so the way of labeling historical data is also unique. The reason is that different capitals, risk tolerances, investment decision-making cycles and other factors lead to diverse investment methods and investment styles. Thus, different investors have distinct definitions of the continuous trend of the market. As a result, the investors can label historical data with unique parameter and train models to get the most suitable model to guide their investment. In this paper, an automatic labeling algorithm was proposed.
When the market rises above a certain proportion parameter ω from the current lowest point or recedes from the current highest point to a certain proportion parameter ω, the two segments are labeled as rising and falling segments, respectively. The labeling result is unique as long as the proportion threshold parameter ω is given for the labeling process. The value of ω in this paper was 0.15, which was obtained from the following analysis in the paper. The algorithm for the automatic labeling process based on a given parameter ω is presented in Algorithm 1.

Initialization of related variables:
FP=x 1 , which represents the first price obtained by the algorithm; x H =x 1 , used to mark the highest price; HT=t 1 , used to mark the time when the highest price occurs; x L =x 1 , used to mark the lowest price; LT=t 1 , used to mark the time when the lowest price occurs; Cid=0, used to mark the current direction of labeling; FP_N =0, the index of the highest or lowest point obtained initially.

Data Description
This paper mainly analyzed the trend changes in China's Shanghai Stock Composite Index (Stock Code 000,001) and Shenzhen Component Index (Stock Code 399,001) and forecasted the trend of the stock market to guide the investment analysis.
Shanghai Composite Stock Exchange Index (SSCI) is short for "Shanghai Stock Index" or "Shanghai Stock Composite Index". Its component stocks are all listed on the Shanghai Stock Exchange, including A and B stocks. SSCI reflects the changes of the stock prices on the Shanghai Stock Exchange. SSCI was officially released with a base value of 100 on 19 December 1990 and has since reflected the overall trend of the Chinese stock market [91].
Shenzhen Component Stock Exchange Index (SZCI) refers to the weighted composite stock price index compiled by the Shenzhen Stock Exchange and covers all stocks listed on the Shenzhen Stock Exchange. The Shenzhen Component Index was compiled and published by the Shenzhen Stock Exchange on 3 April 1991 with a base index of 100 points.
As the barometer of China's economy, Shanghai Stock Index and Shenzhen Component Index have a profound impact on all walks of life. China's stock market provides financing facilities for enterprises and is crucial for the development of these enterprises. SSCI and SZCI play an important role in the national economic development at macro and micro levels and have great significance in the prediction of important index trends. Therefore, the correct prediction of the trend of China's stock market not only can guide the formulation of relevant economic policies, but also can better prevent financial risks, facilitate the dynamic flow of capital, and allow funds to flow to enterprises in need.
This paper mainly analyzed the trend of two stock indexes in China by using daily stock transaction data collected from China RichDataCenter (the data can be downloaded from GitHub (https://github.com/justbeat99/Daily-Stock-Data-Set) or China RichDataCenter (http://www.licai668.cn/ content/showContent.asp?titleNo=365)) based on the backward answer authority, which is the price before eliminating advantageous position carries when the answer authority changeless, raise the price after eliminating advantageous position. In order to further compare the method proposed in this paper, three stocks with sufficient data were also randomly selected to test the trend prediction ability of the models. Then, investment strategies were established to test the investment performance. These three stocks were those of the Founder Technology Group Corp. (stock code 600,601), Shenzhen Cau Technology Co., Ltd. (stock code 000,004), and Shanghai Fenghwa Group Co., Ltd. (stock code 600,615).

Input Setup
Combined with the trading rules and the volatility level of China's stock market, the sliding window parameter λ = 11 was obtained as an empirical parameter. In order to be comparable, all parameters of the machine learning models remained the same for the comparison experiments and were not optimized. In the segmentation of training and test sets, it was considered to establish strategies for the later steps to obtain the profit gain/loss results for the net yield rate curve, and the larger the amount of test data, the more convincing the results. At the same time, the larger the training set, the better learning performance of the models. Thus, this paper adopted manual segmentation of training and test sets to balance the above problems by splitting the test and training sets in a ratio close to 1:1 (In fact, it has been verified that the models have converged after the training set data are close to 1000). The actual data available for the training set would be subtracted by λ − 1 raw data owing to the sliding window parameter λ. The data in the first length of λ − 1 contained no more historical data for expansion, so they could not be used for training. The relevant data are shown in Table 2.

Comparison Experiments
In order to compare the prediction results of the labeling method proposed by this paper with those obtained from the traditional time series data labeling method, four groups of comparative experiments were designed. Since this paper constructed investment strategies based on the forecasting direction, the regression analysis was also reduced into high-low direction forecasting problems to compare with the method proposed by this paper. The traditional labeling method labeled data by comparing the closing price X t + m at a certain time with the closing price X t . In the comparative experiments of this paper, m = 1, 3, 5, 10. Table 3 showed the labeling process of the labeling method proposed by this paper and the comparison experiments, where X t represents the closing price at time t, "Label t " represents the Label of the t-th data. E was the experiment of the labeling method proposed by the paper, and C1, C3, C5, C10 represent the comparative experiments, respectively.

Experiment Name
Label t = 1

Statistical Metrics
In order to evaluate the prediction efficiency of the trained models, six statistical metrics, namely accuracy, recall, precision, F1_score, AUC and NYR were selected to evaluate the prediction classification results [92,93]. The former five metrics are classification metrics that would be maximized when the model did not generate false positive or false negative predictions, as shown in the Table 4 below [94], NYR is the profit metric. AUC AUC can objectively reflect the ability of comprehensively predicting positive samples and negative samples and eliminate the influence of sample skew on the results to a certain extent.
In order to compare the profitability of the constructed strategies, the net yield rate NYR was used to evaluate the strategies.
True positives (TP) denote the success in the identification of the correct reinforcement class (positive samples), true negatives (TN) denote the successful classification of negative samples, false positives (FP) stand for the incorrect classifications of negative samples into positive samples, and false negatives (FN) denote the positive samples that are incorrectly predicted as negative samples [92,95]. Acc is the most primitive evaluation metric in classification problems. The definition of accuracy is the percentage of the correct results in the total sample. In terms of AUC, x + i and x − i represent the positive and negative labels of data points, respectively. f is a generic classification model, 1 is an indicator function equals to 1 when f (x + i ) ≥ f (x − j ) and 0 otherwise, N + (resp., N − ) is the number of data points with positive (resp., negative) labels, and M = N + N − is the number of matching points with opposite labels (x + ,x − ), with a value ranging from 0 to 1. The higher the value, the better the model. A random guess model has 0.5 as its AUC value [96]. NYR represents the cumulative return on the investment strategy, where R j denotes the daily return of a stock, HD stands for the number of days when the positions of the stock are held in a buy-and-sell process, and NT represents the total number of transactions of a buy-and-sell strategy.

Analysis of Threshold Parameters
The selection of the parameter ω was based on the different situations of actual investors (including but not limited to investment capital, risk tolerance, trading frequency, etc.) and the fluctuation of the corresponding market target. In order to objectively evaluate the proposed labeling method, a more objective method was used to determine the value of parameter ω, and the traditional four machine learning models were used for the analysis and study of parameter ω. In this study, different ω parameters were compared and analyzed, and a parameter ω with relatively better classification results was finally chosen as the basis of the next comparative experiment and strategy construction. The value of parameter ω was set to 0.05-0.5 with a step size of 0.05 to label SSCI and SZCI data based on the automatic labeling algorithm proposed in this paper, and four traditional machine learning models were trained accordingly. Figure 3 shows the specific graphic results of the classification metrics of the four models assigned with different values of parameter ω based on the SSCI and SZCI data. In the training process, the 10-fold cross-validation was used to train the model, and a mean accuracy value of 10-fold cross-validation was obtained.  , Average_value under different thresholds respectively.Ag represented "average", A represents accuracy, P represents precision, R represents recall, and F1 represents F1_score. The X axis represents the value of the parameter, and the Y axis represents the corresponding classification index value. In the results, to balance the results of the four traditional machine learning models, plot f averaged the classification results of the four machine learning models. It can be seen from the picture that a threshold parameter set at 0.1-0.2 is better.

Classification Results and Analysis
In this part, the trained LOGREG, RF, KNN, SVM, LSTM, GRU models with the parameter ω = 0.15 were used to compare the classification results of the two stock indexes and three stocks. In order to distinguish the effects of the traditional four machine learning models and two deep learning models, the experimental results were listed separately. Table 5 showed the results of average accuracy of the 10-fold cross-validation of the four traditional machine learning models carried out using the training set. The "Average_Accuracy" column represented the average accuracy of the four traditional machine learning models using the same experimental results of the same stock. The value of "Average_Accuracy" could more objectively reflect the experimental results of the various labeling methods. It can be clearly seen that the average classification accuracy of experiment E on all stock indices and stocks was close to 0.7 of the four machine learning models, much exceeding the results of C1, C3, C5, and C10, and was consistent with that of "Average_Accuracy". , Average_value under different thresholds respectively. Ag represented "average", A represents accuracy, P represents precision, R represents recall, and F1 represents F1_score. The X axis represents the value of the parameter, and the Y axis represents the corresponding classification index value. In the results, to balance the results of the four traditional machine learning models, plot f averaged the classification results of the four machine learning models. It can be seen from the picture that a threshold parameter set at 0.1-0.2 is better.
It can be seen from plots a to e in Figure 3 that while the parameter ω gradually increased from 0.05 to 0.5, the values of accuracy, precision and AUC of the classification results gradually decreased, while the values of recall and F1_score did not decrease significantly. When the parameter ω was between 0.05 and 0.25, the values of accuracy and AUC remained above 0.6 and the classification accuracy was relatively high, indicating that the ROC curve worked well and the precision was above 0.55. The results obtained from plot f were consistent with the results obtained from plots a to plot e, indicating that the classification performance was the best when the parameters were between 0.05 and 0.25. Therefore, the parameter ω should be close to 0.05. However, in order not to overemphasize the role of parameter ω, a suboptimal parameter value ω = 0.15 was chosen instead and the average value ranged from 0.05 to 0.25. Therefore, if the price rose or fell by more than 15%, it would be regarded as an upward or downward trend, respectively, and the data would be labeled by the automatic labeling algorithm accordingly.

Classification Results and Analysis
In this part, the trained LOGREG, RF, KNN, SVM, LSTM, GRU models with the parameter ω = 0.15 were used to compare the classification results of the two stock indexes and three stocks. In order to distinguish the effects of the traditional four machine learning models and two deep learning models, the experimental results were listed separately. Table 5 showed the results of average accuracy of the 10-fold cross-validation of the four traditional machine learning models carried out using the training set. The "Average_Accuracy" column represented the average accuracy of the four traditional machine learning models using the same experimental results of the same stock. The value of "Average_Accuracy" could more objectively reflect the experimental results of the various labeling methods. It can be clearly seen that the average classification accuracy of experiment E on all stock indices and stocks was close to 0.7 of the four machine learning models, much exceeding the results of C1, C3, C5, and C10, and was consistent with that of "Average_Accuracy". The average accuracy of experiments C1, C3, C5 and C10 was significantly lower. Except for the average accuracy value of C1 experiment of 600,601 that was above 0.6, all other experimental results were between 0.51 and 0.56, indicating much poorer performance than the results of the proposed method. Thus, the labeling method proposed by this paper was more in line with the characteristics of the market and more valuable for the models. The four traditional machine learning models trained by the training set generated by the labeling method proposed in this paper showed obvious better results.
The effects of the trained models on the test set were also verified. Table 6 shows the corresponding AUC values. It can be clearly seen that the AUC values of experiment E exceed the results of C1, C3, C5, and C10 in all stocks tested a lot, indicating that the data trained by the automatic labeling algorithm can be well generalized with excellent learning performance. Moreover, the results of C1, C3, C5, and C10 were basically around 0.5, although the result of C1 of 600,601 was above 0.6, indicating poorer performance than that of experiment E.
All ROC curves were drawn to provide better visual effects. However, since the space was limited, only the AUC values were provided here in Table 6.
The four metrics of the classification results, precision, recall, accuracy and F1_score of the models KNN, LOGREG, RF, and SVM, are given in Tables 7 and 8, respectively. Meanwhile, the average of the results of the four model-related metrics used to evaluate the test results more objectively was obtained from Table 9. In Table 7, it can be see that all the index values corresponding to test E as well as the values of Accuracy and F1_score were the highest among all model results obtained from the test set of two stock indices SSCI and SZCI and three stocks, and the classification results of accuracy were higher than those of C1, C3, C5, and C10. As far as the value of metric Precision was concerned, the value of experiment E corresponding to 600,601 of the model LOGREG was 0.6881, which was slightly lower than the corresponding C1 result of 0.6899, but the difference was not significant. The precision value of experiment E of 000,004 in model LOGREG was 0.7889, which was slightly lower than the corresponding C1 value of 0.8571. The precision value of experiment E of 600,601 in the model SVM was 0.6942, which was slightly lower than the corresponding value 0.7055 of C1.     As far as the value of recall was concerned, the value of experiment E in models LOGREG and SVM was slightly lower than that in the comparison experiment, but the other values were the highest among all recall values. Moreover, the data showed that the corresponding F1_score of C1, C3, C5 and C10 of 000,004 and 600,615 were very low for LOGREG and SVM, indicating that the machine learning models have not learned effective patterns using the training data in experiments C1, C3, C5 and C10, and showed poor performance in the test set.
In order to better analyze the results, the differences of machine learning algorithms, as well as the situations caused by the sample differences, were taken into account. Then, the corresponding classification metrics of the four machine learning models were averaged, as shown in Table 9. The table clearly showed that all corresponding index values of experiment E were the highest, and the average Accuracy values were basically around 0.6-0.7, indicating the model had a high accuracy in predicting market trends of the test set. The experimental results showed that the labeling method proposed in this paper was in line with the law of the market, suitable for labeling time series data with trend characteristics. In addition, the machine learning models trained in this study showed good performance in prediction. Table 10 shows the classification results of the two deep learning models of LSTM and GRU. It can be clearly seen that the accuracy values of the two deep learning models in SSCI and SZCI exceed 0.7, which are basically around 0.72. The accuracy results are better than the traditional four machine learning models, and the other four metrics are also significantly better than the traditional four machine learning models. The accuracy results of 600,615 even reach 0.74 and 0.75 on LSTM and GRU, respectively. Concerning LSTM and GRU, the difference of the metrics is not significant. The overall results are better than those of the four traditional machine learning algorithms. Moreover, the results of all metrics values of E much surpass the results of C1, C3, C5, and C10, confirming the effectiveness of our proposed algorithm again.

Implementation of Strategies
In this part, the trend prediction results of the models using the test set were constructed as investment strategies to test the investment efficiency of the models based on different labeling methods. Considering the price changes caused by market fluctuations and in order to reflect the investment process as close as the actual situation, in conjunction with the available amount of funds and purchasable volume, the issue of positions should also be taken into account. The reason was that there would be a problem when the account balance was lower than the initial balance, so that the positions available for purchase would be reduced. Based on the above considerations, this paper proposed the following hypotheses to construct the strategies: Hypothesis 1. SSCI and SZCI were tradable stock indexes. The profit and loss were settled according to the absolute index point, and the contract multiplier was 1.

Hypothesis 2.
The strategies constructed by experiments E, C1, C3, C5, and 10 and the buy-and-hold strategy would be compared with each other based on the net yield rate.
The initial balance for the strategies of the models and the buy-and-hold investment strategy was one million. The investment strategies constructed by the models would have a full load in stock index contracts or stocks in the upward trend and would sell all contracts or stocks in the downward trend, and the actual amounts that could be bought or sold would be calculated according to the balance. The buy-and-hold investment strategy bought the stock index contracts or stocks at the very beginning and held them until the end of the test period, when the stock index contracts or stocks would be sold.
For the strategy construction, based on the two hypotheses above, the investment strategies constructed by the models would conduct buying and selling operations based on the predicted labels of the stock data. In fact, the traditional regression algorithm ultimately built the strategies according to the direction of forecasted price fluctuations, thus the strategies based on C1, C3, C5, and C10 could be considered to cover the strategies of the regression prediction construction. Since there is no short-selling mechanism in China's stock market (without considering the security lending mechanism for the time being), the strategies constructed by the models would buy stock index contracts or stocks in the predicted rising trend and sell all holdings when the market was about to decline. The strategies were constructed as follows: if the label of classification prediction was 1, the buying operation would be performed; if the label of classification prediction was -1, the selling operation would be performed. Table 11 shows the investment results of two stock indices SSCI and SZCI and three stocks of the strategies of the four traditional machine learning models, including the buy-and-hold strategy. In terms of the results of Average_NYR, the net yield rate of experiment E of all stock indices and stocks was the highest, much exceeding that of the buy-and-hold strategy. The Average_NYR values of C1, C3, C5, and C10 of SSCI and SZCI exceeded the results of the buy-and-hold strategy but were lower than the results of Experiment E. In terms of the Average_NYR values of C1, C3, C5, and C10 of 600,601 and 000,004, some values were larger than those of the buy-and-hold strategy and some values were lower than those of the buy-and-hold strategy, without obvious robustness and stability compared to the results of experiment E. Specifically, the Average_NYR result of 600,615 was the only result of experiment E, exceeding the result of the buy-and-hold strategy, and the values of C1, C3, C5, and C10 were all far below that of the buy-and-hold strategy. Specifically, for each of the four machine learning models, it can be clearly seen that the results of experiment E were the best, especially for the results of 600,601 and 600,615, both of which much exceeded those of C1, C3, C5, and C10. It can also be seen from the data that the net yield rate to maturity of 000,004 and 600,615 in experiment E was much higher than the corresponding results of C1, C3, C5 and C10. The results from the net yield rate to maturity proved the optimality of the proposed labeling method. Table 11. Net yield to maturity (NYR) for each strategy of the four traditional machine learning models. The data represent the cumulative rate of return at the end of the test period. The "average_NYR" represented the average value of the four traditional models.  Table 12 shows the investment results of two stock indices SSCI and SZCI and three stocks of the strategies of LSTM and GRU. Concerning SSCI and SZCI, the value of Average_NYR of E not only much exceeds the results of C1, C3, C5, and C10, but also exceeds the traditional four machine learning models. In terms of the individual stock profit rate, only the stock code 600,601 has a yield rate of 412.25%, which exceeds the average result of the traditional four machine learning models. However, considering that the SSCI and SZCI are more representative, the differences of individual stocks are relatively large and the results of the previous classification metrics, it is believed that LSTM and GRU are better than traditional machine learning models in processing time series data classification. The results once again confirm the superiority of our labeling method.

Stock
The net yield rate curves for SSCI and SZCI of the four traditional machine models and the two deep learning models are shown in Figures 4 and 5, respectively. The curves of experiment E were almost above all other curves at each time point, indicating that the profitability of experiment E was higher than that of other experiments. In fact, only the net yield rate result of C5 in subfigure d was similar to the corresponding result in experiment E in Figure 4. In terms of the net yield rate to maturity, it can be seen that the corresponding results of experiment E were still the best. The results of subfigure h in Figure 4 showed that the performance of SVM in C1, C3, C5 and C10 was poor not only due to the difference in variety, but also due to the differences in algorithm models and parameters. At the same time, it was noticed that when the stock indexes SSCI and SZCI rose as a whole, each strategy made different levels of profit. While SSCI and SZCI fell as a whole, each strategy showed a reduced net yield rate. Therefore, if the model could better predict the trend, it could buy the corresponding target when the market trend was about to rise but sell the target in advance when the market trend was about to fall. The experimental results showed that the models based on the labeling method proposed in this paper achieved better results than the traditional labeling method for time series trend prediction. The results demonstrated that the labeling method proposed in this paper was superior to the traditional labeling method for locating the trend characteristics of financial time series data. maturity, it can be seen that the corresponding results of experiment E were still the best. The results of subfigure h in Figure 4 showed that the performance of SVM in C1, C3, C5 and C10 was poor not only due to the difference in variety, but also due to the differences in algorithm models and parameters. At the same time, it was noticed that when the stock indexes SSCI and SZCI rose as a whole, each strategy made different levels of profit. While SSCI and SZCI fell as a whole, each strategy showed a reduced net yield rate. Therefore, if the model could better predict the trend, it could buy the corresponding target when the market trend was about to rise but sell the target in advance when the market trend was about to fall.  The experimental results showed that the models based on the labeling method proposed in this paper achieved better results than the traditional labeling method for time series trend prediction. The results demonstrated that the labeling method proposed in this paper was superior to the traditional labeling method for locating the trend characteristics of financial time series data.  . (a,b) represent the results of LSTM and GRU of SSCI respectively, (c,d) represent the results of LSTM and GRU of SZCI respectively. The X axis is date and the Y axis is the net yield rate. "BAH" is short for "buy-and-hold" strategy. It can be seen from the figure that the cumulative rate of return at each time point of different experiments on the index.

Conclusions
This paper proposed a novel data labeling method called CTL to extract the continuous trend feature of financial time series data. In the feature preprocessing stage, this paper proposed a new method that can avoid the problem of look-ahead bias encountered in the traditional data standardization or normalization process. Then, an automatic labeling algorithm was developed to extract the continuous trend features of financial time series data, and the extracted trend features Figure 5. NYR Curves of SSCI and SZCI. (a,b) represent the results of LSTM and GRU of SSCI respectively, (c,d) represent the results of LSTM and GRU of SZCI respectively. The X axis is date and the Y axis is the net yield rate. "BAH" is short for "buy-and-hold" strategy. It can be seen from the figure that the cumulative rate of return at each time point of different experiments on the index.

Conclusions
This paper proposed a novel data labeling method called CTL to extract the continuous trend feature of financial time series data. In the feature preprocessing stage, this paper proposed a new method that can avoid the problem of look-ahead bias encountered in the traditional data standardization or normalization process. Then, an automatic labeling algorithm was developed to extract the continuous trend features of financial time series data, and the extracted trend features were used in four supervised machine learning methods and two deep learning models for financial time series prediction. The experiments performed on two stock indexes and three stocks demonstrated that CTL was superior to the state-of-the-art data labeling method in terms of classification accuracy and some other metrics. Furthermore, the net yield rate obtained by the strategies built on the financial time series prediction of CTL was much higher than that of other labeling methods, and far exceeded that of the buy-and-hold strategy, which represents the maturity return of the index itself.