A New Trend Pattern-Matching Method of Interactive Case-Based Reasoning for Stock Price Predictions

: In this paper, we suggest a new case-based reasoning method for stock price predictions using the knowledge of traders to select similar past patterns among nearest neighbors obtained from a traditional case-based reasoning machine. Thus, this method overcomes the limitation of conventional case-based reasoning, which does not consider how to retrieve similar neighbors from previous patterns in terms of a graphical pattern. In this paper, we show how the proposed method can be used when traders ﬁnd similar time series patterns among nearest cases. For this, we suggest an interactive prediction system where traders can select similar patterns with individual knowledge among automatically recommended neighbors by case-based reasoning. In this paper, we demonstrate how traders can use their knowledge to select similar patterns using a graphical interface, serving as an exemplar for the target. These concepts are investigated against the backdrop of a practical application involving the prediction of three individual stock prices, i.e., Zoom, Airbnb, and Twitter, as well as the prediction of the Dow Jones Industrial Average (DJIA). The veriﬁcation of the prediction results is compared with a random walk model based on the RMSE and Hit ratio. The results show that the proposed technique is more effective than the random walk model but it does not statistically surpass the random walk model.


Introduction
A case-based reasoning (CBR) technique is one of the popular methodologies in knowledge-based systems and uses past similar problems to solve current new problems [1,2]. Many data mining methods such as regression, ARIMA (autoregressive integrated moving average), k-NN (K-nearest neighbor), and SVM (support vector machine) have been applied to stock price predictions. Recently, deep learning techniques such as LSTM and RNN have also been extensively applied to the task of predicting financial variables [3,4]. However, there is a paucity of research on stock prediction using k-NN or CBR techniques [2].
In this paper, we suggest a new case-based reasoning method for stock price predictions using the knowledge of traders to select similar past patterns among nearest neighbors from a traditional case-based reasoning machine. Thus, this method overcomes the limitation of a conventional case-based reasoning method, which does not consider how to retrieve similar neighbors from previous patterns in terms of a graphical pattern. In this paper, we show how the proposed method can be used when traders find similar time series patterns among nearest cases. We develop a distance measurement for retrieving neighbors from that of Chun and Ko [2]. For this, we suggest an interactive prediction system where traders can choose specific time series patterns among automatically recommended neighbors by case-based reasoning with their individual knowledge. paper intends to predict the stock price by applying the case-based reasoning technique after users select a case based on their knowledge of past similar cases calculated through the case-based reasoning technique. Thus, this method overcomes the limitation that a conventional case based reasoning depends on a Euclidean distance to find nearest neighbor and does not consider how neighbors are similar to the target case in terms of graphical view between previous trends and current target trend in time series. Also, traditional CBR may not reflect the direction of stock prices or political and economic factors during the relevant period. In this new suggested interactive casebased reasoning, users can select a similar patterns among nearest neighbors comparing the target time series, and then reflects new weights through the selected similar cases to predict. This paper presents how CBR can be applied to stock prediction using a user interactive CBR for selecting similar cases to serve an exemplar for the target. These concepts are investigated against the backdrop of a practical application involving the prediction of a stock market index.
The rest of this paper is organized into five sections. Section 2 reviews CBR as a knowledge discovery technique is presented. Section 3 introduces the proposed technique which is called Interactive CBR. Section 4 presents the case study. Section 5 discusses the results of the study. Finally, the concluding remarks are presented in Section 6.

Case-based Reasoning
Case-based reasoning (CBR) is an approach for solving a new problem by remembering a previous similar situation and by reusing information and knowledge of that situation [3]. This concept assumes that similar problems have similar solutions, so CBR is an appropriate method for a practical domain focused on real cases rather than on rules or knowledge to solve problems. A Step 1. Begin with current case .
Seek the J neighboring cases in the past that are closest to according to the distance function: ≡[ , ].
One of the issues of using conventional CBR is how to find optimal neighbors and how to schedule the size of the target data in a time series. Chun and Park [9] suggested a model to dynamically find the optimal neighbors for each target case. Chun and Ko [2] proposed a new similarity measure, termed a shape distance, which compared how rise and fall signs between a target case and possible neighbors were similar to each other.

Interactive Case-Based Reasoning and the Time Series Pattern-Matching Method
In this paper, we propose a user interactive selection method that selects nearest neighbors according to a comparison of graphical patterns with target cases. In financial forecasting problems, the Hit ratio in stock prediction may be an important decision tool to invest money on the stock market. We developed a distance measurement for retrieving neighbors from that of Chun and Ko [2].
Several machine learning algorithms such as deep learning consume too many computing resources to be used at the web front-end. It is also impossible to filter the dataset that the user requires because the results that have been preprocessed at the back-end are fetched. Interactive CBR can select similar graphical patterns among neighbors that a traditional CBR machine recommends. Time series data may be characterized by patterns of behavior in terms of the volatility of rises and falls with trading volumes. Thus, selecting neighbors with similar price trends may be compared to assess the similarity between automatically recommended neighbors and the target case. Figure 2 shows a configuration diagram of the interactive CBR system and presents the procedure of selecting nearest neighbors using interactive CBR. The procedure to reselect the nearest neighbors using traditional CBR is as follows. The server crawls the stock data and collects data from websites such as Yahoo Finance and other financial information intermediaries. When a user accesses the server through the client, the server sends the stock price data that have not been processed separately to the client. To implement a CBR machine, the user sets a few CBR-related options such as the learning period, number of neighbors, and size of the time series (window size). The data are then processed in the client and displayed to the user; the user then reviews the corresponding neighbors that the CBR machine has recommended. The user selects similar patterns compared with the target patterns and obtains the predicted value using the reselected neighbors.
Several machine learning algorithms such as deep learning consume too many computing resources to be used at the web front-end. It is also impossible to filter the dataset that the user requires because the results that have been preprocessed at the back-end are fetched. Interactive CBR can select similar graphical patterns among neighbors that a traditional CBR machine recommends. Time series data may be characterized by patterns of behavior in terms of the volatility of rises and falls with trading volumes. Thus, selecting neighbors with similar price trends may be compared to assess the similarity between automatically recommended neighbors and the target case. Figure 2 shows a configuration diagram of the interactive CBR system and presents the procedure of selecting nearest neighbors using interactive CBR. The procedure to reselect the nearest neighbors using traditional CBR is as follows. The server crawls the stock data and collects data from websites such as Yahoo Finance and other financial information intermediaries. When a user accesses the server through the client, the server sends the stock price data that have not been processed separately to the client. To implement a CBR machine, the user sets a few CBR-related options such as the learning period, number of neighbors, and size of the time series (window size). The data are then processed in the client and displayed to the user; the user then reviews the corresponding neighbors that the CBR machine has recommended. The user selects similar patterns compared with the target patterns and obtains the predicted value using the reselected neighbors.       Figure 3 shows four neighbors that the CBR machine has recommended. The first two neighbors are somewhat different from the target case whereas the last two neighbors look similar to the target case. Thus, interactive CBR finally chooses the last two cases as nearest neighbors for the stock price prediction. The interactive CBR system has the advantage that many users can retrieve the processing results with only a small amount of server computation required because the actual calculation is performed at the front-end client even if many users access the server. In addition, if the user has sufficient computing power, the CBR machine can send the prediction result within a shorter time than the network communication time. Thus, a user can receive the results of a CBR machine by changing the models.

The Data
This case study intended to investigate the effect of the proposed technique on the predictive performance in forecasting a stock market. The case study involved the prediction of three individual stocks, i.e., Zoom technologies, Airbnb, and Twitter, as well as the   Figure 3 shows four neighbors that the CBR machine has recommended. The first two neighbors are somewhat different from the target case whereas the last two neighbors look similar to the target case. Thus, interactive CBR finally chooses the last two cases as nearest neighbors for the stock price prediction. The interactive CBR system has the advantage that many users can retrieve the processing results with only a small amount of server computation required because the actual calculation is performed at the front-end client even if many users access the server. In addition, if the user has sufficient computing power, the CBR machine can send the prediction result within a shorter time than the network communication time. Thus, a user can receive the results of a CBR machine by changing the models.

The Data
This case study intended to investigate the effect of the proposed technique on the predictive performance in forecasting a stock market. The case study involved the prediction of three individual stocks, i.e., Zoom technologies, Airbnb, and Twitter, as well as the

Model Construction
Exploratory plots for the raw data series of the three individual stock prices and the

Model Construction
Exploratory plots for the raw data series of the three individual stock prices and the Dow Jones Industrial Average (DJIA) are shown in Figures 4-7.    In constructing the predictive model for the three individual stock prices and the Dow Jones Industrial Average, the input variables were first transformed. For financial variables, stationarity can often be obtained through a logarithmic and differencing operation [10]. Thus, a differencing procedure was performed. For example, the Opening Value at t time (Opent) could be transformed to be dlOpent (lOpent − lOpent−1) through a logarithmic and differencing procedure. Other input variables such as Hight, Lowt, and Closet were also transformed to be dlHight, dlLowt, and dlCloset, as shown in Figure 8. These variables could then be used for the prediction engine of interactive CBR to produce a predicted value of dlCloset. Finally, a predicted value of the closing price at t + 1 (pCloset+1) was obtained from a de-transforming procedure by adding the predicted value of dCloset to the previous actual closing price at t (Closet). Figure 8 presents an overview of preprocessing and postprocessing for producing a prediction value by interactive CBR.   In constructing the predictive model for the three individual stock prices and the Dow Jones Industrial Average, the input variables were first transformed. For financial variables, stationarity can often be obtained through a logarithmic and differencing operation [10]. Thus, a differencing procedure was performed. For example, the Opening Value at t time (Opent) could be transformed to be dlOpent (lOpent − lOpent−1) through a logarithmic and differencing procedure. Other input variables such as Hight, Lowt, and Closet were also transformed to be dlHight, dlLowt, and dlCloset, as shown in Figure 8. These variables could then be used for the prediction engine of interactive CBR to produce a predicted value of dlCloset. Finally, a predicted value of the closing price at t + 1 (pCloset+1) was obtained from a de-transforming procedure by adding the predicted value of dCloset to the previous actual closing price at t (Closet). Figure 8 presents an overview of preprocessing and postprocessing for producing a prediction value by interactive CBR.  In constructing the predictive model for the three individual stock prices and the Dow Jones Industrial Average, the input variables were first transformed. For financial variables, stationarity can often be obtained through a logarithmic and differencing operation [10]. Thus, a differencing procedure was performed. For example, the Opening Value at t time (Open t ) could be transformed to be dlOpen t (lOpen t − lOpen t−1 ) through a logarithmic and differencing procedure. Other input variables such as High t , Low t , and Close t were also transformed to be dlHigh t , dlLow t , and dlClose t , as shown in Figure 8. These variables could then be used for the prediction engine of interactive CBR to produce a predicted value of dlClose t . Finally, a predicted value of the closing price at t + 1 (pClose t+1 ) was obtained from a de-transforming procedure by adding the predicted value of dClose t to the previous actual closing price at t (Close t ). Figure 8 presents an overview of preprocessing and postprocessing for producing a prediction value by interactive CBR.
Closet were also transformed to be dlHight, dlLowt, and dlCloset, as shown in Figure 8. These variables could then be used for the prediction engine of interactive CBR to produce a predicted value of dlCloset. Finally, a predicted value of the closing price at t + 1 (pCloset+1) was obtained from a de-transforming procedure by adding the predicted value of dCloset to the previous actual closing price at t (Closet). Figure 8 presents an overview of preprocessing and postprocessing for producing a prediction value by interactive CBR.

Results of the Study and Discussion
The performance results among the predictive models (such as the random walk (RW) method (for much of this century, the random walk model of stock prices has served as a pillar of accepted wisdom in financial economics. One implication of the random walk model is that obvious patterns in the economy are already incorporated in the valuation of stock prices and financial markets. This is the rationale behind the technical analysis in forecasting stock prices based solely on variables pertaining to the market itself) and interactive CBR (ICBR)) using a selection method of neighbors are presented in Tables 1-4. The CBR model performance was evaluated using the RMSE and HR (Hit ratio). Table 1 summarizes the RMSE result when the data were not preprocessed; therefore, these raw data were used for the CBR prediction. It showed that CBR with raw data did not produce an improved performance compared with the RW in any combination of the models. Table 2 summarizes the RMSE result after the data were preprocessed. It showed that CBR with preprocessed data had enhanced performance models compared with the RW and the model had the best performance when the number of neighbors was two and window size was sixty. Figure 9 shows the results using a heat map graph. To compare the difference in predictive power between the preprocessed data and the non-preprocessed data, preprocessing was performed with a log and differencing. In the case of the CBR results of the data without preprocessing, none had a superior predictive power to the RW whereas the results of preprocessing showed enhanced predictive power compared with the RW in a specific section.    Figure 9 shows the results using a heat map graph. To compare the difference in predictive power between the preprocessed data and the non-preprocessed data, preprocessing was performed with a log and differencing. In the case of the CBR results of the data without preprocessing, none had a superior predictive power to the RW whereas the results of preprocessing showed enhanced predictive power compared with the RW in a specific section.  Table 3 summarizes the RMSE results for the stock price prediction of Airbnb with the preprocessed data. It showed that CBR with preprocessed data had enhanced performance models compared with the RW and the model had the best performance when the Figure 9. Heat map comparison of the performances of the models between the raw data and the preprocessed data. Table 3 summarizes the RMSE results for the stock price prediction of Airbnb with the preprocessed data. It showed that CBR with preprocessed data had enhanced performance models compared with the RW and the model had the best performance when the number of neighbors was two and window size was thirty. Figure 10 shows these results using a heat map graph.
number of neighbors was two and window size was thirty. Figure 10 shows these results using a heat map graph.   Table 4 summarizes the RMSE results for the stock price prediction of Twitter with the preprocessed data. It showed that CBR with the preprocessed data had enhanced performance models compared with the RW, and the model had the best performance when the number of neighbors was twenty and window size was one hundred twenty. Figure  11 shows these results using a heat map graph.  Table 4 summarizes the RMSE results for the stock price prediction of Twitter with the preprocessed data. It showed that CBR with the preprocessed data had enhanced performance models compared with the RW, and the model had the best performance when the number of neighbors was twenty and window size was one hundred twenty. Figure 11 shows these results using a heat map graph. Table 5 summarizes the RMSE results for the Dow Jones Industrial Average (DJIA) prediction with the preprocessed data. It showed that CBR with the preprocessed data had enhanced performance models compared with the RW and the model had the best performance when the number of neighbors was ten and window size was five. Figure 12 shows these results using a heat map graph.  Figure 11. Heat map comparison of the performances of the models between the raw data and the preprocessed data for the stock price prediction of Twitter. Table 5 summarizes the RMSE results for the Dow Jones Industrial Average (DJIA) prediction with the preprocessed data. It showed that CBR with the preprocessed data had enhanced performance models compared with the RW and the model had the best performance when the number of neighbors was ten and window size was five. Figure 12 shows these results using a heat map graph.   Heat map comparison of the performances of the models between the raw data and the preprocessed data for the DJIA price prediction. Figures 9-12 show several interesting results. In the case of Airbnb and Zoom, the predictive performances were superior to the RW in the small number of neighbors whereas in the case of Twitter, the predictive power was notable when the number o neighbors was larger. In general, superior models with a low RMSE were seen when the window sizes were 5, 10, 30, 60, and 120. The window size also seemed to be a factor o greater importance than the number of neighbors in the Dow Jones Industrial Average prediction. Figure 12. Heat map comparison of the performances of the models between the raw data and the preprocessed data for the DJIA price prediction. Figures 9-12 show several interesting results. In the case of Airbnb and Zoom, the predictive performances were superior to the RW in the small number of neighbors, whereas in the case of Twitter, the predictive power was notable when the number of neighbors was larger. In general, superior models with a low RMSE were seen when the window sizes were 5, 10, 30, 60, and 120. The window size also seemed to be a factor of greater importance than the number of neighbors in the Dow Jones Industrial Average prediction. Table 6 presents the results of the RMSE and the t-test for the difference in performance of the RW and CBR methods. The CBR models seemed to surpass the RW model. However, the CBR models did not exhibit a statistically significant performance difference in terms of the RMSE. Table 6. Root mean squared error (RMSE) and pairwise * t-tests for the best models.  Figures 13-16 show heat maps for the Hit rates, the proportion of correct forecasts for the prediction of the three stock prices and the Dow Jones Industrial Average (DJIA) in the test data. The Hit rates showed how effectively CBR predicted the direction of the price changes for the closing prices of these three stocks and the Dow Jones Industrial Average Index. The areas colored in yellow indicate the models where CBR had a superior performance to the RW. For the Hit ratio prediction, many models showed superior results to the RW model regardless of the window size and number of neighbors. Figures 13, 15 and 16 (Zoom and Twitter stock prices as well as the Dow Jones Industrial Average prediction, respectively) show that the CBR models outperformed the RW but that the RW outperformed the CBR models when compared with the Airbnb model in Figure 15. Compared with the Airbnb models, the predictive performances of the models for Zoom, Twitter, and the Dow Jones Industrial Average prediction outperformed the RW model in many combinations of window size and number of neighbors. The predictive power of the Airbnb model seemed to be relatively low due to the lack of training data, which was due to it being only recently listed on the stock market exchange.   in the test data. The Hit rates showed how effectively CBR predicted the direction of the price changes for the closing prices of these three stocks and the Dow Jones Industrial Average Index. The areas colored in yellow indicate the models where CBR had a superior performance to the RW. For the Hit ratio prediction, many models showed superior results to the RW model regardless of the window size and number of neighbors. Figures  13, 15, and 16 (Zoom and Twitter stock prices as well as the Dow Jones Industrial Average prediction, respectively) show that the CBR models outperformed the RW but that the RW outperformed the CBR models when compared with the Airbnb model in Figure 15. Compared with the Airbnb models, the predictive performances of the models for Zoom, Twitter, and the Dow Jones Industrial Average prediction outperformed the RW model in many combinations of window size and number of neighbors. The predictive power of the Airbnb model seemed to be relatively low due to the lack of training data, which was due to it being only recently listed on the stock market exchange.     Table 7 summarizes the Hit rates and the proportion of correct forecasts for the best models shown in Table 6. The Hit rates showed how effectively CBR predicted the direction of the price changes for the closing prices. Figures 13-16 show that several models had clearly highlighted results. For a consistent comparison, we tested the Hit ratio of the best models in Table 6 that showed the best RMSE performances. Table 7 indicates that CBR seemed to be more effective than the RW model in the Hit ratio. We tested the null hypothesis, Ho; however, the proposed CBR did not produce a statistically improved performance at a level of p < 0.1.    Table 7 summarizes the Hit rates and the proportion of correct forecasts for the best models shown in Table 6. The Hit rates showed how effectively CBR predicted the direction of the price changes for the closing prices. Figures 13-16 show that several models had clearly highlighted results. For a consistent comparison, we tested the Hit ratio of the best models in Table 6 that showed the best RMSE performances. Table 7 indicates that CBR seemed to be more effective than the RW model in the Hit ratio. We tested the null hypothesis, Ho; however, the proposed CBR did not produce a statistically improved performance at a level of p < 0.1.    Table 7 summarizes the Hit rates and the proportion of correct forecasts for the best models shown in Table 6. The Hit rates showed how effectively CBR predicted the direction of the price changes for the closing prices. Figures 13-16 show that several models had clearly highlighted results. For a consistent comparison, we tested the Hit ratio of the best models in Table 6 that showed the best RMSE performances. Table 7 indicates that CBR seemed to be more effective than the RW model in the Hit ratio. We tested the null hypothesis, Ho; however, the proposed CBR did not produce a statistically improved performance at a level of p < 0.1.  Table 7 summarizes the Hit rates and the proportion of correct forecasts for the best models shown in Table 6. The Hit rates showed how effectively CBR predicted the direction of the price changes for the closing prices. Figures 13-16 show that several models had clearly highlighted results. For a consistent comparison, we tested the Hit ratio of the best models in Table 6 that showed the best RMSE performances. Table 7 indicates that CBR seemed to be more effective than the RW model in the Hit ratio. We tested the null hypothesis, Ho; however, the proposed CBR did not produce a statistically improved performance at a level of p < 0.1. where p i are the sample proportions, π i are the population proportions, n i are the sample sizes for the groups, and p is a pooled estimate of the proportion of success in a sample of both groups, p = (n 1 p 1 + n 2 p 2 )/(n 1 + n 2 ).

Models
Generally speaking, the performance of CBR was superior to the RW but the difference was not statistically significant. The reason we did not obtain a statistically significant result seemed to be because the period of the training data and the period of the test data were not sufficient due to one company being only recently listed on the stock exchange. For the DJIA prediction, the period of the training data, which started from January 2015, was longer than the data of the other three stock prices but the test period was the same as the data of the other three stock prices. Although the predictive power was greater than that of the RW, the statistically verifiable data size was small and thus did not produce a significant result.

Concluding Remarks and Future Work
In this paper, we proposed interactive CBR for selecting similar patterns among neighbors that case-based reasoning recommended. Concepts were investigated against the backdrop of a practical application involving the prediction of the individual stock prices of Zoom, Airbnb, and Twitter as well as the Dow Jones Industrial Average. The results of the case study are summarized as follows:

•
The best model of the proposed technique was more effective than the random walk model.

•
The proposed method did not surpass the random walk model without preprocessing, whereas it outperformed the random walk model in terms of the RMSE and Hit ratio after preprocessing (such as logarithms and differencing).

•
In the case of Airbnb and Zoom, the predictive performances were superior to the random walk model with a small number of neighbors, whereas in the case of Twitter, the predictive power was notable when the number of neighbors was large.

•
In general, superior models with lower RMSEs were seen when the window sizes were 5, 10, 30, 60, and 120. The window size was a factor with a greater importance than the number of neighbors in the Dow Jones Industrial Average prediction.

•
For the Hit ratio prediction, many models showed superior results to the random walk model regardless of the window size and number of neighbors. Compared with the Airbnb models, the predictive performances of the models for Zoom and Twitter as well as the Dow Jones Industrial Average prediction outperformed the random walk model in many combinations of window size and number of neighbors.

•
The proposed method was not seen to statistically surpass the random walk model in terms of the RMSE and Hit ratio. The reason seemed to be that the statistically verifiable data size was small due to one of the companies we tested only recently being listed on the stock market exchange.
The proposed method, therefore, had the possibility to enhance predictability. Thus, in future research, we propose the possibility of a two-step filtering method by selecting similar patterns among neighbors that a CBR machine recommends. Interactive CBR can also be implemented by the concept of an automatic filtering method without human expert knowledge using selected similar patterns that would improve the predictability of interactive CBR.