Forecasting Net Income Estimate and Stock Price Using Text Mining from Economic Reports

: This paper proposes and analyzes a methodology of forecasting movements of the analysts’ net income estimates and those of stock prices. We achieve this by applying natural language processing and neural networks in the context of analyst reports. In the pre-experiment, we applied our method to extract opinion sentences from the analyst report while classifying the remaining parts as non-opinion sentences. Then, we performed two additional experiments. First, we employed our proposed method for forecasting the movements of analysts’ net income estimates by inputting the opinion and non-opinion sentences into separate neural networks. Besides the reports, we inputted the trend of the net income estimate to the networks. Second, we employed our proposed method for forecasting the movements of stock prices. Consequently, we found differences between security ﬁrms, which depend on whether analysts’ net income estimates tend to be forecasted by opinions or facts in the context of analyst reports. Furthermore, the trend of the net income estimate was found to be effective for the forecast as well as an analyst report. However, in experiments of forecasting movements of stock prices, the difference between opinion sentences and non-opinion sentences was not effective.


Introduction
According to the Japan Exchange Group's (JPX) research, the number of individual shareholders is rising in Japan (https://www.jpx.co.jp/markets/statistics-equities/examination/ 01.html). In particular, the number of individual investors in Japan achieved 49.67 million in 2016 and increased further by 1.62 million in 2017, exceeding 50 million for the first time. The number of individual investors is expected to increase continuously. Recently, the stock prices of most companies have been on the rise due to the effects of Abenomics (Abenomics refers to the economic policies advocated by Japanese Prime Minister Shinzō Abe since the general elections of December 2012) and the Olympics Games scheduled for 2020.
There is a need for investors to examine much information so as to invest in target companies. However, the sources of information are different, and the process of collecting information necessary for investment is complicated. Looking at a company's website, there are various reports, namely financial statements, financial results' briefing materials, annual reports, and securities reports, on the investor relations page. If we use a search engine to find a company's name, we may find different news reports. Moreover, Internet message boards for financial markets include various investors' opinions related to financial information and stock price movements. Furthermore, in recent years, people's comments on social networking sites, such as Twitter, Facebook, and Instagram, have been reflecting investor sentiments. Bollen et al. showed that mood states obtained from tweets are relevant for forecasting the Dow Jones Industrial Average (DJIA) [1]. The progress of computation assists us in referring to such information. On the other hand, it is getting difficult for investors to find appropriate information for their investments.
In this environment, it would be interesting to investigate whether the context of analyst reports have predictive power for the future movement of a stock price. An analyst report is referred to as a report written by analysts to evaluate individual companies by considering the following: news, press releases, stock valuations, and macroeconomic trends. Therefore, we consider analyst reports as an upward compatibility of the information sources for each investment. In this study, we analyzed the texts of analyst reports in forecasting trends of stock prices. Particularly, we aimed at forecasting the sign of stock price excess return to the market and the extent of stock price volatility, which are crucial in trends of stock prices.
Furthermore, we classified analyst reports by brokerage company and evaluated their effectiveness for each brokerage company, as the style and content of these reports depend on the company. We applied several word-embedding models developed from various resources. Therefore, we experimented with a variety of different data. Figure 1 shows the flow of the experiment in this paper. Basically, we performed three experiments. In the first experiment, we formulated a model to extract opinion sentences using 2213 sentences in analyst reports (we refer to these sentences as analyst report sentences). Section 5 discusses this experiment. We consider this as a pre-experiment for the remaining two experiments. The proposed model distinguishes the opinion and non-opinion sentences from the analyst report set, which comprises 17,356 analyst reports. Second, we forecasted the analyst's revision of the net income estimate using opinion and non-opinion sentences extracted from analyst reports and using trends of the net income estimate. We show this experiment in Section 6.1. Third, we forecasted movements of excess returns and volatilities with opinion and non-opinion sentences. Section 6.2 shows this experiment.  Figure 1. Flow of experiments in this paper. This paper focuses on three experiments: opinion sentence extract, net income estimate forecast, and stock price forecast. The opinion sentence extraction is conducted as a pre-experiment.

Related Works
There are various studies on financial text mining for the prediction of financial markets [2]. Bollen et al. showed that mood states obtained from tweets are relevant for forecasting the DJIA [1].
They applied OpinionFinder and G-POMS to extract seven public moods from tweets. They also applied self-organizing fuzzy neural networks for forecasting, and, consequently, could predict rises and drops with an accuracy of more than 80%. They found that mood states in terms of positive or negative mood are not effective in forecasting but those labeled "Calm" are effective. Schumaker et al. proposed and analyzed a machine-learning method for forecasting stock prices by analyzing financial news articles [3]. Their model forecasted indicators and stock prices by using a resource. Schumaker et al. united their approach using sentiment analysis [4]. They estimated stock prices after releasing financial news articles with SVM. Koppel et al. proposed a method for classifying the news stories of a company according to their apparent impacts on the performance of the company's stock price [5]. Low et al. proposed a semantic expectation-based knowledge extraction methodology for extracting causal relations by using WordNet as a thesaurus for extracting terms representing movement concepts [6]. Ito et al. proposed a neural network model for visualizing online financial textual data [7,8]. Their proposed model acquired word sentiment and its category. Milea et al. predicted the MSCI euro index (upwards, downwards, or constant) based on fuzzy grammar fragments extracted from a report published by the European Central Bank [9]. Wuthrich et al. predicted daily movements of five indices using news articles published on the Internet [10]. They constructed the rule to predict with a combination of news articles, index values, and some keywords. They found that textual information with bag-of-words in addition to numeric time-series data increases the quality of the input. Bar-Haim et al. proposed a framework for identifying expert investors and used it for predicting stock price rise from stock tweets applying an SVM classifier [11]. They trained the classifier that directly learned the relationship between the content of a tweet and the stock prices. The user who was writing tweets that could discriminate the rise/fall of the stock price was learned as a specialist. Then, they constructed the classifier trained only by the set of tweets of the identified experts. Guijarro et al. analyzed the impact of investors' mood on market liquidity [12]. They performed sentiment analysis of tweets related to the S&P 500 Index. Vu et al. proposed a method using a Decision Tree classifier to predict the daily price movements of four famous tech stocks [13]. They applied sentiment analysis, semantic orientation (SO), and movements of previous days as features for tweets. They predicted with an accuracy of more than 75%. Oliveira et al. constructed sentiment and attention indicators extracted from microblogs and then utilized machine learning-based methods for financial tweets sentiment classification of predicting daily stock market variables [14]. They tested five machine learning-based methods for financial tweets sentiment classification with the indicators. Zhang et al. proposed a context-aware deep embedding network to detect financial opinions behind texts extracted from Twitter [15]. They jointly learned and exploited user embeddings and the texts. Ranco et al. analyzed the effects of sentiments of tweets about companies on DJIA 30 prices applying SVM [16]. They found a dependence between stock price returns and Twitter sentiments. Smailović et al. showed causality between sentiment polarity of tweets and daily return of closing prices [17]. The authors also applied sentiment derived from an SVM model to classify the tweets into positive, negative, and neutral categories. Our proposed method uses a combination of several documents, such as analyst reports and the Wikipedia corpus, for forecasting stock price movements.
Regarding financial text mining for the Japanese language, Sakaji et al. proposed a method to automatically extract basis expressions that indicate economic trends from newspaper articles using a statistical method [18]. In addition, Sakaji et al. proposed an unsupervised approach to discover rare causal knowledge from financial statement summaries [19]. Their method extracted basis expressions and causal knowledge using syntactic patterns. Kitamori et al. proposed a method for extracting and classifying sentences indicating business performance forecasts and economic forecasts from summaries of financial statements [20]. This classification method was based on a neural network using a semi-supervised approach. Hirano et al. proposed a generalized scheme for selecting related stocks for themed mutual funds [21,22]. Their methodology used some Japanese documents, such as Japanese financial summaries, news articles, and webpages.
These financial text mining studies considered only one language. In contrast, our method uses movements of net income estimate and stock price as the target data.

Data
In this section, we describe the procedure for collecting data for the experiments. In Section 3.1, we present the analyst reports that we use for the experiments. In Section 3.2, we present an analyst net income estimate and its trend. In Section 3.3, we present an excess return and a volatility.

Analyst Reports
We use two types of analyst reports: analyst report sentences and analyst report set. The analyst report sentences comprise 2213 sentences and are randomly extracted from 10,100 analyst reports issued in 2017. We use the analyst report sentences to construct the extracting model for opinion sentences. The analyst report set comprises 17,356 reports, issued from January 2016 to February 2018. We distinguish opinion sentences from non-opinion sentences in these reports with the extracting model. We use these opinion and non-opinion sentences for net income estimate forecast (Section 6.1) and stock price forecast (Section 6.2).

Dataset for Net Income Forecast
In Section 6.1, we use the most recent trend in net incomes and a change rate to forecast the analyst's net income estimate higher or lower than the threshold. We first calculate the estimated net income. Let NI(t) be the estimated net income of a brand at some point t, which is calculated as net income estimate for the forward 12 months by distributing the net income of the current fiscal year and that of the next fiscal year. We apply the distribution to prevent a jump at a timing across the accounting period using the net income estimate of either the current fiscal year or the next fiscal year. Let us consider the example as of 31 May 2020. Analysts estimate the net incomes of the current fiscal year ending March 2021 and those of the next fiscal year from April 2021 to March 2022, for many March-settlement companies. Then, the net income estimate for the next 12 months will be calculated as the distribution between the 10-month amount of the current year's net income estimate and the 2-month amount of the next year's net income estimate. The net income estimate is calculated using Equation (1), where N I is the net income estimate of the present fiscal year and N I is that of the next fiscal year.
In addition, we calculate the most recent trend in the analyst's net income estimates. At some point t, let the trend be a change rate of the net income estimate NI(t) over the average of the net income estimates, such as the net income for the past three months (30 days ago, 60 days ago, and 90 days ago). This trend is represented in Equations (2) and (3).
To calculate the rate of change in the estimated net income, we consider that of 2, 4, 6, 8, 10, and 12 weeks after the day of publication date of the analyst reports. For example, the rate of change for two weeks after the publication date FR (14) is calculated using Equation (4).
In this paper, the forecast periods are 2, 4, 6, 8, 10, and 12 weeks. In the experiment, we performed binary classification according to the rate of change of the analyst's net income estimate FR. The threshold of FR is calculated using Equation (5).
This is obtained using a linear approximation of the medians of the rates of change of the estimated net income for 2, 4, 6, 8, 10, and 12 weeks in training and validation data.

Dataset for Stock Price Forecast
We collect stock prices from the publication date of the analyst report to the day after two weeks (14 days, 10 business days) and the Tokyo Stock Price Index (TOPIX) for the same period. Analyst reports are issued after the closing of a transaction because they make huge impacts on the market. Then, the information from the analyst reports is incorporated into the market the next day. For this reason, we obtain stock prices and TOPIX from the day after the publication date. Using these values, we calculate excess returns. Furthermore, using the price of a brand on the issue date of the analyst report C 0 , the price on the date after 10 business days, C 10 , TOPIX on the issue date, T 0 , and TOPIX on the date after 10 business days, T 10 , we calculate the excess return using Equation (6).
The excess return is used because the distribution of simple stock price returns can be lean to the positive side around 2017 when Japan was still in long-term economic recovery. Moreover, for institutional investors, who are evaluated by relative performance to their benchmarks, the predictability of excess returns is important. We use 1 and 0 to represent positive and negative excess returns, respectively, stated in each analyst report. Each analyst report usually aims at providing information for a specific company. We calculate the excess returns and label in the same way for 4 weeks (20 business days), 6 weeks (30 business days), 8 weeks (40 business days), 10 weeks (50 business days), and 12 weeks (60 business days). Table 1 shows the numbers of reports of positive or negative excess returns. In addition, we calculate the historical volatility of each stock's excess return to the market. The purpose of this paper is to examine whether we can retrieve useful information from analyst reports for investors who aim at beating the market. Therefore, we use excess returns to the market and volatilities of excess returns as targets for neural networks to forecast. We obtain stock prices and TOPIX index values, such as C 0 , C 1 , · · · , C 9 , C 10 and T 0 , T 1 , · · · , T 9 , T 10 , for 10 business days after their issue dates. Volatility is the standard deviation (SD) of the subtractions, expressed in Equation (7): We use 1 to label the data whose absolute value of volatility is higher than the median and 0 to label those data whose absolute value of volatility is lower than the median. The median level is dependent on the input data.

Methodology
In this section, we introduce our proposed method, which uses neural networks. Figure 2 shows an overview diagram of our method. First, we construct 200-dimensional word embeddings [23]. The embedding is performed in two parts: decomposing sentences into words (the Japanese language does not have spaces between the words in a sentence) and converting each word into a vector, which is called a distributed representation. For the former part, we use MeCab (Available at https: //taku910.github.io/mecab/) with the dictionary of mecab-ipadic-NEologd [24][25][26]. For the latter part, we use Global Vectors for Word Representation (GloVe) (Available at https://nlp.stanford.edu/ projects/glove/) [27].
Overview of our method. Sentences are split into words using MeCab. Words are converted to word embeddings using GloVe. We input these word embeddings to bidirectional LSTM or GRU. The outputs of the hidden layers are weighted by the attention mechanism. We input the weighted output to MLP and softmax function. Then, the probability of each label is output.
Second, we input the word embeddings to recurrent neural networks, which perform better in natural language processing tasks. The type of RNN, namely Long Short-Term Memory (LSTM) [28,29] and gated recurrent unit (GRU) [30], shows high performance. Therefore, we employ these models for opinion sentence extraction. Regarding the LSTM or GRU, we use bi-directional ones. In common single-directional LSTM or GRU, only past information is used for learning. However, in bi-directional LSTM and GRU, we use both past and future information for learning.
To align the sequence length, we pad inputs that do not have the same sequence lengths as the longest sequence with 200-dimensional 0 vectors. To pad inputs, first, we make a list of all the words in all the sentences we use. Then, we make 0 vectors with dimensions that are the same as the length of the list for each sentence and replace 0 at an index in the list of a word with 1. Between the LSTM or GRU layers and multi-layer perceptron (MLP) layers, we place a self-attention mechanism. This helps us in determining which parts are stressed in the forecasting model to make accurate forecasting. Hidden state vectors that go through LSTM or GRU are propagated to the self-attention mechanisms. The outputs of the self-attention mechanism are propagated to MLP layers. On the last layers of MLP, probabilities of 1 and 0 are output. Consequently, a higher probability is adopted.
We describe our method for LSTM in detail. Here, we define LSTM processing from the beginning of a sentence as −−−→ LSTM and that from the end of the sentence as ←−−− LSTM. For each input, our method Here, n is the number of input words and e i is the vector-entered ith words. We define h i as the concatenation of ← − h i and − → h i : Here, h i ∈ R 2m . m is the number of units in the hidden layer. The attention weight α i corresponding to h i is calculated using Equation (11).
Here, H = (h 1 , h 2 , · · · , h n ) is a vector formed by the concatenation of the vector of each hidden layer. u ∈ R n and W h is a weighted matrix, while b h is a bias vector. We weight h i by the attention weight α i and calculate the output of the attention mechanism s as follows: Here, s ∈ R 2m . Then, s is entered into the MLP layers in Equations (13) and (14): Here, u ∈ R l , W s and W u are weighted matrices, and b s and b u are bias vectors. l is the number of units in the middle layer of the MLP, and Y is an output layer, denoted as Y = (y 1 , y 2 ). y 1 and y 2 take a real value of 0 or more and 1 or less, respectively, and the sum is obtained as 1 using the softmax function. These represent the probabilities of the classes. Finally, our proposed method selects a label having a maximum value from the output layer Y as output. Figure 3 shows an overview diagram of our proposed method with inputs of opinion and non-opinion sentences. Therefore, to fix the model (Figure 2), we update Equations (12) and (13) as follows.
Here, h i is a hidden layer of LSTM opinion that has the opinion sentences as inputs, h i is a hidden layer of LSTM nonopinion that has the non-opinion sentences as inputs, and α i is an attention weight of LSTM opinion . α i is an attention weight of LSTM nonopinion .

Opinion Sentence Extraction (Pre-Experiment)
We use opinion and non-opinion sentences in our experiments of net income estimate forecast and stock price forecast. We formulate an opinion sentence extraction model using 2213 sentences (analyst report sentences) to distinguish between opinion and non-opinion sentences. In addition, we compare several word-embedding models formulated using various resources. Thus, we analyze which resource for a word-embedding model would be relevant for financial text mining.

Data
In this section, we introduce the analyst report sentences and corpora used in this experiment. First, we extract 100 reports randomly from 10,100 analyst reports issued in 2017. Then, we manually classify 2213 sentences in the reports into opinion and non-opinion sentences. Here, an opinion sentence is defined as a sentence containing an analyst's forecast of a variable, such as ratings for future stock prices, sales or forecasted net earnings for the next year, and backgrounds of current sales. A non-opinion sentence deals with sentences about facts such as past business results in this research. Table 2 shows examples of opinion and non-opinion sentences. After manual tagging, 1188 sentences are labeled as opinion sentences, while the remaining 1025 sentences are labeled as non-opinion sentences. Table 2. Typical examples of opinion and non-opinion sentences in analyst reports. English follows Japanese.

Opinion/Non-Opinion Sentence
Opinion 2Q実績を踏まえ，業績予想を下方修正する． (We will revise our earnings forecast downwards based on 2Q results. In our experiments, we used the following five corpora to create the word embeddings. We apply comparison methods to create a list of all words in all sentences used. We create 0 vectors with dimensions that are the same as the length of the list for each sentence and replace 0 at an index in the list of a word in a sentence with 1.

Experiments
Regarding the task of learning to distinguish between opinion and non-opinion sentences, we considered inputs as vectors of the words in a sentence. Among the 2213 sentences, we used 70%, 10%, and 20% of them for training, validation, and testing, respectively. We changed hyperparameters such as types of an RNN model, number of epochs, number of hidden layers of the RNN, number of inner layers of MLP, mini-batch size, learning rate, and corpus. The types of RNN models used in this experiment are LSTM and GRU. We also performed this task using comparison methods, such as Linear Support Vector Machine (SVM) and Random Forest (RF). Table 3 shows the results of each model and corpus. Our method achieved the best result in this experiment using the corpus from the analyst report set. With this model, we split the sentences in the analyst report set (consisting of 17,356 reports) into opinion and non-opinion sentences. We used the corpus created from the analyst report set for the main experiments. Table 3. Results of opinion sentence extraction (pre-experiment). The evaluation index is Macro-F1. SVM and RF do not use word embeddings, so the corpus is indicated by a hyphen (-).

Model Corpus F1
Our Method Analyst Report Sentences 0.

Experiments of Forecasting Net Incomes and Stock Prices
We performed two experiments: forecasting movements of analyst net income estimates and forecasting movement of stock prices. We present the results and discussion of these two experiments in the next two sections, respectively.

Forecasting Movements of Analyst Net Income Estimates
In this experiment, we forecasted the rise or descent of the analyst's net income estimates. We inputted opinion and non-opinion sentences split in Section 5, and the trend mentioned in Section 3.2. There are four types of inputs of the analyst reports: • All sentences • Only opinion sentences • Only non-opinion sentences • Opinion and non-opinion sentences separately We inputted the trend in a hidden layer of MLP. Among these reports, we used 64%, 16%, and 20% for training, validation, and testing, respectively.
To align the sequence length, we padded inputs that do not have the same sequence lengths as the longest sequence with 200-dimensional 0 vectors. To pad inputs, first, we made a list of all the words in all the sentences we used. Then, we made 0 vectors with dimensions that are the same as the length of the list for each sentence and replaced 0 at an index in the list of a word with 1. To reduce the effect of padding, we confined the numbers of words of input sentences. There were 530 words when inputting all sentences of analyst reports, 370 words when inputting only opinion sentences, and 250 words when inputting only non-opinion sentences. When the number of words in sentences was more than the criterion, we inputted from the beginning of the report to the criterion length. We set this criterion length to input 90% of reports without cutting in the middle. We inputted each type of analyst reports by broker. That is, we prepared four types of inputs (only opinion sentences, only non-opinion sentences, both opinion and non-opinion sentences, and all sentences without opinion/non-opinion distinction) for five brokers (Brokers A-E), which led to 20 types of input in total. In addition, we took long/short strategies. We took a long (buy) position with a stock of a higher rate of the net income estimate than the threshold and a short (sell) position with that of a lower rate. Therefore, we calculated how much an excess return is expected when we close each position (sell back/buy back) after the forecasting period. We used PyTorch (version 1.3.1) for implementation, optuna (version 0.19.0) for parameter selection, cross-entropy as the loss function, and Adam as the optimization algorithm. We also performed this task using comparison methods, i.e., SVM and RF, and tested with two-sided p-value to compare the results statistically.

Forecasting Movements of Stock Prices
In this experiment, we performed the following three tasks: the distinction between positive and negative excess returns, the distinction between high and low volatilities, and the multitask of distinction of positive or negative returns and brokers that issue reports. In all three tasks, the conditions of the inputs of analyst reports were the same as those reported in Section 6.1. We experimented with four input types and five brokers, and limited the sequence length. We also performed this task using comparison methods of SVM and RF and tested with two-sided p-value to compare the results statistically.
In the multitask, we performed two distinctions simultaneously. One of them is the distinction between positive and negative excess returns, and the other is that of brokers. Together with two outputs of probabilities of positive and negative returns (illustrated as numbers at the top of Figure 2), we had five outputs from the output layer of MLP to distinguish the five brokers. We applied the softmax function to the five outputs and gain a broker that has the highest probability.
In the distinction of the excess returns, we also took long/short strategies. When an excess return of a stock is forecasted to be positive, we took the long (buy) position. However, when it is forecasted to be negative, we took the short (sell) position. Table 4 shows the summary by taking the average of the results by broker and input sentence. Table 5 shows the summary of each index and the results of the comparison methods. Figures 4-8 show the time-series excess returns of long/short strategies with the results obtained by the brokers. Table 4. Summary by broker in forecasting analyst net income estimates with our method. Macro-F1 is a measure.

Broker
Opinion Sentences   Tables 6 and 7 show the summary by taking the average of the results by broker and input sentence. Tables 8 and 9 show the summaries with the results of the comparison methods. Figures 9-13 show the graphs of gained excess returns of long/short strategies with results obtained by the brokers. Table 10 shows the results of the multitask. Table 6. Summary by broker in forecasting excess returns with our method. Macro-F1 is a measure.

Broker
Opinion Sentences

Forecasting Movements of Analyst Net Income Estimates
In Table 4, for Brokers A, D, and E, F1 was the highest when inputting only non-opinion sentences. For Broker B, F1 was the highest when inputting opinion and non-opinion sentences separately. For Broker C, F1 was the highest when inputting only opinion sentences. Analysts in Broker C revise their estimated incomes based on their long-term views expressed as opinion sentences in the context of reports, while analysts in Broker A, D, and E put more weights on facts mainly released from target companies to revise their estimated incomes. This would make a difference. The basis of the forecast would differ for each broker. In this experiment, the distinction between opinion and non-opinion sentences were effective in forecasting. A high F1 of about 0.90 was obtained when inputting non-opinion sentences or all sentences of Broker D, but no result of Brokers A and E reached F1 of 0.70. Therefore, these are considered as brokers that are easy to analyze. When compared to the comparison methods, the most results of our method were clearly better than those of RF and RNN without inputting the trend in terms of p-values. In addition, our method performs slightly better than SVM. As the accuracy was higher than that of the RNN without inputting the trend, it is more effective to add the trend. Besides, SVM inputting only sentences in analyst reports also obtained such a high F1 that contexts of analyst reports were also effective for the forecast. Further improvement in the F1 score can be obtained by analyzing both analyst reports and the trend. For instance, constructing hierarchical attentions [31], instead of insertion from a hidden layer of MLP, would check how effective the trend and analyst reports are. The returns with the long/short strategy were more positive than negative, but the graphs were not monotone.
In the results of Broker D, the inputs of non-opinion sentences with high F1 scores increase monotonically in return, but those of all sentences with relatively high F1 scores fall below 0 in the 6th week. This indicates a low correlation of F1 scores and returns, which is attributed to the stock market being affected by the performance of each stock, as well as political and other economic conditions. The reason may be that the movement of a stock price is affected not only by the direction of net income estimate but also by other factors such as political and economic conditions.

Forecasting Movements of Stock Prices
The average F1 score of the excess return was 0.52 and that of the volatility was 0.60. There were no significant differences among the three input methods, such as inputting only opinion sentences, inputting only non-opinion sentences, and inputting opinion and non-opinion sentences separately. The difference between opinion and non-opinion sentences was ineffective in forecasting stock price movements. However, the best ways of the inputs were different by broker. We could not find advantages with our method over the comparison methods in terms of p-values. Therefore, the superiority of our method in this experiment was not found. As illustrated in Section 8.1, excess returns and volatilities contain random elements as well as stock prices, so it is difficult for humans to forecast. Therefore, there is no accurate forecasting in any method. The returns (Figures 9-13) did not show any features, such as monotone increasing in the whole.
In the multitask experiment, the F1 score did not considerably exceed 0.5, which was expected under a random condition. The experiment was not performed well because the issuing brokers did not have a significant effect on forecasting stock price movements. In addition, forecasting stock price movements was a difficult task. It is thought to have been made more complex by learning issuing brokers.

Conclusions
This study aimed at obtaining unique information from the analyst reports. We propose a method to forecast analyst's net income estimates and movements of stock prices from opinion and non-opinion sentences extracted from analyst reports with the combination of RNNs, an attention mechanism, and MLP. Under the assumption that opinion sentences of analysts are effective in forecasting analysts' net income estimates and stock price movements, we distinguished opinion and non-opinion sentences from analyst reports. We performed the distinction with the F1 score exceeding 0.8. In this pre-experiment, word embeddings derived from the corpus of analyst reports achieved the best performance.
Next, using opinion and non-opinion sentences extracted from analyst reports, as well as analyst's estimate trends, we forecasted that analyst's net income estimate was higher or lower than the threshold. By broker, various inputs that had a high F1 score existed, such as the input of only opinion sentences, that of only non-opinion sentences, and that of opinion and non-opinion sentences separately. This difference results from the difference of an analyst's net income estimate based on his/her opinions or based on facts. The basis of estimates is thought to be different by broker. In forecasting estimated net incomes, the distinction between opinion and non-opinion sentences was effective. In addition, the division of inputs by the brokers was effective because of the difference obtained between brokers. The input of the trends, the information except for the analyst reports, was also effective because the F1 score of our method was higher than that of the method without inputting the trend. We also calculated the returns under the condition of long/short strategies. However, the correlation with F1 was low.
Finally, we forecasted the movements of stock prices with opinion and non-opinion sentences in analyst reports. In this experiment, we used the excess returns and volatilities as targets. F1 scores were around 0.5 and 0.6 in the experiments of excess returns and volatilities, respectively. The forecasting accuracy did not increase. The reason for this is attributed to the difficulty which we could not forecast only by current situation analysis gained from analyst reports. Furthermore, we performed multitask learning, which learned brokers and positive/negative excess returns simultaneously. However, the F1 score was around 0.5 and we could not obtain higher accuracy. In forecasting stock price movements, the distinction between opinion and non-opinion sentences was not effective. In this research, we mainly focused on opinion sentences and non-opinion sentences. Besides these sentences, we gained results of the other indices, such as a week and a broker. We can investigate the relationships between the results of these indices.
We forecasted with analyst reports and the trend in this study. However, we can add other indices not only from the hidden layer of MLP but also from another point. Analyst reports are expected to have different bases depending on the analyst who wrote them, rather than the issuing broker. Then, the experiments conducted by analysts assist in comparing the results obtained by them. Besides, the performance can be improved by constructing the network and adding the information of analysts. The effect of analyst reports on stock prices is different in terms of analyst's popularity. Therefore, adding the analyst's information may make it possible to consider the difference by the analyst's popularity and the difference in the point where an analyst emphasized as of the ground.