Detecting and Analyzing Politically-Themed Stocks Using Text Mining Techniques and Transfer Entropy—Focus on the Republic of Korea’s Case

Politically-themed stocks mainly refer to stocks that benefit from the policies of politicians. This study gave the empirical analysis of the politically-themed stocks in the Republic of Korea and constructed politically-themed stock networks based on the Republic of Korea’s politically-themed stocks, derived mainly from politicians. To select politically-themed stocks, we calculated the daily politician sentiment index (PSI), which means politicians’ daily reputation using politicians’ search volume data and sentiment analysis results from politician-related text data. Additionally, we selected politically-themed stock candidates from politician-related search volume data. To measure causal relationships, we adopted entropy-based measures. We determined politically-themed stocks based on causal relationships from the rates of change of the PSI to their abnormal returns. To illustrate causal relationships between politically-themed stocks, we constructed politically-themed stock networks based on causal relationships using entropy-based approaches. Moreover, we experimented using politically-themed stocks in real-world situations from the schematized networks, focusing on politically-themed stock networks’ dynamic changes. We verified that the investment strategy using the PSI and politically-themed stocks that we selected could benchmark the main stock market indices such as the KOSPI and KOSDAQ around political events.


Introduction
Thematic stocks generally refer to a group of stocks related to a particular topic and exist in various areas, including politics, science, technology, entertainment, environment, health care, and resource development. Thematic stocks in the same theme have the characteristic of synchronizing their stock price movements in the same direction. Investors invest in thematic stocks expecting high returns in the short term rather than stable returns from a long-term perspective. For this reason, thematic stocks' price movements are generally decided by stock-related news or investors' investing behaviors rather than by firms' performance or financial status. In particular, stocks tied up with political themes are called politically-themed stocks, which are generally benefited from policies and politicians. By this definition, there can be two types of politically-themed stocks: policy-related politicallythemed stocks and politician-related politically-themed stocks.
First, policy-related politically-themed stocks are defined as stocks that are expected to benefit from the policy direction of politicians or political parties. In the case of Joe Biden, the 46th U.S. president recently elected, environmental issues were cited as the most significant issue: he signed an executive order to rejoin the U.S. into the Paris climate agreement. As a result, companies in related industries such as solar, wind, and hydrogen power generation along with electric vehicles were expected to benefit as politically-themed stocks, since those industries can be supported by his eco-friendly policies. Likewise,

Research Period
In this study, the research period was set from 1 January 2016, to 30 June 2020. We selected this period because all three primary elections in Korea were held at intervals of a year, and unique political situations (impeachment of the Eighteenth President) were included in this research period. The five major political events that occurred during the period in Korea are in Table 1. We used web search data and included search volumes and associated search keywords in this study. First of all, we selected 12 politicians in Korea for our research subjects. These selected 12 politicians with experience in presidential elections or politicians who had been mentioned as strong candidates in presidential elections were selected in the research periods [1][2][3][4][5][6][7][8][9]. Daily search volume data of politicians in the research period were originated from Naver Trends and Google Trends. As we mentioned before, we selected those two search engines because their search shares in Korea occupied over 85% within the research period. All 12 chosen politicians were de-identified using letters A to L randomly because this paper's purpose was not to comment on a particular politician or to encourage the sale of stocks for a specific company. The brief information of 12 selected politicians are in Table 2.

Strongly Mentioned as a Presidential Candidate from the Press
We noted that the larger the theme-related search volume data, the more these thematic stocks are usually known as leading stocks. Therefore, we searched through keywords such as 'politically-themed stocks of politician A' or 'politician A's politically-themed stocks'. Using collected search volume results based on those keywords, we designated the politician-related stocks whose search volumes were more than 1% of the search volume of most searched politician-related stocks as the politically-themed stock candidates from the search results. For example, if the search volume of the most searched politicianrelated stocks is 200,000, then the politician-related stocks whose search volumes are over 2000 are designated as the politically-themed stock candidates. As a result, we obtained 189 politically-themed stock candidates. A range of 10 to 21 stocks was allocated as politicalthemed stock candidates for 12 politicians each. As mentioned before, 189 politicallythemed stock candidates were de-identified, just as politicians. According to the politicianrelated search volumes, they were numbered in descending order-for example, the stock of the seventh most searched politically-themed stock of politician A is labeled A-7. Using de-identified numbers, we observed whether the actual quantity of causal relationships or network measures are related to the politician-related search volumes. Figure 1 summarizes the number of politically-themed stock candidates for each of the politicians and the ratio of each stock's politician-related search volume to the most searched stock's politician-related search volumes. randomly because this paper's purpose was not to comment on a particular politician or to encourage the sale of stocks for a specific company. The brief information of 12 selected politicians are in Table 2. We noted that the larger the theme-related search volume data, the more these thematic stocks are usually known as leading stocks. Therefore, we searched through keywords such as 'politically-themed stocks of politician A' or 'politician A's politicallythemed stocks'. Using collected search volume results based on those keywords, we designated the politician-related stocks whose search volumes were more than 1% of the search volume of most searched politician-related stocks as the politically-themed stock candidates from the search results. For example, if the search volume of the most searched politician-related stocks is 200,000, then the politician-related stocks whose search volumes are over 2000 are designated as the politically-themed stock candidates. As a result, we obtained 189 politically-themed stock candidates. A range of 10 to 21 stocks was allocated as political-themed stock candidates for 12 politicians each. As mentioned before, 189 politically-themed stock candidates were de-identified, just as politicians. According to the politician-related search volumes, they were numbered in descending order-for example, the stock of the seventh most searched politically-themed stock of politician A is labeled A-7. Using de-identified numbers, we observed whether the actual quantity of causal relationships or network measures are related to the politician-related search volumes. Figure 1 summarizes the number of politically-themed stock candidates for each of the politicians and the ratio of each stock's politician-related search volume to the most searched stock's politician-related search volumes.

Text Data
We randomly collected another 35,019 articles from the political section of Naver News and Google News from 2015 to 2020 and labeled them with sentiment lexicons. We labeled sentiment values of those articles 1 if the summation of lexicons in an article is positive and −1 if the summation of lexicons in an article is negative. We assumed that each article has only one sentiment in one article: positive or negative. After labeling those data, we used them as training data for conducting sentiment analysis. Additionally, we collected 216,843 text data related to 12 politicians in training days of research periods from the political section of Naver News and Google News. To collect the politician-related text data, we collected the text data including the politicians' names. Using the trained model, we conducted sentiment analysis using these collected text data for calculating the daily sentiment score of 12 politicians.

Abnormal Return Data
In this study, we verified politically-themed stocks using the methods that try to calculate politicians' positive and negative reputation quantitatively in the form of PSI and checked causal relationships to abnormal returns of their politically-themed stocks. Accordingly, articles, editorials, and comments related to politicians were selected as text data for performing sentiment analysis to avoid strong biases of unfiltered documents in private spaces such as social network services and blogs. Then, we performed sentiment analysis of those collected text data and calculated the PSI for politicians within the research period. Using the calculated PSI, we verified causal relationships from the ROC of the PSI to abnormal returns of politically-themed stock candidates. Then, the stocks with statistically significant causal relationships were selected as politically-themed stocks.
In this study, we used abnormal returns of stocks to reduce a stock market's influence and focus on politically-themed stocks' performances on the individual level. The abnormal return of an asset is the subtraction of the expected return derived from the asset pricing model from the historical return. To calculate the abnormal returns of stocks, we collected the closing price data of 189 politically theme stock candidates in the research period and computed abnormal returns using the market model (MM).
MM was introduced by Brown and Warner [14] to examine the properties of daily stock returns and how particular characteristics of these data affect event study methodologies. Various finance studies have used this model to derive individual stocks' abnormal returns [9,15]. The equation is as follows: In Equation (1), E(r i,t ) is the return of stock i on day t and E(r m,t ) is the expected daily return to the market portfolio of risky assets on day t. α i and β i are the intercept and the slope of the fitted line derived from linear regression results. We estimated the values of α i and β i using ordinary least squares (OLS) estimators in the original research. Therefore, ε i,t is the error term (a random variable) with expectation zero and finite variance. Additionally, ε i,t is uncorrelated to the market return E(r m,t ) and firm return E(r i,t ) with i = j, homoskedastic and not autocorrelated. The daily returns of Korean Securities Dealers Automated Quotations (KOSDAQ) and Korea Composite Stock Price Index (KOSPI) indices are used as E(r m ) in this study. As a result, we can derive abnormal returns with estimated coefficientsα i andβ i from Equation (2). In Equation (2), r i,t is the actual return of stock i on day t and r m,t is the actual return of the market index on day t.

Preliminary Analysis
We used the abnormal return dataset derived from the market model for measuring causal relationships from ROC of the PSI to abnormal returns of politically-themed stocks. This dataset consists of 1101 daily abnormal return data of 189 politically-themed stock candidates. We checked elementary statistics and performed the Shapiro-Wilk test and Jarque-Bera test for normality, the Ljung-Box test for 10-order serial autocorrelation in squared returns. Additionally, the augmented Dickey-Fuller (ADF) test, Phillips and Perron (PP) test, and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test for stationarity and the White test for homoskedasticity. The detailed results of abnormal return data are in Table A1 of Appendix A.
We summarized our diagnostic results in Figure 2. Figure 2 shows that our abnormal return data exhibit departure from normality based on the Shapiro-Wilk test and Jarque-Bera test. The two test results show that our data do not meet normality regardless of sample sizes at all three significance levels at the rate 0.1, 0.05, and 0.1. Additionally, all abnormal return data also turned out stationary according to ADF, PP, and KPSS tests at all three significance levels. We chose the ADF test results as the representative of those three stationary tests for brevity. In Figure 2, the bar charts mean that the number of stocks that are satisfied with the statistical properties of data that are written at the left side at the significance level 0.1, 0.05, and 0.01 each.  Table  A1 of Appendix A. We summarized our diagnostic results in Figure 2. Figure 2 shows that our abnormal return data exhibit departure from normality based on the Shapiro-Wilk test and Jarque-Bera test. The two test results show that our data do not meet normality regardless of sample sizes at all three significance levels at the rate 0.1, 0.05, and 0.1. Additionally, all abnormal return data also turned out stationary according to ADF, PP, and KPSS tests at all three significance levels. We chose the ADF test results as the representative of those three stationary tests for brevity. In Figure 2, the bar charts mean that the number of stocks that are satisfied with the statistical properties of data that are written at the left side at the significance level 0.1, 0.05, and 0.01 each.

Sentiment Analysis
Sentiment analysis is a text-mining methodology that analyzes the attitude or inclination of writing or talking to identify sentiments on a particular subject, usually the text data. It is mainly used to determine the positive and negative opinions of texts such as articles, movie reviews, or posts from social network services.
From the aspect of tasks and applications, sentiment analysis has been applied in broad areas such as subjectivity classification, polarity determination, multilingual and cross-lingual sentiment analysis, cross-domain sentiment analysis, opinion spam detection, corpora creation, opinion word, aspects extraction, etc. Concerning the finance domain, researchers have used sentiment analysis to predict the movements of stock prices, support decisions, and anticipate risks.
We assumed that sentiments within one document extracted from articles, editorials, comments, and posts, were probability-distributed to calculate the PSI using the sentiment analysis. We used a machine learning-based approach for performing sentiment analysis implementation: the long short-term memory (LSTM) [16] model. LSTM is one of the recurrent neural network (RNN) techniques that are well-suited to time-series data,

Sentiment Analysis
Sentiment analysis is a text-mining methodology that analyzes the attitude or inclination of writing or talking to identify sentiments on a particular subject, usually the text data. It is mainly used to determine the positive and negative opinions of texts such as articles, movie reviews, or posts from social network services.
From the aspect of tasks and applications, sentiment analysis has been applied in broad areas such as subjectivity classification, polarity determination, multilingual and cross-lingual sentiment analysis, cross-domain sentiment analysis, opinion spam detection, corpora creation, opinion word, aspects extraction, etc. Concerning the finance domain, researchers have used sentiment analysis to predict the movements of stock prices, support decisions, and anticipate risks.
We assumed that sentiments within one document extracted from articles, editorials, comments, and posts, were probability-distributed to calculate the PSI using the sentiment analysis. We used a machine learning-based approach for performing sentiment analysis implementation: the long short-term memory (LSTM) [16] model. LSTM is one of the recurrent neural network (RNN) techniques that are well-suited to time-series data, presented by Hochreiter and Schmidhuber, and has been usually used in language-based machine learning and speech recognition. LSTM can be used to categorize or predict future data with macroscopic attention to historical data and has been used as one of the most commonly used machine learning models to conduct sentiment analysis of Korean texts in recent years [17,18]. Especially, the bidirectional LSTM (BiLSTM) model is used in this Entropy 2021, 23, 734 9 of 43 study because BiLSTM is known as the proper model for sentiment analysis using the Korean language [19,20].
LSTM generates positive rates and negative rates of documents. Using the sentiment analysis results, the PSI of politicians can be defined as in Equation (3) so that the search volume data and sentiment analysis results can be considered simultaneously.
In Equation (3), PSI i, t is the PSI of the politician i at time t, P j i, t is the positive rate and N j i, t is the negative rate of the j-th document of politician i at time t from sentiment analysis results which hold P j i, t + N j i, t = 1. V i, t means the number of documents related to politician i used in analysis at time t, and SV i, t is the T-score of search volume data of politician i at time t. We converted search volume data to T-score because their rate of changes (ROC) are too extreme to use. We also transformed original related terms of P j i, t and N j i, t from scale [−1, 1] to [0, 1] for the convenience of calculation. We obtained the daily PSIs of politicians by calculating (3) and computed their rate of changes to demonstrate causality between the abnormal returns of politically-themed stock candidates. Using the property that P j i, t + N j i, t = 1, we can convert the PSI to the rightmost expression of (3). We obtained the daily PSI of politicians by calculating (3) and computed their ROC to demonstrate causal relationships between abnormal returns of politically-themed stock candidates.

Transfer Entropy
To select and analyze politically-themed stocks, we needed to measure causal relationships quantitatively. However, general causal relationships represented by Granger causality [21] required some assumptions of data like normality, stationary, and linearity. However, the results of the preliminary analysis show that the abnormal returns of politically-themed stock candidates were not satisfied with normality. This is similar to the known natures of return-based data of stocks, in that they do not usually satisfy these properties [22][23][24]. Therefore, we tried to utilize econophysics and information theories, which can be used without the assumptions mentioned above. From those theories, we can consider not only linear relationships but also non-linear relationships between objectives to measure causal relationships [25,26]. Accordingly, we used the concept of transfer entropy (TE) proposed by Schreiber [27], which are the entropy-based measures. In detail, we used effective transfer entropy (ETE) using Shannon entropy in this study.
TE is a non-parametric measure for verifying the information transfer amount between two variables based on Shannon entropy. Due to its feature to quantify causal relationships within systems and to identify source variables and target variables efficiently, TE has been spotlighted and widely used not only in the information or physics field but also in neuroscience, electrical engineering, chemical engineering. TE has been widely used to determine causal relationships between financial assets and markets in finance [28]. Specifically, general stock market indices, exchange rates, stock price, sector index, and cryptocurrency has been researched using TE. In the 2000s, Marschinki and Kantz [29] reported the causal relationship between the German DAX Xetra Stock Index (DAX) and Dow Jones Industrial Average. Kwon and Yang [30] showed the directionality of the information transfer and found that the market indices influence individual stocks in the US stock market. After the 2000s, Dimpfl and Peter [31] analyzed the causal relationship of the credit default swap market relative to the corporate bond market for the pricing of credit risk and the dynamic relation between market risk and credit risk proxied by the VIX and the iTraxx Europe from the perspective of pre-crisis, crisis and post-crisis periods based on ETE.
Moreover, Sandoval [32] used ETE to examine the causal relationship among 197 worldwide financial companies. Sensoy et al. [33] investigated the strength and direction of information flow between exchange rates and stock prices in several emerging countries using ETE. Based on TE, Bekiros et al. [34] investigated the network dynamics in US equity and commodity markets, and Lim et al. [35] analyzed the information flow between industrial sectors in credit default swaps and stock markets in the US based on TE from the aspects of intra-structures and inter-structures. Recently, Jang et al. [36] studied the causal relationship between Bitcoin, gold, S&P 500 index, and US dollars using TE, and Yue et al. [37,38] analyzed information transfers between stock market sectors in China and compared between the US and China stock markets. These prior studies can support our idea to use TE as the measure of causal relationships.
Based on the entropy concepts mentioned earlier, conditional entropy can be defined as the expected value of the entropies of the conditional distributions that are averaged over the conditioning random variable. Especially, the conditional entropy of a discrete random variable Y given a discrete random variable X can be expressed as: Based on the above definition, we can define the general form of (k, l)-history TE between two discrete time series X t and Y t for x (k) t = (x t , .., x t−k+1 ) and y (l) t = y t , .., y t−l+1 . The general (k, l)-history transfer entropy can be expressed as follows [28]: Y→X (t) is nonnegative, and for stationary processes we can drop t, which is the time dependency argument. Accordingly, TE  of the credit default swap market relative to the corporate bond market for the pricing of credit risk and the dynamic relation between market risk and credit risk proxied by the VIX and the iTraxx Europe from the perspective of pre-crisis, crisis and post-crisis periods based on ETE. Moreover, Sandoval [32] used ETE to examine the causal relationship among 197 worldwide financial companies. Sensoy et al. [33] investigated the strength and direction of information flow between exchange rates and stock prices in several emerging countries using ETE. Based on TE, Bekiros et al. [34] investigated the network dynamics in US equity and commodity markets, and Lim et al. [35] analyzed the information flow between industrial sectors in credit default swaps and stock markets in the US based on TE from the aspects of intra-structures and inter-structures. Recently, Jang et al. [36] studied the causal relationship between Bitcoin, gold, S&P 500 index, and US dollars using TE, and Yue et al. [37,38] analyzed information transfers between stock market sectors in China and compared between the US and China stock markets. These prior studies can support our idea to use TE as the measure of causal relationships.
Based on the entropy concepts mentioned earlier, conditional entropy can be defined as the expected value of the entropies of the conditional distributions that are averaged over the conditioning random variable. Especially, the conditional entropy of a discrete random variable Y given a discrete random variable X can be expressed as: Based on the above definition, we can define the general form of (k, l)-history TE between two discrete time series X t and Y t for x t (k) = (x t , . . , x t−k+1 ) and y t (l) = (y t , . . , y t−l+1 ). The general (k, l)-history transfer entropy can be expressed as follows [28]: where i = {x t+1 , x t (k) , y t (l) }. TE Y→X (k,l) (t) is nonnegative, and for stationary processes we can drop t, which is the time dependency argument. Accordingly, TE Y→X (k,l) (t) is the information about the future state of X i which can be obtained by subtracting information retrieved from only X t (k) from information gathered from both X t (k) and Y t (l) . The schematic representation of transfer entropy is in Figure 3.  In this study, we focused on the TE under the following conditions of two lags k = l = 1, which is commonly selected because these settings about lags can be safely assumed on the weak form of the efficient market hypothesis (EMH) and the random walk behavior of stock prices [34,36]. Then, we can express the equation of (1,1)-history TE as follows: where i = {x t+1 , x t , y t }.
However, TE has a disadvantage of finite size effects caused by samples. To complement this demerit, we used ETE in this study. To calculate the effective transfer entropy between two time-series data, we shuffled each element of time-series randomly and designated it as the new time-series sets. Then, we obtained the transfer entropy from this selected time-series called randomized transfer entropy (RTE) and subtracted RTE from the originally calculated TE to eliminate noises. This procedure destroys the time-series dependencies of Y and the statistical dependencies between Y and X. Then, the result of this subtraction is ETE, which is as follows. To assess the statistical significance of estimated TE values, we adopt a Markov block bootstrap method proposed in Dimpfl and Peter [31] and tried an average of 100 shuffles from each calculation of ETE. With time delay k and l, the ETE from two random variables X ., Y t−l+1 ) can be defined as: where Y (i) refers to the shuffled series of Y for all t. We used ETE twice in this study. First, we used ETE to verify causal relationships from ROC of PSI to abnormal returns for selecting politically-themed stocks of politicians. Second, we demonstrated ETE to derive a causal relationship between selected politically-themed stocks to construct each politician's politically-themed stock network.
To calculate transfer entropy, the data needed to be discretized; thus, we used histogrambased estimation since histogram-based estimation is one of the most common discretization methods for continuous random variables to calculate transfer entropy [29][30][31][32][33][34][35]. We used the histogram-based estimation to calculate entropy measures. In particular, we considered histograms with equally spaced bins. For a continuous random variable, the bin width is determined by selecting an appropriate number of bins in the sample range using mean squared error. Due to their robustness of non-Gaussian data, we decided to use Doane's equation [39] and the Freedman-Diaconis rule [40]. These two models can complement each other to set a histogram's bin width and the number of bins in a histogram.

Complex Network Analysis
Complex network analysis is usually used to describe a high degree of interdependence between objects. From this point of view, there can be various network analysis applications to financial systems. Most of the existing studies using network theories concentrate on analyzing correlation, financial stability, and contagion phenomena. Moreover, most of the financial network studies researched network effects rather than network formation [41]. Recently, several papers were published in new research areas such as market analysis, social networks, investment decisions, investment banking, and microfinance. We analyzed politically-themed stock networks on the network level and node level to identify causal influence in politically-themed stock networks.
We used network density (ND) and average ETE to analyze and summarize network dynamics through many periods. These network-level measures have been generally used to analyze stock markets [42][43][44][45] and can effectively illustrate the states of politicallythemed stock networks.
(1). Network Density (ND) The network density (ND) can be defined as the number of edges K to the number of possible connections in a network with N edges. The idea of network density comes from the binomial coefficient. In sum, the ND of the directed graph can be expressed as: Entropy 2021, 23, 734 12 of 43 (2). Average Effective Transfer Entropy To compare the ETE between politically-themed stock networks, we introduced a measure that describes the average ETE. Let w ij be elements of N × N weight matrix Then, the average value is defined as: We compared the strength of causal relationships at the network level with this measure.
There are many kinds of node-level measures to analyze networks. Especially, the concept of centrality is important in node-level network analysis. There are many types of centrality measures, and we focused on centrality measures that indicate direct influences of politically-themed stocks. For this reason, we considered centrality measurements considering the intuitive influence of individual nodes. In other words, degree centrality (DC), node strength (NS), and PageRank (PR) are the methods used to analyze politicallythemed stock networks based on ETE [46]. To compare with other politically-themed stock networks' results, we used the normalized versions of node-level measures.

(3). Degree Centrality
In unweighted directed networks, degree centrality represents the total number of edges through which a node is connected with other nodes. However, nodes of weighted directed networks have either ingoing edges, outgoing edges, or both. Generally, a node with many other nodes from it is called a hub, and a node with many other nodes pointing at it is called an authority. Therefore, the degree centrality is analyzed in two measures in weighted directed networks: in-degree (DC in ) and out-degree (DC out ). In this study, we divided two original degree centrality measures by N − 1 to compare with other politicallythemed stock networks on the same scale as we mentioned. In sum, DC in and DC out of the j-th stock can be defined as (9) and (10), respectively: (11) where a ij are the elements of the adjacency matrix A ∈ R N×N . For the adjacency matrix A, if there is a link from the stock i to the stock j, a ij = 1 and a ij = 0 otherwise. Additionally,

(4). Node Strength
In general, node strength is the sum of the weights of links connected to the node. In weighted directed networks, the in-strength is the sum of inward link weights, and the out-strength is the sum of outward link weights. NS represents the influence features in politically-themed stock networks and can be calculated in two measures: in-strength (NS in ) and out-strength (NS out ) in the same as degree centrality. We also divided it by N − 1 to compare it with other politically-themed stock networks. NS in , and NS out of the j-th stock can be defined as (12) and (13), respectively: (13) where w ij are the elements of weight matrix

(5). PageRank
PageRank [47] is an algorithm used by Google Search to rank web pages in their search engine results and is one of the famous eigenvector-based metrics that consider all network paths to determine node importance. PageRank has been introduced to rank web pages from the web graph initially. PageRank improved the disadvantages of similar metrics such as the eigenvector centrality and the Katz centrality. PageRank defines a link analysis method for a directed network to evaluate a user's influence. In finance, PageRank is recently used as the systematic measure [48,49], and stock market analysis [50,51]. Suppose there are N stocks and A ∈ R N×N denotes the adjacency matrix for the politically-themed stock network. In mathematical terms, we can obtain PageRank values of politicallythemed stocks as the form of the column vector r ∈ R N [52]: In Equation (14), I ∈ R N×N is the identity matrix, 1 ∈ R N is the column vector whose elements are all one, and is the number of outward edges started from node i. α is the damping factor, which ranges between zero and one. We set α = 0.85, which is the original study's conventional value [47]. We use the power method to approximate eigenvalues and calculate PageRank values of politically-themed stock networks.

Empirical Analysis
In this section, we will select and analyze politically-themed stocks empirically. First, we will calculate and select politically-themed stocks of politicians in Sections 3.1 and 3.2. Then, in Section 3.3, we will construct politically-themed stock networks for verifying leading stocks, which is the first known characteristic of politically-themed stocks in Korea. In Section 3.4, we will analyze the politically-themed stocks' price changes from the perspective of information flows and network dynamics. After checking the price change of politically-themed stocks, we will develop an investment strategy using politically-themed stocks and back-test the developed strategy. These final two sections are for finding out the second characteristic of politically-themed stocks in Korea: the strong price fluctuation with positive CAR effect.

Calculating the Politician Sentiment Index (PSI)
As we mentioned in 2.2.2, we randomly collected another 35,019 articles from the political section from 2015 to 2020, labeled −1 and 1 to them based on sentiment lexicons, and used them as training data. Additionally, a total of 216,843 text data related to 12 politicians were used as test data to calculate the PSI. Keyword search volume data are based on daily search volume data from Naver Trends and Google Trends.
We used a bi-directional LSTM model for conducting sentiment analysis. An LSTM layer with 128 internal units was used to train the machine-learning model. The activation function was a sigmoid function because it is the appropriate activation function for the binary classification problem, such as sentiment analysis. The maximum epoch, the optimizer, and the loss function were 50, Adam [53] with the 0.001 learning rate and binary cross-entropy, respectively. As a result of training the model, the model's accuracy after 20 epochs was 90.21%, using the early-stopping method for preventing overfitting. The final epoch was 20 by the result of using the early-stopping method. The accuracy and loss curves of the trained model can be found in Figure 4.
binary classification problem, such as sentiment analysis. The maximum epoch, the optimizer, and the loss function were 50, Adam [53] with the 0.001 learning rate and binary cross-entropy, respectively. As a result of training the model, the model's accuracy after 20 epochs was 90.21%, using the early-stopping method for preventing overfitting. The final epoch was 20 by the result of using the early-stopping method. The accuracy and loss curves of the trained model can be found in Figure 4. After combining the results of sentiment analysis and politicians' search volume data using Equation (3), we obtained the politicians' PSI. If all articles collected on a particular date are classified positively, they have a value of 1 at the date. Additionally, if all articles collected on a particular date are classified negatively, they have a value of 0 at the date. We illustrated the PSI in Figure 5 and marked out grey-colored bars, which represent before and after a month of main political events, which we aforementioned in Table 1. The saturation of the gray-colored bar means the political significance of political events. As we mentioned, the PSI of politicians belonging to left-wing parties is high overall, starting with the change of the impeachment of the Eighteenth President, which can be interpreted as the result of the Korean political regime's transfer to politicians belonging to left-wing parties. Taking the political power of politicians belonging to left-wing parties after the nineteenth presidential election, the overall strength continues, but it tends to fall slightly lower than the time around the impeachment of the Eighteenth President. After combining the results of sentiment analysis and politicians' search volume data using Equation (3), we obtained the politicians' PSI. If all articles collected on a particular date are classified positively, they have a value of 1 at the date. Additionally, if all articles collected on a particular date are classified negatively, they have a value of 0 at the date. We illustrated the PSI in Figure 5 and marked out grey-colored bars, which represent before and after a month of main political events, which we aforementioned in Table 1. The saturation of the gray-colored bar means the political significance of political events. As we mentioned, the PSI of politicians belonging to left-wing parties is high overall, starting with the change of the impeachment of the Eighteenth President, which can be interpreted as the result of the Korean political regime's transfer to politicians belonging to left-wing parties. Taking the political power of politicians belonging to left-wing parties after the nineteenth presidential election, the overall strength continues, but it tends to fall slightly lower than the time around the impeachment of the Eighteenth President. The dataset, which includes 12 politicians' ROC of the PSI, comprises 1101 rows fo each politician. Table 3 shows the descriptive statistics in accordance with the ROC of PS We checked elementary statistics and performed the Shapiro-Wilk test and Jarque-Ber test for normality, the Ljung-Box test with 10-order serial autocorrelation in squared re turns, and the White test for homoskedasticity, the same as the abnormal return data.
Our statistical test results for ROC of PSI show departure from normality based o the Shapiro-Wilk and Jarque-Bera test at all three significance levels. Additionally, all 1 ROC of PSI data were also stationary according to the augmented Dickey-Fuller (ADF test, Phillips and Perron (PP) test, and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tes We included the ADF test results as the representative of those three stationary tests fo the same reason in Section 2.2.4.
Moreover, we fitted the 12 ROC of PSI data to determine the best-fitting distributio The dataset, which includes 12 politicians' ROC of the PSI, comprises 1101 rows for each politician. Table 3 shows the descriptive statistics in accordance with the ROC of PSI. We checked elementary statistics and performed the Shapiro-Wilk test and Jarque-Bera test for normality, the Ljung-Box test with 10-order serial autocorrelation in squared returns, and the White test for homoskedasticity, the same as the abnormal return data.  (10), DF τ and LM is the test statistic of the Shapiro-Wilk test and Jarque-Bera test for normality, the Ljung-Box test with 10-order serial autocorrelation in squared abnormal return data, the augmented Dickey-Fuller test, and White test each. ***, **, * means that the null hypothesis of normality, no autocorrelation in residuals, non-stationary, and homoscedasticity is rejected at the 10 percent, 5 percent, and 1 percent significance level each.
Our statistical test results for ROC of PSI show departure from normality based on the Shapiro-Wilk and Jarque-Bera test at all three significance levels. Additionally, all 12 ROC of PSI data were also stationary according to the augmented Dickey-Fuller (ADF) test, Phillips and Perron (PP) test, and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. We included the ADF test results as the representative of those three stationary tests for the same reason in Section 2.2.4.
Moreover, we fitted the 12 ROC of PSI data to determine the best-fitting distribution among well-known continuous distributions. This experiment was conducted using Python 'SciPy' package's all available distributions and the criterion for selecting the best-fitting distribution is Kullback-Leibler divergence loss. We can find that all of the 12 ROC of PSI data are not normally distributed, and the results are in Table 4. These results give justification for our purpose to use TE once again. Table 4. Best-fit distribution in the rate of change (ROC) of PSIs.

Politician
Best-Fit Distribution

Selecting Politically-Themed Stocks
We measured the information flow between the PSI and politically-themed stock candidates, including hidden cause-effects based on the ETE with k = l = 1 case for assuming important hypotheses of stock markets aforementioned in Section 2.3.2.
In Table 5, there are the number and proportion of stocks that have ETE-based causal relationships from the ROC of PSI to abnormal returns. We inferred that ETE detected additional causal relationships between the ROC of PSI and abnormal returns from the number of stock candidates with statistically significant causal relationships in Table 5. From the results in Section 3.1, we chose politically-themed stocks that are statistically significant at α = 0.01, considering the robustness of the results. Additionally, we excluded stocks whose TE values were zero when they were rounded to the fourth decimal place. As a result, the 159 stocks, which are 84.13% of politically-themed stock candidates, were selected as the politically-themed stocks for constructing politically-themed stock networks. The selection results containing 12 politicians and their ETE values are shown in Figures 6  and 7, respectively. In sum, unlike known facts and qualitative selection methodologies of prior research, not all politically-themed stock candidates are selected as politicallythemed stocks based on this study's quantitative selection methodologies. Additionally, greater than or equal to 60% of politically-themed stock candidates were designated as politically-themed stocks of politicians, as shown in Figure 6. Additionally, the ratio of selected politically-themed stocks of politicians to politically-themed stock candidates of 12 politicians did not proportionate to the number of politically-themed stock candidates of politicians. Based on the above results, we conducted statistical analyses to measure correlations between the politically-themed stocks' de-identified numbers and the rank of ETE amounts of politically-themed stocks. As we mentioned, the politically-themed stock's deidentified number means the rank of politician-related search volumes in the particular politician's politically-themed stocks. We sorted the politically-themed stocks' ETE in descending order and ranked them to measure the correlation between the two ranks. We used the Spearman rank coefficient and Kendall rank correlation coefficient to identify the monotonic relationships between politician-related search volumes and ETE amounts. You can see the example of measuring two rank correlations measures between Politician H's politically-themed stocks' de-identified numbers and ETE rank values in descending order in Table 6.   Based on the above results, we conducted statistical analyses to measure correlations between the politically-themed stocks' de-identified numbers and the rank of ETE amounts of politically-themed stocks. As we mentioned, the politically-themed stock's de-identified number means the rank of politician-related search volumes in the particular politician's politically-themed stocks. We sorted the politically-themed stocks' ETE in descending order and ranked them to measure the correlation between the two ranks. We used the Spearman rank coefficient and Kendall rank correlation coefficient to identify the monotonic relationships between politician-related search volumes and ETE amounts. You can see the example of measuring two rank correlations measures between Politician H's politically-themed stocks' de-identified numbers and ETE rank values in descending order in Table 6. As a result, there are significant positive monotonic relationships between the ranks of politically-themed stocks and ETE, as shown in Figure 8. We may consider these results as evidence to support the idea that a politically-themed stock known to have a strong relationship with a politician can be more affected by the politician's reputation. In other words, the stocks whose politician-related search volumes are large and known as leading stocks of politically-themed stocks are strongly linked to politicians. As a result, there are significant positive monotonic relationships between the ranks of politically-themed stocks and ETE, as shown in Figure 8. We may consider these results as evidence to support the idea that a politically-themed stock known to have a strong relationship with a politician can be more affected by the politician's reputation. In other words, the stocks whose politician-related search volumes are large and known as leading stocks of politically-themed stocks are strongly linked to politicians.

Developing Politically-Themed Stock Networks
We illustrated politically-themed stock networks based on ETE using weighted directed networks. The number of statistically significant stocks generated at each significance level is given in Table 7. As a result, we finally illustrated politically-themed stock networks of 12 politicians with statistically significant ETE at significance level α = 0.01, which is demonstrated in Figure 9. We chose the significance level by referring to previous research results, including TE and ETE [23][24][25][26][27][28][29][30][31][32]. In Figure 9, the direction of arrows represents the causal link and the width of arrows represents the size of ETE. Additionally, the size of nodes was proportionate to the summation of two degree-centrality measures, and the width of edges was in proportion to the summation of the estimated ETE.
Based on Figure 9, we demonstrated network-level analysis, as mentioned in Section 2.3.3. No politically-themed stock is isolated in all politically-themed stock networks. This result means that all selected politically-themed stocks are linked to other politically-

Developing Politically-Themed Stock Networks
We illustrated politically-themed stock networks based on ETE using weighted directed networks. The number of statistically significant stocks generated at each significance level is given in Table 7. As a result, we finally illustrated politically-themed stock networks of 12 politicians with statistically significant ETE at significance level α = 0.01, which is demonstrated in Figure 9. We chose the significance level by referring to previous research results, including TE and ETE [23][24][25][26][27][28][29][30][31][32]. In Figure 9, the direction of arrows represents the causal link and the width of arrows represents the size of ETE. Additionally, the size of nodes was proportionate to the summation of two degree-centrality measures, and the width of edges was in proportion to the summation of the estimated ETE.          From the fact that more than three-quarters of causal relationships are statistically significant, we tried to analyze the impacts of individual stocks in politically-themed stocks in detail. We conducted the node-level network analyses, and their results are shown in Figures A1-A4 of Appendix B. We divided in-strength plots and out-strength plots due to problems in plotting two attributes that overlap.
From Figures A1-A4, we can identify the existence of leading stocks in politicallythemed stock networks from unequal values of in-degree, out-degree, in-strength, outstrength, and PageRank. Additionally, all in-degree and out-degree centrality values are over 0.5, which means that over 50% of possible connection exists. This result means these politically-themed stock networks can be considered as highly connected networks. These unequal network measure values imply that some price movements of politically-themed stocks influence other stocks' price movements. Especially, the stocks that have high outdegree, out-strength, and PageRank values can be regarded as leading stocks, which is one of the main characteristics of politically-themed stocks in Korea [6][7][8][9]. These leading stocks act as the leading role of politically-themed stocks in politically-themed stock networks. Table 8 includes the information of the politically-themed stocks which can be the leading stocks of politically-themed stock networks based on their out-degree, out-strength, and PageRank values. Table 8. The average rank correlation between node-level attributes and de-identified numbers of politically-themed stocks. Moreover, if we define a stock whose abnormal returns are heavily influenced by other politically-themed stocks as the following stock, then the following stocks of politicallythemed stock networks are also included. The selection criteria of leading stocks and following stocks are that they meet the top five for all mentioned network measures in each politically-themed stock network. As a result, these selected leading stocks and following stocks have many intersections. In detail, 47.27% of selected leading stocks or following stocks based on network measures act as both leading and following stocks. These selected stocks can be interpreted as having very close relationships with other politically-themed stocks in their politically-themed stock networks. Additionally, if politically-themed stocks in a particular politically-themed stock network have very similar node strength and centrality measure values, no stock is classified as leading stocks or following stocks based on our criteria. In other words, if all the stocks at the top-ranked stocks of each network measure values are all different, there can be no leading stocks or following stocks in a politically-themed stock network. However, two or more stocks in all politically-themed networks are classified as leading stocks or following stocks. Therefore, this result indicates that these stocks play a very important role in the politically-themed stock network from the perspective of node-level network measures. Furthermore, these selected stocks could be considered as the first-ordered investigating stocks to check the violation of stock market regulations from the perspective of financial regulatory authorities.

Network Dynamics Before and after Political Events
In this section, we checked the network dynamics before and after political events to find evidence of politically-themed stocks' dramatic price changes around political events. We set up six experimental points. Five experimental points are the timepoints that five political events occurred in the research. The other experimental point is the control point in time to compare with the other five political event periods' results. We considered one control point in time and selected it on April 17, 2019, because the third Wednesday in April is Korea's official election period if there is an election to be held in that year. In sum, the research periods for conducting network dynamics analysis before and after political events are shown in Table 9. Various studies show that meaningful changes occurred before and after political events in Korea, like positive CAR before political events and losses after political events [6][7][8][9]54]. To identify politically-themed stock networks' change of dynamics, we used sliding window methods to detect politically-themed stock networks' network dynamics with the 10-day step size before and after political events. We set the range for observing their network dynamics as 60 trading days before and after political events with reference to prior studies [1][2][3][4][5][6][7][8][9]. Then, we developed three window sizes: a 20-day window, a 40-day window, and a 60-day window to examine the effects of politically-themed stocks from various angles by setting up different window sizes, shown in Figure 12 [9]. J J-1 J-2 J-13 J-20 J-1 J-4 J-20 K K-8 K-9 K-20 K-9 K-11 K-20 L L-1 L-6 L-12 L-6 L-8 L-11 L-12

Network Dynamics Before and after Political Events
In this section, we checked the network dynamics before and after political events to find evidence of politically-themed stocks' dramatic price changes around political events. We set up six experimental points. Five experimental points are the timepoints that five political events occurred in the research. The other experimental point is the control point in time to compare with the other five political event periods' results. We considered one control point in time and selected it on April 17, 2019, because the third Wednesday in April is Korea's official election period if there is an election to be held in that year. In sum, the research periods for conducting network dynamics analysis before and after political events are shown in Table 9.
Various studies show that meaningful changes occurred before and after political events in Korea, like positive CAR before political events and losses after political events [6][7][8][9]54]. To identify politically-themed stock networks' change of dynamics, we used sliding window methods to detect politically-themed stock networks' network dynamics with the 10-day step size before and after political events. We set the range for observing their network dynamics as 60 trading days before and after political events with reference to prior studies [1][2][3][4][5][6][7][8][9]. Then, we developed three window sizes: a 20-day window, a 40-day window, and a 60-day window to examine the effects of politically-themed stocks from various angles by setting up different window sizes, shown in Figure 12 [9].  As a result, we can obtain the insight that the average ETE values with 20-day windows are clearly larger than those of 40-day windows and 60-day windows based on Figure 13. We can obtain the insight that the average ETE values with 20-day windows show more volatile movements with stronger asymmetry information flows than others. Moreover, most average ETE values are gently decreasing before political events and increasing after political events. The decreasing trends before political events can be interpreted as As a result, we can obtain the insight that the average ETE values with 20-day windows are clearly larger than those of 40-day windows and 60-day windows based on Figure 13. We can obtain the insight that the average ETE values with 20-day windows show more volatile movements with stronger asymmetry information flows than others. Moreover, most average ETE values are gently decreasing before political events and increasing after political events. The decreasing trends before political events can be interpreted as the change from information flows due to leading politically-themed stocks to simultaneous movements from the perspective of daily abnormal returns. As a result, these trends of average ETE values can be elucidated that causal relationships go back to their normal states after political events. This result can be translated to the unique movements derived from investors expecting profits before and right after political events. These periods are commonly known as the most profitable periods [6][7][8][9]54].
after political events. The decreasing trends before political events can be interpreted as the change from information flows due to leading politically-themed stocks to simultaneous movements from the perspective of daily abnormal returns. As a result, these trends of average ETE values can be elucidated that causal relationships go back to their normal states after political events. This result can be translated to the unique movements derived from investors expecting profits before and right after political events. These periods are commonly known as the most profitable periods [6][7][8][9]54]. In conclusion, we can find out the trends for increasing ETE before and after political events. For the control period (Period 5), the results showed the opposite movement of the political events period or oscillated without apparent direction, making it hard to define. Therefore, the increasing ETE around political events may support those rapid stock movements, which is the characteristic of politically-themed stocks in Korea. This phenomenon can be translated that politically-themed stocks affected each other stronger and have more information flows than normal periods.

Investment Strategy Based on PSIs and Politically-Themed Stock Networks
Based on the previous papers about politically-themed stocks and the results of previous sections [1][2][3][4][5][6][7][8][9]54], we obtained the insight of dramatical price changes of politicallythemed stocks with a positive CAR effect of politically-themed stocks occurring around political events. Especially, the size of politically-themed stocks' positive CAR effect occurred more positively in Korea if the politician receives benefits from political events [6][7][8][9]. Based on these results, we developed an investment strategy using the PSI and politically-themed stock networks as the class of assets [55] to verify the possibility of benchmarking KOSPI and KOSDAQ, which are the major market indices of the stock markets in Korea. The strategy can be described as performing the following steps: Step 1: Check the daily ROC of the PSI and pick the largest increasing PSI.
Step 2: Select the politically-themed stock networks of the politician whose PSI are selected In conclusion, we can find out the trends for increasing ETE before and after political events. For the control period (Period 5), the results showed the opposite movement of the political events period or oscillated without apparent direction, making it hard to define. Therefore, the increasing ETE around political events may support those rapid stock movements, which is the characteristic of politically-themed stocks in Korea. This phenomenon can be translated that politically-themed stocks affected each other stronger and have more information flows than normal periods.

Investment Strategy Based on PSIs and Politically-Themed Stock Networks
Based on the previous papers about politically-themed stocks and the results of previous sections [1][2][3][4][5][6][7][8][9]54], we obtained the insight of dramatical price changes of politicallythemed stocks with a positive CAR effect of politically-themed stocks occurring around political events. Especially, the size of politically-themed stocks' positive CAR effect occurred more positively in Korea if the politician receives benefits from political events [6][7][8][9]. Based on these results, we developed an investment strategy using the PSI and politicallythemed stock networks as the class of assets [55] to verify the possibility of benchmarking KOSPI and KOSDAQ, which are the major market indices of the stock markets in Korea. The strategy can be described as performing the following steps: Step 1: Check the daily ROC of the PSI and pick the largest increasing PSI.
Step 2: Select the politically-themed stock networks of the politician whose PSI are selected in the first stage.
Step 3: Optimize the portfolio with the stocks in the selected politically-themed stock networks.
To apply this strategy, we assumed the daily rebalancing period and the weights of stocks were obtained from the solution of portfolio optimization problems to maximize the Sharpe ratio. Maximizing the Sharpe ratio in the portfolio optimization problem is one of the most commonly used portfolio optimization problems. The Sharpe ratio has advantages that consider the return and risk in the objective function simultaneously and can be computed directly from any observed time-series of returns regardless of additional information from stocks. The portfolio optimization problem that maximizes the Sharpe ratio is as follows [56,57]: where w ∈ R N is the column vector consists of the weight of assets, µ ∈ R N is the mean return vector, Σ∈ R N×N is the covariance matrix. The column vectors r f ∈ R N and 1 ∈ R N have elements all consisting of the risk-free rate and one, respectively. The first constraint means that the sum of portfolio weight is one, and the second one implies that short-selling is not allowed, which is hard to carry out by individual investors. We used the stock return data from before 20 trading days to calculate the average return vector and the covariance matrix. The South Korea 10 Years Government Bond was used as the risk-free rate [58]. We also considered transaction costs to prevent overestimating performance, critical to the daily rebalancing period portfolio optimization problem. In other words, we made an assumption that we sell all of our stocks, which we held at the before period, and buy new stocks when we rebalance our portfolio. This assumption was applied to the case that the selected politically-themed stock network is the same for several days in a row. Therefore, the cumulative return data of the portfolio are the lower bound of the portfolio's performance. We also optimized the portfolio from all politically-themed stocks without considering the PSI and to where their networks belong. Additionally, we considered the reformulation version considering computational times and achieving the convexity of the objective function [59]. The portfolio with our investment strategy can obtain additional profit over both market indices before and around the political events, as shown in Figure 14. Moreover, the portfolio's performance based on the PSI and politically-themed stock networks increased sharply in the presidential and legislative elections, which are the most important political events in Korea. In 2017, there were two significant political issues: the impeachment of the Eighteenth President and the nineteenth presidential election; we can identify the upward tendency in the pre-political event and the downward trend in the post-political event more accurately. Before and after the control point in time, the portfolio's performance using the PSI and politically-themed stocks was worse than any other portfolio. Despite downtrends after political events, the portfolio's final performance using the PSI and politically-themed stocks exceeded the cumulative returns of KOSPI and KOSDAQ through the whole research period.
ical event more accurately. Before and after the control point in time, the portfolio formance using the PSI and politically-themed stocks was worse than any other po Despite downtrends after political events, the portfolio's final performance using t and politically-themed stocks exceeded the cumulative returns of KOSPI and KO through the whole research period. Moreover, the portfolio of all politically-themed stocks without using the PSI a derived strategy always showed the lowest performance in almost all research p These results implicate that the investment in politically-themed stocks without an egy is precarious and fragile. This low performance is originated from the fact tha politically-themed stocks have small caps, and their financial stabilities and intrin ues are unverified.
After all, these results can verify the positive CAR effect before and after p events of prior studies [1][2][3][4][5][6][7][8][9]. Additionally, eccentric movements and changes of cally-themed stocks occurred before and after political events simultaneously. Th sults may be considered as evidence of the Korean stock markets' instability, and is to previous studies of politically-themed stocks in Korea from the perspective of t bility of the Korean stock market [6][7][8][9]54].

Discussion
In this study, we developed a novel method for selecting politically-themed by verifying information flows from positive and negative reputations of politic their stock returns. After selecting politically-themed stocks, we tried to verify tw acteristics of politically-themed stocks using politically-themed stocks: the existe leading stocks and the occurrence of positive CAR effect with strong price fluc around political events. Moreover, the portfolio of all politically-themed stocks without using the PSI and the derived strategy always showed the lowest performance in almost all research periods. These results implicate that the investment in politically-themed stocks without any strategy is precarious and fragile. This low performance is originated from the fact that most politically-themed stocks have small caps, and their financial stabilities and intrinsic values are unverified.
After all, these results can verify the positive CAR effect before and after political events of prior studies [1][2][3][4][5][6][7][8][9]. Additionally, eccentric movements and changes of politicallythemed stocks occurred before and after political events simultaneously. These results may be considered as evidence of the Korean stock markets' instability, and is similar to previous studies of politically-themed stocks in Korea from the perspective of the stability of the Korean stock market [6][7][8][9]54].

Discussion
In this study, we developed a novel method for selecting politically-themed stocks by verifying information flows from positive and negative reputations of politicians to their stock returns. After selecting politically-themed stocks, we tried to verify two characteristics of politically-themed stocks using politically-themed stocks: the existence of leading stocks and the occurrence of positive CAR effect with strong price fluctuation around political events.
We set the research period from 1 January 2016, to 30 June 2020. We selected this period because all three primary elections in Korea were held, and a unique political situation (impeachment of the Eighteenth President) is included in this research period. We selected 12 politicians with experience in presidential elections or politicians who had been mentioned as strong candidates in presidential elections for the research periods [1][2][3][4][5][6][7][8][9]. All 12 selected politicians were de-identified using letters A to L.
Using collected politician-related search volume data, we designated the stocks whose search volumes were more than 1% of the search volume of the most searched stocks as the politically-themed stock candidates from the search results. In sum, we obtained 189 politically-themed stock candidates. A range of 10 to 21 stocks were allocated as politicallythemed stock candidates for 12 politicians. Like politicians, 189 politically-themed stock candidates were de-identified.
After collecting data, we first calculated abnormal returns based on the market model. Each abnormal return dataset consisted of 1101 daily abnormal return data of 189 politicallythemed stock candidates. We checked statistical tests for abnormal return datasets. We summarized statistical test results in Figure 2, and they show that our abnormal return dataset exhibits departure from normality and non-stationarity.
We conducted sentiment analysis to calculate the PSI in this study. We randomly collected another 35,019 articles from the political section from 2015 to 2020, labeled them with sentiment lexicons, and used them as training data. Additionally, we designated 216,843 documents related to 12 politicians to the text data and performed sentiment analysis. An LSTM layer with 128 internal units was used to train the machine-learning model for sentiment analysis. The final accuracy of the machine learning model was 90.21%, as shown in Figure 4. After combining the results of sentiment analysis and politicians' search volume data, we finally obtained the politicians' PSI. The calculated PSI is shown in Figure 5. After calculating the ROC of the PSI, we conducted the same statistical tests on the abnormal return data, and found that all the ROC of the PSI data are non-Gaussian and stationary, as shown in Tables 3 and 4.
To measure the causal relationships between the two non-Gaussian data, we used TE calculated from histogram estimation. TE does not assume normality and linear autoregression between variables, which are the core assumptions of conventional economics [16,17]. This attribute makes it possible to test causal relationships between non-linearly interacting variables. Therefore, it has been widely used in many places to set apart the system's source variables and target variables [22,[54][55][56][57]. In particular, financial markets are complex systems that express collective phenomena based on the interacting individual agents, so TE can better detect inter-causal relationships than the Granger causality test [21,[60][61][62]. Especially, we used ETE in experiments because ETE is the complementary measure of transfer entropy in terms of finite-size effects caused by samples. We used the histogram-based estimation to calculate entropy measures After computing ETE from the ROC of the PSI to abnormal returns, we chose politicallythemed stocks that are statistically significant at α = 0.01 considering the robustness of results. Additionally, we excluded stocks whose TE values were zero when they were rounded to the fourth decimal place. As a result, the 159 stocks, which are 84.13% of politically-themed stock candidates, were selected as the politically-themed stocks for constructing politically-themed stock networks. The results of politicians' politically-themed stocks and their ETE are shown in Figures 6 and 7. As a result, contrary to conventional thought, not all politically-themed stock candidates are selected as politically-themed stocks based on this study's methodologies. Greater than or equal to 60% of politically-themed stock candidates are designated as politically-themed stocks of politicians, as shown in Figure 6. Additionally, the ratio of selected politically-themed stocks to politically-themed stock candidates did not proportionate to politically-themed stock candidates.
We illustrated politically-themed stock networks based on ETE using weighted directed networks, as shown in Figure 9. Then, we calculated the network density, average values, and frequency distributions to analyze and summarize politically-themed stock networks dynamically at the network level to obtain insights from the politically-themed stock networks. These network-level measures are generally used to analyze stock markets [32,41,42]. From the perspective of node-level network analyses, we focused on centrality measures, which measure politically-themed stocks' direct influence. For this reason, we considered centrality measurements considering intuitive influence to analyze node-level networks. In other words, in-degree, out-degree centrality, in-strength, out-strength, and PageRank are the methods used to analyze politically-themed stock networks at the node level [48,49]. The overall network density was 0.7981. In other words, 79.81% of all possible edges exist. It indicates that politically-themed stock networks based on ETE are a dense graph, and various causal relationships are included in politically-themed stock networks.
From Figures A1-A4, we can identify the existence of leading stocks in politicallythemed stock networks from unequal values of in-degree, out-degree, in-strength, outstrength, and PageRank. Additionally, almost all in-degree and out-degree values were over 0.5, which means that over 50% of the possible connections were connected. This result means politically-themed stock networks in this study can be considered as highly connected networks. Table 8 includes the information of the politically-themed stocks which can be the leading stocks of politically-themed stock networks based on their outdegree, out-strength, and PageRank values. The selection criteria are those that met the top five for all corresponding network measures in each politically-themed stock network. As a result, these selected leading stocks and following stocks have many intersections. In detail, 47.27% of selected leading stocks or following stocks based on network measures act as both leading and following stocks. These selected stocks can be interpreted as having very close relationships with other politically-themed stocks belonging to politically-themed stock networks. Additionally, these selected stocks could be considered as the first-order-ofinvestigation stocks to check the violation of stock market regulations from the perspective of financial regulatory authorities.
We can find out the trends for increasing ETE before and after political events. Additionally, we checked the network dynamics before and after political events for finding evidence of politically-themed stocks' dramatic price changes around political events. We checked those network dynamics using average ETE values of politically-themed stocks in the same politically-themed stock network with sliding window methods. After conducting analyses, we can obtain the insight that the average ETE values with 20-day windows show more volatile movements with stronger asymmetry information flows than others. Moreover, average ETE values are gently increasing after political events.
Based on the previous papers about politically-themed stocks and results of previous sections [1][2][3][4][5][6][7][8][9]54], we obtained the insight that the abnormal returns of politically-themed stocks occurred before and after political events. Especially, the size of politically-themed stocks' CAR occurred more positively in Korea if the politician receives benefits from political events [6][7][8][9]. From this insight, we developed the investment strategy using the PSI and politically-themed stock networks as the class of assets [55]. Using this investment strategy, we tried to benchmark the KOSPI and KOSDAQ, which are the major market indices of the stock market in Korea. As a result, the portfolio with our investment strategy can obtain additional profit than both market indices before and around the political events, as shown in Figure 14. Moreover, the portfolio's performance based on the PSI and politically-themed stock networks increased sharply in the presidential and legislative elections, which are the most important political events in Korea. Additionally, there are two significant political issues: the impeachment of the Eighteenth President and the nineteenth presidential election in 2017. We can identify the upward tendency in the prepolitical event and the downward trend in the post-political event more accurately in 2017. The optimal portfolio based on all politically-themed stocks without using the PSI and politically-themed stock networks almost always underperformed than other portfolios in the entire research period. In the control period, the portfolio's performance using the PSI and politically-themed stocks was worse than any other portfolio. Plus, the portfolio of all politically-themed stocks without using tge PSI and the derived strategy always showed the lowest performance in almost all research periods.

Conclusions
Politically-themed stocks are generally defined as stocks that are expected to benefit from policies and politicians. By this definition, there can be two types of politicallythemed stocks: policy-related politically-themed stocks and politician-related politically-themed stocks. Especially, politician-related politically-themed stocks are defined as stocks that are privately related to major stakeholders related to politicians. At this time, it means that the major stakeholders of a particular company have a form of relationship that has no relation with the politicians' policy direction. Instead of their policy direction, those stakeholders are connected to politicians based on blood ties, regional ties, academic ties, etc. In this research, we focused on the politician-related politically-themed stocks, not on the ones benefiting from the politicians' policies, but on the ones benefiting from politicians' connections. Most empirical studies of these politically-themed stocks have claimed that two representative characteristics of politically-themed stocks exist: leading stocks and the positive CAR effect.
This study attempted to select politically-themed stocks as information flows from the positive or negative reputation of politicians to abnormal returns in order to supplement the methodologies of existing research. In the process, politically-themed stocks were selected as the ones with statistically significant causal relationships from the positive or negative reputation of politicians. We then used the selected politically-themed stocks to validate the two characteristics: the presence of leading stocks and the occurrence of positive CAR effects around political events.
To select politically-themed stocks, we calculated the daily PSI, which means politicians' daily reputation using politicians' search volume data and politicians' sentiment analysis results. Using the calculated PSI, we determined politically-themed stocks based on causal relationships from ROC of the PSI to the abnormal returns derived from the market model between the politically-themed stock candidates using ETE after the above process. To examine the relationships between politically-themed stocks, we constructed politically-themed stock networks based on causal relationships using entropy-based approaches. Using the constructed politically-themed stock networks, we validate the two characteristics as we mentioned before based on the network measures and network dynamics.
The contributions of this study can be summarized as three aspects. First, most prior research determined the politically-themed stocks from the occurrence of positive CAR during the political event periods. However, these results are unclear as to whether the occurrence of abnormal returns is influenced by politicians' actions. Additionally, the stocks designated as politically-themed stocks in those studies were subjectively set by the researchers. However, the affected stocks were politically-themed and found that some stocks had known politically-themed stocks based on the definition of politically-themed stocks. Second, we verified the leading stocks from unequal values of the network analysis results that ETE of politically-themed stock networks show. Finally, to validate the positive CAR effect of prior research, we analyzed the network dynamics around political events. From the results, we found the trends of increasing ETE before and after political events. This clear trend means that the politically-themed stocks' price movement trends its change around politically-themed stocks. Based on the insight from the network dynamics, we built investment portfolios using the PSI, and as a result, we found that the investment portfolio consists of politically-themed stocks using the PSI that can benchmark the index during the election period. These results can verify the CAR-related results of prior studies and the existence of eccentric movements and changes before and after political events simultaneously. The last two points can be evidence of Korean stock markets' instability, similar to previous studies of politically-themed stocks in Korea.
We have three points that summarize this study's limitations. First, we limited the factors affecting the abnormal returns of stocks only to the PSI. However, abnormal returns of stocks are the combined results of numerous factors in the real world. The second limitation is that the computed PSI can be different because the PSI values depend on the sentiment analysis results, which can be changed by the text data, machine-learning models, and hyperparameters. We adopted bidirectional LSTM as a machine-learning model for sentiment analysis because LSTM is one of the most commonly used methods in the recent 5 years in sentiment analysis based on the Korean language. However, well-performed machine-learning models like the transformer [61], BERT [62], and hybrid model [63] are released nowadays. For example, if those models are used for conducting sentiment analysis, it is believed that they may produce better results to detect politicallythemed stocks based on the methodology of this paper. Finally, the computation time for constructing politically-themed stock networks was not considered. If someone builds a larger politically-themed stock networks to verify the network dynamics in this study, computation time will be an essential factor to consider. Thus, it is necessary to analyze further factors affecting abnormal returns and optimize factors related to calculation time, considering the consistency of results to construct advanced types of politicallythemed networks.
Consequently, we developed a novel method to select the politically-themed stocks of politicians quantitatively based on sentiment analysis and entropy-based information flow measures. Additionally, we empirically analyzed the main characteristics of politicallythemed stocks and developed politically-themed stock networks focusing on their causal relationships. Besides, this study's contribution is that we came up with a new method to verify common knowledge of politically-themed stocks.
As a follow-up task, we will conduct studies to find other kinds of thematic stocks that can be analyzed from this perspective and develop novel methods for detecting non-thematic stocks' relationships based on information theory and behavioral finance. Data Availability Statement: Data available on request due to restrictions eg privacy or ethical. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to political contents in this paper.

Conflicts of Interest:
The authors declare no conflict of interest.    Notes: W, JB, Q 2 (10), DF τ , and LM are the test statistics of the Shapiro-Wilk test and Jarque-Bera test for normality, the Ljung-Box test for 10-order serial autocorrelation in squared abnormal return data, the augmented Dickey-Fuller test, and the White test, respectively. ***, **, * means that the null hypothesis of normality, no autocorrelation in residuals, nonstationary, and homoscedasticity is rejected at the 10 percent, 5 percent, and 1 percent significance level, respectively.    Figure B2. In-degree of politically-themed stocks. Note: The characters of subfigures mean the politicians' assigned alphabet for de-identification.