Symbolic Encoding Methods with Entropy-Based Applications to Financial Time Series Analyses

Symbolic encoding of information is the foundation of Shannon’s mathematical theory of communication. The concept of the informational efficiency of capital markets is closely related to the issue of information processing by equity market participants. Therefore, the aim of this comprehensive research is to examine and compare a battery of methods based on symbolic coding with thresholds and the modified Shannon entropy in the context of stock market efficiency. As these methods are especially useful in assessing the market efficiency in terms of sequential regularity in financial time series during extreme events, two turbulent periods are analyzed: (1) the COVID-19 pandemic outbreak and (2) the period of war in Ukraine. Selected European equity markets are investigated. The findings of empirical experiments document that the encoding method with two 5% and 95% quantile thresholds seems to be the most effective and precise procedure in recognizing the dynamic patterns in time series of stock market indices. Moreover, the Shannon entropy results obtained with the use of this symbolic encoding method are homogenous for all investigated markets and unambiguously confirm that the market informational efficiency measured by the entropy of index returns decreases during extreme event periods. Therefore, we can recommend the use of this STSA method for financial time series analyses.


Introduction
The use of proper symbolic encoding of information is a foundation of the mathematical theory of communication proposed by Shannon in 1948 [1]. Daw et al. [2] indicated that data analysis tools referred to as symbolization or symbolic time series analysis (STSA) involve the transformation of raw time series data into a series of discretized symbols that are processed to extract information about the generating process. Among others, Buhlmann [3] and Schittenkopf et al. [4] emphasized that a discretization strategy transforming real data time series into symbolic streams is an effective complexity reduction tool. However, there is no general rule for locating an optimal partition for actual data [5], and therefore many symbolic encoding procedures have been proposed in the literature and the results are not homogenous [2,3,[6][7][8][9][10][11][12][13].
The well-known concept of informational efficiency of capital markets, which is strictly connected with the Efficient Market Hypothesis (EMH) [14], is closely related to the issue of information processing and depends on the representation of information by equity market participants. Specifically, the EMH defines an efficient market as one in which new information is quickly and correctly reflected in current security prices [15]. The traditional taxonomy of information sets distinguishes between three forms of informational efficiency: (1) weak-form efficiency, (2) semi-strong-form efficiency, and (3) strong-form efficiency. Weak-form efficiency means that the information set includes only the history of prices or returns. Semi-strong-form efficiency denotes that the information set includes all publicly available information. The final one, strong-form efficiency, means that the information set includes all information known to any market participant (see, e.g., [16]). Gulko [17] emphasized that the idea of market efficiency is linked to the concept of entropy. The author proposed the so-called entropic market hypothesis, which states that the entropy maximization may be a basic property of efficient pricing.
In their seminal paper, Shannon [1] documented that information content could be measured by entropy. Generally speaking, communication systems can be roughly classified into three main categories: discrete, continuous, and mixed. A typical case of a discrete system of communication is telegraphy, since both the message and the signal are a sequence of discrete symbols. In such a meaning, each discrete time series constitutes a discrete source of information that can be encoded with the use of symbols [1]. The Shannon entropy definition is grounded on symbolic representation of the information with the respective estimated probabilities. According to the literature, the modified Shannon entropy approach based on symbolic encoding with thresholds is especially useful in assessing market (in)efficiency in terms of sequential regularity in financial time series during extreme event periods [3].
Within turbulent periods, regularity in financial time series increases in terms of the existence of patterns in stock and index returns. For instance, Risso [10] documented that market trends (both up and down) that are common during extreme event periods usually reduce the entropy of daily financial time series due to more frequent repeated patterns. Specifically, stock market crashes create declining trends in financial time series, which reduce the entropy but increase time series regularity. In general, the chances for price prediction rise within a crisis and economic downturns. To sum up, various methods based on entropy enable investors to evaluate the aforementioned problems (see, e.g., [18] and the references therein).
The main research question of this study can be formulated as follows: which STSA method is the most effective one in entropy-based applications in financial markets, especially in the context of informational market efficiency? Therefore, the aim of this comprehensive research is to evaluate several methods based on the modified Shannon entropy and a symbolic representation of discrete time series in financial market analyses.
In order to answer the research question, two turbulent periods are analyzed: (1) the COVID-19 pandemic outbreak and (2) the period of war in Ukraine. Fifteen selected European equity markets are investigated. As the analyzed sample periods are not long, the methods that allow for assessing market (in)efficiency within long-time periods (such as the Hurst exponent [19]) are not appropriate in this case.
The added value of this study is derived from the novel methodological and empirical findings that have not been presented in the literature thus far. The contribution is twofold. First of all, to the authors' knowledge, this is the first comprehensive piece of research that examines and compares a battery of symbolic encoding methods with thresholds in empirical analyses concerning the informational efficiency of financial markets. Moreover, after symbolization, the dynamic structure in real data is recognized by symbol sequences and symbol sequence histograms of relative frequencies, which provide a convenient way to observe possible patterns in time series. The findings document that the encoding method with two 5% and 95% quantile thresholds seems to be the most effective and precise procedure in assessing dynamic patterns in time series of stock market indices. Therefore, we can recommend the use of this STSA method for financial time series analyses.
Second, the research hypothesis that the market informational efficiency measured by the modified Shannon entropy of daily index returns decreases during extreme event periods is assessed. To examine this hypothesis, changes in entropy values for the preturbulent and turbulent periods are estimated. The comparative results are homogenous for both pairs of the pre-event and event sub-periods and they confirm that there is no reason to reject the research hypothesis. The results support the evidence that stock market efficiency measured by entropy decreases during extreme events as the sequential regularity in time series increases in such cases. This conclusion is important for academics and practitioners and it is consistent with the existing literature which documents that turbulent periods are usually found to reduce the entropy of financial markets (e.g., [18,[20][21][22][23]).
The rest of this study is organized as follows. Section 2 presents a brief literature review. Section 3 describes the methodological background concerning symbolic encoding methods as well as the modified Shannon entropy approach based on symbolic representations of time series. Section 4 contains real data descriptions. Section 5 presents the experimental studies and compares empirical results on the European stock markets. The last section summarizes and discusses the main findings and indicates some further research directions. The paper is supplemented with three appendixes.

Brief Literature Review
The literature contains several studies that utilize entropy-based procedures with symbolic encoding in various applications in financial market analyses. However, the number of these studies is rather limited. For instance, Brida and Punzo [5] constructed an artificial economy of the Italian macro-regions and focused on the STSA approach and the modified Shannon entropy to analyse the six-regime dynamics. Risso [10] investigated the daily informational efficiency of five stock markets by using the local entropy and a symbolic time series analysis with one threshold. Mensi et al. [24] evaluated the time-varying degree of the weak-form efficiency of the crude oil market using the modified Shannon entropy and the STSA approach. Sensoy et al. [12] assessed the strength and direction of information flow between exchange rates and stock prices in emerging markets by the effective transfer entropy and symbolic encoding method with two thresholds. Risso [11] applied a symbolic time series analysis with one threshold and the Shannon entropy in order to measure and rank the informational efficiency of twenty developed and emerging stock markets within the world. Mensi et al. [25] examined two worldwide crude oil benchmarks and used the Shannon entropy based on the STSA procedure to rank the market-level efficiency. Oh et al. [26] explored the degree of uncertainty in the return time series of several market indices based on the Shannon entropy, which incorporated the contribution of possible patterns. Ahn et al. [6] used the Shannon entropy based on symbolic encoding with one threshold to analyse the effect of stock market uncertainty on economic fundamentals in China. Shternshis et al. [13] proposed a computational methodology to estimate the Shannon entropy of high-frequency data to study the informational efficiency of Exchange Traded Funds (ETF). The authors considered symbolic encoding methods with one and two thresholds. Olbrys and Majewska [9] applied two different STSA procedures with one threshold and the Shannon entropy to rank European stock markets' informational (in)efficiency during the COVID-19 pandemic. In a recent paper, Brouty and Garcin [27] determined the amount of information contained in the time series of price returns by using the Shannon entropy applied to symbolic representations of time series.
Another strand of the literature explores the topic of stock market efficiency during turbulent periods with the use of various entropy-based methods. For instance, Wang and Wang [22] documented that the informational efficiency of the S&P 500 index substantially decreased during the COVID-19 extreme event. Ozkan [21] investigated six developed equity markets during the COVID-19 pandemic outbreak and obtained that all markets deviate from market efficiency within this extreme event period. Ortiz-Cruz et al. [20] indicated (based on the multi-scale Approximate Entropy procedure) that returns from crude oil markets were less uncertain during economic downturns. Olbrys and Majewska [18] utilized a different approach, i.e., the Sample Entropy algorithm, to estimate the sequential regularity and entropy of the daily time series of 36 stock market indices within two extreme event periods: (1) the Global Financial Crisis in 2007-2009 and (2) the COVID-19 pandemic outbreak in 2020-2021. In general, the empirical results support the hypothesis that the regularity in financial time series usually increases, while the entropy and informational efficiency of stock markets usually decrease during various turbulence periods due to the existence of patterns in returns. In this context, Billio et al. [28] proposed an entropy-based early warning indicator for systematic risk and documented the forecasting ability of entropy measures in predicting banking crises.

Methodological Background
This section presents the theoretical background concerning the modified Shannon entropy approach based on symbolic encoding with thresholds. As mentioned in the Introduction, this method is especially useful in assessing the regularity of discrete financial time series during extreme event periods [3].
In this research, the time series of stock market indices are investigated. The returns of indices are calculated as daily logarithmic rates of return: where P t is the closing value of the particular market index on day t.

Symbolic Encoding with One Threshold
Symbolic time series analyses (STSA) are utilized in many applications. The main idea is that the values of given discrete time series data are transformed into a finite set of symbols, thus obtaining a finite string. This operation is just a translation into a finite precision language [5]. Schittenkopf et al. [4] showed that the discretization of financial time series can effectively filter the data and reduce the noise. Ahn et al. [6] pointed out that an STSA allows to capture time-varying patterns in stock returns by transforming the real data into a limited number of symbols which reflect the dynamic rise-fall pattern of several consecutive returns. Risso [29] reviewed various applications of the STSA methods in social sciences. Letellier [30] emphasized that symbolic sequence analyses are a useful tool for characterizing any kind of dynamical behaviour with symbols, and a so-called threshold-crossing technique could be used. We adapt Letellier's definition of a sequence of symbols for returns of stock market indices (Definition 1): Definition 1. A sequence {s t } of symbols is defined according to: where r c is the critical point (threshold) of return time series {r t }.
In the literature, various threshold values r c are taken into consideration, for instance: Method 3: r c = median [8].
The finite set A = {0, 1} of possible n = 2 symbols is called an alphabet, while each subset of a sequence of symbols is called a word [1,2,5,31]. A sequence of consecutive returns is symbolized as a sequence of 0 s and 1 s [6].

Symbolic Encoding with Two Thresholds
To code the same raw data, one can use different discrete coding alphabets corresponding to different levels of discretization. In the case of two thresholds, the set A = {0, 1, 2} of possible symbols is an alphabet. The alphabet size is equal to n = 3. Therefore, a sequence of consecutive returns is symbolized as a sequence of 0 s, 1 s and 2 s. Daw et al. [2] emphasized that the binary symbolization for n = 2 is convenient in many cases, but higher values of n > 2 correspond to an increasingly refined discrimination of measurement details.
The definition of a sequence of symbols for returns of stock market indices in the case of two thresholds is given as follows (Definition 2): Definition 2. A sequence {s t } of symbols is defined according to: where θ 1 and θ 2 are the thresholds of return time series {r t }.
Based on the literature, various thresholds θ 1 , θ 2 are used in empirical research, for instance: 1.

Symbolic Sequence Dynamics and Histograms
According to the symbolic dynamics literature, after symbolization, the next step in identification of temporal patterns in time series is the construction of symbol sequences (words). If each possible sequence is represented in terms of a unique identifier, each result creates a new time series referred to as a code series [2]. The choice of a specific decimal number can be arbitrary [6].
Let n > 1 be the number of possible symbols and k ≥ 1 be the length of a code sequence (i.e., a word). Hence, there are n k paths with k symbols that occur in the given symbolic data sequence. For instance, in the case of n = 2 and k = 2, the number of possible patterns is equal to n k = 2 2 = 4 permutations, and all possible words are {(1, 1), (1, 0), (0, 1), (0, 0)}. These words can be represented by natural numbers {1, 2, 3, 4} [5].
The temporal structure of observed data is revealed by the relative frequency of each possible symbol sequence (word). The observed dynamics can be described by a k-histogram of relative frequencies. The empirical distribution presented in a symbolsequence histogram allows for a comparison of coded sequences. Generally speaking, direct visual presentation of the frequencies with symbol-sequence histograms provides a convenient way for observing and determining possible patterns in time series [2,5,26].

The Modified Shannon Entropy Approach Based on the Symbolic Representation of Time Series
As mentioned in the Introduction, entropy is a widely used measure that summarizes the information content of a probability distribution. Specifically, the Shannon information entropy [1] quantifies the expected value of information contained in a discrete distribution. The Shannon entropy of k-th order (Definition 3) is an information theoretic measure for symbol-sequence frequencies.
Definition 3. The Shannon entropy of k-th order, H(k), is defined according to: where p i is the probability of finding the i-th sequence of length k.
The probability p i is approximated by the number of times the i-th sequence is found in the original symbolic string divided by the number of all non-zero sequences of length k [5]. This means that p i is calculated based on the histogram of symbol sequence frequencies.
For increasingly longer sequences from a finite-length time series, the entropy given by Definition 3 tends to be underestimated [2]. Therefore, following Brida and Punzo [5], we use the definition of the modified Shannon entropy H s (k) based on the symbolic representation of time series (Definition 4). The H s (k) is a normalized form of the Shannon entropy H(k) (Definition 3).

Definition 4.
The modified (normalized) Shannon entropy 0 ≤ H s (k) ≤ 1 based on symbolic representations of time series is defined according to: where N is the total number of observed sequences of length k with non-zero frequency, i is the index of a sequence and p i is the probability of finding the i-th sequence of length k. It is assumed that 0 · log 2 0 = 0.

Real Data Description
This section describes the real datasets that have been used in empirical research. In order to ensure the coherence and comparability of empirical findings, both datasets include daily observations of the main stock market indices for the same fifteen European countries that have been chosen in the context of the Russian invasion of Ukraine. These countries are France, the United Kingdom, Germany, Finland, Norway, Turkey and the so-called 'Bucharest 9' (NATO Eastern flank states, i.e., Poland, Hungary, Czechia, Romania, Bulgaria, Lithuania, Estonia, Latvia and Slovakia). The choice of the selected countries can be justified as follows: (1) France, the United Kingdom, Germany and Turkey have taken an active part in diplomatic efforts concerning the Russian-Ukrainian conflict, (2) Finland and Norway border Russia and (3) all members of the so-called 'Bucharest 9' NATO Eastern flank states were either part of the former Soviet Union (USSR) or members of the defunct Soviet-led Warsaw Pact.

The COVID-19 Pandemic Outbreak
The first four-year sample comprises the two-year pre-COVID-19 pandemic period (from January 2018 to December 2019) and the two-year COVID-19 pandemic period (from January 2020 to December 2021). Since there is no unanimity in determining the COVID-19 pandemic period among researchers, in this study it is assumed that this period comprised two years (2020-2021), since on 30 January 2020, the COVID-19 outbreak was declared as a Public Health Emergency of International Concern by the World Health Organization (WHO), while on 11 March 2020, the WHO officially declared the COVID-19 outbreak to be a global pandemic [32]. Table 1 includes brief information about the analyzed indices in the order of decreasing value of market capitalization (in EUR billion) on 31 December 2020, as well as the summarized statistics for the daily logarithmic rates of return within the entire first sample period and two investigated sub-periods.

The War in Ukraine
The second two-year sample comprises the one year pre-war period in Ukraine (from 24 February 2021 to 23 February 2022) and a one-year war period in Ukraine (from 24 February 2022 to 24 February 2023). Analogous to Table 1, the subsequent Table 2 presents brief information about the analyzed indices in the same order and the summarized statistics for the daily logarithmic rates of return within the entire second sample period and two investigated sub-periods.

Empirical Experiments
This section presents various empirical experiments concerning symbolic encoding with thresholds and entropy-based comparative analyses in the context of sequential regularity in financial time series. The selected fifteen European stock markets are investigated within the turbulent periods including the COVID-19 pandemic outbreak and the war in Ukraine.
Computations were conducted with a dedicated program. To perform the calculations and generate graphs, the Jupyter Notebook-a web-based interactive computing platform-and the pandas, numpy, math, matplotlib and itertools libraries were used. Jupyter (formerly known as IPython Notebook) allows to create documents that include the code and visualizations, while the libraries enable users to process data and generate graphs efficiently.
The findings of real data experiments document that the modified Shannon entropy based on the encoding method with two 5% and 95% quantile thresholds seems to be the most effective explicit procedure in assessing the dynamic patterns in the time series of stock market indices. This method evaluates extreme returns during turbulent periods much more appropriately than other methods, and the empirical results are especially homogenous for all investigated equity markets within all analyzed sub-periods.

Symbolic Dynamic Patterns in Financial Time Series
As mentioned in Section 3.3, direct identification of symbolic dynamic patterns in time series consists of two steps. The first step encompasses symbolic encoding with one or two thresholds based on Definition 1 or Definition 2, respectively. The next step is the construction of symbol sequences (words). Each possible sequence is represented in terms of a unique identifier (code) given by a natural number. Table 3 reports the number of all possible sequences (words) in the case of both definitions.  Table 4 exemplifies the assigned codes of all possible sequences in the case of the alphabet A = {0, 1, 2} (n = 3) and the k = 3 length of a code sequence (n k = 3 3 = 27 natural numbers). Sequence No. 8 (i.e., (1, 1, 1)) is marked in bold as the one most frequently observed (see Tables 5 and 6). Moreover, the sequence codes reported in Table 4   The Codes of Sequences for the Alphabet A = {0, 1, 2} and k = 3

A Code Sequence Length Definition 1, Alphabet
In this research, three different numbers (k = 3, 4, 5) are utilized as the length of a code sequence. The amount of all calculations is large; thus, only selected results are displayed in this paper. The remaining empirical findings are available upon request.

Symbol-Sequence Histograms
As pointed out in Section 3.3, a dynamic structure in real data can be expressed by the relative frequency of each possible symbol sequence. The observed dynamics can be illustrated by a k-histogram of relative frequencies. Therefore, in this subsection, selected histograms are presented. For comparison, Figure 1 shows symbol-sequence histograms (k = 3) based on Definition 2 and three different STSA methods with two thresholds for the CAC40 (France) index, within the pre-COVID-19 and COVID-19 periods, respectively. Furthermore, Figure 2 exemplifies the appropriate histograms for the same index within the pre-war and war periods in Ukraine.        The evidence shows that Method 1 with two 5% and 95% quantile thresholds specifies extreme returns during turbulent periods more accurately than the other two encoding methods with two thresholds. Method 2 (with two 2.5% and 97.5% quantile thresholds) is too restrictive, while Method 3 (with sample tertiles as the thresholds) collects index returns similarly to the methods with one threshold, given by Definition 1. This observation was commented on in the previous subsection.
The additional Figures A1-A4 (Appendix B) further express comparative analyses of symbol-sequence histograms based on Definition 2 and Method 1 with two thresholds (5% and 95%) within two pairs of turbulent periods: (1) the pre-COVID-19 and the COVID-19 pandemic periods and (2) the pre-war and war periods in Ukraine.

The Modified Shannon Entropy Comparative Results: The COVID-19 Pandemic Outbreak
In this subsection, the comparative entropy results during the pre-COVID-19 and COVID-19 periods for the fifteen analyzed stock market indices are presented and discussed. Table 7 includes the findings for the modified Shannon entropy based on three different methods of symbolic encoding with one threshold given by Definition 1 (for sequences of length k = 3). The columns entitled 'Change' report changes in the Shannon entropy before and during the COVID-19 pandemic period. The down arrows show an entropy decrease, while the up arrows illustrate an entropy increase. As one can observe, the results are rather mixed and heterogenous, and they are not in line with expectations, since the literature documents that the market informational efficiency measured by entropy of index returns usually decreases during extreme event periods [18,[20][21][22][23]. Therefore, there is no reason to recommend the use of encoding methods with one threshold for financial time series analyses within extreme event periods. Table 8 contains the empirical results for the modified Shannon entropy based on three different methods of symbolic encoding with two threshold given by Definition 2 (for sequences of length k = 3). It is evident that the modified Shannon entropy values given by Definition 4 depend on the choice of the encoding procedure. This is rather obvious since the lower entropy values determined by Definition 4 are directly connected to a higher level of regularity in time series, expressed by symbol sequences. Conversely, higher entropy values represent a lower level of regularity. Therefore, the entropy values obtained from Method 2 are the lowest, while the results from Method 3 are the highest. These results are associated with the demonstration graphs presented in Figures 1 and 2 (Section 5.2). It is worthwhile noting that the results are homogenous for all investigated stock market indices.

Modified Shannon Entropy Comparative Results: The War in Ukraine
Similarly to the previous subsection, this subsection describes and discusses the modified Shannon entropy comparative results during the pre-war and war periods in Ukraine for the fifteen analyzed stock market indices. The general conclusions are very similar.
Firstly, the results reported in Table 9 (the modified Shannon entropy based on three different methods of symbolic encoding with one threshold given by Definition 1, k = 3) are diverse and ambiguous, and they are not in line with expectations. Hence, we cannot advocate the use of STSA methods with one threshold in financial time series analyses within turbulent periods.
Secondly, the findings displayed in Table 10 (the modified Shannon entropy based on three different methods of symbolic encoding with two thresholds given by Definition 2, k = 3) are much better compared to those in Table 9, specifically for Method 1 (with two 5% and 95% quantile thresholds). In the case of this method, the pronounced decrease in the modified Shannon entropy for all investigated stock markets is visible in the fourth column ('Change') in Table 10. The obtained results are much more homogenous compared to those in Table 9, especially for Method 1 (the fourth column). The up arrows are rare and they are visible only in the case of three markets (i.e., the U.K., Poland and Estonia). Hence, we can recommend the use of Method 1 with two 5% and 95% quantile thresholds in assessing stock market efficiency.
It is important to note that the obtained results decidedly support the research hypothesis. Therefore, we can assert that the recommendation of the use of the STSA method with two 5% and 95% quantile thresholds is well founded.
Furthermore, additional comparative findings of Method 1 (the 5% and 95% sample quantiles) for sequences of length k = 4 and k = 5 within the pre-event and event periods for the fifteen analyzed stock market indices are reported in Tables A4 and A5 (Appendix C). The obtained results indicate that the choice of the sequence length k is a minor issue as the results for k = 4 and k = 5 are very similar to those for k = 3. However, the visualization of the results by k-histograms is much more difficult for k = 4 and k = 5, as the total number of possible sequences is large (see Table 3).

Conclusions
The purpose of this empirical study was to investigate and compare various methods based on the symbolic representation of discrete time series and the modified Shannon entropy in assessing stock market informational efficiency in terms of sequential regularity in financial time series. Fifteen European stock markets within two extreme event periods (i.e., the COVID-19 pandemic outbreak and the war in Ukraine) were analyzed. The markets were selected in the context of the Russian invasion of Ukraine. To capture the sequential dynamics in daily time series of equity market indices, changes in the Shannon entropy values before and during the particular extreme event were calculated and compared. The research hypothesis that the stock market efficiency measured by entropy usually decreases during turbulent periods was examined with the use of six different variants of STSA methods.
The research contribution of our paper to the discussion concerning stock market informational efficiency is twofold. Firstly, the most pronounced and consistent empirical results were obtained with the use of the STSA method with two thresholds (the 5% and 95% sample quantiles). This method was the best and most unambiguous in assessing the stock market efficiency measured by the modified Shannon entropy. Moreover, the empirical findings confirmed no reason to reject the proposed research hypothesis, since the entropy of stock market indices visibly decreased during both turbulent periods. This well-justified observation is consistent with the existing literature, and it is the second important contribution of our study.
The obtained comparative findings were especially unambiguous within the pre-COVID-19 and COVID-19 sub-periods. This evidence is rather obvious. It is worth recalling that the European stock markets were affected by the COVID-19 pandemic outbreak at the same time and to a similar extent, as opposed to the influence of the war in Ukraine.
It is worth mentioning that our research relates to the literature concerning the weak form of market informational efficiency, since the used information sets include only the history of index returns. Stock market index returns contain the influence of public information and, during various extreme event periods, all public information is especially important for investors and determines investment decisions. Among others, Lim and Brooks [15] emphasized that the empirical findings of market efficiency are rather heterogenous, as the EMH remains an elusive concept. Generally speaking, the topic is interesting and valid for academics and practitioners, and the recommended STSA method might be used as a useful tool in systems that support investment decisions.
The potential limitations of our research are mainly related to the choice of the investigated European stock markets. However, these limitations are not very significant as this choice is well justified (see Section 4) and the obtained empirical findings are homogenous.
Since the topic of stock market informational efficiency measured by entropy is strictly connected to the problem of market dynamics and volatility, a promising direction for further research could be an extensive assessment of STSA methods that incorporate volatility estimates (e.g., [4]). The motivation for such research can be, for instance, the study conducted by Gradojevic and Caric [33]. The authors emphasize that although volatility and entropy are related measures of market risk and uncertainty, entropy can be more useful in predictive modeling.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

STSA
Symbolic Time Series Analysis EMH Efficient Market Hypothesis COVID-19 COVID-19 pandemic Table A1 reports the comparative results for k = 5 and the sequence (1, 1, 1, 1, 1) (Definition 2, Method 1). Analogously to the cases for k = 3 and k = 4 documented in Tables 5 and 6, the empirical findings are also homogenous for all investigated stock markets and all sub-periods. The evidence shows that the sequence (1, 1, 1, 1, 1) is the most frequent path. The percentage number of this sequence is high and it varies between 55.9% and 72.4%. The sequence (1, 1, 1, 1, 1) means that five successive daily stock index returns are not extreme as they lie between the sample thresholds of θ 1 = 5% and θ 2 = 95%. Table A1. Dynamic patterns of symbolic encoding with two thresholds (Method 1: the 5% and 95% sample quantiles) for k = 5 and the sequence (1,1,1,1,1) within the pre-event and event periods (for the 15 analyzed stock market indices).

Pre-War in Ukraine
War in Ukraine Pre-COVID-19 COVID-19 All Sequences for k = 5 The following Tables A2 and A3 demonstrate the dynamic patterns of symbolic encoding with two thresholds in the case of Definition 2 and Method 2 (the 2.5% and 97.5% sample quantiles) for sequences of length k = 3 and k = 4 within the pre-event and event periods. Although the results are unambiguous for all fifteen analyzed stock market indices, they reveal that this STSA method is too restrictive, as the percentage numbers of sequences (1, 1, 1) (k = 3) and (1, 1, 1, 1) (k = 4) are very high. These numbers vary between 79.7% and 89.0% (Table A2) and between 76.4% and 90.6% (Table A3). This means that Method 2 with θ 1 = 2.5% and θ 2 = 97.5% as the thresholds is not very useful, since it favours extremely high or low daily returns and such returns are relatively rare.

Appendix B. Symbol-Sequence Histograms: Additional Figures
Figures A1-A4 present comparative analyses of symbol-sequence histograms based on Definition 2 and Method 1 with two thresholds (5% and 95%) within two pairs of turbulence periods: (1) the pre-COVID-19 and the COVID-19 pandemic periods and (2) the pre-war and the war periods in Ukraine . Examples of appropriate histograms for six stock markets are reported; however, the histograms for the remaining markets are very similar and they are available upon request (due to space restrictions). The empirical results visualized by histograms are documented in Tables 5 and 6 and they are unambiguous for all investigated indices. The main evidence shows that the sequence (1, 1, 1) (see Table 4) is the most frequently observed. This sequence means that three successive daily stock index returns are not extremely high or low, but they lie between the sample quantile thresholds of θ 1 = 5% and θ 2 = 95% (Definition 2). Moreover, the histograms document that the number of zero-frequency sequences is relatively high for all indices and all investigated periods.

Appendix C. The Modified Shannon Entropy: Additional Comparative Results
Tables A4 and A5 report additional comparative findings of Method 1 (the 5% and 95% sample quantiles) for sequences of length k = 4 and k = 5 within two pairs of turbulence periods: (1) the pre-COVID-19 and the COVID-19 pandemic periods (Table A4) and (2) the pre-war and war periods in Ukraine (Table A5). The findings are homogenous and they document that the choice of the sequence length k is a minor problem as the results for k = 4 and k = 5 are similar to those for k = 3 (see Tables 8 and 10). This evidence once again confirms that Method 1 with two 5% and 95% quantile thresholds (for sequences of length k = 3, 4, 5) is worth a recommendation in financial time series applications. Table A4. The modified Shannon entropy based on symbolic encoding with two thresholds given by Definition 2. Comparative empirical findings of Method 1 (the 5% and 95% sample quantiles) for sequences of length k = 4 and k = 5 within the pre-COVID-19 and COVID-19 periods for the 15 analyzed stock market indices.