On the Similarity and Dependence of Time Series

: In this paper, we undertake the problem of evaluating interrelation among time series. Interrelation is measured using a similarity index. In this paper, we suggest a new one based on the known fuzzy transform (F-transform), which has been proven to remove higher frequencies than a given threshold and reduce the random noise signiﬁcantly. The F-transform also provides an estimation of the slope of time series in a given imprecisely delineated time. We prove some of the suggested index properties and show its ability to measure similarity (and thus the interrelation) on a selection of several real ﬁnancial time series. The method is well interpretable and easy to adjust.


Introduction
Time series form a topic of research that has been widely studied for many years (for some recent references, see [1][2][3][4][5]). In 2006, Yang and Wu [6] rated mining information from time series as one of the top ten challenging data mining problems due to its particular properties. One of its subareas is assessing similarity between time series, i.e., the degree to which a given time series resembles another one is the core of many tasks, such as retrieval and clustering, classification, and even forecasting [7].
Analysis of time series in theoretical and practical aspects is an crucial part of the study of stock markets. Empirical research started in 1933 [8] and was focused on the analysis of the stock market as a single independent time series, often referred to as a univariate time series analysis. The financial time series consists, in this case, of single observations recorded sequentially over equal time increments. However, in recent decades, worldwide economies have become increasingly related to each other. Various phenomena such as politics, social media platforms, and even pandemics can influence a set of financial time series similarly. Nowadays, there are likely to be groups of stocks that follow similar time-based patterns behavior simultaneously or with some time delay; Therefore, a crucial question is raised: "what are all the stocks that behave similarly to given stock A? ". Nevertheless, devising a proper similarity measure to find a similar behavior among time series is a non-trivial task [3].
Besides Euclidean distance measures, many others can be found in the literature [9][10][11][12][13][14][15]. In 1999, Mantegna suggested a methodology known as the standard one adopted and followed by many researchers in different areas. As of 2020, his paper and his book, Ref. [16] have been cited several thousand times. To calculate the distance (similarity) among assets, Mantegna recommends the use of correlation among returns. Many researchers aim to improve his method concerning the clustering algorithm or the distance measure itself in continuation of his work. This can be briefly described as follows.
Let N be the number of assets, P i (t) be the price at time t of asset i, 1 ≤ i ≤ N, then the log-return of an asset r i (t), is calculated as follows: To determine the distance (similarity) between each pair (i, j) of assets, he suggests to compute the correlation of returns and then correlation coefficients (p ij ) into distance using the following equation: He employs a minimum spanning tree (MST) to cluster the most similar assets in the form of a tree. A distinctive indexed hierarchy is another product of the resulting MST, corresponding to the one given by the dendrogram obtained using the single linkage clustering algorithm. However, there are few concerns regarding this standard methodology. The biggest problem is instability, which can be partially caused by the MST algorithm or by the correlation coefficient. Note that the correlation is not applied to the financial time series but to their returns; therefore, the price values' dependence might be lost. Moreover, it is known that Pearson linear correlation is sensitive towards outliers and generally not suitable for other probability distributions except for the Gaussian one. It is also challenging to interpret linkage changes during the time since there is a high level of statistical uncertainty associated with the correlation estimation [17]. Interestingly, as one might expect, the higher the correlation coefficient between an asset pair, the more reliable their link should be. However, in [18], the authors show that this hypothesis is not always satisfied in practice. Mantegna [19] concludes that a better approach is needed to use a distance measure other than the expected square deviation and one that is distribution-free. Thus, researchers have investigated different measures of distance from specific viewpoints.
The similarity between two-time series should not be computed based on their values only but also based on the corresponding slopes. This is important specifically in the stock market because relative variations in the price values affect the trading performance. A positive slope indicates an uptrend, and a negative slope indicates a downtrend. However, a question is raised, how we can include the slopes. A possible and very reasonable solution is provided by the fuzzy transform (F-transform). Recall that we distinguish degrees of the F-transform: the zero-degree F-transform provides components giving information about the given function's average values in a specified area. The first-degree F-transform provides an estimation of the tangent of the given function in a specified area.
This paper aims to suggest a new similarity index between two time series that combines both values of time series and their slopes. We prove some of its properties of this index and demonstrate its behavior on a selection of several real financial time series.
The structure of this paper is as follows. Section 2 is an overview of the main principles of the fuzzy transform and its properties. In Section 3, we introduce the new similarity index and prove some of its properties. In Section 4, we demonstrate our index on several financial time series.

The Principle of F-Transform
The fuzzy (F-)transform is a universal approximation technique which was introduced by I. Perfilieva in [20,21]. Its fundamental idea consists in two steps: The F-transform has the following properties: • it is a universal approximator, • it has ability to filter out high frequencies and to reduce noise [22], • it provides estimation of average values of derivatives over an imprecisely specified area [23], • its computational complexity is polynomial.
The parameters of the F-transform can be set in such a way that the approximating functionf has the desired properties.
These properties make the F-transform suitable for applications in various tasks when processing time series. More about F-transform and its applications can be found in [24].

Fuzzy Partition
Recall that by a fuzzy set, we understand a function A : U −→ [0, 1] where U is a set (a universe) and [0, 1] is the interval of reals understood as a set of truth degrees. The element A(u) ∈ [0, 1] is called membership degree of u ∈ U in the fuzzy set A. In general, it is a support of a certain algebra (cf. Section 2.3) whose operations induce operations with fuzzy sets ( We refer the reader to the extensive literature on fuzzy set theory and fuzzy logic, for example [24] and the citations therein). The support of a fuzzy set A is the set The fundamental step of the F-transform procedure is forming a fuzzy partition of the domain [a, b], which is a finite set of fuzzy sets defined over nodes c 0 , . . . , c n ∈ [a, b] such that and for each k = 0, . . . , n, A k (c k ) = 1. Furthermore, each fuzzy set A k , k = 1, . . . , n − 1, has the support (c k−1 , c k+1 ), which implies that . For k = 0 and k = n, we consider only halves of the functions A k , i.e., A 0 has the support (c 0 , c 1 ) and A n the support (c n−1 , c n ). A typical h-uniform triangular fuzzy partition is depicted in Figure 1.

Remark 1.
In general, the fuzzy sets from A k ∈ A must fulfill five axioms, namely: normality, locality (bounded support), continuity, unimodality, and orthogonality, where the latter is ]. For precise formulation of these axioms see [20,24].
A fuzzy partition A is called h-uniform if the nodes c 0 , . . . , c n are h-equidistant, i.e., for all k = 0, . . . , n − 1, c k+1 = c k + h, where h = (b − a)/n and the fuzzy sets A 1 , . . . , A n−1 are shifted copies of a generating function A :

Zero Degree Fuzzy Transform
After determining the fuzzy partition A 0 , . . . , A n ∈ A, we construct a direct F-transform of a continuous function f as a vector One can see that each F 0 All the details and full proofs can be found in [20,21].

Higher Degree Fuzzy Transform
The components F 0 k [ f ] of the zero degree F-transform are real numbers (in the sequel, we will write F 0 -transform). If we replace F 0 k [ f ] by polynomials of m-th degree, m ≥ 0, we obtain a higher degree F-transform (F m transform). A detailed description of this F-transform including full proofs of its properties can be found in [21]. It is important to note that the F 1 transform enables us to also estimate derivatives of the given function f over a non-precisely specified area. The with the coefficients β 0 k , β 1 k given by Note (3). The F 1 -transform enjoys the properties stated in Theorem 1 (see [21]). In comparison with the F 0 transform, it is more precise. Let us remark that in general, we can define n-th degree F-transform. Of course, for higher n, it is more complex but also more precise. In practice, it is sufficient to consider only n ∈ {0, 1, 2}.
The following theorem is important for mining information from time series [24].
Theorem 2. If f is four-times continuously differentiable on [a, b], then for each k = 1, . . . , n − 1, Thus, each F 1 -transform component provides a weighted average of values of the function f in the area around the node c k (7), and also a weighted average of slopes (22) of f in the same area. Lemma 1. If A is an h-uniform fuzzy partition and each basic function A k ∈ A has a triangular shape, then the following can be proved:

Lemma 2. Let A be an h-uniform fuzzy partition with triangular membership functions and let
Proof. (a) Using (10) we have: (b) Using (11) we have:

Time Series and F-Transform
A time series is a stochastic process (see [25,26]) where Ω is a set of elementary random events and T = {0, . . . , p} ⊂ N is a finite set whose elements are interpreted as time moments. Statistical models assume that each X(t), t ∈ T is a random variable having a specific distribution function. Fuzzy techniques are based on the following decomposition model: where TC(t) is a trend-cycle that can be further decomposed into trend and cycle, i.e., TC(t) = Tr(t) + C(t). The S(t) is a seasonal component that is a mixture of r periodic functions where λ 1 , . . . , λ r are frequencies and P j , j = 1, . . . , r are constants. Without loss of generality, we can assume that the frequencies are ordered λ 1 < · · · < λ r . Note that TC and S are ordinary non-stochastic functions. Only R is a random noise, i.e., a stationary stochastic process such that the mean E(R(t, ω)) = 0 and variance Var(R(t, ω)) < σ, t ∈ T.
In practice, we always have only one realization of time series at disposal. Formally, this means that we fix ω ∈ Ω. Then, we write (14) and understand that X is an ordinary real (or complex) valued function.
Let us now assume that inside (14), the real time series is hidden where 0 < s < r and R ≤ R. In other words, we assume that the time series X is "spoiled" by some frequencies that are too high λ r−s+1 , . . . , λ r (they may correspond, e.g., to a certain unwelcome volatility) that are removed in Z.
The following theorem demonstrates the power of the F-transform for time series analysis.
Theorem 3. Let X(t) be realization of the stochastic process (4) and Z be its smoothed version (15). Let A be a fuzzy partition over the set of equidistant nodes (2) with the distance h. Let the fuzzy sets A k ∈ A be N-times differentiable and put d = hλ r−s . (a) The corresponding inverse F-transformX of X(t) gives the following estimation of Z: for t ∈ T, where ω(2h, Z) is a modulus of continuity of Z w.r.t. 2 h (The modulus of continuity of a function f :

]}). (b) E(R(t)) = E(R(t)) = 0 and Var(R(t)) ≤ Var(R(t)) < σ.
The details for the proof of this theorem can be found in [20,[27][28][29]. The theorem holds both for the F 0 -as well as for F 1 -transform. It follows from this theorem that the F-transform filters out frequencies higher than a given threshold and reduces the noise R. In [27], it was even proved that if the correlation function of R has a quick decay, then lim h→∞ Var(R(t)) = 0.

Remark 2.
The fuzzy partition A is not constructed over the discrete set of natural numbers T = {1, . . . , p} but over the interval of real numbers [0, p].
It follows from Theorem 3 thatX ≈ Z; we can estimate the real time series Z with high fidelity. First, we set a proper fuzzy partition and and compute the F 0 -transform of X(t): Then, we compute the inverseX, which approximates the real time series Z. According to [22], we should set for some natural number q (in practice, it is sufficient to put q ∈ {1, 2}). The frequencies λ 1 , . . . , λ r can be found using the well known periodogram-see [25,26].

Remark 3. The edge components F 0 [X], F n [X]
are distorted because only half of the corresponding basic functions are used. This problem can be solved in two ways: either we confine only to the complete components F k [X] for k = 1, . . . , n − 1, or we artificially prolong T to the left and right by h and extrapolate the corresponding values X(t −i ) = X(t i ) for i = 1, . . . , h, and similarly in the right side of T.

Fuzzy Equality
Let [0, 1], ∨, ∧, ⊗, →, 0, 1 be an algebra of truth values where ∨ = max, ∧ = min, . This algebra is called Łukasiewicz standard algebra (Of course, there are also other algebras used as algebras of truth values for fuzzy set theory and fuzzy logic. However, the Łukasiewicz algebra, has a prominent position because of its good properties in many respects and, therefore, we confine our theory to it only). It serves us as the algebra of truth values. The operations with fuzzy sets are defined using the operations on it [30,31].
A binary fuzzy relation is a fuzzy set R : (fuzzy equality in the degree 1 reduces to the classical equality).

Definition 2.
(i) A binary fuzzy relation . = on U is a fuzzy symmetry if it is reflexive and transitive. (ii) A fuzzy symmetry is a fuzzy equality if it is also transitive.

Similarity of Time Series
As already mentioned, there are many kinds of similarity indexes introduced. Most of them are based on the distance between the values of time series (cf., e.g., [2,[32][33][34]). The problem is that all such indexes are necessarily distorted by random noise. Consequently, the real shape of time series is hidden. Solution of these difficulties can be given by the fuzzy transform.
Let us consider two time series X, Y with the same time domain T. Since the time series values can fall within very different ranges, we first normalize both time series to make their values comparable. The normalization will be done w.r.t. maximal values X = max{|X(t)| | t ∈ T}, Y = max{|Y(t)| | t ∈ T}. Then, we put Let us choose two numbers h 0 , h 1 > 0, compute n 0 = |T| Let us compute components of zero-and first-degree F-transforms of the time series (18) and (19): where the zero-degree components are computed on the basis of the fuzzy partition A, and the first-degree ones on the basis of B. Note that in the direct F-transforms above, we omitted the first and the last components. For further processing, we need only the coefficients β 0 k , k = 1, . . . , n 0 − 1, and β 1 k , k = 1, . . . , n 1 − 1 defined in (5) and (6). Based on that, we form to each time series X N , Y N , two new reduced time series, namely a time series of values and that of tangents: Hence, β 0 X and β 0 Y are time series of average values of the respective time series X N , Y N over the imprecisely specified areas A k ∈ A, k = 1, . . . , n − 1, and β 1 X and β 1 Y are time series of average values of tangents of the time series X N , Y N over the imprecisely specified areas B k ∈ B, k = 1, . . . , n − 1.
Definition 3. Let X, Y be time series (14). Then the index of similarity of two time series is the number where ϕ is a common normalization factor assuring that both |β 1 X,k |, |β 1 Y,k | ≤ 1 for all k = 1, . . . , n 1 − 1 and κ 0 , κ 1 are sensitivity constants.
The suggested similarity index thus considers not only distances between average values of time series but also distances between average values of tangents in the same areas. The constants κ 0 , κ 1 increase or decrease sensitivity of values and slopes of the compared time series. Clearly, if κ 0 > 1 then S(X, Y) is more sensitive to differences between the corresponding values of X, Y, while κ 1 > 1 does the same for their slopes.
The normalization factor ϕ can be specified, e.g., as follows. Let X, Y, Z be time series (14) and put is a normalization factor common for X, Y and is a normalization factor common for X, Y, Z.
The following is immediate. Proof. (a) If X = Y then, obviously, S(X, Y) = 1. The symmetry follows immediately from the properties of absolute value.
(b) Let ϕ be a common normalization factor for X, Y, Z. After rewriting, we obtain If the left-hand side is equal to 0, then the inequality is trivially fulfilled. By the assumption, we have to verify that which holds using the triangular inequality and the properties of ordered groups.

Demonstration
In this section, we will apply the similarity index (26) to real data. In all cases, we used F 0 -transform with h = 3 and the sensitivity constant κ 0 = 2.5, and F 1 -transform with h = 5 and the sensitivity constant κ 2 = 2. These parameters were estimated according to the expert opinion based on practical experience. Setting h = 3 is determined so as to not harm the real course of the time series too much, since longer h leads to greater smoothing. Similarly, h = 5 is determined by the idea that the slope should be evaluated over a larger area since otherwise, it can be non-convincing. The parameters κ 1 , κ 2 are estimated by testing the behavior of the index. The conditions for the setting of all four parameters are still a topic of further investigation.
For demonstration, we are inspired by a study [35]. In this paper, Junior et al. provide a view of which indices are the strongest influencers between 83 international market indices. Their results suggest that France and Germany were among the top index to send out the information to other markets. Based on the correlation dependency, the Czech Republic was among the top information receivers. Therefore, we chose these international stock exchange indices with the Russian market as a sample for demonstration. The data contain daily adjusting closing prices for 2016 ( Four international stock exchange indices, namely, Prague (PX), Paris (FCHI), Frankfurt (GDAXI), and Moscow (MOEX), were obtained from Yahoo finance). Several exciting events, such as the falling of oil prices and the Brexit announcement and voting, heavily affected the stock market in 2016 and its interrelations. Therefore, it is valuable to investigate the underlying relations. Figure 2 demonstrates the daily closing price of the above-mentioned four markets. Due to different national holidays, each market contains some missing values, which were omitted for similarity measurement. Using the suggested similarity index, we evaluate pairwise similarities between the stocks and compare them with the known correlation coefficients based on rank correlations, namely Spearman, Kendall and Hoeffing D. The results are summarized in Table 1. The results reveal several interesting relations. The most similar stock to Prague is the Paris market, while Moscow has the lowest similarity. Another exciting relation is with regard to the Frankfurt market. The pair (Paris-Frankfurt) has a higher similarity in comparison with (Prague-Frankfurt). These relations are also visible in Figure 2. In Figures 3-6, we demonstrate the behavior of the normalized prices for the mentioned similarity indices.
To see whether the similarity index (26) reacts on very dissimilar time series, we artificially distorted the Moscow market and also artificially inverted the Prague market's price values. The comparison (Prague-Moscow distorted) and (Prague-Prague inverted) is in Figures 7 and 8. As one expects, there is zero similarity between Prague and distorted Moscow. However, the similarity between Prague and its inverted version remained nonzero, which is caused mainly by the found similarities between their corresponding slopes.    The rank correlation coefficients are a measure of a monotonic relationship. We chose them because they are less sensitive to outliers. Hoeffding's D correlation is a measure of a non-linear and non-monotonous relationship. Note, that the sign of the Hoeffding's D coefficient has no interpretation.
If the Spearman (and Kendall) coefficient is close to 0 and the Hoeffding coefficient is high, then the relationship is probably non-monotonic and non-linear. In our case, this did not happen. According to the statistical tests, all the correlation coefficients are non-zero (the p-values are practically zero) if the similarity index (26) is high. In the case S(Prague, Moscow-distorted) = 0, the correlation coefficients are statistically zero. However, when comparing (Prague-Prague inverted), the statistical tests show negative dependence, while S(Prague, Prague artificial inverse) = 0.34. This is correct because the correlation coefficients measure dependence, while (26) measures similarity. The slopes of both time series have opposite signs and so, they decrease the value of the similarity (26). However, when inspecting Figure 7, one can see similarity in values of both time series and, therefore, the index still has non-zero value.

Conclusions
In this paper, we developed a new method for measuring similarity between time series. The method is based on the application of the fuzzy transform. The index compares F 0 -transform and F 1 -transform components. While the former measures similarity between time series values after the highest frequencies were removed and noise reduced, the former measures similarity between slopes of time series in short local periods.
We demonstrated the application of our index to four real financial time series and two artificial ones. Experimental results confirm its ability to measure the similarity between time series.
Further work will focus on the extension of the method to time series of various lengths. Another direction is to judge the evolution of the similarity throughout the years. We also plan to extend this idea to measuring dependence between time series.