(Multiscale) Cross-Entropy Methods: A Review

Cross-entropy was introduced in 1996 to quantify the degree of asynchronism between two time series. In 2009, a multiscale cross-entropy measure was proposed to analyze the dynamical characteristics of the coupling behavior between two sequences on multiple scales. Since their introductions, many improvements and other methods have been developed. In this review we offer a state-of-the-art on cross-entropy measures and their multiscale approaches.

The multiscale approach of entropy measures was proposed by  to analyze the complexity of a time series [14]. In 2009, Yan et al. proposed a multiscale approach for cross-entropy methods to quantify the dynamical characteristics of coupling behavior between two sequences on multiple scale factors [15]. Then, other multiscale procedures have been published with different cross-entropy methods [16,17]. Multiscale cross-entropy methods have recently been used in different research fields, including medicine [18][19][20][21], finance [6,9], civil engineering [22], and the environment [23].
Cross-entropy methods and their multiscale approaches are used to obtain information on the possible relationship between two time series. For example, Wei et al. applied percussion entropy to the amplitude of digital volume pulse signals and changes in R-R intervals of successive cardiac cycles for assessing baroreflex sensitivity [18]. Results showed that the method is able to identify the markers of diabetes by the nonlinear coupling behavior of the two cardiovascular time series. Moreover, Zhu and Song computed cross-fuzzy entropy on a vibration time series to assess the bearing performance degradation process of motor [13]. Results showed that the method detects trend for bearing degradation process over the whole lifetime. In addition, Wang et al. applied multiscale cross-trend sample entropy to analyze the asynchrony between air quality impact factors (fine particulate matters, nitrogen dioxide, . . . ), and air quality index (AQI) in different regions of China [23]. Results showed that the degree of synchrony between fine particulate matter and AQI is Cross-approximate entropy (cross-ApEn), introduced by Pincus and Singer [1], allows to quantify asynchrony between two time series. For two vectors u and v of length N, cross-ApEn is computed as: log C m i (r)(v||u) and C m i (r)(v||u) is the number of sequences, of m consecutive points, of u that are approximately (within a resolution r) the same as sequences, of the same length, of v. One major dawback of this approach is that C m i (r)(v||u) should not be equal to zero. This is why cross-ApEn is not really adapted for a short time series. Furthermore, it is direction-dependent because often Φ m (r)(v||u) is generally not equal to its direction conjugate Φ m (r)(u||v) [2]. The value of cross-ApEn computed from two signals can be interpreted as a degree of synchrony or mutual relationship.

Binarized Cross-Approximate Entropy
Binarized cross-approximate entropy (XBinEn), introduced by Škorić et al. [5] in 2017, is an evolution of cross-ApEn to quantify the similarity between two time series. It has the advantage of being faster than cross-ApEn. XBinEn encodes a time series divided into vectors of length m. For two vectors u and v of length N, the XBinEn algorithm follows these six steps:

1.
Binary encoding series are obtained as: The time lag t allows a vector decorrelation to be performed;
The probability p m k (r) that a vector is within the distance r from a particular vector is estimated: 6. XBinEn is finally obtained as: where Φ (m) (r, N, t) = ∑ 2 m −1 k=0 P (m) X (k) · ln (p m k (r)).
This method gives almost the same results as cross-ApEn for a non-short time series. However, it is computationally more efficient than cross-ApEn. Its main disadvantage is that it cannot identify small signal changes. XBinEn is adapted to environments where processor resources and energy are limited but it is not a substitute to cross-ApEn [5]. It is proposed when the cross-ApEn procedure cannot be applied. The value of XBinEn computed from two signals can be interpreted as a degree of relationship between a related pair of time series.

Cross-Sample Entropy
Cross-sample entropy (cross-SampEn) quantifies the degree of asynchronism of two time series. This method was introduced by Richman and Moorman in 2000 to improve the cross-ApEn limitations (see Section 2.1.1) [2]. Cross-SampEn is a conditional probability measure that quantifies the probability that a sequence of m consecutive points (called sample) of a time series u-that matches another sequence of the same length of another time series v-will still match the other sequence when their length is increased by one sample (m + 1). For two vectors u and v, cross-SampEn is computed as: where m is the sample length, N is the vectors (u and v) length, A m (r)(v||u) and B m (r)(v||u) are, respectively, the probability that a sequence of u and a sequence of v will match for m + 1 and m points (within a tolerance r).
For two time series u and v of length N, cross-SampEn can also be described as: where n (m) represents the total number of sequences of m consecutive points of u that match with other sequences of m consecutive points of v.
The main difference between cross-ApEn and cross-SampEn is that cross-SampEn shows relative consistency whereas cross-ApEn does not. Unlike cross-ApEn, cross-SampEn is not direction-dependent. However, cross-SampEn generates, sometimes, undefined values for short time series. The value of cross-SampEn computed from two time series can be interpreted as a measure of similarity of the two time series.

Modified Cross-Sample Entropy
Modified cross-sample entropy (mCSE), introduced by Yin and Shang in 2015, has been developed to detect the asynchrony of a financial time series [4]. Inspired by the generalized sample entropy, proposed by Silva and Murta, Jr. [25], the authors proposed to adapt this method to cross-SampEn. The method combines cross-SampEn and nonadditive statistics. For two vectors u and v of length N, mCSE is computed as: where m is the sample length, q is the entropic index, and n (m) i is the number of times that the distance between vectors The value of mCSE computed from two time series can be interpreted as a degree of synchrony between the two time series and it can illustrate some intrinsic relations between the two time series.

Modified Cross-Sample Entropy Based on Symbolic Representation and Similarity
Modified cross-sample entropy based on symbolic representation and similarity (MCSEBSS), introduced by Wu et al. in 2018, has been developed to quantify the degree of asynchrony of two financial time series with various trends (stock markets from different areas [6]). In comparison with cross-SampEn, this method reduces the probability of including undefined entropies and it is more robust to noise. For two vectors u and v of length N, MCSEBSS is computed as: where m is the sample length and n (m) is the number of template matches by comparing s(u m (i), v m (j)) and r. For where count(i, j) is obtained by the function f defined as: The parameter r must be fixed between m−n m+1 and m−n m , where n is the maximum number of zeros obtained with count(i, j) to consider u and v similar.
The value of MCSEBSS computed from two time series can be interpreted as a degree of asynchrony of the two time series. A low cross-entropy value indicates a strong synchrony between two signals.

Kronecker-Delta-Based Cross-Sample Entropy
The Kronecker-delta-based cross-sample entropy (KCSE), introduced by He et al. in 2018, has been developed to define the dissimilarity between two time series [7]. KCSE is based on the Kronecker-delta function δ x,y that returns 1 if two variables are equal and 0 otherwise. For two vectors u and v of length N, KCSE is calculated as: where Authors show that KCSE is better to classify financial data than multidimensional scaling based on the Chebyshev distance method [7]. The value of KSCE computed from two time series can be interpreted as a degree of irregularity between the two time series.

Permutation-Based Cross-Sample Entropy
The permutation-based cross-sample entropy (PCSE), introduced by He et al. in 2018, is quite similar to KCSE (see Section 2.2.4) [7]. A permutation step has only been added. For two vectors u and v of length N, PCSE is calculated as: The KrD function is defined in Section 2.2.4. The two vectors permuX m (i) and permuY m (i) are obtained by a permutation algorithm defined with the permutation entropy [26]. The Video S1 shows an example of a permutation algorithm.
PCSE shows better results than KCSE for synthetic data (ARFIMA model) [7]. However, the two approaches give the same results for financial data [7]. Authors show that KCSE is better to classify financial data than multidimensional scaling based on the Chebyshev distance method [7]. The value of PCSE computed from two time series can be interpreted as degree of irregularity between the two time series.

Cross-Trend Sample Entropy
Inspired by MCSEBSS (see Section 2.2.3), Wng et al. developed the cross-trend sample entropy (CTSE) to quantify the synchronism between two time series with strong trends [23]. For two time series u and v of length N, CTSE is calculated with the following four steps algorithm: 1.
The two time series are symbolized as: whereũ andṽ are, respectively, the trend of u and v obtained by polynomial fitting (linear, quadratic or higher order).

2.
The template vectors u m and v m are constructed as: where 0 k m − 1 and 1 i N − m.

3.
The similarity between x m (i) and y m (i) is calculated as: where the i-th symbol vector C m is determined with f , a symbolic function between two template vectors u m and v m , as: 4. CTSE is finally computed as: where n (m) is obtained by comparing d(x m (i), y m (j)) within a tolerance r for 1 i N − m.
CTSE has two advantages over MCSEBSS: It is more sensitive to the difference of dynamical characteristic between two signals, and it works well with signals with trends (linear, quadratic, cubic, and sinusoidal) [23]. The value of CTSE computed from two time series can be interpreted as an indicator of dynamical structure regarding the two time series with potential trends.

Cross-Distribution Entropy
In 2018, Wang and Shang introduced the cross-distribution entropy (cross-DistEn) to quantify the complexity between two cross-sequences [9]. To generalize the standard statistical mechanics, the authors replaced the standard distribution entropy (DistEn) based on Shannon entropy by DistEn based on Tsallis entropy [9]. The authors showed that cross-DistEn better illustrates the relationships between two vectors than cross-SampEn does [9]. For two times series u and v of length N cross-DistEn follow these four steps:

1.
The state-space is reconstructed by building (N − m + 1) vectors X(i) and Y(i) with X(i) = {u(i), u(i + 1),..., m is the intended size of the vectors X(i) and Y(i); 2.
The distance matrix is built by defining the distance matrix D = {d i,j } with d i,j being the Chebyshev distance between the vectors X(i) and Y(j) defined as: 3.
The probability density is estimated by computing the empirical probability density function of the matrix D by applying the histogram approach. If the histogram has M bins, the probability of each bin will be P t with 1 ≤ t ≤ M; 4.
The cross-distribution entropy based on the Tsallis entropy is computed as: where q is the order of the Tsallis entropy and a the logarithm base of the entropy computation.
The main advantage of cross-DistEn is that it is adapted for short time series. With financial data, cross-DistEn illustrates better the relationship between signals than cross-SampEn [9]. The value of cross-DistEn computed from two time series can be interpreted as a degree of linkage of the two time series.

Permutation Cross-Distribution Entropy
The permutation cross-distribution entropy (PCDE), introduced by He et al. in 2019, is a variant of cross-DistEn (see Section 2.3.1) [10]. The permutation allows to characterize fluctuations and prevents the impact of spatial distances on results. The PCDE algorithm is the same as the one of cross-DistEn, detailed in Section 2.3.1. However, an additional step is added before step 2 to permute X(i) and Y(j) with the permutation algorithm mentioned in Section 2.2.5. The distance matrix is therefore constructed with the permuted vectors. The value of PCDE computed from two time series can be interpreted as a degree of dissimilarity between the two time series.

Cross-Conditional Entropy
Cross-conditional entropy (CCE), introduced by Porta et al. in 1919, quantifies the degree of coupling between two signals [8]. A corrected conditional entropy has been introduced to improve the approximate entropy that suffers from limitations when a finite number of sample is considered [27]. CCE is an adaptation of the corrected conditional entropy. For two signals u = {u(i), i = 1, ..., N} and v = {v(i), i = 1, ..., N}, CCE is computed as: where L is the length of the pattern extracted to be compared, p(u L−1 ) is the joint probability of the pattern u L−1 (i) = (u(i), u L−1 (i − 1)), and p(v(i)/u L−1 ) is the probability of the sample v(i) given the pattern u L−1 (i). If a mixed pattern, composed by L − 1 samples, of u and v: (v(i), u(i), ..., u(i − L + 2)) = (v(i), u L−1 ), is defined and with the Shannon entropy E(u L ) = − ∑ L p(u L ) log p(u L ), CCE can also be described as: For a limited amount of samples, the approximation of CCE always decreases to zero while increasing L. To solve this problem, a modification has been introduced as: where perc v/u is the ratio of mixed patterns found only once over the total number of mixed patterns, CCE v/u (L) and E(v) are, respectively, the estimates of the CCE v/u (L) and E(v) based on the considered limited dataset. CCE can be defined as a measure of unpredictability of one signal when the second is observed because it quantifies the amount of information carried by one signal which cannot be derived from the other. It is not fully a measure of synchronization. The main disadvantage of CCE is that it is not totally adapted for short time series.

Cross-Fuzzy Entropy
Cross-fuzzy entropy (C-FuzzyEn), introduced by by Xie et al. in 2010 [3], is an adaptation of fuzzy entropy, introduced by Chen et al. [28], that quantifies the synchrony or similarity of patterns between two signals [3]. C-FuzzyEn is an improvement of cross-SampEn that is more adapted for short time series and more robust to noise. For two times series u and v of length N, C-FuzzyEn is obtained with the following three steps algorithm:

1.
The distance d m ij between X m i and Y m j is computed as: where m is the number of consecutive data to compare, and v(i) are calculated as: The synchrony or similarity degree D m ij is computed as: D m ij = µ(d m ij , n, r), where µ(d m ij , n, r) is the fuzzy function obtained as: where r and n determine the width and the gradient of the boundary of the exponential function, respectively; 3.
Finally, C-FuzzyEn is computed as: The value of C-FuzzyEn computed from two time series can be interpreted as the synchronicity of patterns.

Joint-Permutation Entropy
Joint permutation entropy (JPE), introduced by Yin et al. in 2019, quantifies the synchronism between two time series. It is based on permutation entropy that consists of comparing neighboring values of each point and mapping them to ordinal patterns to quantify the complexity of a signal [26]. For two signals u and v, JPE is computed as the Shannon entropy of the d! × d! distinct motif combinations {(π d,t i , π d,t j )}: where d is the embedded dimension and p(π d i , π d j ) is the joint probability of {(π d,t i , π d,t j )} appearing in the X d,t l = {u l , u l+t , ..., u l+(d−1)t } and Y d,t l = {v l , v l+t , ..., v l+(d−1)t } and it is defined as: where T = N − (d − 1)t, type(·) corresponds to the map from pattern space to symbol space, and || · || corresponds to the cardinality of a set. The main advantages of JPE are the simplicity, the robustness, and the low computational cost. The value of JPE computed from two time series can be interpreted as a degree of correlation between the two time series [29].

Multiscale Procedures
To study entropy or cross-entropy measures of time series across scales, a multiscale procedure can be used. In this part we detail, in chronological order, three multiscale methods: The coarse-grained, the time-shift, and the composite coarse-grained approaches.

Coarse-Graining Procedure
In 2002 Costa et al. introduced the coarse-graining procedure to analyze the complexity, defined by the analysis of the irregularity through scale factors [14]. This method is an improvement, more adapted for a biological time series, of the coarse-graining procedure introduced by Zhang [30]. This procedure has been used in multiscale entropy and cross-entropy methods [6,9,15,20,[31][32][33]. For each scale factor, this procedure derives a set of vectors illustrating the system dynamics. For a monovariate discrete signal x of length N, the coarse-grained time series y (τ) is calculated as: where τ is the scale factor and 1 j N τ . The length of the coarse-grained vector is N τ . An example of coarse-graining procedure is presented in Figure 1A.  [34]), (B) shows the time-shift procedure, and (C) illustrates the composite coarse-graining procedure (modified from [35]).

Time-Shift Procedure
As for the coarse-grained procedure, the time-shift procedure is used to decompose a signal through different scale factors and to perform a multiscale analysis. While coarse-graining procedure uses the averaging of time series on several interval scales, the time-shift procedure applies time shifting in time series. The main disadvantage of a coarse-graining procedure is the loss of pattern information hidden in the time series. To overcome this limitation, Pham used the Higuchi's fractal dimension (HFD) [36] and proposed a new multiscale analysis [37]. The time-shift procedure illustrates the fractal dimension of a signal. This method has been recently used with entropy and cross-entropy measures [17,[37][38][39]. HFD shows stable numerical results for stationary, non-stationary, deterministic, and stochastic time series [40]. For a monovariate discrete signal x of length N, the β time-shift signal y (τ) β is calculated as: y For each time scale τ, β time-shift time series are computed (β = 1, 2, ..., τ). An illustration of the time-shift procedure is presented in Figure 1B.

Composite Coarse-Graining Procedure
The coarse-graining procedure, introduced by Costa et al. [14], increases the variance of estimated entropy values at large scale. To overcome this limitation, by Wu et al. introduced in 2013 a composite coarse-graining procedure [35]. This method has been used with entropy and cross-entropy measures [16,32]. For a monovariate discrete signal x of length N, the k-th composite coarse-grained time series y (τ) k is computed as: where 1 j N τ . For each time scale τ, k composite coarse-grained time series are computed (1 k τ). An illustration of the composite coarse-graining procedure is presented in Figure 1C.

Generalization
Multiscale cross-entropy (MCE) methods consist of applying a cross-entropy measure for each scale factor obtained by a specific procedure. For each scale factor τ, MCE is computed as: where X (τ) and Y (τ) are computed with a multiscale procedure (see Section 3), k is the number of time series that are generated by the multiscale procedure (k = 1 for the coarse-graining procedure and k = τ for the time-shift and the composite coarse-graining procedures), and crossEn is the cross-entropy method used (see Section 2). Table 2 shows the multiscale cross-entropy methods that can be generalized with Equation (35). Before the computation of MCE, a pre-treatment can be performed. For example, the asymetric multiscale cross-SampEn (AMCSE) method [33] decomposes each signal into two, one for the positive trends and the other for the negative trends, before applying a coarse-graining procedure and cross-SampEn.

Particular Cases
Some multiscale cross-entropy methods cannot follow the generalization previously introduced. In this part we detail three particular methods, in chronological order: The adaptive multiscale cross-SampEn, the refined composite multiscale cross-SampEn, and the percussion entropy.

Adaptive Multiscale Cross-Sample Entropy
The adaptive multiscale cross-sample entropy (AMCSE), introduced by Hu and Liang in 2011, assesses the nonlinear interdependency between different visual cortical areas [41]. The method uses the multivariate empirical mode decomposition (MEMD), introduced by Rehman and Mandic [42], to decompose two time series into intrinsic mode functions (IMFs) that represent the oscillation mode embedded in the data. For two time series u and v, AMCSE is calculated with the following three steps algorithm: 1.
The MEMD on u and v is performed to obtain N IMFs; 2.
The scales of data are computed in two directions, fine-to-coarse S τ f 2c and coarse-to-fine S τ c2 f , with the following two equations: The two directions can be used separately or used in tandem to reveal the underlying dynamics of complex time series; 3.
For each scale factor τ, the cross-SampEn (see Section 2.2.1) is applied between the two scales of data (S τ f 2c and S τ c2 f ) extracted from vectors u and v.

Refined Composite Multiscale Cross-Sample Entropy
Yin et al. introduced in 2016 the composite multiscale cross-sample entropy (CMCSE) that follows the generalization (see Section 4.1), where the composite coarse-graining procedure and cross-SampEn are used [16]. The main disadvantage of this method is that cross-SampEn generates some undefined values when the number of matched sample is zero. To overcome this limitation, Yin et al. introduced the refined CMCSE (RCMCSE). This method leads to better results with short time series. For two times series u and v of length N, RCMSE is computed with the following three steps algorithm:

1.
Coarse-grained time series are obtained with the composite coarse-graining procedure detailed in Section 3.3; 2.
For a scale factor τ, the number of matched vector pairs, n m k,τ and n m+1 k,τ , are calculated for all coarse-grained vectorsl 3.
For each scale factor τ, RCMCSE is computed as: where m is the dimension and of the matched vector pairs and r is the distance tolerance for the matched vector pairs.

Percussion Entropy
Wu et al. introduced, in 2013, the multiscale small-scale entropy index (MEI SS ) that is obtained by summing the values of entropy for the first five scale factors [43]. Percussion entropy, introduced by Wei et al. in 2019, allows one to quantify a percussion entropy index (PEI) [18]. The method has been introduced to assess baroreflex sensitivity. PEI compares the similarity in tendency of change between two time-series. This index has been compared to MEI SS . For two time series u and v of length N, PEI is computed with the following three steps algorithm:
PEI is calculated as: where φ m = ln ∑ n τ τ=1 P m τ and n τ is the number of scales to consider. Wei et al. [18] have chosen n τ = 5 in accordance with MEI SS . This algorithm is a generalization of the method developed by Wei et al. [18] for a specific time series, amplitudes of successive digital volume pulse signals and changes in R-R intervals of successive cardiac cycles. At the moment, it has not been used to process other kinds of signals.

Conclusions
In this review we proposed a state-of-the-art of cross-entropy measures, multiscale procedures, and multiscale cross-entropy methods. Multiscale cross-entropy methods offer other interesting perspectives for time series analysis. Furthermore, all the cross-entropy methods, detailed in this review, can be translated into multiscale cross-entropy methods with the multiscale procedures presented in this review.