Next Article in Journal
The Identification of Patterns in the Relation Between Biodiversity and Mutualistic Ecosystem Function Based on Network Resilience
Next Article in Special Issue
Irreversibility and Energy Transfer at Non-MHD Scales in a Magnetospheric Current Disruption Event
Previous Article in Journal
Adsorption Kinetics Model of Hydrogen on Graphite
Previous Article in Special Issue
Effects of Multiplicative Noise in Bistable Dynamical Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluating Methods for Detrending Time Series Using Ordinal Patterns, with an Application to Air Transport Delays

by
Felipe Olivares
,
F. Javier Marín-Rodríguez
,
Kishor Acharya
and
Massimiliano Zanin
*
Instituto de Física Interdisciplinar y Sistemas Complejos (CSIC-UIB), Campus UIB, 07122 Palma, Spain
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(3), 230; https://doi.org/10.3390/e27030230
Submission received: 26 December 2024 / Revised: 17 February 2025 / Accepted: 20 February 2025 / Published: 23 February 2025

Abstract

:
Functional networks have become a standard tool for the analysis of complex systems, allowing the unveiling of their internal connectivity structure while only requiring the observation of the system’s constituent dynamics. To obtain reliable results, one (often overlooked) prerequisite involves the stationarity of an analyzed time series, without which spurious functional connections may emerge. Here, we show how ordinal patterns and metrics derived from them can be used to assess the effectiveness of detrending methods. We apply this approach to data representing the evolution of delays in major European and US airports, and to synthetic versions of the same, obtaining operational conclusions about how these propagate in the two systems.

1. Introduction

In the past few decades, functional complex networks have emerged as a powerful tool for the analysis of the dynamics of real-world systems. Many of these are complex, in the sense of being composed of multiple elements interacting in a non-linear way, such that the global dynamics cannot be inferred by the behavior of the individual constituents [1]. At the same time, these interactions are seldom observable; hence, the problem is how to describe the structure created by leveraging only limited information. The solution often entails reconstructing complex networks [2], where nodes represent the individual elements of the system and links between pairs of nodes are established whenever their dynamics are shown to be connected. In other words, the dynamics of the elements are assumed to be a function of these connections, and detecting the latter from the former becomes an inverse problem. This approach has been applied in multiple scientific fields, spanning from neuroscience [3,4,5] and genetics [6,7,8] to climate [9,10]. It has more recently been used to unveil the patterns behind the propagation of delays in air transport [11,12,13], a problem impacting the cost-efficiency and safety of the system, and with an important negative impact on the environment [14,15]. In general terms, the analysis is conducted by extracting time series representing the evolution of delays at each airport and then applying a statistical test evaluating the presence of correlations or causalities between pairs of time series. The results of these tests can also be represented as a complex network, allowing more detailed topological analyses [2,16,17].
One of the fundamental prerequisites for obtaining representative functional networks is the stationarity of the time series used in their reconstruction. From a statistical perspective, a time series is said to be stationary when its properties do not have an explicit dependence on time and thus has no predictable patterns—e.g., there must be no trends or seasonalities. Please note that any measure averaged over a non-stationary time series will not be accurate and that standard tests (including the t- and F-tests) and models (like the Auto Regression Moving Average, ARMA) may yield incorrect results. Intuitively, time series representing the evolution of delays at a given airport are expected to be highly non-stationary; delays are usually correlated with the traffic volume, such that they will be higher at noon and lower at night and early morning. This creates regular oscillations that can be misinterpreted as correlations, or even causalities, between pairs of time series.
To illustrate this latter point, the top left panel of Figure 1 reports the fraction of pairs of airports for which metrics detect a flow of information for airports in Europe and the US. Details about the data and the four metrics are presented in Section 2 below. Results indicate that the propagation of delays forms an all-to-all complete network. While not impossible, this result has been obtained using a raw time series of delays and may thus be the consequence of their nonstationarity. Interestingly, when applying two classical stationarity tests, i.e., the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) [18] and the Augmented Dickey–Fuller [19] unit root tests, only the former detects the presence of regular trends. Additionally, adding a small amount of observational noise confounds both tests, while the four metrics still yield a large fraction of causality links—see top right panel of Figure 1.
In order to further validate the role of stationarity, we create surrogate time series by maintaining the same values observed within each day but randomly shuffling the days in each time series—see the bottom left panel of Figure 1 for a graphical representation. In these surrogate series, values at the same position will correspond to delays at the same time but on different days; hence, any detected relationship must be due to the trends within them and not to true propagation instances. As reported in the bottom right panel, many relationships are still detected. In short, ways to detrend these time series and to evaluate the final performance are imperative, in order to obtain reliable functional networks.
Looking at the results of Figure 1, one may be tempted to use these daily shuffled time series as a benchmark—i.e., if the number of detected links is low, it may be concluded that the detrending method is acceptable. This is, nevertheless, not a universal solution, as it depends both on the metric used to assess relationships and on the data. Again, referring to the bottom right panel of Figure 1, it can be appreciated that the Granger Causality (GC, light gray bars) yields substantially different results to the Mutual Information (MI, dark gray bars), even for the same data; and results are always worse for the US, probably due to the greater length of the time series. Consequently, when using GC on the European data, one may be misled to believe that the data are (at least partially) already detrended; but this would result in major biases when considering other metrics or datasets. In other words, obtaining few functional links with a given metric does not guarantee that the underlying time series are truly stationary; false connections may be found in different conditions.
In this contribution, we tackle the problem from an alternative perspective. Given a set of detrending methods, we ask whether independent metrics can be used to assess the number of temporal patterns in the detrended time series. The underlying hypothesis is that whenever these metrics detect no such patterns, the detrending process is effective, and hence, no spurious functional connections should be detected. Conversely, if patterns remain, the time series could not be assumed to have been effectively detrended, and the obtained functional connectivities should, therefore, not be trusted. Among the many possibilities available in the literature, we tackle this problem using ordinal patterns, i.e., a symbolic representation of time series values focusing on the relative amplitude of neighboring data points [20,21]. These patterns then allow the description of the serial dependence structure present in the studied time series in a simple way while providing advantages that include easy interpretability and low computational cost [22,23]. Two versions of the same are compared—respectively, a traditional (see Section 2.2.2) and a continuous one (Section 2.2.3), the latter being able to better incorporate amplitude information [24]. As previously hinted, we test this hypothesis using data representing the evolution of delays at major European and US airports (see Section 2.4) and synthetic versions of the same, with the aim of describing propagation patterns between them. The results presented in Section 3 depict a complex scenario in which detrending methods commonly used in other contexts here underperform. Conversely, when knowledge about the underlying system is included, highly stationary time series can be obtained, which can be used to extract interesting conclusions about the delay propagation process—as discussed in Section 4. In short, we here demonstrate that ordinal patterns-based metrics can provide a model-free methodology to evaluate and validate detrending algorithms.

2. Materials and Methods

In this section, we are going to introduce the methods and data considered in this work. Following the structure of the analysis, these will include the algorithms for detrending the data (see Section 2.1), the metrics to evaluate the correctness of the detrending process (Section 2.2), and the procedure for reconstructing functional connectivity (Section 2.3). We finally introduce the real air transport data on which the methodology will be tested in Section 2.4.

2.1. Detrending Methods

In order to remove the trends present in the considered time series, we here use some standard methods, both representing standard statistical approaches and developed using the knowledge of the problem at hand—specifically, the fact that delays have daily and weekly trends [25].
  • Identity (Ident). As a reference, in the following analyses, we include the results corresponding to not performing any detrending on the time series.
  • Delta (Delta). Simple detrending process based on evaluating the distance between the value at a given hour and the expected value observed for the same weekday and at the same hour: Δ x ( t ) = x ( t ) x ( t + 24 · 7 · k ) , with k Z and · representing the average.
  • Independent Component Analysis (ICA). The time series are detrended by subtracting the three main components detected throughout all the airports by an Independent Component Analysis [26]. While computationally more costly than other solutions, it presents the advantage of being able to detect trends with variable periods, provided they are shared by multiple airports.
  • Second Derivative (SecD). The second difference of the time series, i.e., x ( t ) = x ( t ) + x ( t 2 ) 2 x ( t 1 ) . This approach is customary in the literature when no information about the nature of the underlying periodic trends is available.
  • Z-Score by day (ZScore24). Detrending based on a Z-Score, defined as: z ( t ) = x ( t ) x ˜ ( t ) / σ x ˜ ( t ) . x ˜ ( t ) represents the set of values observed on different days at the same hour, i.e., x ( t + 7 · k ) with k Z . In turn, · and σ · represent, respectively, the average and the standard deviation operators. The Z-Score encodes how much the observed value deviates from the expectation, in this case from the delay observed at the same hour on other days; but, as a difference with respect to the Delta approach, it takes into account the variability of the data.
  • Z-Score by week (ZScore724). Same as the ZScore24, but taking as reference the delays observed at the same hour on the same day of the week. x ˜ ( t ) is thus here defined as x ( t + 24 · 7 · k ) with k Z . ZScore724 should, therefore, also detrend with respect to weekly patterns, e.g., weekdays vs. weekends.

2.2. Evaluating the Detrending Process

As outlined in the introduction, the quantity of information contained in the time series is here assessed through two metrics, the JSD and the COP, both based on the application of the concept of ordinal patterns. In what follows, we first introduce the latter one and then define the two metrics.

2.2.1. Ordinal Patterns in Time Series

Bandt and Pompe introduced an encoding scheme that maps a raw time series onto a corresponding sequence of symbols called ordinal patterns [27]. This is simple, model-free, and with strong resilience to noise [28,29,30]. Several applications pioneered by Prof. Osvaldo Rosso have confirmed the success of this symbolization technique in the evaluation of information quantifiers [31,32,33]. The existence of forbidden [28,34] and missing ordinal patterns [35,36] have been applied to different areas from finance [36,37], semiconductor lasers [38], atmospheric turbulence [36], chaotic optoelectronic systems [39], and hydrology [36]. Furthermore, ordinal patterns have been used to quantify persistence [40], symmetry [40], irreversibility [41,42], and serial dependences [22,23] in time series.
Given a time series X ( t ) = { x t ; t = 1 , , M } , sub-windows (or segments) of it, composed of D values, can be mapped into vectors. Specifically, D consecutive ( τ = 1 ) or non-consecutive ( τ > 1 ) values starting at time t can be transformed into a vector ( x t , x t + τ , , x t + ( D 1 ) τ ) of dimension D. Each element of the previous vector is then replaced by its relative ranking, from zero for the smallest value up to D 1 for the largest one. Next, this vector is itself encoded as an ordinal pattern, representing the permutation π i of 0 , 1 , , D 1 describing the relative amplitude (strength) of each element in the considered sub-window. Finally, all ordinal patterns, calculated for all possible sub-windows, are synthesized into a probability distribution # ( π i ) , usually normalized by the total number of ordinal patterns M ( D 1 ) τ . Please note that the condition M D ! must be satisfied in order to obtain reliable statistics [27].
To illustrate how we build the ordinal pattern probability distribution in this study, let us consider an example comprising a synthetic hourly mean delay sequence over n days, as shown in the left panel of Figure 2. For a pattern length D = 3 , we extract a set of ordinal patterns for each day, representing the ordinal information of the first D hours. After normalizing by n, a probability distribution is obtained. By repeating this procedure with sliding windows of length D and overlap of D 1 , we can generate a set of 24 distributions representing the ordinal probabilities of each hour of the n days. Please note that for the 24th hour, we consider the 2 h of the following day. We follow the recipe suggested in [27] for breaking ties by adding a small amount of Gaussian noise (zero mean). This latter ingredient is important due to the inactivity periods, for which the hourly mean delay equals zero for several consecutive hours.

2.2.2. Jensen–Shannon Divergence of Ordinal Patterns

The permutation Jensen–Shannon Distance (JSD in short) of ordinal patterns was recently introduced as a versatile metric for assessing the degree of similarity between the symbolic ordinal sequence statistics of two time series [43]. This metric relies on evaluating the similarity between the ordinal pattern probability distributions P = { p 1 , , p N } and Q = { q 1 , , q N } associated with the two time series under analysis, using the Jensen–Shannon divergence [44], providing a quantitative measure of their statistical resemblance,
D JS ( P , Q ) = S ( ( P + Q ) / 2 ) S ( P ) / 2 S ( Q ) / 2 ,
where S ( P ) = i = 1 N p i ln p i , is the classical Shannon entropy. The JSD is hence obtained by calculating the square root of Equation (1), and its normalized version reads
JSD ( P , Q ) = D J S ( P , Q ) ln 2 .
Higher values of the JSD suggest greater dissimilarity between the symbolic representations of two time series, while lower values indicate closer similarity. Intuitively, signals originating from the same underlying dynamics are expected to yield small ordinal distance values approaching zero but not exactly zero due to finite-size effects [43]. Additionally, it allows for easy hypothesis testing regarding the dynamical nature of any given time series by calculating its JSD relative to reference time series generated in alignment with a null model or surrogate sequences [43,45].
In the present analysis, we calculate the JSD between each of the 24 ordinal pattern distributions (one per hour, as illustrated in Figure 2) and their shuffled surrogates. These randomized sequences are constrained realizations that satisfy the null hypothesis, i.e., stationary and absence of temporal correlations across days. Moreover, in such a way, finite-size and amplitude distribution effects are taken into account. Finally, to obtain one value for each airport we average the JSD over the 24 h.

2.2.3. Continuous Ordinal Patterns

Continuous Ordinal Patterns [24] (COPs) are a recently proposed modification of the original idea of ordinal patterns previously described, in which the output, instead of being discrete (i.e., a single ordinal pattern), is continuous [24]. Specifically, given a COP and a sub-window of the time series, this approach entails calculating the distance between both. When compared to classical ordinal patterns, this yields the advantage of naturally including the (local) amplitude information of the time series under analysis. On the other hand, the pattern moves from being an output of the analysis to being an input of the same. A brief discussion of the COP methodology is reported here for the sake of completeness; the interested reader can find additional details in [24].
We start by defining a continuous ordinal pattern π as a set of D values π = ( π 0 , π 1 , , π D 1 ) normalized in the range [ 1 , 1 ] . D is here the embedding dimension, and it has the same meaning as the embedding dimension of ordinal patterns. We further denote by s the sub-window of length D of the original time series X that is currently under analysis; note that this segment is also normalized in the range [ 1 , 1 ] . Given both π and s , we can then define a distance ϕ π assessing how well the former represents the evolution of the data in the latter:
ϕ π = 1 2 D i = 1 D d i = 1 2 D i = 1 D | π i s i | .
Here, the subscript i denotes the ith element. Please note that ϕ π = 0 implies that s and π are exactly equal, or, in other words, that the pattern π is a perfect representation of the dynamics within the sub-window s . On the other hand, values of ϕ π close to one imply that s and π have substantially different dynamics. 1 ϕ π is then a metric assessing how well the pattern π represents the dynamics of the time series, and this can further be averaged over all possible segments of length D of the analyzed time series to quantify how important π is to understand its dynamics. Hence, the larger the value of 1 ϕ π (and the smaller is ϕ π ), the more the pattern π is present in the data.
Please note that, up to this point, the COP π has been given as an input of the analysis; it is then necessary to define a way of finding the best π for the problem at hand. In this study, we resort to a simple strategy previously proposed in Refs. [24,46], and based on testing a large set (here, 250) of random patterns π . For each test, the distance between the values ϕ π obtained in the original time series and those in a randomly shuffled version of the same is estimated through a Kolmogorov–Smirnov two-sample test. Finally, the π yielding the largest difference, and thus the pattern according to which the time series under analysis is farther away from a random sequence, is retained.
In synthesis, given a single time series X, the full process involves: (i) finding a COP of dimension D for which the distance between X and a randomly shuffled version of it is maximal by testing multiple random COPs; (ii) calculate the median of ϕ π using such COP; finally, (iii) the maximum ϕ π obtained across the multiple COPs (in what follows denoted as C O P for simplicity) is understood as an index for the presence of some non-trivial structures in the time series. The whole process is illustrated in the right panel of Figure 2, representing, from top to bottom: the original time series X; an arbitrary COP of size D = 3 ; the extraction and normalization of the first sub-window, also of length D = 3 , and corresponding to the red box; and finally, the calculation of the complete ϕ π , in which the first value corresponds to the previous sub-window.

2.2.4. Metric Normalization

In the previous sections, we have defined two ways of detecting the presence of patterns in time series and, hence, whether they can be considered random. Understanding these metrics entails an additional challenge: while they measure a distance from randomness, a completely random time series would also give positive distances due to finite-size effects. In order to solve this, both the J S D and the C O P calculated from a given time series are here normalized according to the average values obtained by the same metrics in a large ensemble of surrogate (randomly shuffled) time series. In other words, the original time series are randomly shuffled 50 times; the J S D and the C O P are calculated for each one of these surrogates and averaged, respectively, obtaining J S D r n d and the C O P r n d ; finally, normalized versions of the metrics are defined as J S D = J S D / J S D r n d and C O P = C O P / C O P r n d . Those normalized values will have an expected value of 1 in the case of random time series and greater than 1 whenever non-random structures are present.
To further validate the obtained results, we have also considered an alternative normalization method based on using both the average and the standard deviation of the values obtained in the surrogate time series to define a Z-Score; in what follows, these two will be, respectively, denoted as J S D Z and C O P Z . This provides a more precise quantification of the distance from randomness, especially in the case of short time series, for which the variability both in the ordinal pattern frequencies and in the COP is higher. The length of the time series considered in this study (see Section 2.4) is large enough to not impact the results, and the relationship between J S D and C O P and their Z-Score counterparts is almost linear. Still, for the sake of generalizability, both normalizations will be evaluated below.

2.3. Assessing Functional Connectivity

Once the time series have been detrended, the next step in the analysis involves the detection of functional connectivity between pairs of them or, in the context of the present study, the detection of instances of delay propagation. Thus, given two time series X and Y for which we want to detect connectivity X Y , the analysis is conducted by applying the following four functional tests:
  • Rank Correlation (RC). Spearman’s Rank Correlation between the two analyzed time series, calculated over shifted time series x ( t ) , y ( t + λ ) , with λ { 0 , 1 , , 5 } , to account for the time required by delays to propagate. The λ yielding the lower p-value is the one selected.
  • Granger Causality (GC). The GC [47] is one of the best-known exponents of predictive causality [48] and assesses whether the inclusion of information about the driving element X helps predict the future dynamics of the driven element Y. As originally proposed, an autoregressive-moving-average (ARMA) model is used for the prediction. Two variants are constructed, forecasting Y by, respectively, introducing or not data about X’s past. Finally, the two models’ residuals are compared through an F-test, yielding a p-value indicating whether the presence of information about X is relevant—and, hence, whether a causality relationship is present.
  • Mutual Information (MI). MI is an information-theoretic measure that captures the shared amount of information between any two random variables. The Shannon information for X and Y, respectively denoted as H ( X ) and H ( Y ) , represent the corresponding amount of potential information or the degree of uncertainty [49]. MI quantifies how much of the uncertainty in Y is reduced or explained after knowing the full information of X, i.e.,
    I ( X : Y ) = y x p ( x , y ) log p ( x , y ) p ( x ) p ( y ) ,
    where p ( x , y ) represents the joint probability distribution, while p ( x ) and p ( y ) denote the marginal probability distributions. MI is particularly interesting for investigating the relationship between two variables because, in contrast to cross-correlation, it is sensitive to non-linear dependencies. In this work, MI is estimated using the Kozachenko–Stögbauer–Grassberger (KSG) method [50], which is well suited for continuous random variables following non-parametric (or unknown) distributions.
  • Transfer Entropy (TE). TE is also an information-theoretic measure that captures the amount of directional information flow from a source variable (e.g., X) to a target variable (e.g., Y) [51]. It is the measure of the amount of information contained in the past states of a source process (i.e., X ) about the future state of the target process (i.e., Y) given that the past states of the target (i.e., Y ) are known:
    T E X Y = p ( y n , y n , x n ) log p ( y n | x n , y n ) p ( y n | y n ) .
    TE is a model-free directional measure that captures the casual relationship in Wiener sense between two variables [52]. The KSG algorithm developed for MI has been adopted to compute the TE [53], and the same adaptation has been used in this work.
Please note that the result of applying these four tests is a p-value—in the case of the MI and the TE, this is obtained by comparing the measured information to what is obtained in a set of 10 3 surrogate (randomly shuffled) time series. As a final step, the presence of a propagation link between two airports is accepted whenever such p-value is below a significance threshold α = 0.01 .

2.4. Data on Airport Dynamics

In order to validate the analysis here proposed on a real-world scenario, we consider two complementary data sets describing the hourly evolution of delays in the top-50 airports in Europe and the US. Information has, respectively, been obtained from EUROCONTROL’s R&D Data Archive, a public repository of European historical flights made available for research purposes and freely accessible at https://www.eurocontrol.int/dashboard/rnd-data-archive, (accessed on 17 January 2024); and from the Reporting Carrier On-Time Performance database of the Bureau of Transportation Statistics, U.S. Department of Transportation, freely accessible at https://www.transtats.bts.gov, (accessed on 17 January 2024). Please note that both data sets have different temporal scopes due to limitations at source: while data are limited for the EU to four months (i.e., March, June, September, and December) of five years (2015–2019), the US dataset includes all months for the same five years. A list of the airports included in the study, alongside some basic statistics on operations, is reported in Table A1 and Table A2. From these raw data, arrival delay time series have been extracted, calculated as the difference between the actual and scheduled landing times of each flight, and averaged at each destination airport and each hour of the day.
It is interesting to note that the two considered data sets are highly heterogeneous. Firstly, as already discussed, the European one does not cover a continuous span of time. Secondly, flights and their associated delays are reported differently [54]: in the US case, flights correspond to those operated by certified US air carriers accounting for at least one percent of domestic scheduled passenger revenues (as opposed to all flights of the European case), and their associated delays are as reported by the airline (as opposed to by a central organization, i.e., EUROCONTROL in the European case).

3. Results

We start the analysis of the results by evaluating the fraction of functional links detected in the daily shuffled surrogate time series as a function of the two evaluation metrics and under different procedures for detrending. Please note that, due to the use of surrogate time series, no functional connectivity can be present; hence, those links that are detected represent how much the functional metrics are misled by the residual nonstationarity of the time series—this fraction is therefore called “confusion” in what follows.
The top left panel of Figure 3 reports the evolution of the confusion as a function of the J S D , in a way that will be common to all panels in subsequent figures, e.g., Figure 3 and Figure 4. Specifically, each point represents the result obtained using a combination of detrending strategies and functional connectivity metrics; the former is indicated by the shape of the marker, and the latter by its color—see legends in the bottom part. Please note that, according to the initial hypothesis of this work, the more a time series is detrended, the lower the quantity of information in them detected by the J S D , and, conversely, the lower should be the fraction of times that the functional connectivity metrics become confused by the nonstationarity. In other words, one would expect a positive correlation between the two, or at least no points corresponding to a low J S D and large confusion. As can be appreciated in Figure 3, this is not the case, with many points being located in the top left quadrant. In other words, even when the permutation pattern-based metric identifies no clear structures in the data (note the minimum J S D 1.5 ), these still have residual nonstationarity that results in spurious causality relations.
The reason for this negative result is easily identifiable by recalling that permutation patterns are designed to assess the presence of temporal structures in the data at the cost of disregarding the amplitude of the same. To illustrate, the top right panel of Figure 3 reports the same results when the original time series of delays are daily normalized using a Z-Score; in other words, the segment of 24 h corresponding to one day and airport is transformed to have an average of zero and a standard deviation of one. The same results as a function of J S D Z , divided according to European and US airports, are reported in the bottom panels of the same figure. An S-shaped curve readily emerged, suggesting that most of the seasonalities are lost using this second normalization process. However, it is worth noting that this assessment is conducted by relying on the detected functional connectivity, i.e., a posteriori.
A solution to this problem may come from the use of a modified version of the permutation pattern paradigm, taking into account the local amplitude structure, i.e., the concept of COP. Figure 4 then reports the results obtained as a function of C O P , in a way similar to Figure 3. Specifically, the top left panel reports the confusion as a function of the C O P when networks are reconstructed with the raw time series; the value of C O P remains high, suggesting the presence of residual patterns. The same metric drops in the two following panels. Specifically, the top middle one corresponds to the daily amplitude normalization, i.e., the same as the right panel of Figure 3. Next, the top right panel corresponds to a normalization in which the value of the delay observed at one airport at one given hour is normalized, using a Z-Score, against the set of delays in all other airports at the same time. Please note that this second normalization accounts for the amplitude of delays across the whole system, as opposed to one single airport. As can be seen, in both cases, the value of the C O P , at least for the Z-Score detrending methods (brown and gray markers), drops close to or below 2.0 ; the residual number of functional relationships is also significantly reduced. Results for the latter normalization as a function of C O P Z , divided according to European and US airports, are reported in the bottom panels of the same figure.
In order to confirm the validity of these results, we tested the previous insights using a synthetic dataset constructed by following the expected behavior of delays. We specifically started by calculating the average evolution of the hourly delays at London Heathrow across September 2019; see the solid black line in the left panel of Figure 5. Starting from this, a synthetic time series was generated by adding a noise, drawn from a distribution N ( 0.0 , 420.0 ) , that repeated every seven days and an additional noise drawn from a distribution N ( 0.0 , 140.0 ) . In other words, the time series of an airport is given by a repeated daily pattern, modulated by a weekly noise of ≈1/3 of the total signal amplitude, and with an additional random component of amplitude 1 / 3 of the previous one. A graphical representation of seven days is included in the left panel of Figure 5; see the thin colored lines. Ten time series, representing the dynamics of ten airports across 30 days, are finally created, each one including a random time shift—to simulate, e.g., the fact that some airports may have their peak time at different hours of the day. Please note that these time series are highly non-stationary (as, indeed, the main component repeats every day) and that, therefore, a functional connection is detected between them if not properly pre-processed. When the same detrending processes are applied to these time series, the results are as expected, see right panel of Figure 5; specifically, the Z-Score724 and the Z-Score24 are the two best approaches, in agreement with the way data were synthesized. Additionally, the low values of confusion confirm that no false positives are generated.
Given the previous results, one may be tempted to conclude that the solution is clear: the two variants of the Z-Score detrending, in conjunction with the Z-Score amplitude normalization, yield time series with low enough trends. This is, nevertheless, not the full picture, and two additional and complementary aspects must be analyzed: is a C O P < 2 low enough? As discussed in Section 2.2.4, this value is equivalent to a Z-Score of ≈6, and hence to the presence of structures that are highly statistically significant. Conversely, does this detrending/normalization preserve enough information about the real functional relationships?
In order to answer these two questions, we calculated the number of functional links obtained when analyzing the real data, i.e., without any daily shuffling, for the four functional metrics, and using the combination of Z-Score724 detrending and Z-Score amplitude normalization. Results, reported in the left and central panel of Figure 6, indicate that a large number of functional links are still detected. The case of the US (central panel) is interesting in that a high functional link density is obtained, much higher than in the EU case, with MI even yielding a fully connected network—which may indicate that not all trends are actually deleted. This nevertheless seems to be partly due to the much larger quantity of data available for the US; as described in Section 2.4, these include all months of the five considered years, as opposed to four months per year of the European dataset. When the time series are pruned to match the other dataset’s days, the link density observed for the US substantially drops—see black bars in the central panel of Figure 6. This is, of course, to be expected, as the lower the quantity of available data, the more difficult it is to detect functional relationships, and hence, the lower the obtained link density—see also the right panel of the same figure.

4. Discussion and Conclusions

This contribution tackled two complementary questions: whether ordinal patterns can be used to evaluate the stationarity of time series for then using these to assess the suitability of standard detrending methods in the context of air transport delay data. The discussion of the results must, therefore, be addressed from this two-fold perspective.
On the one hand, ordinal patterns seem to be a good quantifier of the quantity of temporal structure in the data and, hence, their nonstationarity. These work better than classical statistical tests [18,19] and may be the solution to some known problems they pose [55]. As frequently discussed in the literature, the original version of this metric disregards the amplitude of the time series and, therefore, cannot detect non-stationarities that are amplitude-based—a problem well-known in the literature [56,57]. This can be improved by resorting to modified versions of the same that include local or global amplitude information. We have here considered Continuous Ordinal Patterns—see Figure 4; the reader should nevertheless take into account that other alternatives exist, including weighted permutation patterns [58], generalized patterns [59], or slope entropy [60]. In synthesis, given a detrending method, this can easily be evaluated using an ordinal-based approach, and this conclusion is of general applicability beyond the specific dataset considered here.
On the other hand, we have evaluated a suite of detrending methods commonly used to reconstruct functional networks representing the propagation of delays in air transport; and what obtained depicts a complex scenario. First, a detrending method alone is not enough to obtain stationary data—see left panels of Figure 3 and Figure 4. This can be explained by taking into account the presence of confounding factors that may affect the observed values, both within the same day and across multiple airports of the system. To illustrate, suppose a day with adverse weather throughout the continent. Delays will be higher than expected, both in a given airport for multiple hours (possibly the whole day) and across all the airports of the affected region. A detrend method alone cannot account for this, and its use would result in many functional links that do not necessarily correspond to true propagations. This problem is easily solved by incorporating an additional normalization of the values, either through time (right panel of Figure 3 and central panel of Figure 4) or through airports (right panel of Figure 4). In other words, the problem at hand requires a multi-variate style detrending, something not common in the literature [61]. It is worth noting that, to the best of our knowledge, the proposed normalizations have hitherto never been applied in this context.
On a negative note, we also observed that the use of this combination of detrending/normalization methods is not enough to ensure a complete filtering of the time series. Specifically, the C O P , while lower than in the raw time series, is still quite high—see Figure 4 and Figure 7; and this results in a link density > 0.1 for daily shuffled time series, and suspiciously high connectivity for the raw data—see Figure 6. This is more prominent in the case of the US. Beyond the availability of more data, we hypothesize that this is a consequence of the larger geographical dispersion of this air transport system and of the presence of multiple time zones, which may hinder the Z-Score normalization across airports. Thus, when considering the problem of air transport delay propagations, the conclusion that can here be drawn is that better (possibly tailored) methods for data detrending are needed.
It is worth noting that some detrending approaches, which are common in the literature, yield results that are actually worse than performing no pre-processing—see, for instance, the cases of SecD in Figure 3 and ICA in Figure 4, respectively yielding higher J S D and C O P . This may be due to underlying hypotheses that are not fulfilled by the system under analysis. To illustrate, ICA expects components in the signals to be synchronized across all of them [26]; yet, rush hours at airports may be different, especially when these are located in different time zones.
As a final point, some interesting operational conclusions can be drawn from Figure 6. Even when pruning the time series to include the same number of days, higher link densities are obtained for the US; in addition, this difference is especially notable for MI and TE. This indicates that the US air transportation system is more prone to propagate delays, possibly in a more non-linear and complex way. Please note that this may be caused by multiple factors. To illustrate, the EU strongly relies on strategic Air Traffic Flow Management (ATFM) regulations, which aim at avoiding airborne holdings, while the US equivalent Ground Delay Program (GDP) works on shorter time scales [62]. Additionally, delays are reported in slightly different ways, as these are calculated in Europe using the last-filed flight plans and thus already contain strategic ATFM adjustments [54]. Lastly, even the origin of the data may have an impact: while in Europe, delays are calculated by a central authority, in the US, they are reported by individual airlines, which may follow different standards for their estimation.
In conclusion, the detrending of real-world time series has been shown to be a complex problem in which the idiosyncrasies of the studied system have a non-negligible impact on the validity of the process. Ordinal patterns-based metrics can nevertheless be used to validate the detrending process by quantifying the remaining amount of temporal structures.

Author Contributions

Conceptualization, M.Z.; methodology, F.O., F.J.M.-R., K.A. and M.Z.; formal analysis, F.O., F.J.M.-R., K.A. and M.Z.; writing—original draft preparation, F.O., F.J.M.-R., K.A. and M.Z.; writing—review and editing, F.O., F.J.M.-R., K.A. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 851255). This work was partially supported by the María de Maeztu project CEX2021-001164-M, funded by the MICIU/AEI/10.13039/501100011033.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. The original data analyzed in the study are openly available at https://www.eurocontrol.int/dashboard/rnd-data-archive (accessed on 13 July 2023) and https://www.transtats.bts.gov (accessed on 24 February 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Information about European airports. As well as ICAO codes and official names, the table includes the average number of hourly departures and arrivals using the operations available in the dataset.
Table A1. Information about European airports. As well as ICAO codes and official names, the table includes the average number of hourly departures and arrivals using the operations available in the dataset.
RankCodeNameDeparturesArrivals
1EGLLHeathrow Airport26.3926.41
2LFPGCharles de Gaulle Airport26.9726.91
3EHAMAmsterdam Airport Schiphol27.3727.31
4EDDFFrankfurt am Main Airport26.9226.89
5LEMDAdolfo Suárez Madrid-Barajas Airport21.9321.90
6LEBLJosep Tarradellas Barcelona-El Prat Airport17.7717.76
7EDDMMunich Airport22.1322.14
8EGKKGatwick Airport15.8315.79
9LIRFLeonardo da Vinci-Fiumicino Airport17.1717.17
10LFPOOrly Airport12.9612.94
11EIDWDublin Airport12.2412.22
12LSZHZurich Airport14.1514.11
13EKCHCopenhagen Airport14.6214.64
14LEPAPalma de Mallorca Airport10.9710.99
15LPPTLisbon Airport11.0011.01
16ENGMOslo Airport, Gardermoen13.7513.75
17EGCCManchester Airport10.8410.82
18EGSSLondon Stansted Airport10.0110.00
19LOWWVienna International Airport13.9113.89
20ESSAStockholm Arlanda Airport13.3113.31
21EBBRBrussels Airport12.5412.51
22LIMCMalpensa Airport10.4910.49
23EDDLDüsseldorf Airport11.9611.94
24LGAVAthens International Airport10.6210.63
25EDDTBerlin Tegel Airport10.1810.17
26LEMGMálaga Airport6.826.82
27EPWAWarsaw Chopin Airport9.219.21
28LSGGGeneva Airport9.639.63
29EDDHHamburg Airport8.168.16
30LKPRVáclav Havel Airport Prague7.807.81
31EGGWLuton Airport6.846.85
32LHBPBudapest Ferenc Liszt International Airport5.785.77
33EGPHEdinburgh Airport6.876.86
34LEALAlicante-Elche Airport4.944.94
35LFMNNice Côte d’Azur Airport7.227.23
36LROPHenri Coanda International Airport6.206.19
37EDDKCologne Bonn Airport7.277.26
38LIMEOrio al Serio International Airport4.694.68
39UKBBBoryspil International Airport4.884.89
40EGBBBirmingham Airport6.016.00
41LPPRPorto Airport4.734.73
42EDDSStuttgart Airport6.456.45
43LIPZVenice Marco Polo Airport4.994.99
44LFLLLyon-Saint-Exupéry Airport6.306.30
45LICCCatania-Fontanarossa Airport3.663.66
46LIRNNaples Airport3.853.85
47EGPFGlasgow Airport4.864.84
48LFBOToulouse-Blagnac Airport5.145.11
49LFMLMarseille Provence Airport5.215.20
50LIMLLinate Airport5.885.88
Table A2. Information about European airports. As well as IATA codes and official names, the table includes the average number of hourly departures and arrivals using the operations available in the dataset.
Table A2. Information about European airports. As well as IATA codes and official names, the table includes the average number of hourly departures and arrivals using the operations available in the dataset.
RankCodeNameTitle 3Rank
1ATLHartsfield-Jackson Atlanta International Airport41.6341.61
2DENDenver International Airport25.4125.38
3DFWDallas Fort Worth International Airport25.4225.35
4LAXLos Angeles International Airport23.7623.77
5ORDChicago O’Hare International Airport31.4231.37
6PHXPhoenix Sky Harbor International Airport17.8117.79
7MSPMinneapolis-Saint Paul International Airport14.9514.95
8CLTCharlotte Douglas International Airport16.1716.14
9SEASeattle-Tacoma International Airport14.8814.88
10SFOSan Francisco International Airport18.7718.77
11JFKJohn F. Kennedy International Airport11.6911.68
12IAHGeorge Bush Intercontinental Airport17.0417.00
13MCOOrlando International Airport14.5114.51
14EWRNewark Liberty International Airport13.4813.46
15LASHarry Reid International Airport17.1017.12
16FLLFort Lauderdale-Hollywood International Airport9.879.87
17BOSGeneral Edward Lawrence Logan International Airport14.3414.36
18DTWDetroit Metropolitan Wayne County Airport14.4514.47
19MIAMiami International Airport8.318.31
20LGALaGuardia Airport12.8412.80
21IADWashington Dulles International Airport5.375.37
22BWIBaltimore/Washington International Thurgood Marshall Airport10.9610.96
23PHLPhiladelphia International Airport9.539.53
24SANSan Diego International Airport9.289.28
25MDWChicago Midway International Airport9.459.43
26SLCSalt Lake City International Airport11.9111.93
27DCARonald Reagan Washington National Airport10.3210.32
28TPATampa International Airport7.897.90
29PDXPortland International Airport6.586.59
30STLSt. Louis Lambert International Airport6.476.48
31BNANashville International Airport6.786.79
32AUSAustin-Bergstrom International Airport6.046.05
33HNLDaniel K. Inouye International Airport5.445.44
34SJCSan José International Airport5.505.50
35MCIKansas City International Airport5.245.25
36DALDallas Love Field7.657.63
37SMFSacramento International Airport4.974.98
38MSYLouis Armstrong New Orleans International Airport5.405.40
39SNAJohn Wayne Airport4.564.57
40RDURaleigh-Durham International Airport4.884.89
41RSWSouthwest Florida International Airport3.423.42
42PITPittsburgh International Airport3.783.79
43HOUWilliam P. Hobby Airport6.176.16
44INDIndianapolis International Airport3.753.76
45SATSan Antonio International Airport3.873.88
46SJULuis Mu noz Marín International Airport2.842.85
47CLECleveland Hopkins International Airport4.364.37
48OAKOakland International Airport5.545.53
49CVGCincinnati/Northern Kentucky International Airport3.103.11
50CMHJohn Glenn Columbus International Airport3.443.45

References

  1. Anderson, P.W. More Is Different: Broken symmetry and the nature of the hierarchical structure of science. Science 1972, 177, 393–396. [Google Scholar] [CrossRef] [PubMed]
  2. Strogatz, S.H. Exploring complex networks. Nature 2001, 410, 268–276. [Google Scholar] [CrossRef] [PubMed]
  3. Bullmore, E.; Sporns, O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 2009, 10, 186–198. [Google Scholar] [CrossRef] [PubMed]
  4. Sporns, O. Structure and function of complex brain networks. Dialogues Clin. Neurosci. 2013, 15, 247–262. [Google Scholar] [CrossRef]
  5. Park, H.J.; Friston, K. Structural and functional brain networks: From connections to cognition. Science 2013, 342, 1238411. [Google Scholar] [CrossRef]
  6. Lee, I.; Date, S.V.; Adai, A.T.; Marcotte, E.M. A probabilistic functional network of yeast genes. Science 2004, 306, 1555–1558. [Google Scholar] [CrossRef]
  7. Lezon, T.R.; Banavar, J.R.; Cieplak, M.; Maritan, A.; Fedoroff, N.V. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns. Proc. Natl. Acad. Sci. USA 2006, 103, 19033–19038. [Google Scholar] [CrossRef]
  8. Boone, C.; Bussey, H.; Andrews, B.J. Exploring genetic interactions and networks with yeast. Nat. Rev. Genet. 2007, 8, 437–449. [Google Scholar] [CrossRef]
  9. Tsonis, A.A.; Roebber, P.J. The architecture of the climate network. Phys. A Stat. Mech. Its Appl. 2004, 333, 497–504. [Google Scholar] [CrossRef]
  10. Donges, J.F.; Zou, Y.; Marwan, N.; Kurths, J. The backbone of the climate network. Europhys. Lett. 2009, 87, 48007. [Google Scholar] [CrossRef]
  11. Zanin, M.; Belkoura, S.; Zhu, Y. Network analysis of Chinese air transport delay propagation. Chin. J. Aeronaut. 2017, 30, 491–499. [Google Scholar] [CrossRef]
  12. Pastorino, L.; Zanin, M. Air delay propagation patterns in Europe from 2015 to 2018: An information processing perspective. J. Phys. Complex. 2021, 3, 015001. [Google Scholar] [CrossRef]
  13. Guo, Z.; Hao, M.; Yu, B.; Yao, B. Detecting delay propagation in regional air transport systems using convergent cross mapping and complex network theory. Transp. Res. Part E Logist. Transp. Rev. 2022, 157, 102585. [Google Scholar] [CrossRef]
  14. Carlier, S.; De Lépinay, I.; Hustache, J.C.; Jelinek, F. Environmental impact of air traffic flow management delays. In Proceedings of the 7th USA/Europe air traffic management research and development seminar (ATM2007), Barcelona, Spain, 2–5 July 2007; Volume 2, p. 16. [Google Scholar]
  15. Peterson, E.B.; Neels, K.; Barczi, N.; Graham, T. The economic cost of airline flight delay. J. Transp. Econ. Policy (JTEP) 2013, 47, 107–121. [Google Scholar]
  16. Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D.U. Complex networks: Structure and dynamics. Phys. Rep. 2006, 424, 175–308. [Google Scholar] [CrossRef]
  17. Costa, L.d.F.; Rodrigues, F.A.; Travieso, G.; Villas Boas, P.R. Characterization of complex networks: A survey of measurements. Adv. Phys. 2007, 56, 167–242. [Google Scholar] [CrossRef]
  18. Kwiatkowski, D.; Phillips, P.C.; Schmidt, P.; Shin, Y. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econom. 1992, 54, 159–178. [Google Scholar] [CrossRef]
  19. Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar]
  20. Zanin, M.; Olivares, F. Ordinal patterns-based methodologies for distinguishing chaos from noise in discrete time series. Commun. Phys. 2021, 4, 190. [Google Scholar] [CrossRef]
  21. Leyva, I.; Martínez, J.H.; Masoller, C.; Rosso, O.A.; Zanin, M. 20 years of ordinal patterns: Perspectives and challenges. Europhys. Lett. 2022, 138, 31001. [Google Scholar] [CrossRef]
  22. Bandt, C.; Shiha, F. Order patterns in time series. J. Time Ser. Anal. 2007, 28, 646–665. [Google Scholar] [CrossRef]
  23. Weiß, C.H. Non-parametric tests for serial dependence in time series based on asymptotic implementations of ordinal-pattern statistics. Chaos Interdiscip. J. Nonlinear Sci. 2022, 32, 093107. [Google Scholar] [CrossRef] [PubMed]
  24. Zanin, M. Continuous ordinal patterns: Creating a bridge between ordinal analysis and deep learning. Chaos Interdiscip. J. Nonlinear Sci. 2023, 33, 033114. [Google Scholar] [CrossRef] [PubMed]
  25. Acharya, K.; Olivares, F.; Zanin, M. How representative are air transport functional complex networks? A quantitative validation. Chaos Interdiscip. J. Nonlinear Sci. 2024, 34, 043133. [Google Scholar] [CrossRef]
  26. Lee, T.W.; Lee, T.W. Independent Component Analysis; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
  27. Bandt, C.; Pompe, B. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef]
  28. Amigó, J.M.; Zambrano, S.; Sanjuán, M.A. True and false forbidden patterns in deterministic and random dynamics. Europhys. Lett. 2007, 79, 50001. [Google Scholar] [CrossRef]
  29. Zunino, L.; Soriano, M.C.; Fischer, I.; Rosso, O.A.; Mirasso, C.R. Permutation-information-theory approach to unveil delay dynamics from time-series analysis. Phys. Rev. E—Stat. Nonlinear Soft Matter Phys. 2010, 82, 046212. [Google Scholar] [CrossRef]
  30. Rosso, O.A.; Carpi, L.C.; Saco, P.M.; Ravetti, M.G.; Plastino, A.; Larrondo, H.A. Causality and the entropy–complexity plane: Robustness and missing ordinal patterns. Phys. A Stat. Mech. Its Appl. 2012, 391, 42–55. [Google Scholar] [CrossRef]
  31. Rosso, O.A.; Larrondo, H.; Martin, M.T.; Plastino, A.; Fuentes, M.A. Distinguishing noise from chaos. Phys. Rev. Lett. 2007, 99, 154102. [Google Scholar] [CrossRef]
  32. Zunino, L.; Soriano, M.C.; Rosso, O.A. Distinguishing chaotic and stochastic dynamics from time series by using a multiscale symbolic approach. Phys. Rev. E—Stat. Nonlinear Soft Matter Phys. 2012, 86, 046210. [Google Scholar] [CrossRef]
  33. Amigó, J.M.; Rosso, O.A. Ordinal methods: Concepts, applications, new developments, and challenges-In memory of Karsten Keller (1961–2022). Chaos Interdiscip. J. Nonlinear Sci. 2023, 33, 080401. [Google Scholar] [CrossRef] [PubMed]
  34. Amigó, J.M.; Kocarev, L.; Szczepanski, J. Order patterns and chaos. Phys. Lett. A 2006, 355, 27–31. [Google Scholar] [CrossRef]
  35. Carpi, L.C.; Saco, P.M.; Rosso, O.A. Missing ordinal patterns in correlated noises. Phys. A Stat. Mech. Its Appl. 2010, 389, 2020–2029. [Google Scholar] [CrossRef]
  36. Olivares, F.; Zunino, L.; Pérez, D.G. Revisiting the decay of missing ordinal patterns in long-term correlated time series. Phys. A Stat. Mech. Its Appl. 2019, 534, 122100. [Google Scholar] [CrossRef]
  37. Zunino, L.; Zanin, M.; Tabak, B.M.; Pérez, D.G.; Rosso, O.A. Forbidden patterns, permutation entropy and stock market inefficiency. Phys. A Stat. Mech. Its Appl. 2009, 388, 2854–2864. [Google Scholar] [CrossRef]
  38. Tiana-Alsina, J.; Buldu, J.M.; Torrent, M.; García-Ojalvo, J. Quantifying stochasticity in the dynamics of delay-coupled semiconductor lasers via forbidden patterns. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2010, 368, 367–377. [Google Scholar] [CrossRef]
  39. Olivares, F.; Zunino, L.; Soriano, M.C.; Pérez, D.G. Unraveling the decay of the number of unobserved ordinal patterns in noisy chaotic dynamics. Phys. Rev. E 2019, 100, 042215. [Google Scholar] [CrossRef]
  40. Bandt, C. Statistics and contrasts of order patterns in univariate time series. Chaos Interdiscip. J. Nonlinear Sci. 2023, 33, 033124. [Google Scholar] [CrossRef]
  41. Martínez, J.H.; Herrera-Diestra, J.L.; Chavez, M. Detection of time reversibility in time series by ordinal patterns analysis. Chaos Interdiscip. J. Nonlinear Sci. 2018, 28, 123111. [Google Scholar] [CrossRef]
  42. Zanin, M.; Rodríguez-González, A.; Menasalvas Ruiz, E.; Papo, D. Assessing time series reversibility through permutation patterns. Entropy 2018, 20, 665. [Google Scholar] [CrossRef]
  43. Zunino, L.; Olivares, F.; Ribeiro, H.V.; Rosso, O.A. Permutation Jensen-Shannon distance: A versatile and fast symbolic tool for complex time-series analysis. Phys. Rev. E 2022, 105, 045310. [Google Scholar] [CrossRef] [PubMed]
  44. Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
  45. Olivares, F.; Zunino, L.; Zanin, M. Markov-modulated model for landing flow dynamics: An ordinal analysis validation. Chaos Interdiscip. J. Nonlinear Sci. 2023, 33, 033142. [Google Scholar] [CrossRef] [PubMed]
  46. Zanin, M. Augmenting granger causality through continuous ordinal patterns. Commun. Nonlinear Sci. Numer. Simul. 2024, 128, 107606. [Google Scholar] [CrossRef]
  47. Granger, C.W. Investigating causal relations by econometric models and cross-spectral methods. Econom. J. Econom. Soc. 1969, 37, 424–438. [Google Scholar] [CrossRef]
  48. Diebold, F.X. Independent Component Analysis; Elements of Forecasting; Cengage Learning: Mason, OH, USA, 1998. [Google Scholar]
  49. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]
  50. Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef]
  51. Schreiber, T. Measuring Information Transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef]
  52. Wiener, N. The theory of prediction. In Modern Mathematics for the Engineer; Beckenbach, E.F., Ed.; McGraw-Hill: New York, NY, USA, 1956; pp. 125–139. [Google Scholar]
  53. Gomez-Herrero, G.; Wu, W.; Rutanen, K.; Soriano, M.; Pipa, G.; Vicente, R. Assessing Coupling Dynamics from an Ensemble of Time Series. Entropy 2010, 17, 1958–1970. [Google Scholar] [CrossRef]
  54. Cook, A.; Belkoura, S.; Zanin, M. ATM performance measurement in Europe, the US and China. Chin. J. Aeronaut. 2017, 30, 479–490. [Google Scholar] [CrossRef]
  55. Raffalovich, L.E. Detrending time series: A cautionary note. Sociol. Methods Res. 1994, 22, 492–519. [Google Scholar] [CrossRef]
  56. Azami, H.; Escudero, J. Amplitude-aware permutation entropy: Illustration in spike detection and signal segmentation. Comput. Methods Programs Biomed. 2016, 128, 40–51. [Google Scholar] [CrossRef] [PubMed]
  57. Cuesta Frau, D. Permutation entropy: Influence of amplitude information on time series classification performance. Math. Biosci. Eng. 2019, 16, 6842–6857. [Google Scholar] [CrossRef] [PubMed]
  58. Fadlallah, B.; Chen, B.; Keil, A.; Príncipe, J. Weighted-permutation entropy: A complexity measure for time series incorporating amplitude information. Phys. Rev. E—Stat. Nonlinear Soft Matter Phys. 2013, 87, 022911. [Google Scholar] [CrossRef]
  59. Stosic, D.; Stosic, D.; Stosic, T.; Stosic, B. Generalized weighted permutation entropy. Chaos Interdiscip. J. Nonlinear Sci. 2022, 32. [Google Scholar] [CrossRef]
  60. Cuesta-Frau, D. Slope entropy: A new time series complexity estimator based on both symbolic patterns and amplitude information. Entropy 2019, 21, 1167. [Google Scholar] [CrossRef]
  61. Knight, M.I.; Nunes, M.; Nason, G. Modelling, detrending and decorrelation of network time series. arXiv 2016, arXiv:1603.03221. [Google Scholar]
  62. Eurocontrol; FAA. Comparison of Air Traffic Management Related Operational and Economic Performance U.S.—Europe; Technical Report; 2024. Available online: https://www.eurocontrol.int/publication/comparison-air-traffic-management-related-operational-and-economic-performance (accessed on 25 December 2024).
Figure 1. Nonstationarity and its consequences in the estimation of functional connectivity. (Top left) Fraction of pairs of time series for which functional connectivity is detected, for Europe (left bars) and the US (right bars), and four different connectivity metrics. See Section 2 for details on the data and methods. (Top right) Fraction of times a functional link is detected between the time series of Paris Charles de Gaulle and London Heathrow as a function of the amplitude of a Gaussian additive noise of zero mean and standard deviation σ . Please note that the amplitude is normalized, such that σ = 1 corresponds to the standard deviation of the time series. The dashed and dotted gray lines correspond to the fraction of times the time series are detected as stationary by, respectively, KPSS [18] and Augmented Dickey–Fuller [19] tests. (Bottom left) Graphical representation of the creation of daily shuffled surrogate time series; each color represents the data of one day. (Bottom right) Fraction of pairs of time series for which functional connectivity is detected when daily shuffled surrogates are used. The meaning of bars and colors is the same as in the top left panel. In all cases, RC: Rank Correlation; GC: Granger Causality; MI: Mutual Information; TE: Transfer Entropy.
Figure 1. Nonstationarity and its consequences in the estimation of functional connectivity. (Top left) Fraction of pairs of time series for which functional connectivity is detected, for Europe (left bars) and the US (right bars), and four different connectivity metrics. See Section 2 for details on the data and methods. (Top right) Fraction of times a functional link is detected between the time series of Paris Charles de Gaulle and London Heathrow as a function of the amplitude of a Gaussian additive noise of zero mean and standard deviation σ . Please note that the amplitude is normalized, such that σ = 1 corresponds to the standard deviation of the time series. The dashed and dotted gray lines correspond to the fraction of times the time series are detected as stationary by, respectively, KPSS [18] and Augmented Dickey–Fuller [19] tests. (Bottom left) Graphical representation of the creation of daily shuffled surrogate time series; each color represents the data of one day. (Bottom right) Fraction of pairs of time series for which functional connectivity is detected when daily shuffled surrogates are used. The meaning of bars and colors is the same as in the top left panel. In all cases, RC: Rank Correlation; GC: Granger Causality; MI: Mutual Information; TE: Transfer Entropy.
Entropy 27 00230 g001
Figure 2. Graphical representations of the two methods for evaluating the detrending process. (Left) Illustration of building the ordinal pattern distribution over n days of hourly mean delay sequences. Red and green boxes indicate the data used to calculate the first two distributions. (Right) Calculation of the ϕ π for a time series of delays using the COP methodology. See main text for details.
Figure 2. Graphical representations of the two methods for evaluating the detrending process. (Left) Illustration of building the ordinal pattern distribution over n days of hourly mean delay sequences. Red and green boxes indicate the data used to calculate the first two distributions. (Right) Calculation of the ϕ π for a time series of delays using the COP methodology. See main text for details.
Entropy 27 00230 g002
Figure 3. Confusion, i.e., fraction of times a functional relationship is detected between two airports when data are daily shuffled, as a function of the J S D of the detrended time series. (Top left and right) panels respectively correspond to the original time series, and thereto daily normalized in amplitude using a Z-Score—see main text for details. Bottom panels further report the results as a function of the J S D Z for the daily normalized time series, distinguishing between Europe (bottom left) and the US (bottom right). Shape and colors of the markers respectively indicate the functional metric and the detrending procedure; solid and empty markers further correspond to EU and US, see bottom legends.
Figure 3. Confusion, i.e., fraction of times a functional relationship is detected between two airports when data are daily shuffled, as a function of the J S D of the detrended time series. (Top left and right) panels respectively correspond to the original time series, and thereto daily normalized in amplitude using a Z-Score—see main text for details. Bottom panels further report the results as a function of the J S D Z for the daily normalized time series, distinguishing between Europe (bottom left) and the US (bottom right). Shape and colors of the markers respectively indicate the functional metric and the detrending procedure; solid and empty markers further correspond to EU and US, see bottom legends.
Entropy 27 00230 g003
Figure 4. Confusion, i.e., fraction of times a functional relationship is detected between two airports when data are daily shuffled, as a function of the C O P of the detrended time series. Top panels respectively correspond to the original time series, and daily normalized in amplitude using a Z-Score, and normalized across airports—see main text for details. Bottom panels further report the results as a function of the C O P Z for the Z-Score normalized time series, distinguishing between Europe (bottom left) and the US (bottom right). Shape and colors of the markers respectively indicate the functional metric and the detrending procedure; solid and empty markers further correspond to EU and US, see bottom legends. Please note that points corresponding to the SecD (light blue) are not visible due to this procedure yielding values of C O P > 8 and C O P Z > 15 .
Figure 4. Confusion, i.e., fraction of times a functional relationship is detected between two airports when data are daily shuffled, as a function of the C O P of the detrended time series. Top panels respectively correspond to the original time series, and daily normalized in amplitude using a Z-Score, and normalized across airports—see main text for details. Bottom panels further report the results as a function of the C O P Z for the Z-Score normalized time series, distinguishing between Europe (bottom left) and the US (bottom right). Shape and colors of the markers respectively indicate the functional metric and the detrending procedure; solid and empty markers further correspond to EU and US, see bottom legends. Please note that points corresponding to the SecD (light blue) are not visible due to this procedure yielding values of C O P > 8 and C O P Z > 15 .
Entropy 27 00230 g004
Figure 5. Analysis of synthetic data. The left panel reports a representation of the global average delay profile (thick black line) across the 24 h of the day and of one realization of the delay for the seven days of the week (thin colored lines). The right panel reports the confusion as a function of the C O P Z on the synthetic time series. See the main text for details on the generation process. Shape and colors of the markers respectively indicate the functional metric and the detrending procedure.
Figure 5. Analysis of synthetic data. The left panel reports a representation of the global average delay profile (thick black line) across the 24 h of the day and of one realization of the delay for the seven days of the week (thin colored lines). The right panel reports the confusion as a function of the C O P Z on the synthetic time series. See the main text for details on the generation process. Shape and colors of the markers respectively indicate the functional metric and the detrending procedure.
Entropy 27 00230 g005
Figure 6. Real functional connectivity. (Left and center) Fraction of pairs of time series for which functional connectivity is detected, for Europe (left panel) and the US (central panel), and four different connectivity metrics. In the latter case, black bars report the link density obtained when US time series are pruned to match European ones. (Right) Evolution of the detected link density in the US system as a function of the percentage of data used in the evaluation. Original time series have been processed using a combination of Z-Score724 detrending and Z-Score amplitude normalization.
Figure 6. Real functional connectivity. (Left and center) Fraction of pairs of time series for which functional connectivity is detected, for Europe (left panel) and the US (central panel), and four different connectivity metrics. In the latter case, black bars report the link density obtained when US time series are pruned to match European ones. (Right) Evolution of the detected link density in the US system as a function of the percentage of data used in the evaluation. Original time series have been processed using a combination of Z-Score724 detrending and Z-Score amplitude normalization.
Entropy 27 00230 g006
Figure 7. Distribution of the C O P of time series for all airports in Europe (left panel) and US (right panel), when these are detrended using Z-Score24 (left graph, blue) and Z-Score724 (right graph, orange) methods, and with a Z-Score amplitude normalization. Within each distribution, the middle horizontal lines indicate the median (solid line) and the mean (dashed line). The dashed black horizontal lines indicate the median value observed in the raw data, i.e., before any detrending/normalization. Finally, the dotted lines correspond to a C O P = 2 and are included as a visual reference.
Figure 7. Distribution of the C O P of time series for all airports in Europe (left panel) and US (right panel), when these are detrended using Z-Score24 (left graph, blue) and Z-Score724 (right graph, orange) methods, and with a Z-Score amplitude normalization. Within each distribution, the middle horizontal lines indicate the median (solid line) and the mean (dashed line). The dashed black horizontal lines indicate the median value observed in the raw data, i.e., before any detrending/normalization. Finally, the dotted lines correspond to a C O P = 2 and are included as a visual reference.
Entropy 27 00230 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Olivares, F.; Marín-Rodríguez, F.J.; Acharya, K.; Zanin, M. Evaluating Methods for Detrending Time Series Using Ordinal Patterns, with an Application to Air Transport Delays. Entropy 2025, 27, 230. https://doi.org/10.3390/e27030230

AMA Style

Olivares F, Marín-Rodríguez FJ, Acharya K, Zanin M. Evaluating Methods for Detrending Time Series Using Ordinal Patterns, with an Application to Air Transport Delays. Entropy. 2025; 27(3):230. https://doi.org/10.3390/e27030230

Chicago/Turabian Style

Olivares, Felipe, F. Javier Marín-Rodríguez, Kishor Acharya, and Massimiliano Zanin. 2025. "Evaluating Methods for Detrending Time Series Using Ordinal Patterns, with an Application to Air Transport Delays" Entropy 27, no. 3: 230. https://doi.org/10.3390/e27030230

APA Style

Olivares, F., Marín-Rodríguez, F. J., Acharya, K., & Zanin, M. (2025). Evaluating Methods for Detrending Time Series Using Ordinal Patterns, with an Application to Air Transport Delays. Entropy, 27(3), 230. https://doi.org/10.3390/e27030230

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop