SynthATDelays: A Minimalist Python Package for the Generation of Synthetic Air Transport Delay Data

Büth, Carlson Moses; Zanin, Massimiliano

doi:10.3390/aerospace12100900

Open AccessArticle

SynthATDelays: A Minimalist Python Package for the Generation of Synthetic Air Transport Delay Data

by

Carlson Moses Büth

^1,2

and

Massimiliano Zanin

^3,*

¹

Institute for Cross-Disciplinary Physics and Complex Systems (IFISC), CSIC-UIB, Edifici Instituts Universitaris de Recerca, Campus UIB, 07122 Palma de Mallorca, Spain

²

Department of Computational Linguistics, University of Zurich, 8050 Zurich, Switzerland

³

Institute for Cross-Disciplinary Physics and Complex Systems (IFISC), CSIC-UIB, Edifici Complex de Recerca de les Illes Balears, Parc Bit, 07120 Palma de Mallorca, Spain

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(10), 900; https://doi.org/10.3390/aerospace12100900

Submission received: 13 August 2025 / Revised: 18 September 2025 / Accepted: 4 October 2025 / Published: 6 October 2025

(This article belongs to the Section Air Traffic and Transportation)

Download

Browse Figures

Versions Notes

Abstract

Within the endeavour of describing and analysing delays and their propagations in air transport, a major limitation is represented by the validation of the obtained results. While this can be overcome through synthetic models, those available in the literature mostly aim at simulating the system in a detailed and realistic way, resulting in high complexity and substantial computational costs. We here present SynthATDelays, a minimalist and modular Python package designed to simulate a virtual customisable air transport system and to provide synthetic delay data under tuneable conditions; it is thus designed to support the validation of data-based studies and pipelines. We describe its internal structure and provide examples about how scenarios can be designed and executed. We further show how it can be used to tackle two relevant questions, i.e., the role of operational buffer times in the absorption of delays and the comparison and optimisation of causality tests to detect the propagation thereof.

Keywords:

air transport; delays; delay propagation; Python package

1. Introduction

In a way similar to virtually all fields of science and technology, in recent years data analysis has become an essential toolkit in the endeavour of describing and improving air transport. As a prototypical example, a large body of the literature has been devoted to data-based analyses of delays and their propagations due to their relevance in the efficiency of the system, in its environmental impact, and in the customers’ perceived value [1,2]. These range from the statistical characterisation of delays [3,4,5,6,7] to analyses aiming at describing relationships with external factors (e.g., adverse weather [8,9,10,11]), at extracting propagation patterns (e.g., through functional networks [12,13,14,15,16]), or at creating models able to forecast delays of future operations [17,18,19,20].

In spite of many undeniable successes, describing and analysing the dynamics of delays from real data entails an important limitation: the validation of the results. Given the inability to execute what-if scenarios or modify the system, researchers must trust that their findings accurately represent the underlying dynamics. To illustrate, suppose that a functional network analysis identifies a given airport as the main propagator of delays. In other scientific fields, such a result would be validated (or rejected) by designing and performing a specific experiment under controlled conditions. Yet, if only real data can be used, it is highly impractical to verify whether adding resources to that airport would result in a dampened propagation—modifying the way an airport operates for the sake of science is unrealistic at best. Given this state of affairs, it is imperative to have a good understanding (and hence trust) of the data analysis pipeline. This can be achieved in two ways: through knowledge about the theoretical foundations of the used metrics, such as, e.g., under which conditions causality can be detected between time series; and by using synthetic data, which must be both realistic and tuneable.

The air transport community has a long history of developing models to simulate the dynamics of the system. From the scientific side, a handful of open-source agent-based models focusing on the macroscale movements of aircraft and passengers have been proposed, such as the one in Ref. [21] and Mercury [22]. Along this line, it is also worth mentioning Bluesky [23] and AirTrafficSim [24]; compared to the previous ones, these focus on the microscale and allow for testing hypotheses related to Air Traffic Control (ATC) while still being of potential usefulness in simulating the macroscopic dynamics. Finally, several commercial products are available, such as, e.g., the Total Airspace and Airport Modeler (TAAM), developed by Jeppesen; the Reorganized ATC Mathematical Simulator (RAMS), by EUROCONTROL; and the RAMS Plus, developed by ISA Software. Beyond these complete simulation suites, it is worth mentioning research works focusing on the prediction and modelling of air traffic [25,26,27,28] or the creation and optimisation of aircraft schedules [29,30,31]. In both cases, these can be seen as complementary tools to improve our capacity to simulate the dynamics of the real system. All the above-mentioned examples share one commonality: they aim at providing the most realistic representation of the evolution of the system; in other words, given the real planned operations for one day, they try to yield the real outcome of that same day. This of course results in high complexity and computational costs.

The attentive reader will have noted an important research gap: the need of simpler tools for the validation of data analyses. To illustrate, the researcher wanting to use a causality metric to describe the propagation of delays between two airports may firstly need to validate that metric using a known baseline, i.e., synthetic data of known characteristics. Many (if not all) causality metrics also include parameters that have to be optimised [32]. While such optimisation and validation can be performed using the aforementioned simulators, this comes at a high cost, both computationally and in terms of the time required to prepare scenarios of known characteristics. In short, a simpler solution is needed to support these analyses, as customarily carried out in other research fields, such as, e.g., neuroscience [33,34,35].

In this contribution we present SynthATDelays, a Python package designed to produce synthetic delay information from highly tuneable scenarios. Compared to the aforementioned options, these scenarios are not aimed at mimicking the behaviour of the real system but rather at testing, in a minimalist way, specific conditions and hypotheses, and how subsequent analyses are able to capture these. In other words, this package allows for creating minimal toy models to test specific aspects of delay dynamics, including only the elements that are essential for the analysis, while avoiding (or at least minimising) the complexity of the real system. To illustrate, the researcher can use this package to create a hypothetical system composed of a few airports, and generate time series representing the evolution of the average delays when changing their capacity. Similarly, different events and conditions can be simulated, e.g., the appearance of delays in specific routes, the dependence of different flights from the same crew, or the length of the buffer time between subsequent operations. This is achieved through a parsimonious yet modular structure that allows for extensive customisation and also a reduced computational cost, enabling the batch analysis of numerous realisations in tens of seconds.

The remainder of the paper firstly describes the way that the simulations are performed (Section 2), including which options are available to the user, how the individual operations are numerically simulated, and how the results are organised. Next, Section 3 reports several examples of the use of the library: from a step-by-step tutorial, including the configuration of the initial flight network and the inclusion of different types of delays (Section 3.1), to different analyses of relevance in the context of functional network reconstruction (Section 3.2 and Section 3.3). Section 4 discusses some additional technical considerations, like how to install the package, its dependencies, and an analysis of the computational cost. Finally, Section 5 draws conclusions and proposes future development steps.

2. The Structure and Internal Logic of the Package

As depicted in Figure 1, the package is organised around three main blocks: the specification of the scenario to be executed, whose properties are centralised in a single class; its numerical simulation, handled by a single function; and the extraction of results at different levels of granularity. These three blocks are further described below.

2.1. Setting Up the Simulation

All information required to perform a simulation has to be introduced in an instance of the Options_Class class. This can be performed either by manually populating it or by using several predefined scenarios provided in the library—an example of this will be illustrated in Section 3. In what follows we provide an overview of the available options; additional details can be found in the documentation [36].

Airports, including their number, their capacity (in operations per hour), and the flight time required to travel between them. Note that geographical information is not modelled; it is therefore possible to model non-Euclidean networks of flights.
Aircraft, including their number, the minimum turnaround time between subsequent operations, and the buffer time left to recover delays.
The route operated by each aircraft, defined as a list of airports that are visited sequentially. This allows, for instance, to define simple hub-and-spoke operations (e.g., $a \to b \to a$ ), more complex variations of the same (e.g., $a \to b \to a \to c \to a$ ), or triangular routes (e.g., $a \to b \to c \to a$ ).
Delay generation. Delays can be defined to either affect flights on specific routes or all flights landing at specific airports. The package provides several predefined functions to calculate their magnitude, including the use of delay profiles synthesised from real operations at major European airports; alternatively, the user can define custom ones—see the documentation for details and examples [36].
Dependencies between routes. These model situations in which two flights operate different routes with one airport in common (e.g., $a \to b$ and $b \to c$ ), such that the latter cannot take off until the former has landed. They can thus be used to simulate the presence of connecting passengers, or the crew having to change aircraft. This further allows for modelling the propagation of delays between airports that are not directly connected by a flight.
Additional options, including, e.g., the duration of nights, i.e., of a period in which flights cannot operate; or the number of days to be simulated.

2.2. The Simulation

The previously defined options are used as input for the main function performing the simulation (ExecSimulation; see the documentation [36]). This, in turn, is composed of two major phases.

The first phase entails defining the scheduling of all flights in the scenario. For each aircraft, its operations are created by following the corresponding route; the scheduled take-off time is calculated as the scheduled time of the previous landing, plus the turnaround time and the buffer time; conversely, the flight duration is obtained from the time distance between the corresponding pair of airports. Next, if so defined in the options, flights that should depart during the night hours are moved to the first hour of the morning. Note that, in this phase of the simulation, delays are not accounted for.

As a final point of the scheduling creation, pairs of flights operated by different aircraft are checked and eventually linked to simulate dependencies due to connecting passengers or shared crews. This is performed, with a probability defined by the user, whenever the departure of a flight operating between a and b is close in time to the landing of another flight from c to a—hence the latter aircraft is bringing the connecting passengers to a and the former has to wait for them. This dependency between the two flights is stored for future use.

With this information ready, the simulation proper is executed based on a discrete-event model with incremental time progression. At each time t, which is incremented in steps of one minute, the condition of each aircraft is updated according to the following rules—also represented as a flowchart in Figure 2:

If the aircraft is idle, find its first scheduled flight not already executed; if its scheduled departure time is less than or equal to t, the aircraft is added to the airport queue, which is used to account for its limited capacity. The program also checks for dependencies, and the flight is activated only if the preceding one has already landed. The actual landing time is calculated by adding the distance between the departure and arrival airports and any other en-route or airport delays defined in the options of the scenario. Finally, the status of the aircraft is updated to airborne.
If the aircraft is airborne and the current time t is equal or greater than the landing time, the status of the aircraft is changed to that corresponding to the turnaround process.
Finally, if the aircraft is performing the turnaround and the time passed is greater than the minimum turnaround time, the status of the aircraft is changed back to idle, and the whole process repeats.

Note that the buffer time is implicitly used to reduce delays after landing by being included in the scheduling but not in the real operations. To illustrate, suppose that both the minimum turnaround and buffer times are set to 60 min and that the aircraft was scheduled to land at 15:00, but encountered a 30 min delay. The subsequent departure would have been scheduled at 17:00. This is calculated as 15:00 with the addition of one hour each for turnaround and buffer. Consequently, the aircraft is prepared for departure at 16:30. This time is derived from the actual landing time of 15:30, incremented by the 60 min minimum turnaround time. As a result, 30 min of the 60 min buffer is utilised to mitigate the delay. This ensures that the next flight departs punctually.

Conversely, three conditions can cause delays to propagation (i) when the delays are larger than the buffer time; (ii) when delays result in the concentration of operations in the same time window, thus surpassing the capacity of the airport; and (iii) when flights have dependencies between them, such that one of them has to wait for the arrival of a second one. By tuning the respective parameters (i.e., the buffer time, the airport capacity, and the probability of having dependencies), the researcher can manipulate the effective delay propagation in the system.

2.3. Analysis of the Results

Once the simulation is completed, its results can be accessed in two ways, depending on the granularity required by the subsequent analyses.

On the one hand, information about all executed flights is stored in an array of Flight_Class objects, including the corresponding (scheduled and actual) departure and landing times. These objects are further referenced in lists representing all aircraft and airports in the system, which can be accessed to obtain statistics at those levels. To illustrate, the user can extract the delay of all flights operated by an aircraft or of all flights landing at a given airport.

On the other hand, the package provides a function for obtaining macroscale time series that are commonly used in the context of delay analysis: average departure and landing delays across all airports, and number of landing and departure operations (AnalyseResults; see the documentation [36]). In all cases, the resolution of these time series can be customised, thus supporting multiscale analyses.

3. Examples

After this introduction on the structure of the package, we are here going to show how it can be used for research purposes. We start with a basic tutorial on the use of the different functions (Section 3.1) for then tackling two relevant questions: how delay propagation is modulated by the buffer time and by links between flights (Section 3.2), and the impact of using different metrics for detecting the propagation of delays (Section 3.3).

3.1. Step-by-Step Tutorial

In this section we are going to provide a step-by-step example of the use of the library; the interested reader will find additional details in the documentation of the same, including a set of tutorials that illustrate the main functions in a user-friendly way [36].

The simplest way of setting up a new simulation is through one of the provided scenarios, i.e., predefined self-contained configurations, which can later be tuned by the researcher according to their needs. To illustrate, we will start with the scenario named Scenario_RandomConnectivity, in which a set of airports are randomly connected by flights. The code, including initial imports, is reported in Listing 1.

Listing 1: Basic initialisation code, using one of the predefined scenarios.

This scenario accepts four parameters: the number of airports to be simulated (here set to 6), the number of aircraft connecting them (here 40), the buffer time (here

0.25

h, i.e., 15 min), and the initial random number seed. The result, saved in the variable Options, is the instance of a class containing all the information to perform the simulation. To illustrate, accessing the variable Options.airportCapacity will yield the array array([30., 30., 30., 30., 30., 30.])—in other words, all six airports are initialised with a maximum capacity of 30 operations per hour. This option object can already be passed to the main function performing the simulation, as shown in Listing 2.

Listing 2: Call to the function to execute the simulation, taking as input the options created in Listing 1.

The output is composed of three lists: executedFlights, with individual flights that have been simulated; and Airports and Aircraft, which aggregate information for each airport and aircraft, respectively.

Before delving deeper into the results, let us add some delays. As previously explained, these can be of two types, affecting routes or airports, respectively. Starting from the former, we can add uniformly distributed delays across all routes using the function ERD_Normal, as shown in Listing 3.

Listing 3: Code to define enroute delays, execute the simulation, and finally extracting high-level results about the delay evolution.

The fourth line defines the parameters to be passed to the delay function, namely the set of origin and destination airports defining those routes (with

- 1

indicating all airports) and the average and standard deviation of the normal distribution from which delays are drawn. Note that this function adds a delay proportional to the duration of each flight; hence, a random value of

0.05

implies that the total duration in increased by a

5 %

.

With this defined, we can submit the results to the AnalyseResults function, which synthesises high-level time series from the individual operations—see last three lines of Listing 3. To illustrate, we can access the variable allResults.avgArrivalDelay[:, 0] to obtain a time series of the average arrival delay per hour in the first airport, and allResults.avgDepartureDelay[:, 0] for the equivalent at departure. Both time series are represented in the left panel of Figure 3. Note how departure delays are only positive as, by construction, aircraft cannot take off before the scheduled time, and furthermore note how, thanks to the buffer time, most of the arrival delays are absorbed before the next departure.

We can next add some delays at a specific airport, in this case, the first one, using the function AD_AbsNormal; as the name implies, delays are calculated as the absolute value of random numbers drawn from a normal distribution. The code for this is included in Listing 4.

Listing 4: Definition of airport-based delays.

The same time series are now represented in the right panel of Figure 3. As is to be expected, delays are now substantially larger and mostly positive; still, the buffer time is able to partially compensate for them.

3.2. What Is the Impact of the Buffer Time and of Links Between Flights?

The role of the buffer time is easily depicted, i.e., it must reduce the propagation of delays, and hence their total amount; yet, one may be interested in describing the exact transition, i.e., whether increasing the buffer time decreases the total delay linearly or otherwise. To achieve this aim, we here start from a random connectivity scenario of six airports and 80 aircraft, and with two sources of enroute delays: a Gaussian one across all routes and an exponential one (see the function ERD_Disruptions) only affecting the route

a_{1} \to a_{2}

(

a_{1}

and

a_{2}

being two airports randomly chosen in each simulation). We further add links between flights, i.e., situations in which one of them cannot depart until a second one has arrived; this is performed across all pairs of routes, and with a variable probability (between

0 %

and

20 %

). Note that these simulation parameters, such as, e.g., the number of airports and aircraft, have been chosen to yield a system with high-enough traffic; at the same time, they are fully tuneable, as previously shown, and the user can change their values to analyse specific conditions of interest.

The results, as a function of the buffer time and of the percentage of linked flights, are depicted in Figure 4. Specifically, the top panels report the evolution of the total delay time, i.e., the sum of the landing delay of all flights (top left panel), and of the standard deviation of the same (top right panel). It can be appreciated that, as expected, the total delay drops with increasing buffer times, and that such a drop is mostly linear. On the other hand, the bottom left and right panels report the number of functional links (i.e., pairs of airports between which a propagation is detected), as yielded by the Granger causality and transfer entropy tests—details on these are included below. The number of functional links presents a sharper phase transition: it drops to zero (or at least, to a very small number) as soon as the buffer time surpasses a threshold. Finally, and also not surprisingly, both the total delay and the threshold are larger for increasing percentages of linked flights. This is highlighted in the insets of the bottom panels, representing the minimum buffer time required to reduce the number of functional links by half, as a function of the fraction of linked flights. Conclusions on these results will be drawn below.

3.3. Which Functional Metric Ought to Be Used?

One of the open questions in the reconstruction of functional networks of delay propagations [12,13,14,15,16] is which functional metric should be used, among the many alternatives available in the literature. In other words, given two time series representing the evolution of delays at two airports a and b, the aim is to apply a metric on these time series able to detect whether the delays in a have an impact on b. Even when restricting to directional causality metrics, i.e., discarding correlation metrics and other non-directed tests, several alternatives are available, each one offering advantages and limitations. We here test a few options by resorting to a minimal scenario composed of three airports (a, b, and c), connected by a group of five aircraft operating between airports a and b, and by a similar group between b and c. Delays are added according to a Gaussian distribution, with airport a further experiencing higher delays between 11:00 and 13:00. Finally, all flights operating the segment

b \to c

are linked to flights operating

a \to b

.

The objective of this scenario is to assess whether a propagation of delays is detected between airports a and c. As delays mostly appear on the former, but both airports are not connected by direct flights, the only source of propagation must be the linkage between flights—in other words, we are only measuring reactionary delays.

Functional links, i.e., potential propagation instances between a and c, are evaluated by applying the following metrics and tests on the average hourly arrival delay time series:

Granger Causality (GC). The GC test was initially proposed to test for causal relations in economic time series, but has since found wide-ranging applications in various scientific and technical fields [37,38]. It is based on the idea of “predictive causality” [39], i.e., situations in which past values of one time series provide statistically significant information about future values of another time series beyond what is explained by the past values of the latter. The test compares two linear models: a restricted model that predicts future values using only its own past delays and an unrestricted model that incorporates past values from the second time series. Whenever the latter has a higher prediction accuracy, it is said that the latter time series “Granger-causes” the former one. Note that, in spite of its popularity, it is strictly not a test for causality, as highlighted by Granger himself [40]; or, to use the words of Ref. [41], “Granger causality is designed to measure effect, not mechanism”.
Continuous Ordinal Patterns (COPs). This is a method based on pre-processing the time series under analysis using Continuous Ordinal Patterns (COPs) [42] to then apply the same GC test as described above [43]. COPs are patterns (here of length 4) that are compared against sub-windows of the original time series; they thus quantify the presence of specific non-linear structures, making these explicit for the GC test, and hence overcoming the linear nature of the latter.
Transfer Entropy (TE). The TE from X to Y is defined as the amount of uncertainty reduced in the future values of Y by knowing the past values of X, after considering the past values of Y [44]. Such uncertainty is calculated as an entropy, which is in turn calculated using two estimators of the underlying probability distributions.
- Ordinal estimator: the probability distribution is obtained by mapping the time series into an ordinal space, created by “permutation patterns”—i.e., the rank order of values inside small sub-windows of the original series [45]. We here consider pattern lengths (also called embedding dimensions) of $D = 3$ , 4, and 5.
- Metric estimator: this approach, also called the Kozachenko–Leonenko estimator [46], uses a nearest-neighbours approach to estimate the entropy of a continuous random variable as the expectation of the logarithm of the density. We here use $k = 4$ , 6, and 8 nearest neighbours.
The implementation of this metric corresponds to the one included in the infomeasure Python package [47].

Additional parameters of the scenario include the time separation between airports, being random numbers drawn from a uniform distribution

U (0.9, 1.2)

, in hours; a hourly capacity of 30 operations for all airports; and a minimum turnaround time and buffer time of

0.2

and

0.3

, respectively, also in hours. In all cases, the maximum lag, i.e., the maximum number of hours that may be taken for the delays to propagate, is set to four. Additionally, the TE values have been converted to p-values using permutation tests with 100 randomly shuffled time series—note that the GC test, and hence the COP one, natively yield p-values.

The results for all tests, in terms of the evolution of the median

{log}_{10}

of p-values, are depicted in Figure 5. Several interesting conclusions can be drawn. First of all, the selection of the parameter in the

T E

with the ordinal estimator has a major impact, defining whether the propagation is detected or not; yet, the same does not occur in the case of the metric estimator. Secondly, and not surprisingly, all tests benefit from longer time series—something that is challenging to achieve in the case of real data due to the non-stationarity of the system, as previously discussed in Ref. [48].

Not less importantly, the GC test is the one yielding the smallest p-values. Inasmuch as smaller p-values indicate stronger causal relationships, this result can have a two-fold explanation. On the one hand, the GC test has generally weaker requirements in terms of the minimum time series length; hence, the weak results of the TE may be due to a lack of data, but this cannot explain the weak results of COPs. On the other hand, this may point to the fact that the underlying propagation phenomenon is a linear one; non-linear tests, such as COPs and TE, may be less suited for detecting it. This suggests an interesting hypothesis: while the propagation of delays between pairs of routes may be a linear process, the systemic propagation, i.e., as seen in the macroscale across multiple routes, becomes non-linear, as illustrated by the phase transitions of Figure 4. Notably, this hypothesis could here be considered as thanks to the flexibility of the model; evaluating the same in real data would be a challenging task at best.

4. Additional Technical Considerations

The package is freely available both in the PyPI (https://pypi.org/project/synthatdelays/, accessed on 1 October 2025) and Conda-forge (https://github.com/conda-forge/synthatdelays-feedstock, accessed on 1 October 2025) repositories; instructions for its installation are available in the aforementioned links. It supports Python versions 3.11 and above, including the latest Python 3.14, and only uses standard external libraries—Numpy [49] and Statsmodels [50] being its only dependencies. The code has extensively been tested, with a coverage (at the time of writing) of

99 %

. The source code is freely available in a GitLab repository (https://gitlab.com/MZanin/synth-at-delays, accessed on 1 October 2025), alongside documentation and feedback tools—the reader is encouraged to submit bugs and improvement requests through it.

As previously mentioned, one of the advantages of this package is the reduced computational cost necessary to perform a simulation. To illustrate this, Figure 6 reports the time required to rum a complete simulation—results were obtained using a single core of a

3.8

GHz Intel Core i7 processor. It can be appreciated that the run time scales almost linearly with the number of flights, while the number of airports has a negligible impact. Notably, small size simulations only require few seconds to complete, thus making complete statistical characterisations viable even in standard laptops.

5. Discussion and Conclusions

In this contribution we presented and described SynthATDelays, a Python package designed to generate realistic, yet minimalist, synthetic delay data of an air transport system. It provides a tool to establish lab conditions for working with air traffic delay data, thus filling the missing safe ground between delay data and their analysis and facilitating the verification of analysis programs and scientific workflows. This approach enables the empirical testing of hypotheses concerning aircraft delays, as well as our capacity to articulate such delays and the underlying causes. In essence, it contributes to the scientific robustness of associated data-based analyses.

Beyond presenting the technical characteristics of the package, we also illustrated how it can be used to answer specific research questions. In Section 3.2 we proposed an analysis of the evolution of delays and hence of the capacity of the system to absorb them when the operational buffer time is changed; and in Section 3.3 we compared the performance of three causality tests to detect the propagation of delays. Note that the last one is especially relevant in the context of reconstructing functional networks. Most existing studies [13,14,15,16] only apply one test, arbitrarily chosen by the researcher according to criteria like the length of available time series, yet these tests inherently measure different aspects of the signal, yielding potentially complementary views to the problem. As shown in Figure 5, linear methods seem to outperform non-linear ones; additionally, the results are not sensitive to the parameter k of the transfer entropy with a metric estimator, but are sensitive to the parameter D of the ordinal version. In short, these example illustrate how the model can be used to make informed decisions about the tools used in the data analysis.

While the simplicity of the model here proposed is by design and corresponds to a current research gap, the practitioner must also be aware of the inherent limitations. Specifically, SynthATDelays is not intended to simulate all aspects of the air transport system. To illustrate, airport queues are simplified and the corresponding capacity does not take into account runway configurations or weather conditions; similarly, we suppose that flights do not encounter limitations while airborne, such as, e.g., capacity restrictions at specific sectors or airspaces. This package does not substitute existing simulation models, like those cited in the introduction, but rather aims at complementing them when the objective is to validate specific data-based analysis tools.

As a final point, the characteristics and functionalities here described correspond to version 1.0.0 of the package, and these may have evolved at the time of reading. We are specifically planning to extend the customisation options available to the user, including tuneable buffer times per airport and routes, and tuneable aircraft performances. We further invite the community to submit suggestions and feature requests through the GitLab repository.

Author Contributions

Conceptualisation, M.Z.; software, C.M.B. and M.Z.; validation, C.M.B.; data curation, M.Z.; writing—original draft preparation, C.M.B. and M.Z.; writing—review and editing, C.M.B. and M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 851255). This work was partially supported by the María de Maeztu project CEX2021-001164-M funded by the MICIU/AEI/10.13039/501100011033 and FEDER, EU.

Data Availability Statement

The software library described in this work is freely available at https://gitlab.com/MZanin/synth-at-delays, alongside code to reproduce the shown examples.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Carlier, S.; De Lépinay, I.; Hustache, J.C.; Jelinek, F. Environmental impact of air traffic flow management delays. In Proceedings of the 7th USA/Europe Air Traffic Management Research and Development Seminar (ATM2007), Barcelona, Spain, 2–5 July 2007; Volume 2, p. 16. [Google Scholar]
Peterson, E.B.; Neels, K.; Barczi, N.; Graham, T. The economic cost of airline flight delay. J. Transp. Econ. Policy 2013, 47, 107–121. [Google Scholar]
Cao, Y.; Zhu, C.; Wang, Y.; Li, Q. A method of reducing flight delay by exploring internal mechanism of flight delays. J. Adv. Transp. 2019, 2019, 7069380. [Google Scholar] [CrossRef]
Wang, Y.; Cao, Y.; Zhu, C.; Wu, F.; Hu, M.; Duong, V.; Watkins, M.; Barzel, B.; Stanley, H.E. Universal patterns in passenger flight departure delays. Sci. Rep. 2020, 10, 6890. [Google Scholar] [CrossRef]
Mitsokapas, E.; Schäfer, B.; Harris, R.J.; Beck, C. Statistical characterization of airplane delays. Sci. Rep. 2021, 11, 7855. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Liao, C.; Hang, X.; Li, L.; Delahaye, D.; Hansen, M. Distribution prediction of strategic flight delays via machine learning methods. Sustainability 2022, 14, 15180. [Google Scholar] [CrossRef]
Olivares, F.; Zanin, M. Quantifying Deviations from Gaussianity with Application to Flight Delay Distributions. Entropy 2025, 27, 354. [Google Scholar] [CrossRef]
Schultz, M.; Lorenz, S.; Schmitz, R.; Delgado, L. Weather impact on airport performance. Aerospace 2018, 5, 109. [Google Scholar] [CrossRef]
Zanin, M.; Zhu, Y.; Yan, R.; Dong, P.; Sun, X.; Wandelt, S. Characterization and prediction of air transport delays in China. Appl. Sci. 2020, 10, 6165. [Google Scholar] [CrossRef]
de Oliveira, M.; Eufrásio, A.B.R.; Guterres, M.X.; Murça, M.C.R.; de Arantes Gomes, R. Analysis of airport weather impact on on-time performance of arrival flights for the Brazilian domestic air transportation system. J. Air Transp. Manag. 2021, 91, 101974. [Google Scholar] [CrossRef]
Rodríguez-Sanz, Á.; Cano, J.; Rubio Fernandez, B. Impact of weather conditions on airport arrival delay and throughput. Aircr. Eng. Aerosp. Technol. 2022, 94, 60–78. [Google Scholar] [CrossRef]
Zanin, M. Can we neglect the multi-layer structure of functional networks? Phys. A Stat. Mech. Its Appl. 2015, 430, 184–192. [Google Scholar] [CrossRef]
Pastorino, L.; Zanin, M. Air delay propagation patterns in Europe from 2015 to 2018: An information processing perspective. J. Phys. Complex. 2021, 3, 015001. [Google Scholar] [CrossRef]
Zhang, X.; Zhao, S.; Mei, H. Analysis of airport risk propagation in Chinese air transport network. J. Adv. Transp. 2022, 2022, 9958810. [Google Scholar] [CrossRef]
Pastorino, L.; Zanin, M. Local and Network-Wide Time Scales of Delay Propagation in Air Transport: A Granger Causality Approach. Aerospace 2023, 10, 36. [Google Scholar] [CrossRef]
Chen, S.; Du, W.; Liu, R.; Cao, X. Finding spatial and temporal features of delay propagation via multi-layer networks. Phys. A Stat. Mech. Its Appl. 2023, 614, 128526. [Google Scholar] [CrossRef]
Rebollo, J.J.; Balakrishnan, H. Characterization and prediction of air traffic delays. Transp. Res. Part C Emerg. Technol. 2014, 44, 231–241. [Google Scholar] [CrossRef]
Monmousseau, P.; Delahaye, D.; Marzuoli, A.; Féron, E. Predicting and analyzing US air traffic delays using passenger-centric data-sources. In Proceedings of the ATM 2019, 13th USA/Europe Air Traffic Management Research and Development Seminar, Vienna, Austria, 17–21 June 2019. [Google Scholar]
Liu, Y.; Liu, Y.; Hansen, M.; Pozdnukhov, A.; Zhang, D. Using machine learning to analyze air traffic management actions: Ground delay program case study. Transp. Res. Part E Logist. Transp. Rev. 2019, 131, 80–95. [Google Scholar] [CrossRef]
Huynh, T.K.; Cheung, T.; Chua, C. A systematic review of flight delay forecasting models. In Proceedings of the 2024 7th International Conference on Green Technology and Sustainable Development (GTSD), Ho Chi Minh City, Vietnam, 25–26 July 2024; pp. 533–540. [Google Scholar]
Grether, D.; Fürbas, S.; Nagel, K. Agent-based modelling and simulation of air transport technology. Procedia Comput. Sci. 2013, 19, 821–828. [Google Scholar] [CrossRef]
Delgado, L.; Gurtner, G.; Weiszer, M.; Bolic, T.; Cook, A. Mercury: An open source platform for the evaluation of air transport mobility. In Proceedings of the 13th SESAR Innovation Days, Seville, Spain, 27–30 November 2023. [Google Scholar]
Hoekstra, J.M.; Ellerbroek, J. Bluesky ATC simulator project: An open data and open source approach. In Proceedings of the 7th International Conference on Research in Air Transportation, Philadelphia, PA, USA, 20–24 June 2016; Volume 131, p. 132. [Google Scholar]
Hui, K.Y.; Nguyen, C.H.; Lui, G.N.; Liem, R.P. AirTrafficSim: An open-source web-based air traffic simulation platform. J. Open Source Softw. 2023, 8, 4916. [Google Scholar] [CrossRef]
Lin, Y.; Zhang, J.W.; Liu, H. Deep learning based short-term air traffic flow prediction considering temporal—Spatial correlation. Aerosp. Sci. Technol. 2019, 93, 105113. [Google Scholar] [CrossRef]
Yan, Z.; Yang, H.; Li, F.; Lin, Y. A deep learning approach for short-term airport traffic flow prediction. Aerospace 2021, 9, 11. [Google Scholar] [CrossRef]
Wang, T.; Chen, J.; Lü, J.; Liu, K.; Zhu, A.; Snoussi, H.; Zhang, B. Synchronous spatiotemporal graph transformer: A new framework for traffic data prediction. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 10589–10599. [Google Scholar] [CrossRef]
Xu, Q.; Pang, Y.; Liu, Y. Air traffic density prediction using Bayesian ensemble graph attention network (BEGAN). Transp. Res. Part C Emerg. Technol. 2023, 153, 104225. [Google Scholar] [CrossRef]
Chen, X.; Yu, H.; Cao, K.; Zhou, J.; Wei, T.; Hu, S. Uncertainty-aware flight scheduling for airport throughput and flight delay optimization. IEEE Trans. Aerosp. Electron. Syst. 2019, 56, 853–862. [Google Scholar] [CrossRef]
Wei, M.; Yang, S.; Wu, W.; Sun, B. A multi-objective fuzzy optimization model for multi-type aircraft flight scheduling problem. Transport 2024, 39, 313–322. [Google Scholar] [CrossRef]
Xu, Y.; Wandelt, S.; Sun, X. Airline scheduling optimization: Literature review and a discussion of modelling methodologies. Intell. Transp. Infrastruct. 2024, 3, liad026. [Google Scholar] [CrossRef]
Zhou, S.; Xie, P.; Chen, X.; Wang, Y.; Zhang, Y.; Du, Y. Optimization of relative parameters in transfer entropy estimation and application to corticomuscular coupling in humans. J. Neurosci. Methods 2018, 308, 276–285. [Google Scholar] [CrossRef] [PubMed]
Sipahi, R.; Porfiri, M. Improving on transfer entropy-based network reconstruction using time-delays: Approach and validation. Chaos Interdiscip. J. Nonlinear Sci. 2020, 30, 023125. [Google Scholar] [CrossRef] [PubMed]
Ursino, M.; Ricci, G.; Magosso, E. Transfer entropy as a measure of brain connectivity: A critical analysis with the help of neural mass models. Front. Comput. Neurosci. 2020, 14, 45. [Google Scholar] [CrossRef] [PubMed]
Novelli, L.; Lizier, J.T. Inferring network properties from time series using transfer entropy and mutual information: Validation of multivariate versus bivariate approaches. Netw. Neurosci. 2021, 5, 373–404. [Google Scholar] [CrossRef] [PubMed]
SynthATDelays Documentation. Available online: https://gitlab.com/MZanin/synth-at-delays/-/wikis/Home/ (accessed on 15 August 2025).
Granger, C.W. Investigating causal relations by econometric models and cross-spectral methods. Econom. J. Econom. Soc. 1969, 37, 424–438. [Google Scholar] [CrossRef]
Shojaie, A.; Fox, E.B. Granger causality: A review and recent advances. Annu. Rev. Stat. Its Appl. 2022, 9, 289–319. [Google Scholar] [CrossRef]
Diebold, F.X. Elements of Forecasting; South-Western College Pub.: Cincinnati, OH, USA, 1998. [Google Scholar]
Granger, C.W. Causality, cointegration, and control. J. Econ. Dyn. Control 1988, 12, 551–559. [Google Scholar] [CrossRef]
Barrett, A.B.; Barnett, L. Granger causality is designed to measure effect, not mechanism. Front. Neuroinform. 2013, 7, 6. [Google Scholar] [CrossRef] [PubMed]
Zanin, M. Continuous ordinal patterns: Creating a bridge between ordinal analysis and deep learning. Chaos Interdiscip. J. Nonlinear Sci. 2023, 33, 033114. [Google Scholar] [CrossRef] [PubMed]
Zanin, M. Augmenting granger causality through continuous ordinal patterns. Commun. Nonlinear Sci. Numer. Simul. 2024, 128, 107606. [Google Scholar] [CrossRef]
Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 2000, 85, 461. [Google Scholar] [CrossRef]
Bandt, C.; Pompe, B. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef] [PubMed]
Kozachenko, L. Sample estimate of the entropy of a random vector. Probl. Pered. Inform. 1987, 23, 9. [Google Scholar]
Büth, C.M.; Acharya, K.; Zanin, M. Infomeasure: A Comprehensive Python Package for Information Theory Measures and Estimators. Sci. Rep. 2025, 15, 29323. [Google Scholar] [CrossRef]
Acharya, K.; Olivares, F.; Zanin, M. How representative are air transport functional complex networks? A quantitative validation. Chaos Interdiscip. J. Nonlinear Sci. 2024, 34, 043133. [Google Scholar] [CrossRef] [PubMed]
Harris, C.R.; Millman, K.J.; Van Der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
Seabold, S.; Perktold, J. Statsmodels: Econometric and statistical modeling with python. SciPy 2010, 7, 92–96. [Google Scholar]

Figure 1. Logical organisation of the package: from left to right, the user starts by defining the scenario and its multiple options; then executes the simulation; and then extracts results at different granularity levels. The middle and bottom rows further report the main functions involved and the classes in which data are stored.

Figure 2. Schematic representation of the logic behind the update of the status of each aircraft. See main text for details.

Figure 3. Graphical representation of the time series generated with the code described in Section 3. Left and right panels depict the results without and with airport-based delays, respectively. Blue and orange lines correspond to arrival and departure average delays for the first airport.

Figure 4. Evolution of the total landing delay (top panels), and number of functional links detected by the Granger causality (bottom left) and the transfer entropy (bottom right) as a function of the buffer time (in hours, X axis) and the percentage of linked flights (line colours and styles; see legends). Left and right top panels report the mean and standard deviation of the total delay, respectively, in thousands of hours. Insets in the bottom panels depict the evolution of the minimum buffer time required to reduce the number of functional links by

50 %

as a function of the percentage of linked flights. See main text for the definition of the simulation scenario.

Figure 4. Evolution of the total landing delay (top panels), and number of functional links detected by the Granger causality (bottom left) and the transfer entropy (bottom right) as a function of the buffer time (in hours, X axis) and the percentage of linked flights (line colours and styles; see legends). Left and right top panels report the mean and standard deviation of the total delay, respectively, in thousands of hours. Insets in the bottom panels depict the evolution of the minimum buffer time required to reduce the number of functional links by

50 %

as a function of the percentage of linked flights. See main text for the definition of the simulation scenario.

Figure 5. Evolution of the

{log}_{10}

of the p-values yielded by several functional metrics for time series generated by the synthetic model described in Section 3.3. From left to right, the metrics include the GC and COP, and the TE with two different estimators—see main text for details. For the TE, different lines correspond to different values of the estimator parameter; see the legend. The results are obtained as the median over 200 independent realisations.

Figure 5. Evolution of the

{log}_{10}

of the p-values yielded by several functional metrics for time series generated by the synthetic model described in Section 3.3. From left to right, the metrics include the GC and COP, and the TE with two different estimators—see main text for details. For the TE, different lines correspond to different values of the estimator parameter; see the legend. The results are obtained as the median over 200 independent realisations.

Figure 6. Analysis of the computational cost. The left panel reports the average time required to run a full simulation as a function of the number of aircraft (different lines; see text on top of them) and airports (X axis). The right panel reports the same information as a function of the number of flights; each point is thus the average of a curve on the left panel. Results correspond to the average over 20 independent realisations.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Büth, C.M.; Zanin, M. SynthATDelays: A Minimalist Python Package for the Generation of Synthetic Air Transport Delay Data. Aerospace 2025, 12, 900. https://doi.org/10.3390/aerospace12100900

AMA Style

Büth CM, Zanin M. SynthATDelays: A Minimalist Python Package for the Generation of Synthetic Air Transport Delay Data. Aerospace. 2025; 12(10):900. https://doi.org/10.3390/aerospace12100900

Chicago/Turabian Style

Büth, Carlson Moses, and Massimiliano Zanin. 2025. "SynthATDelays: A Minimalist Python Package for the Generation of Synthetic Air Transport Delay Data" Aerospace 12, no. 10: 900. https://doi.org/10.3390/aerospace12100900

APA Style

Büth, C. M., & Zanin, M. (2025). SynthATDelays: A Minimalist Python Package for the Generation of Synthetic Air Transport Delay Data. Aerospace, 12(10), 900. https://doi.org/10.3390/aerospace12100900

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SynthATDelays: A Minimalist Python Package for the Generation of Synthetic Air Transport Delay Data

Abstract

1. Introduction

2. The Structure and Internal Logic of the Package

2.1. Setting Up the Simulation

2.2. The Simulation

2.3. Analysis of the Results

3. Examples

3.1. Step-by-Step Tutorial

3.2. What Is the Impact of the Buffer Time and of Links Between Flights?

3.3. Which Functional Metric Ought to Be Used?

4. Additional Technical Considerations

5. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI