Analysis of the Structure and Dynamics of European Flight Networks

We analyze structure and dynamics of flight networks of 50 airlines active in the European airspace in 2017. Our analysis shows that the concentration of the degree of nodes of different flight networks of airlines is markedly heterogeneous among airlines reflecting heterogeneity of the airline business models. We obtain an unsupervised classification of airlines by performing a hierarchical clustering that uses a correlation coefficient computed between the average occurrence profiles of 4-motifs of airline networks as similarity measure. The hierarchical tree is highly informative with respect to properties of the different airlines (for example, the number of main hubs, airline participation to intercontinental flights, regional coverage, nature of commercial, cargo, leisure or rental airline). The 4-motif patterns are therefore distinctive of each airline and reflect information about the main determinants of different airlines. This information is different from what can be found looking at the overlap of directed links.


Introduction
The air transportation system (ATS) is a socio-technical system analyzed as a complex network for many years [1,2]. The ATS is analyzed at different geographical scales (see, for example, studies covering the ATSs of China [3], Europe [4] and the U.S. [5]) and at different resolutions starting from the airport-flight network down to the network of the reference points used in the definition of flight routes (called navigation points) [6].
In the majority of studies, the ATS is investigated by setting up a flight network where nodes are airports and flights connecting airports are links. The flight networks have been investigated by considering them undirected and/or directed networks (in this last case, the direction of the links originates from the departing airport and ends up in the arrival airport), unweighted and/or weighted [7]. Several studies have considered the problem of the resilience of the ATS to failures and attacks [5,[8][9][10]. Other studies have selected a subset of links (labeled as the "backbone" of the ATS) presenting statistical properties that are not consistent with a specific null hypothesis [11,12], making the ATS one of the first systems where statistically validated networks [13] have been investigated.
The ATS is a complex system composed of well-defined subunits. In fact, flights are operated by different airlines that compete and collaborate among them. Since 2010, the ATS has been analyzed by distinguishing the role of its subunits (i.e., by analyzing properties of flight networks of single airlines [5]). Moreover, the presence of different flight networks observed for different airlines made this system a natural candidate for the study of so-called multiplex, which are networks where nodes can have multiple kinds of relations called layers. In fact, the ATS was one of the first socio-technical systems described as a multiplex, where layers represent flights operated by different airlines [14,15].
Flight networks have been investigated from different perspectives and at different scales [16][17][18], for example, by considering basic network metrics, topology of the degree distribution, resilience to attack or failures, community detection of large clusters and computation and analysis of network motifs. Motifs are isomorphic subnetworks of a specified number of nodes and shape. Motifs were first investigated in studies of social networks [19]. In these earlier studies, motifs were primarily investigated as triads (i.e., as subnetworks of three nodes) and put in relation with the properties of the degree sequence. At the beginning of this century, such structures were also investigated in biological systems under the name of motifs [20]. By considering isomorphic motifs (i.e., subnetworks where the identity of the node is not taken into account when considering the shape of the subnetwork), there are 13 isomorphic for subnetworks with 3 nodes or 3-motifs. This number soon explodes when the subnetwork includes more nodes. For subnetworks with 4 nodes (or 4-motifs), one counts 199 isomorphic motifs [20].
Network motifs have been investigated in flight networks both in studies comparing the informativeness of network detection in several types of complex networks [21] and in studies fully focused on the static and dynamics characteristics of the flight networks [22][23][24][25]. In this study, we investigate the temporal evolution of 3-motifs and 4-motifs for the 50 European airlines with the highest number of flights in the European Civil Aviation Conference (ECAC) airspace in the year 2017. By investigating the number and temporal evolution of the 3-and 4-motifs, we are able to perform an unsupervised classification of the 50 airlines indicating that main differences among different airlines are due to their regional specialization (including the ability to perform intercontinental flights) and to their business model. We observe that the business model of each airline ranges between the two stylized models of hub-and-spoke and point-to-point business models [26,27]. In a hub-and-spoke model, one or more airports act as "hubs", i.e., as special airports directly connecting all remaining airports. In a hub-and-spoke structure with a single hub, the network therefore has a star topology with the hub at the center of the star and all the other airports acting as leaves of the network. In the point-to-point structure, all the airports are equivalent and the network degree is characterized by pair interconnections between airports.
The main goal of our investigation is a reliable and effective classification of airlines. The classification is obtained by an unsupervised methodology that only takes into account the information about the airline flights. We hypothesize that the business models of each airline induce specific constraints on its flight network. These constraints are reflected in the motif occurrence of each airline. Our network analysis shows that European airlines present a heterogeneous profile distributed between the two boundaries of huband-spoke and point-to-point business models. The heterogeneity is clearly shown by using a measure of concentration of degree in the degree sequence. Specifically, as a measure of concentration, we use an adapted version of the Herfindal-Hirshman index [28,29]. For the sake of simplicity, in the remaining text, we will call this index by the more traditional, although imprecise, name of Herfindal index.The time evolution of motifs shows that the basic temporal unit of the flight schedule is the week. Differences in the degree concentration observed during winter and summer schedules are detected, but their amount is negligible for most airlines. Average values of the motif occurrences may therefore be a useful proxy of the average behavior of the airlines over a calendar year. By using average values of the 4-motifs occurrence, we are able to obtain an unsupervised classification of airlines. The obtained hierarchical clustering is showing that the presence of a given number of hubs together with the presence or absence of intercontinental flights characterizes groups of airlines. On the other hand, a hierarchical clustering based on a similarity measure estimated starting from the co-presence of the two airlines in the origin-destination flight is providing a poorly informative hierarchical clustering.
The paper is organized as follows. In Section 2, we discuss the data used in our analysis and the metrics and methods used to characterize flight networks. In Section 3, we present our results about the heterogeneity of the degree concentration and our results about the structure and time evolution of 3-and 4-motifs for the different airlines. Average 4-motif occurrences are used to perform an unsupervised clustering of the 50 airlines providing an informative hierarchical cluster. In Section 4, we discuss our results.

Data and Methods
We investigate the flight networks of the 50 biggest commercial airlines flying over the European flight zone. Specifically, we consider all flights that occurred during the period from 1 January 2017 to 31 December 2017.
A flight network is a network where nodes are airports and links are flights that occurred in a given time interval. By considering that the flight occurs from a departing airport to an arrival airport, flight networks can be described as directed weighted networks (where the weight of a link is the number of flights that occurred from airport i to airport j in the chosen time interval). In this study, we considered flight networks as directed networks while we disregard the weights of the links. Networks are computed using daily and weekly time intervals.
Flight networks and their metrics of each airline are analyzed both in their time evolution and in their subunits. Specifically, we investigate the daily degree sequence of each airline for each day. In our analysis, we primarily focus on the concentration of the highest degree values on a limited set of airports usually described as "hubs". This is performed by adapting the Herfindal index, i.e., a well-known measure of concentration, to the degree sequence. The subunits analysis is carried out by considering all isomorphic small networks with 3 or 4 nodes. These subnetworks are called motifs in the biological literature or triads or subnetworks in the social science literature.
We compare similarity between pairs of airlines both by considering the links, i.e., flights, they are performing on a specific day or week and by considering the motifs they present on a specific day or on average over the full year. Similarity between the airlines is therefore estimated and interpreted by extracting hierarchical trees from the selected similarity matrix.

Flight Data
Our dataset comprises all the flights that, even partly, cross the ECAC airspace for the entire 2017 year. Data were obtained by EUROCONTROL (http://www.eurocontrol.int, accessed on 4 February 2022), the European public institution that coordinates and plans air traffic control for all of Europe.
Specifically, we obtained access to the Demand Data Repository (DDR) from which one can obtain all flights followed by any aircraft in the ECAC airspace. Data about flights contain several types of information. In the present study, we just focus on the origin-destination of each flight crossing the ECAC airspace at a given time.
By considering that our focus is on the specific characteristics of airlines, in the present study, we investigate flights of the major 50 airlines performing flights in the ECAC airspace in 2017. In our set, we do not consider Air Berlin because this airline ceased operations on 27 October 2017. Since 2016, Germanwings has been a lease operator for its sister company Eurowings. In our set, we are not considering Germanwings flights. The selected airlines have performed 65.7% of the total number of flights of 2017, which corresponds to approximately 3000 flights per company per month on average. The list of the 50 airlines is provided in Appendix A. The large majority of airlines are commercial airlines. There are 24 flag carrier airlines, 14 low cost carrier (LCC) airlines, 6 regional airlines, 2 leisure airlines, 2 scheduled airlines, 1 cargo airline and 1 rental airline.

Herfindal Index
The Herfindal index [28] has been introduced in the economic literature in order to measure the amount of competition among industrial firms. As such, it has also been used as an indicator of concentration, as large firms usually contribute more to the Herfindal index than smaller ones. In the context of complex networks, the Herfindal index can be defined as where d i is the degree of node i and 2m is twice the number of directed links.

Motifs Detection
The investigation of subnetworks of fixed size (also called motifs) has a long history. Originally investigated as triads and put in relation with the properties of the degree sequence in the investigation of social networks [19], they were then also introduced in biology where the term "motif" was used for the first time [20].
In network analysis, a motif of size k is a structure of k nodes not necessarily all linked between each other, as, for example, in Figure 1. Motifs are different from cliques. A clique is defined in undirected networks, and it is a subgraph such that every two distinct vertices are adjacent.
For size k = 3, there are 13 isomorphic 3-motifs. In Figure 1, we are showing all of them together with the classification scheme used in [20]: The number of isomorphic 4-motifs is 199 and therefore much larger than 13. As for the 3-motifs, we use the classification of [20]. For the shape of each 4-motif, one can consult the motifs dictionary that can be downloaded from the website of Uri Alon laboratory.
Network motif analysis can be performed by computational or analytical approaches. In our investigation, we considered a computational approach as it allows for the exact count of network motifs. Computational approaches usually follow a three-step procedure that can be summarized as follows: • Search and enumerate occurrences of a topology with fixed size in the observed network; • Classify topologies by their isomorphic classes; • Calculate statistical significance for each isomorphic classes comparing occurrences with those in random ensemble.
In particular, we considered the mfinder [30] software developed by Uri Alon laboratory.

Average Linkage Clustering Analysis
We assess the similarity between each pair of the n airline by estimating the correlation between the average occurrence of each 4-motif of each airline. The average is computed over the 365 days of the year. To take into account the large interval of values observed for the different 4-motifs, we use the Spearman correlation coefficient. Therefore, by starting from the matrix of records obtained by averaging the occurrence of each 4-motif, we estimate a correlation matrix and we use the correlation ρ ij as a measure of similarity between airlines i and j.
From the correlation values, we compute a distance according to the relation d ij = 2(1 − ρ ij ). This distance is therefore used to extract a hierarchical tree with the method of the average linkage.
The average linkage cluster analysis is a hierarchical clustering procedure [31,32]. The procedure gives as an output a rooted tree or dendrogram. In this procedure, at each step, when two elements or one element and a cluster or two clusters p and q merge in a wider single cluster t, the distance d tr between the new cluster t and any cluster r is recursively determined as the average distance between any element of t and any other element of cluster r.

Herfindal Index
Our first analysis determines the daily flight network of each investigated airline. The day is defined as the calendar day at European Central Time. For illustrative purposes, we show the networks of the nine biggest airlines on day 1 September 2017 in Figure 2. This day has been retrospectively selected as an example of a day with routinely operational activities. For each flight network, we extract the degree sequence by considering the network as a directed network. The average values over the year of the number of nodes N (i.e., number of airports where airlines flight), the number of direct links E (i.e., the number of distinct origin destination flights), minimum degree, median degree, mean degree, maximum degree, standard deviation of the degree and Herfindal index are shown in Table 1. The metrics shown in Table 1 are quite basic and standard with the exception of the adaptation of the Herfindal index as an indicator of concentration in the degree sequence observed in one or more of the nodes.
Given the definition of Equation (1), a pure hub-and-spoke setting of flights would imply a Herfindal index of 0.25 for large values of N. This is what we observe (see Table 1   Our analysis can therefore confirm that flight network characteristics are deeply related to the business organization of each airline with a prominent role played by the choice of a hub-and-spoke versus a point-to-point structure and with a role played by the number of hubs characterizing the flight network.
In the next section, we investigate 3-motifs to better characterize similarity and differences among the flight networks of airlines.

3-Motifs
We have computed the number of 3-motifs present on daily flight networks for all 50 airlines. In Figure 4, we show a color code map of the occurrence of the 13 isomorphic 3-motifs for the 9 largest airlines.
The occurrence of each 3-motif presents large variability among the different types of motifs and is correlated with properties of the flight networks such as number of nodes, number of links, number of bidirectional links and topology structure of the network. The most common 3-motif is motif 78. This type of motif is clearly manifesting that a hublike structure and bidirectional links are essential ingredients of all flight networks. The other 3-motif with all bidirectional link, i.e., 3-motif 238, is significantly present in airlines presenting flight networks with a pronounced point-to-point structure, such as Ryanair, EasyJet and Vueling, or airlines having more than a single hub, such as Lufthansa, Turkish and Scandinavian Airlines. The 3-motifs with only unidirectional links are poorly observed (see average occurrence values of 3-motifs 6, 12, 36, 38 and 98). Some of the 3-motifs with mixed types of links are significantly present (for example, 3-motifs 14 and 74), while others are rather poorly expressed (as in the case of 3-motifs 102 and 108).
The profile of occurrence of the 3-motifs in different airlines is certainly informative. However, the number of 3-motifs is somewhat limited and therefore it is useful to consider motifs of larger size. In the next section, we investigate the occurrence of 4-motifs.  [20]. A zero occurrence is indicated with a −1 value. Each panel refers to one of the nine biggest airlines. The airline name is indicated on top of the panel.

Daily Occurrence of 4-Motifs
We compute the occurrence of all 4-motifs for the daily flight networks of the 50 biggest airlines. In Figure 5, we show a color code map of the occurrence of the 199 isomorphic 4-motifs for the 9 largest airlines.
The profile of 4-motifs is richer than the one of the 3-motifs. Occurrences of the 4motifs span about 5 orders of magnitude. For this reason, in Figure 5, we show the decimal logarithm to provide a comprehensive overview of the results. Airlines characterized by the presence of a single hub such as KLM present only a very limited number of 4-motifs with occurrence different from zero. Airlines with a business model closer to a point-to-point structure such as Ryanair and EasyJet present a higher number of observed 4-motifs. The other airlines characterized by a different number of hubs present an intermediate behavior between the two extremes. In addition to the presence or absence of a given motif at a given day, Figure 5 also shows a time variation of the occurrence of a given motif. To investigate the main frequencies associated with this time variation, we compute the periodogram of the occurrence of a set of 4-motifs. Specifically, we consider the twelve 4-motifs with the highest occurrence averaged over all considered days. In Figure 6, we show the power spectrum of the time evolution of the occurrence of the top twelve 4-motifs of Ryanair. For all 4-motifs, frequency peaks are detected for f = 0.14 day −1 and for its second and third harmonics. The main frequency f = 0.14 day −1 corresponds to a weekly cycle and the second and third harmonics correspond to a bi-weekly or three-weekly cycle. Therefore, the main underlying periodicity is the week periodicity as already observed in the estimation of the Herfindal index (see periodicity observed in Figure 3).  The main frequency f = 0.14 day −1 corresponds to a weekly cycle and the second and third harmonics correspond to a bi-weekly or three-weekly cycle.
All the other airlines have the top 10 4-motifs presenting a high number of bidirectional links. The 4-motif with the highest occurrence for all the top 9 airlines (and indeed the top 4-motif for 49 of 50 airlines) is 4-motif 4382. These motifs present three bidirectional links originating from the same node. As for 3-motif 78, the largest occurrence of this motif reflects the fact that at least one important airport is used as a hub by the airline generating the network. The other 4-motifs composed by only bidirectional links (i.e., 4-motifs 4698, 4958, 13260 and 13278) are compatible with a point-to-point structure or with a hub-and-spoke structure in the presence of at least two hubs. In fact, these 4-motifs are not observed for KLM and are observed at the highest rank for more oriented point-to-point airlines such as Ryanair, EasyJet and Vueling. They are also present when more than one hub is present as, for example, in the case of Lufthansa or Scandinavian Airlines. The ranking of the 4-motifs can therefore be used to evaluate the similarity of flight airline networks and we investigate this possibility in the next section. Figure 7. Shape of the 10 top 4-motifs observed for the 9 largest airlines (see Table 2). The 4-motifs are labeled according to [20]. Left column groups 4-motifs with only unidirectional links. Right column groups 4-motifs with only bidirectional links. Center column groups 4-motifs with both unidirectional and bidirectional links.

Similarity of 4-Motif Profile
We use the information about the 4-motifs occurrence to obtain a categorization of airlines by using the methodology of Section 2.4. It is worth recalling here that, given this specific purpose, it is not necessary for us to maintain the information about the specific airports that is present in a motif. In fact, since we are interested in extracting a clusterization of airlines by using the structural information about the 4-motifs, only the isomorphic motifs will be relevant for us.
The result of our analysis is shown in Figure 8. The hierarchical tree of Figure 8 is highly informative with respect to the clustering of groups of airlines. One airline markedly distinct from all others is NetJets Transportes Aéreos, S.A. (NJE). This airline is the only airline of the set providing rental of jets and therefore observing it distinct from all the others indicates that the observed flights of this rental company have 4-motifs that are quite distinct from the ones of all other airlines. An inspection of the hierarchical tree indicates the presence of clusters of airlines presenting a certain similarity among them and a degree of dissimilarity from the other airlines. Here, we wish to comment about some of them. One cluster is the cluster of KLM, Aeroflot (AFL) and Brussels Airlines (BEL). These three airlines are airlines with a single large hub as testified by a Herfindal index very close to 0.25 (see Table 1). Another cluster comprises Delta Air Lines (DAL), American Airlines (AAL) and United Airlines (UAL). These three airlines are American airlines primarily performing intercontinental flights.

Airline Networks Overlap
It is worth estimating whether similarity between 4-motif occurrences could just be due to overlap between the links of the flight network of airlines. We rule out, to a large extent, this possibility by investigating the degree of overlap between all pairs of airline networks. Our investigation is conducted by estimating the Jaccard measure J(net 1 , net 2 ) between each pair of airline networks net 1 and net 2 . The Jaccard similarity is defined as where |E 1 ∩ E 2 | is the number of directed links appearing in both flight networks, and |E 1 ∪ E 2 | is the number of links that appear in at least one of the two networks.
To take into account weekly variability of flight schedules, we have performed this analysis by considering the weekly schedule of each airline. The results obtained at the daily level are showing a degree of similarity of the same order or less. In Figure 9, we show the average linkage hierarchical tree obtained by using the Jaccard measure as a similarity measure. The hierarchical tree is poorly informative and only a very limited number of small clusters can be highlighted. This is in marked contrast with what we have obtained in the previous section when the similarity measure between airlines was obtained from the analysis of 4-motifs. The hierarchical tree shown in Figure 9 is representative of hierarchical trees obtained for all weeks of 2017.
By summarizing, we are the first to use the Herfindal index to characterize each airline operating in a given period. Moreover, by using the Herfindal index together with 3-and 4-motif analysis, we are able to achieve an unsupervised classification of airlines, clarifying the main characteristics of each airline.

Discussion and Conclusions
In the present study, we have analyzed the structure and dynamics of flight networks of 50 airlines performing most of the flights that occurred in the European airspace in 2017. Our analysis of directed flight networks shows that the degree concentration of the different networks is quite heterogeneous among the different airlines. We have been able to quantify this heterogeneity by using an adapted version of a classic measure of concentration, i.e., the Herfindal index. The Herfindal index provides a simple and reliable estimation of the closeness of the airline network to the reference models classified as hub-and-spoke and point-to-point. It can also be informative about the number of main hubs that are present in a network with a hub-and-spoke structure and multiple hubs. It is worth noting that the European ATS presents a very heterogeneous set of airline companies. In other words, business optimization performed at the level of a single airline generates different business models that eventually coexist in the global system. The time evolution of the different networks presents a basic time cycle that is a weekly cycle. This basic timescale is evident both from the analysis of the time evolution of the Herfindal index and from the analysis of the time evolution of the occurrence of 3-motifs and 4-motifs. The summer-winter cycle primarily detected in the number of flights occurring daily or weekly does not significantly affect the long-term time evolution of the Herfindal index and the occurrence of 3-motifs and 4-motifs. These indicators are therefore more related to the type of business model followed by the airline than to the specific origin-destination links or number of flights operated in a given time interval.
In summary, an unsupervised classification based on hierarchical clustering and obtained by using a correlation coefficient between the occurrence profile of 4-motifs of airline networks as similarity measure is highly informative with respect to the properties of the different airlines (for example, the number of main hubs, their participation to intercontinental flights, their regional coverage, their nature of commercial, cargo, leisure or rental airline). The 4-motifs are therefore distinctive of the airlines and reflect information about the main determinants of the different airlines. Information is distinct from that originating from the overlap of the same directed links.
Such results indicate that a reliable and effective classification of airlines can be obtained by an unsupervised methodology that only takes into account the information about the airline flights. This is an important result given that, currently, the characterization of airlines and their business model has become a fundamental part of modern air transportation systems. An appropriate airline categorization is important not only for the practitioners but because it also influences the passengers' perception. An indubitable advantage of our approach is that it is flexible as it may directly reflect any positioning of an airline within the general landscape of airlines, due to any change in its business model as reflected within its flight plans.