Next Article in Journal
CADS: A Circular-Adaptive Density Smoother for Two-Dimensional Probability Density Estimation of Seasonal Geophysical Data
Previous Article in Journal
GeoVault: Leveraging Human Spatial Memory for Secure Cryptographic Key Management
Previous Article in Special Issue
Data-Driven Predictive Modeling of Passenger-Accepted Vehicle Occupancy in Transport Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Unsupervised Clustering of Cities Using Commercial Air Traffic: A Proxy for Economic Connectivity

1
URJC Technology Research Center for Data, Complex Networks and Cybersecurity Sciences, Rey Juan Carlos University, Plaza de Manuel Becerra, 14, 28028 Madrid, Spain
2
Science, Computing and Technology Department, School of Architecture, Engineering and Design, Universidad Europea de Madrid, Calle Tajo, S/N, 28670 Villaviciosa de Odón, Spain
3
Department of Computer Science and Technology, Universidad Internacional de La Rioja, 26006 Logroño, Spain
4
Independent Researcher, Madrid, Spain
5
Laboratory of Mathematical Computation on Complex Networks and Their Applications, Rey Juan Carlos University, Calle Tulipán, 28933 Móstoles, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(10), 1654; https://doi.org/10.3390/math14101654
Submission received: 19 February 2026 / Revised: 23 April 2026 / Accepted: 28 April 2026 / Published: 13 May 2026
(This article belongs to the Special Issue Modeling of Processes in Transport Systems)

Abstract

This paper proposes a data-driven framework to identify and rank economically connected cities by using commercial air traffic as a proxy for urban economic connectivity. The study is motivated by the limitation of traditional city classifications, which often rely on costly and multidimensional socioeconomic indicators, and by the need for scalable alternatives based on open mobility data. Using daily flight frequencies between 213 cities included in the GaWC classification for the year 2022, we built a time series for each origin–destination pair and unsupervised clustering these temporal profiles. The resulting clusters were used to define the layers of a multiplex network, where each layer represents a different pattern of flight connectivity. City importance was then estimated through Multiplex PageRank, which allows for temporal behavior and multilayer network structure to be combined in a single ranking scheme. Rather than introducing a new standalone algorithm, this paper contributes a reproducible analytical pipeline that integrates time-series clustering with multiplex centrality analysis using open aviation data. The results show that the ranking obtained is broadly aligned with established classifications such as GaWC, supporting the idea that commercial flight dynamics can provide a useful proxy for economic interconnectedness. The proposed approach offers a simple and replicable tool for comparative urban analysis, although the results should be interpreted with caution given the limited post-pandemic period covered by the data.

1. Introduction

1.1. Cities Classification

The 11th United Nations (UN) Sustainable Development Goal highlights the importance of making cities and human settlements inclusive, safe, resilient and sustainable [1]. Cities play a central role in this objective, as they concentrate a large share of the global population and act as key drivers of economic activity [2].
However, cities are complex systems that cannot be described through a single perspective. On the one hand, they have a social dimension, related to quality of life, the use of public space, levels of education and income, or their ability to attract both residents and visitors. On the other hand, there is a governmental dimension, which includes aspects such as public services, budget allocation, governance quality or transparency. Finally, cities also present a business and economic dimension, linked to innovation capacity, company performance, internationalisation and talent attraction.
Because of this complexity, many efforts have been made to classify cities according to different sets of variables. These classifications are commonly used to support urban planning and policy-making, as they provide a way to compare cities and identify strengths and weaknesses. Traditionally, such classifications rely on heterogeneous data sources, including demographic indicators (population, age structure, education), economic variables (income, number of firms), characteristics of the built environment (real estate prices, land use), or infrastructure networks (transport, energy or water systems).
Some well-known examples include the ranking proposed by a global real estate company, which evaluates city competitiveness based on factors such as investment attractiveness, quality of life or innovation capacity [3]. The Global Power City Index (GPCI), developed by the Institute for Urban Strategies, considers six dimensions including economy, research and development, cultural interaction, liveability and accessibility [4]. Another widely used reference is the Globalisation and World Cities (GaWC) network [5], which focuses on international connectedness and models cities as part of a global network based on flows of advanced producer services [5,6].
In parallel, the increasing digitisation of economic and social activities has led to the generation of large amounts of data, some of which have been made publicly available by institutions and companies. These data sources open new possibilities to analyse cities from alternative perspectives. In this work, we explore one of these sources: global commercial flight data between civilian airports.
The objective of this paper is to analyse whether patterns extracted from this mobility data can provide insights into the role of cities within the global system, and to what extent they align with existing classifications. In particular, we compare our results with the GaWC 2020 classification [5], which is used here as a proxy for economic interconnectedness. This choice is mainly motivated by the availability of consistent and publicly accessible data for that specific year.

1.2. Contribution and Novelty

This work aims to provide an alternative perspective on the classification of cities by exploiting mobility data derived from global commercial flights. Unlike traditional approaches, which rely on socio-economic, demographic or infrastructural indicators, our proposal is based on the analysis of temporal patterns extracted from real-world interactions between cities. In particular, we model the dynamics of flight connections as time series and explore their structural properties to identify similarities between urban areas.
The main novelty of this article lies in the combination of two elements. First, we use a data-driven approach based on large-scale, openly available mobility data, which allows us to capture functional relationships between cities beyond static indicators. Second, we apply time series clustering techniques, including shape-based and visibility graph-based methods, to uncover different types of temporal behaviours associated with these connections.
From a methodological point of view, this approach enables the identification of groups of cities that share similar interaction patterns over time, offering a complementary view to existing classifications. In addition, the comparison with established rankings, such as GaWC, allows us to assess the extent to which mobility-based patterns align with more traditional measures of global urban importance.
Overall, the contribution of this work is twofold: on the one hand, it introduces a novel framework for city classification based on temporal interaction data; on the other hand, it provides empirical evidence on how these data-driven groupings relate to widely used global city rankings.

2. Background and Related Works

2.1. City Classifications

The classification of cities at a global scale has received considerable attention in fields such as urban studies, geography and socio-economic analysis. Over time, different approaches have been proposed to rank cities according to multiple dimensions, including economic performance, quality of life, infrastructure or levels of connectivity.
One of the most widely used frameworks is the Global Cities Index by A.T. Kearney, which evaluates cities through indicators such as business activity, human capital, information exchange, cultural experience and political engagement [7]. This index highlights the relevance of international connectivity and integration in shaping the global role of cities.
Similarly, the Global Power City Index (GPCI), developed by the Mori Memorial Foundation, ranks cities based on their overall attractiveness and functional power, considering aspects such as economy, research and development, cultural interaction, liveability, environment and accessibility [8]. Compared to other rankings, GPCI explicitly tries to capture the multidimensional nature of urban competitiveness.
Another well-known reference is the Global Liveability Index by the Economist Intelligence Unit, which focuses on quality of life by assessing stability, healthcare, culture, education and infrastructure [9]. This index is often used in practice to inform investment strategies and urban policy decisions.
At a more structural level, reports such as the United Nations’ Urban Agglomerations provide insights into urban growth, population distribution and metropolitan expansion [10]. These studies are particularly useful to understand long-term urbanisation dynamics and the challenges associated with rapidly growing cities.
From an academic perspective, the concept of global cities introduced by Saskia Sassen emphasizes the role of major urban areas as key nodes in the global economy, especially in sectors such as finance, trade and advanced services [11]. In a similar direction, the World Cities Report by UN-Habitat analyses urbanisation processes and their implications for sustainability and resilience, offering policy-oriented insights [12].
Overall, these contributions reflect the diversity of criteria used to classify cities and highlight the inherent complexity of urban systems. They also suggest that no single framework is sufficient on its own, and that combining different perspectives remains necessary to better understand how cities evolve within an increasingly interconnected world.

2.2. Time Based Behaviour Classification

The classification of temporal data, especially when dealing with numerous elements, has been an area of significant research due to the increasing availability of time-series data across various fields. Temporal data classification involves analysing sequences of data points collected over time to identify patterns, detect anomalies, and make predictions.
One foundational approach in temporal data classification is the use of Hidden Markov Models (HMMs), which have been widely applied to model sequential data such as speech recognition, bioinformatics, and finance [13]. HMMs are effective for recognising temporal patterns and modeling the probabilistic relationships between observed sequences and underlying hidden states.
Recurrent Neural Networks (RNNs) and their variants, such as Long Short-Term Memory (LSTM) networks, have become prominent tools for temporal data classification. RNNs are designed to handle sequential data by maintaining a memory of previous inputs, making them suitable for tasks like language modeling, machine translation, and anomaly detection in time-series data [14]. LSTMs, in particular, address the vanishing gradient problem in standard RNNs, enabling the modeling of long-term dependencies.
Time-series classification has also been enhanced by the application of Convolutional Neural Networks (CNNs), which, although traditionally used in image processing, can learn spatial hierarchies and patterns within sequences when applied to temporal data [15]. This approach has been particularly effective in activity recognition and fault detection.
Clustering techniques such as Dynamic Time Warping (DTW) and k-means clustering have been employed to group temporal data based on similarity measures. DTW allows for flexible matching of time-series data by aligning sequences that may vary in speed or timing [16]. These methods have applications in gesture recognition, medical diagnosis, and financial market analysis.
In recent years, the use of graphs in temporal data classification has gained attention. Graph-based methods leverage the structural information of data points and their temporal relationships. For instance, Dynamic Graph Neural Networks (DGNNs) have been developed to model temporal changes in graph structures, providing a powerful tool for analysing dynamic networks [17]. DGNNs can capture the evolving relationships between nodes over time, making them suitable for applications in social network analysis, traffic prediction, and biological networks.
Another interesting area is the use of unsupervised time series classifications, as k-visility and kshape, and later aggregate all of it in a multiplex graph to obtain more aggregated info for each node [18,19,20].
These studies highlight the diverse methodologies and applications of temporal data classification, particularly the use of graph-based methods. The continuous development of machine learning algorithms and computational techniques promise to further improve the accuracy and applicability of temporal data classification in various domains.

3. Methodology

This study aims to classify and rank nodes using their temporal behavior within a complex network. The use of multiplex networks to order time series attributes has already been studied [18,19,20]. This new methodology is based on the use of the Multiplex PageRank algorithm to cluster network nodes instead of using network metrics separately in a machine learning solution.
First, we employ time series-based clustering to group similar temporal patterns. This involves using clustering algorithms such as kvisibility. By clustering the time series data, we can identify distinct temporal behaviors and patterns that recur over time.
Following the clustering, we construct a temporal behavior multiplex complex network. In this network, each layer corresponds to a different temporal behavior identified in the clustering step, with nodes representing data points and edges indicating the relationships or interactions between them. This multiplex network structure allows us to model the multi-dimensional nature of temporal interactions within the data, providing a more comprehensive representation of the underlying temporal behaviors.
Finally, we apply a version of the Multiplex PageRank algorithm to rank the nodes within this complex network [21,22,23]. The Multiplex PageRank algorithm extends the traditional PageRank by considering the multiple layers of the network and weighting the PageRank contributions of each layer based on the number of edges or other relevant metrics. This approach enables us to obtain a holistic measure of node importance, taking into account both the temporal behavior and the complex interactions across different layers of the network.

3.1. Time Series Based Time Relations Between Nodes

In this study, time series are constructed to represent the interactions between every pair of nodes in the network. For each origin–destination pair, a temporal sequence is generated based on the observed number of connections over time, resulting in a time series that captures the evolution of their relationship. In this way, the static structure of the graph is extended with a temporal dimension, allowing for each edge to be described not only by its existence but also by its dynamics. These time series therefore provide a detailed representation of how interactions between nodes change over time.
The definition of these time series is a critical step in the analysis, as it directly influences the results obtained in subsequent stages. In particular, the choice of the initial and final dates of the observation window, as well as the temporal aggregation frequency (e.g., daily, weekly), determines the shape and variability of the series. Different choices may emphasise or smooth temporal patterns, potentially leading to different clustering outcomes and network structures. Therefore, the temporal configuration of the data must be carefully considered, as it plays a key role in shaping the final results of the methodology.

3.2. An Alternative Way to Cluster Time Series

Time series clustering has been approached through a wide variety of methods in the literature. Among the most commonly used families are feature-based approaches, in which each series is represented through a set of descriptive statistics or shape indicators; model-based methods, which rely on fitted stochastic or dynamical models; distance-based approaches, which compare series directly through similarity measures such as Euclidean distance or Dynamic Time Warping; and, more recently, deep-learning-based methods, which learn latent representations from raw temporal data. Each of these families has advantages and limitations in terms of interpretability, computational cost, robustness to noise, and ability to capture local or global temporal patterns.
A recent line of research proposes transforming time series into graph representations in order to capture their structural properties. In this context, visibility graphs (VG) provide a natural bridge between time series analysis and complex network theory [24,25]. A visibility graph maps a univariate time series into a connected and undirected network by establishing edges between pairs of data points that satisfy a geometric visibility criterion [26]. This transformation encodes the temporal structure of the series into a network topology, allowing for the use of graph-theoretic metrics to characterise temporal behaviour.
Building on this idea, the k-visibility approach has been recently proposed as a clustering method specifically designed for time series [27]. The method consists of three main steps. First, each time series is transformed into its corresponding visibility graph, thus obtaining a network representation of the temporal signal. Second, a set of structural features is extracted from each graph, such as degree-based measures, density, and other topological descriptors that capture both local and global properties of the series. These features provide a compact representation of the temporal dynamics in terms of network structure. Finally, the resulting feature vectors are grouped using the k-means algorithm, which partitions the data into a predefined number of clusters by minimising the within-cluster variance [28].
The key advantage of k-visibility lies in its ability to capture structural similarities between time series that may not be evident through direct temporal comparison. Since relevant patterns such as periodicity, trends, or abrupt changes are reflected in the topology of the visibility graph, the method provides a notion of similarity based on structural organisation rather than pointwise alignment. In addition, the graph-based representation allows for the method to scale efficiently to long time series, as highlighted in the comparative analysis presented in [27], where k-visibility achieves clustering performance comparable to other widely used approaches while offering improved computational efficiency.
For these reasons, k-visibility is adopted in this study as the method to cluster time series according to their temporal behaviour. Its graph-based formulation is consistent with the network perspective of this work and enables a natural integration of the clustering results into the subsequent multiplex network construction.

3.3. Temporal Behavior Multiplex Complex Network

Complex network theory studies systems composed of multiple interacting components, typically referred to as nodes or vertices. The interactions between these nodes are represented by edges that link them [29,30]. When these interactions exhibit different behaviours according to a specific dimension of analysis, the network can be extended into a multilayer structure, known as a multiplex network, where each layer captures a distinct type of relationship.
In this work, a multiplex network is constructed in which each layer corresponds to one cluster obtained from the time series clustering process. The time series represent the temporal evolution of the interactions between pairs of nodes, i.e., the edges of the original graph. After clustering these time series, edges whose temporal behaviour belongs to the same cluster are assigned to the same layer. In this way, each layer of the multiplex network groups together edges with similar temporal dynamics. As a result, the final structure is a multiplex graph with as many layers as clusters identified in the time series clustering stage, where nodes are shared across layers and edges are distributed according to their temporal behaviour.

3.4. Mutiplex PageRank Order

Once the multiplex graph has been constructed, the next step is to quantify the relative importance of each node by means of a multiplex PageRank approach. The objective of this procedure is to obtain a single relevance score for every node while preserving the information provided by the different layers of the multiplex structure.
PageRank was originally introduced as a centrality measure to evaluate the relevance of nodes in a graph according to both the quantity and the quality of their connections [21]. In contrast to local measures that only depend on direct neighbors, PageRank assigns higher importance to nodes that are connected to other relevant nodes. This recursive property makes it particularly suitable for identifying influential nodes in complex systems. In the context of multiplex networks, where interactions are distributed across several layers, this idea can be extended by computing a PageRank score in each layer and then combining these scores into a single global indicator [22,23].
Let G = ( V , E , L ) be a multiplex graph, where V is the set of nodes, E is the set of edges, and L is the set of layers. Each layer represents one specific type of relationship, or in our case, one cluster of temporal flight-connectivity patterns. For each layer l L , let E ( l ) E denote the set of edges belonging to layer l.
For every layer l L , a transition matrix M ( l ) is defined, where each entry M i j ( l ) represents the probability of moving from node i to node j within layer l. This matrix is obtained by normalising the adjacency structure of the corresponding layer so that each row defines a probability distribution over outgoing links. Based on this transition matrix, the PageRank vector P ( l ) associated with layer l is computed as
P ( l ) = α M ( l ) P ( l ) + ( 1 α ) v ,
where α ( 0 , 1 ) is the damping factor, typically fixed at 0.85 , and v is the teleportation vector, usually assumed to be uniform. The damping factor controls the balance between two processes: following the observed network connections and randomly jumping to any node in the graph. This mechanism ensures the existence and stability of the ranking, even in the presence of disconnected components or nodes with no outgoing edges.
The result of this step is a PageRank vector for each layer, reflecting node importance with respect to a specific temporal pattern of flight connectivity. However, since the aim of this study is to derive a global ranking for the whole multiplex structure rather than independent rankings for each layer, these layer-specific vectors must be aggregated.
To combine the different layers, a weight w ( l ) is assigned to each one according to its relative number of edges. The rationale behind this choice is that denser layers represent more observed interactions and, therefore, should contribute more strongly to the final score. Formally, the weight of layer l is defined as
w ( l ) = | E ( l ) | m L | E ( m ) | .
By construction, these weights satisfy
l L w ( l ) = 1 .
Finally, the overall multiplex PageRank vector P is obtained as the weighted sum of the PageRank vectors computed for each layer:
P = l L w ( l ) P ( l ) .
This final vector provides a global measure of node importance across the whole multiplex network. In this way, a node receives a high score not only because it is central in one particular layer, but also because it can be consistently relevant across several layers, especially those with greater structural weight. Therefore, the resulting ranking integrates two complementary dimensions: the centrality of a node within each temporal connectivity pattern and the relative relevance of each pattern within the multiplex structure.
In the present study, this multiplex PageRank score is interpreted as an indicator of the relative relevance of each city within the network of commercial flight interactions. Cities with higher scores occupy structurally important positions in the multiplex system, either because they are strongly connected within highly populated layers or because they maintain relevant positions across several distinct temporal connectivity patterns. This makes multiplex PageRank an appropriate tool to summarise, in a single ranking, the multilayer organisation of the air transport network derived from the clustering stage.
The remainder of this article is organised as follows: Section 4 explains the steps followed to analyse the data related to commercial flights. Section 4.3 shares the results obtained following this machine learning-based methodology and, finally, Section 5 provides the main insights extracted out of this analysis and Section 6 suggests new promising research paths.

4. Evaluation and Results

The objective of this article is to come up with a quick but reliable classification of cities based on the number of commercial flight connections in which they are either origin or destination and to compare the results obtained with existing classifications on how economically connected cities are. Table 1 summarises the key steps performed.

4.1. Data

The data set analysed in this article comes from zenodo [31]. CERN, the European Organisation for Nuclear Research, launched it as an open science platform. zenodo is an implementation of the OpenAIRE project within the Open Data policy of the European Commission (EC). Specifically, this article uses commercial flight data provided to zenodo by the OpenSky Network, a non-profit association dedicated to improving the security, reliability, and efficiency of airspace usage. They provide open access to real-world air traffic control data. The entire data set available in https://zenodo.org/ (accessed on 1 January 2026) consists of flights from January 2019 to December 2022. This data repository was initially compiled to analyse the impact of the COVID-19 pandemic on commercial aviation worldwide. The data set includes all commercial flights taking place between major cities across the globe. In this research, the cities under study correspond to the list of cities classified by GaWC in 2020 [5]. To increase the temporal coverage of the analysis, this study uses flight data from December 2021 to December 2022.
The methodology followed by the GaWC network is based on corporate data describing the level of integration and influence of a city in an interconnected world. The GaWC classification indicates the importance of cities as nodes in the world city network, i.e., their degree of integration into economic globalisation, including seasonal industries such as tourism.

4.2. Evaluation Process

4.2.1. First Step: Time Series Creation

The time series used in this study are created from flight data connections provided by commercial flight data providers, such as zenodo.org [31]. This dataset includes detailed information on commercial flights between various cities around the world. The analysis focuses on extracting and structuring the daily temporal patterns of flights between pairs of cities connected by air traffic.
This analysis use the information of 213 big cities listed in the GaWC classification [5].
To begin, the flight data is preprocessed to ensure accuracy and consistency. This involves cleaning the data to handle missing values. For each pair of cities connected via commercial flights, we create a temporal profile of daily connections. This profile captures the number of flights occurring each day between the two cities, providing a comprehensive view of their flight activity over time. If there are no flights in a single day, the value in the time series will be 0.
All flights between each specific pair of cities are aggregated using the following format:
  • Source city;
  • Destination city;
  • Date: Day of the week;
  • Number of flights per day;
By structuring the data in this format, we can create a time series for each origin–destination pair. This time series data reflects the daily number of flights between the cities, allowing us to analyse the temporal dynamics of their flight connections.
Subsequently, for every origin–destination pair, a time series is built according to the number of flights that take place between them on a daily basis. This pre-processing step ensures that the time series data is structured and ready for further analysis. By capturing the temporal patterns of flight activity, we can gain valuable insights into how air traffic flows between different cities and how these patterns vary over time as we can appreciate in Figure 1 and Figure 2.
The resulting time series data forms the basis for further analysis in subsequent steps. This data will be used to construct a temporal behavior multiplex complex network, which will model the multi-dimensional and interdependent nature of temporal interactions within the dataset. Ultimately, this network will enable us to apply the Multiplex PageRank algorithm for node ranking, providing a comprehensive measure of the importance of each city within the global flight network.

4.2.2. Second Step: k-Visibility-Based Time Series Clustering

The second step consists of using the k-visibility algorithm to cluster the obtained time series based on their different temporal flight profiles. The k-visibility algorithm is particularly suitable for this task as it aligns and clusters time series data based on shape similarities, which is crucial for identifying patterns in temporal behaviors. By applying the k-visibility algorithm, we group the time series into clusters where each cluster represents a distinct temporal flight profile. Each cluster describes a similar relationship in terms of commercial flights between two cities. This step allows us to capture the inherent temporal dynamics and identify patterns of flight activity between city pairs. As we can see in Figure 3, Figure 4 and Figure 5 there is different temporal patterns between different cities.
The k-visibility algorithm requires the number of clusters k to be defined in advance. In order to select an appropriate value, we evaluated different options using two internal validation metrics: the silhouette score and the Calinski–Harabasz index.
Figure 6 and Figure 7 show the evolution of both metrics for values of k between 2 and 10. As can be observed, the silhouette score reaches its highest value at low k, but then it progressively decreases as the number of clusters increases. However, from k = 4 to k = 6, the decrease becomes smoother, suggesting that the main structure of the data is already captured in that range. After k = 6, the drop becomes more pronounced, indicating that adding more clusters leads to less consistent groupings.
A similar behavior can be observed for the Calinski–Harabasz index. Although the maximum is obtained for very small values of k, there is a local improvement around k = 6, after which the metric decreases again. This suggests that k = 6 still preserves a reasonable separation between clusters while avoiding an excessive fragmentation of the data.
Taking both metrics into account, k = 6 was selected as a compromise solution. Even if it is not the global optimum for any single metric, it provides a good balance between cluster quality and interpretability. In addition, using six clusters allows us to construct a multiplex network with a manageable number of layers, which is important for the subsequent analysis.

4.2.3. Third Step: Building a Single Network per Cluster

The third step builds a single complex network layer with each of the clusters identified in the second step. For each of the cities that are connected with commercial flights that follow a similar k-visibility clustering based time pattern, a complex network layer is created. This way, patterns of flight profiles between cities create a temporal behavior multiplex network.

4.2.4. Fourth Step: Creating a Multiplex Complex Network

The fourth step overlays all these cluster-based complex networks to create a multiplex network in which the nodes common to more than one network layer are those cities that are present in more than one type of cluster representing the temporal patterns of the flights between pairs of cities, as Figure 8 displays. This means that if a city is connected via flights to more than one city, it will appear in as many network layers as destination cities connected to it with different time series profiles. However, those cities with a flight time profile included in the same k-visibility cluster will be nodes within the same network layer.

4.2.5. Fifth Step: A New Clustering Based on Complex Network Parameters

Once a multiplex complex network is created, and based on a key complex network parameter such as degree distribution for each of the resulting layers, a new order is proposed using multiplex PageRank.
The result of this new clustering process is compared against the classification of cities done by GaWC in 2020 [5].

4.2.6. Sixth Step: Assessing the Gap Between Clustering and GaWC 2020

In this sixth step, the gap between the classification of cities based on the proposed clustering and the classification of cities done by GaWC in 2020 [5] is analysed. Each of the analysed cities is tagged with the number of the cluster in which they are included and with the GaWC category.

4.3. Results

This article uses commercial flight data provided to zenodo by the OpenSky Network, a non-profit association dedicated to improving the security, reliability and efficiency of the air space usage. They provide open access to real-world air traffic control data. The entire data set available in zenodo.org consists of the list of flights from January 2019 to October 2022. In this research, the cities that are subject of study correspond to the list of cities classified by GaWC in 2020 [5]. This study makes use of flight data from December 2021 to December 2022.
With this data set, the analysis of the time series, built from commercial flights between GaWC-classified cities, produces a series of clusters of time series based on the k-visibility algorithm. Subsequently, a network layer is built for each of the clusters, creating a multiplex complex network. Figure 8 depicts this step.
With the ordering of the pagerank data we have obtained we compare it with the segmentation proposed in GaWC. As we can see in Figure 9 there is a relationship between the proposed order and the distribution of cities as we can see in the Appendix A.

5. Conclusions

This paper proposes a data-driven approach for classifying cities based on a single, widely available variable: the number of commercial flights between them over time. By combining time series clustering with multiplex network construction and the application of a Multiplex PageRank algorithm, the methodology provides a systematic way to extract structural and temporal information from air traffic data and translate it into a global ranking of cities.
The results suggest that commercial flight activity can serve as a meaningful proxy for economic interconnectedness between cities. The temporal dynamics of flight connections capture not only the intensity of interactions but also their evolution over time, reflecting patterns that may be associated with business activity, tourism flows, and broader economic relationships. In this sense, the proposed framework offers a simple yet informative representation of urban connectivity using open and continuously available data.
A key finding of this study is that the ranking derived from this approach shows a notable level of agreement with established classifications such as GaWC, which are based on a wide range of socioeconomic indicators. This result indicates that air traffic flows, despite their simplicity, contain significant information about the position of cities within the global economic network. Therefore, they can be considered as an efficient proxy variable that approximates more complex and resource-intensive classification systems.
Beyond the specific results, the main contribution of this work lies in the integration of time series clustering and multiplex network analysis into a reproducible framework that can be applied to large-scale datasets. This approach enables the incorporation of temporal behaviour into network-based city classification, providing additional insight compared to static analyses.
Finally, some limitations should be acknowledged. The results depend on the temporal window and aggregation choices used to construct the time series, and the analysis is restricted to a specific period of post-pandemic recovery. Future work should extend the temporal scope of the data, explore the robustness of the results under different methodological choices, and incorporate additional sources of information to further validate and enrich the proposed framework.

6. Future Work

Future work should focus on several directions to further validate and extend the proposed framework. First, it is necessary to analyse the temporal stability of the identified clustering patterns by extending the time horizon of the study. Incorporating data from additional years would allow for assessments of whether the observed structures persist over time or are influenced by short-term effects, such as seasonal variations or post-pandemic recovery dynamics.
In addition, future research should include robustness and sensitivity analyses to evaluate how the results are affected by key methodological choices, such as the temporal aggregation level, the clustering parameters, and the construction of the multiplex network. Comparing the proposed approach with alternative time series clustering methods and network-based ranking techniques would also help us to better understand its relative performance and limitations.
Another promising direction is the integration of additional data sources to complement air traffic information. Incorporating variables related to trade, financial flows, or digital connectivity could provide a more comprehensive representation of economic relationships between cities and improve the interpretability of the results. Finally, the potential application of this framework in real-world scenarios, such as urban planning or economic policy design, should be further explored to assess its practical relevance and usability.

Author Contributions

Conceptualization, S.I.-P., J.M. and R.C.; Methodology, J.M.; Validation, S.I.-P.; Formal analysis, S.I.-P., A.P., J.M. and R.C.; Investigation, A.P.; Data curation, A.P.; Writing—original draft, S.I.-P. and A.P.; Writing—review & editing, A.P.; Supervision, R.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been partially supported by project M3680 (URJC Grant) and URJC Technology Research Center for Data, Complex Networks and Cybersecurity Sciences.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Relationship Between City Classification and Cluster

Table A1 complements the graphical information provided by Figure 9. It lists the cluster and classification of all analysed cities.
Table A1. Order and classification.
Table A1. Order and classification.
CityOrderClassification
London0.08286846032237187Alpha ++
Geneva0.07443346944376086Beta -
Amsterdam0.0640311485472075Alpha
Rome0.06272622117803714Beta +
Zurich0.059457103478881314Alpha -
New York0.057458527589325285Alpha ++
Dublin0.05724235397558691Alpha -
Dubai0.05658105701611917Alpha +
Vienna0.056562771004442326Alpha -
Madrid0.05508190351389255Alpha
Chicago0.054343526782845354Alpha
San Francisco0.05359542133773153Alpha -
Los Angeles0.053359361054424494Alpha
Barcelona0.053218120610391204Beta +
Toronto0.053154342523634816Alpha
Munich0.05295166029620492Alpha -
Atlanta0.052337183159727164Beta +
Houston0.05229750559164148Beta +
Warsaw0.05202740481081898Alpha -
Berlin0.05108195786499423Beta +
Vancouver0.05064916548130127Beta +
Miami0.04984980710746199Beta +
Moscow0.04950083570987314Alpha
Melbourne0.0493447599593919Alpha -
Hamilton0.04933122604408509Sufficiency
Doha0.04892998159033505Beta +
Seoul0.04889266781144161Alpha -
Bangkok0.04769197872509893Alpha -
Seattle0.046409723586760675Beta
Boston0.046161418553560304Alpha -
Prague0.045493063047348195Alpha -
Stockholm0.045111252642368135Alpha -
Hong Kong0.04432908152293485Alpha +
Las Vegas0.04388323321294666Sufficiency
Dallas0.043475950319007264Beta +
San Jose0.043362969954320034Gamma +
Richmond0.04289839315069728Sufficiency
Nashville0.04280799468756276Gamma
Orlando0.04261330152303014Gamma +
Cleveland0.0419601541029099Gamma
Philadelphia0.04178297880557582Beta
Tel Aviv0.04143022443422701Beta +
Copenhagen0.04130741235484866Beta +
Lisbon0.04128806060803853Alpha -
Jacksonville0.04124202992516507Sufficiency
Luxembourg0.040893552180865385Alpha -
Denver0.040669085064177275Beta
San Juan0.040618586713831076Gamma
Bristol0.04049824177416359Gamma
Hamburg0.03993221755549518Beta +
Rotterdam0.039855674652522925Gamma +
Charlotte0.039494232277918054Gamma +
Minneapolis0.03934253076644363Beta -
Abu Dhabi0.038866302047305984Beta
Harrisburg0.03885923636316671Sufficiency
Nottingham0.038794498827355015Sufficiency
Tampa0.03857338126913577Beta -
Budapest0.03853051797677236Beta +
Louisville0.03833570221927531Sufficiency
Detroit0.03815124510486706Beta -
Indianapolis0.037855814821912664High Sufficiency
St Louis0.037672017145690975Gamma +
Pittsburgh0.03766462678032102Sufficiency
Cincinnati0.0371035889011297High Sufficiency
Calgary0.03692338364300565Beta -
Hartford0.03690840354645314High Sufficiency
Milwaukee0.0367741464244136Gamma
San Diego0.03669574478603722Beta -
Austin0.03650480182235696Beta -
San Antonio0.03650290551743829High Sufficiency
Tokyo0.03650198576955331Alpha +
Omaha0.03636969435822775Sufficiency
Baltimore0.03632311011153943Gamma +
Rochester0.036279263200245014Sufficiency
Hannover0.03620889650644066Sufficiency
Ottawa0.03619416173659745Gamma
Singapore0.0360913840676057Alpha +
Edinburgh0.0359190002916075Beta -
Winnipeg0.03472888521343283Sufficiency
Belgrade0.03415683225112791Beta -
Manchester0.0339024319033121Beta -
Buffalo0.03389670596796734Sufficiency
Halifax0.03378558135045927Sufficiency
Phoenix0.03372698570892201Gamma +
Kansas City0.0334291640020074Gamma
Salt Lake City0.033134995477773606Gamma
Katowice0.032517345647030724Gamma
New Orleans0.032439988489116854Sufficiency
Oklahoma City0.032127289386656835Sufficiency
Riyadh0.031800307136017544Alpha -
Bremen0.03162525811361897Sufficiency
Sacramento0.03142183614189984Gamma
Porto0.03137757732315148Gamma +
Bratislava0.031375557905014236Beta -
Antwerp0.030967543516767802Gamma +
Bern0.030607249723362973Sufficiency
Bologna0.030144829536453508Sufficiency
Sofia0.029709255054424982Beta -
Dortmund0.029644632204705496Sufficiency
Riga0.029252345598577277Gamma +
Johannesburg0.02915801475157052Alpha -
Athens0.02864262217064819Beta
Liverpool0.0284676808010074Sufficiency
Beirut0.028250814699272417Beta +
Naples0.028064376500578594Sufficiency
Manama0.02756003332447534Beta
Linz0.027021655818063886Sufficiency
Perth0.026990556186950496Beta
Bogota0.02689251004692141Beta +
Casablanca0.02683791724053132Beta
Aberdeen0.02646106854636992Sufficiency
Auckland0.025708871229435704Beta +
Guatemala City0.025365426512437637Beta -
Saskatoon0.02521562027344878Sufficiency
Ankara0.02521001639787949Gamma
Bergen0.02495574739484696Sufficiency
Tallinn0.024845092118531587Sufficiency
St Petersburg0.02411279050663235Beta -
Hobart0.024093889922556483Sufficiency
Southampton0.023903446766727406Sufficiency
Belfast0.023831713358544015Gamma +
Vilnius0.023782909056041357Gamma
Rio De Janeiro0.02361985772028295Beta
Edmonton0.02303696872514104Gamma
Tbilisi0.023016213237420584Gamma +
Palermo0.02294885407228306Sufficiency
Bucharest0.02282137352925363Beta +
Dresden0.02267443435137656Sufficiency
Strasbourg0.022528303323522087High Sufficiency
Cape Town0.022200545081857863Beta
Dakar0.022168513236686036Gamma
Bangalore0.021789984148050712Alpha -
Mannheim0.021518364544600113Sufficiency
Brisbane0.021391516643504263Beta +
Algiers0.0202442916881039Gamma +
Santo Domingo0.020155261529681657Gamma
Nantes0.020023951536892046Gamma
Tijuana0.01991741061666742High Sufficiency
Campinas0.019902625500367303Sufficiency
Jakarta0.01969045109948221Alpha
Milan0.0196762707788752Alpha
Chennai0.019533095656494665Beta
Zagreb0.019363183256297923Beta -
Quito0.017758067275733466Beta -
Paris0.01736805994147254Alpha +
Penang0.017340358273442248Gamma
Lima0.01727872675749759Beta +
Hsinchu City0.01646824546568633Sufficiency
Bandar Seri Begawan0.01631468654446834Sufficiency
Mumbai0.015521624504952853Alpha
Amman0.01547147557941693Beta -
Kolkata0.015400317347536404Gamma +
Puebla0.01535922032786452High Sufficiency
Sapporo0.015261245673173506Sufficiency
Buenos Aires0.015232115675365367Alpha -
Almaty0.014984884452562667Beta -
Florence0.014694926689170334Sufficiency
Palo Alto0.014560678452952922Sufficiency
Beijing0.014536044861634486Alpha +
Johor Bahru0.014331293431319371High Sufficiency
Minsk0.01429453044117685Sufficiency
San Salvador0.014030190532607671Beta -
Lausanne0.013995868834706021Gamma
Sheffield0.01391287973601087Sufficiency
Adelaide0.013755700420283453Gamma +
Essen0.013325056137340283Sufficiency
Wellington0.01324033099673921Gamma
Des Moines0.013221083580422479Sufficiency
Canberra0.01300342314622525Sufficiency
Sydney0.01262797263037952Alpha
Seville0.012527808170143715High Sufficiency
Baku0.012520871259587734Gamma +
Kazan0.012451313614213397Sufficiency
Christchurch0.012238600121566454Sufficiency
Alexandria0.012017270007625987Sufficiency
Chongqing0.011431250006635103Beta
Nagoya0.010896511423048318Gamma
Fukuoka0.010878705655987482Sufficiency
Glasgow0.010635469102946087Gamma +
Bilbao0.010509069264136698Gamma
Delhi0.010355074776758306Alpha -
Quebec0.010107302384586828Sufficiency
Mexico City0.010026663986918817Alpha
Leeds0.010006573448390682High Sufficiency
Cairo0.009940217118253297Beta +
Kobe0.009000636715833071Sufficiency
Colombo0.008963064190464786Gamma
Porto Alegre0.008867443755905311High Sufficiency
Brussels0.008424884727944483Alpha
Santa Cruz0.008122907322718389Sufficiency
Memphis0.008043895580189328Sufficiency
Tianjin0.007892524154485051Beta
Santiago0.0076805595773575715Alpha -
Genoa0.007659533292661303Sufficiency
Lahore0.007578194417243795Beta -
Labuan0.007167180100622703Sufficiency
Pretoria0.006984753682292396Sufficiency
Haifa0.0069319941242844026Sufficiency
Oslo0.006257340317531866Beta
Sarajevo0.0062469842743976Sufficiency
Recife0.005867610566214577Sufficiency
Kuala Lumpur0.005837948534315372Alpha
Raleigh0.0055803061322488206High Sufficiency
Curitiba0.005473568133516429High Sufficiency
Panama City0.005396258878329846Beta
Bursa0.005193983806947503Sufficiency
Gothenburg0.005088458778952795Gamma
Guayaquil0.005060757746209759Gamma
Montevideo0.005015562400077605Beta

References

  1. United Nations. The Sustainable Development Goals Report 2022. 2022. Available online: https://unstats.un.org/sdgs/report/2022/The-Sustainable-Development-Goals-Report-2022.pdf (accessed on 27 April 2026).
  2. World Economic Forum. Accelerating Urban Inclusion for a Just Recovery. Insight Report. 2022. Available online: https://www3.weforum.org/docs/WEF_C4IR_GFC_on_Cities_Inclusion_2022.pdf (accessed on 27 April 2026).
  3. JLL. The State of Global Cities. 2019. Available online: https://www.us.jll.com/content/dam/jll-com/documents/pdf/research/jll-demand-and-disruption-in-global-cities-2019-v1.pdf (accessed on 27 April 2026).
  4. The Mori Memorial Foundation. What Is the GPCI? Global Power City Index 2022. 2022. Available online: https://mori-m-foundation.or.jp/english/ius2/gpci2/index.shtml (accessed on 27 April 2026).
  5. GaWC. Globalization and World Cities Research Network. 2020. Available online: https://gawc.lboro.ac.uk/gawc-worlds/gawc-data/ (accessed on 27 April 2026).
  6. Taylor, P.J. Specification of the World City Network. Geogr. Anal. 2001, 33, 181–194. [Google Scholar] [CrossRef]
  7. Global Cities Index 2019; A.T. Kearney: Chicago, IL, USA, 2019.
  8. Global Power City Index 2019; Mori Memorial Foundation Institute for Urban Strategies: Tokyo, Japan, 2019.
  9. Global Liveability Index 2019; Economist Intelligence Unit: London, UK, 2019.
  10. World Urbanization Prospects: The 2018 Revision; United Nations: New York, NY, USA, 2018.
  11. Sassen, S. The Global City: New York, London, Tokyo; Princeton University Press: Princeton, NJ, USA, 2001. [Google Scholar]
  12. World Cities Report 2020: The Value of Sustainable Urbanization; UN-Habitat: Nairobi, Kenya, 2020.
  13. Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef] [PubMed]
  14. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1578–1585. [Google Scholar]
  16. Berndt, D.J.; Clifford, J. Using dynamic time warping to find patterns in time series. In Proceedings of the KDD Workshop, Seattle, WA, USA, 31 July–1 August 1994; Volume 10, pp. 359–370. [Google Scholar]
  17. Skarding, J.; Gabrys, B.; Musial, K. Foundations and modeling of dynamic networks using dynamic graph neural networks: A survey. IEEE Access 2021, 9, 79143–79168. [Google Scholar] [CrossRef]
  18. Iglesias-Perez, S.; Criado, R. Temporal metagraph: A new mathematical approach to capture temporal dependencies and interactions between different entities over time. Chaos Solitons Fractals 2023, 175, 113940. [Google Scholar] [CrossRef]
  19. Iglesias-Perez, S.; Moral-Rubio, S.; Criado, R. A new approach to combine multiplex networks and time series attributes: Building intrusion detection systems (IDS) in cybersecurity. Chaos Solitons Fractals 2021, 150, 111143. [Google Scholar] [CrossRef]
  20. Iglesias-Perez, S.; Moral-Rubio, S.; Criado, R. Combining multiplex networks and time series: A new way to optimize real estate forecasting in New York using cab rides. Phys. A Stat. Mech. Its Appl. 2023, 609, 128306. [Google Scholar] [CrossRef]
  21. Page, L. The Pagerank Citation Ranking: Bringing Order to the Web; Technical report; Stanford Digital Library Technologies Project: Stanford, CA, USA, 1998. [Google Scholar]
  22. Halu, A.; Mondragón, R.J.; Panzarasa, P.; Bianconi, G. Multiplex pagerank. PLoS ONE 2013, 8, e78293. [Google Scholar] [CrossRef] [PubMed]
  23. Pedroche, F.; García, E.; Romance, M.; Criado, R. Sharp estimates for the personalized multiplex PageRank. J. Comput. Appl. Math. 2018, 330, 1030–1040. [Google Scholar] [CrossRef]
  24. Lacasa, L.; Luque, B.; Ballesteros, F.; Luque, J.; Nuno, J.C. From time series to complex networks: The visibility graph. Proc. Natl. Acad. Sci. USA 2008, 105, 4972–4975. [Google Scholar] [CrossRef] [PubMed]
  25. Luque, B.; Lacasa, L.; Ballesteros, F.; Luque, J. Horizontal visibility graphs: Exact results for random time series. Phys. Rev. E 2009, 80, 046103. [Google Scholar] [CrossRef] [PubMed]
  26. Partida, A.; Criado, R.; Romance, M. Visibility Graph Analysis of IOTA and IoTeX Price Series: An Intentional Risk-Based Strategy to Use 5G for IoT. Electronics 2021, 10, 2282. [Google Scholar] [CrossRef]
  27. Iglesias-Perez, S.; Partida, A.; Criado, R. The advantages of k-visibility: A comparative analysis of several time series clustering algorithms. AIMS Math. 2024, 9, 35551–35569. [Google Scholar] [CrossRef]
  28. Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1979, 28, 100–108. [Google Scholar] [CrossRef]
  29. Newman, M.E. The structure and function of complex networks. SIAM Rev. 2003, 45, 167–256. [Google Scholar] [CrossRef]
  30. Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M.; Hwang, D.U. Complex networks: Structure and dynamics. Phys. Rep. 2006, 424, 175–308. [Google Scholar] [CrossRef]
  31. OpenSky Network. Crowdsourced Air Traffic Data from the OpenSky Network 2020. 2022. Available online: https://zenodo.org/record/7323875/#.Y4aLdtfMKUk (accessed on 27 April 2026).
Figure 1. Evolution of flights between Amsterdam and London.
Figure 1. Evolution of flights between Amsterdam and London.
Mathematics 14 01654 g001
Figure 2. Evolution of flights between New York and London.
Figure 2. Evolution of flights between New York and London.
Mathematics 14 01654 g002
Figure 3. Weekly evolution of flights between Berlin and London.
Figure 3. Weekly evolution of flights between Berlin and London.
Mathematics 14 01654 g003
Figure 4. Weekly evolution of flights between Madrid and London.
Figure 4. Weekly evolution of flights between Madrid and London.
Mathematics 14 01654 g004
Figure 5. Weekly evolution of flights between New York and London.
Figure 5. Weekly evolution of flights between New York and London.
Mathematics 14 01654 g005
Figure 6. Silhouette evolution per number of clusters.
Figure 6. Silhouette evolution per number of clusters.
Mathematics 14 01654 g006
Figure 7. Calinski evolution per number of cluster.
Figure 7. Calinski evolution per number of cluster.
Mathematics 14 01654 g007
Figure 8. Graphical representation of how the multiplex network is created from a collection of time series.
Figure 8. Graphical representation of how the multiplex network is created from a collection of time series.
Mathematics 14 01654 g008
Figure 9. Distribution of GaWC city categories across the ranking obtained.
Figure 9. Distribution of GaWC city categories across the ranking obtained.
Mathematics 14 01654 g009
Table 1. Steps of the proposed methodology specifying the action for each step and the data source used.
Table 1. Steps of the proposed methodology specifying the action for each step and the data source used.
StepActionData Source
1creation of the time series between each citycommercial flight data
2k-visibilityto create clustersflights’ time series
3multiplex network with edges
of the same cluster in each layer
time series clusters
4multiplex PageRank sorting city importancemultiplex network
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Iglesias-Perez, S.; Partida, A.; Murillo, J.; Criado, R. Unsupervised Clustering of Cities Using Commercial Air Traffic: A Proxy for Economic Connectivity. Mathematics 2026, 14, 1654. https://doi.org/10.3390/math14101654

AMA Style

Iglesias-Perez S, Partida A, Murillo J, Criado R. Unsupervised Clustering of Cities Using Commercial Air Traffic: A Proxy for Economic Connectivity. Mathematics. 2026; 14(10):1654. https://doi.org/10.3390/math14101654

Chicago/Turabian Style

Iglesias-Perez, Sergio, Alberto Partida, Juan Murillo, and Regino Criado. 2026. "Unsupervised Clustering of Cities Using Commercial Air Traffic: A Proxy for Economic Connectivity" Mathematics 14, no. 10: 1654. https://doi.org/10.3390/math14101654

APA Style

Iglesias-Perez, S., Partida, A., Murillo, J., & Criado, R. (2026). Unsupervised Clustering of Cities Using Commercial Air Traffic: A Proxy for Economic Connectivity. Mathematics, 14(10), 1654. https://doi.org/10.3390/math14101654

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop