High-Order Community Detection in the Air Transport Industry: A Comparative Analysis among 10 Major International Airlines

: Community detection in a complex network is an ongoing ﬁeld. While the air transport network has gradually formed as a complex system, the topological and geographical characteristics of airline networks have become crucial in understanding the network dynamics and airports’ roles. This research tackles the highly interconnected parts in weighted codeshare networks. A dataset comprising ten major international airlines is selected to conduct a comparative analysis. The result conﬁrms that the clique percolation method can be used in conjunction with other metrics to shed light on air transport network topology, recognizing patterns of inter- and intra-community connections. Moreover, the topological detection results are interpreted and explained from a transport geographical perspective, with the physical airline network structure. As complex as it may seem, the airline network tends to be a relatively small system with only a few high-order communities, which can be characterized by geographical constraints. This research also contributes to the literature by capturing new insights regarding the topological patterns of the air transport industry. Particularly, it reveals the wide hub-shifting phenomenon and the possibility of airlines with different business models sharing an identical topology proﬁle.


Introduction
Topology describes properties of space that are preserved under continuous deformations, while network topology explores the way components arrange and connect within a system [1]. With the tremendous growth of the complex networks theory and application, the air transport network has gradually formed as a complex system of flights, which considers airports and direct flights as vertices and edges [2].
Vertices or nodes with network-specific roles emphasize the determinants of the network topology and performance [3]. In a complex network, communities usually represent the multiple subgroups or clusters, which consist of groups of vertices that locally, densely interconnect, but sparsely connect to other groups [4,5]. In other words, nodes more heavily connect within the community, rather than across communities [6]. For instance, a community in the transportation industry may exist as several cities, which are frequently connected by bus, train, or flights. Further, communities may have features including motifs and cliques, where nodes are divided based on their qualities or their relationships with other nodes [7]. Subsequently, the existence of communities evidences the hierarchy among the interactions and features within the network.
While the core nodes and communities act as a pivotal part in the system, community detection facilitates the uncovering of hidden relationships, revealing the interconnections and inter-dependencies among multiple parts of the complex aviation network [8,9]. In this sense, community detection is of great value in classifying the functions of nodes and • This paper examines the applicability and the robustness of the weighted clique percolation method in the commercial world, with a sample of ten major airlines with different business models. • This paper expands the research scope by taking both codeshare agreements and the flight weights into account. • New insights in air transport geographical and topological patterns include the following: 1.
The detected high-order communities can be interpreted purely based on geographical information.

2.
The wide-spread topological hub-shifting phenomenon is observed, resulting in inconsistency between topological gateway airports and the actual airline hubs.

3.
It is possible that airlines with the different business models and network sizes share an identical topology profile.
This paper is structured by first reviewing the fundamental concepts of network science and community detection methods (Sections 1 and 2). Section 3 briefly explains the computational tool implemented and the sources of data in this research. They are followed by a network analysis of ten selected airlines, with a special focus on identifying and examining the community configuration and influential shared nodes (Section 4). The results are then interpreted and explained from a transport geographical point of view, to obtain an in-depth understanding of the network dynamics and airports' roles. Section 5 discusses the findings and results, emphasizing the new insights, and Section 6 concludes this paper.

Literature Review
The study of traffic dynamics has become one of the most successful applications of the complex network theory [12]. Table 1 summarizes a brief comparison of community detection methods by the year of publication. Some of them have proven their effectiveness in the transport industry, and they will be further reviewed in this section. Table 1. A brief comparison of community detection methods.

Categories
Reference

Year Approaches Sketches
Low-Order Community Detection [13] 2002 Based on betweenness Could handle both weighted and directed graphs Improved the speed of the algorithm [14] 2004 Based on shortest path betweenness Tested for undirected unweighted edge Could handle more complicated network types [3] 2005 Based on the modularity Proposed by Newman and Girvan [14] Tested for undirected unweighted graph [15] 2007 Based on successive neighborhoods Potentially faster than most community finding algorithms Not as precise as Girvan and Newman's method [14] [7] 2013 Degree-based core-vertex algorithm Detected overlapping communities

Traffic Dynamics from a Low-Order Perspective
Academics have developed significant numbers of mathematical tools and computer algorithms to identify the effective approaches to detect community structures. However, most of them focused on the low-order connection patterns of individual nodes and edges. For example, the traditional technique revealed the underlying community structure by removing edges, based on the shortest path, betweenness, or successive neighborhoods [13][14][15]. Others tried to overcome the limitations of the conventional ones and proposed new methods, such as the degree-based core-vertex algorithm and local community neighborhood ratio function [7,18]. Precisely, Guimerà et al. identified the multi-community structure of the global airport network, supporting the anomalous values of centrality [3]. They insisted that the community structure cannot be explained exclusively with geographical factors. Jia et al. pointed out the spatial pattern of the airport network modular structure over time in the United States [27]. However, those structures were not characterized by geographical constraints, which is consistent with Guimerà et al. [3]. Yang et al. applied hierarchical cluster analysis to compare the community configurations between high-speed rail and airline networks [4]. Similarly, the four subgroups identified in the dendrogram for airlines cannot be explained with geographical factors like the clusters that were observed in the high-speed rail network. Additionally, no clear pattern of links between the cities' demographic information and clusters was found in the airline networks. Wu et al. improved the Clauset-Newman-Moore modularity maximization algorithm and proposed a route-traffic-based method to detect communities in airline networks [20]. They claimed that full-service airlines had fewer communities consisted of more airports, while the low-cost carriers illustrated the opposite.

High-Order Community Detection in Aviation
A high-order connection usually refers to a sub-network, which is a graph within a larger graph, whose vertex set is a subset of the vertex set, and edge set is a subset of the edge set. A complete subgraph in the network is also called a clique. It is a common network topology class and one of the basic concepts in the mathematical area of graph theory. It requires every pair of distinct nodes to be connected by a unique edge in a simple undirected graph or a pair of unique edges in each direction [28]. For instance, a three-node clique denotes a triangle, while a four-node clique represents a quadrilateral with two diagonal lines.
Typically, the clique structures in the airline network represent a group of highly connected destinations with overall better connectivity. In general, airline hubs or bases obtain the highest connectivity within the country and to the hubs of other corresponding airlines [29]. Consequently, cliques exist more commonly among hub airports. In return, the cliques in the airline network also imply the market potential and strategic position of those distinct destinations. In this sense, proper measurements for those cliques could help an airline to facilitate its core market and adjust its strategy when necessary.
Subsequently, clique-based methods have become popular for airlines to quantify the contributions of their highly connected destinations. For instance, Cardillo et al. have displayed the characteristics of three-node cliques in a multi-layer airline network [29]. They highlighted that the merging of different airline networks generates a large density of triangles, which represent the three-node cliques in the graph. They not only noticed the opportunity brought by the self-connections between major airlines and low-cost carriers, but also believed that one airline can hardly provide round trips for all the flights in those triangles. In fact, the ever-growing codeshare partnerships allow an airline to market its partners' products and services, and maximize the profits on more routes, without investing any additional capacity. Since the interline tickets are commonly bonded with codeshare agreements, it is not appropriate to merge airline networks randomly. Instead, the impacts of the aggregating codeshare networks need to be further examined.
Furthermore, the existing literature generally did not consider the patterns of the high-order connections, such as communities formed by cliques. Although Huang et al. tried to tackle this problem by proposing a higher-order multi-layer community detection method [25], their approach is motif-driven rather than clique-driven. Therefore, quantitatively characterizing the clique communities becomes essential to shed light on the high-order structures in the network.

The Applicability and The Robustness of The Existing Community Detection Methods
The topological positions and the properties of nodes directly affect the interactions and spreading phenomena in the network [30]. A node participating in more than one community is a common phenomenon in complex networks. In the air transport industry, the shared nodes reflect a few central gateway airports. They connect different regions with a high density of routes, propagating passengers, cargo, and even diseases to a large portion of the network. Guimerà et al. claimed that the airports connecting different communities are typically hubs within their low-order communities [3]. Rather than being classified into one community, those airports are, if not demanded equally, proportionately needed by both sides. In this sense, it is the shared nodes that also indicate the existence of community overlaps. Accordingly, methods and algorithms have been proposed to detect overlapping community structures using modularity, spectral, and matrix factorization [17].
In addition, previous studies have usually constructed the aviation network as unweighted and undirected. Their network frameworks decrease the complexity of the air transport system by reducing the model to pair-wise interactions, without considering the particularities of the structure. Although it helps to stay focused on the structural properties of connections, and identify the most relevant mechanisms, additional information is naturally neglected under those circumstances, including flight schedule, aircraft type, and operator [31,32]. While the traffic en route affects the network aspect spatially, and impacts the passenger rerouting choices, it is as important as the topological characters [20]. Since flights are not equivalent, the dynamics of weights along the routes should be considered proportionally, by either flight frequency or passenger number [2]. Cui et al. investigated the fully connected subgraphs with the clustering coefficient and the belonging degrees [21,22]. However, no evidence shows the robustness of those methods in weighted networks. Li et al. accounted for unweighted and weighted networks, and they extracted the maximal cliques to find overlapping vertices or bridge vertices between communities [23]. However, they simply merged two maximal cliques into a larger subgraph for weighted graphs, which is incapable of quantitatively characterizing the effects of weights during the calculation.
To overcome the above-mentioned challenges, this paper attempts to fill the gap and tackle the highly interconnected parts in a weighted network, by introducing a clique-based community detection method to the air transport industry.

Weighted Clique Percolation Method
Clique represents the complete subgraphs in the network, which requires every pair of distinct nodes to be connected. The original clique percolation method creates clique graphs by searching for communities of size k. Hence, a k-clique with k(k − 1)/2 connected pairs represents the strongest possible coupling of k nodes with unweighted links. When a link is removed from a (k + 1) clique, it creates two k-cliques sharing (k − 1) nodes. Those two k-cliques are defined as adjacent. In other words, a k-clique has k(k − 1)/2 connected pairs, while two adjacent k-cliques share (k − 1) nodes.
Based on that, Farkas et al. applied an extension of the original algorithm to find modules in a weighted network [33]. They tended to include a k-clique into a module only when it had an intensity (I) larger than a fixed threshold value (I 0 ). The weight of a subgraph, defined as the subgraph intensity, is implemented by the geometric mean of the weights of its links. Consequently, a weighted clique community is defined as a maximal set of k-cliques with intensities higher than I 0 . While modules can be reached via a series of k-clique adjacency connections, the overlaps between the communities are allowed. The intensity of a k-clique (C) is written as follows: where k(k − 1)/2 and w ij denote the edge number and the weight between node i and j, respectively.
Therefore, defining an optimal I 0 for each k becomes the key for the clique percolation algorithm. If I 0 is too big, the program will exclude all k-cliques, whereas a small I 0 includes all k-cliques and can hardly detect any communities.
Ideally, the size distribution of the communities follows a power law. When the number of communities is small, Farkas et al. proposed to instead optimize I 0 based on the variance (χ) of the community [33], which is defined as follows: where n represents all the communities identified in the network. More precisely, n α denotes a group of communities excluding the largest one, while n β denotes a group of communities that exclude n α and the largest one. As a result, the maximal variance (χ) is associated with the optimized I 0 for each respective k.
When the network or the number of communities is too small to establish a stable estimate of χ, the entropy based on Shannon information becomes another option [34]. The I 0 that has the maximum entropy for the respective k would be desirable to optimize k. The entropy can be defined as follows: where N denotes the number of communities and p m denotes the probability of being in community m.
Lastly, a permutation test is implemented to examine if the entropy is higher than expected by chance. The test creates permutations for the network, and extracts the highest entropy for each k, before calculating the confidence interval of the entropy. By comparing the entropy with the upper bound of the confidence interval, the optimized I 0 for the respective k can be spotted and interpreted.
The research methodology is summarized in Figure 1.
tion algorithm. If is too big, the program will exclude all -cliques, whereas a small includes all -cliques and can hardly detect any communities.
Ideally, the size distribution of the communities follows a power law. When the number of communities is small, Farkas et al. proposed to instead optimize based on the variance ( ) of the community [33], which is defined as follows: where represents all the communities identified in the network. More precisely, denotes a group of communities excluding the largest one, while denotes a group of communities that exclude and the largest one. As a result, the maximal variance ( ) is associated with the optimized for each respective . When the network or the number of communities is too small to establish a stable estimate of , the entropy based on Shannon information becomes another option [34]. The that has the maximum entropy for the respective would be desirable to optimize . The entropy can be defined as follows: where denotes the number of communities and denotes the probability of being in community .
Lastly, a permutation test is implemented to examine if the entropy is higher than expected by chance. The test creates permutations for the network, and extracts the highest entropy for each , before calculating the confidence interval of the entropy. By comparing the entropy with the upper bound of the confidence interval, the optimized for the respective can be spotted and interpreted.
The research methodology is summarized in Figure 1.

Dataset
To verify the effectiveness of the proposed method, this study conducts a comparative analysis with the top ten airline groups by passengers carried in 2019, including American Airlines Group, Delta Air Lines, Southwest Airlines, United Airlines, Ryanair, China Southern Airlines, Lufthansa Group, China Eastern Airlines, International Airlines Group (IAG) and Air China Group.
Intersect holdings have gained increasing attention and popularity among airlines. Instead of merging several holding airlines into one, some groups prefer to maintain airlines' brands and liveries and operate as their subsidiaries. Therefore, the biggest airline is selected from each group to enable knowledge discovery and pattern detection. For example, the scheduled flights of British Airways will be analyzed on behalf of the complex network of IAG.
Among selected airlines, Southwest Airlines and Ryanair (including Ryanair Sun) are low-cost carriers operating point-to-point networks by themselves, whereas the rest are full-service ones operating hub-and-spoke networks with their codeshare partners. Hence, they will be investigated respectively to explore the highly interconnected parts and give particular insights in terms of the topological differences between business models.
As discussed in Section 2.2, the codeshare agreement benefits an airline considerably. It allows the airline to publish and market flights operated by partners under its flight number as part of the published timetable. Those agreements dramatically influence the airline network configuration and reshape the market dynamics worldwide [35]. Yet, most of the research considers the transportation network as an isolated system [36]. Other academics tried to tackle this issue from a multi-layer perspective. By corresponding each layer to a different airline, Hong and Liang calculated topological parameters for Chinese airlines and conducted a comparative analysis among them [26]. Similarly, Li et al. unveiled the multi-layer structure in the aviation industry with a special focus on communities bigger than ten nodes [37]. Likewise, Cardillo et al. sketched the structural properties of the air transport system in Europe [29]. They noticed that the topological properties of the airline network have resulted from multi-layer characters rather than single layers. Although they compared the networks of major airlines and low-cost ones, they only took operating carriers into account, which left the codeshare system remaining almost unexplored. Therefore, it is necessary to devote efforts to and explore the way in which codeshare partners are reshaping topological properties in the aggregate network.
A weekly scheduled non-stop flight dataset (from 1 August 2019 to 7 August 2019) is obtained from OAG (OAG is a global travel data provider with headquarters in the UK. It provides flight information data, including schedules, flight status, connection times and industry reference codes, such as airport codes), including origin, destination, operating and codeshare carriers of each flight. Because the actual passenger number is not available via multiple sources, this research weights each flight by the weekly frequency for the selected airlines accordingly. Hence, the relationships between airports are defined by both topological structures and a traffic-driven indicator. Last, but not least, this study will be primarily focused on airport level rather than city level in order to identify the key players in a multi-airport system. Hence, each airport represents a vertex, while each direct flight connecting an airport pair serves as an edge. Table 2 presents a summary of the transport network statistics for the chosen airlines and their codeshare networks. The number of nodes and edges measures the size of each network, where a node represents an airport, and an edge connects a pair of airports. While the edge-to-node ratio illustrates the average degree, the density investigates the ratio of the actual number of edges to the total possible number of edges. Although airline groups are selected based on the passenger number, the size of the individual airline network varies from one to another. For instance, eight legacy carriers fly to, on average, 200 destinations by themselves, whereas huge gaps are observed in the number of edges connecting those airports. More specifically, MU connects 237 airports with 1711 unique airport pairs, resulting in the highest edge-to-node ratio (7.22). On the contrary, BA connects 208 airports with 453 unique connections, achieving the lowest edge-to-node ratio (2.18). The low average degree of BA represents a loosely connected network. The limited number of edges further confirms the lack of connections, probably for most destinations in BA's network. A higher edge-to-node radio represents a generally better connection within the network, which can usually be confirmed by the density results. Nonetheless, those metrics are not always consistent, since the calculation of density magnifies the weight of nodes. Precisely, low-cost carriers obtain a relatively higher ratio with decentralized network structures, when compared with full-service carriers. Equipped with a higher edge-to-node ratio, FR obtains only half of the density of WN. This is because the gap in the number of their edges outweighs the one in the nodes during the processing of density.

Network Properties for Selected Airlines
With the wide exchange of codeshare agreements, the airline network has become more complex than ever before. Overall, the network density declines with the increase in codeshare partnerships. By that measure, all codeshare systems remain fairly sparse. Regarding the average degree, the results are quite conflicting. While the partnerships lower the edge-to-node ratio for AA, UA, CZ, and MU, other airlines witness dramatic growths in the ratio. This illustrates that the number of nodes and edges does not change proportionately for most carriers when aggregating networks with their codeshare partners. Particularly, the change rate in edges is usually smaller than the square of the rate in nodes, which leads to the drop in density. It is also noticeable that the gaps become smaller, in terms of the sizes among codeshare networks, which may indicate wide homogeneous competition in the airline industry.

Clique Percolation Community Detection Process
The traditional static network framework limits the studies to certain properties of these networks. For instance, it allows identifications for the bottlenecks or the clusters of destinations without measuring the dynamic characteristic of the aviation system [32]. In contrast, the clique percolation method proposes an algorithm to detect the interaction patterns of cliques. Although Eustace et al. worried that the number and the size of k-clique may affect the quality of the detected communities [18], the nature of the airline network limits the cliques to three-/four-node communities in most cases. Subsequently, this study mainly examined the network dynamics of three-/four-clique communities in the system.
Initially, the maximum edge weight is tested as the upper limits for I 0 , as was recommended by Farkas et al. [33]. For instance, the maximum edge weight of 119 is set as the upper limits for AA's codeshare network, in steps of 0.1.
Although the airline codeshare networks seem to be sophisticated, UA's network, for example, can be divided into a maximum of ten communities. When the optimal I 0 is identified by the emergence of the gigantic component, a small number of communities may lead to an unstable threshold. As a result, the maximal variance (χ) is associated with the optimal I 0 for three-/four-clique community identifications for AA, BA, CZ, DL, and MU. It also helps to detect the three-clique communities for CA, LH, UA, and WN. Take AA as an example, for k = 3, the maximal variance equals 4.25, which leads to an optimal intensity (I 0 = 14.2). Among 583 airports in AA's codeshare network, 270 airports are identified and classified into three three-clique communities, while 313 nodes are isolated, including 49 nodes found in three-clique communities and 264 nodes outside cliques. LAX and SEA are identified as the shared nodes, which interconnect the coexistence of structural subgraphs in the system. Similarly, optimal I 0 (9.1) is identified at the point of maximal variance (χ = 2.69) for AA's four-clique communities. The increase in optimal k witnesses the rising number of isolated nodes (385). Three hundred and eighty-two of them are outside four-node cliques and sparsely connected to the network originally. Only 198 airports are identified in the three four-clique communities, while three airports (GRU, LAX, and MIA) are detected as shared nodes.
When detecting four-clique communities for CA, LH, UA, and WN, the number of communities tends to be too small to establish a stable estimate of χ. In this case, entropy becomes the primary indicator in finding the optimal I 0 for the respective k. For instance, the maximum entropy for CA equals 1.002, which is higher than the upper bound of the 95% confidence interval (see Table 3). It indicates that the entropy is higher than expected by chance. Therefore, the I 0 (0.1), at this point, would be desirable to optimal k = 4. Then, the airports can be classified into two four-clique communities with two shared nodes (PEK and PVG). Similarly, four-clique communities are detected for LH, UA, and WN. Since no stable variance is calculated for FR, both its three-clique and four-clique communities are detected based on the maximum entropy. However, neither of them passes the permutation test. Therefore, no high-order community is identified in FR's network.

Community Detection Results and Airline Network Configurations
The different airline operating patterns lead to the different network topological and community structures [20]. Unlike the structures identified in low-order communities, the clique community detection results show that most of the codeshare networks consist of three three-clique groups (see Table 4). The fewer groups identified in the four-clique community, for LH, UA, and CA, suggest an overall better connection among all the cliques. In contrast, low-cost airlines rarely have high-order communities, since their networks are combined with rolling hubs and direct origin-destination pairs. Most airlines have one big well-connected community that is covered by their own capacity and one or two small communities that are possibly guaranteed by their codeshare partners. This confirms that the partnership offers a bypass for an airline to extend its network coverage with limited capacity and traffic rights. Nevertheless, the four-clique communities detected in BA's network limit to several key airports in each group, which is similar to the configuration of WN's three-clique communities. Despite the business model and network size, the similarity in the communities suggests the possibility of them sharing an identical topology profile. More specifically, Figure 2 demonstrates the airports detected in high-order communities. The community groups are highlighted with different colors (yellow, orange, or red, when applicable). Regardless of the overlapping areas, the communities detected in the airline network are separated based on geographical information. This can be explained by the cliques formed in the high-order communities. An airline tends to partner with the one that provides complimentary advantages regionally. Hence, the merging of networks not only supplies existing cliques from the partners, but also generates a large density of new triangles. Basically, the geographical location of the partners' network results in the geographical separation of clique communities. However, the separation does not necessarily mean geographical isolation by countries or continents. Take BA as an example, airports are divided into three groups for three-clique communities. Particularly, most of the airports in the biggest community located in the US and Europe, which represent BA's home advantage across Europe and the trans-Atlantic Ocean. Another two communities are identified in Asia-Pacific (HKG, MEL, and SYD) and Africa (CPT, DUR, JNB, and PLZ). Likewise, four-clique communities are further dissociated into three groups, purely consisting of BA's home ground, but the isolation of MAD with three shared nodes (JFK, DUB, and LHR) does not reflect a loose connection in the network. However, MAD is the headquarter of Iberia, the flag carrier of Spain, and other 100% owned subsidiaries of IAG. The separation indicates a rather dense connectivity among the four nodes. On the other hand, LH, another European carrier, demonstrates its worldwide coverage and mature network via three-clique communities, and overall better connectivity through a single four-clique community.
wide coverage and mature network via three-clique communities, and overall better con-nectivity through a single four-clique community.

Airports' Roles in Codeshare Network: Hub Shifting or Hub Concentration
In the physical airline network, the hub is usually highly connected within the country and to the hubs of other corresponding airlines [29]. This is the result of airline aggregations and alliances decisions under a series of legal, commercial, and technical considerations [38]. Further, this concept indicates the hierarchy among airports, particularly when a carrier runs a multi-hub system. Therefore, hub airports become crucial to airlines' strategic resource optimization.
Topologically, the hub airport, connecting different parts of the network, usually refers to the shared node in the overlapped communities. Hence, at least two communities are necessary for the identification of the overlapping area. Meanwhile, if no shared node is identified, this indicates either the topological isolation of each community, or only one well-connected community detected in the network. In fact, this issue becomes especially vital in an aggregated codeshare network. Whether the hub of an airline can be identified as the shared node reflects the airline's strategic position in the cooperation. To be more specific, if a partner's hub is identified as the shared node, it reveals a possible hub shifting in the codeshare network. This could result from a partner with strong market power, or from the airline losing its dominant position in connecting different regions. In this sense, discovering the influential shared nodes helps airlines to not only recognize the pattern of intercommunity and intracommunity connections, but also understand its position in the network and control the network dynamics regionally.
One main function of the clique percolation method is to identify the overlapping areas. The results show that the hub-shifting phenomenon has been widely observed among six full-service airlines. For instance, although LHR and LGW are BA's hubs in London, three airports (DUB, JFK, and LHR) are marked as influential among its fourclique communities. While JFK and DUB are the hub airports of AA and Aer Lingus (EI), the result proves the partners' contribution in the complex codeshare network, such as enhancing the BA's trans-Atlantic and major European market. More specifically, EI is the flag carrier and the second largest airline in Ireland, now a wholly owned subsidiary of IAG. The detection of DUB confirms the operating strategy of IAG as a group, together with the oneworld alliance partnership with AA. Similarly, only two of AA's hubs (LAX and MIA) are marked among its ten hubs in the United States. SEA is identified as the shared node, along with LAX, among three-clique communities, while GRU, LAX, and MIA are found to be influential among four-clique communities. Since SEA and GRU are the hub airports of Alaska Air Group (AS and QX) and LATAM Brasil (JJ), it can be concluded that the codeshare inside and outside the alliance provides complementary advantages for the airline network, and leads to the hub shifting.
Similar results are found in CA's network. Precisely, PEK and PVG are detected in the overlapped area among the four-clique communities, whereas CA originally hubs in PEK and CTU. Unlike previous examples, PVG is not a hub airport operated by CA's partner outside China. In contrast, the outperformance of PVG establishes the international market power of Shanghai as a possibly geopolitical-based core region, connecting China to the rest of the world [39]. While constant change happens in the aviation sector, CA may encounter more hub shifting with the emerging multi-airport configurations, particularly after the operation of Beijing Daxing International Airport (PKX) and Chengdu Tianfu International Airport (TFU).
The newly identified influential airports expand the airline's connectivity by providing additional transit opportunities and flights covering more regions. However, if only the partners' hubs are identified as influential, it may ring the bell to the airline, indicating the loss of its position in connecting the complete subgraphs across regions. This issue has been found on the network of CZ, DL, and UA. In particular, it has been found in CZ hubs in PEK, PKX, and CAN, operating approximately 3000 domestic flights daily. However, evidence shows that AMS is the only shared node among CZ's three-clique communities, linking Asia and Europe. Meanwhile, three airports in the United States (JFK, LAX, and SFO) tend to be isolated from the other two groups, without any overlapping area. Although CZ's home court is well connected, there is no gateway airport located in its registration country. This gap diminishes CZ's effort in connecting China to the rest of the world via the "Canton Route". Additionally, only a secondary hub (URC) is marked as another shared node among four-clique communities, raising the alarm for this Guangzhou-based carrier. Likewise, CAN, ICN, and PVG are found to be influential among three-clique communities in DL's network, none of which are DL's hub airport located in the United States. Similarly, AKL, PEK, and SYD are found to be critical among UA's three-clique communities.
The previous study claims that medium-sized airports are more strategic in connecting different parts of the network than larger ones, due to the architecture of the air transport system [32]. The existing research also suggested that the global hubs are not necessarily the gateway airports in low-order communities. The most connected cities are not necessarily the most central ones [3]. Those statements are also supported by the identification of URC and AKL in the high-order communities. Each serves just over 20 million passengers annually, URC and AKL seem to be much less considerable compared with other mega airports. Nonetheless, their geographical and topological positions fit the pattern of network dynamics, and connect intercommunity and intracommunity.
On the contrary, the hub airports of LH and MU prove their strategic position in the codeshare networks. FRA and PVG outrank other hubs in three-clique communities, and become the only influential node for LH and MU, respectively. This also indicates that those two airports are more substantial in their multi-hub systems. In four-clique communities, the results are controversial. The fact that no shared nodes are found in the LH's network illustrates strong local connectivity across the single community. By contrast, the expansion of influential nodes (CDG, PVG, SIN, and SYD) in MU's system demonstrates the partners' contribution to improving network efficiency and connectivity worldwide. Particularly, the geographical locations of those hub airports are ideal in connecting different continents.
Lastly, it would become reasonable if no shared node is identified in low-cost carriers' networks, since they usually operate decentralized systems. However, four influential airports are found in WN's three-clique community, which implies the topological difference between the two major low-cost carriers. This can be explained by the geographical configurations and airline network of the United States and Europe.

Findings and Discussion
The research aims to assess the underlying patterns in the high-order communities, and extracts the backbone of the airline network structure with a weighted clique percolation method. Ten airlines are selected from the top ten airline groups worldwide, to exemplify a comparative analysis and verify the effectiveness of the proposed method.
Firstly, this study summarizes the patterns of major airline networks with statistical values, which illustrate the variations in the average degree and density of the selected airline networks. This paper spots the proportionate change in nodes and edges, which may result in the uncertainty of density.
Then, the weighted clique percolation method is introduced to analyze the high-order interaction and clustering properties. Typically, most of the codeshare networks are consist in three high-order communities, whereas low-cost airlines seldom have any high-order community, due to their network structure and lack of partnerships. Meanwhile, the community configuration of BA is close to what WN has, with several key airports in each group. Regardless of the business model and network size, the similarity in highorder community structures suggests the possibility of them sharing an identical topology profile, which is the opposite of what previous studies in low-order communities have found [20]. Moreover, the communities detected by this method are separated based on geographical information, which has not been achieved by other techniques. Basically, the geographical location of the partners' network results in the geographical separation of clique communities.
The influential nodes in the overlapping area help airlines to recognize airports' roles in the network and control the network dynamics. However, the results seem to be rather controversial. This study observes a wide hub-shifting phenomenon among six legacy airlines. The shifting can be classified into three types. The first type combines some of the airline's hubs with their partners', such as AA and BA. The result proves the complementary advantages brought by partners. In contrast, no partners' hub outside China was found in CA's network. Particularly, the outperformance of PVG establishes the international market power of Shanghai, and CA should pay attention to the emerging multi-airport configurations in China. Finally, only partners' hubs were identified in the network of CZ, DL, and UA, which may ring the bell of airlines losing dominant positions in the codeshare network. Aside from shifting, the concentration in FRA and PVG proves their hubs' strategic positions by outperforming other hub airports in the system. In MU's four-clique communities, it is also noticeable that the influential nodes extend to CDG, PVG, SIN, and SYD, offering worldwide connections, contributed to by codeshare partnerships.
There has been very limited research targeting high-order communities in the airline network. Hence, it is not easy to conduct a comparative analysis among the published results from the limited available research. However, it is noticeable that some of the findings are consistent with the patterns that the previous literature has detected in the low-order structures. For instance, this paper identifies two medium-sized airports as gateway airports, which confirms the arguments of Guimerà et al. and Rocha [3,32].

Contribution
Network science has been commonly applied as a quantitative tool to contribute to a better understanding of the various layers of the aviation system. The existing literature usually considers the network of an operating carrier as a single layer, but leaves the codeshare system remaining almost unexplored. The codeshare network is definitely worthy of deeper investigative analysis, as it is the result of airline aggregations and alliances decisions, which are crucial to airlines' strategical resource optimization. Hence, this study fills in knowledge gaps by uniquely taking the codeshare network into account, and addressing its effects on airline topology and transport geography. More importantly, the rarely discussed industry-specific issues are explored, based on the reality of the airline networks in the commercial world, rather than defining communities algorithmically.
Affecting the network aspect spatially and impacting the passenger rerouting choices, the traffic en route is as important as the topological characters for airlines and airports. However, there is very limited literature that has investigated the aviation network as a weighted system. This paper contributes to the existing literature by considering the dynamics of weights along the routes. This research increases the rationale and accuracy of the analysis by taking flight frequency into account, and expands the research scope from topological structure to the real world.
Last, but not least, this study examines the applicability and the robustness of the weighted clique percolation method with a case study, testing ten major airlines with different business models. The results argue that the clique-based analysis is quite distinct from what the existing literature found with low-order methods, which reveals new insights in air transport geographical and topological patterns. First, the result suggests that the high-order communities detected in the airline systems are separated based on geographical information, which has not been achieved by other techniques. Second, the topological hubshifting phenomenon is observed, revealing that the topological gateway airports are not always consistent with the actual hub airports of the airlines. Third, unlike previous studies on low-order communities [20], no clear patterns in the high-order network topology and community structures are identified between legacy carriers and low-cost airlines. In contrast, this research reflects the possibility of airlines with different business models and network sizes sharing an identical topology profile. This overall ensemble of unique inputs that were applied in this study separates it from other studies.

Limitations and Future Work of The Study
This research explores the spatial distribution of the community structure. However, the analysis leaves two issues, which can be addressed in future research. First, there is no commonly accepted standard to evaluate the detection of communities [22]. Second, this study targets the dynamics of a static network. A study on the temporal network will become more meaningful in evaluating the epidemic outbreak and traffic dynamics, especially during the pandemic [40]. Therefore, further research is necessary to fill these gaps.

Conclusions
To yield insightful results revealing the organization of complex aviation systems, this study first summarizes the patterns of major airline networks. The statistical values support the variations in the average degree and density of the selected airline networks, including legacy carriers and low-cost ones. It is also worthy to notice that the proportionate change in nodes and edges may bring uncertainty to the calculation of density.
This study then introduces a weighted clique percolation method to the airline industry, to assess and interpret the network structures topologically. As complex as it may seem, the airline network tends to be a relatively small system with only a few high-order communities. Legacy carriers follow the hub-and-spoke structure to improve the coverage of airports and maximize efficiency, whereas low-cost airlines seem to lose interest in the centralized network. However, there are certain topological similarities between them.
A comparative analysis confirms that the proposed method can be used in conjunction with other metrics to shed light on air transport network topology, and it may become one of the most preferable ways to measure airline networks. The results quantify and interpret the high-order communities with geographical characteristics, while emphasizing the hub-shifting and hub-concentration phenomena at the level of an aggregate codeshare network. Although airlines do not usually make decisions based on topological factors, the new insights spot the connections between topological patterns and the physical and geographical perspective. Precisely, the geographical separation of the high-order communities confirms the regional complimentary advantages brought by partners. On the other hand, the hub-shifting phenomena indicates the lower hierarchy of the airline in the codeshare network. Since the hub-shifting phenomena rings the bell of one airline losing its position in the codeshare partnership, efforts are necessary for the airline to facilitate its core market, and therefore adjust its strategy and physical network accordingly.

Acknowledgments:
The authors are grateful to the anonymous reviewers and the editor for their valuable suggestions, which improved the manuscript considerably.

Conflicts of Interest:
The authors declare no conflict of interest regarding the publication of this paper.