Research on the Topological Properties of Air Quality Index Based on a Complex Network

: To analyze the dynamic characteristics of air quality for enforcing effective measures to prevent and evade air pollution harm, air quality index (AQI) time series data was selected and transformed into a symbol sequence consisting of characters ( H , M , L ) through the coarse graining process; then each 6-symbols series was treated as one vertex by time sequence to construct the AQI directed-weighted network; ﬁnally the centrality, clusterability, and ranking of the AQI network were analyzed. The results indicated that vertex strength and cumulative strength distribution, vertex strength and strength rank presented power law distributions, and the AQI network is a scale-free network. Only 17 vertices possessed a higher weighted clustering coefﬁcient; meanwhile weighted clustering coefﬁcient and vertex strength didn’t show a strong correlation. The AQI network did not have an obvious central tendency towards intermediaries in general, but 20.55% of vertices accounted for nearly 1/2 of the intermediaries, and the varieties still existed. The mean distance of 68.4932% of vertices was 6.120–9.973, the AQI network did not have obvious small-world phenomena, the conversion of AQI patterns presented the characteristics of periodicity and regularity, and 20.2055% of vertices had high proximity prestige. The vertices fell into six islands, the AQI pattern indicating heavy or serious air pollution lasting six days always lingered for a long time. The number of triads 2-012 was the largest, and the AQI network followed the transitivity model. The study has instructional signiﬁcance in understanding time change regulation of air quality in Beijing, opening a new way for time series prediction research. Additionally, the factors causing the change of topological properties should be analyzed in the future research.


Introduction
Air pollution is hazardous chemicals released into the atmosphere by a number of natural and/or anthropogenic activities. The change of atmospheric composition is attributed to the combustion of fossil fuels [1]. Air pollutants, such as carbon monoxide (CO), respirable particulate matter (PM 2.5 and PM 10 ), nitrogen oxides (NO x ), ozone (O 3 ), sulphur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), and nitric oxide (NO), differ in their reaction properties, chemical composition, time of disintegration, emission, and diffuse ability over short or long distances. Air pollution has both acute and chronic impacts on human health, and causes or aggravates numerous organic and systemic diseases, such as heart disease, respiratory irritation, lung cancer, chronic bronchitis, and acute respiratory infections, which will bring about premature mortality and reduce life expectancy [2][3][4]. Therefore, air pollution is a fundamental issue of global concern. Atmospheric pollution in China is very serious. The "Environmental Performance Index: 2016 Report" released by Yale University showed that as the world's second largest economy, the air quality in China was the second-lowest in the world, only slightly better than crude oil future prices and spot price from 25 November 2002 to 24 September 2010 [25]. Wan, Shu, and Guo (2012) presented a method to model frequent patterns and their interaction relationship in sequences based on a complex network [26]. Zhou, Gong, Zhi, and Feng (2008) collected data of the surface temperature from 160 Chinese weather observations and investigated the topology of Chinese climate networks by using a complex network [27]. Until now, few scholars have established an AQI network and used complex network theory to study the AQI change law. Therefore, this study is undertaken to analyze the AQI time series data with complex network theory, to fill the research gap.
This study selected the time serial data of the AQI in Beijing from 1 November 2013 to 31 October 2017; transformed it into a symbol sequence consisting of three characters {H, M, L} through the coarse graining process; then defined 272 AQI patterns as vertices to create one directed-weighted network of the AQI via sliding sequence; and finally analyzed the topological properties of vertex strength, strength distribution, weighted clustering coefficient, betweenness centralization, islands, distance, proximity prestige, ranking and triadic.
The rest of this article is organized as follows. The data and complex network theory are introduced, and the AQI complex network is constructed in Section 2. The topological properties of the AQI complex network are analyzed in Section 3. At last, conclusions are summarized and future research is suggested in Section 4.

Data Description
The data used in the study is the AQI of Beijing from 1 November 2013 to 31 October 2017 (Figure 1), which was obtained from the China National Environmental Monitoring Centre (http://www.cnemc.cn/).

Complex Network Theory
The complex network is a system composed of a mass of vertices and edges, which considers distinct elements or actors represented by vertices (or nodes) and the interaction or relationship between the elements or actors as edges (or links), usually shown as a graph to describe phenomena in social or natural sciences. Nowadays, the complex network is widely used in various scientific fields, such as biological networks, telecommunication networks, social networks, cognitive and semantic networks, and computer networks. It draws on theories and methods including graph theory from mathematics, information visualization from computer science, statistical mechanics from physics, social structure from sociology, and inferential modeling from statistics [28].
Complex networks are very different from traditional statistical methods. For example, in social sciences, traditional statistical methods carry out research based on attribute data, such as gender, age, income, attitudes, shared values, etc. However, individuals live in a particular social environment, and their behavior is affected by others. Statistical methods, like a "meat grinder", divorce individuals from the social context in which they live and assume that there are no relations between individuals.
In contrast, the complex network conducts research on the basis of relational data, and captures attitudes and behaviors determined by social structures through relationship analysis [29].
The characteristic analysis of complex network mainly includes centrality analysis, clusterability analysis, ranking analysis, and network type analysis. The central analysis is to find influential vertices, the clusterability analysis is for discovering cohesive subgroups from the network, and the ranking analysis is for extracting discrete ranks from relations. The scale-free network and small-world network are the most common network types (Table 1). The measurement indicators of centrality analysis include degree centrality, closeness centrality, and between centrality. Degree centrality is a simple measure that counts how many neighbors a vertex has. The more neighbors, the more important the vertex. Closeness centrality measures the mean distance from a vertex to other vertices. Betweenness centrality measures the extent to which a vertex lies on paths between other vertices. In a weighted network, the degree centrality and closeness centrality are usually measured by vertex strength and weighted clustering coefficient.
Network clusterability analysis is an important way to understand the network structure and function. Within a complex network, different clusters often exist. The internal vertices interact a lot, while there are few interactions between dissimilar clusters. The island analysis is employed to detect the network structure and internal clusters in this study.
In a directed network, the relationship direction is not very important to brokerage, but central to ranking. Ranking is connected to asymmetric relations. If one vertex receives many choices and reciprocates few choices, it is deemed as enjoying more prestige. Patterns of asymmetric choices may reveal the hierarchy of layers in a directed network. In network ranking analysis, prestige and triadic analysis are used to extract discrete ranks from relations.
There is a large class of scale-free (SF) networks, so-called because zooming any part of the degree distribution doesn't change its shape. The degree distribution of a scale-free network is heterogeneous and follows a "power-law", which means a few called hub vertices have a significant amount of connections with other vertices and play a leading role in scale-free networks, but most vertices have a small quantity of connections. A straight line on a log-log plot is strong evidence for power-laws, with the slope of the straight line corresponding to the power-law exponent.
Small-world networks are also a common network type, in which-although most vertices are not neighbors of one another-the vertices can be reached from every other vertex by a small number of hops or steps. A small-world network's features are a shorter characteristic path length and a larger clustering coefficient [30].

Vertex Strength
In a directed network, the number of arcs one vertex receives is called in-degree, and the number of arcs it emits by the vertex is called out-degree. In a directed-weighted network, vertex strength measures the importance and impact of one vertex, and strength distribution exhibits dispersion degree and variation of vertex strength. The vertex strength is defined as follows: In Formula (1), w ij denotes the weight of edge (i, j), the N j represents all other edges connected to vertex j.
The strength distribution is measured with the proportion of current vertex accounts for total vertex strength. The calculation formula is given as follows: In Formula (2), S j is the strength of vertex j, and S is the vertex strength summation. A special statistical indicator, weighted frequency W of n-same symbols series, is defined in this study. It means the occurrence probability that one air quality represented by the symbol (H, M, or L) will last for n days, which is used to measure the duration and level of air pollution. The weighted frequency W of n-same symbols series is defined as Formula (3).
In Formula (3), m i is the occurrence number of the n-same symbols series i appears in the AQI pattern. P(s) is the strength distribution of the n-same symbols series i.

Weighted Clustering Coefficient
The weighted clustering coefficient of one vertex is used to measure the degree of association with surrounding vertices in a weighted network. The higher the clustering coefficient, the more constant the contact between adjacent vertices. Traditional calculation methods do not consider edge weight, so Onnela et al. and Holme et al. proposed the definition of weighted clustering coefficient for the weighting network [31,32]. Our study defines the weighted clustering coefficient as follows: In the Formula (4), k i is the vertex strength of vertex i, s i is the degree of vertex i, w ij is the edge weight of (i, j), a ij a jk a ki is used to determine whether there is tie between three vertices. The result 0 indicates a tie does not exist, result 1 indicates a tie exists, three vertices form a triangle in the network.

Betweenness Centralization
A geodesic from u to v is the shortest path between two vertices, and the length of the geodesic is called the distance from u to v. If one vertex is the intermediary of information communication or resource flow, it is more important, because removing this vertex will destroy communication ties and break resource flow in the network. The intermediary of one vertex is measured by the concept of betweenness centrality.
In general, the proportion of all geodesics between other vertices in the network that include a vertex is called the betweenness centrality of this vertex, and the betweenness centralization is the variation in the betweenness centrality of vertices divided by the maximum variation in betweenness centrality scores possible in a network of the same size.
We assume C k (i, j) is all geodesics of one vertices pairs (i, j) that include vertex k, C(i, j) is the total geodesics of all vertices pairs in a network, so the betweenness centrality of vertex k is defined as Formula (5).

Island
Most exploration technology of cohesive subgroups is based on the number of neighbors, but the island analysis is based on multiplicity or value of edges. The island concept was introduced by John Scott [33], who defined an island as one maximal subnetwork containing edges with a multiplicity equal to or greater than m and vertices which are incident with these edges. In an island, vertices are connected by edges of multiplicity m or higher to at least one other vertex.

Proximity Prestige
There are three criteria for measuring the importance of a vertex: in-degree, input domain, and proximity prestige. In-degree only takes account of direct choices and leaves out indirect choices, so it is a very strict measure of prestige. The input domain of a vertex is the number or percentage of all other vertices that are connected directly or indirectly by a path to this vertex. However, in a well-connected network, the input domain of a vertex often contains all or almost all other vertices; it does not distinguish very well between vertices, so the input domain of a vertex is not a perfect measure of prestige [29].
In this case, we assume nominations by close neighbors are more important than distant neighbors, limit the input domain to direct neighbors or to neighbors at maximum distance two, and finally propose proximity prestige to estimate the importance of a vertex. The proximity prestige considers all connected vertices, and weights each connected vertex by its path distance to the vertex. The proximity prestige of a vertex is defined as the proportion of all vertices (except itself) in its input domain, divided by the mean distance from all vertices in its input domain.

Triadic Analysis
A pair of vertices and the lines between them is a dyad. Dyads fall into two categories: symmetric and asymmetric. A symmetric dyad means equivalence, a vertex is supposed to reciprocate the choices that it receives, while an asymmetric dyad signifies ranking, one vertex chooses the other but this choice is not reciprocated. Both mutual choices and mutual absent choices are symmetric. For analyzing the topological structure of a directed network, we proceed from dyads to triads and list 16 basic types of triads [29].
Triad type is identified by a M-A-N number of three digits, that is proposed by Davis, Holland, and Leinhardt [34]. The three digits respectively refer to the number of mutual positive dyads (M), the number of asymmetric dyads number (A), and the number of null dyads number (N). In triads with the same M-A-N digits, a letter is added to represent the direction of asymmetric choice: C for cyclic, U for Up, D for down, and T for transitive ( Figure 2).

AQI Network Construction
To explore the AQI fluctuation with time, we transform the time series of the AQI into a directed-weighted network of the AQI.

Data Coarse-Grained Processing
The study object is the daily fluctuating information that rises or falls, so the AQI is coarsely grained at first to convert the continuous time series data into a wave state symbol sequence. The coarse-grained treatment is to discard the small details, break the original time series data into multiple finite sub-intervals, and use a character to represent the interval homogenization of subintervals. The symbol states are limited, which is more conducive to revealing the nature of AQI fluctuation. The accuracy of coarse-grained treatment determines the validity of the research conclusions, so there should not be too many symbol types, to be able to represent the stage characteristics of the AQI and be independent of each other.
Too much classification is not conducive to the discovery of wave patterns of AQI, therefore, according to fluctuation range (Formula (6)), the AQI data series is abstracted into three symbols H, M, and L, transformed into symbol sequence CS i , as shown in Table 2.
In Formula (6), the symbols H, M, and L respectively denote heavy or serious air pollution, mild or moderate air pollution and excellent or good air quality.

Definition of AQI Pattern
After the coarse-grained processing of the AQI, the time series data is transformed into the symbol sequence CS t , which exhibits different rising or falling amplitudes of AQI. The time series and symbol sequence CS t of AQI are equivalent. The symbol sequence CS t is defined as follow.
A series of 1-10 symbols is individually regarded as one series, and the symbol sequence CS t is transformed into AQI patterns through slide operation. The calculations reveal that a 6-symbols series as one pattern shows a stronger regularity (Table 3). Finally, 292 patterns are obtained from the symbol sequence CS t via a 6-symbols series as one pattern. Each pattern evolves on the basis of the previous pattern, emerging with the characteristics of memorability and diversification. The symbolic sequence CS t should have 3 6

Construction of the AQI Directed-Weighted Network
Each pattern is regarded as one vertex, the conversion among patterns is treated as one edge, finally, a directed-weighted network of the AQI is established. Edge (arrow) between vertices indicates the conversion direction, and edge size identifies the conversion frequency between patterns. In a directed-weighted network, the in-degree is the times other vertices are converted to current vertex, and out-degree is the times current vertex points to other vertices. Due to arcs between vertices being generated in a chronological order, the in-degree and out-degree of a vertex are equal (except the first and last vertex), so our study sets the number of received arcs as the edge weight of an AQI network. The entire modeling process is visualized in Figure 3. In the AQI network, the vertex representing air pollution (H or M) lasting six days is set to red, the vertex representing air pollution (H or M) lasting 3-5 days is set to yellow, and the vertex representing air pollution (H or M) lasting 1-2 days is set to blue. Figure 3 shows that there are more conversion times and dense connection between vertices such as LLLLLL and LLLLLL, LLLLLL and LLLLLM, MMMMMM and MMMMMM, LLMMLL and LMMLLL.

Statistic Analysis
In the research duration, the statistics go through four years and four seasonal cycles. Among 1457 days in four years, the symbols H, M, and L respectively appear 162 times (11.12%), 513 times (35.21%), and 782 times (53.67%). Pollution days (H or M) in four years account for 46.33%, which indicates air pollution is serious in Beijing. From the annual statistics, the symbols H and M reduce, and the symbol L increases, which demonstrates that the overall air quality improves gradually in Beijing (Figure 4). From the seasonal statistics, heavy or serious air pollution (H) is most likely to occur in winter, mild or moderate air pollution (M) often appears in spring or summer, and the best air quality (L) recurs mostly in autumn ( Figure 5).

Vertex Strength
In AQI networks, the vertex strength represents the importance of the vertex, which indicates links received from other vertices. The larger the vertex strength is, the higher the vertex occurrence probability is, the more important the AQI pattern represented by vertex. Meanwhile, if a huge gap of vertex strength exists, the strength distribution will exhibit a "power-law" distribution, which means that a few vertices have a dominant effect in the AQI network.
The vertex strength and strength distribution of a directed-weighted network of the AQI in Beijing are calculated and shown in Table 4.  Table 4 illustrates the vertices LLLLLL, LLLLLM, and MLLLLL have maximum strength. The vertex strength of LLLLLL that means excellent or good air quality (AQI ≤ 100) lasting six days is 129, the strength distribution is only 8.7466%. However, the vertices that represent air pollution occurred at least one day in six days account for 91.2534%. The vertex strength and strength distribution of vertex HHHHHH that signifies heavy or serious air pollution (AQI > 200) lasting six days are seven and 0.4821%, whose vertex strength is ranked 46th in 292 vertices. The vertex strength and strength distribution of vertex MMMMM that indicates mild or moderate air pollution (100 < AQI ≤ 200) lasting six days are 34 and 2.3426%, whose vertex strength is ranked seventh in 292 vertices. These facts illustrate that the air pollution in Beijing is generally serious, but the best or worst air quality is rare, annd mild or moderate air pollution often occurs.
Sorted by vertex strength in descending order, among 292 vertices, the strength of the first 40 vertices is above eight, the strength distribution is above 0.5510%, and the cumulative strength distribution is 61.1570%. The strength of the first 20 vertices is above 16, the strength distribution is above 1.1019%, and the cumulative strength distribution is 45.5234%, which indicates the conversions of the AQI patterns frequently occur from the first 20 vertices to other vertices, or from other vertices to the first 20 vertices, or among 20 vertices. The strength distribution of the last 204 vertices is less than 0.25%, and the cumulative strength distribution is less than 20.6612%, which implies most AQI patterns represented by vertices rarely appear.
All vertices were listed in ascending order of vertex strength, the vertex strength and cumulative strength distribution (CSD) were treated with logarithmic function to get variables log(s) and log(csd), then taking log(s) and log(csd) as independent and dependent variables to establish the linear regression equation, and taking log(s) and log(csd) as X axis and Y axis to draw Figure 6. The obtained equation was y = −0.2996x + 0.0084 with R 2 = 0.8309. The equation and Figure 6 proved the vertex strength and cumulative strength distribution of the AQI network were in line with "power-law" distribution. Similarly, it was found that the vertex strength and cumulative strength distribution of the first 40 vertices (Figure 7) and the last 252 vertices (Figure 8) also conformed to the "power-law". The linear regression equation of the first 40 vertices was y = −0.741x + 0.5205 with R 2 = 0.9592, and the linear regression equation of the last 252 vertices was y = −0.1682x − 0.01 with R 2 = 0.9259.
All vertices were presented in the descending order of vertex strength. The vertex strength and ranking were treated with logarithmic function to get variables log(s) and log(r), then log(s) and log(r) were regarded as dependent and independent variables to establish the linear regression equation. The obtained equation was y = −1.0504x + 2.5181 with R 2 = 0.9528. The results demonstrated the strengths and rankings of vertices in the AQI network also followed the "power-law" distribution ( Figure 9).
The fact that vertex strength and cumulative strength distribution, and vertex strength and ranking all followed the "power-law" distribution proved that a few vertices played a leading role in the AQI network. Evidence from Table 4 showed that these hub vertices were LLLLLL, LLLLLM, MLLLLL, LLLLMM, LLLMMM, MMLLLL, MMMMMM, LMLLLL, LLMLLL, and LLLMLL; in particular, the AQI pattern MMMMMM has a larger vertex strength, which once again verified that air pollution in Beijing is mainly due to moderate pollution.    Long-term observation found that air pollution or excellent weather had persistent features, often lasting many days, so the weighted frequency W of n-same symbols series was calculated to analyze the pollution duration and level. The weighted frequency of the n-same symbols (H, M or L) series in the first 40 vertices had been calculated according to Formula (3), described below in Table 5. In the n-same symbols series, HHHHH means the probability that symbol H will last five days in six days, and the rest can be done in the same manner. Table 5 shows that with the increase of the sequence length, the weighted frequency of the n-same symbols series decreases gradually. The weighted frequency of the n-same symbols series about L is the highest, which indicates the air quality in Beijing is dominated by excellent or good weather. The weighted frequency of the n-same symbols series about L and M is far above that of H, which demonstrates a mild or moderate level of air pollution (M) is more likely to occur than heavy or serious air pollution (H). The weighted frequencies of the same symbols series HHHH and HHHHH is 0, which indicates the probability of heavy or serious air pollution lasting for four or five days is very low, close to 0. The weighted frequency of same symbols series LLLLL is 0.26, which means there is a low probability of excellent or good air quality for five days. Overall, excellent or good weather dominates, but air pollution intermittently recurs in Beijing.

Closeness Centrality
In an AQI network, the weighted clustering coefficient quantifies the closeness degree between the vertex and its adjacent vertices. The larger the weighted clustering coefficient of one vertex, the more frequent and easy the conversion from an AQI pattern represented by this vertex to other AQI patterns.
The statistics notes that there are 17 vertices with a weighted clustering coefficient that is not 0, the vertices HHHHHL, HHHHHH, LLLLLL, and MMMMM possess a higher weighted clustering coefficient. The AQI patterns represented by them are closer than with others patterns; they convert to other patterns more frequently and easily. Particularly, the vertex MMMMM has both a larger vertex strength and weighted clustering coefficient, which means the AQI pattern moderate air pollution (M) lasting 6 days holds an important position in the AQI network (Table 6). Taking the vertex strength and weighted clustering coefficient as the X coordinate and Y coordinate, it is found that the correlation between two variables is not strong, and the AQI network presents complicated polymorphism ( Figure 10).

Betweenness Centrality
In an AQI network, the betweenness centrality of a vertex indicates the influence of the AQI pattern represented by this vertex as a communication hub, so controlling the intermediary vertex will cut off the path of air pollution diffusion. The betweenness centrality also helps to better understand the conversion process of AQI patterns, and provides a theoretical basis for formulating the treating measures of air pollution.
The betweenness centrality of each vertex in AQI network of Beijing was calculated and sorted in descending order, as shown in Table 7. The vertex with highest betweenness centrality is LLLMHH, followed by MLLLMM, LLLLMH, MLLLLM, HLLLLM, LLLLMM, LMLLLM, LLLMMM, MMLLLL, and LLLMMH. It is observed that the intermediary of every vertex is not strong. The betweenness centralization of the whole AQI network is 14.80%, indicating the AQI network does not have an obvious central tendency towards intermediaries. It is difficult to control the spread of air pollution by controlling some intermediary vertices.
The cumulative distribution curve of betweenness centrality appears in Figure 11, which demonstrates that the slope is not steep, and the overall distribution of betweenness centrality is uniform. While the slope decreases gradually in the back, the varieties in the intermediation of the vertices still exist. Statistics also lends evidence that the cumulative distribution of the top 60 vertices reaches 49.09%, indicating that 20.55% of vertices account for nearly 1/2 of the intermediaries in the AQI network. Paying attention to these vertices and taking actions to control them will have great significance in preventing the diffusion of air pollution. Figure 11. The cumulative betweenness distribution of betweenness centrality.

Structural Clusterability Analysis
The clusterability analysis will reveal how many clusters exist in the AQI network, and which AQI patterns often appear together. The multiple edges are more institutional and less personal, so we use island analysis technology to explore network clusters. When minimum island size is set to two, and maximum island size is set to 291, the vertices of the AQI network fall into six islands (Table 8).  Table 8, island 0 represented by HHHHHM means the corresponding vertices do not belong to any category. Islands 1, 3, 4, 6 are respectively represented by LLLHHL, LHHLLL, HHHHML, LMHLLL, and only contain two vertices. Island 5 contains three vertices. However, island 2 represented by HHHHHH contains 132 vertices, which reveals AQI pattern HHHHHH-meaning heavy or serious air pollution lasting six days-is very cohesive; many AQI patterns emerge around it, which once again proves that the air pollution in Beijing is very serious, and, more significantly, that heavy or serious air pollution always lingers for a long time. Figure 12 visually exhibits the six islands of the AQI network. It is found that island 2 is the largest one; more connections between vertices occur in the same island, but fewer in different islands, and the more multiple connections, the thicker the line.

Prestige
The vertex with a higher ranking receives many choices from other vertices, so the AQI pattern represented by it tends to be an endpoint or residential location in the air pollution diffusion process. On the contrary, one vertex with a lower ranking is often a starting point or a transit point in the air pollution diffusion process.
As shown in Table 9, the AQI network is a fully connected network, and the input domain of each vertex contains all other vertices (291), so the proximity prestige is equal to the inverse of the mean distance. Table 10 reveals the mean distance of 68.4932% vertices is 6.120-9.973, which reveals the AQI network does not have obvious small-world phenomena, and the conversion of AQI patterns presents the characteristics of periodicity and regularity. Table 11 illustrates that 64 (21.9178%) vertices have higher proximity prestige and shorter mean distance with other vertices. AQI patterns represented by them stay a long time in the AQI fluctuation process, dominating the diffusion of air pollution.

Triadic Analysis
The triad census of the AQI network is compared with the chance distribution of triad types; if several triad types of AQI network occur more frequently than expected by chance, the corresponding triad type may guide or influence the structure of AQI network. When the number of forbidden triads is less than expected by chance, balance theory could be used to explain network structure, which divides the network types into balance, cluster ability, ranked clustering, transitivity, or hierarchical clusters. Table 12 indicates that the chi-square statistic is statistically significant at the 0.001 level, and the AQI network is clearly different from that random network. The frequency of three in five forbidden triads (7-111D, 8-111U, 11-201) is less than expected by chance. Two triads (2-012, 9-030T) are expected to occur more often than by chance, thereinto the number of triads 2-012 is 127635, appearing substantially more often than the chance-expected number 126,531.02, so a transitivity model seems to be the best choice for the AQI network. However, many other triads (300, 16-300, 1-003, 4-021D, 5-021U, 12-120D, 13-120U, 14-120C, 15-210) appear less than expected by chance. This casts some doubt on the reliability of the chi-square measure.

Conclusions and Discussion
This study converted time series data into a symbol sequence through the coarse graining process; established the directed-weighted network of the AQI; then analyzed the centrality, clusterability, and ranking of the AQI network. The main results and conclusions are summarized as follows.
The statistics show that air pollution in Beijing is serious, but air quality improves gradually. The AQI in Beijing has seasonal variations; heavy or serious air pollution mostly recurs in winter, and the excellent or good air quality often appears in autumn. The statistical results are consistent with the subjective experience, perhaps because the meteorological conditions of autumn in Beijing are better and favorable for the diffusion of air pollutants.
The vertex strength and cumulative strength distribution, and vertex strength and ranking follow "power-law" distribution; the AQI network is a scale-free network, which means only a few AQI patterns-represented by so-called super vertices which play a leading role in the AQI network-appear frequently. The best or worst air quality in Beijing is rare; mild or moderate air pollution often occurs. The probability of heavy or serious air pollution lasting for 4 or 5 days is very low, close to 0.
17 vertices have a weighted clustering coefficient greater than 0. The vertex MMMMM has both larger vertex strength and weighted clustering coefficient, holding an important position in the AQI network. The correlation between vertex strength and weighted clustering coefficient is not strong and the AQI network presented complicated polymorphism.
The AQI network does not have an obvious central tendency towards intermediaries, but the varieties in the intermediation of the vertices still exist; 20.55% of vertices account for nearly 1 2 of the intermediaries in the AQI network. It is difficult to restrain the diffusion of air pollution by controlling the intermediate vertex, but it is also possible. The vertices of the AQI network fall into six islands; the largest island represented by HHHHHH contains 132 vertices, which means the AQI pattern of heavy or serious air pollution lasting six days is very cohesive, always lingering for a long time.
The AQI network is a fully connected network, which does not have obvious small-world phenomena; the mean distance of 68.4932% vertices is 6.120-9.973, and the conversion of AQI patterns presents the characteristics of periodicity and regularity. The 64 vertices had high proximity prestige and dominated the AQI network. They are often the endpoints or residential locations in the air pollution diffusion process, dominating the diffusion of air pollution. The number of triads 2-012 is the largest, and the AQI network seems to follow the transitivity model, however, many other triads appear less than expected by chance, which casts some doubt on the reliability of the chi-square measure.
Air pollution is one essential environmental problem. Our study firstly applies complex network theory to analyze the AQI, and reveals the AQI fluctuation law and internal mechanism, which can provide evidence for formulating the countermeasures about preventing and controlling air pollution. Meanwhile, our study also presents a new approach for time series prediction, contributing to existing studies. In different areas and times, air quality is affected by different factors, such as atmospheric conditions, fossil fuel emissions, landform features, and measures for prevention and control of pollution. Although revealing the topological properties hidden behind the AQI time series, this study does not analyze the factors that lead to the change of air quality. Constructing an influence factors model to reveal the causes of AQI pattern variety in different times is the research direction and content for the future.