A Dynamic Global–Local Spatiotemporal Graph Framework for Multi-City PM2.5 Long-Term Forecasting

Huang, Yao; Zhu, Xianxun; Wang, Rui; Xie, Yanan; Fong, Simon

doi:10.3390/rs17162750

Open AccessArticle

A Dynamic Global–Local Spatiotemporal Graph Framework for Multi-City PM_2.5 Long-Term Forecasting

by

Yao Huang

¹

,

Xianxun Zhu

¹,

Rui Wang

^1,*

,

Yanan Xie

¹ and

Simon Fong

²

¹

School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China

²

Faculty of Science and Technology, University of Macau, Macau 999078, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(16), 2750; https://doi.org/10.3390/rs17162750

Submission received: 11 June 2025 / Revised: 5 August 2025 / Accepted: 6 August 2025 / Published: 8 August 2025

(This article belongs to the Special Issue Remote Sensing and Climate Pollutants)

Download

Browse Figures

Versions Notes

Abstract

Accurate PM_2.5 prediction is essential for effective urban air quality management. However, existing methods often struggle to capture the complex, nonlinear, and coupled spatiotemporal dynamics in long-term air pollution evolution. Most existing models rely on short-term observations and overlook long-range temporal trends and inter-station dependencies, which limit their ability to capture the spatiotemporal evolution of air pollution. To address these challenges, we propose a novel dynamic global–local spatiotemporal graph framework for PM_2.5 long-term forecasting across multiple cities. Specifically, we introduce a Multi-Station iTransformer (MS-iTransformer) module to capture long-term temporal dependencies from station-specific historical sequences. To globally model evolving inter-city relationships, we design a bilinear spatiotemporal attention (BSTA) module to adaptively build dynamic spatiotemporal graphs using bilinear spatial and temporal attention. Furthermore, we propose a Graph-Enhanced Spatiotemporal Module (GESM) to capture localized spatiotemporal dependencies through graph convolution and recurrent modeling. The experimental results demonstrate that our model has significant improvements across PM_2.5 forecasting tasks on three real-world air quality datasets, outperforming widely adopted baseline approaches. The MAE and RMSE are decreased by 1.7665 and 1.8578, respectively. The FAR is reduced by 0.0312. The CSI and R² are improved by 0.0194 and 0.0260, respectively. Therefore, the proposed method achieves accurate air quality forecasting by effectively capturing long-term temporal trends, dynamic spatial dependencies, and localized spatiotemporal interactions.

Keywords:

PM_2.5 prediction; graph neural network; global modeling; local modeling

1. Introduction

Recently, air pollution has intensified globally to emerge as a major threat to public health and the ecological environment. PM_2.5 poses one of the most serious risks due to its extremely fine particle size, which allows it to penetrate the respiratory tract, reach the lungs, and even enter the bloodstream, causing severe harm to human health [1]. Therefore, PM_2.5 is widely regarded as one of the most hazardous air pollutants [2,3]. Accurately predicting the trends of PM_2.5 concentration has become a central challenge in air quality monitoring and environmental management. Traditional monitoring methods are no longer sufficient for the increasing complexity of air pollution sources. We employ advanced spatiotemporal modeling and intelligent forecasting techniques to enable efficient, accurate, and long-term dynamic prediction of PM_2.5 concentrations.

The PM_2.5 concentration forecasting consists of short-term predictions and long-term predictions. The former is primarily influenced by rapidly changing factors (such as temperature and wind speed) and sudden emission events [4]. In contrast, the seasonal patterns and annual climate variations affect long-term forecasting [5,6], which requires a comprehensive understanding of dependencies across multiple temporal scales to accurately capture the dynamic evolution of air quality. Currently, long-term PM_2.5 forecasting methods face a variety of external interferences, such as fluctuations in meteorological conditions and human activities [7]. Further, they often require the integration of multi-source heterogeneous data, including meteorological variables, pollutant indices, and other exogenous and anthropogenic factors, which significantly increases the complexity of model construction and places greater demands on generalization capabilities [8]. Recurrent Neural Networks (RNNs) [9], Long Short-Term Memory (LSTM) [10], and Gated Recurrent Units (GRUs) [11] have shown advantages in handling small-scale or short-sequence tasks due to their relatively simple structures and fewer parameters. However, they struggle to fully capture complex temporal features and long-range dependencies for long-sequence tasks [12,13]. Therefore, Zhu et al. [14] designed an attention mechanism-based parallel network model, which extracts short-term and long-term temporal features to effectively capture complex temporal dependencies and significantly improve the accuracy of PM_2.5 concentration forecasts. Fang et al. [15] introduced a novel decomposing-ensemble and spatiotemporal attention model, which decomposes mixed-mode time series into single-mode series and automatically assigns weights for spatiotemporal factors to enhance prediction precision.

In addition, Wen et al. [16] integrated CNNs with LSTM variants to capture spatiotemporal dependencies. Zhang et al. [17] extracted local features using CNNs and employed a spatiotemporal attention mechanism to assign different weights to various time steps and spatial regions, thereby improving the model′s sensitivity to dynamic spatiotemporal variations. However, the spatial relationships between monitoring stations are difficult to accurately characterize using Euclidean space, as they are influenced not only by distance but also by a variety of non-structural factors such as topography, terrain, and wind direction [18,19]. Therefore, the PM_2.5 concentration forecasting in space is more suitable in a non-Euclidean space [20].

Graph Neural Networks (GNNs) have emerged as a widely used approach for modeling relationships in non-Euclidean spaces, enabling information propagation and feature extraction through connections between nodes [21]. In recent years, GNNs have demonstrated outstanding performance in numerous real-world scenarios by integrating graph structures with node attributes, such as classification, regression, and clustering [22]. In transportation networks, GNNs model spatial dependencies between roads and are used for traffic flow prediction and congestion mitigation [23]. In bioinformatics and drug discovery, GNNs leverage molecular graphs to enhance protein modeling, molecular property prediction, and drug–target interaction analysis [24,25]. GNNs support node classification, community detection, and relationship prediction in social networks [26]. In recommender systems, GNNs integrate user–item interaction graphs to improve recommendation accuracy and robustness [27]. Further, GNNs have also found applications in natural language processing [28], brain disease analysis [29], demonstrating their ability to flexibly capture interactions based on graph structures and to support long-range dependency modeling [30,31,32].

In spatiotemporal data analysis, GNNs not only represent complex structures but also capture dynamic spatiotemporal dependencies, demonstrating strong adaptability and generalization ability. Zhang et al. [33] leveraged GNNs to extract meteorological features and employed a Gray Wolf Optimization (GWO) algorithm to adaptively optimize model parameters, yielding remarkable advantages, especially in handling complex external environmental factors. Liu et al. [34] integrated a spatial graph modeling module with a gated continuous-time forecasting cell for long-term PM_2.5 concentration prediction, which jointly models inter-city spatial dependencies and temporal evolution to improve adaptability to environmental conditions and complex meteorological conditions. Zhao et al. [35] developed a forecasting model integrating mixed graph convolutional GRU and a self-attention network, enhancing both the accuracy and stability of long-term forecasts.

The PM_2.5 concentration forecasting methods based on GNNs and temporal modeling have made significant advancements [30,31]. However, these methods still face several limitations. The spatiotemporal variations in air quality exhibit highly nonlinear and strongly coupled dynamics at both global and local scales. Meanwhile, the majority of existing long-term forecasting models primarily depend on short-term historical data, thereby neglecting the intrinsic long-term trends and periodicities [36]. Consequently, the predictions tend to be overly sensitive to short-term fluctuations, and due to the temporal variability of influencing factors across different periods, these models often struggle to capture the evolving temporal dependencies effectively. This challenge impedes the ability to simultaneously model long-range dependencies and fine-grained local interactions with sufficient accuracy [37,38]. To address these challenges, we propose a long-term PM_2.5 forecasting approach across multiple cities based on a dynamic global–local spatiotemporal graph framework. The main contributions of this work are summarized as follows:

(1): We propose an MS-iTransformer module to capture long-term trends in PM_2.5 sequences. The time series of each station is fed into an individual iTransformer to learn station-specific temporal dependencies. The MS-iTransformer improves the accuracy and robustness of long-term forecasts by station-wise normalization and multi-station self-attention.
(2): We propose a BSTA module to capture global spatiotemporal dynamic dependencies across all cities within a region. By integrating spatial and temporal bilinear attention mechanisms, BSTA adaptively constructs a dynamic Spatiotemporal Dynamic Graph (STDG) that models the evolving inter-city spatial correlations over time.
(3): We propose a GESM to capture localized spatiotemporal dependencies for fine-grained air quality prediction. The GESM aggregates neighbor information via graph convolution and models short-term temporal dynamics using recurrent units to effectively learn local interaction patterns across both spatial and temporal dimensions.

The rest of the paper is structured as follows: Section 2 introduces the study area and available data; Section 3 details the proposed methodology; Section 4 presents the experimental results and discussion; and Section 5 concludes this paper.

2. Study Area and Available Data

This paper selects 184 urban areas in China as the research object and conducts spatiotemporal prediction analysis of the PM_2.5 concentration, as shown in Figure 1. The formation and diffusion of air pollution are jointly affected by a variety of environmental and geographical factors, including meteorological variables, such as temperature, humidity, precipitation, wind speed, and air pressure, as well as distance between cities and terrain characteristics [39].

Initially, we considered a total of 17 meteorological variables in our model, including average temperature, meridional and zonal wind speed, relative humidity, precipitation, surface pressure, and others. To minimize the impact of irrelevant or redundant features on model performance, we employed random forest feature importance ranking and Pearson correlation coefficient (PCC) analysis, as shown in Figure 2 and Figure 3. Figure 2 shows the feature importance scores from the random forest—higher bars indicate greater influence. Figure 3 illustrates the correlation strength between each feature and PM_2.5 concentration based on PCC.

Specifically, we select eight features with strong physical significance and high statistical correlation to PM_2.5 concentrations: Boundary_layer_height is negatively correlated with PM_2.5, and higher layers facilitate vertical diffusion of pollutants. K-index indicates tropospheric instability; higher values reflect better diffusion conditions, usually resulting in lower PM_2.5 levels. u_component_of_wind + 950 & v_component_of_wind + 950 represent horizontal wind speed at 950 hPa (~500 m altitude), where PM_2.5 tends to accumulate; stronger winds help disperse pollutants. 2 m_temperature influences PM_2.5 through cold front activities and ventilation efficiency. Surface pressure is strongly associated with vertical stability; higher pressure may lead to stratified layers that trap pollutants. Relative_humidity + 950: Water vapor contributes to PM_2.5 formation and particle growth. Total_precipitation reduces PM_2.5 through wet scavenging and downward airflow, showing a strong negative correlation [40].

Further, the regional pollution not only comes from local emissions, but is also significantly affected by the transmission effect of neighboring regions. For example, the PM_2.5 levels of some cities in North China are often affected by the long-distance transmission of industrial emissions from Beijing and Tianjin [41]. The spatial diffusion of pollutants usually depends on factors such as wind direction, wind speed, atmospheric boundary layer height, and geographical proximity, which together shape a complex spatiotemporal propagation mechanism. Therefore, it is necessary to comprehensively consider the spatial correlation between cities and the potential impact of external inputs on local pollution levels.

The edge attributes of connecting adjacent city nodes consist of

w_{s}

,

w_{d}

,

d r_{f}

,

d_{b}

, and

a_{c}

; detailed descriptions of these features as provided in Table 1.

To more realistically simulate pollutant transmission pathways between cities, we introduce the advection coefficient

a_{c}

[40], as in Equation (1), to model dynamic inter-city pollution transport. This coefficient accounts for the interaction among wind direction, wind speed, and inter-city distance, effectively representing potential pollutant transmission routes.

a_{c} = R e L U (\frac{| w_{s} |}{d_{b}} \cos (d r_{f} - w_{d})),

(1)

When the wind speed is high and its direction aligns with the vector between two cities (i.e., cos (

d r_{f}

−

w_{d}

) approaches 1), and the distance between the cities is short, the pollutant transport is stronger, resulting in a higher coefficient value. Conversely, if the wind direction opposes the inter-city vector, the wind speed is low, or the distance is long, the coefficient becomes smaller, indicating weaker pollutant transport capability.

3. Methodology

Figure 4 illustrates the overall architecture of the proposed model, which consists of three main components: First, the MS-iTransformer module is proposed to capture long-term temporal trends from historical data at individual monitoring stations. Second, a BSTA module is proposed to dynamically construct a global inter-city spatiotemporal graph, thereby modeling spatial and temporal dependencies at the global scale. Finally, the GESM uncovers localized spatiotemporal dependencies through graph convolution and recurrent modeling.

3.1. Global Temporal Feature Extraction

Figure 5 shows that we utilize historical PM_2.5 observations from multiple cities and employ an iTransformer to predict PM_2.5 trends for each city over multiple future time steps. Specifically, given the historical PM_2.5 of multiple cities over the past

L

time steps

(t, t - 1, \dots \dots, t - L)

, the model forecasts PM_2.5 of these cities for the next

P

time steps. The core of the iTransformer lies in its ability to learn effective representations of historical PM_2.5 variations in each city and to enhance prediction accuracy by modeling the dynamic correlations among cities. The historical PM_2.5 time series of each city is passed through an embedding module to convert it into tokens that represent its characteristic temporal features. For the historical PM_2.5 of each city

X

, PM_2.5 features

Y^{'}

are then used for prediction via the following steps in Equation (2):

\begin{array}{l} h_{n}^{0} = E m b e d d i n g (X), \\ \begin{matrix} H^{l + 1} = {TrmBlock (H}^{l}), \end{matrix} l = 0, 1, \dots, L - 1, \\ Y^{'} = {Projection (h}_{n}^{L}), \end{array}

(2)

where

H = \{h 1, \dots, h_{N}\} \in ℝ^{N \times D}

is the feature representation of latent space for

N

cities, and

D

denotes the dimension of each token. Subsequently, the iTransformer employs a multi-city self-attention mechanism to model the tokens of different cities and learn their dynamic relationships. The feature extraction is first performed on the time series of each city to obtain a comprehensive representation

H = \{h 1, \dots, h_{N}\} \in ℝ^{N \times D}

. Then, the self-attention module uses linear mappings to generate query (Q), key (K), and value (V) [42]. The corresponding Q and K vectors of a given city are denoted as

q_{i}, k_{j} \in ℝ^{d_{k}}

. Each element of the score matrix before the Softmax operation in the attention computation is calculated as in Equation (3):

A_{i, j} = {({QK}^{T} / \sqrt{d_{k}})}_{i, j} \propto q_{i}^{T} k_{j},

(3)

Since the features of each city are normalized along the feature dimension before input, each element in the score matrix partially reflects the correlation of historical PM_2.5 trends between different cities. However, traditional normalization methods are not well-suited for multi-city PM_2.5 prediction tasks as different cities are influenced by varying pollution sources, geographic environments, and climatic conditions. If we apply a uniform normalization across all cities, which introduces non-causal noise and may cause temporal lag effects, thereby affecting learning local dynamic patterns for each city, and degrading prediction performance. Therefore, the historical PM_2.5 of each city is individually normalized to a standard normal distribution (mean 0, variance 1), which is defined as in Equation (4):

L a y e r N o r m (H) \{\frac{h_{n} - M e a n (h_{n})}{\sqrt{V a r (h_{n})}} |n = 1, \dots, N\},

(4)

where

μ_{i}

represents the mean and

σ_{i}

denotes the standard deviation of the historical PM_2.5 of the

i

-th city. After individual normalization, a feed-forward network is applied to the entire historical PM_2.5 of each city. Finally, the latent space features are mapped to PM_2.5 at future time steps by a linear projection map.

3.2. Global Spatiotemporal Dependency of Auxiliary Features

Many existing spatial models use static graphs based on fixed distance or correlation, which fail to account for evolving inter-city pollution transport paths driven by dynamic wind patterns, shifting emission sources, or seasonal meteorology [37]. To capture the dynamic spatiotemporal correlations among stations within a region across different time periods, we propose a BSTA based on a bilinear spatiotemporal attention mechanism. The BSTA adaptively constructs an STDG with temporal dynamics to effectively mine the evolving spatiotemporal dependencies among multiple stations, thereby enhancing the regional generalization ability and spatiotemporal modeling capacity. Figure 6 illustrates the structure of the BSTA, comprising the temporal bilinear attention (TBA) and spatial bilinear attention (SBA) module to model dynamic correlations in the temporal and spatial dimensions, respectively.

Temporal bilinear attention module: The TBA module first models the correlations between different time steps along the temporal dimension to highlight the characteristics of key moments in the sequence. Let the input temporal sequence be

X \in ℝ^{L \times N \times d}

,

L

denotes the number of historical time steps,

N

represents the number of monitoring stations, and

D

denotes the feature dimension at each time step. The temporal attention matrix

E \in ℝ^{T \times T}

is defined as in Equation (7):

l_{time} = (X^{T} \cdot G_{1}) \cdot G_{2},

(5)

r_{t i m e} = G_{3} \cdot X,

(6)

E = Softmax (V_{e} \cdot sigmod (l \cdot r + b_{e})),

(7)

where

σ

denotes the activation function,

X

is the input features, and the learnable parameters include

V_{e}, b_{e} \in R^{T \times T}

,

U_{1} \in R^{N}

,

U_{2} \in R^{d_{m} \times N}

,

U_{3} \in R^{d_{m}}

. This bilinear attention mechanism integrates both the station and feature dimension information to capture the global correlations across different time steps. Subsequently, we normalize the temporal attention matrix

E

to obtain the temporal attention weights. Finally, the input feature tensor

X

is multiplied by

E

to obtain the weighted temporal feature matrix

X_{E} \in R^{N \times T \times d_{m}}

as in Equation (8):

X_{E} = X \cdot E,

(8)

Spatial bilinear attention module: The SBA module is capable of perceiving the influence of temporal dynamics on spatial relationships. After obtaining the temporally weighted features

X_{E}

, we further model the dynamic correlations among different monitoring stations in the spatial dimension. Further, we construct a spatial attention matrix using a bilinear operation as in Equation (11):

l_{s} = (X_{E} \cdot Q_{1}) \cdot Q_{2},

(9)

r_{s} = Q_{3} \cdot X_{E},

(10)

S = softmax (V_{s} \cdot sigmod (l_{s} \cdot r_{s}^{T} + b_{s})),

(11)

where

V_{s}, b_{s} \in ℝ^{N \times N}

,

Q_{1} \in ℝ^{T}

,

Q_{2} \in ℝ^{d_{m} \times T}

,

Q_{3} \in ℝ^{d_{m}}

are learnable parameters, and

S

is the normalized spatial attention weight matrix.

The spatial attention weight matrix constitutes an STDG that varies with the input sequence, is directly utilized in downstream graph-structured modeling tasks, such as graph convolution.

Based on the above analysis, the BSTA jointly models the spatiotemporal dependencies across multiple monitoring stations through TBA and SBA mechanisms to dynamically generate an STDG structure, thereby effectively enhancing its ability to represent and learn complex spatiotemporal structures across multiple regions.

Our BSTA module introduces a dynamic graph construction mechanism via bilinear attention, which jointly considers spatial and temporal relevance. Specifically, bilinear temporal attention reweights the input features to emphasize temporally salient patterns, which are then used in bilinear spatial attention to infer adaptive spatiotemporal graphs. This allows the model to adaptively capture changing inter-city relationships over time—for example, temporary downwind pollution transfer from one city to another. Thus, BSTA complements the limitations of static or implicitly encoded spatial methods by offering flexible, data-driven graph adaptation.

3.3. Local Spatiotemporal Feature Extraction

The pollutant diffusion and meteorological transport between neighboring regions exhibit significant spatial correlations in practical urban air quality forecasting scenarios. Therefore, we construct a spatial topology graph based on the adjacency relationships among cities to capture these inter-regional interactions and enhance representational capacity. Figure 7 shows the structure of the GESM. Specifically,

N

cities is defined as a set

C = {c_{1}, c_{2}, \dots \dots, c_{N}}

, and

c_{n}

denotes the time series of monitoring data for city

n

. The model considers not only the historical observations of the city but also incorporates the historical data of its first-order and higher-order neighbors based on the spatial graph structure to construct the input features and predict the corresponding values of

c_{n}

over the next

P

time steps

(t + 1, t + 2, \dots \dots, t + P)

within the past

L

time steps

(t, t - 1, \dots \dots, t - L)

at a given time

t

for a target city

n

.

We introduce a GNN to capture the latent spatial dependencies between cities, which dynamically learns the interaction mechanisms between nodes by integrating both the spatial topological structure and node feature information, thereby enhancing the representation of the spatiotemporal air quality dynamics. The GNN typically consists of aggregation, update, and iteration. Specifically, the feature of a single city node may be sparse and insufficient to accurately reflect its future air quality trends. Therefore, the GNN aggregates information from neighboring nodes of a target node, leveraging their features to compensate for the insufficiency of single-node representations. Then, the aggregated features from neighboring nodes are fused with the original features of the target node to update its representation through weighted summation, nonlinear transformations, or gated mechanisms. Finally, the GNN performs multiple rounds of aggregation and update, progressively incorporating broader neighborhood information for each node until the feature representations converge or it reaches a predefined number of iterations.

The graph convolution is a commonly used implementation of the GNN in practical applications [43]. In this study, we employ a Graph Convolutional Network (GCN) to capture the spatial dependencies among monitoring stations. Specifically, we construct the normalized Laplacian matrix based on the adjacency matrix

A

and the degree matrix

D

, both obtained during the preprocessing stage. Specifically, the adjacency matrix

A

is computed using Vincenty’s formula [44] to accurately quantify the geodesic distances between spatial nodes. The normalized Laplacian is then formulated as in Equation (12):

\hat{A} = D^{- \frac{1}{2}} (D - A) D^{\frac{1}{2}},

(12)

where the matrix is regarded as the aggregation operator in each update iteration. For example, the computation process of the

m

-layer GCN is computed as in Equation (13):

\begin{array}{l} F_{t}^{(1)} = Re lu (\hat{A} F_{t}^{(o)} W^{(O)}) \\ F_{t}^{(2)} = Re lu (\hat{A} F_{t}^{(1)} W^{(1)}) \\ \dots \dots \\ F_{t}^{(m)} = Re lu (\hat{A} F_{t}^{(m - 1)} W^{(m - 1)}) \end{array},

(13)

where

F_{t}^{(m)}

denotes the feature matrix of all nodes (i.e., cities) at the time step after the

m

-th iteration, and

W^{(m - 1)}

is a learnable weight matrix. The embedding feature matrix includes the spatial dependencies among cities, which is reflected in two key aspects: (1) Each row of the matrix denotes the embedding feature of a city at time step

t

, which no longer corresponds to the raw air quality or meteorological indicators, and its dimensionality has typically changed. The features of neighboring nodes are aggregated through multiplication with the normalized Laplacian matrix

\hat{A}

during each update, maintaining the original feature dimension. Then, it is followed by multiplication with a non-square learnable weight matrix

W^{(\cdot)}

that performs feature dimension transformation, typically mapping the features into a lower-dimensional space. (2) The aforementioned matrix represents the spatial feature embeddings of all cities at time step

t

, capturing the spatial relationships among features. To fully model temporal dynamics, the same graph convolution operation is applied to the city features over the past

L

time steps

(t, t - 1, \dots \dots, t - L)

, forming a feature sequence

F_{t}^{(m)}, F_{t - 1}^{(m)}, \dots \dots, F_{t - L}^{(m)}

that incorporates historical temporal information. Subsequently, we form a new input feature tensor with temporal features and global spatiotemporal features, together with the hidden state

h_{n}

from the previous time step, which is fed into the GRU to update the hidden state. The updated hidden state serves as the final feature representation for the current time step and followed by a fully connected layer to generate the prediction tensor

x_{n}

, appending to the prediction sequence pm25_pred, thereby producing the predicted PM_2.5 concentrations for all cities over the next

M

time steps.

4. Experimental Setting and Results Analysis

4.1. Experimental Setting

Dataset: The Know Air dataset [40] contains air quality monitoring data spanning four years, from 1 January 2015 to 31 December 2018, covering 184 cities (nodes) across China. To evaluate the effectiveness of our model in multi-city PM_2.5 concentration prediction, we use historical data from the past 3 h to forecast PM_2.5 trends for the next 72 h across all cities.

To assess the performance of our model in different scenarios, we divide the dataset into datasets 1, 2, and 3, as detailed in Table 2. Dataset 1: The entire dataset consists of training, validation, and testing sets, with a ratio of 2:1:1, evaluating overall air quality prediction performance. Dataset 2: This subset focuses on winter high-pollution periods and is partitioned equally (1:1:1). Due to increased emissions from winter heating combined with frequent northerly or northwesterly winds, PM_2.5 emissions and long-range transport become more severe, which is more challenging for prediction, thus evaluating the adaptability under extreme pollution conditions. Dataset 3: This subset predicts PM_2.5 concentrations of the subsequent month by using the first three months, with a split ratio of 3:1:1, which mainly serves to evaluate the model performance on long-term trend forecasting [45]. By employing these dataset partitions, we are able to comprehensively evaluate the adaptability and generalization ability of the model across diverse prediction scenarios.

Experimental Settings: We perform hyperparameter tuning using grid search and refer to the initialization strategies of baseline models. The Adam optimizer is used to adaptively adjust the learning rate based on gradient magnitudes, improving the stability and convergence speed of training. The batch size is set to 32, and the number of epochs is 50. The learning rate is set to 0.005, and the weight decay is 0.0001. To prevent overfitting and improve training efficiency, an early stopping mechanism is introduced to monitor performance on the validation set. Specifically, training is terminated early if the validation performance did not improve or worsened for 10 consecutive epochs, thereby avoiding unnecessary training and saving computational resources. Additionally, the Mean Squared Error (MSE) is employed as the loss function to quantify the difference between predicted and actual PM_2.5 concentrations, serving as the optimization objective during model training.

Evaluation Metrics: To comprehensively evaluate the prediction accuracy and fitting performance, this paper utilizes the Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²) [46] as evaluation metrics, which are defined as in Equation (14):

\begin{array}{l} R M S E = \sqrt{\frac{\sum_{j = 1}^{n} {(X_{j} - {\hat{X}}_{j})}^{2}}{n}} \\ M A E = \frac{\sum_{j = 1}^{n} |X_{j} - {\hat{X}}_{j}|}{n} \\ R^{2} = \frac{1}{n} (1 - \frac{\sum_{j = 1}^{n} (X_{j} - {\hat{X}}_{j})}{{\sum_{j = 1}^{n} (X_{j} - {\hat{X}}_{j})}^{2}}) \end{array},

(14)

where

n

is the number of samples,

X_{j}

and

{\hat{X}}_{j}

denote the actual and predicted PM_2.5 concentrations of the

j

-th sample, respectively, and

X

represents the mean PM_2.5 concentration across all samples.

Further, we use the Critical Success Index (CSI) and False Alarm Rate (FAR) [33] to comprehensively evaluate model performance under pollution threshold conditions in air quality prediction tasks. CSI measures the accuracy in predicting rare pollution events (e.g., high PM_2.5 concentrations), with higher values indicating better detection capability. The FAR assesses the frequency of incorrect pollution event predictions, where lower values suggest more reliable performance in forecasting such events. Specifically, we binarize the predicted and actual observed PM_2.5 concentrations into 0/1 to determine whether it exceeds a PM_2.5 threshold of 75 μg/m³ in the ambient air quality standards of China [40], as the criterion for good air quality.

4.2. Experimental Results and Analysis

4.2.1. Comparison Evaluation with Different Models

To comprehensively evaluate the effectiveness of the proposed model, we conduct comparisons against various types of baseline models, including single-sequence forecasting models, hybrid deep learning models, and optimized deep learning models with swarm intelligence. Specifically, MLP, LSTM [47], and GRU [48] represent single-sequence forecasting models; GC-LSTM [49] and PM2.5-GNN [40] fall under hybrid deep learning models; and GWO-GART [33] is a deep learning model enhanced through swarm intelligence optimization. The EGCFC [34] method combined a hybrid graph convolutional GRU with a self-attention network. Table 3, Table 4, and Table 5 show the performance comparison between the proposed model and baseline methods on datasets 1, 2, and 3, respectively, with each value representing the average result over 10 independent runs.

It can be seen from Table 3, Table 4 and Table 5 that the traditional models, such as MLP, LSTM, and GRU, generally underperform compared to more advanced architectures in terms of prediction accuracy, where the MLP shows the weakest performance, and the LSTM and GRU demonstrate slightly better capability in capturing long-term dependencies within time series. Due to the influence of multiple complex factors on PM_2.5 concentrations, single neural network models exhibit limited expressive power. In contrast, these methods (GAT-GRU, GC-LSTM, and PM_2.5-GNN) effectively capture both spatial and temporal dependencies to significantly improve prediction performance.

Particularly, PM_2.5-GNN enhances the interactions among nodes, resulting in improvements in predictive accuracy over GC-LSTM. Furthermore, GWO-GART integrates the GWO algorithm, which surpasses PM_2.5-GNN, demonstrating the benefit of swarm intelligence in model optimization. Moreover, the EGCFC mode strengthens the capacity to capture long-term spatiotemporal dependencies, outperforming the above methods. Notably, our proposed model significantly enhances its ability to capture complex spatiotemporal patterns in long-term PM_2.5 forecasting. Specifically, the MS-iTransformer module effectively captures long-range temporal trends specific to each monitoring site through station-wise temporal encoding and multi-station self-attention, improving robustness and accuracy over extended periods; the BSTA module introduces a bilinear attention mechanism that dynamically constructs spatiotemporal dependency graphs, enabling the model to adaptively learn evolving inter-city correlations across time; the GESM leverages graph convolution and recurrent modeling to extract fine-grained local spatial and temporal dependencies, enabling precise modeling of localized air pollution dynamics. Therefore, our proposed method excels by jointly capturing long-term temporal trends, dynamic global spatial dependencies, and localized spatiotemporal interactions, enabling superior multi-city PM_2.5 forecasting performance over the existing methods.

Specifically, compared to the EGCFC model, the MAE and RMSE of the proposed method decreased by 0.7537 and 0.7751, respectively; CSI and R² are improved by 0.0071 and 0.0159; and the FAR decreased by 0.0272 on dataset 1, as shown in Table 3. The MAE and RMSE decreased by 1.0849 and 1.0547, respectively; R² is improved by 0.0188; and the FAR decreased by 0.0440 on dataset 2, as shown in Table 4. The MAE and RMSE decreased by 1.7665 and 1.8578, respectively; CSI and R² are improved by 0.0194 and 0.0260; and the FAR decreased by 0.0312 on dataset 3, as shown in Table 5. These results comprehensively demonstrate the robustness, adaptability, and superior generalization capability of the proposed model under diverse forecasting scenarios.

4.2.2. Comparison Forecast Performance in a Representative City

To validate the effectiveness of our proposed model, Figure 8 illustrates the differences between the predicted and actual PM_2.5 concentrations on dataset 3 across different models. Specifically, we select Xianyang city to evaluate predictive capabilities as it presents multiple pollution sources and significant PM_2.5 fluctuations.

It can be seen from Figure 8 that our model demonstrates higher fitting accuracy compared to other methods, with its prediction curve closely aligning with the observed values. Notably, our model maintains strong predictive performance during periods of high PM_2.5 concentrations. These results indicate that our model outperforms others in long-term forecasting tasks, particularly under complex pollution conditions.

To further validate the effectiveness of our proposed method, we evaluate the correlation between the predicted and actual PM_2.5 concentrations. Figure 9 presents the scatter plot of predictions for Xianyang city, where the solid line is the linear regression fit and the dashed line is the reference line. Our model demonstrates a significantly higher correlation between predicted and observed values compared to other models. The predicted points and the fitted regression line are notably closer to the reference line, accurately capturing the variation patterns of PM_2.5 concentrations in long-term forecasting tasks. Particularly, our model consistently outperforms others under high PM_2.5 concentration scenarios. In contrast, the predictions of other models show larger deviations from the actual values, highlighting the superior accuracy of our proposed model even under complex environmental conditions.

Further, we also compare prediction performance in Yanan, which is located in a sparse monitoring region, as shown in Figure 10. To elaborate, the sparse spatial layout around Yanan (Figure 1) limits the available contextual information from neighboring stations, thereby weakening the spatial dependencies that can be effectively captured by the model. In addition, the observed PM_2.5 series in Yanan exhibits strong fluctuations and irregular patterns, as shown in Figure 10, where the MLP model yields a low R² value of 0.1062. This suggests that unstable and noisy temporal dynamics further hinder predictive accuracy. These findings highlight that both limited spatial connectivity and temporal instability can negatively affect model performance. Nevertheless, our model demonstrates robustness under such adverse conditions, outperforming baseline methods in both stable and unstable environments.

The results reveal that station density and data stability both impact predictive performance. However, our model consistently maintains superior results in both scenarios. Consequently, the model generalizes well to both densely and sparsely instrumented areas, as evidenced by the strong correlation and low error metrics achieved in both cases.

Nevertheless, we also recognize potential limitations. The current model is primarily trained and tested on PM_2.5 data. While the architecture is flexible and can be extended to other pollutants (e.g., NO₂ and O₃), this transfer may require retraining and feature re-selection to account for different physical and chemical characteristics.

4.2.3. Comparison of Our Model and Existing Methods at Multiple Time Steps

The long-term prediction typically extends beyond 48 h [33]. Due to the accumulation of meteorological influences and pollution dynamics over multiple days, we use 60 h and 72 h forecasts as representative examples of long-term forecasting. Table 6 presents the evaluation results of MAE and RMSE on dataset 3. It can be seen that the proposed model significantly outperforms the other methods in both metrics. For example, compared to the EGCFC method for the 60 h and 72 h forecasting tasks, the MAE is reduced by 1.7849 and 2.4503, respectively, and the RMSE is reduced by 2.5580 and 1.8578, respectively. These results demonstrate that our method has improvements in prediction accuracy over other models, which are attributed to the explicit modeling of dynamic spatiotemporal relationships, enhancing its ability to track evolving pollutant transport patterns rather than relying solely on short-term autoregressive signals.

Our method also demonstrates marked advantages in detecting and handling extreme pollution events. Table 7 presents the evaluation results of the CSI and FAR metric on dataset 3. It can be seen that our model achieves the best performance among all compared methods in terms of CSI, with particularly notable improvements in mid-term to long-term forecasting. Specifically, the CSI is improved by 0.0154 in the 60 h forecast and by 0.0194 in the 72 h forecast, suggesting a higher true positive rate in predicting pollution exceedances. These results demonstrate the superior performance of our proposed model in long-term prediction tasks, which is attributed to its precise modeling of complex inter-city dependencies across both spatial and temporal dimensions. For the FAR metric, it is observed that our proposed model performs comparably to EGCFC in short-term predictions (3–24 h) but exhibits significant advantages in mid-term to long-term predictions (24–72 h), where the FAR is decreased by 0.0321 at 60 h and by 0.0312 at 72 h, meaning fewer false alarms in severe cases. This robustness under extreme conditions reflects the effectiveness of our spatial edge design, particularly the advection coefficient, which encodes physically plausible pollution transport based on wind direction and city-to-city proximity. This allows the model to anticipate pollutant incursions even when the local conditions alone do not strongly indicate a pollution rise.

Table 8 presents the comparison results of the R² metric on dataset 3. Similar to the FAR metric, our proposed model exhibits comparable performance to EGCFC in short-term predictions (3–24 h). However, the proposed method demonstrates a significant advantage in mid-term to long-term predictions (24–72 h), with R² improvements of 0.0044 and 0.0260 at 60 and 72 h, respectively, indicating that our model better captures the underlying variance in PM_2.5 concentrations over extended horizons. This advantage is closely related to our model’s ability to accurately represent both short-range and long-range pollutant transport patterns, increasingly important in long-term forecasting. While baseline models often rely on temporal continuity or local correlations—whose predictive power diminishes beyond 24 h—our approach leverages both physically grounded features (e.g., boundary layer height, K-index, and wind components) and adaptive graph structures based on advection coefficients, enabling the model to simulate evolving pollution dynamics across city networks. This facilitates more accurate extrapolation beyond immediate historical data. In addition, the spatiotemporal attention mechanisms embedded in our architecture allow for flexible reweighting of relevant historical and spatial information, dynamically adapting to different atmospheric and emission regimes. This ensures that critical dependencies—such as long-range transport during stagnant weather or delayed cross-city pollution drift—are not overlooked in multi-step forecasting.

4.2.4. Comparison of Model Runtime and Complexity

To evaluate the computational efficiency of our proposed model, we report both runtime (in seconds) and complexity, as shown in Table 9. All models are tested under the same experimental environment to ensure fairness. It can be seen from Table 9 that although simple models like MLP, GRU, and LSTM have very low runtime and computational complexity, they also show significantly lower performance in terms of prediction accuracy (as discussed in Section 4.2.1). Our model, while more complex than traditional RNN-based models (e.g., GRU and LSTM), demonstrates competitive computational efficiency when compared to other graph-based models, such as PM_2.5-GNN, EGCFC, and especially GWO-GART. Specifically, our model achieves a runtime of 1001.20 s with FLOPs = 52.008 G and only 0.090 M parameters, which is about 11% faster than EGCFC (1125.36 s), substantially more efficient than GWO-GART. Compared to PM_2.5-GNN, our model has slightly higher FLOPs (52.008 G vs. 51.330 G), but a longer runtime primarily due to the richer modeling components and edge feature mechanisms. However, it remains well within a practical range for real-world forecasting scenarios. These results clearly demonstrate that the proposed method provides a favorable trade-off between model expressiveness and computational feasibility, making it suitable for large-scale spatiotemporal applications, even when forecasting air pollution across a large-scale urban network with 184 cities.

4.3. Ablation Study

We conduct ablation experiments on datasets 1, 2, and 3, as shown in Table 10, to evaluate the contribution of the key module in our proposed model. Specifically, we design the following ablation variants:

w/o GESM: This variant removes the GESM, which includes graph convolution and recurrent units used to capture localized spatial and short-term temporal dependencies. When removing the GESM, our proposed model achieves significant improvements in the MAE and RMSE, with R² and CSI increasing by 0.0024 and 0.0012, respectively. Meanwhile, the FAR is decreased by 0.0097 on dataset 1. Further, it can be seen from Table 10 that our model achieves a more notable improvement in prediction accuracy compared to removing the GESM on datasets 2 and 3. Many baselines either model spatial and temporal dependencies separately or overlook localized spatiotemporal patterns (e.g., city-specific short-term meteorological events). The GESM integrates graph convolution with gated recurrent mechanisms to jointly model short-term dynamics and localized structural dependencies. This is particularly effective in scenarios where a city experiences a sudden weather shift or emission spike that does not immediately propagate to others. The GESM ensures such localized signals are captured and learned effectively, reducing false alarms and enhancing response sensitivity.

w/o MS-iTransformer: This variant removes the MS-iTransformer module, which is designed to capture long-term temporal trends from the historical data of individual stations. When removing the MS-iTransformer module, our proposed model achieves significant improvements in the MAE and RMSE, with R² and CSI increasing by 0.0227 and 0.0144, respectively. Meanwhile, the FAR is decreased by 0.0147 on dataset 3. Further, our model achieves a more notable improvement in prediction accuracy compared to removing MS-iTransformer on datasets 1 and 2. Traditional time-series models (e.g., LSTM and GRU) focus primarily on short-term or fixed-length temporal windows. These methods struggle to learn long-term pollutant accumulation trends or delayed meteorological influences (e.g., distant precipitation or multi-day wind patterns) [30]. The MS-iTransformer overcomes this by leveraging a station-specific transformer structure, which enables the model to extract long-range temporal dependencies without being constrained by fixed memory lengths. This is particularly beneficial for recognizing multi-day pollution buildup or lagged meteorological effects, which often influence air quality trends beyond 48 h.

Notably, the proposed model achieves the best performance across all datasets, with the MAE and RMSE reduced by 4.0328 and 4.6769, the CSI increased by 0.0364, the FAR decreased by 0.0740, and R² improved by 0.0630 compared to the baseline model on dataset 3. The ablation study results demonstrate that the spatial feature and global feature modules are effective in enhancing long-term prediction performance.

5. Conclusions

This paper proposes a novel dynamic global–local spatiotemporal graph framework for the long-term PM_2.5 forecasting across multiple cities. Specifically, the MS-iTransformer module captures station-specific long-term temporal trends, the BSTA module dynamically models inter-city spatiotemporal dependencies, and the GESM learns fine-grained local interactions. By jointly modeling long-range temporal patterns, global spatial dynamics, and localized spatiotemporal relationships, the proposed model demonstrates superior performance in multi-step PM_2.5 prediction, showing significant improvements over existing methods on real-world multi-city air quality datasets. The experimental results demonstrate that the MAE, RMSE, and FAR are decreased by 1.7665, 1.8578, and 0.0312, respectively. The CSI and R² are improved by 0.0194 and 0.0260, respectively.

Author Contributions

Conceptualization, R.W.; methodology, Y.H.; validation, Y.H.; formula derivation, Y.H., and R.W.; writing—original draft preparation, Y.H.; writing—review and editing, Y.H., Y.X., and S.F.; visualization, X.Z.; funding acquisition, R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (NO. 61771299) and by the National Natural Science Foundation of China (grant number 62071286).

Data Availability Statement

Data are contained within the article.

Acknowledgments

We thank the Ministry of Ecology and Environment of China for providing the PM_2.5 concentration data for each city, and the ERA5 atmospheric reanalysis project for offering the meteorological and environmental indicators used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Feng, Y.; Castro, E.; Wei, Y.; Jin, T.; Qiu, X.; Dominici, F.; Schwartz, J. Long-term exposure to ambient PM_2.5, particulate constituents and hospital admissions from non-respiratory infection. Nat. Commun. 2024, 15, 1518. [Google Scholar] [CrossRef]
Lim, S.; Bassey, E.; Bos, B.; Makacha, L.; Varaden, D.; Arku, R.E.; Baumgartner, J.; Brauer, M.; Ezzati, M.; Kelly, F.J.; et al. Comparing human exposure to fine particulate matter in low and high-income countries: A systematic review of studies measuring personal PM_2.5 exposure. Sci. Total Environ. 2022, 833, 155207. [Google Scholar] [CrossRef]
Han, D.; Guo, Y.; Wang, J.; Zhao, B. Global disparities in indoor wildfire-PM_2.5 exposure and mitigation costs. Sci. Adv. 2025, 11, eads4360. [Google Scholar] [CrossRef] [PubMed]
Bae, M.; Kang, Y.; Kim, E.; Kim, S.; Kim, S. A multifaceted approach to explain short-and long-term PM_2.5 concentration changes in Northeast Asia in the month of January during 2016–2021. Sci. Total Environ. 2023, 880, 163309. [Google Scholar] [CrossRef] [PubMed]
Wei, J.; Wang, J.; Li, Z.; Kondragunta, S.; Anenberg, S.; Wang, Y.; Zhang, H.; Diner, D.; Hand, J.; Lyapustin, A.; et al. Long-term mortality burden trends attributed to black carbon and PM_2.5 from wildfire emissions across the continental USA from 2000 to 2020: A deep learning modelling study. Lancet Planet. Health 2023, 7, e963–e975. [Google Scholar] [CrossRef] [PubMed]
Lin, M.D.; Liu, P.Y.; Huang, C.W.; Lin, Y.H. The application of strategy based on LSTM for the short-term prediction of PM_2.5 in city. Sci. Total Environ. 2024, 906, 167892. [Google Scholar] [CrossRef]
Zhu, S.; Tang, J.; Zhou, X.; Li, P.; Liu, Z.; Zhang, C.; Zou, Z.; Li, T.; Peng, C. Research progress, challenges, and prospects of PM2. 5 concentration estimation using satellite data. Environ. Rev. 2023, 31, 605–631. [Google Scholar] [CrossRef]
Ma, Z.; Dey, S.; Christopher, S.; Liu, R.; Bi, J.; Balyan, P.; Liu, Y. A review of statistical methods used for developing large-scale and long-term PM_2.5 models from satellite data. Remote Sens. Environ. 2022, 269, 112827. [Google Scholar] [CrossRef]
Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 2001, 5, 2. [Google Scholar]
Graves, A.; Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; MIT Press: Cambridge, MA, USA, 2012; pp. 37–45. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Waqas, M.; Humphries, U.W. A critical review of RNN and LSTM variants in hydrological time series predictions. MethodsX 2024, 13, 102946. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Li, J.; Zhang, S.; Zhang, S.; Yan, M.; Xiong, H. Expanding the prediction capacity in long sequence time-series forecasting. Artif. Intell. 2023, 318, 103886. [Google Scholar] [CrossRef]
Zhu, J.; Deng, F.; Zhao, J.; Zheng, H. Attention-based parallel networks (APNet) for PM_2.5 spatiotemporal prediction. Sci. Total Environ. 2021, 769, 145082. [Google Scholar] [CrossRef]
Fang, S.; Li, Q.; Karimian, H.; Liu, H.; Mo, Y. DESA: A novel hybrid decomposing-ensemble and spatiotemporal attention model for PM_2.5 forecasting. Environ. Sci. Pollut. Res. 2022, 29, 54150–54166. [Google Scholar] [CrossRef] [PubMed]
Wen, C.; Liu, S.; Yao, X.; Peng, L.; Li, X.; Hu, Y.; Chi, T. A novel spatiotemporal convolutional long short-term neural network for air pollution prediction. Sci. Total Environ. 2019, 654, 1091–1099. [Google Scholar] [CrossRef]
Zhang, K.; Yang, X.; Cao, H.; Thé, J.; Tan, Z.; Yu, H. Multi-step forecast of PM_2.5 and PM10 concentrations using convolutional neural network integrated with spatial–temporal attention and residual learning. Environ. Int. 2023, 171, 107691. [Google Scholar] [CrossRef] [PubMed]
Teutscher, D.; Bukreev, F.; Kummerländer, A.; Simonis, S.; Bächler, P.; Rezaee, A.; Hermansdorfer, M.; Krause, M.J. A digital urban twin enabling interactive pollution predictions and enhanced planning. Build. Environ. 2025, 281, 113093. [Google Scholar] [CrossRef]
Zhang, D.; Martin, R.V.; Bindle, L.; Li, C.; Eastham, S.D.; van Donkelaar, A.; Gallardo, L. Advances in simulating the global spatial heterogeneity of air quality and source sector contributions: Insights into the global South. Environ. Sci. Technol. 2023, 57, 6955–6964. [Google Scholar] [CrossRef]
Chen, X.; Zhang, Y.; Wang, Y.; Zhang, L.; Yi, Z.; Zhang, H.; Mathiopoulos, P.T. A spatiotemporal interpolation graph convolutional network for estimating PM_2.5 concentrations based on urban functional zones. IEEE Trans. Geosci. Remote Sens. 2022, 61, 1–14. [Google Scholar] [CrossRef]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef]
Jin, M.; Koh, H.Y.; Wen, Q.; Zambon, D.; Alippi, C.; Webb, G.I.; King, I.; Pan, S. A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10466–10485. [Google Scholar] [CrossRef]
Peng, H.; Wang, H.; Du, B.; Bhuiyan, M.Z.A.; Ma, H.; Liu, J.; Wang, L.; Yang, Z.; Du, L.; Wang, S.; et al. Spatial temporal incidence dynamic graph neural networks for traffic flow forecasting. Inf. Sci. 2020, 521, 277–290. [Google Scholar] [CrossRef]
Li, Y.; Liang, W.; Peng, L.; Zhang, D.; Yang, C.; Li, K.C. Predicting drug-target interactions via dual-stream graph neural network. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 21, 948–958. [Google Scholar] [CrossRef]
Zhang, Y.; Hu, Y.; Han, N.; Yang, A.; Liu, X.; Cai, H. A survey of drug-target interaction and affinity prediction methods via graph neural networks. Comput. Biol. Med. 2023, 163, 107136. [Google Scholar] [CrossRef] [PubMed]
Kumar, S.; Mallik, A.; Khetarpal, A.; Panda, B.S. Influence maximization in social networks using graph embedding and graph neural network. Inf. Sci. 2022, 607, 1617–1636. [Google Scholar] [CrossRef]
Sharma, K.; Lee, Y.C.; Nambi, S.; Salian, A.; Shah, S.; Kim, S.W.; Kumar, S. A survey of graph neural networks for social recommender systems. ACM Comput. Surv. 2024, 56, 1–34. [Google Scholar] [CrossRef]
Chen, C.; Wu, Y.; Dai, Q.; Zhou, H.Y.; Xu, M.; Yang, S.; Han, X.; Yu, Y. A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10297–10318. [Google Scholar] [CrossRef]
Klepl, D.; Wu, M.; He, F. Graph neural network-based eeg classification: A survey. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 493–503. [Google Scholar] [CrossRef]
Ye, Y.; Cao, Y.; Dong, Y.; Yan, H. A Graph Neural Network and Transformer-based model for PM_2.5 prediction through spatiotemporal correlation. Environ. Model. Softw. 2025, 191, 106501. [Google Scholar] [CrossRef]
Chen, Y.; Wu, Y.; Zhang, S.; Yuan, K.; Huang, J.; Shi, D.; Hu, S. Regional PM_2.5 prediction with hybrid directed graph neural networks and Spatio-temporal fusion of meteorological factors. Environ. Pollut. 2025, 366, 125404. [Google Scholar] [CrossRef]
Chang-Silva, R.; Tariq, S.; Loy-Benitez, J.; Yoo, C. Smart solutions for urban health risk assessment: A PM_2.5 monitoring system incorporating spatiotemporal long-short term graph convolutional network. Chemosphere 2023, 335, 139071. [Google Scholar] [CrossRef] [PubMed]
Zhang, C.; Wang, S.; Wu, Y.; Zhu, X.; Shen, W. A long-term prediction method for PM_2.5 concentration based on spatiotemporal graph attention recurrent neural network and grey wolf optimization algorithm. J. Environ. Chem. Eng. 2024, 12, 111716. [Google Scholar] [CrossRef]
Zhang, C.; Li, X.; Sheng, H.; Shen, Y.; Xie, W.; Zhu, X. Long-term prediction method for PM_2.5 concentration using edge channel graph attention network and gating closed-form continuous-time neural networks. Process Saf. Environ. Prot. 2024, 189, 356–373. [Google Scholar] [CrossRef]
Zhao, G.; Yang, X.; Shi, J.; He, H.; Wang, Q. A PM_2.5 spatiotemporal prediction model based on mixed graph convolutional GRU and self-attention network. Environ. Pollut. 2025, 368, 125748. [Google Scholar] [CrossRef]
Zheng, W.; Hu, J. Multivariate time series prediction based on temporal change information learning method. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 7034–7048. [Google Scholar] [CrossRef]
Chen, X.; Hu, Y.; Dong, F.; Chen, K.; Xia, H. A multi-graph spatial-temporal attention network for air-quality prediction. Process Saf. Environ. Prot. 2024, 181, 442–451. [Google Scholar] [CrossRef]
Hu, W.; Zhang, Z.; Zhang, S.; Chen, C.; Yuan, J.; Yao, J.; Zhao, S.; Guo, L. Learning spatiotemporal dependencies using adaptive hierarchical graph convolutional neural network for air quality prediction. J. Clean. Prod. 2024, 459, 142541. [Google Scholar] [CrossRef]
Liu, X.; Chang, M.; Zhang, J.; Wang, J.; Gao, H.; Gao, Y.; Yao, X. Rethinking the causes of extreme heavy winter PM_2.5 pollution events in northern China. Sci. Total Environ. 2021, 794, 148637. [Google Scholar] [CrossRef]
Wang, S.; Li, Y.; Zhang, J.; Meng, Q.; Meng, L.; Gao, F. PM_2.5-gnn: A domain knowledge enhanced graph neural network for PM2.5 forecasting. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2020; pp. 163–166. [Google Scholar]
Pang, N.; Gao, J.; Che, F.; Ma, T.; Liu, S.; Yang, Y.; Zhao, P.; Yuan, J.; Liu, J.; Xu, Z.; et al. Cause of PM_2.5 pollution during the 2016-2017 heating season in Beijing, Tianjin, and Langfang, China. J. Environ. Sci. 2020, 95, 201–209. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Proceedings of the NIPS′17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
Kipf, T.N. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Abuouelezz, W.; Ali, N.; Aung, Z.; Altunaiji, A.; Shah, S.B.; Gliddon, D. Exploring PM_2.5 and PM10 ML forecasting models: A comparative study in the UAE. Sci. Rep. 2025, 15, 9797. [Google Scholar] [CrossRef] [PubMed]
Chen, M.H.; Chen, Y.C.; Chou, T.Y.; Ning, F.S. PM_2.5 concentration prediction model: A CNN–RF ensemble framework. Int. J. Environ. Res. Public Health 2023, 20, 4077. [Google Scholar] [CrossRef] [PubMed]
Van Houdt, G.; Mosquera, C.; Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 2020, 53, 5929–5955. [Google Scholar] [CrossRef]
Weerakody, P.B.; Wong, K.W.; Wang, G.; Ela, W. A review of irregular time series data handling with gated recurrent neural networks. Neurocomputing 2021, 441, 161–178. [Google Scholar] [CrossRef]
Qi, Y.; Li, Q.; Karimian, H.; Liu, D. A hybrid model for spatiotemporal forecasting of PM_2.5 based on graph convolutional neural network and long short-term memory. Sci. Total Environ. 2019, 664, 1–10. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The 184 urban areas included in this study, the pentagram denotes the background.

Figure 2. Feature analysis based on random forest.

Figure 3. Feature analysis based on Pearson correlation coefficient.

Figure 4. The proposed overall framework.

Figure 5. MS-iTransformer module.

Figure 6. The structure of BSTA.

Figure 7. GESM.

Figure 8. Comparison of predicted and observed PM_2.5 concentrations in Xianyang for (a–h) corresponding to MLP, LSTM, GRU, GC_LSTM, PM2.5-GNN, GWO-GART, EGCFC, and our proposed model.

Figure 9. Comparison scatter plot of predicted and observed PM_2.5 concentrations in Xianyang.

Figure 10. Comparison of predicted and observed PM_2.5 concentrations in Yanan for (a–h) corresponding to MLP, LSTM, GRU, GC_LSTM, PM_2.5-GNN, GWO-GART, EGCFC, and our proposed model.

Table 1. Data attributes.

Nodes	Edges Feature	Unit
Nodes	k_index	K
	2 m_temperature	K
	surface_pressure	Pa
	total_precipitation	m
	boundary_layer_height	m
	relative_humidity + 950	%
	u_component_of_wind + 950	m/s
	v_component_of_wind + 950	m/s
Edges	$wind_speed_of_source_city (w_{s}$ )	km/h
	$wind_direction_of_target_city (w_{d}$ )	$(°$ )
	$direction_from_source_city_to_target_city (d r_{f}$ )	$(°$ )
	$distance_between_source_city_and_target_city (d_{b}$ )	km
	advection_coeffient	$%$

Table 2. Know Air dataset.

Dataset	Training Set	Validation Set	Test Set
Dataset 1	1 January 2015–31 December 2016	31 December 2016–31 December 2017	31 December 2017–31 December 2018
Dataset 2	1 November 2015–28 February 2016	1 November 2016–28 February 2017	1 November 2017–28 February 2018
Dataset 3	1 September 2016–30 November 2016	30 November 2016–30 December 2016	30 December 2016–31 January 2017

Table 3. The performance comparison between our model and baseline methods on dataset 1.

Dataset	Model	Train Loss	Validate Loss	Test Loss	MAE	RMSE	CSI	FAR	R²
1	MLP	0.5624 ±0.0085	0.5269 ±0.0088	0.5537 ±0.0092	18.1240 ±0.1940	22.6306 ±0.1950	0.4252 ±0.0047	0.3502 ±0.0145	0.4651 ±0.0089
	LSTM	0.4229 ±0.0037	0.4287 ±0.0017	0.4571 ±0.0024	16.1657 ±0.1289	20.4499 ±0.1110	0.4615 ±0.0037	0.3038 ±0.0079	0.5585 ±0.0023
	GRU	0.4286 ±0.0020	0.4266 ±0.0021	0.4512 ±0.0018	16.0839 ±0.1229	20.3553 ±0.1049	0.4653 ±0.0042	0.3029 ±0.0110	0.5642 ±0.0017
	GC-LSTM	0.4098 ±0.0030	0.4206 ±0.0016	0.4411 ±0.0031	15.9462 ±0.1132	20.1977 ±0.1063	0.4702 ±0.0046	0.3049 ±0.0138	0.5739 ±0.0030
	PM_2.5-GNN	0.3972 ±0.0042	0.3987 ±0.0035	0.4185 ±0.0042	15.4801 ±0.1412	19.6491 ±0.1355	0.4852 ±0.0036	0.2897 ±0.0114	0.5957 ±0.0041
	GWO-GART	0.3621 ±0.0043	0.4013 ±0.0038	0.4229 ±0.0041	15.3649 ±0.1534	19.6826 ±0.1415	0.4891 ±0.0046	0.2753 ±0.0155	0.5912 ±0.0039
	EGCFC	0.3384 ±0.0081	0.3858 ±0.0032	0.3997 ±0.0038	14.8373 ±0.0050	19.0834 ±0.1511	0.4959 ±0.0049	0.2567 ±0.0150	0.6168 ±0.0045
	ours	0.3567 ±0.0034	0.3601 ±0.0017	0.3756 ±0.0018	14.0836 ±0.0676	18.3083 ±0.0851	0.5030 ±0.0068	0.2295 ±0.0093	0.6327 ±0.0044

Table 4. The performance comparison between our model and baseline methods on dataset 2.

Dataset	Model	Train Loss	Validate Loss	Test Loss	MAE	RMSE	CSI	FAR	R²
2	MLP	0.6409 ±0.0066	0.6372 ±0.0083	0.6523 ±0.0096	28.4975 ±0.3455	35.1934 ±0.3549	0.4628 ±0.0116	0.3081 ±0.0116	0.3770 ±0.0092
	LSTM	0.4464 ±0.0140	0.5172 ±0.0065	0.5459 ±0.0107	25.8818 ±0.3199	32.3494 ±0.3496	0.5114 ±0.0090	0.2975 ±0.0076	0.4785 ±0.0102
	GRU	0.4584 ±0.0070	0.5068 ±0.0031	0.5333 ±0.0065	25.4581 ±0.2491	31.8953 ±0.2443	0.5142 ±0.0059	0.2958 ±0.0097	0.4906 ±0.0062
	GC-LSTM	0.4336 ±0.0102	0.5136 ±0.0055	0.5410 ±0.0098	25.7895 ±0.2607	32.2493 ±0.2876	0.5125 ±0.0065	0.2933 ±0.0082	0.4832 ±0.0093
	PM_2.5-GNN	0.4379 ±0.0079	0.4855 ±0.0032	0.5110 ±0.0044	24.9161 ±0.2012	31.2798 ±0.1915	0.5258 ±0.0052	0.2906 ±0.0086	0.511 ±0.0042
	GWO-GART	0.4319 ±0.0068	0.4847 ±0.0031	0.4941 ±0.0042	24.1134 ±0.1944	30.5662 ±0.1871	0.5338 ±0.0053	0.2729 ±0.0083	0.5285 ±0.0043
	EGCFC	0.4005 ±0.0072	0.4787 ±0.0032	0.4912 ±0.0042	23.9113 ±0.1932	30.3868 ±0.1862	0.5493 ±0.0054	0.2650 ±0.0084	0.5314 ±0.0044
	ours	0.3477 ±0.0112	0.4429 ±0.0025	0.4542 ±0.0076	22.8264 ±0.3842	29.3321 ±0.3881	0.5396 ±0.0123	0.2210 ±0.0111	0.5502 ±0.0115

Table 5. The performance comparison between our model and baseline methods on dataset 3.

Dataset	Model	Train Loss	Validate Loss	Test Loss	MAE	RMSE	CSI	FAR	R²
3	MLP	0.6229 ±0.0101	0.7502 ±0.0171	0.5570 ±0.0108	38.1941 ±0.3776	46.4208 ±0.3766	0.5665 ±0.0050	0.3125 ±0.0094	0.4110 ±0.0114
	LSTM	0.4386 ±0.0060	0.5471 ±0.0066	0.4862 ±0.0124	36.3341 ±0.6214	44.3482 ±0.6369	0.6096 ±0.0038	0.3070 ±0.0054	0.4859 ±0.0131
	GRU	0.4600 ±0.0113	0.5525 ±0.0104	0.4717 ±0.0082	35.8335 ±0.3977	43.706 ±0.4276	0.6105 ±0.0039	0.3091 ±0.0079	0.5012 ±0.0086
	GC-LSTM	0.4358 ±0.0068	0.5535 ±0.0124	0.4822 ±0.0100	36.2248 ±0.5390	44.2294 ±0.5000	0.6055 ±0.0040	0.3099 ±0.0073	0.4901 ±0.0106
	PM_2.5-GNN	0.4401 ±0.0081	0.5147 ±0.0086	0.4636 ±0.0128	35.1663 ±0.6300	42.9891 ±0.6633	0.6168 ±0.0031	0.3063 ±0.0077	0.5097 ±0.0135
	GWO-GART	0.4125 ±0.0076	0.5034 ±0.0084	0.4462 ±0.0123	34.3 ±0.6151	42.52 ±0.6570	0.6276 ±0.0032	0.2956 ±0.0074	0.5278 ±0.0139
	EGCFC	0.3945 ±0.0073	0.4913 ±0.0082	0.4311 ±0.0119	32.9 ±0.5896	41.17 ±0.6352	0.6338 ±0.0032	0.2635 ± 0.0066	0.5467 ±0.0145
	ours	0.3690 ±0.0194	0.4585 ±0.0041	0.4322 ±0.0073	31.1335 ±0.5273	39.3122 ±0.5665	0.6532 ±0.0074	0.2323 ±0.0116	0.5727 ±0.0136

Table 6. MAE and RMSE of different models at different periods.

Model	Metric	+3 h	+6 h	+12 h	+24 h	+36 h	+48 h	+60 h	+72 h
MLP	MAE	11.1459	18.0483	25.8614	32.5212	35.6843	37.3319	37.4012	38.1941
MLP	RMSE	15.7626	23.7607	32.4327	39.9506	43.6865	45.7564	45.1916	46.4208
LSTM	MAE	10.0366	15.9801	22.7599	29.0874	32.6889	34.7104	34.8446	36.3341
LSTM	RMSE	14.1938	21.1434	28.8040	36.1058	40.4199	42.9563	42.3574	44.3482
GRU	MAE	10.1385	16.0489	22.6351	28.8163	32.4622	34.3892	34.7412	35.8335
GRU	RMSE	14.3379	21.2472	28.6710	35.7979	40.1031	42.5039	42.2465	43.7062
GC-LSTM	MAE	10.2688	16.0763	22.5396	28.7403	32.3440	34.6697	34.9810	36.2248
GC-LSTM	RMSE	14.5223	21.2557	28.5414	35.7349	40.0674	42.9681	42.5618	44.2294
PM_2.5-GNN	MAE	9.9502	15.7211	21.9645	27.9514	31.8948	33.6419	33.9457	35.1663
PM_2.5-GNN	RMSE	14.0718	20.8274	27.8935	34.8443	39.4647	41.7387	41.3908	42.9891
GWO-GART	MAE	9.7014	15.3325	21.4217	27.2566	31.1105	32.7984	33.1064	34.3000
GWO-GART	RMSE	13.7721	20.3798	27.2782	34.0619	38.5830	40.8396	41.5091	42.5200
EGCFC	MAE	9.5024	15.0137	20.9761	26.6936	30.4595	32.1280	32.4181	33.5838
EGCFC	RMSE	13.6028	20.1214	26.9477	33.6629	38.1245	40.3239	40.9876	41.1700
Ours	MAE	9.4753	14.8783	20.8946	26.3614	29.1230	31.0708	30.6332	31.1335
Ours	RMSE	13.4000	19.8016	26.7688	33.3437	36.9280	39.5794	38.4296	39.3122

Table 7. CSI and FAR of different models at different periods.

Model	Metric	+3 h	+6 h	+12 h	+24 h	+36 h	+48 h	+60 h	+72 h
MLP	CSI	0.8803	0.8071	0.7240	0.6516	0.6110	0.5922	0.5821	0.5665
MLP	FAR	0.0582	0.1042	0.1667	0.2304	0.2637	0.2856	0.3037	0.3125
LSTM	CSI	0.8914	0.8298	0.7628	0.6974	0.6579	0.6331	0.6252	0.6096
LSTM	FAR	0.0627	0.1054	0.1506	0.2048	0.2409	0.2665	0.2916	0.3070
GRU	CSI	0.8897	0.8270	0.7617	0.7006	0.6598	0.6374	0.6274	0.6105
GRU	FAR	0.0616	0.1038	0.1545	0.2089	0.2474	0.2715	0.2928	0.3091
GC-LSTM	CSI	0.8895	0.8272	0.7631	0.7001	0.6597	0.6318	0.6208	0.6055
GC-LSTM	FAR	0.0590	0.1030	0.1510	0.2038	0.2428	0.2709	0.2943	0.3099
PM_2.5-GNN	CSI	0.8917	0.8303	0.7692	0.7091	0.6692	0.6476	0.6322	0.6168
PM_2.5-GNN	FAR	0.0619	0.1084	0.1543	0.2034	0.2504	0.2674	0.2895	0.3063
GWO-GART	CSI	0.8920	0.8319	0.7702	0.7101	0.6716	0.6553	0.6411	0.6276
GWO-GART	FAR	0.0608	0.1073	0.1531	0.1914	0.2485	0.2511	0.2595	0.2956
EGCFC	CSI	0.8929	0.8332	0.7735	0.7100	0.6753	0.6623	0.6442	0.6338
EGCFC	FAR	0.0636	0.1043	0.1550	0.1814	0.2359	0.2393	0.2493	0.2635
Ours	CSI	0.8961	0.8399	0.7788	0.7232	0.6910	0.6641	0.6596	0.6532
Ours	FAR	0.0550	0.0886	0.1204	0.1647	0.1889	0.2031	0.2172	0.2323

Table 8. R² of different models at different periods.

Model	+3 h	+6 h	+12 h	+24 h	+36 h	+48 h	+60 h	+72 h
MLP	0.8929	0.7988	0.6708	0.5443	0.4786	0.4385	0.4365	0.4110
LSTM	0.9069	0.8318	0.7301	0.6265	0.5581	0.5147	0.5272	0.4859
GRU	0.9063	0.8308	0.7321	0.6293	0.5605	0.5231	0.5268	0.5012
GC-LSTM	0.9059	0.8329	0.7366	0.6340	0.5649	0.5168	0.5233	0.4901
PM_2.5-GNN	0.9094	0.8390	0.7465	0.6456	0.5749	0.5352	0.5395	0.5097
GART	0.9091	0.8487	0.7483	0.6461	0.5821	0.5467	0.5532	0.5278
EGCFC	0.9092	0.8443	0.7518	0.6498	0.5946	0.5655	0.5769	0.5467
ours	0.9123	0.8435	0.7522	0.6595	0.6058	0.5687	0.5813	0.5727

Table 9. The model runtime and complexity on dataset 3.

Model	Runtime (s)	FLOPs (G)	Params (M)
MLP	395.36	0.104	0.001
GRU	369.08	0.932	0.007
LSTM	334.44	1.221	0.009
GC_LSTM	631.06	0.837	0.006
PM_2.5-GNN	832.31	51.330	0.020
GWO-GART	108,000	52.430	0.091
EGCFC	1125.36	55.430	0.103
Ours	1001.20	52.008	0.090

Table 10. Quantitative results of ablation study on datasets 1, 2, and 3.

Dataset	Model	MAE	RMSE	CSI	FAR	R²
1	Baseline	15.4801 ±0.1412	19.6491 ±0.1355	0.4852 ±0.0036	0.2897 ±0.0114	0.5957 ±0.0041
	w/o GESM	14.1623 ±0.0656	18.4013 ±0.0707	0.5018 ±0.0095	0.2392 ±0.0162	0.6303 ±0.0042
	w/o MS-iTransformer	14.1053 ±0.0618	18.3386 ±0.0653	0.5017 ±0.0068	0.2348 ±0.0124	0.6324 ±0.0030
	Ours	14.0836 ±0.0676	18.3083 ±0.0851	0.5030 ±0.0068	0.2295 ±0.0093	0.6327 ±0.0044
2	Baseline	24.9161 ±0.2012	31.2798 ±0.1915	0.5258 ±0.0052	0.2906 ±0.0086	0.5119 ±0.0042
	w/o GESM	22.9963 ±0.2168	29.5258 ±0.2950	0.5365 ±0.0073	0.2349 ±0.0063	0.5477 ±0.0099
	w/o MS-iTransformer	22.8453 ±0.2401	29.3219 ±0.2482	0.5393 ±0.0125	0.2238 ±0.0168	0.5486 ±0.0101
	Ours	22.8264 ±0.3842	29.3321 ±0.3881	0.5396 ±0.0123	0.2210 ±0.0111	0.5502 ±0.0115
3	Baseline	35.1663 ±0.6300	42.9891 ±0.6633	0.6168 ±0.0031	0.3063 ±0.0077	0.5097 ±0.0135
	w/o GESM	32.4696 ±0.5980	40.5141 ±0.5914	0.6330 ±0.0073	0.2566 ±0.0128	0.5542 ±0.0139
	w/o MS-iTransformer	32.2708 ±0.4987	40.4062 ±0.5761	0.6388 ±0.0061	0.2470 ±0.0153	0.5500 ±0.0139
	Ours	31.1335 ±0.5273	39.3122 ±0.5665	0.6532 ±0.0074	0.2323 ±0.0116	0.5727 ±0.0136

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Y.; Zhu, X.; Wang, R.; Xie, Y.; Fong, S. A Dynamic Global–Local Spatiotemporal Graph Framework for Multi-City PM_2.5 Long-Term Forecasting. Remote Sens. 2025, 17, 2750. https://doi.org/10.3390/rs17162750

AMA Style

Huang Y, Zhu X, Wang R, Xie Y, Fong S. A Dynamic Global–Local Spatiotemporal Graph Framework for Multi-City PM_2.5 Long-Term Forecasting. Remote Sensing. 2025; 17(16):2750. https://doi.org/10.3390/rs17162750

Chicago/Turabian Style

Huang, Yao, Xianxun Zhu, Rui Wang, Yanan Xie, and Simon Fong. 2025. "A Dynamic Global–Local Spatiotemporal Graph Framework for Multi-City PM_2.5 Long-Term Forecasting" Remote Sensing 17, no. 16: 2750. https://doi.org/10.3390/rs17162750

APA Style

Huang, Y., Zhu, X., Wang, R., Xie, Y., & Fong, S. (2025). A Dynamic Global–Local Spatiotemporal Graph Framework for Multi-City PM_2.5 Long-Term Forecasting. Remote Sensing, 17(16), 2750. https://doi.org/10.3390/rs17162750

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dynamic Global–Local Spatiotemporal Graph Framework for Multi-City PM_2.5 Long-Term Forecasting

Abstract

1. Introduction

2. Study Area and Available Data

3. Methodology

3.1. Global Temporal Feature Extraction

3.2. Global Spatiotemporal Dependency of Auxiliary Features

3.3. Local Spatiotemporal Feature Extraction

4. Experimental Setting and Results Analysis

4.1. Experimental Setting

4.2. Experimental Results and Analysis

4.2.1. Comparison Evaluation with Different Models

4.2.2. Comparison Forecast Performance in a Representative City

4.2.3. Comparison of Our Model and Existing Methods at Multiple Time Steps

4.2.4. Comparison of Model Runtime and Complexity

4.3. Ablation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI