Next Article in Journal
Estimating Urbanization’s Impact on Soil Erosion: A Global Comparative Analysis and Case Study of Phoenix, USA
Previous Article in Journal
Land Use Patterns and Small Investment Project Preferences in Participatory Budgeting: Insights from a City in Poland
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Lightweight Spatiotemporal Graph Framework Leveraging Clustered Monitoring Networks and Copula-Based Pollutant Dependency for PM2.5 Forecasting

by
Mohammad Taghi Abbasi
1,
Ali Asghar Alesheikh
1 and
Fatemeh Rezaie
1,2,*
1
Department of GIS, Faculty of Geodesy and Geomatics Engineering, K. N. Toosi University of Technology, Tehran 19967-15433, Iran
2
Department of Geophysical Exploration, Korea University of Science and Technology, 217 Gajeong-ro, Yuseong-gu, Daejeon 34113, Republic of Korea
*
Author to whom correspondence should be addressed.
Land 2025, 14(8), 1589; https://doi.org/10.3390/land14081589
Submission received: 27 June 2025 / Revised: 31 July 2025 / Accepted: 1 August 2025 / Published: 4 August 2025
(This article belongs to the Section Land Innovations – Data and Machine Learning)

Abstract

Air pollution threatens human health and ecosystems, making timely forecasting essential. The spatiotemporal dynamics of pollutants, shaped by various factors, challenge traditional methods. Therefore, spatiotemporal graph-based deep learning has gained attention for its ability to capture spatial and temporal dependencies within monitoring networks. However, many existing models, despite their high predictive accuracy, face computational complexity and scalability challenges. This study introduces clustered and lightweight spatio-temporal graph convolutional network with gated recurrent unit (ClusLite-STGCN-GRU), a hybrid model that integrates spatial clustering based on pollutant time series for graph construction, Copula-based dependency analysis for selecting relevant pollutants to predict PM2.5, and graph convolution combined with gated recurrent units to extract spatiotemporal features. Unlike conventional approaches that require learning or dynamically updating adjacency matrices, ClusLite-STGCN-GRU employs a fixed, simple cluster-based structure. Experimental results on Tehran air quality data demonstrate that the proposed model not only achieves competitive predictive performance compared to more complex models, but also significantly reduces computational cost—by up to 66% in training time, 83% in memory usage, and 84% in number of floating-point operations—making it suitable for real-time applications and offering a practical balance between accuracy, interpretability, and efficiency.

1. Introduction

Urbanization and industrialization, though vital to socioeconomic progress, have significantly worsened air quality worldwide [1]. In 2024, the World Health Organization (WHO) reported that nearly 7.92 billion people worldwide are exposed to polluted air, leading to 6.7 million deaths each year due to its harmful effects on cardiovascular and respiratory health, including conditions such as asthma [2]. Air pollution is a complex mixture of particulate matter and gaseous contaminants, with key constituents including particulate matter less than 10 μm (PM10) and 2.5 μm (PM2.5) in diameter, along with sulfur dioxide (SO2), nitrogen dioxide (NO2), carbon monoxide (CO), and ozone (O3) [3]. Among these, PM2.5 is the most concerning due to its strong association with an increased risk of lung cancer and other serious health effects [4]. Given the absence of a comprehensive strategy for the complete eradication of air pollution, the development of forecasting models and early warning systems is essential [5]. This is particularly important for critical pollutants like PM2.5, as they play a key role in formulating effective mitigation policies and protecting public health [6].
Governments have established networks of air quality monitoring stations (AQMSs) to provide accurate data on current conditions and historical trends of air pollution. However, one of the main limitations of these networks is their inability to forecast future pollutant concentrations [7]. Air pollutant concentration forecasting based on AQMS data faces several challenges, including spatial dependencies among AQMSs [8], temporal dependencies on historical data [9], and the influence of meteorological conditions [10]. Additionally, nonlinear chemical interactions, such as NO2 oxidation and secondary O3 formation, complicate predictions [11]. These complexities underscore the necessity of developing spatiotemporal models that can simultaneously capture spatial and temporal dependencies, account for the influence of meteorological conditions on pollutant concentrations, and identify the key pollutants affecting the target variable, such as PM2.5.
Advancements in technology and increased Graphics Processing Unit (GPU) power have transformed deep learning into an advanced branch of machine learning, capable of analyzing complex data and modeling spatiotemporal relationships [12]. As a result, hybrid deep learning models combining spatial and temporal learning are gaining interest, with graph convolutional networks (GCNs) used for capturing spatial dependencies, and recurrent neural networks (RNNs), temporal convolutional networks (TCNs), or attention mechanisms employed to model temporal dependencies [13].
Over the past few years, a growing number of studies have explored various architectures and techniques to improve the accuracy and efficiency of spatiotemporal air pollution forecasting models [14]. For example, Guan et al. [7] introduced a multi-branch TGCN model to extract features from different meteorological variables, improving short- and long-term PM2.5 forecasts. Chen et al. [15] developed an adaptive adjacency matrix learned from meteorological and point of interest (POI) data, which significantly reduced prediction errors compared to fixed distance-based graphs. Hierarchical spatial modeling was explored by Hu et al. [16], who showed that multi-scale graph aggregation improves accuracy. Other works, such as that of Zeng et al. [17], combined convolution with spatial attention mechanisms to capture both local and global spatial dependencies, while Liu et al. [18] enhanced the representation of nodes with few connections using fine-grained graph convolutions. Zhao et al. [19] developed dynamic graph structures that incorporate wind field data, capturing spatial dependencies more effectively through geographically informed directed graphs. Huang et al. [20] proposed a hybrid Transformer-GCN model, GCN-FFPformer, which combines spatial feature extraction through GCN with frequency-domain temporal modeling using an enhanced transformer based on fast Fourier transform (FFT). Zeng et al. [21] tackled the over-smoothing issue by introducing DAGJN, an dynamic adaptive graph jump network that integrates multi-head self-attention to better capture spatiotemporal dependencies. Table S1 summarizes the features of these papers for air pollutant concentration prediction.
With the growing body of research in this area, recent studies (Table S1) have prioritized enhancing prediction accuracy. Nevertheless, these gains in accuracy frequently lead to greater model complexity, resulting in more challenging implementation, longer processing times, and a substantial increase in the number of model parameters [22]. In real-world applications, where resources are limited and rapid prediction is required for horizons such as 1 to 72 h ahead, high accuracy alone is not sufficient [23]. Models must be not only accurate but also lightweight and simple [24]. Therefore, balancing accuracy and ease of deployment is a fundamental challenge in the design of spatiotemporal models.
To address this, a novel model, ClusLite-STGCN-GRU (clustered and lightweight spatio-temporal graph convolutional network with gated recurrent unit), is proposed for near-future air pollution forecasting. The main objective of this model is to achieve high prediction accuracy while reducing computational complexity. To this end, a previously developed clustering framework [25] is employed to partition the AQMS network into a set of local subgraphs. This clustering process is based on time–frequency analysis, dimensionality reduction through principal component analysis (PCA), and agglomerative hierarchical clustering. This decomposition offers three key advantages: (1) it enhances training stability by isolating more stationary spatial patterns, which aligns well with the inherently local nature of graph convolutional filters; (2) it substantially lowers computational costs by eliminating the need for dynamic adjacency matrix learning and by limiting connections to within-cluster interactions; and (3) it implicitly captures meteorological influences by clustering AQMSs based on full PM2.5 time series over the study period (2019–2022), without requiring explicit weather data. Since pollutant patterns are shaped by factors like wind and temperature, similar time series often reflect similar underlying climate conditions. In addition, Copula models are utilized to identify pollutants that exhibit significant dependencies on PM2.5 (the target variable), and only these pollutants are considered as inputs. This reduces the input size, accelerates training, and enables a lightweight and efficient framework. Experimental results demonstrate that ClusLite-STGCN-GRU outperforms all baseline models that either neglect spatial dependencies or use CNNs to extract spatial information among AQMSs. It achieves accuracy comparable to more complex graph-based models for short-term forecasts (up to 8 h ahead), while consistently outperforming them over longer horizons (from 12 to 72 h), thereby improving accuracy while reducing complexity.
The remainder of this paper is organized into the following six sections: a description of the study area and dataset along with the problem definition, the theoretical background, a detailed explanation of the proposed model, a comparative analysis of the prediction results and model performance, and finally, discussion and conclusions.

2. Materials and Problem Definition

2.1. Study Area and Data Description

Our study area is Tehran, the capital and most densely populated city in Iran (Figure 1a). This metropolis spans approximately 18,814 square kilometers and has a population of about 16 million, making it the second-largest city in the Middle East after Cairo [26]. Located at an elevation of 1200 m above sea level on the southern slopes of the Alborz Mountains, Tehran is naturally constrained to the north by these mountains, which limit the dispersion of air pollutants in that direction. However, the city opens to expansive plains in the south, which allow for some pollutant dispersion. The prevailing winds, coming mainly from the west and southwest, carry pollutants from industrial zones and densely populated areas to other parts of the city. Tehran has a dry climate, receiving only about 200 mm of rainfall per year and averaging 25 °C, which, along with rapid population growth, industrial expansion, and heavy traffic, has made Tehran one of the most polluted megacities globally [27].
In Tehran, air quality data is published by Tehran Municipality (https://www.tehran.ir/, accessed on 31 July 2025)) and Department of Environment (DOE, https://www.doe.ir/, accessed on 31 July 2025)). In this study, hourly pollutant concentration data—including PM2.5, PM10, NO2, SO2, CO, and O3—were collected from these two organizations, covering the period from 00:00 on 1 January 2019 to 23:00 on 31 December 2022. Additionally, meteorological observations, including temperature, pressure, humidity, dew point temperature, and wind components (wind_x and wind_y), were obtained from the Iran Meteorological Organization (https://www.irimo.ir/, accessed on 31 July 2025), which is responsible for monitoring atmospheric conditions. The meteorological data were collected for the same time period and matched to AQMSs using the inverse distance weighting (IDW) interpolation method. It is worth mentioning that in our proposed model, meteorological data are not used directly as inputs but are collected only for baseline models that cannot implicitly capture their effects. Our model accounts for these impacts indirectly through time series clustering. Table 1 summarizes the descriptive statistics of the relevant variables, and the spatial distribution of AQMSs and meteorological stations is shown in Figure 1b.

2.2. Data Preprocessing

Air quality sensors in real-world conditions face challenges such as outliers and missing data due to factors like adverse weather conditions, equipment wear, and power outages [14]. Missing data hinders the model training process as models require complete data for learning. On the other hand, outliers cause the model to deviate from the underlying patterns and reduce prediction accuracy [28]. Additionally, due to scale differences in the variables measured by AQMSs (Table 1), data normalization is essential. This process ensures the uniformity of feature scales, thereby guaranteeing that each feature has an equal impact on the modeling process [29]. Finally, appropriate data splitting into training, validation, and test sets is crucial for accurately evaluating model performance and avoiding issues such as overfitting. Preprocessing procedures for imputing missing data and removing outliers followed the method proposed by Abbasi et al. [25]. Min–max normalization was applied to ensure consistent feature scaling. The dataset was sequentially split into training, validation, and test sets, covering the first 70%, the next 15%, and the final 15% of the data, respectively.

2.3. Problem Formulation

The objective of this study was to predict the spatiotemporal air quality at the locations of AQMSs using historical data from the station itself and its neighboring AQMSs. To achieve this, the data from these AQMSs are treated as signals recorded every hour of the day, with the domain of these signals represented as a graph. In this graph, AQMSs are nodes, and the observations from each station serve as the features of those nodes. The spatial relationships between AQMSs are captured through the adjacency matrix, which defines the edges of the graph. Ultimately, the spatiotemporal deep learning model, by learning a mapping function F, predicts air quality for the next τ hours at each node based on graph G and the historical node features of length T:
X t 0 + 1 ,   X t 0 + 2 ,   ,   X t 0 + τ = F θ X t 0 T + 1 ,   X t 0 T + 2 ,   ,   X t 0 ; G
where θ represents the learnable parameters and X represents the air quality features observed at AQMSs, with each X t being a matrix of features recorded at all stations at time t 0 , capturing pollutant levels and related environmental factors.

3. Theoretical Background

3.1. GCNs

GCNs are a class of deep learning models designed for graph-structured data. Unlike traditional neural networks that process Euclidean data (i.e., images, text, and videos), GCNs are capable of capturing the topological structure of graphs and learning node representations [30]. They are categorized into spectral and spatial graph convolutions [31]. Spectral methods use the eigendecomposition of the graph Laplacian, where the eigenvectors of the Laplacian act as the graph’s Fourier basis, similar to how sine and cosine functions work in Fourier space. Although the convolution operation in Fourier space is equivalent to a Hadamard multiplication, it is computationally expensive due to the need for eigenvalue decomposition [32]. Spatial methods have limitations compared to spectral methods in capturing long-range dependencies and global structural patterns. However, they operate without the need for spectral decomposition, directly aggregating local neighborhood features, which enhances scalability and computational efficiency for large-scale graphs [33].
The evolution of GCNs began with the models proposed by Bruna et al. [34] and Henaff et al. [35], which formulated convolution in the spectral domain using the eigendecomposition of the graph Laplacian matrix L. However, the dense nature of L incurs a computational complexity of O (n3), making it impractical for large-scale graphs. To address this, Defferrard et al. [36] proposed ChebNet, which utilizes Chebyshev polynomial approximations to eliminate the need for eigendecomposition, thereby reducing the computational complexity to O ( K | E | ) , where K is the polynomial order and | E | is the number of edges. Building on this, Kipf & Welling [37] introduced a simplified first-order model ( K = 1 ) with linear complexity O ( | E | ) , enabling scalable learning on large graphs through localized filtering. This process is defined by the following equation:
Z = D ~   1 2 A ~ D ~   1 2 X Θ
where A ~ = A + I N is the adjacency matrix with self-loops; D ~ the degree matrix; X the input features; Θ the trainable weights; and Z the output. This formulation ensures an efficient balance between scalability and expressive power.

3.2. RNNs

In 1906, Andrey Markov introduced the Markov chain model, where the future state of a system depends solely on its current state, without reference to prior states [38]. In 1982, John Hopfield introduced RNNs, enabling the modeling of longer sequence dependencies, moving beyond the reliance on just the current state [39]. These networks are particularly effective for tasks that involve time-series data or sequences, as they can maintain information over time and use it for predictions [40]. The Elman RNN (ERNN) [41] represents one of the pioneering RNN architectures specifically designed to capture sequence dependencies. In this architecture, the current input x t is integrated with the hidden state h t 1 from the previous time step, with both components being weighted by learnable parameters. The hidden state at time step t, denoted as h t , is computed using the following equation:
h t = tanh W h h t 1 + W x x t + b
where t a n h is the non-linear activation function; W h represents the weight matrix for the previous hidden state; W x is the weight matrix for the current input; and b is the bias term. ERNN is effective in modeling short-term dependencies but suffers from the vanishing gradient problem, which restricts its ability to capture long-term patterns [42].
The long short-term memory (LSTM) [43] was introduced to address the vanishing gradient problem in the ERNN by using gating mechanisms to preserve long-term dependencies [44]. The LSTM architecture consists of three primary gates: input, forget, and output. Each gate applies a linear transformation to the current input x t and the previous hidden state h t 1 , followed by a sigmoid activation function to produce gating values between 0 and 1:
g t = σ W g h t 1 + U g x t + b g
where g t { f t ,   i t , o t } represents the output of the forget, input, or output gate, respectively. Here, W g and U g are learnable weight matrices applied to the previous hidden state and current input, respectively, and b g is a bias vector. These gates regulate the information flow through the memory cell. The forget gate f t controls which parts of the previous cell state c t 1 are retained, while the input gate i t determines the contribution of the new candidate state c ~ t . The cell state is updated as follows:
C t = σ f t C t 1 + i t C ~ t
The output gate o t then determines the hidden state:
h t = tanh C t o t  
This structure enables LSTM to effectively preserve long-term dependencies and mitigate the vanishing gradient problem [45].
The gated recurrent unit (GRU) [46] is a simpler variant of the LSTM that uses only two gates: update and reset, reducing model complexity while maintaining the ability to capture long- and short-term dependencies [47,48]. Each gate is computed as follows:
g t = σ W g h t 1 + U g x t + b g
where g t { z t , r t } corresponds to the update ( z t ) and reset ( r t ) gates, respectively. The candidate hidden state h ~ t and the final hidden state h t are updated as follows:
h ~ t = tanh W · r t h t 1 , x t
h t = 1 z t h t 1 + z t h ~ t
This structure allows GRU to effectively regulate information flow without a separate memory cell, making it computationally efficient while mitigating the vanishing gradient problem [49]. The architectures of ERNN, LSTM, and GRU are shown in Figure 2a–c, respectively.

4. Proposed Model and Experimental Settings

4.1. Model Design

The proposed model in this study is a hybrid deep learning framework based on the combination of GCN and GRU networks, designed to simultaneously predict PM2.5 concentrations at all AQMSs for a forecast horizon of 1 to 72 h. As shown in Figure 3, the proposed model consists of five separate blocks, namely Copula-based dependency analysis, pollutant time series clustering, graph convolution layers, GRU layers, and output PM2.5 prediction layers.

4.1.1. Copula-Based Dependency Analysis Block

The concentrations of air pollutants are often influenced by common emission sources such as traffic and industrial activities, and under certain meteorological conditions, complex chemical interactions may occur among them [50,51]. These factors result in statistical dependencies among pollutants, where variations in the concentration of one pollutant are associated with changes in others. Therefore, dependency analysis helps identify pollutants related to PM2.5 (the target pollutant) and prevents the inclusion of all pollutants in the model, thereby reducing the number of input parameters. Methods such as Pearson and Spearman correlation coefficients are commonly used for dependency analysis between variables. However, these methods are only capable of detecting linear (Pearson) or monotonic (Spearman) dependencies and have limited effectiveness in identifying critical dependency structures—particularly when pollutant concentrations simultaneously reach very high or very low levels. In this context, Copula models provide a more flexible statistical framework for modeling the dependency structure among random variables without requiring specific assumptions about their marginal distributions [52]. These models possess a strong ability to analyze nonlinear dependencies and examine the joint behavior of variables under extreme conditions (tail dependencies) [53].
In the Copula-based modeling approach, each marginal variable is first transformed into a standard uniform distribution over the interval [0, 1] using its empirical cumulative distribution function (ECDF)—a process known as marginal transformation. Subsequently, the dependence structure among the transformed variables is estimated using an appropriate Copula function [54]. Table 2 presents the most commonly used Copula families, the types of dependence they capture, and their typical applications in the context of air pollution. It is worth noting that in addition to the Copulas listed in Table 2, Copulas can be rotated by 90 or 180 degrees to better capture asymmetric dependence patterns, especially when tail dependence is observed only in specific quadrants of the joint distribution. Such rotations enhance the flexibility of the Copula framework, making it particularly valuable in environmental applications where dependencies may not be symmetric or uniformly distributed.
Key characteristics of dependency structures in all Copula models include Kendall’s tau rank correlation coefficient (τ), upper tail dependence ( λ u p p e r ), and lower tail dependence ( λ l o w e r ). The τ coefficient quantifies the overall strength and direction of association between two variables and ranges from –1 to +1. A positive τ suggests that increases in one pollutant are generally associated with increases in another (positive association), while a negative τ indicates that an increase in one variable tends to coincide with a decrease in the other (negative association or inverse trend). In contrast, λ u p p e r and λ l o w e r , both ranging from 0 to 1, quantify the probability of co-occurrence of extreme events in the upper and lower tails of the joint distribution, respectively, providing critical insights into the tail-dependent behavior of pollutants [55].

4.1.2. Pollutant Time Series Clustering Block

AQMSs measure pollutant concentrations at fixed sampling intervals, resulting in multivariate time series data for each pollutant across multiple AQMSs. Herein, an adjacency matrix is constructed separately for each pollutant by identifying the time series of AQMSs—which serve as graph nodes—that exhibit similar levels and patterns of variation. The adjacency matrix is defined such that only AQMSs within the same cluster are connected, while no connections exist between AQMSs belonging to different clusters. Moreover, pollutants that do not exhibit a distinct spatial clustering pattern—that is, those whose concentration levels and variation trends are similar across all urban AQMSs—are excluded from the model inputs due to their lack of added spatial value. This strategy removes the need to learn the adjacency matrix during training and implicitly captures meteorological effects, as AQMSs experiencing similar weather conditions often exhibit similar pollution dynamics [56]. By removing explicit meteorological features and pollutants without spatial cluster structures, and employing a fixed adjacency matrix, the model benefits from reduced input dimensionality and fewer trainable parameters.
To achieve this, a methodology inspired by a prior study [25] was adopted, consisting of the following steps: (1) transformation from the time domain to the time–frequency domain to capture the amplitude and frequency of pollutant concentration variations, (2) use of a PCA method to extract a low-dimensional feature representation from the transformed signals, (3) aggregate hierarchical clustering (AHC) to group AQMSs into clusters based on the extracted features. For further methodological details, the reader is referred to Abbasi et al. [25].

4.1.3. Graph Convolution Block

A key strength of using GCNs lies in their ability to simultaneously process information from multiple AQMSs while accounting for spatial dependencies among them. In this study, this capability is realized by constructing graphs in which each AQMS is considered as a node (a total of N nodes corresponding to the total number of AQMSs), and each node is associated with a P-dimensional feature vector. This feature vector is composed of variables selected by two prior blocks: pollutants that exhibit non-zero dependency with PM2.5 in all three parts of the distribution—namely, central dependency, upper tail dependency, and lower tail dependency—as well as pollutants that display distinct spatial clustering behavior. For each selected pollutant, a separate sequence of graphs is constructed over the study period, such that each time step corresponds to a snapshot graph. As discussed in Section 4.1.2, the structure of each graph is defined by an adjacency matrix specific to the corresponding pollutant, which is constructed through spatial clustering of its concentration time series across all AQMSs. In this setting, the edges are binary and undirected, and are defined as follows:
A i j = 1   , i f   i j   a n d   n o d e s   i , j   b e l o n g   t o   t h e   s a m e   c l u s t e r 0   , o t h e r w i s e
This explicit construction of the adjacency matrix removes the need for learning graph connectivity parameters during model training, thereby reducing model complexity compared to approaches that require dynamically learned or time-varying adjacency matrices. Within the GCN framework, the node feature vectors are combined using this adjacency matrix to produce spatially embedded outputs.

4.1.4. GRU Block

In the graph convolution block, for each pollutant, a distinct graph is constructed at every time step based on its spatial clustering. Graph convolution is applied separately on each pollutant’s graph, and their outputs are concatenated to form a combined spatial embedding vector. This sequence of embeddings is then input to a GRU block to capture temporal dependencies. As discussed in Section 3.2, the GRU employs two gating mechanisms—the update and reset gates—to selectively retain or discard past information, thereby modeling the temporal dynamics of the data. This mechanism, when applied to the output of the graph convolution block, is mathematically defined as follows:
g t = σ W g h t 1 + U g G C N X t , A + b g   ; h ~ t = tanh W · r t h t 1 ,   G C N X t , A   ; h t = 1 z t h t 1 + z t h ~ t  
where G C N X t , A denotes the output of the GCN at time step t given the input features X t and the adjacency matrix A.

4.1.5. Output Block

After extracting spatiotemporal features through the graph convolution and GRU blocks, the output block employs fully connected (FC) layers to map the learned hidden representations to the target prediction dimensions. For multi-step forecasting over a prediction horizon of T time steps, the output is calculated as follows:
Y ^ t + 1 : t + T = W F + b
where Y ^ t + 1 : t + T denotes the predicted pollutant concentrations (e.g., PM2.5) for the next T time steps; F represents the spatiotemporal feature vector extracted from the previous layers; and W and b are the weight matrix and bias vector of the fully connected layers, respectively.

4.2. Model Validation

To validate the performance of the proposed model in predicting PM2.5 concentrations, four statistical metrics—MAE, RMSE, R2, and Index of Agreement (IA)—were used. The selection of these metrics was made not only due to their widespread use in the literature of regression predictions but also considering the specific characteristics of air pollution data and the analytical needs associated with this domain [21,57,58,59]. In air pollution forecasting, the accuracy of predictions is critical for decision-making related to air quality and public health. MAE measures the average absolute error, while RMSE emphasizes large errors, helping evaluate the model’s performance during significant concentration changes. R2 assesses the proportion of variance in the observed data that is predictable from the model, offering an interpretable measure of model fit across varying concentration levels. On the other hand, the Index of Agreement (IA) is used to measure the overall agreement between the predicted and actual values, serving as a dimensionless metric that reflects the model’s overall accuracy in predicting data behavior. The definitions of them are provided by Equations (13)–(16):
M A E =   1 n   i   =   1 n o i p i
R M S E =   1 n   i = 1 n ( o i p i ) 2
R 2 = 1 i = 1 n ( o i p i ) 2 i 1 n ( o i o ¯ ) 2
I A = 1 i = 1 n ( o i p i ) 2 i = 1 n ( p i o ¯ + o i o ¯ ) 2
where o i and p i are the observed (ground truth) and predicted values, respectively. o ¯ is the average value of n observed sample data.

4.3. Model Evaluation

In this subsection, the performance of the proposed model is evaluated. The aim is to assess the accuracy, robustness, and reliability of the model in predicting PM2.5 concentrations, in comparison with baseline models. To ensure a fair and consistent comparison, identical training, validation, and testing datasets are used across all models. Moreover, the evaluation is conducted using the performance metrics introduced in Section 4.2. For optimal performance, the hyperparameters of each model are fine-tuned using a grid search strategy. The proposed model is evaluated by comparing it with seven baseline models:
(1) 
GRU competitive method
As one of the competitive prediction methods, this approach relies solely on the temporal dependencies of observational data from AMQSs to forecast future values, without considering the spatial dependencies among AQMSs. This approach can highlight the impact of ignoring the spatial dimension in predicting PM2.5 levels. In this method, the pollutant observations from AQMSs, along with the relevant meteorological data as auxiliary information, are fed into the network. The model makes predictions separately for each AMQS, and then evaluation metrics are calculated as the average across all AMQSs. Thus, the input matrix is three-dimensional, consisting of batch size, time step, and number of features. What enables these models to effectively simulate temporal dependencies and variations in pollutants over time is the allocation of a separate dimension to historical data as time step. The structure of the proposed GRU model for the problem of predicting PM2.5 concentrations at 11 AQMSs in Tehran is shown in Figure S1.
(2) 
LSTM competitive method
As another competitive prediction method, this method—similar to GRU—focuses exclusively on the temporal dependencies in individual AQMS time series, while ignoring the spatial dependencies among AQMSs. LSTM uses three gating mechanisms (input, forget, and output), which result in a larger parameter space and increased computational overhead compared to GRU, which only uses two gates (update and reset) with a simpler architecture, as discussed in Section 3.2. This configuration highlights the trade-off between enhanced temporal modeling capabilities and the associated computational complexity. Similar to the GRU model, pollutant observations and auxiliary meteorological data are fed into the network in a three-dimensional tensor (batch size, time step, features). Each AQMS is modeled independently, and the prediction accuracy is evaluated as the average across all AQMSs. The structure of the proposed LSTM model is shown in Figure S2.
(3) 
GRU and LSTM with multi-head attention competitive method
In standard GRU and LSTM architectures, the trainable weights (parameters) are shared across all time steps, although the hidden states vary at each step. Although this weight-sharing strategy promotes efficiency and temporal generalization, it limits the model’s ability to adaptively focus on specific time steps during sequence processing. By integrating a multi-head attention mechanism applied over the full sequence of GRU and LSTM outputs, these models can learn to assign distinct attention weights to each time step. This enables them to emphasize relevant temporal segments and capture more complex dependencies. Specifically, the input tensor has the shape (batch size, time step, number of features), and the attention mechanism computes the query, key, and value matrices from the full output sequence of the LSTM or GRU. Attention scores are calculated using scaled dot-product operations, normalized through the softmax function, and applied to weight the value vectors accordingly. The resulting multi-head outputs are concatenated and projected back to the original hidden size dimension. For the final prediction, the representation obtained from the attention mechanism corresponding to the last time step is passed through a fully connected layer to estimate PM2.5 concentrations. This approach enables the model to focus on more important parts of the temporal data, although it increases computational complexity. The overall architectures of the models are shown in Figures S3 and S4.
(4) 
CNN-GRU competitive method
The hybrid CNN-GRU architecture, which functions as a competitive forecasting method, simultaneously models spatial and temporal dependencies. In this approach, the irregular graph-based structure of AQMSs is transformed into a regular spatial grid. This model allows for evaluating the impact of neglecting the true spatial topology on the model’s performance. The model’s input is a four-dimensional tensor, where each time step corresponds to a two-dimensional image representing the spatial distribution of AQMS data. For each AQMS, its three geographically nearest neighbors are placed adjacent to it, forming a local region of data. In the first stage, an independent convolutional filter with a kernel size of (1 × 4) is directionally applied along the width axis for each time step (which will later be set to 24). This design enables the model to learn local patterns across horizontally adjacent AQMSs while preserving the vertical structure of the image (height dimension). The extracted spatial features are then combined with auxiliary data (i.e., meteorological variables) and passed through two consecutive GRU layers to capture temporal dependencies. Finally, the GRU output is processed through fully connected layers to produce a two-dimensional output matrix, where each row corresponds to the predictions for a specific AQMS, and each column represents the predicted PM2.5 concentration for one of the next 8 time steps. The structure of this model is shown in Figure S5.
(5) 
Distance-based GCN-GRU and wind-driven dynamic GAT-GRU
Two other competitive methods considered to compare the performance in forecasting PM2.5 concentration are based on combining graph convolutional networks (GCNs) or graph attention networks (GATs) with a GRU block. The goal of these methods is to evaluate the performance of using static graphs and dynamic graphs with graph attention mechanisms for modeling spatial dependencies among AQMSs, in comparison to our proposed model. The distance-based GCN-GRU model differs from our proposed model in the formulation of the static adjacency matrix: a connection between two AQMSs is established only if their pairwise distance falls below a predefined threshold, and the corresponding edge weight is assigned as the inverse of their distance, as defined by the following:
A i j = 1 d i j   , i f   i j   a n d   d i j < R   0   , o t h e r w i s e
where A i j denotes the weight of the edge between nodes (AQMSs) i and j; d i j is the Euclidean distance between these two AQMSs; and R is the fixed distance threshold set to 5 km.
The wind-driven dynamic GAT-GRU model differs from our proposed model not only in the formulation of the adjacency matrix, but also in its use of GAT instead of GCN. In this model, the graph structure, i.e., the presence of directed edges between nodes, is dynamically determined at each time step based on wind direction. Specifically, an edge from node i to node j is created if the wind at AQMS i points toward AQMS j , formalized by the following:
c o s ( θ i j ( t ) ) = w i ( t ) w i ( t ) · p j p i p j p i > 0
Edge weights are computed by the following:
A i j ( t ) = w i ( t ) · c o s ( θ i j t ) · e x p ( d i j 2 2 σ 2 )
where w i ( t ) denotes the wind vector at AQMS i at time t ; p i and p j are the coordinates of AQMSs i and j, respectively; and σ is a distance smoothing hyperparameter. This formulation yields a sparse, directed, and temporally dynamic graph. This formulation results in a directed and dynamic graph structure over time. The GAT layer assigns learnable attention coefficients to edges, reflecting their relative importance in node feature aggregation and enabling the model to learn an optimal graph structure at each time step. In both models, after extracting spatial features through GCN or GAT layers, the outputs are fed into a GRU block to model temporal dependencies in the data. The final structure of each model includes fully connected layers to generate the final prediction of PM2.5 concentration at different AQMSs. The overall architecture of these models is shown in Figures S6 and S7.

4.4. Experimental Settings

The experiments in this study were conducted on a computing system running Windows 11. The hardware configuration included an NVIDIA GeForce RTX 3050 Ti GPU and 16 GB of RAM. On the software side, Python 3.10.4 was used along with PyTorch 2.4.1 and PyTorch Geometric (PyG) 2.6.1 to implement and train all models. The pollutant concentration data were collected by AQMSs at an hourly sampling rate; therefore, each time step was considered equivalent to one hour. In this study, a time window of 24 previous time steps (equivalent to 24 h) was used to predict future PM2.5 concentrations, referred to as time step. This value was adjusted using the Fourier transform, which converts the pollutant concentration data from the time domain to the frequency domain. As shown in Figure S8, in this domain, the frequency corresponding to the 24 h cycle had the most significant impact. The goal of the prediction was to forecast PM2.5 concentrations simultaneously across all AQMSs at future time steps with intervals of [1, 2, 4, 8, 12, 24, 48, 72] hours. All models were trained using the mean squared error (MSE) loss function with a weight decay of 0.0001, a batch size of 64, and a total of 400 training epochs. To reduce computational cost and prevent overfitting, early stopping with a patience value of 5 was employed. The remaining hyperparameters for each model were independently tuned using a grid search approach to identify the optimal combination that yields the best model performance. The final selected hyperparameter values for each model are presented in Table S2.

5. Results

5.1. PM2.5 Dependency on Other Pollutants

The concentrations of PM2.5 and five other major air pollutants (PM10, NO2, SO2, CO, and O3) were monitored hourly at each AQMS. To obtain a unified time series for the entire study area, the hourly average concentration of each pollutant across all AQMSs was calculated. This process yielded a single continuous hourly time series spanning the study period. Using this aggregated time series, the Copula modeling framework was employed to quantify the statistical dependencies between PM2.5 (as the target pollutant) and the other pollutants.
This analysis showed that PM2.5 exhibits varying statistical dependencies with other pollutants (Table 3). PM10 showed the strongest positive dependence with PM2.5 (τ = 0.665), with symmetric tail dependence ( λ u p p e r = λ l o w e r   = 0.229), modeled using a t-Student Copula. This strong association is primarily attributed to the physical nature of the pollutants, as PM2.5 is a subset of PM10, and their common anthropogenic sources such as vehicular emissions, road dust resuspension, and industrial activities [60]. This explains their consistent co-occurrence across both mild and extreme pollution levels. NO2 and SO2 exhibited moderate positive dependencies with PM2.5 (with Kendall’s τ values of 0.424 and 0.388, respectively), both modeled using a t-Student Copula with non-zero upper and lower tail dependencies. These results indicate a significant co-occurrence of high (or low) concentrations of these gaseous pollutants alongside PM2.5, which can be attributed to common anthropogenic sources such as vehicular emissions and industrial activities [9]. CO showed a weaker dependence with PM2.5 (τ = 0.386), with both upper and lower tail dependence coefficients equal to zero, as modeled using a Frank copula. This indicates a general positive association without significant tail dependence. The relationship between O3 and PM2.5 was negative, with a Kendall’s τ of −0.262, modeled using a 90° rotated Gumbel copula, which exhibited no tail dependence. This negative association aligns with previous findings in the literature [55,61,62], as the photochemical processes responsible for ozone formation typically occur under atmospheric conditions that differ from those leading to elevated PM2.5 levels.

5.2. Spatial Clustering of AQMSs

Spatial clustering analysis using AHC was conducted on the time series of air pollutants recorded at AQMSs within the study area. Internal validation indices, including the Silhouette score, Dunn index, and Calinski–Harabasz index, were employed to determine the optimal number of clusters. The validity of the resulting clusters was assessed using the q-index and the non-central F-test. The findings revealed that for O3, CO, and SO2, the temporal behavior of pollutant concentrations was not significantly influenced by the spatial location of the AQMSs, resulting in the formation of a single cluster encompassing all AQMSs. However, for the pollutants NO2, PM10, and PM2.5, the analysis showed that their temporal concentration patterns were influenced by the spatial locations of the AQMSs, resulting in the formation of five, four, and five clusters, respectively, each exhibiting distinct spatial patterns, as shown in Figure 4a–c.
For NO2 (Figure 4a), five distinct clusters were identified. Cluster 1 (blue) includes AQMSs located in the central–southwestern urban–industrial zones with high population and emission density; Cluster 2 (green) consists of AQMSs situated in eastern peripheral areas as well as one central station, reflecting both regional transport and localized urban effects; Cluster 3 (maroon) contains only the Rey AQMS, located in the southern part of the city, likely influenced by nearby power plants and industrial facilities; Cluster 4 (pastel green) encompasses AQMSs in the northern part of the city, where urban infrastructure and topographic influences may shape NO2 patterns; and Cluster 5 (yellow) includes a single centrally located AQMS, suggesting site-specific temporal variability. For PM10 (Figure 4b), four distinct clusters were identified, with most AQMSs within each cluster exhibiting close spatial proximity. This spatial coherence suggests that local environmental conditions and emission sources play a dominant role in shaping PM10 distribution patterns. Nevertheless, two AQMSs deviated from the general spatial clustering pattern. Mantagheh 22 AQMS, despite being located in the western part of the city, was grouped with AQMSs in the southern region, suggesting similar temporal trends likely driven by shared emission sources, long-range pollutant transport, or dominant wind directions. Piroozi AQMS, although geographically situated in the eastern zone, displayed temporal behavior more aligned with central urban stations, indicating that urban activity intensity or localized environmental conditions exerted a stronger influence than its physical location. For PM2.5 (Figure 4c), five distinct clusters were identified that, similar to PM10, mostly consist of AQMSs with significant spatial proximity within each cluster, reflecting the strong influence of local environmental conditions and shared pollutant sources in shaping similar temporal patterns. However, some deviations from this pattern were observed: Mantagheh 22 AQMS, despite being located in western Tehran, was grouped in Cluster 2 (dark green) along with AQMSs from central areas, indicating similar temporal trends likely influenced by pollutant transport or common pollution sources such as industrial activities or prevailing wind directions. Additionally, Masoudieh 22 AQMS, although situated at the eastern boundary of Tehran, was grouped with northern AQMSs in Cluster 3 (yellow), reflecting temporal alignment with northern city patterns. Furthermore, Piroozi 22 AQMS, also geographically located at the eastern boundary of Tehran, was grouped in Cluster 4 (light green) alongside central city AQMSs, suggesting that the intensity of urban activities or local environmental conditions had a stronger influence than its physical location. As a result of these two blocks, among the six main pollutants measured by the AQMSs, besides PM2.5 as the target pollutant, NO2 and PM10 were selected as input variables to the model due to their non-zero dependency with PM2.5 across all three parts of the distribution—namely, central dependency, upper tail dependency, and lower tail dependency—as well as their clustered spatial patterns across the AQMSs in Tehran.

5.3. Comparison of Model Performance Across Spatial Clustering of AQMSs

Table 4 summarizes the performance of the proposed ClusLite-STGCN-GRU and seven baseline models on the test set across forecasting horizons from +1 to +72 h, using the validation metrics introduced in Section 4.2. Table S3 further details the performance across the training, validation, and test datasets. Additionally, the correlation between observed and predicted PM2.5 values for 1, 2, 4, 8, 12, 24, 48, and 72 h ahead forecasting across various models on the training and test datasets is shown in Figures S9–S16. The best results are marked in bold. As can be seen, non-hybrid baseline models (i.e., the first four competitive methods) exhibit lower forecasting accuracy compared to hybrid architectures incorporating CNN, GCN, or GAT components (i.e., the latter four competitive methods). These findings align with the main hypothesis of this study, which assumes that air pollutant concentrations and dispersion patterns are governed not only by the temporal dynamics of individual AQMS observations but also by spatial interactions among AQMSs. Additionally, although prediction errors increase with longer forecasting horizons in all models, our proposed model exhibits a lower rate of error growth compared to the baseline models.
Among the non-hybrid baseline models, GRU demonstrated the best balance between simplicity, computational efficiency, and forecasting accuracy across all horizons. While incorporating multi-head attention into GRU led to moderate improvements in short-term predictions (up to 8 h), it performed slightly worse than GRU in long-term forecasts, despite the added complexity. In contrast, both LSTM and LSTM with multi-head attention consistently underperformed compared to their GRU counterparts, indicating limited benefits from the added complexity under the current configuration.
Among the hybrid baseline models, CNN-GRU, which uses CNNs to extract spatial dependencies, performed worse than graph-based models (GCN- or GAT-based). This suggests that converting the irregular distribution of AQMSs into a regular Euclidean grid—as required by CNNs—results in a loss of spatial precision, thereby reducing forecasting accuracy. Nevertheless, it still outperformed purely temporal models, highlighting the importance of incorporating spatial features, even when the spatial representation is suboptimal.
Among graph-based baseline models, wind-driven dynamic GAT-GRU showed better performance in short-term predictions (up to 8 h). This performance is likely due to the model’s ability to adaptively and dynamically model spatiotemporal relationships influenced by wind, highlighting the crucial role of wind in rapid dispersion of pollutants over short periods. However, in longer-term horizons (from 12 h onward), the advantage of wind-based model diminished. In these intervals, performance of wind-driven dynamic GAT-GRU decreased compared to the ClusLite-STGCN-GRU model, indicating that the effect of wind plays a less significant role in longer horizons, where unpredictable factors such as cumulative uncertainty, atmospheric mixing, and broader environmental variables are more influential. This trend confirms that although wind is an important factor in accuracy of short-term forecasts, broader temporal and spatial factors dominate in long-term predictions. In contrast, the clustering approach in the ClusLite-STGCN-GRU model, which groups AQMSs based on temporal and behavioral similarities of pollutant patterns, demonstrated significant effectiveness particularly in medium-term (+8 to +12 h) and long-term (+24 to +72 h) horizons. By simplifying complex spatial relationships and focusing on similar behavioral patterns, this model was able to maintain stability and appropriate accuracy in predictions.

5.4. Comparison Results of Model Complexity

In this subsection, the complexity of the ClusLite-STGCN-GRU model is further analyzed. Given that the proposed model is a hybrid graph-based deep learning model, the comparison was limited to two similar deep learning approaches: distance-based GCN-GRU and wind-driven dynamic GAT-GRU. Table 5 presents the comparison results from various aspects of computational complexity, including training and inference time, memory consumption, number of floating-point operations (FLOPs), and number of parameters. Among them, the forward and backward propagation time (FBP), measured in seconds (s), evaluates the offline training speed per epoch, while the forward propagation time (FP), in milliseconds (ms), reflects the online inference speed per sample. Total epoch time (seconds) accounts for the entire duration of one training epoch, including computation and data handling. FLOPs, a unitless measure, indicate the number of floating-point operations needed to process one sample—a lower value means a lighter, faster model. The number of parameters, also unitless, represents the count of trainable weights and biases, reflecting model capacity. Inference memory allocated, reported in megabytes (MB), shows the RAM or GPU memory required during prediction.
Overall, the results in Table 5 highlight the superior computational efficiency of the ClusLite-STGCN-GRU model compared to the other two graph-based deep learning baselines. It achieves 12% faster training per epoch compared to distance-based GCN-GRU and is 65.7% faster than wind-driven dynamic GAT-GRU. In terms of online inference speed, ClusLite-STGCN-GRU is significantly faster—66.8% and 79.7% faster, respectively—than the two baselines. Moreover, it reduces inference memory usage by 82.72% compared to wind-driven dynamic GAT-GRU, and its FLOP count is up to 84.29% lower, indicating substantially lighter computations. Although its number of parameters is slightly higher (about 1.8% more than distance-based GCN-GRU), this does not affect performance; in fact, the model still achieves a 15.3% reduction in total epoch time over distance-based GCN-GRU and a 65.7% reduction compared to wind-driven dynamic GAT-GRU. These results confirm its overall advantage in time efficiency, scalability, and suitability for real-time deployment.

6. Discussion

Table 6 presents a comparative analysis of the results of our study with those of previous research conducted in the same study area, Tehran, and it is worth noting that the results presented correspond to daily PM2.5 predictions. Although all the mentioned studies focus on PM2.5 prediction, their conditions differ due to variations in data sets, time periods, and prediction models used, which constitutes a limitation of the present study. Nevertheless, acknowledging these limitations, the comparison of results across these studies indicates that the proposed method demonstrates an improved performance in PM2.5 prediction.
Nabavi et al. [63] employed random forest to estimate daily PM2.5 concentrations in Tehran using satellite-based 10 km resolution merged dark target and deep blue (DB_DT) aerosol optical depth (AOD) along with meteorological data from 2011 to 2016. To enhance the relationship between satellite AOD and surface-level PM2.5, the study incorporated relative humidity adjustments and normalized AOD values using planetary boundary layer height (PBLH), supported by aerosol layer height (ALH) data derived from 159 CALIPSO profiles. While the model achieved a moderate level of accuracy (RMSE = 17.52 μg/m3, R2 = 0.68), it exhibited reduced performance in the summer and in the northern and eastern regions of Tehran, likely due to the absence of variables representing secondary aerosol formation and long-range pollutant transport mechanisms. Zamani Joharestani et al. [64] improved upon this by implementing XGBoost on data collected from 2015 to 2018, incorporating a comprehensive set of 23 features including ground-measured PM2.5, satellite AOD at 3 km resolution, meteorological parameters, and geographical information. Their model achieved an RMSE of 13.58 μg/m3 and an MAE of 9.93 μg/m3, with a maximum R2 of 0.81 after eliminating irrelevant features. However, they found that including satellite-derived AOD reduced model performance (R2 dropped to 0.63–0.67), suggesting that AOD may not significantly enhance PM2.5 prediction in dense urban environments such as Tehran, particularly when high-resolution ground and meteorological data are available.
In the most similar study, Faraji et al. [47] proposed a hybrid deep learning model that integrates three-dimensional convolutional neural networks (3D CNNs) with gated recurrent units (GRUs) to simultaneously capture spatiotemporal dependencies in PM2.5 data. The model was applied to air quality data collected from multiple AQMSs across Tehran between 2016 and 2019, and its performance was directly compared with machine learning models such as SVR and standalone deep learning models including ANN, LSTM, and GRU. For daily predictions, their proposed model achieved an RMSE of 15.21 µg/m3 and an MAE of 12.00 µg/m3, outperforming other models. However, the relatively higher error values compared to some other studies are likely due to the greater complexity of the data during the study period—such as stronger fluctuations in pollutant concentrations or limited availability of high-quality auxiliary data—rather than a limitation of the model architecture itself. In comparison, our proposed ClusLite-STGCN-GRU model, applied to more recent data from 2019 to 2022, achieved an RMSE of 13.45 μg/m3 and an MAE of 9.24 μg/m3, demonstrating the best performance among all studies conducted in Tehran. Although variations in study periods, features, and data quality limit direct comparison, the improved performance of our model can be attributed to the effective integration of spatial clustering with spatiotemporal graph convolutional networks. This approach better captures the complex spatial and temporal dependencies among AQMSs, resulting in enhanced accuracy of pollutant modeling, as confirmed by comparisons with competitive baseline models.

7. Conclusions

The irregular distribution of AQMSs and the spatiotemporal dynamics of air pollution have made graph-based spatiotemporal modeling an effective approach for accurate air quality forecasting. In such models, node features and edge weights vary over time, resulting in dynamic graph structures. However, many of the existing models rely on techniques such as time-varying adjacency matrices, adjacency matrices learned during training, or the combination of multiple adjacency matrices to achieve high accuracy. Although these approaches achieve good results, they often lead to increased computational complexity, implementation challenges, and reduced efficiency in practical applications. To address these limitations, a new lightweight spatiotemporal prediction model, ClusLite-STGCN-GRU, is proposed. The experimental results show that our proposed model, by simplifying the graph structure through clustering of AQMSs and feature selection based on spatiotemporal dependencies, achieves an effective balance between prediction accuracy, computational efficiency, and implementation simplicity. In medium- and long-term forecasting horizons (8 to 72 h), the model shows more stable and accurate performance compared to more complex graph-based models. Although there is a decrease in short-term (1 to 8 h) accuracy compared to models with dynamic and complex structures such as wind-driven dynamic GAT-GRU, the proposed model still delivers comparable and acceptable performance. From a computational standpoint, ClusLite-STGCN-GRU demonstrates superior efficiency—with reductions of up to 65.7% in training time, 82% in memory usage, and 84% in FLOPs compared to baseline models—making it well-suited for real-time applications. These results confirm that the proposed model establishes an effective balance between accuracy, simplicity, and computational efficiency, making it a practical choice for spatiotemporal air pollution forecasting.
To further evaluate the generalizability and robustness of the proposed model, future studies should investigate its performance across diverse geographic regions with varying topographical, climatic, and pollution-source characteristics. In addition, the effective deployment of this model in real-world environments, its integration with intelligent big data platforms, and its role in the development of comprehensive smart urban management systems are of great importance. To build such systems, leveraging the complementary strengths of these two types of models—dynamic models such as wind-driven dynamic GAT-GRU and clustering-based models such as ClusLite-STGCN-GRU—can prove beneficial. While dynamic models like wind-driven dynamic GAT-GRU offer higher accuracy for short-term predictions, their computational complexity may hinder deployment in real-time and rapid-response scenarios. Optimizing and simplifying such models can facilitate their use in short-term forecasting tasks. In contrast, clustering-based models such as ClusLite-STGCN-GRU are better suited for medium- to long-term forecasting, as they capture stable patterns based on temporal and behavioral similarities among monitoring stations. This performance distinction highlights the potential for designing hybrid intelligent systems that integrate both approaches at different levels of decision-making. Moreover, given the generalizable graph-based structure of the proposed model, it can also be applied to other domains such as weather forecasting, urban traffic flow analysis, and transportation demand estimation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/land14081589/s1, Figure S1: GRU predictive model framework; Figure S2: LSTM predictive model framework; Figure S3: GRU with multi-head attention predictive model framework; Figure S4: LSTM with multi-head attention predictive model framework; Figure S5: CNN-GRU predictive model framework; Figure S6: distance-based GCN-GRU predictive model framework; Figure S7: wind-driven dynamic GAT-GRU predictive model framework; Figure S8: (a) hourly PM2.5 concentration signal in the time domain; (b) frequency domain representation highlighting dominant 24 h, 12 h, and 6 h cycles; Figure S9: correlation between observed and predicted PM2.5 values in 1 h ahead forecasting by different models on the training and test datasets: (a) GRU, (b) LSTM, (c) GRU with multi-head attention, (d) LSTM with multi-head attention, (e) CNN-GRU, (f) distance-based GCN-GRU, (g) wind-driven dynamic GAT-GRU, (h) ClusLite-STGCN-GRU; Figure S10: correlation between observed and predicted PM2.5 values in 2 h ahead forecasting by different models on the training and test datasets: (a) GRU, (b) LSTM, (c) GRU with multi-head attention, (d) LSTM with multi-head attention, (e) CNN-GRU, (f) distance-based GCN-GRU, (g) wind-driven dynamic GAT-GRU, (h) ClusLite-STGCN-GRU; Figure S11: correlation between observed and predicted PM2.5 values in 4 h ahead forecasting by different models on the training and test datasets: (a) GRU, (b) LSTM, (c) GRU with multi-head attention, (d) LSTM with multi-head attention, (e) CNN-GRU, (f) distance-based GCN-GRU, (g) wind-driven dynamic GAT-GRU, (h) ClusLite-STGCN-GRU; Figure S12: correlation between observed and predicted PM2.5 values in 8 h ahead forecasting by different models on the training and test datasets: (a) GRU, (b) LSTM, (c) GRU with multi-head attention, (d) LSTM with multi-head attention, (e) CNN-GRU, (f) distance-based GCN-GRU, (g) wind-driven dynamic GAT-GRU, (h) ClusLite-STGCN-GRU; Figure S13: correlation between observed and predicted PM2.5 values in 12 h ahead forecasting by different models on the training and test datasets: (a) GRU, (b) LSTM, (c) GRU with multi-head attention, (d) LSTM with multi-head attention, (e) CNN-GRU, (f) distance-based GCN-GRU, (g) wind-driven dynamic GAT-GRU, (h) ClusLite-STGCN-GRU; Figure S14: correlation between observed and predicted PM2.5 values in 24 h ahead forecasting by different models on the training and test datasets: (a) GRU, (b) LSTM, (c) GRU with multi-head attention, (d) LSTM with multi-head attention, (e) CNN-GRU, (f) distance-based GCN-GRU, (g) wind-driven dynamic GAT-GRU, (h) ClusLite-STGCN-GRU; Figure S15: correlation between observed and predicted PM2.5 values in 48 h ahead forecasting by different models on the training and test datasets: (a) GRU, (b) LSTM, (c) GRU with multi-head attention, (d) LSTM with multi-head attention, (e) CNN-GRU, (f) distance-based GCN-GRU, (g) wind-driven dynamic GAT-GRU, (h) ClusLite-STGCN-GRU; Figure S16: correlation between observed and predicted PM2.5 values in 72 h ahead forecasting by different models on the training and test datasets: (a) GRU, (b) LSTM, (c) GRU with multi-head attention, (d) LSTM with multi-head attention, (e) CNN-GRU, (f) distance-based GCN-GRU, (g) wind-driven dynamic GAT-GRU, (h) ClusLite-STGCN-GRU; Table S1: features of the papers devoted to the implementation of spatiotemporal hybrid deep learning models for air quality prediction; Table S2: details of the experimental settings; Table S3: model results on the three datasets: training, validation, and test.

Author Contributions

Conceptualization, M.T.A.; methodology, M.T.A., A.A.A. and F.R.; data curation, M.T.A.; formal analysis, M.T.A. and A.A.A.; investigation, M.T.A., A.A.A. and F.R.; project administration, M.T.A.; validation, M.T.A., A.A.A. and F.R.; visualization, M.T.A. and F.R.; writing—original draft preparation, M.T.A.; writing—review and editing, A.A.A. and F.R.; supervision, A.A.A. and F.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this paper are available at the following GitHub repository: https://github.com/m-t-abbasi/ClusLite-STGCNGRU, accessed on 31 July 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, J.; Fu, M.; Wang, L.; Liang, Y.; Tang, F.; Li, S.; Wu, C. Impact of Urban Shrinkage on Pollution Reduction and Carbon Mitigation Synergy: Spatial Heterogeneity and Interaction Effects in Chinese Cities. Land 2025, 14, 537. [Google Scholar] [CrossRef]
  2. WHO World Health Organization. Available online: https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health (accessed on 31 July 2025).
  3. Habibi, R.; Alesheikh, A.A.; Mohammadinia, A.; Sharif, M. An Assessment of Spatial Pattern Characterization of Air Pollution: A Case Study of CO and PM2. 5 in Tehran, Iran. ISPRS Int. J. Geo-Inf. 2017, 6, 270. [Google Scholar] [CrossRef]
  4. Liu, F.; Jia, S.; Ma, L.; Lu, S. Spatiotemporal Dynamic Evolution of PM2. 5 Exposure from Land Use Changes: A Case Study of Gansu Province, China. Land 2025, 14, 795. [Google Scholar] [CrossRef]
  5. Song, Y.; Mao, H.; Li, H. Spatio-Temporal Modeling for Air Quality Prediction Based on Spectral Graph Convolutional Network and Attention Mechanism. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; IEEE: New York, NY, USA, 2022; pp. 1–9. [Google Scholar]
  6. Liu, Z.; Fang, Z.; Hu, Y. A Deep Learning-Based Hybrid Method for PM2. 5 Prediction in Central and Western China. Sci. Rep. 2025, 15, 10080. [Google Scholar]
  7. Guan, Q.; Wang, J.; Ren, S.; Gao, H.; Liang, Z.; Wang, J.; Yao, Y. Predicting Short-Term PM2. 5 Concentrations at Fine Temporal Resolutions Using a Multi-Branch Temporal Graph Convolutional Neural Network. Int. J. Geogr. Inf. Sci. 2024, 38, 778–801. [Google Scholar] [CrossRef]
  8. Wang, Z.; Hu, K.; Wang, Z.; Yang, B.; Chen, Z. Impact of Urban Neighborhood Morphology on PM2. 5 Concentration Distribution at Different Scale Buffers. Land 2024, 14, 7. [Google Scholar] [CrossRef]
  9. Faridi, S.; Niazi, S.; Yousefian, F.; Azimi, F.; Pasalari, H.; Momeniha, F.; Mokammel, A.; Gholampour, A.; Hassanvand, M.S.; Naddafi, K. Spatial Homogeneity and Heterogeneity of Ambient Air Pollutants in Tehran. Sci. Total Environ. 2019, 697, 134123. [Google Scholar] [CrossRef]
  10. Mun, H.; Li, M.; Jung, J. Spatial-Temporal Characteristics and Influencing Factors of Particulate Matter: Geodetector Approach. Land 2022, 11, 2336. [Google Scholar] [CrossRef]
  11. Alharbi, B.H.; Alduwais, A.K.; Alhudhodi, A.H. An Analysis of the Spatial Distribution of O3 and Its Precursors during Summer in the Urban Atmosphere of Riyadh, Saudi Arabia. Atmos. Pollut. Res. 2017, 8, 861–872. [Google Scholar] [CrossRef]
  12. Hu, Y.; Li, Q.; Shi, X.; Yan, J.; Chen, Y. Domain Knowledge-Enhanced Multi-Spatial Multi-Temporal PM2. 5 Forecasting with Integrated Monitoring and Reanalysis Data. Environ. Int. 2024, 192, 108997. [Google Scholar] [CrossRef]
  13. Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N. Prabhat, fnm Deep Learning and Process Understanding for Data-Driven Earth System Science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef]
  14. Abbasi, M.T.; Alesheikh, A.A.; Lotfata, A.; Azizi, Z. Hybrid Graph Convolutional Networks for Air Quality Prediction: A Systematic Review of Foundations, Challenges, and Opportunities. Int. J. Environ. Sci. Technol. 2025. [Google Scholar] [CrossRef]
  15. Chen, Q.; Ding, R.; Mo, X.; Li, H.; Xie, L.; Yang, J. An Adaptive Adjacency Matrix-Based Graph Convolutional Recurrent Network for Air Quality Prediction. Sci. Rep. 2024, 14, 4408. [Google Scholar] [CrossRef]
  16. Hu, W.; Zhang, Z.; Zhang, S.; Chen, C.; Yuan, J.; Yao, J.; Zhao, S.; Guo, L. Learning Spatiotemporal Dependencies Using Adaptive Hierarchical Graph Convolutional Neural Network for Air Quality Prediction. J. Clean. Prod. 2024, 459, 142541. [Google Scholar] [CrossRef]
  17. Zeng, Q.; Cao, Y.; Fan, M.; Chen, L.; Zhu, H.; Wang, L.; Li, Y.; Liu, S. Fine Particulate Matter Concentration Prediction Based on Hybrid Convolutional Network with Aggregated Local and Global Spatiotemporal Information: A Case Study in Beijing and Chongqing. Atmos. Environ. 2024, 333, 120647. [Google Scholar] [CrossRef]
  18. Liu, H.; Han, Q.; Lu, D.; Sheng, J.; Sui, S.; Sun, H. Fine-Grained Graph Convolutional Network with Learning-Based Bi-Relational Graph for Spatiotemporal Forecasting. Expert Syst. Appl. 2025, 265, 125959. [Google Scholar] [CrossRef]
  19. Zhao, Q.; Liu, J.; Yang, X.; Qi, H.; Lian, J. Spatiotemporal PM2. 5 Forecasting via Dynamic Geographical Graph Neural Network. Environ. Model. Softw. 2025, 186, 106351. [Google Scholar] [CrossRef]
  20. Huang, Y.; Han, F.; Feng, Q. A Novel Model for Predicting PM2. 5 Concentrations Utilizing Graph Convolutional Networks and Transformer. IEEE Access 2025. [Google Scholar]
  21. Zeng, Q.; Zeng, H.; Fan, M.; Chen, L.; Tao, J.; Zhang, Y.; Zhu, H.; Liu, S.; Zhu, Y. Adaptive Graph-Generating Jump Network for Air Quality Prediction Based on Improved Graph Convolutional Network. Atmos. Pollut. Res. 2025, 16, 102488. [Google Scholar] [CrossRef]
  22. Wang, P.; Zhang, H.; Cheng, S.; Zhang, T.; Lu, F.; Wu, S. A Lightweight Spatiotemporal Graph Dilated Convolutional Network for Urban Sensor State Prediction. Sustain. Cities Soc. 2024, 101, 105105. [Google Scholar] [CrossRef]
  23. Zheng, Y.; Capra, L.; Wolfson, O.; Yang, H. Urban Computing: Concepts, Methodologies, and Applications. ACM Trans. Intell. Syst. Technol. 2014, 5, 1–55. [Google Scholar] [CrossRef]
  24. Van, N.H.; Van Thanh, P.; Tran, D.N.; Tran, D.-T. A New Model of Air Quality Prediction Using Lightweight Machine Learning. Int. J. Environ. Sci. Technol. 2023, 20, 2983–2994. [Google Scholar] [CrossRef]
  25. Abbasi, M.T.; Alesheikh, A.A.; Jafari, A.; Lotfata, A. Spatial and Temporal Patterns of Urban Air Pollution in Tehran with a Focus on PM2. 5 and Associated Pollutants. Sci. Rep. 2024, 14, 25150. [Google Scholar] [CrossRef] [PubMed]
  26. Kalankesh, L.R.; Khajavian, N.; Soori, H.; Vaziri, M.H.; Saeedi, R.; Hajighasemkhan, A. Association Metrological Factors with Covid-19 Mortality in Tehran, Iran (2020-2021). Int. J. Environ. Health Res. 2024, 34, 1725–1736. [Google Scholar] [CrossRef] [PubMed]
  27. Taksibi, F.; Khajehpour, H.; Saboohi, Y. On the Environmental Effectiveness Analysis of Energy Policies: A Case Study of Air Pollution in the Megacity of Tehran. Sci. Total Environ. 2020, 705, 135824. [Google Scholar] [CrossRef] [PubMed]
  28. Zhu, J.; Ge, Z.; Song, Z.; Gao, F. Review and Big Data Perspectives on Robust Data Mining Approaches for Industrial Process Modeling with Outliers and Missing Data. Annu. Rev. Control 2018, 46, 107–133. [Google Scholar] [CrossRef]
  29. Singh, D.; Singh, B. Feature Wise Normalization: An Effective Way of Normalizing Data. Pattern Recognit. 2022, 122, 108307. [Google Scholar] [CrossRef]
  30. Jiang, W.; Luo, J. Graph Neural Network for Traffic Forecasting: A Survey. Expert Syst. Appl. 2022, 207, 117921. [Google Scholar] [CrossRef]
  31. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and Applications. AI open 2020, 1, 57–81. [Google Scholar] [CrossRef]
  32. Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph Convolutional Networks: A Comprehensive Review. Comput. Soc. Networks 2019, 6, 1–23. [Google Scholar] [CrossRef]
  33. Wu, G.; Al-qaness, M.A.A.; Al-Alimi, D.; Dahou, A.; Abd Elaziz, M.; Ewees, A.A. Hyperspectral Image Classification Using Graph Convolutional Network: A Comprehensive Review. Expert Syst. Appl. 2024, 257, 125106. [Google Scholar] [CrossRef]
  34. Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral Networks and Locally Connected Networks on Graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
  35. Henaff, M.; Bruna, J.; LeCun, Y. Deep Convolutional Networks on Graph-Structured Data. arXiv 2015, arXiv:1506.05163. [Google Scholar]
  36. Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
  37. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  38. Shin, J.; Yeon, K.; Kim, S.; Sunwoo, M.; Han, M. Comparative Study of Markov Chain with Recurrent Neural Network for Short Term Velocity Prediction Implemented on an Embedded System. IEEE Access 2021, 9, 24755–24767. [Google Scholar] [CrossRef]
  39. Hopfield, J.J. Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proc. Natl. Acad. Sci. USA 1982, 79, 2554–2558. [Google Scholar] [CrossRef]
  40. Farmanifard, S.; Alesheikh, A.A.; Sharif, M. A Context-Aware Hybrid Deep Learning Model for the Prediction of Tropical Cyclone Trajectories. Expert Syst. Appl. 2023, 231, 120701. [Google Scholar] [CrossRef]
  41. Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  42. Hamedi, H.; Alesheikh, A.A.; Panahi, M.; Lee, S. Landslide Susceptibility Mapping Using Deep Learning Models in Ardabil Province, Iran. Stoch. Environ. Res. Risk Assess. 2022, 36, 4287–4310. [Google Scholar] [CrossRef]
  43. Schmidhuber, J.; Hochreiter, S. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar]
  44. Fan, J.; Li, R.; Zhao, M.; Pan, X. A BiLSTM-Based Hybrid Ensemble Approach for Forecasting Suspended Sediment Concentrations: Application to the Upper Yellow River. Land 2025, 14, 1199. [Google Scholar] [CrossRef]
  45. Hakim, W.L.; Nur, A.S.; Rezaie, F.; Panahi, M.; Lee, C.-W.; Lee, S. Convolutional Neural Network and Long Short-Term Memory Algorithms for Groundwater Potential Mapping in Anseong, South Korea. J. Hydrol. Reg. Stud. 2022, 39, 100990. [Google Scholar] [CrossRef]
  46. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
  47. Faraji, M.; Nadi, S.; Ghaffarpasand, O.; Homayoni, S.; Downey, K. An Integrated 3D CNN-GRU Deep Learning Method for Short-Term Prediction of PM2. 5 Concentration in Urban Environment. Sci. Total Environ. 2022, 834, 155324. [Google Scholar] [CrossRef] [PubMed]
  48. Li, M.; Yan, Y. Comparative Analysis of Machine-Learning Models for Soil Moisture Estimation Using High-Resolution Remote-Sensing Data. Land 2024, 13, 1331. [Google Scholar] [CrossRef]
  49. Xiong, B.; Tang, J.; Li, Y.; Zhou, P.; Zhang, S.; Zhang, X.; Dong, C.; Gooi, H.B. A Flow-Rate-Aware Data-Driven Model of Vanadium Redox Flow Battery Based on Gated Recurrent Unit Neural Network. J. Energy Storage 2023, 74, 109537. [Google Scholar] [CrossRef]
  50. Szramowiat-Sala, K.; Marczak-Grzesik, M.; Karczewski, M.; Kistler, M.; Giebl, A.K.; Styszko, K. Chemical Investigation of Polycyclic Aromatic Hydrocarbon Sources in an Urban Area with Complex Air Quality Challenges. Sci. Rep. 2025, 15, 6987. [Google Scholar] [CrossRef]
  51. Yang, L.; Wang, G.; Wang, Y.; Wang, Y.; Ma, Y.; Zhang, X. A Rapid Computational Method for Quantifying Inter-Regional Air Pollutant Transport Dynamics. Atmosphere 2025, 16, 163. [Google Scholar] [CrossRef]
  52. Joe, H. Dependence Modeling with Copulas; CRC Press: Boca Raton, FL, USA, 2014; ISBN 1466583223. [Google Scholar]
  53. Lyu, M.-Z.; Fei, Z.-J.; Feng, D.-C. Copula-Based Cloud Analysis for Seismic Fragility and Its Application to Nuclear Power Plant Structures. Eng. Struct. 2024, 305, 117754. [Google Scholar] [CrossRef]
  54. Pan, S.; Joe, H. Predicting Times to Event Based on Vine Copula Models. Comput. Stat. Data Anal. 2022, 175, 107546. [Google Scholar] [CrossRef]
  55. Zhang, J.; Li, Y.; Liu, C.; Wu, B.; Shi, K. A Study of Cross-Correlations between PM2. 5 and O3 Based on Copula and Multifractal Methods. Phys. A Stat. Mech. Its Appl. 2022, 589, 126651. [Google Scholar] [CrossRef]
  56. Zhang, Y. Dynamic Effect Analysis of Meteorological Conditions on Air Pollution: A Case Study from Beijing. Sci. Total Environ. 2019, 684, 178–185. [Google Scholar] [CrossRef] [PubMed]
  57. Qi, Y.; Li, Q.; Karimian, H.; Liu, D. A Hybrid Model for Spatiotemporal Forecasting of PM2. 5 Based on Graph Convolutional Neural Network and Long Short-Term Memory. Sci. Total Environ. 2019, 664, 1–10. [Google Scholar] [CrossRef]
  58. Zhou, H.; Zhang, F.; Du, Z.; Liu, R. A Theory-Guided Graph Networks Based PM2. 5 Forecasting Method. Environ. Pollut. 2022, 293, 118569. [Google Scholar] [CrossRef]
  59. Wang, H.; Zhang, L.; Wu, R.; Cen, Y. Spatio-Temporal Fusion of Meteorological Factors for Multi-Site PM2. 5 Prediction: A Deep Learning and Time-Variant Graph Approach. Environ. Res. 2023, 239, 117286. [Google Scholar] [CrossRef] [PubMed]
  60. Pillai, P.S.; Babu, S.S.; Moorthy, K.K. A Study of PM, PM10 and PM2. 5 Concentration at a Tropical Coastal Station. Atmos. Res. 2002, 61, 149–167. [Google Scholar] [CrossRef]
  61. Wang, P.; Guo, H.; Hu, J.; Kota, S.H.; Ying, Q.; Zhang, H. Responses of PM2. 5 and O3 Concentrations to Changes of Meteorology and Emissions in China. Sci. Total Environ. 2019, 662, 297–306. [Google Scholar] [CrossRef] [PubMed]
  62. Chuang, M.-T.; Chou, C.C.-K.; Lin, C.-Y.; Lee, J.-H.; Lin, W.-C.; Chen, Y.-Y.; Chang, C.-C.; Lee, C.-T.; Kong, S.S.-K.; Lin, T.-H. A Numerical Study of Reducing the Concentration of O3 and PM2. 5 Simultaneously in Taiwan. J. Environ. Manag. 2022, 318, 115614. [Google Scholar] [CrossRef]
  63. Nabavi, S.O.; Haimberger, L.; Abbasi, E. Assessing PM2. 5 Concentrations in Tehran, Iran, from Space Using MAIAC, Deep Blue, and Dark Target AOD and Machine Learning Algorithms. Atmos. Pollut. Res. 2019, 10, 889–903. [Google Scholar] [CrossRef]
  64. Zamani Joharestani, M.; Cao, C.; Ni, X.; Bashir, B.; Talebiesfandarani, S. PM2. 5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data. Atmosphere 2019, 10, 373. [Google Scholar] [CrossRef]
Figure 1. (a) Study area; (b) distribution of AQMSs.
Figure 1. (a) Study area; (b) distribution of AQMSs.
Land 14 01589 g001
Figure 2. Architectures of (a) Elman RNN, (b) LSTM, and (c) GRU.
Figure 2. Architectures of (a) Elman RNN, (b) LSTM, and (c) GRU.
Land 14 01589 g002
Figure 3. The framework of our proposed model for PM2.5 prediction.
Figure 3. The framework of our proposed model for PM2.5 prediction.
Land 14 01589 g003
Figure 4. Cluster patterns of (a) NO2, (b) PM10, and (c) PM2.5.
Figure 4. Cluster patterns of (a) NO2, (b) PM10, and (c) PM2.5.
Land 14 01589 g004
Table 1. Descriptive statistics of variables.
Table 1. Descriptive statistics of variables.
VariableUnitRangeMeanSt. Dev.
PM2.5 μ g / m 3 [0.167, 249.724]30.68020.309
PM10 μ g / m 3 [0.677, 697.977]76.78046.340
SO2ppb[0.051, 142.400]7.5616.472
NO2ppb[0.565, 301.055]48.59722.942
O3ppb[0.0680, 213.035]20.70220.204
COppm[0.0075, 15.7300]1.8711.269
Temperature°C[−7.631, 40.888]17.77010.191
Pressurembar[956.452, 1037.462]1011.2368.928
Humidity%[2.479, 99.147]36.65620.766
Dew point temperature°C[−26.819, 24.372]0.0615.308
Wind_xm/s[−11.653, 7.269]−0.8741.408
Wind_ym/s[−18.558, 9.383]−0.1881.936
Table 2. Copula families and their dependence characteristics.
Table 2. Copula families and their dependence characteristics.
Copula FamilyTail DependenceSymmetryType of Dependence
Captured
Typical Use Case in Air Pollution
ClaytonLower tail (strong)AsymmetricCaptures stronger association in low extremesSimultaneous decrease in pollutant concentrations
GumbelUpper tail (strong)AsymmetricCaptures stronger association in high extremesJoint increase in pollutants under severe pollution events
t-StudentBoth tails (moderate/strong)SymmetricModels symmetric tail dependenceExtreme events with co-movements in both directions
GaussianNone (only linear correlation)SymmetricCaptures linear correlation but no tail dependenceMild/moderate dependence under normal conditions
FrankNo tail dependence (moderate)SymmetricCaptures moderate dependence across the whole rangeBalanced and non-extreme pollutant interactions
Table 3. Copula-based dependence measures between PM2.5 and other pollutants.
Table 3. Copula-based dependence measures between PM2.5 and other pollutants.
PollutantFitted Cupola Model τ λ u p p e r λ l o w e r
O3Rotated Gumbel 90°−0.26200
COFrank0.38600
NO2t-Student0.4240.1360.136
SO2t-Student0.3880.1890.189
PM10t-Student0.6650.2290.229
Table 4. Results of models on the test dataset.
Table 4. Results of models on the test dataset.
ModelMetric+1+2+4+8+12+24+48+72
GRUIA0.8860.8350.7610.7080.6470.6060.5430.513
R20.6880.5850.4630.3600.3190.2770.2290.202
MAE ( μ g / m 3 )6.7617.9129.18710.12110.43210.65510.97611.175
RMSE ( μ g / m 3 )10.20811.78013.43814.70915.17015.64716.19416.550
LSTMIA0.8680.8120.7370.6620.6290.5780.5270.504
R26.9118.0089.18610.07210.33710.65210.93511.044
MAE ( μ g / m 3 )10.54812.05013.54214.68415.14015.59316.01216.251
RMSE ( μ g / m 3 )10.54812.05013.54214.68415.14015.59316.01216.251
GRU with multi-head attentionIA0.8920.8350.7560.6890.6570.6040.5190.480
R20.6910.5810.4560.3550.3220.2700.1890.142
MAE ( μ g / m 3 )6.6887.8939.28710.37010.61110.81411.38311.703
RMSE ( μ g / m 3 )10.12111.78813.47814.72215.12815.70516.61417.171
LSTM with multi-head attentionIA0.8770.8250.7480.6690.6380.5850.5030.471
R20.6630.5620.4370.3290.2940.2450.1620.128
MAE ( μ g / m 3 )7.0908.1609.44010.45810.73911.02111.60811.783
RMSE ( μ g / m 3 )10.56912.05213.69714.98315.38915.92216.84717.255
CNN-GRUIA0.9140.8630.8110.7480.7350.6740.5850.549
R20.7400.6270.5460.4510.4340.3590.2470.200
MAE ( μ g / m 3 )6.3767.7788.8229.7839.90910.25711.01811.359
RMSE ( μ g / m 3 )9.83811.78313.04514.31314.42114.85415.96616.440
Distance-based GCN-GRUIA0.9210.8760.8110.7660.7430.7200.6140.592
R20.7480.6440.5540.4660.4310.3860.2390.224
MAE ( μ g / m 3 )6.1857.3968.4769.2589.5939.88010.99211.089
RMSE ( μ g / m 3 )9.68611.51012.92214.14714.48514.53916.06616.211
Wind-driven dynamic GAT-GRUIA0.9350.8920.8430.7860.7520.7150.5670.546
R20.8020.7170.6320.5510.5060.4570.2840.241
MAE ( μ g / m 3 )5.6136.7637.8568.8489.1849.47110.79610.972
RMSE ( μ g / m 3 )8.48510.30611.79012.99913.51513.72115.62516.059
ClusLite-STGCN-GRUIA0.9200.8840.8420.7880.7760.7520.6330.624
R20.7650.6870.6170.5330.5120.4750.3230.317
MAE ( μ g / m 3 )5.9057.0088.0018.9099.0629.24410.45810.461
RMSE ( μ g / m 3 )9.37310.79711.98113.21213.40013.45015.14815.210
Table 5. Comparison of computational complexity from different aspects among hybrid graph-based deep learning models.
Table 5. Comparison of computational complexity from different aspects among hybrid graph-based deep learning models.
ModelFBP (s)FP (ms)Total Epoch Time (s)Inference Memory Allocated (MB)FLOPsNumber of Parameters
Distance-based GCN-GRU383.3511.03440.65920.4078,312,960252,844
Wind-driven dynamic
GAT-GRU
995.8718.011089.404166.76105,318,400176,304
ClusLite-STGCN-GRU337.273.66373.41720.0916,549,904257,288
Table 6. A comparison of the findings of our study with those of earlier research conducted in Tehran.
Table 6. A comparison of the findings of our study with those of earlier research conducted in Tehran.
AuthorsPublicationStudy PeriodModelEvaluation Criteria
Nabavi et al. [63]20192011–2016Machine Learning
(Random Forest)
RMSE   =   17.52   μ g / m 3
MAE = Not mentioned.
Zamani Joharestani et al. [64]20192015–2018Machine Learning (XGBoost) RMSE = 13.58   μ g / m 3
MAE = 9.93   μ g / m 3
Faraji et al. [47]20222016–2019Deep Learning
(3D CNN-GRU)
RMSE = 15.21   μ g / m 3
MAE = 12.00   μ g / m 3
Ours-2019–2022Deep Learning
(ClusLite-STGCN-GRU)
RMSE = 13.45   μ g / m 3
MAE = 9.24   μ g / m 3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Abbasi, M.T.; Alesheikh, A.A.; Rezaie, F. A Lightweight Spatiotemporal Graph Framework Leveraging Clustered Monitoring Networks and Copula-Based Pollutant Dependency for PM2.5 Forecasting. Land 2025, 14, 1589. https://doi.org/10.3390/land14081589

AMA Style

Abbasi MT, Alesheikh AA, Rezaie F. A Lightweight Spatiotemporal Graph Framework Leveraging Clustered Monitoring Networks and Copula-Based Pollutant Dependency for PM2.5 Forecasting. Land. 2025; 14(8):1589. https://doi.org/10.3390/land14081589

Chicago/Turabian Style

Abbasi, Mohammad Taghi, Ali Asghar Alesheikh, and Fatemeh Rezaie. 2025. "A Lightweight Spatiotemporal Graph Framework Leveraging Clustered Monitoring Networks and Copula-Based Pollutant Dependency for PM2.5 Forecasting" Land 14, no. 8: 1589. https://doi.org/10.3390/land14081589

APA Style

Abbasi, M. T., Alesheikh, A. A., & Rezaie, F. (2025). A Lightweight Spatiotemporal Graph Framework Leveraging Clustered Monitoring Networks and Copula-Based Pollutant Dependency for PM2.5 Forecasting. Land, 14(8), 1589. https://doi.org/10.3390/land14081589

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop