1. Introduction
On a global scale, environmental pollution has been markedly intensified by urbanization and industrialization [
1,
2]. In urban systems, atmospheric pollution is inherently a spatial–temporal phenomenon, shaped by heterogeneous emission sources, land-use patterns, population exposure, and rapidly changing meteorological conditions. Traditional geospatial technologies—including ground monitoring networks, remote sensing, geographic information systems (GISs), and meteorological reanalysis products—provide essential infrastructures for observing, mapping, and interpreting the spatial and temporal dimensions of urban air quality [
3,
4,
5]. Among various pollutants, ground-level ozone (O
3) is a secondary pollutant formed through photochemical reactions in the troposphere and has emerged as a critical environmental challenge across many metropolises [
6]. Unlike stratospheric ozone, which protects against ultraviolet radiation, ground-level ozone is harmful to human health and ecosystems. Prolonged exposure to elevated ground-level ozone has been associated with adverse health outcomes, including respiratory diseases such as chronic obstructive pulmonary disease (COPD), and is linked to premature mortality [
7]. These health burdens translate into substantial socioeconomic costs and highlight the need for decision-relevant information that is both spatially explicit and temporally actionable. China has made sustained efforts in air pollution control, and substantial progress has been achieved for several pollutants [
8,
9,
10]. However, ozone remains persistent and challenging due to its nonlinear formation mechanisms and strong sensitivity to meteorological variability, which often leads to pronounced spatial contrasts within and between cities. Therefore, developing accurate forecasting frameworks that leverage geospatial observations to capture urban-scale spatiotemporal dynamics of ground-level ozone is essential for air-quality management, public health protection, and sustainable urban environmental governance [
11].
The models used to predict ozone concentrations mainly included three classifications: numerical forecast, statistical forecast, and machine learning models [
12,
13,
14]. In recent years, numerical forecast models have been universally used in the prediction of air pollutants, including the Community Multi-Scale Air Quality Model (CMAQ) and the Weather Research and Forecasting Model with Chemistry (WRF-Chem) [
15]. Uncertainties in initial and boundary meteorological conditions, along with imperfect parameter configurations and dataset limitations, frequently introduce inaccuracies into model predictions [
16,
17,
18,
19].
Statistical models were initially used for their flexible and practically applicable advantages [
20], such as the Autoregressive Moving Average (ARMA) model [
21], Autoregressive Integrated Moving Average (ARIMA) model [
22], and Multiple Linear Regression (MLR) model [
23]. These models are based on linear assumptions and only handle one-dimensional data; they can capture linear relationships but cannot handle complex nonlinear problems [
24,
25].
Recently, machine learning has received attention as a tool for predicting ozone concentration with strong data-mining capabilities. Compared with traditional statistical models, machine learning models can process complex multidimensional data and nonlinear problems. Machine learning models include artificial neural networks (ANNs) [
26], Support Vector Machines (SVMs) [
27], and Random Forests (RFs) [
28]. These models are easy to use, can easily access data, and have flexible input and output features. However, as the scale and complexity of the data have grown substantially, the computational challenges posed by enormous high-dimensional datasets hinder the effective performance of traditional machine learning models. As an emerging form of machine learning models, deep learning models have quickly become an efficient way to process complex multidimensional data [
29]. For example, Ebrahim Eslami developed a Convolutional Neural Network (CNN) model for 24 h ozone predictions in Seoul, achieving high accuracy with a mean Index of Agreement (IOA) of 0.84–0.89 across 25 stations [
30]. Building on this line of work, subsequent studies have increasingly sought not only to model temporal patterns directly from observations but also to harness and correct information from chemistry transport models, thereby moving from purely data-driven forecasting to hybrid, model-guided approaches. Ahmed Khan Salman develops and evaluates a Temporal Convolutional Neural Network (TCNN) to bias-correct hourly ozone forecasts from the CMAQ model over South Korea for a 3-day horizon [
31]. A complementary thread leveraged recurrent architectures whose gating can preserve and reuse context across many hours. Matteo Sangiorgio employs Long Short-Term Memory (LSTM) networks to predict hourly concentrations at 20 stations around the Alpine region [
32]. However, purely recurrent setups, even with gated memory, can under-represent cross-site interactions and uneven spatial influences [
33]. To explicitly encode these inter-station relations and adaptively weight salient temporal cues, Zhang and Hou propose a spatiotemporal neural network for 24 h ozone prediction. The model fuses graph convolution, bidirectional LSTM, and an attention mechanism, achieving the best performance [
34]. Extending this progression from spatiotemporal fusion toward greater interpretability, recent work has coupled explicit time series decomposition with sequence models to separate structure from noise before prediction. Mu and Bi develop an interpretable hybrid model that combines Seasonal-Trend decomposition using Loess (STL) with a Transformer. The model first decomposes the ozone time series into trend, seasonal, and residual components via STL, achieving promising results [
35].
Recent GNN-based ozone prediction studies generally define inter-station connectivity using fixed geographic proximity, effectively freezing the graph structure along the temporal dimension. Meanwhile, recent advances have explored learning adaptive or dynamic spatial dependencies from data, such as adaptive hierarchical graph convolution networks [
36], dynamic graph neural networks with learnable edge attributes [
37], and adaptive adjacency matrix-based graph convolutional recurrent networks [
38]. Although these approaches improve the flexibility of spatial dependency modeling, most existing studies still do not explicitly incorporate wind-driven transport physics when constructing time-varying inter-station connectivity for ozone forecasting.
However, ozone transport exhibits significant anisotropy: air masses transport pollutants downwind, creating essential differences in mixing effects between upwind and downwind directions. As wind speed and direction dynamically modulate horizontal advection and diffusion, the influence of each station on its neighbors fluctuates in real time with the evolving flow field. This urgently necessitates replacing static topologies with a dynamic wind-sensitive graph, where edge connections and weights update in response to instantaneous meteorological conditions. One concrete pathway is a GNN–Transformer architecture in which graph operations ingest the time-varying, directionally weighted topology to represent instantaneous inter-station coupling, while a Transformer captures longer-range temporal dependencies and, via attention, adaptively emphasizes periods and sources that matter most. Although such a design is well aligned with the physics of pollutant transport and is likely to enhance predictive skill, few ozone prediction studies explicitly construct dynamic wind-driven graphs based on real-time wind fields.
The main aim of this study is to develop a novel spatiotemporal hybrid deep learning model called the Wind Speed and Direction-Based Dynamic Spatial Graph Attention Network (WSDST-GAT). Concretely, WSDST-GAT constructs a time-varying directed monitoring graph whose connectivity is determined by the instantaneous wind field, orienting edges along the downwind pathway and scaling their strengths as a function of wind speed and direction. On this evolving topology, a Graph Attention Network computes adaptive inter-station coefficients to aggregate information from dynamically relevant neighbors, enabling rich extraction of spatial signals under changing flow conditions. For the temporal backbone, we employ a Transformer encoder–decoder that leverages multi-head attention to capture long-range dependencies in the hourly series, assigning greater weight to pivotal periods while maintaining sequence coherence. The resulting architecture supports both point forecasting of hourly ozone concentrations and the range estimation of concentration intervals. To translate station-level forecasts into operational, city-wide ozone fields with quantified uncertainty, we integrate a co-kriging module. Rather than serving as a mere visualization step, this spatial component models a physically informed continuous surface conditioned on meteorology, thereby bridging point forecasts and geospatial decision needs.
2. Materials and Methods
2.1. Study Area
Beijing, the economic, social, and cultural center of China, is selected as the study area in this study. Beijing is located in the northern part of the North China Plain, with the Yanshan Mountains in the north and the Taihang Mountains in the west. The terrain of Beijing is higher in the northwest and flatter in the east. Beijing’s climate is a typical northern temperate semi-humid continental monsoon climate. It is hot and rainy in summer, cold and dry in winter; spring and fall are relatively short. Ozone is easily generated through chemical reactions of nitrogen oxides and volatile organic compounds under favorable meteorological conditions, including intense solar radiation, low humidity, high temperature, and low winds. Due to the particular location and local geographical conditions, ozone pollution once became the main component of air pollutants in Beijing. The study area encompasses Beijing’s 16 administrative districts: Yanqing (YQ), Miyun (MY), Huairou (HR), Changping (CP), Pinggu (PG), Shunyi (SY), Haidian (HD), Chaoyang (CY), Tongzhou (TZ), Daxing (DX), Fangshan (FS), Mentougou (MTG), Shijingshan (SJS), Fengtai (FT), Dongcheng (DC), and Xicheng (XC).
2.2. Data Sources
Air pollutant concentration from 1 January 2022to 31 December 2022 was obtained from the Real-time Air Quality Release System of the Beijing Ecological Environment Monitoring Center (
https://www.bjmemc.com.cn/ (accessed on 11 October 2025)). The system provides hourly ground-level air pollutants, including O
3, CO, NO
2, SO
2, PM
2.5, and PM
10. Meteorological data were derived from the ERA5-Land dataset, provided by the European Center for Medium-Range Weather Forecasting (ECMWF) (
https://cds.climate.copernicus.eu/ (accessed on 11 October 2025)) [
39]. ERA5-Land is a global reanalysis product that ingests vast worldwide observations from sources such as satellites, radars, and weather stations and combines them with model data through the laws of physics to generate a globally complete and consistent dataset. This dataset was used to capture key atmospheric variables: 2 m dewpoint temperature (d2m), 2 m temperature (t2m), surface pressure (sp), 10 m u-component of wind (u10), and 10 m v-component of wind (v10). The monitoring stations in the study area are shown in
Figure 1. All spatial data were processed in the WGS 84 geographic coordinate reference system. The great-circle distance and azimuth angle required for wind-aligned graph construction were derived from these coordinates.
2.3. Data Preprocessing
Hourly air quality observations were obtained from 35 monitoring stations in Beijing. All stations measure six pollutants: CO, NO
2, PM
10, PM
2.5, SO
2, and O
3. The latitude and longitude coordinates of each station are listed in
Table A1.
In addition to pollutant concentrations, key meteorological variables were collected, including temperature, humidity, surface pressure, and wind components. These variables describe different atmospheric states and reflect the underlying climatic conditions that influence ozone formation and dispersion. From a statistical perspective, their distributions exhibit substantial variability across seasons and time scales, providing essential explanatory signals for modeling atmospheric dynamics.
All records were converted to a unified hourly timeline in UTC. Obvious invalid values, such as sentinel codes and negative concentrations, were treated as missing. This preprocessing step was applied independently to each variable at each station to ensure data consistency.
Table 1 summarizes the statistical characteristics of both pollution and meteorological variables, including minimum values (Min), maximum values (Max), median values (Median), and standard deviation (St.Dev).
From a climatic perspective, these statistical measures quantitatively describe the variability in atmospheric states during the study year. The wide ranges observed in temperature and wind components reflect seasonal transitions and synoptic-scale variability, while dispersion statistics capture the dynamic fluctuations that influence ozone formation and transport.
These descriptive statistics provide a mathematical characterization of the climatic background under which the proposed model is evaluated.
2.3.1. Outlier Processing
To mitigate spurious spikes, we applied a Hampel filter within a centered 24 h window to each station–variable series.
where
is the hourly value at time
t;
is the rolling median within a centered window of width
w;
is the rolling median absolute deviation computed within the same window; and
is the outlier threshold.
2.3.2. Missing-Value Imputation
After outlier handling, data gaps for each station and variable are filled using a two-stage sequential strategy. First, forward/backward filling is applied to bridge short-term gaps. If gaps persist, a 24 h rolling median is then used for interpolation. This approach balances local temporal coherence with robustness in handling longer data gaps.
2.3.3. Normalization
Due to significant differences in the significance and numerical ranges of different feature variables, normalization is necessary before entering the model. Data normalization can effectively accelerate the convergence speed of the model and improve its predictive performance. All variables were normalized using min–max scaling prior to model training.
2.4. Supporting Statistical and Geospatial Methods
This subsection introduces the supporting methodological components used for exploratory dependency characterization and input construction, volatility-stratified evaluation, as well as spatial field reconstruction.
2.4.1. Dependency Characterization and Input Construction
Spatiotemporal Dependency Characterization
To characterize temporal persistence and spatial coherence in ozone observations prior to model development, two standard exploratory statistics were computed. First, for each station, the autocorrelation function (ACF) was calculated to assess temporal dependence within a 100 h lag window. Second, to quantify inter-station association, pairwise Pearson correlation coefficients were computed across all station pairs using contemporaneously aligned hourly ozone series.
2.4.2. Meteorological Volatility Index
To quantify the atmospheric instability and group the test hours as discussed in
Section 3, we introduce the meteorological volatility index (
). This index measures the intensity of temporal fluctuations in wind speed over a sliding window. It is defined mathematically as the rolling standard deviation:
where:
denotes the observed meteorological value at time step
t;
W is the size of the sliding window;
represents the moving average over the window
W.
Based on the distribution of , we categorize the test samples into different stability groups. Specifically, periods where exceeds the 80th percentile of the historical distribution are classified as high volatility, whereas periods falling below the 20th percentile represent steady conditions.
2.4.3. Co-Kriging Method
We improve ozone interpolation by co-kriging with physically related meteorology: 2 m dewpoint temperature, 2 m temperature, surface pressure, 10 m u-component of wind, and 10 m v-component of wind. All variables are time-aligned and standardized.
Let the primary variable be ozone
and the secondary set
. Under second-order stationarity of residuals, direct and cross semivariograms follow a Linear Model of Coregionalization (LMC) [
41]:
where
are authorized basic structures, and each coregionalization matrix
is positive semidefinite.
The ordinary co-kriging predictor at
is
with weights solving
where
,
,
is a vector of ones, and
enforces
. The co-kriging variance is
Anisotropy aligned with prevailing winds can be used in
to capture directional transport.
2.5. Proposed WSDST-GAT Model
Figure 2 shows the detailed architecture of WSDST-GAT.
2.5.1. Problem Definition
The current time is
t. We aim to predict hourly ozone concentrations over a future horizon of
H steps,
where
denotes the ozone concentrations at all
N stations at time
. The prediction is conditioned on a historical window of length
L (from
to
t) that includes air-quality features, meteorological drivers, and dynamic graphs driven by wind.
Specifically, we partition the inputs as
Here,
collects air-quality predictors at the station level (CO, NO
2, PM
2.5, PM
10) with
;
contains meteorology, including 2 m dewpoint temperature, 2 m temperature, surface pressure, 10 m u-component of wind, and 10 m v-component of wind (
).
denotes the sequence of time-varying directed adjacency matrices constructed from instantaneous wind speed/direction and inter-station distances.
The proposed model is a mapping
where
is parameterized by the wind-aware dynamic GAT and Transformer, and
denotes the quantile forecasts
when prediction intervals are required.
N denotes the number of stations;
L denotes the input length;
H denotes the forecast horizon;
and
denote the characteristic dimensions of the air-quality and meteorological input, respectively;
denotes the directed adjacency at hour
; and
denotes the training parameters of WSDST-GAT.
2.5.2. Spatial Feature Extraction Module
At each hour
, we operate on a directed graph of wind awareness and time-varying
, whose adjacency
encodes downwind-oriented connectivity, distance decay, and wind-speed gain. Given node features
, a dynamic Graph Attention Network computes layer-wise embeddings
via edge-biased attention [
42]:
with parallel multi-head attention and residual connections plus layer normalization for stability. The outputs
serve as spatial embeddings for the temporal module. Here,
is the embedding of node
i at layer
ℓ and hour
;
is a linear projection;
is the attention scorer;
maps the edge weight;
is a scalar bias-injecting wind topology;
is a nonlinearity;
is the neighbor set of
i at hour
; and
is the number of spatial layers.
2.5.3. Temporal Feature Extraction Module
For each station
i, we form a length-
L sequence
from the spatial module and encode it with a Transformer to capture multi-scale temporal dynamics and long-range dependencies [
43]:
with sinusoidal positional encodings on inputs and queries and a pre-norm Transformer. For multi-step forecasting, a non-autoregressive decoder with
H queries maps the encoded history to the horizon. Here,
is the station-wise embedding sequence;
is positional encoding;
is the encoded representation;
is the
H queries; and
are the
H-step predictions at station
i.
2.5.4. Fusion Module
We fuse the spatial context
and temporal encoding
with a gated linear unit (GLU):
A point head and optional quantile heads then produce
H-step forecasts:
Here, ‖ is concatenation, ⊙ is the Hadamard product,
is the logistic sigmoid, and
are learnable matrices.
and
denote linear mappings that produce, respectively, mean forecasts and
-quantiles.
2.6. Process Details
The overall experimental workflow of the proposed WSDST-GAT model comprises four primary stages: data preprocessing, spatial–temporal representation learning, model training, and evaluation and visualization. The detailed procedure is described below.
2.6.1. Construction of Dynamic Spatial Graphs
At each hourly step t, a dynamic graph driven by the wind was constructed, where nodes correspond to air-quality stations, and the edges represent the dynamic directional influence determined by the instantaneous wind field.
Specifically, the wind direction determines the orientation of pollutant transport. For any pair of stations , the azimuth angle describes the direction from station i to station j. When the angular difference is small, station j lies approximately in the downwind direction of station i, indicating potential pollutant transport from i to j.
Meanwhile, wind speed controls the overall strength of atmospheric advection, thereby modulating the intensity of inter-station influence. Stronger winds enhance horizontal transport and amplify the effective connectivity between directionally aligned stations.
Accordingly, the edge weights were defined as
where
denotes the great-circle distance between stations
i and
j;
represents the azimuth angle between the two stations;
denotes the instantaneous wind direction; and
controls the spatial decay scale.
This formulation ensures that wind direction determines edge orientation through downwind alignment, while wind speed regulates the magnitude of spatial interaction under varying meteorological conditions. The resulting time-varying adjacency matrix dynamically adapts to the evolving atmospheric transport pathways.
2.6.2. Model Training
The learning process integrates spatial and temporal dependencies through an end-to-end framework. The dynamic Graph Attention Network (GAT) extracts spatial correlations among stations under varying flow fields, while the Transformer captures long-range temporal dependencies from sequential embeddings. These two feature streams are fused via a GLU and passed into fully connected layers for final prediction. The model is optimized using a composite loss function:
where MSE represents the mean squared error for point forecasts and
represents the quantile regression loss for probabilistic range estimation. Adam optimizer was used with an initial learning rate of 1.00 × 10
−3 and a batch size of 32. Early stopping was applied based on Root Mean Square Error (RMSE) validation to prevent overfitting.
2.7. Experimental Settings
All experiments were implemented using Python 3.9 and PyTorch 1.12 and executed on a workstation equipped with an NVIDIA RTX 4060ti GPU (16 GB memory), Intel Core i7-10700K CPU, and 32 GB RAM running Windows 11. The computational environment was configured to ensure reproducibility and consistency across all runs.
2.7.1. Dataset Division
The complete dataset, consisting of hourly air quality and meteorological records from January 2022 to December 2022, was divided into three subsets in chronological order:
Training set (70%)—used for model parameter learning;
Validation set (15%)—used for hyperparameter tuning and early stopping;
Test set (15%)—used exclusively for performance evaluation.
The chronological split avoids temporal leakage and ensures that the model generalizes to unseen future periods.
2.7.2. Baseline Models for Comparison
A comprehensive evaluation was conducted to validate the proposed WSDST-GAT, which included a comparison with several widely adopted baseline models.
LSTM [
44]: captures temporal dependencies via a recurrent structure;
GRU [
45]: simplified recurrent model for short-term dynamics;
GCN-LSTM [
46]: incorporates static spatial correlations with temporal recurrence;
ST-GAT [
47]: Graph Attention Network with static adjacency for spatiotemporal learning;
ST-Transformer [
48]: Transformer-based spatiotemporal sequence model without dynamic graph adaptation.
STD-GCN [
49]: a dynamic directed graph convolutional network with wind-field-based adjacency.
All baseline models were trained under identical data splits, optimization settings, and evaluation metrics for fair comparison.
2.7.3. Evaluation Metrics
In order to fully assess point and interval forecast performance, multiple evaluation indicators were employed in this study.
Point Forecast Metrics
For deterministic prediction of ozone concentration, we utilized three widely used statistical indicators: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of Determination (R
2).
where
and
denote the observed and predicted ozone concentrations, respectively;
is the mean of the observed values; and
N is the number of samples.
Interval Forecast Metrics
For probabilistic or range-based forecasting, we evaluated the model using Prediction Interval Coverage Probability (PICP) and Prediction Interval Normalized Average Width (PINAW).
Prediction Interval Coverage Probability (PICP):
where
denotes the predicted lower and upper bounds. PICP measures the percentage of actual observations covered by the predicted interval.
Normalized Average Width (PINAW) Prediction Interval:
where
represents the range of observed ozone concentrations. PINAW reflects the average width of the interval, normalized to the data range—smaller PINAW values indicate tighter intervals.
3. Results and Discussion
3.1. Exploratory Spatiotemporal Characteristics and Feature Screening Results
To better understand the intrinsic statistical properties of ozone concentration and justify the modeling design, we first examine its temporal persistence, spatial coherence, and nonlinear dependency with candidate predictors. The results of these exploratory analyses are presented below.
3.1.1. Temporal Dependency Analysis
Figure 3 illustrates the ACF of hourly ozone concentration over a 100 h lag window.
Distinct peaks are observed at lag intervals of approximately 24 and 48 h, indicating pronounced daily periodicity. In several stations, the autocorrelation coefficients exceed 0.5 at these lags, reflecting strong short-term temporal persistence.
This cyclical behavior is consistent with the photochemical formation mechanism of ozone, which is modulated by solar radiation and temperature variations. The presence of substantial temporal dependency supports the necessity of incorporating long-range sequence modeling mechanisms within the predictive framework.
3.1.2. Spatial Correlation Analysis
The spatial association between monitoring stations is visualized in
Figure 4, which presents the Pearson correlation matrix of hourly ozone concentrations across all 35 stations.
The majority of station pairs exhibit moderate-to-strong positive correlations, indicating substantial spatial coherence in ozone variability across the study area. This pattern suggests that ozone concentrations are influenced not only by local emissions but also by regional transport and atmospheric mixing processes.
Such spatial dependency justifies the adoption of graph-based modeling structures capable of explicitly encoding inter-station interactions.
3.1.3. Mutual Information-Based Feature Screening
Figure 5 reports the estimated mutual information (MI) values between ozone concentration and candidate predictors.
Most meteorological and pollutant variables exhibit non-negligible nonlinear dependence with ozone, indicating their potential explanatory relevance. In contrast, SO2 demonstrates near-zero MI values, suggesting minimal contribution to ozone variability during the study period.
Based on these results, SO2 was excluded from the input feature set, and nine predictors were retained per station. This selection balances information richness and redundancy control, improving generalization capacity while maintaining computational efficiency.
3.2. Overall Performance Evaluation
The comprehensive evaluation of the proposed WSDST-GAT’s predictive capability relied on both point and interval forecasting techniques. Evaluation metrics include MAE, RMSE, and R2 for point prediction, as well as PICP and PINAW for interval prediction. These metrics jointly assess the accuracy, reliability, and uncertainty calibration of the model.
3.2.1. Point Forecasting: Comparative Benchmarking Across Models
Table 2 summarizes the quantitative results of predicting ozone concentration in all models. Meanwhile,
Figure 6 provides a visual comparison where MAE, RMSE, and R
2 are shown as bars. The proposed WSDST-GAT demonstrates overall superior performance to all baseline methods, namely LSTM, GRU, GCN-LSTM, ST-GAT, ST-Transformer, and STD-GCN, with an MAE of 5.25 μg/m
3 and RMSE of 9.58 μg/m
3. These results demonstrate that integrating wind-speed and direction-driven dynamic graphs effectively captures evolving spatial dependencies, while the Transformer-based temporal encoder improves the extraction of long-term patterns.
WSDST-GAT demonstrates a marked improvement over static graph models such as ST-GAT, underscoring the benefits of employing a wind-aware dynamic topology in which edges between monitoring stations adjust adaptively based on hourly wind field data.
Conventional static distance-based graphs implicitly assume isotropic and time-invariant spatial interactions, where inter-station influence depends solely on geographic proximity. However, atmospheric pollutant transport is inherently anisotropic and strongly modulated by evolving meteorological conditions, particularly wind speed and direction. Static topologies therefore fail to reflect the directional and time-varying nature of advection-driven dispersion processes.
In contrast, the proposed dynamic graph explicitly incorporates wind-aware connectivity, allowing spatial interactions to vary according to real-time flow conditions. This dynamic structure captures both the directional alignment of downwind transport and the intensity of wind-driven dispersion, leading to spatial representations that are more physically consistent and better aligned with atmospheric transport mechanisms. Consequently, the model achieves improved predictive performance and enhanced physical interpretability.
3.2.2. Prediction Intervals: Comparative Benchmarking Across Models
WSDST-GAT quantifies the prediction uncertainty by generating prediction intervals for each predicted time step. A well-calibrated prediction interval achieves an optimal balance between coverage reliability and sharpness, maintaining narrow widths while consistently encompassing the true observations. To evaluate the performance of the proposed model in uncertainty quantification,
Table 3 compares the interval forecasting results of the WSDST-GAT model with the baseline models.
As demonstrated in the results,
Figure 7 provides a visual comparison of interval forecast performance. WSDST-GAT produces the most reliable and compact prediction intervals, achieving the highest PICP of 94.01% and the lowest PINAW of 0.174, reflecting the superior calibration of uncertainty. A high PICP indicates that the constructed prediction intervals successfully cover the majority of true observations, demonstrating strong reliability. Meanwhile, a relatively low PINAW suggests that the intervals remain sufficiently narrow, avoiding an overly conservative uncertainty estimation. The combination of high coverage and compact width implies that the model achieves a favorable balance between reliability and precision in probabilistic forecasting. Notably, by incorporating wind field dynamics, the model adaptively widens its uncertainty bands under unstable meteorological conditions, such as rapid changes in wind speed or direction, thereby enhancing robustness in complex atmospheric scenarios.
In general, these findings validate that WSDST-GAT achieves high deterministic accuracy and strong probabilistic reliability, offering a physically interpretable and data-driven framework suitable for real-time ozone prediction and early-warning systems.
3.3. Temporal Forecasting Analysis
To provide the government with more time to formulate policy responses to potential severe ozone-related issues, long-term prediction of ozone concentrations is crucial. Consequently, the forecast horizon is extended from the next hour to the entire following day. The present study further investigates the temporal modeling capacity of the proposed WSDST-GAT model by evaluating its performance across multiple forecasting horizons and examining its efficacy in capturing short-term fluctuations and diurnal patterns of ozone concentrations. Taking the XCGY station as an example, the 24 h prediction results are shown in
Figure 8 and
Figure 9, which present the forecast accuracy of the proposed model and baselines across different horizons. As expected, the prediction errors gradually increase with longer forecast horizons due to the accumulation of temporal uncertainty. However, the proposed WSDST-GAT consistently outperforms all comparison models on each horizon, showing superior stability and robustness.
The results indicate that WSDST-GAT maintains stable predictive performance even at longer forecast intervals. The combination of a Transformer-based temporal encoder and dynamic graph-driven spatial representation allows the model to effectively capture both long-term temporal dependency and short-term variations in the ozone concentration series. In particular, the multi-head self-attention mechanism enables the model to assign greater weights to the most informative historical periods, thereby improving temporal prediction coherence.
3.4. Spatial Prediction Performance
To comprehensively evaluate the spatial generalization ability of the proposed WSDST-GAT model, we assess its predictive performance at 35 air-quality monitoring stations spread throughout Beijing. For each model, the station-level MAEs are calculated.
Figure 10 illustrates the MAE values and R
2 of the baseline approaches compared to the proposed WSDST-GAT.
The results demonstrate that WSDST-GAT consistently achieves the lowest MAE among all competing models, confirming its superior robustness when applied to geographically heterogeneous monitoring sites. Conventional recurrent models such as LSTM exhibit the largest errors, primarily due to their inability to encode explicit inter-station spatial dependencies. GCN-LSTM and ST-GAT partially alleviate this limitation by incorporating graph structures; however, their spatial graphs remain static and therefore cannot accommodate the inherently time-varying nature of atmospheric transport. ST-Transformer achieves further improvements by integrating spatiotemporal attention mechanisms, yet it still relies on a fixed or quasi-static representation of spatial connectivity.
In contrast, WSDST-GAT employs a dynamically evolving, wind-aware directed graph, where edge directions are aligned with downwind flows, and edge weights are adaptively modulated by wind speed and directional alignment. This enables the model to better capture the anisotropic and temporally varying transport processes that govern ozone formation and dispersion. Consequently, WSDST-GAT yields more spatially coherent predictions not only within densely populated urban areas but also at suburban and peripheral stations that are strongly influenced by regional atmospheric transport. These findings underscore the importance of dynamically modeling wind-driven topological structures to improve spatial prediction accuracy in ozone forecasting tasks.
3.5. Prediction Intervals: Case Study and Volatility-Stratified Analysis
While point forecasts provide estimates of the expected ozone concentration, they do not convey the inherent uncertainty arising from complex meteorological dynamics and stochastic emission processes. To overcome this limitation, the proposed WSDST-GAT model is extended to generate prediction intervals, offering probabilistic representations of the forecast uncertainty at each time step.
Figure 11 presents an example of the predicted ozone concentration intervals at Xicheng Guanyuan station in July 2022. The blue shaded area represents the 90% prediction interval, while the solid line indicates the predicted mean value. The actual observations mostly lie within the shaded region, confirming that the intervals are well calibrated.
The predictive bands widen adaptively during periods associated with strong convective activity or abrupt changes in wind.
Table 4 shows that grouping of test hours by the meteorological volatility index shows that the mean interval width increases from 99.44 to 112.61, indicating that a higher atmospheric instability translates into a greater uncertainty of the forecast. By contrast, under relatively steady conditions, the intervals are narrower, showing that the model dynamically modulates its confidence according to the underlying atmospheric stability.
3.6. Ablation Study
To evaluate the contribution of each major component within the proposed WSDST-GAT, a comprehensive ablation study was conducted. Three key modules were selectively removed or replaced to analyze their individual impact on model performance under identical training and evaluation settings. The following variants were designed:
- 1.
Without Dynamic Graph (Static-GAT): The dynamic wind-driven adjacency matrix was replaced with a static spatial distance matrix.
- 2.
Without Transformer (GAT-only): The Transformer-based temporal encoder was replaced with a multi-layer LSTM.
- 3.
Without Meteorological Fusion: Meteorological variables were removed, leaving only pollutant concentrations as input.
The quantitative results of these variants are reported in
Table 5.
Ablation results consistently validate the necessity of each core component within the WSDST-GAT architecture. Removing any of the three modules leads to a noticeable decrease in predictive accuracy, demonstrating that each component contributes uniquely to the overall modeling capability.
From a climatic and statistical perspective, the removal of meteorological variables leads to a consistent increase in RMSE and MAE and a reduction in R2. This performance degradation indicates that ozone concentration cannot be sufficiently explained by pollutant autocorrelation alone.
Meteorological variables introduce additional explanatory variance related to atmospheric thermodynamics and transport processes. Their exclusion reduces the model’s capacity to capture climate-driven variability, thereby confirming the quantitative contribution of climatic information to predictive performance.
Specifically, replacing the dynamic graph with a static matrix weakens the model’s ability to represent wind-driven anisotropic transport, while substituting the Transformer with an LSTM limits long-range temporal dependency modeling. The removal of meteorological inputs reduces the model’s capacity to capture statistically meaningful climate–pollution interactions, confirming that atmospheric variability plays a critical role in ozone prediction.
Overall, these findings demonstrate that the integration of dynamic spatial modeling, temporal attention mechanisms, and meteorological feature fusion forms a coherent framework that jointly captures spatial transport dynamics, temporal persistence, and climate-driven variability. This synergy enables WSDST-GAT to achieve superior prediction accuracy and robust generalization in ozone forecasting.
3.7. Spatial Enhancement Module: Co-Kriging Field Reconstruction Based on Meteorological Covariates
Spatial enhancement is implemented as a post-processing step. Specifically, the co-kriging module does not participate in model parameter optimization; instead, it operates on the predicted station-level ozone concentrations to reconstruct spatially continuous city-scale fields with quantified uncertainty. In this study, we employ the advanced co-kriging interpolation method. This technique not only utilizes geographic coordinate information such as longitude and latitude to model spatial correlations but also fully leverages multi-source data by integrating key meteorological covariates, including d2m, t2m, sp, u10, and v10.
Specifically, the co-kriging method thoroughly exploits the spatial cross-correlation between the target variable, ozone, and auxiliary variables, namely the meteorological fields. Based on point forecasts from monitoring stations, it considers not only the spatial distance and orientation between observation points through longitude and latitude but also incorporates closely related, continuously distributed meteorological field information. This process reconstructs a physically meaningful and information-rich continuous concentration surface. The resulting fine-scale regional field reflects the underlying dynamics of atmospheric dispersion and transport, while extending reliable estimates to areas with sparse or no monitoring stations, thereby improving spatial completeness and supporting spatially informed environmental decision-making.
Figure 12 shows the three-hour continuous Kriging interpolation forecast results. These spatially enhanced outputs provide high-value support for environmental policymaking and air-quality early warning. The continuous concentration surface enables precise identification of pollution hotspots and facilitates spatial exceedance detection and analysis against regulatory thresholds. This offers a direct, scientific basis for issuing health alerts, targeted deployment of regulatory inspections, and optimized allocation of emission reduction resources. By converting discrete point forecasts into complete, continuous regional concentration fields, this framework successfully bridges station-based observations with macro-level geospatial decision-making needs, delivering high-resolution maps that are ready for actionable urban air-quality management and governmental environmental decision support.
4. Conclusions
This study proposed a novel hybrid deep learning framework, the Wind Speed and Direction-Based Dynamic Spatiotemporal Graph Attention Network (WSDST-GAT), for the hourly prediction and interval estimation of ozone concentration in Beijing. By integrating physical meteorological knowledge with advanced neural network architectures, the model effectively captures both the spatial anisotropy of pollutant transport and the temporal dependencies of atmospheric processes.
The main innovations and contributions of this work are summarized as follows: (1) a wind-driven, time-varying directed graph is constructed to encode downwind transport, where edge directions follow the instantaneous wind field and edge weights reflect wind speed; meanwhile, a Graph Attention Network adaptively aggregates information from relevant neighbors; (2) a Transformer encoder–decoder is employed to capture multi-scale temporal patterns and long-range dependencies in hourly ozone series via multi-head attention; (3) a spatiotemporal model is developed to jointly support deterministic point forecasting and probabilistic interval estimation for risk-aware early warning; (4) a co-kriging module is integrated to convert station-level forecasts into continuous city-wide ozone fields with quantified uncertainty by incorporating key meteorological covariates.
Comprehensive experiments based on hourly data from 35 air-quality monitoring stations in Beijing demonstrated that the proposed model achieved superior performance, with an R2 of 0.957, an MAE of 5.25 μg/m3, and an RMSE of 9.58 μg/m3 for point prediction. Furthermore, the results of the probabilistic interval prediction indicated high calibration and sharpness, with a PICP of 94.01% and PINAW of 0.174, confirming the model’s reliability for uncertainty estimation. Comparative analysis against classical models and ablation studies verified that each structural component—dynamic graph, Transformer, and meteorological fusion—contributed significantly to the model’s predictive capability.
This study develops and evaluates the proposed model based on the existed dataset. Due to the accessibility of data, the performance assessment may be limited to some extent. Future work will incorporate larger-scale, multi-year data to enable a more comprehensive evaluation of model performance and to further examine its generalization capability and stability.
Although this study focuses on Beijing, the proposed framework is not region-specific. The wind-aware dynamic graph construction and spatiotemporal modeling architecture can be extended to other cities or regions, provided that fundamental air-quality and meteorological monitoring data are available. When applied to different geographical settings, minor adjustments, such as recalibration of spatial decay parameters or adaptation to local monitoring network density, may be required. Nevertheless, the overall modeling strategy remains applicable across regions with diverse terrain and meteorological characteristics.
The proposed WSDST-GAT provides an effective data-driven framework for fine-grained ozone forecasting under dynamic meteorological conditions. By explicitly encoding wind-driven transport and spatial neighborhood interactions, the model links atmospheric dynamics with the spatial–temporal observations of urban air-quality evolution. In this sense, WSDST-GAT supports understanding the spatial and temporal dimensions of urban systems through geospatial technologies: it can ingest observations from routine monitoring networks and meteorological fields from multi-source data and translate them into location-specific, time-resolved ozone forecasts that are necessary for urban contexts.
Overall, WSDST-GAT demonstrates a strong synergy between meteorological drivers and deep learning, offering a practical tool for urban air pollution early warning, smart environmental protection, and sustainable atmospheric governance.