Noise-Aware Diffusion for City-Scale Air-Quality Reconstruction from Sparse Monitoring Stations

Zheng, Guanglei; Wan, Yuchai; Zhang, Xun; Liu, Xiansheng

doi:10.3390/ijgi15040171

Open AccessArticle

Noise-Aware Diffusion for City-Scale Air-Quality Reconstruction from Sparse Monitoring Stations

¹

School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing 100048, China

²

Institute of Environmental Health and Pollution Control, Guangdong University of Technology, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2026, 15(4), 171; https://doi.org/10.3390/ijgi15040171

Submission received: 2 February 2026 / Revised: 31 March 2026 / Accepted: 9 April 2026 / Published: 14 April 2026

(This article belongs to the Topic Advances in Sensor Data Fusion and AI for Environmental Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Reliable air-quality monitoring is essential for urban exposure assessment and environmental policy, yet many downstream applications are hindered by sparse regulatory stations and noisy real-world measurements. While diffusion models have shown promise for probabilistic spatiotemporal imputation, common conditioning strategies can be brittle: purely input-based conditioning may drift from sparse constraints, whereas hard clamping can introduce a clean–noisy mismatch and propagate corrupted readings during reverse sampling. In this work, we propose STGPD (SpatioTemporal Graph Posterior Diffusion), a probabilistic framework that formulates city-scale pollutant reconstruction as posterior sampling on a graph-structured spatiotemporal field. STGPD enforces noise-aware soft consistency by re-noising visible observations to the current diffusion level and fusing a noise-matched measurement term with the model prior via variance-weighted fusion under an explicit observation-noise model. To improve spatial extrapolation in heterogeneous urban environments, we further construct a dual-view graph that combines geographic proximity with functional similarity derived from static descriptors. Experiments on real-world monitoring data in Augsburg, Germany, for PM₁₀ and NO₂ show that STGPD provides a robust probabilistic reconstruction framework under extreme sparsity, station outages, and synthetic sensor-noise injection in this sparse-monitoring case study. Compared with strong deterministic and diffusion-based baselines, STGPD achieves improved reconstruction accuracy (MAE/RMSE) and better-calibrated uncertainty estimates (CRPS) under the current evaluation protocols.

Keywords:

air-quality reconstruction; spatiotemporal imputation; diffusion models; graph neural networks; posterior sampling; uncertainty quantification

1. Introduction

Fine-grained urban air-quality fields are increasingly used in geospatial decision-making, including exposure assessment, urban planning, and policy evaluation, where neighborhood-scale variation often matters more than city-wide averages [1,2]. Yet, from a geo-information perspective, these applications depend on reconstructing a spatially continuous spatiotemporal field from observations that are sparse, irregular, and sometimes unreliable. In operational settings, the monitoring infrastructure that provides regulatory-grade supervision is both limited and fragile: stations are expensive to deploy, and long gaps due to outages, maintenance, or communication failures are common [3]. This creates a persistent gap between the spatial detail required by downstream geospatial analyses and the coverage of reliable measurements.

We study this problem in Augsburg, Germany, where the goal is to reconstruct city-scale PM₁₀ and NO₂ fields over a dense grid from only a handful of fixed stations, leveraging exogenous covariates (meteorology, traffic) and static geographic descriptors. With so few station time series constraining a latent city-wide field, the reconstruction is strongly underdetermined and naturally calls for a probabilistic treatment: rather than returning a single surface completion, we seek a distribution of plausible spatiotemporal fields conditioned on sparse and potentially noisy measurements. Such uncertainty-aware field inference is especially relevant under practical stressors, including single-station availability, extended station outages, and sensor corruption where point estimates can be misleading. Classical geostatistical and GIS interpolation methods such as Kriging [4] and IDW [5] remain useful baselines, but their smoothness and stationarity assumptions are often strained in heterogeneous urban environments.

Recent learning-based spatiotemporal models provide more flexible inductive biases by capturing non-linear dependencies over time and space. In air-quality applications, architectures based on spatiotemporal attention, graph neural networks, and multi-scale representations have been proposed for forecasting and inference [6,7,8,9,10,11]. Related advances on sensor networks, ranging from diffusion-convolution recurrent designs to graph convolutional backbones, offer effective mechanisms for combining temporal modeling with spatial message passing [12,13,14,15,16], and Transformer variants provide complementary tools for long-range temporal structure [17,18]. Still, most of these methods are trained as deterministic predictors: under extreme sparsity they tend to over-smooth unconstrained regions, degrade sharply during long outages, and provide limited or poorly calibrated uncertainty [19]. For urban environmental mapping and geo-information applications, a probabilistic reconstruction approach is better aligned with this regime because it can represent multiple field realizations that are consistent with the available evidence.

Diffusion models are a particularly attractive probabilistic family because they can represent complex conditional distributions through sampling [20,21,22,23,24,25]. However, in sparse sensing settings, reconstruction quality depends critically on how measurement constraints are enforced during reverse sampling. If observations enter only through denoiser inputs, sampling trajectories can drift from sparse constraints; if one instead applies hard replacement, inserting clean values into noisy intermediate states induces a clean–noisy mismatch and can propagate corrupted readings when measurements are unreliable [26,27]. From a Bayesian viewpoint, conditioning should approximate posterior sampling under an explicit likelihood, connecting to diffusion-based inverse-problem methods that guide sampling with measurement constraints [28,29,30,31,32,33].

We therefore propose STGPD (SpatioTemporal Graph Posterior Diffusion), a geo-information oriented posterior sampling framework for city-scale air-quality field reconstruction on graph-structured spatiotemporal domains. STGPD performs constrained posterior-guided sampling with a noise-aware soft consistency update: at each step, it combines the model proposal with a noise-matched measurement term via variance-weighted Gaussian fusion, rather than rigid replacement, improving stability under sensor noise. To better capture heterogeneous urban structure, STGPD further constructs a dual-view graph that combines geographic proximity with functional similarity derived from multi-scale geographic descriptors, enabling information propagation between distant yet functionally similar regions. Across stress tests involving extreme sparsity, station outages, and synthetic noise injection, STGPD improves both reconstruction accuracy and uncertainty calibration, while remaining compatible with fast solvers and conditional samplers [34,35,36].

Our main contributions are threefold:

We formulate city-scale air-quality reconstruction as geo-information field inference via diffusion-based posterior sampling on a graph-structured spatiotemporal domain.
We propose a noise-aware, step-wise soft-consistency mechanism that enforces measurement fidelity without clean–noisy mismatch, improving robustness under unreliable sensors.
We introduce a dual-view (geographic + functional) graph construction using multi-scale geographic descriptors and demonstrate consistent gains in accuracy and uncertainty calibration under multiple real-world stress scenarios.

2. Related Work

2.1. Spatiotemporal Imputation and Graph Neural Networks

Spatiotemporal imputation is a long-standing problem in urban computing and environmental monitoring, where observations are sparse, irregularly distributed, and often missing over extended periods. From a geo-information perspective, this problem corresponds to reconstructing a spatially continuous environmental field over an urban domain from sparse sensor networks, closely related to spatial interpolation and environmental mapping in GIS. Classical geostatistical interpolation methods such as Kriging [4] and Inverse Distance Weighting (IDW), also known as Shepard interpolation [5] provide strong baselines under smoothness and stationarity assumptions, but can degrade under non-linear dynamics and strong spatial heterogeneity. With the rise of deep learning, Graph Neural Networks (GNNs) have become a primary paradigm for modeling spatial dependencies on irregular sensor networks [19]. Representative methods such as GRIN [37] and related graph-recurrent designs [38,39] couple message passing with temporal modeling to propagate information across space and time.

Recent work also develops spatiotemporal attention and Transformer architectures for long-range dependency modeling on urban sensor networks, primarily in forecasting tasks such as traffic flow prediction [40,41]. In parallel, uncertainty-aware spatiotemporal GNNs explicitly model predictive uncertainty to support risk-sensitive decision making. Despite their effectiveness, most deterministic graph imputers are optimized for point estimates and may over-smooth when the conditioning set is extremely sparse, while providing limited or poorly calibrated uncertainty under deployment stressors. Moreover, many pipelines rely on a single graph definition and coarse static context, whereas heterogeneous urban environments often benefit from richer spatial context and graph construction that better reflects both geographic proximity and functional similarity. Related IJGI studies also emphasize spatial non-stationarity in air-quality mapping, motivating geographically adaptive learning such as geographically weighted neural networks [42].

2.2. Diffusion Models for Spatiotemporal Learning

Generative diffusion models have emerged as competitive approaches for probabilistic imputation and spatiotemporal generation due to their ability to model complex conditional distributions [25]. Recent work in the geo-information literature also highlights the growing role of deep learning integrated with GIS and remote sensing for high-resolution air-pollution mapping [43]. CSDI [20] introduced conditional score-based diffusion for probabilistic time-series imputation, and PriSTI [21] extended this paradigm to spatiotemporal settings by incorporating spatial priors during denoising. Subsequent advances improve conditioning stability and reconstruction quality through self-adaptive noise scaling [22], non-autoregressive conditional diffusion, and refinement-based strategies [23]. In parallel, spatiotemporal foundation models based on diffusion and Transformer backbones suggest the promise of scaling generative architectures for open-world spatiotemporal learning [44,45,46].

A recurring challenge is measurement fidelity under extreme sparsity: if observation information enters only through network inputs, iterative sampling may drift away from sparse observations unless consistency is enforced throughout the reverse trajectory. This issue is central in air-quality monitoring and geospatial environmental mapping, where adherence to available station measurements is often non-negotiable for downstream spatial analysis.

2.3. Inverse Problems and Physics-Informed Generative Modeling

Reconstructing a high-resolution spatiotemporal field from sparse measurements can be viewed as an inverse problem, where observations arise from a measurement operator applied to the latent field. In computer vision, diffusion posterior sampling methods such as DPS [28] and DDRM [29] show that diffusion priors can be guided by observation constraints during reverse sampling, including extensions for linear inverse solvers [30] and Gaussian reformulations that clarify the role of likelihood terms [32]. A complementary direction develops physics-informed diffusion models that incorporate governing equations or physical regularizers as soft constraints [45,47,48].

Directly transferring these ideas to graph-structured spatiotemporal sensing is nontrivial. Computer-vision inverse problems and inpainting-style conditioning typically operate on dense pixel grids with relatively dense measurement operators and often assume random or spatially contiguous masks, whereas geospatial sensing provides observations only at a small set of station nodes over an irregular, often graph-structured domain that mixes sensor locations and urban grids. In this setting, the measurement operator corresponds to highly sparse sampling or masking rather than a dense imaging operator, and enforcing observation fidelity must respect the diffusion noise level. Consequently, naive replacement of clean station observations into noisy intermediate diffusion states can create a clean–noisy mismatch and propagate corrupted readings when measurements are unreliable. Such issues are well recognized in practical GIS-based air-quality modeling, where ground-station measurements often suffer from noise, calibration uncertainty, and missing segments [49]. In contrast to inpainting-style conditioning [26] or linearly corrupted variants [31], our STGPD performs step-wise, noise-aware observation consistency by re-noising measurements to the current diffusion level and adaptively fusing the measurement branch with the model’s prior proposal under an explicit sensor-noise parameter, enabling stable posterior sampling under extreme sparsity and sensor corruption.

2.4. Uncertainty Quantification and Sampling Efficiency

For air-quality monitoring, uncertainty quantification can be as important as point accuracy, since downstream decisions benefit from calibrated probabilistic estimates [50]. In urban environmental mapping and geo-information applications, spatially explicit uncertainty is often required for exposure assessment and risk-aware planning, making calibrated probabilistic evaluation particularly important. Diffusion models provide uncertainty via sampling, but ensuring calibration and statistically meaningful probabilistic scores (e.g., CRPS-based evaluation) remains an active topic [51,52]. A practical limitation is inference cost: iterative reverse diffusion can be expensive and may hinder real-time deployment. Accelerated samplers such as DDIM [34] and DPM-Solver++ [35] reduce the number of function evaluations, while conditional sampling strategies improve efficiency under constraints [36]. In this work, we integrate fast solvers with constrained posterior sampling to balance inference speed, reconstruction fidelity, and uncertainty quality, while maintaining strict consistency with sparse observations.

3. Methodology

We first introduce the Augsburg dataset and preprocessing pipeline, followed by graph construction and the proposed STGPD framework. The study area and monitoring configuration are shown in Figure 1. An overview of the proposed STGPD framework is shown in Figure 2. We focus on two policy-relevant pollutants, PM₁₀ and NO₂. Since grid-level ground truth is unavailable, station measurements are the only supervised reference for training and evaluation. Grid nodes are included to support city-scale field reconstruction, but they do not contribute to the supervised loss or evaluation metrics unless stated otherwise.

3.1. Study Area and Data Sources

The study area is Augsburg, Bavaria, Germany, a mid-sized city with heterogeneous land-use patterns and traffic corridors, which provides a challenging setting for reconstruction from sparse monitoring, as shown in Figure 1. All spatial layers (station locations, city boundary, and grid-cell centroids) are processed in a consistent coordinate reference system, and distances used in graph construction are computed accordingly. The reconstruction domain consists of

N_{g} = 456

grid-cell centroids within the Augsburg administrative boundary. Building on this spatial setup, we next describe the pollutant measurements and the auxiliary covariates used for reconstruction.

Hourly concentrations of PM₁₀ and NO₂ were collected from 1 July 2016 to 31 December 2020 at four fixed monitoring stations: Augsburg/Königsplatz, Augsburg/Bourges-Platz, Augsburg/Karlstraße, and Augsburg University of Applied Sciences (UAS). Measurements contain missing segments due to outages and are kept as missing in the aligned dataset. In addition to these station readings, we incorporate time-varying exogenous covariates that are available across the domain.

We use hourly meteorological variables, including pressure, temperature, relative humidity, precipitation, wind speed, and wind direction, together with traffic intensity provided by the City of Augsburg and calendar indicators such as hour-of-day, day-of-week, weekend and holiday flags. Following the experimental assumptions used throughout the paper, including the leave-one-out evaluation, meteorology and calendar features are treated as globally available covariates for all nodes, both stations and grid cells, reflecting their availability from dense external products or administrative records. Traffic features are aligned to the hourly timeline and associated with nodes according to the spatial support provided by the data source, for example by mapping measurements to nearby road segments or aggregating them to the spatial units used by the provider. Finally, we assign each node a static descriptor to summarize multi-scale geographic context and support spatial extrapolation.

To represent local-to-regional spatial context, each node is assigned a static descriptor vector computed from concentric buffers with 19 radii spanning 0.1–5.0 km. Within each radius, we compute summary statistics of geographic context, including land-use categories and indicators related to the built environment and roads, and concatenate them across scales. These descriptors are computed for both monitoring stations and grid-cell centroids, enabling the model to use consistent spatial context when propagating information from stations to grids. The geographic descriptors are computed from standard GIS layers extracted from OpenStreetMap (OSM), including land-use and land-cover data as well as indicators related to the built environment and roads, aggregated within each buffer radius. The proposed framework does not explicitly ingest a source-resolved emission inventory. Instead, source-related spatial heterogeneity is represented indirectly through traffic covariates, meteorological forcings, and multi-scale geographic descriptors, which together capture local traffic influence, broader urban functional differences, and spatial context relevant to transport and dispersion. Table 1 summarizes the station names and site types, measurement method, and QA/QC information of the four fixed monitoring stations used in this study.

3.2. Preprocessing and Input Tensors

We next describe how all sources are aligned and transformed into the tensors used by the denoiser and the posterior sampling procedure. All data sources are aligned to a common hourly timeline. Pollutant channels may be missing and are retained as missing values in the raw aligned data. Exogenous covariates are synchronized by timestamp. Static descriptors are matched to each node by location.

Before modeling, we apply standard transformations and encodings to ensure numerical stability and avoid representational artifacts. Traffic intensity is transformed with log1p. Wind direction is encoded by its sine and cosine components to avoid angular discontinuities. Calendar features are encoded as standard time indicators.

All continuous variables (pollutants, meteorology, traffic, static descriptors) are standardized using z-score normalization. Normalization statistics are computed only on the training split and then applied to test splits to prevent leakage. In implementation, missing pollutant entries are filled with zeros after normalization to avoid propagating NaN values; the corresponding observation mask ensures that these placeholders do not contribute to the supervised objective. All deterministic metrics (MAE/RMSE/

R^{2}

) are reported in the de-normalized scale.

With the aligned and normalized inputs, we define the target field, observations, and covariates used throughout the paper. Let T be the number of hourly timestamps and

N = N_{s} + N_{g}

the number of nodes, where

N_{s} = 4

stations and

N_{g} = 456

grid-cell centroids. We define:

Latent field: $X_{0} \in R^{T \times N \times 2}$ , with two channels corresponding to PM₁₀ and NO₂.
Observations: $Y \in R^{T \times N \times 2}$ , denoting the aligned observation tensor; only station-node entries are observed, while all grid-node entries are treated as missing.
Time-varying covariates: $U \in R^{T \times N \times C_{u}}$ , including meteorological variables, traffic intensity, and calendar features, where $C_{u}$ denotes the number of dynamic covariate channels.
Static descriptors: $S \in R^{N \times C_{s}}$ , representing multi-scale geographic descriptors shared across time, where $C_{s}$ denotes the number of static descriptor channels.

We define a pollutant observation mask

M_{raw} \in {0, 1}^{T \times N \times 2}

, where

M_{raw, t, i, c} = 1

indicates that pollutant channel c at node i and time t is available in the raw dataset (after natural missingness), and 0 otherwise. Exogenous covariates are not masked.

For re-masking evaluation, we define a station-only observation mask

M_{st}

by restricting

M_{raw}

to station nodes (and setting all grid-node entries to zero). We then split the observed station entries into a visible mask

M_{keep}

and a target mask

M_{tgt}

such that

M_{keep} ⊙ M_{tgt} = 0, M_{keep} + M_{tgt} = M_{st} .

We define the visible observation tensor as

Y_{vis} = Y ⊙ M_{keep}

.

During training under the re-masking objective, the denoiser is conditioned on

(Y_{vis}, M_{keep}, U, S, G)

and the supervised loss is computed only on target entries indexed by

M_{tgt}

. Crucially, during inference (testing), the consistency update is applied only on visible entries by setting the consistency mask

M_{cons} = M_{keep}

and using

Y_{vis}

in all re-noising and fusion steps; target entries in

M_{tgt}

are never used during sampling and are reserved solely for metric computation, preventing any ground-truth leakage into the reverse diffusion trajectory. For convenience,

M_{st}

is represented in the full-node tensor shape by assigning zero entries to all grid nodes.

3.3. Graph Construction

We encode spatial structure with a weighted graph

G = (V, E)

, whose edge weights are defined as follows. To reflect both geographic proximity and functional similarity, we fuse two affinity matrices:

W = α {\tilde{W}}_{geo} + (1 - α) {\tilde{W}}_{sem},

(1)

where

α \in [0, 1]

, and

{\tilde{W}}_{geo}, {\tilde{W}}_{sem}

are affinities normalized to comparable scales prior to fusion (e.g., by row-wise normalization or other monotone rescaling). Specifically, we first construct

W_{geo}

and

W_{sem}

, and then obtain

{\tilde{W}}_{geo} = Norm (W_{geo})

and

{\tilde{W}}_{sem} = Norm (W_{sem})

. Here

Norm (\cdot)

denotes a monotone rescaling used to place the two affinity matrices on comparable scales prior to fusion; in our implementation we apply row-wise normalization,

Norm {(W)}_{i j} = \frac{W_{i j}}{\sum_{j^{'}} W_{i j^{'}} + ϵ}

, where

ϵ

is a small constant for numerical stability.

We compute

W_{geo}

using a Gaussian kernel over pairwise distances

d_{i j}

:

W_{i j}^{geo} = exp (- \frac{d_{i j}^{2}}{2 σ_{d}^{2}}) .

(2)

This proximity-based view encourages local information propagation on the urban grid; however, purely distance-based connectivity can be insufficient in heterogeneous cities where distant locations may share similar land-use and built-environment characteristics. To complement geographic proximity, we therefore introduce a functional-similarity affinity derived from static descriptors.

We compute

W_{sem}

from the static descriptor vectors. Let

s_{i}

be the normalized descriptor of node i. We define a nonnegative similarity:

W_{i j}^{sem} = max (0, \frac{〈 s_{i}, s_{j} 〉}{∥ s_{i} ∥ ∥ s_{j} ∥}),

(3)

which allows edges between functionally similar locations even when they are geographically distant. After constructing

W_{geo}

and

W_{sem}

, we fuse them using

α

and then sparsify the resulting affinity to obtain a tractable graph.

For efficiency and to avoid isolated nodes, we sparsify the fused affinity by retaining the top-k neighbors per node, symmetrize by

W \leftarrow max (W, W^{⊤})

, add self-loops, and apply row normalization before message passing. To ensure connectivity among monitoring sites, we preserve all station–station edges during sparsification so that station nodes form a fully connected subgraph, while pruning is applied to all other pairs.

When wind-driven transport effects are modeled, we further modulate edge strengths using wind speed and direction to obtain a time-dependent graph

G_{t}

, followed by the same sparsification and normalization. Unless stated otherwise,

α

and k are selected on the validation split.

3.4. STGPD: Spatiotemporal Graph Posterior Diffusion

We treat reconstruction as posterior sampling on graph-structured spatiotemporal fields under sparse observations. Let

X_{0} \in R^{T \times N \times 2}

denote the latent pollutant field over all nodes and times. Observed station readings follow:

Y = M_{st} ⊙ X_{0} + M_{st} ⊙ E, E \sim N (0, σ_{y}^{2} I),

(4)

where

M_{st}

is the station-only mask induced by

M_{raw}

. Grid-node pollutant entries are treated as unobserved; supervision is provided only by station entries, and the training loss is computed on the target subset introduced by re-masking (Section 4). Building on this observation model, we first learn a conditional diffusion prior over the full latent field and then enforce step-wise consistency with the visible observations during reverse-time sampling.

3.4.1. Diffusion Prior with a Graph-Temporal Denoiser

We learn a conditional diffusion prior over

X_{0}

. With a noise schedule

{β_{t}}_{t = 1}^{K}

, define

α_{t} = 1 - β_{t}

and

{\bar{α}}_{t} = \prod_{i = 1}^{t} α_{i}

. The forward process is:

X_{t} = \sqrt{{\bar{α}}_{t}} X_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ, ϵ \sim N (0, I) .

(5)

We train a denoiser

ϵ_{θ} (X_{t}, t, C)

to predict

ϵ

, where the conditioning set is

C = (Y ⊙ M_{c}, M_{c}, U, S, G or G_{t})

. Here,

ϵ_{θ} (\cdot)

denotes the neural network, parameterized by

θ

, that predicts the diffusion noise residual

ϵ

. Here,

M_{c}

denotes the conditioning mask: under the re-masking protocol,

M_{c} = M_{keep}

; otherwise

M_{c} = M_{raw}

. The denoiser follows a bidirectional graph-recurrent design: graph message passing captures spatial dependencies on

G

, and bidirectional recurrent updates model temporal evolution in both directions.

Training minimizes the standard noise-prediction loss on pollutant channels:

L_{prior} = E_{X_{0}, t, ϵ} [∥ ϵ - ϵ_{θ} (X_{t}, t, C) ∥_{2}^{2}] .

(6)

When grid-level ground truth is unavailable, the loss is evaluated only on supervised station entries and, under re-masking, restricted to the target subset indexed by

M_{tgt}

(details in Section 4). Once this prior is learned, we draw samples by reverse diffusion; however, under sparse sensing, unconstrained sampling can drift away from the observed station values, motivating an explicit consistency mechanism.

3.4.2. Noise-Aware Observation Consistency During Sampling

Unconstrained reverse diffusion samples from the learned prior and may drift when observations are sparse. STGPD enforces consistency by updating the reverse trajectory at every step, using only the visible observations specified by the re-masking protocol.

A standard DDPM-style proposal from

t \to t - 1

is:

X_{t - 1}^{prior} = \frac{1}{\sqrt{α_{t}}} (X_{t} - \frac{β_{t}}{\sqrt{1 - {\bar{α}}_{t}}} ϵ_{θ} (X_{t}, t, C)) + σ_{t} z, z \sim N (0, I) .

(7)

We then combine this prior proposal with an observation-based term in a noise-matched manner. To avoid inserting clean values into a noisy state and to ensure a leakage-free re-masking evaluation protocol, we first define the visible observation tensor

Y_{vis} = Y ⊙ M_{keep}

and re-noise these visible observations to the current diffusion level:

X_{t - 1}^{meas} = \sqrt{{\bar{α}}_{t - 1}} Y_{vis} + \sqrt{1 - {\bar{α}}_{t - 1}} ξ, ξ \sim N (0, I) .

(8)

Hard Consistency (Noise Matched)

We enforce consistency strictly on the visible subset. Let

M_{cons} = M_{keep}

be the consistency mask. The update is:

X_{t - 1} = M_{cons} ⊙ X_{t - 1}^{meas} + (1 - M_{cons}) ⊙ X_{t - 1}^{prior} .

(9)

In this hard-consistency variant, we treat the (re-noised) visible observations as exact constraints on the visible subset, i.e., we directly replace the corresponding entries after noise matching.

Soft Consistency (Noise-Aware Fusion)

When measurements are noisy, strict clamping can overfit corrupted readings. To make the role and scope of the proposed update explicit, we emphasize that the soft-consistency mechanism is intended as a practical approximation to posterior-guided reverse sampling under an explicit observation-noise assumption, rather than as a closed-form derivation of the exact reverse-time Bayesian posterior of the full diffusion process. Its purpose is to mitigate the clean–noisy mismatch caused by direct replacement of observations and to provide a noise-aware balance between measurement consistency and prior-guided denoising. Assuming additive observation noise with variance

σ_{y}^{2}

, the effective variance of the noise-matched observation term at step

t - 1

is:

ν_{t - 1}^{2} = (1 - {\bar{α}}_{t - 1}) + {\bar{α}}_{t - 1} σ_{y}^{2} .

(10)

On observed entries, we fuse the prior proposal and the measurement term via a Gaussian product-of-experts:

X_{t - 1}^{post} = \frac{ν_{t - 1}^{- 2} X_{t - 1}^{meas} + σ_{t}^{- 2} X_{t - 1}^{prior}}{ν_{t - 1}^{- 2} + σ_{t}^{- 2}} .

(11)

The state is then updated using the same consistency mask

M_{cons}

:

X_{t - 1} = M_{cons} ⊙ X_{t - 1}^{post} + (1 - M_{cons}) ⊙ X_{t - 1}^{prior} .

(12)

Under this interpretation,

X_{t - 1}^{meas}

and

X_{t - 1}^{prior}

can be viewed as two Gaussian experts: the former carries the noise-matched observational information, while the latter represents the prior-guided reverse proposal produced by the denoiser and sampler. Their precision-weighted fusion is therefore best understood as a practical Gaussian approximation to the local posterior update, not as a claim of exact Bayesian optimality under the full generative model. As t decreases, diffusion noise diminishes and the constraint naturally tightens; the update approaches hard consistency when

σ_{y}^{2}

is small. This interpretation is also consistent with the empirical behavior observed in the sensitivity analysis of

σ_{y}

(Section 4.6.3): moderate nonzero values provide a better balance between observational conditioning and prior guidance than strict hard clamping, whereas excessively large values weaken the contribution of measurements. Unless stated otherwise,

σ_{y}

is defined in the normalized space. When

σ_{y}

is specified in physical units, it is converted to the normalized space by dividing by the training-split standard deviation of the corresponding pollutant.

During re-masking evaluation, the target entries indexed by

M_{tgt}

are treated as latent variables and excluded from

M_{cons}

and

Y_{vis}

, ensuring that they do not influence the reverse diffusion trajectory. The denoiser conditioning similarly uses

M_{c} = M_{keep}

. With this step-wise consistency update in place, we can use different samplers to trade off speed and accuracy, and obtain uncertainty estimates by repeated posterior sampling.

3.4.3. Fast Sampling and Uncertainty Estimation

The consistency updates in Equations (9)–(12) are applied after each solver step and are therefore compatible with different samplers. In experiments, we report results for DDPM and accelerated solvers (DDIM, DPM-Solver++) under varying numbers of function evaluations (NFE). Following the experimental definition in Section 4.1, no clamping refers to standard conditional diffusion where observations enter only through the denoiser inputs (conditioning), without explicit replacement or fusion on the reverse trajectory.

Uncertainty is obtained by Monte Carlo sampling. Given S posterior samples

{X_{0}^{(s)}}_{s = 1}^{S}

, we estimate moments by:

\hat{E} [X_{0} ∣ Y] = \frac{1}{S} \sum_{s = 1}^{S} X_{0}^{(s)}, \hat{Var} [X_{0} ∣ Y] = \frac{1}{S - 1} \sum_{s = 1}^{S} {(X_{0}^{(s)} - \hat{E} [X_{0} ∣ Y])}^{2} .

(13)

Deterministic metrics use the posterior mean, while probabilistic metrics (e.g., CRPS) are computed from the sample set.

Training follows the standard conditional diffusion formulation with the noise-prediction objective. For each mini-batch, a diffusion step

t \sim Uniform ({1, \dots, K})

is sampled and

ϵ \sim N (0, I)

is drawn to obtain the noisy state

X_{t}

via Equation (5). Observed station pollutant entries are randomly partitioned into a visible subset and a target subset, forming

M_{keep}

and

M_{tgt}

with

M_{keep} + M_{tgt} = M_{st}

. The denoiser is conditioned on

C = (Y ⊙ M_{keep}, M_{keep}, U, S, G or G_{t})

, and parameters

θ

are optimized by minimizing Equation (6), with supervision restricted to the target entries indexed by

M_{tgt}

(station entries only).

3.5. Implementation Details

For reproducibility, we provide additional implementation details here. The proposed STGPD framework was implemented in PyTorch 2.0 and trained on a workstation equipped with a single NVIDIA V100 GPU (NVIDIA Corporation, Santa Clara, CA, USA) with 24 GB VRAM, in a Linux environment (Ubuntu 22.04). The core denoiser adopts a bidirectional graph-recurrent architecture with a hidden dimension of 64. For the diffusion process, we employed a linear noise schedule with

K = 1000

steps, where

β_{1} = 1 \times 10^{- 4}

and

β_{K} = 0.02

. During inference, DPM-Solver++ was used with

N F E = 20

to accelerate sampling while maintaining reconstruction fidelity. All major hyperparameters, including the learning rate, batch size, graph construction settings, and soft-consistency configuration, were selected based on validation performance. The reported learning rate was retained after preliminary validation-based tuning because it yielded stable convergence in all reported experiments. In particular, the soft-consistency parameter

σ_{y}

was set to 0.05 in the normalized space after sensitivity analysis, so as to provide a practical balance between diffusion-prior guidance and sparse observational constraints. These details are reported to improve transparency and to help readers better assess the robustness and reproducibility of the proposed framework.

4. Experiments and Results

We evaluate STGPD using the Augsburg dataset. Since exhaustive grid-level ground truth is unavailable, quantitative evaluation relies on station-level cross-validation under controlled re-masking and leave-one-station-out protocols. Therefore, the reported results provide indirect but practically relevant evidence for probabilistic grid-field reconstruction under sparse observational constraints, rather than exhaustive verification of all grid-level predictions. We assess performance across four dimensions: (1) reconstruction accuracy under standard sparsity; (2) spatial generalization via station hold-out tests; (3) robustness to extreme data loss; and (4) resilience to sensor noise injection and sampling-efficiency trade-offs. We next describe the experimental protocol and implementation details before presenting results for each evaluation dimension.

4.1. Experimental Setup

To evaluate temporal generalization, the dataset is partitioned chronologically to prevent information leakage from future timestamps. The training, validation, and testing splits follow a standard temporal order. Deterministic metrics are computed on de-normalized values (

μ g / m^{3}

) to reflect physical magnitudes, while probabilistic scores (CRPS) are reported in the normalized space to facilitate comparison between PM₁₀ and NO₂.

We simulate sparse supervision using a randomized re-masking strategy on station observations. Let

M_{st}

denote the binary mask of available station entries. On the test split, we randomly partition observed station entries into two disjoint subsets: a conditioning set (

M_{keep}

) used as model input, and a target set (

M_{tgt}

) reserved strictly for evaluation. The protocol is:

Input: the model is conditioned on $Y ⊙ M_{keep}$ together with exogenous covariates (meteorology, traffic, static descriptors) and the graph structure.
Output: the model reconstructs the full spatiotemporal field; error metrics are computed only on entries indexed by $M_{tgt}$ .

This protocol ensures that target values never influence the sampling trajectory, providing a leakage-free assessment of imputation performance. Unless specified otherwise, 20% of available station entries are assigned to

M_{tgt}

. All methods share the same fixed re-masking partition on the test split, generated with a single random seed.

To contextualize performance, we compare STGPD against baselines from three categories:

1.: Spatial Interpolation: IDW and Kriging, applied per timestamp using station coordinates and the available observations in $M_{keep}$ . These methods use only same-timestep information and ignore temporal context.
2.: Deterministic Deep Learning: BRITS [53] and GRIN [37], which learn spatiotemporal dependencies but output single-point estimates.
3.: Probabilistic Diffusion Models: state-of-the-art diffusion imputers including CSDI [20], PriSTI [21], SaSDim [22], RDPI [23], and CoFILL [24].

STGPD employs a DDPM sampler with

K = 200

diffusion steps by default. The model is trained using Adam with a learning rate of

10^{- 4}

and a batch size of 32. Early stopping is applied based on validation loss.All experiments were executed on a workstation equipped with a single NVIDIA V100 GPU (NVIDIA Corporation, Santa Clara, CA, USA). For clarity, the observational time series are split chronologically into training, validation, and test subsets, and only station observations are used as supervised references during model optimization and quantitative evaluation. In contrast, the city-grid nodes serve as unlabeled inference targets without exhaustive ground-truth values. Under the standard evaluation protocol, only observed station entries in the test period are re-masked and used as targets, while naturally missing entries remain excluded unless explicitly involved in a dedicated masking protocol.

4.2. Performance Under Random Re-Masking

We first report overall imputation performance under the standard random re-masking setting. Table 2 reports deterministic metrics under random re-masking. Across both pollutants, STGPD achieves the lowest RMSE/MAE and the highest

R^{2}

. Table 3 reports CRPS for diffusion-based models; STGPD yields the lowest CRPS, indicating improved probabilistic quality under the same supervision.Although the absolute margin over the strongest baseline is moderate under the standard re-masking setting, repeated-run significance tests indicate that the improvement is statistically reliable rather than attributable to random fluctuation alone (Figure 3). Moreover, the practical contribution of STGPD is not limited to average point-error reduction, but also lies in uncertainty-aware conditioning and more stable behavior under sparse, noisy, and partially missing observational settings. This behavior is consistent with the method design, in which observational information is incorporated through noise-aware soft consistency rather than rigid replacement during reverse diffusion.

4.3. Station Outage Extrapolation: Leave-One-Out

We next evaluate spatial extrapolation under full station outages via 4-fold LOO evaluation. In each fold, we mask all pollutant observations from one station for the entire test horizon, while keeping its node in the graph. Exogenous covariates and static descriptors remain available at the held-out station. We evaluate a single model trained with the standard random re-masking objective, without retraining per fold.

Table 4 reports RMSE on the held-out station for each fold, and Table 5 reports the corresponding normalized CRPS. Figure 4 visualizes absolute error distributions. STGPD yields lower errors and tighter interquartile ranges than PriSTI across stations, demonstrating robust spatial extrapolation under station outages.

4.4. Uncertainty Analysis and Interpretation of Grid-Level Reconstructions

We acknowledge that the station-based evaluation protocols used in this study provide only indirect evidence for grid-level reconstruction quality in fully unmonitored areas. Because exhaustive gridded ground truth is unavailable, the reconstructed city-scale fields should be interpreted as probabilistic spatial inferences constrained by station observations, rather than as fully verified deterministic surfaces.

To better justify the grid-level reconstructions, we added an uncertainty analysis based on the posterior standard deviation of diffusion samples. Figure 5 presents a representative example of the posterior mean field, the corresponding posterior standard deviation field, and the relationship between uncertainty and the distance to the nearest monitoring station. The posterior mean illustrates the inferred city-scale concentration pattern under sparse observational constraints, while the posterior standard deviation provides a spatially explicit diagnostic of confidence in the reconstruction.

A clear spatial pattern can be observed: uncertainty is generally lower near observed stations and increases in less constrained regions, especially where station support is sparse and spatial extrapolation is required. This behavior is consistent with the design of the proposed conditioning mechanism. Grid nodes close to monitoring stations are more strongly constrained by the available observations through the soft-consistency update, whereas nodes farther away rely more heavily on the learned spatiotemporal prior. As a result, posterior uncertainty naturally increases with extrapolation distance.

The positive relationship between posterior uncertainty and the distance to the nearest monitoring station shown in Figure 5c further supports this interpretation. Although this analysis does not replace direct full-field validation, it provides an interpretable confidence diagnostic for identifying where the reconstructed field is more strongly supported by observations and where it should be interpreted with greater caution. Under extremely sparse monitoring conditions, such uncertainty-aware interpretation is particularly important for practical use of city-scale reconstructed air-quality surfaces.

4.5. Robustness to Extreme Sparsity

We further stress-test robustness under extreme information loss, considering both spatial sparsity (fewer conditioning stations) and temporal sparsity (fewer observed timestamps).

We reduce the number of conditioning stations from four down to one. For each

k \in {1, 2, 3, 4}

, we select k stations as conditioning stations and mask pollutant observations from the remaining stations in the conditioning mask, while keeping all nodes and covariates available. Evaluation is performed on the same target set indexed by

M_{tgt}

, so the reported results remain directly comparable across different k and consistent with the standard re-masking setting. This design isolates the effect of reduced conditioning information while keeping the evaluation target distribution fixed.

We first form

M_{tgt}

by masking 20% of observed station entries on the test split. From the remaining entries

M_{keep}

, we further subsample a proportion

p \in {100 %, 50 %, 10 %, 1 %}

as the effective conditioning mask. All models are evaluated on the same targets indexed by

M_{tgt}

. Figure 6 shows that STGPD degrades most gracefully as information decreases, maintaining a clear gap over baselines under both extreme spatial and temporal sparsity.

4.6. Efficiency and Noise-Aware Consistency

Finally, we analyze the trade-off between sampling efficiency and robustness, focusing on how different consistency strategies behave under fewer function evaluations and under explicit sensor-noise injection.

4.6.1. Sampling Efficiency with Fast Solvers

We evaluate DPM-Solver++ with different numbers of function evaluations (NFE): 10, 15, 20, 50, and 200. We compare STGPD (Soft Clamping) against Hard Clamping, Naive Replacement, and No Clamping. All variants are evaluated under the same station-target re-masking protocol. As NFE decreases, all methods exhibit the expected accuracy–efficiency trade-off. Across the tested settings, Soft Clamping consistently provides the most stable conditioning behavior while retaining competitive accuracy at low NFE. This behavior is consistent with the role of the soft-consistency mechanism, which is designed to balance observational conditioning against prior-guided denoising rather than enforcing rigid replacement of measurements.

4.6.2. Robustness to Sensor Noise

We inject additive Gaussian noise into conditioning observations at inference time:

\tilde{Y} = Y + η

, with

η \sim N (0, σ_{noise}^{2})

and

σ_{noise} \in {0, 5, 10, 20} μ g / m^{3}

. We compare Soft Clamping (adaptive

σ_{y}

) against Hard Clamping, Naive Replacement, and No Clamping.

Figure 7 shows that Soft Clamping remains stable across noise levels, whereas Hard Clamping and Naive Replacement degrade as they force the sampler to fit corrupted measurements. This result is consistent with the noise-aware design of the proposed fusion step, in which the confidence assigned to measurements is adjusted through

σ_{y}

instead of treating all observations as equally reliable.

4.6.3. Sensitivity to the Observation-Noise Parameter $σ_{y}$

To further clarify the role of the soft-consistency parameter, we vary

σ_{y}

while keeping all other settings fixed and evaluate both reconstruction accuracy (RMSE) and probabilistic quality (CRPS). Figure 8 shows that moderate nonzero values of

σ_{y}

provide the best balance between observational conditioning and diffusion-prior guidance. In contrast, hard clamping (

σ_{y} = 0

) over-trusts the observations, whereas excessively large values weaken conditioning and degrade performance. These results support interpreting

σ_{y}

as an effective observation-noise parameter that controls the confidence assigned to conditioning measurements, rather than as an exact estimate of raw sensor error.

4.7. Interpreting the Learned Spatial Receptive Field

To provide a complementary interpretability view, we analyze the learned attention weights over 19 buffer radii (0.1–5.0 km) used in static geographic descriptors. Figure 9 indicates that NO₂ places relatively more weight on localized context, whereas PM₁₀ exhibits a broader receptive field, consistent with their expected spatial characteristics.

The distinct scale sensitivities of NO₂ and PM₁₀ can be plausibly linked to differences in their dominant urban atmospheric processes. NO₂ is more strongly associated with local traffic-related emissions and near-road concentration gradients, so descriptors extracted at relatively small buffer radii are expected to be more informative. In contrast, PM₁₀ is influenced not only by local sources but also by broader-scale processes such as regional transport, background concentrations, and resuspension. This may help explain why PM₁₀ exhibits weaker importance at very small radii (

< 0.3

km) and a comparatively stronger contribution from larger contextual scales. Therefore, the observed scale-dependent descriptor weights are broadly consistent with the differing spatial representativeness of traffic-dominated versus more regionally modulated pollutants.

5. Conclusions

We studied city-scale spatiotemporal air-quality reconstruction in Augsburg from sparse and unreliable station measurements, focusing on PM₁₀ and NO₂. We proposed STGPD, a graph-structured diffusion framework that formulates reconstruction as constrained posterior-guided sampling and enforces observation consistency throughout the reverse diffusion trajectory. The key component is a noise-aware soft-consistency update that re-noises measurements to the current diffusion level and adaptively relaxes constraints under sensor corruption, avoiding the clean–noisy mismatch introduced by hard replacement.

Across all evaluated stress scenarios, including reduced station availability, heavy temporal missingness, full station outages, and explicit sensor-noise injection, STGPD shows lower error growth and more stable uncertainty behavior than the evaluated baselines. These gains are driven by variance-weighted fusion, which balances the model prior with a noise-matched observation term instead of forcing exact agreement with potentially corrupted readings. In addition, the learned multi-scale spatial attention highlights pollutant-specific receptive fields across buffer radii, providing an interpretable view of which spatial scales contribute most to reconstruction. These properties suggest that STGPD may be useful for urban exposure assessment and policy-relevant environmental mapping under sparse monitoring, although further validation beyond the present case study remains necessary. Extending the evaluation to additional cities and incorporating richer transport-aware graph constructions are natural next steps.

This study also has several limitations. First, the analysis is based on a single-city case study with a very sparse monitoring network, which limits the strength of broader generalization claims across cities, climatic conditions, or monitoring-density regimes. Second, although the proposed framework reconstructs dense city-grid air-quality fields, exhaustive gridded ground truth is unavailable; therefore, grid-level performance in fully unmonitored areas can only be assessed indirectly through station-based validation protocols and uncertainty analysis. In this sense, the current evaluation provides supporting evidence for probabilistic spatial reconstruction under sparse observational constraints, rather than exhaustive verification of all grid-level predictions. Third, source-related heterogeneity is represented indirectly through geographic descriptors and exogenous covariates rather than through explicit source-resolved emission modeling. Consequently, the present results should be interpreted as evidence from a promising sparse-monitoring case study rather than as definitive proof of universal superiority or broad external generalization. Future work should extend the analysis to additional cities, denser monitoring settings, and independent spatial references where possible.

Author Contributions

Conceptualization, Guanglei Zheng and Yuchai Wan; methodology, Guanglei Zheng and Xun Zhang; software, Guanglei Zheng; validation, Guanglei Zheng and Xun Zhang; formal analysis, Guanglei Zheng; investigation, Guanglei Zheng and Xiansheng Liu; data curation, Guanglei Zheng and Xiansheng Liu; writing—original draft preparation, Guanglei Zheng; writing—review and editing, Yuchai Wan, Xun Zhang and Xiansheng Liu; visualization, Guanglei Zheng; supervision, Yuchai Wan; project administration, Yuchai Wan. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Guangdong Basic and Applied Basic Research Foundation (2026A1515030034) and the National Natural Science Foundation of China (42407566).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are available from the corresponding author upon reasonable request. Some data may be subject to third-party restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

STGPD	SpatioTemporal Graph Posterior Diffusion
PM₁₀	Particulate Matter with diameter ≤ 10 μm
NO₂	Nitrogen Dioxide
GNN	Graph Neural Network
DDPM	Denoising Diffusion Probabilistic Models
LOO	Leave-One-Out
CRPS	Continuous Ranked Probability Score
RMSE	Root Mean Square Error

References

Zhang, T.; Zheng, B.; Huang, R. Adaptive high-resolution mapping of air pollution with a novel implicit 3D representation approach. npj Clim. Atmos. Sci. 2025, 8, 180. [Google Scholar] [CrossRef]
Lin, H.; Li, S.; Niu, J.; Yang, J.; Wang, Q.; Li, W.; Liu, S. Estimation of Ultrahigh Resolution PM2.5 in Urban Areas by Using 30 m Landsat-8 and Sentinel-2 AOD Retrievals. Remote Sens. 2025, 17, 2609. [Google Scholar] [CrossRef]
Lolli, S. Urban PM2.5 concentration monitoring: A review of recent advances in ground-based, satellite, model, and machine learning integration. Urban Clim. 2025, 63, 102566. [Google Scholar] [CrossRef]
Krige, D.G. A statistical approach to some basic mine valuation problems on the Witwatersrand. J. S. Afr. Inst. Min. Metall. 1951, 52, 119–139. [Google Scholar]
Shepard, D. A two-dimensional interpolation function for irregularly-spaced data. In ACM ‘68: Proceedings of the 1968 23rd ACM National Conference; Association for Computing Machinery: New York, NY, USA, 1968; pp. 517–524. [Google Scholar]
Liang, Y.; Xia, Y.; Ke, S.; Wang, Y.; Wen, Q.; Zhang, J.; Zheng, Y.; Zimmermann, R. AirFormer: Predicting Nationwide Air Quality in China with Transformers. In AAAI’23/IAAI’23/EAAI’23: Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2023. [Google Scholar]
Deng, Z.; Du, X.; Yang, H. MSTGAN: A Multi-scale Spatiotemporal Graph Adversarial Network for Air Quality Prediction. Inf. Sci. 2024, 676, 120804. [Google Scholar] [CrossRef]
Chen, Z.; Zhao, J. Air Quality Prediction Based on Dynamic Graph Convolutional Network with Spatiotemporal Attention Mechanism. In 2024 4th International Conference on Electronic Information Engineering and Computer Science (EIECS); IEEE: New York, NY, USA, 2024; pp. 740–744. [Google Scholar]
Feng, Y.; Wang, Q.; Xia, Y.; Huang, J.; Zhong, S.; Liang, Y. Spatio-temporal field neural networks for air quality inference. arXiv 2024, arXiv:2403.02354. [Google Scholar] [CrossRef]
Lin, Z.; Wang, S.; Xu, J.; Shi, P.; Ma, Y.; Wang, Y.; Zhang, G. A graph neural networks approach predicted spatiotemporal changes of ozone concentrations in the Yangtze River Delta (China). Environ. Res. Commun. 2025, 7, 091004. [Google Scholar] [CrossRef]
Huo, T.; Li, H.; Zhang, H.; Liu, W.; Sun, S.; Huang, Z.; Xie, W. Air pollutant prediction by spatial-temporal information reconstruction and fusion. Results Eng. 2025, 28, 107393. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. arXiv 2018, arXiv:1707.01926. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2017, arXiv:1709.04875. [Google Scholar]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In IJCAI’19: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI); AAAI Press: Washington, DC, USA, 2019. [Google Scholar]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the dots: Multivariate time series forecasting with graph neural networks. In KDD ‘20: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2020; pp. 753–763. [Google Scholar]
Zheng, C.; Fan, X.; Wang, C.; Qi, J. GMAN: A graph multi-attention network for traffic prediction. Proc. AAAI Conf. Artif. Intell. 2020, 34, 1234–1241. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Proc. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Jin, M.; Koh, H.Y.; Wen, Q.; Zambon, D.; Alippi, C.; Webb, G.I.; King, I.; Pan, S. A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10466–10485. [Google Scholar] [CrossRef]
Tashiro, Y.; Song, J.; Song, Y.; Ermon, S. CSDI: Conditional Score-Based Diffusion Models for Probabilistic Time Series Imputation. Proc. Adv. Neural Inf. Process. Syst. 2021, 34, 24804–24816. [Google Scholar]
Liu, M.; Li, Y.; Zhang, G.; Wang, Y.; Liu, J.; Li, G. PriSTI: A Conditional Diffusion Framework for Spatiotemporal Imputation. In 2023 IEEE 39th International Conference on Data Engineering (ICDE); IEEE: New York, NY, USA, 2023. [Google Scholar]
Zhang, S.; Wang, S.; Tan, X.; Liu, R.; Zhang, J.; Wang, J. Sasdim: Self-adaptive noise scaling diffusion model for spatial time series imputation. arXiv 2023, arXiv:2309.01988. [Google Scholar] [CrossRef]
Liu, Z.; Zhao, X.; Song, Y. RDPI: A Refine Diffusion Probability Generation Method for Spatiotemporal Data Imputation. Proc. AAAI Conf. Artif. Intell. 2025, 39, 12255–12263. [Google Scholar] [CrossRef]
He, W.; Lin, R.; Shen, L.; Kwok, J. Filling the Missings: Spatiotemporal Data Imputation by Conditional Diffusion. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 16–22 August 2025. [Google Scholar]
Yang, Y.; Jin, M.; Wen, H.; Zhang, C.; Liang, Y.; Ma, L.; Wang, Y.; Liu, C.; Yang, B.; Xu, Z.; et al. A survey on diffusion models for time series and spatio-temporal data. ACM Comput. Surv. 2026, 58, 196. [Google Scholar] [CrossRef]
Lugmayr, A.; Danelljan, M.; Romero, A.; Yu, F.; Timofte, R.; Van Gool, L. Repaint: Inpainting using denoising diffusion probabilistic models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2022; pp. 11461–11471. [Google Scholar]
Wu, Q.; Han, M.; Jiang, T.; Jiang, C.; Luo, J.; Jiang, M.; Fan, H.; Liu, S. Realistic noise synthesis with diffusion models. Proc. AAAI Conf. Artif. Intell. 2025, 39, 8432–8440. [Google Scholar] [CrossRef]
Chung, H.; Kim, J.; Mccann, M.T.; Klasky, M.L.; Ye, J.C. Diffusion posterior sampling for general noisy inverse problems. arXiv 2022, arXiv:2209.14687. [Google Scholar]
Kawar, B.; Elad, M.; Ermon, S.; Song, J. Denoising diffusion restoration models. Proc. Adv. Neural Inf. Process. Syst. 2022, 35, 23593–23606. [Google Scholar]
Dou, Z.; Song, Y. Diffusion posterior sampling for linear inverse problem solving: A filtering perspective. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar]
Aali, A.; Daras, G.; Levac, B.; Kumar, S.; Dimakis, A.G.; Tamir, J.I. Ambient diffusion posterior sampling: Solving inverse problems with diffusion models trained on corrupted data. arXiv 2024, arXiv:2403.08728. [Google Scholar] [CrossRef]
Yismaw, N.; Kamilov, U.S.; Asif, M.S. Gaussian is all you need: A unified framework for solving inverse problems via diffusion posterior sampling. IEEE Trans. Comput. Imaging 2025, 11, 1020–1030. [Google Scholar] [CrossRef]
Li, J.; Wang, C. Efficient Diffusion Posterior Sampling for Noisy Inverse Problems. SIAM J. Imaging Sci. 2025, 18, 1468–1492. [Google Scholar] [CrossRef]
Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
Lu, C.; Zhou, Y.; Bao, F.; Chen, J.; Li, C.; Zhu, J. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. Mach. Intell. Res. 2025, 22, 730–751. [Google Scholar] [CrossRef]
Zhao, Z.; Luo, Z.; Sjölund, J.; Schön, T. Conditional sampling within generative diffusion models. Philos. Trans. A 2025, 383, 20240329. [Google Scholar] [CrossRef]
Cini, A.; Marisca, I.; Alippi, C. Multivariate Time Series Imputation by Graph Neural Networks. arXiv 2021, arXiv:2108.00298. [Google Scholar]
Ma, M.; Xie, P.; Teng, F.; Wang, B.; Ji, S.; Zhang, J.; Li, T. HiSTGNN: Hierarchical spatio-temporal graph neural network for weather forecasting. Inf. Sci. 2023, 648, 119580. [Google Scholar] [CrossRef]
Yao, S.; Huang, B. Spatiotemporal interpolation using graph neural network. Ann. Am. Assoc. Geogr. 2023, 113, 1856–1877. [Google Scholar] [CrossRef]
Liu, D.; Wang, J.; Shang, S.; Han, P. MSDR: Multi-step dependency relation networks for spatial temporal forecasting. In KDD ‘22: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1042–1050. [Google Scholar]
Jiang, J.; Han, C.; Zhao, W.X.; Wang, J. Pdformer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. Proc. AAAI Conf. Artif. Intell. 2023, 37, 4365–4373. [Google Scholar] [CrossRef]
Wang, D.; Cao, J.; Zhang, B.; Zhang, Y.; Xie, L. A Novel Flexible Geographically Weighted Neural Network for High-Precision PM2.5 Mapping across the Contiguous United States. ISPRS Int. J. Geo-Inf. 2024, 13, 217. [Google Scholar] [CrossRef]
Kaveh, M.; Mesgari, M.S.; Kaveh, M. A Novel Evolutionary Deep Learning Approach for PM2.5 Prediction Using Remote Sensing and Spatial–Temporal Data: A Case Study of Tehran. ISPRS Int. J. Geo-Inf. 2025, 14, 42. [Google Scholar] [CrossRef]
Yuan, Y.; Han, C.; Ding, J.; Jin, D.; Li, Y. Urbandit: A foundation model for open-world urban spatio-temporal learning. arXiv 2024, arXiv:2411.12164. [Google Scholar]
Liang, Y.; Wen, H.; Xia, Y.; Jin, M.; Yang, B.; Salim, F.; Wen, Q.; Pan, S.; Cong, G. Foundation models for spatio-temporal data science: A tutorial and survey. In KDD ‘25: Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2; Association for Computing Machinery: New York, NY, USA, 2025; pp. 6063–6073. [Google Scholar]
Fang, Y.; Miao, H.; Liang, Y.; Deng, L.; Cui, Y.; Zeng, X.; Xia, Y.; Zhao, Y.; Pedersen, T.B.; Jensen, C.S.; et al. Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review. arXiv 2025, arXiv:2506.01364. [Google Scholar] [CrossRef]
Shu, D.; Li, Z.; Barati Farimani, A. Physics-Informed Diffusion Models. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna Austria, 7–11 May 2024. [Google Scholar]
Zhou, N.; Hong, W.; Wang, H.; Zheng, J.; Wang, Q.; Song, Y.; Zhang, X.P.; Li, Y.; Chen, X. STeP-Diff: Spatio-Temporal Physics-Informed Diffusion Models for Mobile Fine-Grained Pollution Forecasting. arXiv 2025, arXiv:2512.04385. [Google Scholar] [CrossRef]
Delavar, M.R.; Gholami, A.; Shiran, G.R.; Rashidi, Y.; Nakhaeizadeh, G.R.; Fedra, K.; Hatefi Afshar, S. A novel method for improving air pollution prediction based on machine learning approaches: A case study applied to the capital city of Tehran. ISPRS Int. J. Geo-Inf. 2019, 8, 99. [Google Scholar] [CrossRef]
Chen, K.; Li, G.; Li, H.; Wang, Y.; Wang, W.; Liu, Q.; Wang, H. Quantifying uncertainty: Air quality forecasting based on dynamic spatial-temporal denoising diffusion probabilistic model. Environ. Res. 2024, 249, 118438. [Google Scholar] [CrossRef] [PubMed]
Gneiting, T.; Raftery, A.E.; Westveld, A.H., III; Goldman, T. Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 2005, 133, 1098–1118. [Google Scholar] [CrossRef]
Pang, T.; Lu, C.; Du, C.; Lin, M.; Yan, S.; Deng, Z. On calibrating diffusion probabilistic models. Proc. Adv. Neural Inf. Process. Syst. 2023, 36, 49234–49249. [Google Scholar]
Cao, W.; Wang, D.; Li, J.; Zhou, H.; Li, L.; Li, Y. BRITS: Bidirectional Recurrent Imputation for Time Series. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, QC, Canada, 2–8 December 2018. [Google Scholar]

Figure 1. Study area and monitoring stations. Left: location of Augsburg in Germany. Right: Augsburg boundary with air-quality monitoring stations and traffic monitoring locations used to derive traffic covariates.

Figure 2. Overview of STGPD. The model learns a conditional diffusion prior on a fused geographic-semantic graph and enforces step-wise observation consistency via noise-matched re-noising with hard clamping or noise-aware soft fusion, enabling posterior sampling and uncertainty estimation over the city grid.

Figure 3. Repeated-run significance tests under the standard random re-masking setting. RMSE distributions across repeated runs are shown for STGPD and the strongest baseline CoFILL on PM₁₀ and NO₂. The performance differences are statistically significant for both pollutants, indicating that the observed improvement is unlikely to be explained by random fluctuation alone.

Figure 4. LOO absolute error distributions for PM₁₀(a) and NO₂ (b) across stations (S1–S4). STGPD shows lower median errors and tighter spreads compared to GRIN and PriSTI.

Figure 5. Uncertainty analysis for a representative reconstruction case. (a) Posterior mean field. (b) Posterior standard deviation field. (c) Relationship between posterior uncertainty and distance to the nearest monitoring station. Triangles indicate monitoring stations. Uncertainty is generally lower near observed stations and increases in less constrained regions, providing an interpretable confidence diagnostic for grid-level reconstructions under sparse monitoring conditions.

Figure 6. Robustness to sparsity. Subfigure (a): PM₁₀ under spatial sparsity (varying observed station count). Subfigure (b): PM₂ under temporal sparsity (varying retained observation ratio). Subfigure (c): NO₂ under spatial sparsity (varying observed station count). Subfigure (d): NO₂ under temporal sparsity (varying retained observation ratio).

Figure 7. Robustness to sensor noise for PM₁₀ (a) and NO₂ (b). Soft (STGPD) denotes noise-aware soft clamping with adaptive

σ_{y}

, while Soft (Fixed) uses a fixed

σ_{y}

.

Figure 7. Robustness to sensor noise for PM₁₀ (a) and NO₂ (b). Soft (STGPD) denotes noise-aware soft clamping with adaptive

σ_{y}

, while Soft (Fixed) uses a fixed

σ_{y}

.

Figure 8. Sensitivity of STGPD to the effective observation-noise parameter

σ_{y}

. Subfigure (a): RMSE as a function of

σ_{y}

. Subfigure (b): CRPS as a function of

σ_{y}

. A moderate nonzero

σ_{y}

yields the best performance in both RMSE and CRPS, while hard clamping (

σ_{y} = 0

) and overly large

σ_{y}

lead to degraded results.

Figure 8. Sensitivity of STGPD to the effective observation-noise parameter

σ_{y}

. Subfigure (a): RMSE as a function of

σ_{y}

. Subfigure (b): CRPS as a function of

σ_{y}

. A moderate nonzero

σ_{y}

yields the best performance in both RMSE and CRPS, while hard clamping (

σ_{y} = 0

) and overly large

σ_{y}

lead to degraded results.

Figure 9. Normalized multi-scale spatial attention weights over buffer radii (0.1–5.0 km) for NO₂ and PM₁₀.

Table 1. Summary of the fixed monitoring stations and measurement information used in this study.

Station/Site Type	Measurement Method	QA/QC
Königsplatz (urban traffic)	Routine hourly monitoring	Available; re-masked
Bourges-Platz (urban background)	Routine hourly monitoring	Available; re-masked
Karlstraße (urban traffic)	Routine hourly monitoring	Available; re-masked
UAS (urban background)	Routine hourly monitoring	Available; re-masked

Table 2. Deterministic performance on pseudo-missing station entries under re-masking. Bold indicates the best result and underlining indicates the second-best result. ↓ indicates lower is better for RMSE/MAE; ↑ indicates higher is better for

R^{2}

.

Table 2. Deterministic performance on pseudo-missing station entries under re-masking. Bold indicates the best result and underlining indicates the second-best result. ↓ indicates lower is better for RMSE/MAE; ↑ indicates higher is better for

R^{2}

.

Model	PM₁₀			NO₂
Model	RMSE↓	MAE↓	$R^{2} ↑$	RMSE↓	MAE↓	$R^{2} ↑$
IDW	10.21	6.98	0.71	12.56	8.71	0.68
Kriging	9.87	6.72	0.73	12.11	8.39	0.70
BRITS	8.55	5.81	0.79	10.88	7.55	0.76
GRIN	8.12	5.50	0.81	10.34	7.18	0.78
CSDI	7.89	5.32	0.82	10.01	6.95	0.79
PriSTI	7.65	5.15	0.83	9.75	6.70	0.79
SaSDim	7.72	5.20	0.83	9.80	6.75	0.80
RDPI	7.68	5.18	0.83	9.78	6.72	0.80
CoFILL	7.62	5.12	0.83	9.72	6.68	0.81
STGPD	7.51	5.05	0.84	9.58	6.62	0.82

Table 3. Probabilistic performance (Normalized CRPS) on pseudo-missing station entries (

S = 20

samples). Bold indicates the best result; underlining indicates the second-best result; ↓ indicates lower is better.

Table 3. Probabilistic performance (Normalized CRPS) on pseudo-missing station entries (

S = 20

samples). Bold indicates the best result; underlining indicates the second-best result; ↓ indicates lower is better.

Model	CRPS↓
Model	PM₁₀	NO₂
CSDI	0.085	0.098
PriSTI	0.081	0.093
SaSDim	0.082	0.094
RDPI	0.080	0.092
CoFILL	0.081	0.093
STGPD	0.079	0.091

Table 4. LOO probabilistic performance (RMSE) on the held-out station. Bold indicates the best result; underlining indicates the second-best result; ↓ indicates lower is better.

Model	PM₁₀ RMSE↓					NO₂ RMSE↓
Model	S1	S2	S3	S4	Avg.	S1	S2	S3	S4	Avg.
GRIN	9.12	8.48	8.93	8.71	8.81	11.54	10.92	11.28	11.15	11.22
PriSTI	8.55	7.92	8.33	8.14	8.24	10.83	10.25	10.61	10.38	10.52
STGPD	8.21	7.68	8.02	7.89	7.95	10.47	9.88	10.29	10.12	10.19

Table 5. LOO probabilistic performance (CRPS) on the held-out station. Bold indicates the best result; underlining indicates the second-best result; ↓ indicates lower is better.

Model	CRPS↓ (per Held-Out Station)				Avg.
Model	S1	S2	S3	S4	Avg.
PriSTI	0.090	0.083	0.087	0.085	0.086
STGPD	0.086	0.081	0.084	0.083	0.083

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Zheng, G.; Wan, Y.; Zhang, X.; Liu, X. Noise-Aware Diffusion for City-Scale Air-Quality Reconstruction from Sparse Monitoring Stations. ISPRS Int. J. Geo-Inf. 2026, 15, 171. https://doi.org/10.3390/ijgi15040171

AMA Style

Zheng G, Wan Y, Zhang X, Liu X. Noise-Aware Diffusion for City-Scale Air-Quality Reconstruction from Sparse Monitoring Stations. ISPRS International Journal of Geo-Information. 2026; 15(4):171. https://doi.org/10.3390/ijgi15040171

Chicago/Turabian Style

Zheng, Guanglei, Yuchai Wan, Xun Zhang, and Xiansheng Liu. 2026. "Noise-Aware Diffusion for City-Scale Air-Quality Reconstruction from Sparse Monitoring Stations" ISPRS International Journal of Geo-Information 15, no. 4: 171. https://doi.org/10.3390/ijgi15040171

APA Style

Zheng, G., Wan, Y., Zhang, X., & Liu, X. (2026). Noise-Aware Diffusion for City-Scale Air-Quality Reconstruction from Sparse Monitoring Stations. ISPRS International Journal of Geo-Information, 15(4), 171. https://doi.org/10.3390/ijgi15040171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Noise-Aware Diffusion for City-Scale Air-Quality Reconstruction from Sparse Monitoring Stations

Abstract

1. Introduction

2. Related Work

2.1. Spatiotemporal Imputation and Graph Neural Networks

2.2. Diffusion Models for Spatiotemporal Learning

2.3. Inverse Problems and Physics-Informed Generative Modeling

2.4. Uncertainty Quantification and Sampling Efficiency

3. Methodology

3.1. Study Area and Data Sources

3.2. Preprocessing and Input Tensors

3.3. Graph Construction

3.4. STGPD: Spatiotemporal Graph Posterior Diffusion

3.4.1. Diffusion Prior with a Graph-Temporal Denoiser

3.4.2. Noise-Aware Observation Consistency During Sampling

Hard Consistency (Noise Matched)

Soft Consistency (Noise-Aware Fusion)

3.4.3. Fast Sampling and Uncertainty Estimation

3.5. Implementation Details

4. Experiments and Results

4.1. Experimental Setup

4.2. Performance Under Random Re-Masking

4.3. Station Outage Extrapolation: Leave-One-Out

4.4. Uncertainty Analysis and Interpretation of Grid-Level Reconstructions

4.5. Robustness to Extreme Sparsity

4.6. Efficiency and Noise-Aware Consistency

4.6.1. Sampling Efficiency with Fast Solvers

4.6.2. Robustness to Sensor Noise

4.6.3. Sensitivity to the Observation-Noise Parameter σ y

4.7. Interpreting the Learned Spatial Receptive Field

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.6.3. Sensitivity to the Observation-Noise Parameter $σ_{y}$