AI-Driven Particulate Matter Forecasting and Spatial Estimation in the CityAirQ Urban Monitoring Network

Gasan, Carol-Luca; Tudose, Dan; Ruse, Laura

doi:10.3390/su18125985

Open AccessArticle

AI-Driven Particulate Matter Forecasting and Spatial Estimation in the CityAirQ Urban Monitoring Network

by

Carol-Luca Gasan

,

Dan Tudose

and

Laura Ruse

^*

Faculty of Automatic Control and Computers, National University of Science and Technology POLITEHNICA Bucharest, 060042 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(12), 5985; https://doi.org/10.3390/su18125985

Submission received: 1 May 2026 / Revised: 4 June 2026 / Accepted: 8 June 2026 / Published: 11 June 2026

(This article belongs to the Section Air, Climate Change and Sustainability)

Download

Browse Figures

Versions Notes

Abstract

Urban air-quality monitoring networks are often sparse, leaving coverage gaps where particulate matter (PM) concentrations cannot be directly observed. This paper extends the CityAirQ pollution tracking platform and its mobile air-quality device prototype by introducing an AI-based benchmark for two Bucharest station networks across three deployment-oriented tasks: multi-station temporal forecasting (Task A), leave-one-station-out same-day spatial estimation (Task B), and a preliminary mobile-site prediction pilot at an uncalibrated location (Task C). The benchmark compares machine-learning models, including ensemble tree methods, recurrent neural networks, and lightweight graph-inspired architectures, evaluated under a unified time-aware rolling protocol. In Task A, the proposed Advanced Stage 0–3 pipeline achieves the best overall MAE (7.12

μ

g/m³), a 4.7% reduction relative to Random Forest (7.47

μ

g/m³), while the Seasonal naïve (10.41

μ

g/m³), Persistence (11.51

μ

g/m³), neural, and graph-inspired references perform worse under recursive forecasting. In Task B, the neighbour-only Random Forest reaches a mean

R^{2}

of 0.873 on the classic four-station network and a median

R^{2}

of 0.734 on the ten-station city-scale extension. Task C is reported as an exploratory six-day prediction pilot, not as deployment-grade validation: no co-located EPA FRM/FEM or equivalent reference monitor was available at the mobile location ℓ. The historical-transfer Random Forest retained a sample-limited positive PM_2.5 association with the raw mobile readings (

r = 0.432

,

n = 6

), while a strict one-day-ahead online persistence predictor reduced PM_2.5 MAE from 40.58 to 20.00

μ

g/m³ on the five forecastable mobile days. Ultimately, accurate PM monitoring empowers sustainable urban planning, helping to mitigate exposure risks and supporting long-term public health and environmental sustainability initiatives.

Keywords:

particulate matter forecasting and spatial estimation; sparse urban monitoring; spatiotemporal machine learning; low-cost sensor calibration; spatial transfer; sustainability; sustainable urban planning

1. Introduction

Air pollution is a major public health concern; particulate matter (PM) concentrations are associated with adverse respiratory and cardiovascular outcomes [1,2]. Forecasting systems can support municipal mitigation actions, early-warning services, and individual exposure-aware decisions [3]. Despite recent progress in machine-learning-based time-series modelling, evaluations in data-scarce urban settings—where station networks are small and observations may be sparse—remain relatively underexplored.

This paper builds upon CityAirQ, a pollution tracking system for urban environments previously introduced by Dinica et al. [4]. The original CityAirQ platform provides an end-to-end IoT infrastructure, including custom portable sensing devices, a mobile application, and a cloud data pipeline, for real-time PM monitoring in metropolitan areas. A related IEEE conference paper documents the low-cost wearable mobile monitoring device used for personal exposure mapping and city-scale data collection [5]. The present work extends that foundation by developing and benchmarking AI-based models that operate over the sparse station networks typical of CityAirQ deployments. This paper reserves the term forecasting for future-time prediction and uses spatial estimation or reconstruction when current-day neighbouring station measurements are used to estimate PM concentrations at unobserved locations.

The models evaluated in this study are drawn from machine learning and deep learning. They include ensemble methods such as Random Forests, which combine many decision trees to produce robust predictions; recurrent neural networks (RNNs), including long short-term memory (LSTM) networks, which learn temporal patterns in sequential data; and graph neural networks (GNNs), which model spatial relationships between monitoring stations. Classical statistical approaches (linear regression) and a kernel-based method (support vector regression) round out the comparison. Together, these models learn patterns from historical data and use them to estimate future or spatially unmeasured PM concentrations.

A deployment-oriented benchmark of these models for sparse PM forecasting and spatial estimation in Bucharest is presented. The evaluation is structured around three scenarios reflecting realistic deployment conditions: (i) multi-station temporal forecasting at existing instrumented sites, (ii) same-day spatial reconstruction at a held-out station using neighbour readings, and (iii) a six-calendar-day exploratory field pilot with a low-cost mobile sensor at an uncalibrated location. Across all three settings, a unified time-aware evaluation protocol is employed to ensure that results are comparable and internally consistent.

Contributions

The main contributions of this paper are as follows. First, the paper proposes an advanced residual spatiotemporal pipeline (Stages 0–3) for forecasting and spatial estimation in sparse station networks. The pipeline combines engineered transport and seasonal features, PCA latent temporal factors, and a spatial interpolation meta-learner, and is evaluated on both Task A (temporal forecasting) and Task B (spatial estimation). Second, three evaluation tasks are formalised corresponding to temporal forecasting, same-day spatial estimation, and unseen-location deployment via mobile sensing, and a unified time-aware rolling protocol is applied across all of them. Third, a multivariate LSTM is compared against classical regressors, two lightweight graph-inspired reference implementations (DCRNN-style and STGCN-style), and two simple reference baselines (Seasonal naïve and Persistence) under a rolling-window protocol for 30-day recursive forecasting. Fourth, leave-one-station-out spatial estimation is evaluated on a four-station reference network and extended to a Sensor.Community city-scale network comprising the ten stations with the highest PM coverage using a fully split-aware imputation pipeline. Finally, a preliminary low-cost mobile-site prediction pilot is presented together with explicit QA/QC limitations, including the absence of a co-located EPA FRM/FEM or equivalent reference monitor at the deployment site and the need for formal calibration before deployment-grade claims.

2. State of the Art

2.1. Background: AI Models for Forecasting and Spatial Estimation

This section provides a concise overview of the AI model families used in this paper, intended to make the paper accessible to readers who may not be familiar with specific architectures.

2.1.1. Ensemble Tree Methods

Random Forest [6] is an ensemble learning method that builds a large number of decision trees during training and aggregates their outputs. Each tree is trained on a random bootstrap sample of the data and uses a random subset of features at each split, which reduces variance and improves generalisation. Random Forests are robust to overfitting, require minimal hyperparameter tuning, and handle non-linear relationships naturally, making them strong baselines in low-data regimes.

HistGradientBoosting [7] (scikit-learn’s histogram-based gradient boosting, inspired by LightGBM [8]) is a gradient boosting variant that builds trees sequentially, where each tree corrects the residual errors of the previous ensemble. Binning continuous features into histograms substantially reduces memory usage and training time, making it well-suited to the multi-output, feature-rich setting of the proposed advanced pipeline.

2.1.2. Classical and Neural Models

Linear Regression (LR) fits a linear mapping from input features to the target variable by minimising squared residuals. Despite its simplicity, LR captures linear auto-correlations effectively and serves as a strong and interpretable baseline.

Support Vector Regression (SVR) [9] extends the support vector machine to regression by finding a function that lies within an

ε

-tube of the training targets. An RBF kernel implicitly maps inputs to a high-dimensional space, allowing SVR to capture non-linear patterns while controlling model complexity via regularisation.

Long Short-Term Memory (LSTM) [10,11] is a recurrent neural network architecture designed to model sequential data. LSTM cells use gating mechanisms (input, forget, and output gates) to selectively retain or discard information over long time horizons, mitigating the vanishing-gradient problem that affects standard RNNs. In this paper, a multi-layer LSTM takes the full multi-station PM tensor as input and predicts next-day values at all stations simultaneously.

2.1.3. Graph Neural Network Models

Graph Neural Networks (GNNs) extend neural networks to irregular graph-structured data, where nodes represent entities (e.g., monitoring stations) and edges encode relationships (e.g., spatial proximity or wind-driven transport). In spatiotemporal settings, GNNs propagate information across the graph at each time step, allowing the model to fuse temporal patterns with spatial context.

Diffusion Convolutional Recurrent Neural Networks (DCRNNs) [12] model spatial dependencies as a diffusion process on a directed graph. Random-walk diffusion operators propagate node features before they are processed by a gated recurrent encoder–decoder. The lightweight DCRNN-style approximation used in this paper adopts the same diffusion-mixing idea with a GRU backbone.

Spatiotemporal Graph Convolutional Networks (STGCNs) [13] interleave graph convolution layers (which aggregate neighbourhood features on the graph) with temporal 1D convolution layers (which capture local time-domain patterns). The lightweight STGCN-style implementation used here follows this structure with a simplified single graph-convolution mixing step.

2.2. Related Work

Air-quality forecasting is commonly framed as a multivariate time-series prediction problem with strong spatiotemporal dependence across monitoring locations. Classical statistical approaches such as ARIMA [14] provide interpretable baselines but often struggle with non-linear dynamics, regime shifts, and heterogeneous station behaviour. Recurrent neural networks, and LSTM models in particular, were introduced to mitigate vanishing gradients and to capture longer temporal dependencies via gating mechanisms [10].

Recent work increasingly models station networks as graphs and applies spatiotemporal graph neural networks (STGNNs) to couple temporal dynamics with spatial information. Representative architectures include diffusion convolutional recurrent networks (DCRNNs), which propagate information on directed graphs via random-walk diffusion within an encoder–decoder sequence model [12], and spatiotemporal graph convolutional networks (STGCNs), which interleave temporal convolutions with graph convolutions to capture local spatial correlation [13]. Although originally developed for traffic-sensor forecasting, these architectures transfer directly to air-quality station networks given the shared “sensors on a graph” structure.

For air-quality prediction specifically, recent studies report consistent gains from graph-based models and attention mechanisms. Hybrid graph deep networks have been applied to short-horizon PM_2.5 forecasting by aggregating neighbourhood spatiotemporal information [15]. Dynamic geographical graph neural networks construct directional graphs from wind fields and combine them with static distance-based graphs to improve interpretability and accuracy in PM_2.5 forecasting [16]. Multi-scale dynamic GNN approaches employing learnable station clustering and multi-head attention have also demonstrated strong performance in sparse regional networks [17]. Beyond GNNs, transformer-style attention is increasingly used to model long-range temporal interactions and to fuse heterogeneous covariates; measurable error reductions have been reported when augmenting operational air-quality forecasters with attention blocks [18].

An important but less-studied challenge is distribution shift: GNN-based models have been shown to suffer disproportionately when training and test periods exhibit different statistical regimes, often performing worse than simpler non-graph baselines under temporal distribution shift [19]. This finding reinforces the motivation for including classical regressors as strong reference points in the evaluation.

The current state of the art for station-network air-quality forecasting therefore combines (i) explicit graph structure (static, learned, or meteorology-driven dynamic adjacency), (ii) expressive temporal backbones (dilated convolutions, gated recurrent units, or transformers), and (iii) covariate conditioning on weather, traffic, and calendar effects. In contrast, this paper focuses on PM-only baselines and transfer settings to isolate the contribution of station-to-station information and feature engineering, and positions the experiments as a lightweight alternative to STGNN-based forecasters in data-limited scenarios.

Recent PM forecasting studies reinforce this positioning. LSTM variants have been extended with spatial and auxiliary inputs for air-pollutant concentration prediction [20,21], while GC–LSTM, graph-based LSTM, domain-knowledge-enhanced GNNs, dynamic-wind graph convolution, graph-reinforcement ensembles, graph transformers, adaptive scalable STGCN, and GCN–E-LSTM hybrids have been proposed to capture multi-station spatiotemporal dependence [22,23,24,25,26,27,28,29]. These studies motivate graph-aware comparisons, but they also highlight that full state-of-the-art STGNN implementations require richer covariates and larger station networks than the sparse CityAirQ setting considered here.

Despite the surge of deep learning approaches, tree-based ensemble methods remain highly competitive for PM forecasting when training data are limited. Random Forests and gradient boosting models have been shown to match or exceed neural alternatives in small-network and low-data regimes [30,31], largely because their inductive bias towards piecewise-constant functions is well-suited to the non-linear but low-sample-count setting found in sparse urban networks. A comprehensive review of deep learning architectures for PM_2.5 concentration prediction confirms that hybrid methods combining feature engineering with neural components tend to outperform single-architecture approaches, particularly when training data are scarce [32]. This motivates the inclusion of tree-based ensembles as primary baselines rather than as afterthoughts.

Low-cost sensor calibration is a complementary and increasingly important problem. The monitoring literature emphasises that low-cost sensors can increase spatial density but require careful field calibration, reference co-location, and QA/QC before quantitative regulatory interpretation [33,34,35,36,37,38,39]. PMSA-class and similar optical particle counters are subject to humidity interference, particle composition bias, and drift over time [40,41]; machine-learning calibration against co-located reference monitors has been shown to substantially reduce these errors, though calibration transfer across environments remains an open challenge [42,43,44,45,46,47]. The mobile hardware considered here follows a low-cost wearable sensing design previously reported for CityAirQ-style deployments [5]. Device-level environmental compensation and internal sensor cross-checks can improve operational stability, but they are not equivalent to field calibration against a regulatory-grade PM reference monitor. Task C in this paper therefore addresses only an exploratory uncalibrated prediction setting and its results are explicitly framed as indicative rather than definitive.

3. Materials and Methods

3.1. Problem Formulation and Tasks

Let

S = {s_{1}, \dots, s_{N}}

denote a set of fixed monitoring stations, with

N = 4

for the classic network and

N = 10

for the city extension. For each station s, the observed data form a daily multivariate time series

x_{t}^{(s)} \in R^{3}, x_{t}^{(s)} = [x_{t, {PM}_{1}}^{(s)}, x_{t, {PM}_{2.5}}^{(s)}, x_{t, {PM}_{10}}^{(s)}] .

(1)

Given a historical window of length L days, the goal is to predict future pollutant levels for a horizon of H days. The study defines three tasks reflecting distinct deployment conditions.

3.2. Task A: Multi-Station Temporal Forecasting

Given the historical observations from the full station network, predict future values at all stations jointly:

f_{θ} : (X_{t - L + 1}, \dots, X_{t}) \mapsto ({\hat{X}}_{t + 1}, \dots, {\hat{X}}_{t + H}),

(2)

where

X_{t} \in R^{N \times 3}

is the full network observation at day t. All compared models use the flattened

N \times 3

network tensor as their input; no model is restricted to a single station’s history.

3.3. Task B: Leave-One-Station-Out Spatial Estimation

Train using all stations except a held-out target

s^{⋆}

, then estimate the PM values at

s^{⋆}

for each day in the test window using same-day neighbour readings and short-lag engineered features. Because the feature set includes current-day neighbour measurements, this task is same-day spatial estimation (spatial reconstruction/nowcasting) rather than sequential forecasting. It probes whether a model can generalise across urban locations when the target station has no training data, while requiring that neighbour measurements are available on the estimation day. Task B is therefore a spatiotemporal same-day estimation problem: temporal information enters through lag features, while the target-day estimate relies on neighbouring stations rather than a future target measurement.

3.4. Task C: Mobile Sensing Field Deployment

Deploy a mobile sensor at a new location ℓ not represented in the training set and predict daily PM levels at that location. In this task, no co-located reference monitor is available at ℓ; consequently, the experiment is treated as a preliminary prediction and QA/QC pilot rather than as a deployment-scale validation. The setup assumes that the historical daily data of the fixed stations up to and including the day prior to deployment are available. After the first mobile daily median has been observed, a strictly one-day-ahead online predictor may also use the previous mobile daily median to forecast the next day. The mobile sensor records continuously for six calendar days at an uncovered location under highly variable weather conditions, providing a reference-free time series for exploratory prediction analysis. Accordingly, the mobile series is reported only as raw pilot evidence and is not used to claim calibrated ambient PM concentrations.

3.5. Operational Setup

The following fixed parameters are used throughout all tasks. Fixed-station observations are represented as daily medians. Task C mobile readings are recorded at approximately five-minute resolution and aggregated to daily medians for model comparison. For Task A, the input window is

L = 10

days and the recursive horizon is

H = 30

days, with a multi-output vector

({PM}_{1}, {PM}_{2.5}, {PM}_{10})

predicted at each step for all N stations. For Task B, a fixed chronological hold-out of the most recent 180 days is used, with current-day neighbour values permitted as features in the main same-day reconstruction setting and excluded in the lagged-only sensitivity analysis.

3.6. Data

3.6.1. Classic Four-Station Network (AQICN-Derived)

A four-station network in Bucharest (

N = 4

, stations S1–S4) derived from publicly available AQICN air-quality time series [48] is used. For each station, daily medians are computed for PM₁, PM_2.5, and PM₁₀ and aligned by date to form a continuous multivariate sequence. Station locations are listed in Table 1.

3.6.2. Sensor.Community City Network

To study city-scale heterogeneity and transfer, a Bucharest-wide dataset is constructed from public Sensor.Community [49] measurements within a broad geographic bounding box. The snapshot spans 18 months (2024–2025) and yields 15 outdoor stations, all reporting PM_2.5 and PM₁₀; PM₁ is available only for the station equipped with an SPS30 sensor (SC89277). For transfer experiments, the ten stations with the highest joint PM_2.5/PM₁₀ day coverage are selected to reduce instability from sparse histories (

N = 10

).

3.6.3. Mobile Sensing Dataset

Task C evaluates a six-day outdoor deployment (24–29 April 2026) of a CityAirQ mobile sensor derived from the previously published mobile monitoring-device architecture for low-cost wearable air-quality measurement [5]. The sensor was placed at an outdoor location ℓ (44.3898° N, 26.1176° E) not covered by either the classic four-station network or the Sensor.Community network. The deployment window spans six calendar dates, comprising four complete measurement days and two partial boundary days (the first and last). The deployment period was specifically chosen to include a weekend and highly variable weather conditions, ensuring that the dataset captures a realistic and dynamic range of atmospheric states rather than an artificially constant baseline. The sensor records continuously, producing a reference-free time series of PM₁, PM_2.5, and PM₁₀ at the unseen site. Table 2 summarises the deployment parameters.

3.6.4. Quality Assurance and Calibration Status

The fixed-station datasets are public observational time series and are used after timestamp parsing, daily aggregation, chronological splitting, and split-aware imputation. For the CityAirQ mobile sensor, invalid sentinel values (−1.0) are removed before aggregation, timestamps are converted to Bucharest local time, and daily medians are computed from approximately five-minute readings. The mobile deployment at ℓ was not co-located with an EPA Federal Reference Method (FRM), Federal Equivalent Method (FEM), or equivalent regulatory-grade PM instrument. Therefore, Task C should be interpreted as an exploratory prediction and data-quality pilot rather than as a reference-grade validation study. Device-level environmental compensation and the short BMV080/PMSA008 paired comparison are used only as internal low-cost sensor QA/QC checks; they do not replace a field co-location against a reference monitor under the same environmental conditions as the deployment site. Consequently, the raw mobile measurements are not treated as calibrated reference observations and are used only to assess exploratory prediction behaviour and data-quality limitations.

For the classic network, missing days are filled with temporal gap-aware imputation followed by forward/backward fill. For the city top-10 experiment, a complete daily grid is enforced using a hybrid imputation pipeline: short-gap temporal smoothing, seasonal day-of-year/month medians per station, same-day cross-station pollutant means, and final forward/backward fill. This guarantees that each selected station has PM_2.5 and PM₁₀ values on every day of the experiment window.

For Task A models, MinMax scaling is fit on the training portion of each rolling subset and applied to the corresponding test slice. The baseline inputs use only the three PM variables. Tree-based regressors are largely scale-insensitive; MinMax scaling primarily affects the neural baselines. An explicit MinMax vs. Standard/Robust scaling ablation is left as follow-up work.

3.6.5. Split-Aware Imputation

In all reported experiments, imputation statistics (seasonal medians, cross-station means) are fitted exclusively on the training fold of each split and applied to the held-out portion without any look-ahead. Station selection (top-10 by coverage) was determined prior to any split construction, using only the full-period day counts as a coverage criterion, which does not depend on the PM values themselves. In the leave-one-station-out experiments (Task B), the target station’s own values are excluded from same-day cross-station imputation within the hold-out window, and imputation statistics for the target station are fitted only on the pre-test history. Forward fill is strictly causal by construction; backward fill, applied only to short initial gaps at the start of a station’s history, represents a minor and unavoidable boundary artefact. This split-aware handling follows standard missing-data practice: imputation model parameters must be estimated on the available training information rather than on held-out responses [50,51].

Table 3 reports a linear vs. hybrid ablation under this split-aware protocol, confirming that the hybrid variant materially outperforms linear imputation (mean

R^{2}

0.687 vs. 0.520; median

R^{2}

0.729 vs. 0.629), with no inflated advantage from test-period look-ahead.

The hybrid variant is retained because it materially improves transfer stability on this dataset. All main results reported in Section 4.4, Section 4.5 and Section 4.6 use the split-aware hybrid imputation.

3.7. Models and Feature Construction

This section describes all models compared in the paper. It first introduces the proposed candidate model, then the Task A baselines (LSTM, classical regressors, simple reference baselines, and graph-inspired implementations), and finally the spatial estimation setup used for Task B. Full architectural and hyperparameter details for the Advanced Stage 0–3 pipeline are given in Section 3.11.

3.8. Proposed Candidate Model

The model proposed as a candidate for practical deployment is the Advanced Stage 0–3 pipeline. Its prediction core is a multi-output HistGradientBoosting residual regressor augmented with inverse-distance transport features, seasonal and statistical feature engineering, PCA latent temporal factors, and a RidgeCV-based spatial interpolation meta-learner. All Task A models (LSTM, LR, RF, SVR, DCRNN-style, STGCN-style, and Seasonal naïve) serve as reference baselines, and the advanced pipeline is also evaluated on Task A (see Section 4.2) to provide a complete benchmark. Architecture and feature-engineering details are deferred to Section 3.11, after the baseline results have been presented.

3.9. Task A: Multi-Station Temporal Forecasting Models

3.9.1. Sequence Construction

Supervised examples are constructed using a sliding window of length

L = 10

days. For each day t, the model input is the full network tensor of past PM values, with shape

(L, N \times 3) = (10, 12)

for the four-station network, and the target is the next-day multi-station vector

X_{t + 1} \in R^{N \times 3}

. To produce a 30-day forecast, recursive (iterative) prediction is applied: each predicted day is appended to the input window and used to predict the next step, repeated for

H = 30

steps. Within each rolling subset, MinMax scaling is fit on the training slice and applied to the test slice.

3.9.2. LSTM Forecaster

The neural baseline is a multivariate LSTM designed for daily PM prediction on the full four-station tensor. The network receives flattened sequences of shape

(10, 12)

(4 stations × 3 pollutants) and processes them through two LSTM layers with 192 hidden units each, with dropout regularisation (rate 0.1) applied between layers. A dense output layer with 12 units produces station-by-pollutant predictions. Training proceeds for up to 80 epochs with batch size 32, a 0.2 validation split, and Adam (learning rate

5 \times 10^{- 4}

) minimising MSE; MAE is tracked as a secondary metric.

3.9.3. Classical Baselines

Three classical predictors are implemented under the same windowed input regime (full network tensor as input). Linear regression (LR) applies ordinary least squares on flattened windows. The Random Forest regressor (RF) uses 300 trees (n_estimators=300, random_state=42). Support vector regression (SVR) employs an RBF kernel with per-pollutant models (C=1, epsilon=0.01). All three are applied recursively to generate multi-step forecasts.

3.9.4. Simple Reference Baselines

To anchor the evaluation against methods that require no fitted model, two reference baselines are included. The Persistence baseline forecasts every future step as the last observed value in the input window, i.e.,

{\hat{x}}_{t + h} = x_{t}

for all

h \in {1, \dots, 30}

. The Seasonal naïve baseline forecasts each future step using the historical day-of-year median, computed on the training portion of each rolling subset. Models that do not clearly improve over these baselines should be treated as ineffective for the task.

3.9.5. Lightweight Graph-Inspired Reference Implementations

To compare with the architectures discussed in Section 2.2, two simplified graph-based implementations are included for the four-station network. These are lightweight approximations designed to probe the value of graph-based information flow in a small four-station setting; they are not faithful reproductions of the published DCRNN or STGCN architectures and should not be interpreted as such.

The DCRNN-style variant constructs a static adjacency matrix A from inverse Haversine distances between stations; diffusion mixing applies A and

A^{2}

to the input, followed by a two-layer GRU and a dense output head, trained with the same MSE objective, Adam optimiser, and batch size as the LSTM. The STGCN-style variant uses the same adjacency for a single graph-convolution mixing step, followed by temporal 1D convolutions and a dense output head. Both implementations use the same rolling protocol (

L = 10

,

H = 30

, 5 splits) and recursive forecasting. Comparisons against these implementations reflect the value of lightweight graph mixing in this small-network regime, not performance relative to the state-of-the-art models from which they draw inspiration.

3.10. Task B: Spatial Estimation Feature Construction

Let

s^{⋆}

be the held-out station and

S ∖ {s^{⋆}}

the neighbour set. For each day t, a feature vector is constructed from the neighbour stations comprising five groups: (i) current-day neighbour values of all PM variables (same-day readings, which makes the task spatial estimation rather than forecasting); (ii) long-tail lags for each neighbour variable at offsets of 1, 2, 3, 5, 7, 10, 14, and 21 days; (iii) short rolling statistics over a 10-day window (mean, standard deviation, minimum, maximum), shifted by one day to preserve causality; (iv) neighbour aggregates per pollutant (mean, minimum, maximum, standard deviation across all neighbours); and (v) seasonal signals in the form of sine and cosine of day-of-year to capture annual periodicity.

A Random Forest regressor is used for estimation given its ability to model non-linear interactions with limited tuning (n_estimators=500, max_depth=24, min_samples_leaf=1, random_state=42). For each held-out station and pollutant, the model is fit on the pre-test history and evaluated on the most recent 180 days.

3.11. Advanced Pipeline Architecture (Stages 0–3)

This section provides full architectural and hyperparameter details for the proposed candidate model: the Advanced Stage 0–3 pipeline. Results for this pipeline appear in Section 4.1 (Task A temporal forecasting), Section 4.4 and Section 4.5 (spatial estimation), and Section 4.6 (mobile-site prediction pathway for Task C).

3.12. Experiment Stages

The pipeline is organised into four progressive stages, each adding capabilities on top of the previous one. Table 4 gives a compact overview and the subsections that follow describe each component in detail.

3.13. Inverse-Distance Transport Features

To model pollutant transport between stations, inverse-distance weighted signals are computed using Haversine distances. For a directed pair

i \to j

and exponent

α \in {0.5, 1.0, 1.5}

, the weight is

w_{i \to j} = \frac{1}{d {(i, j)}^{α} + ε} .

(3)

Using one-day-lagged neighbour values (to preserve strict causality), two “received-from-neighbour” features per pollutant p are constructed:

\begin{matrix} {received_sum}_{j, p, α} (t) & = \sum_{i \neq j} w_{i \to j} \cdot y_{i, p} (t - 1), \end{matrix}

(4)

\begin{matrix} {received_\max}_{j, p, α} (t) & = max_{i \neq j} (w_{i \to j} \cdot y_{i, p} (t - 1)) . \end{matrix}

(5)

3.14. Residual Modelling and Latent Temporal Factors

The full feature stack includes: dense lags (1–14 days), Fibonacci lags (1, 2, 3, 5, 8, 13), rolling statistics over windows of 3/7/14/30 days (mean, median, standard deviation, maximum, minimum), exponentially weighted mean (EWM) features with half-lives of 7/14/30 days, global station aggregates, pairwise station differences, and cyclic time encodings (day-of-year and day-of-week sine/cosine, plus a linear trend). A 90-day lagged rolling mean

b (t)

serves as the seasonal baseline; all models in Stages 0–3 predict the residual

r (t) = y (t) - b (t)

. Figure 1 and Figure 2 summarise the advanced pipeline and transport-feature construction.

Stage 0 applies a log1p residual transform (with an additive offset to handle negative residuals). Stage 3 switches to a signed-power transform with exponent 1.15 and applies volatility-aware sample weights:

w (t) = 1 + 2 \cdot rank (max_{p} y_{p} (t)) + 3 \cdot rank (max_{p} | y_{p} (t) - b_{p} (t) |) .

(6)

The residual regressor is a multi-output HistGradientBoosting model with hyperparameters selected by 14 iterations of randomised search over a TimeSeriesSplit with 4 folds and a cross-validation test size of 120 days: loss=absolute_error, max_depth=10, max_leaf_nodes=95, learning_rate=0.06, min_samples_leaf=20, l2_regularization=1e-2, max_iter=300. The final model is evaluated on a 180-day hold-out window.

Latent temporal factors are derived by applying PCA to the lag-1 feature matrix and retaining four components (traffic_component_1–4). The explained variance ratios are approximately

[0.67, 0.238, 0.069, 0.017]

, so the first two components capture around 91% of the lag-1 structure. Figure 3 shows the latent-factor construction step used to generate the traffic-component inputs.

3.15. Spatial Interpolation Meta-Learner (Stage 3)

The Stage 3 meta-learner builds inverse-distance weighted features of neighbour predictions together with their summary statistics (mean, minimum, maximum, quantiles), following the common geostatistical intuition that nearby observations carry stronger local information [52]. It then trains a RidgeCV model per station. Base and interpolated predictions are blended as

{\hat{y}}_{blend} = β \cdot {\hat{y}}_{base} + (1 - β) \cdot {\hat{y}}_{interp},

(7)

where

β

is tuned on training MAE. Hyperparameter search uses TimeSeriesSplit (4 splits, cross-validation test size 120 days) and the same 180-day hold-out for final evaluation.

3.16. Evaluation Protocol and Metrics

All reported evaluations use chronological splits to avoid look-ahead. Task A uses five rolling 30-day forecast windows, whereas Tasks B and C use fixed hold-out evaluations matched to their spatial-estimation and mobile-deployment settings. The metrics below are reported in the original PM units whenever possible.

3.17. Rolling Window Evaluation (Task A)

A rolling testing strategy improves robustness by measuring performance across multiple recent intervals. For each station dataset, five temporal subsets are constructed. The k-th subset (

k = 1, \dots, 5

) includes all data up to day

T - (k - 1) \times 30

, where T is the final available day. Within each subset, the model is trained on all days except the last 30, and tested on those final 30 days. Because successive subsets shift the test window back by 30 days each time, the five test windows cover non-overlapping 30-day periods; however, training sets across subsets share earlier data and are therefore correlated. The mean MAE across the five subsets is reported as the primary aggregate and is complemented with median and IQR dispersion statistics rather than significance tests.

3.18. Hold-Out Evaluation (Tasks B and C)

For Task B (leave-one-station estimation on both the classic and city networks), the most recent 180 days are held out for testing, with the model trained on all earlier data. For Task C, the fixed-station model is trained on historical daily station data and evaluated on the six daily medians from the mobile deployment. A complementary online prediction baseline is evaluated on the five days after the first mobile daily median, using only the previous mobile daily median as input. No reference-calibration split is claimed for the deployment site.

3.19. Metrics

Mean absolute error (MAE) is the primary metric, as it is interpretable in the original

μ {g / m}^{3}

units and less dominated by large individual errors than RMSE [53,54]. RMSE and

R^{2}

are also reported in summary tables. Because MAE alone can miss trend misalignment (phase shifts), time-series prediction plots are used to complement numerical summaries. Horizon-wise error curves (t + 1 through t + 30) are a relevant extension for future analysis of long-horizon degradation.

4. Results

4.1. Multi-Station Temporal Forecasting (Task A)

The mean MAE across the five rolling subsets for each model, station, and pollutant is reported in Table 5. Two observations carry across all models: (i) Station S2 consistently achieves the lowest absolute errors, likely reflecting a more homogeneous local PM regime, and (ii) Station S4 consistently produces the highest errors, a pattern revisited in Section 4.5 when discussing spatial transfer. All models are evaluated under the same 30-day recursive forecasting protocol. Figure 4 and Figure 5 summarise the aggregate model comparison and a representative Station S1 forecast.

Despite its simplicity, linear regression outperforms the LSTM at every station, suggesting that the 10-day input window and the approximately linear structure of PM auto-correlation are sufficient for capturing much of the predictable variance. SVR is competitive with linear regression but shows higher variance; it is notably strong at S2 (PM_2.5 MAE 3.77

μ

g/m³) but substantially weaker at S4, indicating sensitivity to the scale of PM episodes. Random Forest is the strongest trained classical baseline. Both lightweight graph implementations produce the highest errors among the learning-based models; because they are simplified graph-inspired references rather than faithful DCRNN/STGCN implementations, this result should be interpreted only as evidence about these lightweight implementations in this small-network setting.

4.2. Advanced Pipeline on Task A

The proposed Advanced Stage 0–3 pipeline is evaluated under the same rolling protocol (

L = 10

,

H = 30

, 5 subsets) to provide a complete Task A benchmark. The pipeline is applied in its residual-forecasting mode: the 90-day rolling mean seasonal baseline

b (t)

is subtracted, the HistGradientBoosting residual regressor (with transport and PCA features) predicts the residual, and

b (t)

is re-added for the final forecast. Recursion is applied identically to the other models. Results are included in Table 5.

The advanced pipeline achieves an overall MAE of 7.118

μ

g/m³, reducing error by 4.7% relative to Random Forest (7.469

μ

g/m³). The aggregate gain is consistent across pollutants. Per-pollutant means are PM₁: 5.089 (RF: 5.369), PM_2.5: 7.543 (RF: 7.941), and PM₁₀: 8.721 (RF: 9.097), representing relative reductions of 5.2%, 5.0%, and 4.1%, respectively.

4.3. Summary

Across all stations and pollutants, the advanced pipeline achieves the best overall MAE (7.118

μ

g/m³), followed by Random Forest (7.469), linear regression (8.054), SVR (8.231), and LSTM (9.142). The DCRNN-style reference (9.828) remains below the Seasonal naïve baseline (10.406), while STGCN-style (10.550) is slightly weaker than Seasonal naïve but still stronger than Persistence (11.512). Table 6 reports median/IQR dispersion across station–pollutant cells, and Table 7 reports RMSE and

R^{2}

for the tabular baselines.

Negative out-of-sample R-squared values mean that, on these short recursive windows, the fitted model explains less variance than a simple test-window mean benchmark; they do not indicate an invalid calculation.

4.4. Spatial Estimation on the Classic Four-Station Network (Task B)

Leave-one-station-out spatial estimation on the classic Bucharest network (

N = 4

) is evaluated using the Random Forest feature pipeline described in Section 3.10. For each of the four held-out stations, the model is trained on the remaining three and evaluated on the most recent 180-day hold-out window.

Across all four stations and three pollutants, the mean

R^{2}

is 0.873, confirming that same-day neighbour readings carry substantial information for spatial reconstruction in this small, geographically compact network. Station S4, which consistently shows the highest forecasting errors in Task A, also yields the lowest

R^{2}

in the spatial estimation task, suggesting that its PM regime is less faithfully captured by measurements at the other three stations. Station S2, the most homogeneous site in Task A, is also the most accurately reconstructed.

The high mean

R^{2}

should be interpreted with care: because all four stations lie within a small geographic area and share the same seasonal PM cycle, a substantial portion of the explained variance reflects shared temporal structure (the network-wide winter-high, summer-low pattern) rather than purely spatial information transfer. This point is revisited quantitatively in Section 4.5 via the permutation importance analysis on the larger city network.

4.5. City-Scale Sensor.Community Extension

To assess whether the Task B spatial estimation pipeline scales to a larger and more heterogeneous network, the pipeline is replicated on the Sensor.Community city dataset (

N = 10

stations). The feature set and Random Forest configuration are identical to those in Section 3.10 (n_estimators=500, max_depth=24, min_samples_leaf=1), and the same 180-day hold-out is used. All results in this section are obtained with the fully split-aware hybrid imputation described in Section 3.6. The proposed advanced pipeline is also evaluated (Entry B) against the neighbour-only Random Forest (Entry A) on this larger setting.

Across the ten stations, the median overall

R^{2}

is 0.734 (mean 0.691). SC84029 is the strongest station and SC87013 the weakest. Figure 6 and Figure 7 visualise the spatial estimation performance for these two boundary cases.

Table 8 reports sensitivity variants for the same city-scale transfer setting. The full same-day model and the current-neighbour-only variant perform similarly, while the lagged-only and seasonal-only variants degrade substantially. This indicates that same-day neighbour information is a major contributor to Task B and that the task should be interpreted as spatial reconstruction/nowcasting, not strict future-time forecasting. The full model has a small mean bias (−0.26

μ

g/m³ overall), but the boundary-case figures show station-dependent behaviour: SC84029 tends to slightly underestimate, whereas SC87013 tends to overestimate.

Feature Influence Analysis

To quantify the contribution of each engineered feature, permutation importance is applied to the Random Forest transfer models (top-10 stations, PM_2.5/PM₁₀ targets). Importance is measured as the mean MAE increase on the hold-out set when each feature is randomly permuted, averaged across station/pollutant combinations. Figure 8 and Figure 9 summarise the strongest baseline features and their category-level contributions.

For the baseline entry (Table 9), the cosine day-of-year encoding (dayofyear_cos) is the single most important feature, followed by the neighbour PM₁₀ and PM_2.5 means. Notably, the seasonal term (importance 0.43) outweighs the strongest spatial neighbour feature (0.30), which has a meaningful implication: a substantial fraction of the model’s accuracy derives from seasonal context rather than from the actual neighbour readings on a given day.

Applying PCA to the importance matrix across station/pollutant tasks (Figure 10) shows that PC1 explains 24.85% of variance and PC1–PC5 together explain 63.17%, indicating moderate but not dominant task-level homogeneity.

For the advanced entry (Table 10), the seasonal term drops out of the top five, and neighbour aggregate and station-specific lag features rise in relative importance. The transport flow category, absent from the baseline top-five, contributes positively in the advanced pipeline (Figure 11). Figure 12 and Figure 13 provide the corresponding ranked-feature and PCA views for the advanced entry.

Table 11 summarises the stepwise advanced-pipeline ablation on the city-scale transfer task. The gains over the baseline are measurable: the selected advanced configuration reduces mean MAE from 3.918 to 3.860

μ

g/m³ (1.5% lower error) and raises mean

R^{2}

from 0.699 to 0.720 (3.0% relative increase). This supports the pipeline as a useful refinement over a strong baseline while keeping the interpretation proportional to the observed effect size.

4.6. Mobile Sensing and Field Deployment (Task C)

Task C addresses the deployment scenario in which a low-cost mobile sensor is placed at a new location ℓ not covered by any existing monitoring network and daily PM levels must be predicted at that location. Because no co-located reference monitor was available at ℓ, this task is evaluated as an exploratory prediction pilot with two strictly predictive settings: (i) a historical-transfer Random Forest trained on the 15-station network and applied to the mobile daily series using only calendar and lagged PM features; and (ii) a one-day-ahead online persistence predictor that forecasts day d from the mobile daily median observed on day

d - 1

. The online predictor is therefore scored only after the first mobile day has been observed.

4.6.1. Six-Calendar-Day Field Deployment

The CityAirQ mobile sensor is deployed outdoors for six calendar days (24–29 April 2026) at location ℓ (44.390° N, 26.118° E), comprising four complete measurement days and two partial boundary days (the first and last). This site is not represented in either the classic four-station network or the Sensor.Community city network. Figure 14 shows the deployment site relative to the historical fixed stations. The sensor records PM₁, PM_2.5, and PM₁₀ at approximately five-minute intervals throughout the deployment window.

The deployment window is deliberately chosen to span both weekdays and a weekend (Saturday 25 April and Sunday 26 April), ensuring that the dataset captures the reduced-traffic conditions typical of non-working days alongside normal weekday patterns. In addition, the deployment period coincides with substantial meteorological variability: temperature ranges from 14.9 to 36.6 °C, relative humidity from 14.6 to 42.8%, and barometric pressure from 998 to 1017 hPa. A notable pressure trough on 26–27 April (dropping below 1000 hPa) is accompanied by elevated PM concentrations, while the subsequent pressure recovery on 28 April coincides with a sharp decline in all three PM channels. These conditions ensure that the field dataset avoids artificial constancy and reflects a realistic range of atmospheric states.

Table 12 lists the deployment-site metadata, and Figure 15 summarises the PM and meteorological time series recorded during the field deployment.

Table 13 summarises the daily statistics of the raw mobile sensor readings. Median PM_2.5 remains elevated at 84–107

μ

g/m³ during the first four days, then drops sharply to 7

μ

g/m³ on day 5 following the weather shift. Within-day variability (standard deviation) is highest on the first and fifth days, when the sensor captures both pre- and post-transition conditions. Figure 16 visualises the daily distributions as box plots, highlighting the contrast between the high-PM weekend/early-week period and the post-frontal clean-air days.

A notable feature of the data is the very low within-day variance on days 2–3 (Saturday–Sunday), during which all three PM channels report nearly constant values (PM₁ ≈ 52, PM_2.5 ≈ 84, PM₁₀ ≈ 95

μ

g/m³). This plateau may reflect either genuine atmospheric stagnation under low-wind weekend conditions or sensor-level quantisation at these concentration levels; a co-located reference instrument is needed to disambiguate the two explanations.

4.6.2. Exploratory Prediction at the Unseen Location

To evaluate prediction at location ℓ, the historical-transfer Random Forest is trained exclusively on the extended city-scale network (comprising 15 historical stations, including both the classic network and Sensor.Community nodes), then evaluated against the six daily median concentrations recorded by the mobile sensor. The model does not use same-day mobile readings as predictors; the first-day lags are seeded from the historical fixed-station distribution, and later predictions use only previously observed mobile daily medians. A strict online persistence predictor is also reported, which is available from the second deployment day onward and predicts each daily median from the previous mobile daily median. Table 14 reports the effective sample size and uncertainty estimates. The online predictor reduces MAE substantially because the short deployment contains multi-day plateaus, but it still misses the abrupt day 5 drop and should be interpreted only as a short-horizon prediction baseline.

A negative bias means that predictions are below the raw mobile median on average; the high MAE values therefore reflect both transfer error and the absence of reference calibration at the mobile site. Positive bias for the online persistence baseline reflects overestimation after the abrupt clean-air transition.

The same pilot diagnostics give additional rank and scale-normalised context. For the historical-transfer RF, Spearman correlations are 0.648, −0.031, and −0.185 for PM₁, PM_2.5, and PM₁₀, with normalised MAE values of 0.634, 0.653, and 0.666, respectively. For the online persistence predictor, Spearman correlation is approximately 0.645 for all three pollutants, with normalised MAE values of 0.337, 0.376, and 0.349. Figure 17 compares the signed prediction errors for the historical-transfer and online-persistence predictors.

4.6.3. Prediction Interpretation

The contrast between the two Task C predictors is informative. The historical-transfer Random Forest captures a sample-limited positive trend but shows a raw-scale offset, especially during the elevated-PM plateau. The online persistence predictor is simpler but better matched to the local mobile series once one previous daily median is available, reducing MAE by 53.3% for PM₁, 50.7% for PM_2.5, and 54.3% for PM₁₀ relative to the historical-transfer RF. This gain should not be read as deployment-grade validation: the test set contains only five forecastable online points, and the persistence predictor overestimates the abrupt day 5 clean-air transition. The result instead indicates that any operational Task C predictor should combine historical spatial transfer with rapid local adaptation after deployment.

5. Discussion

5.1. Task A: Temporal Forecasting

The advanced pipeline achieves the best overall MAE (7.118

μ

g/m³) on Task A, with Random Forest (7.469), linear regression (8.054), and SVR (8.231) as the closest alternatives. The Seasonal naïve (10.406) and Persistence (11.512) references are substantially weaker. The 4.7% MAE reduction relative to Random Forest is an important finding: in this sparse daily setting, feature engineering adds measurable value over a strong tabular baseline without changing the overall difficulty of the task. The LSTM (9.142) outperforms both lightweight graph-inspired references, but the simplified graph-mixing models do not support broad claims about state-of-the-art STGNNs because they are intentionally lightweight references rather than full reproductions.

5.2. Task B: Same-Day Spatial Estimation

The neighbour-only Random Forest achieves consistently high

R^{2}

on the classic network (mean 0.873, Section 4.4), with performance remaining meaningful but more variable across the ten Sensor.Community stations (median 0.734, Section 4.5). The performance gap between the classic and city settings reflects the increased heterogeneity of the larger network: stations with shorter histories or greater geographic isolation (e.g., SC87013) show substantially lower

R^{2}

. An important caveat is that the use of current-day neighbour readings as features makes this task spatial reconstruction rather than forecasting; a strictly lagged feature set would be harder but more realistic for a purely predictive evaluation.

5.3. Advanced Pipeline

On Task A, the advanced pipeline achieves the best overall MAE (7.118

μ

g/m³), reducing error by 4.7% relative to Random Forest (7.469

μ

g/m³). On the city-scale spatial estimation task (Task B), it further improves over the Baseline in both mean MAE (3.860 vs. 3.918

μ

g/m³; 1.5% reduction) and mean

R^{2}

(0.720 vs. 0.699; 3.0% relative increase). Permutation importance confirms that the largest contributors to Task B performance remain seasonal encoding and neighbour aggregate features; transport flow features add an incremental positive contribution in the advanced pipeline. Together, these results support the Advanced Stage 0–3 pipeline as the proposed candidate model for both temporal forecasting and spatial estimation tasks, with bounded but consistent improvements over strong baselines.

5.4. Task C: Mobile Sensing and Field Deployment

The six-day outdoor deployment (24–29 April 2026) at an uncovered site provides a preliminary transfer and data-quality pilot for an entirely new location. The mobile dataset reveals substantial environmental variability across the deployment window: a sustained high-PM period coinciding with a pressure trough and warm conditions, followed by a sharp concentration drop after a frontal passage. The inclusion of a weekend ensures that the data are not artificially constant, capturing reduced-traffic emission patterns alongside normal weekday behaviour.

In the exploratory prediction sub-experiment (Section 4.6), the historical-transfer model trained on the 15-station city network yields positive Pearson correlations for all three pollutants (

r = 0.432

for PM_2.5,

n = 6

). The reported bootstrap intervals are descriptive bounds for a deliberately short pilot sample, so the correlation estimates are interpreted together with MAE and bias rather than as stand-alone significance tests. The signed error estimates indicate a raw-scale offset between the fixed-station transfer model and the mobile sensor. A strict one-day-ahead online persistence predictor improves MAE once a prior mobile daily median is available, but it remains a short-window baseline and misses the abrupt day 5 regime shift. A co-located reference instrument at location ℓ is therefore the critical next step for calibration and conclusive validation.

5.5. Limitations

Task B uses same-day neighbour readings, making it spatial reconstruction rather than forecasting; the lagged-only sensitivity analysis confirms that the task becomes substantially harder when same-day information is removed. The Task C pilot is based on uncalibrated mobile readings, as no co-located reference instrument was available at location ℓ during the deployment. The sensor quantisation behaviour observed on the weekend plateau (days 2–3) requires investigation before the mobile readings can be used for quantitative calibration. Finally, all experiments are conducted on a single city, and performance on networks with different spatial density, topography, or emission profiles may differ.

6. Conclusions

The central empirical lesson of this benchmark is that, in a sparse urban PM network at daily resolution, recent concentration history and careful feature engineering matter at least as much as model expressiveness. The proposed Advanced Stage 0–3 pipeline achieves the best Task A overall MAE (7.118

μ

g/m³), a 4.7% reduction relative to Random Forest (7.469

μ

g/m³). This indicates that the 30-day recursive forecasting setting is difficult and that strong tabular and simple reference baselines are essential. The lightweight DCRNN-style and STGCN-style references are weaker in this small-network experiment, but the result should not be generalised to full STGNN architectures.

The Task B spatial estimation results tell a complementary story: when same-day neighbour readings are available, a neighbour-only Random Forest achieves consistently high

R^{2}

(mean 0.873 on the classic network, median 0.734 on the ten-station city extension). The dominance of seasonal encoding in permutation importance, however, indicates that part of this accuracy reflects a shared temporal structure (the network-wide winter-high, summer-low PM cycle). The sensitivity analysis supports this interpretation: the full same-day and current-neighbour-only variants remain strong, while the lagged-only and seasonal-only variants degrade substantially.

For practitioners, the results suggest that deploying a well-engineered machine-learning model with seasonal features and neighbour aggregates is a strong starting point for PM monitoring in data-scarce cities. The Advanced Stage 0–3 pipeline extends this baseline with residual modelling, transport features, and spatial interpolation, showing consistent gains across both forecasting and spatial estimation tasks. All experiments use fully split-aware imputation, ensuring that reported performance is not inflated by test-period look-ahead.

The Task C results demonstrate a feasible mobile-site prediction pathway under realistic field conditions, but not deployment-grade accuracy. The six-day deployment captured a dynamic range of conditions—including weekend reduced-traffic patterns, a pressure-driven PM excursion, and a post-frontal clean-air period. Historical-transfer prediction using the 15-station city network achieved positive sample-limited Pearson correlations for all three pollutants, with descriptive intervals reflecting the

n = 6

pilot size and a measurable raw-scale offset. A strict one-day-ahead online persistence predictor reduced MAE after the first mobile day, showing the value of rapid local adaptation, but the evaluation still contains only five forecastable online points. Longer deployment windows and co-located reference instruments are necessary before calibration or regulatory claims can be made.

Future work should evaluate the advanced pipeline on more cities and longer time periods; extend the Task C field deployment to include co-located reference measurements and a formal calibration protocol; investigate the sensor quantisation behaviour observed during the weekend plateau; and further expand the lagged-only Task B evaluation across more cities. Ultimately, refining these AI-driven monitoring methods directly supports the sustainability agenda by providing cities with accessible, data-driven tools to combat air pollution and safeguard public health.

Author Contributions

Conceptualisation, C.-L.G., D.T. and L.R.; Methodology, C.-L.G.; Software, C.-L.G.; Validation, C.-L.G.; Writing—original draft, C.-L.G.; Writing—review and editing, C.-L.G., D.T. and L.R.; Supervision, D.T. and L.R.; Project administration, D.T. and L.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The AQICN data used for the classic four-station network are publicly available at https://aqicn.org (accessed on 1 January 2026). The Sensor.Community data used for the city-scale extension are publicly available at https://sensor.community (accessed on 1 January 2026). The mobile-sensor deployment data will be made available by the authors upon reasonable request.

Acknowledgments

The authors acknowledge the AQICN project and the Sensor.Community network for providing open access to the air-quality data used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM_2.5 and PM₁₀), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide; WHO: Geneva, Switzerland, 2021. [Google Scholar]
Burnett, R.; Chen, H.; Szyszkowicz, M.; Spadaro, J.V. Global Estimates of Mortality Associated with Long-Term Exposure to Outdoor Fine Particulate Matter. Proc. Natl. Acad. Sci. USA 2018, 115, 9592–9597. [Google Scholar] [CrossRef]
Zhang, Y.; Bocquet, M.; Mallet, V.; Seigneur, C.; Baklanov, A. Real-Time Air Quality Forecasting, Part I: History, Techniques, and Current Status. Atmos. Environ. 2012, 60, 632–655. [Google Scholar] [CrossRef]
Dinica, M.; Popescu, D.; Tudose, D.; Dumitru, B.; Ruse, L.; Pitale, A.; Preda, M. CityAirQ—Pollution Tracking System. Sustainability 2025, 17, 4062. [Google Scholar] [CrossRef]
Dinica, M.; Tudose, D.; Ruse, L.; Pitale, A. Mobile Air Quality Monitoring Device. In Proceedings of the 2024 23rd RoEduNet Conference: Networking in Education and Research (RoEduNet), Bucharest, Romania, 19–20 September 2024; pp. 1–6. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems 30 (NeurIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 3146–3154. [Google Scholar]
Smola, A.J.; Schölkopf, B. A Tutorial on Support Vector Regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In Proceedings of the 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13–19 July 2018; pp. 3634–3640. [Google Scholar] [CrossRef]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Teng, X.; Liu, J.; Yi, X.; Zhang, Y. PM_2.5 Concentration Prediction: A Hybrid Graph Deep Learning Model. Expert Syst. Appl. 2023, 229, 120453. [Google Scholar] [CrossRef]
Zhao, J.; Zhang, M.; Li, Y. Spatiotemporal PM_2.5 Forecasting via Dynamic Geographical Graph Neural Network. Environ. Model. Softw. 2025, 186, 106351. [Google Scholar] [CrossRef]
Yang, H.; Wang, W.; Li, G. Multi-Scale Dynamic Graph Neural Network for PM_2.5 Concentration Prediction. PLoS ONE 2024, 19, e0338392. [Google Scholar] [CrossRef]
Bodendorfer, N. A HEART for the Environment: Transformer-Based Spatiotemporal Modelling for Air Quality Prediction. arXiv 2025, arXiv:2502.19042. [Google Scholar]
Zhou, Y.; Longa, A.; Rozemberczki, B.; Liò, P. PM_2.5 Forecasting under Distribution Shift. Artif. Intell. Earth Space Sci. 2024, 2, 100029. [Google Scholar]
Li, X.; Peng, L.; Yao, X.; Cui, S.; Hu, Y.; You, C.; Chi, T. Long Short-Term Memory Neural Network for Air Pollutant Concentration Predictions: Method Development and Evaluation. Environ. Pollut. 2017, 231, 997–1004. [Google Scholar] [CrossRef]
Huang, C.-J.; Kuo, P.-H. A Deep CNN–LSTM Model for Particulate Matter (PM_2.5) Forecasting in Smart Cities. Sensors 2018, 18, 2220. [Google Scholar] [CrossRef]
Qi, Y.; Li, Q.; Karimian, H.; Liu, D. A Hybrid Model for Spatiotemporal Forecasting of PM_2.5 Based on Graph Convolutional Neural Network and Long Short-Term Memory. Sci. Total Environ. 2019, 664, 1–10. [Google Scholar] [CrossRef]
Gao, X.; Li, W. A Graph-Based LSTM Model for PM_2.5 Forecasting. Atmos. Pollut. Res. 2021, 12, 101150. [Google Scholar] [CrossRef]
Wang, S.; Li, Y.; Zhang, J.; Meng, Q.; Meng, L.; Gao, F. PM_2.5-GNN: A Domain Knowledge Enhanced Graph Neural Network for PM_2.5 Forecasting. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2020; pp. 163–166. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, F.; Du, Z.; Liu, R. Forecasting PM_2.5 Using Hybrid Graph Convolution-Based Model Considering Dynamic Wind-Field to Offer the Benefit of Spatial Interpretability. Environ. Pollut. 2021, 273, 116473. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Qin, M.; He, Y.; Mi, X.; Yu, C. A New Multi-Data-Driven Spatiotemporal PM_2.5 Forecasting Model Based on an Ensemble Graph Reinforcement Learning Convolutional Network. Atmos. Pollut. Res. 2021, 12, 101197. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, S.; Zhao, X.; Chen, L.; Yao, J.; Lu, Y. Temporal Difference-Based Graph Transformer Networks for Air Quality PM_2.5 Prediction: A Case Study in China. Front. Environ. Sci. 2022, 10, 924986. [Google Scholar] [CrossRef]
Ni, Q.; Wang, Y.; Yuan, J. Adaptive Scalable Spatio-Temporal Graph Convolutional Network for PM_2.5 Prediction. Eng. Appl. Artif. Intell. 2023, 126, 107080. [Google Scholar] [CrossRef]
Mohammadzadeh, A.K.; Salah, H.; Jahanmahin, R.; Hussain, A.E.A.; Butler, L. Spatiotemporal Integration of GCN and E-LSTM Networks for PM_2.5 Forecasting. Mach. Learn. Appl. 2024, 15, 100521. [Google Scholar] [CrossRef]
Haseeb, M.; Tahir, Z.; Mahmood, S.A.; Arif, H.; Almutairi, K.F.; Soufan, W.; Tariq, A. Comparative Analysis of Machine Learning Models for Predicting PM_2.5 Concentrations. J. Atmos. Sol.-Terr. Phys. 2024, 263, 106338. [Google Scholar] [CrossRef]
Ravindiran, G.; Karthick, K.; Rajamanickam, S.; Datta, D.; Das, B.; Shyamala, G.; Hayder, G.; Maria, A. Ensemble Stacking of Machine Learning Models for Air Quality Prediction. iScience 2025, 28, 111894. [Google Scholar] [CrossRef]
Jin, Y.; Chen, X.; Yang, Q. Deep-Learning Architecture for PM_2.5 Concentration Prediction: A Review. Environ. Adv. 2024, 16, 100519. [Google Scholar] [CrossRef]
Snyder, E.G.; Watkins, T.H.; Solomon, P.A.; Thoma, E.D.; Williams, R.W.; Hagler, G.S.W.; Shelow, D.; Hindin, D.A.; Kilaru, V.J.; Preuss, P.W. The Changing Paradigm of Air Pollution Monitoring. Environ. Sci. Technol. 2013, 47, 11369–11377. [Google Scholar] [CrossRef] [PubMed]
Kumar, P.; Morawska, L.; Martani, C.; Biskos, G.; Neophytou, M.; Di Sabatino, S.; Bell, M.; Norford, L.; Britter, R. The Rise of Low-Cost Sensing for Managing Air Pollution in Cities. Environ. Int. 2015, 75, 199–205. [Google Scholar] [CrossRef] [PubMed]
Castell, N.; Dauge, F.R.; Schneider, P.; Vogt, M.; Lerner, U.; Fishbain, B.; Broday, D.; Bartonova, A. Can Commercial Low-Cost Sensor Platforms Contribute to Air Quality Monitoring and Exposure Estimates? Environ. Int. 2017, 99, 293–302. [Google Scholar] [CrossRef]
Morawska, L.; Thai, P.K.; Liu, X.; Asumadu-Sakyi, A.; Ayoko, G.; Bartonova, A.; Bedini, A.; Chai, F.; Christensen, B.; Dunbabin, M.; et al. Applications of Low-Cost Sensing Technologies for Air Quality Monitoring and Exposure Assessment: How Far Have They Gone? Environ. Int. 2018, 116, 286–299. [Google Scholar] [CrossRef]
Karagulian, F.; Barbiere, M.; Kotsev, A.; Spinelle, L.; Gerboles, M.; Lagler, F.; Redon, N.; Crunaire, S.; Borowiak, A. Review of the Performance of Low-Cost Sensors for Air Quality Monitoring. Atmosphere 2019, 10, 506. [Google Scholar] [CrossRef]
Lewis, A.C.; von Schneidemesser, E.; Peltier, R.E. (Eds.) Low-Cost Sensors for the Measurement of Atmospheric Composition: Overview of Topic and Future Applications; World Meteorological Organization: Geneva, Switzerland, 2018; Available online: https://www.ccacoalition.org/resources/low-cost-sensors-measurement-atmospheric-composition-overview-topic-and-future-applications (accessed on 1 January 2026).
Concas, F.; Mineraud, J.; Lagerspetz, E.; Varjonen, S.; Liu, X.; Puolamäki, K.; Nurmi, P.; Tarkoma, S. Low-Cost Outdoor Air Quality Monitoring and Sensor Calibration: A Survey and Critical Analysis. ACM Trans. Sens. Netw. 2021, 17, 1–44. [Google Scholar] [CrossRef]
Badura, M.; Batog, P.; Drzeniecka-Osiadacz, A.; Modzel, P. Evaluation of Low-Cost Sensors for Ambient PM_2.5 Monitoring. J. Sens. 2018, 2018, 5096540. [Google Scholar] [CrossRef]
Malings, C.; Tanzer, R.; Hauryliuk, A.; Saha, P.K.; Robinson, A.L.; Presto, A.A.; Subramanian, R. Fine Particle Mass Monitoring with Low-Cost Sensors: Corrections and Long-Term Performance Evaluation. Aerosol Sci. Technol. 2020, 54, 160–174. [Google Scholar] [CrossRef]
Ni, J.; Chen, Y.; Gu, Y.; Fang, X.; Shi, P. An Improved Hybrid Transfer Learning-Based Deep Learning Model for PM_2.5 Concentration Prediction. Appl. Sci. 2022, 12, 3597. [Google Scholar] [CrossRef]
Zimmerman, N.; Presto, A.A.; Kumar, S.P.N.; Gu, J.; Hauryliuk, A.; Robinson, E.S.; Robinson, A.L.; Subramanian, R. A Machine Learning Calibration Model Using Random Forests to Improve Sensor Performance for Lower-Cost Air Quality Monitoring. Atmos. Meas. Tech. 2018, 11, 291–313. [Google Scholar] [CrossRef]
Spinelle, L.; Gerboles, M.; Villani, M.G.; Aleixandre, M.; Bonavitacola, F. Field Calibration of a Cluster of Low-Cost Available Sensors for Air Quality Monitoring. Part A: Ozone and Nitrogen Dioxide. Sens. Actuators B Chem. 2015, 215, 249–257. [Google Scholar] [CrossRef]
Jiao, W.; Hagler, G.; Williams, R.; Sharpe, R.; Brown, R.; Garver, D.; Judge, R.; Caudill, M.; Rickard, J.; Davis, M.; et al. Community Air Sensor Network (CAIRSENSE) Project: Evaluation of Low-Cost Sensor Performance in a Suburban Environment in the Southeastern United States. Atmos. Meas. Tech. 2016, 9, 5281–5292. [Google Scholar] [CrossRef]
Rai, A.C.; Kumar, P.; Pilla, F.; Skouloudis, A.N.; Di Sabatino, S.; Ratti, C.; Yasar, A.; Rickerby, D. End-User Perspective of Low-Cost Sensors for Outdoor Air Pollution Monitoring. Sci. Total Environ. 2017, 607–608, 691–705. [Google Scholar] [CrossRef] [PubMed]
Mead, M.I.; Popoola, O.A.M.; Stewart, G.B.; Landshoff, P.; Calleja, M.; Hayes, M.; Baldovi, J.J.; McLeod, M.W.; Hodgson, T.F.; Dicks, J.; et al. The Use of Electrochemical Sensors for Monitoring Urban Air Quality in Low-Cost, High-Density Networks. Atmos. Environ. 2013, 70, 186–203. [Google Scholar] [CrossRef]
AQICN. World Air Quality Index Project. Available online: https://aqicn.org (accessed on 1 January 2026).
Sensor.Community. Open Environmental Data with Open Hardware. Available online: https://sensor.community/en/ (accessed on 1 January 2026).
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley: Hoboken, NJ, USA, 2019. [Google Scholar] [CrossRef]
van Buuren, S. Flexible Imputation of Missing Data, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar] [CrossRef]
Cressie, N.A.C. Statistics for Spatial Data, revised ed.; Wiley: New York, NY, USA, 1993. [Google Scholar] [CrossRef]
Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 3rd ed.; OTexts: Melbourne, Australia, 2021; Available online: https://otexts.com/fpp3/ (accessed on 1 January 2026).

Figure 1. Architecture overview of the Advanced Stage 0–3 pipeline. Blue boxes represent data-processing and modelling stages, tan side boxes represent validation or calibration inputs, and arrows indicate the direction of data flow.

Figure 2. Received–from–neighbour transport features (received_sum and received_max). Blue nodes denote neighbouring stations, the green node denotes the target station, orange arrows denote directed transport contributions, and the equations show one-day-lagged inputs

(t

−

1)

.

Figure 2. Received–from–neighbour transport features (received_sum and received_max). Blue nodes denote neighbouring stations, the green node denotes the target station, orange arrows denote directed transport contributions, and the equations show one-day-lagged inputs

(t

−

1)

.

Figure 3. Construction of PCA latent temporal factors from lagged multi-station features. Blue boxes show preprocessing, the green box contains the retained traffic-component features, the tan box marks the residual-model inputs, and arrows show the transformation sequence.

Figure 4. Task A overall MAE comparison across models. Bars are coloured by model family, with blue tones for the proposed/tabular models, orange/red tones for neural and graph-inspired models, and grey tones for simple reference baselines. The dashed blue line marks the Advanced Pipeline benchmark MAE; units are

μ

g/m³.

Figure 4. Task A overall MAE comparison across models. Bars are coloured by model family, with blue tones for the proposed/tabular models, orange/red tones for neural and graph-inspired models, and grey tones for simple reference baselines. The dashed blue line marks the Advanced Pipeline benchmark MAE; units are

μ

g/m³.

Figure 5. Task A sample 30-day recursive forecast vs. actual values for Station S1 (PM_2.5).

Figure 6. Task B spatial estimation at the strongest city-scale station (SC84029): sample predictions vs. observed hold-out values. Negative values, where shown on axes or annotations, use the mathematical minus sign −.

Figure 7. Task B spatial estimation at the weakest city-scale station (SC87013): sample predictions vs. observed hold-out values.

Figure 8. Top features by permutation importance, baseline entry. Bar length represents the mean MAE increase after permutation; the bar colours distinguish the ranked features for readability.

Figure 9. Feature influence by category, baseline entry. Bar colours identify feature categories, and bar length represents average permutation importance; seasonal and neighbour-based categories dominate.

Figure 10. PCA of feature importance patterns across tasks, baseline entry. Green markers denote individual station–pollutant tasks projected into the first two principal components.

Figure 11. Feature influence by category, advanced entry. Bar colours identify feature categories, and bar length represents average permutation importance; neighbour aggregates remain dominant and transport flow contributes positively.

Figure 12. Top features by permutation importance, advanced entry. Blue bars denote the ranked engineered features, with bar length proportional to mean MAE increase after permutation.

Figure 13. PCA of feature importance patterns across tasks, advanced entry. Green markers denote individual station–pollutant tasks projected into the first two principal components.

Figure 14. Task C spatial map showing the mobile deployment location ℓ, the classic S1–S4 stations, and the Sensor.Community stations used for city-scale transfer.

Figure 15. PM time series and meteorological conditions at deployment site ℓ over the measurement window (24–29 April 2026, Bucharest local time). (a) PM₁; (b) PM_2.5; (c) PM₁₀; (d) min–max-normalised meteorology. Solid blue, orange, and red lines show hourly median PM concentrations, translucent bands show hourly min–max ranges, the solid red meteorology line shows temperature, the dashed blue line shows relative humidity, and the dotted grey line shows barometric pressure. Orange-shaded columns indicate weekend days (Saturday–Sunday). The sharp PM decline on 28 April coincides with a pressure recovery following the 26–27 April trough.

Figure 16. Daily box plots of PM readings at site ℓ. (a) PM₁; (b) PM_2.5; (c) PM₁₀. Blue, orange, and red boxes identify the three pollutant channels, box centres mark medians, boxes show interquartile ranges, whiskers show non-outlier ranges, dots mark outliers, and orange-shaded columns mark weekend days. The elevated, low-variance readings on days 2–4 contrast sharply with the post-frontal drop on day 5.

Figure 17. Task C daily prediction error distribution for the historical-transfer Random Forest (RF) and the one-day-ahead online persistence predictor (Pers.). Blue boxes show RF errors, orange boxes show persistence errors, circles show outlying daily errors, and the dashed horizontal line marks zero prediction error.

Table 1. Classic network station metadata.

Station	Location	Lat.	Lon.
S1	Aleea Politehnicii	44.4437	26.0519
S2	Strada Pirotehniei	44.4393	26.0493
S3	Strada Valea Calugareasca	44.4106	26.1106
S4	Strada Soldat Ion Ciocodeica	44.3869	26.1194

Table 2. Mobile sensor deployment summary.

Parameter	Value
Field deployment	6 calendar days at uncovered outdoor location ℓ
Sampling frequency	Approximately five-minute readings
Pollutants	PM₁, PM_2.5, PM₁₀
Weather conditions	Highly variable (includes a weekend)
Reference at ℓ	None; no co-located EPA FRM/FEM or equivalent reference monitor

Table 3. City imputation ablation (top-10 transfer, 180-day hold-out, split-aware statistics).

Imputation	Mean MAE	Mean $R^{2}$	Median $R^{2}$
Linear (split-aware)	5.0512	0.5198	0.6287
Hybrid (split-aware)	4.0234	0.6871	0.7289

Table 4. Summary of the four pipeline stages and the capabilities added at each stage.

Stage	What It Adds
Stage 0 (Base residual)	Advanced feature stack (lags, rolling statistics, seasonal encoding, transport features); multi-output HistGradientBoosting predicting seasonal residuals.
Stage 1 (Transfer eval.)	Evaluates the Stage 0 pipeline under leave-one-station-out on the classic four-station network.
Stage 2 (Tuning)	Randomised hyperparameter search on Stage 0 using TimeSeriesSplit; selects the best configuration on a validation tail.
Stage 3 (Spatial blend)	Adds a RidgeCV spatial interpolation meta-learner on top of Stage 2 residual predictions; blends base and interpolated outputs.

Table 5. Per-station MAE for all Task A models (mean across 5 rolling subsets,

μ

g/m³).

Table 5. Per-station MAE for all Task A models (mean across 5 rolling subsets,

μ

g/m³).

Model	Pollutant	S1	S2	S3	S4
Advanced Pipeline	PM₁	4.489	3.756	5.698	6.413
	PM_2.5	7.287	4.612	7.201	11.071
	PM₁₀	9.179	4.958	8.442	12.284
Random Forest	PM₁	4.711	3.909	5.993	6.864
	PM_2.5	7.656	4.827	7.489	11.792
	PM₁₀	9.423	5.304	8.744	12.917
Linear Regression	PM₁	5.548	3.256	6.296	7.757
	PM_2.5	8.929	4.019	7.782	13.588
	PM₁₀	11.325	4.380	9.103	14.669
SVR	PM₁	6.150	3.113	6.132	7.593
	PM_2.5	10.190	3.773	7.668	13.375
	PM₁₀	12.970	4.103	8.986	14.714
LSTM	PM₁	5.567	3.574	7.792	8.870
	PM_2.5	9.177	4.447	9.765	15.723
	PM₁₀	11.589	4.829	11.228	17.144
DCRNN-style (ref.)	PM₁	6.196	3.860	8.002	9.445
	PM_2.5	10.239	4.650	10.094	16.964
	PM₁₀	13.059	5.266	11.619	18.538
STGCN-style (ref.)	PM₁	6.495	3.963	9.058	9.767
	PM_2.5	11.006	4.952	11.364	18.154
	PM₁₀	13.968	5.386	12.883	19.600
Seasonal naïve	PM₁	6.840	4.730	8.630	10.070
	PM_2.5	10.790	5.540	11.240	16.830
	PM₁₀	13.120	6.090	12.580	18.410
Persistence	PM₁	7.520	5.190	9.480	11.230
	PM_2.5	12.030	6.110	12.510	18.460
	PM₁₀	14.470	6.830	13.970	20.340

Table 6. Task A MAE dispersion across station×pollutant cells (median and IQR).

Model	Median MAE	IQR
Advanced Pipeline	6.807	3.755
Random Forest	7.177	4.018
Linear Regression	7.770	4.403
SVR	7.631	5.261
LSTM	9.023	5.936
DCRNN-style (ref.)	9.769	6.016
STGCN-style (ref.)	10.386	6.936
Seasonal naïve	10.430	6.385
Persistence	11.630	7.045

Table 7. Task A RMSE and

R^{2}

summary for fitted tabular baselines (mean across rolling splits). Negative

R^{2}

values reflect the difficulty of variance-normalised scoring on short 30-day recursive forecast windows.

Table 7. Task A RMSE and

R^{2}

summary for fitted tabular baselines (mean across rolling splits). Negative

R^{2}

values reflect the difficulty of variance-normalised scoring on short 30-day recursive forecast windows.

Model	RMSE	$R^{2}$
Random Forest	9.337	−1.156
Linear Regression	10.165	−1.679
SVR	10.443	−1.066

Table 8. Task B city-scale sensitivity variants on the ten-station network.

Variant	Mean MAE	Mean RMSE	Mean $R^{2}$	Median $R^{2}$
Full same-day feature set	3.921	6.655	0.698	0.741
Current-neighbour only	3.864	6.655	0.694	0.739
Lagged-only	5.706	9.152	0.458	0.473
Seasonal-only	6.766	10.071	0.282	0.424

Table 9. Top features by permutation importance, baseline entry (mean MAE increase on hold-out set).

Feature	Importance
`dayofyear_cos`	0.4316
`PM10_neighbor_mean`	0.3005
`PM2.5_neighbor_mean`	0.1821
`PM2.5_neighbor_min`	0.0561
`PM10_neighbor_min`	0.0501

Table 10. Top features by permutation importance, advanced entry (mean MAE increase on hold-out set).

Feature	Importance
`PM10_neighbor_mean`	0.1999
`PM2.5_neighbor_mean`	0.1664
`PM2.5_neighbor_min`	0.0899
`PM10_neighbor_min`	0.0765
`SC69599_PM2.5_max`	0.0533

Table 11. Advanced-pipeline ablation on city-scale Task B.

Configuration	Mean MAE	Mean RMSE	Mean $R^{2}$
Baseline	3.918	6.644	0.699
Advanced raw	3.886	6.599	0.704
Advanced selected	3.860	6.544	0.720
Advanced pruned	3.861	6.625	0.714

Table 12. Field deployment site metadata.

Parameter	Value
Location	South Bucharest (uncovered site)
Latitude	44.390° N
Longitude	26.118° E
Deployment dates	24–29 April 2026
Total days	6 (4 full days + 2 partial boundary days)
Sampling interval	≈5 min
Total readings	767

Table 13. Daily summary statistics of PM readings at site ℓ (raw mobile sensor,

μ

g/m³).

Table 13. Daily summary statistics of PM readings at site ℓ (raw mobile sensor,

μ

g/m³).

Day	Weekday	PM₁		PM_2.5		PM₁₀
Day	Weekday	Median	Std	Median	Std	Median	Std
1	Fri	61.0	14.6	107.0	29.0	113.0	29.9
2	Sat	52.0	<0.1	84.0	<0.1	95.0	<0.1
3	Sun	52.0	<0.1	84.0	<0.1	95.0	<0.1
4	Mon	52.0	2.8	84.0	3.6	95.0	5.1
5	Tue	5.0	18.6	7.0	33.3	8.0	35.2
6	Wed	5.0	<0.1	7.0	<0.1	8.0	<0.1

Table 14. Task C exploratory prediction estimates at site ℓ.

Method	Pollutant	n	Pearson r	95% CI	$p_{perm}$	MAE	Bias
Hist. RF	PM₁	6	0.663	[−0.223, 0.999]	0.094	23.995	−18.091
Online pers.	PM₁	5	0.649	[0.250, 1.000]	0.295	11.200	11.200
Hist. RF	PM_2.5	6	0.432	[−0.972, 0.997]	0.461	40.582	−27.942
Online pers.	PM_2.5	5	0.660	[0.250, 1.000]	0.295	20.000	20.000
Hist. RF	PM₁₀	6	0.351	[−0.983, 0.994]	0.637	45.946	−31.697
Online pers.	PM₁₀	5	0.651	[0.250, 1.000]	0.151	21.000	21.000

Note: Confidence intervals are descriptive bootstrap intervals for the short pilot sample and are used to contextualise the point estimates rather than as stand-alone significance claims.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gasan, C.-L.; Tudose, D.; Ruse, L. AI-Driven Particulate Matter Forecasting and Spatial Estimation in the CityAirQ Urban Monitoring Network. Sustainability 2026, 18, 5985. https://doi.org/10.3390/su18125985

AMA Style

Gasan C-L, Tudose D, Ruse L. AI-Driven Particulate Matter Forecasting and Spatial Estimation in the CityAirQ Urban Monitoring Network. Sustainability. 2026; 18(12):5985. https://doi.org/10.3390/su18125985

Chicago/Turabian Style

Gasan, Carol-Luca, Dan Tudose, and Laura Ruse. 2026. "AI-Driven Particulate Matter Forecasting and Spatial Estimation in the CityAirQ Urban Monitoring Network" Sustainability 18, no. 12: 5985. https://doi.org/10.3390/su18125985

APA Style

Gasan, C.-L., Tudose, D., & Ruse, L. (2026). AI-Driven Particulate Matter Forecasting and Spatial Estimation in the CityAirQ Urban Monitoring Network. Sustainability, 18(12), 5985. https://doi.org/10.3390/su18125985

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Driven Particulate Matter Forecasting and Spatial Estimation in the CityAirQ Urban Monitoring Network

Abstract

1. Introduction

Contributions

2. State of the Art

2.1. Background: AI Models for Forecasting and Spatial Estimation

2.1.1. Ensemble Tree Methods

2.1.2. Classical and Neural Models

2.1.3. Graph Neural Network Models

2.2. Related Work

3. Materials and Methods

3.1. Problem Formulation and Tasks

3.2. Task A: Multi-Station Temporal Forecasting

3.3. Task B: Leave-One-Station-Out Spatial Estimation

3.4. Task C: Mobile Sensing Field Deployment

3.5. Operational Setup

3.6. Data

3.6.1. Classic Four-Station Network (AQICN-Derived)

3.6.2. Sensor.Community City Network

3.6.3. Mobile Sensing Dataset

3.6.4. Quality Assurance and Calibration Status

3.6.5. Split-Aware Imputation

3.7. Models and Feature Construction

3.8. Proposed Candidate Model

3.9. Task A: Multi-Station Temporal Forecasting Models

3.9.1. Sequence Construction

3.9.2. LSTM Forecaster

3.9.3. Classical Baselines

3.9.4. Simple Reference Baselines

3.9.5. Lightweight Graph-Inspired Reference Implementations

3.10. Task B: Spatial Estimation Feature Construction

3.11. Advanced Pipeline Architecture (Stages 0–3)

3.12. Experiment Stages

3.13. Inverse-Distance Transport Features

3.14. Residual Modelling and Latent Temporal Factors

3.15. Spatial Interpolation Meta-Learner (Stage 3)

3.16. Evaluation Protocol and Metrics

3.17. Rolling Window Evaluation (Task A)

3.18. Hold-Out Evaluation (Tasks B and C)

3.19. Metrics

4. Results

4.1. Multi-Station Temporal Forecasting (Task A)

4.2. Advanced Pipeline on Task A

4.3. Summary

4.4. Spatial Estimation on the Classic Four-Station Network (Task B)

4.5. City-Scale Sensor.Community Extension

Feature Influence Analysis

4.6. Mobile Sensing and Field Deployment (Task C)

4.6.1. Six-Calendar-Day Field Deployment

4.6.2. Exploratory Prediction at the Unseen Location

4.6.3. Prediction Interpretation

5. Discussion

5.1. Task A: Temporal Forecasting

5.2. Task B: Same-Day Spatial Estimation

5.3. Advanced Pipeline

5.4. Task C: Mobile Sensing and Field Deployment

5.5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI