You are currently viewing a new version of our website. To view the old version click .
Atmosphere
  • Article
  • Open Access

10 November 2025

A Novel Application of Choquet Integral for Multi-Model Fusion in Urban PM10 Forecasting

,
,
,
and
1
Doctoral School of Environmental Sciences, Hungarian University of Agriculture and Life Sciences, 2100 Gödöllő, Hungary
2
Laboratory of Materials, Signals, Systems and Physical Modelling, Physics Department, Faculty of Sciences, Ibn Zohr University, Agadir 80000, Morocco
3
Electrical & Computer Engineering Department, Dalhousie University, Halifax, NS B3H 4R2, Canada
4
Independent Researcher, 5038CE Tilburg, The Netherlands
This article belongs to the Special Issue Advances in Integrated Air Quality Management: Emissions, Monitoring, Modelling (4th Edition)

Abstract

Air pollution forecasting remains a critical challenge for urban public health management, with traditional approaches struggling to balance accuracy and interpretability. This study introduces a novel PM10 forecasting framework combining physics-informed feature engineering with interpretable ensemble fusion using the Choquet integral, the first application of this non-linear aggregation operator for air quality forecasting. Using hourly data from 11 monitoring stations in Budapest (2021–2023), we developed four specialized feature sets capturing distinct atmospheric processes: short-term dynamics, long-term patterns, meteorological drivers, and anomaly detection. We evaluated machine learning models including Random Forest variants (RF), Gradient Boosting (GBR), Support Vector Regression (SVR), K-Nearest Neighbors (KNN), and Long Short-Term Memory (LSTM) architectures across six identified pollution regimes. Results revealed the critical importance of feature engineering over architectural complexity. While sophisticated models failed when trained on raw data, the KNN model with 5-dimensional anomaly features achieved exceptional performance, representing an 86.7% improvement over direct meteorological input models. Regime-specific modeling proved essential, with GBR-Regime outperforming GBR-Stable by a remarkable effect size. For ensemble fusion, we compared the novel Choquet integral approach against conventional methods (mean, median, Bayesian Model Averaging, stacking). The Choquet integral achieved near-equivalent performance to state-of-the-art stacking while providing complete mathematical interpretability through interaction coefficients. Analysis revealed predominantly redundant interactions among models, demonstrating that sophisticated fusion must prevent information over-counting rather than merely combining predictions. Station-specific interaction patterns showed selective synergy exploitation at complex urban locations while maintaining redundancy management at simpler sites. This work establishes that combining domain-informed feature engineering with interpretable Choquet integral aggregation can match black-box ensemble performance while maintaining the transparency essential for operational deployment and regulatory compliance in air quality management systems.

1. Introduction

Air pollution remains a critical threat to public health and economic stability worldwide. Particulate matter, greenhouse gases, and other pollutants from both natural sources (wildfires and volcanic activity) and human activities (vehicle emissions, industrial processes, and fossil fuel combustion) affect billions of people globally [,]. The health impacts span respiratory, cardiovascular, and neurological disorders, with vulnerable populations facing disproportionate risks [,,]. Beyond direct health impacts, air pollution drives climate change by altering weather patterns and global temperatures, creating feedback loops that amplify both environmental and health consequences [,,]. Economic costs encompass healthcare expenditures, productivity losses, and agricultural yield reductions from ozone and particulate damage to crops [,,]. Particulate matter with a diameter ≤10 μm (PM10) penetrates the upper respiratory system, causing immediate health impacts and serving as a regulated pollutant under EU Directive 2008/50/EC with limit values of 50 μg/m3 (24 h mean) and 40 μg/m3 (annual mean). While PM2.5 receives attention for its deeper lung penetration, PM10 remains the primary monitored fraction in many European cities including Budapest, where comprehensive PM10 datasets enable robust model development. Hungary’s PM10 levels frequently exceed EU limits during winter inversions and spring Saharan dust events, making accurate forecasting essential for public warnings [].
As urbanization and industrialization intensify air quality challenges, accurate forecasting systems are essential for informing policies and protecting vulnerable populations [,]. Early prediction approaches relied on Chemical Transport Models (CTMs) such as WRF-Chem, CHIMERE, and CMAQ [,,], which simulate pollutant transport through deterministic physical–chemical equations. However, CTMs depend on meteorological inputs from Numerical Weather Prediction (NWP) models, introducing significant errors in complex urban environments where topology and street canyons dramatically alter wind patterns and pollutant dispersion [,]. These accumulated uncertainties often degrade CTM performance, driving researchers toward more flexible data-driven approaches.
In recent years, the rapid advancement of artificial intelligence, particularly machine learning, has transformed air quality forecasting. Researchers have applied a wide range of machine learning models, including Random Forest (RF), Gradient Boost Regressor (GBR), and Support Vector Regression (SVR), with varying success [,,]. While some studies found GBR to outperform other models [], others concluded that RF and SVR were more robust across different regions and pollutants [,]. This apparent contradiction highlights a core tenet of machine learning: there is no single ‘best’ model. The varying performance across studies leads us to conclude that model superiority depends critically on local conditions, data characteristics, and prediction horizons [,]. A model’s performance is critically dependent on a confluence of factors, including the dataset’s characteristics, regional meteorology, and the specific prediction horizon.
To overcome the limitations of a single model and leverage its unique strengths, researchers began exploring ensemble fusion techniques. Early statistical methods like Bayesian Model Averaging (BMA) and Convex Weighted Averaging (CWA) offered a way to combine model outputs [,]. BMA, in particular, provides a statistically rigorous, probabilistic approach by weighting models based on their historical performance. However, these methods often struggled with the non-linear, interdependent relationships inherent in atmospheric data, assuming a simple, linear combination of model outputs [].
The field took a significant leap with the application of deep learning models, such as Long Short-Term Memory (LSTM) networks, which have the capacity to process long-sequence data and learn complex temporal patterns, making them ideal for air quality time-series forecasting [,,]. LSTMs circumvent the gradient vanishing and explosion issues of traditional Recurrent Neural Networks (RNNs) through a sophisticated gating mechanism that manages data retention, enhancing long-term prediction accuracy. This shift culminated in the widespread use of stacking ensembles, a powerful fusion technique where the forecasts of multiple base models are fed into a meta-learner (often another machine learning model) that learns the optimal way to combine them [,,]. This approach has shown remarkable accuracy, consistently outperforming single models by learning the unique biases and strengths of each base learner.
Despite achieving state-of-the-art performance, deep learning fusion methods suffer from the “black box” problem, providing accurate predictions without revealing how models are combined or interact [,]. This opacity limits operational adoption where understanding model behavior is essential for trust and debugging. To address this gap, this paper applies the Choquet integral, an aggregation operator that explicitly reveals model interactions through mathematical coefficients distinguishing synergistic from redundant relationships [].
This study introduces, for the first time in air quality modeling, the Choquet integral as an interpretable aggregation method for PM10 forecasting in Budapest, Hungary. Using hourly data from 11 monitoring stations (2021–2023), we develop specialized feature sets capturing distinct atmospheric processes, train 11 machine learning models targeting different pollution regimes, and apply Choquet integral fusion to combine predictions while revealing model interactions through Möbius coefficients. Unlike black-box stacking, this approach provides transparency in how models are weighted and which pairs exhibit synergy versus redundancy, though individual model decisions remain opaque. We demonstrate that this “fusion-level interpretability” achieves comparable performance to fully opaque methods while enabling model debugging, trust calibration, and regulatory compliance essential for operational deployment.

2. Materials and Methods

2.1. PM10 Data Assessment

Hourly PM10 concentrations and meteorological data were collected from 11 monitoring stations in Budapest, Hungary (47.5° N, 19.0° E, population 1.75 million) for the years 2021–2023. The city’s continental climate, winter inversions, and complex topography create PM10 annual means of 25–35 μg/m3 with frequent exceedances, representing typical Central European urban conditions. Station data was collected from the public database of the Hungarian meteorological services [] and was merged with meteorological observations using temporal alignment with a 30 min tolerance window. Outlier detection employed a three-stage robust filtering pipeline consisting of physical range clipping based on domain constraints, rolling median absolute deviation (MAD) despiking with threshold k = 6, and robust z-score filtering at k = 7 using MAD-based standardization. These thresholds are intentionally conservative to distinguish sensor malfunctions from genuine pollution episodes.
  • k = 6 for MAD filtering: Assuming a Gaussian distribution, this corresponds to ~4.5σ, capturing only the top 0.0003% of values. Legitimate extreme events (Saharan dust, winter inversions) typically reach 2–3σ and are preserved.
  • k = 7 for z-score: Even more conservative, removing only values deviating by 7 MAD-based standard deviations from the rolling median.
Gaps created by outlier removal were interpolated with a maximum window of 6 h to maintain temporal continuity while avoiding artificial smoothing of genuine data gaps. These steps were performed using Python 3.9.7.
Table 1 shows the 11 monitoring stations, their type and the percentage of missing values in the period studied.
Table 1. Quality Assessment of PM10 data in Budapest.

2.2. Study Design

The study follows a seven-stage process as illustrated in Figure 1:
Figure 1. Process flow of the study design.
Stage 1: Data Partitioning—The dataset was split temporally (70% training, 15% validation, 15% testing) to preserve chronological order and prevent future data leakage. No random shuffling was performed, ensuring models are trained on past data and evaluated on genuinely unseen future conditions.
Stage 2: Feature Engineering (Section 2.2.1.)—From raw measurements, 47 predictor variables (features) were constructed to serve as model inputs. These features comprise three categories:
  • PM10-based features: Historical PM10 values at various lags ( t 1 to t 24 ), rolling statistics (24 h and 168 h means and variances), and short-term changes (3 h and 6 h deltas)
  • Meteorological features: Temperature, wind speed, wind direction, relative humidity, atmospheric pressure, boundary layer indicators, and derived stability indices
  • Temporal features: Cyclical encodings of hour, day of week, month, and binary indicators for weekends and holidays
All 47 features are extracted at each time point and constitute the complete feature pool available for subsequent model training and prediction.
Stage 3: Regime Classification (Section 2.2.2.). At each time point t in the training data, the atmospheric state was classified into one of six regimes using specific historical indicators from the feature set which are detailed in Section 2.2.2.
A deterministic priority hierarchy (extreme > rising/falling > rush/nocturnal > stable) resolves overlapping regime assignments, ensuring each time point receives exactly one regime label. All regime indicators utilize exclusively data from time t 1 and earlier.
Stage 4: Regime-Specific Feature Selection—Each regime is paired with features that capture its dominant physical processes. This regime-feature allocation follows atmospheric physics:
  • Extreme regime: Meteorological features (temperature, wind speed, pressure, humidity) + PM10( t 1 ), as extreme pollution episodes are primarily weather-driven
  • Rising/Falling regimes: Short-term PM10 features (lags t 1 to t 3 , 3 h and 6 h changes), capturing momentum dynamics
  • Stable regime: Long-term PM10 features (24 h and 168 h rolling statistics, daily and weekly patterns), as stable conditions follow predictable temporal cycles
  • Rush/Nocturnal regimes: Traffic indicators + temporal encodings + boundary layer features, reflecting diurnal emission patterns
This feature allocation strategy enables each model to focus on the most relevant predictors for its associated atmospheric conditions.
Stage 5: Model Training (Section 2.3). Five machine learning architectures (Random Forest, Gradient Boosting Regressor, Support Vector Machine, K-Nearest Neighbors, Long Short-Term Memory networks) were implemented per station using scikit-learn 1.7.2 and TensorFlow in Python []. Hyperparameters (learning rates, tree depths, neighbor counts, etc.) were optimized separately for each model configuration using the validation set (15% of data) to prevent overfitting.
Stage 6: Generate a 1 h ahead forecast PM10( t + 1 ) in real-time. We built 11 models separately for each weather station, resulting in 121 models in total.
Stage 7: Individual models fusion using 3 methods (BMA, Meta-Learner and Choquet Integral).
The forecasting process maintains strict temporal causality. Regime classification at time t depends solely on features computed from observations at t 1 and earlier. The regime determines which model and features to use for forecasting, but the regime itself is determined by historical conditions, never by the future value of PM10( t + 1 ) being predicted. This eliminates data leakage and ensures operational validity.

2.2.1. Features Engineering

To capture the multi-scale temporal dynamics and heterogeneous physical processes governing PM10 concentrations, we designed a feature engineering strategy that constructs four complementary feature sets. Each set was developed to emphasize distinct aspects of pollution behavior, promoting model specialization and ensuring diverse error structures suitable for ensemble fusion.
Short-Term Dynamics Features
This feature set comprised 11 variables targeting immediate temporal dependencies and rapid transitions characteristic of traffic-induced variations and short-term meteorological impacts. The set included PM10 lags at t 1 , t 2 , and t 3 h, with temporal differences computed as:
1 t = y t 1 y t 2 ,       2 t = y t 2 y t 3
Wind components were encoded to preserve directional continuity:
W D s i n t = sin θ W D t ,       W D c o s t = c o s ( θ W D t )
where θ W D is wind direction in radians. Hourly cyclical patterns were captured through:
h s i n t = sin 2 π . h o u r t 24 , h c o s t = cos 2 π . h o u r t 24
Long-Term Pattern Features
This set incorporated variables operating at extended temporal scales to detect weekly cycles, seasonal trends, and persistent atmospheric patterns. It included PM10 lags at 24, 48, and 168 h, along with rolling statistics computed using past-only windows:
y ¯ w t = 1 w i = 1 w y t i ,       σ w t = 1 / w i = 1 w ( y t i y ¯ w t ) 2
where y represents PM10 concentration (μg/m3) at each time point, y ¯ w (also referred to as Rolling_mean in Table 2) is the rolling mean of PM10 over the past hours w , and σ w (also referred to as Rolling_std) is the rolling standard deviation with w 24,168 hours with minimum observations n m i n = m a x ( 3 , w / 3 ) .
Table 2. Feature set composition and characteristics.
Annual seasonality was encoded as:
m s i n t = sin 2 π . m o n t h t 12 ,       m c o s t = cos 2 π . m o n t h t 12
Baseline meteorological variables (temperature, relative humidity, global radiation) were included to capture seasonal atmospheric conditions.
Meteorological Driver Features
This set emphasized atmospheric dispersion mechanisms through contemporaneous and lagged (6 h, 12 h) meteorological variables. An experimental indicator was defined as:
P p r o x y t = T t R H T + ε
where T is temperature ( ° C ) , R H is relative humidity ( % ) , and ε = 10 6 . While this ratio does not directly measure atmospheric pressure or stability, it was included as an exploratory feature to capture potential relationships between temperature-humidity conditions and mixing characteristics. The indicator remains positive and continuous under Budapest’s typical meteorological conditions during pollution episodes. Decomposed wind vectors and lagged meteorological features accounted for the delayed impact of atmospheric conditions on pollutant accumulation.
Anomaly Detection Features
To enhance model robustness during extreme events, this set quantified deviations from expected patterns. Standardized z-scores were computed as:
z t = y t μ e x p ( t 1 ) σ e x p ( t 1 )
where μ e x p   a n d   σ e x p are expanding mean and standard deviation from all historical values up to t   1 .
Deviations from periodic patterns were calculated as:
δ d a i l y t = y t   y ¯ h t   ,     δ w e e k l y t = y t   y ¯ 168 t 1
where y represents PM10 concentration at each time point.
Binary indicators flagged unusual conditions including nocturnal low-wind events (WS < 2 m/s, 00:00–06:00) and high-temperature-low-wind combinations exceeding the 90th percentile. To prevent data leakage, all temporal features were computed using strict historical information with shift (1) operations excluding current observations. Table 2 summarizes the composition and characteristics of the four feature sets, demonstrating how each targets specific physical processes: traffic-induced immediate dispersion, weekly cycles and seasonal trends, boundary layer dynamics, and extreme events.

2.2.2. Regime Identification

To enable conditional model specialization, we identified six distinct pollution regimes based on concentration variability and temporal patterns. Regime boundaries were established using training set quantiles applied to past-only signals:
R s t a b l e t = 1 σ 6 ( t 1 ) < Q 30 ( σ 6 t r a i n )
R r i s i n g t = 1 3 ( t 1 ) > Q 75 ( 3 t r a i n )
R f a l l i n g t = 1 3 ( t 1 ) < Q 25 ( 3 t r a i n )
R e x t r e m e t = 1 y t 1 > Q 90 ( y t r a i n )
where 3 t = y t y t 3   a n d   σ 6 is the 6 h rolling standard deviation, and 1 [ . ] is the indicator function.
Temporal regimes captured diurnal patterns:
R R u s h t   =   1 h o u r ( t ) 7,8 , 9
R n o c t u r n a l t = 1 h o u r t 0 , , 5 W S t 1 < Q 40 ( W S t r a i n )
Since atmospheric conditions can satisfy multiple regime criteria simultaneously, we implement a hierarchical priority system to ensure deterministic regime assignment. The priority order proceeds from highest to lowest as follows: extreme conditions take precedence over all others when R e x t r e m e t   =   1 , followed by transitional regimes when either R r i s i n g t   =   1 or R f a l l i n g t   =   1 , then temporal regimes when R R u s h t   =   1 or R n o c t u r n a l t   =   1 , and finally stable conditions with R s t a b l e t = 1 serving as the default classification. For example, during a morning rush hour (07:00–09:00) with PM10(t−1) > Q90, both R R u s h and R e x t r e m e equal 1. The system assigns the extreme regime due to its higher priority. When no specific conditions are met (~15% of hours), the stable regime serves as a default, using conservative model parameters suitable for steady-state conditions. This hierarchical approach ensures exactly one regime is active per prediction time, preventing ambiguity in model selection while prioritizing the most critical atmospheric conditions for accurate PM10 forecasting.
Each regime was paired with feature sets aligned to its dominant physical processes: short-term features for rapid transitions (rising, falling, morning rush), long-term features for stable conditions, meteorological features for dispersion-dominated periods, and anomaly features for extreme events. This regime-based approach acknowledges the non-stationary nature of PM10 dynamics, enabling models to develop specialized expertise for specific pollution behaviors.
The multi-feature, multi-regime framework ensures that individual models capture distinct aspects of PM10 dynamics, creating complementary prediction patterns amenable to various fusion strategies including weighted averaging, stacking, Bayesian model averaging, and Choquet integral aggregation. The diversity in feature representations and regime specialization promotes ensemble robustness across varying atmospheric conditions and pollution scenarios.

2.3. Machine Learning Models

2.3.1. Random Forest Regressor (RF)

Random Forest models [] were configured with 400 trees and adaptive maximum depth based on regime specialization. For high-pollution and transition regimes, no depth constraint was imposed to capture complex non-linear relationships, while stable conditions employed depth limitation (max_depth = 10) to prevent overfitting during low-variability periods. Minimum samples per leaf varied between 2 for transition detection and 5 for stable conditions. The bootstrap aggregation mechanism proved particularly effective for PM10 prediction, with out-of-bag error estimates indicating optimal forest size at 400 trees (convergence achieved at 350 trees with <1% improvement thereafter).
Asymmetric loss variants were implemented using sample weighting:
w i = exp ( ( y i μ y ) / σ y )   for underestimation-averse models
w i = exp ( ( y i μ y ) / σ y )   for overestimation-averse variants
where μ y   and   σ y represent the training sets’ mean and standard deviation.
This weighting scheme penalizes prediction errors asymmetrically, with underpredict-averse models assigning exponentially higher weights to high-concentration samples. Node splitting utilized Gini impurity with bootstrap sampling, while out-of-bag error estimation provided internal validation without requiring a separate validation set. Feature importance was calculated through mean decrease in impurity, weighted by the probability of reaching each node and the number of samples affected.

2.3.2. Gradient Boosting Regressor (GBR)

Two distinct Gradient Boosting configurations [] were implemented targeting different temporal dynamics. For stable conditions, we employed a conservative architecture with 500 trees, learning rate η = 0.03, maximum depth of 3, and subsample ratio of 0.9. This configuration prioritizes gradual refinement over aggressive fitting, suitable for capturing smooth temporal transitions. The loss function minimization follows:
F m x   =   F m 1 x   +   η . h m ( x )
where h m represents the m-th weak learner fitted to the negative gradient of the loss function. For regime-specific models, we utilized a more aggressive configuration with 300 trees, η = 0.05, and a maximum depth of 4, enabling faster adaptation to regime-specific patterns.
Robustness was enhanced through Huber loss for outlier resistance when y i y ^ i exceeded 1.35σ, transitioning from squared to linear loss. Feature subsampling (0.8) at each split introduced stochasticity to reduce overfitting. Early stopping with a patience of 50 iterations prevented overspecialization to training data, triggered when validation loss failed to improve by more than 10−4.

2.3.3. Support Vector Machine (SVM)

Support Vector Regression [] with radial basis function (RBF) kernels was deployed for meteorological feature sets, leveraging its effectiveness in high-dimensional spaces with complex non-linear relationships. The optimization problem was formulated as:
min 1 2   w 2   +   C i = 1 n ( ξ i   +   ξ i * )
Subject to:
y i     w ,   ϕ x i     b     ε   +   ξ i
w ,   ϕ x i + b   y i     ε + ξ i *
where ϕ maps input to a high-dimensional feature space via the RBF kernel k x , x = exp ( γ x x 2 ) . The regularization parameter C = 10.0 balanced model complexity with training error, while γ   = 1 d V A R ( X ) adapted to feature scaling. The ε-insensitive tube width was set to 0.1, permitting small prediction deviations without penalty. Feature standardization preceded kernel computation to ensure equal contribution across meteorological variables with different units.

2.3.4. K-Nearest Neighbors (KNN)

KNN regression [] with k = 15 neighbors and distance-weighted voting was employed for anomaly feature sets, exploiting local similarity in unusual conditions. The prediction followed:
y ^ x   =     i N k ( x ) w i . y i i N k ( x ) w i
where w i = 1 d x , x i     a n d     N k ( x ) represents the k-nearest neighbors of x.
Distance calculations used Minkowski metric with p = 2 (Euclidean), after robust scaling to handle outliers. The relatively large k value provided smoothing over local noise while maintaining responsiveness to anomaly patterns. Leaf size of 30 optimized tree construction for the Ball Tree algorithm, balancing query speed with construction time.

2.3.5. Long-Short Term Memory (LSTM)

Multiple LSTM architectures were developed to capture temporal dependencies at varying scales. The core LSTM cell computations followed []:
f t   =   σ ( w f . h t 1 ,   x t   +   b f )
i t = σ ( w i . h t 1 ,   x t + b i )
C ~   t = t a n h ( w C . h t 1 ,   x t + b C )
C   t =   f t   × C t 1 + i t   C ~   t
h t = o t   t a n h ( C t )
where w f , w i and w C are weight matrices; b f , b i and b C are bias constants; and σ is the corresponding sigmoid function. The neural network filters the data through the forgetting gate f t . By evaluating the forgotten information of the previous state f t C t 1 , the useful information i t   C ~   t is remembered from the current state, and then h t is fed forward to the next hidden LSTM layer to update the state C t .
Architecture configurations were specialized for different temporal patterns:
  • Short transitions: lookback = 12 h, 64 LSTM units, learning rate = 2 × 10−3
  • Long patterns: lookback = 168 h, 128 LSTM units, learning rate = 5 × 10−4
  • Multivariate: lookback = 24 h, 96 LSTM units, features = [PM10, T, RH, WS]
  • Balanced baseline: lookback = 24 h, 96 LSTM units, learning rate = 1 × 10−3
Each architecture incorporated dropout (p = 0.2) after the LSTM layer for regularization, followed by a dense layer with 32 ReLU-activated units. For asymmetric variants, we implemented custom loss functions:
L a s y m y ,   y ^ ,   λ u ,   λ o   =   λ u ( y     y ^ ) 2 + +   λ o ( y     y ^ ) 2  
where ( z ) + = max z , 0     a n d     ( z ) = max z , 0
Training employed Adam optimization with early stopping (patience = 5 epochs) and learning rate reduction (factor = 0.5, patience = 3) based on validation loss. Input sequences were standardized using training set statistics, with separate scalers for features and targets to preserve scale relationships. Sequence generation used sliding windows with single-step advancement, ensuring maximum temporal coverage while maintaining causal consistency. Validation split of 10% or a minimum of 50 samples prevented overfitting while ensuring sufficient training data.
The ensemble of specialized LSTM variants captured complementary temporal patterns: short-transition models excelled at sudden changes, long-pattern variants identified weekly cycles, while multivariate configurations leveraged cross-variable dependencies during complex atmospheric conditions.
Table 3 summarizes the final configurations and hyperparameters for all 11 models.
Table 3. Feature set composition and models configuration.

2.4. Fusion Techniques

The heterogeneous nature of our expert models, spanning tree-based algorithms, neural networks, and kernel methods with diverse feature specializations, necessitates sophisticated fusion strategies to optimally combine their predictions. While individual models capture specific aspects of PM10 dynamics, their complementary strengths and varying error patterns suggest potential for improved performance through ensemble aggregation. To comprehensively evaluate the proposed Choquet integral fusion approach, we implemented a spectrum of aggregation methods representing current best practices in ensemble learning to forecast PM10 concentration 1 h ahead, suitable for real-time warning systems requiring hourly updates. At each timestamp, all 11 expert models run in parallel, producing predictions that are aggregated through learned fusion weights; no regime-based hard-gating or pre-selection occurs. Our fusion framework encompasses three categories of increasing complexity:
(i)
Baseline aggregators (mean, median) that require no training but provide robust performance benchmarks.
(ii)
Linear combination methods including Bayesian Model Averaging (BMA), which has demonstrated success in meteorological applications []
(iii)
Stacking with meta-learning, which has achieved state-of-the-art performance in recent air quality studies []
(iv)
Choquet integral, a fuzzy measure (The term “fuzzy measure” is standard mathematical nomenclature for non-additive measures and should not be confused with fuzzy set theory.) based aggregator that uniquely captures both importance weights and interaction effects between models.
The selection of comparison methods was motivated by their proven effectiveness in environmental prediction tasks. Stacking has shown 15–25% improvement over individual models in PM2.5 forecasting [], while BMA provides probabilistically principled weight assignment with demonstrated robustness to model uncertainty. These methods, however, assume either linear relationships (BMA) or learn purely empirical combinations (stacking) without explicitly modeling inter-model dependencies. The Choquet integral addresses this limitation by incorporating a mathematical framework for synergy and redundancy, potentially offering superior performance when expert models exhibit complex interaction patterns. All fusion methods were evaluated under identical conditions: 15% of data for calibration, consistent expert model pools, and standardized preprocessing. This controlled comparison enables rigorous assessment of each method’s ability to exploit the complementary information encoded across our specialized expert models.

2.4.1. Baseline Aggregation Methods

Simple aggregation methods provided performance benchmarks. The arithmetic mean aggregator computed:
y ^ m e a n t = 1 M m = 1 M y ^ m ( t )
where M denotes the number of valid predictions at time t. The median aggregator provided robust central tendency estimation resistant to outlier predictions. Both methods required no training and served as lower bounds for fusion performance.

2.4.2. Bayesian Model Averaging (BMA)

BMA weights expert predictions based on their posterior probability given the calibration data. The BIC-based weights were computed as:
w k = e x p ( 1 2 B I C k ) j = 1 M e x p ( 1 2 B I C j )
where the Bayesian Information Criterion for model k is:
B I C k = 2 ln L k + p k l n ( n )
With L k representing the likelihood under Gaussian residuals assumption, p k = 1 (single parameter per model), and n the calibration sample size. The fused prediction follows:
y ^ B M A t = k = 1 M w k y ^ k t
This approach naturally penalizes model complexity while rewarding predictive accuracy, providing probabilistically principled weight assignment.

2.4.3. Stacking Ensemble with Meta-Learning

Stacking employed a two-level architecture where a meta-learner combined base model predictions. Using 5-fold time series cross-validation, we generated out-of-fold predictions to train the meta-model while avoiding data leakage:
y ^ s t a c k t = f m e t a ( y ^ 1 t , y ^ 2 t , , y ^ M t )
Three station-specific meta-learners were evaluated: Ridge regression with cross-validated α ∈ {0.01, 0.1, 1.0, 10.0}, Elastic Net with α = 0.01 and l1-ratio = 0.5, and Gradient Boosting with 100 trees and maximum depth of 3. Gradient Boosting was selected as the meta-learner architecture based on validation performance. Each station has its own meta-learner trained on local base model predictions, allowing station-specific weighting patterns to emerge. Robust scaling preceded meta-learning to handle heterogeneous prediction scales.

2.4.4. Choquet Integral Fusion

The Choquet integral provides a powerful non-linear aggregation framework that captures both individual model importance and their interactions. Unlike traditional weighted averaging, it models complementarity and redundancy between experts through a fuzzy measure. However, we emphasize that this does not explain the internal decision-making of individual models, which remains opaque.
For a set of M expert models N = {1, 2, …, M}, the Choquet integral with respect to fuzzy measure μ is defined as:
C μ x = i = 1 M x ( i ) . [ μ A i μ A i + 1 ]
where ( . ) indicates a permutation such that x ( 1 ) x 2 x M , A i = i , i + 1 , , m   and   A M + 1 =
To maintain tractability while capturing interactions, we employed the 2-additive Choquet integral using the Möbius transform representation:
μ S = T S m ( T )
where the Möbius mass m is restricted to:
  • Singletons: m ( { i } ) representing individual importance
  • Pairs: m ( { i , j } ) representing pairwise interactions
  • Empty set and larger subsets: m T = 0   for   T > 2
The Choquet integral then simplifies to:
C μ x = i = 1 M x i m i + { i , j } N m i , j m i n ( x i , x j )
To ensure the fuzzy measure remains monotonic (adding experts never decreases the measure), we impose:
m i 0 ,   i N
m i + m i , j   0 ,     i , j N ,   i j
Additionally, the normalization constraint ensures T N m ( T ) = 1
The Möbius coefficients were learned by minimizing the mean squared error on calibration data:
m i n m 1 n c a l t = 1 n c a l ( y t C μ ( x t ) ) 2
Subject to monotonicity and normalization constraints. We employed two optimization strategies:
  • COBYLA (Constrained Optimization BY Linear Approximations) from SciPy version 1.7.3 in Python 3.9.7. A derivative-free method suitable for constrained optimization, with maximum iterations set to 2000.
  • Differential Evolution: A global optimization method with population-based search, using bounds [0, 1] for singletons and [−0.5, 0.5] for pairs.
The optimization was initialized with equal singleton weights m i = 1 / M and zero pairwise interactions. To balance model diversity with quality, we evaluated Choquet integrals using the k-best experts based on calibration RMSE, with k ∈ {3, 5, 7, 10, all}. This approach prevents dilution from poorly performing models while maintaining sufficient diversity.
The Shapley value provides a game-theoretical interpretation of each expert’s contribution:
ϕ i = m i + 1 2   j i m ( { i , j } )
Representing the average marginal contribution of expert i across all possible coalitions.
The interaction between experts i and j was also assessed and is directly given by the Möbius mass:
m ( { i , j } ) > 0 : Synergistic interaction   ( complementary expertise )
m ( { i , j } ) < 0 : Redundancy   ( overlapping information )
m i , j = 0 : Independent contributions

3. Results

3.1. Feature Engineering and Model Architecture Analysis

The Comprehensive evaluation of 11 specialized models across monitoring stations revealed highly significant performance stratification by architecture class (Kruskal–Wallis H = 51.16, p = 0.00). This extreme significance indicates fundamental differences in how architectures capture PM10 dynamics. K-Nearest Neighbors with anomaly-detection features achieved superior performance (RMSE = 1.80 ± 0.71 μg/m3, R2 = 0.979) as seen in Table 4 and Figure 2, representing a minimum of 60.8% improvement over the average performance of all individual models and 86.7% improvement over the worst-performing SVR model. The Gradient Boosting comparison provides compelling evidence for regime-based modeling. GBR-Regime (RMSE = 4.60 ± 1.20 μg/m3) demonstrated dramatically superior performance compared to GBR-Stable (RMSE = 10.82 ± 2.16 μg/m3; paired t-test: t (10) = −13.61, p = 0.00, Cohen’s d = −4.10). This effect size of −4.10 is exceptionally large by any standard. Cohen classified effect sizes as small (d = 0.2), medium (d = 0.5), and large (d ≥ 0.8) making our observed effect substantially larger than typical environmental modeling effects []. To our knowledge, standardized effect sizes are rarely reported in atmospheric ML model comparisons, which typically focus on RMSE, MAE, and R2 metrics with statistical significance assessed via paired t-tests. This large effect size quantifies the severe performance penalty of ignoring regime transitions in atmospheric forecasting, demonstrating that the assumption of stationarity fundamentally undermines model accuracy beyond what traditional metrics alone reveal []. The stable variant’s R2 = 0.331 versus regime-specific R2 = 0.81 indicates that 48.2% of the variance explanation is lost when ignoring atmospheric regime transitions. Random Forest architectures exhibited statistical invariance to asymmetric loss functions (ANOVA F = 0.00, p = 0.9952), though the Friedman test detected subtle ranking differences (χ2 = 6.73, p = 0.01273). The contrast between parametric and non-parametric tests suggests that while mean performances are identical, the models exhibit different failure patterns across stations. The negligible ΔRMSE across variants (ΔRMSE %: 2.06) confirms that bootstrap aggregation’s variance reduction overwhelms targeted loss weighting benefits, validating the theoretical prediction that ensemble methods naturally resist prediction bias.
Table 4. Cross-station evaluation metrics of model architectures.
Figure 2. RMSE distributions by feature-set expert (PM10) across stations.
LSTM architectures showed statistically equivalent performance across different lookback windows. The 24 h configuration achieved optimal performance (RMSE = 6.01 ± 3.08 μg/m3, R2 = 0.782), while both shorter (12 h: RMSE = 6.08 ± 3.15) and longer (168 h: RMSE = 6.10 ± 2.977) (RMSE ± Standard variation) contexts showed degradation. Statistical comparison between extreme contexts (12 h vs. 168 h: t = −0.16, p = 0.8759) indicates no significant difference, suggesting information saturation beyond diurnal cycles.
The comparison between 24 h and 24 h+ meteorological inputs (t = −3.84, p = 0.0032) reveals a paradoxical performance penalty from additional information. The multivariate LSTM (RMSE = 11.71 ± 4.84 μg/m3, R2 = 0.18) represents catastrophic failure, with performance 94.7% worse than the optimal univariate configuration. This degradation, despite the theoretical advantages of multivariate inputs, indicates that conflicting temporal scales between meteorological (synoptic: 72–120 h) and pollution (diurnal: 24 h) signals create irreconcilable optimization challenges in the shared recurrent state space.
Despite near-identical mean performance across asymmetric RF variants (short-term feature: 5.85 ± 3.09, underpredicted averse: 5.94 ± 3.09, overpredicted averse: 5.97 ± 2.98 μg/m3), the models serve distinct operational purposes. The underpredict-averse variant reduces Type II errors during pollution episodes by 2.0% compared to the standard configuration, critical for public health warnings where false negatives carry higher costs than false positives. The invariance across loss functions (all pairwise p > 0.9952) suggests that the 400-tree ensemble with maximum depth 10 has reached an information-theoretic ceiling for the short-term feature space. This plateau at R2 ≈ 0.79–0.80 across all variants indicates that approximately 20% of PM10 variance remains irreducible noise or requires features beyond the current 11-dimensional short-term dynamics representation.
SVR-RBF’s catastrophic performance (RMSE = 13.59 ± 2.24 μg/m3, R2 = −0.048) warrants detailed examination as a cautionary case. The negative R2 indicates predictions 4.8% worse than using the unconditional mean, representing complete model failure. With n = 11 stations and 14 dimensional meteorological features, the sample-to-dimension ratio of 0.79 falls below the theoretical threshold for RBF kernel convergence in high-dimensional spaces. The model’s inability to generalize stems from the curse of dimensionality in kernel space. With Gaussian RBF kernels, the effective number of parameters grows exponentially with feature dimension, requiring O(exp(d)) samples for consistent estimation. Our configuration with d = 14 and n-effective ≈ 11 × 8760 h creates a severely underdetermined system where regularization dominates, forcing near-constant predictions.
The exceptional performance of KNN-Anomaly (RMSE = 1.80 μg/m3) validates instance-based learning for non-stationary atmospheric systems. With k = 15 neighbors and distance weighting, the model implicitly performs local polynomial regression in the 5-dimensional anomaly space. The dramatic improvement over global models suggests that PM10 dynamics exhibit local linearity in deviation space despite global non-linearity in absolute concentration space. Analysis of the nearest neighbor sets during extreme events (PM10 > 90th percentile) would likely reveal temporal clustering, where similar deviations from seasonal/diurnal means identify analogous atmospheric conditions regardless of absolute concentration levels. This scale-invariant similarity metric explains the model’s robust performance across the 10-fold concentration range observed across stations. However, despite KNN-Anomaly’s superiority, significant performance gaps emerge under specific conditions that motivate ensemble fusion approaches. Station-specific analysis reveals that KNN-Anomaly’s performance degrades at high-traffic locations (RMSE increases to 2.64 μg/m3 at Erzsebet square). Similarly, during rapid morning transitions (06:00–09:00), LSTM-Short captures temporal derivatives that KNN-Anomaly’s similarity-based approach misses, reducing prediction lag by 1.3 h. These complementary failure modes where no single model dominates across all spatiotemporal conditions establish the theoretical foundation for fusion methods.

3.2. Performance of Fusion Techniques

The performance comparison reveals that the Stacking ensemble and Choquet Integral with 5 experts (denoted as Choquet-k5 in the rest of the paper) achieve statistical equivalence despite the 0.16 μg/m3 nominal difference as shown in Figure 3. The pairwise significance test confirms no significant difference (p > 0.05). This statistical equivalence is remarkable given their fundamentally different approaches: Stack employs black-box non-linear learning while Choquet uses transparent fuzzy measure aggregation. The effect size analysis (Cohen’s d = −0.3) between Stack and Choquet-k5 falls well below the threshold for even a small effect (|d| < 0.5), confirming practical equivalence. In contrast, both methods show huge effects compared to BMA (d > 3.0) and mean aggregation (d > 3.5), establishing them as a distinct performance tier. This two-tier structure, sophisticated fusion (Stack/Choquet) versus simple aggregation (BMA/mean), suggests that the choice between Stack and Choquet should be based on secondary considerations rather than raw performance. Table 5 shows the performance of fusion techniques at 11 stations.
Figure 3. Scatter plots of different fusion methods in 4 stations on the held-out test set: (a) Erszebet square station; (b) Honved station; (c) Budatétény station; (d) Széna square station.
Table 5. Performance of fusion techniques at 11 stations.
The marginal performance difference between the Stacking ensemble and Choquet-k5 represents a 9.6% RMSE increase, within typical measurement uncertainty for PM10 sensors (±10–15%). This negligible practical difference must be weighed against the Choquet integral’s substantial interpretability advantages: Interaction matrices revealing synergies and redundancies, and mathematical guarantees through fuzzy measure theory.
For operational deployment requiring regulatory compliance or stakeholder communication, the ability to explain why specific predictions were made often outweighs marginal accuracy improvements. The Choquet integral provides complete algorithmic transparency; every prediction can be decomposed into individual and interaction contributions, while Stack remains an opaque combination of 100 regression trees.
Choquet Integral’s performance demonstrated strong sensitivity to the number of included expert models (K), with evaluation across five ensemble sizes: K ∈ {3, 5, 7, 10, 13} as shown in Figure 4. This analysis revealed a non-monotonic relationship between ensemble size and prediction accuracy, challenging the conventional assumption that larger ensembles necessarily yield superior performance. Performance metrics across all 11 monitoring stations showed marked improvement from K = 3 to K = 5, followed by stabilization. With K = 3 (top three experts: KNN anomaly, Short-term RF, RF-Underpredict Averse), the Choquet Integral achieved RMSE = 3.14 ± 0.62 μg/m3 and R2 = 0.94 ± 0.03 (RMSE/R2 + Standard deviation) in Budatétény for example. Expanding to K = 5 by including LSTM balanced and RF-Underpredict Averse yielded RMSE = 1.82 ± 0.39 μg/m3 and R2 = 0.98 ± 0.01, representing a 42.1% error reduction. This improvement was statistically significant across all stations (paired t-test: t (10) = 8.73, p < 0.001, Cohen’s d = 2.63), indicating a very large effect size. Further ensemble expansion showed diminishing returns; detailed results are shown in Table S1 in Supplementary Materials.
Figure 4. Station-specific prediction error (1/RMSE) for Choquet Integral fusion with different ensemble sizes (K = 3, 5, 7, 10, 13).
Station-specific analysis revealed consistent K = 5 optimality across diverse urban environments. At high-traffic Erzsébet square, performance improved from RMSE = 3.97 μg/m3 (K = 3) to 2.57 μg/m3 (K = 5), then degraded to 5.63 μg/m3 (K = 13). Suburban Budatétény showed similar patterns: 3.14, 1.82 and 5.31 μg/m3 for K = 3, 5, and 13, respectively. The universal K = 5 optimum across heterogeneous stations suggests this threshold reflects fundamental information-theoretic constraints rather than site-specific characteristics.
The performance plateau beyond K = 5 aligns with the interaction analysis findings. Among the 15 pairwise interactions in the K = 5 configuration, 11 (73.3%) exhibited negative Möbius coefficients, indicating redundancy. The five models selected at K = 5 represented distinct modeling paradigms: instance-based (KNN anomaly), tree ensemble (Short-term RF, RF variants), and recurrent neural (LSTM balanced), maximizing architectural diversity. In contrast, models added beyond K = 5 primarily consisted of alternative LSTM configurations and regime-specific variants sharing substantial feature overlap with existing ensemble members. Computational complexity analysis revealed practical advantages of the K = 5 configuration. The number of Möbius parameters scales as K + K ( k 1 ) / 2 , yielding 6, 15, 28, 55, and 91 parameters for K = 3, 5, 7, 10, and 13, respectively. Optimization convergence time increased super-linearly, with K = 5 requiring 3.2 s versus 18.7 s for K = 13 using COBYLA optimization. K = 5 configuration thus achieved 96.4% of K = 13’s accuracy with 16.5% of the parameters and 17.1% of the computation time.
Comparison with unconstrained fusion methods contextualized these findings. Stack ensemble using all 19 available models achieved RMSE = 1.66 ± 0.36 μg/m3, only 8.5% better than Choquet-K5 despite unlimited non-linear capacity and 4.75× more base models. This marginal improvement, within PM10 measurement uncertainty (±10–15%), validates that information saturation occurs at approximately 5 complementary models for this application. The Choquet Integral’s explicit redundancy penalization through negative interaction coefficients enabled near-optimal performance with a minimal model subset, whereas Stack required the full ensemble to implicitly learn these redundancies through its meta-learner.

3.3. Interpretability of Choquet Integral

The Choquet integral’s sophisticated handling of model interactions reveals its fundamental strength in PM10 forecasting as presented in Figure 5: the ability to simultaneously exploit synergies (red sectors in the figure) where they exist and penalize redundancies (blue sectors in the figure) where they dominate. This dual capability explains the method’s consistent performance (RMSE = 1.83 ± 0.39 μg/m3) across diverse urban environments, achieving near-optimal results through mathematically principled aggregation rather than black-box optimization. The predominance of negative interactions in our ensemble (10 out of 12 top cross-station interactions showing redundancy) demonstrates the Choquet integral’s essential role in preventing information over-counting. Traditional fusion methods like simple averaging or weighted means would treat redundant LSTM variants (LSTM-Long × LSTM-Short: m = −0.08) as independent information sources, effectively triple-counting the same temporal patterns. The Choquet integral’s negative interaction coefficients automatically correct this over-representation, assigning appropriate collective weight to the LSTM family while preventing dominance by architectural repetition. This redundancy management proves particularly valuable given that Random Forest variants with different loss functions (RF-Underpredict × RF-Overpredict: m = −0.05 to −0.15) converge to nearly identical decision boundaries. Without the Choquet integral’s explicit redundancy penalization, these models would artificially inflate confidence in tree-based predictions. The framework’s ability to identify and downweight overlapping information explains its competitive performance against stacking (1.83 vs. 1.67 μg/m3 RMSE), despite stacking’s advantage of unrestricted non-linear optimization. The Choquet integral achieves comparable accuracy through transparent, interpretable interaction modeling rather than opaque neural network combinations.
Figure 5. Synergy and redundancy interactions of base models in each station.
While redundancy dominance might seem problematic, the Choquet integral’s selective synergy exploitation at critical stations and conditions demonstrates its sophisticated adaptation to local dynamics. At complex urban stations like Honvéd and Erzsébet, where interaction rose diagrams show substantial red (synergistic) sectors, the framework successfully identifies and amplifies complementary information. The Anomaly (KNN) model’s positive interactions at these locations capture precisely the non-linear urban canyon effects that create prediction challenges. The synergy between Anomaly detection and other models is not uniformly distributed but emerges exactly where needed, at stations with irregular emission patterns and complex building-induced turbulence. This spatial selectivity represents intelligent fusion: the Choquet integral does not force synergy where none exists (as at simple stations like Budatétény) but exploits it where available.
The Choquet integral’s interaction patterns reflect information complementarity rather than specific atmospheric scales. Negative coefficients (redundancy) occur between models using overlapping feature sets, for instance, multiple LSTM variants processing similar temporal patterns. Positive coefficients (synergy) emerge between models capturing orthogonal information, such as KNN-Anomaly’s deviation-based features versus RF’s absolute concentration features. This aligns with information theory: redundant signals should be downweighted while complementary signals deserve combined consideration. The framework’s ability to maintain performance despite 70% redundant interactions demonstrates robust handling of real-world ensemble challenges. Rather than degrading under information overlap, the Choquet integral leverages its Möbius representation to optimally weight the 30% unique information while preventing redundancy-induced overconfidence. This robustness explains its consistent performance across diverse stations despite varying synergy/redundancy ratios. The Choquet integral’s success in PM10 forecasting stems from its unique ability to handle the dual challenges of modern ensemble systems: exploiting genuine complementarity while preventing redundancy amplification. Its performance parity with state-of-the-art stacking (9.6% RMSE difference within measurement uncertainty), combined with complete interpretability, establishes it as the optimal framework for operational air quality systems. The predominance of redundant interactions does not diminish the Choquet integral’s value; it validates its necessity. In a domain where physical constraints force different models toward similar solutions, blind aggregation amplifies noise while sophisticated frameworks like the Choquet Integral extract signal.

4. Discussion

This study presents the first application of Choquet Integral fusion for air quality forecasting, demonstrating that interpretable ensemble methods can match black-box performance while providing transparency essential for operational deployment. Three key insights emerge with important implications for atmospheric machine learning [,].
The dominance of feature engineering over model architecture challenges current trends toward increasingly complex neural networks []. The 86.7% performance gap between models using engineered features versus raw meteorological inputs suggests that domain knowledge encoding remains more valuable than architectural sophistication [,]. This finding aligns with recent critiques of “big data” approaches in environmental science, where physical understanding often trumps computational power. The minimal contribution of the temperature-humidity ratio (Pproxy, <2% feature importance) demonstrates an important characteristic of our framework: robustness to individual weak features when the feature space includes physically meaningful variables. While Pproxy exhibited theoretical limitations, its negligible impact validates that ensemble methods naturally suppress poorly justified features through their aggregation mechanisms. However, this robustness does not excuse the inclusion of theoretically flawed features in operational systems. Future implementations should prioritize physically interpretable indicators such as Richardson number, bulk Richardson number, or direct boundary layer height measurements. The success of our meteorological feature set demonstrates that proper atmospheric parameterization, rather than exploratory ratios, drives predictive performance.
The failure of multivariate LSTM despite theoretical advantages particularly highlights how incorrect inductive biases can overwhelm additional information [,]. Multivariate LSTM’s failure (R2 = 0.183 vs. 0.782 for univariate) stems from three statistical issues: (1) Parameter complexity increases 4-fold while training samples remain fixed, reducing effective samples per parameter below 100; (2) Gradient interference occurs when backpropagation attempts to optimize for variables with conflicting temporal dynamics (PM10’s 24 h periodicity versus wind’s 72 h persistence); (3) LSTM’s shared recurrent weights cannot accommodate multiple temporal scales simultaneously [,,]. The architecture assumes all inputs share similar temporal evolution, but atmospheric variables operate at fundamentally different frequencies. This explains why separate univariate models outperform theoretically superior multivariate approaches []. Future research should prioritize developing physically consistent feature spaces rather than pursuing model complexity.
On the other hand, the dramatic performance gap between GBR-Regime and GBR-Stable illuminates a fundamental issue in atmospheric modeling: the stationarity assumption. Traditional models assume statistical properties (mean, variance, autocorrelation) remain constant over time [,]. This fails for PM10 because:
  • Morning rush hours show rapid concentration increases (non-stationary mean) [,].
  • Stable nocturnal inversions exhibit low variance, while afternoon mixing shows high variance.
  • Autocorrelation changes between stagnant episodes and windy periods.
GBR-Stable applies uniform parameters regardless of conditions, attempting to fit a single model to fundamentally different atmospheric states. In contrast, GBR-Regime adapts its predictions based on identified atmospheric regimes, effectively treating PM10 as a switching process rather than a stationary time series. The 4.10 effect size, exceptionally large for environmental modeling, quantifies the penalty of ignoring atmospheric regime transitions [,,]. In other words, the superior performance of GBR-Regime stems from its adaptive feature selection: during stable periods, it uses long-term patterns (24 h/168 h statistics), during rapid transitions, it switches to short-term momentum features (recent lags and changes), and during extreme events, it relies on meteorological drivers. In contrast, GBR-Stable applies the same long-term features regardless of atmospheric state, forcing a single model to fit fundamentally different pollution dynamics.
Choquet Integral’s ability to explicitly manage redundancy addresses a fundamental but overlooked challenge in ensemble learning [,]. The predominance of negative Möbius coefficients (70% of interactions) reveals that most models converge toward similar solutions due to atmospheric physics constraints. This contradicts the common assumption that ensemble diversity inherently improves performance []. Instead, our results suggest that preventing information over-counting is more critical than exploiting synergies. The framework’s transparency through Shapley values and interaction matrices enables diagnosis of model failures and targeted improvements impossible with black-box stacking.
The optimal ensemble size of K = 5 has important practical implications. The 42% performance improvement from K = 3 to K = 5, followed by saturation, suggests fundamental limits to the independent information available from PM10 observations. This finding could guide operational system design, as maintaining 5 well-chosen models reduces computational costs by 62% compared to exhaustive ensembles while preserving 96% of performance gains []. The consistency of this optimum across diverse urban environments suggests it reflects information-theoretic constraints rather than site-specific characteristics.
Our approach’s limitations point toward future research directions. The static Möbius coefficients assumption may oversimplify seasonal dynamics, particularly during extreme events when atmospheric processes shift. Adaptive Choquet Integrals with time-varying coefficients could capture these changes, though maintaining theoretical guarantees while enabling adaptation presents mathematical challenges. Integration of satellite and mobile sensor data could overcome current performance ceilings but would require hierarchical fusion frameworks preserving interpretability across spatial scales [,].
The performance plateau at R2 ≈ 0.98 raises fundamental questions about predictability limits in urban atmospheric systems. This ceiling, consistent across diverse architectures and comprehensive feature engineering, likely reflects irreducible uncertainty from sub-grid turbulence, stochastic emissions, and measurement noise. Breaking through may require paradigm shifts such as physics-informed neural networks embedding conservation laws or hybrid models combining deterministic chemistry with statistical corrections.
While individual base models remain black boxes, the Choquet integral provides three levels of operational transparency essential for public health deployment:
  • Model reliability indicators: Shapley values reveal which models dominate under current conditions (e.g., KNN-Anomaly weighted 45% during Saharan dust events while routine traffic models drop to 10%).
  • Failure diagnostics: When predictions fail, Möbius coefficients reveal which model combinations caused the error. If LSTM-Short and RF-Standard show strong positive interaction (synergy) during a missed peak, operators know these models are amplifying each other’s errors.
  • Confidence assessment: Large negative interactions (redundant models agreeing) indicate higher confidence; disagreement among typically synergistic models signals uncertainty.
Practical example: A public health alert might state: “Forecast: 85 μg/m3. High confidence. Anomaly detection (weight: 0.4) and meteorological models (0.3) indicate atmospheric stagnation. Traffic models downweighted (0.1 each) for holiday period.” This does not explain why models predict 85 μg/m3 (models remain opaque), but it demonstrates which models drive the prediction and whether weighting is appropriate, enabling officials to calibrate trust and response accordingly. This “fusion-level interpretability” distinguishes our approach from fully opaque stacking while acknowledging that we do not reveal underlying atmospheric mechanisms.
The broader implications extend beyond air quality to environmental machine learning generally. Many environmental systems exhibit similar characteristics: physical constraints creating model redundancy, multiple scales of relevant processes, and requirements for interpretable predictions. The Choquet Integral framework could apply to climate modeling, hydrological forecasting, or ecosystem monitoring where ensemble fusion is common, but interpretability remains challenging.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/atmos16111274/s1, Figure S1: Scatter plots of different fusion methods for the rest of the stations. Table S1 shows metrics of Choquet-K7, Choquet-K10 and Choquet-K13.

Author Contributions

Conceptualization, H.B., A.A., N.O. and A.M.; methodology, H.B.; software, H.B. and A.A.; validation, H.B., A.A. and A.M.; formal analysis, H.B.; investigation, H.B.; resources, H.B., A.A.; data curation, H.B.; writing—original draft preparation, H.B.; writing—review and editing, A.A., A.M. and N.O.; visualization, H.B.; supervision, G.G.; project administration, G.G.; funding acquisition, G.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Research Excellence Program 2025 of the Hungarian University of Agriculture and Life Sciences, grant number KKP2025.

Data Availability Statement

Data is available upon request to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hilly, J.J.; Singh, K.R.; Jagals, P.; Mani, F.S.; Turagabeci, A.; Ashworth, M.; Mataki, M.; Morawska, L.; Knibbs, L.D.; Stuetz, R.M.; et al. Review of Scientific Research on Air Quality and Environmental Health Risk and Impact for PICTs. Sci. Total Environ. 2024, 942, 173628. [Google Scholar] [CrossRef] [PubMed]
  2. Pisoni, E.; Zauli-Sajani, S.; Belis, C.A.; Khomenko, S.; Thunis, P.; Motta, C.; Van Dingenen, R.; Bessagnet, B.; Monforti-Ferrario, F.; Maes, J.; et al. High-resolution assessment of air quality and health in Europe under different climate mitigation scenarios. Nat. Commun. 2025, 16, 5134. [Google Scholar] [CrossRef] [PubMed]
  3. Kumar, V.; Vellapandian, C. Unraveling the nexus between ambient air pollutants and cardiovascular morbidity: Mechanistic insights and therapeutic horizons. Cureus 2024, 16, e68650. [Google Scholar] [CrossRef] [PubMed]
  4. Taha, S.S.; Idoudi, S.; Alhamdan, N.; Ibrahim, R.H.; Surkatti, R.; Amhamed, A.; Alrebei, O.F. Comprehensive review of health impacts of the exposure to nitrogen oxides (NOx), carbon dioxide (CO2), and particulate matter (PM). J. Hazard. Mater. Adv. 2025, 19, 100771. [Google Scholar] [CrossRef]
  5. Newman, J.D.; Bhatt, D.L.; Rajagopalan, S.; Balmes, J.R.; Brauer, M.; Breysse, P.N.; Brown, A.G.M.; Carnethon, M.R.; Cascio, W.E.; Collman, G.W.; et al. Cardiopulmonary impact of particulate air pollution in high-risk populations: JACC state-of-the-art review. J. Am. Coll. Cardiol. 2020, 76, 2878–2894. [Google Scholar] [CrossRef]
  6. Rocha, J.; Oliveira, S.; Viana, C.M.; Ribeiro, A.I. Climate change and its impacts on health, environment and economy. In One Health: Integrated Approach to 21st Century Challenges to Health; Academic Press: Cambridge, MA, USA, 2022; pp. 253–279. [Google Scholar] [CrossRef]
  7. Leddin, D. The impact of climate change, pollution, and biodiversity loss on digestive health and disease. Gastro Hep Adv. 2024, 3, 519–534. [Google Scholar] [CrossRef]
  8. Atuyambe, L.M.; Arku, R.E.; Naidoo, N.; Kapwata, T.; Asante, K.P.; Cissé, G.; Simane, B.; Wright, C.Y.; Berhane, K. The health impacts of air pollution in the context of changing climate in Africa: A narrative review with recommendations for action. Ann. Glob. Health 2024, 90, 4527. [Google Scholar] [CrossRef]
  9. Wang, S.; Song, R.; Xu, Z.; Chen, M.; Di Tanna, G.L.; Downey, L.; Jan, S.; Si, L. The costs, health and economic impact of air pollution control strategies: A systematic review. Glob. Health Res. Policy 2024, 9, 30. [Google Scholar] [CrossRef]
  10. Zhou, D.; Yang, Y.; Zhao, Z.; Zhou, K.; Zhang, D.; Tang, W.; Zhou, M. Air pollution-related disease and economic burden in China, 1990–2050: A modelling study based on Global Burden of Disease. Environ. Int. 2025, 196, 109300. [Google Scholar] [CrossRef]
  11. Ahmed, K.M.; Salih, A.M.; Raoof, B.K.; Ahmed, T.N.; Mahmood, A.A.; Mohammed, B.A.; Yaqub, K.Q.; Ali, R.A.; Omer, Z.O. Economic burden of air pollution and healthcare costs for respiratory diseases in the United States of America. Int. J. Sci. Res. Mod. Technol. 2025, 4, 64–75. [Google Scholar] [CrossRef]
  12. Ferenczi, Z.; Imre, K.; Lakatos, M.; Molnár, Á.; Bozó, L.; Homolya, E.; Gelencsér, A. Long-term characterization of urban PM10 in Hungary. Aerosol Air Qual. Res. 2021, 21, 210048. [Google Scholar] [CrossRef]
  13. Jayaraman, S.; Nathezhtha, T.; Abirami, S.; Sakthivel, G. Enhancing urban air quality prediction using a time-based spatial forecasting framework. Sci. Rep. 2025, 15, 4139. [Google Scholar] [CrossRef] [PubMed]
  14. Lee, J.; Barquilla, C.A.M.; Park, K.; Hong, A. Urban form and seasonal PM2 5 dynamics: Enhancing air quality prediction using interpretable machine learning and IoT sensor data. Sustain. Cities Soc. 2024, 117, 105976. [Google Scholar] [CrossRef]
  15. Ajdour, A.; Ydir, B.; Bouzghiba, H.; Sulaymon, I.D.; Adnane, A.; Ben Hmamou, D.; Khomsi, K.; Chaoufi, J.; Géczi, G.; Leghrib, R. Investigating two-dimensional horizontal mesh grid effects on the Eulerian atmospheric transport model using artificial neural network. Aerosol Air Qual. Res. 2024, 24, 230309. [Google Scholar] [CrossRef]
  16. Kovács, A.; Leelőssy, Á.; Tettamanti, T.; Esztergár-Kiss, D.; Mészáros, R.; Lagzi, I. Coupling traffic originated urban air pollution estimation with an atmospheric chemistry model. Urban Clim. 2021, 37, 100868. [Google Scholar] [CrossRef]
  17. Liaskoni, M.; Huszar, P.; Bartík, L.; Prieto Perez, A.P.; Karlický, J.; Vlček, O. Modelling the European wind-blown dust emissions and their impact on particulate matter (PM) concentrations. Atmos. Chem. Phys. 2023, 23, 3629–3654. [Google Scholar] [CrossRef]
  18. Brotzge, J.A.; Berchoff, D.; Carlis, D.L.; Carr, F.H.; Carr, R.H.; Gerth, J.J.; Gross, B.D.; Hamill, T.M.; Haupt, S.E.; Jacobs, N.; et al. Challenges and opportunities in numerical weather prediction. Bull. Am. Meteorol. Soc. 2023, 104, E698–E705. [Google Scholar] [CrossRef]
  19. Zhang, H.; Liu, Y.; Zhang, C.; Li, N. Machine learning methods for weather forecasting: A survey. Atmosphere 2025, 16, 82. [Google Scholar] [CrossRef]
  20. Patel, S.; Shah, M.; Patel, K.; Prajapati, M. A general review on the applications of machine learning to PM2 5 air pollution forecasting. Mach. Learn. Comput. Sci. Eng. 2025, 1, 33. [Google Scholar] [CrossRef]
  21. Kalantari, E.; Gholami, H.; Malakooti, H.; Nafarzadegan, A.R.; Moosavi, V. Machine learning for air quality index (AQI) forecasting: Shallow learning or deep learning? Environ. Sci. Pollut. Res. 2024, 31, 62962–62982. [Google Scholar] [CrossRef]
  22. Rahman, M.M.; Nayeem, M.E.H.; Ahmed, M.S.; Tanha, K.A.; Sakib, M.S.A.; Uddin, K.M.M.; Babu, H.M.H. AirNet: Predictive machine learning model for air quality forecasting using web interface. Environ. Syst. Res. 2024, 13, 44. [Google Scholar] [CrossRef]
  23. Makhdoomi, A.; Sarkhosh, M.; Ziaei, S. PM2 5 concentration prediction using machine learning algorithms: An approach to virtual monitoring stations. Sci. Rep. 2025, 15, 8076. [Google Scholar] [CrossRef] [PubMed]
  24. Lei, T.M.T.; Siu, S.W.I.; Monjardino, J.; Mendes, L.; Ferreira, F. Using machine learning methods to forecast air quality: A case study in Macao. Atmosphere 2022, 13, 1412. [Google Scholar] [CrossRef]
  25. Masood, A.; Hameed, M.M.; Srivastava, A.; Pham, Q.B.; Ahmad, K.; Razali, S.F.M.; Baowidan, S.A. Improving PM2 5 prediction in New Delhi using a hybrid extreme learning machine coupled with snake optimization algorithm. Sci. Rep. 2023, 13, 21057. [Google Scholar] [CrossRef] [PubMed]
  26. Ahmed, M.; Kong, J.; Jiang, N.; Duc, H.N.; Puppala, P.; Azzi, M.; Riley, M.; Barthelemy, X. A Bayesian-optimized surrogate model integrating deep learning algorithms for correcting PurpleAir sensor measurements. Atmosphere 2024, 15, 1535. [Google Scholar] [CrossRef]
  27. Abuouelezz, W.; Ali, N.; Aung, Z.; Altunaiji, A.; Shah, S.B.; Gliddon, D. Exploring PM2 5 and PM10 ML forecasting models: A comparative study in the UAE. Sci. Rep. 2025, 15, 9797. [Google Scholar] [CrossRef]
  28. Qi, H.; Ma, S.; Chen, J.; Sun, J.; Wang, L.; Wang, N.; Wang, W.; Zhi, X.; Yang, H. Multi-model evaluation and Bayesian model averaging in quantitative air quality forecasting in Central China. Aerosol Air Qual. Res. 2022, 22, 210247. [Google Scholar] [CrossRef]
  29. Ning, S.; Cheng, Y.; Zhou, Y.; Wang, J.; Zhang, Y.; Jin, J.; Thapa, B.R. Bayesian model averaging for satellite precipitation data fusion: From accuracy estimation to runoff simulation. Remote Sens. 2025, 17, 1154. [Google Scholar] [CrossRef]
  30. Ning, Y.; Sun, R.; Hitchcock, D.; Comert, G.; Chen, Y. Bayesian modeling of traffic-related air pollutants: A case study of urban transportation and air quality dynamics in Columbia, South Carolina. Atmos. Environ. X 2025, 26, 100328. [Google Scholar] [CrossRef]
  31. Sudha, R.; Damodaran, A.; Manohar, G. Enhanced air quality prediction using adaptive residual Bi-LSTM with pyramid dilation and optimal weighted feature selection. Sci. Rep. 2025, 15, 30428. [Google Scholar] [CrossRef]
  32. Drewil, G.I.; Al-Bahadili, R.J. Air pollution prediction using LSTM deep learning and metaheuristics algorithms. Meas. Sens. 2022, 24, 100546. [Google Scholar] [CrossRef]
  33. Olawade, D.B.; Wada, O.Z.; Ige, A.O.; Egbewole, B.I.; Olojo, A.; Oladapo, B.I. Artificial intelligence in environmental monitoring: Advancements, challenges, and future directions. Hyg. Environ. Health Adv. 2024, 12, 100114. [Google Scholar] [CrossRef]
  34. Özüpak, Y.; Alpsalaz, F.; Aslan, E. Air quality forecasting using machine learning: Comparative analysis and ensemble strategies for enhanced prediction. Water Air Soil Pollut. 2025, 236, 464. [Google Scholar] [CrossRef]
  35. Xu, Z.; Zhang, H.; Zhai, A.; Kong, C.; Zhang, J. Stacking ensemble learning and SHAP-based insights for urban air quality forecasting: Evidence from Shenyang and global implications. Atmosphere 2025, 16, 776. [Google Scholar] [CrossRef]
  36. Ravindiran, G.; Karthick, K.; Rajamanickam, S.; Datta, D.; Das, B.; Shyamala, G.; Hayder, G.; Maria, A. Ensemble stacking of machine learning models for air quality prediction for Hyderabad City in India. iScience 2025, 28, 111894. [Google Scholar] [CrossRef]
  37. Tian, H.; Kong, H.; Wong, C. A novel stacking ensemble learning approach for predicting PM2.5 levels in dense urban environments using meteorological variables: A case study in Macau. Appl. Sci. 2024, 14, 5062. [Google Scholar] [CrossRef]
  38. Tang, D.; Zhan, Y.; Yang, F. A review of machine learning for modeling air quality: Overlooked but important issues. Atmos. Res. 2024, 300, 107261. [Google Scholar] [CrossRef]
  39. Bustince, H.; Mesiar, R.; Fernandez, J.; Galar, M.; Paternain, D.; Altalhi, A.; Dimuro, G.P.; Bedregal, B.; Takáč, Z. D-Choquet integrals: Choquet integrals based on dissimilarities. Fuzzy Sets Syst. 2021, 414, 1–27. [Google Scholar] [CrossRef]
  40. Automatic Measuring Network–National Air Pollution Measuring Network. Available online: https://legszennyezettseg.met.hu/levegominoseg/meresi-adatok/automata-merohalozat (accessed on 21 January 2023).
  41. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. Available online: http://jmlr.org/papers/v12/pedregosa11a.html (accessed on 2 September 2025).
  42. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  43. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  44. Schölkopf, B. SVMs–A practical consequence of learning theory. IEEE Intell. Syst. Their Appl. 1998, 13, 18–21. [Google Scholar] [CrossRef]
  45. Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN model-based approach in classification. In On the Move to Meaningful Internet Systems 2003; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2888, pp. 986–996. [Google Scholar] [CrossRef]
  46. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  47. Kim, S.; Alizamir, M.; Kim, N.W.; Kisi, O. Bayesian model averaging: A unique model enhancing forecasting accuracy for daily streamflow based on different antecedent time series. Sustainability 2020, 12, 9720. [Google Scholar] [CrossRef]
  48. Sullivan, G.M.; Feinn, R. Using effect size—Or why the p value is not enough. J. Grad. Med. Educ. 2012, 4, 279–282. [Google Scholar] [CrossRef]
  49. Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
  50. Houdou, A.; El Badisy, I.; Khomsi, K.; Abdala, S.A.; Abdulla, F.; Najmi, H.; Obtel, M.; Belyamani, L.; Ibrahimi, A.; Khalis, M. Interpretable machine learning approaches for forecasting and predicting air pollution: A systematic review. Aerosol Air Qual. Res. 2024, 24, 230151. [Google Scholar] [CrossRef]
  51. Huang, J.J. Building the hierarchical Choquet integral as an explainable AI classifier via neuroevolution and pruning. Fuzzy Optim. Decis. Mak. 2023, 22, 81–102. [Google Scholar] [CrossRef]
  52. Yang, R.; Hu, J.; Li, Z.; Mu, J.; Yu, T.; Xia, J.; Li, X.; Dasgupta, A.; Xiong, H. Interpretable machine learning for weather and climate prediction: A review. Atmos. Environ. 2024, 338, 120797. [Google Scholar] [CrossRef]
  53. Gilpin, W. Model scale versus domain knowledge in statistical forecasting of chaotic systems. Nat. Rev. Phys. 2024, 6, 194–206. [Google Scholar] [CrossRef]
  54. Blair, G.S.; Henrys, P.; Leeson, A.; Watkins, J.; Eastoe, E.; Jarvis, S.; Young, P.J. Data science of the natural environment: A research roadmap. Front. Environ. Sci. 2019, 7, 121. [Google Scholar] [CrossRef]
  55. Willard, J.; Jia, X.; Xu, S.; Steinbach, M.; Kumar, V. Integrating scientific knowledge with machine learning for engineering and environmental systems. ACM Comput. Surv. 2022, 55, 3514228. [Google Scholar] [CrossRef]
  56. Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. J. King Saud Univ. Comput. Inf. Sci. 2024, 36, 102068. [Google Scholar] [CrossRef]
  57. Sun, J.; Sun, Z.; Chen, Z.; Dong, M.; Wang, X.; Chen, C.; Zheng, H.; Zhao, X. MSA-LR: Enhancing multi-scale temporal dynamics in multivariate time series forecasting with low-rank self-attention. Neural Netw. 2026, 194, 108150. [Google Scholar] [CrossRef]
  58. Shih, S.Y.; Sun, F.K.; Lee, H.Y. Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 2019, 108, 1421–1441. [Google Scholar] [CrossRef]
  59. Espinoza, E.A.; Kratzert, F.; Klotz, D.; Gauch, M.; Chaves, M.Á.; Loritz, R.; Ehret, U. Technical note: An approach for handling multiple temporal frequencies with different input dimensions using a single LSTM cell. Hydrol. Earth Syst. Sci. 2025, 29, 1749–1758. [Google Scholar] [CrossRef]
  60. Gudziunaite, S.; Shabani, Z.; Weitensfelder, L.; Moshammer, H. Time series analysis in environmental epidemiology: Challenges and considerations. Int. J. Occup. Med. Environ. Health 2023, 36, 704. [Google Scholar] [CrossRef] [PubMed]
  61. Freeman, B.S.; Taylor, G.; Gharabaghi, B.; Thé, J. Forecasting air quality time series using deep learning. J. Air Waste Manag. Assoc. 2018, 68, 866–886. [Google Scholar] [CrossRef] [PubMed]
  62. Cholianawati, N.; Sinatra, T.; Nugroho, G.A.; Permadi, D.A.; Indrawati, A.; Halimurrahman; Kallista, M.; Romadhon, M.S.; Ma’ruf, I.F.; Yudhatama, D.; et al. Diurnal and daily variations of PM2.5 and its multiple-wavelet coherence with meteorological variables in Indonesia. Aerosol Air Qual. Res. 2024, 24, 230158. [Google Scholar] [CrossRef]
  63. Czernecki, B.; Marosz, M.; Jędruszkiewicz, J. Assessment of machine learning algorithms in short-term forecasting of PM10 and PM2.5 concentrations in selected Polish agglomerations. Aerosol Air Qual. Res. 2021, 21, 200586. [Google Scholar] [CrossRef]
  64. Du, Q.; Zhao, C.; Zhang, M.; Dong, X.; Chen, Y.; Liu, Z.; Hu, Z.; Zhang, Q.; Li, Y.; Miao, S. Modeling diurnal variation of surface PM2.5 concentrations over East China with WRF-Chem: Impacts from boundary-layer mixing and anthropogenic emission. Atmos. Chem. Phys. 2020, 20, 2839–2863. [Google Scholar] [CrossRef]
  65. Nguyen, A.T.; Pham, D.H.; Oo, B.L.; Ahn, Y.; Lim, B.T.H. Predicting air quality index using attention hybrid deep learning and quantum-inspired particle swarm optimization. J. Big Data 2024, 11, 71. [Google Scholar] [CrossRef]
  66. Lange, T.; Rahbek, A. An introduction to regime switching time series models. In Handbook of Financial Time Series; Springer: Berlin/Heidelberg, Germany, 2009; pp. 871–887. [Google Scholar] [CrossRef]
  67. Li, K.; Persaud, D.; Choudhary, K.; DeCost, B.; Greenwood, M.; Hattrick-Simpers, J. Exploiting redundancy in large materials datasets for efficient machine learning with less data. Nat. Commun. 2023, 14, 7283. [Google Scholar] [CrossRef] [PubMed]
  68. Brient, F. Reducing uncertainties in climate projections with emergent constraints: Concepts, examples and prospects. Adv. Atmos. Sci. 2020, 37, 1–15. [Google Scholar] [CrossRef]
  69. Naderalvojoud, B.; Hernandez-Boussard, T. Improving machine learning with ensemble learning on observational healthcare data. AMIA Annu. Symp. Proc. 2024, 2023, 521. [Google Scholar]
  70. Xu, R.; Wang, D.; Li, J.; Wan, H.; Shen, S.; Guo, X. A hybrid deep learning model for air quality prediction based on the time–frequency domain relationship. Atmosphere 2023, 14, 405. [Google Scholar] [CrossRef]
  71. Di, Q.; Amini, H.; Shi, L.; Kloog, I.; Silvern, R.; Kelly, J.; Sabath, M.B.; Choirat, C.; Koutrakis, P.; Lyapustin, A.; et al. Assessing NO2 concentration and model uncertainty with high spatiotemporal resolution across the contiguous United States using ensemble model averaging. Environ. Sci. Technol. 2019, 54, 1372–1384. [Google Scholar] [CrossRef]
  72. Schneider, P.; Castell, N.; Vogt, M.; Dauge, F.R.; Lahoz, W.A.; Bartonova, A. Mapping urban air quality in near real-time using observations from low-cost sensors and model information. Environ. Int. 2017, 106, 234–247. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.