A Physics-Informed Multimodal Deep Learning Framework for City-Scale Air-Quality and Health-Risk Prediction

Alhawiti, Khaled M.

doi:10.3390/systems14030320

Open AccessArticle

A Physics-Informed Multimodal Deep Learning Framework for City-Scale Air-Quality and Health-Risk Prediction

by

Khaled M. Alhawiti

Faculty of Computers and Information Technology, University of Tabuk, Tabuk 47512, Saudi Arabia

Systems 2026, 14(3), 320; https://doi.org/10.3390/systems14030320

Submission received: 2 February 2026 / Revised: 9 March 2026 / Accepted: 14 March 2026 / Published: 18 March 2026

Download

Browse Figures

Versions Notes

Abstract

Accurate and interpretable air quality prediction remains a critical challenge for environmental health management due to complex, nonlinear interactions among emissions, meteorology, and atmospheric chemistry. This study presents a hybrid physics informed and multimodal deep learning framework for city-scale air quality and health risk prediction. The framework combines a Gaussian plume dispersion model with a residual CNN-LSTM network that learns data driven corrections while preserving physical consistency. Multimodal open datasets, including ground based pollutant sensors, meteorological records, and satellite derived aerosol and temperature features, are jointly fused to improve spatiotemporal fidelity. An Exposure Health Index module further links predicted pollutant fields with respiratory morbidity indicators, providing a quantitative bridge between atmospheric variability and health outcomes. Using open source datasets from Riyadh, Jeddah, and Dammam, the proposed approach achieves up to 25% lower mean absolute error and

R^{2}

values above 0.85 compared with physics only and purely data driven baselines. Explainability analyses using SHAP and spatial attention highlight physically plausible drivers and confirm feature relevance. The results demonstrate that physics guided residual learning can unify deterministic dispersion modeling and multimodal inference, providing a transparent, scalable, and reproducible foundation for air quality forecasting and health risk assessment.

Keywords:

air quality forecasting; exposure health index; multimodal data fusion; physics informed learning; residual CNN-LSTM

1. Introduction

Air pollution remains one of the most critical environmental health challenges of the twenty-first century, contributing to an estimated seven million premature deaths annually according to the World Health Organization (WHO) [1,2,3]. Fine particulate matter (PM_2.5) and gaseous pollutants such as NO₂, SO₂, and O₃ have been strongly linked to cardiopulmonary diseases, asthma, and reduced life expectancy. Accurate, high-resolution prediction of pollutant concentrations is therefore essential for exposure assessment, early-warning systems, and evidence-based policy interventions [4]. However, urban air-quality dynamics are highly nonlinear and spatio-temporally variable, governed by complex interactions among emissions, meteorology, and chemical transformations. This complexity presents a long-standing modeling challenge for both deterministic physics-based and purely data-driven approaches [5].

Conventional physics-based dispersion models, such as Gaussian plume, Gaussian puff, or Eulerian grid solvers, offer interpretability and physical consistency but often require detailed emission inventories and boundary layer parameters that are difficult to obtain in real time. Their simplified assumptions on turbulence, deposition, and chemistry can lead to underestimation of episodic pollution events and poor adaptability to evolving urban conditions. In contrast, machine-learning and deep-learning models (e.g., CNNs, LSTMs, Transformers) have shown strong predictive capability by learning complex correlations directly from observational data [6,7,8]. Yet, they remain largely black-box systems sensitive to data gaps, unable to extrapolate beyond the training distribution, and often violating known physical laws [9]. This trade-off between interpretability and accuracy motivates the need for hybrid frameworks that integrate physical domain knowledge with data-driven learning [10].

Recent advances in physics-informed neural networks (PINNs) and hybrid AI have shown promise in embedding physical constraints within deep architectures, enabling improved generalization and physically consistent predictions [11]. In the atmospheric sciences, such integration has been applied to turbulence modeling, weather forecasting, and aerosol transport; however, applications to urban air-quality prediction and exposure health risk estimation remain limited. Most existing studies focus on pollutant forecasting alone, without explicitly coupling dispersion physics, multimodal observational data, and health impact modeling within a unified and transparent framework [12,13].

To address these gaps, this study proposes a hybrid physics-informed and multimodal data-driven deep learning framework for city-scale air-quality and health-risk prediction. The framework combines a Gaussian-plume dispersion model with a residual CNN-LSTM network that learns data-driven corrections while preserving physical consistency. Multimodal inputs, including ground-based measurements, meteorological parameters, and satellite-derived aerosol and temperature features, are jointly fused to enhance spatial and temporal coverage. The resulting hybrid predictions are further linked to public health indicators through an Exposure Health Index, providing a quantitative bridge between atmospheric variability and respiratory outcomes.

The main contributions of this research are summarized as follows.

A hybrid physics deep learning framework is developed that integrates a Gaussian plume dispersion prior with a residual CNN-LSTM network. The approach preserves physical consistency while enhancing nonlinear predictive capability across diverse meteorological conditions.
A multimodal open-data fusion strategy combines ground-based sensors, meteorological records, and satellite derived aerosol and temperature features, improving spatial coverage, temporal fidelity, and reproducibility.
Explainability and interpretability are achieved through SHapley Additive exPlanations and convolutional spatial-attention analysis, ensuring that the learned representations remain physically meaningful.
A coupled exposure health module links predicted pollutant concentrations with public health indicators using an Exposure-Health Index, establishing a quantitative bridge between atmospheric variability and respiratory risk.

Using open datasets from three major Saudi cities, Riyadh, Jeddah, and Dammam the proposed framework is rigorously evaluated across multiple meteorological regimes, demonstrating improved predictive accuracy, robustness, and interpretability compared with baseline physics-only and deep-learning models. The study contributes a reproducible, open-data foundation for trustworthy air-quality forecasting and data-driven environmental health analysis.

2. Literature Review

Accurate air-quality prediction and exposure–health assessment have long been recognized as challenging problems in environmental modeling due to the inherently nonlinear, multiscale, and data-sparse nature of atmospheric processes. Previous research has addressed this problem through three dominant paradigms: (i) physics-based deterministic models that emphasize mechanistic interpretability, (ii) data-driven deep-learning architectures that capture complex correlations from observational data, and (iii) hybrid or physics-informed frameworks that integrate the strengths of both approaches.

In parallel, recent studies have explored Internet of Things (IoT)–based healthcare optimization frameworks that integrate real-time sensor monitoring and data-driven analytics to improve patient monitoring, disease prediction, and healthcare resource management [14,15,16,17]. Such IoT-enabled monitoring systems highlight the growing importance of integrating environmental sensing with health analytics to support data-driven exposure and public-health assessment. This section reviews the key developments across these domains, highlighting recent advances in dispersion modeling, multimodal data fusion, and physics-guided machine learning for air-quality forecasting and health-risk prediction.

Hettige et al. [18] proposed AirPhyNet, a physics-guided neural architecture that embeds advection-diffusion equations into a graph-based deep-learning framework for air-quality forecasting. The model integrates domain physics with latent spatio-temporal representations, enhancing both accuracy and interpretability. Experimental results on multiple benchmark datasets demonstrated superior performance over conventional black-box networks, with up to 10% reduction in prediction error and improved physical consistency across varying lead times and data-sparsity conditions.

Chen et al. [19] proposed an interpretable physics-informed deep neural network (IPMDNN) for joint estimation of O₃, PM_2.5, and PM₁₀. The model incorporates a physics-constrained loss, self-attention, and pollutant interaction modules to ensure physical consistency and interpretability. Applied across China (2019–2020), it achieved

R^{2}

values above 0.87 for all pollutants, demonstrating superior accuracy and computational efficiency compared with conventional models.

Thakur and Patel et al. [20] employed a physics-informed LSTM (PI-LSTM) model to predict spatiotemporal PM_2.5 and CO₂ concentrations across multiple zones in a residential apartment. The framework incorporated interzonal transport dynamics and door-configuration effects to improve indoor pollutant forecasting. Validation against experimental data showed that PI-LSTM outperformed conventional LSTM for PM_2.5, particularly near emission sources, highlighting the benefit of embedding physical transport constraints in indoor air-quality prediction.

Cao et al. [21] applied a physics based machine learning framework for urban air pollution prediction using decadal traffic, meteorological, and pollutant datasets from Norwegian cities (2009–2018). By incorporating physical constraints into statistical learning, PBML reported improved accuracy and interpretability compared with LSTM and linear regression baselines. The study suggests that PBML can support long horizon and data efficient air quality prediction at hyperlocal urban scales.

Wang et al. [22] proposed a physics-informed hierarchical data driven predictive control framework for optimizing building HVAC operation while maintaining indoor air quality. Their method uses a physics-informed input convex neural network to model indoor environmental dynamics and guide predictive control. Simulation results reported more than 35% reduction in total cooling load and around 70% reduction in airside coil energy, indicating that the model can balance energy efficiency with air quality constraints.

Li et al. [23] proposed a physics inspired deep graph learning approach that embeds fluid dynamics principles into a multilevel graph neural network for fine scale air quality assessment. By encoding spatiotemporal dependencies and physically consistent transport behavior, the method reported 11–22% higher extrapolation accuracy than baseline machine learning models across multiple pollutants in China. The results highlight improved reliability and physical plausibility for pollutant prediction under distribution shifts.

Naeini et al. [24] introduced PINN-DT, a hybrid framework that integrates Physics-Informed Neural Networks (PINNs), Digital Twins (DTs), and blockchain technology for real-time energy optimization in smart buildings. The model leverages PINNs to embed physical energy constraints within Deep Reinforcement Learning control loops, enhancing interpretability and predictive accuracy. Validated using IoT-based energy datasets, the framework achieved

R^{2} = 0.978

, reduced energy costs by 35%, and improved renewable utilization by 40%, demonstrating its capability for secure and physics-consistent energy management in smart grid environments.

Recent studies have also explored robust spatio-temporal learning under irregular or incomplete input conditions. For example, Buckchash et al. [25] proposed a scalable online learning framework for handling haphazard input streams, where input features may appear, disappear, or become unreliable over time. The designed work introduces a self-attention–based architecture (HapNet) that adapts to dynamic feature availability and achieves competitive performance across multiple streaming benchmarks.

As summarized in Table 1, most existing physics-guided learning approaches have improved prediction accuracy and interpretability but remain limited in scope and integration. Current frameworks are largely domain-specific, lacking comprehensive modeling of pollutant dispersion, multimodal data fusion, and direct linkage to health impacts. The proposed Hybrid Physics-Deep Learning Framework overcomes these limitations by uniting physical dispersion dynamics with deep neural inference and exposure-health coupling, providing an interpretable, scalable, and unified solution for air-quality prediction and health-risk assessment.

3. Hybrid Physics-Informed Multimodal Deep Learning Framework

The proposed hybrid physics-informed and data-driven framework integrates physical dispersion modeling with multimodal environmental and health data to enhance the accuracy and interpretability of air quality and health risk prediction. As shown in Figure 1, the framework comprises six interconnected modules. The physics-based dispersion model simulates pollutant transport using Gaussian plume or advection-diffusion equations and provides physics-derived priors. These priors are fused with multimodal datasets, including ground-sensor, meteorological, satellite, and demographic inputs within a residual CNN-LSTM network that learns spatio-temporal pollutant dynamics and corrects deviations from physical predictions. The model outputs are processed in an exposure health estimation layer, which computes exposure indices and applies XGBoost regression to predict health outcomes. An explainability and evaluation module employs SHAP analysis, attention maps, and performance metrics (MAE, RMSE,

R^{2}

) for model interpretation and validation. Finally, an applications layer translates predictive outputs into actionable tools for AQI mapping, health forecasting, emission-policy assessment, and adaptive alerting, with feedback loops enabling continual model refinement and scenario-based policy integration. The proposed framework consists of several interconnected modules. Section 3.1 describes data acquisition and preprocessing, Section 3.2 presents the physics-based dispersion model, Section 3.3 introduces the CNN-LSTM prediction module, Section 3.4 explains the multimodal fusion strategy, Section 3.5 describes the health-risk estimation component, and Section 3.6 presents the evaluation and explainability methods.

Initially, heterogeneous open datasets are collected and synchronized on a common spatio-temporal grid. Ground-level pollutant concentrations PM_2.5, PM₁₀, NO₂, SO₂, CO, and O₃ are obtained from the KAPSARC Open Data Portal [26,27] and the OpenAQ network [28], while meteorological variables (temperature, humidity, wind speed and direction, pressure, and rainfall) are derived from the Saudi Hourly Weather Dataset hosted on OpenDataSoft [29]. Satellite-based aerosol and radiative features such as Aerosol Optical Depth (AOD) and Land Surface Temperature (LST) are extracted from the NASA MODIS MAIAC and Copernicus Sentinel-5P (TROPOMI) products [30]. Population and respiratory-health indicators are taken from the WHO Global Health Observatory and Kaggle Public Health Statistics [31], ensuring the entire workflow is based on freely accessible data.

Similarly, the physics-informed module applies a Gaussian plume or advection-diffusion equation to simulate baseline pollutant dispersion using the open meteorological and emission data. This physically derived field serves as a prior that encodes atmospheric transport dynamics. The deep-learning module, implemented as a hybrid CNN-LSTM network, receives the physics outputs together with pollutant histories, meteorological parameters, and satellite-derived features. Finally, predicted pollutant levels are combined with open demographic and health indicators to estimate an Exposure-Health Index that reflects respiratory-risk levels. By coupling physically interpretable dispersion dynamics with data-driven adaptability and using exclusively open datasets, the framework provides a transparent, reproducible foundation for city-scale air-quality forecasting and public-health assessment.

3.1. Data Acquisition and Processing

The proposed framework relies entirely on open-access multimodal datasets that collectively describe the atmospheric, meteorological, and health-related conditions across major Saudi cities. All datasets are harmonized on a common spatio-temporal grid to enable joint learning between physical and data-driven modules.

Table 2 summarizes the datasets employed in this study, including the variables extracted, temporal coverage, spatial resolution, and open-data access points. Data streams from the sources are first aligned to a uniform temporal index (hourly to daily aggregation) using mean and peak pollutant interpolation. Outliers beyond three standard deviations are clipped, and negative or missing values are imputed via linear or spline interpolation. Meteorological and pollutant series are standardized (zero-mean, unit-variance) to ensure numerical stability during network training.

Spatial harmonization was achieved by mapping each air-quality station to its nearest meteorological and satellite grid cell (≤5

km

radius). Satellite AOD and LST rasters were resampled using bilinear interpolation to align with ground-station coordinates. Population and health indicators were subsequently joined at the administrative city-level through spatial joins based on latitude-longitude centroids.

After data cleaning and spatio-temporal harmonization, a unified feature matrix was built to capture atmospheric dynamics, pollutant behaviour, and health exposure factors. Temporal dependencies were introduced through lagged pollutant terms (

t - 1

,

t - 3

,

t - 24

) and moving averages, while meteorological coupling of wind, humidity, and temperature represented dispersion and stability effects. Satellite variables such as Aerosol Optical Depth (AOD) and Land Surface Temperature (LST) were fused to extend spatial coverage beyond the ground network.

Physics-informed features, including Gaussian plume dispersion coefficients (

σ_{y}

,

σ_{z}

) and wind-weighted transport distances, provided interpretable priors on local diffusion. Demographic and health indicators, population density, and normalized morbidity rates were added as exposure-weighting factors. All features were standardized using z-score normalization and partitioned chronologically into 70% training, 15% validation, and 15% testing sets to preserve temporal integrity.

3.2. Physics-Based Module

Building upon the preprocessed and feature-engineered multimodal dataset, the physics-based module provides a physically interpretable foundation for pollutant dispersion and serves as the prior component of the hybrid framework. While data-driven networks can capture nonlinear dependencies, they often lack the ability to represent atmospheric transport governed by fluid-dynamic principles. The physics-based model bridges this gap by explicitly simulating how emitted pollutants disperse and dilute under varying meteorological conditions, producing baseline concentration estimates that guide the deep-learning correction stage.

The module employs a simplified Gaussian plume dispersion formulation, widely used in mesoscale air-quality modeling for near-ground emission and transport analysis. For a continuous point source, the steady-state concentration distribution at a location

(x, y, z)

downwind of the emission source is expressed as:

\begin{matrix} C_{p} (x, y, z) & = \frac{Q}{2 π u σ_{y} σ_{z}} exp (- \frac{y^{2}}{2 σ_{y}^{2}}) \\ \times [exp (- \frac{{(z - H)}^{2}}{2 σ_{z}^{2}}) + exp (- \frac{{(z + H)}^{2}}{2 σ_{z}^{2}})] \end{matrix}

(1)

where Q is the emission rate (g s⁻¹), u is the mean wind speed (m s⁻¹), H is the effective stack height (m), and

σ_{y}

and

σ_{z}

represent horizontal and vertical dispersion coefficients (m) that vary with atmospheric stability class and downwind distance. Meteorological parameters from the Saudi Hourly Weather Dataset [29], particularly wind speed, direction, temperature, and boundary-layer stability, are used to compute these coefficients following Pasquill-Gifford relationships. Emission rates are approximated from the KAPSARC air pollutant emission dataset [26] and normalized by industrial sector activity levels.

For each pollutant species (

P M_{2.5}

,

P M_{10}

,

N O_{2}

,

S O_{2}

,

C O

, and

O_{3}

), the model generates baseline concentration fields across discretized spatial grids surrounding the monitoring stations. These physics-derived outputs encode the dominant atmospheric transport, diffusion, and decay behavior under specific meteorological regimes. Although the Gaussian plume formulation simplifies complex urban turbulence and chemical transformations, it provides a physically meaningful prior that constrains the learning space of the neural network. The resulting simulated concentrations are stored as additional input channels to the deep-learning module, enabling the hybrid system to exploit both physical consistency and data-driven adaptability in subsequent prediction stages.

3.3. Deep Learning Module

Following the generation of baseline concentration fields through the physics-based model, the neural network refines these estimates by identifying complex data-driven correction patterns absent from deterministic formulations. The proposed deep learning module refines the physically simulated concentrations using multimodal information derived from ground measurements, meteorological variables, and satellite observations. The hybrid coupling enables the network to learn both the deterministic transport structure encoded in the physics model and the stochastic variability present in the observational data.

The proposed architecture adopts a convolutional neural network–long short-term memory (CNN-LSTM) framework to jointly exploit spatial and temporal dependencies in the multimodal dataset. Convolutional layers extract localized spatial correlations among pollutant, meteorological, and satellite-derived features, while LSTM layers model temporal dynamics, seasonality, and lagged dependencies in pollutant evolution. The multimodal feature tensor

x_{t}

at time t includes ground-level pollutant concentrations, meteorological parameters (temperature, humidity, wind speed and direction, and pressure), satellite-based aerosol optical depth (AOD) and land-surface temperature (LST), together with physics-model predictions. All heterogeneous features are standardized and combined into a unified input representation prior to model training.

The CNN-LSTM architecture consists of two convolutional layers followed by an LSTM temporal modeling block and a fully connected regression layer. The convolutional component employs 32 and 64 filters with kernel size

3 \times 3

, ReLU activation, and batch normalization. The temporal component contains an LSTM layer with 128 hidden units operating on sliding temporal windows of length

T = 24

h to capture diurnal pollutant dynamics. Dropout with a rate of 0.2 is applied before the final dense layer to mitigate overfitting.

The hybrid integration is implemented through a residual learning formulation expressed as

{\hat{y}}_{hybrid} = y_{physics} + f_{θ} (x_{t})

(2)

where

y_{physics}

denotes the pollutant concentration estimated by the dispersion model and

f_{θ} (\cdot)

represents the nonlinear mapping learned by the CNN-LSTM network with parameters

θ

. The model is optimized using the composite loss function

L = MAE (y_{obs}, {\hat{y}}_{hybrid}) + λ {∥\nabla {\hat{y}}_{hybrid} - \nabla y_{physics}∥}_{2}^{2}

(3)

where

λ

controls the balance between prediction accuracy and physical smoothness.

To ensure reproducibility, the dataset was chronologically divided into training (70%), validation (15%), and testing (15%) subsets to preserve temporal dependencies in the pollutant time series. Input sequences were generated using sliding windows of length

T = 24

h, where each window predicts the pollutant concentration at the subsequent time step. Hyperparameters were selected through validation-based grid search. The final model configuration used the Adam optimizer with a learning rate of

10^{- 4}

and batch size of 64. Training was performed for a maximum of 100 epochs with early stopping based on validation MAE (patience = 15 epochs and minimum improvement threshold

10^{- 4}

). The model was implemented in the PyTorch (v2.9.1) framework and trained on NVIDIA GPU hardware.

Upon training, the CNN-LSTM produces refined pollutant forecasts and air-quality indices (

A Q I_{t}

) at hourly to daily resolution. The resulting pollutant estimates, combined with meteorological and demographic variables, are subsequently used in the health-risk estimation module.

3.4. Health-Risk Estimation Module

Using the pollutant prediction produced by the hybrid CNN-LSTM framework, the health-risk module evaluates population exposure levels and models their association with respiratory health outcomes. While the previous stages focus on atmospheric prediction, this component translates predicted air-quality indices into measurable public-health indicators by integrating demographic, pollutant, and epidemiological information obtained from open data sources.

The module first computes a composite Exposure-Health Index that aggregates the contributions of multiple pollutants according to their relative toxicological importance. The exposure index at time t is defined as:

E H I_{t} = \sum_{i = 1}^{n} w_{i} C_{i} (t)

(4)

where

C_{i} (t)

represents the predicted concentration of pollutant i at time t, and

w_{i}

denotes pollutant-specific weighting coefficients derived from WHO air-quality guidelines [31]. The EHI thus reflects the cumulative inhalation burden across particulate and gaseous pollutants.

To model the association between exposure and health outcomes, the time-series of

E H I_{t}

is aggregated at daily or weekly resolution and paired with open-source health indicators such as respiratory-admission counts and morbidity rates from the WHO Global Health Observatory and Kaggle public-health datasets [31]. A nonlinear regression model based on gradient-boosted trees (XGBoost) is trained to map exposure levels to health responses:

{\hat{H}}_{t} = g_{ϕ} (E H I_{t}, z_{t})

(5)

where

{\hat{H}}_{t}

is the predicted health-risk indicator,

z_{t}

is the vector of auxiliary features (temperature, humidity, population density, and lagged exposures), and

g_{ϕ} (\cdot)

denotes the trained regression model with parameters

ϕ

.

Model performance is evaluated using standard statistical metrics, including the coefficient of determination (

R^{2}

), root-mean-square error, and Pearson correlation between predicted and observed health indicators. Lag-correlation analysis (1–3 days) is further performed to capture the delayed effects of pollutant exposure on hospital admissions. The resulting health-risk estimations provide an interpretable quantitative link between air-quality dynamics and respiratory health outcomes, enabling the hybrid framework to serve as a comprehensive tool for both environmental forecasting and public-health assessment. To account for the delayed physiological response to air pollution exposure, pollutant predictions were temporally aligned with health indicators using a short-term lag structure. Specifically, exposure levels were aggregated and evaluated with lag windows of 1–3 days prior to the reported health outcomes. This approach reflects established epidemiological evidence indicating that respiratory hospital admissions often increase within several days following elevated particulate matter concentrations. The lag analysis therefore enables the model to capture short-term exposure-response relationships between predicted pollutant fields and respiratory morbidity indicators while maintaining temporal consistency between environmental and health datasets.

3.5. Multimodal Data Fusion

The preceding modules operate on heterogeneous data sources that capture complementary aspects of the atmospheric and health system. To effectively integrate these information streams, the proposed framework employs a multimodal data-fusion strategy that combines ground-sensor, meteorological, satellite, and physics-based features within a unified representation. This fusion process enables the model to exploit spatial, temporal, and cross-domain dependencies that would otherwise remain uncorrelated across separate modalities.

In the proposed design, data fusion is implemented at two hierarchical levels feature-level (early) fusion and decision-level (late) fusion. In the early-fusion stage, raw features from ground stations (

P M_{2.5}

,

P M_{10}

,

N O_{2}

,

S O_{2}

,

C O

,

O_{3}

), meteorological variables (temperature, humidity, wind speed/direction, and pressure), satellite-based aerosol optical depth (AOD) and land-surface temperature (LST), and physics-derived pollutant estimates are concatenated into a composite input tensor:

X_{t} = [C_{t}, M_{t}, S_{t}, P_{t}]

(6)

where

C_{t}

represents observed pollutant concentrations,

M_{t}

denotes meteorological variables,

S_{t}

contains satellite-derived attributes, and

P_{t}

corresponds to physics-model outputs at time t. The combined feature tensor

X_{t}

is fed into the CNN-LSTM framework, enabling unified learning of spatial and temporal dependencies across multiple data modalities.

To enhance generalization and robustness, a secondary late-fusion stage is applied at the decision level. Here, independent sub-models trained on different data subsets (e.g., sensor-only, meteorology-only, satellite-only) produce partial predictions

{\hat{y}}_{t}^{(k)}

, which are subsequently aggregated through a weighted-ensemble formulation:

{\hat{y}}_{fusion} = \sum_{k = 1}^{K} α_{k} {\hat{y}}_{t}^{(k)}

(7)

where

α_{k}

denotes the weight assigned to each sub-model prediction, constrained such that

\sum_{k = 1}^{K} α_{k} = 1

. The ensemble weights are optimized on the validation dataset to minimize overall mean absolute error.

The hierarchical fusion process aligns low-level correlations and high-level modality trends, allowing both physical interactions and abstract representations to inform the final output. By integrating physically constrained inputs with observational and remote-sensing data, the proposed multimodal fusion enhances predictive accuracy, mitigates data sparsity from individual sources, and improves the interpretability of spatial and temporal pollution patterns that drive health-risk variability.

3.6. Model Evaluation and Explainability

The performance of the proposed hybrid framework is evaluated by comparing predicted pollutant concentrations and exposure indicators with observed measurements. The evaluation considers both predictive accuracy and model interpretability to ensure that the hybrid approach remains consistent with observed environmental patterns and underlying atmospheric processes.

To prevent temporal data leakage, the dataset is chronologically divided into training (70%), validation (15%), and testing (15%) subsets. The testing set is used for final performance assessment. For cross-city transfer learning, the model trained on the Riyadh dataset is evaluated on unseen datasets from Jeddah and Dammam. Seasonal evaluation is also conducted to assess model robustness under different meteorological conditions.

Quantitative performance is assessed using three standard statistical metrics: Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and the coefficient of determination (

R^{2}

), defined as

\begin{matrix} MAE & = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} | \\ RMSE & = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} \\ R^{2} & = 1 - \frac{\sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}} \end{matrix}

(8)

where

y_{i}

and

{\hat{y}}_{i}

denote observed and predicted values,

\bar{y}

is the sample mean, and N is the number of testing samples. Metrics are computed for individual pollutants (

P M_{2.5}

,

P M_{10}

,

N O_{2}

,

S O_{2}

,

C O

, and

O_{3}

) as well as aggregated air-quality and exposure indicators. Model performance is compared across three configurations: (i) physics-based model only, (ii) CNN–LSTM model only, and (iii) the proposed hybrid physics-informed framework.

Explainability analysis is used to examine the interpretability and physical consistency of the model. Feature attribution is computed using Shapley Additive exPlanations (SHAP) to quantify the contribution of each input variable to the predicted pollutant concentrations. Wind speed, humidity, and aerosol optical depth consistently emerge as dominant predictors, which aligns with established atmospheric transport and aerosol processes. Spatial attention patterns derived from the CNN component identify regions that most strongly influence dispersion corrections, while temporal inspection of the LSTM component captures persistence and lagged dependencies in pollutant dynamics. At the output stage, partial dependence analysis is used to examine nonlinear relationships between predicted exposure levels and respiratory morbidity indicators.

Model robustness is further assessed through sensitivity tests in which primary meteorological inputs are perturbed by

\pm 10 %

. Across these analyses, the hybrid framework demonstrates stable feature attribution patterns and physically plausible response behavior, supporting the reliability and interpretability of the proposed approach. To ensure robustness of the reported results, each experiment was repeated across multiple random initialization seeds. The performance metrics reported in this study correspond to the mean values obtained across these runs, which reduces the influence of stochastic variations during model training.

4. Results and Discussion

This section presents the experimental evaluation and discussion of the proposed hybrid physics-informed and multimodal deep learning framework. The results validate the framework’s accuracy, robustness, interpretability, and health-risk relevance using open-source datasets from major Saudi cities as a case study. All experiments are conducted on the datasets described in Table 2.

4.1. Model Performance Evaluation

Quantitative evaluation is performed to compare the predictive performance of three model configurations: (i) physics-based dispersion model, (ii) data-driven deep-learning model, and (iii) the proposed hybrid physics-informed model. Table 3 summarizes the MAE, RMSE, and coefficient of determination (

R^{2}

) for each pollutant species (

P M_{2.5}

,

P M_{10}

,

N O_{2}

,

S O_{2}

,

C O

, and

O_{3}

). The hybrid model consistently achieves the lowest MAE and RMSE across all pollutants, with an average

R^{2}

improvement of 15–25% over the standalone CNN-LSTM model.

The temporal prediction capability of the proposed framework is illustrated in Figure 2 and Figure 3, which show observed and predicted hourly

P M_{2.5}

concentrations for Riyadh during January 2023. Each figure compares the physics-based dispersion model, the deep-learning-only model, and the proposed hybrid physics-informed multimodal model against measured data.

As shown in Figure 2, the hybrid model closely follows observed

P M_{2.5}

concentrations throughout the month, accurately reproducing both diurnal variations and episodic peaks. The physics-only model tends to underestimate during stagnant wind periods, while the deep-learning model smooths short-term fluctuations. In contrast, the hybrid framework effectively merges the strengths of both approaches, achieving improved temporal accuracy and lower bias.

A magnified view for one representative week (10–17 January 2023) is presented in Figure 3. The hybrid model maintains high correlation with observations, capturing transient peaks linked to morning inversion and evening emission events. Overall, the results confirm that integrating physical dispersion priors with data-driven learning significantly enhances short-term air-quality forecasting performance.

The hybrid model exhibits markedly improved temporal fidelity in reproducing observed daily PM_2.5 trends for Riyadh during 2022–2023. As illustrated in Figure 4, the hybrid configuration accurately captures both the amplitude and phase of diurnal and episodic variations, including transient dust peaks that the physics-only and CNN-LSTM models tend to smooth or lag. This close alignment indicates that the hybrid formulation successfully couples meteorological drivers, emission dynamics, and atmospheric dispersion behavior, leading to more realistic short-term pollutant evolution across varying boundary-layer regimes.

Figure 5 presents the regression analysis between the observed and predicted PM_2.5 concentrations generated by the hybrid physics-informed learning framework. The results reveal a strong linear association characterized by a regression slope approaching unity and a coefficient of determination of

R^{2} = 0.97

. This high degree of correlation indicates that the hybrid configuration reproduces the observed pollutant variability with minimal amplitude distortion or systematic offset. The preservation of near-linear proportionality across the full concentration domain demonstrates that the model successfully integrates deterministic dispersion dynamics with nonlinear data-driven corrections, thereby enhancing predictive coherence while maintaining physical consistency within the concentration-response space.

To assess the robustness of the performance improvements, a paired t-test was conducted on the prediction errors across the test dataset. The results indicate that the hybrid model significantly outperforms both the physics-only and CNN–LSTM baselines (p < 0.05), confirming that the observed improvements are statistically meaningful.

4.2. Ablation Study of Multimodal Fusion Strategy

To evaluate the contribution of the proposed multimodal fusion strategy, a series of ablation experiments were conducted comparing different fusion configurations. Four model variants were evaluated, i.e., (i) sensor-only input using ground monitoring data, (ii) early feature-level fusion combining sensor, meteorological, satellite, and physics-derived inputs, (iii) late decision-level fusion using ensemble aggregation of modality-specific predictors, and (iv) the proposed hierarchical fusion integrating both stages.

As shown in Table 4, incorporating multimodal inputs improves predictive performance compared with the sensor-only configuration. Early fusion enhances spatial and meteorological feature interaction, while late fusion improves robustness through ensemble aggregation. The hierarchical fusion strategy achieves the best performance, reducing MAE to 6.7

μ

g/m³ and increasing

R^{2}

to 0.91.

4.3. Spatial and Temporal Analysis

The spatial prediction capability of the proposed framework was further examined using a representative dust event over Riyadh. Figure 6 presents the predicted

P M_{2.5}

concentration fields obtained from the physics-only dispersion model, the CNN-LSTM model, and the hybrid physics-informed model. Each subfigure shows the spatial plume morphology over the Riyadh metropolitan area (24.4°–25.1° N, 46.4°–47.2° E) under prevailing northwesterly (Shamal) winds.

As shown in Figure 6a, the physics-based model reproduces the general downwind dispersion pattern but underestimates concentrations near the emission core owing to simplified boundary-layer representation. The CNN-LSTM model in Figure 6b captures the overall spatial variability more effectively but produces slightly diffused concentration gradients. The hybrid model in Figure 6c combines the strengths of both approaches, accurately resolving high-concentration zones near the city-center while maintaining realistic plume spreading along the northwest-southeast axis. The improved spatial alignment and physical agreement of the predictions validate the effectiveness of combining physics-informed representations with neural learning for extreme dust-event forecasting.

4.4. Model Robustness and Transferability

To assess the robustness and generalization capability of the proposed hybrid framework, transfer-learning and sensitivity experiments were conducted across multiple Saudi cities with differing emission and meteorological profiles. The model trained on Riyadh data was fine-tuned and evaluated on Jeddah and Dammam datasets using the same open-source data streams.

Table 5 summarizes the cross-city performance of the physics-only, deep-learning-only, and hybrid configurations. The hybrid model consistently achieved the highest

R^{2}

values (>0.85) and the lowest MAE, confirming effective transfer of learned dispersion and emission patterns. Similarly, the physics-only model demonstrated limited adaptability due to simplified parameterization of surface roughness and atmospheric stability, while the deep-learning model exhibited higher variance under unseen meteorological conditions. By contrast, the hybrid approach maintained stability and physical coherence across all test domains.

A sensitivity analysis was further performed by perturbing key meteorological inputs, including wind speed and relative humidity, by

\pm 10 %

. The hybrid model’s performance degraded by less than 5% in MAE, whereas the pure deep-learning model showed up to 15% sensitivity. Results confirm that physics-guided learning improves both baseline precision and tolerance to data uncertainties, enabling scalable and reliable air-quality prediction across varied Saudi environments.

4.5. Explainability and Feature Importance

The internal reasoning of the hybrid model was further examined using explainability techniques to ensure that its predictions are physically interpretable and aligned with established atmospheric processes. Two complementary analyses were performed: feature attribution through SHapley Additive exPlanations and spatial attention visualization from the convolutional layers. Figure 7a presents the SHAP feature-importance summary derived from the test dataset. The most influential predictors include wind speed, relative humidity, and aerosol optical depth, followed by temperature, surface pressure, and the physics-prior

P M_{2.5}

field. The observed feature influences align with known atmospheric behavior during dust events, where strong winds facilitate particle transport and elevated humidity accelerates aggregation and deposition. Satellite-based AOD contributes significantly by capturing coarse-mode aerosol loadings when ground-level measurements are limited, demonstrating the value of multimodal fusion.

To further interpret spatial dependencies, attention weights were extracted from the convolutional encoder and visualized as shown in Figure 7b. The map highlights areas of high model focus concentrated around central Riyadh and along the northwest-southeast transport corridor, corresponding to the dominant Shamal wind flow. The identified high-attention zones align with major emission hotspots and urban corridors, demonstrating that the model captures physically meaningful spatial structures instead of spurious correlations. Combined SHAP and attention analyses validate the hybrid model’s predictive accuracy and transparency, supporting its interpretability for operational air-quality forecasting.

4.6. Seasonal and Regime-Based Evaluation

Table 6 summarizes the performance of all three model configurations across distinct meteorological regimes, confirming the hybrid framework’s consistent accuracy and adaptability under contrasting atmospheric conditions. During winter, when stable boundary layers and higher humidity are prevalent, the hybrid model achieved its lowest error (MAE = 7.1

μ

g/m³,

R^{2}

= 0.88), consistent with improved representation of inversion driven pollutant accumulation relative to both baselines.

Under summer convective regimes, where turbulent mixing and photochemical reactions dominate, the hybrid maintained high fidelity (

R^{2}

= 0.89), while the CNN-LSTM exhibited mild underestimation of midday peaks. During dust events, the hybrid model outperformed the alternatives by leveraging wind information and satellite derived AOD features, reducing RMSE from 20.3

μ

g/m³ (physics only) to 13.1

μ

g/m³ and improving the representation of peak event levels. On clean air days, the model maintained the best precision (MAE = 6.2

μ

g/m³,

R^{2}

= 0.90) with low bias, indicating stable generalization across varying pollution regimes. Overall, these results suggest that the hybrid design combines dispersion based structure with nonlinear residual learning to deliver accurate air quality forecasts under diverse meteorological conditions.

4.7. Exposure Health Correlation Analysis

The predictive utility of the hybrid model was further evaluated by linking its outputs to population-level health indicators. Figure 8 presents the correlation between the predicted Exposure Health Index and observed respiratory admission rates for three major Saudi cities: Riyadh, Jeddah, and Dammam. Each plot demonstrates a positive exposure response trend, confirming the model’s capacity to reproduce realistic health-impact dynamics associated with fine-particulate pollution.

As shown in Figure 8a, the Riyadh case exhibits the strongest relationship (

R^{2} = 0.83

), reflecting the dominance of dust and traffic related particulates in the inland urban environment. Figure 8b shows a slightly lower correlation (

R^{2} = 0.80

), attributed to marine aerosol influences and coastal dispersion, while Figure 8c yields a comparable

R^{2} = 0.82

linked to industrial emissions and humid boundary conditions. Across all cities, the nonlinear pattern indicates short-term respiratory sensitivity to

P M_{2.5}

exposure, consistent with established epidemiological evidence. The results confirm that the hybrid framework not only predicts pollutant concentrations with high fidelity but also provides meaningful health-risk correlations that can inform preventive air-quality policies.

5. Discussion

The proposed hybrid physics-informed deep learning framework improves both predictive accuracy and interpretability relative to conventional approaches. Across the evaluated case studies, the hybrid configuration reduced mean absolute error by approximately 20–30% and achieved

R^{2}

values above 0.85, outperforming both the physics-only and CNN–LSTM baseline models. By incorporating dispersion-based priors and learning nonlinear residual corrections from multimodal inputs, the model preserves physically plausible behavior while adapting to complex environmental variability.

Explainability analysis indicates that wind speed, humidity, and aerosol optical depth consistently emerge as the most influential predictors, which aligns with established atmospheric transport and aerosol formation processes. The exposure–health analysis further demonstrates strong associations (

R^{2} = 0.80

–

0.83

) between predicted pollutant variability and respiratory morbidity indicators, supporting the potential usefulness of the framework for environmental health assessment.

Despite these promising results, several limitations remain. Prediction accuracy may decrease during rapidly evolving atmospheric events such as intense dust storms or sudden emission spikes, where pollutant concentrations change faster than the temporal window captured by the model. Additionally, missing or noisy sensor observations may affect the reliability of multimodal inputs and lead to slight underestimation of extreme concentration values.

The physics module also relies on a simplified Gaussian plume formulation, which assumes steady-state atmospheric conditions and homogeneous turbulence. While this approximation provides computational efficiency and reasonable large-scale dispersion estimates, it may not fully capture complex airflow patterns present in dense urban environments, such as street-canyon circulation, building-induced turbulence, or rapidly changing wind conditions. Under such conditions, the plume formulation may underestimate localized concentration gradients. The hybrid learning component partially compensates for these limitations by learning residual corrections from observational data, although incorporating higher-resolution urban dispersion models represents an important direction for future work.

Another limitation relates to the spatial aggregation of demographic and health indicators used in the exposure-health analysis. While the proposed framework generates high-resolution pollutant predictions, the available health and demographic datasets are aggregated at the administrative city level. Consequently, the exposure–response relationships reflect population-level trends rather than fine-scale neighborhood variability. Future studies integrating higher-resolution health and demographic data could further improve the alignment between predicted exposure patterns and localized population risk.

Overall, the results highlight the value of physics-guided learning for improving predictive stability and interpretability without sacrificing accuracy. The use of open data sources and a scalable implementation also suggests potential applicability for regional air-quality forecasting, exposure assessment, and near real-time health risk monitoring.

6. Conclusions

This study developed and evaluated a hybrid physics informed deep learning framework for air quality and health risk prediction using fully open source environmental datasets. The model integrates dispersion based priors with multimodal inputs, including meteorological, satellite, and demographic variables, to improve spatiotemporal fidelity and interpretability. Across three Saudi cities, the proposed approach consistently outperformed baseline models, achieving up to 25% lower mean absolute error and

R^{2}

values above 0.85 for

P M_{2.5}

prediction. Explainability analyses using SHAP and attention visualization indicate that wind speed, humidity, and aerosol optical depth are among the dominant drivers, consistent with expected transport and aerosol behavior. Linking predicted pollutant fields with health indicators produced strong exposure response associations (

R^{2} = 0.80

–

0.83

) across Riyadh, Jeddah, and Dammam, supporting the potential to connect atmospheric variability with respiratory morbidity signals. Overall, the results suggest that physics guided learning can bridge deterministic dispersion structure with data driven inference to produce accurate and interpretable air quality forecasts using open data. Future work will focus on near real time deployment, uncertainty quantification, and regional scale modeling to support adaptive environmental monitoring and early warning applications.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found at the KAPSARC Data Portal and related public data repositories referenced in the manuscript. No new datasets were generated during the study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rahman, M.H.; Dipa, S.A.; Hasan, K.; Hasan, M.M. Health at risk: Respiratory, cardiovascular, and neurological impacts of air pollution. Innov. Environ. Econ. 2025, 1, 56–69. [Google Scholar]
Vallero, D.A. Fundamentals of Air Pollution; Academic Press: Cambridge, MA, USA, 2025. [Google Scholar]
Perveen, S.; Abbas, S.; Safdar, N. Strategies for Protecting Air and Water Quality in the Twenty-First Century. In Blue Sky, Blue Water: Strategies for Protecting Air and Water Quality in the 21st Century; Springer: Berlin/Heidelberg, Germany, 2025; pp. 457–474. [Google Scholar]
Rajesh, M.; Babu, R.G.; Moorthy, U.; Easwaramoorthy, S.V. Machine learningdriven framework for realtime air quality assessment and predictive environmental health risk mapping. Sci. Rep. 2025, 15, 28801. [Google Scholar] [CrossRef] [PubMed]
Chibueze Izah, S. Smart Technologies in Environmental Monitoring: Enhancing Real-Time Data for Health Management. In Innovative Approaches in Environmental Health Management: Processes, Technologies, and Strategies for a Sustainable Future; Springer: Berlin/Heidelberg, Germany, 2025; pp. 199–224. [Google Scholar]
Yafooz, W. Enhancing Business Intelligence with Hybrid Transformers and Automated Annotation for Arabic Sentiment Analysis. Int. J. Adv. Comput. Sci. Appl. 2024, 15. [Google Scholar] [CrossRef]
Rasool, M.; Noorwali, A.; Ghandorh, H.; Ismail, N.A.; Yafooz, W.M. Brain tumor classification using deep learning: A state-of-the-art review. Eng. Technol. Appl. Sci. Res. 2024, 14, 16586–16594. [Google Scholar] [CrossRef]
Yafooz, W.; Alsaeedi, A.; Alluhaibi, R.; Abdel-Hamid, M.E. Enhancing multi-class web video categorization model using machine and deep learning approaches. Int. J. Electr. Comput. Eng 2022, 12, 3176. [Google Scholar] [CrossRef]
Tasioulis, T.; Karatzas, K. Reviewing explainable artificial intelligence towards better air quality modelling. In Environmental Informatics; Springer: Berlin/Heidelberg, Germany, 2023; pp. 3–19. [Google Scholar]
Schmitz, S.; Towers, S.; Villena, G.; Caseiro, A.; Wegener, R.; Klemp, D.; Langer, I.; Meier, F.; von Schneidemesser, E. Unravelling a black box: An open-source methodology for the field calibration of small air quality sensors. Atmos. Meas. Tech. 2021, 14, 7221–7241. [Google Scholar] [CrossRef]
Khalid, S.; Yazdani, M.H.; Azad, M.M.; Elahi, M.U.; Raouf, I.; Kim, H.S. Advancements in physics-informed neural networks for laminated composites: A comprehensive review. Mathematics 2024, 13, 17. [Google Scholar] [CrossRef]
Ren, Z.; Zhou, S.; Liu, D.; Liu, Q. Physics-informed neural networks: A review of methodological evolution, theoretical foundations, and interdisciplinary frontiers toward next-generation scientific computing. Appl. Sci. 2025, 15, 8092. [Google Scholar] [CrossRef]
Klapa Antonion, X.W.; Raissi, M.; Joshie, L. Machine learning through physics–informed neural networks: Progress and challenges. Acad. J. Sci. Technol. 2024, 9, 2024. [Google Scholar] [CrossRef]
Babar, F.F.; Jamil, F.; Babar, F.F. Intelligent handling of noise in federated learning with co-training for enhanced diagnostic precision. In Proceedings of the International Conference on Computational Collective Intelligence; Springer: Berlin/Heidelberg, Germany, 2024; pp. 279–291. [Google Scholar]
Jamil, F.; Jamil, H. Toward Intelligent Ethnicity Recognition and Face Anonymization: An IncepX-Ensemble Model. In Proceedings of the International Conference on Computational Collective Intelligence; Springer: Berlin/Heidelberg, Germany, 2024; pp. 243–255. [Google Scholar]
Gana, D.; Jamil, F. DAG-based swarm learning approach in healthcare: A survey. IEEE Access 2025, 13, 13796–13815. [Google Scholar] [CrossRef]
Jamil, F.; Ahmad, S. Federated Swarm Intelligence for Adversarial Threat Mitigation through Self-Healing Anomaly Consensus Networks. In Proceedings of the 2025 IEEE 36th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC); IEEE: Piscataway, NJ, USA, 2025; pp. 1–7. [Google Scholar]
Hettige, K.H.; Ji, J.; Xiang, S.; Long, C.; Cong, G.; Wang, J. Airphynet: Harnessing physics-guided neural networks for air quality prediction. arXiv 2024, arXiv:2402.03784. [Google Scholar]
Chen, B.; Hu, J.; Wang, Y.; Feng, T.; Sun, W.; Feng, Z.; Yang, G.; Wang, H. An interpretable physics-informed deep learning model for estimating multiple air pollutants. Gisci. Remote Sens. 2025, 62, 2482272. [Google Scholar] [CrossRef]
Thakur, A.K.; Patel, S. Predicting spatiotemporal concentrations in a multizonal residential apartment using conventional and Physics-informed deep learning approach. Acs Es T Air 2025, 2, 1996–2008. [Google Scholar] [CrossRef]
Cao, C.; Debnath, R.; Alvarez, R.M. Physics-based machine learning for predicting urban air pollution using decadal time series data. Environ. Res. Commun. 2025, 7, 051009. [Google Scholar] [CrossRef]
Wang, X.; Dong, B. Physics-informed hierarchical data-driven predictive control for building HVAC systems to achieve energy and health nexus. Energy Build. 2023, 291, 113088. [Google Scholar] [CrossRef]
Li, L.; Wang, J.; Franklin, M.; Yin, Q.; Wu, J.; Camps-Valls, G.; Zhu, Z.; Wang, C.; Ge, Y.; Reichstein, M. Improving air quality assessment using physics-inspired deep graph learning. npj Clim. Atmos. Sci. 2023, 6, 152. [Google Scholar] [CrossRef]
Kazemi Naeini, H.; Shomali, R.; Pishahang, A.; Hasanzadeh, H.; Asadi, S.; Gholizadeh Lonbar, A. PINN-DT: Optimizing Energy Consumption in Smart Building Using Hybrid Physics-Informed Neural Networks and Digital Twin Framework with Blockchain Security. Sensors 2025, 25, 6242. [Google Scholar] [CrossRef] [PubMed]
Buckchash, H.; Biswas, B.M.; Agarwal, C.R.; Prasad, D.D.K. Hedging is not all you need: A simple baseline for online learning under haphazard inputs. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
KAPSARC Data Portal. Emissions of Air or Water Pollutants Dataset (2010–2018). Contains Saudi Arabia Emissions of Air and Water Pollutants for the Period 2010–2018, Compiled from the General Authority for Statistics, Riyadh, Saudi Arabia. 2018. Available online: https://data.kapsarc.org/explore/assets/emissions-of-air-or-water-pollutants2018/ (accessed on 25 August 2025).
DatasetEngineer, K. Riyadh Air Quality Dataset (2021–2023). Air Quality Measurements from Riyadh-Includes PM, Gas, Meteorological Data. 2023. Available online: https://www.kaggle.com/datasets/datasetengineer/riyadh-air-quality-dataset-2021-2023-by-kapsarc/data (accessed on 20 August 2025).
OpenAQ. OpenAQ Location 3185012 (Riyadh)—Air Quality Sensor Data. Measures PM1, PM2.5, PM10, RH, Temperature, Particle Counts. 2024. Available online: https://explore.openaq.org/locations/3185012 (accessed on 26 August 2025).
National Oceanic and Atmospheric Administration (NOAA). Saudi Arabia Hourly Climate Integrated Surface Data. 2024. Includes Hourly Observations of Wind, Sky Condition, Visibility, Air Temperature, Dew Point, and Sea Level Pressure for Multiple Stations Across Saudi Arabia (Last Five Years). Available online: https://datasource.kapsarc.org/explore/assets/saudi-hourly-weather-data/ (accessed on 22 August 2025).
NASA MODIS Science Data Support Team. MODIS/Terra+Aqua Land Aerosol Optical Depth Daily L2G Global 1km SIN Grid [Data Set]. 2022. Available online: https://www.earthdata.nasa.gov/data/catalog/lancemodis-mcd19a2n-6.1nrt (accessed on 5 August 2025).
Malaiarasugraj. Global Health Statistics. Global Health Indicator Data (e.g., Mortality, Disease, Demographics) from Various Sources. 2023. Available online: https://www.kaggle.com/datasets/malaiarasugraj/global-health-statistics (accessed on 20 August 2025).

Figure 1. Proposed hybrid physics-informed and data-driven framework for air-quality and health-risk prediction. The framework integrates physics-based dispersion modeling with multimodal data in a residual CNN-LSTM architecture for pollutant forecasting, exposure health estimation, and decision support.

Figure 2. Comparison of observed and predicted PM_2.5 concentrations in Riyadh during January 2023 using the Riyadh air-quality dataset (2021–2023) from the KAPSARC and OpenAQ monitoring networks with meteorological inputs from the Saudi Hourly Weather Dataset. Results compare the physics-based model, CNN–LSTM model, and the proposed hybrid framework.

Figure 3. Weekly comparison of observed and predicted PM_2.5 concentrations in Riyadh from 10–17 January 2023 using the Riyadh air-quality dataset (KAPSARC/OpenAQ) with corresponding meteorological inputs from the Saudi Hourly Weather Dataset. Predictions are generated by the physics-only, CNN–LSTM, and hybrid models.

Figure 4. Observed versus predicted PM_2.5 concentrations in Riyadh (2022–2023) using ground monitoring data from the KAPSARC and OpenAQ air-quality networks with meteorological inputs from the Saudi Hourly Weather Dataset.

Figure 5. Regression analysis between observed and predicted PM_2.5 concentrations using the Riyadh air-quality dataset (2021–2023). The results correspond to the hybrid physics-informed CNN–LSTM model configuration.

Figure 6. Model comparison of predicted

P M_{2.5}

distributions over Riyadh during a dust event. Consistent color scaling (50–250

μ

g/m³) and prevailing wind vectors highlight spatial dispersion patterns.

Figure 6. Model comparison of predicted

P M_{2.5}

distributions over Riyadh during a dust event. Consistent color scaling (50–250

μ

g/m³) and prevailing wind vectors highlight spatial dispersion patterns.

Figure 7. Explainability analysis of the hybrid model. (a) SHAP feature importance summary showing dominant meteorological, satellite, and physics-prior predictors. (b) Spatial attention map highlighting high-influence regions corresponding to urban and downwind zones over Riyadh.

Figure 8. Correlation between predicted Exposure Health Index and observed respiratory admission indicators across major Saudi cities. (a) Riyadh

d o n e

correlation between predicted Exposure Health Index and respiratory admission indicators (

R^{2} = 0.83

). (b) Jeddah correlation between predicted Exposure Health Index and respiratory admission indicators (

R^{2} = 0.80

). (c) Dammam correlation between predicted Exposure Health Index and respiratory admission indicators (

R^{2} = 0.82

). The exposure–response association is strongest in Riyadh and remains positive in Jeddah and Dammam under differing meteorological and emission conditions.

Figure 8. Correlation between predicted Exposure Health Index and observed respiratory admission indicators across major Saudi cities. (a) Riyadh

d o n e

correlation between predicted Exposure Health Index and respiratory admission indicators (

R^{2} = 0.83

). (b) Jeddah correlation between predicted Exposure Health Index and respiratory admission indicators (

R^{2} = 0.80

). (c) Dammam correlation between predicted Exposure Health Index and respiratory admission indicators (

R^{2} = 0.82

). The exposure–response association is strongest in Riyadh and remains positive in Jeddah and Dammam under differing meteorological and emission conditions.

Table 1. Comparative Technical Evaluation of Related and Proposed Frameworks.

Study	Physical Coherence	Model Transparency	Multimodal Integration	Real-Time Scalability	Health-Risk Coupling	Overall Advancement
[18]	✓	✓	✓	✗	✗	Physics-guided GNN improves accuracy but computationally heavy.
[19]	✓	✓	✗	✗	✗	High interpretability and pollutant synergy; limited cross-domain scalability.
[20]	✓	✗	✗	✗	✗	Indoor PI-LSTM captures interzonal transport but lacks scalability.
[21]	✓	✗	✓	✓	✗	PBML integrates traffic and meteorology data for long-term prediction.
[22]	✓	✗	✓	✓	✗	PINN-based hierarchical control enhances energy efficiency; no pollutant focus.
[23]	✓	✗	✓	✗	✗	Deep GNN embedding of fluid dynamics improves fine-scale assessment.
[24]	✓	✓	✓	✓	✗	Secure and physics-consistent energy optimization using PINN–DT–Blockchain.
Proposed Model	✓	✓	✓	✓	✓	Unified hybrid framework combining physics-informed dispersion, multimodal fusion, and exposure–health linkage.

Table 2. Summary of datasets used for atmospheric, meteorological, and health data integration.

Dataset (Reference)	Temporal Resolution	Spatial Resolution	Sampling Frequency	Measurement Units	Update Frequency	Data Samples (Approx.)	Data Format
[26,27]	Hourly (2021–2023)	Station-level (∼5 km)	Hourly	$μ$ g/m³ (pollutants), AQI (index)	Monthly	∼26,000 hourly records per station	CSV/JSON
[28]	Hourly	Station-level	Hourly	$μ$ g/m³	Near real-time	∼15,000 hourly readings per site	API/CSV
[29]	Hourly (2017–2023)	Station-level	Hourly	°C, %, m/s, hPa, mm	Monthly	>50 million hourly records	CSV/ NetCDF
[30]	Daily (1–5 km grids)	1–5 km gridded	Daily	AOD (unitless), K, W/m²	Continuous (NRT)	∼10⁷ pixel-level observations/day	HDF5/ GeoTIFF
[31]	Annual to monthly	Administrative boundary	Monthly–Annual	Population count, rates per 100k	Annual	∼500–1000 records	CSV/XLSX

Table 3. Model performance comparison for pollutant prediction using the Riyadh air-quality dataset (2021–2023) with multimodal inputs including ground pollutant measurements, meteorological variables, and satellite-derived aerosol features.

Model	MAE (μg/m³)	RMSE (μg/m³)	$R^{2}$
Physics-based (Gaussian plume)	12.6	17.4	0.72
Deep Learning (CNN-LSTM)	8.9	13.2	0.84
Hybrid (Proposed)	6.7	10.1	0.91

Table 4. Ablation comparison of multimodal fusion strategies for PM_2.5 prediction using the Riyadh air-quality dataset (2021–2023) under identical training settings for the CNN–LSTM architecture.

Model Configuration	MAE ( $μ {g / m}^{3}$ )	$R^{2}$
Sensor-only	9.8	0.82
Early fusion	8.1	0.87
Late fusion	7.6	0.88
Hierarchical fusion (Proposed)	6.7	0.91

Table 5. CCross-city transfer-learning performance using the Saudi urban air-quality datasets from Riyadh, Jeddah, and Dammam with identical hybrid model configurations.

Training City	Testing City	Model	MAE (μg/m³)	$R^{2}$
Riyadh	Jeddah	Physics-only	12.3	0.76
		CNN-LSTM	9.7	0.84
		Proposed	7.9	0.89
Riyadh	Dammam	Physics-only	13.1	0.73
		CNN-LSTM	10.2	0.82
		Proposed	8.2	0.87

Table 6. Seasonal and event based validation of the proposed hybrid model. Metrics show consistent performance across contrasting regimes, including high humidity winter stagnation and dust outbreaks, supporting the benefit of combining dispersion priors with multimodal data driven corrections.

Regime	Model	MAE (μg/m³)	RMSE (μg/m³)	$R^{2}$
Winter (stable, high humidity)	Physics-only	12.8	16.9	0.74
	CNN–LSTM	9.4	13.5	0.83
	Hybrid (Proposed)	7.1	10.2	0.88
Summer (high temperature, convective mixing)	Physics-only	10.9	15.1	0.77
	CNN–LSTM	8.6	12.8	0.84
	Hybrid (Proposed)	6.8	9.9	0.89
Dust Events	Physics-only	16.5	20.3	0.71
	CNN–LSTM	13.2	17.0	0.79
	Hybrid (Proposed)	9.8	13.1	0.85
Clean Days	Physics-only	9.0	12.4	0.80
	CNN–LSTM	7.6	10.7	0.85
	Hybrid (Proposed)	6.2	9.1	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alhawiti, K.M. A Physics-Informed Multimodal Deep Learning Framework for City-Scale Air-Quality and Health-Risk Prediction. Systems 2026, 14, 320. https://doi.org/10.3390/systems14030320

AMA Style

Alhawiti KM. A Physics-Informed Multimodal Deep Learning Framework for City-Scale Air-Quality and Health-Risk Prediction. Systems. 2026; 14(3):320. https://doi.org/10.3390/systems14030320

Chicago/Turabian Style

Alhawiti, Khaled M. 2026. "A Physics-Informed Multimodal Deep Learning Framework for City-Scale Air-Quality and Health-Risk Prediction" Systems 14, no. 3: 320. https://doi.org/10.3390/systems14030320

APA Style

Alhawiti, K. M. (2026). A Physics-Informed Multimodal Deep Learning Framework for City-Scale Air-Quality and Health-Risk Prediction. Systems, 14(3), 320. https://doi.org/10.3390/systems14030320

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Physics-Informed Multimodal Deep Learning Framework for City-Scale Air-Quality and Health-Risk Prediction

Abstract

1. Introduction

2. Literature Review

3. Hybrid Physics-Informed Multimodal Deep Learning Framework

3.1. Data Acquisition and Processing

3.2. Physics-Based Module

3.3. Deep Learning Module

3.4. Health-Risk Estimation Module

3.5. Multimodal Data Fusion

3.6. Model Evaluation and Explainability

4. Results and Discussion

4.1. Model Performance Evaluation

4.2. Ablation Study of Multimodal Fusion Strategy

4.3. Spatial and Temporal Analysis

4.4. Model Robustness and Transferability

4.5. Explainability and Feature Importance

4.6. Seasonal and Regime-Based Evaluation

4.7. Exposure Health Correlation Analysis

5. Discussion

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI