Integration of Explainable Artificial Intelligence into Hybrid Long Short-Term Memory and Adaptive Kalman Filter for Sulfur Dioxide (SO2) Prediction in Kimberley, South Africa

Agbehadji, Israel Edem; Obagbuwa, Ibidun Christiana

doi:10.3390/atmos16050523

Open AccessArticle

Integration of Explainable Artificial Intelligence into Hybrid Long Short-Term Memory and Adaptive Kalman Filter for Sulfur Dioxide (SO₂) Prediction in Kimberley, South Africa

by

Israel Edem Agbehadji

^1,* and

Ibidun Christiana Obagbuwa

^2,*

¹

The Centre for Global Change, Faculty of Natural and Applied Sciences, Sol Plaatje University, Kimberly 8301, South Africa

²

Department of Computer Science and Information Technology, Faculty of Natural and Applied Sciences, Sol Plaatje University, Kimberly 8301, South Africa

^*

Authors to whom correspondence should be addressed.

Atmosphere 2025, 16(5), 523; https://doi.org/10.3390/atmos16050523

Submission received: 27 March 2025 / Revised: 26 April 2025 / Accepted: 28 April 2025 / Published: 29 April 2025

(This article belongs to the Special Issue Applications of Artificial Intelligence in Atmospheric Sciences)

Download

Browse Figures

Versions Notes

Abstract

Air pollution remains one of the environmental issues affecting some countries, which leads to health issues globally. Though several machine learning and deep learning models are used to analyze air pollutants, model interpretability is a challenge. Also, the dynamic and time-varying nature of air pollutants often creates noise in measurements, making air pollutant prediction (e.g., Sulfur Dioxide (SO₂) concentration) inaccurate, which influences a model’s performance. Recent advancements in artificial intelligence (AI), particularly explainable AI, offer transparency and trust in the deep learning models. In this regard, organizations using traditional machine and deep learning models are confronted with how to integrate explainable AI into air pollutant prediction systems. In this paper, we propose a novel approach that integrates explainable AI (xAI) into long short-term memory (LSTM) models and attempts to address the noise by Adaptive Kalman Filters (AKFs) and also includes causal inference analysis. By utilizing the LSTM, the long-term dependencies in daily air pollutant concentration and meteorological datasets (between 2008 and 2024) for the City of Kimberley, South Africa, are captured and analyzed in multi-time steps. The proposed model (AKF_LSTM_xAI) was compared with LSTM, the Gate Recurrent Unit (GRU), and LSTM-multilayer perceptron (LSTM-MLP) at different time steps. The performance evaluation results based on the root mean square error (RMSE) for the one-day time step suggest that AKF_LSTM_xAI guaranteed 0.382, LSTM (2.122), LSTM_MLP (3.602), and GRU (2.309). The SHapley Additive exPlanations (SHAP) value reveals “Relative_humidity_t0” as the most influential variable in predicting the SO₂ concentration, whereas LIME values suggest that high “wind_speed_t0” reduces the predicted SO₂ concentration.

Keywords:

air pollution; long short-term model; adaptive Kalman filter; deep learning; machine learning; explainable artificial intelligence; local interpretable model-agnostic explanations; SHapley Additive exPlanations

1. Introduction

Air pollution is an environmental issue that significantly affects public health and ecosystems worldwide. According to the World Health Organization (WHO), air pollution is responsible for over four million premature deaths annually, due to respiratory and cardiovascular diseases [1]. For instance, Carbon Monoxide (CO) emissions alone affect cardiovascular organs, which leads to chronic diseases. Thus, an increase in air pollution and its concentration levels poses a significant threat to human health [2]. Yeshe, Wairooy [3] indicate a direct correlation between human population and human activities, such that an increase in human population also increases human activities, which then affect the air quality.

Developing economies are at the receiving end of the effects of air pollution. The overarching objective of some developing economies is the containment or total reduction in the impact of air pollution on their economy [4]. From a global perspective, some interventions to compensate developing economies for the emissions from developed nations include the provision of a dedicated global climate fund for sustainable climate activities and advanced technologies to mitigate air pollution [5]. In recent years, the Centre for Global Change, South Africa, has initiated programs which focus on monitoring global change risk and vulnerabilities across arid landscapes for local impacts. In localizing the impact on the Northern Cape Province, air pollution is identified as one of the challenges that requires some intervention. Maisha, Mulovhedzi [6] suggested that South Africa still uses numerical weather and climate models, which are optimal in the location or provinces in which they were developed, thus creating potential biases and limitations in other locations. Consequently, leveraging local expertise to enhance the decade-old “Conformal Cubic Atmospheric Model (CCAM)” developed in Australia at the Council for Scientific and Industrial Organization (CSIRO) is being championed in South Africa.

The Northern Cape is an arid landscape and the largest province in the Republic of South Africa. The Northern Cape covers an area of 372,889 km², with an estimated population of 1,193,780. The capital city is Kimberley, and there are other surrounding towns such as Upington, Springbok, Kuruman, De Aar, and others. The Northern Cape is rich in minerals and also has fertile agricultural land. The mandate to monitor ambient air quality is bestowed on the ambient monitoring stations of each province, and thus they are required to deploy the required hardware to ensure the measurement of quality air [7]. In Kimberley, the leading air pollution sources include diamond mining activities, vehicle emissions, and industrial facilities, with particulate matter (PM_2.5) being a primary concern [8]. Unfortunately, the activities of large-scale diamond mining companies in Kimberley negatively impact the environment, such as by causing land degradation, air pollution, and biodiversity loss [9]. The South African Air Quality Information System (SAAQIS) is one of the platforms that provides real-time air quality data. In 2024, it was recorded that the annual measure of particulates (PM₁₀) and Sulfur Dioxide (SO₂) are the two most prevalent pollutants in South Africa [10].

In previous years, statistical models such as the ARIMA (AutoRegressive Integrated Moving Average) were used to predict air pollution. In recent years, statistical models have been combined with machine learning (ML) techniques such as support vector machines (SVMs), random forest, decision trees, and many more to improve the accuracy of air pollutant prediction models [11]. Unfortunately, the complexities in addressing air pollution are quite enormous, such that traditional statistical methods are unable to give accurate predictions even during the peak air pollution event [12]. These complexities are the result of multiple factors, including spatial–temporal dependencies and meteorological factors [13]. Thus, statistical models are often deficient in capturing air pollution patterns’ dynamic and nonlinear nature, especially over long periods. The challenge with the ML model is overfitting and computational complexity when large volumes of data are used [14], and this usually leads to model hybridization to leverage the strengths and weaknesses of individual models and find an optimal weighted value for the model’s network architecture [15,16]. Furthermore, the chances of minimizing the prediction error when two models are combined are assured [17].

In recent years, advanced models such as deep learning models have been introduced to capture complex, dynamic, and nonlinear characteristics of air pollution over long-term dependencies in time-series data, which is crucial in predicting air pollution. Air pollutant prediction as a time-series problem is extremely difficult to accurately predict [18]. Meanwhile, one of the challenges in using these deep learning models for air pollution concentration analysis is the time-varying noise or uncertainty in measurements of air pollutants. There is uncertainty when capturing real-time weather and pollutant concentrations over time and when determining which air pollutant models should be accounted for to improve performance or ensure performance does not degrade. Deep learning models provide an approach to improve prediction accuracy, and such models include convolution-neural-network-based LSTM models to predict and reduce forecast error in meteorological datasets. Again, a temporal convolutional network with a bi-gated recurrent unit (TCN-BiGRU) has also been utilized to predict air pollutants. Therefore, the accurate prediction of air pollutant concentration level is vital in implementing timely interventions to mitigate the harmful effects on humans [19].

Explainable artificial intelligence (xAI) is one of the emerging fields in artificial intelligence that offers explanations on complex models used in time-series analysis. For instance, combining a deep learning model with another model could introduce additional network structure such that the knowledge of how these models contribute to any predictive task becomes a challenge. Thus, the utilization of explainable artificial intelligence in model hybridization provides the needed explanation to help experts or research scientists, in air pollution modeling, understand the black box structure of integrated models when applied in any air pollution predictive task.

This research proposes a novel air pollutant prediction model that leverages the LSTM model. The LSTM model helps predict time-series data, such as air pollutant levels, which are measured in multi-time steps [20]. The dynamic and time-varying nature of air pollutants and meteorological variables creates noise in data, which is addressed by the Adaptive Kalman Filter (AKF). Unlike the Kalman Filter, the noise assumes a constant value, which is not effective in dynamic, varying air pollutants prediction tasks. Thus, in this research, an Adaptive Kalman Filter (AKF) is employed to adjust the LSTM model so that the noise in the observed pollutants and meteorological dataset can reflect the time-varying nature of the air pollutant concentration. Consequently, the Adaptive Kalman Filter can continue providing accurate predictions in long-term dependency data. However, understanding how and why models make certain predictions in complex tasks is imperative. Thus, explainable artificial intelligence (xAI) is integrated with the proposed model, referred to as AKF_LSTM_xAI, and provides a novel approach to ensure the trust and interpretability of predictions. Thus, while ref. [21] proposed LSTM-KF for the prediction of the train trajectory, where the Kalman Filter (KF) replaced multilayer perceptron (MLP) for nonlinear analysis, its challenge was that the Kalman Filter assumed noise is constant. Considering the dynamic nature of air pollutants and meteorological factors, assuming a constant noise for a Kalman Filter might not effectively address the varying nature of air pollutant concentrations. Thus, our scientific novelty lies in the integration of emerging models with deep learning models to address the aforementioned challenge of existing models for air quality prediction. To this end, the sections are as follows: Section 2 (review of literature), Section 3 (materials and methods), Section 4 (results), Section 5 (discussions), and Section 6 (conclusion and future work).

2. Review of Literature

2.1. Long Short-Term Memory (LSTM)

The long-short-term memory (LSTM) model is a deep learning time-series model that captures long-term dependencies in data. LSTM models use gates as the mechanism to control data sequences. Li, Chen [22] employed an LSTM-based model for predicting air quality with an improved accuracy in predicting particulate matter (PM_2.5) concentration levels in urban areas. Similarly, Tsai, Zeng [23] applied a recurrent neural network (RNN) model, which is a variant of LSTM, for air pollutants analysis. While RNN models show promising results individually, they often struggle to incorporate long-term dependencies because of the vanishing gradient problem.

Studies have proposed a hybrid model to combine the strengths of different learning models. For example, He and Guo [24] combined RNNs with other neural network architectures for air quality prediction, leading to an improvement in prediction accuracy. Generally, hybrid models outperform single models in terms of accuracy and stability [25]. Similarly, other hybrid models that combine LSTMs with convolutional neural networks (CNNs) have been tested to address both spatial and temporal features of air quality data [26].

Hu, Lu [27] indicated that chemical transport models, such as the Community Multiscale Air Quality (CMAQ) model, which are commonly used for air quality prediction, often exhibit biases compared to observed data. This bias can be seen in the measurement error produced by the CMAQ models [28]. Chemical transport models consume high computing resources, whereas ML models are challenged with establishing causality, thus leading to the use of explainable ML and causal inference approaches to support data-driven approaches to identifying the causal relationship in air pollution data [29]. Thus, the hybridization of CMAQ with deep learning models (e.g., CNN and LSTM) has been explored to address the bias and improve predictive performance [30].

Karaiskos, Munian [31] identified the weakness of hybrid multilayer perceptron (MLP) and LSTM algorithms for indoor air pollutants prediction as lacking the ability to predict precise times. Subsequently, a hybrid LSTM-RNN model was utilized to predict the level of indoor pollution a day ahead. Zhang, Zou [32] indicated that while LSTM relies on gated units as the mechanism to establish the relationship among features, the correlation that exists between gate units is ignored. Because of this, the relationship helps to understand the degree of correlation between random variables in time-series analysis. Thus, the Read-first LSTM (RLSTM) model was proposed for temporal feature extraction, such that the RLSTM was used as an encoder and LSTM was used as the decoder; hence, the model can predict pollutants within 24 h.

Aggarwal and Toshniwal [33] indicated that while deep learning models such as LSTM help to understand the complex dependencies in time-series prediction, they identified that it is flexible to add more layers to improve performance. However, it has a higher computation cost and requires extensive hardware capabilities. Therefore, the optimal setting of hyperparameters is key to avoid model overfitting; thus, the stochastic optimization method was utilized.

Data availability to train ML/DL models has been one of the challenges in designing air quality predictive models. One of the approaches to address this challenge is transfer learning models and federated learning. Alazmi and Rakha [34] also suggest a shift to mobile monitoring tools to harness data from a variety of locations to develop predictive models for air pollutants such as particulate matter. Unfortunately, employing air pollutant monitoring tools such as low-cost sensors to collect high spatiotemporal data is less accurate, thus leading to the use of multiple datasets [35]. Furthermore, statistical methods are also used to calibrate low-cost sensors, thus helping reduce the biases and improve the prediction accuracy of the models used for predicting PM_2.5 [36]. Again, the sparse nature of mobile monitoring methods makes it challenging to recover high-resolution pollutant levels across a wide area [37]. The use of computer-generated or synthetic data for model training has also been leveraged to address the data gap [38]. Some of the challenges, such as missing data encountered during data preprocessing, hinder time-series data analysis [39,40]; this also occurs when the quality of the data from low-cost sensors is noisy [41]. Thus, there appears to be a shift from using fixed-site datasets to synthetic to mobile and low-cost data-harnessing devices.

A gated recurrent unit (GRU) is a type of RNN, which is an alternative to LSTM networks. GRUs aim to solve the vanishing gradient problem in traditional RNNs [21], even though it is simpler and computationally efficient compared with the LSTM model. LSTM and GRUs are two types of improved RNN models to address time-series prediction problems. To further improve the prediction accuracy of the temporal convolutional networks (TCNs), they are hybridized with bidirectional gated recurrent units (biGRUs) [42]. In achieving this, the TCN extracts higher-level feature information from longer time-series data, and biGRU captures past and future data about features to achieve more accurate predictive outcomes. However, the performance degrades in the long term due to the random guessing of parameters as time increases [42]. Table 1 presents a summary of deep learning models for predicting air pollutants.

The capturing of long-term dependencies in time-series data has been one of the challenges in air quality prediction models [52]. The attention mechanism has been one of the approaches used in LSTM models because of its capability to focus attention on important features in the dataset [53]. However, models based on attention mechanisms end up learning patterns that are specific to noisy data rather than the underlying inherent pattern or relationship in the data that the model sought to capture [54]. Though adding an attention mechanism enhances the LSTM model’s performance in any predictive task [55], the model’s interpretability is essential to successfully enhance air pollutant prediction. However, the deep learning models are black boxes, meaning it is often difficult to understand the decision-making process leading to outcomes from deep learning models [56].

2.2. Adaptive Kalman Filter

The Adaptive Kalman Filter, as an extension of the Kalman Filter technique, allows the filter parameters to change over time based on the observed data. Adaptive Kalman Filtering (AKF) provides an effective method to filter out noise in time-series data. The changes in weather and environmental variables necessitate a dynamic method of modeling air pollutant prediction systems [57]. A Kalman Filter is an optimized recursive or autoregressive data-processing algorithm that estimates the state of a linear system from noise measurements [58]. The traditional Kalman Filter presumes that noise and measurements are known and constant. Zhou, Wang [59] proposed the Kalman–attention–LSTM model for estimating the air quality index, where the Kalman Filter integrates measurement data with predictions from another model (e.g., WRF-CMAQ). In this approach, the LSTM–attention mechanism was applied as a single-factor prediction model for AQI prediction.

Christakis, Tsakiridis [60] suggested that the application of the Kalman Filter ensures the reliability of measurement error from low-cost sensors used in air quality monitoring. The air quality prediction model proposed by Song, Huang [61] applied the Kalman Filter as a linear estimator and the LSTM as the static prediction model. In this model, the Kalman Filter provides the dynamic model to adjust observation data based on the predicted values of the long short-term memory recurrent neural network. Ahmad and Alkhammash [57] indicate that noise significantly impedes accurate time-series data analysis.

Yang, Chen [62] applied a seasonal gated recurrent unit (SGRU), which is a time-series prediction model, with the Kalman Filter technique. In this approach, the Kalman Filter was applied for data preprocessing to achieve an accurate forecast of CO concentration. Zhou, Wang [59] presented the Kalman–attention–LSTM prediction model, where the Kalman Filter was used as a data-fusing (that is, using environmental and simulation data) technique and the LSTM–attention model was used as a single factor for the prediction model. The Adaptive Kalman Filter dynamically adjusts the filtering parameter of the model and ensures a reliable and accurate analysis. The advantage of the Adaptive Kalman Filter is the computational efficiency (accuracy of 95.4%), such that it is suitable in resource-constrained low-cost sensors used to capture air pollutants in real time. Table 2 shows the summary of Kalman Filter approaches.

2.3. Explainable Artificial Intelligence (xAI)

Explainable AI (xAI) is a set of methods and techniques in artificial intelligence (AI) that help to make the decisions of a system transparent, understandable, and interpretable [69]. In recent times, researchers have identified that ML models are often unstable, in terms of accuracy, due to the interplay of multiple variables considered as input variables in the training models [70]. In this regard, models such as explainable AI (xAI) have been proposed to help gain insight into the inherent network structures of ML models. For instance, the use of xAI models such as Shapley Additive exPlanation (SHAP) helps to interpret the results of ML/AI models. Again, Deterministic Local Interpretable Model-Agnostic Explanations (DLIME) is an enhanced version of LIME (Local Interpretable Model-Agnostic Explanations) for interpreting complex deep learning models, specifically designed to explain individual predictions [71]. The simplicity of LIME makes it more suitable for ML models. While SHAP provides global and local feature attributions based on cooperative game theory, LIME focuses on creating a local surrogate model that is interpretable and can explain why a complex model made a particular prediction. ML models such as XGBoost, when combined with SHAP, provide one of the effective tools to explain input features and predicted variables [72]. Moreover, the correlation and causation should be carefully analyzed. For instance, while SHAP provides the needed correlations, the correlation among variables does not suggest causality. However, deep learning models for time-series analysis handle complexity among variables and aim to identify and infer causal relationships among variables [73,74]. Thus, deep learning models not only provide the correlation but also the underlying cause-and-effect relationship that exists in predictive models [74]. Unfortunately, there is a caveat that not every correlation implies causation; moreover, causation implies correlation. Therefore, commencing with a correlation analysis incurs some risks with the deep learning method.

Zhao, Lin [75] revealed that SHAP is effective at showing the influence of meteorological variables on particulate matter (that is, PM_2.5) [75]. Zhong, Xiao [76] indicated that integrating SHAP with ML helps in determining factors that can influence another day’s “pollen” concentration and their thresholds when meteorological data and vegetation phenology are analyzed. Thus, explainable AI enables help with factor analysis between environmental variables and particulate matter [77]. Similarly, SHAP and ensemble ML have been explored to understand the impact of factors (such as meteorology, sources, and chemical compositions) on ozone concentration levels [78,79]. Generally, xAI helps to understand the important features in input data to arrive at the final output. However, Minh, Wang [69] indicated that the challenge with xAI is the trade-off between performance versus explainability.

Oliveira, Franco [80] attempted to include transparency in the LSTM model using two explainability approaches, that is, an “ante-hoc” approach with an attention layer and the “post-hoc” approach using SHapley Additive exPlanations (SHAP), for air pollutant analysis. It revealed that the attention layer recognized CO and NO₂ as the most important features, whereas the SHAP highlighted NO₂ as the major contributor to the air quality predictions, followed by particulate matter (PM_2.5) and CO.

Deep learning models are complex and also use nonlinear activation functions. Despite the application of deep learning models in air quality prediction, their network structures are not easily interpretable by humans; therefore, they are considered a “black box” [81]. Consequently, fuzzy-embedded RNN (FE-RNN) is applied to improve the interpretability of the underlying neural network model. While using an enhanced RNN, and combined with explainable AI, the fuzzy logic optimized the hyperparameter to enhance the model’s prediction [82]. Again, deep learning models in high-dimensional space are more complicated, making interpretation more challenging [69]. Thus, to develop a specific model for the air pollutants prediction task, it is required that a model be trained with air pollution data (that is, meteorological and air pollutants), such that the model can learn from inherent variables in each pollutant datum. By so doing, a model can learn from many tasks together. Table 3 provides a summary of explainable AI and LSTM models.

3. Materials and Methods

3.1. Proposed Method

The methodology combines the LSTM model, the Adaptive Kalman Filter (AKF), and explainable AI (xAI), referred to as hybrid AKF_LSTM_xAI (see Figure 1). The capability of the individual model is leveraged to provide a robust model for air pollutant prediction. Figure 1 presents the structure of the proposed hybrid AKF_LSTM_xAI model.

3.2. Model Integration Steps

The steps to integrate the proposed AKF_LSTM_xAI model can be expressed as follows:

3.2.1. Step 1: Raw Input Data

The raw input data are air pollutant concentrations such as particulate matter (PM_2.5 and PM₁₀ micrograms per cubic meter (µg/m³)), Nitrogen Dioxide (NO₂ micrograms per cubic meter (µg/m³)), Nitrogen Oxide (NO), ozone (O₃ micrograms per cubic meter (µg/m³)), Carbon Monoxide (CO milligrams per cubic meter (mg/m³)), Sulfur Dioxide (SO₂ micrograms per cubic meter (µg/m³)), as well as meteorological data (ambient temperature (°C), relative humidity (g/m³), and wind speed (km/h)). There were 6210 instances with 10 features.

3.2.2. Step 2: Data Preprocessing

Data are cleaned, normalized using “MinMaxScalar” (0, 1), formatted, and fed as the input sequence into the LSTM layer. The data are preprocessed to handle missing values using interpolation. The data used to train the LSTM model were formatted in sequences of past M-time steps, and the LSTM learns to map the sequences to n-day future time step values.

3.2.3. Step 3: LSTM Layer (Feature Extraction and Prediction)

The LSTM network processes the input data sequentially. It learns the long-term dependencies in the data, making it useful for time-series forecasting or prediction tasks. The output from the LSTM is the initial state predictions of air pollutants over time. The LSTM generates the initial state prediction, and the output of the LSTM at each time step is the prediction of the state, which is computed by Equation (1):

x_{t}^{L S T M} = f_{L S T M} (x_{t} {, h}_{t - 1,})

(1)

where

f_{L S T M}

is LSTM’s mapping from inputs

x_{t}

at time step (t) and the hidden states

h_{t - 1,}

at time step (t).

3.2.4. Step 4: Adaptive Kalman Filter Layer (State Estimation and Correction)

The AKF leverages the unique memory feature of LSTM to “store” the information contained in the time-series data. In this regard, the AKF technique adjusts the sequence from the LSTM layer [61]. The Kalman Filter equation is expressed as follows:

A.: Prediction Step:

The prediction step consists of state prediction and covariance prediction, which are expressed by Equations (2) and (3). State prediction is expressed in Equation (2):

\dot{x_{t}} = A {\dot{x}}_{t - 1} + B u_{t}

(2)

Covariance prediction is expressed in Equation (3):

\dot{P_{t}} = A P_{t - 1} A^{T} + Q

(3)

B.: Update Step:

The update step is expressed by Equations (4)–(6). The Kalman gain is expressed in Equation (4):

K_{t} = \dot{P_{t}} {(\dot{P_{t}} + R)}^{- 1}

(4)

Again, the state update is expressed by Equation (5):

x_{t} = \dot{x_{t}} + K_{t} (y_{t} - \dot{x_{t}})

(5)

Finally, the covariance update is expressed by Equation (6):

P_{t} = (I - K_{t}) \dot{P_{t}}

(6)

where

\dot{x_{t}}

,

x_{t}

are the predicted and updated states, respectively.

\dot{P_{t}}

,

P_{t}

are the predicted and updated covariance matrices.

K_{t}

represents the Kalman gain. A is the state transition matrix, and B is the control input matrix.

u_{t}

represents the control vector (external inputs).

y_{t}

is the actual measurement at time t (e.g., the observed pollutant concentration). Q and R are the process and measurement noise covariance, which are updated dynamically based on the estimation error and performance obtained through learning. Thus, Q and R were expressed by Equations (7) and (8):

Q_{t} = α \cdot {(y_{t} - \dot{x_{t}})}^{2} + (1 - α) \cdot Q_{t - 1}

(7)

R_{t} = β \cdot {(y_{t} - \dot{x_{t}})}^{2} + (1 - α) \cdot R_{t - 1}

(8)

where

α

and

β

represent the forgetting factors between 0 and 1, to help control how quickly the air pollutant prediction model adapts.

During the integration phase, the output of the LSTM

x_{t}^{L S T M}

was used as the predicted state in the Kalman Filter’s update expression. At the Kalman Filter update step, the output of the LSTM

x_{t}^{L S T M}

was used as the predicted state in the Kalman Filter’s update equation. Finally, the prediction can be expressed by Equation (9):

\dot{x_{t}} = x_{t - 1} + K (x_{t}^{L S T M} - x_{t - 1})

(9)

Whereas the update can be expressed by Equation (10):

x_{t} = \dot{x_{t}} + K_{t} (y_{t} - \dot{x_{t}})

(10)

where

y_{t}

is the actual measurement at time step t and

K_{t}

is the adaptive Kalman gain, which is adapted based on the prediction error, refining the estimates at each iteration and thus helping reduce the noise in the model as new data are fed in.

Therefore, the final output is the updated state

x_{t}

, which is the refined prediction from both the LSTM and the AKF, ensuring that the noise and uncertainty in the LSTM’s predictions are minimized. Thus, the hybrid model AKF_LSTM indicates that LSTM provides a temporal prediction and the AKF is responsible for fine-tuning the prediction with updated covariance information. Therefore, the final output is expressed by Equation (11):

x_{t}^{f i n a l} = A K F (x_{t}^{L S T M}, K_{t}, y_{t})

(11)

where

x_{t}^{f i n a l}

is the final output after the AKF model’s correction was applied to the LSTM prediction,

K_{t}

is the Kalman gain, and

y_{t}

is the actual air pollutant concentration. Thus, while the LSTM learns the long-term dependencies from the sequential data, the AKF adjusts the predictions based on the historical measurement corrections, leading to a more robust model, especially in a noisy environment such as an air pollutant environment.

3.2.5. Step 5: Explainable AI (xAI) Layer

Explainable AI provides insights into the model, showing which factors (like weather conditions or seasonal trends) are most contributing to air quality levels on a particular day. This layer provides transparency and interpretability to the predictions of the combined AKF_LSTM model. Techniques used by the xAI method include the following:

In terms of SHAP for global interpretability, SHAP values are used to interpret the output of the LSTM by attributing the contribution or influence of each input feature (e.g., the air pollutant concentration) to the model’s final prediction. SHAP decomposed the prediction into additive contributions from each feature across the entire sequence. The SHAP value represents the influence of

x_{t}^{(j)}

on the overall prediction. The SHAP value

\emptyset_{j} (t)

is the value for the j-th feature at time step t. The Shapley value

\emptyset_{j}

of the overall prediction is the sum of all contributions across all time steps, which can be expressed by Equation (12):

\emptyset_{j} = \sum_{i = 1}^{T} \emptyset_{i} (t) .

(12)

At each time (t), the Shapley value

\emptyset_{j} (t)

can be computed in Equation (13):

\emptyset_{j} (t) = \sum_{S \subseteq N (j)} \frac{|S|! (|N| - |S| - 1)!}{|N|!} [f (S \cup (j), t) - f (S, t)] .

(13)

where

S \subseteq N (j)

represent a set of all features (N) and a subset of features (S) excluding other features (j).

f (S \cup (j), t)

presents the prediction of the AKF_LSTM using the features in the subset

(S \cup (j))

at each time step.

f (S, t)

is the prediction by AKF_LSTM when using the subset of features S at time step. Therefore, the terms

f (S \cup (j), t) - f (S, t)

represent the marginal contribution of feature at each time step when added to the subset feature S. The contributions are aggregated across all time steps to understand the overall impact of each feature on the model’s prediction.

The use of LIME for local interpretability helps to understand how the LSTM model, which could be otherwise hard to interpret due to its sequential nature, makes decisions on input variables. The goal of LIME is to understand why the LSTM model made a particular prediction, say for predicting the SO₂ concentration at a given time step. Given the perturbed instances

{x^{'}}_{i}

where i = 1, …, N, the perturbed instance is computed in Equation (14):

x^{'} ~ N (x, σ)

(14)

Thus, per each perturbed instance

x^{'}

,

σ

is the small perturbation noise, the AKF_LSTM prediction

{\hat{y}}^{'}

is expressed in Equation (15):

{\hat{y}}^{'} = f_{L S T M} (x^{'})

(15)

After generating perturbed instances and corresponding predictions from the AKF_LSTM model, LIME fits a regression model to approximate the model’s local behavior as expressed in Equation (16):

{{\hat{y}}^{'} = θ^{T} x}^{'} + b

(16)

Thus, the surrogate model to minimize the loss function is expressed in Equation (17):

L (θ, b) = \sum_{i = 1}^{N} {({\hat{y}}^{'}_{1} - (θ^{T} x_{i}^{'} + b))}^{2} + λ {‖θ‖}_{1} .

(17)

where

x_{i}^{'}

is the input data,

θ

is the set of coefficients (weights) of a surrogate regression model, b is the bias term of the surrogate regression model,

L (θ, b)

is the loss function used to train the surrogate regression model, N is the number of perturbed instances used to train the surrogate model, λ is the regularization hyperparameter,

{‖θ‖}_{1}

is the surrogate penalization term, and

{\hat{y}}^{'}

is the surrogate regression model. The loss function value is scaled between 0 and 1.

3.2.6. Step 6: Model Evaluation

The model’s performance evaluation metrics (see Table 4) are the root mean square error (RMSE), mean square error (MSE), and coefficient of determination (R²).

3.2.7. Step 7: Final Prediction/Output

The output is the final prediction from the model. This prediction is interpretable through the xAI techniques, allowing users to understand the reasons behind the model’s behavior and decision. Thus, by combining LSTM and AKF with xAI (SHAP and LIME), we intend to create a robust approach for predicting air pollutant concentration for the study area. This approach is valuable for systems used in environmental modeling, where both accurate predictions and model interpretability are crucial.

3.3. Model Parameter Description

Table 5 presents the parameters of the models. These models were implemented in Python version 3.10 due to its programming flexibility using an Intel Core i3 processor manufactured by Intel Corporation in Santa Clara, CA, USA.

4. Results

4.1. Explainable AI: SHAP Analysis on AKF_LSTM_xAI Model

Figure 2 shows the findings from applying the xAI model. The SHAP value is shown for the one-day time step. It can be observed that relative humidity at the one-day time step (with the time step indexed at one) has a high feature value. In essence, Figure 2 shows how each feature influences the prediction of SO₂ concentrations using the proposed AKF_LSTM_xAI model.

In Figure 2, there are “Blue” and “Red” colors; each point shows a single instance of how the feature value affects the prediction of SO₂, and the x-axis shows the SHAP value of how each feature contributes to SO₂ prediction (where positive values push model output higher and negative values push it lower) and the y-axis shows the sorted feature importance from top to bottom. Thus, features at the bottom of the plot have a lower impact on the AKF_LSTM_xAI model’s prediction of SO₂ concentration. The “RED” colored features suggest that an increase in those feature values leads to higher predicted SO₂, whereas the “Blue” colored features mean that an increase in their feature value leads to a lower predicted SO₂ concentration. Furthermore, either “Blue” or “Red” shows the average impact of the feature across all the predictions. Thus, the “blue” feature points have a low average impact, whereas the “Red” feature points have a high average impact. The color intensity on the y-axis shows the level of impact of each feature on the AKF_LSTM_xAI model’s prediction of SO₂. Therefore, “Relative_humidity_t0” is found on top of the plot and also has a widespread SHAP value, and hence, it is the most influential feature. Again, a positive SHAP value means the feature increases the model’s prediction, and a negative value means the feature reduces the model’s prediction.

Table 6 shows the global feature importance values that give a more general overview of the most important features across all the predictions.

From Table 6, it can be observed that “Relative_humidity_t0” has a SHAP value of 0.000177438, which tends to be more impactful for the one-day time step prediction of SO₂ concentration.

4.2. Explainable AI: LIME Analysis on AKF_LSTM_xAI Model

LIME explains how the features (that is, NO, NO₂, CO, O₃, PM₁₀, PM_2.5, wind speed, ambient temperature, and relative humidity) contributed to SO₂ concentration prediction. LIME focuses on the local explanation and provides feature importance values for the individual predictions. Figure 3 shows the LIME results for the one-day time step.

In Figure 3, the “Green bar” and “Orange bar” represent positive and negative features, respectively. In this instance, increasing “Green bars” tend to increase the prediction of SO₂ concentration. It can be observed that increasing PM_2.5 at time step (t0) leads to a higher predicted SO₂ concentration. Furthermore, an “Orange bar” suggests that increasing the value of the feature tends to decrease the predicted SO₂ concentration. Furthermore, it can be observed that “Wind_speed” at time step (t0) is likely to lead to a lower predicted SO₂ concentration. Again, a horizontally longer bar shows a greater effect of a feature on the predicted SO₂ concentration. Thus, it can be observed that “PM2.5_t0” has a horizontally longer positive value, whereas “Wind_Speed_t0” has a negative value (shown in the orange bar). Thus, the proposed model considered that a higher PM_2.5 concentration could increase the SO₂ concentration, while a higher wind speed decreases the SO₂ concentration prediction at any specific instance. Therefore, all the positive features tend to increase the SO₂ prediction, whereas any negative feature tends to decrease the SO₂ prediction. Again, it can be observed that “wind_Speed_t0” is shown in blue, which means that a higher wind speed tends to decrease the predicted SO₂ concentration. Table 7 shows the LIME feature importance and weights of all the features at the one-day time step.

Table 8 shows the loss function value of the proposed model during the local explanation, which indicates how well the local interpretable model based on linear regression fits the original behavior of the model on the data instances. Through this, the proposed model accurately presented the approximation of the model’s complexity. It can be observed that the loss function value of 0.6052 was recorded as guaranteeing a reasonable complexity. Thus, a loss function value between 0.5 and 1.0 shows reasonableness.

4.3. Models’ Prediction over Time

During the experiment, the AKF was applied to the preprocessed data and normalized with the “MinMaxScaler” to generate the time-series data to train the LSTM model for the proposed AKF_LSTM_xAI model, thus ensuring noise reduction and guaranteeing a stable model. Figure A1 and Figure A2 show the actual and predicted SO₂ concentration for the M-day time step for the proposed AKF_LSTM_xAI model and comparative models, respectively. It can be observed that the AKF_LSTM_xAI demonstrated predictions that were close to the actual SO₂ concentration (Figure A1). Again, the proposed model demonstrated a good performance about the comparative models (Figure A2).

Table 9 presents the performance evaluation of the proposed model and comparative models using the MSE, R², and RMSE for different sliding windows (M-day time steps), where M represents the number of previous days of data used to train the models.

Figure A3, Figure A4 and Figure A5 show plots of the MSE, RMSE, and R² of the models, respectively. It can be observed that the proposed AKF_LSTM_xAI model guaranteed the lowest MSE and RMSE values. Again, the proposed model also guaranteed a high R² value, suggesting how well the model fits the actual data. For instance, for the one-day time step, AKF_LSTM_xAI was 0.991 and LSTM (0.786), LSTM_MLP (0.385), and GRU (0.747).

The proposed model addresses temporal patterns, dynamic estimation, and the smoothing of prediction and offers an explanation on what features contributed to the prediction, thereby tackling the aspects of correlation. In this research, correlation is considered only as an indicator of cause and effect. For instance, the high LIME value of PM_2.5 does not mean PM_2.5 causes SO₂; however, it suggests a strong association in the dataset. Thus, a causal influence model was employed to understand whether changes in SO₂ were caused by the PM_2.5 concentration (Figure 4 and Table 10).

The p-value was 0.98, and the estimated effect was 2.94082. When a random common cause was introduced, a small change in the effect was recorded as 2.98081. Based on the p-value recorded, there was no causal relationship or effect between SO₂ and PM_2.5. Further experiments were conducted using selected feature numbers (that is, one, two, and three) whose feature weights are above 0.06 (see Table 7), shown as column 1, whereas all features are presented in column 2 (Table 11).

Table 11 shows the summary of causal effects and p-values. In column 1, the estimated causal effect was 0.25432 and by adding random common cause, a new effect value of 0.25435 was generated and a p-value of 0.88 was recorded. These results suggest a minimal effect indicating that the difference is statistically not significant. Figure 4 shows the map of each of the features with SO₂ and PM_2.5.

Furthermore, when all the features were considered to understand the causal inferences with SO₂ and PM_2.5 (Figure 5). As shown in column 2, the estimated effect was 0.110716 whereas the new effect was 0.110714. Thus, the random common cause that was introduced in the model (new causal effect) barely changed the estimates. Moreover, a p-value of 0.90 was recorded which signifies that there is no statistically significant change in the model.

5. Discussion

The hybrid AKF_LSTM_xAI model was developed and tested on daily air pollutant concentrations and meteorological datasets. Different air pollutants and meteorological variables were considered to predict the SO₂ concentration.

Table 9 shows the models (LSTM, GRU, LSTM_MLP) and (AKF_LSTM_xAI) in terms of the RMSE, MSE, and R². The evaluation metric, such as the RMSE, should produce a lower value to suggest that the prediction of SO₂ is closer to the actual SO₂ concentration. It was observed that the AKF_LSTM_xAI guaranteed RMSE value for the one-day time step of 0.382 whereas LSTM (2.122), LSTM_MLP (3.602) and GRU (2.309). Furthermore, the R² value of AKF_LSTM_xAI is 0.991 whereas LSTM (0.786), LSTM_MLP (0.385), and GRU (0.747). Because of this, AKF_LSTM_xAI guaranteed a high R² value suggesting a better fitting of the model to actual data. Generally, the AKF_LSTM_xAI model demonstrated better performance in all the M-day time steps. For instance, using a 30-day time step for the prediction of SO₂ concentration yielded the following R² performance values for the models AKF_LSTM_xAI (0.870), LSTM (0.739), LSTM_MLP (0.760) and GRU (0.682). Thus, using multiple daily historical data, the proposed model still demonstrates better performance because of the adaptive Kalman Filter capability. Zhou, Wang [59] application of Kalman Filter, attention mechanism and LSTM model for air pollutant concentration prediction using metrics like the RMSE and R-square value also produced smaller values when compared with other others. Song, Huang [61] suggested that hybridizing LSTM with Kalman Filter provides a better prediction of air pollutants than a single LSTM model for long-term characteristics. In these regards, this research also suggests that hybridizing LSTM and Adaptive Kalman Filter for SO₂ concentration prediction can provide a guarantee better performance.

The LIME and SHAP plots give a more comprehensive understanding of how the proposed model uses the variables or features to predict the SO₂ concentration levels. By integrating explainable AI models into the AKF_LSTM model, this study provided insight into how the model made decisions that is crucial to enhance the model’s transparency and build trust among stakeholders. In so doing, the air pollution value chain stakeholders can be assured of the utmost model’s performance when integrated into legacy-based LSTM models for air pollutant concentration prediction. For instance with one-day time step, Figure 2 suggests “Relative_humidity_t0” with SHAP value (apprx. 0.00018) as the most influential variable in predicting the SO₂ concentration levels. Subsequently, “NO_t0” was the second most influential variable with SHAP value approximately 0.000104. In terms of the LIME values, Figure 3 suggests that a high “wind_speed_t0” decreases the predicted SO₂ concentration. Again, the LIME value reveals that a higher PM_2.5 concentration could increase the SO₂ concentration prediction for the one-day time step. Thus, in terms of feature importance PM_2.5 had weight of approximately 0.093 followed by O₃ (0.08) etc. and “Wind_speed” was considered less important with LIME value, approximately -0.00056. Furthermore, the loss function value that shows the complexity of the AKF_LSTM_xAI model was approximately 0.61, which suggests that the proposed model had reasonable complexity. Thus, LIME provides a local explanation of the specific data points which may not necessarily be generalized to all instances. Such that the local explanations for the 30-day time step for these features may not be same as one-day time step.

Given these two explanations (LIME and SHAP), it can be noted that “Wind_Speed_t₀” has the lowest value in both explanations. This suggests that each model identified this feature as least important. LIME indicates that PM_2.5_t0 > 0.62 added ~0.093 to the prediction which is a clear threshold rule whereas SHAP indicates that relative humidity contributed to 0.00018 which is very small, and near zero. In this regard, PM_2.5 is more impactful for the prediction than relative humidity. Thus, SHAP gives a precise contribution whereas LIME gives an interpretable rule. Furthermore, LIME gave a clearer insight for the specific prediction (PM_2.5 > 0.62 is impactful) and SHAP suggests relative humidity has very minimal influence. In view of this, these two explainable AI models serve different purposes which could be useful to different stakeholders within the value-chain of air pollutant prediction. Therefore, combining both LIME and SHAP in the proposed model is beneficial to different stakeholders. Beriwal and Ayeelyan [50] viewed that LIME model provides a better explanation and causal inference that help with impact analysis of air pollutant prediction from multiple locations.

The analysis of the causal inference between SO₂ and PM_2.5 (based on LIME results) show very minimal impact. Therefore, based on the results (see Table 10 and Table 11), it could be inferred that the proposed model is robust and trustworthy. Having implemented the proposed model for SO₂ concentration prediction, this study would suggest the utilization of the proposed model to different air predictive tasks, as it provides the underlying structure that could help research scientists and air monitoring stations deal with noise in dynamic and time varying air pollutants within the Northern Cape Province and other air monitoring stations in the Republic of South Africa. Moreover, including both LIME and SHAP in deep learning models could provide an explanation to different stakeholders within the value-chain of air pollution monitoring and analysis.

6. Conclusions

In South Africa, air pollution emanates from different sources including industries, and vehicles. While regulatory agencies have enacted policies to control the impact of air pollutants on humans and the environment, the use of advanced technologies to capture air pollutant concentration has received attention from the research community. This study demonstrates the effectiveness of a hybrid AKF_LSTM_xAI model for predicting SO₂ air pollutant concentration levels. This study suggests that by integrating the Adaptive Kalman Filter into LSTM and xAI, the errors in the original data are filtered to improve performance. The AKF reduces the prediction error of the LSTM model, while explainable AI (e.g., SHAP and LIME) offer insight into the intricacies of the model’s structure and explanation of the predictions. Thus, the combination of LSTM, Kalman Filter, and explainable AI provides a powerful approach for handling time-series data in complex systems. Additionally, the causal inference was conducted to analyze the impact of the air pollutants. In this regard, while the LSTM handles temporal dependencies, the Adaptive Kalman Filter improves state estimation, and xAI provides insights into the decision-making process, making the model more transparent and trustworthy. The advantage of the proposed model is that it allows accurate predictions and explains which features are driving those predictions, thus increasing transparency and trust in the model. Future research should consider the integration of emerging explainable AI models with hybrid LSTM and other types of Kalman Filter models. Finally, given that most air pollution monitoring stations are utilizing low-cost sensors, the proposed model can be integrated with low-cost sensor models to improve the quality of data from low-cost sensors.

Author Contributions

Conceptualization and methodology, software, curation, writing—original draft preparation, I.E.A.; writing—review and editing, supervision, project administration, funding acquisition, I.C.O. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by the Centre for Global Change, Sol Plaatje University, with the National Research Foundation (NRF) (Number: 136097).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on: https://doi.org/10.6084/m9.figshare.28607666.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Network
SVM	Support vector machine
k-NN	k-nearest neighbor
CNN	Convolutional Neural Networks
LSTM	Long short-term memory
GRU	Gated recurrent unit
biGRU	bidirectional gated recurrent units
WRF-LETKF	Weather Research and Forecasting- Local Ensemble Transform Kalman Filter
CMAQ	Community Multiscale Air Quality
GAM	Generalized Additive Model
LRP	Layer-wise Relevance Propagation
LIME	Local Interpretable Model-Agnostic Explanations
TCN	Temporal Convolutional Network
SHAP	SHapley additive exPlanations
L2X	Learning to eXplain
RNNs	Recurrent Neural Networks
MAPE	Mean Absolute Percentage Error
SGRU	seasonal gated recurrent unit
MLP	Multilayer perceptron
AKF_LSTM_xAI	Adaptive Kalman Filter long short-term memory explainable artificial intelligence

Appendix A

Figure A1. 2-day time step of AKF_LSTM_xAI model.

Figure A2. 2-day time step of all models.

Figure A3. Models’ MSE value.

Figure A4. Models’ RMSE value.

Figure A5. Models’ R² value.

References

World Health Organization. Air Pollution. 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health (accessed on 20 February 2025).
Yeswanth, A.V.S.; Nandanwar, H.; Sharma, K.; Chauhan, A. A data-driven: Design, modeling analytics approach for smart IoT based air pollution monitoring system. In Proceedings of the 2023 2nd International Conference on Informatics, ICI 2023, Noida, India, 23–25 November 2023. [Google Scholar]
Yeshe, D.; Wairooy, I.K.; Makalew, B.A. Predictive Analytics for Future Air Pollution Levels Based on Population Growth. In Procedia Computer Science; Elsevier: Amsterdam, The Netherlands, 2024. [Google Scholar]
Oliva, P.; Alexianu, M.; Nasir, R. Suffocating Prosperity: Air Pollution and Economic Growth in Developing Countries; International Growth Centre: London, UK, 2019; pp. 1–9. [Google Scholar]
Clark, R.; Reed, J.; Sunderland, T. Bridging funding gaps for climate and sustainable development: Pitfalls, progress and potential of private finance. Land Use Policy 2018, 71, 335–346. [Google Scholar] [CrossRef]
Maisha, T.; Mulovhedzi, P.T.; Rambuwani, G.T.; Makgati, L.N.; Barnes, M.; Lekoloane, L.; Engelbrecht, F.A.; Ndarana, T.; Mbokodo, I.L.; Xulu, N.G.; et al. The Development of a Locally Based Weather and Climate Model in Southern Africa; Water Research Commission: Pretoria, South Africa, 2025; pp. 1–197. [Google Scholar]
Department of Environmental Affairs. National Air Quality Indicator—Monthly Data Report for the Northern Cape Province; The South African Government’s Department of Environment, Forestry and Fisheries: Pretoria, South Africa, 2019; pp. 1–15. [Google Scholar]
Becker, D.; Alfeus, A.; Molnár, P.; Boman, J.; Wichmann, J. Ambient PM2.5, soot, black carbon and organic carbon levels in Kimberley, South Africa. Clean Air J. 2024, 34, 2. [Google Scholar] [CrossRef]
Charumbira, S.; Ncube, A. An Environmental Disaster: A Critical Review of Kimberlite Diamond Mining in Kimberley, South Africa. Int. J. Disaster Risk Reduct. 2022, 20. [Google Scholar] [CrossRef]
Forestry Fisheries and the Environment, National Air Quality Indicator—Monthly Data Report for the Northern Cape Province; The Department of Forestry, Fisheries and the Environment: Pretoria, South Africa, 2024; pp. 1–15.
Wang, T.; Zhang, Y.; Wang, F. Air quality prediction using machine learning methods: A review. In Environmental Modelling & Software; Elsevier: Amsterdam, The Netherlands, 2019; Volume 120, pp. 1–10. [Google Scholar]
Alkhodaidi, A.; Attiah, A.; Mhawish, A.; Hakeem, A. The Role of Machine Learning in Enhancing Particulate Matter Estimation: A Systematic Literature Review. Technologies 2024, 12, 198. [Google Scholar] [CrossRef]
Zhao, Z.; Qin, J.; He, Z.; Li, H.; Yang, Y.; Zhang, R. Combining forward with recurrent neural networks for hourly air quality prediction in Northwest of China. Environ. Sci. Pollut. Res. 2020, 27, 28931–28948. [Google Scholar] [CrossRef]
AbdElkader, A.G.; ZainEldin, H.; Saafan, M.M. Optimizing wind power forecasting with RNN-LSTM models through grid search cross-validation. Sustain. Comput. Inform. Syst. 2025, 45, 101054. [Google Scholar] [CrossRef]
Bhardwaj, D.; Ragiri, P.R. A Deep Learning Approach to Enhance Air Quality Prediction: Comparative Analysis of LSTM, LSTM with Attention Mechanism and BiLSTM. In Proceedings of the 2024 IEEE Region 10 Symposium, TENSYMP 2024, New Delhi, India, 27–29 September 2024. [Google Scholar]
Agbehadji, I.E.; Obagbuwa, I.C. Systematic Review of Machine Learning and Deep Learning Techniques for Spatiotemporal Air Quality Prediction. Atmosphere 2024, 15, 1352. [Google Scholar] [CrossRef]
Dalal, S.; Lilhore, U.K.; Faujdar, N.; Samiya, S.; Jaglan, V.; Alroobaea, R.; Shaheen, M.; Ahmad, F. Optimising air quality prediction in smart cities with hybrid particle swarm optimization-long-short term memory-recurrent neural network model. IET Smart Cities 2024, 6, 156–179. [Google Scholar] [CrossRef]
Zhao, L.; Li, Z.; Qu, L. A novel machine learning-based artificial intelligence method for predicting the air pollution index PM2.5. J. Clean. Prod. 2024, 468, 143042. [Google Scholar] [CrossRef]
Al-Eidi, S.; Amsaad, F.; Darwish, O.; Tashtoush, Y.; Alqahtani, A.; Niveshitha, N. Comparative Analysis Study for Air Quality Prediction in Smart Cities Using Regression Techniques. IEEE Access 2023, 11, 115140–115149. [Google Scholar] [CrossRef]
Naresh, G.; Indira, B. Air Pollution Prediction using Multivariate LSTM Deep Learning Model. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 211–220. [Google Scholar]
Ahmad, E.; He, Y.; Luo, Z.; Lv, J. A Hybrid Long Short-Term Memory and Kalman Filter Model for Train Trajectory Prediction. IEEE Trans. Intell. Transp. Syst. 2024, 25, 7125–7139. [Google Scholar] [CrossRef]
Li, Z.; Chen, L.; Zhang, Y. Air pollution prediction based on LSTM neural networks: A case study of PM2.5 in China. J. Environ. Manag. 2019, 245, 150–159. [Google Scholar]
Tsai, Y.-T.; Zeng, Y.-R.; Chang, Y.-S. Air Pollution Forecasting Using RNN with LSTM. In Proceedings of the 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Athens, Greece, 12–15 August 2018; pp. 1074–1079. [Google Scholar] [CrossRef]
He, Z.; Guo, Q. Comparative Analysis of Multiple Deep Learning Models for Forecasting Monthly Ambient PM2.5. Concentrations: A Case Study in Dezhou City, China. Atmosphere 2024, 15, 1432. [Google Scholar] [CrossRef]
Sun, Q.; Zhu, Y.; Chen, X.; Xu, A.; Peng, X. A hybrid deep learning model with multi-source data for PM2.5 concentration forecast. Air Qual. Atmos. Health 2021, 14, 503–513. [Google Scholar] [CrossRef]
Gilik, A.; Ogrenci, S.; Özme, A. Air quality prediction using CNN+LSTM-based hybrid deep learning architecture. Environ. Sci. Pollut. Res. 2022, 29, 1–19. [Google Scholar] [CrossRef]
Hu, M.; Lu, X.; Chen, Y.; Li, Z.; Wang, Y.; Fung, J.C. AirQFormer: Improving regional air quality forecast with a hybrid deep learning model. Sustain. Cities Soc. 2025, 119, 106133. [Google Scholar] [CrossRef]
Sayeed, A.; Eslami, E.; Lops, Y.; Choi, Y. CMAQ-CNN: A new-generation of post-processing techniques for chemical transport models using deep neural networks. Atmos. Environ. 2022, 273, 118961. [Google Scholar] [CrossRef]
Wang, L.; Chen, B.; Ouyang, J.; Mu, Y.; Zhen, L.; Yang, L.; Tang, L. Causal-inference machine learning reveals the drivers of China’s 2022 ozone rebound. Environ. Sci. Ecotechnol. 2025, 24, 100524. [Google Scholar] [CrossRef]
Hong, H.; Choi, I.; Jeon, H.; Kim, Y.; Lee, J.B.; Park, C.H.; Kim, H.S. An Air Pollutants Prediction Method Integrating Numerical Models and Artificial Intelligence Models Targeting the Area around Busan Port in Korea. Atmosphere 2022, 13, 1462. [Google Scholar] [CrossRef]
Karaiskos, P.; Munian, Y.; Martinez-Molina, A.; Alamaniotis, M. Indoor air quality prediction modeling for a naturally ventilated fitness building using RNN-LSTM artificial neural networks. Smart Sustain. Built Environ. 2024. [Google Scholar] [CrossRef]
Zhang, B.; Zou, G.; Qin, D.; Lu, Y.; Jin, Y.; Wang, H. A novel Encoder-Decoder model based on read-first LSTM for air pollutant prediction. Sci. Total Environ. 2021, 765, 144507. [Google Scholar] [CrossRef] [PubMed]
Aggarwal, A.; Toshniwal, D. A hybrid deep learning framework for urban air quality forecasting. J. Clean. Prod. 2021, 329, 129660. [Google Scholar] [CrossRef]
Alazmi, A.; Rakha, H. Assessing and Validating the Ability of Machine Learning to Handle Unrefined Particle Air Pollution Mobile Monitoring Data Randomly, Spatially, and Spatiotemporally. Int. J. Environ. Res. Public Health 2022, 19, 10098. [Google Scholar] [CrossRef]
Ali, S.; Alam, F.; Potgieter, J.; Arif, K.M. Leveraging Temporal Information to Improve Machine Learning-Based Calibration Techniques for Low-Cost Air Quality Sensors. Sensors 2024, 24, 2930. [Google Scholar] [CrossRef]
Zhao, S.; Lin, H.; Wang, H.; Liu, G.; Wang, X.; Du, K.; Ren, G. Spatiotemporal distribution prediction for PM2.5 based on STXGBoost model and high-density monitoring sensors in Zhengzhou High Tech Zone, China. J. Environ. Manag. 2025, 373, 123682. [Google Scholar] [CrossRef]
Wang, Y.Z.; He, H.D.; Huang, H.C.; Yang, J.M.; Peng, Z.R. High-resolution spatiotemporal prediction of PM2.5 concentration based on mobile monitoring and deep learning. Environ. Pollut. 2025, 364, 125342. [Google Scholar] [CrossRef]
Marwala, T.; Fournier-Tombs, E.; Stinckwich, S. The Use of Synthetic Data to Train AI Models: Opportunities and Risks for Sustainable Development; United Nations University: Tokyo, Japan, 2023; pp. 1–5. [Google Scholar]
Hua, V.; Nguyen, T.; Dao, M.S.; Nguyen, H.D.; Nguyen, B.T. The impact of data imputation on air quality prediction problem. PLoS ONE 2024, 12, 9. [Google Scholar] [CrossRef]
Agbehadji, I.E.; Millham, R.C.; Fong, S.J.; Yang, H. Bioinspired computational approach to missing value estimation. Math. Probl. Eng. 2018, 2018, 9457821. [Google Scholar] [CrossRef]
Teh, H.Y.; Kempa-Liehr, A.W.; Wang, K.I.-K. Sensor data quality: A systematic review. J. Big Data 2020, 7, 11. [Google Scholar] [CrossRef]
Shi, T.; Li, P.; Yang, W.; Qi, A.; Qiao, J. Application of TCN-biGRU neural network in PM_2.5 concentration prediction. Environ. Sci. Pollut. Res. Int. 2023, 30, 119506–119517. [Google Scholar] [CrossRef] [PubMed]
Yuan, P.; Mei, Y.; Zhong, Y.; Xia, Y.; Fang, L. A Hybrid Deep Learning Model for Predicting PM2.5. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing, ICSP 2022, Wuhan, China, 22–25 April 2022. [Google Scholar]
Faydi, M.; Zrelli, A.; Ezzedine, T. Smart Environment Monitoring Systems for PM2.5 Prediction Using Deep Learning Models in Smart City. In Proceedings of the 2023 International Symposium on Networks, Computers and Communications, ISNCC 2023, Doha, Qatar, 23–26 October 2023. [Google Scholar]
Guo, Q.; He, Z.; Wang, Z. Monthly climate prediction using deep convolutional neural network and long short-term memory. Sci. Rep. 2024, 14, 17748. [Google Scholar] [CrossRef]
Lin, Y.C.; Lin, Y.T.; Chen, C.R.; Lai, C.Y. Meteorological and traffic effects on air pollutants using Bayesian networks and deep learning. J. Environ. Sci. 2025, 152, 54–70. [Google Scholar] [CrossRef]
Kaveh, M.; Mesgari, M.S.; Kaveh, M. A Novel Evolutionary Deep Learning Approach for PM2.5 Prediction Using Remote Sensing and Spatial–Temporal Data: A Case Study of Tehran. ISPRS Int. J. Geo-Inf. 2025, 14, 42. [Google Scholar] [CrossRef]
Wang, H.S.; Jwo, D.J.; Gao, Z.H. Towards Explainable Artificial Intelligence for GNSS Multipath LSTM Training Models. Sensors 2025, 25, 978. [Google Scholar] [CrossRef] [PubMed]
Ganguli, I.; Nakum, M.; Das, B.; Kshetrimayum, N. Comprehensive Analysis of Air Quality Trends in India Using Machine Learning and Deep Learning Models. In Proceedings of the ICDCN 2025 the 26th International Conference on Distributed Computing and Networking, Hyderabad, India, 4–7 January 2025. [Google Scholar]
Beriwal, S.; Ayeelyan, J. Decoding Pollution: A Federated Learning-Based Pollution Prediction Study with Health Ramifications Using Causal Inferences. Electronics 2025, 14, 350. [Google Scholar] [CrossRef]
Agbehadji, I.E.; Obagbuwa, I.C. Mode Decomposition Bi-Directional Long Short-Term Memory (BiLSTM) Attention Mechanism and Transformer (AMT) Model for Ozone (O3) Prediction in Johannesburg, South Africa. Forecast. 2025, 7, 15. [Google Scholar] [CrossRef]
Niu, M.; Zhang, Y.; Ren, Z. Deep Learning-Based PM2.5 Long Time-Series Prediction by Fusing Multisource Data—A Case Study of Beijing. Atmosphere 2023, 14, 340. [Google Scholar] [CrossRef]
Ran, X.; Shan, Z.; Fang, Y.; Lin, C. An LSTM-Based Method with Attention Mechanism for Travel Time Prediction. Sensor 2019, 19, 861. [Google Scholar] [CrossRef]
Wang, Z.; Xu, C.; Tan, Y.P.; Yuan, J. Attention-Aware Noisy Label Learning for Image Classification. In Computer Vision and Pattern Recognition; Elsevier: Amsterdam, The Netherlands, 2020; pp. 1–10. [Google Scholar] [CrossRef]
Chen, H.; Yang, J.; Fu, X.; Zheng, Q.; Song, X.; Fu, Z.; Wang, J.; Liang, Y.; Yin, H.; Liu, Z.; et al. Water Quality Prediction Based on LSTM and Attention Mechanism: A Case Study of the Burnett River, Australia. Sustainability 2022, 14, 13231. [Google Scholar] [CrossRef]
Pantiskas, L.; Verstoep, K.; Bal, H. Interpretable Multivariate Time Series Forecasting with Temporal Attention Convolutional Neural Networks. In Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020. [Google Scholar]
Ahmad, R.; Alkhammash, E.H. Online Adaptive Kalman Filtering for Real-Time Anomaly Detection in Wireless Sensor Networks. Sensor 2024, 24, 5046. [Google Scholar] [CrossRef] [PubMed]
Singh, R.; Mehra, R.; Sharma, L. Design of Kalman Filter for Wireless Sensor Network. In Proceedings of the 2016 International Conference on Internet of Things and Applications (IOTA), Pune, India, 22–24 January 2016; IEEE: New York, NY, USA, 2016. [Google Scholar]
Zhou, H.; Wang, T.; Zhao, H.; Wang, Z. Updated Prediction of Air Quality Based on Kalman-Attention-LSTM Network. Sustainability 2023, 15, 356. [Google Scholar] [CrossRef]
Christakis, I.; Tsakiridis, O.; Kandris, D.; Stavrakas, I. A Kalman Filter Scheme for the Optimization of Low-Cost Gas Sensor Measurements. Electronics 2024, 13, 25. [Google Scholar] [CrossRef]
Song, X.; Huang, J.; Song, D. Air quality prediction based on LSTM-kalman model. In Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC 2019), Chongqing, China, 24–26 May 2019; pp. 1–5. [Google Scholar]
Yang, C.H.; Chen, P.H.; Wu, C.H.; Yang, C.S.; Chuang, L.Y. Deep learning-based air pollution analysis on carbon monoxide in Taiwan. Ecol. Inform. 2024, 80, 102477. [Google Scholar] [CrossRef]
Kataria, A.; Puri, V. AI- and IoT-based hybrid model for air quality prediction in a smart city with network assistance. IET Netw. 2022, 11, 221–233. [Google Scholar] [CrossRef]
Yang, S.C.; Cheng, F.Y.; Wang, L.J.; Wang, S.H.; Hsu, C. Impact of lidar data assimilation on planetary boundary layer wind and PM2.5 prediction in Taiwan. Atmos. Environ. 2022, 277, 119064. [Google Scholar] [CrossRef]
Lee, K.; Yu, J.; Lee, S.; Park, M.; Hong, H.; Park, S.Y.; Choi, M.; Kim, J.; Kim, Y.; Woo, J.-H.; et al. Development of Korean Air Quality Prediction System version 1 (KAQPS v1) with focuses on practical issues. Geosci. Model Dev. 2020, 13, 1055–1073. [Google Scholar] [CrossRef]
Kong, L.; Tang, X.; Zhu, J.; Wang, Z.; Pan, Y.; Wu, H.; Wu, L.; Wu, Q.; He, Y.; Tian, S. Improved Inversion of Monthly Ammonia Emissions in China Based on the Chinese Ammonia Monitoring Network and Ensemble Kalman Filter. Environ. Sci. Technol. 2019, 53, 12529–12538. [Google Scholar] [CrossRef]
Kong, L.; Tang, X.; Zhu, J.; Wang, Z.; Li, J.; Wu, H.; Wu, Q.; Chen, H.; Zhu, L.; Wang, W.; et al. A 6-year-long (2013-2018) high-resolution air quality reanalysis dataset in China based on the assimilation of surface observations from CNEMC. Earth Syst. Sci. Data 2021, 13, 529–570. [Google Scholar] [CrossRef]
Kong, L.; Tang, X.; Wang, Z.; Zhu, J.; Li, J.; Wu, H.; Wu, Q.; Chen, H.; Zhu, L.; Wang, W.; et al. Changes in air pollutant emissions in China during two clean-air action periods derived from the newly developed Inversed Emission Inventory for Chinese Air Quality (CAQIEI). Earth Syst. Sci. Data 2024, 16, 4351–4387. [Google Scholar] [CrossRef]
Minh, D.; Wang, H.X.; Li, Y.F.; Nguyen, T.N. Explainable artificial intelligence: A comprehensive review. Artif. Intell. Rev. 2022, 55, 3503–3568. [Google Scholar] [CrossRef]
Abdollahi, A.; Pradhan, B. Explainable artificial intelligence (XAI) for interpreting the contributing factors feed into the wildfire susceptibility prediction model. Sci. Total Environ. 2023, 879, 163004. [Google Scholar] [CrossRef] [PubMed]
Zafar, M.R.; Khan, N. Deterministic Local Interpretable Model-Agnostic Explanations for Stable Explainability. Mach. Learn. Knowl. Extr. 2021, 3, 525–541. [Google Scholar] [CrossRef]
Dillon, E.; Lundberg, S.; LaRiviere, J.; Roth, J.; Syrgkanis, V. Be careful when interpreting predictive models in search of causal insights. In A Joint Article About Causality and Interpretable Machine Learning with 2018; Judea Pearl: Los Angeles, CA, USA, 2018; Available online: https://shap.readthedocs.io/en/latest/example_notebooks/overviews/Be%20careful%20when%20interpreting%20predictive%20models%20in%20search%20of%20causal%20insights.html (accessed on 17 April 2025).
Li, Z.; Guo, X.; Qiang, S. A survey of deep causal models and their industrial applications. Artif. Intell. Rev. 2024, 57, 298. [Google Scholar] [CrossRef]
Lagemann, K.; Lagemann, C.; Taschler, B.; Mukherjee, S. Deep learning of causal structures in high dimensions under data limitations. Nat. Mach. Intell. 2023, 5, 1306–1316. [Google Scholar] [CrossRef]
Zhao, C.; Lin, Z.; Yang, L.; Jiang, M.; Qiu, Z.; Wang, S.; Gu, Y.; Ye, W.; Pan, Y.; Zhang, Y.; et al. A study on the impact of meteorological and emission factors on PM2.5 concentrations based on machine learning. J. Environ. Manag. 2025, 376, 124347. [Google Scholar] [CrossRef]
Zhong, J.; Xiao, R.; Wang, P.; Yang, X.; Lu, Z.; Zheng, J.; Jiang, H.; Rao, X.; Luo, S.; Huang, F. Identifying influence factors and thresholds of the next day’s pollen concentration in different seasons using interpretable machine learning. Sci. Total Environ. 2024, 935, 173430. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, Q.; Liu, J.; Petrosian, O. Long-Term Forecasting of Air Pollution Particulate Matter (PM2.5) and Analysis of Influencing Factors. Sustainability 2024, 16, 19. [Google Scholar] [CrossRef]
Zhang, L.; Wang, L.; Ji, D.; Xia, Z.; Nan, P.; Zhang, J.; Li, K.; Qi, B.; DU, R.; Sun, Y.; et al. Explainable ensemble machine learning revealing the effect of meteorology and sources on ozone formation in megacity Hangzhou, China. Sci. Total Environ. 2024, 922, 171295. [Google Scholar] [CrossRef]
Zhang, C.; Xie, Y.; Shao, M.; Wang, Q.G. Application of machine learning to analyze ozone sensitivity to influencing factors: A case study in Nanjing, China. Sci. Total Environ. 2024, 929, 172544. [Google Scholar] [CrossRef]
Oliveira, P.; Franco, F.; Bessa, A.; Durães, D.; Novais, P. Employing Explainable AI Techniques for Air Pollution: An Ante-Hoc and Post-Hoc Approach in Dioxide Nitrogen Forecasting. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2025. [Google Scholar]
Tan, J.C.M.; Cao, Q.; Quek, C. FE-RNN: A fuzzy embedded recurrent neural network for improving interpretability of underlying neural network. Inf. Sci. 2024, 663, 120276. [Google Scholar]
Talaat, F.M. Explainable Enhanced Recurrent Neural Network for lie detection using voice stress analysis. Multimed Tools Appl. 2024, 83, 32277–32299. [Google Scholar] [CrossRef]
Kiran, A.; Kumar, H.; Sivanandam, S.; Senthilvel, P.G.; Lalitha, R.V.S.; Reddy, C.P. Explainable AI in thermal modelling enhancing precision in thermal gradient monitoring for additive manufacturing using LSTM networks. Therm. Sci. Eng. Prog. 2025, 60, 103465. [Google Scholar] [CrossRef]
Ndao, M.L.; Youness, G.; Niang, N.; Saporta, G. Improving predictive maintenance: Evaluating the impact of preprocessing and model complexity on the effectiveness of eXplainable Artificial Intelligence methods. Eng. Appl. Artif. Intell. 2025, 144, 110144. [Google Scholar] [CrossRef]
Sunu Fathima, T.H.; Kovoor, B.C. Explainable AI Insights into a Time Series Weather Prediction Model Using Stacked LSTM. In Lecture Notes in Networks and Systems; Springer International Publishing AG: Cham, Switzerland, 2025. [Google Scholar]

Figure 1. Proposed AKF_LSTM_xAI structure.

Figure 2. Performance result of SHAP analysis (AKF_LSTM_xAI model).

Figure 3. Performance results of LIME analysis (AKF_LSTM_xAI model).

Figure 4. Causal inference between feature in column 1 with SO₂ and PM_2.5.

Figure 5. Causal inference between all features with SO₂ and PM_2.5.

Table 1. Summary of deep learning models for predicting air pollutants.

Author	Hybrid Model	Pollutant	Problem Identified
Yuan, Mei [43]	Hybrid deep learning model (simple-RNN, LSTM, GRU, and TCN)	PM_2.5	How to extract features effectively from a large amount of relevant monitoring data
Hong, Choi [30]	CMAQ model and RNN-LSTM	PM_2.5 and O₃	Lack of measures to identify and minimize the effects of air pollution.
Faydi, Zrelli [44]	CNN for feature extraction and LSTM for temporal sequence prediction	PM_2.5	To improve performance in smart monitoring systems used for air pollution
Shi, Li [42]	TCN with biGRU	PM_2.5	The random guess in RNN neural networks leads to performance issues.
Guo, He [45]	Hybrid CNN-LSTM	Meteorological	Reduces forecasting error for a one-month time step ahead
Lin, Lin [46]	LSTM with GAM and Bayesian network	CO, NO, NO₂, NO_x, O₃, PM₁₀, and PM_2.5 Meteorological variable analysis procedure by considering both rainfall amount and patterns	Impact of traffic factors on air quality
Kaveh, Mesgari [47]	Geographic information systems (GISs), remote sensing (RS), and a hybrid LSTM architecture to predict Approach: orchard algorithm (OA) with LSTM to optimize air pollution forecasting	PM_2.5 concentrations, meteorological data, topographical features, and satellite imagery	Gradient-based methods face limitations such as getting trapped in local minima, and high computational costs
Wang, Jwo [48]	RNNs with LSTM cells with LRP		Understanding and interpreting deep learning models in Global Navigation Satellite System (GNSS)
Ganguli, Nakum [49]	RNN, LSTM, and GRU. ML models (ARIMA and SARIMA) for air quality prediction. RNN performs better	PM_2.5 levels	Forecasting of air quality
Beriwal and Ayeelyan [50]	Federated learning that employed VGG-19 deep learning model with causal inference for model interpretability	PM_2.5 and PM₁₀	To monitor and predict PM2.5 and PM10 from multiple locations, with impact analysis
Agbehadji and Obagbuwa [51]	BiLSTM with attention transformer mechanism with mode decomposition approach	Ozone prediction	Predict the nonlinear nature of O₃ concentration in Johannesburg

Table 2. Types of Kalman Filters.

Author	Research Purpose	Comparative Models	Approach	Pollutant	Evaluation Method	Accuracy Recorded
Zhou, Wang [59]	Air pollutant concentration prediction	RNN, GRU, LSTM, attention-LSTM and Kalman-LSTM	Kalman Filter, attention, and LSTM model	SO₂, NO₂, PM₁₀, PM_2.5, and CO	SE, RMSE, MAE, and better R-square	All have smaller values
Kataria and Puri [63]	Air quality index prediction	ANN, SVM, k-NN, CNN, LSTM, CNN-LSTM, ensemble model	A Kalman Filter removes unwanted noise from data collected via sensors Proposed model (CNN-LSTM-BOA), that is, CNN-LSTM-Bayesian optimization algorithm (BOA) model	CO and PM_2.5	MAE, RMSE, coefficient of determination (R²), and accuracy score	Over 97% accuracy
Yang, Cheng [64]	Impact of lidar on PM2.5 concentration and the wind fields	-	WRF-LETKF framework coupled with the CMAQ model	Lidar-retrieved PM_2.5	-	-
Lee, Yu [65]	An air quality prediction system was developed for the main air quality criteria species in South Korea	DA RUN was compared with those of the CMAQ simulations	Data assimilation (DA) of optimal interpolation (OI) with Kalman Filter was used in this study	PM₁₀, PM_2.5, CO, O₃, and SO₂	Index of agreement
Kong, Tang [66]; Kong, Tang [67]; Kong, Tang [68]	The uncertainties in predictive systems	-	Ensemble Kalman Filter and the Nested Air Quality Prediction Modeling System	NH₃ emissions with large uncertainty	-	-
Song, Huang [61]	Prediction of time-series data with long-term and short-term characteristics	LSTM	LSTM and Kalman Filtering	CO, NO₂, C₆H₆	RMSE and R-square	The LSTM-Kalman model is better than the LSTM

Table 3. Summary of explainable AI and LSTM models.

Author	Research Focus	Approach
Kiran, Kumar [83]	Enhancing the measurement accuracy of thermal gradients in additive manufacturing	Explainable artificial intelligence (xAI) with LSTM networks. The framework includes LRP and SHAP.
Ndao, Youness [84]	Impact of data preprocessing and model complexity	LSTM predicts the remaining useful life (RUL). Explainable artificial intelligence (xAI) is used to understand the relationship between the input data and the predicted RUL. Three _XAI post hoc local agnostic methods (LIME), SHAP, and Learning to eXplain (L2X) in the context of the RUL prediction.
Sunu Fathima and Kovoor [85]	The availability of a large amount of diverse weather data is a challenge for traditional models	Short-term temperature based on a Stacked LSTM. Explainable AI using SHapley Additive exPlanations (SHAP) to determine the influence of different features on the predicted values.

Table 4. Performance evaluation metrics.

Metric	Formula	Explanation of Variables
MSE	$M S E = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}$	$\bar{y}$ is the predicted value from the model, and $y$ is the actual value of the target variable
RMSE	$R M S E = \sqrt{M S E}$
R²	$R^{2} = 1 - \frac{{\sum_{i = 1}^{n} (y_{i} - \overset{=}{y})}^{2}}{{\sum_{i = 1}^{n} (y_{i} - \overset{=}{y})}^{2}}$	$\overset{=}{y}$ is the mean of the actual target value, and N represents the number of data points

Table 5. Hyperparameter tuning.

Model	Parameter Value
LSTM	Hidden units = 50, Batch size = 32
Dense	1
Optimizer	“adam”
Epoch	100
Activation function	“relu”
AKF process noise	0.01
AKF measurement noise	0.1
Initial error covariance	1
Time step	1–5, 10, 20, 30

Table 6. SHAP feature importance (AKF_LSTM_xAI model).

Features	Value
Relative_humidity_t₀	0.000177438
NO_t₀	0.000103743
CO_t₀	7.01776 × 10⁻⁵
NO₂_t₀	4.52497 × 10⁻⁵
Ambient_Temperature_t₀	4.25577 × 10⁻⁵
O₃_t₀	2.50957 × 10⁻⁵
PM_2.5_t₀	1.57543 × 10⁻⁵
PM₁₀_t₀	1.44274 × 10⁻⁵
Wind_speed_t₀	0

Table 7. LIME feature importance (AKF_LSTM_xAI model).

Feature#	Feature	Weight
0	PM_2.5_t₀ > 0.62	0.0926935
1	O₃_t₀ > 0.61	0.0816015
2	NO₂_t₀ > 0.61	0.0779279
3	PM₁₀_t₀ > 0.62	0.0660164
4	Relative_humidity_t₀ > 0.58	0.0318761
5	Ambient_Temperature_t₀ > 0.58	0.0281854
6	NO_t₀ > 0.60	0.0225153
7	CO_t₀ > 0.61	0.00873191
8	Wind_speed_t₀ > 0.40	−0.00055597

Table 8. Loss value of the explanation (AKF_LSTM_xAI model).

Loss Function	Value
MSE	0.6052

Table 9. Performance of AKF_LSTM_xAI with other models for M-day time step.

	RMSE				MSE				R2
M-Day Time Step	AKF_LSTM_xAI	LSTM	LSTM_MLP	GRU	AKF_LSTM_xAI	LSTM	LSTM_MLP	GRU	AKF_LSTM_xAI	LSTM	LSTM_MLP	GRU
1	0.382	2.122	3.602	2.309	0.146	4.504	12.972	5.332	0.991	0.786	0.385	0.747
2	0.805	2.249	3.064	2.837	0.648	5.061	9.386	8.054	0.960	0.760	0.554	0.618
3	0.828	2.262	4.608	3.086	0.685	5.114	21.239	9.522	0.957	0.757	−0.007	0.548
4	0.755	2.323	2.715	2.762	0.570	5.396	7.3712	7.627	0.965	0.744	0.650	0.638
5	1.015	2.307	2.423	2.916	1.030	5.323	5.872	8.504	0.936	0.748	0.721	0.597
10	1.152	2.526	3.745	2.540	1.326	6.378	14.053	6.453	0.918	0.696	0.332	0.693
20	1.395	2.434	3.274	2.124	1.946	5.927	10.718	4.511	0.879	0.717	0.489	0.785
30	1.145	2.333	2.228	2.576	2.094	5.446	5.0105	6.634	0.870	0.739	0.760	0.682

Table 10. Causal inference of SO₂ and PM_2.5.

Summary	Value
Estimated effect	2.94082
New effect	2.98081
p-value	0.98

Table 11. Summary p-values and causal effects.

Summary of Effects	Column 1	Column 2
Estimated effect	0.25432	0.110716
New effect	0.25435	0.110714
p-value	0.88	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Agbehadji, I.E.; Obagbuwa, I.C. Integration of Explainable Artificial Intelligence into Hybrid Long Short-Term Memory and Adaptive Kalman Filter for Sulfur Dioxide (SO₂) Prediction in Kimberley, South Africa. Atmosphere 2025, 16, 523. https://doi.org/10.3390/atmos16050523

AMA Style

Agbehadji IE, Obagbuwa IC. Integration of Explainable Artificial Intelligence into Hybrid Long Short-Term Memory and Adaptive Kalman Filter for Sulfur Dioxide (SO₂) Prediction in Kimberley, South Africa. Atmosphere. 2025; 16(5):523. https://doi.org/10.3390/atmos16050523

Chicago/Turabian Style

Agbehadji, Israel Edem, and Ibidun Christiana Obagbuwa. 2025. "Integration of Explainable Artificial Intelligence into Hybrid Long Short-Term Memory and Adaptive Kalman Filter for Sulfur Dioxide (SO₂) Prediction in Kimberley, South Africa" Atmosphere 16, no. 5: 523. https://doi.org/10.3390/atmos16050523

APA Style

Agbehadji, I. E., & Obagbuwa, I. C. (2025). Integration of Explainable Artificial Intelligence into Hybrid Long Short-Term Memory and Adaptive Kalman Filter for Sulfur Dioxide (SO₂) Prediction in Kimberley, South Africa. Atmosphere, 16(5), 523. https://doi.org/10.3390/atmos16050523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integration of Explainable Artificial Intelligence into Hybrid Long Short-Term Memory and Adaptive Kalman Filter for Sulfur Dioxide (SO2) Prediction in Kimberley, South Africa

Abstract

1. Introduction

2. Review of Literature

2.1. Long Short-Term Memory (LSTM)

2.2. Adaptive Kalman Filter

2.3. Explainable Artificial Intelligence (xAI)

3. Materials and Methods

3.1. Proposed Method

3.2. Model Integration Steps

3.2.1. Step 1: Raw Input Data

3.2.2. Step 2: Data Preprocessing

3.2.3. Step 3: LSTM Layer (Feature Extraction and Prediction)

3.2.4. Step 4: Adaptive Kalman Filter Layer (State Estimation and Correction)

3.2.5. Step 5: Explainable AI (xAI) Layer

3.2.6. Step 6: Model Evaluation

3.2.7. Step 7: Final Prediction/Output

3.3. Model Parameter Description

4. Results

4.1. Explainable AI: SHAP Analysis on AKF_LSTM_xAI Model

4.2. Explainable AI: LIME Analysis on AKF_LSTM_xAI Model

4.3. Models’ Prediction over Time

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Integration of Explainable Artificial Intelligence into Hybrid Long Short-Term Memory and Adaptive Kalman Filter for Sulfur Dioxide (SO₂) Prediction in Kimberley, South Africa