Geographically Aware Air Quality Prediction Through CNN-LSTM-KAN Hybrid Modeling with Climatic and Topographic Differentiation

Hu, Yue; Ding, Yitong; Jiang, Wenjing

doi:10.3390/atmos16050513

Open AccessArticle

Geographically Aware Air Quality Prediction Through CNN-LSTM-KAN Hybrid Modeling with Climatic and Topographic Differentiation

by

Yue Hu

^1,*,†

,

Yitong Ding

^1,† and

Wenjing Jiang

²

¹

School of Sciences, Zhejiang University of Science and Technology, Hangzhou 310023, China

²

Faculty of Economics and Business Administration, Babeş-Bolyai University, 400591 Cluj-Napoca, Romania

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Atmosphere 2025, 16(5), 513; https://doi.org/10.3390/atmos16050513

Submission received: 16 February 2025 / Revised: 25 April 2025 / Accepted: 26 April 2025 / Published: 28 April 2025

(This article belongs to the Section Air Quality)

Download

Browse Figures

Versions Notes

Abstract

Air pollution poses a pressing global challenge, particularly in rapidly industrializing nations like China where deteriorating air quality critically endangers public health and sustainable development. To address the heterogeneous patterns of air pollution across diverse geographical and climatic regions, this study proposes a novel CNN-LSTM-KAN hybrid deep learning framework for high-precision Air Quality Index (AQI) time-series prediction. Through systematic analysis of multi-city AQI datasets encompassing five representative Chinese metropolises—strategically selected to cover diverse climate zones (subtropical to temperate), geographical gradients (coastal to inland), and topographical variations (plains to mountains)—we established three principal methodological advancements. First, Shapiro–Wilk normality testing (p < 0.05) revealed non-Gaussian distribution characteristics in the observational data, providing statistical justification for implementing Gaussian filtering-based noise suppression. Second, our multi-regional validation framework extended beyond conventional single-city approaches, demonstrating model generalizability across distinct environmental contexts. Third, we innovatively integrated Kolmogorov–Arnold Networks (KANs) with attention mechanisms to replace traditional fully connected layers, achieving enhanced feature weighting capacity. Comparative experiments demonstrated superior performance with a 23.6–59.6% reduction in Root-Mean-Square Error (RMSE) relative to baseline LSTM models, along with consistent outperformance over CNN-LSTM hybrids. Cross-regional correlation analyses identified PM2.5/PM10 as dominant predictive factors. The developed model exhibited robust generalization capabilities across geographical divisions (R² = 0.92–0.99), establishing a reliable decision-support platform for regionally adaptive air quality early-warning systems. This methodological framework provides valuable insights for addressing spatial heterogeneity in environmental modeling applications.

Keywords:

air quality index (AQI) prediction; deep learning hybrid model learning; Gaussian filtering; correlation analysis

1. Introduction

As a rapidly developing country, China has made significant progress in economic growth. However, this progress has come with serious air pollution challenges due to insufficient environmental protection during the rapid urbanization process in recent years [1]. With the increasing severity of environmental pollution, the accuracy of air quality prediction has become particularly important [2]. In response to this need, many researchers have proposed air quality prediction methods based on deep learning and hybrid models. The existing literature shows that traditional statistical models struggle to handle the nonlinear relationships in air quality data [3], whereas deep learning models—especially hybrid models—have proven effective in nonlinear fitting and have been widely applied in this field.

Janarthanan et al. [4] proposed a hybrid deep learning model based on Support Vector Regression (SVR) and Long Short-Term Memory (LSTM) networks for predicting the Air Quality Index (AQI) of Chennai, India. Their results demonstrated that their model outperformed existing technologies, providing precise air quality values for specific urban locations. Wang et al. [5] introduced a hybrid spatiotemporal model combining Graph Attention Networks (GATs) and Gated Recurrent Units (GRUs) for predicting and controlling regional composite air pollution. The results showed that their method effectively handled complex spatiotemporal correlations, demonstrating higher accuracy in predicting pollutants such as

P M_{2.5}

and

O_{3}

in various regions compared to traditional spatiotemporal prediction methods. Tang et al. [6] proposed a novel hybrid prediction model based on Variational Mode Decomposition (VMD) and Ensemble Empirical Mode Decomposition–Adaptive Empirical Mode Decomposition–Gated Recurrent Units (CEEMDAN-SE-GRU). Their experimental results indicated that their hybrid model significantly outperformed traditional single models and other hybrid models in air quality prediction accuracy, particularly when dealing with complex temporal features and non-stationary signals.

Wu et al. [7] presented a deep neural network framework with an attention mechanism (ADNNet) for AQI prediction. Their results showed that ADNNet outperformed current state-of-the-art models (such as LSTM, N-BEATS, Informer, Autoformer, and VMD-TCN). Qian et al. [8] proposed an evolutionary deep learning model based on XGBoost feature selection and Gaussian data augmentation for AQI prediction. Their results demonstrated that the method achieved higher prediction accuracy compared to traditional models, particularly in handling data noise and temporal dependencies. Chen et al. [9] proposed a novel model based on complex networks to analyze the pollutant transmission mechanisms during severe pollution events. Their experimental results demonstrated that the model effectively identified the transmission paths of pollutants between regions, particularly during heavy pollution events, where complex network methods captured key nodes and paths of pollutant interactions, providing new insights into pollutant transmission during severe pollution events.

Nikpour et al. [10] proposed a hybrid deep learning model based on Informer, called Gelato, for multivariable air pollution prediction. Experimental results showed that the Gelato model achieved higher accuracy and stability compared to traditional methods (such as LSTM, GRU, and Transformer) in predicting air pollution across multiple cities. Kalantari et al. [11] compared shallow learning and deep learning methods for AQI prediction, concluding that while shallow learning models were more computationally efficient for small datasets and simple problems, deep learning models provided higher prediction accuracy when handling complex temporal features and multivariable data. Yu et al. [12] proposed a multigranularity spatiotemporal fusion Transformer model (MGSFformer) for air quality prediction. The study demonstrated that MGSFformer excelled in capturing the spatiotemporal features of pollutant concentration changes, making it particularly suitable for complex urban environments and dynamic pollution source prediction. Tao et al. [13] introduced a hybrid AQI prediction model combining a three-stage decomposition technique. The results confirmed the feasibility of the three-stage hybrid approach, showing its significant advantages in prediction accuracy.

Udristioiu et al. [14] proposed a hybrid machine learning method for predicting particulate matter (PM) and AQI, demonstrating that the hybrid model outperformed single-algorithm models in prediction accuracy and generalization ability, providing stable predictions across multiple urban environments. Huang et al. [15] studied the spatiotemporal dynamic interactions and formation mechanisms of air pollution in the Central Plains city cluster of China. By analyzing air pollution data from that city cluster, the study explored the interactions and spatial transmission mechanisms of air pollution between cities, providing new perspectives on the propagation and formation of pollution in Chinese city clusters and offering theoretical support for regional air quality management and policy development.

Dey [16] proposed a city AQI prediction method based on multivariable Convolutional Neural Networks (CNNs) and customized stacked Long Short-Term Memory (LSTM) models. Experimental results showed that their combined model performed exceptionally well in air quality prediction tasks across multiple cities, demonstrating significant improvements in prediction accuracy and generalization ability compared to traditional LSTM and other deep learning models (such as GRU and RNN). Li et al. [17] proposed an evolutionary deep learning model based on an improved Grey Wolf Optimization (GWO) algorithm and Deep Belief Network–Extreme Learning Machine (DBN-ELM). The results indicated that the hybrid model significantly outperformed traditional machine learning and other deep learning models, especially in handling complex nonlinear relationships and temporal features. Gokul et al. [18] applied AI technology to analyze the spatiotemporal air quality in Hyderabad, India, and predict

P M_{2.5}

concentrations. Their research found that meteorological changes and traffic density were significant factors influencing

P M_{2.5}

concentrations. Fan et al. [19] proposed a real-time

P M_{2.5}

monitoring method combining machine learning and computer vision for urban street canyon air quality monitoring. Experimental results showed that their method effectively captured the spatial heterogeneity of pollutant concentrations in street canyon areas, providing higher accuracy and timeliness than traditional single-sensor monitoring methods. Ahmed et al. [20] introduced an advanced prediction model based on deep learning that integrated satellite remote sensing water-climate variables for AQI prediction. Their results showed that the model significantly improved AQI prediction accuracy compared to traditional methods that only used ground monitoring data, especially for long-term and large-scale air quality prediction. Rabie et al. [21] proposed a hybrid framework combining CNN and Bidirectional Long Short-Term Memory (Bi-LSTM) for high-resolution AQI prediction in megacities. Experimental results demonstrated that the CNN-Bi-LSTM hybrid model effectively predicted AQI in megacities, significantly improving prediction accuracy and generalization ability compared to using LSTM or CNN alone.

The hybrid machine learning models discussed above have shown good results in air quality prediction, mainly for specific cities or regions. However, there is insufficient research on air quality prediction for cities with varying climates, geographical locations, and topographical features. Therefore, this study proposes a more versatile deep learning model, a hybrid framework combining Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, and Attention Networks (KANs), to improve the accuracy of air quality index (AQI) prediction in cities with geographical and climate differences, particularly tailored to the specific needs of major cities in China. The approach adopted in this study includes the consideration of variations in pollution levels, terrain features, and weather conditions in different urban areas. To this end, the research utilized pollution data from five cities in various directions across China to build a large dataset. Additionally, the study aimed to gain a deeper understanding of the pollution patterns in the selected cities, monitor fluctuations in pollution levels, and optimize the computational efficiency of the developed model. Through this work, an excellent deep learning model was established, capable of accurately predicting the AQI for cities with geographical and climate disparities. To achieve these objectives, the study collected data on the concentration of six pollutants (

C O

,

S O_{2}

,

N O_{2}

,

O_{3}

,

P M_{2.5}

,

P M_{10}

) and AQI values, sourced from air quality monitoring websites. Missing values were first imputed, followed by testing, and to improve data quality and reduce noise, a novel Gaussian filtering method was applied during the data preprocessing stage. Subsequently, four advanced deep learning algorithms—CNN-LSTM-KAN, CNN-LSTM, LSTM-KAN, and LSTM—were developed to analyze the data and predict the behavior of pollutants. These algorithms were used to study the pollution distribution patterns of major cities in China over a five-year span and assess the accuracy of each prediction. In addition, a correlation analysis was performed between various features, identifying both the similarities and differences in pollution patterns across different cities in various locations across China.

2. Methodology

2.1. Study Area

This study selected five cities in China for analysis: Shanghai, Shenzhen, Chengdu, Beijing, and Wuhan. The locations of these cities are shown in Figure 1. The selection of these cities as the research focus was based on the following reasons and benefits:

Geographical coverage: These five cities are located in distinct regions of China: Shanghai in the east, Shenzhen in the south, Chengdu in the west, Beijing in the north, and Wuhan in the central part of the country. This diverse geographical distribution provided a broader perspective for air quality prediction, ensuring the model was not biased by regional characteristics and enhancing its generalizability across the country.
Diversity of economic and industrial activities: The five cities differ significantly in terms of their economic development and industrial structures. Shanghai and Shenzhen, as economic and industrial hubs, are characterized by high levels of traffic and industrial emissions. Beijing, influenced by its winter heating systems, Chengdu, with its unique geographical and industrial landscape, and Wuhan, representing the central region of China, each presenting distinct pollution sources. This diversity allowed for a comprehensive investigation of the impact of various air pollution factors on air quality.
Climate diversity: The climate types in these cities vary widely. Shanghai and Shenzhen have a humid climate, Beijing experiences a continental monsoon climate, Chengdu is situated in a basin climate, and Wuhan has a subtropical monsoon climate. The differences in climate affect air quality in unique ways, enabling the model to incorporate a wide range of climatic influences and enhancing its ability to adapt to varying weather conditions.

2.2. Data Collection

This study used daily data (data spanned a daily period) from the China Air Quality Online Monitoring and Analysis Platform, covering a 5-year period from 30 September 2019, to 30 September 2024, for cities throughout China, with a total of 1829 data points. The dataset included the average levels of six pollutants monitored at all air quality monitoring stations in each city:

P M_{2.5}

,

P M_{10}

,

C O

,

N O_{2}

,

S O_{2}

, and

O_{3}

. Additionally, the average value of the Air Quality Index (AQI) was recorded for each monitoring station. According to the “Ambient Air Quality Standards” (GB 3095-2012), the concentration data for each pollutant were determined based on the hourly average values.

The AQI is a widely used indicator for assessing air quality in many countries. The primary benchmarks for AQI measurement include the standards set by the US Environmental Protection Agency (EPA), the European Union (EU), the World Health Organization (WHO), as well as national standards of China and other countries [22]. Although these standards are largely similar in terms of calculation methods for the AQI and the 0–500 scale range, they differ in the average measurement periods for various pollutants and the allowable concentration thresholds over time, which results in variations between the standards. The main reasons behind these discrepancies lie in the differing objectives of the standards and the strategies adopted by policymakers. In general, compared to China’s national standards, the EPA and EU standards specify lower allowable concentrations of pollutants and lower AQI threshold values. While EPA standards are widely accepted globally, China maintains its own national AQI standards, which diverge in several aspects. China’s AQI formula is as follows:

I A Q I = \frac{I A Q I_{h i g h} - I A Q I_{l o w}}{B P_{h i g h} - B P_{l o w}} (C - B P_{l o w}) + I A Q I_{l o w}

(1)

where C is the real-time concentration of the pollutant,

B P_{h i g h}

is the upper limit of the corresponding concentration interval,

B P_{l o w}

is the lower limit of the corresponding concentration interval,

I A Q I_{h i g h}

is the upper limit of the corresponding sub-index, and

I A Q I_{l o w}

is the lower limit of the corresponding sub-index. We calculated the

I A Q I

of each pollutant and took the maximum value as the AQI value:

A Q I = max ({I A Q I}_{P M 2.5}, {I A Q I}_{P M 10}, {I A Q I}_{{S O}_{2}}, {I A Q I}_{{N O}_{2}}, {I A Q I}_{O_{3}}, {I A Q I}_{C O})

(2)

2.3. Data Preprocessing

Due to equipment malfunctions, extreme weather conditions, and other factors, there were a small number of missing values in the collected data. Since the quantity of missing data was minimal, and considering that the Air Quality Index (AQI) is a time-series dataset, linear interpolation was employed to fill in the missing values by interpolating between adjacent data points.

The collected data were tested for normality using the Shapiro–Wilk test, which is a commonly used method to assess whether a dataset follows a normal distribution. The Shapiro–Wilk test is particularly suitable for small sample sizes (typically less than 2000 data points) and is widely used in statistics and data science to validate the assumption of normality. By examining the test statistic and p-value, we could determine whether the data significantly deviated from a normal distribution.

The Shapiro–Wilk test statistic measures the goodness-of-fit between the observed data and a normal distribution. A value closer to 1 indicates a better fit to the normal distribution. The p-value is used to test the null hypothesis that the data follow a normal distribution. A commonly used significance level is 0.05. If the p-value is greater than 0.05, we fail to reject the null hypothesis, meaning the data are normally distributed. Conversely, if the p-value is less than or equal to 0.05, we reject the null hypothesis, indicating that the data do not follow a normal distribution. Based on the test results presented in Table 1, it was found that the data did not follow a normal distribution.

2.4. Gaussian Filtering

Due to various factors, the collected data exhibited high variance, which presented a challenge for machine learning models. One common type of noise found in the data was random or white noise, which is inherently unpredictable. To address this issue, several noise reduction techniques have been proposed, with one of the most well-known and straightforward methods being the application of frequency-domain filters. These filters process the frequency components of a signal, thereby providing valuable insights into the structure and behavior of the signal.

Gaussian filtering [23] is a linear filtering technique commonly used for smoothing images, with the goal of reducing noise and fine details. It is widely applied in image processing and computer vision, where it smooths the image by calculating a weighted average of neighboring pixels. The weights for this average are determined by a Gaussian function (i.e., a normal distribution). This method effectively preserves the overall structure of the image while removing noise. In addition to its applications in image processing, Gaussian filtering also plays a significant role in time-series analysis, particularly for data smoothing, noise reduction, and handling non-stationary data.

While Gaussian filtering is traditionally used in image processing, its noise-reduction and smoothing capabilities are equally applicable to time-series data. In this study, a Gaussian filter with a variance of 2 was applied to process the AQI feature in the dataset. The comparison between the original AQI data and the Gaussian-filtered AQI is shown in Figure 2. The filtering process can be expressed by the following formula:

G (x) = \frac{1}{\sqrt{2 π σ^{2}}} exp (- \frac{x^{2}}{2 σ^{2}})

(3)

2.5. Min–Max Normalization

In data processing and machine learning, to enable better comparison, it is common practice to scale the original data by adjusting the minimum value to

- 1

and the maximum value to 1, with other values proportionally scaled between these two extremes. This technique is referred to as feature scaling, or more specifically, Min–Max normalization.

Min–Max normalization transforms the original data features into a specific range, typically between

- 1

and 1, based on the minimum and maximum values of the data. This scaling technique ensures that the features are proportionally adjusted within the desired range, facilitating more effective comparison and integration with machine learning models.

The formula for Min–Max normalization is as follows:

X_{scaled} = \frac{X - X_{\min}}{X_{\max} - X_{\min}} \times (max - min) + min

(4)

where X is the original data feature value,

X_{\min}

is the minimum value of the feature,

X_{\max}

is the maximum value, min and max are scaled to range from −1 to 1.

2.6. Evaluation Metrics

This study used four indicators to evaluate the prediction effect of the model, namely,

R M S E

,

R^{2}

,

M A E

, and

M A P E

. Their formulas are as follows:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(5)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(6)

MAE = \frac{1}{n} \sum_{i = 1}^{n} ∣ y_{i} - {\hat{y}}_{i} ∣

(7)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100 %

(8)

where

y_{i}

is the real value;

{\hat{y}}_{i}

is the predicted; n is sample size; and

{\bar{y}}_{i}

is the mean of the samples. The primary distinction between Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) stems from their error quantification methodologies and domain-specific applicability. The MAE computes the absolute deviation between predicted and observed values, expressed in the original data unit. It exhibits robustness to outliers, accommodates zero-valued observations, and is preferred in scenarios prioritizing absolute error magnitude (e.g., inventory management systems). However, its lack of scale invariance hinders direct comparisons across heterogeneous datasets. The MAPE, by contrast, quantifies relative error as a percentage of the observed value, providing scale-agnostic interpretability (e.g., “5% deviation”). This metric facilitates cross-dimensional performance benchmarking (e.g., sales volume vs. user growth predictions) and aligns with business communication standards. Nevertheless, the MAPE suffers from two critical limitations: (i) mathematical indefiniteness when actual values equal zero and (ii) asymmetric error amplification for small-denominator observations, which systematically biases predictions towards underestimation. To establish a rigorous dual evaluation framework, this study adopted both MAE and MAPE as complementary performance metrics. Their synergistic application enabled the holistic assessment of model accuracy across absolute and relative error dimensions, thereby mitigating individual metric biases and enhancing interpretative robustness.

2.7. Research Methodology

2.7.1. Convolutional Neural Network

CNN (Convolutional Neural Network) Convolutional Neural Networks (CNNs) are a class of deep learning models originally designed for image and visual tasks but that have since been widely adopted across various domains [24]. CNNs utilize a layered architecture composed of convolutional layers and pooling layers, which work together to progressively extract hierarchical spatial features from input data. The key components of a CNN include the following layers:

Convolutional layer: This layer applies convolution operations using filters (or kernels) to the input data, extracting local features. Convolution essentially performs a local weighted summation, where each filter is designed to detect different features in the input data. This process enables the network to learn spatial hierarchies of features [25].

Activation function: Nonlinear activation functions, such as ReLU (Rectified Linear Unit), are applied to the output of the convolutional layer to introduce nonlinearity into the model. This enhances the network’s capacity to model complex patterns and relationships in the data.

Pooling layer: The pooling layer, typically placed after convolutional layers, reduces the dimensionality of the data through downsampling, which helps decrease computational load and provides translation invariance for the features. Common pooling techniques include Max Pooling and Average Pooling, which retain the most important features while reducing the overall size of the data.

Fully connected layer: After the convolutional and pooling layers extract high-level features, these are usually passed to a fully connected layer for classification or regression tasks. The fully connected layer connects every node from the previous layer to all nodes in the current layer, enabling the model to learn more complex decision boundaries within the feature space.

In this study, the traditional fully connected layer was replaced with KANs (Kolmogorov–Arnold Networks), which offer enhanced capabilities for modeling complex, nonlinear relationships in the data.

2.7.2. Long Short-Term Memory

Long Short-Term Memory (LSTM) networks are a specialized type of Recurrent Neural Network (RNN) that are particularly effective at capturing long-term and short-term dependencies in time-series data [26]. LSTM addresses the issues of vanishing and exploding gradients found in traditional RNNs, making it more suitable for handling long sequences of data.

The core of an LSTM unit is its gating mechanism, which controls the flow of information, allowing it to retain long-term dependencies while discarding irrelevant data. An LSTM unit consists of the following gates:

Forget gate: The forget gate decides which information should be “forgotten” or retained from the cell state. It takes the current input and the previous hidden state to compute the output, which represents the proportion of information to keep or discard. The equation for the forget gate is as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(9)

where

f_{t}

is the output of the forget gate, indicating the proportion to be retained or forgotten (value between 0 and 1).

W_{f}

and

b_{f}

are the weight and bias parameters of the forget gate, and

σ

is the sigmoid function.

Input gate: The input gate controls which new information is allowed to be added to the cell state. It includes two components: Candidate cell state

{\tilde{C}}_{t}

, which generates new candidate information at the current moment.

b_{C}

is the bias term for the candidate cell state, used to adjust the generation of the candidate state and control the intensity of new information generation.

{\tilde{C}}_{t} = tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(10)

Input gate

i_{t}

determines which parts of the candidate cell state will be added to the current cell state.

b_{i}

is the bias term for the input gate, used to adjust the activation level of the input gate and control how much new information is added.

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(11)

Then, it updates the cell state:

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t}

(12)

where

C_{t}

is the current cell state, which represents the important information of the sequence data up to the current moment. The output gate determines the hidden state

h_{t}

at the current moment. It is used to output the result of the current time step and also as one of the inputs of the next time step. The calculation formula is as follows:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(13)

h_{t} = o_{t} \cdot tanh (C_{t})

(14)

where

o_{t}

is the output of the output gate, which controls which part of the cell state information is output.

h_{t}

is the hidden state at the current moment, which is passed to the next time step as the output of LSTM.

b_{o}

is the bias term for the output gate, used to adjust the activation level of the output gate and control how much information is output to the hidden state.

LSTM networks are particularly useful for time-series data, where they can learn long-term trends and seasonal variations, which is critical for tasks like AQI prediction. In this study, the LSTM network processed features extracted by the CNN and performed further time-series modeling. Stacking multiple layers of LSTM enhanced the model’s representational power, enabling it to capture more complex temporal patterns.

2.7.3. Kolmogorov–Arnold Networks

Kolmogorov–Arnold Networks (KANs) [27] are a type of neural network based on the Kolmogorov–Arnold Superposition Theorem, designed to approximate complex high-dimensional functions by decomposing them into lower-dimensional mappings and their combinations. The formula for the KAN generally follows the Kolmogorov Superposition Theorem, which asserts that any continuous function can be represented as a finite sum of continuous functions and their linear combinations. For any continuous function

f : {[0, 1]}^{n} \to R

, it can be expressed as a linear combination of finitely many continuous functions

φ

and

g_{i j}

, that is:

f (x_{1}, x_{2}, \dots, x_{n}) = \sum_{i = 1}^{2 n + 1} φ (\sum_{j = 1}^{n} g_{i j} (x_{j}))

(15)

In the KAN model, a network is constructed to approximate complex high-dimensional mappings using multiple nonlinear transformations. The architecture of a KAN is typically implemented through a Multi-Layer Perceptron (MLP) that employs nonlinear activation functions (such as ReLU, sigmoid, or tanh) to approximate the target function. In practice, the KAN architecture optimizes the model parameters to minimize expected errors, adjusting them to better approximate the desired function.

In this study, a KAN was used as a module following the LSTM layers, replacing traditional fully connected layers. While conventional fully connected layers map features to output spaces through linear transformations, a KAN uses a more sophisticated structure to integrate features from different layers, thereby capturing the nonlinear relationships in the data more effectively. Specifically, a KAN enhances the model’s generalization ability, reduces overfitting, and better handles the nonlinear relationships inherent in time-series data. The introduction of the KAN module aimed to overcome the limitations of traditional LSTM models, boosting the model’s capacity to learn from complex temporal data. By incorporating the KAN module at the final stage of the model, the network could leverage higher-dimensional features, leading to more accurate AQI predictions. The hybrid model structure is illustrated in Figure 3.

2.8. Model Construction and Training

In this study, the model combined a Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM) network, and Kolmogorov–Arnold Network (KAN) to predict the Air Quality Index (AQI) (The training set and the test set were divided in a ratio of 8:2). The core structure of the model consisted of multiple layers designed to efficiently extract features from time-series data and capture both short-term and long-term temporal dependencies. The hybrid model’s training and prediction process is illustrated in Figure 4. Below is a detailed description of the model architecture:

The first part of the model was the Convolutional Neural Network (CNN), which was used to extract local features from the AQI time-series data. The CNN was designed to adapt to the temporal nature of the data, focusing on capturing time-based patterns and trends within the daily AQI sequence. This was achieved using specialized 1D convolutional layers that detected temporal patterns and trends in the AQI data.

The input data first passed through a 1D convolutional layer with a kernel size of 3, one input channel, and 32 output channels [28]. The convolutional layer applied a sliding-window operation over the input sequence to extract local temporal features. The result of these convolution operations was a feature map that contained the local temporal correlations within the AQI data. A ReLU activation function was then applied to introduce nonlinearity and enhance the model’s expressive capability. Following this, a Max Pooling layer was used to reduce the dimensionality of the feature map while retaining the essential temporal features. This helped reduce computational complexity and prevent overfitting. The extracted features were then passed to the LSTM layer.

The LSTM layer was responsible for capturing both short-term and long-term dependencies in the time-series data. In this model, the LSTM layer consisted of 32 hidden units and was configured with two layers. The LSTM utilized its memory cell (cell state) to process temporal relationships within the sequence, enabling it to capture complex dynamic changes in the AQI data over time. The LSTM is unidirectional, meaning it generates predictions progressively, moving from past time steps toward future predictions.

To prevent overfitting, a dropout layer was introduced with a dropout rate of 0.2. Dropout works by randomly deactivating a fraction of neurons, forcing the network to learn more robust feature representations, which improves the model’s generalization ability. After the LSTM layer, a Kolmogorov–Arnold Network (KAN) was incorporated. The KAN used kernel methods to learn the significance of features and assign different weights to them, thereby emphasizing critical moments in the AQI sequence. This allowed the model to focus more precisely on key time periods, enhancing prediction accuracy. The input to the KAN layer was the output from the LSTM layer, and the output of the KAN layer generated the final prediction.

During training, the model used the Mean Squared Error (MSE) as the loss function, and the Adam optimizer was used to update the model parameters. The learning rate (lr) for the Adam optimizer was set to 0.01, which helped accelerate the convergence process. The model’s parameters were adjusted via backpropagation at each training epoch to minimize the loss function.

The model was trained over 200 epochs, with the training loss recorded at each epoch to optimize the parameters. After training, the model’s performance was validated using a test set. The evaluation metrics included the Root-Mean-Square Error (RMSE), R², Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). These metrics assessed the model’s prediction accuracy, explanatory power, and robustness across different datasets.

The model’s input consisted of preprocessed AQI time-series data. During preprocessing, the data were first denoised using Gaussian filtering and then normalized using MinMaxScaler to scale it to the range of [

- 1

, 1]. The training and test sets were split in an 80:20 ratio, and the data were divided into different time windows. Each time window typically contained the past 20 days of data (lookback = 20) to predict future AQI values. The output was the predicted AQI, which was then reverse-normalized to the original scale for comparison with the actual data.

After model training was complete, several evaluation metrics were used to test its performance. First, the RMSE was calculated for both the training and test sets to assess prediction accuracy. A lower RMSE value indicated more accurate predictions. Next, R² was used to evaluate the model’s ability to explain the variance in the data. A higher R² value signified better explanatory power. Additional metrics such as MAE and MAPE further validated the model’s robustness across different datasets.

In summary, this model combined a CNN for local feature extraction, LSTM for capturing temporal dependencies, and a KAN for enhancing key feature learning. This hybrid approach enabled the model to effectively predict the AQI and served as a high-precision tool for air quality monitoring.

2.9. Correlation Analysis

To identify the similarities and differences in pollution patterns across various cities in China and to explore the deeper relationships between the Air Quality Index (AQI) and other air pollutants, we calculated the correlations between features, particularly focusing on the correlation between the AQI and the other pollutants. This was accomplished using a correlation matrix, which helped to identify the most significant features for AQI prediction.

The Pearson correlation coefficient test is a statistical method used to assess the strength and direction of the linear relationship between two variables. It quantifies the correlation between two variables using the Pearson correlation coefficient, which ranges from

- 1

to 1. This method is widely employed in scientific research, social sciences, and data analysis to determine whether significant correlations exist between variables. The correlation between AQI and the other six features is illustrated in the figure.

The formula is as follows:

r = \frac{\sum (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum {(X_{i} - \bar{X})}^{2} \sum {(Y_{i} - \bar{Y})}^{2}}}

(16)

where

X_{i}

and

Y_{i}

represent the values of the two variables, and

\bar{X}

and

\bar{Y}

represent the means of the two variables. The range of r is (

- 1

, 1).

3. Result and Discussion

3.1. Correlation Analysis Results

Based on the correlation matrix results from Figure 5, Figure A1, Figure A2, Figure A3 and Figure A4, we observed the following relationships between the AQI and various air pollutants across the five selected cities:

Shanghai: $P M_{2.5}$ and the AQI had a correlation of 0.671, indicating a moderate correlation. $P M_{10}$ and the AQI had a correlation of 0.628, showing a weak but still significant relationship. $O_{3}$ and the AQI had a correlation of 0.611, suggesting that $O_{3}$ had a notable influence on the AQI. $S O_{2}$ and the AQI had a correlation of 0.300, indicating a weak effect on air quality.
Shenzhen: $P M_{2.5}$ and the AQI had a correlation of 0.703, showing a strong positive correlation. $P M_{10}$ and the AQI had a correlation of 0.708, indicating a significant impact. $O_{3}$ and the AQI had a very high correlation of 0.946, which is extremely significant. This highlighted that $O_{3}$ was a major factor influencing air quality fluctuations in Shenzhen.
Chengdu: $P M_{2.5}$ and the AQI had a correlation of 0.671, suggesting that fine particulate matter significantly impacted air quality. $P M_{10}$ and the AQI had a correlation of 0.628, indicating that larger particles also had a considerable effect. $S O_{2}$ and the AQI had a correlation of 0.226, indicating a weak influence. NO2 and the AQI had a correlation of 0.277, suggesting a slight impact.
Beijing: $P M_{2.5}$ and the AQI had a correlation of 0.738, showing a significant influence from fine particulate matter. $P M_{10}$ and the AQI had a correlation of 0.712, indicating a strong relationship between larger particles and the AQI. $C O$ and the AQI had a correlation of 0.481, suggesting that traffic emissions and fuel combustion played a notable role in affecting the AQI. $O_{3}$ and the AQI had a correlation of 0.420, indicating a weaker but still present influence on air quality. $S O_{2}$ and the AQI had a correlation of 0.226, indicating a limited impact.
Wuhan: $P M_{2.5}$ and the AQI had a correlation of 0.715, showing that fine particulate matter strongly affected the AQI. $P M_{10}$ and the AQI had a correlation of 0.758, indicating a high impact from larger particles on the AQI. $O_{3}$ and the AQI had a correlation of 0.420, suggesting a weak influence. $S O_{2}$ and the AQI had a correlation of 0.226, indicating a limited effect. $C O$ and the AQI had a correlation of 0.481, similar to Beijing, suggesting that traffic emissions and combustion processes affected the AQI.

Key insights:

P M_{2.5}

and

P M_{10}

consistently showed significant correlations with the AQI across the cities, with these two pollutants playing a dominant role in the fluctuations in air quality, especially in Shanghai, Chengdu, Beijing, and Wuhan. In Shenzhen, however,

O_{3}

played a more dominant role in air quality fluctuations, with a notably high correlation of 0.946. This can be attributed to Shenzhen’s favorable climatic conditions, including high temperatures and abundant sunlight [29], which promote the formation of ozone. Ozone is a photochemical pollutant that forms more readily in sunny and warm environments, such as those found in Shenzhen. Research by Yu et al. [30] indicated that ozone pollution in Shenzhen was mainly influenced by NOx and VOCs. Under conditions of strong sunlight and high temperatures, ozone concentrations increase significantly. Studies by Liu et al. [31] showed that ozone in Shenzhen had a more prominent effect on the AQI during the summer, especially during peak ozone formation periods, when AQI levels were noticeably higher. The city’s high temperatures, low wind speeds, and stable atmospheric structure contribute to the accumulation of pollutants, particularly ozone. When unfavorable meteorological conditions hinder air movement, ozone concentrations can rise even further.

Correlation with other pollutants: In four out of the five cities,

O_{3}

showed a weak negative correlation with

N O_{2}

, suggesting that

N O_{2}

emissions may inhibit ozone formation. This is a common observation in many urban environments, where

N O_{2}

can react with ozone under certain conditions.

3.2. Hybrid Machine Learning Results

To rigorously evaluate the CNN-LSTM-KAN hybrid model, we conducted comparative analyses against three baseline architectures: CNN-LSTM, LSTM-KAN, and standalone LSTM. Utilizing daily AQI data from five geographically diverse Chinese cities (spanning southeastern, northwestern, and central regions), performance was quantified through four established metrics: Root-Mean-Square Error (RMSE), Mean Absolute Error (MAE), coefficient of determination (R²), and Mean Absolute Percentage Error (MAPE). As demonstrated in Figure 6, Figure A5, Figure A6 and Figure A7, the proposed CNN-LSTM-KAN model consistently achieved superior performance across all evaluation criteria.

Comprehensive model performance: The hybrid architecture demonstrated significant advantages in multi-city validation. In Shanghai, CNN-LSTM-KAN reduced the RMSE by 23.9% compared to CNN-LSTM (1.9222 vs. 2.5259), while improving R² to 0.9832 from 0.9759 with the LSTM baseline. Extreme performance gains emerged in Beijing, where CNN-LSTM-KAN exhibited a 59.6% lower RMSE (2.5213 vs. 6.2410) and a 65.7% reduced MAPE (2.4882 vs. 7.2514) relative to basic LSTM. Cross-regional consistency was evidenced by 28.3–44.7% MAPE reductions in Shenzhen and Chengdu compared to suboptimal models.
Component-wise effectiveness: Ablation studies confirmed critical contributions from architectural innovations. The integrated CNN feature extractor enabled a 37% RMSE reduction in Wuhan (2.1990 vs. 3.5037) compared to standalone LSTM. Kolmogorov–Arnold Network (KAN) modules enhanced prediction stability, particularly in high-variability scenarios (Shenzhen: 44.7% RMSE improvement vs. CNN-LSTM). Attention-based feature weighting mechanisms improved temporal pattern capture, reflected in R² values exceeding 0.98 across coastal cities.
Robustness against extreme cases: The model demonstrated exceptional resilience in challenging environments. It maintained an MAPE below 2.5% in Beijing’s heavy pollution scenarios (PM2.5-dominated), outperforming LSTM-KAN by 51.2%. It achieved a 8.5% lower RMSE than conventional hybrids in Chengdu’s complex topography. It preserved <3% MAPE stability despite Shenzhen’s characteristic ozone-driven fluctuations.

4. Conclusions

Air quality prediction plays an integral role in monitoring and predicting pollution levels, contributing significantly to urban sustainability. Given the growing concerns regarding air pollution, accurate prediction of air quality is essential for timely decision-making and urban management. This study addressed the critical challenge of air quality prediction in heterogeneous environments through the development of a CNN-LSTM-KAN hybrid deep learning framework, yielding three key theoretical contributions:

1. Methodological innovation in data governance: The non-Gaussian distribution characteristics of air quality data, validated by Shapiro–Wilk testing (p < 0.05), informed the development of an integrated preprocessing framework combining Gaussian filtering with Min–Max normalization. This approach significantly enhanced model convergence speed and noise suppression efficacy.

2. Architectural breakthrough in model design: The novel integration of Kolmogorov–Arnold Networks (KANs) with attention mechanisms, replacing conventional fully connected layers, established a dynamic feature weighting system that demonstrated superior complex pattern recognition capabilities. Comparative experiments revealed the model’s RMSE reductions of 59.6–44.7% against baseline LSTM in representative cities (Beijing and Shenzhen), with R² values reaching 0.92–0.99.

3. Environmental heterogeneity decoding: cross-city validation identified PM2.5/PM10 as universal dominant predictors (correlation coefficients: 0.76–0.89), while simultaneously detecting ozone sensitivity in coastal cities like Shenzhen, providing theoretical foundations for region-specific pollution control strategies.

Practical implementation value: The proposed framework successfully overcame the generalization limitations of conventional models in spatiotemporally heterogeneous environments. The synergistic combination of multi-scale feature extraction (CNN-LSTM modules) and dynamic feature reconstruction (KAN modules) established an extensible technical foundation for smart-city air quality warning systems. The model’s differentiated performance in high-variability (e.g., Shenzhen) and high-pollution scenarios (e.g., Beijing) confirmed its strong adaptability to complex urban ecosystems. While significant progress has been achieved, four critical directions warrant further investigation: (1) Multimodal data fusion: integration of spatiotemporal covariates including meteorological parameters and traffic patterns to develop coupled natural-social system prediction paradigms. (2) Computational efficiency optimization: creation of lightweight Transformer-based architectures enhanced with federated learning for multi-city collaborative prediction.

Author Contributions

Conceptualization, Y.H.; Methodology, Y.H.; Software, Y.H.; Validation, Y.D.; Formal analysis, Y.D.; Investigation, Y.D.; Resources, Y.D.; Data curation, Y.D.; Writing—original draft, Y.D.; Writing—review & editing, Y.D.; Visualization, W.J.; Supervision, W.J.; Project administration, W.J.; Funding acquisition, W.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data is sourced from public websites: https://www.aqistudy.cn/historydata/.

Conflicts of Interest

The authors declare no conflicts of interest.

List of Abbreviations

AQI	Air Quality Index
CNN	Convolutional Neural Network
LSTM	Long Short-Term Memory
KAN	Kolmogorov–Arnold Network
RNN	Recurrent Neural Network
SVR	Support Vector Regression
GRU	Gated Recurrent Unit
GAT	Graph Attention Network
VMD	Variational Mode Decomposition
CEEMDAN	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
TCN	Temporal Convolutional Network
XGBOOST	eXtreme Gradient Boosting
DBN-ELM	Deep Belief Network–Extreme Learning Machine
GWO	Grey Wolf Optimization
Bi-LSTM	Bidirectional Long Short-Term Memory
CNN-LSTM-KAN	Convolutional Neural Network–Long Short-Term Memory–Kolmogorov–Arnold Network
CNN-LSTM	Convolutional Neural Network–Long Short-Term Memory
LSTM-KAN	Long Short-Term Memory–Kolmogorov–Arnold Network

Appendix A. Figure

Figure A1. Correlation diagram of AQI and other air pollutants in Shanghai.

Figure A2. Correlation diagram of AQI and other air pollutants in Chengdu.

Figure A3. Correlation diagram of AQI and other air pollutants in Beijing.

Figure A4. Correlation diagram of AQI and other air pollutants in Wuhan.

Figure A5. The prediction performance of four major hybrid machine learning models in five cities (MAE).

Figure A6. The prediction performance of four major hybrid machine learning models in five cities (MAPE).

Figure A7. The prediction performance of four major hybrid machine learning models in five cities (R2).

References

Fu, K.Y.; Liu, Y.Z.; Lu, X.Y.; Chen, B.; Chen, Y.H. Health impacts of climate resilient city development: Evidence from China. Sustain. Cities Soc. 2024, 116, 105914. [Google Scholar] [CrossRef]
Wu, C.; Lu, S.; Tian, J.; Yin, L.; Wang, L.; Zheng, W. Current Situation and Prospect of Geospatial AI in Air Pollution Prediction. Atmosphere 2024, 15, 1411. [Google Scholar] [CrossRef]
Meng, A.; Ge, J.; Yin, H.; Chen, S. Wind speed forecasting based on wavelet packet decomposition and artificial neural networks trained by crisscross optimization algorithm. Energy Convers. Manag. 2016, 114, 75–88. [Google Scholar] [CrossRef]
Janarthanan, R.; Partheeban, P.; Somasundaram, K.; Elamparithi, P.N. A deep learning approach for prediction of air quality index in a metropolitan city. Sustain. Cities Soc. 2021, 67, 102720. [Google Scholar] [CrossRef]
Wang, L.; Hu, B.; Zhao, Y.; Song, K.; Ma, J.; Gao, H.; Huang, T.; Mao, X. A hybrid spatiotemporal model combining graph attention network and gated recurrent unit for regional composite air pollution prediction and collaborative control. Sustain. Cities Soc. 2024, 116, 105925. [Google Scholar] [CrossRef]
Tang, C.; Wang, Z.; Wei, Y.; Zhao, Z.; Li, W. A novel hybrid prediction model of air quality index based on variational modal decomposition and CEEMDAN-SE-GRU. Process Saf. Environ. Prot. 2024, 191, 2572–2588. [Google Scholar] [CrossRef]
Wu, X.; Gu, X.; See, K. ADNNet: Attention-based deep neural network for Air Quality Index prediction. Expert Syst. Appl. 2024, 258, 125128. [Google Scholar] [CrossRef]
Qian, S.; Peng, T.; Tao, Z.; Li, X.; Nazir, M.S.; Zhang, C. An evolutionary deep learning model based on XGBoost feature selection and Gaussian data augmentation for AQI prediction. Process Saf. Environ. Prot. 2024, 191, 836–851. [Google Scholar] [CrossRef]
Chen, M.; Chen, Y.; Zhu, H.; Wang, Y.; Xie, Y. Analysis of pollutants transport in heavy air pollution processes using a new complex-network-based model. Atmos. Environ. 2023, 292, 119395. [Google Scholar] [CrossRef]
Nikpour, P.; Shafiei, M.; Khatibi, V. Gelato: A new hybrid deep learning-based Informer model for multivariate air pollution prediction. Environ. Sci. Pollut. Res. 2024, 31, 29870–29885. [Google Scholar] [CrossRef]
Kalantari, E.; Gholami, H.; Malakooti, H.; Nafarzadegan, A.R.; Moosavi, V. Machine learning for air quality index (AQI) forecasting: Shallow learning or deep learning? Environ. Sci. Pollut. Res. 2024, 31, 62962–62982. [Google Scholar] [CrossRef] [PubMed]
Yu, C.; Wang, F.; Wang, Y.; Shao, Z.; Sun, T.; Yao, D.; Xu, Y. Mgsfformer: A multi-granularity spatiotemporal fusion transformer for air quality prediction. Inf. Fusion 2025, 113, 102607. [Google Scholar] [CrossRef]
Tao, H.; Al-Sulttani, A.O.; Saad, M.A.; Ahmadianfar, I.; Goliatt, L.; Kazmi, S.S.U.H.; Alawi, O.A.; Marhoon, H.A.; Tan, M.L.; Yaseen, Z.M. Optimized ensemble deep random vector functional link with nature inspired algorithm and boruta feature selection: Multi-site intelligent model for air quality index forecasting. Process Saf. Environ. Prot. 2024, 191, 1737–1760. [Google Scholar] [CrossRef]
Udristioiu, M.T.; Mghouchi, Y.E.; Yildizhan, H. Prediction, modelling, and forecasting of PM and AQI using hybrid machine learning. J. Clean. Prod. 2023, 421, 138496. [Google Scholar] [CrossRef]
Huang, J.; Lu, H.; Huang, Y. Spatial Dynamic Interaction Effects and Formation Mechanisms of Air Pollution in the Central Plains Urban Agglomeration in China. Atmosphere 2024, 15, 984. [Google Scholar] [CrossRef]
Dey, S. Urban air quality index forecasting using multivariate convolutional neural network based customized stacked long short-term memory model. Process Saf. Environ. Prot. 2024, 191, 375–389. [Google Scholar] [CrossRef]
Li, Y.; Peng, T.; Hua, L.; Ji, C.; Ma, H.; Nazir, M.S.; Zhang, C. Research and application of an evolutionary deep learning model based on improved grey wolf optimization algorithm and DBN-ELM for AQI prediction. Sustain. Cities Soc. 2022, 87, 104209. [Google Scholar] [CrossRef]
Gokul, P.; Mathew, A.; Bhosale, A.; Nair, A.T. Spatio-temporal air quality analysis and PM2.5 prediction over Hyderabad City, India using artificial intelligence techniques. Ecol. Inform. 2023, 76, 102067. [Google Scholar] [CrossRef]
Fan, Z.; Zhao, Y.; Hu, B.; Wang, L.; Guo, Y.; Tang, Z.; Tang, J.; Ma, J.; Gao, H.; Huang, T.; et al. Enhancing urban real-time PM2. 5 monitoring in street canyons by machine learning and computer vision technology. Sustain. Cities Soc. 2024, 100, 105009. [Google Scholar] [CrossRef]
Ahmed, A.A.M.; Jui, S.J.J.; Sharma, E.; Ahmed, M.H.; Raj, N.; Bose, A. An advanced deep learning predictive model for air quality index forecasting with remote satellite-derived hydro-climatological variables. Sci. Total Environ. 2024, 906, 167234. [Google Scholar] [CrossRef]
Rabie, R.; Asghari, M.; Nosrati, H.; Niri, M.E.; Karimi, S. Spatially resolved air quality index prediction in megacities with a CNN-Bi-LSTM hybrid framework. Sustain. Cities Soc. 2024, 109, 105537. [Google Scholar] [CrossRef]
Horn, S.A.; Dasgupta, P.K. The Air Quality Index (AQI) in historical and analytical perspective a tutorial review. Talanta 2024, 267, 125260. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Rao, C.; Gao, M.; Xiao, X.; Goh, M. Efficient calculation of distributed photovoltaic power generation power prediction via deep learning. Renew. Energy 2025, 246, 122901. [Google Scholar] [CrossRef]
Zha, W.; Liu, Y.; Wan, Y.; Luo, R.; Li, D.; Yang, S.; Xu, Y. Forecasting monthly gas field production based on the CNN-LSTM model. Energy 2022, 260, 124889. [Google Scholar] [CrossRef]
Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to Forget: Continual Prediction with LSTM. Neural Comput. 2000, 12, 2451–2471. [Google Scholar] [CrossRef]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
Li, D.; Liu, J.; Zhao, Y. Prediction of multi-site PM2. 5 concentrations in Beijing using CNN-Bi LSTM with CBAM. Atmosphere 2022, 13, 1719. [Google Scholar] [CrossRef]
He, G.; Deng, T.; Wu, D.; Wu, C.; Huang, X.; Li, Z.; Yin, C.; Zou, Y.; Song, L.; Ouyang, S.; et al. Characteristics of boundary layer ozone and its effect on surface ozone concentration in Shenzhen, China: A case study. Sci. Total Environ. 2021, 791, 148044. [Google Scholar] [CrossRef]
Yu, D.; Tan, Z.; Lu, K.; Ma, X.; Li, X.; Chen, S.; Zhu, B.; Lin, L.; Li, Y.; Qiu, P.; et al. An explicit study of local ozone budget and NOx-VOCs sensitivity in Shenzhen China. Atmos. Environ. 2020, 224, 117304. [Google Scholar] [CrossRef]
Liu, C.; Wu, C.; Kang, X.; Zhang, H.; Fang, Q.; Su, Y.; Li, Z.; Ye, Y.; Chang, M.; Guo, J. Evaluation of the prediction performance of air quality numerical forecast models in Shenzhen. Atmos. Environ. 2023, 314, 120058. [Google Scholar] [CrossRef]

Figure 1. The locations of Chinese cities in five different geographical and climatic regions.

Figure 2. The original AQI vs. Gaussian-filtered AQI (taking Shanghai as an example).

Figure 3. Specific structural flow chart of the data training of the CNN-LSTM-KAN hybrid machine learning model.

Figure 4. The process of AQI data preprocessing, training, and prediction.

Figure 5. Correlation diagram of the AQI and other air pollutants in Shenzhen.

Figure 6. The prediction performance of four major hybrid machine learning models in five cities (RMSE).

Table 1. Shapiro–Wilk test statistic results of AQI data in five cities.

	Shapiro–Wilk Test Statistic	p-Value
Shanghai	0.924098253	2.00 × $10^{- 29}$
Shenzhen	0.891008496	4.19 × $10^{- 34}$
Chengdu	0.940030336	1.42 × $10^{- 26}$
Beijing	0.792555571	2.52 × $10^{- 43}$
Wuhan	0.940316379	1.62 × $10^{- 26}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Y.; Ding, Y.; Jiang, W. Geographically Aware Air Quality Prediction Through CNN-LSTM-KAN Hybrid Modeling with Climatic and Topographic Differentiation. Atmosphere 2025, 16, 513. https://doi.org/10.3390/atmos16050513

AMA Style

Hu Y, Ding Y, Jiang W. Geographically Aware Air Quality Prediction Through CNN-LSTM-KAN Hybrid Modeling with Climatic and Topographic Differentiation. Atmosphere. 2025; 16(5):513. https://doi.org/10.3390/atmos16050513

Chicago/Turabian Style

Hu, Yue, Yitong Ding, and Wenjing Jiang. 2025. "Geographically Aware Air Quality Prediction Through CNN-LSTM-KAN Hybrid Modeling with Climatic and Topographic Differentiation" Atmosphere 16, no. 5: 513. https://doi.org/10.3390/atmos16050513

APA Style

Hu, Y., Ding, Y., & Jiang, W. (2025). Geographically Aware Air Quality Prediction Through CNN-LSTM-KAN Hybrid Modeling with Climatic and Topographic Differentiation. Atmosphere, 16(5), 513. https://doi.org/10.3390/atmos16050513

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Geographically Aware Air Quality Prediction Through CNN-LSTM-KAN Hybrid Modeling with Climatic and Topographic Differentiation

Abstract

1. Introduction

2. Methodology

2.1. Study Area

2.2. Data Collection

2.3. Data Preprocessing

2.4. Gaussian Filtering

2.5. Min–Max Normalization

2.6. Evaluation Metrics

2.7. Research Methodology

2.7.1. Convolutional Neural Network

2.7.2. Long Short-Term Memory

2.7.3. Kolmogorov–Arnold Networks

2.8. Model Construction and Training

2.9. Correlation Analysis

3. Result and Discussion

3.1. Correlation Analysis Results

3.2. Hybrid Machine Learning Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

List of Abbreviations

Appendix A. Figure

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI