A Symmetry-Driven Hybrid Framework Integrating ITTAO and sLSTM-Attention for Air Quality Prediction

Liu, Yanping; Zhang, Kunkun; Yu, Bohao; Liao, Bin; Song, Fuhong; Tang, Chunju

doi:10.3390/sym17081369

Open AccessArticle

A Symmetry-Driven Hybrid Framework Integrating ITTAO and sLSTM-Attention for Air Quality Prediction

by

Yanping Liu

¹,

Kunkun Zhang

^1,*,

Bohao Yu

²,

Bin Liao

¹,

Fuhong Song

³ and

Chunju Tang

⁴

¹

College of Big Data Statistics, Guizhou University of Finance and Economics, Guiyang 550025, China

²

College of Environmental Science and Engineering, China West Normal University, Nanchong 637002, China

³

School of Information, Guizhou University of Finance and Economics, Guiyang 550025, China

⁴

School of Humanities, Guizhou University of Finance and Economics, Guiyang 550025, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(8), 1369; https://doi.org/10.3390/sym17081369

Submission received: 23 July 2025 / Revised: 19 August 2025 / Accepted: 20 August 2025 / Published: 21 August 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Air pollution poses a threat to public health, ecosystem stability, and sustainable development. Accurate air quality prediction is essential for environmental protection and achieving sustainability. This study proposes a symmetry-driven hybrid framework that integrates an Improved Triangulation Topology Aggregation Optimizer (ITTAO) with a Stable Long Short-Term Memory (sLSTM) network and an attention mechanism to achieve high-precision air quality prediction. Three enhancement strategies are introduced to improve the optimization capability of the TTAO algorithm. Experiments with CEC2017 standard functions validate the ITTAO algorithm’s superior convergence and global search ability. ITTAO then optimizes the hyperparameters of the sLSTM-Attention model, resulting in the ITTAO-sLSTM-Attention model. Four air quality datasets from diverse regions in China verify the model’s performance, demonstrating that the proposed model outperforms seven swarm intelligence-optimized sLSTM-Attention models and six machine learning models. Compared to the LSTM model, ITTAO-sLSTM-Attention reduces RMSE by 23.47%, 13.23%, 19.69%, and 26.46% across four cities, confirming its enhanced accuracy and generalization. Finally, an interactive air quality prediction system based on the ITTAO-sLSTM-Attention model and PyQt is developed, offering a user-friendly tool for air quality prediction.

Keywords:

air quality prediction; Triangulation Topology Aggregation Optimizer; Stable Long Short-Term Memory; attention mechanism; symmetry-driven framework

1. Introduction

Air pollution is a major and pervasive issue worldwide. It profoundly impacts livelihoods and economic development while significantly hindering the sustainable development of human society [1]. Atmospheric pollutants severely impact human health, causing conditions such as respiratory and cardiovascular diseases, and may even result in premature death [2]. Accurate air quality predictions, which reflect air conditions in real time, provide crucial decision support for relevant departments, promote environmental protection efforts, support sustainable management practices, and aid in long-term policy-making for sustainable development [3]. Increasingly, scholars both domestically and internationally are dedicated to studying air quality indicators and continuously optimizing air quality prediction models.

In the field of air pollution concentration prediction, existing research mainly focuses on two directions: statistical methods and data-driven modeling [4]. Statistical methods are based on statistical principles and aim to analyze and infer the characteristics and trends in data. Commonly used models include autoregressive integral moving-average models and state-space models. These methods have strong interpretability and statistical rigor. However, when it comes to nonlinear relationships and high-dimensional complex data, the modeling capabilities of these methods are limited [5,6,7]. Unlike statistical methods, data-driven modeling approaches can effectively mine complex relationships and trends in past pollutant concentration levels, enabling accurate predictions of future pollutant levels using past observations and present conditions. Data-driven approaches mainly involve machine learning and deep learning methods [8]. Several studies have proposed machine learning-based models to predict air quality and pollutant concentrations. These models include support vector regression (SVR), Random Forest Regression (RFR), decision trees, artificial neural networks (ANNs), and XGBoost algorithms. They typically combine meteorological, environmental, and pollutant data to forecast air quality indices such as PM_2.5 and AQI [9,10,11,12,13,14]. However, these traditional machine learning models are only suitable for scenarios with simple structures and sufficient data, and their prediction accuracy becomes limited when dealing with sudden events or complex spatiotemporal relationships [15]. Deep learning advancements, particularly Long Short-Term Memory (LSTM) networks, have become pivotal in air quality prediction due to their capacity to model temporal dependencies and nonlinear dynamics. Studies show that LSTMs surpass traditional models like SVR, RFR, and XGBoost [16,17]. Despite the strong performance of the LSTM model in various applications and its ability to overcome the gradient vanishing and explosion problems of traditional recurrent neural networks, it still has inherent limitations. Specifically, LSTM may struggle with gradient decay when handling very long time series. Moreover, it faces challenges in training to capture long-term dependencies, which in turn affects the model’s performance and stability.

To address the aforementioned issues with the LSTM model, researchers have mainly adopted two improvement strategies. The first strategy involves integrating additional mechanisms with the LSTM model to enhance its performance. For instance, CNN and attention mechanisms have been combined with LSTM to improve AQI prediction accuracy [18], and CNN+LSTM hybrids have been introduced to forecast air pollutant concentrations in urban areas [19]. Moreover, a hybrid model combining CNN, LSTM, and time-series decomposition via empirical mode decomposition (EMD) has been proposed to improve prediction accuracy, particularly in the Lanzhou dataset [20]. The second approach focuses on modifying the original LSTM architecture to enhance its performance. Examples include the use of Bidirectional LSTM (BiLSTM) for hourly PM_2.5 forecasting, which processes information in both directions, capturing more comprehensive time-series features [21], and stacked LSTM models that improve accuracy by capturing complex dependencies across different time scales [22]. Additionally, the sLSTM model, which introduces an exponential activation function and stabilization factors, has been shown to outperform traditional LSTM models in prediction accuracy and stability [23]. While both enhancement methods have contributed to some improvement in the LSTM model’s performance, its overall effectiveness is still largely dependent on hyperparameters.

Numerous studies have shown that optimizing the hyperparameters of air quality prediction models using swarm intelligence algorithms is an effective way to improve predictive performance. Various algorithms, such as BWO [24], APSO [25], SSA [26], ICGO [27], and BSMO [28], have been applied to optimize key hyperparameters in different models, including Informer, CNN-Bi-LSTM, LSTM, and BiLSTM. These optimizations have significantly improved prediction accuracy in terms of metrics like AQI, mean absolute error (MAE), and root mean square error (RMSE) for datasets in cities such as Beijing, Wuhan, and Chengdu. However, it is important to note that the “no free lunch theorem” in swarm intelligence algorithms suggests that no single algorithm is universally optimal for all types of problems. Thus, selecting the appropriate swarm intelligence algorithm depending on the specific application scenario is crucial.

Although there has been a considerable amount of research combining swarm intelligence optimization algorithms with time-series prediction models for air quality forecasting, most studies are still limited to the combination of traditional optimization algorithms, such as PSO and Genetic Algorithms, with conventional prediction models, such as SVR, XGBoost, LSTM, and GRU, among others. These methods still face some limitations in terms of prediction accuracy. To address this, this paper proposes a symmetry-driven hybrid framework and applies it to air quality prediction. This framework integrates the ITTAO, sLSTM model, and attention mechanism. The sLSTM component provides temporal symmetry-aware modeling capabilities, while the attention mechanism enables global balance and symmetry-guided feature extraction. By introducing the structurally symmetric ITTAO optimization algorithm to further optimize the model’s hyperparameters, prediction accuracy is improved.

Different from existing studies, the ITTAO algorithm proposed in this paper is an improvement of the Triangulation Topology Aggregation Optimizer (TTAO) algorithm, enhancing its global optimization ability. The sLSTM, as a relatively new deep learning model, has not been widely applied in air quality prediction. This paper bridges this gap by combining sLSTM with the attention mechanism for air quality forecasting. Furthermore, most current air quality prediction research focuses on theoretical model development while neglecting the practical application of these models. This paper not only focuses on theoretical model innovation but also combines it with practical application, proposing a more practically valuable prediction system, thus bridging the gap between theory and practice. The key contributions are as follows:

In order to enhance the optimization ability of the TTAO, this paper introduces Random Walk Strategy Using Lévy Flight, Differential Evolution Strategy, and Dimensional Pinhole Imaging Inverse Learning Strategy at different stages of the algorithm. Extensive experiments on the CEC2017 standard test function set demonstrate that the ITTAO algorithm exhibits superior convergence accuracy and global search capability compared to other classic optimization algorithms.
To improve the accuracy of the sLSTM model in sustainable air quality prediction, this paper incorporates the attention mechanism into the sLSTM model, resulting in the creation of the sLSTM-Attention hybrid model. The learning rate and dropout rate of the model are optimized using the ITTAO algorithm, leading to the development of the ITTAO-sLSTM-Attention model. This model is then compared with various swarm intelligence-optimized sLSTM-Attention models and other classic machine learning models across four cities. The results indicate that the ITTAO-sLSTM-Attention model exhibits superior predictive accuracy and generalization performance.
To bridge the gap between theory and practice for the ITTAO-sLSTM-Attention model, this paper presents an interactive prediction system developed using PyQt. The system features an intuitive user interface and integrates functional modules such as model training, hyperparameter settings, data prediction, and result visualization. It aims to provide users with a more convenient and efficient tool for sustainable air quality prediction.

2. Methodology

2.1. Triangulation Topology Aggregation Optimizer

The TTAO algorithm is a novel optimization method inspired by the theory of similar triangles in plane geometry [29]. It is designed by constructing dynamically evolving similar triangle topology units that embody inherent structural symmetry and by aggregating high-quality vertex information. Through this design, TTAO is able to effectively solve optimization problems and obtain near-optimal solutions.

(1): Initialization

In the initialization stage of the TTAO algorithm, given the population size N and the dimension of the variables D, each vertex in the triangular topology is treated as a search agent. The total population N is divided into

N / 3

triangular topological units (TTUs), with additional individuals randomly generated in the search space. In this initialization phase,

N / 3

individuals are randomly generated within the feasible region; the mathematical expression for generating each individual is shown as follows:

x_{i, 1}^{j} (t) = r (U_{i, 1}^{j} - L_{i, 1}^{j}) + L_{i, 1}^{j}, j = 1, 2, 3, \dots, D,

(1)

where

x_{i, 1}^{j}

represents the position information of the first type of vertex agent in the i-th TTU, with i being an integer between 1 and

N / 3

, and j represents the j-th dimension’s variable. t represents the current generation number, r represents a random number in the range

[0, 1]

, and

U_{i, 1}^{j}

and

L_{i, 1}^{j}

are the upper and lower bounds for the j-th dimension’s variable.

(2): Formation of Triangular Topological Units

The creation of TTUs involves transforming between polar coordinates and the Cartesian coordinate system. Starting with the first vertex, a new direction vector of length

l \cdot f (\vec{θ} (t))

is positioned in the spherical coordinate system and then converted to the standard coordinate system using trigonometric functions to define the second vertex. This direction vector is then rotated counterclockwise by

π / 3

, and converted again into the standard coordinate system to define the third vertex. As shown in Equations (2) and (3), the mathematical expressions for these vertices are given by

x_{i, 2}^{j} (t) = x_{i, 1}^{j} (t) + l \cdot f (θ_{i, 1}^{j} (t))

(2)

x_{i, 3}^{j} (t) = x_{i, 1}^{j} (t) + l \cdot f (θ_{i, 1}^{j} (t) + \frac{π}{3}),

(3)

where

l = 9 \cdot e^{\frac{t}{T}}

denotes the size of the TTU, with T being the maximum number of iterations. The direction vectors of the other two edges,

f (θ_{i, 1}^{j} (t))

and

f (θ_{i, 1}^{j} (t) + \frac{π}{3})

, are derived from the first type of vertices in the j-dimensional variable of the i-th TTU, where

θ_{i, 1}^{j} (t)

is a random number within the range [0,

π

].

Each TTU is aggregated into a fourth vertex, generated through a linear weighting of the three types of vertices. The fourth vertex is defined as

x_{i, 4}^{j} (t) = r_{1} x_{i, 1}^{j} (t) + r_{2} x_{i, 2}^{j} (t) + r_{3} x_{i, 3}^{j} (t),

(4)

where

r_{1}

,

r_{2}

, and

r_{3}

are random numbers in the range

[0, 1]

, with the constraint that

r_{1} + r_{2} + r_{3} = 1

. As a result, the fourth type of search agent is placed within each TTU.

(3): Generic Aggregation

Generic aggregation emphasizes the exploration stage, during which information on excellent agents in different TTUs is collected and new feasible solutions are created. Information exchange occurs between the best agent in each TTU and the best agent in any group of randomly selected units. The inspiration for it comes from the information interaction mechanism of gene hybridization in genetic algorithms. A linear combination with different weights is applied to the dimension variables of the two agents. This results in the creation of a new agent through a more optimal two-vertex connection, as shown in Equation (5):

X_{i, n e w 1}^{j} (t + 1) = r_{4} X_{i, b e s t}^{j} (t) + (1 - r_{4}) X_{r a n d, b e s t}^{j} (t),

(5)

where

r_{4}

is a random number in the range [0, 1], and

X_{i, b e s t}

and

X_{r a n d, b e s t}

represent the best proxy for TTU i and the optimal proxy for random triangular unit selection at the t-th iteration, respectively. In addition, the fitness value of

X_{i, n e w 1}

is compared with the optimal and suboptimal search agents of this triangular unit, and the optimal and suboptimal agents are updated. Consider a minimization problem. The mathematical expressions for the updated optimal and suboptimal agents at iteration

t + 1

are as shown in Equation (6):

\{\begin{matrix} X_{i, b e s t} (t + 1) = X_{i, n e w 1} (t), & if f (X_{i, n e w 1} (t)) < f (X_{i, b e s t} (t)) \\ X_{i, s b e s t} (t + 1) = X_{i, n e w 1} (t), & if f (X_{i, n e w 1} (t)) < f (X_{i, s b e s t} (t)), \end{matrix}

(6)

where

X_{i, s b e s t} (t)

represents the suboptimal agent of the i-th iteration,

r_{4}

is a random number in the range [0, 1],

X_{i, b e s t}

and

X_{r a n d, b e s t}

represent the best proxy for TTU i and the optimal proxy for random triangular unit selection at the t-th iteration, and

f (\cdot)

represents the objective function of the given problem.

(4): Local Aggregation

During the local aggregation phase, each group of TTUs forms a motion vector difference using the optimal and suboptimal vertices, which causes the disturbance of the optimal agent’s position in both direction and step size within the local area. Therefore, each TTU is re-searched within a specific local area. To implement the development of each TTU, the calculation formula of the new agent can be given by

X_{i, n e w 2}^{j} (t + 1) = X_{i, b e s t}^{j} (t + 1) + α (X_{i, b e s t}^{j} (t + 1) - X_{i, s b e s t}^{j} (t + 1)),

(7)

where

α

is the adjustment parameter for the aggregation range size, which can be calculated as

α = \ln (\frac{e - e^{3}}{T - 1} t + e^{3} - \frac{e - e^{3}}{T - 1}) .

(8)

To guide convergence in a better direction, the fitness values of the agents are compared before and after local exploration. As shown in Equation (9), if the new agent performs better than the original, its position is updated; otherwise, no update occurs.

\{\begin{matrix} X_{i, b e s t} (t + 1) = X_{i, n e w 2} (t + 1), & if f (X_{i, n e w 2} (t + 1)) < f (X_{i, b e s t} (t + 1)) \\ X_{i, b e s t} (t + 1), & otherwise . \end{matrix}

(9)

2.2. Improved Triangulation Topology Aggregation Optimizer

2.2.1. Random Walk Strategy Using Lévy Flight

In the initialization phase of the TTAO algorithm, a new agent

x_{4}

is generated for each group of TTUs through a linear weighted combination of its three vertex agents:

x_{1}

,

x_{2}

, and

x_{3}

. The search direction of this agent is entirely dependent on the geometric relationships of the current triangular structure. However, this generation mechanism has a narrow search direction, as the new solution is always confined within the spatial range of the three known points, lacking effective exploration of the unknown regions. This limitation often leads to a fixed search path, causing the algorithm to fall into local regions and undergo repetitive iterations. To overcome this directional constraint, a Lévy flight strategy is introduced after the generation of

x_{4}

. Lévy flight, based on a heavy-tailed distribution, allows for long-distance jumps, breaking the search constraints of a fixed direction, thus expanding the solution distribution range and enhancing the global exploration capability of the population [30]. More importantly, Lévy flight is applied to the

x_{4}

generated from the local structure, ensuring that the jump’s starting point has high quality, thus avoiding ineffective searches caused by random blind jumps. The updated formula is

x_{i, n e w 4}^{j} (t) = x_{i, 4}^{j} (t) + L e v y \times e^{- λ \frac{t}{T}},

(10)

where

e^{- λ \frac{t}{T}}

is the control factor,

λ

controls the decay rate, and

L e v y

represents the random walk generated by the Lévy distribution. The calculation formula is

L e v y = \frac{μ}{{| ν |}^{\frac{1}{β}}},

(11)

where

μ \sim N (0, σ^{2})

,

ν \sim N (0, 1)

, and

σ = {\{\frac{Γ (1 + β \sin (\frac{π β}{2}))}{β Γ (\frac{1 + β}{2}) 2^{\frac{β - 1}{2}}}\}}^{\frac{1}{β}},

(12)

in which

Γ

represents the Gamma function, and

β = 1.5

is the default constant.

2.2.2. Differential Evolution Strategy

In the generic aggregation stage of the TTAO algorithm, the algorithm generates new solutions by linearly combining two agents with different weights. Although this generation method can achieve information interaction between agents through linear combinations in different dimensions, the new solutions are restricted within the linear space between the two agents. It is difficult to explore the nonlinear regions in the solution space, and information interaction is only carried out through linear combinations, lacking more complex interaction mechanisms. This makes it hard to fully utilize the information in the TTUs, resulting in the algorithm having insufficient local search ability. For enhancing the local search ability of the algorithm, a differential evolution strategy is introduced in generic aggregation. This strategy is an evolutionary strategy based on population differences; it conducts a fine search through mutation and crossover operations [31]. Its mathematical expression for mutation can be given by

v_{i} (t + 1) = x_{i, b e s t} (t) + F \cdot (x_{r a n d 1, b e s t} (t) - x_{r a n d 2, b e s t} (t)),

(13)

where F is a random scaling factor generated in the range

[0.5, 1]

,

x_{r a n d 1, b e s t} (t)

and

x_{r a n d 2, b e s t} (t)

are the two best agents selected from the randomly chosen TTU at iteration t. The crossover operation can be expressed as

u_{i}^{j} (t + 1) = \{\begin{matrix} v_{i}^{j} (t + 1), & if rand (j) \leq C R or j = j_{rand} \\ x_{i, b e s t}^{j} (t + 1), & otherwise \end{matrix},

(14)

where

rand (j) \in (0, 1)

generates an independent random number for each dimension,

C R

is the crossover probability, and

j_{rand} = 1, 2, \dots, D

is a random integer used to ensure that at least one dimension comes from the mutation vector

v_{i} (t + 1)

. By using the greedy criterion, the current agent

x_{i, n e w 1} (t + 1)

is compared with the new agent

u_{i} (t + 1)

generated after the mutation and crossover operations, and the agent with the lower fitness function value is selected, according to (15).

x_{i, n e w 1} (t + 1) = \{\begin{matrix} u_{i} (t + 1), & if f (u_{i} (t + 1)) < f (x_{i, n e w 1} (t + 1)) \\ x_{i, n e w 1} (t + 1), & otherwise . \end{matrix},

(15)

2.2.3. Dimensional Pinhole Imaging Inverse Learning Strategy

Aiming at the problem that the TTAO algorithm becomes stuck in local extrema and the optimal proxy in TTUs remains unchanged in multiple iterations during the optimization process, this paper introduces a dimensional pinhole imaging inverse learning strategy [32]. This strategy integrates the traditional reverse learning method with the pinhole imaging principle in optics. By conducting imaging-based reverse mapping of the current optimal individual in each dimension, it can effectively reduce the coupling interference between dimensions, thereby enhancing the local search ability and the ability to escape from the local optimum. Specifically, this method takes the current optimal solution as a reference in each dimension, expands the search space by constructing its “imaging inverse solution”, and introduces an adjustable control parameter n to adjust the mapping magnification, thereby achieving flexible control of the search stride. The reverse mapping process of this strategy can be expressed as

x_{i, i n v}^{j} (t) = \frac{U_{i}^{j} + L_{i}^{j}}{2} + \frac{U_{i}^{j} + L_{i}^{j} - 2 x_{i, b e s t}^{j} (t)}{2 n},

(16)

x_{i, b e s t} (t) = \{\begin{matrix} x_{i, i n v} (t), & if f (x_{i, i n v} (t)) < f (x_{i, b e s t} (t)) \\ x_{i, b e s t} (t), & otherwise, \end{matrix},

(17)

where

x_{i, i n v} (t)

represents the position of the imaging inverse solution of the optimal agent in the i-th TTU at the t-th iteration. The pseudo-code of ITTAO is summarized in Algorithm 1.

2.3. sLSTM-Attention Model

The sLSTM-Attention model proposed in this study is a hybrid framework that integrates an sLSTM network with a multi-head attention mechanism. The sLSTM component is designed to capture dynamic variations and long-term dependencies in air quality time-series data [33], demonstrating a temporal symmetry-aware modeling capability. Meanwhile, the multi-head attention mechanism allocates weights to different time steps to selectively focus on informative historical inputs [34]. This mechanism enables symmetry-aware interactions across attention heads and introduces a balanced temporal representation, thereby reflecting both structural and temporal symmetry in the feature extraction process. By combining the temporal symmetry-aware modeling capability of sLSTM with the globally symmetric design of the attention mechanism, the overall architecture is structured to support stable and effective air quality forecasting. The algorithmic flow of the sLSTM-Attention model is shown in Figure 1.

Algorithm 1 Improved triangulation topology aggregation optimizer

Require: Population size N, maximum number of iterations T, dimension of variables D.
Ensure: The optimal position

X_{b e s t}

and its fitness value

f_{b e s t}

.
Initialize each TTU point

X_{1}

.
while

t < T

do
Update each TTU point

X_{2}, X_{3}, X_{4}

according to Equations (2), (3) and (10), and perform boundary checks.
for

i = 1

to

N / 3

do
Calculate the fitness values of each agent, sort the vertexes in each group of TTUs according to the fitness values, and record

X_{i, b e s t} (t)

and

X_{i, s b e s t} (t)

.
end for
for

i = 1

to

N / 3

do
Update the agent position using Equation (15), and update

X_{i, b e s t}

and

X_{i, s b e s t} (t)

using Equation (6).
end for
for

i = 1

to

N / 3

do
Update the agent position using Equation (7), and update

X_{i, b e s t}

using Equation (9).
end for
for

i = 1

to

N / 3

do
Update the agent positions using Equation (17).
end for
Calculate the fitness value of

X_{b e s t} (t)

and the fitness values of remaining agents, and sort the positions according to the fitness values.
Update

X_{1}

using the top

N / 3

search agents based on their fitness values.
end while
return

X_{b e s t}

and

f_{b e s t}

2.3.1. sLSTM Model

sLSTM is an enhanced variant of traditional LSTM, designed to enhance the model’s ability to capture long-sequence dependencies and enhance its stability and generalization in practical applications. Unlike traditional LSTM, sLSTM introduces an exponential gating mechanism and a normalized state mechanism [23,33]. Among these, the exponential gating mechanism replaces the activation functions of the forgetting gate and the input gate with exponential functions from the sigmoid function, thereby significantly increasing the response amplitude of the gating unit to the input signal, allowing the model to better capture sudden changes in the input sequence, and enhancing the sensitivity to key time steps. It is applicable to prediction tasks with significant time-series fluctuations, such as traffic flow and air quality. Meanwhile, sLSTM introduces a state normalization mechanism. By normalizing the state units, it helps to address issues such as gradient explosion and vanishing gradients during training, thereby improving the numerical stability and convergence efficiency for long-sequence modeling. The corresponding states are represented as

f_{t} = \exp ({\bar{f}}_{t} + m_{t - 1} - m_{t}), {\bar{f}}_{t} = W_{f}^{T} x_{t} + r_{f} h_{t - 1} + b_{f},

(18)

i_{t} = \exp ({\tilde{i}}_{t} - m_{t}), {\tilde{i}}_{t} = W_{i}^{T} x_{t} + r_{i} h_{t - 1} + b_{i},

(19)

m_{t} = \max (\log (f_{t}) + m_{t - 1}, \log (i_{t})),

(20)

n_{t} = f_{t} n_{t - 1} + i_{t},

(21)

where

f_{t}

and

i_{t}

represent the output values of the forget gate and the input gate, respectively,

e x p (\cdot)

represents the exponential activation function,

x_{t}

is the input value at time t,

h_{t - 1}

is the hidden state at time

t - 1

, W and r are the weight matrices, and b is the bias term. Due to the fact that the exponential activation function is prone to overflow problems, the sLSTM structure adopts an additional state

m_{t}

to stabilize the gating. To normalize the state,

n_{t}

controls the scale of the information flow and stabilizes the learning process of the network:

z_{t} = φ ({\tilde{z}}_{t}), {\tilde{z}}_{t} = W_{z}^{T} x_{t} + r_{z} h_{t - 1} + b_{z}

(22)

o_{t} = σ ({\tilde{o}}_{t}), {\tilde{o}}_{t} = W_{o}^{T} x_{t} + r_{o} h_{t - 1} + b_{o},

(23)

where

z_{t}

and

o_{t}

are temporary memory cells and output gates, which are consistent with the original LSTM.

φ (\cdot)

and

σ (\cdot)

represent the tanh function and the sigmoid function, respectively, both of which are activation functions.

c_{t} = f_{t} c_{t - 1} + i_{t} z_{t}

(24)

h_{t} = o_{t} {\tilde{h}}_{t}, {\tilde{h}}_{t} = \frac{c_{t}}{n_{t}}

(25)

Based on the information processing above, the cell-state update is shown in Equation (24). Based on the cell state

c_{t}

and the normalized state

n_{t}

at time t, the hidden state

h_{t}

at time t is computed as shown in Equation (25). The structure of an sLSTM cell is depicted in Figure 2.

2.3.2. Multi-Head Attention Mechanism

The attention mechanism is a method designed to improve the neural network models’ ability to prioritize crucial information in the input data [35]. By weighting different parts or features in the data based on their importance, attention can be focused on important information. Especially when dealing with long texts and complex content, useful information can be efficiently obtained. The output state of sLSTM is weighted in the attention layer, as shown in Equations (26)–(28). The query and key vectors are defined as Q and K, respectively, while the value vector is denoted by V:

Q = h_{s l s t m} W_{Q}

(26)

K = h_{s l s t m} W_{K}

(27)

V = h_{s l s t m} W_{V},

(28)

where

h_{s l s t m}

represents the hidden state output from the sLSTM layer, and

W_{Q}, W_{K}

, and

W_{V}

are the learnable weight matrices. The specific formula for calculating the attention scores can be given by

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,

(29)

where

d_{k}

is the dimension of the key, that is, the feature dimension.

K^{T}

is the transpose of the key. If there are h heads, the final multi-head self-attention output can be expressed as

h_{mult-head} = Concat (h_{1}, h_{2}, \dots, h_{h}) W_{o},

(30)

where

h_{1}, h_{2}, \dots, h_{h}

is the output of each head, and

W_{o}

is the linear transformation matrix. After concatenating the outputs of multiple heads, the final output is obtained through linear transformation.

2.4. ITTAO-sLSTM-Attention Model

In practical applications, the performance of the sLSTM-Attention model is influenced by the configuration of hyperparameters. Proper tuning of these hyperparameters is crucial to optimize the model’s effectiveness. However, traditional methods, such as manual tuning or grid search, are often inefficient and computationally expensive when dealing with large hyperparameter spaces. To resolve this, the ITTAO algorithm is employed to optimize the key hyperparameters of the sLSTM-Attention model, thus enhancing its prediction accuracy. During the hyperparameter optimization process, the learning rate and dropout rate are selected as the optimization variables for the ITTAO algorithm. These two hyperparameters have a direct effect on the model’s convergence, stability, and its ability to mitigate overfitting. The learning rate controls the step size of model weight updates, while the dropout rate affects the regularization effect of the neural network. A well-balanced dropout rate can prevent overfitting and enhance the model’s generalization ability. In this study, RMSE is used as the fitness function. By minimizing the RMSE, the ITTAO algorithm is effectively directed towards identifying the optimal hyperparameter configuration, thereby improving the model’s prediction performance. The overall workflow of the ITTAO-sLSTM-Attention model is shown in Figure 3. The detailed steps are as follows:

Data preparation: This includes generating the training and testing sets, data preprocessing, and data standardization.
Model construction: The sLSTM-Attention model proposed in this paper is constructed.
Hyperparameter optimization: The learning rate and dropout rate of the sLSTM-Attention model are optimized using the ITTAO algorithm. This process includes the initialization of the ITTAO algorithm, the calculation of fitness values based on RMSE as the evaluation metric, and the updating of optimization variables according to the fitness values, ultimately returning the optimal model parameter configuration. The detailed algorithm update process is shown in Algorithm 1.
Model training: Based on the optimal hyperparameters obtained from ITTAO optimization, these parameters are used to train the sLSTM-Attention model.
Prediction results and error evaluation: Based on the trained model, prediction results are obtained and errors are calculated.

2.5. Evaluation Metrics

This paper evaluates the performance of the model by examining the differences between the actual values and the predicted values. To provide a more comprehensive assessment, three evaluation indicators are selected: MAE, RMSE, and MAPE. The calculation methods for each evaluation indicator are as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y_{i}}|

(31)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(32)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - \hat{y_{i}}}{y_{i}}|

(33)

In Equations (31)–(33), n denotes the total number of observations,

y_{i}

represents the true value for the i-th observation, and

\hat{y_{i}}

is the predicted value for the same observation.

3. Experimental Results and Discussion

3.1. Performance Analysis of the ITTAO Algorithm

3.1.1. Test Functions and Basic Configurations

To systematically evaluate the performance of the ITTAO algorithm on complex optimization problems, this paper designs and implements a comparative experiment based on the CEC2017 benchmark test function set [36]. This function set covers four typical test problems: F1 is a unimodal function, F3 to F9 are simple multimodal functions, F10 to F19 are mixed functions, and F20 to F30 are composite functions. The experiment was performed on a standardized system, which included an Intel Core i5-12500H processor, 32 GB of RAM, and the 64-bit version of Windows 11 as the operating system. All programs are written in Python 3.11.7. The deep learning models involved include sLSTM, LSTM, GRU, etc., and are all implemented based on the PyTorch framework, version 2.4.0+cu118. To ensure the scientific rigor and repeatability of the comparison, this paper selects seven swarm intelligence optimization algorithms that are representative and widely used in the literature as the control objects. They are the PSO [37], Whale Optimization Algorithm (WOA) [38], Grasshopper Optimization Algorithm (GOA) [39], SSA [40], Teaching Optimization Algorithm (TLBO) [41], Gravity Search Algorithm (GSA) [42], and TTAO. Table 1 presents the parameter configurations for each of the comparison algorithms. All algorithms were run under identical experimental conditions, with the parameters being a population size of 30 and a maximum of 1000 iterations. Additionally, in order to minimize the effect of randomness, 30 independent trials were conducted for each test function [43,44].

3.1.2. Analysis of Statistical Results

Table S1 (Supplemental Material) presents the convergence results, including the mean and standard deviation, from experiments on 30-dimensional test functions, offering a statistical evaluation of the algorithm’s performance.

For the unimodal function F1, its main purpose is to test the ITTAO’s basic search ability and convergence speed in simple optimization problems. The ITTAO algorithm performs exceptionally well in F1, with a mean lower than all comparison algorithms and a smaller variance, indicating its strong convergence and stability, and its ability to effectively handle simple optimization tasks.

For simple multimodal functions F3–F9, these functions have multiple local minima, which are designed to test the algorithm’s global search ability and the ability to escape from local optimal solutions. The average performance of the ITTAO algorithm on the F3, F4, F5, F6, and F7 functions is the best. Especially on F4 and F7, the variance of ITTAO is lower than that of other compared algorithms, demonstrating its stability in jumping between local optimal solutions. Although ITTAO failed to achieve the optimal mean on F8 and F9, its stability outperformed GOA, WOA, SSA, and TTAO, indicating that ITTAO demonstrates superior global optimization ability and stability when handling multimodal functions.

For the hybrid functions F10–F19, they combine the characteristics of multiple classic optimization problems, aiming to test whether the algorithm can handle different types of optimization challenges, especially in terms of adaptability and flexibility in complex solution spaces. Hybrid functions have a more complex solution space structure, so the algorithm needs to maintain openness while exploring new solutions. The ITTAO algorithm performs well in the F10, F11, F13, F14, F15, F17, F18, and F19 functions. Although it does not fully achieve the global optimum, its mean is lower than that of the comparison algorithms, demonstrating strong local optimization ability. Especially in the F13, F14, and F15 functions, the stability of the ITTAO algorithm is superior to other comparison algorithms, indicating that it can perform global search well and avoid local optima in these functions with complex solution spaces.

For the composite functions F20–F30, which are composed of multiple sub-problems, the objective is to test the comprehensive ability of the algorithm in high-dimensional, multimodal, and irregular solution spaces, especially the ability to achieve a balance between global and local search. Composite functions not only contain multiple local minima in high-dimensional spaces, but also often incorporate different scales and diverse solution space features. The mean value and stability of the ITTAO algorithm on F20, F22, F26, F27, F28, and F30 are outstanding, outperforming the comparison algorithms, indicating that it can effectively handle complex problems in high-dimensional and multimodal solution spaces. However, on the F21, F23, F24, and F29 functions, although the mean of the ITTAO algorithm is better, its stability fails to surpass that of algorithms such as SSA, GOA, and WOA. In the F25 function, although the mean and variance of ITTAO are not lower than those of the SSA algorithm, its results are close to those of SSA, and it outperforms other compared algorithms in both mean and stability, demonstrating its global optimization potential in complex compound problems.

Overall, the ITTAO algorithm outperforms the other seven algorithms in both local and global search capabilities. It shows stronger performance on 25 out of 29 test functions, validating the proposed strategy’s effectiveness.

As illustrated in Figure 4, the ITTAO algorithm achieves the best average ranking across all four types of benchmark test functions. It ranks first in unimodal, simple multimodal, mixed, and composite functions, with average rankings of 1.00, 1.29, 1.20, and 1.09, respectively, significantly outperforming the comparison algorithms, including PSO, WOA, GOA, SSA, TLBO, GSA, and TTAO.

As shown in Figure 5, the ITTAO algorithm exhibits exceptional convergence performance across all test functions. Whether it is unimodal functions such as F1; simple multimodal functions such as F3, F6, and F9; or more complex mixed and combined functions such as F13, F14, F22, F26, and F28, ITTAO can reduce the objective function value at a faster rate and continuously optimize throughout the entire iteration process. It maintains the leading position among all the compared algorithms.

3.1.3. Wilcoxon Test

To evaluate the performance differences between the ITTAO algorithm and TTAO, as well as six other mainstream algorithms, a statistical analysis is conducted on the optimization results of each algorithm on the CEC2017 benchmark function set based on the Wilcoxon test. At a 95% confidence level, a p-value less than 0.05 indicates a statistically significant performance difference between ITTAO and the comparison algorithm [45]. The statistical results are presented in Table 2. Among the 29 test functions, the p-values of ITTAO and TTAO in 26 functions are less than 0.05, indicating that ITTAO is significantly superior to TTAO in the vast majority of test functions, and there is no significant performance difference between ITTAO and TTAO in only 3 functions. In the comparison with six mainstream algorithms, namely, PSO, WOA, GOA, SSA, TLBO, and GSA, ITTAO achieved statistically significant advantages on 29, 29, 29, 29, 26, and 29 functions, respectively. Among these, in the comparison with the TLBO algorithm, the p-values of three functions were greater than 0.05, indicating that there was no significant performance difference between the two on these functions. The remaining algorithms were significantly surpassed by ITTAO on the vast majority or all of these functions. Overall, the ITTAO algorithm demonstrated significant performance advantages in the CEC2017 benchmark test. Statistical analysis through the Wilcoxon rank-sum test shows that ITTAO outperforms TTAO and other mainstream swarm intelligence algorithms in the vast majority of test functions. The above results prove the effectiveness and advancement of ITTAO as an improved algorithm, which can provide technical support for optimizing the hyperparameters of the sLSTM-Attention model.

3.2. Air Quality Prediction Using ITTAO-sLSTM-Attention

3.2.1. Data Source and Preprocessing

To assess the effectiveness and generalization capability of the ITTAO-sLSTM-Attention model in air quality prediction, this study uses historical air quality data from four representative Chinese cities between January 2020 and December 2024. The selected cities, i.e., Chengdu, Beijing, Nanjing, and Xi’an, offer a diverse representation of different geographic locations, climate types, levels of economic development, and pollution characteristics [2]. Chengdu, located in southwestern China, suffers from poor atmospheric diffusion due to its basin topography. Beijing, as the nation’s political, economic, and cultural hub, exemplifies a typical northern megacity with significant air pollution issues. Nanjing, located along the Yangtze River, combines high industrialization with a humid climate. Xi’an, a typical inland city in the northwest, is notably affected by dust storms and pollution during the heating season. These cities exhibit significant differences in spatial distribution and environmental characteristics, providing a robust basis for verifying the model’s applicability and stability across diverse geographical and climatic contexts. The air quality data used in this study are publicly available and were retrieved from the Tianqi Houbao (https://www.tianqihoubao.com/, accessed on 12 May 2025). The dataset comprises the AQI and six key pollutants: PM_2.5, PM₁₀, NO₂, CO, SO₂, and O₃. The AQI, a key indicator for measuring air quality, is widely used by the public to assess air quality [9,18,27]. In the modeling process of this study, the AQI is used as the target variable for prediction, while the remaining pollutants serve as input features for the model. Table 3 presents the statistical information of the air quality datasets for the four cities.

Before model training, the raw data undergoes preprocessing. Missing values in the data are filled using cubic spline interpolation. This method constructs a cubic polynomial function for each interpolation interval, approximating the data points in a continuous and smooth manner [46]. It effectively estimates the missing data while preserving the overall smoothness of the trend. Subsequently, min–max normalization is applied to standardize the data. This method linearly maps the original data to the range [0, 1], which improves the model’s stability during training and facilitates effective learning of the weights and biases. The min–max normalization is shown in Equation (34):

X^{'} = \frac{X - \min (X)}{\max (X) - \min (X)},

(34)

where X represents the original data value,

X^{'}

is the normalized value.

3.2.2. Comparison of AQI Prediction Results of sLSTM-Attention Optimized by ITTAO and Benchmark Algorithms

To balance optimization accuracy and computational efficiency, the population size for ITTAO and other benchmark comparison algorithms is 20, with a maximum of 40 iterations [25,28], enabling efficient search for the optimal hyperparameter configuration of the sLSTM-Attention model.

As shown in Table 4, the comparison results of the sLSTM-Attention models optimized by different benchmark algorithms for air quality prediction across four cities are presented. According to the results, for Chengdu, the RMSE, MAE, and MAPE of the ITTAO-sLSTM-Attention model were 9.6691, 6.1955, and 12.62%, respectively. Compared to the TTAO-sLSTM-Attention model, these three metrics decreased by 5.04%, 9.79%, and 10.38%, respectively. In Beijing, the ITTAO-sLSTM-Attention model showed a reduction of 6.54%, 11.26%, and 16.44% in RMSE, MAE, and MAPE, respectively. For Nanjing, the reduction rates were 10.94%, 16.16%, and 10.47%, respectively. In Xi’an, they decreased by 6.2%, 16.47%, and 10.47%, respectively. Although the improvement magnitudes of various indicators vary across different cities, the ITTAO-sLSTM-Attention model consistently shows a more significant performance improvement compared to the TTAO-sLSTM-Attention model in all cities. This is attributed to the fact that the improved TTAO algorithm can optimize model parameters more efficiently and stably, effectively avoiding local optima and obtaining superior solutions.

Figure 6, Figure 7 and Figure 8 shows the ranking of evaluation indicators of each algorithm in different cities. From the perspective of the RMSE indicator, the ITTAO-sLSTM-Attention model ranks first in all cities. Among the other two evaluation indicators, although the ITTAO-sLSTM-Attention model ranked second in Nanjing City, it ranked first in the other cities. This indicates that the overall performance of the ITTAO-sLSTM-Attention model in various prediction tasks is superior to that of other comparison models, including PSO, WOA, GOA, SSA, TLBO, GSA, and TTAO. It is worth noting that the rankings of various indicators of the TTAO-sLSTM-Attention model in different cities are rather unstable, and it usually ranks behind the WOA-sLSTM-Attention and SSA-sLSTM-Attention models. This further validates the effectiveness and superiority of the ITTAO algorithm.

Figure 9 shows the convergence curves of the ITTAO algorithm in Nanjing and other benchmark algorithms when solving the optimal parameters of the sLSTM-Attention model. The fitness value of the ITTAO algorithm dropped rapidly in the first two iterations, mainly due to the introduction of the Levy flight random walk strategy and the differential evolution strategy, which endowed the algorithm with stronger global and local optimization capabilities. It is worth noting that when the algorithm falls into a local optimum, ITTAO can still demonstrate a strong ability to break away from the local optimum. In contrast, other algorithms suffer from slow convergence, low accuracy, and consistently high fitness values throughout the iteration process, failing to reach the convergence level of ITTAO.

Figure 10 shows the fitting of the ITTAO-sLSTM-Attention model with other benchmark models on the Nanjing air quality dataset. The red fitting curve of the ITTAO-sLSTM-Attention model closely matches the black curve representing the true values, highlighting the model’s excellent predictive performance.

3.2.3. Comparison of Air Quality Prediction Results Between the ITTAO-sLSTM-Attention Model and Benchmark Machine Learning Models

A total of six machine learning algorithms were used for prediction and compared with ITTAO-sLSTM-Attention, including sLSTM-Attention, sLSTM, LSTM [17], BiLSTM [21], CNN-LSTM [19], and GRU [47]. The hyperparameters for each model were configured as follows: a learning rate of 0.0001, a batch size of 64, 64 model units, 50 training epochs, a dropout ratio of 0.2, and the Adam optimizer was utilized [16,17]. For the CNN-LSTM model, the number of filters in the convolutional layer was set to 32, and the kernel size was set to 3 [17]. For the attention mechanism, the number of heads was set to four [18]. Each model was trained on four cities, respectively, and tested on the test set, and the results of RMSE, MAE, and MAPE are shown in Table 5.

According to Table 5, it can be seen that ITTAO-sLSTM-Attention outperforms sLSTM-Attention in the three indicators of RMSE, MAE, and MAPE. Among the four cities, the error value of ITTAO-sLSTM-Attention was significantly lower than that of other models, demonstrating higher prediction accuracy. The advantages of ITTAO-sLSTM-Attention mainly stem from the effective optimization of the hyperparameters of the sLSTM-Attention model by the ITTAO algorithm in different urban scenarios, thereby enhancing the predictive ability of this model.Compared with other models, such as sLSTM and LSTM, ITTAO-sLSTM-Attention consistently maintains excellent performance. The values of RMSE, MAE, and MAPE are significantly lower than those of these models, especially in the comparison with traditional LSTM. The RMSE of the four cities decreased by 23.47%, 13.23%, 19.69%, and 26.46%, respectively, while the MAPE decreased by 33.21%, 26.99%, 31.33%, and 29.11%, respectively, demonstrating the significant advantages of ITTAO-sLSTM-Attention. Meanwhile, the various indicators of the sLSTM-Attention model are also generally superior to the other benchmark models in the four cities. This is attributed to the combination of the sLSTM model and the multi-head attention mechanism, which enables it to better capture the global information of the data and thereby enhances the predictive ability of the model. It can be observed that the running time of the proposed ITTAO-sLSTM-Attention model is significantly longer than that of the comparison models during the training process. This is because the ITTAO algorithm requires multiple iterations to optimize the hyperparameters of the sLSTM-Attention model, whereas the comparison models use fixed hyperparameter values, resulting in shorter training times. However, the proposed model outperforms the comparison models in terms of evaluation metrics across different scenarios, demonstrating that the increased training time is aimed at improving prediction accuracy, despite the higher computational cost. Therefore, in practical applications, when quick prediction results are required, traditional models are more suitable due to their lower computational cost. However, when higher prediction accuracy is desired, the ITTAO-sLSTM-Attention model proposed in this paper is the better choice, offering more accurate predictions at the cost of longer training times.

Figure 11 presents the Taylor graphs comparing the prediction results of all models across four cities, visually illustrating the relationship between the predicted and actual [48]. It can be found that the ITTAO-sLSTM-Attention model has the smallest angle with the observation point among the four cities, and the highest correlation coefficient with the observed values. Its standard deviation is closer to that of the observed values than the other benchmark models, suggesting that the prediction range aligns more closely with the distribution of the observed values, resulting in more accurate predictions. This further demonstrates the validity and generalization of the ITTAO-sLSTM-Attention model in the prediction of air quality in different cities.

To evaluate the performance of the ITTAO-sLSTM-Attention model in the multi-step prediction of future air quality index values, we conducted experiments on the situations with prediction steps of 1, 3, 5, and 7 days, respectively. The experimental results in Table S2 (Supplemental Material) show that the ITTAO-sLSTM-Attention model outperforms the other compared models across all three evaluation metrics. However, as the prediction step size increases, the prediction performance of the model shows a downward trend. This phenomenon can be attributed to the inherent randomness of the air quality sequence and the unpredictable noise. For instance, sudden pollution incidents (such as wildfires) may cause a sharp increase or decrease in the air quality index, and human activities can also lead to fluctuations in the AQI within a short period of time [49]. Because the intensity and distribution of these events are usually difficult to predict, they significantly affect the model’s performance. Despite the performance degradation in multi-step prediction, the ITTAO-sLSTM-Attention model still maintains excellent performance and outperforms the other models at all step sizes. In particular, as the prediction step size increases, the ITTAO-sLSTM-Attention model consistently outperforms the sLSTM-Attention model, and the sLSTM-Attention model outperforms the sLSTM model. This indicates that by introducing the multi-head attention mechanism and the ITTAO-optimized hyperparameter strategy, both models can effectively improve the performance of sLSTM in air quality prediction tasks.

Figure 12 illustrates the performance trend bar charts of different models for air quality prediction in Chengdu, evaluated across multiple metrics and various prediction steps. As shown in Figure 12, the prediction error of all models increases as the prediction horizon becomes longer, reflecting the increasing difficulty of multi-step forecasting tasks. Compared with the baseline models, the ITTAO-sLSTM-Attention model consistently maintains relatively lower errors across all metrics and prediction steps, demonstrating its stable and superior predictive capability.

3.2.4. ITTAO-sLSTM-Attention-Based Air Quality Prediction System

In the above experiments, this paper verified the effectiveness and superiority of the proposed ITTAO-sLSTM-Attention model. Considering its practical application requirements in air quality prediction, we have developed the ITTAO-sLSTM-Attention air quality prediction system based on this model. This system adopts a hierarchical design for the front- and back-ends, and the overall process is shown in Figure 13. The front-end core layer is responsible for user interaction, including functions such as data input, feature and model selection, parameter setting, visual display, and user-defined prediction, ensuring that the operation process is simple, intuitive, and easy to use. The core layer at the back-end undertakes core computing tasks such as data preprocessing, feature engineering, model hyperparameter optimization (implemented through the ITTAO optimization algorithm), model training and evaluation, parameter saving, and final prediction execution. The front-end and back-end work in coordination through clear data flow and control flow, forming a complete closed-loop process from data loading, feature engineering, model configuration, training optimization to displaying results and user prediction, thereby providing users with an efficient and user-friendly air quality prediction and analysis experience.

Figure 14 shows the PyQt system interface for the application of the air quality prediction in Chengdu. Specifically, (a) after the dataset is uploaded to the system, the system automatically performs data preprocessing and visualizes the processing results of each feature in the dataset in the display bar on the right. The figure shows the original data curve and the processed data curve, and the number of processed points is marked with red dots, including the processing points for missing values and outliers. (b) Select the ITTAO-sLSTM-Attention model for the user and set the corresponding parameters. The convergence of various indicators after the training is completed. The system also provides other benchmark models for users to choose from. In this section, the window size is set to 30, the batch size is 64, and the number of training rounds is 50. On the right display bar, the training loss curve is shown, as well as the convergence curves of RMSE, MAE, MAPE, and R² in the training set. Meanwhile, the log information bar provides detailed information about the training process. (c) Display the test results of the model in the test set after the user clicks the “Model Test” button. The log information area on the left shows the test results of RMSE, MAE, and MAPE of the test set, while the display bar on the right draws the comparison curve graph between the real values and the model’s predicted values. (d) Customize the prediction result display for users. Users can predict the air quality for the next 7 days and present it in a table. At the same time, draw the corresponding 7-day prediction curve below. Meanwhile, the system displays the statistical information of each feature of the test set uploaded by the user. Although the proposed air quality prediction system shows promising results, further validation in diverse environments and real-world applications is still necessary.

4. Conclusions

This paper proposes a symmetry-driven hybrid framework for air quality prediction by integrating the ITTAO optimizer with an sLSTM-Attention model. Initially, the structurally symmetric ITTAO was improved by integrating three strategies to enhance its optimization capability. Seven classic swarm intelligence optimization algorithms were selected and compared on the standard benchmark function set CEC2017. The experimental results demonstrate that the proposed ITTAO exhibits significant advantages in solution accuracy, stability, and robustness. Subsequently, the sLSTM-Attention model was developed by combining the temporal symmetry-aware modeling capability of sLSTM with the global balance and symmetry-guided feature extraction ability of the attention mechanism. Leveraging the advantages of ITTAO, the optimizer was applied to tune the hyperparameters of the sLSTM-Attention model, focusing on the dropout ratio and learning rate. Consequently, the ITTAO-sLSTM-Attention air quality prediction model was constructed. To verify its predictive performance, air quality monitoring data from four cities were selected for experimental analysis. The experiments compared the ITTAO-sLSTM-Attention model with sLSTM-Attention models optimized by seven alternative swarm intelligence algorithms. The results demonstrated that ITTAO more effectively optimized the hyperparameters of the sLSTM-Attention model. Furthermore, the ITTAO-sLSTM-Attention model outperformed six common machine learning models across all evaluation metrics, showcasing its superior feature extraction and temporal dependency modeling capabilities for air quality prediction. Finally, to facilitate practical use, an interactive air quality prediction system based on the ITTAO-sLSTM-Attention model was developed. The system supports model training, parameter tuning, prediction, and result visualization with user-friendly operation. The ITTAO-sLSTM-Attention model shows significant advantages in air quality prediction, but there are limitations in its generalization ability, computational efficiency, and the deployment of the prediction system. Future research will focus on optimizing the model to improve its generalization in different environments and reduce computational costs. Additionally, to enhance the functionality and scalability of the prediction system, integration with existing air quality monitoring networks will be considered.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/sym17081369/s1, Table S1: The evaluation results of ITTAO and the comparison algorithms on the test functions; Table S2: Performance indicators of multi-step predictions of each model in different cities.

Author Contributions

Conceptualization, Y.L. and K.Z.; methodology, Y.L. and K.Z.; software, Y.L. and B.L.; validation, K.Z. and B.Y.; data curation, C.T. and K.Z.; writing—original draft preparation, K.Z.; writing—review and editing, Y.L. and F.S.; visualization, F.S. and B.Y.; funding acquisition, C.T. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Guizhou University of Finance and Economics Innovation Exploration and Academic Emerging Project under Grant 2024XSXMB06, and in part by the National Natural Science Foundation of China under Grant 62061007.

Data Availability Statement

The data source is available at the website provided in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dong, J.; Zhang, Y.; Hu, J. Short-term air quality prediction based on EMD-transformer-BiLSTM. Sci. Rep. 2024, 14, 20513. [Google Scholar] [CrossRef] [PubMed]
Huang, R.; Hu, R.; Chen, H. A novel hybrid model for air quality prediction via dimension reduction and error correction techniques. Environ. Monit. Assess. 2025, 197, 96. [Google Scholar] [CrossRef]
Liu, T.; You, S. Analysis and forecast of Beijing’s air quality index based on ARIMA model and neural network model. Atmosphere 2022, 13, 512. [Google Scholar] [CrossRef]
Ali, U.; Bano, S.; Shamsi, M.H.; Sood, D.; Hoare, C.; Zuo, W.; Hewitt, N.; O’Donnell, J. Urban building energy performance prediction and retrofit analysis using data-driven machine learning approach. Energy Build. 2024, 303, 113768. [Google Scholar] [CrossRef]
Pant, A.; Joshi, R.C.; Sharma, S.; Pant, K. Predictive modeling for forecasting air quality index (AQI) using time series analysis. Avicenna J. Environ. Health Eng. 2023, 10, 38–43. [Google Scholar] [CrossRef]
Hu, Y.; Ding, Y.; Jiang, W. Geographically Aware Air Quality Prediction Through CNN-LSTM-KAN Hybrid Modeling with Climatic and Topographic Differentiation. Atmosphere 2025, 16, 513. [Google Scholar] [CrossRef]
Ahmed, M.; Zhang, X.; Shen, Y.; Ahmed, T.; Ali, S.; Ali, A.; Gulakhmadov, A.; Nam, W.-H.; Chen, N. Low-cost video-based air quality estimation system using structured deep learning with selective state space modeling. Environ. Int. 2025, 199, 109496. [Google Scholar] [CrossRef] [PubMed]
Song, Q.; Zou, J.; Xu, M.; Xi, M.; Zhou, Z. Air quality prediction for Chengdu based on long short-term memory neural network with improved jellyfish search optimizer. Environ. Sci. Pollut. Res. 2023, 30, 64416–64442. [Google Scholar] [CrossRef]
Liaw, J.-J.; Chen, K.-Y. Using high-frequency information and RH to estimate AQI based on SVR. Sensors 2021, 21, 3630. [Google Scholar] [CrossRef]
Ketu, S. Spatial air quality index and air pollutant concentration prediction using linear regression based recursive feature elimination with random forest regression (RFERF): A case study in India. Nat. Hazards 2022, 114, 2109–2138. [Google Scholar] [CrossRef]
Wang, Y.; Kong, T. Air quality predictive modeling based on an improved decision tree in a weather-smart grid. IEEE Access 2019, 7, 172892–172901. [Google Scholar] [CrossRef]
Jin, X.-B.; Wang, Z.-Y.; Gong, W.-T.; Kong, J.-L.; Bai, Y.-T.; Su, T.-L.; Ma, H.-J.; Chakrabarti, P. Variational Bayesian network with information interpretability filtering for air quality forecasting. Mathematics 2023, 11, 837. [Google Scholar] [CrossRef]
Ma, J.; Yu, Z.; Qu, Y.; Xu, J.; Cao, Y. Application of the XGBoost machine learning method in PM2.5 prediction: A case study of Shanghai. Aerosol Air Qual. Res. 2020, 20, 128–138. [Google Scholar] [CrossRef]
Zhang, H.; Srinivasan, R.; Yang, X. Simulation and analysis of indoor air quality in Florida using time series regression (TSR) and artificial neural networks (ANN) models. Symmetry 2021, 13, 952. [Google Scholar] [CrossRef]
Huang, W.; Li, T.; Liu, J.; Xie, P.; Du, S.; Teng, F. An overview of air quality analysis by big data techniques: Monitoring, forecasting, and traceability. Inf. Fusion 2021, 75, 28–40. [Google Scholar] [CrossRef]
Seng, D.; Zhang, Q.; Zhang, X.; Chen, G.; Chen, X. Spatiotemporal prediction of air quality based on LSTM neural network. Alex. Eng. J. 2021, 60, 2. [Google Scholar] [CrossRef]
Jiao, Y.; Wang, Z.; Zhang, Y. Prediction of air quality index based on LSTM. In Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 24–26 May 2019; pp. 17–20. [Google Scholar]
Li, Y.; Jiang, T.; Gu, H.; Lu, W.; Wu, Q.; Yu, Y. Air Quality Index Prediction Based on CNN-LSTM-Attention Hybrid Modeling. In Proceedings of the 2023 International Conference on the Cognitive Computing and Complex Data (ICCD), Huaian, China, 21–22 October 2023; pp. 175–180. [Google Scholar]
Gilik, A.; Ogrenci, A.S.; Ozmen, A. Air quality prediction using CNN+LSTM-based hybrid deep learning architecture. Environ. Sci. Pollut. Res. 2022, 29, 11920–11938. [Google Scholar] [CrossRef]
Guo, Z.; Jing, X.; Ling, Y.; Yang, Y.; Jing, N.; Yuan, R.; Liu, Y. Optimized air quality management based on air quality index prediction and air pollutants identification in representative cities in China. Sci. Rep. 2024, 14, 17923. [Google Scholar] [CrossRef]
Zhang, M.; Wu, D.; Xue, R. Hourly prediction of PM 2.5 concentration in Beijing based on Bi-LSTM neural network. Multimed. Tools Appl. 2021, 80, 24455–24468. [Google Scholar] [CrossRef]
Hossain Sani, S.; Shopon, M.; Hossain Khan, M.; Hasan, M.; Mridha, M.F. Short-term and Long-term Air Quality Forecasting Technique Using Stacked LSTM. In Proceedings of the 6th International Conference on Communication and Information Processing, Tokyo, Japan, 27–29 November 2020; pp. 165–171. [Google Scholar]
Kafi, F.; Yousefi, E.; Ehteram, M.; Ashrafi, K. Stabilized Long Short Term Memory (SLSTM) model: A new variant of the LSTM model for predicting ozone concentration data. Earth Sci. Inform. 2025, 18, 311. [Google Scholar] [CrossRef]
Dong, X.; Li, D.; Wang, W.; Shen, Y. BWO-CAformer: An improved Informer model for AQI prediction in Beijing and Wuhan. Process Saf. Environ. Prot. 2025, 195, 106800. [Google Scholar] [CrossRef]
Zhu, X.; Zou, F.; Li, S. Enhancing Air Quality Prediction with an Adaptive PSO-Optimized CNN-Bi-LSTM Model. Appl. Sci. 2024, 14, 5787. [Google Scholar] [CrossRef]
Zhao, X.; Li, C.; Zou, X.; Du, X.; Ismail, A. Passenger Flow Prediction for Rail Transit Stations Based on an Improved SSA-LSTM Model. Mathematics 2024, 12, 3556. [Google Scholar] [CrossRef]
Liu, Y.; Zheng, R.; Yu, B.; Liao, B.; Song, F.; Tang, C. An Improved Chaotic Game Optimization Algorithm and Its Application in Air Quality Prediction. Axioms 2025, 14, 235. [Google Scholar] [CrossRef]
Aarthi, C.; Ramya, V.J.; Falkowski-Gilski, P.; Divakarachari, P.B. Balanced Spider Monkey Optimization with Bi-LSTM for Sustainable Air Quality Prediction. Sustainability 2023, 15, 1637. [Google Scholar] [CrossRef]
Zhao, S.; Zhang, T.; Cai, L.; Yang, R. Triangulation topology aggregation optimizer: A novel mathematics-based meta-heuristic algorithm for continuous optimization and engineering applications. Expert Syst. Appl. 2024, 238, 121744. [Google Scholar] [CrossRef]
Che, Z.; Peng, C.; Yue, C. Optimizing LSTM with multi-strategy improved WOA for robust prediction of high-speed machine tests data. Chaos Solitons Fractals 2024, 178, 114394. [Google Scholar] [CrossRef]
Zeng, N.; Song, D.; Li, H.; You, Y.; Liu, Y.; Alsaadi, F.E. A competitive mechanism integrated multi-objective whale optimization algorithm with differential evolution. Neurocomputing 2021, 432, 170–182. [Google Scholar] [CrossRef]
Liu, Z.; Li, X.; Lu, Z.; Meng, X. IWOA-RNN: An improved whale optimization algorithm with recurrent neural networks for traffic flow prediction. Alex. Eng. J. 2025, 117, 563–576. [Google Scholar] [CrossRef]
Beck, M.; Pöppel, K.; Spanring, M.; Auer, A.; Prudnikova, O.; Kopp, M.; Klambauer, G.; Brandstetter, J.; Hochreiter, S. XLSTM: Extended long short-term memory. arXiv 2024, arXiv:2405.04517. [Google Scholar]
Kumar, A.; Narapareddy, V.T.; Srikanth, V.A.; Malapati, A.; Neti, L.B.M. Sarcasm detection using multi-head attention based bidirectional LSTM. IEEE Access 2020, 8, 6388–6397. [Google Scholar] [CrossRef]
Wei, Y.; Liu, H. Convolutional long-short term memory network with multi-head attention mechanism for traffic flow prediction. Sensors 2022, 22, 7994. [Google Scholar] [CrossRef] [PubMed]
Chakraborty, S.; Sharma, S.; Saha, A.K.; Saha, A. A novel improved whale optimization algorithm to solve numerical optimization and real-world applications. Artif. Intell. Rev. 2022, 55, 4605–4716. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Saremi, S.; Mirjalili, S.; Lewis, A. Grasshopper optimisation algorithm: Theory and application. Adv. Eng. Softw. 2017, 105, 30–47. [Google Scholar] [CrossRef]
Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
Rao, R.V. Teaching-Learning-Based Optimization Algorithm; Springer: Cham, Switzerland, 2016. [Google Scholar]
Rashedi, E.; Nezamabadi-Pour, H.; Saryazdi, S. GSA: A gravitational search algorithm. Inf. Sci. 2009, 179, 2232–2248. [Google Scholar] [CrossRef]
Elmogy, A.; Miqrish, H.; Elawady, W.; El-Ghaish, H. ANWOA: An adaptive nonlinear whale optimization algorithm for high-dimensional optimization problems. Neural Comput. Appl. 2023, 35, 22671–22686. [Google Scholar] [CrossRef]
Chen, Q.; Yao, G.; Yang, L.; Liu, T.; Sun, J.; Cai, S. Research on ship replenishment path planning based on the modified whale optimization algorithm. Biomimetics 2025, 10, 179. [Google Scholar] [CrossRef]
Wang, H.; Tang, J.; Pan, Q. MSI-HHO: Multi-strategy improved HHO algorithm for global optimization. Mathematics 2024, 12, 415. [Google Scholar] [CrossRef]
Revathi, A.R.; Arockia, A.A. An Overview of Predictive Analysis for Air Quality Measurements and Visualization. In Proceedings of the 2024 International Conference on Computing and Data Science (ICCDS), Chennai, India, 26–27 April 2024; pp. 1–6. [Google Scholar]
Wang, X.; Yan, J.; Wang, X.; Wang, Y. Air quality forecasting using the GRU model based on multiple sensors nodes. IEEE Sens. Lett. 2023, 7, 1–4. [Google Scholar] [CrossRef]
Wang, Y.; Liu, K.; He, Y.; Wang, P.; Chen, Y.; Xue, H.; Huang, C.; Li, L. Enhancing air quality forecasting: A novel spatio-temporal model integrating graph convolution and multi-head attention mechanism. Atmosphere 2024, 15, 418. [Google Scholar] [CrossRef]
Zhou, H.; Yan, Y. Research on a hybrid deep learning model based on two-stage decomposition and an improved whale optimization algorithm for air quality index prediction. Eng. Appl. Comput. Fluid Mech. 2025, 19, 2507753. [Google Scholar] [CrossRef]

Figure 1. Process of sLSTM-Attention model.

Figure 2. Architecture of an sLSTM cell.

Figure 3. The overall process of the ITTAO-sLSTM-Attention model.

Figure 4. The average ranking of each algorithm in different types of test functions.

Figure 5. Convergence curves of each algorithm under different test functions. (a) corresponds to unimodal functions, (b–d) correspond to simple multimodal functions, (e–h) correspond to hybrid functions, and (i–l) correspond to composite functions.

Figure 6. RMSE ranking of different models for air quality prediction across four cities.

Figure 7. MAE ranking of different models for air quality prediction across four cities.

Figure 8. MAPE ranking of different models for air quality prediction across four cities.

Figure 9. The convergence curves of parameter optimization for each model in Nanjing City.

Figure 10. Comparison of predicted and actual AQI values for each model in Nanjing city.

Figure 11. Taylor graphs for prediction results of the ITTAO-sLSTM-Attention model and the benchmark machine learning models in different cities. (a–d) correspond, respectively, to Chengdu, Beijing, Nanjing and Xi’an.

Figure 12. Trends of different indicators in various models of Chengdu with different prediction steps.

Figure 13. Design diagram of the air quality prediction system.

Figure 14. Interface of ITTAO-sLSTM-Attention-based air quality prediction system.

Table 1. Parameter settings for the tested algorithms.

Name	Parameters	Values
PSO	$W, c 1, c 2$	0.9 to 0.4, 2, 2
WOA	$b, a, l$	1, [0, 2], [−1, 1]
GOA	$L, f, c$	1.5, 0.5, [0, 1]
SSA	$P D, S D$	0.2, 0.1
TLBO	$T F$	1 or 2
GSA	$G_0$ , $α$	100, 20
TTAO	$r 0, r 1, r 2, r 3, r 4$	[0, 1]

Table 2. Wilcoxon test results between the ITTAO algorithm and compared algorithms.

	PSO	WOA	GOA	SSA	TLBO	GSA	TTAO
F1	$2.87 \times 10^{- 11}$	$2.87 \times 10^{- 11}$	$2.87 \times 10^{- 11}$	$2.87 \times 10^{- 11}$	$4.28 \times 10^{- 7}$	$2.87 \times 10^{- 11}$	$4.91 \times 10^{- 6}$
F3	$4.58 \times 10^{- 6}$	$2.87 \times 10^{- 11}$	$2.87 \times 10^{- 11}$	$4.75 \times 10^{- 3}$	$2.87 \times 10^{- 11}$	$2.87 \times 10^{- 11}$	$1.93 \times 10^{- 6}$
F4	$2.87 \times 10^{- 11}$	$2.87 \times 10^{- 11}$	$2.87 \times 10^{- 11}$	$1.21 \times 10^{- 1}$	$3.85 \times 10^{- 2}$	$2.87 \times 10^{- 11}$	$8.82 \times 10^{- 5}$
F5	$4.49 \times 10^{- 8}$	$2.87 \times 10^{- 11}$	$2.87 \times 10^{- 11}$	$5.77 \times 10^{- 11}$	$3.70 \times 10^{- 6}$	$2.87 \times 10^{- 11}$	$6.57 \times 10^{- 6}$
F6	$9.18 \times 10^{- 14}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$7.73 \times 10^{- 18}$	$7.37 \times 10^{- 5}$	$6.86 \times 10^{- 18}$	$5.35 \times 10^{- 14}$
F7	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$8.75 \times 10^{- 11}$	$6.86 \times 10^{- 18}$	$4.24 \times 10^{- 11}$
F8	$1.43 \times 10^{- 15}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.33 \times 10^{- 17}$	$8.81 \times 10^{- 8}$	$6.86 \times 10^{- 18}$	$1.26 \times 10^{- 2}$
F9	$5.53 \times 10^{- 11}$	$2.55 \times 10^{- 17}$	$2.01 \times 10^{- 17}$	$8.72 \times 10^{- 18}$	$1.06 \times 10^{- 3}$	$9.83 \times 10^{- 18}$	$7.68 \times 10^{- 4}$
F10	$2.37 \times 10^{- 9}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.84 \times 10^{- 6}$	$8.80 \times 10^{- 12}$	$6.86 \times 10^{- 18}$	$3.63 \times 10^{- 1}$
F11	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$4.57 \times 10^{- 15}$	$5.02 \times 10^{- 7}$	$6.86 \times 10^{- 18}$	$5.35 \times 10^{- 3}$
F12	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$5.58 \times 10^{- 3}$	$6.86 \times 10^{- 18}$	$4.16 \times 10^{- 3}$
F13	$6.88 \times 10^{- 16}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.74 \times 10^{- 1}$	$6.86 \times 10^{- 18}$	$8.11 \times 10^{- 2}$
F14	$1.52 \times 10^{- 6}$	$8.21 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.72 \times 10^{- 6}$	$3.77 \times 10^{- 8}$	$6.86 \times 10^{- 18}$	$1.93 \times 10^{- 4}$
F15	$3.22 \times 10^{- 5}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.26 \times 10^{- 2}$	$6.86 \times 10^{- 18}$	$9.45 \times 10^{- 3}$
F16	$2.75 \times 10^{- 11}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$2.94 \times 10^{- 15}$	$2.73 \times 10^{- 4}$	$6.86 \times 10^{- 18}$	$3.37 \times 10^{- 2}$
F17	$2.23 \times 10^{- 15}$	$1.59 \times 10^{- 17}$	$6.86 \times 10^{- 18}$	$1.04 \times 10^{- 16}$	$6.99 \times 10^{- 1}$	$6.86 \times 10^{- 18}$	$4.82 \times 10^{- 16}$
F18	$1.08 \times 10^{- 2}$	$1.79 \times 10^{- 17}$	$6.86 \times 10^{- 18}$	$5.39 \times 10^{- 7}$	$2.41 \times 10^{- 6}$	$6.86 \times 10^{- 18}$	$3.59 \times 10^{- 2}$
F19	$7.01 \times 10^{- 8}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.33 \times 10^{- 2}$	$6.86 \times 10^{- 18}$	$5.58 \times 10^{- 1}$
F20	$1.15 \times 10^{- 9}$	$9.29 \times 10^{- 15}$	$2.01 \times 10^{- 17}$	$1.73 \times 10^{- 10}$	$1.05 \times 10^{- 10}$	$3.23 \times 10^{- 17}$	$2.84 \times 10^{- 3}$
F21	$2.47 \times 10^{- 16}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$2.40 \times 10^{- 17}$	$6.47 \times 10^{- 3}$	$6.86 \times 10^{- 18}$	$2.31 \times 10^{- 3}$
F22	$1.55 \times 10^{- 7}$	$5.16 \times 10^{- 17}$	$3.03 \times 10^{- 14}$	$4.06 \times 10^{- 12}$	$1.36 \times 10^{- 5}$	$6.86 \times 10^{- 18}$	$1.26 \times 10^{- 2}$
F23	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.04 \times 10^{- 16}$	$8.76 \times 10^{- 5}$	$6.86 \times 10^{- 18}$	$8.03 \times 10^{- 5}$
F24	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.39 \times 10^{- 13}$	$8.18 \times 10^{- 6}$	$6.86 \times 10^{- 18}$	$8.09 \times 10^{- 6}$
F25	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.08 \times 10^{- 2}$	$6.00 \times 10^{- 7}$	$6.86 \times 10^{- 18}$	$2.61 \times 10^{- 7}$
F26	$2.25 \times 10^{- 7}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.28 \times 10^{- 15}$	$9.92 \times 10^{- 6}$	$6.86 \times 10^{- 18}$	$1.13 \times 10^{- 6}$
F27	$6.92 \times 10^{- 17}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.65 \times 10^{- 16}$	$1.32 \times 10^{- 3}$	$6.86 \times 10^{- 18}$	$1.29 \times 10^{- 3}$
F28	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$4.06 \times 10^{- 2}$	$1.24 \times 10^{- 12}$	$6.86 \times 10^{- 18}$	$3.49 \times 10^{- 12}$
F29	$6.88 \times 10^{- 16}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.47 \times 10^{- 16}$	$2.64 \times 10^{- 1}$	$6.86 \times 10^{- 18}$	$7.09 \times 10^{- 2}$
F30	$1.04 \times 10^{- 17}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$6.86 \times 10^{- 18}$	$1.14 \times 10^{- 3}$	$6.86 \times 10^{- 18}$	$2.98 \times 10^{- 11}$

Table 3. Data and statistical information of four cities.

Type	Symbol	Unit	Chengdu		Beijing		Nanjing		Xi’an
			Max	Min	Max	Min	Max	Min	Max	Min
Input	PM_2.5	μg/m³	188	3	208	1	134	3	294	4
	PM₁₀	μg/m³	399	5	1563	4	502	6	1116	8
	NO₂	μg/m³	13	1	13	1	16	2	27	2
	CO	μg/m³	1.55	0.26	2.33	0.07	1.61	0.28	2.12	0.25
	SO₂	μg/m³	92	5	88	1	106	5	93	6
	O₃	μg/m³	144	3	200	2	178	4	152	4
Output	AQI	\	278	11	471	11	282	11	464	1

Table 4. Evaluation results of the ITTAO-sLSTM-Attention model and the benchmark models.

City	Metric	ITTAO-sLSTM-Attention	TTAO-sLSTM-Attention	TLBO-sLSTM-Attention	GSA-sLSTM-Attention	PSO-sLSTM-Attention	WOA-sLSTM-Attention	GOA-sLSTM-Attention	SSA-sLSTM-Attention
Chengdu	RMSE	9.6691	10.1826	9.7684	9.9987	10.6061	10.0472	12.0738	9.934
	MAE	6.1955	6.8860	6.5147	6.7967	7.5487	6.8215	9.5128	6.3692
	MAPE	12.62%	14.09%	13.05%	13.77%	14.60%	15.20%	22.58%	13.31%
Beijing	RMSE	9.2070	9.8521	10.2551	9.6399	9.6254	9.4654	10.5076	9.9666
	MAE	6.7044	7.5505	7.9742	7.3157	7.1823	7.4095	8.0599	7.5945
	MAPE	14.17%	17.08%	18.12%	16.65%	15.67%	15.46%	20.43%	17.01%
Nanjing	RMSE	7.3518	8.8294	8.298	7.8076	7.4393	7.7455	7.9834	7.9767
	MAE	5.0078	7.2052	6.3283	5.804	4.8889	5.3336	5.536	5.1906
	MAPE	11.21%	17.61%	14.94%	13.72%	11.75%	12.00%	12.65%	11.14%
Xi’an	RMSE	8.6288	9.493	9.6946	12.0535	9.3095	8.8342	10.2745	10.2745
	MAE	5.7291	6.7199	7.2448	9.4564	6.9736	6.0747	7.1876	7.5513
	MAPE	8.46%	10.03%	10.92%	14.34%	10.47%	9.06%	10.93%	11.89%

Table 5. The evaluation results of the ITTAO-sLSTM-Attention model and the benchmark machine learning model.

City	Metric	ITTAO-sLSTM-Attention	sLSTM-Attention	sLSTM	LSTM	BiLSTM	CNN-LSTM	GRU
Chengdu	RMSE	9.6691	10.7194	11.2653	12.6346	12.9266	10.4963	11.2957
	MAE	6.1955	7.1573	8.0308	8.5086	8.5458	6.9863	8.1390
	MAPE	12.6297%	15.4566%	18.2234%	18.9092%	18.6987%	14.9360%	17.8583%
	Running Time (s)	2581.25	26.73	21.51	18.34	20.56	23.67	18.32
Beijing	RMSE	9.2070	9.9934	10.6111	11.7445	11.5034	11.1279	10.0414
	MAE	6.7044	7.6021	8.3294	9.6718	9.3110	8.8889	7.6968
	MAPE	14.1784%	16.7536%	19.4207%	23.1820%	22.0361%	20.5829%	17.4201%
	Running Time (s)	2432.17	25.72	20.47	18.02	19.52	23.21	18.16
Nanjing	RMSE	7.3518	7.9808	9.8077	9.1850	8.0070	7.8015	8.8617
	MAE	5.0078	5.8507	7.4301	6.6148	5.3837	5.1605	6.4532
	MAPE	11.2071%	13.9877%	18.9729%	16.3209%	12.3919%	11.4542%	16.0052%
	Running Time (s)	2723.55	25.67	20.11	19.72	23.86	25.41	20.59
Xi’an	RMSE	8.6288	9.2009	10.4861	11.7340	9.1246	12.2160	11.7522
	MAE	5.7291	6.5227	7.6187	8.5446	6.4056	9.5266	8.7587
	MAPE	8.4626%	9.7991%	11.2464%	11.9373%	9.1916%	13.9910%	12.3041%
	Running Time (s)	2673.72	27.29	25.54	20.52	21.62	23.12	20.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Zhang, K.; Yu, B.; Liao, B.; Song, F.; Tang, C. A Symmetry-Driven Hybrid Framework Integrating ITTAO and sLSTM-Attention for Air Quality Prediction. Symmetry 2025, 17, 1369. https://doi.org/10.3390/sym17081369

AMA Style

Liu Y, Zhang K, Yu B, Liao B, Song F, Tang C. A Symmetry-Driven Hybrid Framework Integrating ITTAO and sLSTM-Attention for Air Quality Prediction. Symmetry. 2025; 17(8):1369. https://doi.org/10.3390/sym17081369

Chicago/Turabian Style

Liu, Yanping, Kunkun Zhang, Bohao Yu, Bin Liao, Fuhong Song, and Chunju Tang. 2025. "A Symmetry-Driven Hybrid Framework Integrating ITTAO and sLSTM-Attention for Air Quality Prediction" Symmetry 17, no. 8: 1369. https://doi.org/10.3390/sym17081369

APA Style

Liu, Y., Zhang, K., Yu, B., Liao, B., Song, F., & Tang, C. (2025). A Symmetry-Driven Hybrid Framework Integrating ITTAO and sLSTM-Attention for Air Quality Prediction. Symmetry, 17(8), 1369. https://doi.org/10.3390/sym17081369

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Symmetry-Driven Hybrid Framework Integrating ITTAO and sLSTM-Attention for Air Quality Prediction

Abstract

1. Introduction

2. Methodology

2.1. Triangulation Topology Aggregation Optimizer

2.2. Improved Triangulation Topology Aggregation Optimizer

2.2.1. Random Walk Strategy Using Lévy Flight

2.2.2. Differential Evolution Strategy

2.2.3. Dimensional Pinhole Imaging Inverse Learning Strategy

2.3. sLSTM-Attention Model

2.3.1. sLSTM Model

2.3.2. Multi-Head Attention Mechanism

2.4. ITTAO-sLSTM-Attention Model

2.5. Evaluation Metrics

3. Experimental Results and Discussion

3.1. Performance Analysis of the ITTAO Algorithm

3.1.1. Test Functions and Basic Configurations

3.1.2. Analysis of Statistical Results

3.1.3. Wilcoxon Test

3.2. Air Quality Prediction Using ITTAO-sLSTM-Attention

3.2.1. Data Source and Preprocessing

3.2.2. Comparison of AQI Prediction Results of sLSTM-Attention Optimized by ITTAO and Benchmark Algorithms

3.2.3. Comparison of Air Quality Prediction Results Between the ITTAO-sLSTM-Attention Model and Benchmark Machine Learning Models

3.2.4. ITTAO-sLSTM-Attention-Based Air Quality Prediction System

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI