Time Series-Based PM2.5 Concentration Prediction Model Incorporating Attention Mechanism

Cheng, Xiaolong; Li, Moye; Ke, Yangzhong; Li, Bingzi; Huang, Yuemei

doi:10.3390/su18042038

Open AccessArticle

Time Series-Based PM_2.5 Concentration Prediction Model Incorporating Attention Mechanism

by

Xiaolong Cheng

^1,2

,

Moye Li

^1,*

,

Yangzhong Ke

¹

,

Bingzi Li

¹

and

Yuemei Huang

¹

College of Civil and Surveying and Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China

²

Jiangxi Province Key Laboratory of Water Ecological Conservation in Headwater Regions, Jiangxi University of Science and Technology, 1958 Ke-jia Road, Ganzhou 341000, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(4), 2038; https://doi.org/10.3390/su18042038

Submission received: 22 December 2025 / Revised: 6 February 2026 / Accepted: 11 February 2026 / Published: 17 February 2026

Download

Browse Figures

Versions Notes

Abstract

As a key indicator of air quality, effective forecasting of

{PM}_{2.5}

concentration can provide key technical support for the scientific and precise implementation of air pollution prevention and control. However, predicting

{PM}_{2.5}

concentrations faces challenges such as multiple influencing factors, long-term temporal dependencies, and inherent nonlinearity. Furthermore, traditional Long Short-Term Memory (LSTM) networks not only fail to effectively grasp the dependency relationships in long-time-span data, but also encounter difficulties in fully integrating and exploiting the information of numerous influencing factors. In order to solve these problems, a novel prediction model (OVMD–PeepholeLSTM–attention) for

{PM}_{2.5}

concentration was presented in this study, which includes Peephole Long Short-Term Memory (PeepholeLSTM), optimal variational mode decomposition (OVMD) and an attention mechanism (AM). In this study, K modal components result from the initial decomposition of

{PM}_{2.5}

monitoring data using OVMD. The obtained components are then individually predicted by the PeepholeLSTM–attention model, and the final prediction is reconstructed. The proposed model was comprehensively evaluated on

{PM}_{2.5}

concentration monitoring data sets from Guangzhou and Shenzhen in China from 2020 to 2022, through a series of comparative experiments. The model proposed in this study is shown by experimental results to reduce mean absolute error (MAE) by approximately 39%, root mean square error (RMSE) by 45%, and increases the fitting coefficient (

R^{2}

) by 0.0457 in Guangzhou compared to the single PeepholeLSTM model. The corresponding improvements in Shenzhen are 45% for MAE, 51% for RMSE, and 0.0765 for

R^{2}

. This indicates that the model proposed in this paper exhibits higher accuracy in terms of predicting

{PM}_{2.5}

concentrations, and the research results can provide a basis for quantitative assessment and scientific decision-making for the sustainable development of urban ecological environments.

Keywords:

PM_2.5 concentration prediction; PeepholeLSTM; optimal variational mode decomposition; attention mechanism; sustainability

1. Introduction

Air quality in China has deteriorated markedly amid rapid economic growth, and

{PM}_{2.5}

pollution has become increasingly severe. These fine particles carry a multitude of harmful substances, remain suspended for long durations, and can travel considerable distances. Thus, they act as a carriers of various harmful pollutants in the air, significantly degradeing air quality and visibility. And they also simultaneously constitute serious threats to human health, such as impaired renal function, coronary artery disease, asthma, lung cancer, and other respiratory diseases [1,2,3,4]. Therefore, building a high-precision

{PM}_{2.5}

concentration prediction model is of key significance for achieving proactive warnings of air pollution and precise management and control. An accurate prediction model can quantify the spatiotemporal evolution patterns of air pollutants, providing quantitative decision-making support for the sustainable management of urban ecological environments. Its prediction results can directly support practical actions such as the dynamic formulation of pollution control measures, the scientific optimization of emission reduction plans, and the efficient allocation of environmental governance resources. Research on

{PM}_{2.5}

concentration prediction has been validated and applied in practical scenarios such as urban air environment monitoring and regional sustainable development assessment, both domestically and internationally, becoming an important technological support for the promotion of sustainable development in the ecological environment field.

At present, approaches to forecasting

{PM}_{2.5}

concentrations generally fall into two broad categories: mechanistic and machine learning models. Mechanistic approaches rely on fundamental physical and chemical principles for their implementation. These approaches simulate the diffusion, transport, and deposition processes of pollutant plumes, with the ultimate goal of estimating ambient pollutant concentrations [5,6,7,8]. These methods provide good interpretability. However, such models require prior knowledge of emission sources, meteorological conditions, and geographical features, and rely on empirical parameters and assumptions to estimate future concentration changes, which limits their applicability to specific scenarios [9]. For example, AERMOD is an empirical model primarily designed for simulating and predicting small-scale pollutant dispersion [5]. When third-generation air quality models are applied, complete emission inventories and accurate meteorological fields are required as inputs, which renders these models inapplicable in most cases [6]. Prediction methods based on mechanistic models simulate the physical evolution of atmospheric elements and their interregional transport. Although mechanism-based methods can simulate the physical evolution and interregional transport of atmospheric components, their prediction accuracy is largely constrained by the difficulty of obtaining precise input data as well as sufficient knowledge of emission sources, reaction mechanisms, and chemical kinetics [10,11].

Compared with mechanism-based approaches, machine learning methods are more effective in terms of capturing the nonlinear characteristics of

{PM}_{2.5}

and have become a research hotspot worldwide. Traditional machine learning encompasses a wide range of predictive models. Among these, Support Vector Machines (SVMs) stand out as representative and widely applied approaches. By employing Particle Swarm Optimization (PSO) to optimize a hybrid kernel SVM, Zhang et al. developed a PM_2.5 prediction model that achieved strong accuracy and efficiency [12]. The field of

{PM}_{2.5}

prediction has also seen the application of the Extreme Learning Machine (ELM)—a rapid learning algorithm for Single Hidden Layer Feedforward Neural Networks (SLFNs), where its optimized variants have demonstrated promising predictive performance [13,14]. Studies further indicate that meteorological factors, such as surface pressure, precipitation, and temperature, as well as other pollutants including PM₁₀, CO, and SO₂, play significant roles in influencing

{PM}_{2.5}

concentration levels [15,16]. Although traditional machine learning methods exhibit certain capabilities in handling nonlinear data, they are limited in extracting deeper representations from large-scale datasets [17]. Building on traditional machine learning, the advanced deep learning provides a new paradigm for atmospheric time series prediction by enabling hierarchical feature learning from large inputs. The deep learning field encompasses a variety of specialized architectures, notably Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs). RNNs have demonstrated particular advantages in time series modeling [18]. It is well-established that LSTM, the predominant RNN variant, is extensively applied to

{PM}_{2.5}

prediction [19,20]. By introducing memory cells and gating mechanisms, LSTM effectively addresses the long-term dependency problem and retains temporal information. Designed as a streamlined alternative to LSTM, the Gated Recurrent Unit (GRU) reduces model complexity and accelerates training, which has led to its wide adoption in

{PM}_{2.5}

prediction tasks [21,22,23,24]. However, the relationship between

{PM}_{2.5}

concentrations and meteorological factors is characterized by a dynamic interplay, where complexity stems from the interconnectedness of multiple variables rather than from any single factor. When considered independently, these factors cannot fully capture the interactive effects of multiple variables on

{PM}_{2.5}

concentrations, which limits prediction accuracy.

To elevate the accuracy of predictive models, numerous scholars have explored a range of avenues, such as optimizing the internal architecture of existing models and integrating heterogeneous model frameworks into a unified hybrid system. A prominent example of such optimized architectures is the Bidirectional Long Short-Term Memory (BiLSTM) network, an advanced iteration of the traditional LSTM that mitigates predictive deviations by effectively capturing sequential features in both forward and backward directions [25]. Other studies have focused on parameter optimization of LSTM models using optimization algorithms and their improved versions, including the Genetic Algorithm (GA) [26], Quantum Particle Swarm Optimization (QPSO) [27], Bayesian Optimization (BO) [28], and the Whale Optimization Algorithm (WOA) [29]. These algorithms perform global searches to automatically determine optimal hyperparameters (e.g., training epochs, learning rate, and batch size), thereby avoiding inefficient manual tuning and improving stability and accuracy. In addition, hybrid neural networks have been developed to exploit the complementary strengths of different architectures, improving accuracy, generalization, adaptability to complex data, and robustness against noise. For example, the fusion of convolutional and recurrent networks has emerged as a viable strategy to enhance the efficiency of prediction tasks in deep learning applications [30,31,32]. Ding et al. [33] proposed a hybrid model integrating LSTM and a weighted Random Forest (RF) for

{PM}_{2.5}

prediction. In this framework, by employing RF to assess the importance of input variables (e.g., temperature, wind speed, and historical

{PM}_{2.5}

concentrations), subsequently identifying and selecting the most relevant features. A fully connected network (FCN) was then employed to assign weights to these features, quantifying their relative influence. Finally, the LSTM processed the weighted feature sequence to capture long-term temporal dependencies, enabling it to predict

{PM}_{2.5}

concentrations over the subsequent six hours.

In order to optimize the model’s performance and improve the accuracy of predictions, previous studies have explored model enhancements and the fusion of different neural networks. Although some progress has been achieved, limitations remain in capturing complex feature relationships within the data. With the widespread application of attention mechanisms across various fields [34,35], the combination of data preprocessing and machine learning has provided new opportunities for improving prediction performance. Through probabilistic weighting, attention mechanisms extract long-term dependencies and minimize information loss, thereby enhancing overall predictive accuracy. It has been confirmed that

{PM}_{2.5}

prediction models combined with attention mechanisms achieve improved accuracy and generalization [36,37,38]. For example, aiming at two-day-ahead forecasts of

{PM}_{2.5}

, Zhang et al. [39] developed a hybrid CNN–BiLSTM–attention model. This model is designed as a multi-modal fusion system, synthesizing strategies from convolutional, recurrent, and attention-based neural processing. Experiments that were evaluated on data from Shunyi District, Beijing, show that this model outperforms Lasso regression, ridge regression, and XGBoost in both short-term and long-term predictions. It is noteworthy that integrating an attention mechanism into the model can improve prediction accuracy, but it requires calculating attention weights at each time step [25,38,40].

In addition, the integration of data preprocessing techniques has been shown to improve

{PM}_{2.5}

prediction accuracy. This is achieved by simplifying raw sequences and decomposing them into different distinct subsequences, with each bearing a variety of feature information [37]. Notably, researchers have developed hybrid models by fusing neural networks with Empirical Mode Decomposition (EMD), its derivatives, and Variational Mode Decomposition (VMD); these models have since delivered significant improvements in

{PM}_{2.5}

prediction accuracy [37,38,41,42]. To address the challenges in

{PM}_{2.5}

concentration prediction within 0–24 h, Teng et al. [43] introduced a hybrid model that incorporates EMD and Sample Entropy (SE) into a BiLSTM framework. Experimental results indicated that this model outperformed single deep learning models, with short-term (within 6 h) prediction accuracy improved by at least 50%. These studies indicate that employing hybrid data decomposition techniques, coupled with further refinement through secondary decomposition, can achieve more detailed feature extraction and thus enhance the model’s predictive accuracy [44].

Overall, despite the significant progress made in

{PM}_{2.5}

prediction, numerous limitations remain unresolved in current research. For

{PM}_{2.5}

concentration time series characterized by seasonality and influenced by multiple complex factors, existing models fail to effectively address non-stationarity and lack refined decomposition of fluctuating components, often leading to mode mixing. Additionally, most models have a limited ability to focus on key information such as meteorological factors, face difficulties in multi-source data fusion, and some hybrid models suffer from structural redundancy. Although attention mechanisms have been introduced, they are often simply combined with existing models without fully exploiting their advantages. These techniques provide models with higher-quality inputs, enabling efficient fusion of multi-source data and collectively enhancing the capacity of the model for processing sophisticated data structures. Despite the progress achieved, substantial improvements are still required to realize deep collaboration among techniques and to enhance real-time prediction. Therefore, building a real-time

{PM}_{2.5}

concentration prediction model that can accurately forecast dynamic climate changes while offering high adaptability, high interpretability, and high precision is not only a practical requirement for precise air pollution control and sustainable ecological management, but also a key scientific challenge that urgently needs to be addressed.

To address these challenges, this paper proposes a high-accuracy

{PM}_{2.5}

concentration prediction method, with the technical process shown in Figure 1. Using

{PM}_{2.5}

monitoring data from Guangzhou and Shenzhen from 2020 to 2022 as the research object, after completing data supplementation, the seasonal variation characteristics of

{PM}_{2.5}

concentration are first analyzed. The correlations between pollution factors (

{SO}_{2}

, CO,

O_{3}

, and

{NO}_{2}

) and meteorological factors (precipitation, temperature, atmospheric pressure, etc.) with

{PM}_{2.5}

concentration are explored, thereby revealing the driving mechanisms of

{PM}_{2.5}

concentration changes. Subsequently, the preprocessed dataset is divided and normalized. Employing the ‘decomposition–prediction–reconstruction’ modeling approach, this is input into the OVMD–PeepholeLSTM–attention model (hereafter referred to as PeepholeLSTM-OA) to obtain the final

{PM}_{2.5}

concentration predictions. Finally, quantitative evaluation indicators such as MAE, RMSE, and

R^{2}

are calculated to comprehensively assess the predictive performance of the model.

2. Research Data

2.1. Overview of the Study Area

Functioning as both the “dual engines” driving the Pearl River Delta region and major hubs in the global economy, Guangzhou and Shenzhen face severe challenges in air quality management during their rapid development (Figure 2). Studies indicate that Guangzhou’s

{PM}_{2.5}

concentrations display marked seasonal and regional differences—higher levels occur in winter under the influence of continental cold air masses, while summer conditions of high temperature and heavy rainfall facilitate pollutant dispersion and removal [45]. The mountainous terrain in the northern region helps disperse pollution, resulting in lower concentrations. In contrast, the southern plains experience higher concentrations due to poor dispersion conditions and dense industrial and transportation activities. During stable weather conditions, certain industrial zones in Shenzhen are prone to rising concentrations caused by emission accumulation. The complex topography, characterized by alternating mountains and seas, renders

{PM}_{2.5}

pollution subject to the combined effects of local emissions and regional transport, leading to distinct concentration trends across different areas.

In recent years, the governments of two regions have continuously advanced air quality improvements through multiple rounds of targeted initiatives, including industrial restructuring and precise pollution source management [46,47]. Even though the effectiveness of managing anthropogenic sources continues to become apparent, regional air quality is still influenced by a combination of natural and human factors, and the pressure for precise and proactive pollution control has not diminished. For example, in the spring of 2024, affected by the long-distance transport of northern dust storms,

{PM}_{2.5}

concentrations at some monitoring stations in Guangzhou exceeded 100 μg/m³ on a single day, revealing a composite pollution pattern characterized by “external input combined with local generation” [48]. In Shenzhen, some sites exceeded 35 μg/m³ for three consecutive days within a week, with a peak of 42 μg/m³, breaking the city’s long-standing relatively stable air quality pattern. These types of pollution events, caused jointly by seasonal changes, natural meteorological processes, and human economic activities, are characterized by suddenness, short duration, and significant fluctuations in concentration. They are difficult to effectively manage through traditional extensive control measures. However, high-resolution hourly

{PM}_{2.5}

concentration forecasts can accurately capture the real-time concentration changes of such complex pollution, providing critical data support for timely activation and precise implementation of urban pollution warnings and emergency control measures.

2.2. Data Acquisition and Processing

Air pollutant data were obtained from the National Urban Air Quality Real-Time Publishing Platform of the China National Environmental Monitoring Center. Meteorological data were retrieved from the ERA5 reanalysis dataset provided by the European Center for Medium-Range Weather Forecasts (ECMWF). The ERA5 dataset provides global meteorological records featuring a one-hour temporal resolution. The experiment is based on concurrent hourly data for air pollutants and meteorological factors, covering the period between January 2020 and December 2022, were collected and processed, yielding 24,709 samples for Guangzhou and 26,163 samples for Shenzhen. The environmental monitoring stations used in this study were 1345A in Guangzhou and 1359A in Shenzhen. Although this period fell within the special context of global pandemic prevention and control, during which anthropogenic emissions (such as from industrial production, transportation, and residential activities) experienced phased adjustments, the timeframe still maintains both data continuity and representativeness of regional pollution characteristics. It allowed for continuous monitoring of pollutants and meteorological data at stations in Guangzhou and Shenzhen throughout the entire period, enabling a comprehensive capture of the coupling patterns between natural and anthropogenic sources of

{PM}_{2.5}

. The research findings therefore provide valuable reference for complex urban pollution control.

Due to the large sample size and high sampling frequency of hourly air pollution and meteorological data, there are unavoidable data losses in the data collection process caused by equipment malfunctions, maintenance, or extreme weather conditions. Missing data may lead to information loss, thereby affecting model training and predictive performance. Therefore, selecting an appropriate imputation method to ensure data completeness is an essential preprocessing step in model construction. In order to ensure the integrity of sample information and make the imputed values closer to the real situation, this study deletes all time points with complete missing values for all variables in the dataset. The MICE method is then used to fill in time points with only partial missing values.

After data imputation, time series plots of

{PM}_{2.5}

concentrations in Guangzhou and Shenzhen were obtained. As shown in Figure 3, both cities exhibited lower

{PM}_{2.5}

concentrations during spring and summer each year from 2020 to 2022. Concentrations gradually increased during autumn and winter, often peaking in winter and exhibiting a certain degree of periodicity in their variation.

Studies have shown that the variation mechanisms of

{PM}_{2.5}

concentrations are highly complex and dynamic. Air pollutants and meteorological factors exert diverse influences on

{PM}_{2.5}

fluctuations. Therefore, investigating the correlations and driving effects of these factors not only provides reasonable explanations for

{PM}_{2.5}

variation but also offers theoretical support for targeted pollution control measures. In this study,

{SO}_{2}

, CO,

O_{3}

,

{NO}_{2}

, and

{PM}_{10}

were selected as air pollutant factors, while total precipitation (tp), dew point temperature (d2m), surface pressure (sp), wind components (v10 and u10), and temperature (t2m) were chosen as meteorological factors.

The correlations among pollutant factors, meteorological factors, and

{PM}_{2.5}

concentrations were evaluated using the Spearman correlation coefficient. The calculation is expressed by Equation (1) as follows:

ρ = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}

(1)

where

ρ

denotes the Spearman correlation coefficient,

d_{i}

represents the difference between the ranks of two variables, and n is the sample size. The value of

ρ

ranges from −1 to 1. Specifically, a value of

ρ = 1

corresponds to a perfect positive linear relationship between variables, while

ρ = - 1

signifies a perfect negative linear relationship.

Figure 4 shows the Spearman correlation results. This analysis serves as an important preliminary quantitative basis for feature selection in subsequent deep learning model development, providing direct support for the selection of input features and the determination of parameters for the model. It indicates a very strong relationship between

{PM}_{2.5}

and

{PM}_{10}

, with coefficients above 0.9 in both datasets. Other pollutant factors, including

{SO}_{2}

, CO,

O_{3}

, and

{NO}_{2}

, also exhibited positive correlations with

{PM}_{2.5}

to varying degrees. Among the meteorological factors, all except surface pressure (sp) showed negative correlations with

{PM}_{2.5}

, with total precipitation (tp) and dew point temperature (d2m) demonstrating particularly strong negative relationships. The above results confirm that the selected pollutants and meteorological factors are all significantly correlated with changes in

{PM}_{2.5}

concentrations, and no weakly correlated factors can be excluded. Therefore, this study includes all analyzed factors as core feature variables in the input layer of the deep learning model, providing complete data support for the model to capture the multidimensional driving features of

{PM}_{2.5}

concentrations. This also serves as prior knowledge to assist in the targeted adjustment of hyperparameter tuning.

3. Research Methods

In the current research on time series prediction of

{PM}_{2.5}

concentrations, the original monitoring data is highly non-stationary and significantly affected by noise, and it is difficult for a single deep learning model to accurately capture the key features and dynamic relationships of the sequence. To further improve the accuracy and robustness of prediction models and make them more applicable to practical regional air pollution control needs, this paper proposes a

{PM}_{2.5}

concentration prediction model, PeepholeLSTM-OA. This model integrates optimal variational mode decomposition (OVMD), Peephole Long Short-Term Memory network (PeepholeLSTM), and an attention mechanism (AM). OVMD effectively mitigates the non-stationarity and noise interference in the original data, and the attention mechanism is introduced into the PeepholeLSTM model, enabling the model to dynamically focus on the parts of the input sequence most relevant to the current output, thereby enhancing model performance.

3.1. Peephole Long Short-Term Memory Network (Peephole LSTM)

Peephole Long Short-Term Memory (PeepholeLSTM) is a variant of Long Short-Term Memory networks (LSTMs) [49]. It extends and optimizes the traditional LSTM, thereby enhancing its capability to process complex data. The structure of the PeepholeLSTM unit is illustrated in Figure 5. The

{PM}_{2.5}

concentration time series exhibits obvious long-term dependencies and phased abrupt change characteristics. In actual monitoring data,

{PM}_{2.5}

concentrations often show—under a relatively stable background level, influenced by changes in meteorological conditions (such as stagnant weather or temperature inversion) and sudden surges in human emissions—short-term spikes or rapid accumulation followed by slow decay, and there is usually a long interval between these high-pollution events. This characteristic means that, when updating the gating state, models need to accurately perceive and distinguish whether ‘the current change is sufficient to break the existing pollution accumulation state.’ The standard LSTM gating mechanism relies only on the current input and the previous hidden state. In long-span sequences, the influence of the cell state on gating decisions is indirect, which can easily lead to insufficient perception of the pollution accumulation level, especially near concentration spikes or turning points, where gate updates may lag. Peephole LSTM introduces direct access to the previous cell state in the input gate, forget gate, and output gate, enabling gating decisions to explicitly sense the current

{PM}_{2.5}

accumulation level and evolution trend. The update formulas for the gating unit states are provided in Equations (2)–(4).

f_{t} = σ (W_{f} \cdot [C_{t - 1}, h_{t - 1}, x_{t}] + b_{f})

(2)

i_{t} = σ (W_{i} \cdot [C_{t - 1}, h_{t - 1}, x_{t}] + b_{i})

(3)

o_{t} = σ (W_{o} \cdot [C_{t}, h_{t - 1}, x_{t}] + b_{o})

(4)

where

f_{t}

,

i_{t}

, and

o_{t}

denote the forget gate, input gate, and output gate at time step t respectively;

σ

is the Sigmoid activation function; W represents the gate weight matrix corresponding to each gate;

C_{t - 1}

is the previous cell state;

h_{t - 1}

is the previous hidden state;

x_{t}

is the current input; and b denotes the bias vector.

3.2. Attention Mechanism

The core operation of the attention mechanism is computing a weight vector that quantifies the importance of each input sequence element to the current output, essentially implementing a weighted processing procedure. Its core idea is to enable the model to dynamically focus on the parts of the input sequence that are most relevant to the current output, by increasing the weight of important sequences and reducing the weight of non-important data, thereby improving the model’s performance. Especially in time series tasks, the attention mechanism helps the model automatically select historical data most relevant to the prediction, reducing reliance on all inputs. At the same time, it can also alleviate the issues of uneven temporal contributions and redundant multivariate input information in the formation process of

{PM}_{2.5}

, thereby enhancing the model’s ability to capture key stages of pollution evolution. Combined with the cumulative memory characteristics of Peephole LSTM, the attention mechanism further strengthens the model’s capability to represent critical moments and key driving factors, which partly explains the improvement in the model’s predictive performance. Standard attention mechanisms typically involve three vectors: the Query vector (Q), which represents the current task’s focus information; the Key vector (K), which indexes input sequence elements for matching with Q; and the Value vector (V), which contains each element’s actual information and is aggregated via weighted summation. The attention mechanism can generally be divided into the following steps:

(1): Calculate the attention scores. The attention score is derived from a similarity function between the Q and K. Common methods for calculating similarity are shown in Table 1.
(2): Calculate attention weights. To obtain the final attention weights, the raw attention scores are normalized through the application of a softmax function. These weights represent the relative importance of each input element to the current output.

$α_{i} = \frac{exp (score (q, k_{i}))}{\sum_{j = 1}^{n} exp (score (q, k_{j}))}$

(5)
(3): Weighted Summation. Use attention weights to perform a weighted sum of the Value vectors (V), resulting in the final attention value.

$attention = \sum_{i = 1}^{n} α_{i} v_{i}$

(6)

3.3. Optimal Variational Mode Decomposition (OVMD)

The Variational Mode Decomposition (VMD) algorithm is an adaptive signal processing and modal decomposition method that can decompose a time series into K intrinsic mode functions (IMFs) with different central frequencies and bandwidths, thereby achieving effective separation of the original signal [50]. The expression of an amplitude–frequency modulated signal is given as follows:

u_{k} (t) = A_{k} (t) cos [φ_{k} (t)]

(7)

where K denotes the number of decomposition modes,

A_{k} (t)

represents the modal component,

u_{k} (t)

is its amplitude, and

φ_{k} (t)

denotes the instantaneous phase of the mode component

u_{k} (t)

.

Through continuous iterative updates of modal components and central frequencies, it decomposes the original data into K intrinsic mode functions (IMFs) with distinct frequency characteristics. The efficacy of VMD and the subsequent predictive accuracy of the model are largely influenced by the choice of K. An insufficient number of decomposed modes will lead to insufficient decomposition and ineffective identification of data patterns. Conversely, an excessive modal decomposition can result in mode mixing between adjacent components. Therefore, in this study, the Optimal Variational Mode Decomposition (OVMD) algorithm was employed to decompose the

{PM}_{2.5}

monitoring data.

Unlike VMD, which requires empirical setting of hyperparameters, the OVMD algorithm determines the optimal value of K by analyzing the distribution of the center frequencies of the IMF components. In this way, the decomposition avoids both insufficient separation and mode mixing. The update step size

τ

is a core hyperparameter used in the OVMD decomposition process to control the convergence speed of signal decomposition iterations and to balance decomposition accuracy and computational efficiency. Its value directly affects the Residual Evaluation Index (REI), which in turn determines the rationality of the OVMD decomposition results: if

τ

is too large, it can lead to insufficient decomposition and larger residuals, while if it is too small, it increases the computational load and may cause over-decomposition. After the optimal K is obtained, the OVMD method reconstructs the IMF components generated under different step sizes

τ

and calculates their Relative Entropy Index (REI). The step size

τ

that yields the minimum REI value is selected as the optimal parameter, with the range of step size

τ

as [0, 1].

In this study, during the sample construction phase for

{PM}_{2.5}

concentration prediction, the original dataset was first divided into a training set and a test set according to a time sequence. OVMD decomposition was performed only on the training set data to determine the optimal number of modes K, the update step

τ

, and other core parameters. Subsequently, the decomposition parameters determined from the training set were directly applied to the independent test set data for decomposition and subsequent prediction. A sliding window method was used throughout both the training and test sets to partition input and output sequences. The length of the sliding window was determined based on data characteristics and model input requirements, and the window moved continuously during iterations, which effectively reduced the impact of boundary effects on the error of real-time prediction results. The main decomposition procedure of OVMD is illustrated as follows:

(1): The optimal number of decomposition modes (K) is determined by iteratively computing and analyzing the distribution of modal center frequencies across candidate K values.
(2): The modal component sequences are reconstructed, and the Residual Evaluation Index (REI) between the reconstructed sequence and the original sequence is calculated to determine the optimal step size. The calculation of REI is given in Equation (8).

$REI = min \frac{1}{N} \sum_{i = 1}^{N} {[\sum_{k = 1}^{K} U_{k} - f]}_{i}$

(8)

where U denotes the number of decomposed modes, f represents the original signal, and N is the total number of signal samples.

3.4. Model Construction

The PeepholeLSTM-OA model adopts a “decomposition–prediction–reconstruction” framework, as illustrated in Figure 6.

In the decomposition stage, OVMD is employed to decompose the

{PM}_{2.5}

monitoring data, transforming the complex time series into multiple relatively stationary subsequences (IMFs). This adaptive decomposition alleviates non-stationarity and noise in the raw data, thereby providing high-quality inputs for subsequent prediction. In the prediction stage, PeepholeLSTM is used as the base learner. Compared with the traditional LSTM, PeepholeLSTM introduces direct connections from the cell state to the input, forget, and output gates, which enables the network to capture long-term dependencies more accurately within the gating mechanism and enhances its ability to model complex time series. Furthermore, an attention mechanism is incorporated into the PeepholeLSTM framework. The attention mechanism assigns weights to the contributions of different IMF components, allowing the model to focus on critical information while preserving more detailed data features, thus providing a solid foundation for performance improvement. In the reconstruction stage, the prediction results of all IMF components are aggregated through weighted summation to generate the complete

{PM}_{2.5}

concentration sequence. Through this “decomposition–prediction–reconstruction” process, the PeepholeLSTM-OA model achieves comprehensive extraction and utilization of multi-scale features, effectively addressing the non-linearity, non-stationarity, and multiple spatiotemporal characteristics of

{PM}_{2.5}

concentration series. Consequently, the model improves forecasting performance and evaluation metrics, offering a more reliable technical framework for air quality prediction and early warning. The specific prediction procedure is detailed as follows:

(1): ${PM}_{2.5}$ monitoring data were processed for missing value imputation, feature selection, normalization, and split into training and testing sets. These data, together with pollutant and meteorological factors, were used as model inputs (x), then the data were split the dataset into a training set and a test set in an 8:2 ratio.
(2): The training set ${PM}_{2.5}$ time series were decomposed using OVMD. The optimal number of modes K was determined by iteratively evaluating the center frequencies of IMF components, and the update step size $τ$ was set by minimizing the REI. After obtaining the OVMD decomposition parameters, the test set was then decomposed into K intrinsic modal functions (IMF 1, IMF 2, …, IMF K) of different frequency features. Each captured different frequency features such as trends, periodic variations, and high-frequency fluctuations in the original data, thus achieving ${PM}_{2.5}$ data decomposition.
(3): Each IMF subsequence was subjected to separate prediction via a PeepholeLSTM network, with the network leveraging the preceding 12 h of data to forecast the hourly ${PM}_{2.5}$ concentration for the subsequent time step, obtaining prediction results for each IMF component. In this framework, the Peephole LSTM captured trend and seasonal patterns while learning nonlinear temporal dependencies and long-term correlations.
(4): An attention mechanism was applied to assign weights $α_{i}$ to each IMF prediction according to its importance, generating a weighted output.
(5): The PeepholeLSTM-OA model predictions were fused and reconstructed on the test set. Model performance was evaluated using mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination ( $R^{2}$ ).

3.5. Error Evaluation Indices

To evaluate model performance, the coefficient of determination (

R^{2}

) was used to measure regression line fitting accuracy. This metric ranges from 0 to 1, with values close to 1 signifying strong alignment between observed and predicted data. The MAE and RMSE were also computed to quantify the PeepholeLSTM-OA model’s predictive performance and generalization capability for

{PM}_{2.5}

concentrations. The formulas for

R^{2}

, MAE and RMSE are shown in Equations (9)–(11), respectively, as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(9)

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(10)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(11)

where n denotes the number of data samples,

{\hat{y}}_{i}

represents the predicted value,

y_{i}

denotes the observed value, and

{\bar{y}}_{i}

is the mean of the observed values.

4. Analysis of ${PM}_{2.5}$ Prediction Results

For neural network and machine learning models, parameter selection significantly affects training cost and predictive performance. Proper parameter settings can enhance model training efficiency, improve prediction accuracy, and reduce the risk of overfitting. Among network parameters, the number of neurons is critical: increasing the number of neurons can improve the model’s fitting ability for complex time series, but excessive neurons may lead to overfitting and higher computational cost. Epoch denotes the number of times the entire training set is processed; an appropriate value balances generalization and computational efficiency. Batch size refers to the number of samples used to update the model in each iteration, and powers of two (e.g., 32, 64, 128) are commonly chosen to optimize computational performance. In this study, the neural network models (LSTM, GRU, and PeepholeLSTM) were configured with a single hidden layer containing 50 neurons, a fully connected output layer; an Adam optimizer with a learning rate of 0.001 was used to prevent overfitting during model training. The following were also used: Dropout rate (e.g., 0.2), weight decay coefficient (L2 = 1 × 10⁻⁴); activation function selected—ReLU; Mean squared error (MSE) loss function, 100 epochs; and a batch size of 64. Meanwhile, set an Early Stopping strategy so that when the validation loss does not decrease for 10 consecutive epochs, training is automatically terminated to avoid ineffective training and overfitting.

For the support vector regression (SVR) model, the kernel type is the most critical parameter, as it maps input data into a high-dimensional feature space to handle nonlinear relationships. The kernel bandwidth (gamma) controls the influence range of individual samples: a large gamma may cause overfitting, whereas a small gamma produces a smoother kernel, making the model more sensitive to overall trends. The penalty parameter (C) regulates the tolerance for training errors; a large C can lead to overfitting, while a small C may result in underfitting. In this study, the SVR model was configured with a radial basis function (RBF) kernel, C set to 100, and gamma set to 0.1. The model parameters are shown in Table 2.

Data from the Guangzhou and Shenzhen stations were separately input into the SVR, LSTM, GRU, PeepholeLSTM, and PeepholeLSTM–attention models for training and

{PM}_{2.5}

concentration prediction. The first 80% of the data were used for model training to capture temporal patterns and trends, while the remaining 20% were reserved as the test set to evaluate model performance. A time step of 12 was applied, using the previous 12 h to predict the

{PM}_{2.5}

concentration in the 13th hour. The fitting between the predicted and observed values for both stations is shown in Figure 7. Due to the large number of hourly data points, the comparison plot is not visually clear; therefore, 24 h averaged

{PM}_{2.5}

concentrations were calculated to generate a daily comparison plot. To further illustrate differences among models, enlarged plots of six representative weeks were additionally provided.

Figure 7 visualizes the predictive performance of different models by plotting the actual against the predicted

{PM}_{2.5}

concentrations on the test sets. The black curve represents observed values, while other colors indicate predictions from SVR, LSTM, GRU, PeepholeLSTM, and PeepholeLSTM–attention models. Overall, predictions from all five models approximately match the observed values for both Guangzhou and Shenzhen stations, capturing the general trends of

{PM}_{2.5}

variations. However, the SVR model exhibits noticeably poorer performance compared with the deep learning models. Although SVR captures the overall trend, its predictions at peak

{PM}_{2.5}

values often deviate substantially from observations and may even show trends opposite to the actual data. The LSTM, GRU, and PeepholeLSTM models demonstrate similar performance, effectively reflecting the true trends and outperforming SVR. Nevertheless, these models still show some discrepancies at peak

{PM}_{2.5}

values. Introducing the attention mechanism into PeepholeLSTM model (PeepholeLSTM–attention) leads to a further improvement in predictive performance.

Although the combined PeepholeLSTM–attention model improved predictive performance, peak

{PM}_{2.5}

values were still not well captured. Therefore, OVMD was applied to decompose the

{PM}_{2.5}

time series, and the resulting IMF components were predicted using the PeepholeLSTM–attention model. The final prediction of

{PM}_{2.5}

concentrations was obtained by reconstructing the predicted results of the IMFs.

OVMD was first applied to the Guangzhou station data. As shown in Figure 8, when the number of modes K reached 7, the center frequencies of IMFs for both Guangzhou and Shenzhen stations became stable, and further increases in K did not lead to significant changes. Thus, K = 7 was selected as the optimal decomposition number.

The optimal update step

τ

was determined by minimizing the REI value. Let

τ

traverse the interval [0, 1] for a search, with the update step set to 0.01. For each candidate value of

τ

, calculate the Residual Evaluation Index (REI). By comparing the REI values for different

τ

values across the entire interval, finally select the

τ

that minimizes the REI as the optimal update step for OVMD decomposition. This ensures the decomposition process is efficient and the decomposition results are accurate, providing high-quality feature inputs for subsequent model training. As shown in Figure 9, the REI reached its minimum at

τ

= 0.85 for Guangzhou and

τ

= 0.88 for Shenzhen.

Accordingly, for both stations, OVMD was set to K = 7, with

τ

= 0.85 for Guangzhou, and

τ

= 0.88 for Shenzhen. The final decomposition results are shown in Figure 10. As can be seen from the figure, whether at Guangzhou Station or Shenzhen Station, the waveform fluctuations of the IMF1 component are the most dense and significantly higher than those of the subsequent components, which is a typical high-frequency fluctuation characteristic. This component has the fastest amplitude changes and the highest frequency, reflecting the short-term, rapid random fluctuations and high-frequency variations caused by sudden pollution events in the

{PM}_{2.5}

data. The waveform fluctuations of the IMF7 component are minimal, with overall changes being gentle, showing the long-term variation trend of

{PM}_{2.5}

concentration, which is a typical low-frequency trend characteristic. The fluctuation frequencies of the other components lie between high-frequency and trend components, containing certain short-term fluctuations as well as some trend changes, making them transitional mid-frequency components.

After OVMD decomposition, seven IMF components were obtained for each station, and each was individually predicted by sequentially inputting into the PeepholeLSTM–attention model with a time step of 12. The results of

{PM}_{2.5}

concentration prediction for the two sites were greatly improved, especially in terms of capturing the actual trends and peak positions more accurately. The prediction results of each IMF component are shown in Figure 11. The results indicate that the model has fully grasped the characteristic patterns of each IMF component (including the long-term stable trend of low-frequency components and the short-term fluctuation features of high-frequency components), and is able to accurately capture the variation details of each component. This enables precise prediction of the actual change trend and peak positions of

{PM}_{2.5}

concentrations, and also validates the rationality and effectiveness of combining OVMD decomposition with the Peephole LSTM–attention model.

To further reveal the mechanism of how the attention mechanism functions in predicting each IMF component and to clarify the model’s focus on features at different time steps of the input sequence, this study also visualizes the attention weight heatmaps during the prediction process of each modal component. Since high-frequency IMFs directly reflect the core fluctuation patterns of pollution events, accurately capturing their changes is key to improving prediction accuracy. Therefore, this paper only presents the heatmap of attention weight distribution for the high-frequency IMF1 component, which exhibits strong predictive performance (Figure 12). Analysis shows that the model demonstrates significantly differentiated attention characteristics for input time step features of different modal components; notably, the IMF components have the highest weights compared to the other 11 features, and this attention pattern aligns closely with the physical significance of each IMF component. For pollutants like

{PM}_{2.5}

, which have strong temporal inertia, the model inherently assigns higher attention weights to time steps closer to the prediction moment, while weights for more distant time steps decay rapidly. Combined with historical

{PM}_{2.5}

data, when sudden changes in

{PM}_{2.5}

concentration occur (such as unexpected heavy pollution or sudden drops in concentration), the attention weights exhibit noticeable ‘abnormal peak shifts’.

The reconstructed

{PM}_{2.5}

concentration predictions obtained by combining all IMF components are presented in Figure 13. The results demonstrate that a superior overall predictive performance for the PeepholeLSTM-OA model closely approximates the observed data at both stations, achieving accurate fit for abrupt changes, as well as peak and trough values, thereby achieving accurate prediction of

{PM}_{2.5}

concentration variations.

To comprehensively assess model performance in

{PM}_{2.5}

concentration prediction, we not only visualized prediction outputs to gauge their ability to track concentration fluctuations but also employed evaluation metrics to quantify the true effectiveness of these predictions. These metrics provide a more direct reflection of model stability, sensitivity, and fitting effectiveness. The MAE, RMSE, and

R^{2}

for each model are presented in Table 3. The results consistently indicated the superior performance of deep learning approaches (LSTM, GRU, and PeepholeLSTM) relative to SVR for the Shenzhen case, and consistent with results at Guangzhou, indicating that deep learning approaches are more suitable for long time series prediction tasks compared to traditional machine learning methods. Among the individual models, GRU achieved results comparable to LSTM, while PeepholeLSTM exhibited the best performance, a phenomenon consistently observed at both Guangzhou and Shenzhen sites. When the attention mechanism was integrated into the PeepholeLSTM, the combined model (PeepholeLSTM–attention) demonstrated further improvements in all three metrics, though the performance gains were marginal. Finally, with the introduction of the OVMD decomposition algorithm, the PeepholeLSTM-OA model achieved significant improvements over the single PeepholeLSTM. At the Guangzhou site, the MAE decreased by about 39%, the RMSE decreased by about 45%, and the

R^{2}

increased by 0.0457; at the Shenzhen site, the MAE decreased by about 45%, the RMSE decreased by about 51%, and the

R^{2}

increased by 0.0765. These results indicate that the stability, sensitivity to large errors, and fitting capability of the model were substantially enhanced.

In order to gain a more intuitive understanding of the prediction error distribution of the PeepholeLSTM-OA model, this study plotted the original error distribution chart (Figure 14). According to the statistics, the error range for the Guangzhou station is [−11.56, 7.60] μg/m³, with a standard deviation of 1.675 μg/m³ and a mean of −0.791 μg/m³; for the Shenzhen station, the error range is [−14.41, 12.66] μg/m³, with a standard deviation of 1.413 μg/m³ and a mean of −0.105 μg/m³. From this, it can be concluded that the model has excellent

{PM}_{2.5}

concentration prediction performance.

In summary, the OVMD–PeepholeLSTM–attention model proposed in this paper performs excellently in predicting

{PM}_{2.5}

concentrations at two sites. All evaluation metrics surpass those of the comparison models, particularly demonstrating significant advantages in capturing sudden concentration changes, peak positions, and long-term trends, thereby validating the rationality and effectiveness of the model’s design.

To further verify the authenticity of this excellent performance metric and the generalization ability of the model, and to eliminate the risk of overfitting, this study conducted K-fold cross-validation experiments. A 5-fold cross-validation strategy was used, where the original training set was randomly divided into five equally sized subsets. Each time, four subsets were selected as the training set and one subset as the validation set. The training and validation process was repeated five times, and the evaluation metrics of the five validation results were calculated to assess the stability of the model and eliminate the random errors caused by a single training session. The experimental results are shown in Table 4. The results of the 5-fold cross-validation show that the average

R^{2}

value of the model across five validations is higher than 0.94, and the average MAE and RMSE values show little difference from the single training results, with no significant fluctuations, indicating that the model has good stability and that random errors from a single training session have been ruled out.

The analysis of the above experiments indicates that the prediction results of the PeepholeLSTM-OA model are highly reliable. It performs excellently in terms of stability, sensitivity to anomalous data, and the ability to uncover data patterns, making it practically meaningful for predicting changes in

{PM}_{2.5}

concentrations.

5. Conclusions

This study proposed the PeepholeLSTM-OA model for predicting the hourly

{PM}_{2.5}

concentrations in Guangzhou and Shenzhen, China, from 2020 to 2022, demonstrating superior performance compared with individual models. The key findings are summarized as follows.

(a): Instance validation shows that the PeepholeLSTM-OA model outperforms the SVR, GRU, LSTM, PeepholeLSTM, and PeepholeLSTM–attention models across all evaluation metrics. The predicted results accurately reflect the true trends of the measured values and also show good performance in handling abrupt numerical changes.
(b): In Guangzhou, the PeepholeLSTM-OA model achieves MAE, RMSE, and $R^{2}$ values of 1.442, 2.025, and 0.9800, respectively. Compared with the single PeepholeLSTM model, MAE decreases by approximately 39%, RMSE decreases by about 45%, and $R^{2}$ improves by 0.0457. In Shenzhen, the model achieves MAE, RMSE, and $R^{2}$ values of 1.330, 1.330, and 0.9789, respectively, with MAE reduced by approximately 45%, RMSE reduced by 51%, and $R^{2}$ improved by 0.0765 compared with the PeepholeLSTM model.
(c): The PeepholeLSTM-OA model has excellent ${PM}_{2.5}$ concentration prediction performance. This method can be integrated into global air quality monitoring systems and has broad applicability and potential in general scientific fields, providing technical support for building a sustainable atmospheric environment monitoring system. The model can provide data support for short-term, real-time precise management of regional air pollution, assist in early warning of short-term pollution peaks, and accurately capture the long-term evolution trend of ${PM}_{2.5}$ concentrations through multi-step predictions. It provides a scientific basis for formulating sustainable policies for air pollution prevention and control, and has important practical value in promoting normalized and sustainable management of air pollution.

Future research could integrate attention mechanisms with reinforcement learning and leverage the predictive capabilities of the PeepholeLSTM-OA model to conduct intelligent decision-making studies related to pollution control. Based on this integrated framework, it is possible to further achieve unified decision-making for ’prediction-analysis-control’ of air pollution, contributing to the sustainable development of the atmospheric ecological environment.

Author Contributions

X.C.: Conceptualization, Methodology, Formal analysis, Writing—original draft; M.L.: Validation, Investigation, Data curation, Writing—original draft, Writing—review and editing; Y.K.: Conceptualization, Validation, Writing—original draft, Writing—review and editing; B.L.: Investigation, Data curation, Writing—review and editing; Y.H.: Validation, Investigation, Data curation, Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant Nos. 42004158, 42461051), the Natural Science Foundation of Jiangxi Province (Grant No. 20224BAB212025), and the Science and Technology Research Project of Education Department of Jiangxi Province (Grant No. GJJ2200803).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Air pollutant data were obtained from the National Urban Air Quality Real-Time Publishing Platform of the China National Environmental Monitoring Center (https://air.cnemc.cn:18007, accessed on 15 February 2025). Meteorological data were retrieved from the ERA5 reanalysis dataset provided by the European Center for Medium-Range Weather Forecasts (ECMWF). The processed data used in this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, Y.; Guo, C.; Chung, M.K.; Yi, Q.; Wang, X.; Wang, Y.; Jiang, B.; Liu, Y.; Lan, M.; Lin, L.; et al. The associations of prenatal exposure to fine particulate matter and its chemical components with allergic rhinitis in children and the modification effect of polyunsaturated fatty acids: A birth cohort study. Environ. Health Perspect. 2024, 132, 047010. [Google Scholar] [CrossRef]
Liang, C.; Ding, R.; Sun, Q.; Liu, S.; Sun, Z.; Duan, J. An overview of adverse outcome pathway links between PM_2.5 exposure and cardiac developmental toxicity. Environ. Health 2024, 2, 105–113. [Google Scholar] [CrossRef]
Yang, Z.; Wang, Y.; Xu, X.-H.; Yang, J.; Ou, C.-Q. Quantifying and characterizing the impacts of PM_2.5 and humidity on atmospheric visibility in 182 Chinese cities: A nationwide time-series study. J. Clean. Prod. 2022, 368, 133182. [Google Scholar] [CrossRef]
Bryan, L.; Landrigan, P. PM_2.5 pollution in Texas: A geospatial analysis of health impact functions. Front. Public Health 2023, 11, 1286755. [Google Scholar] [CrossRef]
Amouzouvi, Y.M.; Dzagli, M.M.; Sagna, K.; Török, Z.; Roba, C.A.; Mereuţă, A.; Ozunu, A.; Edjame, K.S. Evaluation of pollutants along the national road N2 in Togo using the AERMOD dispersion model. J. Health Pollut. 2020, 10, 200908. [Google Scholar] [CrossRef] [PubMed]
Seaton, M.; O’Neill, J.; Bien, B.; Hood, C.; Jackson, M.; Jackson, R.; Johnson, K.; Oades, M.; Stidworthy, A.; Stocker, J.; et al. A multi-model air quality system for health research: Road model development and evaluation. Environ. Model. Softw. 2022, 155, 105455. [Google Scholar] [CrossRef]
Kim, D.-J.; Kim, T.-H.; Choi, J.-Y.; Lee, J.-B.; Kim, R.-H.; Son, J.-S.; Lee, D. The impact of vertical eddy diffusivity changes in the CMAQ model on PM_2.5 concentration variations in Northeast Asia: Focusing on the Seoul Metropolitan Area. Atmosphere 2024, 15, 376. [Google Scholar] [CrossRef]
Patel, Z.B.; Purohit, P.; Patel, H.M.; Sahni, S.; Batra, N. Accurate and scalable gaussian processes for fine-grained air quality inference. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 12080–12088. [Google Scholar]
Zou, X.; Zhao, J.; Zhao, D.; Sun, B.; He, Y.; Fuentes, S. Air quality prediction based on a spatiotemporal attention mechanism. Mob. Inf. Syst. 2021, 2021, 6630944. [Google Scholar] [CrossRef]
Li, Q.; Gong, D.; Wang, H.; Wang, Y.; Han, S.; Wu, G.; Deng, S.; Yu, P.; Wang, W.; Wang, B. Rapid increase in atmospheric glyoxal and methylglyoxal concentrations in Lhasa, Tibetan Plateau: Potential sources and implications. Sci. Total Environ. 2022, 824, 153782. [Google Scholar] [CrossRef]
Yang, F.; Huang, G.; Li, Y. A new combination model for air pollutant concentration prediction: A case study of Xi’an, China. Sustainability 2023, 15, 9713. [Google Scholar] [CrossRef]
Zhang, L.-H.; Deng, Z.-H.; Wang, W.-B. PM_2.5 concentration prediction based on Markov Blanke feature selection and hybrid kernel support vector regression optimized by particle swarm optimization. Aerosol Air Qual. Res. 2021, 21, 200144. [Google Scholar] [CrossRef]
Yang, H.; Zhao, J.; Li, G. A novel hybrid prediction model for PM_2.5 concentration based on decomposition ensemble and error correction. Environ. Sci. Pollut. Res. 2023, 30, 44893–44913. [Google Scholar] [CrossRef]
Heidari, A.A.; Akhoondzadeh, M.; Chen, H. A wavelet PM_2.5 prediction system using optimized kernel extreme learning with Boruta-XGBoost feature selection. Mathematics 2022, 10, 3566. [Google Scholar] [CrossRef]
Ismail, N.A.; Abdullah, S.; Mansor, A.A.; Ahmad, A.N.; Ahmed, A.; Ismail, M. Trend and interrelationship of PM_2.5, gaseous pollutants and meteorological factors in Kuala Terengganu, Malaysia. Int. J. Des. Nat. Ecodyn. 2024, 19, 1251–1260. [Google Scholar] [CrossRef]
Yang, Z.; Yang, X.; Xu, C.; Wang, Q. The effect of meteorological features on pollution characteristics of PM_2.5 in the South area of Beijing, China. Atmosphere 2023, 14, 1753. [Google Scholar] [CrossRef]
Hussin, S.K.; Omar, Y.M.; Abdelmageid, S.M.; Marie, M.I. Traditional machine learning and big data analytics in virtual screening: A comparative study. Int. J. Adv. Comput. Res. 2020, 10, 72–88. [Google Scholar] [CrossRef]
Liu, Y. Dynamics evolution prediction from time series data with recurrent neural networks in a complex system. Int. J. Mod. Phys. C 2023, 34, 2350099. [Google Scholar] [CrossRef]
Paulpandi, C.; Chinnasamy, M.; Rajendiran, S.N. Multi-site air pollutant prediction using long short term memory. Comput. Syst. Sci. Eng. 2022, 43, 1342–1355. [Google Scholar] [CrossRef]
Seng, D.; Zhang, Q.; Zhang, X.; Chen, G.; Chen, X. Spatiotemporal prediction of air quality based on LSTM neural network. Alex. Eng. J. 2021, 60, 2021–2032. [Google Scholar] [CrossRef]
Qing, L. PM_2.5 concentration prediction using GRA-GRU network in air monitoring. Sustainability 2023, 15, 1973. [Google Scholar] [CrossRef]
Ayturan, Y.A.; Ayturan, Z.C.; Altun, H.O.; Kongoli, C.; Tuncez, F.D.; Dursun, S.; Ozturk, A. Short-term prediction of PM_2.5 pollution with deep learning methods. Glob. Nest J. 2020, 22, 126–131. [Google Scholar]
Chae, M.; Han, S.; Lee, H. Outdoor particulate matter correlation analysis and prediction based deep learning in the Korea. Electronics 2020, 9, 1146. [Google Scholar] [CrossRef]
Wang, L.; Hu, B.; Zhao, Y.; Song, K.; Ma, J.; Gao, H.; Huang, T.; Mao, X. A hybrid spatiotemporal model combining graph attention network and gated recurrent unit for regional composite air pollution prediction and collaborative control. Sustain. Cities Soc. 2024, 116, 105925. [Google Scholar] [CrossRef]
Zhang, M.; Wu, D.; Xue, R. Hourly prediction of PM_2.5 concentration in Beijing based on Bi-LSTM neural network. Multimed. Tools Appl. 2021, 80, 24455–24468. [Google Scholar] [CrossRef]
Choe, T.-H.; Ho, C.-S. An improvement of PM_2.5 concentration prediction using optimised deep LSTM. Int. J. Environ. Pollut. 2021, 69, 249–260. [Google Scholar] [CrossRef]
Du, M.; Chen, Y.; Liu, Y.; Yin, H. A novel hybrid method to predict PM_2.5 concentration based on the SWT-QPSO-LSTM hybrid model. Comput. Intell. Neurosci. 2022, 2022, 7207477. [Google Scholar] [CrossRef] [PubMed]
Ma, J.; Ding, Y.; Cheng, J.C.P.; Jiang, F.; Gan, V.J.L.; Xu, Z. A Lag-FLSTM deep learning network based on Bayesian Optimization for multi-sequential-variant PM_2.5 prediction. Sustain. Cities Soc. 2020, 60, 102237. [Google Scholar] [CrossRef]
Luo, J.; Gong, Y. Air pollutant prediction based on ARIMA-WOA-LSTM model. Atmos. Pollut. Res. 2023, 14, 101761. [Google Scholar] [CrossRef]
Faraji, M.; Nadi, S.; Ghaffarpasand, O.; Homayoni, S.; Downey, K. An integrated 3D CNN-GRU deep learning method for short-term prediction of PM_2.5 concentration in urban environment. Sci. Total Environ. 2022, 834, 155324. [Google Scholar] [CrossRef]
Tsokov, S.; Lazarova, M.; Aleksieva-Petrova, A. A hybrid spatiotemporal deep model based on CNN and LSTM for air pollution prediction. Sustainability 2022, 14, 5104. [Google Scholar] [CrossRef]
Zhu, M.; Xie, J. Investigation of nearby monitoring station for hourly PM_2.5 forecasting using parallel multi-input 1D-CNN-biLSTM. Expert Syst. Appl. 2023, 211, 118707. [Google Scholar] [CrossRef]
Ding, W.; Sun, H. Prediction of PM_2.5 concentration based on the weighted RF-LSTM model. Earth Sci. Inform. 2023, 16, 3023–3037. [Google Scholar] [CrossRef]
Galassi, A.; Lippi, M.; Torroni, P. Attention in natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4291–4308. [Google Scholar] [CrossRef]
Li, Y.; Lu, G.; Li, J.; Zhang, Z.; Zhang, D. Facial expression recognition in the wild using multi-level features and attention mechanisms. IEEE Trans. Affect. Comput. 2020, 14, 451–462. [Google Scholar] [CrossRef]
Zhang, C.; Wang, S.; Wu, Y.; Zhu, X.; Shen, W. A long-term prediction method for PM_2.5 concentration based on spatiotemporal graph attention recurrent neural network and grey wolf optimization algorithm. J. Environ. Chem. Eng. 2024, 12, 111716. [Google Scholar] [CrossRef]
Wu, F.; Min, P.; Jin, Y.; Zhang, K.; Liu, H.; Zhao, J.; Li, D. A novel hybrid model for hourly PM_2.5 prediction considering air pollution factors, meteorological parameters and GNSS-ZTD. Environ. Model. Softw. 2023, 167, 105780. [Google Scholar] [CrossRef]
Zheng, Q.; Tian, X.; Yu, Z.; Jin, B.; Jiang, N.; Ding, Y.; Yang, M.; Elhanashi, A.; Saponara, S.; Kpalma, K. Application of complete ensemble empirical mode decomposition based multi-stream informer (CEEMD-MsI) in PM_2.5 concentration long-term prediction. Expert Syst. Appl. 2024, 245, 123008. [Google Scholar] [CrossRef]
Zhang, J.; Peng, Y.; Ren, B.; Li, T. PM_2.5 concentration prediction based on CNN-BiLSTM and attention mechanism. Algorithms 2021, 14, 208. [Google Scholar] [CrossRef]
Tao, W.; Li, C.; Song, R.; Cheng, J.; Liu, Y.; Wan, F.; Chen, X. EEG-based emotion recognition via channel-wise attention and self attention. IEEE Trans. Affect. Comput. 2020, 14, 382–393. [Google Scholar] [CrossRef]
Dubey, G.; Singh, H.P.; Maurya, R.K.; Sheoran, K.; Dhand, G. A hybrid forecasting system using convolutional-based extreme learning with extended elephant herd optimization for time-series prediction. Soft Comput. 2024, 28, 7093–7124. [Google Scholar] [CrossRef]
Mirzaei, S.; Liao, T.L.; Hsu, C.-Y. Modeling PM_2.5 urban pollution using hybrid models incorporating decomposition and multiple factors. Urban Clim. 2025, 60, 102338. [Google Scholar] [CrossRef]
Teng, M.; Li, S.; Xing, J.; Song, G.; Yang, J.; Dong, J.; Zeng, X.; Qin, Y. 24-hour prediction of PM_2.5 concentrations by combining empirical mode decomposition and bidirectional long short-term memory neural network. Sci. Total Environ. 2022, 821, 153276. [Google Scholar] [CrossRef] [PubMed]
Dong, L.; Hua, P.; Gui, D.; Zhang, J. Extraction of multi-scale features enhances the deep learning-based daily PM_2.5 forecasting in cities. Chemosphere 2022, 308, 136252. [Google Scholar] [CrossRef] [PubMed]
Zhao, D.; Chen, H.; Sun, X.; Shi, Z. Spatio-temporal variation of PM_2.5 pollution and its relationship with meteorology among five megacities in China. Aerosol Air Qual. Res. 2018, 18, 2318–2331. [Google Scholar] [CrossRef]
Cai, J.; Peng, C.; Yu, S.; Pei, Y.; Liu, N.; Wu, Y.; Fu, Y.; Cheng, J. Association between PM_2.5 exposure and all-cause, non-accidental, accidental, different respiratory diseases, sex and age mortality in Shenzhen, China. Int. J. Environ. Res. Public Health 2019, 16, 401. [Google Scholar] [CrossRef]
Yan, X.; Sun, T. Spatial–temporal variations and influencing factors of air quality in China’s major cities during COVID-19 lockdown. Environ. Sci. Pollut. Res. 2023, 30, 24617–24628. [Google Scholar] [CrossRef]
Wang, J.; Ge, B.; Kong, L.; Chen, X.; Li, J.; Lu, K.; Dong, Y.; Su, H.; Wang, Z.; Zhang, Y. Quantitative decoupling analysis for assessing the meteorological, emission, and chemical influences on fine particle pollution. J. Adv. Model. Earth Syst. 2024, 16, e2024MS004261. [Google Scholar] [CrossRef]
Gers, F.A.; Schmidhuber, J. Recurrent nets that time and count. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN 2000), Como, Italy, 27 July 2000; Volume 3, pp. 189–194. [Google Scholar]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]

Figure 1. Technical flow of this study.

Figure 2. Environmental quality monitoring stations.

Figure 3.

{PM}_{2.5}

time series. (a) Guangzhou station, (b) Shenzhen station.

Figure 3.

{PM}_{2.5}

time series. (a) Guangzhou station, (b) Shenzhen station.

Figure 4. Spearman coefficient. (a) Guangzhou station, (b) Shenzhen station.

Figure 5. PeepholeLSTM structure unit.

Figure 6. PeepholeLSTM-OA prediction model.

Figure 7. Prediction effect of each model. (a) Guangzhou station, (b) Shenzhen station.

Figure 8. IMF component center frequency distribution. (a) Guangzhou station, (b) Shenzhen station.

Figure 9. Update step

τ

. (a) Guangzhou station, (b) Shenzhen station.

Figure 9. Update step

τ

. (a) Guangzhou station, (b) Shenzhen station.

Figure 10. Data decomposition results. (a) Guangzhou station, (b) Shenzhen station.

Figure 11. IMF component prediction results. (a) Guangzhou station, (b) Shenzhen station.

Figure 12. IMF1 Attention Weight Distribution. (a) Guangzhou station, (b) Shenzhen station.

Figure 13.

{PM}_{2.5}

concentration prediction results. (a) Guangzhou station, (b) Shenzhen station.

Figure 13.

{PM}_{2.5}

concentration prediction results. (a) Guangzhou station, (b) Shenzhen station.

Figure 14. Error Distribution of PeepholeLSTM-OA Model. (a) Guangzhou station, (b) Shenzhen station.

Table 1. Similarity function.

	Additive Attention	Dot-Product Attention	Scaled Dot-Product Attention	Bilinear Attention
Similarity	$v^{T} tanh (W_{q} q + W_{k} k)$	$q \cdot k$	$\frac{q \cdot k}{\sqrt{d_{k}}}$	$q^{T} W k$

Table 2. Model parameter Settings.

Model	Parameter
	Units	epoch	batch_size
GRU	50	100	64
LSTM	50	100	64
PeepholeLSTM	50	100	64
SVR	Kernel	c	gamma
SVR	RBF	100	0.1

Table 3. Indicators of Experimental Models at Two Sites.

Model	Guangzhou			Shenzhen
Model	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$
SVR	5.954	7.601	0.6600	4.826	6.349	0.6839
GRU	2.420	3.760	0.9312	2.587	4.096	0.9009
LSTM	2.535	3.863	0.9274	2.547	4.014	0.8964
PeepholeLSTM	2.368	3.705	0.9332	2.464	3.991	0.9024
PeepholeLSTM–attention	2.367	3.676	0.9343	2.454	3.952	0.9034
PeepholeLSTM-OA	1.442	2.025	0.9800	1.330	1.915	0.9789

Table 4. K-fold cross-validation results.

	Guangzhou			Shenzhen
	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$
First fold	1.232	1.768	0.9773	1.226	1.703	0.9548
Second fold	1.766	2.293	0.9615	1.233	1.638	0.9725
Third fold	1.675	2.266	0.9490	2.418	3.574	0.9612
Fourth fold	1.874	2.574	0.9607	1.398	2.073	0.9738
Fifth fold	1.829	2.504	0.9600	2.006	2.724	0.9674

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cheng, X.; Li, M.; Ke, Y.; Li, B.; Huang, Y. Time Series-Based PM_2.5 Concentration Prediction Model Incorporating Attention Mechanism. Sustainability 2026, 18, 2038. https://doi.org/10.3390/su18042038

AMA Style

Cheng X, Li M, Ke Y, Li B, Huang Y. Time Series-Based PM_2.5 Concentration Prediction Model Incorporating Attention Mechanism. Sustainability. 2026; 18(4):2038. https://doi.org/10.3390/su18042038

Chicago/Turabian Style

Cheng, Xiaolong, Moye Li, Yangzhong Ke, Bingzi Li, and Yuemei Huang. 2026. "Time Series-Based PM_2.5 Concentration Prediction Model Incorporating Attention Mechanism" Sustainability 18, no. 4: 2038. https://doi.org/10.3390/su18042038

APA Style

Cheng, X., Li, M., Ke, Y., Li, B., & Huang, Y. (2026). Time Series-Based PM_2.5 Concentration Prediction Model Incorporating Attention Mechanism. Sustainability, 18(4), 2038. https://doi.org/10.3390/su18042038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Time Series-Based PM_2.5 Concentration Prediction Model Incorporating Attention Mechanism

Abstract

1. Introduction

2. Research Data

2.1. Overview of the Study Area

2.2. Data Acquisition and Processing

3. Research Methods

3.1. Peephole Long Short-Term Memory Network (Peephole LSTM)

3.2. Attention Mechanism

3.3. Optimal Variational Mode Decomposition (OVMD)

3.4. Model Construction

3.5. Error Evaluation Indices

4. Analysis of ${PM}_{2.5}$ Prediction Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Time Series-Based PM2.5 Concentration Prediction Model Incorporating Attention Mechanism

Abstract

1. Introduction

2. Research Data

2.1. Overview of the Study Area

2.2. Data Acquisition and Processing

3. Research Methods

3.1. Peephole Long Short-Term Memory Network (Peephole LSTM)

3.2. Attention Mechanism

3.3. Optimal Variational Mode Decomposition (OVMD)

3.4. Model Construction

3.5. Error Evaluation Indices

4. Analysis of PM 2.5 Prediction Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Time Series-Based PM_2.5 Concentration Prediction Model Incorporating Attention Mechanism

4. Analysis of ${PM}_{2.5}$ Prediction Results