1. Introduction
In recent years, with the rapid advancement of China’s industrialisation and the continuous expansion of its population, air pollution has become an increasingly severe issue, exerting significant negative impacts on public health and socio-economic development [
1,
2]. Accurate air quality forecasting facilitates effective prevention, control, and mitigation of atmospheric pollution [
3]. While individual pollutant concentrations can be used to assess air quality, the public and policymakers struggle to interpret these data to determine current air quality conditions. The Air Quality Index (AQI), however, simplifies pollutant concentrations—including PM
2.5, PM
10, CO, SO
2, O
3, and NO
2—into a single numerical value, providing a more intuitive reflection of overall air quality.
The atmospheric system is inherently dynamic, non-linear, and non-stationary, significantly increasing the complexity of AQI forecasting. Consequently, developing an appropriate forecasting framework is crucial [
4]. Existing AQI forecasting techniques are primarily categorised into four types [
5]: physical methods, statistical methods, artificial intelligence methods, and hybrid models. Physical models, based on physicochemical principles, are commonly used to simulate the dispersion and transport of atmospheric pollutants. However, the primary drawbacks of this approach lie in their high computational complexity, substantial time requirements, and limited predictive accuracy [
6]. Traditional statistical methods, such as the Autoregressive Integrated Moving Average (ARIMA) model [
7] and Grey Model [
8], can infer short-term trends using historical sequences but struggle to capture the pronounced non-linear and non-stationary characteristics of AQI data [
9]. Artificial intelligence methods, particularly deep learning approaches, such as convolutional neural networks (CNN), recurrent neural networks (RNN), long short-term memory networks (LSTM), gated recurrent units (GRU) [
10], and the recently emerging Transformer model, have been widely applied in air quality forecasting due to their robust non-linear fitting capabilities and ability to learn temporal features [
11]. For instance, Ansari A et al. [
12] conducted a study in Azamgarh, India, using 8760 hourly air quality measurements (PM
2.5, PM
10, NO
2, SO
2) and meteorological data from July 2022 to June 2023, employing multiple statistical methods to demonstrate significant pollutant variations across different timescales. By comparing six deep learning models including Transformers and LSTMs, they that found feedforward neural networks (FNNs) achieved the optimal hourly AQI prediction performance (MAE 2.89, RMSE 4.99, R
2 = 0.9971), with PM
2.5, NO
2, and SO
2 exerting the strongest influence on their predictions. However, single-model approaches remain significantly constrained in handling the multi-scale characteristics and complex spatiotemporal dependencies inherent in air quality time series [
13].
Presently, to overcome the limitations of single models in terms of accuracy and adaptability, deep learning-based hybrid models have emerged as the mainstream approach for AQI forecasting [
14]. Nguyen A T et al. [
15] proposed a hybrid deep learning model that integrates Attention Convolutional Neural Networks, ARIMA, LSTM enhanced by Quantum Particle Swarm Optimisation, and XGBoost. Utilising Seoul air quality data from 2021–2022, they achieved AQI prediction through a two-stage process (ARIMA fitting for linear components + hybrid model processing for non-linear components). Results demonstrated significant improvements over traditional models in terms of metrics such as MSE, MAE, and R
2, with superior performance at both city and site levels. Qian S et al. [
16] proposed a hybrid deep learning model integrating XGBoost feature selection, Gaussian data augmentation, an improved manta ray foraging optimisation algorithm, and TCN-GRU. By synergistically enhancing spatio-temporal feature extraction and model robustness across multiple components, it significantly reduced prediction errors compared to baseline models in AQI forecasting for four cities, providing reliable support for air quality early warning systems.
With technological advancements, numerous scholars have incorporated signal decomposition techniques such as the Extended Empirical Mode Decomposition (EEMD) into AQI forecasting. By decomposing AQI data into frequency-based components through signal decomposition, these methods better capture multi-scale characteristics and enhance prediction accuracy [
17]. Wang K et al. [
18] developed an integrated model (IAMSSA-VMD-SSA-LSTM) for AQI forecasting. This combines an improved adaptive variant sparrow search algorithm (IAMSSA) to optimised variational mode decomposition (VMD) parameters with a sparrow search algorithm (SSA) to optimise LSTM. After decomposing the non-linear, non-stationary AQI sequence into multiple intrinsic modal functions (IMFs) and residuals (RES), they modelled these components separately. Across data from Chengdu, Guangzhou, and Shenyang, the model achieved MAE, RMSE, MAPE, and R
2 values of 3.692, 4.909, 6.241, and 0.981, respectively, outperforming comparison models such as LSTM and SSA-LSTM. Qian S et al. [
19] proposed the STL–Metis–MHBA–TC hybrid model for AQI forecasting. This approach processes raw sequences via seasonal-trend decomposition (STL), employs the Metis algorithm to optimise hyperparameters of TimesNet and Crossformer for component prediction, and then fuses results using an enhanced honey badger algorithm (MHBA). This significantly enhances forecasting accuracy compared to Transformer-based models. Owing to AQI’s complex characteristics, including strong non-stationarity, some single-stage decomposition methods fail to effectively address high-frequency disturbances, and often retain noise interference. Others may yield suboptimal decomposition outcomes due to heavy reliance on parameters [
20]. Consequently, researchers have explored two-stage decomposition strategies, combining the strengths of both approaches: initial decomposition followed by refined processing of complex components to enhance feature extraction quality [
21]. Tang C et al. [
22] utilised AQI data from the Beijing–Tianjin–Hebei region to propose two foundational AQI prediction models: one based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) with GRU, and another hybrid model incorporating VMD and Sample Entropy (SE) optimisation. The improved model achieved a maximum R
2 of 0.984 and a mean absolute error (MAE) of 2.476 across multiple regions, validating its reliability and applicability.
Consequently, this paper proposes a hybrid prediction model integrating CEEMDAN, SE, VMD, and Transformer–BiLSTM. The Crested Porcupine Optimizer (CPO) algorithm is employed to optimised VMD parameters, enhancing decomposition accuracy. Its performance is validated using AQI data of Beijing. Experimental results demonstrate that the CEEMDAN–SE–CPO–VMD–Transformer–BiLSTM framework significantly outperforms traditional models in both prediction accuracy and efficiency. To further enhance prediction precision, LSSVM is introduced for error correction, and this error correction step provides crucial reference for environmental management.
The innovations and primary contributions of this study are as follows:
- (1)
Proposal of a novel hybrid prediction framework: We designed the CEEMDAN–SE–CPO–VMD–Transformer–BiLSTM hybrid prediction model. CEEMDAN decomposition enhances data utilisation, SE optimises signal reconstruction, the CPO algorithm adaptively optimises VMD parameters to enhance decomposition accuracy, Transformer–BiLSTM collaboratively captures long- and short-term temporal features, and LSSVM error correction further improves prediction accuracy. Empirical analysis demonstrates the framework’s superior performance in AQI forecasting, significantly outperforming traditional and single-decomposition models.
- (2)
Validating of the hybrid framework’s superiority and robustness: Cross-season comparison experiments using AQI data of Beijing across spring, summer, autumn, and winter 2023 data confirmed the proposed model’s superiority across metrics including MAE, RMSE, MAPE, and R2, alongside its universality and stability across seasonal scenarios. LSSVM error correction substantially optimised prediction outcomes.
- (3)
Optimisation of VMD parameter selection: Addressing the subjective nature of selecting VMD parameters (mode number K and penalty factor ), we introduce CPO for adaptive optimisation. This enhanced the accuracy of quadratic decomposition, providing high-quality inputs for Transformer–BiLSTM forecasting and LSSVM error correction, and thus ensuring model robustness and prediction reliability.
2. Materials and Methods
2.1. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) [
23] represents an enhanced signal decomposition method derived from Empirical Mode Decomposition (EMD). By repeatedly introducing adaptive white noise during decomposition, it significantly alleviates the modal aliasing inherent in EMD. Meanwhile, its decomposition process exhibits excellent completeness, enabling precise reconstruction of the original signal. The specific steps of this method are as follows:
First, the AQI signal is decomposed into K zero-mean white noise components:
where
represents a Gaussian white noise weighting factor;
denotes the
-th generated white noise component
Subsequently, perform EMD is performed on
, treating the first decomposed modal component as the first modal component obtained from CEEMDAN decomposition:
Among these, represents the first modal component of this decomposition; denotes the residual.
After adding specific adaptive noise to the residuals obtained at the
-th stage of the decomposition, the EMD continues:
where
denotes the
-th modal component of this decomposition.
Finally, if the residual signal obtained from the -th decomposition is a monotonic signal, the iteration stops and the CEEMDAN algorithm concludes.
2.2. Sample Entropy
To address the issue of the generation of numerous components after decomposition whose practical significance is difficult to determine, Richman and Moorman proposed Sample Entropy (SE) [
24]. SE is a non-linear dynamic method for assessing data complexity: the simpler the time series, the smaller the SE value; conversely, the larger the SE value. SE computation consists of three steps: first, determining the subsequence length and similarity threshold; second, constructing the subsequences; and finally, calculating similarity and probability. In our experiments, we employed the SE algorithm to compute the entropy for each intrinsic mode function (IMF). Based on these entropy values, we assessed the randomness of each IMF and them as the basis for merging and reconstructing components, to generate three types of components: low-frequency, high-frequency, and trend components. To further optimise component classification, we employed the K-means clustering algorithm to categorise components into high-frequency, medium-frequency, and low-frequency groups. This approach reduces the number of components and improves computational efficiency.
2.3. Crested Porcupine Optimizer
The Crested Porcupine Optimizer (CPO) [
25] is a novel bio-inspired meta-heuristic optimisation algorithm. Its core lies in constructing an optimisation framework by simulating four typical defence behaviours of crested porcupines: visual, acoustic defence, olfactory defence, and physical attack. These behaviours are hierarchically designed to balance exploration and exploitation: the first two strategies (visual and acoustic) facilitate global search, while the latter two (olfactory and physical) enhance local refinement. This design endows CPO with three critical advantages:
Strong global exploration: Effectively avoids local optima by mimicking the porcupine’s threat assessment mechanisms;
Rapid convergence: Achieves fast optimisation through cyclic population reduction, which dynamically adjusts search intensity;
Low computational cost: Reduces traversal time in complex parameter spaces via adaptive defence strategy switching.
The detailed mathematical formulations of CPO, including its position update rules and cyclic population reduction technique, are given in Reference [
26]. This algorithm has been validated in various engineering applications for its robustness and efficiency in solving high-dimensional optimisation problems. The specific workflow is illustrated in
Figure 1.
2.4. Variational Mode Decomposition
Variational Mode Decomposition (VMD) represents a novel approach to decomposing complex signals [
27]. Unlike EMD and its variants, the VMD algorithm has rigorous mathematical derivation and strong noise robustness, thereby ensuring decomposition accuracy and stability. Consequently, VMD effectively resolves the modal aliasing issue inherent in EMD [
28]. It can adaptively optimise the decomposition of non-stationary and highly complex signals into several relatively stable sub-modes. The specific steps are detailed in the Reference [
29].
2.5. Transformer–BiLSTM
AQI forecasting necessitates processing multidimensional time-series data. Traditional RNNs struggle to model long-term sequence dependencies, while Transformers excel at capturing global correlations but lack sufficient short-term dynamic capture capability. Hence, this paper proposes the Transformer–BiLSTM model, which synergistically learns to extract long- and short-term features through self-attention mechanisms and BiLSTMs, thereby enhancing the precision and robustness of AQI predictions. The model structure and principles are as follows:
- (1)
Position Encoding: Imbuing temporal information into the model
The Transformer first applies positional encoding to the input AQI time-series features, embedding temporal sequence information into the model. The formula are:
where
denotes the temporal step position and
represents the embedding dimension. Position encoding enables the model to capture the temporal sequence of AQI features, preventing the loss of temporal information inherent in the Transformer’s “unordered list” processing approach.
- (2)
Transformer Self-Attention: Capturing Global Dependencies
Self-attention is the core of the Transformer. The input to self-attention is represented by matrix X. Via linear transformations of the query matrix (Q), key matrix (K), and value matrix (V), it calculates the correlation weights between AQI features across different time steps (e.g., the correlation between pollutant concentrations at the current time and those at historical times, or with historical AQI values).
where
represents the attention score,
denotes the input matrix, and
are the trainable matrices within the model;
characterises the similarity between queries and keys (i.e., feature relevance);
is a scaling factor to prevent excessive inner products; Multi-Head Attention further maps features to multiple subspaces, capturing multi-scale global dependencies. The formula is:
where
denotes the output of the
-th head, and
represents the linear transformation matrix. Through multi-head attention, the model can simultaneously learn long-range correlations within AQI sequences.
- (3)
BiLSTM Decoder: Learning Local Temporal Dependencies
The global feature vector processed by the Transformer serves as input to the BiLSTM, which comprises a forward LSTM and a backward LSTM: the forward LSTM processes the sequence chronologically to learn forward temporal relationships; the backward LSTM processes the sequence in reverse order to learn backward temporal relationships.
The core of LSTM lies in its gating mechanism (
). The forward computation process of LSTM is as follows:
where
represents the AQI feature sequence (incorporating contextual information from pollutant concentrations, meteorological factors, and historical AQI data). The hidden state
captures short-term AQI dynamics. Through bidirectional processing, BiLSTM can simultaneously learn “past → present” and “present → past” relationships, enhancing its modelling capability for short-term fluctuations.
- (4)
Output Layer: AQI Prediction
The BiLSTM output is mapped to a single-dimensional prediction value via a fully connected layer, and model training is completed using regression loss. Ultimately, the model outputs the predicted AQI value for the next time step.
The Transformer–BiLSTM model provides a more accurate and robust approach for AQI forecasting through synergistic learning of “global correlations + local dynamics”.
2.6. Prediction Framework
To address the non-stationarity and complex temporal characteristics in AQI forecasting, this paper proposes an integrated AQI prediction workflow combining a two-stage modal decomposition method based on CEEMDAN-SE-CPO-VMD with an ensemble Transformer–BiLSTM model. This workflow comprises three stages: preprocessing of raw AQI data, signal processing via secondary modal decomposition based on CEEMDAN-SE-CPO-VMD, and training and testing of the integrated Transformer–BiLSTM model. The prediction steps are as follows:
- (1)
Data Preprocessing
Collect historical AQI data and feature vectors (pollutant concentrations such as PM2.5, PM10, NO2, SO2, CO, O3, and meteorological factors including wind speed, wind direction, temperature, and humidity). Preprocess the raw data to construct standardised input sequences, providing a high-quality data foundation for subsequent decomposition and modelling.
- (2)
CEEMDAN-SE-CPO-VMD Second-Order Modal Decomposition
The CEEMDAN algorithm to decompose the original non-stationary AQI time series into multiple IMF components. Calculate the SE for each IMF component to quantify sequence complexity. Using the SE values of each IMF as feature vectors, apply the K-means clustering algorithm to partition the IMF components into three clusters (high-, medium-, and low-frequency Co-IMF). The cluster with the highest complexity (Co-IMF1) is selected and further decomposed into finer-grained sub-components (VMD-IMF) using VMD to uncover micro-oscillation patterns. To avoid arbitrary parameter selection in VMD, CPO is employed to optimise the modal number K and penalty factor .
- (3)
Ensemble Transformer–BiLSTM Prediction and Result Reconstruction
The sub-components obtained from VMD and the remaining clusters from CEEMDAN decomposition are fed into a Transformer–BiLSTM hybrid prediction model for training and forecasting. The Transformer captures global correlations within the AQI sequence (e.g., pollutant concentration trends across days) via self-attention mechanisms, while the BiLSTM models local temporal dependencies (e.g., short-term pollutant fluctuations). Finally, the predicted values of all subcomponents (VMD-decomposed subcomponents and remaining Co-IMFs) are linearly superimposed to reconstruct the final prediction of the original AQI time series.
This approach effectively captures both short-term and long-term dynamic characteristics of AQI sequences through integrated modelling using secondary modal decomposition and Transformer–BiLSTM, thereby enhancing prediction accuracy and robustness. The prediction model workflow is illustrated in
Figure 2.
3. Date Sources and Processing
3.1. Data Sources and Preprocessing
Beijing, situated in the northern part of the North China Plain, experiences a temperate monsoon climate with distinct seasonal characteristics. Summers are hot and rainy, while winters are cold and dry. Daily temperature variations are moderate, and the region enjoys abundant sunshine. Precipitation is predominantly concentrated in summer, and spring is prone to sandstorms. Beijing’s industrial structure centres on the tertiary sector, prioritising modern services, high-tech industries, and cultural and creative industries. Heavy industry accounts for a relatively low proportion, though certain areas still host energy processing and high-end manufacturing sectors. Significant disparities exist in pollutant emissions between urban and suburban zones. Regarding pollution sources, Beijing’s atmospheric environment is primarily influenced by two factors: natural factors, notably spring sandstorms and surface dust pollution; and anthropogenic factors, chiefly motor vehicle exhaust emissions, industrial pollutant discharges, and energy consumption during the winter heating season.
This study uses hourly AQI, pollutant, and meteorological data from Beijing spanning 1 June to 31 August 2023, comprising 2208 data points. Pollutants include PM
2.5, PM
10, SO
2, NO
2, O
3, and CO, while meteorological factors encompass air temperature, humidity, atmospheric pressure, and wind speed. AQI and pollutant data were sourced from the National Urban Air Quality Real-time Publishing Platform of the China National Environmental Monitoring Centre (CNEMC) [Online]. Available:
https://air.cnemc.cn:18007/ (accessed on 15 October 2025), while meteorological data were sourced from the National Meteorological Information Centre (NMIC) [Online]. Available:
http://data.cma.cn/ (accessed on 15 October 2025).
The descriptive statistics presented in
Table 1 offer a comprehensive overview of the dataset and reveal the dominant influence of specific environmental and meteorological parameters on the AQI during the summer of 2023 in Beijing.
The average AQI was only 48.84, with a maximum of 150, indicating air quality was predominantly excellent or good and no heavy pollution episodes occurred. Among the six criteria pollutants, ozone (O3) exhibited the highest mean concentration (101.90 μg/m3) and by far the greatest variability (Std. Dev = 54.90 μg/m3), while primary pollutants such as PM2.5 (mean: 20.84 μg/m3), PM10 (mean: 39.86 μg/m3), NO2 (mean: 16.57 μg/m3), and CO (mean: 0.45 mg/m3) remained at remarkably low levels. This pattern suggests that secondary photochemical processes, rather than direct emissions, were the primary drivers of AQI variations during this period.
Temperature emerged as a key meteorological driver, with a mean of 28.17 °C and moderate variability (Std. Dev = 4.82 °C), creating favourable conditions for intense solar radiation and the rapid photochemical formation of ozone from precursor pollutants. Relative humidity averaged 60.44% with marked dispersion (Std. Dev = 25.08%), reflecting frequent convective rainfall events, which effectively scavenged particulate matter through wet deposition, thereby suppressing PM concentrations. Wind speed was relatively low (mean: 2.24 m/s, Std. Dev = 1.24 m/s), indicating generally calm conditions, which limited long-range pollutant transport and favoured the local accumulation of photochemically produced pollutants. Atmospheric pressure showed minor fluctuations (Std. Dev = 4.62 hPa), consistent with the stable dominance of the East Asian summer monsoon and the absence of strong subsidence inversions characteristic of colder seasons.
Taken together, the interplay of high temperatures, moderate-to-high humidity coupled with frequent precipitation, low wind speeds, and stable pressure patterns created a meteorological regime in which temperature-driven ozone production overwhelmingly dominated AQI variability, while particulate matter was efficiently removed by wet deposition. This contextual understanding of the dataset, alongside the specific roles played by these environmental and meteorological parameters, lays a solid foundation for subsequent correlation analysis and modelling stages.
During data preprocessing, missing data caused by equipment failure were first addressed using mean imputation to ensure data integrity and minimise its impact on the model. Subsequently, outliers arising from equipment malfunctions, transmission errors, or extreme weather were detected using the Z-score method. Given the limited sample size, outliers with |Z| > 3 were corrected via threshold-based correction. Next, to eliminate the impact of differing variable dimensions on analysis and accelerate the convergence of machine learning algorithms, Min-Max normalisation was applied to scale the data to the [0, 1] range. Finally, the 2208 empirical data points were divided into a training set (1766 records) and a test set (442 records) in an 8:2 ratio in chronological order.
3.2. Feature Selection
AQI is influenced by multiple factors. with influencing factors as model inputs risks increasing computational workload and data redundancy, making feature selection crucial. Correlation analysis among features enables the identification of optimal subsets from numerous features, eliminating non-critical factors and significantly enhancing model training efficiency. Given the pronounced influence of weather conditions on air quality, Pearson’s correlation coefficient was employed to investigate the relationship between concentrations of six primary pollutants and meteorological variables. This coefficient measures the strength of data association, calculated as the covariance divided by the product of their respective standard deviations. Results range from −1 to 1, with values closer to 1 indicating a stronger positive correlation between the factor and AQI. Using the concentrations of six AQI-influencing pollutants as baseline data, combined with four categories of meteorological data, the aforementioned method was applied the aforementioned method after screening and preprocessing. This ultimately yielded a Pearson correlation coefficient heatmap (
Figure 4), revealing the interaction mechanisms between pollutants and meteorological factors while enabling the selection of relevant characteristics.
As shown in
Figure 4, the correlation coefficient between SO
2 and AQI is merely 0.117, while that for NO
2 is 0.065, indicating extremely weak linear correlations with AQI. The correlation coefficient between humidity and AQI was −0.361, between atmospheric pressure and AQI was −0.100, and between wind speed and AQI was 0.149. These meteorological variables also exhibited relatively weak correlations with AQI, indicating limited direct explanatory power for AQI. Conversely, PM
2.5 exhibits a moderate positive correlation with AQI (correlation coefficient 0.565), while PM
10 shows a strong positive correlation (0.662). O
3 demonstrates the strongest positive correlation with AQI (0.751), and CO exhibits a moderate positive correlation (0.380). Air temperature also exhibits a moderate positive correlation with AQI (0.585). Notably, while PM
2.5 and PM
10 demonstrate extremely strong collinearity (correlation coefficient 0.907), their influence on AQI differs due to their distinct particle size dimensions (as pollutants of different diameters). O
3 and CO show weaker collinearity with other pollutants, providing independent types of pollution information. Although air temperature exhibits strong collinearity with O
3 (correlation coefficient 0.778), as a meteorological variable it maintains a close association with AQI through unique mechanisms. Consequently, PM
2.5, PM
10, O
3, CO, and air temperature were selected as predictive features for AQI.