A Deep Learning–Random Forest Hybrid Model for Predicting Historical Temperature Variations Driven by Air Pollution: Methodological Insights from Wuhan

Liu, Yu; Du, Yuanfang

doi:10.3390/atmos16091056

Open AccessArticle

A Deep Learning–Random Forest Hybrid Model for Predicting Historical Temperature Variations Driven by Air Pollution: Methodological Insights from Wuhan

by

Yu Liu

and

Yuanfang Du

^*

Department of Mathematics, Xizang University, Lhasa 850000, China

^*

Author to whom correspondence should be addressed.

Atmosphere 2025, 16(9), 1056; https://doi.org/10.3390/atmos16091056

Submission received: 29 May 2025 / Revised: 6 July 2025 / Accepted: 15 July 2025 / Published: 8 September 2025

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Download

Browse Figures

Versions Notes

Abstract

With the continuous acceleration of industrialization, air pollution has become increasingly severe and has, to some extent, contributed to the progression of global climate change. Against this backdrop, accurate temperature forecasting plays a vital role in various fields, including agricultural production, energy scheduling, environmental governance, and public health protection. To improve the accuracy and stability of temperature prediction, this study proposes a hybrid modeling approach that integrates convolutional neural networks (CNNs), Long Short-Term Memory (LSTM) networks, and random forests (RFs). This model fully leverages the strengths of CNNs in extracting local spatial features, the advantages of LSTM in modeling long-term dependencies in time series, and the capabilities of RF in nonlinear modeling and feature selection through ensemble learning. Based on daily temperature, meteorological, and air pollutant observation data from Wuhan during the period 2015–2023, this study conducted multi-scale modeling and seasonal performance evaluations. Pearson correlation analysis and random forest-based feature importance ranking were used to identify two key pollutants (PM_2.5 and O₃) and two critical meteorological variables (air pressure and visibility) that are strongly associated with temperature variation. A CNN-LSTM model was then constructed using the meteorological variables as input to generate preliminary predictions. These predictions were subsequently combined with the concentrations of the selected pollutants to form a new feature set, which was input into the RF model for secondary regression, thereby enhancing the overall model performance. The main findings are as follows: (1) The six major pollutants exhibit clear seasonal distribution patterns, with generally higher concentrations in winter and lower in summer, while O₃ shows the opposite trend. Moreover, the influence of pollutants on temperature demonstrates significant seasonal heterogeneity. (2) The CNN-LSTM-RF hybrid model shows excellent performance in temperature prediction tasks. The predicted values align closely with observed data in the test set, with a low prediction error (RMSE = 0.88, MAE = 0.66) and a high coefficient of determination (R² = 0.99), confirming the model’s accuracy and robustness. (3) In multi-scale forecasting, the model performs well on both daily (short-term) and monthly (mid- to long-term) scales. While daily-scale predictions exhibit higher precision, monthly-scale forecasts effectively capture long-term trends. A paired-sample t-test on annual mean temperature predictions across the two time scales revealed a statistically significant difference at the 95% confidence level (t = −3.5299, p = 0.0242), indicating that time granularity has a notable impact on prediction outcomes and should be carefully selected and optimized based on practical application needs. (4) One-way ANOVA and the non-parametric Kruskal–Wallis test were employed to assess the statistical significance of seasonal differences in daily absolute prediction errors. Results showed significant variation across seasons (ANOVA: F = 2.94, p = 0.032; Kruskal–Wallis: H = 8.82, p = 0.031; both p < 0.05), suggesting that seasonal changes considerably affect the model’s predictive performance. Specifically, the model exhibited the highest RMSE and MAE in spring, indicating poorer fit, whereas performance was best in autumn, with the highest R² value, suggesting a stronger fitting capability.

Keywords:

temperature; air pollution; convolutional neural network (CNN); long short-term memory network (LSTM); random forest (RF) time scale; season

1. Introduction

With the intensification of global climate change, the interaction between air pollution and temperature has become increasingly complex. This relationship holds significant research value and practical importance for addressing climate change and environmental management challenges [1,2,3,4,5]. Accurately predicting this interaction not only aids in air quality management but also plays a critical role in fields such as agriculture, energy regulation, and ecological protection. However, existing studies have shown that the uncertainty associated with global warming leads to a highly nonlinear and time-varying coupling between air pollution and temperature. Traditional physical and statistical models face evident limitations in capturing such dynamic processes. Therefore, there is an urgent need to develop efficient predictive approaches capable of learning complex spatiotemporal patterns to enhance model adaptability and forecasting accuracy. In-depth exploration of the underlying mechanisms and the development of advanced predictive methods have become central issues in improving forecasting performance and responding to the challenges posed by climate change [6,7,8].

Time series forecasting has relied on classical models such as the Autoregressive Moving Average (ARMA), the Autoregressive Integrated Moving Average (ARIMA), and the Seasonal Autoregressive Integrated Moving Average (SARIMA) models [9,10]. For instance, Wang Li et al. applied the SARIMA model to predict the monthly average Air Pollution Index (API) in Lanzhou from 2003 to 2012 and monitored the prediction results using residual control charts [11]. These models have demonstrated good performance in fields such as economic forecasting and energy consumption, particularly when dealing with linear and stationary data. However, as these traditional approaches are typically based on linear assumptions, they exhibit limitations when faced with nonlinear and time-varying datasets, making it difficult to fully capture complex patterns in the data [12]. To address these limitations, machine learning methods have been increasingly applied to environmental data modeling. Among them, the random forest (RF) algorithm, with its ensemble learning structure, has shown strong robustness and generalization capabilities in handling high-dimensional and heterogeneous data. RF is capable of identifying complex nonlinear relationships effectively. For example, Du Y. et al. [13] explored the impact of six major pollutants on the Air Quality Index (AQI) and found that the RF model significantly outperformed the SARIMA model in terms of prediction accuracy, further demonstrating its potential in air pollution forecasting. In recent years, deep learning techniques—particularly the Long Short-Term Memory (LSTM) network—have made remarkable progress in modeling time series data. LSTM excels at capturing long-term dependencies and has been successfully applied in domains such as weather forecasting and speech recognition. For instance, Li X. et al. [14] proposed an attention-based LSTM model for multi-step time series forecasting, which significantly improved trend prediction accuracy by integrating data such as air and soil temperature and humidity from greenhouses. Nevertheless, LSTM models may face overfitting issues when dealing with complex local patterns and noisy environments, posing challenges to their application in certain real-world scenarios [15].

Recently, deep learning techniques have been increasingly applied to weather forecasting, especially in regions with complex climate conditions. These models have significantly improved the accuracy of temperature and weather predictions. For instance, Ranjan et al. used hybrid neural networks for traffic congestion prediction, while Zhang et al. employed CNN-LSTM models for global long-term temperature forecasting [16,17]. Huang and Kuo successfully applied the CNN-LSTM model to PM_2.5 concentration prediction, outperforming other machine learning methods [18]. Wang et al. applied CNN-LSTM to short-term storm surge forecasting and found it superior to traditional support vector regression (SVR) models [19]. Zhang et al. introduced an attention mechanism into a CNN-LSTM framework to predict the performance of concrete aggregates, significantly improving the model’s accuracy, robustness, and generalization capability compared to traditional CNN-LSTM models [20]. Additionally, Ran et al. proposed a hybrid CNN-LSTM-AM model that integrates time series data across different time intervals for multi-step EV load forecasting, which greatly enhanced predictive accuracy [21]. Overall, the CNN-LSTM model demonstrates significant advantages in the field of meteorological forecasting. It is capable of effectively handling nonlinear and large-scale time series data, thereby enhancing prediction performance. Based on this, the present study develops a CNN-LSTM-RF hybrid model for temperature prediction and further optimizes the model input by incorporating information from six major air pollutants. The results show that this approach outperforms existing models in terms of both predictive accuracy and stability, providing strong support for atmospheric environment monitoring and climate change research [22]. The structure of this paper is organized as follows: Section 1 introduces the background of relevant research, both domestically and internationally. Section 2 presents the data sources, preprocessing procedures, and the theoretical foundations of the models used. Section 3 elaborates on the proposed CNN-LSTM-RF hybrid model and provides a detailed analysis of the results. Section 4 summarizes the research findings, discusses current limitations, and outlines the prospects for practical applications and future research directions.

2. Materials and Methods

2.1. Data Sources and Preprocessing

Temperature data, other meteorological factors, and six major air pollutants were obtained from the Huiju Atmospheric Platform and the China National Air Quality Online Monitoring and Analysis Platform. The dataset covers the Wuhan region from 1 January 2015 to 31 December 2023 and includes daily average temperature, precipitation (mm), daily average humidity (%), daily average wind speed (m/s), daily wind scale (level), daily atmospheric pressure (hPa), daily visibility (km), total daily precipitation, and total cloud cover (%), as well as the daily average concentrations of six major pollutants (PM_2.5, PM₁₀, SO₂, NO₂, CO, and O₃). The dataset consists of 14 variables with a total of 3280 valid observations. This study systematically analyzed the temporal evolution characteristics of air pollutants’ effects on temperature based on daily frequency data. Data cleaning and preprocessing were performed using Python (Jupyter Notebook 7.0.8). Missing dates were first identified, followed by outlier removal using the interquartile range (IQR) method, and finally, missing data were imputed using monthly averages to ensure data stability and accuracy of analysis.

2.2. Theoretical Framework of the Model

2.2.1. Convolutional Neural Network (CNN) Model

The convolutional neural network (CNN) is a deep learning architecture extensively applied in image recognition, video analysis, and natural language processing tasks [23]. As shown in Figure 1, a typical CNN architecture consists of an input layer, multiple hidden layers, and an output layer. The hidden layers are highly flexible and complex, typically comprising convolutional layers, pooling layers, and fully connected layers.

2.2.2. Long Short-Term Memory (LSTM) Model

Long Short-Term Memory (LSTM) is a variant of Recurrent Neural Networks (RNNs) specifically designed to handle and predict sequential data, with significant advantages in addressing the long-term dependency problem [24]. Compared to traditional RNNs, LSTM introduces gating mechanisms, including the forget gate, input gate, and output gate, which effectively mitigate issues such as vanishing and exploding gradients. This significantly enhances the model’s stability and representational capacity in modeling long time series. LSTM has been widely applied in fields such as speech recognition, natural language processing, and weather forecasting and has become one of the mainstream approaches in time series modeling. A typical architecture of the LSTM network is illustrated in Figure 2.

The structure of each component can be represented by the following equations:

f_{t} = σ (W_{f} \times x_{t} + U_{f} \times h_{t - 1} + b_{f})

(1)

i_{t} = σ (W_{i} \times x_{t} + U_{i} \times h_{t - 1} + b_{i})

(2)

\bar{c_{t}} = σ (W_{c} \times x_{t} + U_{c} \times h_{t - 1} + b_{c})

(3)

c_{t} = f_{t} \times c_{t - 1} + i_{t} \times \bar{c_{t}}

(4)

o_{t} = σ (W_{o} \times x_{t} + U_{o} \times h_{t - 1} + b_{o})

(5)

h_{t} = o_{t} \times \tanh (c_{t})

(6)

The input data at time t is denoted as

x_{t}

, the previous hidden state as

h_{t - 1}

, the previous memory cell state as

c_{t - 1}

, he current memory cell state as

c_{t}

, and the current hidden state as

h_{t}

. These operations are represented by Equations (1)–(6): The variables

f_{t}

,

i_{t}

, and

o_{t}

represent the activation signals at time t generated by the forget gate, input gate, and output gate. The symbol

W_{f}, W_{i}, W_{c}, W_{o}

denotes the weight matrices associated with these gates and the memory cell state, while

U_{f}, U_{i}, U_{c}, U_{o}

represents the recurrent weight matrices corresponding to the connections within the gates. The variable

b_{f}, b_{i}, b_{c}, b_{o}

refers to the bias terms for each gate. Additionally, the function

σ

denotes the sigmoid activation function, which plays a key role in regulating the openness of the gates, thereby ensuring controlled information flow through the network.

2.2.3. Random Forest (RF) Model

The prediction theory of the random forest (RF) model is based on the ensemble learning framework, which improves model accuracy and robustness by combining the prediction results of multiple decision trees [25]. If we suppose the random forest consists of M decision trees, then

Random Forest Output : \overset{\land}{y} = \frac{1}{M} \sum_{m = 1}^{M} f_{m} (x)

(7)

where

$f_{m} (x)$ denotes the output of the m-th decision tree;
$M$ represents the total number of decision trees;
$x$ is the input feature vector.

3. Results and Analysis

3.1. Correlation Analysis

3.1.1. Pearson Correlation Coefficient Analysis

To evaluate the strength of the relationship between temperature and various influencing factors, this study calculates the Pearson correlation coefficients between temperature and six major air pollutants, as well as between temperature and key meteorological variables. Significance tests are also conducted to assess the reliability of the correlations. Through this correlation analysis, variables that are strongly associated with temperature changes can be effectively identified, providing a basis for subsequent modeling and helping to select the most relevant features to retain. The Pearson correlation coefficient is a statistical measure of the strength and direction of a linear relationship between two variables, with values ranging from −1 to 1 [26]. The formula is as follows:

ρ_{x, y} = \frac{cov (x, y)}{σ_{x} σ_{y}}

(8)

According to the Pearson correlation coefficients shown in Figure 3, the correlation values between temperature and the six major pollutants—PM_2.5, PM₁₀, NO₂, CO, SO₂, and O₃—are −0.47, −0.28, −0.28, −0.29, −0.15, and 0.58, respectively. The corresponding significance levels (p-values) are 5 × 10⁻¹⁸¹, 3.2 × 10⁻⁶⁰, 2.7 × 10⁻⁶¹, 1.2 × 10⁻⁶², 4.9 × 10⁻¹⁹, and 5.2 × 10⁻²⁹⁷. The results indicate that PM_2.5, PM₁₀, NO₂, CO, and SO₂ are significantly negatively correlated with temperature, among which PM_2.5 shows the strongest negative correlation (r = −0.47, p < 0.001). In contrast, O₃ is significantly positively correlated with temperature (r = 0.58, p < 0.001), suggesting that increased ozone concentration is closely associated with rising temperatures. Based on the strength of these correlations, the pollutants’ influence on temperature can be ranked as follows: O₃ > PM_2.5 > (NO₂ ≈ PM₁₀ ≈CO) > SO₂. Further analysis reveals that O₃ has the most prominent effect on temperature, likely due to the accelerated production of ozone and enhanced photochemical reactions under high-temperature conditions, leading to significant increases in O₃ concentrations. Although particulate matter such as PM_2.5 also exhibits a strong negative correlation, its impact is less pronounced than that of O₃. This may be attributed to the fact that the generation and dispersion of particulates are influenced by various meteorological factors, including humidity, wind speed, and boundary layer height. Moreover, PM₁₀, NO₂, CO, and SO₂ show relatively weaker Pearson correlations, indicating a limited direct effect on temperature. This is likely because the generation and emissions of these pollutants are primarily driven by anthropogenic activities such as traffic and industrial processes, and temperature plays a lesser role in regulating their concentrations. Based on this, this study selects O₃ and PM_2.5 as representative air pollutant factors, which serve as key input features for the random forest model, thereby laying a data foundation for the construction of the temperature prediction model.

Furthermore, according to the Pearson correlation coefficients presented in Figure 4, air temperature exhibits varying degrees of correlation with other major meteorological factors, namely, relative humidity, wind speed, wind scale, atmospheric pressure, visibility, precipitation, and cloud cover. The respective correlation coefficients are −0.03, −0.05, −0.05, −0.85, 0.47, 0.03, and −0.15, with corresponding significance levels (p-values) of 0.1, 0.0076, 0.0076, <0.001, 1.5 × 10⁻¹⁸³, 0.058, and 7.8 × 10⁻¹⁹. These results indicate that atmospheric pressure has the strongest and most statistically significant negative correlation with air temperature (r = −0.85, p < 0.001), among all the examined meteorological variables. This suggests that high temperatures are more likely to occur under low-pressure conditions, which may be attributed to enhanced vertical atmospheric motion, reduced cloud cover, and increased solar radiation under such systems. Visibility demonstrates a moderate positive correlation with temperature (r = 0.47, p < 0.001), implying that higher temperatures are often associated with clearer atmospheric conditions, potentially due to lower concentrations of water vapor or particulate matter. This pattern is commonly observed under dry and sunny weather conditions that are conducive to heat accumulation. Cloud cover exhibits a significant negative correlation with temperature (r = −0.15, p < 0.001), indicating that cloudy or overcast conditions tend to suppress temperature rise due to the blocking of solar radiation. Conversely, clear skies are generally associated with higher temperatures. Both wind speed and wind scale show weak but statistically significant negative correlations with air temperature (r = −0.05, p = 0.0076). These relationships may reflect the role of enhanced wind in promoting atmospheric mixing and thermal dispersion, thereby mitigating sharp increases in surface temperature. Relative humidity exhibits the weakest correlation with temperature (r = −0.03, p = 0.1), which is not statistically significant, suggesting that, in the studied region and temporal scale, the linear influence of humidity on temperature is limited. Furthermore, precipitation shows a weak positive correlation with temperature (r = 0.03, p = 0.058), which approaches but does not reach conventional levels of statistical significance. This implies that precipitation’s effect on temperature may be complex and nonlinear, potentially influenced by factors such as precipitation intensity, type, and timing.

Comprehensive analysis indicates that atmospheric pressure and visibility are the two meteorological factors most closely related to temperature variation, with atmospheric pressure having the most significant influence, followed by visibility. *Both variables exhibit strong statistical correlations with temperature, suggesting that they possess high explanatory power and representativeness in temperature prediction models. In contrast, cloud cover, wind speed, wind scale, relative humidity, and precipitation show relatively weak correlations with temperature and do not demonstrate significant linear relationships, indicating their limited direct impact on temperature changes. The above findings provide a theoretical basis and variable selection support for subsequent feature extraction of meteorological factors based on convolutional neural networks (CNNs).

3.1.2. Random Forest Feature Importance Analysis

The Pearson correlation coefficient primarily measures linear relationships between variables; however, the relationship between temperature, air pollutants, and meteorological factors may involve complex nonlinear interactions. To further analyze these intricate relationships, a random forest (RF) model is employed for quantitative investigation. As an ensemble learning method, RF improves prediction accuracy and robustness by integrating multiple decision trees. It not only captures nonlinear features effectively but also provides estimates of feature importance, allowing for the assessment of each variable’s influence on temperature. Variables with higher importance scores are considered to have a stronger association with temperature.

According to the feature importance ranking shown in Figure 5a, the contributions of various pollutants to temperature variation differ significantly. Ozone (O₃) has the most significant impact on temperature, exhibiting the highest feature importance, indicating that changes in O₃ concentration play a decisive role in temperature prediction, potentially by enhancing the greenhouse effect and driving temperature increases. PM₂.₅ ranks second; although it is negatively correlated with temperature, its feature importance is lower than that of O₃, which may be related to particulate matter’s scattering and absorption of solar radiation as well as atmospheric stability factors. In contrast, nitrogen dioxide (NO₂), carbon monoxide (CO), PM₁₀, and sulfur dioxide (SO₂) have relatively low feature importance, suggesting a weaker influence on temperature. Furthermore, based on the feature importance ranking in Figure 5b, meteorological factors also exhibit significant differences in their contributions to temperature variation. Atmospheric pressure shows the highest feature importance, indicating that its variation plays a decisive role in temperature prediction, possibly by regulating atmospheric circulation and enhancing the greenhouse effect. Visibility ranks second; although it is positively correlated with temperature, its feature importance is notably lower than that of atmospheric pressure. These results are highly consistent with the Pearson correlation coefficient analysis. Regarding atmospheric pollutants, the analysis further validates the significant positive impact of O₃ on temperature and the negative correlation of PM_2.5. For meteorological factors, the analysis also confirms the significant influence of atmospheric pressure on temperature and the positive correlation with visibility. Meanwhile, the results suggest that other pollutants and meteorological factors contribute relatively little to temperature variation. Regarding atmospheric pollutants, the analysis further validates the significant positive impact of O₃ on temperature and the negative correlation of PM_2.5. For meteorological factors, the analysis also confirms the significant influence of atmospheric pressure on temperature and the positive correlation with visibility. Meanwhile, the results suggest that other pollutants and meteorological factors contribute relatively little to temperature variation. Based on the above analysis, this study selects atmospheric pressure and visibility as two key meteorological factors, along with O₃ and PM_2.5, as the two primary pollutant concentrations. These variables are, respectively, used as input features for the convolutional neural network (CNN) model and the random forest (RF) model, laying a solid foundation for subsequent temperature prediction model development.

Based on the aforementioned correlation analysis, this study selected two major pollutants, O₃ and PM_2.5, along with two key meteorological factors—visibility and atmospheric pressure—as important feature variables for subsequent modeling. To further verify the causal relationships between these variables and air temperature, the Granger causality test with a lag order of 5 was applied. The results indicate that PM_2.5, O₃, daily visibility (measured in kilometers), and daily atmospheric pressure (measured in hectopascals) all have significant causal relationships with air temperature, as detailed in Table 1. This analysis demonstrates that these four variables significantly influence temperature variation, supporting the rationale for selecting PM_2.5 and O₃ as key predictive features in this study’s model, while also suggesting the potential regulatory effects of other meteorological factors (visibility and atmospheric pressure) on temperature.

3.2. Seasonal Analysis

Although the random forest method can effectively capture the overall nonlinear relationships between air temperature and various pollutants, and quantitatively assess feature importance, it still has certain limitations in revealing spatiotemporal heterogeneity. In particular, meteorological conditions and pollutant emission mechanisms vary significantly across seasons, which can profoundly influence the interactions between temperature and pollutants. For example, during autumn and winter, increased heating demand, rising energy consumption, and a lowered atmospheric boundary layer height lead to intensified pollutant emissions and poorer dispersion conditions. These changes enhance the feedback effect of pollutants on temperature variation. Therefore, taking seasonal factors into account is crucial for a deeper understanding of the complex relationships between pollutants and temperature and for improving the generalizability and predictive accuracy of the model.

To further explore the linear relationships between air temperature and six major pollutants, this study conducted a correlation analysis based on seasonal scales. The results (see Figure 6) show that the correlations between temperature and each pollutant vary significantly across different seasons. Specifically, during spring and summer, the correlations between temperature and pollutants are generally weak, with most not reaching statistical significance, indicating a limited influence of pollutants on temperature during these periods. As the season transitions into autumn, ozone (O₃) exhibits a significant positive correlation with temperature (r = 0.58). This may be attributed to the gradual decrease in temperature and the stabilization of atmospheric conditions, which favor the accumulation and increased concentration of ozone. The stable atmospheric environment reduces ozone dispersion, thereby enhancing its impact on local temperature. In winter, the correlation pattern becomes more complex. Nitrogen dioxide (NO₂), particulate matter (PM₁₀), carbon monoxide (CO), and sulfur dioxide (SO₂) show significant positive correlations with temperature. This phenomenon is likely related to increased emissions from coal combustion and industrial activities during the heating season. Additionally, the stable atmospheric stratification and frequent temperature inversions in winter limit the vertical dispersion of pollutants, leading to their accumulation near the surface and further strengthening their feedback effects on temperature variations. These findings suggest that temperature variations are not only influenced by seasonal factors but are also significantly regulated by air pollutants.

Figure 7 shows that the concentrations of PM_2.5, PM₁₀, NO₂, CO, and SO₂ are relatively high in autumn and winter and lower in spring and summer, while O₃ exhibits the opposite seasonal trend, peaking in summer and reaching its lowest levels in winter. Correlation analysis indicates that the relationship between pollutants and temperature is generally weak in spring and summer, whereas a significant positive correlation is observed between O₃ and temperature in autumn (r = 0.58, p < 0.05). It is noteworthy that, although O₃ concentrations are highest in summer, their correlation with temperature is not statistically significant, suggesting that temperature is not the sole or primary driver of O₃ variation. The formation of O₃ is also influenced by other factors, such as solar radiation intensity, precursor gas concentrations, and relative humidity. In winter, concentrations of PM₁₀, NO₂, CO, and SO₂ increase markedly and exhibit statistically significant positive correlations with temperature. However, this does not imply that higher pollutant concentrations directly cause increases in temperature. Rather, it more likely reflects a co-varying trend under similar meteorological conditions. During winter, temperature inversion layers frequently occur, vertical atmospheric mixing is suppressed, and pollutants tend to accumulate near the surface. Additionally, low wind speeds, poor dispersion conditions, and high humidity contribute to the so-called “pollutant retention effect.”.

3.3. Analysis of Temperature Forecasting Accuracy

The temperature data, other meteorological factors, and six major air pollutant datasets were all obtained from the Huiju Atmospheric Platform. The data span from 1 January 2015 to 31 December 2023, with daily observations, totaling 3280 valid records. To ensure the reliability of model training and prediction, the data were chronologically divided into a training set (80%, from 1 January 2015 to 14 March 2022) and a testing set (20%, from 15 March 2022 to 31 December 2023).

An initial analysis of the raw temperature series was conducted. As shown in Figure 8a, the original temperature sequence exhibits a certain trend but contains several data points with significant deviations. To eliminate the influence of scale and enhance model training efficiency, normalization was applied using the method defined in Equation (2). The normalized sequence is shown in Figure 8b, which retains the overall trend of the data while effectively reducing the degree of dispersion among data points and compressing the values into the range [0, 1]. This normalization process contributes to improving the training stability of the CNN-LSTM model and accelerates convergence toward optimal parameters.

{X_{i j}}^{*} = 2 \times \frac{X_{i j} - X_{\min}}{X_{\max} - X_{\min}} - 1

(9)

where

{X_{i j}}^{*}

is the normalized value,

X_{i j}

is the original value,

X_{\max}

denotes the maximum, and

X_{\min}

denotes the minimum within the dataset.

To ensure fairness in the performance comparison among different models, this study standardized the input features uniformly, and all models employed a feature set including two major meteorological factors—atmospheric pressure and visibility—along with historical temperature data. This approach eliminates performance bias caused by differences in feature information across models. In addition, recognizing that the lookback window (i.e., time step) may significantly affect the accuracy of temperature prediction, this study systematically determined the optimal lookback value by combining autocorrelation function (ACF) analysis with the experimental validation method (Grid Search), as shown in Figure 9. Specifically, by evaluating the Mean Squared Error (MSE) of model predictions on the testing set under various lookback settings, the results, as shown in Figure 9a, indicated that a lookback value of 7 yielded the lowest MSE (0.0267). The ACF plot (Figure 9b) also exhibited significant autocorrelation at lag = 7, reflecting the long-memory characteristics of the temperature series. These findings provide both theoretical justification and empirical support for the subsequent CNN-LSTM-RF model design and temporal dependency modeling. It is worth noting that the lookback = 7 adopted in this study means that the model inputs consist of the temperature observations from the previous seven consecutive days. To further improve the model’s generalization ability and mitigate overfitting, a dropout regularization mechanism was introduced during training, with the dropout rate set to 0.2. This means that, during each training iteration, 20% of the neurons are randomly deactivated, which effectively enhances the model’s stability and generalization performance on unseen data.

Table 2 provides a detailed overview of the model architecture along with the output shapes and parameter configurations. The input layer has an output shape of (None, 7, 3), indicating it accepts sequential data with a time step length of 7 and a feature dimension of 3, and contains no trainable parameters. Next, the Conv1D layer employs 32 one-dimensional convolutional filters of size 3 for feature extraction. With 3 input channels and 32 output channels, a stride of 1, and no padding (padding = 0), the output shape becomes (None, 32, 5), reflecting a reduction of 2 in the time steps compared to the input. This layer introduces 320 trainable parameters. The ReLU activation function provides nonlinear transformation capability, maintaining the same output shape as the convolutional layer (None, 32, 5). It effectively mitigates the vanishing gradient problem and accelerates model convergence. Following this, the MaxPool1D layer performs down sampling with a pool size of 3 and a stride of 1, compressing the output shape to (None, 32, 3). This reduces computational complexity while enhancing feature representation. Subsequently, a dimension permutation operation rearranges the tensor to the input format required by the LSTM layer, i.e., (batch, seq_len, features). The LSTM component consists of two layers, each with 32 units, producing an output shape of (None, 3, 32) and containing 16,640 trainable parameters. It aims to capture the long-term dependencies in the temperature time series. The following fully connected layer applies a linear mapping to transform the LSTM output into a single prediction value, with an output shape of (None, 1) and 33 parameters. Finally, the CNN-LSTM module outputs the temperature prediction with a shape of (None, 1). The total number of parameters is 16,993, reflecting the model’s efficiency and integrative capability in temporal feature extraction and sequence modeling.

To enhance the model’s environmental awareness, the PM_2.5 and O₃ concentration features are introduced as supplementary inputs. The output of the CNN-LSTM model, along with these two pollutant concentration features, forms the input feature set for the random forest model, with dimensions of (None, 3). This multi-source feature fusion strategy effectively improves the model’s ability to predict temperature changes.

The choice of loss function plays a crucial role in determining the predictive accuracy of the model. In this study, the Mean Squared Error (MSE) is employed as the loss function. MSE evaluates model performance by calculating the average of the squared differences between the predicted and actual values. Due to its high sensitivity to large errors, MSE effectively amplifies significant deviations, making it well suited for tasks that demand high predictive precision. This, in turn, enables the model to more accurately capture the variation characteristics of the target variable.

Figure 10 illustrates the fitting performance of the CNN-LSTM model during the training phase, as well as the evolution of the loss function. The left panel shows the training set fitting curve, where the model’s predicted values (red curve) closely match the actual observations (blue curve). The model demonstrates excellent fitting ability, particularly in capturing the periodic fluctuations and extreme values of temperature, indicating its effectiveness in modeling the temporal patterns and seasonal characteristics of temperature data. The right panel displays the variation trend of the loss function (measured by Mean Squared Error, MSE) throughout the training process. As shown in the figure, the model achieved a substantial reduction in loss during the early training epochs, dropping rapidly from an initial value above 20 to nearly 1. The loss then gradually stabilized and eventually settled around 0.90, demonstrating good convergence behavior. This process reflects the model’s strong feature learning capability in the early training stage, efficiently capturing the main structural changes and intrinsic patterns in the data. Moreover, the loss function remains stable in the later stages of training without significant oscillations, suggesting that the model does not suffer from overfitting. In other words, the model has not simply memorized the training data but exhibits a certain degree of abstract generalization. This indicates that the model is likely to perform well on unseen data, validating the effectiveness and robustness of the CNN-LSTM hybrid architecture for modeling highly complex temperature time series.

Specifically, the CNN module extracts local variation features (such as atmospheric pressure and visibility) through local convolution operations, while the LSTM module captures both short- and long-term dependencies along the time dimension via its gating mechanism. The synergy between these two components significantly enhances the model’s adaptability to non-stationary climate data. Furthermore, the steady decline and stabilization of the training loss further support the robustness and reliability of the CNN-LSTM structure in handling complex temperature series, providing clear and strong evidence of the model’s effectiveness and stability.

To validate the superiority of the CNN-LSTM model, Figure 11 presents the prediction results on the test set. The predicted curve aligns closely with the actual values, further confirming the model’s predictive capability. The output of this deep learning model is subsequently used as one of the input features for the random forest model, providing a reliable foundation for temperature prediction research based on the concentrations of major air pollutants.

Finally, this study integrates the prediction results of the CNN-LSTM model with two key pollutant features identified earlier (PM_2.5 and O₃) to construct a three-dimensional input feature set, which is then used as input to the random forest (RF) model to further enhance temperature prediction performance. This hybrid approach leverages the strengths of deep learning in time series modeling and the powerful nonlinear feature extraction capability of the RF model, significantly improving the accuracy and robustness of temperature forecasts. As shown in Figure 12a, the predicted values on the test set closely match the actual observations, indicating that the model possesses strong fitting capability and generalization performance. The residual time series plot in Figure 12b demonstrates that, although some fluctuations exist, the overall distribution of residuals is relatively stable, implying that the prediction errors are minor and within a controllable range. Some deviations may result from inevitable systematic errors that cannot be fully eliminated during the model fitting process. Furthermore, the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots in Figure 12c show no significant autocorrelation beyond lag 1, suggesting that the model has effectively captured the underlying patterns in the data and that the remaining residuals resemble white noise. The QQ plot in Figure 12d reveals that the residuals are approximately normally distributed. Taken together, the residual analysis confirms that the CNN-LSTM-RF model achieves a satisfactory fit, with prediction errors primarily reflecting random disturbances, thus demonstrating the model’s strong stability and reliability in temperature forecasting tasks.

Table 3 presents the performance comparison results of different models. The analysis shows that compared to individual models, the CNN-LSTM-RF hybrid model achieves a significant improvement in prediction performance, highlighting the advantages of deep learning approaches in temperature forecasting tasks. Specifically, the random forest (RF) model achieved RMSE = 5.58, MAE = 4.42, and R² = 0.60, indicating some strengths in nonlinear feature extraction but also revealing limitations in capturing the complexity of temperature patterns. The convolutional neural network (CNN) model shows a notable performance improvement, with RMSE = 3.43, MAE = 2.78, and R² = 0.86, demonstrating strong capability in extracting spatial features from temperature data. The Long Short-Term Memory (LSTM) network further enhances predictive accuracy, reducing RMSE to 2.64, reducing MAE to 1.91, and increasing R² to 0.91, validating its superiority in modeling sequential data and capturing long-term dependencies. The CNN-LSTM model, which integrates the spatial feature extraction capabilities of CNNs and the temporal dependency modeling strength of LSTM, achieves RMSE = 2.39, MAE = 1.76, and R² = 0.93. This indicates its powerful, comprehensive modeling ability in time series prediction tasks.

Ultimately, the CNN-LSTM-RF hybrid model demonstrated the best predictive performance, with the RMSE reduced to 0.88, the MAE lowered to 0.66, and the R² reaching as high as 0.99. These results fully validate the model’s superior capability in capturing both spatial features and temporal dynamics of temperature data. The hybrid model effectively integrates the strengths of convolutional neural networks (CNNs) in spatial feature extraction and Long Short-Term Memory (LSTM) networks in temporal sequence modeling. The output of the CNN-LSTM structure is then combined with key pollutant features and fed into the random forest (RF) model, which significantly enhances the accuracy of temperature prediction influenced by atmospheric pollutants. By integrating deep learning with traditional machine learning methods, this study systematically explores the mechanisms through which air pollutants affect temperature variation. The proposed framework provides a novel approach for improving weather prediction accuracy and offers important insights and reference values for research and applications in climate-related fields.

3.4. Comparison of Temperature Prediction Accuracy at Multiple Time Scales

In the previous Section 3.3, this study focused on analyzing the performance of the CNN-LSTM-RF model for temperature prediction at the daily scale. To further explore the model’s applicability and stability across different temporal scales—especially its predictive capability for medium- to long-term temperature forecasting (e.g., monthly average temperature)—a multi-scale comparative analysis was conducted. Specifically, the preprocessed daily temperature data were aggregated into a monthly average temperature series using a moving average method. Based on this, the optimal time step (lookback) for the monthly scale was systematically determined through autocorrelation function (ACF) analysis and empirical validation via Grid Search (see Figure 13). The results indicate that, when the lookback is set to 7, the model achieves the lowest test set MSE (MSE = 0.00796). Meanwhile, the ACF plot also shows significant autocorrelation at lag = 7, revealing the long-memory characteristics of the temperature sequence. This provides both theoretical justification and practical support for the subsequent construction of the CNN-LSTM-RF model and its modeling of temporal dependencies.

Under the premise of keeping the previously selected model structure at the daily scale unchanged, this study adjusted the temporal granularity of the input data to the monthly scale and retrained the model using combined input features, including PM_2.5 and O₃ concentrations as well as historical temperature data. To prevent overfitting during training, the same dropout regularization strategy as in the daily-scale modeling was adopted, with the dropout rate set at 0.2, aiming to improve the model’s generalization ability and robustness in monthly-scale temperature prediction tasks. On this basis, the output of the CNN-LSTM model was further used as feature input to the random forest (RF) model for temperature prediction. This fusion strategy effectively combines the advantages of deep learning in temporal sequence modeling with the strong generalization capability of ensemble learning in nonlinear regression tasks, significantly enhancing the overall prediction accuracy and stability of the model. Figure 11 presents the temperature sequence prediction results of Wuhan city based on the monthly-scale CNN-LSTM-RF model. As shown in Figure 14, the overall trend of the model’s predicted values is highly consistent with the actual observations, further verifying the effectiveness and robustness of the model in medium- to long-term temperature forecasting tasks.

Finally, the evaluation results of the monthly-scale CNN-LSTM-RF hybrid model show an RMSE of 1.0097, an MAE of 0.8771, and an R² as high as 0.9841. A detailed comparison of performance metrics with the daily-scale model is presented in Table 4. The results indicate that the CNN-LSTM-RF model demonstrates high predictive accuracy across different temporal scales in temperature forecasting tasks. Specifically, the RMSE and MAE values at the daily (short-term) scale are lower than those at the monthly (medium- to long-term) scale, and the R² scores are higher, indicating that the model captures short-term temperature fluctuations more accurately and stably. Although errors increase somewhat at the monthly scale, the model still maintains strong fitting ability and high predictive performance, showing good adaptability and reliability in capturing medium- to long-term temperature trends. Overall, the CNN-LSTM-RF model performs well across multiple temporal scales for temperature prediction, demonstrating strong generalization capability and application potential.

It is worth noting that there are observable differences between the temperature prediction results based on daily and monthly scales. To further examine whether these differences are statistically significant, this study conducted a paired-sample t-test comparing the predicted annual average temperatures derived from both time scales. The test results yielded a t-value of −3.5299 and a p-value of 0.0242, indicating statistical significance at the 0.05 level. This suggests that the prediction results at the two temporal scales differ significantly within the 95% confidence interval. These findings further imply that temperature prediction models constructed at different temporal scales may produce systematic deviations in their estimates. Therefore, researchers are advised to carefully select appropriate time scales and time steps according to specific application needs and to adjust and optimize model structures and parameter settings accordingly, in order to enhance the scientific validity and reliability of climate forecasting outcomes.

3.5. The Comparative Study of Temperature Prediction Accuracy Across Different Seasons

Based on the aforementioned analysis, it was found that the impact of pollutant concentrations on temperature varies significantly across different seasons, indicating a clear seasonal heterogeneity in the relationship between temperature and pollution factors. To further investigate the adaptability and performance of the constructed model under different seasonal conditions, this study retained the optimal model structure previously determined based on daily-scale data and separately conducted model training and evaluation for spring, summer, autumn, and winter. The predictive performance of the model in each season was systematically compared using three metrics: root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R²). As shown in Figure 15, the RMSE and MAE values in spring are relatively higher, indicating that the model has the largest prediction error and the poorest fit in this season, whereas the predictive performance in autumn is optimal, with the highest R² value, demonstrating a stronger fitting capability of the model during this season. Combined with feature importance analysis, it was further found that ozone (O₃) has the most significant impact on temperature in autumn, which may be a key factor driving the improvement in model performance in this season.

To verify whether the differences in model predictive performance across seasons are statistically significant, this study conducted significance tests on daily prediction errors (i.e., absolute errors) using both one-way analysis of variance (ANOVA) and the non-parametric Kruskal–Wallis H test on the sample data from the four seasons. The test results are shown in Table 5, indicating significant differences in prediction errors among seasons (ANOVA: F = 2.94, p = 0.032; Kruskal–Wallis: H = 8.82, p = 0.031, both p < 0.05), which further confirms the significant seasonal variation in model performance and suggests that seasonal changes have an important impact on model accuracy.

4. Conclusions and Discussion

This study developed a hybrid CNN-LSTM-RF model for temperature forecasting, demonstrating the effectiveness of integrating deep learning and machine learning techniques in meteorological prediction. The model fully leverages the strengths of convolutional neural networks (CNN) in spatial feature extraction and Long Short-Term Memory (LSTM) networks in modeling temporal dependencies. Through correlation and seasonal analyses, PM_2.5 and O₃ were identified as the pollutants most significantly associated with temperature variation, while atmospheric pressure and solar radiation visibility were determined as the key meteorological factors, providing essential input features for the deep learning model. Finally, the prediction results from the deep learning model, combined with pollutant concentration data, were input into a random forest (RF) model to achieve high-precision and stable temperature forecasts. The model uses Mean Squared Error (MSE) as the loss function, offering an accurate and sensitive metric for performance evaluation.

Based on the above methods, this study has achieved the following main findings: (1) The concentrations of six major air pollutants in Wuhan exhibit significant seasonal distribution patterns, with the highest levels typically observed in winter and the lowest in summer, while ozone (O₃) shows an opposite seasonal trend. The impact of atmospheric pollutants on temperature displays obvious seasonal heterogeneity; specifically, ozone shows a significant positive correlation with temperature in autumn, whereas pollutants such as PM₁₀, NO₂, CO, and SO₂ exhibit a certain degree of positive correlation with temperature in winter. Temperature variations in other seasons are mainly driven by seasonal climatic factors. (2) The constructed CNN-LSTM-RF hybrid model demonstrates excellent performance in handling complex meteorological data, with prediction results closely matching actual observations, thereby validating the model’s stability and reliability. The model effectively addresses challenges posed by missing data and high-dimensional features, achieving significant improvements in prediction accuracy, with the RMSE reduced to 0.88, MAE at 0.66, and R² reaching 0.99 on the test set, highlighting its broad application potential in climate forecasting. (3) The model exhibits good adaptability across different time scales (short-term and medium-to-long-term). In short-term daily-scale predictions, the model achieves lower RMSE and MAE values and higher R² scores, demonstrating superior fitting ability to daily temperature fluctuations. Although prediction errors slightly increase in monthly-scale medium-to-long-term forecasts, the model maintains high fitting accuracy and prediction stability, indicating strong capability in capturing medium- to long-term temperature variation trends. Based on this, a paired-sample t-test was conducted to assess the significance of differences between annual mean temperature predictions at daily and monthly scales, revealing a significant difference at the 95% confidence level (t = −3.5299, p = 0.0242). This finding underscores the influence of temporal granularity on model prediction results and suggests that practical climate forecasting should rationally select time scales according to specific needs and perform targeted model optimization. (4) This study employed both one-way analysis of variance (ANOVA) and the non-parametric Kruskal–Wallis test to conduct significance tests based on daily prediction errors (absolute errors) across the four seasons. The results indicate significant differences in prediction errors among different seasons (ANOVA: F = 2.94, p = 0.032; Kruskal–Wallis: H = 8.82, p = 0.031, both p < 0.05), demonstrating that the model’s predictive performance varies significantly across seasons, suggesting that seasonal changes have a substantial impact on model accuracy. Specifically, the RMSE and MAE values are relatively higher in spring, indicating the largest prediction errors and relatively poorer fitting performance in this season; in contrast, autumn shows the best performance, with the highest R² value, demonstrating stronger model fitting ability during this period. Combined with feature importance analysis, it was further found that ozone (O₃) has the most significant influence on temperature in autumn, which may be a key factor driving the improvement in model performance.

These research findings not only provide important theoretical guidance and practical reference for fields such as agriculture, energy management, and urban planning but also offer a scientific basis for relevant decision-making. Future studies may further extend the application of this model to other regions with similar climatic conditions and explore its applicability and effectiveness in forecasting a broader range of meteorological phenomena, thereby contributing to the enhancement of weather forecasting capabilities and response strategies in the context of global climate change.

Specifically, by accurately predicting future temperature trends, this model offers critical climate information support for agricultural production. For example, the forecast results can be used to adjust the timing of crop sowing and harvesting, optimize irrigation scheduling, and develop scientific pest and disease control strategies. These applications can significantly improve agricultural productivity, enhance resilience against climate risks, and ensure food security and sustainable agricultural development. In terms of urban planning, the model’s ability to capture seasonal temperature variation provides valuable data for the layout of urban green spaces, energy-efficient building design, and energy consumption management. This helps mitigate urban heat island effects and promotes the rational use of resources and environmental sustainability. In addition, temperature predictions can be integrated into public health warning systems to issue early alerts for extreme heat or cold events, enabling relevant departments to make timely preparations and effectively protect public health and safety. In summary, these specific application examples fully demonstrate the broad applicability and profound value of the proposed model in climate risk management and decision-making support. They also lay a solid foundation for future interdisciplinary research and model development.

Author Contributions

Conceptualization, Y.L.; methodology, Y.L. and Y.D.; formal analysis, Y.L.; investigation, Y.L.; data curation, Y.D.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and Y.D.; visualization, Y.L.; supervision, Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

Yu Liu was supported by the High-Level Talent Training Program of Xizang University for the project “Research on Temperature Variation Prediction Integrating Deep Learning and Random Forest” (Project No. 2025-GSP-S064). Yuanfang Du was supported by the Discipline (Degree Program) Construction Fund of the School of Science, Xizang University (00061353), the 2025 Research Project of China Association of Trade in Services (CATIS-PR-250102), and the Natural Science Foundation Project of the Science and Technology Department of Tibet Autonomous Region (XZ202501ZR0116).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed in this study are publicly available. The air pollutant data (PM_2.5, PM₁₀, NO₂, SO₂, CO, and O₃) were obtained from the National Urban Air Quality Real-time Publishing Platform of the China National Environmental Monitoring Center (https://www.cnemc.cn/). Temperature data were obtained from the Huiju Data Platform (http://www.hjhj-e.com/).

Conflicts of Interest

The authors declare they have no conflicts of interest.

References

Roberts, S. Interactions between particulate air pollution and temperature in air pollution mortality time series studies. Environ. Res. 2004, 96, 328–337. [Google Scholar] [CrossRef] [PubMed]
Micjetti, M.; Gualtieri, M.; Anav, A.; Adani, M.; Benassi, B.; Dalmastri, C.; D’Elia, I.; Piersanti, A.; Sannino, G.; Zanini, G.; et al. Climate change and air pollution: Translating their interplay into present and future mortality risk for Rome and Milan municipalities. Sci. Total Environ. 2022, 830, 154680. [Google Scholar] [CrossRef] [PubMed]
Gao, Q.; Jiang, B.; Tong, M.; Zuo, H.; Cheng, C.; Zhang, Y.; Song, S.; Lu, L.; Li, X. Effects and interaction of humidex and air pollution on influenza: A national analysis of 319 cities in mainland China. J. Hazard. Mater. 2025, 490, 137865. [Google Scholar] [CrossRef] [PubMed]
Cheng, C.; Liu, Y.; Han, C.; Fang, Q.; Cui, F.; Li, X. Effects of extreme temperature events on deaths and its interaction with air pollution. Sci. Total Environ. 2024, 915, 170212. [Google Scholar] [CrossRef] [PubMed]
Zhang, P.; Zhang, J.; Liu, Z.; Liu, Y.; Chen, Z. Relationship between land surface temperature and air quality in urban and suburban areas: Dynamic changes and interaction effects. Sustain. Cities Soc. 2025, 118, 106043. [Google Scholar] [CrossRef]
Rodríguez, L.R.; Delgado, M.G.; Medina, D.C.; Sánchez Ramos, J.; Álvarez Domínguez, S. Forecasting urban temperatures through crowdsourced data from Citizen Weather Stations. Urban Clim. 2024, 56, 102021. [Google Scholar] [CrossRef]
Wang, H.; Zhang, J.; Yang, J. Time series forecasting of pedestrian-level urban air temperature by LSTM: Guidance for practitioners. Urban Clim. 2024, 56, 102063. [Google Scholar] [CrossRef]
Cruz, J.I.G.; de Vera, J.M.L.; Pilario, K.E. Machine learning-driven analysis of agro-climatic data for temperature modeling and forecasting in Philippine urban areas. Urban Clim. 2025, 60, 102339. [Google Scholar] [CrossRef]
Neyman, J. An Introduction to the Theory of Statistics. Nature 1938, 141, 140–141. [Google Scholar] [CrossRef]
Walker, G.T.; Bliss, E.W. World Weather V. Mem. R. Meteorol. Soc. 1932, 4, 53–84. [Google Scholar]
Wang, L.; Zhao, Y.; Yang, X.; Ma, J.; Huang, T.; Gao, H. Air Quality Study in Lanzhou Based on Time Series Models and Residual Control Charts. Plateau Meteorol. 2015, 34, 230–236. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Du, Y.; You, S.; Liu, W.; Basang, T.X. Spatiotemporal evolution characteristics and prediction analysis of urban air quality in China. Sci. Rep. 2023, 13, 8907. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Zhang, L.; Wang, X.; Liang, B. Forecasting greenhouse air and soil temperatures: A multi-step time series approach employing attention-based LSTM network. Comput. Electron. Agric. 2024, 217, 108602. [Google Scholar] [CrossRef]
Ma, L.; Tian, S. A hybrid CNN-LSTM model for aircraft 4D trajectory prediction. IEEE Access 2020, 8, 134668–134680. [Google Scholar] [CrossRef]
Ranjan, N.; Bhandari, S.; Zhao, H.P.; Kim, H.; Khan, P. City-wide traffic congestion prediction based on CNN, LSTM and transpose CNN. IEEE Access 2020, 8, 81606–81620. [Google Scholar] [CrossRef]
Zhang, Y.; He, Q.; Zeng, S. Research on global temperature prediction based on CNN-LSTM model. Prog. Appl. Math. 2024, 13, 302–312. [Google Scholar]
Huang, C.; Li, Q.; Xie, Y.; Peng, J. Application of machine learning methods in summer precipitation prediction in Hunan Province. Trans. Atmos. Sci. 2022, 45, 191–202. [Google Scholar]
Wang, B.; Liu, S.; Wang, B.; Wu, W.; Wang, J.; Shen, D. Multi-step ahead short-term predictions of storm surge level using CNN and LSTM network. Acta Oceanol. Sin. 2021, 40, 104–118. [Google Scholar] [CrossRef]
Zhang, Y.; Jiang, Y.; Li, C.; Bai, C.; Zhang, F.; Li, J.; Guo, M. Prediction of cement-stabilized recycled concrete aggregate properties by CNN-LSTM incorporating attention mechanism. Mater. Today Commun. 2024, 42, 111137. [Google Scholar] [CrossRef]
Ran, J.; Gong, Y.; Hu, Y.; Cai, J. EV load forecasting using a refined CNN-LSTM-AM. Electr. Power Syst. Res. 2025, 238, 111091. [Google Scholar] [CrossRef]
Ahmed, A.; Alalana, J.A.S. Temperature prediction using LSTM neural network. In Proceedings of the IEEE 9th International Conference on Electronics, Kobe, Japan, 13–16 October 2020; pp. 210–215. [Google Scholar]
Chauhan, R.; Ghanshala, K.K.; Joshi, R.C. Convolutional neural network (CNN) for image detection and recognition. In Proceedings of the International Conference on Secure Cyber Computing and Communication (ICSCCC), National Institute of Technology, Punjab, India, 15–17 December 2018; pp. 278–282. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Wang, Y.C.; Song, H.M.; Wang, J.S.; Ma, X.R.; Song, Y.W.; Qi, Y.L. Multi-strategy fusion binary SHO guided by Pearson correlation coefficient for feature selection with cancer gene expression data. Egypt. Inform. J. 2025, 29, 100639. [Google Scholar] [CrossRef]

Figure 1. CNN neural network structure diagram.

Figure 2. LSTM neural network structure diagram.

Figure 3. (a) Heat map of the correlation coefficients between air temperature and six major pollutants in Wuhan. (b) Significant heat map of the correlation coefficients between air temperature and six major pollutants in Wuhan (p-value).

Figure 4. (a) Heatmap of Pearson correlation coefficients between temperature and major meteorological factors in Wuhan. (b) Heatmap of significance levels (p-values) for Pearson correlations between temperature and major meteorological factors in Wuhan.

Figure 5. (a) Evaluation of feature importance for six major air pollutants using random forest. (b) Evaluation of meteorological feature importance based on random forest.

Figure 6. Heat map of temperature and correlation coefficients of six major pollutants in four seasons in Wuhan. (a) Correlation Heatmap of Air Temperature and Major Pollutants in Spring. (b) Correlation Heatmap of Air Temperature and Major Pollutants in Summer. (c) Correlation Heatmap of Air Temperature and Major Pollutants in Autumn. (d) Correlation Heatmap of Air Temperature and Major Pollutants in Winter.

Figure 7. Box-line diagrams of the six major pollutant gases. (a) Boxplot of PM_2.5 concentration. (b) Boxplot of PM₁₀ concentration. (c) Boxplot of NO₂ concentration. (d) Boxplot of CO concentration. (e) Boxplot of SO₂ concentration. (f) Boxplot of O₃ concentration.

Figure 8. (a) Temperature series map of Wuhan city; (b) normalized sequence diagrams.

Figure 9. (a) Comparison of model prediction errors under different time steps on a daily scale (based on the Grid Search method); (b) autocorrelation plot of air temperature time series on a daily scale.

Figure 10. (a) Fitting curve of temperature series training set in Wuhan city. (b) Loss function curve.

Figure 11. Prediction curve of temperature series in Wuhan based on CNN-LSTM model.

Figure 12. (a) Temperature series prediction curve based on the daily-scale CNN-LSTM-RF model. (b) Residual diagram of the CNN-LSTM-RF model. (c) ACF and PACF of the residuals from the CNN-LSTM-RF model. (d) QQ plot of the residuals from the CNN-LSTM-RF model.

Figure 13. (a) Comparison of model prediction errors at different time steps on a monthly scale (based on experimental verification using Grid Search); (b) autocorrelation plot of the temperature series on a monthly scale.

Figure 14. Temperature series prediction curve of Wuhan based on the monthly-scale CNN-LSTM-RF model.

Figure 15. Comparison of CNN-LSTM-RF model performance across different seasons.

Table 1. Granger causality test results for key variables with lag order 5.

Independent Variable	Lag Order	F-Test Value	p-Value
PM_2.5	5	10.8414	0.0000
O₃	5	4.7000	0.0092
Daily Visibility (km)	5	4.1067	0.0010
Daily Air Pressure (hPa)	5	11.6415	0.0000

Table 2. Internal structure of the CNN-LSTM-RF model.

Model Name/Hierarchy	Output Shape	Parameter Count	Description
Input Layer	(None, 7, 3)	0	The input features are temperature, atmospheric pressure, and visibility, with a time step length of 7 and a feature dimension of 3.
Conv1D	(None, 32, 5)	320	The 1D convolutional layer has 3 input channels and 32 output channels, with a kernel size of 3, no padding (padding = 0), and a stride of 1 (the kernel moves forward by 1 element each time). As a result, the time step length is reduced by 2.
RELU	(None, 32, 5)	0	Activation function: ReLU.
MaxPool1D	(None, 32, 3)	0	The max pooling layer has a pool size of 3 and a stride of 1, resulting in a reduction of 4 in the time step length.
permute	(None, 3, 32)		Converted to LSTM input format: (batch, seq_len, features)
LSTM (units = 32)	(None, 16, 32)	16,640	LSTM layer with 32 hidden units and 2 layers, reducing the time steps by 4.
Dense Layer (Linear)	(None, 1)	33	Fully connected layer that maps the output of the LSTM to the temperature prediction value (single output).
Output Layer	(None, 1)	0	Final output: temperature prediction value.
CNN-LSTM Output	(None, 1)	16,993	The output of the CNN-LSTM model, which is the temperature prediction result, serves as one of the input features for the random forest.
Pollutant Feature 1 (PM_2.5)	(None, 1)	0	PM_2.5 concentration, used as one of the input features for the random forest.
Pollutant Feature 2 (O₃)	(None, 1)	0	O₃ concentration, used as one of the input features for the random forest.
Random Forest Input Features	(None, 3)	-	The CNN-LSTM output is combined with two pollutant features (PM_2.5 and O₃) to form three features used as input for the random forest.
Random Forest Output (Temperature Prediction)	(None, 1)	-	The output of the random forest is the temperature prediction value.

Table 3. Model performance comparison.

Model	RMSE	MAE	R² Score
RF	5.58	4.42	0.60
CNN	3.43	2.78	0.86
LSTM	2.64	1.91	0.91
CNN-LSTM	2.39	1.76	0.93
CNN-LSTM-RF	0.88	0.66	0.99

Table 4. Performance comparison of CNN-LSTM-RF model at multiple time scales.

Time Granularity	RMSE	MAE	R² Score
Daily Scale (Short-term)	0.8816	0.6582	0.9905
Monthly Scale (Medium- to Long-term)	1.0097	0.8771	0.9841

Table 5. Significance test results of prediction errors for the CNN-LSTM-RF model across different seasons.

Test Method	Test Statistic	p-Value	Significance Conclusion
One-way ANOVA	F = 2.9436	0.0324	Significant (p < 0.05)
Kruskal–Wallis H Test	H = 8.8187	0.0310	Significant (p < 0.05)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Du, Y. A Deep Learning–Random Forest Hybrid Model for Predicting Historical Temperature Variations Driven by Air Pollution: Methodological Insights from Wuhan. Atmosphere 2025, 16, 1056. https://doi.org/10.3390/atmos16091056

AMA Style

Liu Y, Du Y. A Deep Learning–Random Forest Hybrid Model for Predicting Historical Temperature Variations Driven by Air Pollution: Methodological Insights from Wuhan. Atmosphere. 2025; 16(9):1056. https://doi.org/10.3390/atmos16091056

Chicago/Turabian Style

Liu, Yu, and Yuanfang Du. 2025. "A Deep Learning–Random Forest Hybrid Model for Predicting Historical Temperature Variations Driven by Air Pollution: Methodological Insights from Wuhan" Atmosphere 16, no. 9: 1056. https://doi.org/10.3390/atmos16091056

APA Style

Liu, Y., & Du, Y. (2025). A Deep Learning–Random Forest Hybrid Model for Predicting Historical Temperature Variations Driven by Air Pollution: Methodological Insights from Wuhan. Atmosphere, 16(9), 1056. https://doi.org/10.3390/atmos16091056

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Deep Learning–Random Forest Hybrid Model for Predicting Historical Temperature Variations Driven by Air Pollution: Methodological Insights from Wuhan

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Sources and Preprocessing

2.2. Theoretical Framework of the Model

2.2.1. Convolutional Neural Network (CNN) Model

2.2.2. Long Short-Term Memory (LSTM) Model

2.2.3. Random Forest (RF) Model

3. Results and Analysis

3.1. Correlation Analysis

3.1.1. Pearson Correlation Coefficient Analysis

3.1.2. Random Forest Feature Importance Analysis

3.2. Seasonal Analysis

3.3. Analysis of Temperature Forecasting Accuracy

3.4. Comparison of Temperature Prediction Accuracy at Multiple Time Scales

3.5. The Comparative Study of Temperature Prediction Accuracy Across Different Seasons

4. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI