1. Introduction
In recent decades, the problem of heavy rainfall and flooding has become increasingly severe worldwide, due to the impacts of monsoon climate change and rapid human development, posing a significant threat to human survival and development [
1,
2]. Accurate daily precipitation prediction is not only a core issue of meteorological science, but also a key support for social and economic development. For example, in flood prevention and disaster mitigation, forecasting heavy rainfall 24 h in advance can help government departments initiate emergency responses promptly, thereby reducing casualties and property damage. In agricultural irrigation, accurate precipitation forecasting can optimize water distribution and boost crop yields. In urban planning, drainage systems can be effectively designed with precipitation predictions to prevent waterlogging. Additionally, for ecologically fragile areas (such as the Kunming Plateau Lake), precipitation prediction can provide a scientific basis for ecological water replenishment and biodiversity maintenance. Therefore, enhancing the accuracy of daily precipitation prediction is of great practical urgency and has extensive socio-economic value [
3].
Precipitation forecasting methods can be broadly divided into two categories: process-based methods and data-driven methods. Process-based methods rely on a deep understanding of atmospheric physical processes and can offer intuitive explanations of precipitation formation mechanisms, making them particularly suitable for long-term forecasting and complex weather system simulations. Data-driven methods, on the other hand, are highly adaptable, capable of handling large-scale data and capturing complex nonlinear relationships, making them suitable for real-time and short-term forecasting tasks [
4].
Currently, researchers worldwide have conducted extensive studies in precipitation prediction, ranging from the early traditional statistical model and the autoregressive moving average model (ARIMA) to the convolutional neural network (CNN) [
5] and recurrent neural network (RNN) [
6,
7], which can capture both the temporal and spatial features of precipitation. Long short-term memory (LSTM) is a special type of RNN, and the LSTM model has a unique “gate” structure that solves the problem of gradient explosion and vanishing in the training of long time-series processes and improves the accuracy of long-term process simulation. Shen Haojun et al. [
8] used LSTM to study summer precipitation in China, providing a reference for seasonal precipitation prediction. Kang et al. [
9] selected an LSTM model with multiple input variables to predict daily precipitation in Jingdezhen, Jiangxi Province. Han Ying et al. [
10] used the advantages of deep learning and width learning to propose an improved LSTM model (LSTM-WBLS), which provides a novel approach for precipitation prediction research. To address the low accuracy in predicting extreme precipitation values and no-rain days, Ling et al. [
11] proposed a combined framework integrating support vector machines (SVMs), complete ensemble empirical mode decomposition (CEEMDAN), and bidirectional long short-term memory networks (BiLSTM) for daily precipitation prediction in the Poyang Lake Basin. In recent years, attention mechanisms have been widely applied and continuously optimized and are now utilized in popular research fields such as computer vision, speech recognition, and image recognition. Some scholars have attempted to apply attention mechanisms to precipitation forecasting to improve prediction accuracy. Cheng Yuxiang [
12] proposed an attention-based BiLSTM model to analyze the weights of meteorological factors. Compared to traditional methods, this model shows superior performance in predicting precipitation influenced by multiple meteorological factors. However, these methods are limited by their relatively simple feature learning abilities, and cannot fully capture complex spatiotemporal relationships, which affects accuracy and generalization capabilities. In addition, when dealing with high-dimensional multi-source data, they may encounter dimensionality disasters that make the model difficult to train and optimize.
Therefore, combining the advantages of PCA, CNN-BiLSTM, and attention mechanisms, this paper proposes a new method for daily precipitation prediction based on a PCA-CNN-BiLSTM-Attention model, tailored to address the nonlinear and temporal characteristics of precipitation data. Among them, PCA can effectively extract the main features in the data, reduce redundant information, and improve the training efficiency and accuracy of the algorithm. Subsequently, convolutional neural networks (CNNs) can effectively capture the nonlinear local features in precipitation data. Next, the bidirectional time-dependent features of the sequence data can be extracted using a bidirectional long-term short-term memory network (BiLSTM) layer. On this basis, the features generated by the hidden layer of BiLSTM are used as inputs of the attention mechanism. The attention mechanism then distinguishes the importance of the time features extracted by BiLSTM through a weighting method, so as to reduce the interference of redundant information on the precipitation prediction results. Finally, comparative experiments prove the reliability and effectiveness of the proposed method, which can provide a reference for agriculture and water conservancy departments to make water resources management decisions and then reduce the risk of drought and flood disasters.
  2. Data and Methods
  2.1. Principal Component Analysis (PCA)
Principal component analysis (PCA) is a widely used dimensionality reduction method. It transforms the original variables into a set of linearly uncorrelated variables, known as principal components, through orthogonal transformation. By representing the original data with fewer principal components, PCA achieves dimensionality reduction [
13]. In meteorological data analysis, PCA reduces multicollinearity among original meteorological variables, making complex datasets easier to understand and process. By applying linear transformations to historical meteorological data, PCA identifies the principal components that best capture the overall trends in the data. These principal components are typically sorted by their contribution to variance, and when the cumulative contribution rate exceeds a certain threshold, they can be used to characterize the original variables.
Consider a sample dataset 
 of dimension 
, where 
is the number of samples and each sample has 
-dimensional features, representing 
 attributes or indicators of the dataset and mapping the data into 
 dimensions in space, as shown in Equation (1):
The steps of PCA are as follows:
Different features in the original dataset have different dimensions, and standardized data features can unify the magnitudes of each variable into the same range, thereby reducing the negative impact of the numerical coefficients of the eigenvalues on the analysis results. The specific calculation formula is shown in Equation (2):
        where 
 are the mean and variance of each observation sample 
 of dataset 
, and 
 is the normalized value of 
, forming a standardized data matrix 
.
The correlation coefficient matrix 
can be calculated as shown in Equation (3):
According to the eigenequation  the eigenvalues of the correlation coefficient matrix   (i = 1, 2, …, m),  ≥  ≥ … ≥  ≥ 0 solve = 0 to obtain the eigenvector corresponding to , so as to obtain the eigenvalues and eigenvectors.
The variance contribution 
 and cumulative variance contribution 
 are calculated as shown in Equations (4) and (5):
The number of principal components is determined according to the cumulative variance contribution rate, and the principal components are calculated according to the impact factor component matrix.
  2.2. Convolutional Neural Networks (CNNs)
CNNs have a unique three-layer architecture composed of convolutional layers, pooling layers, and fully connected layers, enabling them to efficiently extract relevant features related to precipitation and other factors. In the convolutional layer, filters are applied to extract features from the input data. These filters perform convolution operations on local regions of the input data via sliding windows, generating new feature maps. Pooling layers reduce the spatial dimensions of the feature maps while retaining the most significant features. Activation functions such as ReLU increase the nonlinear expressive power of the network. Fully connected layers integrate the features extracted by the convolutional layers and perform the final classification or regression tasks. The basic structure of CNNs is shown in 
Figure 1.
The convolution operation is shown in Equation (6):
        where 
 is the output value of the convolution operation of its (L − 1) input using different filters (
), 
 is the output value of each filter, and 
 is the corresponding activation function after the convolution operation.
The max-pooling operation is shown in Equation (7):
        where 
 is the feature map extracted from the previous convolutional layer, and 
 is the size of the merged values.
  2.3. Bidirectional Long Short-Term Memory Networks (BiLSTM)
Bidirectional long short-term memory (BiLSTM) networks are a specialized type of recurrent neural network (RNN) that incorporates two cell-state propagation paths: one moving forward (past to future) and one moving backward (future to past). This bidirectional architecture enables BiLSTM to capture past and future temporal information, while also exploring the relationships between historical and future temporal contexts through recursive and feedback-driven processing. By arranging neurons in opposing directions, a bidirectional training mechanism is established, leveraging both past and future data to construct the BiLSTM network.
The core LSTM model [
14] treats neurons as the smallest units of information processing, with each neuron comprising three “gates” that continuously update its state. These three gates are defined as the forget gate Ft, the input gate It, and the output gate Ot. At time step t, each gate receives the input value Zt at the current time step and the output value Ht − 1 from the previous time step, collectively influencing the training process at time step t. The structure of an LSTM unit is illustrated in 
Figure 2.
The computation process is shown in Equation (8):
        where 
, 
 are the weight matrices of the forget gate, input gate, cell-state update, and output gate, respectively. 
, and bobo are the biases of the forget gate, input gate, cell-state update, and output gate, respectively. 
 and 
 are the outputs of the forget gate, input gate, and output gate, respectively. 
 is the sigmoid function, 
 is the hidden state of the cell, and 
is also the weight matrix corresponding to each gate.
Assume that the first LSTM layer processes information in chronological order, while the second LSTM layer processes information in reverse chronological order. At time step 
t, the hidden states of the forward LSTM and backward LSTM are defined as 
 and 
, respectively. The layer-wise computation of the network is expressed in Equation (9) as follows:
        where 
W, 
V, 
 are the weight matrices of the input layer, hidden layer, and output layer, respectively, 
b and 
c are the offsets, 
 represent the vector splicing, and 
g is the activation function.
  2.4. Attention Mechanism
The attention mechanism is inspired by the way the human brain processes information. In deep learning, it is primarily used to assign different weights to various parts of the input sequence, determining the importance of each part in the output sequence. This allows the model to focus on the more important parts of the input sequence. The structure of an attention unit is illustrated in 
Figure 3.
The computation process is shown in Equation (10):
        where 
 is the attention weight of the BiLSTM hidden layer output 
 for the current input, 
 is the input sequence, 
 is the hidden state corresponding to the input sequence, and V, W, U, and b are the model’s learnable parameters.
  2.5. PCA-CNN-BiLSTM-Attention Model
Based on the above algorithms, a hydrological model that integrates PCA, CNN-BiLSTM, and attention mechanisms is proposed. The model’s workflow is shown in 
Figure 4, and the basic process is as follows:
- (1)
- Collect and organize precipitation, temperature, and other meteorological spatial data, reconstruct the data into one-dimensional sequences, and normalize the data using the “max–min” method to enhance model training stability. 
- (2)
- Use PCA to reduce the dimensionality of the normalized meteorological data and select key principal component variables based on a predefined threshold, representing the main spatial features of precipitation and temperature. 
- (3)
- Use convolutional neural networks (CNNs) to extract temporal features and capture local feature information. The processed time series data are then input into a bidirectional long short-term memory network (BiLSTM) to learn long-term and short-term dependencies in the data. 
- (4)
- Introduce an attention mechanism at the output layer of the BiLSTM model to enhance the model’s focus on important time steps, improving prediction accuracy. The features processed by the attention mechanism are then input into a fully connected layer, where linear regression is used to predict the precipitation variable at the target time step (L). Linear regression, as a simple and effective method, maps high-dimensional features (extracted by CNN, BiLSTM, and attention mechanisms) to the scalar precipitation values at the target time step. The linear regression in the fully connected layer is consistent with the end-to-end deep learning framework, ensuring computational efficiency and seamless gradient propagation during the training process. 
  2.6. Study Area
Kunming is located in central Yunnan, China, with latitudes ranging from 24° N to 27° N and longitudes from 101° E to 104° E. Situated on the Yunnan–Guizhou Plateau, the region has an elevation range of 692 to 4219 m. The underlying surface primarily consists of subtropical evergreen broad-leaved forests and partial grasslands. The climate is characterized by a subtropical monsoon—warm, humid, and distinctively seasonal. The East Asian monsoon significantly influences Kunming, resulting in relatively uniform precipitation distribution. The average annual precipitation in Kunming ranges from 1000 to 1500 mm, with the majority concentrated during the summer months. High-precision precipitation modeling and forecasting are crucial for flood prevention, water resource management, and environmental protection in Kunming and its surrounding areas.
  2.7. Data
The data used in this study were sourced from the China Ground Cumulative Daily Value Dataset (V3.0) released by the China Meteorological Data Network. The daily live observation meteorological data of Kunming City (station number 56778) from 1 January 1953 to 31 December 2019 were selected as the research object. A total of 24,472 days of meteorological data were collected. Among them, a small number of missing data were supplemented using linear interpolation, and some experimental data are presented in 
Table 1. The partially missing data are shown in 
Table 2.
  4. Conclusions
In this study, principal component analysis (PCA), bidirectional long short-term memory networks (BiLSTMs), and attention mechanisms were integrated for daily precipitation forecasting. The method effectively extracts the spatial characteristics of meteorological elements and comprehensively captures the contextual information in the time series. A systematic review of the PCA-CNN-BiLSTM-Attention model in the Kunming study area showed that, compared with the basic model, the model achieved a Nash efficiency coefficient of 0.993, with the RMSE and MAE reduced by 67.31% and 58.12%, respectively. These results verify that the model has strong applicability and robustness.
The model can be integrated into the urban emergency management system, predicting heavy rain events 24 h in advance and assisting relevant departments in the timely activation of emergency plans. For example, by combining the topographical data of Kunming City with historical flood records, the model can further optimize warning thresholds, dynamically assess flooding risks under different precipitation intensities, and provide a scientific basis for personnel evacuation and resource allocation.
The model in this study still has the following limitations: first, it is sensitive to data quality, and the imputation of missing values may introduce bias. Second, the prediction stability decreases in rare extreme weather events. Third, the computational complexity of the model is high, and GPU acceleration is required to meet real-time requirements. This study explores the advantageous fusion of deep learning, principal component analysis, and attention mechanism in precipitation prediction and provides a new feasible scheme for precipitation prediction research. Future work will integrate multi-source data, such as terrain elevation and satellite remote sensing, and explore the design of lightweight models to improve generalization capabilities.