TransMambaCNN: A Spatiotemporal Transformer Network Fusing State-Space Models and CNNs for Short-Term Precipitation Forecasting

Zhang, Kai; Zhang, Guojing; Wang, Xiaoying

doi:10.3390/rs17183200

Open AccessArticle

TransMambaCNN: A Spatiotemporal Transformer Network Fusing State-Space Models and CNNs for Short-Term Precipitation Forecasting

by

Kai Zhang

^1,2,

Guojing Zhang

^1,2,*

and

Xiaoying Wang

³

¹

School of Computer Technology and Application, Qinghai University, Xining 810016, China

²

Intelligent Computing and Application Laboratory of Qinghai Province, Qinghai University, Xining 810016, China

³

School of Computer and Information Science, Qinghai Institute of Technology, Xining 810018, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(18), 3200; https://doi.org/10.3390/rs17183200

Submission received: 24 July 2025 / Revised: 8 September 2025 / Accepted: 14 September 2025 / Published: 16 September 2025

(This article belongs to the Special Issue Precipitation, Flood and Earthquake Events Monitoring, Simulation, Analysis and Early Warning by Advanced Environmental Remote Sensing and AI)

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

We propose TransMambaCNN, a novel spatiotemporal model designed for short-term precipitation forecasting. It effectively integrates a Convolutional State-Space Module (C-SSM) to capture global dependencies with multi-scale Inception modules for local feature extraction, leading to significant improvements in multivariate precipitation forecasting capability.
It demonstrated superior performance over classical models on two benchmark datasets, with notable improvements in forecasting accuracy for challenging heavy rainfall events.

What is the implication of the main findings?

The dual-branch architecture with learnable parameters enables adaptation to diverse meteorological regions (coastal vs. plateau), significantly improving generalization in regional precipitation forecasting.
Multivariate analysis confirms the synergistic effects of integrating temperature, humidity, and wind speed data, advancing physically informed precipitation prediction methodologies.

Abstract

Deep learning for precipitation forecasting remains constrained by complex meteorological factors affecting accuracy. To address this issue, this paper proposes TransMambaCNN, which is a spatiotemporal transformer network fusing state-space models and CNNs for short-term precipitation forecasting. The core of the model employs a Convolutional State-Space Module (C-SSM), which efficiently extracts spatiotemporal features from multi-source meteorological variables by replacing the self-attention mechanism in the Vision Transformer (ViT) with an Attentive State-Space Module (ASSM) and augmenting its feature extraction capacity with integrated depthwise convolution. Its dual-branch architecture consists of a global branch, where C-SSM captures long-range dependencies and global spatiotemporal patterns, and a local branch, which leverages multi-scale convolutions based on SimVP’s Inception structure to extract fine-grained local features. The deep fusion of these dual branches significantly enhances spatiotemporal feature representation.Experiments demonstrate that in southeastern China and adjacent marine areas (period of high precipitation: April–September), TransMambaCNN achieves a 13.38% and 47.67% improvement in Threat Score (TS) over PredRNN at thresholds of ≥25 mm and ≥50 mm, respectively. In the Qinghai Sanjiangyuan region of western China (a precipitation-scarce area), TransMambaCNN’s TS score surpasses SimVP by 11.86 times at the ≥25 mm threshold.

Keywords:

short-term precipitation forecasting; state-space models; CNNs; dual-branch architecture

1. Introduction

Short-term precipitation forecasting stands as one of the core challenges in weather forecasting. Its accuracy directly impacts people’s daily lives, travel safety, industrial and agricultural production, commercial activities, and disaster prevention decision-making, with profound and far-reaching consequences [1,2]. However, achieving high-precision short-term precipitation forecasts remains a major challenge in meteorological science due to the extreme complexity of atmospheric dynamical processes and the frequent occurrence of precipitation events characterized by significant abruptness and high locality [3].

Traditional precipitation forecasting methods primarily rely on meteorological observations, empirical rules, and physical analysis. With the advancement of artificial intelligence (AI) technologies, current precipitation forecasting approaches can be broadly categorized into two types—Numerical Weather Prediction (NWP)-based methods [4,5] and radar/satellite real-time extrapolation-based methods [6,7,8,9]. NWP constructs partial differential equations describing atmospheric state evolution based on the physical laws of fluid dynamics and thermodynamics. Solving these highly complex nonlinear equations numerically provides relatively accurate short-to-medium-range precipitation forecasts. Nevertheless, the high complexity of NWP models demands substantial computational resources and the parameterization of physical processes [10,11]. Conversely, radar echo extrapolation-based precipitation forecasting methods are renowned for their speed and real-time capability. These methods utilize consecutive radar observations to extrapolate the motion and evolution trends of current echoes, generating forecasted echo maps for future timesteps [12]. The extrapolated future radar reflectivity is subsequently converted into precipitation estimates [13]. However, this conversion relies on empirical Z-R relationships to indirectly derive precipitation amounts [14]. Since this relationship inadequately characterizes the physical mechanisms linking reflectivity and precipitation intensity (such as dependence on precipitation type, raindrop size distribution evolution, and phase changes of solid precipitation), it leads to systematic intensity biases and false precipitation signals in the forecasts. The vigorous development of deep learning has significantly enhanced performances in many traditionally challenging tasks, including those within the meteorological domain. This technology plays a crucial role in addressing meteorological difficulties. Notably, framing precipitation forecasting as a video spatiotemporal sequence prediction problem has emerged as an important application direction and a key research focus [15].

Methods based on Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) networks and their improved variants, represent one of the core research directions for short-term precipitation forecasting. Shi et al. integrated the spatial feature extraction capability of Convolutional Neural Networks (CNNs) with the time-series modeling strength of LSTMs by replacing fully connected layers with convolutional layers, proposing the Convolutional LSTM (ConvLSTM) model [16], which significantly enhanced the accuracy of precipitation forecasting. Subsequently, the same team further introduced a positional encoding mechanism to develop the Trajectory Gated Recurrent Unit (TrajGRU) model [17], strengthening its spatial information processing capability. Wang et al. proposed the Predictive Recurrent Neural Network (PredRNN), centered on the Spatiotemporal LSTM (ST-LSTM) unit [18]. Their subsequent work, PredRNN++ [19], effectively mitigated the vanishing gradient problem through structural optimization and enhanced modeling capability for short-term spatiotemporal dynamics. Furthermore, the Memory In Memory (MIM) module proposed by Wang et al. effectively captured higher-order non-stationary dynamics in sequence evolution by leveraging the differential information between adjacent hidden states within the recurrent path [20]. Other representative models, such as SAC-LSTM [21], 3D-Conv-LSTM [22], EOST-LSTM [23], MotionRNN [24], TISE-LSTM [25], GAN-LSTM [26], and DB-RNN [27], have also optimized the RNN architecture from various dimensions. These studies primarily focus on innovations in the model structure itself.

Meanwhile, in the field of Convolutional Neural Network-based forecasting, SimVP [28] is a video prediction model built entirely on CNNs, demonstrating through experiments that even this simple SimVP architecture can achieve state-of-the-art results.Han employed a CNN-based U-Net framework for radar data-based precipitation nowcasting, achieving a performance nearly equivalent to that of the Trajectory GRU (TrajGRU) model [29]. Deng applied the U-Net framework to enhance summer precipitation forecasting in China [30]. Furthermore, models improved upon the U-Net framework, such as Broad-UNet [31], RainNet [32], Wf-UNet [33], and Mctu-net [34], have also been utilized for precipitation forecasting. However, the inherent characteristic of CNNs to learn local spatial information reduces their ability to capture long-term spatiotemporal dependencies, thereby limiting their predictive accuracy.

In the field of attention mechanism-based forecasting, the RAP-Net [35] model proposed by Zhang embeds Regional Attention Blocks (RABs) to enhance the model’s capability for forecasting heavy rainfall regions. The SwinLSTM model [36] combines Swin Transformer [37] blocks with LSTM, demonstrating that learning global spatial dependencies is more beneficial for capturing spatiotemporal dependencies. The PrecipLSTM model [38] integrates a Spatial Local Attention Memory (SLAM) module and a Temporal Difference Memory (TDM) module with PredRNN, achieving state-of-the-art results on radar datasets. Additionally, Transformer-based improved models, such as TFT [39], DCTN [40], RR-Former [41], and TransLSTMUNet [42], are also widely applied in precipitation forecasting.The introduction of self-attention mechanisms provides a powerful tool for modeling complex long-range spatiotemporal dependencies within precipitation systems, significantly expanding the design space and potential of models.

Although the aforementioned deep learning-based radar echo extrapolation models have achieved significantly superior accuracy and speed for short-term nowcasting compared to traditional extrapolation methods, they still face critical challenges. A core limitation lies in their reliance on a single data source (sequences of radar echo images). Precipitation is the result of a complex physical process involving the interaction of multiple atmospheric factors such as humidity, temperature, air pressure, and wind speed. While single radar echo maps can intuitively reflect current precipitation intensity and distribution, they are inadequate for comprehensively characterizing the underlying three-dimensional thermodynamic structure of the weather systems driving their evolution. Furthermore, radar observations themselves are constrained by the Earth’s curvature, terrain occlusion, signal attenuation, and limitations in station density, rendering many regions (e.g., oceans, plateaus, and remote areas) without effective coverage or high-quality continuous data. More importantly, precipitation events in meteorology exhibit a long-tail distribution, where light rain or no rain constitutes the majority of cases, while heavy rain and rainstorms occur infrequently. This distribution makes many standard methods poorly suited, leading to reduced forecasting accuracy for heavy rain and rainstorms.

The recent introduction of S-Mamba has substantiated the utility of state-space models in forecasting [43] in order to address the aforementioned challenges. This paper proposes a spatiotemporal Transformer network fusing state-space models and CNNs for short-term precipitation forecasting. This model utilizes multiple meteorological variables (including temperature, humidity, and wind speed) to predict precipitation over the next 24 h. To effectively resolve the complex relationships between diverse meteorological variable data and precipitation data across regions of varying elevation, the authors first conduct a proportional analysis of precipitation data from two datasets, categorizing rainfall into light rain, moderate rain, heavy rain, and rainstorm. Subsequently, this work introduces the Attention state-space module (ASSM) [44] from the image domain into the Vision Transformer (ViT) [45] to form a Convolutional State-Space Module (C-SSM) for precipitation forecasting. Leveraging its global information modeling capability, the C-SSM addresses the need to capture spatiotemporal dependencies across large-scale meteorological systems, thereby improving the extraction of spatiotemporal features from multiple meteorological variables. Additionally, a dual-branch structure is employed to fuse the C-SSM and Inception modules, capturing local information at different scales and enhancing feature extraction capability. Learnable parameters are introduced to adjust the information flow within the Inception modules, enabling better adaptation to the meteorological characteristics of different regions, thus enhancing the model’s generalization capability for representing regional precipitation patterns. Furthermore, to better understand the influence of different meteorological variables on precipitation, a relative importance analysis is performed on multiple variables within the TransMambaCNN model. The main contribution points of this paper are as follows:

In this paper, the Convolutional State-Space Module (C-SSM) module is used for precipitation forecasting to leverage its global information modeling capability in order to address the need to capture the spatial and temporal correlations of large-scale meteorological systems in precipitation forecasting, as well as to significantly enhance the model’s ability to capture the spatial and temporal correlation features of the large-scale meteorological systems that drive the evolution of precipitation.
This paper proposes a unique two-branch structure. One branch utilizes C-SSM to capture long-range spatiotemporal dependence and global patterns; the other branch adopts the Inception structure of SimVP to extract multi-scale local detailed features. Learning parameters are introduced to dynamically adjust and fuse the two-branch features so that the model can adapt to the unique meteorological characteristics of different regions (e.g., coastal vs. plateau), significantly improving the ability of the generalized characterization of regional precipitation patterns.
Additionally, we conducted experimental analysis on multiple variables in the dataset. This included investigating their relative importance through the permutation importance method, as well as using the stepwise addition of variables to assess the impact of synergistic effects among multiple variables on precipitation forecast accuracy.

We demonstrate the effectiveness of TransMambaCNN by comparing its results on two benchmark datasets with those of classical precipitation prediction models. In addition to innovations in the model architecture, we also conducted experiments on multiple meteorological variables and analyzed the results (detailed in Section 5), validating the effectiveness of multivariate precipitation forecasting. We believe our work will advance the development of short-term precipitation forecasting, providing more accurate precipitation predictions for practical applications.

2. Methodology

Problem Description: To formally define the spatiotemporal forecasting task addressed by our model, we consider the problem of predicting a sequence of future 24 h cumulative precipitation maps based on a sequence of past atmospheric observations. The task is structured as a sequence-to-sequence learning problem, mapping an input tensor of historical data to an output tensor of future precipitation. This entails learning a mapping between input and output tensors structured along spatial dimensions (Height H and Width W) and batch dimension B.

The input tensor

X \in R^{B \times T \times C \times H \times W}

, where T represents the number of hourly timesteps in the input look-back period and C denotes the number of input variables (e.g., temperature, humidity, and wind components), is obtained from the ERA5 dataset.

The output tensor

Y \in R^{B \times T \times 1 \times H \times W}

, represents the single-target variable precipitation. The model uses input sequences from the past 24 h to predict the 24 h cumulative precipitation for the subsequent 24 h period.

2.1. TransMambaCNN

In precipitation forecasting, rainfall is closely related to multiple meteorological variables. The TransMambaCNN model aims to effectively integrate multi-source meteorological variables and collaboratively extract key spatiotemporal features to enhance forecasting accuracy. As shown in Figure 1, the model consists of three core modules—an encoder, a spatiotemporal feature fusion module, and a decoder. The encoder module first performs downsampling on the input. It compresses the spatial dimensions while integrating information from multiple meteorological variables and extracting high-level abstracted features. This significantly enhances the model’s ability to capture key driving patterns of precipitation. The spatiotemporal feature fusion module processes the features output

f_{1}

by the encoder and Inception using a dual-branch structure; one branch utilizes the Convolutional State-Space Module (C-SSM) to capture long-range spatiotemporal dependencies, while the other branch employs the Inception structure from SimVP to extract local detail features through parallel multi-scale convolutional kernels. The outputs from both branches are fused to form a powerful spatiotemporal representation

f_{i}

, significantly improving the modeling capability for complex meteorological systems, especially extreme precipitation events.

Finally, the decoder module is responsible for upsampling the fused features F to recover spatial details. It skillfully incorporates skip connections from the encoder to supplement detail loss occurring during spatiotemporal modeling, ultimately outputting the predicted precipitation data Y. The encoder module is represented by the function Down; the spatiotemporal feature fusion module is represented by the function SFFM; and the decoder module is represented by the function Up. The corresponding equations are as follows:

\begin{matrix} f_{1} = I n c e p t i o n (D o w n (X)) \end{matrix}

(1)

\begin{matrix} F = S F F M (f_{1}) \end{matrix}

(2)

\begin{matrix} Y = U p (F + D o w n (X)) \end{matrix}

(3)

The model processes input tensors of shape

X \in R^{B \times T \times C \times H \times W}

through three key transformation stages. The encoder first reshapes the input to

R^{(B \times T) \times C \times H \times W}

and produces an output in the original

\in R^{B \times T \times C \times H \times W}

format with modified channel and spatial dimensions. Before entering the spatiotemporal fusion module, the tensor is reshaped to

R^{B \times (T \times C) \times H \times W}

, with the module output returning to the

\in R^{B \times T \times C \times H \times W}

format. Finally, the decoder takes

R^{(B \times T) \times C \times H \times W}

as an input and generates predictions matching the target output shape

\in R^{B \times T \times C \times H \times W}

. Throughout this pipeline, both channel (C) and spatial dimensions (H, W) undergo progressive transformations, with this description focusing exclusively on tensor shape changes rather than specific dimensional values or underlying operations.

2.2. Spatiotemporal Feature Fusion Module

The core components of the spatiotemporal feature fusion module include the Convolutional State-Space Module (C-SSM), the Inception module, and the Spatial Feature Transformation Layer (SFTLayer) [46]. As shown in Figure 1a, this module takes as input the feature information

f_{1}

output by the encoder after downsampling and initial Inception processing.

f_{1}

subsequently enters a dual-branch processing pipeline, whereby the C-SSM branch focuses on extracting long-range spatiotemporal dependency features, while the Inception branch captures multi-scale local fine-grained features. To effectively fuse these two types of complementary information and enable flexible regulation, the module introduces the SFTLayer for feature fusion. Notably, considering the variability in meteorological characteristics across different regions, not all local information is equally important. Therefore, the module applies a learnable parameter scaling mechanism to the output of the Inception branch. This mechanism dynamically adjusts the contribution weight of local features, thereby generating feature representations better adapted to region-specific meteorological patterns. Information from

f_{1}

undergoes deep spatiotemporal modeling through multiple stacked units of C-SSM, Inception, and SFTLayer combinations, ultimately extracting and outputting a powerful fused spatiotemporal feature F. The C-SSM is represented by the function CSSM. The learnable scaling parameter is denoted by

S_{i}

, with its initial value displayed in the corresponding figure. The specific equations are as follows:

\begin{matrix} f_{2} = S F T L a y e r (C S S M (f_{1}), S_{1} * I n c e p t i o n (f_{1})) \end{matrix}

(4)

\begin{matrix} f_{i} = S F T L a y e r (C S S M (f_{i - 1} + f_{i - 2}), S_{i - 1} * I n c e p t i o n (f_{i - 1})), 6 \geq i \geq 3 \end{matrix}

(5)

\begin{matrix} F = S_{7} * C S S M (f_{6} + f_{5}) + S_{6} * I n c e p t i o n (f_{6}) \end{matrix}

(6)

2.3. Convolutional State-Space Module

The C-SSM structure is similar to the ViT (Vision Transformer) architecture, primarily consisting of the following components: depthwise convolution, Norm (normalization), ASSM, depthwise convolution, Norm, MLP (multi-layer perceptron) structure, and residual connections. As shown in Figure 1b, depthwise convolution is first applied to the input features to effectively preserve positional sensitivity and ensure the retention of local spatial continuity characteristics. The core innovation lies in replacing the traditional causal state-space model with the non-causal ASSM. The ASSM eliminates the causality constraint, whereby the features of the current grid point depend solely on previously scanned points, enabling direct interaction between distant meteorological regions exhibiting similar patterns. This capability is crucial for capturing large-scale correlation patterns within precipitation systems. The specific processing workflow is as follows. Feature information

f_{i}

first undergoes a residual connection with

f_{i - 1}

. The summed features are subsequently fed into the depthwise convolution module. Afterward, the output passes through a normalization layer. The ASSM then performs the core spatiotemporal feature extraction on the normalized features. Following extraction, the features undergo another normalization step. Finally, the MLP layer further enhances the spatiotemporal modeling capabilities, outputting the processed spatiotemporal feature

F A_{i + 1}

. The core computational flow is formally expressed as follows:

\begin{matrix} f_{a} = f_{i} + f_{i - 1}, 6 \geq i \geq 2 \end{matrix}

(7)

\begin{matrix} f_{a} = f_{1} \end{matrix}

(8)

\begin{matrix} f_{a 1} = f_{a} + C o n v 2 d (f_{a}) \end{matrix}

(9)

\begin{matrix} f_{a 2} = A S S M (N o r m (f_{a 1})) + f_{a 1} \end{matrix}

(10)

\begin{matrix} f_{a 3} = f_{a 2} + C o n v 2 d (f_{a 2}) \end{matrix}

(11)

\begin{matrix} F A_{i + 1} = A S S M (N o r m (f_{a 3})) + f_{a 3} \end{matrix}

(12)

Attentive State-Space Module

The Attentive State-Space Module (ASSM) [44] acts as the core block of our C-SSM. As shown in Figure 2, given the input feature

x \in R^{H \times W \times C}

, where H and W are the height and width, respectively, and C is the channel dimension, we first apply the positional encoding on x to preserve the original structure information. After that, we employ Semantic-Guided Neighboring (SGN) to unfold the 2D sequences into 1D sequences for subsequent Attentive State-Space Equation (ASE) [44] modeling. Finally, another round of SGN is employed as the inverse operator of the previous one in order to fold the sequence back to the 2D form, followed by a linear projection to obtain the block output. More details are given below.

When using SGN-unfold to rearrange semantically similar information closer together in the 1D sequence, the 2D feature map is transformed into a 1D sequence

x_{i}

of shape

\in R^{L \times C}

, where L equals H × W. This sequence

x_{i}

is subsequently fed into the ASE (Attentive State-Space Equation), which outputs a new sequence

y_{i}

of shape

\in R^{L \times d}

. Finally, the SGN-fold operation reconstructs the 2D structure from this sequence, resulting in a feature map of shape

y \in R^{H \times W \times d}

, which is then processed by a linear projection layer to restore the original channel dimension, yielding the final output of shape

\in R^{H \times W \times C}

.

Within the ASSM structure, the limitations of traditional state-space models are broken. Through SGN-unfold, ASE, and SGN-fold, information with similar semantics is linked together. This ensures that similar information remains adjacent during processing in the ASE, which incorporates global information even when that information has not been scanned yet in the unidirectional sequence. In precipitation forecasting, rainfall is highly regional, and extreme precipitation events are particularly rare. This capability of linking similar information enhances the capture of features critical for precipitation forecasting, which is exceptionally important when forecasting based on multiple meteorological variables, especially for extreme precipitation. Our experiments have also proven this point, demonstrating that it mitigates the issue of under-prediction (low bias) in precipitation forecasts.

\begin{matrix} x_{i} = S G N u n f o l d (x) \end{matrix}

(13)

\begin{matrix} y_{i} = A S E (x_{i}) \end{matrix}

(14)

\begin{matrix} y = L i n e a r (S G N f o l d (y_{i})) \end{matrix}

(15)

2.4. Inception Module

This module employs the Inception structure design from SimVP, serving a critical function within the spatiotemporal feature fusion module. As shown in Figure 1c, by performing parallel multi-scale feature extraction, it enhances the model’s ability to discern local meteorological patterns. Specifically tailored to the dimensions of the feature maps, the module meticulously utilizes four distinct convolutional kernel sizes—3 × 3, 5 × 5, 7 × 7, and 11 × 11—for synergistic processing. The implementation proceeds as follows. The input features

f_{i}

are first processed through a 1 × 1 convolution for fundamental feature transformation. Subsequently, they are fed in parallel into four independent branches. Each branch executes a grouped convolution operation (GroupConv2d) using one of the four specified kernel sizes to extract multi-scale features, generating the corresponding scale-specific feature maps

f_{i 3 \times 3}

,

f_{i 5 \times 5}

,

f_{i 7 \times 7}

, and

f_{i 11 \times 11}

. Finally, these multi-scale representations are fused through a feature stacking operation, yielding the enhanced output features

F B_{i + 1}

. This process can be represented as follows:

\begin{matrix} f_{i 3 \times 3} = G r o u p C o n v 2 d_{3 \times 3} (c o n v 1 \times 1 (f_{i})) \end{matrix}

(16)

\begin{matrix} f_{i 5 \times 5} = G r o u p C o n v 2 d_{5 \times 5} (c o n v 1 \times 1 (f_{i})) \end{matrix}

(17)

\begin{matrix} f_{i 7 \times 7} = G r o u p C o n v 2 d_{7 \times 7} (c o n v 1 \times 1 (f_{i})) \end{matrix}

(18)

\begin{matrix} f_{i 11 \times 11} = G r o u p C o n v 2 d_{11 \times 11} (c o n v 1 \times 1 (f_{i})) \end{matrix}

(19)

\begin{matrix} F B_{i + 1} = f_{i 3 \times 3} + f_{i 5 \times 5} + f_{i 7 \times 7} + f_{i 11 \times 11} \end{matrix}

(20)

2.5. SFT Layer

To further integrate the broad-range spatiotemporal correlation features

F A_{i + 1}

captured by the C-SSM module with the multi-scale local meteorological features

F B_{i + 1}

extracted by the Inception module, as shown in Figure 3, the model incorporates the spatial feature transformation layer (SFT layer) proposed by Wang et al [46]. The SFT layer operates within the spatiotemporal fusion module, which processes input tensors of shape

R^{B \times (T \times C) \times H \times W}

. This layer first concatenates

F A_{i + 1}

and

F B_{i + 1}

along the channel dimension (using

t o r c h . c a t

). Since the outputs from both parallel branches have identical shapes, no broadcasting operation is required during the fusion process. Subsequently, it utilizes two independent cascaded convolutional pathways to learn multiplicative modulation parameters (

m u l

) and additive modulation parameters (

a d d

), respectively, from the concatenated features. Finally, dynamic modulation and the deep fusion of the main spatiotemporal features

F A_{i + 1}

by the conditional features

F B_{i + 1}

are achieved through the affine transformation

F A_{i + 1} \times m u l + a d d

, generating a more information-comprehensive integrated feature. The specific equations are as follows:

\begin{matrix} m u l = s i g m o i d (C o n v 2 d (L e a k y R e L U (C o n v 2 d (c a t (F A_{i + 1}, F B_{i + 1}))))), 5 \geq i \geq 1 \end{matrix}

(21)

\begin{matrix} a d d = C o n v 2 d (L e a k y R e L U (C o n v 2 d (c a t (F A_{i + 1,} F B_{i + 1})))), 5 \geq i \geq 1 \end{matrix}

(22)

\begin{matrix} f_{i + 1} = F A_{i + 1} \times m u l + a d d, 5 \geq i \geq 1 \end{matrix}

(23)

3. Experiments

This section first introduces the datasets used and preprocessing procedures in Section 3.1, followed by the experimental setup and evaluation metrics in Section 3.2. Section 3.3 presents the experimental results and analysis of different models.

3.1. Datasets

This study utilizes the ERA5 reanalysis dataset (0.25° × 0.25° spatial resolution) released by the European Centre for Medium-Range Weather Forecasts (ECMWF) [47,48]. It focuses on the arid Sanjiangyuan region in Qinghai Province of western China and the humid southeastern China with its adjacent marine areas, aiming to achieve 24 h precipitation forecasting. As shown in Table 1, temperature, humidity, u-wind component, v-wind component, and precipitation were selected as predictive variables. During data processing, precipitation data were first processed using the ReLU function to filter out invalid negative values (aberrant precipitation data) and then multiplied by 1000 to convert the unit to millimeters (mm). Subsequently, to mitigate the effects of differing units and scales among various meteorological variables (temperature, humidity, and u-/v-wind components), all variables underwent normalization. This step helps balance data across different variables and elevation levels, thereby enhancing model training efficiency and forecast accuracy. Finally, after completing the aforementioned preprocessing and normalization steps, the data were converted into a format suitable for model input.

3.2. Experimental Setup

This experiment utilized the preprocessed dataset described above, with the training set covering 2009–2018, the validation set spanning 2019–2021, and the test set encompassing 2022–2023. The learning rate was set to 0.0001, and the batch size was 8; the Adam optimizer was combined with a OneCycleLR learning rate schedule initialized at 0.0001, with a training cycle of 51 iterations. DropPath serves as the sole regularization technique, while MSELoss is used as the objective function. To evaluate the performance of the proposed method, several metrics were employed: Mean Squared Error (MSE), Mean Absolute Error (MAE), Threat Score (TS), Probability of Detection (POD), and False Alarm Rate (FAR). For the 24 h precipitation forecast, precipitation thresholds corresponding to light rain (L.R), moderate rain (M.R), heavy rain (H.R), and rainstorm (R.S) were set at 0.1 mm, 10 mm, 25 mm, and 50 mm, respectively. Using these precipitation thresholds, the rainfall amounts were converted into binary values of 0 or 1 to calculate the metrics based on hits, misses, false alarms, and correct negatives, as illustrated in Table 2. These metrics were used to further assess the predictive capability of the model. The specific equations are as follows:

\begin{matrix} T S = \frac{H}{H + M + F A} \end{matrix}

(24)

\begin{matrix} P O D = \frac{H}{H + M} \end{matrix}

(25)

\begin{matrix} M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} | \end{matrix}

(26)

\begin{matrix} M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} \end{matrix}

(27)

\begin{matrix} F A R = \frac{F A}{H + F A} \end{matrix}

(28)

3.3. Experimental Results and Analysis

This section compares our proposed method, TransMambaCNN, with classical approaches including ViT [45], SimVP [28], TAU [49], ConvLSTM [16], PredRNN [18], MIM [20], PredRNN++ [19], and PredFormer [50]. To enhance precipitation forecasting, a statistical analysis of precipitation levels was conducted on the two preprocessed datasets spanning 2009–2023. As shown in Figure 4, precipitation was categorized into light rain (L.R), moderate rain (M.R), heavy rain (H.R), and rainstorm (R.S) using thresholds of 0.1 mm, 10 mm, 25 mm, and 50 mm, respectively. Figure 4a presents the statistics of 24 h precipitation from April to September for southeastern China and adjacent marine areas. The results indicate that despite selecting data from a season with frequent precipitation, heavy rain and rainstorm events remain relatively infrequent. Specifically, precipitation levels are concentrated in light rain events (53.93%), followed by no rain (28.08%) and moderate rain (12.21%), with heavy rain and rainstorm being scarcer (4.42% and 1.36%, respectively). Figure 4b shows the annual 24 h precipitation statistics for the Sanjiangyuan region in Qinghai Province of western China. In contrast to southeastern China and adjacent marine areas characterized by high temperatures, humidity, frequent typhoons, and heavier precipitation, the Sanjiangyuan region in Qinghai Province of western China exhibits a predominance of no rain and light rain events (42.14% and 53.93%, respectively). Heavy rain and rainstorm events are extremely rare, with rainstorm events registering 0% and heavy rain only 0.07%. Moderate rain accounts for 3.17%. A more detailed breakdown is shown in Table 3.

Although many of the aforementioned methods were designed for sequence forecasting, for fair comparison, they were all trained using the dataset containing multiple meteorological variables (precipitation, temperature, humidity, and wind speed) as inputs. As presented in Table 4, a comprehensive evaluation was conducted using both datasets as baseline data, with MAE, MSE, SSIM, POD, FAR, and TS serving as evaluation metrics. The experimental results demonstrate that TransMambaCNN outperforms the other models. Specifically, in the comprehensive evaluation for southeastern China and adjacent marine areas, the proposed TransMambaCNN model exhibits significant performance advantages. Regarding key error metrics, the model achieved the lowest MSE (70.2023) and the second-lowest MAE (4.2595), indicating its superior overall forecasting accuracy. Although the MAE is slightly higher than PredRNN’s 4.1062, this is combined with a significantly reduced MSE, highlighting the model’s strength in suppressing large errors. Additionally, the model’s SSIM metric also performs relatively well on both datasets. The model performs particularly well in predicting moderate rainfall and precipitation events. At the three key thresholds (≥10 mm, ≥25 mm, and ≥50 mm), the model’s TS scores (0.4675, 0.3083, and 0.1750, respectively) comprehensively lead all baseline models. Notably, its performance for extreme precipitation forecasting (≥50 mm) shows a 34.2% improvement over the suboptimal model, PredFormer (0.1304). Additionally, as presented in Table 5, the TransMambaCNN model demonstrates significant advantages in extreme precipitation forecasting, particularly showing outstanding performance under high-threshold conditions (≥50 mm). In the southeastern coastal and marine areas, the model achieved the highest POD (0.2199), representing a 40.15% improvement compared to PredFormer (0.1569), while simultaneously maintaining a suitable FAR (0.5384 vs. 0.5642 of PredFormer). Similarly, in the Sanjiangyuan region of Qinghai, the model exhibited optimal performance at the ≥25 mm threshold (POD 0.0197), which is 5.35 times higher than that of PredFormer (0.0031), along with a significantly reduced FAR (0.4898 vs. 0.8000). Compared to baseline models, our solution not only significantly improves the detection capability of heavy precipitation but also ensures an appropriate false alarm rate. This characteristic makes the model especially suitable for critical applications such as flood warning and extreme weather monitoring, providing more reliable technical support for disaster prevention and mitigation.

In the comprehensive evaluation for the Sanjiangyuan region in Qinghai Province of western China, since rainstorm events (threshold ≥50 mm) constitute 0% of precipitation in this region (Figure 4), the TS score was not calculated for the ≥50 mm threshold. The proposed model demonstrates significant advantages in both overall error control and the forecasting of moderate rainfall and precipitation events within this region. Specifically, it achieved the lowest MAE (0.9896), signifying its optimal forecasting accuracy among the compared models. Its TS scores at the key threshold ≥25 mm (0.0193) are superior to those of the baseline models. Crucially, for heavy rain events (threshold ≥25 mm), which constitute only 0.7% of precipitation in this region (Figure 4) and are rare probabilistic events, the PredRNN, which performs relatively well in southeastern China and adjacent marine areas, does not do well in predicting heavy rainfall, where precipitation is extremely sparse. Other baseline models are similar. In contrast, TransMambaCNN achieved a TS score of 0.0193 for ≥25 mm events, which is 5.22 times higher than the 0.0031 score of the suboptimal model PredFormer. This strongly highlights the model’s exceptional effectiveness in capturing low-probability heavy rain precipitation events within the complex high-altitude terrain of the Sanjiangyuan region in Qinghai Province, western China.

In summary, whether in southeastern China and adjacent marine areas with abundant and diverse precipitation intensities, or in the Sanjiangyuan region in Qinghai Province of western China characterized by scarce precipitation and complex terrain, the TransMambaCNN model demonstrates a significantly superior comprehensive performance for 24 h precipitation forecasting compared to the baseline models.

Furthermore, to provide a more intuitive demonstration of model performance, 24 h precipitation forecast maps from different models were compared against ground truth maps for both the southeastern China and adjacent marine areas dataset and the Sanjiangyuan region in Qinghai Province of western China dataset within their respective test sets. As shown in Figure 5, the precipitation map generated by TransMambaCNN for southeastern China and adjacent marine areas (Figure 5a) demonstrates a closer alignment with the true precipitation distribution, particularly excelling in the forecasting of heavy precipitation events. Similarly, the precipitation map for the Sanjiangyuan region in Qinghai Province of western China (Figure 5b) reveals that TransMambaCNN’s forecast is markedly closer to the ground truth. Other models exhibit deficiencies such as an overly extensive forecast range, leading to a high false alarm rate, or an overly confined scope, resulting in a high miss rate. In contrast, TransMambaCNN most accurately captures both the intensity and spatial extent of the precipitation. Notably, despite the significantly higher precipitation intensity in southeastern China and adjacent marine areas compared to the arid Qinghai Sanjiangyuan region of western China, TransMambaCNN maintains excellent performance across both regions, with its advantage in forecasting precipitation events being particularly prominent.

Beyond visual comparisons, we further analyzed the spatial distribution of Threat Score (TS) for all models. Figure 6 displays the spatial TS distributions for multiple models at thresholds ≥ 25 mm and ≥50 mm. Consistent with the data statistics, the TS spatial distribution was not calculated for the ≥50 mm threshold in the dataset of the Qinghai Sanjiangyuan region in western China. To ensure the stability of the TS spatial maps, TS was computed for each grid point across the entire test set, and the resulting scores were visualized. The figure itself is a direct spatial representation of model performance relative to ground truth, where darker colors indicate higher TS values and better forecast skills. The results (Figure 6a) for the southeastern China and adjacent marine areas dataset show that the TS map for TransMambaCNN demonstrates clear advantages for heavy rain (≥25 mm) and especially for rainstorm events (≥50 mm), where TransMambaCNN exhibits significant superiority. Similar results are shown for the dataset of the Qinghai Sanjiangyuan region in western China (Figure 6b), where TAU and ViT perform noticeably worse. For moderate rain forecasts (≥10 mm), while the MIM model shows more grid points with higher TS values (orange), the SimVP model exhibits a larger spatial extent of moderate TS values (yellow). TransMambaCNN effectively combines the strengths of both patterns. Its advantage becomes even more pronounced for heavy rain forecasting (≥25 mm). The TS spatial distribution maps across both datasets consistently highlight TransMambaCNN ’s distinct superiority, particularly for predicting precipitation events.

4. Ablation Study

4.1. Effect of C-SSM

To validate the effectiveness of the proposed C-SSM module, experimental evaluations were conducted on two datasets. The core of this work lies in integrating the ASSM module with a deep convolutional structure to construct the C-SSM module, and subsequently performing spatiotemporal modeling by stacking multiple such modules, forming the TransMamba architecture for precipitation forecasting. The experimental results, presented in Table 6, demonstrate that the Mamba-based TransMamba model comprehensively outperforms the standard Transformer-based ViT model on the precipitation forecasting task, exhibiting significant improvements in overall accuracy and event detection capability. Specifically, on both datasets, the MSE and MAE metrics of TransMamba were significantly lower than those of ViT. In southeastern China and adjacent marine areas, the TS and POD scores for moderate rain, heavy rain, and torrential rain showed substantial increases, with the advantages being particularly pronounced for rare events like heavy rain and rainstorms. In the Qinghai Sanjiangyuan region of western China, despite the unique arid climate making moderate and heavy rain rare events (torrential rain recorded zero occurrences in the 24 h statistics, hence TS and POD were not calculated), TransMamba still demonstrated clear improvements in TS and POD scores for moderate and heavy rain. In summary, the experimental results clearly validate the superior performance of the TransMamba model integrating the C-SSM module over ViT, especially regarding its predictive capability for heavy rain and rainstorm precipitation events.

4.2. Effect of the Dual-Branch Structure

To explore the effectiveness of the dual-branch integration structure combining a Convolutional State-Space Module and an Inception module, features from the Inception module were scaled and fused with the Convolutional State-Space Module using either an SFT layer or feature addition. These integrated modules were then stacked to form a spatiotemporal feature extraction architecture, resulting in the dual-branch TransMambaCNN model. Ablation studies removing either branch individually confirmed that TransMambaCNN outperformed models using either single branch alone. As shown in Table 7, TransMambaCNN exhibited a slightly higher MAE but lower MSE compared to TransMamba in southeastern China and adjacent marine areas. More importantly, it achieved significant improvements in TS and POD scores for moderate rain, heavy rain, and rainstorm events. In the Qinghai Sanjiangyuan region of western China, the model showed a slightly higher MSE but reduced MAE, along with notable enhancements in TS and POD scores for moderate and heavy rainfall. TransMambaCNN demonstrated clear advantages over M-CNN and TransMamba in both southeastern China and adjacent marine areas as well as the Qinghai Qinghai Sanjiangyuan region of western China. In summary, the dual-branch TransMambaCNN model exhibits effective short-term precipitation forecasting capability and demonstrates significantly improved regional adaptability.

4.2.1. Effect of the $S_{i}$

This ablation study systematically evaluates the impact of parameter initialization strategies on precipitation prediction performance across two geographically distinct regions—southeastern China and adjacent marine areas and the Sanjiangyuan Region in Qinghai, China. As shown in Table 8, the experiment adopts different initialization values for each region based on their unique climatic characteristics.

For southeastern China and adjacent marine areas, we conduct a focused ablation on parameter

S_{6}

(ranging from 0.1 to 1.0, with 0.1 increments), identifying

S_{6}

= 0.6 as delivering an optimal performance across all precipitation thresholds (≥25 mm and ≥50 mm). In contrast, the Sanjiangyuan Region requires a uniform initialization of all parameters (

S_{1}

–

S_{6}

) due to its complex high-altitude precipitation patterns, with

S_{i}

= 0.2 emerging as the most effective configuration. The results demonstrate significant regional disparities in parameter sensitivity, with southeastern China showing particular responsiveness to

S_{6}

adjustments in the 0.4–0.6 range for heavier precipitation events, while the Sanjiangyuan Region achieves the best performance with parameter 0.2.

4.2.2. Effect of the Stacking Integration Unit

This ablation study evaluates the forecasting performance across varying precipitation thresholds by adjusting the number of stacking integration units (from N = 3 to N = 6) in both southeastern China and adjacent marine areas as well as the Sanjiangyuan Region (Qinghai, China). Comprehensive metrics including TS, POD, parameter count (M), and FLOPs (T) are summarized in Table 9. The results reveal notable regional differences in how model complexity affects performance. In southeastern China and adjacent marine areas, N = 5 achieves the best results across most precipitation levels, particularly for heavy precipitation (≥50 mm), where TS reaches 0.1750 and POD reaches 0.2199, significantly outperforming other configurations. Further increasing complexity to N = 6 leads to a clear degradation across all metrics. In contrast, the Sanjiangyuan Region also exhibits the best performance at N = 5 (e.g., TS = 0.2149 for ≥10 mm), though further increasing model depth beyond this brings negligible gains. Furthermore, N = 5 strikes a favorable balance in computational efficiency. Therefore, considering both predictive accuracy and computational cost, N = 5 is selected as the final configuration for the stacking integration unit.

5. Discussion

The generation and distribution of precipitation result from the combined influence of multiple spatiotemporally correlated meteorological variables. A deeper understanding of these variables is crucial for enhancing precipitation forecast accuracy. To this end, this study employed the permutation importance method to assess the relative importance (RI) of various meteorological predictor variables [51,52,53,54]. This method measures a variable’s importance by randomly permuting its data sequence while keeping other variables unchanged and observing the resultant decline in model predictive performance.

The experimental results reveal significant regional variations in the criticality of meteorological variables. For short-term precipitation forecasting in southeastern China and adjacent marine areas, total precipitation (TP) was identified as the most critical predictor. However, in the Sanjiangyuan region in Qinghai Province of western China, the key variables manifested as 200 hPa relative humidity (R_humidity_200 hPa), 600 hPa relative humidity (R_humidity_600 hPa), and 600 hPa temperature (Temp_600 hPa). This discrepancy is closely linked to the region’s precipitation scarcity. Specifically, within multivariate datasets of precipitation-scarce regions, the low frequency of precipitation events results in samples containing numerous zero-precipitation values, substantially increasing the difficulty of constructing accurate precipitation models. Under these circumstances, thermodynamic variables characterizing the atmospheric state, such as temperature and humidity, gain prominence in importance. Humidity and temperature often reflect large-scale moisture transport and stability conditions. Their relative stability in reflecting atmospheric physical conditions provides more informative signals for models to infer the likelihood or potential intensity of precipitation, which are crucial for precipitation formation in arid and high-altitude regions like Sanjiangyuan. Conversely, directly relying on historical precipitation values (TP) as a predictor sees its contained effective predictive information relatively weakened due to dilution by the abundance of zero-value samples, leading to a decrease in its computed relative importance. This finding clearly indicates that in precipitation-scarce regions, constructing models based on multiple atmospheric state variables may be more effective than relying on historical precipitation itself.

The RI analysis results based on the TransMambaCNN model (Figure 7 and Figure 8) show that for precipitation forecasting in both southeastern China and adjacent marine areas and the Sanjiangyuan region in Qinghai Province of western China, as the precipitation intensity threshold increases, the RI values of most variables exhibit a systematic upward trend. When RI > 0, it indicates that the variable contributes positively to the forecasting; conversely, it suggests a potential inhibitory effect. Notably, significant negative RI outliers occurred under the 25 mm heavy precipitation threshold in the Sanjiangyuan region (Figure 8b). This is intrinsically linked to the extreme scarcity of local extreme heavy precipitation samples, whereby during multiple random permutation experiments, the exceptionally low frequency of target events means permuted random predictions might occasionally outperform the original sequence purely by chance, causing a “spurious improvement” in model performance. This, in turn, introduces a negative bias in the average RI calculation. Supporting this, RI values for the 0 mm and 10 mm thresholds in the same region both approach zero, indicating minimal contribution from the corresponding variables in conventional precipitation forecasting. This stands in mutual validation to the precipitation-prone southeastern China and adjacent marine areas (where extreme negative values were absent), jointly highlighting the limitations of the RI method in regions with extremely sparse precipitation.

Furthermore, this study evaluated the collective importance of the same meteorological variable across different vertical levels (Figure 9). By simultaneously permuting the data for that variable across all height levels, its overall RI value was calculated. The experiment revealed that the RI from collective permutation was significantly higher than the results from single-level permutations. This phenomenon reveals that the same variable at different heights contributes collectively to precipitation formation mechanisms through vertical physical linkages, with its overall informational value far exceeding that of single-level features. This indicates that traditional single-level importance assessments may systematically underestimate the predictive potential of meteorological variables, emphasizing the scientific necessity of integrating the vertical structural information of variables within deep learning models.

Beyond RI analysis, we also conducted precipitation forecasting experiments using a stepwise variable addition approach (Figure 10). Models were incrementally enhanced by adding variables starting from TP. The results show that as variables were progressively added, the average MAE and MSE across all models generally decreased, underscoring the effectiveness of multivariate precipitation forecasting. However, it is noteworthy that during the addition of certain specific variables, some models occasionally exhibited increased errors. This indicates that while multivariate forecasting is overall beneficial, more variables are not invariably better; the benefits may be influenced by factors such as overfitting, variable redundancy, or model optimization constraints.

In terms of RI score, we obtain the relative importance scores (

R I_{i}

) for each meteorological predictor. Here, i represents the index of the permuted variable,

s c o r e

is the original TS score, and score_perm is the TS score after permutation. The RI equation is as follows:

\begin{matrix} R I_{i} = (s c o r e - s c o r e_{i}) / s c o r e \end{matrix}

(29)

The conclusions based on the analysis of forecast variables in this paper are as follows:

There are significant regional differences in the criticality of meteorological variables in short-term precipitation forecasting.
A reasonable selection of meteorological variables from different altitude layers to be used together in short-term precipitation forecasting increases the performance of precipitation forecasting.
Constructing models based on multiple atmospheric state variables may be more effective than relying on historical precipitation itself.
There are correlations between meteorological variables in short-term precipitation forecasting, and adding meteorological variables does not necessarily lead to improved model performance; a reasonable selection of meteorological variables is beneficial to precipitation forecasting.

6. Conclusions

This paper proposes TransMambaCNN, a spatiotemporal Transformer network integrating state-space models with CNNs for short-term precipitation forecasting. The core innovation of the model lies in the design of the Convolutional State-Space Module (C-SSM), which efficiently extracts spatiotemporal features from multi-source meteorological variables by replacing the self-attention mechanism in the Vision Transformer (ViT) with an Attentive State-Space Module (ASSM) and augmenting its feature extraction capacity with integrated depthwise convolution. Simultaneously, a dual-branch structure is employed, whereby the C-SSM branch captures global information, while the Inception branch supplements this by extracting local information and fine-grained details through parallel multi-scale convolutions. Additionally, the deep fusion mechanism enhances the representation capacity for complex precipitation systems. In comparative experiments on two datasets covering southeastern China and adjacent marine areas (high-precipitation areaa) as well as the Sanjiangyuan region in Qinghai Province of western China (low-precipitation area), TransMambaCNN demonstrated significant advantages across key evaluation metrics. Notably, its performance in forecasting moderate rainfall and above precipitation events showed substantial improvement. However, the model’s forecasting for heavy precipitation events tends to be lower than the actual observed values. Future work will explore the integration of radar, satellite, and ground observation data alongside the incorporation of ERA5 data for 2024 and 2025 to further enhance temporal generalization capabilities and real-world applicability. This effort will incorporate advanced verification metrics such as the Equitable Threat Score and Bias Score, along with confidence intervals and seasonal breakdowns, in order to improve the accuracy of extreme precipitation forecasting. Concurrently, inference time will be evaluated as a key metric to ensure computational efficiency and practical feasibility for real-time deployment.

Author Contributions

Conceptualization: K.Z. and G.Z.; methodology: K.Z.; software: K.Z.; investigation: K.Z. and X.W.; writing—original draft preparation: K.Z. and G.Z.; writing—review and editing: K.Z. and X.W.; project administration: G.Z.; funding acquisition: X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Qinghai Province (No. 2023-ZJ-906M) and the National Natural Science Foundation of China (No. 62162053). This research was supported by the High-performance computing center of Qinghai University.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ma, Z.; Zhang, H.; Liu, J. MM-RNN: A multimodal RNN for precipitation nowcasting. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4101914. [Google Scholar] [CrossRef]
Cheng, Y.; Qu, H.; Wang, J.; Qian, K.; Li, W.; Yang, L.; Han, X.; Liu, M. A radar echo extrapolation model based on a dual-branch encoder–decoder and spatiotemporal GRU. Atmosphere 2024, 15, 104. [Google Scholar] [CrossRef]
Tang, Y.; Zhou, J.; Pan, X.; Gong, Z.; Liang, J. Postrainbench: A comprehensive benchmark and a new model for precipitation forecasting. arXiv 2023, arXiv:2310.02676. [Google Scholar]
Grönquist, P.; Yao, C.; Ben-Nun, T.; Dryden, N.; Dueben, P.; Li, S.; Hoefler, T. Deep learning for post-processing ensemble weather forecasts. Philos. Trans. R. Soc. A 2021, 379, 20200092. [Google Scholar] [CrossRef] [PubMed]
Peng, X.; Che, Y.; Chang, J. A novel approach to improve numerical weather prediction skills by using anomaly integration and historical data. J. Geophys. Res. Atmos. 2013, 118, 8814–8826. [Google Scholar] [CrossRef]
Agrawal, S.; Barrington, L.; Bromberg, C.; Burge, J.; Gazen, C.; Hickey, J. Machine learning for precipitation nowcasting from radar images. arXiv 2019, arXiv:1912.12132. [Google Scholar] [CrossRef]
Chen, L.; Cao, Y.; Ma, L.; Zhang, J. A deep learning-based methodology for precipitation nowcasting with radar. Earth Space Sci. 2020, 7, e2019EA000812. [Google Scholar] [CrossRef]
Li, D.; Min, X.; Xu, J.; Xue, J.; Shi, Z. Assessment of three gridded satellite-based precipitation products and their performance variabilities during typhoons over Zhejiang, southeastern China. J. Hydrol. 2022, 610, 127985. [Google Scholar] [CrossRef]
Xian, D.; Zhang, P.; Gao, L.; Sun, R.; Zhang, H.; Jia, X. Fengyun meteorological satellite products for earth system science applications. Adv. Atmos. Sci. 2021, 38, 1267–1284. [Google Scholar] [CrossRef]
Jin, Q.; Zhang, X.; Xiao, X.; Wang, Y.; Meng, G.; Xiang, S.; Pan, C. Spatiotemporal inference network for precipitation nowcasting with multimodal fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 1299–1314. [Google Scholar] [CrossRef]
Bauer, P.; Thorpe, A.; Brunet, G. The quiet revolution of numerical weather prediction. Nature 2015, 525, 47–55. [Google Scholar] [CrossRef]
Woo, W.C.; Wong, W.K. Operational application of optical flow techniques to radar-based rainfall nowcasting. Atmosphere 2017, 8, 48. [Google Scholar] [CrossRef]
Hu, J.; Yin, B.; Guo, C. METEO-DLNet: Quantitative precipitation nowcasting net based on meteorological features and deep learning. Remote Sens. 2024, 16, 1063. [Google Scholar] [CrossRef]
Zeng, Z.; Wang, D.; Chen, Y. An investigation of convective features and ZR relationships for a local extreme precipitation event. Atmos. Res. 2021, 250, 105372. [Google Scholar] [CrossRef]
Choi, Y.; Cha, K.; Back, M.; Choi, H.; Jeon, T. RAIN-F+: The data-driven precipitation prediction model for integrated weather observations. Remote Sens. 2021, 13, 3627. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 1. [Google Scholar]
Shi, X.; Gao, Z.; Lausen, L.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Deep learning for precipitation nowcasting: A benchmark and a new model. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Wang, Y.; Gao, Z.; Long, M.; Wang, J.; Yu, P.S. Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In Proceedings of the International Conference on Machine Learning—PMLR 2018, Stockholm, Sweden, 10–15 July 2018; pp. 5123–5132. [Google Scholar]
Wang, Y.; Zhang, J.; Zhu, H.; Long, M.; Wang, J.; Yu, P.S. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9154–9162. [Google Scholar]
She, L.; Zhang, C.; Man, X.; Luo, X.; Shao, J. A self-attention causal lstm model for precipitation nowcasting. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Brisbane, Australia, 10–14 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 470–473. [Google Scholar]
Patel, V.; Degadwala, S. Deployment of 3D-Conv-LSTM for Precipitation Nowcast via Satellite Data. In Proceedings of the 2024 4th International Conference on Pervasive Computing and Social Networking (ICPCSN), Salem, India, 3–4 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 984–988. [Google Scholar]
He, G.; Wu, W.; Han, J.; Luo, J.; Lei, L. EOST-LSTM: Long Short-Term Memory Model Combined with Attention Module and Full-Dimensional Dynamic Convolution Module. Remote Sens. 2025, 17, 1103. [Google Scholar] [CrossRef]
Wu, H.; Yao, Z.; Wang, J.; Long, M. MotionRNN: A flexible model for video prediction with spacetime-varying motions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15435–15444. [Google Scholar]
Zheng, C.; Tao, Y.; Zhang, J.; Xun, L.; Li, T.; Yan, Q. TISE-LSTM: A LSTM model for precipitation nowcasting with temporal interactions and spatial extract blocks. Neurocomputing 2024, 590, 127700. [Google Scholar] [CrossRef]
Hong, L.; Modirrousta, M.H.; Hossein Nasirpour, M.; Mirshekari Chargari, M.; Mohammadi, F.; Moravvej, S.V.; Rezvanishad, L.; Rezvanishad, M.; Bakhshayeshi, I.; Alizadehsani, R.; et al. GAN-LSTM-3D: An efficient method for lung tumour 3D reconstruction enhanced by attention-based LSTM. CAAI Trans. Intell. Technol. 2023. [Google Scholar] [CrossRef]
Ma, Z.; Zhang, H.; Liu, J. DB-RNN: An RNN for precipitation nowcasting deblurring. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5026–5041. [Google Scholar] [CrossRef]
Gao, Z.; Tan, C.; Wu, L.; Li, S.Z. Simvp: Simpler yet better video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3170–3180. [Google Scholar]
Han, L.; Liang, H.; Chen, H.; Zhang, W.; Ge, Y. Convective precipitation nowcasting using U-Net model. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4103508. [Google Scholar] [CrossRef]
Deng, Q.; Lu, P.; Zhao, S.; Yuan, N. U-Net: A deep-learning method for improving summer precipitation forecasts in China. Atmos. Ocean. Sci. Lett. 2023, 16, 100322. [Google Scholar] [CrossRef]
Fernández, J.G.; Mehrkanoon, S. Broad-UNet: Multi-scale feature learning for nowcasting tasks. Neural Netw. 2021, 144, 419–427. [Google Scholar] [CrossRef]
Ayzel, G.; Scheffer, T.; Heistermann, M.I. RainNet v1.0: A convolutional neural network for radar-based precipitation nowcasting. Geosci. Model Dev. 2020, 13, 2631–2644. [Google Scholar] [CrossRef]
Kaparakis, C.; Mehrkanoon, S. Wf-unet: Weather fusion unet for precipitation nowcasting. arXiv 2023, arXiv:2302.04102. [Google Scholar] [CrossRef]
Zhu, K.; Chen, H.; Han, L. Mct u-net: A deep learning nowcasting method using dual-polarization radar observations. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 4665–4668. [Google Scholar]
Zhang, Z.; Luo, C.; Feng, S.; Ye, R.; Ye, Y.; Li, X. RAP-Net: Region attention predictive network for precipitation nowcasting. Geosci. Model Dev. Discuss. 2022, 2022, 1–19. [Google Scholar] [CrossRef]
Tang, S.; Li, C.; Zhang, P.; Tang, R. Swinlstm: Improving spatiotemporal prediction accuracy using swin transformer and lstm. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 13470–13479. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Ma, Z.; Zhang, H.; Liu, J. Preciplstm: A meteorological spatiotemporal lstm for precipitation nowcasting. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4109108. [Google Scholar] [CrossRef]
Civitarese, D.S.; Szwarcman, D.; Zadrozny, B.; Watson, C. Extreme precipitation seasonal forecast using a transformer neural network. arXiv 2021, arXiv:2107.06846. [Google Scholar] [CrossRef]
Yang, H.; Zhang, Z.; Liu, X.; Jing, P. Monthly-scale hydro-climatic forecasting and climate change impact evaluation based on a novel DCNN-Transformer network. Environ. Res. 2023, 236, 116821. [Google Scholar] [CrossRef] [PubMed]
Yin, H.; Guo, Z.; Zhang, X.; Chen, J.; Zhang, Y. RR-Former: Rainfall-runoff modeling based on Transformer. J. Hydrol. 2022, 609, 127781. [Google Scholar] [CrossRef]
Jiang, M.; Weng, B.; Chen, J.; Huang, T.; Ye, F.; You, L. Transformer-enhanced spatiotemporal neural network for post-processing of precipitation forecasts. J. Hydrol. 2024, 630, 130720. [Google Scholar] [CrossRef]
Wang, Z.; Kong, F.; Feng, S.; Wang, M.; Yang, X.; Zhao, H.; Wang, D.; Zhang, Y. Is mamba effective for time series forecasting? Neurocomputing 2025, 619, 129178. [Google Scholar] [CrossRef]
Guo, H.; Guo, Y.; Zha, Y.; Zhang, Y.; Li, W.; Dai, T.; Xia, S.T.; Li, Y. Mambairv2: Attentive state space restoration. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 10–17 June 2025; pp. 28124–28133. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 606–615. [Google Scholar]
Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
Soci, C.; Hersbach, H.; Simmons, A.; Poli, P.; Bell, B.; Berrisford, P.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Radu, R.; et al. The ERA5 global reanalysis from 1940 to 2022. Q. J. R. Meteorol. Soc. 2024, 150, 4014–4048. [Google Scholar] [CrossRef]
Tan, C.; Gao, Z.; Wu, L.; Xu, Y.; Xia, J.; Li, S.; Li, S.Z. Temporal attention unit: Towards efficient spatiotemporal predictive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18770–18782. [Google Scholar]
Tang, Y.; Qi, L.; Xie, F.; Li, X.; Ma, C.; Yang, M.H. PredFormer: Transformers are effective spatial-temporal predictive learners. In Proceedings of the ICLR 2025, Singapore, 24–28 April 2025. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Rasp, S.; Lerch, S. Neural networks for postprocessing ensemble weather forecasts. Mon. Weather. Rev. 2018, 146, 3885–3900. [Google Scholar] [CrossRef]
Li, W.; Pan, B.; Xia, J.; Duan, Q. Convolutional neural network-based statistical post-processing of ensemble precipitation forecasts. J. Hydrol. 2022, 605, 127301. [Google Scholar] [CrossRef]
Zhou, K.; Sun, J.; Zheng, Y.; Zhang, Y. Quantitative precipitation forecast experiment based on basic NWP variables using deep learning. Adv. Atmos. Sci. 2022, 39, 1472–1486. [Google Scholar] [CrossRef]

Figure 1. TransMambaCNN overview. Spatiotemporal feature fusion module structure schematic. Conv State-Space Module (C-SSM) structure schematic. Inception module structure schematic.

Figure 2. Structure of ASSM [44]. The ASSM structure is a key component of C-SSM and provides global information to the C-SSM branches.

Figure 3. Structure of the SFT layer. The SFT layer acts as a feature fusion unit in the model, effectively fusing the feature information of the two branches.

Figure 4. The statistics of 24-hour precipitation for the dataset. (a) Southeastern China and adjacent marine areas; (b) Sanjiangyuan region in Qinghai, western China.

Figure 5. Visualization of real and forecast precipitation. (a) Southeastern China and adjacent marine areas; (b) Sanjiangyuan region, Qinghai, western China.

Figure 6. Comparison of spatial TS scores for different precipitation intervals between models and dataset. (a) Southeastern China and adjacent marine areas; (b) Sanjiangyuan region, Qinghai, western China.

Figure 7. RI analysis in terms of TS in southeastern China and adjacent marine areas. (a) RI scores of all variables for TS in TransMambaCNN. (b) To provide a clearer presentation of the RI scores of TS, we removed the predictor variable of the RI scores of TS from (a), which predicted that the RI scores of TS of the predictor descended above 0.2.

Figure 8. RI analysis of TS in the Sanjiangyuan region in Qinghai Province of western China, showing the RI scores of TS more clearly. (a) The RI scores of the removed TS are less than 0 for the predictor variable. (b) The RI scores of the removed TS are more than 0 for the predictor variable.

Figure 9. RI analysis in terms of TS, which is the collective RI of the same meteorological variable across different vertical levels. (a) Southeastern China and adjacent marine areas; (b) Sanjiangyuan region, Qinghai, western China.

Figure 10. Line plots of the values of the variables MAE and MSE, added step by step in different models. (a) Southeastern China and adjacent marine areas; (b) Sanjiangyuan region, Qinghai, western China.

Table 1. Meteorological elements and their abbreviations in two datasets. Dataset 1 covers the southeastern China and adjacent marine areas (April–September, 20.5–35.25°N, 108.25–123°E). Dataset 2 covers the Sanjiangyuan region in Qinghai Province of western China (January–December, 31–39.75°N, 89.25–103°E).

Dataset	Meteorological Element	Levels	Abbreviation
Dataset 1	Total precipitation	Surface	TP
	Humidity	200/500/700/850 hPa	R_humidity_200 hPa/R_humidity_500 hPa
			R_humidity_700 hPa/R_humidity_850 hPa
	Temperature	200/500/700/850 hPa	Temp_200 hPa/Temp_500 hPa
			Temp_700 hPa/Temp_850 hPa
	U-wind component	200/500/700/850 hPa	U-Wind_200 hPa/U-Wind_500 hPa
			U-Wind_700 hPa/U-Wind_850 hPa
	V-wind component	200/500/700/850 hPa	V-Wind_200 hPa/V-Wind_500 hPa
			V-Wind_700 hPa/V-Wind_850 hPa
Dataset 2	Total precipitation	Surface	TP
	Humidity	200/300/400/500/600 hPa	R_humidity_200 hPa/R_humidity_300 hPa
			R_humidity_400 hPa/R_humidity_500 hPa
			R_humidity_600 hPa
	Temperature	200/300/400/500/600 hPa	Temp_200 hPa/Temp_300 hPa
			Temp_400 hPa/Temp_500 hPa
			Temp_600 hPa
	U-wind component	200/300/400/500/600 hPa	U-Wind_200 hPa/U-Wind_300 hPa
			U-Wind_400 hPa/U-Wind_500 hPa
			U-Wind_600 hPa
	V-wind component	200/300/400/500/600 hPa	V-Wind_200 hPa/V-Wind_300 hPa
			V-Wind_400 hPa/V-Wind_500 hPa
			V-Wind_600 hPa

Table 2. Hits, misses, false alarms, and correct negatives are divided into situations.

		Observation
		0	1
Forecast	0	correct negatives (CN)	misses (M)
Forecast	1	false alarms (FA)	hits (H)

Table 3. Data distribution of precipitation categories by region and data type.

Region	Data	Precipitation Categories (mm)
Region	Data	<0.1	0.1–10	10–25	25–50	≥50
Sanjiangyuan Region (Qinghai, western China)	Training	0.4206	0.5478	0.0309	0.0007	0.0000
	Validation	0.4164	0.5481	0.0347	0.0008	0.0000
	Test	0.4431	0.5346	0.0314	0.0008	0.0000
Southeastern China and Marine Areas	Training	0.2766	0.5424	0.1226	0.0444	0.0140
	Validation	0.2779	0.5390	0.1250	0.0448	0.0132
	Test	0.3061	0.5238	0.1151	0.0425	0.0125

Table 4. Performance comparison of precipitation forecasting models; “–” indicates no data for ≥50 mm threshold in the Sanjiangyuan region in Qinghai Province of western China. The best results are in bold.

Region	Model	Paras (M)	Flops (T)	SSIM ↑	MSE ↓	MAE ↓	TS Score (↑)
Region	Model	Paras (M)	Flops (T)	SSIM ↑	MSE ↓	MAE ↓	≥0.1	≥10	≥25	≥50
Southeastern China and Marine Areas	PredRNN [18]	23.859	16.491	0.4393	71.3980	4.1062	0.7177	0.4595	0.2719	0.1185
	ConvLSTM [16]	15.094	8.078	0.3710	75.9776	4.4554	0.6950	0.4511	0.2556	0.0940
	MIM [20]	45.769	25.551	0.4213	76.3462	4.2786	0.7144	0.4397	0.2492	0.0910
	PredRNN++ [19]	38.605	24.419	0.3822	75.4762	4.3130	0.6944	0.4518	0.2254	0.0503
	SimVP [28]	22.45	0.233	0.3806	71.6207	4.3681	0.7056	0.4662	0.2653	0.0875
	ViT [45]	52.179	0.439	0.3932	76.0649	4.2698	0.7051	0.4117	0.2115	0.1108
	TAU [49]	50.058	0.423	0.3554	75.0830	4.5516	0.7013	0.4170	0.2404	0.1056
	PredFormer [50]	25.393	1.162	0.3470	74.0522	4.5427	0.6986	0.4543	0.2805	0.1304
	TransMambaCNN	26.932	0.406	0.4323	70.2023	4.2595	0.7131	0.4675	0.3083	0.1750
Sanjiangyuan Region (Qinghai, western China)	PredRNN [18]	23.949	9.27	0.3507	4.5245	1.0543	0.6604	0.1673	0.0000	–
	ConvLSTM [16]	15.146	4.543	0.3300	4.8843	1.0950	0.6665	0.1747	0.0000	–
	MIM [20]	42.209	14.343	0.3331	4.4083	1.0440	0.6454	0.1883	0.0000	–
	PredRNN++ [19]	38.694	13.709	0.3622	4.5204	1.0420	0.7025	0.1807	0.0000	–
	SimVP [28]	22.452	0.131	0.3376	4.3580	1.0315	0.6811	0.2143	0.0015	–
	ViT [45]	52.181	0.247	0.2983	4.8069	1.1513	0.6202	0.1533	0.0000	–
	TAU [49]	50.061	0.238	0.3403	4.9105	1.0652	0.6212	0.1417	0.0000	–
	PredFormer [50]	25.426	0.637	0.3284	4.1968	1.0336	0.6481	0.2427	0.0031	–
	TransMambaCNN	26.935	0.229	0.3935	4.3214	0.9896	0.6898	0.2149	0.0193	–

Table 5. Comparison of probability of detection (POD) and false alarm ratio (FAR) across different regions and precipitation thresholds. “–” indicates no available data for the ≥50 mm threshold in Sanjiangyuan region. The best results are in bold.

Region	Model	POD ↑			FAR ↓
Region	Model	≥10	≥25	≥50	≥10	≥25	≥50
Southeastern China and Marine Areas	PredRNN [18]	0.6265	0.3427	0.1350	0.3671	0.4317	0.5082
	ConvLSTM [16]	0.6410	0.3303	0.1040	0.3964	0.4697	0.5065
	MIM [20]	0.6175	0.3085	0.1004	0.3957	0.4356	0.5087
	PredRNN++ [19]	0.6285	0.2659	0.0523	0.3835	0.4035	0.4350
	SimVP [28]	0.6452	0.3343	0.0957	0.3731	0.4376	0.4945
	ViT [45]	0.5277	0.2443	0.1210	0.3481	0.3886	0.4307
	TAU [49]	0.6243	0.2730	0.0985	0.3856	0.4042	0.4704
	PredFormer [50]	0.6306	0.3712	0.1569	0.3810	0.4655	0.5642
	TransMambaCNN	0.6566	0.4212	0.2199	0.3813	0.4651	0.5384
Sanjiangyuan Region (Qinghai, China)	PredRNN [18]	0.2011	0.0000	–	0.3272	0.0000	–
	ConvLSTM [16]	0.2279	0.0000	–	0.5719	1.0000	–
	MIM [20]	0.2282	0.0000	–	0.4816	0.0000	–
	PredRNN++ [19]	0.2215	0.0000	–	0.5046	0.0000	–
	SimVP [28]	0.2756	0.0016	–	0.5092	0.9091	–
	ViT [45]	0.1895	0.0000	–	0.5549	0.0000	–
	TAU [49]	0.1617	0.0000	–	0.4661	0.0000	–
	PredFormer [50]	0.3229	0.0031	–	0.5057	0.8000	–
	TransMambaCNN	0.2796	0.0197	–	0.5184	0.4898	–

Table 6. The results of the C-SSM ablation experiment assessments, with the best metrics highlighted in bold. “–” indicates no data for the thresholds ≥50 mm in the Qinghai Sanjiangyuan region of western China.

Region	Model	MSE ↓	MAE ↓	TS Score ↑			POD ↑
Region	Model	MSE ↓	MAE ↓	≥10	≥25	≥50	≥10	≥25	≥50
Southeastern China and Adjacent Marine Areas	ViT	76.0649	4.2698	0.4117	0.2115	0.1108	0.5277	0.2443	0.1210
Southeastern China and Adjacent Marine Areas	TransMamba	70.6951	3.9924	0.4583	0.2945	0.1582	0.6056	0.3773	0.1945
Sanjiangyuan Region (Qinghai, western China)	ViT	4.8069	1.1513	0.1533	0.0000	–	0.1895	0.0000	–
Sanjiangyuan Region (Qinghai, western China)	TransMamba	4.2796	1.0007	0.2137	0.0141	–	0.2780	0.0142	–

Table 7. The results of the dual-branch structure ablation experiment assessments, with the best metrics highlighted in bold. “–” indicates no data for the thresholds ≥50 mm in the Qinghai Sanjiangyuan region of western China.

Region	Model	MSE ↓	MAE ↓	TS Score ↑			POD ↑
Region	Model	MSE ↓	MAE ↓	≥10	≥25	≥50	≥10	≥25	≥50
Southeastern China and Adjacent Marine Areas	TransMamba	70.6951	3.9924	0.4583	0.2945	0.1582	0.6056	0.3773	0.1945
	M-CNN	73.0374	4.1179	0.4430	0.2246	0.1088	0.5770	0.2644	0.1199
	TransMambaCNN	70.2023	4.2595	0.4675	0.3083	0.1750	0.6566	0.4212	0.2199
Sanjiangyuan Region (Qinghai, western China)	TransMamba	4.2796	1.0007	0.2137	0.0141	–	0.2780	0.0142	–
	M-CNN	4.3514	1.0325	0.2085	0.0085	–	0.2661	0.0087	–
	TransMambaCNN	4.3214	0.9896	0.2149	0.0193	–	0.2796	0.0197	–

Table 8. The results of the

S_{i}

ablation experiment assessments, with the best metrics highlighted in bold. “–” indicates no data for the thresholds ≥50 mm in the Qinghai Sanjiangyuan region of western China.

Table 8. The results of the

S_{i}

ablation experiment assessments, with the best metrics highlighted in bold. “–” indicates no data for the thresholds ≥50 mm in the Qinghai Sanjiangyuan region of western China.

Region	$S_{i}$	TS ↑			POD ↑
Region	6 ≥ i ≥ 1	≥10	≥25	≥50	≥10	≥25	≥50
Southeastern China and Marine Areas	$S_{i} = R a n d o m$	0.4594	0.2940	0.1608	0.6138	0.3795	0.1916
	$S_{6} = 0.1$	0.4520	0.2451	0.1205	0.5952	0.2879	0.1328
	$S_{6} = 0.2$	0.4539	0.2576	0.1448	0.6017	0.3122	0.1640
	$S_{6} = 0.3$	0.4620	0.2781	0.1593	0.6121	0.3525	0.1915
	$S_{6} = 0.4$	0.4730	0.3079	0.1435	0.6596	0.3983	0.1652
	$S_{6} = 0.5$	0.4547	0.2646	0.1583	0.6056	0.3278	0.1854
	$S_{6} = 0.6$	0.4675	0.3083	0.1750	0.6566	0.4212	0.2199
	$S_{6} = 0.7$	0.4620	0.2937	0.1682	0.6197	0.3814	0.2017
	$S_{6} = 0.8$	0.4614	0.2943	0.1584	0.6193	0.3802	0.1900
	$S_{6} = 0.9$	0.4544	0.2793	0.1492	0.6064	0.3536	0.1754
	$S_{6} = 1$	0.4454	0.2301	0.1297	0.5764	0.2715	0.1443
Sanjiangyuan Region (Qinghai, China)	$S_{i} = R a n d o m$	0.2198	0.0084	–	0.2944	0.0087	–
	$S_{i} = 0.1$	0.2104	0.0023	–	0.2736	0.0024	–
	$S_{i} = 0.2$	0.2149	0.0193	–	0.2796	0.0197	–
	$S_{i} = 0.3$	0.1979	0.0008	–	0.2493	0.0008	–
	$S_{i} = 0.4$	0.2137	0.0144	–	0.2809	0.0149	–
	$S_{i} = 0.5$	0.2033	0.0061	–	0.2644	0.0063	–
	$S_{i} = 0.6$	0.2103	0.0123	–	0.2698	0.0126	–
	$S_{i} = 0.7$	0.2134	0.0069	–	0.2780	0.0071	–
	$S_{i} = 0.8$	0.2110	0.0100	–	0.2724	0.0102	–
	$S_{i} = 0.9$	0.2110	0.0098	–	0.2778	0.0102	–
	$S_{i} = 1$	0.2125	0.0023	–	0.2792	0.0024	–

Table 9. The results of the stacking integration unit ablation experiment assessments, with the best metrics highlighted in bold. “–” indicates no data for the thresholds ≥50 mm in the Qinghai Sanjiangyuan region of western China.

Region	Stacking Integration Unit = N	Parameters (M)	FLOPs (T)	TS ↑			POD ↑
Region	Stacking Integration Unit = N	Parameters (M)	FLOPs (T)	≥10	≥25	≥50	≥10	≥25	≥50
Southeastern China and Marine Areas	N = 3	23.896	0.31	0.4592	0.2947	0.1523	0.6079	0.3762	0.1839
	N = 4	25.414	0.358	0.4665	0.2850	0.1412	0.6376	0.3702	0.1652
	N = 5	26.932	0.406	0.4675	0.3083	0.1750	0.6566	0.4212	0.2199
	N = 6	28.45	0.455	0.4464	0.2590	0.1246	0.5904	0.3221	0.1414
Sanjiangyuan Region (Qinghai, China)	N = 3	23.899	0.174	0.2135	0.0061	–	0.2776	0.0063	–
	N = 4	25.417	0.202	0.1991	0.0062	–	0.2514	0.0063	–
	N = 5	26.935	0.229	0.2149	0.0193	–	0.2796	0.0197	–
	N = 6	28.452	0.256	0.2009	0.0250	–	0.2536	0.0260	–

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, K.; Zhang, G.; Wang, X. TransMambaCNN: A Spatiotemporal Transformer Network Fusing State-Space Models and CNNs for Short-Term Precipitation Forecasting. Remote Sens. 2025, 17, 3200. https://doi.org/10.3390/rs17183200

AMA Style

Zhang K, Zhang G, Wang X. TransMambaCNN: A Spatiotemporal Transformer Network Fusing State-Space Models and CNNs for Short-Term Precipitation Forecasting. Remote Sensing. 2025; 17(18):3200. https://doi.org/10.3390/rs17183200

Chicago/Turabian Style

Zhang, Kai, Guojing Zhang, and Xiaoying Wang. 2025. "TransMambaCNN: A Spatiotemporal Transformer Network Fusing State-Space Models and CNNs for Short-Term Precipitation Forecasting" Remote Sensing 17, no. 18: 3200. https://doi.org/10.3390/rs17183200

APA Style

Zhang, K., Zhang, G., & Wang, X. (2025). TransMambaCNN: A Spatiotemporal Transformer Network Fusing State-Space Models and CNNs for Short-Term Precipitation Forecasting. Remote Sensing, 17(18), 3200. https://doi.org/10.3390/rs17183200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TransMambaCNN: A Spatiotemporal Transformer Network Fusing State-Space Models and CNNs for Short-Term Precipitation Forecasting

Abstract

Highlights

Abstract

1. Introduction

2. Methodology

2.1. TransMambaCNN

2.2. Spatiotemporal Feature Fusion Module

2.3. Convolutional State-Space Module

Attentive State-Space Module

2.4. Inception Module

2.5. SFT Layer

3. Experiments

3.1. Datasets

3.2. Experimental Setup

3.3. Experimental Results and Analysis

4. Ablation Study

4.1. Effect of C-SSM

4.2. Effect of the Dual-Branch Structure

4.2.1. Effect of the $S_{i}$

4.2.2. Effect of the Stacking Integration Unit

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

TransMambaCNN: A Spatiotemporal Transformer Network Fusing State-Space Models and CNNs for Short-Term Precipitation Forecasting

Abstract

Highlights

Abstract

1. Introduction

2. Methodology

2.1. TransMambaCNN

2.2. Spatiotemporal Feature Fusion Module

2.3. Convolutional State-Space Module

Attentive State-Space Module

2.4. Inception Module

2.5. SFT Layer

3. Experiments

3.1. Datasets

3.2. Experimental Setup

3.3. Experimental Results and Analysis

4. Ablation Study

4.1. Effect of C-SSM

4.2. Effect of the Dual-Branch Structure

4.2.1. Effect of the S i

4.2.2. Effect of the Stacking Integration Unit

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.1. Effect of the $S_{i}$