Highlights
What are the main findings?
- The applicability of satellite precipitation products varies across different hydrological modeling frameworks.
- In China’s Xiangjiang River Basin, the Distributed Physics-informed Deep Learning (DPDL) model demonstrates better compatibility with satellite precipitation products compared to the SWAT model.
What are the implication of the main findings?
- The DPDL model can effectively capture watershed hydrological dynamics when driven by satellite precipitation products.
- The DPDL model attains the highest streamflow simulation accuracy when driven by GSMaP with specific training strategies, and demonstrates optimal robustness when forced with IMERG-F under varying strategies.
Abstract
Satellite precipitation products serve as valuable global data sources for hydrological modeling, yet their applicability across different hydrological models remains insufficiently explored. The distributed physics-informed deep learning model (DPDL), as a representative of emerging differentiable, physics-based hydrological models, requires a systematic evaluation of the suitability of multi-source precipitation products within its modeling framework. This study focuses on the Xiangjiang River Basin in southern China, where both a DPDL model and a Soil and Water Assessment Tool (SWAT) model were constructed. In addition, two model training strategies were designed: S1 (fixed parameters) and S2 (product-specific recalibration). Multiple precipitation products were used to drive both hydrological models, and their streamflow simulation performance was evaluated under different training schemes to analyze the compatibility between precipitation products and hydrological modeling frameworks. The results show that: (1) In the Xiangjiang River Basin of southern China, GSMaP demonstrated the best overall performance with a Critical Success Index of 0.70 and a correlation coefficient (Corr) of 0.79; IMERG-F showed acceptable accuracy with a Corr of 0.75 but had a relatively high false alarm rate (FAR) of 0.32; while CMORPH exhibited the most significant systematic underestimation with a relative bias (RBIAS) of −8.48%. (2) The DPDL model more effectively captured watershed hydrological dynamics, achieving a validation period correlation coefficient of 0.82 and a Nash–Sutcliffe efficiency (NSE) of 0.79, outperforming the SWAT model. However, the DPDL model showed a higher RBIAS of +16.69% during the validation period, along with greater overestimation fluctuations during dry periods, revealing inherent limitations of differentiable hydrological models when training samples are limited. (3) The S2 strategy (product-specific recalibration) improved the streamflow simulation accuracy for most precipitation products, with the maximum increase in the NSE coefficient reaching 15.8%. (4) The hydrological utility of satellite products is jointly determined by model architecture and training strategy. For the DPDL model, IMERG-F demonstrated the best overall robustness, while GSMaP achieved the highest accuracy under the S2 strategy. This study aims to provide theoretical support for optimizing differentiable hydrological modeling and to offer new perspectives for evaluating the hydrological utility of satellite precipitation products.
1. Introduction
Hydrological models serve as indispensable tools for understanding water cycle processes, predicting water resource variations, and supporting flood control, disaster mitigation, and climate change adaptation research [1,2]. Since the simple conceptual models of the 20th century, hydrological modeling has undergone substantial development driven by theoretical and technological advances, progressively evolving from lumped models to semi-distributed, distributed, and more recently, data-driven models [3].
Traditional physics-based models, such as the lumped conceptual Xinanjiang, Stanford, and Tank models, enable rapid assessment through simplified physical process representations. However, their inability to adequately capture spatiotemporal precipitation heterogeneity limits their accuracy in simulating complex water cycle processes [4,5]. In contrast, semi-distributed and distributed models represented by SWAT, TOPMODEL, and VIC provide more refined representations of spatiotemporal precipitation variations through grid-based computations and deliver richer hydrological variable outputs, significantly enhancing simulation accuracy [6,7]. However, this precision improvement comes at the cost of increased parametrization and computational demands [8], and their high computational complexity and dependence on data resources also constrain widespread application across large basins [9]. Simultaneously, Sun et al. (2024) point out that compressing rich information into a single-valued objective function is a primary cause of calibration difficulty in distributed models and a source of model uncertainty [10].
With breakthroughs in artificial intelligence and machine learning technologies, data-driven approaches have demonstrated strong potential in hydrological modeling [11,12,13]. Numerous studies clearly indicate that AI techniques generally outperform most physics-based models in runoff prediction using large hydrometeorological datasets [14,15,16]. These methods do not rely on explicit physical mechanisms but instead model systems by mining complex nonlinear statistical relationships between inputs and outputs, evolving from early regression models to deep neural networks with thousands of parameters. Deep learning models represented by Long Short-Term Memory networks and Convolutional Neural Networks possess efficient capabilities for processing high-dimensional data and recognizing complex patterns [14,17]. However, some researchers critically point out that many data-driven studies select input-output variables with weak physical connections, neglecting theoretically defined causal relationships [18,19], making them difficult to address specific scientific questions [20]. Adera et al. (2024) systematically list the main drawbacks of data-driven models: potential unreliability under non-stationary conditions, requirement for substantial training data, occasional physically inconsistent results, lack of interpretability, and absence of clear representations of internal states and processes [21]. Based on these reasons, the mainstream scientific community remains cautious about using ML algorithms [22,23].
In recent years, differentiable hydrological models integrating physical mechanisms with data-driven advantages have gradually emerged [24]. These models use physical process models as their backbone, embedding regionalized neural network modules within differentiable physical structures to learn process variables and optimize parameters [17,25]. They combine physical consistency with adaptive learning capabilities, balancing physical mechanisms and interpretability while fully leveraging data-driven model advantages [26,27]. Zhang et al. (2025) embedded regionalized LSTM within a differentiable physical backbone (EXP-HYDRO) to simultaneously learn process variables and parameters, experimentally demonstrating that this framework achieves accuracy comparable to pure LSTM models in ungauged basins (median NSE difference only 0.021) [28]. Ouyang et al. (2025) constructed dXAJ and dXAJnn models, retaining the Xinanjiang model structure while incorporating LSTM for parameter learning, with experimental results directly proving that under data-limited conditions, both dXAJ and dXAJnn outperform traditionally calibrated models in runoff prediction accuracy [29]. Wang et al. (2024) [26] developed a novel framework that leverages neural networks to derive physically meaningful, spatially distributed parameters from watershed attributes for runoff simulation. In the Amazon Basin, this framework improved the simulation efficiency of runoff and total water storage by 41% and 35%, respectively, compared to the original physical model, HydroPy [26]. Sawadekar et al. (2025) demonstrated how neural networks (for data fusion) can be seamlessly coupled with process-based models (for hydrological modeling) to enable end-to-end training, ultimately achieving a substantial improvement in simulation accuracy [30]. Zhong et al. (2024) [31] proposed a distributed physics-informed DL model based on a distributed framework. It involves spatial discretization and establishing differentiable hydrological models for discrete sub-basins, combined with differentiable Muskingum method channel routing, ultimately yielding superior runoff simulation results compared to distributed hydrological models [31]. Despite the theoretical advantages of differentiable modeling, existing research predominantly focuses on North America and plateau regions, with limited studies on their applicability in typical climatic zones like humid southern China, urgently requiring regional adaptability exploration to promote global application [17,25,27,28,32,33].
In hydrological models, precipitation as the core driving data directly determines simulation performance quality [34,35,36]. Although traditional ground rain gauge and weather radar observations offer relatively high accuracy, their spatial discontinuity and sparse distribution limit the comprehensive capture of spatiotemporal precipitation dynamics. Despite radar’s high spatiotemporal resolution, it remains susceptible to weather conditions and detection network coverage constraints [37,38,39]. In recent years, the rapid development of satellite remote sensing technology has made satellite precipitation estimation products important precipitation data sources, offering advantages of global coverage, high spatiotemporal resolution, and efficient assimilation, becoming powerful supplements or even alternatives to ground observations in data-scarce regions [40,41,42].
Currently, numerous studies focus on precipitation dataset accuracy assessment, striving to identify optimal products suitable for different regions and models [43,44,45,46]. Despite continuous improvements in spatiotemporal resolution, applying satellite precipitation products in hydrological modeling still faces numerous challenges [47].
Firstly, constrained by sensor performance and retrieval algorithms, satellite products still exhibit systematic biases in complex precipitation events, particularly showing insufficient accuracy in extreme and localized heavy precipitation simulations, leading to unstable extreme flow simulation performance when applied in hydrological models. Lyu et al. (2024) indicated systematic underestimation in IMERG V6 and systematic overestimation in IMERG V7, revealing how such precipitation errors directly cause flood peak overestimation [48]. Woods et al. (2023) research showed that using satellite products for flood simulation studies leads to earlier flood onset, later termination, higher intensity, and longer duration [49]. Al Khoury et al. (2024) found all precipitation products caused significant runoff underestimation (up to 79%) [50]. Despite the relatively poor accuracy of runoff simulations driven by precipitation products, systematic investigations into how precipitation data errors propagate into and affect hydrological modeling remain scarce [51].
Secondly, current research mostly focuses on precipitation product accuracy itself and runoff simulation accuracy with single hydrological models, lacking systematic evaluation of synergistic performance across “different hydrological model—different precipitation product” combinations, particularly lacking targeted assessment and tuning before use, hindering full realization of data-model matching potential. Gao et al. (2023) [43] pointed out three major limitations in studies on hydrological simulation performance of precipitation products: predominant use of lumped hydrological models neglecting spatial variability; hydrological models typically not recalibrated for each remote sensing product; and lack of systematic analysis of the suitability of various precipitation products for identifying flood characteristics [43].
Clearly, hydrological model selection significantly impacts runoff simulation accuracy evaluation for specific precipitation datasets. Furthermore, precipitation dataset performance may vary considerably depending on the applied region. However, to our knowledge, few studies have evaluated the hydrological utility of precipitation products using different hydrological models. Wan et al. (2025) comprehensively evaluated six widely used long-term precipitation datasets (MSWEP V2, GPCC, CPC, NCEP-2, MERRA-2, and ERA5) applied to conceptual hydrological models XAJ and GR4J regarding their performance in capturing extreme precipitation and flow across China [52]. Results indicated substantial uncertainty in the ability of precipitation datasets to capture extreme flows, primarily originating from the hydrological model used, particularly in mountainous basins where complex topography and climate make hydrological model structure especially influential on flow simulation. Therefore, deeply exploring the adaptability and error compensation mechanisms of different hydrological models to multi-source precipitation data holds significant theoretical and practical importance for enhancing the overall application effectiveness of hydrological models in complex environments.
Based on the aforementioned research gaps, this study selects the Xiangjiang River Basin as the study area, with a primary focus on investigating the synergistic performance between different precipitation data products and both differentiable hydrological models and traditional physics-based hydrological models. The research objectives are threefold: (1) to develop a differentiable hydrological model and evaluate its applicability in the humid region of southern China, thereby analyzing its distinctions from the SWAT model; (2) to design two model training strategies and assess the compatibility of multi-source precipitation products with hydrological models under these different strategies; and (3) to investigate the compensatory effects of hydrological models on errors inherent in precipitation products, particularly in the context of extreme flow simulations. This study aims to provide a theoretical foundation for the future optimization of differentiable hydrological modeling and to offer a scientific basis for water resources management and flood mitigation strategies within the Xiangjiang River Basin.
2. Study Area and Data
2.1. Study Area
The Xiangjiang River is located in Hunan Province, China, and serves as a major tributary in the middle reaches of the Yangtze River. Its main stem extends 948 km in length and drains a catchment area of approximately 94,721 km2 (24°31′–28°45′N, 110°30′–114°00′E). The lower reaches of the basin form the core socioeconomic zone of Hunan Province, supporting the Changsha-Zhuzhou-Xiangtan urban agglomeration, a significant metropolitan region in central-eastern China. This area is characterized by a dense population and concentrated urban infrastructure (Figure 1).
Figure 1.
The location of the study area in China and the spatial distribution of hydrological stations and rain gauges.
The basin experiences a subtropical monsoon climate, with mean annual precipitation ranging from 1300 mm to 1800 mm. Precipitation is highly seasonal, concentrating during the summer months, particularly from June to August, when heavy rainfall events frequently occur. A notable example is the extreme event from 22 June to 2 July 2017, when widespread heavy-to-torrential rain affected Hunan Province, with an average rainfall accumulation of 197.3 mm. This event caused water levels at multiple gauging stations along the mainstem of the Xiangjiang River to exceed warning thresholds. The water level at the Changsha Station reached a record high, surpassing the previous historical record set in 1998. The flooding necessitated the evacuation of tens of thousands of residents, posing a severe threat to lives and property along the riverbanks.
The selection of the Xiangjiang River Basin as the study area not only allows for a robust evaluation of the differentiable hydrological model’s applicability in humid regions and an assessment of the impact of diverse precipitation data sources on hydrological simulations, but also carries significant scientific and practical value for enhancing regional disaster prevention and mitigation capabilities.
2.2. Satellite-Based Precipitation Products (SPPs)
Based on a systematic review and synthesis of previous research, this study selected five satellite precipitation products (SPPs), namely TMPA 3B42RT (hereafter TRMM-RT), TMPA 3B42V7 (hereafter TRMM), IMERG-F, GSMaP, and CMORPH. Multiple studies have validated the reliability and stability of these products across different regions and climatic conditions from the dual perspectives of data accuracy and hydrological model applicability. Precipitation data from the five SPPs for the period January 2001 to December 2018 were collected for further precipitation estimation accuracy evaluation and for driving the hydrological models.
The Tropical Rainfall Measuring Mission (TRMM) was a joint program between the National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA), with the primary objective of monitoring and investigating the distribution of precipitation in the tropics [47]. This study utilizes the near-real-time product TRMM-RT and the post-real-time product TRMM. These products have a spatial resolution of 0.25° and a temporal resolution of 3 h (https://disc.gsfc.nasa.gov/datasets?keywords=TRMM, accessed on 8 April 2023). The post-real-time TRMM incorporates gauge-based precipitation data from the Global Precipitation Climatology Centre (GPCC) for bias correction, whereas TRMM-RT, although not corrected in this manner, employs a climate correction algorithm to eliminate errors.
The Global Precipitation Measurement (GPM) mission is the successor to TRMM. The Integrated Multi-satellite Retrievals for GPM (IMERG) algorithm generates high-precision precipitation estimates by merging microwave, infrared, and radar data [47]. It provides data with superior spatiotemporal resolution: a temporal resolution of 0.5 h and a spatial resolution enhanced to 0.1°. IMERG products are categorized into three sub-products: IMERG-E, IMERG-L, and IMERG-F. This study employs the gauge-calibrated IMERG-F product for subsequent analysis (https://gpm1.gesdisc.eosdis.nasa.gov/data/, accessed on 8 April 2023).
The Global Satellite Mapping of Precipitation (GSMaP), developed by JAXA, is a global satellite precipitation mapping system [53]. It combines passive microwave (PMW) and infrared (IR) remote sensing data and incorporates a Kalman filter technique to dynamically adjust precipitation estimates, aiming to provide high-accuracy, high-resolution global precipitation maps (https://sharaku.eorc.jaxa.jp/GSMaP/index.htm, accessed on 8 April 2023). This study uses the GSMaP_mvk product, which has a temporal resolution of 1 h and a spatial resolution of 0.1° × 0.1°.
The Climate Prediction Center Morphing technique (CMORPH) was developed by the NOAA Climate Prediction Center (CPC). It generates global, high spatiotemporal resolution precipitation data by morphing and integrating observations from IR and PMW sensors, making it suitable for studying precipitation and its spatiotemporal variations from mesoscale to interannual scales [54]. This study utilizes the bias-corrected version of CMORPH V1.0, which has a spatial resolution of 0.25° × 0.25° and covers the global latitude band from 60°S to 60°N (https://www.ncei.noaa.gov/data/cmorph-high-resolution-global-precipitation-estimates/access/daily/, accessed on 8 April 2023).
2.3. Ground-Based Precipitation and Hydrological Data
Daily ground-based precipitation and air temperature data for the period 2003–2018 were obtained from 45 national meteorological stations within the Xiangjiang River Basin. All meteorological data, sourced from the National Meteorological Information Center of the China Meteorological Administration (http://data.cma.cn, accessed on 8 April 2023). Potential evapotranspiration was calculated using the Hargreaves formula [55]. Corresponding daily streamflow data for the same period were collected from the Xiangtan Hydrological Station (112°52′E, 27°50′N), the outlet control station of the Xiangjiang River Basin, provided by the Hydrological and Water Resources Survey Center of Hunan Province.
2.4. Other Data
Additional data include topography, land use, soil, and vegetation characteristics. Topographic data were derived from the Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM), a joint project of NASA and the National Geospatial-Intelligence Agency (NGA), with a spatial resolution of 90 m (http://www.gscloud.cn, accessed on 8 April 2023). Land use data were sourced from the “30 m annual land cover and its dynamics in China from 1990 to 2019” dataset [56]. Soil and vegetation characteristic data were obtained from the Land-Atmosphere Interaction Research Group at Sun Yat-sen University [57]. The soil data is part of a newly developed global dataset of soil, hydraulic, and thermal parameters, covering 90°N to 90°S and 180°W to 180°E. The vertical profile information for the soil data includes layers at depths of: 0–0.05 m, 0.05–0.15 m, 0.15–0.30 m, 0.30–0.60 m, 0.60–1.00 m, and 1.00–2.00 m. Key parameters provided in the dataset include saturated water content, volumetric heat capacity of soil solids in a unit soil volume, thermal conductivity of dry soils, and the volumetric fractions of sand, silt, and clay. More detailed information about this data source and the download link can be found at: http://globalchange.bnu.edu.cn/research/soil5.jsp, accessed on 8 April 2023. The vegetation characteristic data (LAI dataset) were generated by reprocessing the MODIS version 6.1 LAI products. The raw data used include the MODIS LAI Version 6.1 products MCD15A2H (covering 4 July 2002 to 2023), MOD15A2H (covering 18 February 2000 to 26 June 2002), and the MODIS Land Cover Type product MCD12Q1 (covering 2001 to 2023). More detailed information on the data source, validation, and evaluation results, and the download link can be found at: http://globalchange.bnu.edu.cn/research/laiv061, accessed on 8 April 2023. The average value for each attribute was calculated for each sub-basin. To characterize watershed heterogeneity, the minimum, maximum, and standard deviation of each variable were additionally computed for topographic attributes (except area) and soil properties.
2.5. Data Preprocessing
All precipitation, streamflow, and meteorological data underwent rigorous quality control procedures. For the SPPs, spatial interpolation and temporal aggregation were performed to ensure data completeness and consistency. For the ground-based precipitation data within the basin, missing values were interpolated, and potential outliers were identified and removed. Streamflow data processing similarly involved the removal of outliers and data standardization to facilitate comparison with model outputs.
Specifically, potential artificial recording errors or outliers in the streamflow and ground-based precipitation observation data were identified and eliminated. Missing values in the ground-based precipitation observations were interpolated using the Natural Neighbor method. This method, which weights estimates based on the areas of Thiessen polygons surrounding each gauge, effectively estimates precipitation at unsampled locations [58].
Given the differences in spatiotemporal resolutions among the various precipitation datasets, as well as the fact that these products are recorded in Coordinated Universal Time (UTC) while ground observations follow Beijing Time (UTC+8), this study first standardized the observation time to UTC+8. The precipitation product data were then aggregated to a daily scale. Following this, the spatial resolution was resampled to a uniform 0.1° × 0.1° grid using the bilinear interpolation method. This approach calculates the weighted average for a target point based on known precipitation values from its four nearest neighboring grid points through two successive linear interpolations, effectively generating a smooth and spatially continuous precipitation field. After unifying the spatiotemporal resolution of the precipitation products, the satellite precipitation product grid point data closest to the ground observation stations were selected, and the ground station data were matched with the grid precipitation data. This matched dataset was then used for further accuracy assessment of satellite precipitation products and the construction of precipitation estimation training sets.
During the setup of the hydrological model, the precipitation data from the SPPs needed to be spatially and temporally unified to the centroid of each sub-basin. Following established methods, a virtual station was created at each centroid location. The SPP grid data nearest to the centroid were directly selected as the precipitation data for that basin’s centroid. Subsequently, the precipitation time series for each sub-basin was organized, and a text file was generated for each virtual station containing the station ID, the start date of the precipitation time series, and all daily precipitation values for the entire simulation period. The data for each station were stored in separate text files for driving the hydrological models.
3. Methodology
3.1. The Distributed Physics-Informed Deep Learning Model (DPDL)
Zhong et al. (2024) [31] adopted the differentiable parameter learning (dPL) framework proposed by Tsai et al. (2021) [24] and Feng et al. (2022) [17], along with the EXP-HYDRO model [59], to develop a distributed, physics-informed deep learning hydrological model (DPDL) (Figure 2). This model incorporates two state buckets (a snow bucket and a soil water bucket) and six physical parameters, primarily describing processes such as precipitation, snow accumulation and ablation, soil water dynamics, and subsurface and surface flow. Given the minimal influence of glacial runoff and snowmelt on the Xiangjiang River Basin, related computations were excluded from this study.
Figure 2.
Flow chart for this study.
Inspired by Feng et al. (2022) [17], Zhong et al. (2024) [31] introduced an additional parameter, β, into the evapotranspiration calculation to reflect vegetation effects. Following Mizukami et al. (2017) [60], a hillslope routing module was incorporated into the surface flow component, utilizing a Gamma distribution-based unit hydrograph for convolution calculations to derive the final surface runoff at the outlet of each sub-basin. In constructing the differentiable hydrological model, two primary categories of input data were systematically collected and prepared: watershed physical attributes for determining static parameters and meteorological time series data for inferring dynamic parameters.
The process began with the extraction of watershed static features and the derivation of static parameters. Following the delineation of sub-basins based on DEM data, detailed statistical indicators of physical characteristics were calculated for each sub-basin. The specific indicators encompassed:
Topographic features: Area, Mean Elevation, Mean Slope, Standard Deviation of Slope, Mean Aspect, Drainage Density, Main Channel Length, Main Channel Slope, and Elevation Roughness Index. Land use and vegetation features: Vegetation Cover Index (e.g., mean NDVI), Forest Area Ratio, Grassland Area Ratio, and Impervious Area Ratio. Soil and geological features: Soil Type Proportion (e.g., sand, loam, clay), Soil Depth, Saturated Hydraulic Conductivity, and Geological Type Index. Climatological statistical features: Mean Annual Precipitation, Mean Annual Temperature, Precipitation Seasonality Index, and Aridity Index.
These feature vectors, along with meteorological sequence data, served as input to a static parameter learning network composed of a 1D Convolutional (Conv1D) encoder and a Fully Connected Neural Network (FCNN). The Conv1D layer was employed to extract supplementary features reflecting watershed meteorological characteristics from the forcing data, thereby mitigating uncertainties inherent in watershed attribute selection. Subsequently, the learned meteorological features were concatenated with the watershed’s physical features. The fused features underwent a nonlinear transformation through the FCNN, ultimately outputting a set of static hydrological parameters for each sub-basin. These included parameters for watershed runoff generation, such as Snow bucket maximum storage (Smax_snow), Soil water bucket maximum storage (Smax_soil), Outflow coefficient (K), and Infiltration rate (f), as well as parameters for channel routing, namely the shape parameter (a) and time scale parameter (b) for the Gamma-based unit hydrograph of surface flow. The hyperparameters for this static parameter network can be specifically designed as a funnel-shaped Conv1D architecture (kernel counts 10/5/1, sizes 7/5/3) followed by an FCNN layer (hidden size 128), to achieve effective mapping from meteorological sequences to static parameters. These parameters remained constant during the simulation period, representing the intrinsic physical properties of the watersheds and channels.
Next, meteorological time series data were processed for dynamic parameter inference. The collected long-term daily meteorological forcing data, specifically Precipitation, Temperature, and Potential Evapotranspiration, were fed into a Long Short-Term Memory (LSTM) network. The LSTM learned the temporal evolution patterns of meteorological conditions and their dynamic influence on hydrological process states, outputting a set of dynamic parameters at each time step. These included: a soil freeze–thaw parameter (controlling ice content in soil water), a soil water outflow control parameter (governing the recession rate of flow from the soil bucket), and an evapotranspiration vegetation correction parameter (for adjusting ET based on vegetation phenology). The dynamic parameter network was typically constructed using an LSTM (with a hidden size of 128) and often incorporated a relatively high Dropout rate (0.5) to enhance generalization capability.
Following runoff generation calculations for each sub-basin, the flow was routed through the river network to the watershed outlet. This study employed a differentiable framework coupled with the Muskingum method for flow routing. Within this framework, the storage time constant and weighting factor for each river reach were automatically determined by a lightweight FCNN (hidden size 32), taking inputs such as channel silt content, clay content, sand content, length, slope, and width.
Finally, the watershed and channel static parameters determined by the FCNN, along with the dynamic parameters inferred at each time step by the LSTM network, jointly drove the differentiable EXP-HYDRO hydrological physical model. This completed the full-process simulation from meteorological inputs to runoff generation in each sub-basin, followed by routing through the river network to the main watershed outlet. The entire framework was optimized end-to-end via gradient backpropagation, utilizing metrics such as the Nash-Sutcliffe Efficiency (NSE) as the loss function, and employing techniques like gradient clipping and early stopping to prevent training instability and overfitting. This architecture ensures that the model maintains both physical mechanism constraints and spatiotemporal dynamic adaptability, even in data-scarce regions.
3.2. SWAT Model (Benchmark Model)
This study also employed the Soil and Water Assessment Tool (SWAT) model as a benchmark. SWAT is a semi-distributed, process-based watershed-scale model widely used to assess the long-term impacts of land use and climate change on the hydrologic cycle [42]. The model divides the watershed into sub-basins based on a DEM and further delineates Hydrologic Response Units (HRUs) as the minimum computational units based on land cover, soil type, and slope.
The study primarily focuses on the hydrologic component, mainly founded on the Water Mass Balance equation.
where, represents the overall soil moisture at time t, expressed in days; is the precipitation in mm, is the initial moisture; is the amount of water seeped from the soil water, is the surface runoff; is the evapotranspiration; is the groundwater runoff.
The hydrological calculations in the SWAT model follow the water balance equation, mainly including four aspects: surface runoff, evapotranspiration, soil water, and groundwater.
Runoff can be predicted using the curve number method.
where, is the daily rainfall (mm); is the runoff (mm); is the retention parameter; can be calculated using the following SCS equation.
CN is the curve number under normal antecedent moisture conditions.
The formula for calculating potential evapotranspiration is:
where: is the potential heat flux density, MJ/(m2·d); is the evapotranspiration rate, mm/d; is the slope of the saturation vapor pressure-temperature curve, kPa/°C; is the radiation amount, MJ/(m2·d); is the ground heat flux density, MJ/(m2·d); is the air density, kg/m3; is the constant specific heat, MJ/(kg·°C); is the saturation vapor pressure, kPa; is the water vapor pressure, kPa; is the psychrometric constant, kPa/°C; is the plant canopy resistance, s/m; is the aerodynamic resistance, s/m.
The formula for calculating interflow is:
where: is the outflow from the saturated zone, mm; is the saturated hydraulic conductivity of the soil, mm/h; is the slope, m/m; is the total soil porosity; is the slope length, m.
The calculation formula for shallow groundwater is:
where: is the shallow groundwater amount on day i, mm; is the shallow groundwater amount on day i − 1, mm; is the recharge amount, mm; is the baseflow generated, mm; is the amount of water entering the soil zone on day i, mm; is the shallow extraction amount, mm.
The calculation formula for deep groundwater amount is:
where: is the deep groundwater content on day i, mm; is the deep groundwater amount on day i − 1, mm; is the recharge amount of the deep aquifer, mm; is the deep extraction amount, mm.
The topographic data, land use data, soil data, and sub-basin delineation used for the SWAT model in this study are consistent with those used for DPDL. This aims to provide a reference for evaluating precipitation products under different hydrological modeling frameworks.
3.3. Experimental Design
To evaluate the accuracy of different precipitation products in the Xiangjiang River Basin and their applicability in driving different hydrological models, this study designed the following experiments:
(1) Direct comparison of precipitation estimation accuracy: The accuracy of the precipitation products is evaluated by comparing gauge precipitation data with precipitation estimates from the corresponding grid cells [61]. For grid cells containing only a single gauge, a point-to-pixel evaluation is conducted [62]. For grid cells containing two or more gauges, a pixel-to-pixel evaluation is performed, using the arithmetic mean of all gauge data within the grid cell as the benchmark precipitation amount [63].
(2) Hydrological model setup and evaluation: The differentiable DPDL and SWAT hydrological models were set up for the Xiangjiang River Basin. Both models were calibrated using the first 9 years of the study period (1 January 2003 to 31 December 2011) and validated using the final 7 years (1 January 2012 to 31 December 2018).
Existing research indicates that hydrological models can, to some extent, compensate for errors in SPPs through the calibration process [64]. To evaluate the performance of each precipitation product for runoff modeling and the adaptability of different hydrological models to different SPPs, a comparative assessment was conducted using two strategies: S1 (Parameter-fixed) and S2 (Product-specific recalibration) [36].
S1 (Parameter-fixed): Model parameters are calibrated using gauge precipitation data and subsequently kept fixed. The model is then validated using each SPP. This assesses the suitability of the SPPs under consistent parameter conditions.
S2 (Product-specific recalibration): Each SPP is used directly for both model calibration and validation. This assesses the ability of different hydrological models to compensate for errors inherent in the SPPs.
(3) Evaluation of flood simulation capability: This study selected 20 flood events occurring between 2003 and 2018, naming them by their occurrence date (e.g., the flood event on 22 June 2017 is denoted “20170622”). Among these, 12 events fall within the calibration period and 8 within the validation period (Table 1). By calculating errors in peak flow and flood volume, the performance of the hydrological models driven by different precipitation products in simulating individual flood events was analyzed.
Table 1.
Flood events from 2003 to 2018 over the Xiangjiang River basin.
3.4. Evaluation Metrics
(1) Categorical Metrics for Precipitation Detection [43]:
Probability of Detection (POD): Measures the proportion of actual events that were successfully detected; False Alarm Ratio (FAR): Measures the proportion of predicted events that were false alarms; Critical Success Index (CSI): A comprehensive measure evaluating the accuracy and detection capability of precipitation products in capturing precipitation events and the hydrological model in simulating runoff processes.
where: TP (True Positives) is the number of precipitation events correctly detected by both the precipitation product/hydrological model and the ground observations; FN (False Negatives) is the number of events recorded by ground observations but missed by the precipitation product/hydrological model; FP (False Positives) is the number of events detected by the precipitation product/hydrological model but not confirmed by ground observations.
(2) Flow Simulation Metrics [36]:
Nash-Sutcliffe Efficiency (NSE): Measures the goodness-of-fit between simulated and observed values; Correlation Coefficient (Corr): Describes the linear correlation between simulations and observations.
(3) Bias and Error Metrics [52]:
Relative Bias (RBIAS): Evaluates systematic overestimation or underestimation; Root Mean Square Error (RMSE): Measures the magnitude of simulation errors.
where: α is the bias ratio of the mean simulated streamflow to the mean observed streamflow. β is the variability ratio of the coefficient of variation of the simulated streamflow to the coefficient of variation of the observed streamflow. n is the number of time steps or events. and are the simulated and observed values at time i, respectively. and are the mean simulated and observed values, respectively.
4. Results
4.1. Accuracy Assessment of Multi-Source Precipitation Products
The quantitative evaluation of the five satellite precipitation products reveals distinct systematic biases, as summarized in Table 2. Overall, GSMaP delivered the best performance, achieving the highest Corr of 0.79, the lowest RMSE of 6.88 mm/day, and the most robust precipitation detection capability with a CSI of 0.70. IMERG-F also demonstrated high accuracy with a Corr of 0.75 and showed a strong ability to detect precipitation events, evidenced by its POD of 0.71. However, its utility was somewhat constrained by a notably higher FAR of 0.32.
Table 2.
Performance assessment of the five SPPs.
In terms of systematic bias, the products diverged into two groups: IMERG-F and TRMM-RT displayed slight RBIAS of +3.95% and +4.53%, respectively, while CMORPH, GSMaP, and TRMM were characterized by RBIAS of −8.48%, −3.62%, and −1.79%, respectively. A noteworthy observation was the substantial correction from the positive bias in TRMM-RT to the slight negative bias in its successor, TRMM, underscoring the effectiveness of the post-real-time correction algorithm.
A further stratified evaluation based on rainfall intensity levels highlights significant performance degradation for all products as intensity increases (Table 3). GSMaP stands out for its remarkable robustness, consistently maintaining the highest Corr and the lowest RMSE across all intensity categories, closely followed by IMERG-F. Despite their superior performance in these metrics, both GSMaP and IMERG-F exhibit relatively pronounced negative RBIAS. Conversely, the TRMM series products (TRMM-RT and TRMM), while generally featuring lower absolute RBIAS, are outperformed in terms of Corr and RMSE.
Table 3.
Performance of the five SPPs across varying precipitation intensity levels.
A consistent pattern emerged where all products experience a systematic performance decay with increasing precipitation intensity: Corr values decline, RMSE increases markedly, and the magnitude of underestimation (negative RBIAS) intensifies. Particularly for heavy precipitation events exceeding 30 mm/day, the underestimation for all products surpasses −24%, while their Corr drops below 0.50 and RMSE approaches 30 mm/day. This pattern underscores a fundamental challenge inherent in current satellite retrieval algorithms to accurately capture the core structure and intensity of deep convective systems. This conclusion is consistent with the findings of Meng and Zhao (2025) [36].
4.2. Hydrological Model Construction and Applicability Assessment
To enhance the accuracy of the SWAT model, it is crucial to select highly sensitive parameters, define suitable warm-up and calibration periods, and choose appropriate calibration methods. The LH-OAT (Latin hypercube and one-at-a-time) method was employed to select 12 highly sensitive parameters, as shown in Table 4.
Table 4.
Sensitivity parameters for the runoff in the Xiangjiang Basin.
Table 5 summarizes the performance metrics of the SWAT and DPDL models for daily streamflow simulation in the Xiang River Basin. During both the calibration (2003–2011) and validation (2012–2018) periods, and when driven by gauge-based precipitation data, both models demonstrated satisfactory performance, with the Nash-Sutcliffe efficiency coefficient (NSE), Kling-Gupta efficiency coefficient (KGE), and correlation coefficient all exceeding 0.78. This indicates their strong capability in daily streamflow simulation and establishes a reliable foundation for the subsequent evaluation of the hydrological utility of multi-source precipitation products.
Table 5.
Performance of the models during calibration and validation periods.
A comparative analysis revealed distinct performance characteristics between the two model structures. The DPDL model demonstrated superior capability in capturing watershed streamflow dynamics, achieving a higher NSE of 0.83 and lower RMSE of 839.24 m3/s during the calibration period compared to the SWAT model (NSE = 0.78, RMSE = 932.19 m3/s). This advantage in dynamic simulation was maintained during the validation period. However, the percent bias (PBIAS) values for the DPDL model were +14.36% and +16.69% during the calibration and validation periods, respectively, significantly higher than those of the SWAT model (+5.82% and +10.61%). In contrast, the SWAT model demonstrated superior performance in maintaining the overall water balance.
The daily streamflow hydrographs (Figure 3) clearly illustrate the DPDL model’s superior capability in capturing watershed hydrological dynamics. The model achieved more accurate and stable simulations of major flood peaks, such as the June 2017 event, where its simulated values aligned more closely with observations compared to the SWAT model. This performance effectively mitigated the systematic overestimation or underestimation occasionally observed in the SWAT model. These findings confirm the effectiveness of the dynamic parameterization scheme in DPDL, consistent with observations made by Jarrin-Perez et al. (2025) [42] in the Little River Experimental Watershed. As noted by Wang et al. (2024) [26], physics-encoded deep learning frameworks can enhance model robustness during extreme events by integrating physical constraints into the learning process.
Figure 3.
Observed and simulated hydrographs of streamflow based on SWAT and DPDL.
However, some limitations of the DPDL model were observed. While it captured the general trends of daily streamflow well, it consistently overestimated low-flow periods and exhibited certain fluctuations in these periods. In contrast, the SWAT model demonstrated greater stability during low-flow conditions. This instability in DPDL may be attributed to the limited number of training samples available for daily streamflow simulation.
In summary, although the DPDL model demonstrated significant advantages in capturing watershed hydrological dynamics through its dynamic parameterization scheme, its data-driven nature resulted in persistent overestimation during dry periods and underestimation during flood events. Furthermore, when trained with limited daily streamflow samples, the model exhibited certain fluctuations in simulated hydrographs, highlighting the importance of sufficient training data for achieving stable performance with data-driven hydrological models.
4.3. Evaluation of Hydrological Utility for Different Precipitation Products
Table 6 summarizes the performance metrics of the SWAT and DPDL models under two distinct training strategies (S1 and S2) when driven by different SPPs.
Table 6.
Hydrological model performance under different training strategies driven by multi-source precipitation products.
Under the S1 strategy (parameter-fixed), streamflow simulation accuracy was strongly correlated with the inherent accuracy of the driving precipitation product. The GSMaP product, which demonstrated the highest original accuracy, consequently achieved the best hydrological simulation performance under this strategy (NSE = 0.73 for DPDL during the validation period). Notably, the differentiable hydrological model (DPDL) exhibited superior trend compensation capability. When driven by the same precipitation product, DPDL generally achieved higher NSE values than the SWAT model, indicating its successful capture of flow process dynamics and demonstrating that its dynamic parameters can flexibly respond to input errors. This finding aligns with reports by Song et al. (2024) [65]. However, DPDL showed less satisfactory performance in RBIAS and KGE metrics, systematically overestimating the overall water volume, which corroborates the findings discussed in previous sections.
The S2 strategy (product-specific retraining) significantly enhanced the simulation performance for most precipitation products and hydrological model combinations. This improvement was particularly pronounced in the DPDL model. For instance, for the lower-performance TRMM-RT product, the NSE of DPDL during the validation period increased from 0.57 under the S1 strategy to 0.66 under the S2 strategy, representing an improvement of 15.8%. More importantly, the S2 strategy effectively narrowed the simulation gap between precipitation products of different quality levels. Precipitation products with poorer inherent accuracy achieved the greatest improvement in streamflow simulation accuracy under the S2 strategy, while those with better inherent accuracy showed smaller improvements. This demonstrates that through product-specific retraining, the model can learn and correct the unique systematic biases inherent to different products. However, when driving both SWAT and DPDL models, the CMORPH product showed lower accuracy under the S2 strategy compared to the S1 strategy, which may be related to the specific error characteristics of this precipitation product.
The essence of the S2 strategy lies in how the retraining process optimizes the model’s internal parameters, enabling it to best adapt to the error characteristics of specific precipitation products. This process allows the model to acquire a “correction function” that is difficult to achieve with conceptual models using fixed parameter sets, such as SWAT. Throughout the validation period, DPDL under the S2 strategy achieved an average NSE of 0.71, significantly higher than the 0.63 achieved by the SWAT model, fully demonstrating the powerful potential and adaptability of the differentiable modeling framework in utilizing multi-source precipitation data.
In summary, compared to the SWAT model, the differentiable model (DPDL) with dynamic parameterization capability can effectively and nonlinearly compensate for input errors from satellite precipitation products, although this compensation may be detrimental for the RBIAS metric. Overall, the hydrological utility of precipitation products cannot be simply predicted based on their intrinsic accuracy alone; the coupling effects with specific hydrological models and training strategies must be considered.
Based on the daily streamflow hydrographs for different precipitation products under various strategies presented in Figure 4, the coupling effects of the two strategies with different hydrological models can be further revealed. Compared to the S1 strategy, streamflow simulations under the S2 strategy for both models demonstrate more sensitive responses to fluctuations in satellite precipitation inputs. Although flood peak simulations under S2 appear somewhat conservative, the hydrographs show significantly improved agreement with the rising and falling limbs of observed streamflow, indicating that this strategy better captures actual hydrological dynamics. In contrast, simulation results under the S1 strategy, while temporally smoother, show unstable performance in peak flow simulation, sometimes overestimating and sometimes underestimating observed values, reflecting how model performance under this strategy is more directly constrained by the absolute accuracy of the precipitation product itself.

Figure 4.
Comparison of simulated streamflow hydrographs across different training strategies of hydrological models forced by multi-source precipitation products.
Regarding differences in model structure, the DPDL model exhibits more conservative characteristics compared to the SWAT model, particularly showing systematic slight underestimation of flood peaks, but with concentrated and stable bias ranges. Simultaneously, its response to precipitation fluctuations under the S2 strategy is more sensitive. This characteristic may stem from its dynamic parameterization mechanism and data-driven underlying architecture—where the model continuously updates internal states (such as soil moisture and water storage capacity) in real time based on input sequences. This observation is consistent with the fluctuation phenomenon noted in daily streamflow simulation processes in previous research findings.
4.4. Evaluation of Hydrological Utility Based on Flood Events
We selected 20 typical flood events to conduct an in-depth evaluation of the comprehensive performance of five SPPs in driving both the SWAT and DPDL hydrological models under two training strategies. Analysis of the boxplots showing the distribution of flood event simulation error metrics across multiple scenarios (Figure 5) reveals that the DPDL model demonstrates robust performance in process simulation, achieving median NSE and KGE values of 0.65 and 0.58, respectively. These values are significantly higher than the corresponding values of 0.44 and 0.44 for the SWAT model. In terms of systematic bias control, DPDL also performs better, with a median absolute RBIAS of 14.9% compared to 16.8% for SWAT.
Figure 5.
Box plots of error metric distributions for flood event simulations under multiple scenarios.
The distribution characteristics of the metrics further highlight the advantages of DPDL. The interquartile ranges (25th–75th percentiles) for NSE and KGE of DPDL are concentrated between 0.55 and 0.75 and 0.50–0.65, respectively. In contrast, the SWAT model exhibits much wider corresponding ranges of −0.54 to 0.65 and 0.15 to 0.65, with simulation failures occurring in multiple flood events. This strongly corroborates the significant advantage of the dynamic parameterization mechanism in complex flood event simulation, as proposed by Song et al. (2024) [65].
Analysis of Figure 5 indicates that the S2 strategy (product-specific retraining) can, to some extent, improve the median NSE and KGE of the DPDL model and narrow the performance gap between different SPPs. This confirms that through training, the model can learn product-specific biases and form a built-in “correction function”. However, for the RBIAS metric, the S2 strategy still underperforms the S1 strategy in most scenarios. Furthermore, the effect of the S2 strategy is not universally applicable to improving all metrics for every flood event, nor does it enhance the streamflow simulation accuracy for all precipitation products. This phenomenon is particularly evident in the SWAT model. For example, while the S2 strategy effectively improved the median NSE and KGE of streamflow simulations for TRMM and IMERG-F products in the DPDL model, it actually reduced accuracy in the SWAT model. For the CMORPH product, the S2 strategy failed to effectively improve its median NSE and KGE in either hydrological model. In summary, the advantages demonstrated by the S2 strategy in daily streamflow simulation are still partially reflected in event-scale flood simulation, but with notable differences.
In conclusion, the model structure is a crucial element determining simulation performance, the training strategy serves as a key tool for performance optimization, and the hydrological utility assessment of Satellite Precipitation Products (SPPs) is influenced by multiple factors, necessitating a comprehensive and multi-faceted consideration. Moreover, a single evaluation metric is insufficient to fully assess the hydrological utility of precipitation products. The hydrological utility of precipitation products cannot be simply predicted based on their intrinsic accuracy alone; the coupling effects with specific hydrological models and training strategies must be considered. When driving the DPDL model in the Xiang River Basin, GSMaP and IMERG-F demonstrated the best overall performance, with GSMaP showing the highest potential but greater performance volatility.
5. Discussion
This study reveals significant performance heterogeneity among different SPPs in the Xiang River Basin. The GSMaP product performed best, achieving a CSI of 0.70 and the highest Corr (0.79), followed by IMERG. This indicates that the algorithms of GSMaP are particularly effective in capturing the spatiotemporal patterns of precipitation typical of this region. However, a critical limitation common to all SPPs was exposed under heavy rainfall conditions: when precipitation intensity exceeded 30 mm/day, the correlation coefficients for all products dropped below 0.5, accompanied by relative biases exceeding 24%. This systematic performance degradation in detecting high-intensity rainfall directly explains the fundamental challenge in accurately simulating peak discharges using SPPs.
Compared to the traditional conceptual SWAT model, the differentiable model (DPDL) demonstrated a superior capacity in capturing watershed dynamic processes, as evidenced by consistently higher NSE and lower RMSE values during both the calibration and validation periods. This affirms the potential of dynamic parameterization schemes in representing complex hydrological nonlinearities. However, this enhancement was achieved at the expense of systemic water balance. The DPDL model’s PBIAS of +14.36% was substantially higher than the +5.82% for the SWAT model, indicating a systematic overestimation of streamflow. Furthermore, fluctuations observed in the hydrograph during low-flow periods reveal an inherent limitation of data-driven differentiable models when trained on limited data. Without sufficient physical constraints, these models may learn spurious relationships during low-flow seasons, where the signal-to-noise ratio is low, highlighting the challenge of balancing model flexibility with physical realism.
This study also acknowledges that the dynamic parameterization mechanism carries the risk of amplifying input noise and leading to overfitting when training data is limited, which may compromise the model’s physical consistency and generalization capability. Although regularization techniques have been employed to mitigate this issue, the current analysis has not yet systematically evaluated the relationship between training data volume and the stability of dynamic parameters. Future research should further quantify the robustness of data-driven modules through experiments controlling training data length and adding noise, thereby clarifying the advantages and limitations of dynamic parameterization in practical applications.
In the daily scale streamflow simulation, the architecture of the DPDL model itself provided an inherent compensation capacity for SPP uncertainties, outperforming the SWAT model under both the S1 (parameter-fixed) and S2 (product-specific retraining) strategies. The S2 strategy further amplified this advantage, elevating the mean NSE of the DPDL model to 0.71, which was significantly higher than the 0.63 achieved by the SWAT model. This demonstrates that the DPDL model can effectively learn to adjust its internal mappings to correct for product-specific systematic biases, thereby not only enhancing simulation accuracy but, more crucially, narrowing the performance gap among different SPPs. This “homogenization effect” reduces the dependency of hydrological simulations on the selection of a specific “best” product, thereby enhancing the robustness for operational applications. Nevertheless, the persistence of a high bias under the S2 strategy suggests that the model’s compensation mechanism may be incomplete, or that the model prioritizes fitting high-flow events at the expense of overall water balance during its learning process. Simultaneously, this study observed a clear difference in the effectiveness of the S2 training strategy between the DPDL and SWAT models. This is primarily attributed to the fundamental differences in structural flexibility and parameter adaptability between the two model types. The differentiable architecture and dynamic parameterization mechanism of DPDL allow it to flexibly adjust internal states through training, thereby more effectively learning and compensating for error patterns specific to different satellite precipitation products. In contrast, the static parameter structure of traditional hydrological models like SWAT limits their ability to adapt to complex, heterogeneous input errors. Furthermore, the regional and systematic biases inherent in the satellite data themselves, along with the nonlinear characteristics of error propagation through the model processes, collectively influence the final effectiveness of the retraining strategy. Therefore, the effectiveness of the strategy depends not only on the model’s own learning capability but also on the degree of match between the error characteristics of the input data and the model structure.
This study selects SWAT as the comparative benchmark to reveal the core differences between differentiable modeling and traditional semi-distributed physical models. It is important to note that SWAT primarily represents the physics-based semi-distributed modeling paradigm. Consequently, the conclusions of this study focus on the comparison between these two model types and do not encompass other architectures such as purely data-driven models (e.g., LSTM) or lumped conceptual models (e.g., GR4J).
Future research could systematically expand the comparison dimensions. For instance, comparisons with deep learning models could help distinguish the contributions of “physics-informed” and “data-driven” elements; comparisons with conceptual models could examine the impact of structural complexity on error propagation; and comparisons with other distributed models could verify the universality of the observed patterns. Such work would contribute to a more precise positioning of differentiable hydrological models within the model spectrum, clarify their strengths and applicable boundaries, and guide the development of this framework towards greater robustness and generality.
The effectiveness of the S2 strategy was markedly diminished and selective in flood event simulation. Although the DPDL model achieved the best overall performance in flood events (median NSE = 0.65, KGE = 0.58), the S2 strategy failed to consistently improve all metrics across all events and products, a phenomenon particularly evident within the SWAT framework. This discrepancy can be attributed to the differing nature of errors across time scales. Errors at the daily scale can be partially compensated for through continuous parameter adjustment, whereas the short-duration errors in SPPs during the genesis of flood events are too acute and non-stationary to be fully overcome by any calibration strategy.
6. Conclusions
This study conducted a systematic evaluation of the coupled performance between a differentiable hydrological model (DPDL) and multiple SPPs in the Xiang River Basin, China, leading to the following main conclusions:
- (1)
- The performance of SPPs is heterogeneous, and their capability to capture extreme events is insufficient. GSMaP exhibited the best overall statistical performance, but all SPPs have significant deficiencies in capturing heavy precipitation (>30 mm/day), which is the root cause of their inability to reliably simulate flood peaks.
- (2)
- The DPDL model possesses superior process simulation capability but poses challenges in maintaining water balance. The differentiable model structure significantly outperformed the traditional SWAT model in simulating streamflow dynamics (NSE, RMSE). However, it consistently introduced a higher systemic bias (PBIAS > 14%) and low-flow instability, revealing a critical trade-off between process representation fidelity and physical consistency under limited training data.
- (3)
- Model structure is a key factor determining bias compensation capacity. The DPDL model exhibited a stronger intrinsic ability to compensate for SPP uncertainties compared to SWAT. The product-specific retraining (S2) strategy effectively leveraged this capacity, not only further enhancing the daily scale simulation accuracy but, more importantly, homogenizing the performance differences among various SPPs, thereby reducing the sensitivity of simulation results to product choice.
- (4)
- The effectiveness of the strategy is scale- and model-dependent. The utility of the S2 strategy is not universal. It proved highly effective for the daily scale streamflow simulation, but its improvements for flood event simulation were selective. Furthermore, the advantages of the S2 strategy were more pronounced in the structurally more flexible DPDL model than in the more rigid SWAT model.
Author Contributions
Conceptualization, S.Y. and C.J.; methodology, S.Y.; software, S.Y.; validation, S.Y. and Y.L.; formal analysis, S.Y.; investigation, Y.L.; resources, Y.L.; data curation, X.W. and S.Y.; writing—original draft preparation, S.Y.; writing—review and editing, C.J.; visualization, S.Y.; supervision, C.J.; project administration, C.J.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.
Funding
The study was supported by the National Natural Science Foundation of China (52079010), the Water Resources Science and Technology Program of Hunan Province (XSKJ2023059-06), the Key Research and Development Program Project of Hunan Province, China (2025AQ2014) and the Major Water Science and Technology Projects of Hunan Province (XSKJ2024064-2).
Data Availability Statement
The TRMM 3B42RT and 3B42V7 data are available at https://disc.gsfc.nasa.gov/datasets?keywords=TRMM, accessed on 8 April 2023. The GPM IMERG-F data are available at https://gpm1.gesdisc.eosdis.nasa.gov/data/, accessed on 8 April 2023. The GSMaP data are available at https://sharaku.eorc.jaxa.jp/GSMaP/index.htm, accessed on 8 April 2023. The CMORPH data are available at https://www.ncei.noaa.gov/data/cmorph-high-resolution-global-precipitation-estimates/access/daily/, accessed on 8 April 2023.
Acknowledgments
The authors would like to thank the China Meteorological Administration National Meteorological Information Center for the provision of the gauge-based precipitation and air temperature observations, and the Hydrological and Water Resources Survey Center of Hunan Province for providing the streamflow data. The authors would like to express gratitude to the National Aeronautics and Space Administration (NASA) and the Japan Aerospace Exploration Agency (JAXA) for providing the TRMM products; the NASA Goddard Earth Sciences Data and Information Services Center for providing the GPM IMERG data; JAXA for providing the GSMaP data; and the NOAA Climate Prediction Center for providing the CMORPH data. The authors also wish to express their gratitude to the anonymous reviewers for their valuable suggestions and comments.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Yang, D.; Yang, Y.; Xia, J. Hydrological Cycle and Water Resources in a Changing World: A Review. Geogr. Sustain. 2021, 2, 115–122. [Google Scholar] [CrossRef]
- Davenport, F.V.; Burke, M.; Diffenbaugh, N.S. Contribution of Historical Precipitation Change to US Flood Damages. Proc. Natl. Acad. Sci. USA 2021, 118, e2017524118. [Google Scholar] [CrossRef]
- Zubelzu, S.; Ghalkha, A.; Ben Issaid, C.; Zanella, A.; Bennis, M. Coupling Machine Learning and Physical Modelling for Predicting Runoff at Catchment Scale. J. Environ. Manag. 2024, 354, 120404. [Google Scholar] [CrossRef] [PubMed]
- Yang, S.; Yang, D.; Chen, J.; Santisirisomboon, J.; Lu, W.; Zhao, B. A Physical Process and Machine Learning Combined Hydrological Model for Daily Streamflow Simulations of Large Watersheds with Limited Observation Data. J. Hydrol. 2020, 590, 125206. [Google Scholar] [CrossRef]
- McMillan, H.K.; Booker, D.J.; Cattoën, C. Validation of a National Hydrological Model. J. Hydrol. 2016, 541, 800–815. [Google Scholar] [CrossRef]
- Clark, M.P.; Schaefli, B.; Schymanski, S.J.; Samaniego, L.; Luce, C.H.; Jackson, B.M.; Freer, J.E.; Arnold, J.R.; Moore, R.D.; Istanbulluoglu, E.; et al. Improving the Theoretical Underpinnings of Process-based Hydrologic Models. Water Resour. Res. 2016, 52, 2350–2365. [Google Scholar] [CrossRef]
- Semenova, O.; Beven, K. Barriers to Progress in Distributed Hydrological Modelling. Hydrol. Process. 2015, 29, 2074–2078. [Google Scholar] [CrossRef]
- Kollet, S.; Sulis, M.; Maxwell, R.M.; Paniconi, C.; Putti, M.; Bertoldi, G.; Coon, E.T.; Cordano, E.; Endrizzi, S.; Kikinzon, E.; et al. The Integrated Hydrologic Model Intercomparison Project, IH-MIP2: A Second Set of Benchmark Results to Diagnose Integrated Hydrology and Feedbacks. Water Resour. Res. 2017, 53, 867–890. [Google Scholar] [CrossRef]
- Moges, E.; Demissie, Y.; Larsen, L.; Yassin, F. Review: Sources of Hydrological Model Uncertainties and Advances in Their Analysis. Water 2020, 13, 28. [Google Scholar] [CrossRef]
- Sun, R.; Pan, B.; Duan, Q. Learning Distributed Parameters of Land Surface Hydrologic Models Using a Generative Adversarial Network. Water Resour. Res. 2024, 60, e2024WR037380. [Google Scholar] [CrossRef]
- Jiang, S.; Zheng, Y.; Wang, C.; Babovic, V. Uncovering Flooding Mechanisms Across the Contiguous United States Through Interpretive Deep Learning on Representative Catchments. Water Resour. Res. 2022, 58, e2021WR030185. [Google Scholar] [CrossRef]
- Sadler, J.M.; Appling, A.P.; Read, J.S.; Oliver, S.K.; Jia, X.; Zwart, J.A.; Kumar, V. Multi-Task Deep Learning of Daily Streamflow and Water Temperature. Water Resour. Res. 2022, 58, e2021WR030138. [Google Scholar] [CrossRef]
- Arsenault, R.; Martel, J.-L.; Brunet, F.; Brissette, F.; Mai, J. Continuous Streamflow Prediction in Ungauged Basins: Long Short-Term Memory Neural Networks Clearly Outperform Traditional Hydrological Models. Hydrol. Earth Syst. Sci. 2023, 27, 139–157. [Google Scholar] [CrossRef]
- Kratzert, F.; Klotz, D.; Shalev, G.; Klambauer, G.; Hochreiter, S.; Nearing, G. Towards Learning Universal, Regional, and Local Hydrological Behaviors via Machine Learning Applied to Large-Sample Datasets. Hydrol. Earth Syst. Sci. 2019, 23, 5089–5110. [Google Scholar] [CrossRef]
- Sit, M.; Demiray, B.Z.; Xiang, Z.; Ewing, G.J.; Sermet, Y.; Demir, I. A Comprehensive Review of Deep Learning Applications in Hydrology and Water Resources. Water Sci. Technol. 2020, 82, 2635–2670. [Google Scholar] [CrossRef]
- Li, H.; Zhang, C.; Chu, W.; Shen, D.; Li, R. A Process-Driven Deep Learning Hydrological Model for Daily Rainfall-Runoff Simulation. J. Hydrol. 2024, 637, 131434. [Google Scholar] [CrossRef]
- Feng, D.; Liu, J.; Lawson, K.; Shen, C. Differentiable, Learnable, Regionalized Process-Based Models with Multiphysical Outputs Can Approach State-Of-The-Art Hydrologic Prediction Accuracy. Water Resour. Res. 2022, 58, e2022WR032404. [Google Scholar] [CrossRef]
- Mohammadi, B. Application of Machine Learning and Remote Sensing in Hydrology. Sustainability 2022, 14, 7586. [Google Scholar] [CrossRef]
- Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble Machine Learning Paradigms in Hydrology: A Review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
- Fang, K.; Shen, C.; Kifer, D.; Yang, X. Prolongation of SMAP to Spatiotemporally Seamless Coverage of Continental U.S. Using a Deep Learning Neural Network. Geophys. Res. Lett. 2017, 44, 11030–11039. [Google Scholar] [CrossRef]
- Adera, S.; Bellugi, D.; Dhakal, A.; Larsen, L. Streamflow Prediction at the Intersection of Physics and Machine Learning: A Case Study of Two Mediterranean-Climate Watersheds. Water Resour. Res. 2024, 60, e2023WR035790. [Google Scholar] [CrossRef]
- Kalantar, B.; Ueda, N.; Idrees, M.O.; Janizadeh, S.; Ahmadi, K.; Shabani, F. Forest Fire Susceptibility Prediction Based on Machine Learning Models with Resampling Algorithms on Remote Sensing Data. Remote Sens. 2020, 12, 3682. [Google Scholar] [CrossRef]
- Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep Learning in Environmental Remote Sensing: Achievements and Challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
- Tsai, W.-P.; Feng, D.; Pan, M.; Beck, H.; Lawson, K.; Yang, Y.; Liu, J.; Shen, C. From Calibration to Parameter Learning: Harnessing the Scaling Effects of Big Data in Geoscientific Modeling. Nat. Commun. 2021, 12, 5988. [Google Scholar] [CrossRef] [PubMed]
- Feng, D.; Beck, H.; Lawson, K.; Shen, C. The Suitability of Differentiable, Physics-Informed Machine Learninghydrologic Models for Ungauged Regions and Climate Change Impact Assessment. Hydrol. Earth Syst. Sci. 2023, 27, 2357–2373. [Google Scholar] [CrossRef]
- Wang, C.; Jiang, S.; Zheng, Y.; Han, F.; Kumar, R.; Rakovec, O.; Li, S. Distributed Hydrological Modeling with Physics-Encoded Deep Learning: A General Framework and Its Application in the Amazon. Water Resour. Res. 2024, 60, e2023WR036170. [Google Scholar] [CrossRef]
- He, M.; Jiang, S.; Ren, L.; Cui, H.; Du, S.; Zhu, Y.; Qin, T.; Yang, X.; Fang, X.; Xu, C.-Y. Exploring the Performance and Interpretability of Hybrid Hydrologic Model Coupling Physical Mechanisms and Deep Learning. J. Hydrol. 2025, 649, 132440. [Google Scholar] [CrossRef]
- Zhang, C.; Li, H.; Hu, Y.; Shen, D.; Xu, B.; Chen, M.; Chu, W.; Li, R. A Differentiability-Based Processes and Parameters Learning Hydrologic Model for Advancing Runoff Prediction and Process Understanding. J. Hydrol. 2025, 661, 133594. [Google Scholar] [CrossRef]
- Ouyang, W.; Ye, L.; Chai, Y.; Ma, H.; Chu, J.; Peng, Y.; Zhang, C. A Differentiable, Physics-Based Hydrological Model and Its Evaluation for Data-Limited Basins. J. Hydrol. 2025, 649, 132471. [Google Scholar] [CrossRef]
- Sawadekar, K.; Song, Y.; Pan, M.; Beck, H.; McCrary, R.; Ullrich, P.; Lawson, K.; Shen, C. Improving Differentiable Hydrologic Modeling with Interpretable Forcing Fusion. J. Hydrol. 2025, 659, 133320. [Google Scholar] [CrossRef]
- Zhong, L.; Lei, H.; Yang, J. Development of a Distributed Physics-Informed Deep Learning Hydrological Model for Data-Scarce Regions. Water Resour. Res. 2024, 60, e2023WR036333. [Google Scholar] [CrossRef]
- Zhong, L.; Lei, H.; Gao, B. Developing a Physics-Informed Deep Learning Model to Simulate Runoff Response to Climate Change in Alpine Catchments. Water Resour. Res. 2023, 59, e2022WR034118. [Google Scholar] [CrossRef]
- Wi, S.; Steinschneider, S. Assessing the Physical Realism of Deep Learning Hydrologic Model Projections Under Climate Change. Water Resour. Res. 2022, 58, e2022WR032123. [Google Scholar] [CrossRef]
- Adam, J.C.; Lettenmaier, D.P. Adjustment of Global Gridded Precipitation for Systematic Bias. J. Geophys. Res. 2003, 108, 2002JD002499. [Google Scholar] [CrossRef]
- Wang, X.; Zhou, J.; Ma, J.; Luo, P.; Fu, X.; Feng, X.; Zhang, X.; Jia, Z.; Wang, X.; Huang, X. Evaluation and Comparison of Reanalysis Data for Runoff Simulation in the Data-Scarce Watersheds of Alpine Regions. Remote Sens. 2024, 16, 751. [Google Scholar] [CrossRef]
- Meng, H.; Zhao, T. Evaluation of the Hydrological Utility of the GPM IMERG Satellite Precipitation Products. Atmos. Res. 2025, 322, 108139. [Google Scholar] [CrossRef]
- Tian, F.; Hou, S.; Yang, L.; Hu, H.; Hou, A. How Does the Evaluation of the GPM IMERG Rainfall Product Depend on Gauge Density and Rainfall Intensity? J. Hydrometeorol. 2018, 19, 339–349. [Google Scholar] [CrossRef]
- Sharifi, E.; Steinacker, R.; Saghafian, B. Multi Time-Scale Evaluation of High-Resolution Satellite-Based Precipitation Products over Northeast of Austria. Atmos. Res. 2018, 206, 46–63. [Google Scholar] [CrossRef]
- Tan, M.L.; Santo, H. Comparison of GPM IMERG, TMPA 3B42 and PERSIANN-CDR Satellite Precipitation Products over Malaysia. Atmos. Res. 2018, 202, 63–76. [Google Scholar] [CrossRef]
- Javanmard, S.; Yatagai, A.; Nodzu, M.I.; BodaghJamali, J.; Kawamoto, H. Comparing High-Resolution Gridded Precipitation Data with Satellite Rainfall Estimates of TRMM_3B42 over Iran. Adv. Geosci. 2010, 25, 119–125. [Google Scholar] [CrossRef]
- Michelson, D.B. Systematic Correction of Precipitation Gauge Observations Using Analyzed Meteorological Variables. J. Hydrol. 2004, 290, 161–177. [Google Scholar] [CrossRef]
- Jarrin-Perez, F.; Jeong, J.; Bieger, K.; Roger, J.-C.; Choi, S. Evaluating IMERG-F Precipitation for SWAT Hydrologic Modeling in Data-Rich and Sparse Watersheds. Environ. Model. Softw. 2025, 192, 106574. [Google Scholar] [CrossRef]
- Gao, Z.; Tang, G.; Jing, W.; Hou, Z.; Yang, J.; Sun, J. Evaluation of Multiple Satellite, Reanalysis, and Merged Precipitation Products for Hydrological Modeling in the Data-Scarce Tributaries of the Pearl River Basin, China. Remote Sens. 2023, 15, 5349. [Google Scholar] [CrossRef]
- Zhang, Y.; Wu, C.; Yeh, P.J.-F.; Li, J.; Hu, B.X.; Feng, P.; Lei, Y. Evaluation of Multi-Satellite Precipitation Products in Estimating Precipitation Extremes over Mainland China at Annual, Seasonal and Monthly Scales. Atmos. Res. 2022, 279, 106387. [Google Scholar] [CrossRef]
- Lu, J.; Wang, K.; Wu, G.; Mao, Y. Evaluation of Multisource Datasets in Characterizing Spatiotemporal Characteristics of Extreme Precipitation from 2001 to 2019 in China. J. Hydrometeorol. 2024, 25, 515–539. [Google Scholar] [CrossRef]
- Xiang, Y.; Chen, J.; Li, L.; Peng, T.; Yin, Z. Evaluation of Eight Global Precipitation Datasets in Hydrological Modeling. Remote Sens. 2021, 13, 2831. [Google Scholar] [CrossRef]
- Jiang, S.; Ding, Y.; Liu, R.; Wei, L.; Liu, Y.; Ren, M.; Ren, L. Assessing the Potential of IMERG and TMPA Satellite Precipitation Products for Flood Simulations and Frequency Analyses over a Typical Humid Basin in South China. Remote Sens. 2022, 14, 4406. [Google Scholar] [CrossRef]
- Lyu, X.; Li, Z.; Li, X. Evaluation of GPM IMERG Satellite Precipitation Products in Event-Based Flood Modeling over the Sunshui River Basin in Southwestern China. Remote Sens. 2024, 16, 2333. [Google Scholar] [CrossRef]
- Woods, D.; Kirstetter, P.-E.; Vergara, H.; Duarte, J.A.; Basara, J. Hydrologic Evaluation of the Global Precipitation Measurement Mission over the U.S.: Flood Peak Discharge and Duration. J. Hydrol. 2023, 617, 129124. [Google Scholar] [CrossRef]
- Al Khoury, I.; Boithias, L.; Sivelle, V.; Bailey, R.T.; Abbas, S.A.; Filippucci, P.; Massari, C.; Labat, D. Evaluation of Precipitation Products for Small Karst Catchment Hydrological Modeling in Data-Scarce Mountainous Regions. J. Hydrol. 2024, 645, 132131. [Google Scholar] [CrossRef]
- Nanding, N.; Wu, H.; Tao, J.; Maggioni, V.; Beck, H.E.; Zhou, N.; Huang, M.; Huang, Z. Assessment of Precipitation Error Propagation in Discharge Simulations over the Contiguous United States. J. Hydrometeorol. 2021, 22, 1987–2008. [Google Scholar] [CrossRef]
- Wan, Y.; Li, D.; Sun, J.; Wang, M.; Liu, H. Evaluation of Six Latest Precipitation Datasets for Extreme Precipitation Estimates and Hydrological Application across Various Climate Regions in China. Atmos. Res. 2025, 315, 107932. [Google Scholar] [CrossRef]
- Zhu, Z.; Yong, B.; Ke, L.; Wang, G.; Ren, L.; Chen, X. Tracing the Error Sources of Global Satellite Mapping of Precipitation for GPM (GPM-GSMaP) Over the Tibetan Plateau, China. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2181–2191. [Google Scholar] [CrossRef]
- Helmi, A.M.M.; Abdelhamed, M.S.S. Evaluation of CMORPH, PERSIANN-CDR, CHIRPS V2.0, TMPA 3B42 V7, and GPM IMERG V6 Satellite Precipitation Datasets in Arabian Arid Regions. Water 2023, 15, 92. [Google Scholar] [CrossRef]
- Hargreaves, G.H.; Allen, R.G. History and Evaluation of Hargreaves Evapotranspiration Equation. J. Irrig. Drain Eng. 2003, 129, 53–63. [Google Scholar] [CrossRef]
- Yang, J.; Huang, X. 30 m Annual Land Cover and Its Dynamics in China from 1990 to 2019. Earth Syst. Sci. Data Discuss. 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
- Dai, Y.; Wei, N.; Yuan, H.; Zhang, S.; Shangguan, W.; Liu, S.; Lu, X.; Xin, Y. Evaluation of Soil Thermal Conductivity Schemes for Use in Land Surface Modeling. J. Adv. Model. Earth Syst. 2019, 11, 3454–3473. [Google Scholar] [CrossRef]
- Sciuto, L.; Vanella, D.; Cirelli, G.L.; Consoli, S.; Licciardello, F.; Longo-Minnolo, G. Improving Runoff Estimation in Hydrological Models Using Remote Sensing and Climate Data Reanalysis in the Dittaino River Basin (Eastern Sicily, Italy). J. Hydrol. Reg. Stud. 2025, 60, 102569. [Google Scholar] [CrossRef]
- Patil, S.; Stieglitz, M. Modelling Daily Streamflow at Ungauged Catchments: What Information Is Necessary?: Modelling Daily Streamflow at Ungauged Catchments. Hydrol. Process. 2014, 28, 1159–1169. [Google Scholar] [CrossRef]
- Mizukami, N.; Clark, M.P.; Newman, A.J.; Wood, A.W.; Gutmann, E.D.; Nijssen, B.; Rakovec, O.; Samaniego, L. Towards Seamless Large-domain Parameter Estimation for Hydrologic Models. Water Resour. Res. 2017, 53, 8020–8040. [Google Scholar] [CrossRef]
- Gao, R.; Li, L.; Wang, Y.; Li, W.; Yun, Z.; Gai, Y. Improvements and Limitations of the Latest Version 8 of GSMaP Compared with Its Former Version 7 and IMERG V06 at Multiple Spatio-Temporal Scales in Mainland China. Atmos. Res. 2024, 308, 107517. [Google Scholar] [CrossRef]
- Zhou, Z.; Guo, B.; Xing, W.; Zhou, J.; Xu, F.; Xu, Y. Comprehensive Evaluation of Latest GPM Era IMERG and GSMaP Precipitation Products over Mainland China. Atmos. Res. 2020, 246, 105132. [Google Scholar] [CrossRef]
- Shen, Z.; Yong, B.; Yi, L.; Wu, H.; Xu, H. From TRMM to GPM, How Do Improvements of Post/near-Real-Time Satellite Precipitation Estimates Manifest? Atmos. Res. 2022, 268, 106029. [Google Scholar] [CrossRef]
- Wang, J.; Zhuo, L.; Han, D.; Liu, Y.; Rico-Ramirez, M.A. Hydrological Model Adaptability to Rainfall Inputs of Varied Quality. Water Resour. Res. 2023, 59, e2022WR032484. [Google Scholar] [CrossRef]
- Song, Y.; Knoben, W.J.M.; Clark, M.P.; Feng, D.; Lawson, K.; Sawadekar, K.; Shen, C. When Ancient Numerical Demons Meet Physics-Informed Machine Learning: Adjoint-Based Gradients for Implicit Differentiable Modeling. Hydrol. Earth Syst. Sci. 2024, 28, 3051–3077. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.