You are currently viewing a new version of our website. To view the old version click .
Remote Sensing
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

9 December 2025

Research on Multi-Source Precipitation Fusion Based on Classification and Regression Machine Learning Methods—A Case Study of the Min River Basin in the Eastern Source of the Qinghai–Tibet Plateau

,
,
,
and
1
Henan Agricultural Remote Sensing Big Data Development and Innovation Laboratory, Shangqiu Normal University, Shangqiu 476000, China
2
Yellow River Institute of Hydraulic Research, Yellow River Conservancy Commission (YRCC), Zhengzhou 450003, China
3
Research Center on Levee Safety Disaster Prevention, Ministry of Water Resources (MWR), Zhengzhou 450003, China
4
The National Key Laboratory of Water Disaster Prevention, Nanjing Hydraulic Research Institute, No. 225, Guangzhou Road, Nanjing 210029, China

Abstract

Against the backdrop of insufficient accuracy and adaptability of satellite precipitation products in complex terrain areas, this study focused on the Min River Basin (MRB) on the eastern edge of the Qinghai–Tibet Plateau. A two-step machine learning fusion framework was established, which integrates precipitation event identification and quantitative intensity estimation in a systematic manner. This framework incorporated 5 precipitation products (PERSIANN-CDR, CMORPH, GSMaP, IMERG, MSWEP), measured data, and environmental variables. The study compared the precipitation estimation performance of Random Forest (RF), Extreme Learning Machine (ELM), eXtreme Gradient Boosting (XGBoost), Bagging, and Double Machine Learning (DML) models, and analyzed the models' performance under different precipitation intensities and altitudes, as well as their variable sensitivity. The results showed that: (1) DML models outperformed Single Machine Learning (SML) models and original precipitation products, with RF-Bagging being the optimal model. The daily-scale Correlation Coefficient (CC) of RF-Bagging was over 50% higher than that of original products, while the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) were reduced by more than 40% and 35%, respectively. (2) For moderate-to-heavy precipitation, the RF-Bagging and RF-RF models maintain a stable Critical Success Index (CSI) of 0.7. In high-altitude regions, their Probability of Detection (POD) approaches 1, and the Heidke Skill Score (HSS) is 30–40% higher than that in mid-altitude areas, significantly outperforming other models and demonstrating strong adaptability to complex terrain. For light precipitation, while the POD values of these two models are comparable to those of other models, their False Alarm Rate (FAR) is reduced by 15–20%, effectively mitigating precipitation false alarms. (3) GSMaP, IMERG, and MSWEP were the core input variables for all models. RF and ELM models were more dependent on environmental variables, while XGBoost and Bagging models relied more on satellite data. This framework can provide technical references for precipitation estimation in complex terrain areas and contribute to watershed water resource management as well as flood prevention and mitigation.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.