Enhancing Streamflow Modeling in Data-Scarce Catchments with Similarity-Guided Source Selection and Transfer Learning

Gao, Yuxuan; Mandania, Rupal; Ma, Jun; Chen, Jack; Zhuang, Wuyi

doi:10.3390/w17182762

Open AccessArticle

Enhancing Streamflow Modeling in Data-Scarce Catchments with Similarity-Guided Source Selection and Transfer Learning

by

Yuxuan Gao

^1,*

,

Rupal Mandania

²

,

Jun Ma

³,

Jack Chen

⁴ and

Wuyi Zhuang

¹

Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK

²

School of Business and Economics, Loughborough University, Loughborough LE11 3TU, UK

³

CHN Energy Technology & Economics Research Institute, Beijing Changping District, Beijing 102211, China

⁴

CamDragon Co., Ltd., 2 Braybrooke Place, Cambridge CB1 3LN, UK

^*

Author to whom correspondence should be addressed.

Water 2025, 17(18), 2762; https://doi.org/10.3390/w17182762

Submission received: 30 June 2025 / Revised: 10 September 2025 / Accepted: 16 September 2025 / Published: 18 September 2025

(This article belongs to the Special Issue Computer Modelling Techniques in Environmental Hydraulics and Water Resource Engineering)

Download

Browse Figures

Versions Notes

Abstract

Accurate streamflow modeling in data-scarce catchments remains a significant challenge due to the limited availability of historical records. Transfer Learning (TL), increasingly applied in hydrology, leverages knowledge from data-rich catchments (sources) to enhance predictions in data-scarce catchments (targets), providing new possibilities of hydrological predictions. Most existing TL approaches pre-train models on large-scale meteoro-hydrological datasets and show good generalizability across multiple target catchments. However, for a specific target catchment, it remains unclear which source catchments contribute most effectively to the accurate prediction. Including many irrelevant sources may even degrade model performance. In this study, we investigated how source catchment selection affects TL performance by employing similarity-guided strategies based on three key factors, i.e., spatial distance, physical attributes, and flow regime characteristics. Using the CAMELS-GB dataset, we conducted comparative experiments by pre-training the networks with different ranked groups of the source catchments and fine-tuning them on three target catchments representing distinct hydrological environments. The results showed that carefully selected small subsets (fewer than 40, or even as few as 10) of highly similar catchments can achieve comparable or better TL performance than using all 668 available source catchments. All three target catchments yielded better NSE results from source catchments with closer spatial proximity and more consistent flow regimes. The TL performance of physical attribute similarity-based selection varied depending on the attribute combinations, with those related to land cover, climate, and soil properties leading to superior performance. These findings highlight the importance of similarity-guided source selection in hydrological TL. In addition, they demonstrate ways to reduce computational costs while improving modeling accuracy in data-scarce regions.

Keywords:

hydrological predictions; data-scarce; deep learning; transfer learning; catchment similarity; source selection

1. Introduction

Accurate hydrological modeling is essential for flood prevention, water resource management, and evaluating the impacts of climate changes and human activities [1,2]. Reliable inputs are crucial to achieve accurate predictions, including meteorological data, hydrological records, and relevant physical attributes [3]. Generally, hydrological data are less abundant and comprehensive compared to meteorological data. This disparity arises partly because key hydrological processes, such as evaporation, infiltration, and subsurface flow, are inherently complex and challenging to measure [4]. Even for relatively easier-to-measure variables, such as streamflow, the global network of observation stations is much less developed than meteorological monitoring systems, leading to substantial data gaps in many regions [2].

Currently, the Global Runoff Data Centre (GRDC) contains streamflow data from 10,836 gauging stations worldwide. However, only a small subset of these stations meets the criteria for providing high-quality data essential for reliable hydrological modeling. For instance, Huang et al. identified only 1761 “high-quality” catchments globally from the GRDC dataset, using selected criteria such as catchment boundary quality, irrigation area, and a minimum of 10 years of continuous daily observations [1]. Among these criteria, the length of historical time-series records is a particularly critical factor influencing data reliability. Figure 1 illustrates the distribution of GRDC gauging stations based on their record length, highlighting significant disparities across regions [5]. In many developing countries, stations often have only 1–10 years of available data, posing substantial challenges for long-term hydrological forecasting in these areas.

To improve the hydrological modeling accuracy in data-scarce or ungauged catchments, a commonly adopted solution is to transform hydrological information from gauged areas to ungauged areas [6]. This generally falls into two categories: (1) a regionalization approach for traditional hydrological models, and (2) a transfer learning approach for deep learning models.

For the first category, substantial progress has been made, especially since the predictions in ungauged basins (PUB) decade [7]. Regionalization methods have been employed predominantly with conceptual models (e.g., GR4J [8,9]) and, to a lesser extent, with physically based models (e.g., SWAT [10]). These methods can be generally grouped into three types: (i) similarity-based methods, which rely on spatial proximity or similar catchment attributes (such as climate type, land use, and geology); (ii) regression-based methods, which establish regression relationships between model parameters and catchment descriptors; and (iii) hydrological signature-based methods, which leverage key information embedded in streamflow data (e.g., mean flow, flow percentile, and baseflow index. However, the main limitation of this approach does not lie in its regionalization rationale, but in the traditional hydrological models, which inevitably oversimplify the nonlinear and complex nature of streamflow [11].

In contrast, recent deep learning (DL) models have demonstrated a strong ability for directly capturing nonlinear patterns and relationships from observed meteorological and historical hydrological data [12]. Commonly used architectures in hydrological studies include Artificial Neural Networks (ANN) [13], Convolutional Neural Networks (CNN) [14], and Recurrent Neural Networks (RNN) [15], along with their variants—Gated Recurrent Unit (GRU) [16], Long Short-Term Memory (LSTM) [17,18], and Transformers [19]. However, these models are generally data-hungry, requiring a large amount of labeled data for effective training. In data-scarce regions, traditional (i.e., non-transferable) DL models are habitually trained over limited labeled data, leading to degrading or even failing performance [20,21].

To address these challenges, Transfer Learning (TL), a technique in ML/DL, offers a new insight for data-scarce scenarios. TL enables the transfer of learned knowledge from a data-rich “source” domain (catchment) to a related data-scarce “target” domain (catchment) [22]. It can be applied to various real-world tasks, including regression (e.g., time-series regression in this study), classification, and clustering [23]. TL has been widely applied across diverse fields, such as image recognition [24], natural language processing [25], biology and medicine [26], economics [27], and military applications [28]. However, its application in hydrological studies remains relatively nascent, with research efforts emerging only in recent years.

Ma et al. used Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) streamflow dataset, comprising 671 U.S. catchments, to pre-train an LSTM model, and transferred it to other continents with varying data densities, including Great Britain, Chile, and China. This approach enhanced overall model performance, demonstrating the feasibility of cross-continental knowledge transfer for streamflow prediction [20]. Similarly, Khoshkalam et al. leveraged the knowledge from the CAMELS dataset but tested the transferability on snow-dominated regions in Southern Quebec using the LSTM model, achieving improved accuracy in daily streamflow predictions [29]. Muhammad and Abba used a source selection strategy based on Dynamic Time Warping (DTW) and semantic entropy calculation to select 10 source catchments from the 438-catchment Model Parameter Estimation Experiment (MOPEX) dataset. They then applied TL with Gated Recurrent Unit (TL + GRU) model to improve the streamflow predictions for most catchments [30]. Xu et al. proposed a cross-regional interpretable machine learning (XGBoost) TL model to predict runoff in ungauged basins, leveraging flowmeter and catchment characteristic data from 5764 catchments across various climate zones in the Caravan dataset, achieving NSE values improvement [31].

These studies demonstrate that current hydrological TL efforts predominantly rely on large-scale meteoro-hydrological datasets. This strategy offers the advantage of improving overall model performance across multiple target catchments due to the broad range of hydrological scenarios, resulting in strong generalizability. However, several key challenges remain:

(1) Local performance trade-off: There is a long-standing issue termed “negative transfer” in TL-related studies, which refers to the situation where leveraging source domain data undesirably reduces learning performance in the target domain [32]. It generally arises from four reasons: large domain divergence, poor source data quality, poor target data quality, and inappropriate TL algorithms [33]. In the context of hydrological TL, domain divergence is particularly critical. When using all catchments from a large dataset, it remains unclear which catchments contribute positively or negatively to the final TL performance. While using large datasets has been favored for their broad coverage of diverse hydrological events, its generalization can sometimes come at the cost of degraded local performance for specific target catchments [34]. For such cases, although fine-tuning helps the model adapt to local patterns, the inclusion of an excessive number of low-correlation catchments may reduce or even destroy the positive impact of high-related catchments [34]. Therefore, this study aims to systematically investigate various source selection strategies to better quantify the impact of source-target similarity on transfer effectiveness.

(2) High computational cost: Based on our experiments and records from previous studies, pretraining the basic LSTM model on the CAMELS-GB dataset typically requires 6–10 h for one hyperparameter configuration, depending on the computational device [35]. When exploring more advanced TL architectures, such as domain adaptation which networks align source and target domains in high-dimensional feature space, such large dataset imposes substantial computational demands and further increases the model training complexity.

(3) Limited availability of large-scale datasets in some regions: Although more countries have recently contributed to the expansion of the CAMELS dataset, many regions in the world still lack dense hydrological monitoring networks, making it difficult to compile long-term records. Beyond applying cross-continental transfer [20], this study also explores the potential of identifying a small number of locally and highly correlated source catchments to achieve comparable or even better TL performance.

To achieve these, this study ranks source catchments based on their similarity to the selected target catchments. Three commonly used similarity comparison strategies are employed: (1) spatial similarity (SS), (2) physical attributes similarity (PS), and (3) flow regime similarity (FS). After ranking, source catchments are sub-grouped by similarity level and used to train TL networks. Two baseline networks are included for comparison: a Non-Transfer Learning (NTL) network and an All-Source Transfer Learning (ASTL) network, which uses the full set of source datasets. Finally, this study aims to identify which similarity comparison strategy is more effective in guiding source selection for enhancing TL performance.

The remainder of this article is organized as follows: Section 2—Data and Methodology, Section 3—Results and Discussions of three similarity comparison strategies, Section 4—Limitations and Future Works, and Section 5—Conclusions.

2. Data and Methodology

2.1. Study Area and Dataset

All data used in this study originate from the CAMELS-GB dataset [36], a recently released, large-sample, long-term, daily dataset developed for hydrological modeling across Great Britain. CAMELS-GB comprises hydro-meteorological time series and static physical attributes for 671 catchments spanning from 1 October 1970 to 30 September 2015. It includes daily records of rainfall, potential evapotranspiration, temperature, radiation, humidity, and streamflow, alongside static attributes relating to climate, topography, human influences, hydrogeology, hydrometry, soils and land cover.

Following previous studies [35], daily precipitation (mm/day), potential evapotranspiration (mm/day), and temperature (°C) were used as dynamic inputs, along with 21 static inputs to predict the daily catchment-specific discharge (mm/day). The daily precipitation data were derived from the CEH Gridded Estimates of Areal Rainfall dataset (CEH-GEAR), while the daily potential evapotranspiration and temperature were obtained from the Climate Hydrology and Ecology research Support System Potential Evapotranspiration dataset (CHESS-PE) and Meteorology dataset (CHESS-met). The daily streamflow was extracted from the UK National River Flow Archive (NRFA). Additionally, 21 selected static attributes were listed in Table 1.

Due to incomplete attribute data for two catchments (stations “18011” and “26006”), analyses in this study included data from 669 catchments. Among the 669 catchments, three catchments were selected as target catchment cases. For each target catchment, to simulate severe data-scarce scenarios in real-world, only one year of training data was employed [20]. Detailed time ranges of training, validating and testing target datasets are provided in Table 2.

The target catchments were chosen based on the UK Köppen–Geiger climate classification zones [37] and rainfall map using the HadUK-Grid 1 km average annual rainfall data [38] (Figure 2). While the UK is predominantly characterized by a humid temperate oceanic climate (“Cfb” in Köppen classification), we strategically selected three target catchments to represent contrasting climate conditions. The detailed information of the target catchments was obtained from UK National River Flow Archive and presented in Table 3. The first target catchment (“39010”), underlain primarily by chalk with drift cover, exhibits a transition from rural headwaters to suburbanized lower reaches, and it represents a temperate humid climate. The second catchment (“12007”), situated in a mountainous region of metamorphic and granitic geology, receives high precipitation with frequent winter snow cover. In contrast, the third catchment (“33023”) is a relatively small catchment with low rainfall and predominantly agricultural land use, representing a relatively arid region.

For each target catchment, the remaining 668 catchments served as the pool of source catchments. The amount and selection of source catchments in different sub-experiments varied according to the catchment similarity analysis described in Section 2.2. Detailed data length information of the source catchment datasets is also shown in Table 2.

2.2. Catchment Similarity Comparison

The similarity between 668 source catchments and each target catchment was evaluated based on three factors: Spatial Similarity (SS), Physical Attribute Similarity (PS), and Flow Regime Similarity (FS). Detailed comparison strategies for each factor are described below:

(1) Spatial Similarity (SS): The haversine distances between the centroid of each of the 668 source catchments and the target catchment were calculated and subsequently ranked.

(2) Physical Attribute Similarity (PS): To identify the most significant physical attributes influencing TL outcomes while minimizing computational complexity, k-means clustering was applied to group the 21 static catchment attributes into distinct clusters. The optimal number of clusters (K) was determined through the elbow method and silhouette analysis, enhancing the statistical robustness and interpretability of attribute groupings [39]. Similarity was then assessed and ranked by calculating Euclidean distances using standardized attribute values within each identified cluster. As a baseline, similarity was also computed using all 21 attributes collectively without clustering, providing insights into the clustering’s impact on the similarity assessment effectiveness.

(3) Flow Regime Similarity (FS): The similarity of streamflow regimes was evaluated using shape-based time-series comparison techniques applied to standardized hydrographs. Weekly step hydrographs were used to remove magnitude differences, thus emphasizing similarities in temporal flow patterns. Dynamic Time Warping (DTW) with a Sakoe–Chiba band constraint was employed to measure the distances between source catchments and the target catchment streamflow, effectively capturing the overall temporal variability of flow series despite minor temporal shifts [40]. Catchments were subsequently ranked based on these similarity measurements.

For each factor, subsets comprising the top 1–10, 11–20, 21–30, 31–40 ranked source catchments were extracted for comparative experiments to analyze the influence of varying similarity levels on TL results. Additionally, comparisons involving progressively larger subsets (top 1–10, 1–20, 1–30, and 1–40 source catchments) were conducted to evaluate whether incorporating larger but less similar datasets could enhance the TL performance or adversely affect it.

2.3. Transfer Learning Network and Model Training

In this study, a basic Long Short-Term Memory (LSTM) architecture was used for all TL experiments. The LSTM network, a specialized form of RNN, was first introduced by [41] and was widely recognized for its effectiveness in TL tasks involving hydrological time series [29,42]. The LSTM network processes the input features (X) at each time step and each LSTM cell generates an output (h). The output of each cell will be conveyed to a fully connected neural network (FCNN) layer to generate the final prediction output (Y). For streamflow prediction, the input features at each time step typically include meteorological variables and historical streamflow. The input and output sequence lengths are the length of the input feature and the final prediction output, respectively. In single LSTM cell at time step t, the cell state is primarily regulated by three gates: the forget gate (

f_{t}

), input gate (

i_{t}

), and output gate (

o_{t}

). Numerous previous studies have extensively detailed the architecture and algorithm of the LSTM cell for referencing [43,44]. Interested readers are recommended to refer to these studies.

Some advanced architectures, such as the Entity-Aware LSTM (EA LSTM) [18], modify the forget gate to rely solely on static catchment attributes rather than both static and dynamic data. The architectural modifications enhance the use of catchment attributes but substantially increase the training time (approximately nine-fold) [35]. Given that our study focuses on evaluating the impacts of source catchment selection strategies on TL performance, and considering computational resource constraints, the basic 1-layer LSTM architecture was selected as a practical and reliable network.

Fine-tuning based TL using LSTM involves a two-stage (pre-training and fine-tuning) process. First, the model is pretrained on a large set of source catchments, during which the network learns generalizable hydrological patterns. The learned parameters (weights and biases) are saved. Second, the pretrained model is fine-tuned using limited data from the target catchment. By selectively freezing or fine-tuning specific network layers’ weights and biases, the model adjusts to local hydrological conditions while retaining knowledge from the broader dataset.

Preliminary experiments were performed on a local workstation equipped with an Intel^® Core™ i9 14900KF processor and one NVIDIA^® GeForce RTX™ 4080 SUPER. To accelerate the extensive experiments, additional computational resources were leveraged through the Beijing Super Cloud Computing Center, utilizing containerized instances configured with NVIDIA RTX 4090 GPUs and 10 virtual CPU cores.

All TL experiments were implemented using the open-source Neuralhydrology codebase (written in Python), available at: https://github.com/neuralhydrology/neuralhydrology (accessed on 8 May 2025) [45]. The basic “cudalstm” network in model zoo was adopted for model pretraining, with no temporal lagging applied to input variables. Model hyperparameters were determined using a grid search strategy aimed at optimizing predictive performance. The hyperparameter search explored combinations of hidden layer sizes (64, 128, 256), dropout rates (0.2, 0.4, 0.6), input sequence lengths (90, 180, 365 days), and initial learning rates (10⁻³, 10⁻⁴). The models were trained using the Adam optimizer for up to 50 epochs, with an early stopping mechanism based on validation NSE to prevent overfitting.

After identifying the optimal pretrained model for each experiment via grid search, fine-tuning was performed using one year of data from the target catchment. During fine-tuning, the “lstm” and “head” modules were set as the trainable modules. Two initial learning rate (5 × 10⁻³, 5 × 10⁻⁴) were adopted for each case. Each fine-tuning experiment was trained for up to 200 epochs, employing an early stopping mechanism based on NSE, to select the optimal model for final testing on the test dataset. The optimal hyperparameter configurations for all the TL experiments are shown in Appendix A.

2.4. Experimental Settings

To facilitate a clear comparison of all experiments conducted in this study, Table 4 outlines each experiment’s name, similarity criterion used for source selection, and the corresponding ranking range of selected catchments. Two baseline experiments are included: Non-Transfer Learning (NTL), where only 1-year target catchment data were used to train the model without any pretraining, and All-Source Transfer Learning (ASTL), where the model was pretrained using all 668 source catchments and then fine-tuned with 1-year target data.

2.5. Evaluation Metrics

The following metrics were employed to evaluate model performance: Nash-Sutcliffe Efficiency (NSE), Kling–Gupta Efficiency (KGE), and three flow duration curve (FDC)-based signature indices: high-flow bias (%BiasFHV), mid-flow bias (%BiasFMS), low-flow bias (%BiasFLV). %BiasFHV quantifies biases in the highest 2% of flows, %BiasFMS evaluates biases between the 20th and 70th percentile flows, and %BiasFLV measures biases in low flows exceeded 70% of the time, with positive values indicating underestimation and negative values indicating overestimation. NSE and KGE are commonly used hydrological metrics with values ranging from (−∞, 1], where values closer to 1 indicate better predictive performance. The equations of all the evaluation metrics are as follows:

N S E = 1 - \frac{\sum_{t = 1}^{n} {(q_{o}^{t} - q_{p}^{t})}^{2}}{\sum_{t = 1}^{n} {(q_{o}^{t} - \bar{q_{o}})}^{2}}

(1)

where n is the number of observations,

q_{o}^{t}

is the observed flow at time t,

q_{p}^{t}

is the predicted flow at time t,

\bar{q_{o}}

is the mean value of observed flows [46].

K G E = 1 - \sqrt{(r - 1)^{2} + {(\frac{σ_{p}}{σ_{0}} - 1)}^{2} + {(\frac{μ_{p}}{μ_{0}} - 1)}^{2}}

(2)

where r is the linear correlation coefficient between observed and predicted flows,

σ_{p}

and

σ_{o}

are standard deviations of predicted flows and observed flows,

μ_{p}

and

μ_{o}

are mean values of predicted flows and observed flows [47].

% BiasFHV = \frac{\sum_{h = 1}^{H} (q_{p}^{h} - q_{o}^{h})}{\sum_{h = 1}^{H} q_{o}^{h}} \times 100

(3)

where h = 1, 2, …, H denotes the indices for high flows with exceedance probabilities lower than 0.02 [48].

% BiasFMS = \frac{[\log (q_{p}^{m 1}) - \log (q_{p}^{m 2})] - [\log (q_{o}^{m 1}) - \log (q_{o}^{m 2})]}{[\log (q_{o}^{m 1}) - \log (q_{o}^{m 2})]} \times 100

(4)

where m1 and m2 are the lowest and highest flow exceedance probabilities of 0.2 and 0.7, defining the midsegment of the FDC [48].

% BiasFLV = - 1 \cdot \frac{\sum_{l = 1}^{L} [\log (q_{p}^{l}) - \log (q_{p}^{L})] - \sum_{l = 1}^{L} [\log (q_{o}^{l}) - \log (q_{o}^{L})]}{\sum_{l = 1}^{L} [\log (q_{o}^{l}) - \log (q_{o}^{L})]} \times 100

(5)

where l = 1, 2, …, L refers to the indices of flow values within the low-flow segment of the FDC, corresponding to exceedance probabilities ranging from 0.7 to 1.0, with L denoting the index of the minimum flow [48].

3. Results and Discussion

3.1. Effects of Spatial Similarity on Transferability

For the three target catchments, the Non-Transfer Learning (NTL) network, trained solely on one year of target catchment data, yielded NSEs of 0.327, 0.425, and 0.277, respectively, indicating poor ability to capture hydrological patterns. These results are the lower thresholds, and any similarity-guided TL experiment performing below the thresholds is considered to exhibit negative transfer. In contrast, the All-Source Transfer Learning (ASTL) network, trained using data from all 668 source catchments (excluding the selected target catchment data), reached a substantially higher NSEs of 0.792, 0.783, and 0.809, respectively. These represent the upper thresholds of performance. Any similarity-guided TL experiment approaching or exceeding these thresholds, while relying on a significantly smaller subset of source catchments, demonstrates superior transferability.

Then, seven spatial similarity-based TL experiments (SS1–SS7) were conducted for each target catchment. The spatial distributions of the target catchments and their selected source catchments are presented in Figure 3a–c. Figure 3d–f show the corresponding NSE results. The results for KGE, %BiasFHV, %BiasFMS, and %BiasFLV of all the experiments are provided in Appendix B, Appendix C and Appendix D. For clearer visualization of streamflow dynamics across different target catchments, we have also included the streamflow plots of SS1 and ASTL experiments in Appendix E.

According to the results of SS1–SS4, SS1 achieved the highest performance across all the target catchments. The three target catchments yielded NSEs of 0.856, 0.770, and 0.852, respectively, approaching or even surpassing their ASTL thresholds. However, as spatial similarity decreased from SS1 to SS4, transferability degraded. The decline was most pronounced for target catchment “39010”, where SS4 performed worse than the NTL threshold. To further investigate this phenomenon, we compared catchment characteristics between the ranked groups (SS1–SS4) and the target “39010” (Appendix F). The results show that SS4 exhibited significantly greater heterogeneity in mean gauged daily flow, maximum gauged daily flow and catchment area. Such variability would introduce learning difficulties and conflicting parameter updates in TL, which were reflected in substantially larger biases in high-flow prediction accuracy (68.04%) compared with SS1–SS3.

These results highlight the potential of leveraging a few highly spatially similar catchments to achieve superior performance compared to ASTL, while also emphasizing the risk of degrading performance when only a small amount and more distant catchments are used. In regions lacking large datasets, SS-guided TL could offer a practical alternative. However, further research is needed to explore the critical similarity threshold beyond which transferability diminishes, and this threshold is influenced by regional heterogeneity regarding hydrological conditions.

In addition, the results of SS5–SS7, which progressively included more catchments, demonstrated that their performance may not always surpass SS1, but it may appear more stable than SS2–SS4. The result implies that the inclusion of top 10-ranked catchments can offset the negative impact of adding lower similarity ones. However, the ASTL experiments performed worse than some SS5–SS7 experiments, suggesting that including an excessively large and heterogeneous set of catchments can reduce the benefits of TL.

3.2. Effects of Physical Attributes Similarity on Transferability

To investigate the role of physical attributes in similarity-guided TL, 21 static catchment attributes were divided into four clusters based on k-means clustering. The optimal k = 4 was determined by statistical optimization (elbow method and silhouette analysis), while considering the hydrological interpretability. Visualizations of the elbow method, silhouette analysis, and the clustering results are shown in Appendix G. The list of attributes in each cluster is presented in Table 5.

Cluster 1 mainly comprises landcover and climate-related attributes, including indicators of vegetation type (e.g., percentage of deciduous woodland, crops, and urban area), precipitation seasonality and variability (e.g., frequency and duration of high or low precipitation events), and climatic variables (e.g., mean daily potential evapotranspiration). Cluster 2 contains attributes related to topography, soil, and snow dynamics, such as mean elevation, drainage slope, sand content, hydraulic conductivity, and fraction of snowfall. Cluster 3 consists solely of catchment area, capturing spatial scale of hydrological processes. Cluster 4 contains soil texture and porosity indicators, including percentages of silt and clay and volumetric porosity, reflecting water retention and infiltration capacity.

Maps in Figure 4 show the spatial distributions of the source catchments selected from each cluster for each target catchment. Clusters 1, 2, 4, and the full attribute set (“All”) generally show spatial concentration surrounding the selected target catchment, reflecting regional similarities in land use, climate, and geology. Cluster 3, which is solely based on catchment area, exhibits no clear spatial patterns.

Before examining the characteristics of each cluster in detail, it is noteworthy that for the target catchment “12007”, the distributions of source catchments in PS-C1, PS-C2, and PS-All are highly similar to those in the SS experiments. Nearly all the selected source catchments are located near the Cairngorms National Park in Scotland, indicating that the landcover, climate, topography, and snow dynamics of this region are highly unique compared to the rest of UK. Consistently, the NSE results of PS-C1, PS-C2, and PS-All show similarly stable and superior performance as the SS experiments, and they are highly competitive with the ASTL result (NSE = 0.783). Moreover, they outperform the results of the other two target catchments under the same similarity metrics. These findings suggest that for catchments that are strongly heterogeneous relative to other regions, the positive contribution from full source dataset primarily comes from a small number of highly similar catchments. In such cases, comparable transferability can be more readily achieved by leveraging different physical similarity measures.

Compared with the target catchment “12007”, the other two target catchments are more sensitive to the choice of similarity metric. In the PS-C1 experiments for target catchment “39010” and “33023” (Figure 4a,k), the top 10 catchments (PS-C1-1) exhibit strong transferability, achieving NSEs of 0.793 and 0.738, respectively. Comparing the results of PS-C1-1 with PS-C1-2 to PS-C1-4, performance generally decreases. When comparing results from cumulative subsets (PS-C1-5 to PS-C1-7), performance rebounds as more data are included. These results indicate that high similarity in land cover and climate can enable effective TL, but performance is sensitive to declining similarity.

For the PS-C2 experiments (Figure 4b,l), no clear advantage is observed in the top 10 subset, nor is there a consistent trend of decreasing performance with decreasing similarity. This may be due to Cluster 2, which includes a wide range of attributes (elevation, slope, soil, snow, and precipitation). Therefore, using the overall similarity derived from the entire cluster is difficult to identify the most relevant sources, particularly when hydrological characteristics of the target catchment are not highly heterogeneous, as in the case of catchment “12007”. Consequently, similarity rankings based on C2 may not effectively guide TL performance.

Regarding the PS-C3 results in Figure 4c,h,m, all seven ranked groupings yield lower NSEs than their ASTL upper thresholds. Some groupings for target catchment “39010” even fall below the NTL lower threshold, indicating negative transfer. The phenomenon likely attributes to one of the most common factors of negative transfer, which is large domain divergence [33]. These findings confirm that catchment area alone is an inadequate criterion for similarity assessment in TL.

For the PS-C4 results in Figure 4d,i,n, the top 10 catchments again show strong transfer performance (NSE = 0.769, 0.723, and 0.859), comparable to their ASTL thresholds. Although no strict pattern of decreasing performance with decreasing similarity is observed across ranked groups in target catchments “39010” and “33023”, most groups achieve strong performance close to the ASTL thresholds. The results suggest that soil structure and water retention properties are informative for guiding effective transfer. However, a clear threshold of attribute similarity—beyond which performance begins to decline—cannot yet be identified and requires further investigation.

In Figure 4e,j,o, where all 21 physical attributes are used, each subset achieves performance close to or better than the ASTL. As a result, when it is unclear which attributes to prioritize for similarity evaluation, using the full attribute set is a reliable alternative.

In summary, TL performance is sensitive to the selected attribute cluster. For regions lacking large datasets or aiming to reduce training costs with complex models, the following recommendations can be made: if an attribute cluster contains diverse and physically unrelated indicators (e.g., Cluster 2) or only includes a single feature with weak linkage to rainfall–runoff processes (e.g., Cluster 3—area), it is not recommended for guiding source selection. If the cluster relates to land cover and climate (e.g., Cluster 1), it can be used as a selection guide when high similarity is ensured. A more robust and safer choice is to use the full set of physical attributes, especially including variables such as soil porosity and infiltration capacity, as represented in Cluster 4.

3.3. Effects of Flow Regime Similarity on Transferability

Flow regime similarity was assessed by calculating Dynamic Time Warping (DTW) distances between standardized weekly step hydrographs. The distributions of the top-ranked source catchments for each target catchment are presented in Figure 5a–c. According to the NSE results in Figure 5d–f, most ranked groups exhibit relatively stable and comparable performance to the ASTL thresholds. Among them, the ranked groups for target catchment “39010” show slightly better performance compared to target catchments “12007” and “33023”. By referring to their hydrograph comparisons and corresponding DTW values (Appendix H, Appendix I and Appendix J), DTW ranges from 0.153 to 0.493 for the top 40 source catchments of target “39010”, from 11.663 to 35.366 for target “12007”, and from 1.721 to 6.016 for target “33023”. These values indicate that nearly all the top 40 source catchments for target “39010” are highly similar to the target itself, while target “12007” has fewer highly regime similar source catchments across the dataset.

Although target “12007” benefits much from surrounding catchments with similar physical attributes (Section 3.2), physically similarity does not always guarantee hydrologically similarity. Previous study has also stated the major reasons behind this inconsistency: (1) these catchments often have a quite specific hydrological behavior or (2) the complex underground behavior was not accurately described by the available attributes [6]. In such case, for catchments characterized by unique hydrological characteristics (i.e., similar flow regime are rarely seen across the dataset), physical attributes similarity can be prioritized as a more straightforward and effective alternative.

Apart from the special cases, flow regime similarity is still a promising selection criterion, given that it represents the final manifestation of catchment behavior and serves as a direct indicator of hydrological similarity.

It is also important to note that in this study, only one year of training data was available for the target catchment. Thus, DTW-based similarity was calculated using just one year of flow data for both the target and source catchments. In scenarios where longer training periods are available for the target catchment, comparing multi-year flow regime patterns could yield more robust and informative similarity assessments.

4. Limitations and Future Works

4.1. Limitations

In this study, although we have included three different target catchment cases including temperate, snow-dominated, and relatively arid regions, the UK is still predominantly characterized by a homogeneous temperate climate. Future studies could expand to regions in other continents with more diverse and extreme hydrological conditions to further enhance the generalizability of our findings.

4.2. Future Works

Future research can be pursued along the following directions:

(1) Extension to ungauged catchments

Since this study focuses on data-scarce catchments to enable a fair comparison with other large-dataset fine-tuning-based TL studies, the extent to which the similarity strategy can be extended to ungauged catchments remains uncertain. Therefore, future research will be undertaken to develop an end-to-end framework that integrates diverse similarity metrics for both data-scarce and ungauged catchments. Notably, for ungauged catchments, composite metrics can solely rely on spatial and physical attributes. Some attribute weighting schemes can be applied to construct the final metrics. For instance, mutual information (MI) analysis can be used to quantify correlations between physical attributes. Higher MI values indicate that one attribute contains more information of the others, suggesting potential redundancy that should be downweighed [49]. Alternatively, Random Forest (RF) algorithms can be employed, as each decision tree selects the most discriminative attributes for node splitting, while irrelevant attributes may not be used if they lack predictive ability [40]. In addition, other TL architectures should be explored for ungauged catchments, such as domain adaptation, which enables the transfer of knowledge from labeled source catchments to unlabeled target catchments, which have no flow records, by aligning their feature distributions [50].

(2) Extension to other source/target data scenarios

In this study, we assume all the data in the CAMELS-GB dataset are of consistently high quality, and we use a fixed 20-year source data together with 1-year target data. In the future, additional tests will be conducted under other data scenarios, such as (i) different lengths of source/target data [20] and (ii) source/target data with different missing data ratios. These tests would allow a better understanding of how data quantity and quality influence the overall transferability.

5. Conclusions

This study investigated the role of similarity-guided source catchment selection in enhancing transfer learning (TL) performance for streamflow modeling in data-scarce regions. The results from three target catchments demonstrate that carefully selecting a small number of similar source catchments can achieve or even surpass the performance of models trained on the entire source dataset. Specifically, spatial similarity-based selection showed effective transfer performance, with the top 10 nearest catchments yielding the best results. Physical attribute similarity showed varied effectiveness depending on the attribute combinations, with those related to land cover, climate, and soil properties leading to superior performance. Flow regime similarity can provide the most stable TL performance. However, for target catchment in a unique hydrological environment (e.g., snow-dominated regions), catchments with highly similar flow regime are rare across the UK, making spatial and physical attributes similarity more appropriate selection strategies. Overall, the findings highlight that effective source selection offers great opportunity to reduce computational costs while improving streamflow modeling accuracy in data-scarce regions.

Author Contributions

Conceptualization, Y.G.; methodology, Y.G.; coding, Y.G.; result analysis, Y.G.; writing—original draft preparation, Y.G.; review, R.M., J.M. and J.C.; visualization, Y.G. and W.Z.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the EPSRC Centre for Doctoral Training in Future Infrastructure and Built Environment: Resilience in a Changing World (FIBE2) (Grant number EP/S02302X/1) and Unlocking Net-Zero Infrastructure (FIBE3) (Grant number EP/Y034643/1).

Data Availability Statement

CAMELS-GB data are available at https://catalogue.ceh.ac.uk/documents/8344e4f3-d2ea-44f5-8afa-86d2987543a9 (accessed on 22 January 2025). Neuralhydrology codebase is available at: https://github.com/neuralhydrology/neuralhydrology (accessed on 8 May 2025). The station metadata and information are from the UK National River Flow Archive.

Acknowledgments

We would like to thank Dongfang Liang of the University of Cambridge for his editorial support.

Conflicts of Interest

The author Jack Chen was employed by the company Cam Dragon Corporation Ltd., 2 Braybrooke Place, Cambridge, CB1 3LN as research scientist. The author Wuyi Zhuang was funded by the EPSRC Centre for Doctoral Training in Future Infrastructure and Built Environment: Resilience in a Changing World (FIBE2) (Grant number EP/S02302X/1). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Networks
ASTL	All-Source Transfer Learning
%BiasFHV	Percent Bias in High-Flow Volume (top 2% of flows)
%BiasFLV	Percent Bias in Low-Flow Volume (flows exceeded 70% of the time)
%BiasFMS	Percent Bias in Mid-Flow Segment (20th to 70th percentile flows)
CNN	Convolutional Neural Network
DL	Deep Learning
DTW	Dynamic Time Warping
FS	Flow Similarity
GRU	Gated Recurrent Unit
KGE	Kling–Gupta Efficiency
LSTM	Long Short-Term Memory
NSE	Nash–Sutcliffe Efficiency
PS	Physical Attributes Similarity
RNN	Recurrent Neural Network
SS	Spatial Similarity
SWAT	Soil and Water Assessment Tool
TL	Transfer Learning

Appendix A

Table A1. The optimal hyperparameter configurations for all the transfer learning experiments.

Target “39010”	Pre-Train		Fine-Tune	Target “12007”	Pre-Train		Fine-Tune	Target “33023”	Pre-Train		Fine-Tune
Target “39010”	Hidden Size	Drop-Out	lr *	Target “12007”	Hidden Size	Drop-Out	lr	Target “33023”	Hidden Size	Drop-Out	lr
ASTL	256	0.4	0.005	ASTL	256	0.4	0.005	ASTL	256	0.4	0.0005
SS1	256	0.2	0.0005	SS1	64	0.2	0.0005	SS1	256	0.2	0.0005
SS2	128	0.2	0.005	SS2	128	0.4	0.0005	SS2	256	0.4	0.0005
SS3	64	0.2	0.0005	SS3	256	0.2	0.0005	SS3	256	0.2	0.005
SS4	128	0.2	0.005	SS4	256	0.2	0.0005	SS4	256	0.4	0.0005
SS5	256	0.2	0.0005	SS5	64	0.2	0.005	SS5	128	0.2	0.0005
SS6	128	0.4	0.0005	SS6	256	0.2	0.0005	SS6	128	0.4	0.0005
SS7	256	0.4	0.0005	SS7	128	0.2	0.0005	SS7	64	0.4	0.0005
PS-C1-1	128	0.4	0.0005	PS-C1-1	128	0.4	0.0005	PS-C1-1	128	0.2	0.005
PS-C1-2	256	0.4	0.005	PS-C1-2	128	0.2	0.0005	PS-C1-2	256	0.4	0.0005
PS-C1-3	64	0.4	0.005	PS-C1-3	256	0.4	0.005	PS-C1-3	128	0.2	0.0005
PS-C1-4	128	0.4	0.005	PS-C1-4	128	0.2	0.005	PS-C1-4	64	0.4	0.005
PS-C1-5	128	0.2	0.005	PS-C1-5	256	0.2	0.005	PS-C1-5	128	0.4	0.0005
PS-C1-6	64	0.4	0.0005	PS-C1-6	256	0.2	0.0005	PS-C1-6	256	0.2	0.0005
PS-C1-7	128	0.4	0.005	PS-C1-7	128	0.2	0.0005	PS-C1-7	256	0.2	0.0005
PS-C2-1	256	0.2	0.005	PS-C2-1	64	0.2	0.005	PS-C2-1	256	0.2	0.0005
PS-C2-2	64	0.2	0.005	PS-C2-2	128	0.4	0.005	PS-C2-2	256	0.2	0.0005
PS-C2-3	256	0.4	0.005	PS-C2-3	128	0.4	0.005	PS-C2-3	256	0.2	0.0005
PS-C2-4	128	0.4	0.005	PS-C2-4	128	0.4	0.005	PS-C2-4	256	0.2	0.0005
PS-C2-5	256	0.2	0.005	PS-C2-5	128	0.2	0.005	PS-C2-5	128	0.2	0.0005
PS-C2-6	256	0.2	0.005	PS-C2-6	64	0.2	0.005	PS-C2-6	256	0.2	0.0005
PS-C2-7	64	0.4	0.005	PS-C2-7	128	0.4	0.0005	PS-C2-7	256	0.2	0.0005
PS-C3-1	256	0.2	0.0005	PS-C3-1	256	0.2	0.005	PS-C3-1	64	0.2	0.0005
PS-C3-2	64	0.2	0.0005	PS-C3-2	256	0.2	0.005	PS-C3-2	256	0.4	0.0005
PS-C3-3	128	0.6	0.0005	PS-C3-3	64	0.2	0.005	PS-C3-3	256	0.2	0.0005
PS-C3-4	128	0.4	0.0005	PS-C3-4	128	0.4	0.005	PS-C3-4	256	0.2	0.0005
PS-C3-5	128	0.4	0.0005	PS-C3-5	128	0.2	0.0005	PS-C3-5	256	0.4	0.0005
PS-C3-6	256	0.2	0.0005	PS-C3-6	128	0.2	0.005	PS-C3-6	256	0.2	0.0005
PS-C3-7	256	0.4	0.0005	PS-C3-7	128	0.2	0.005	PS-C3-7	128	0.2	0.0005
PS-C4-1	128	0.4	0.005	PS-C4-1	256	0.2	0.0005	PS-C4-1	256	0.4	0.0005
PS-C4-2	256	0.2	0.0005	PS-C4-2	256	0.4	0.0005	PS-C4-2	128	0.2	0.005
PS-C4-3	64	0.4	0.005	PS-C4-3	256	0.4	0.005	PS-C4-3	256	0.4	0.0005
PS-C4-4	256	0.2	0.0005	PS-C4-4	256	0.2	0.0005	PS-C4-4	256	0.4	0.0005
PS-C4-5	128	0.6	0.005	PS-C4-5	128	0.4	0.005	PS-C4-5	128	0.2	0.0005
PS-C4-6	256	0.2	0.0005	PS-C4-6	256	0.4	0.0005	PS-C4-6	256	0.2	0.0005
PS-C4-7	64	0.4	0.005	PS-C4-7	256	0.2	0.005	PS-C4-7	256	0.2	0.0005
PS-All-1	64	0.2	0.005	PS-All-1	128	0.6	0.0005	PS-All-1	64	0.2	0.005
PS-All-2	64	0.4	0.005	PS-All-2	128	0.2	0.0005	PS-All-2	256	0.2	0.005
PS-All-3	64	0.2	0.005	PS-All-3	128	0.2	0.005	PS-All-3	256	0.2	0.0005
PS-All-4	128	0.4	0.005	PS-All-4	128	0.2	0.005	PS-All-4	256	0.4	0.005
PS-All-5	128	0.2	0.005	PS-All-5	128	0.2	0.0005	PS-All-5	256	0.4	0.0005
PS-All-6	128	0.4	0.0005	PS-All-6	64	0.2	0.005	PS-All-6	128	0.2	0.0005
PS-All-7	256	0.2	0.0005	PS-All-7	256	0.2	0.0005	PS-All-7	256	0.4	0.005
FR1	128	0.2	0.0005	FR1	256	0.2	0.005	FR1	256	0.4	0.0005
FR2	256	0.2	0.005	FR2	256	0.2	0.005	FR2	256	0.2	0.0005
FR3	64	0.2	0.005	FR3	256	0.4	0.0005	FR3	64	0.2	0.0005
FR4	64	0.2	0.005	FR4	128	0.2	0.0005	FR4	256	0.2	0.005
FS5	256	0.2	0.0005	FS5	128	0.2	0.0005	FS5	128	0.2	0.005
FS6	256	0.2	0.0005	FS6	128	0.6	0.0005	FS6	256	0.4	0.0005
FS7	64	0.2	0.0005	FS7	256	0.2	0.0005	FS7	128	0.2	0.0005

Note: At the pre-training stage, an input sequence length of 365 and an initial learning rate of 10⁻³ are consistent across different experiments. * lr—initial learning rate.

Appendix B

Table A2. Full results of transfer learning experiments for target catchment “39010”.

Experiment Name	NSE	KGE	FHV	FMS	FLV	Experiment Name	NSE	KGE	FHV	FMS	FLV
ASTL	0.792	0.685	−21.899	39.229	−10.737	PS-C3-4	0.304	0.607	43.830	30.342	−80.694
SS1	0.856	0.888	−8.124	18.349	10.335	PS-C3-5	0.360	0.586	40.104	−9.988	−93.091
SS2	0.769	0.767	−12.536	24.486	−4.913	PS-C3-6	0.453	0.636	40.547	20.895	−42.181
SS3	0.502	0.731	31.628	24.840	17.592	PS-C3-7	0.309	0.627	45.366	48.372	155.275
SS4	0.181	0.388	68.037	15.843	20.520	PS-C4-1	0.769	0.806	−8.168	28.102	−15.551
SS5	0.824	0.771	−28.412	−4.974	−7.420	PS-C4-2	0.424	0.626	24.934	−1.786	5.601
SS6	0.823	0.832	18.052	−7.310	−36.116	PS-C4-3	0.818	0.811	11.192	27.945	−38.522
SS7	0.817	0.815	−21.867	−7.434	−4.179	PS-C4-4	0.770	0.798	19.026	−8.353	0.752
PS-C1-1 ¹	0.793	0.877	7.523	13.326	−81.105	PS-C4-5	0.509	0.613	−9.420	47.441	49.755
PS-C1-2	0.705	0.685	−21.676	28.645	8.217	PS-C4-6	0.811	0.820	16.228	10.664	−51.143
PS-C1-3	0.340	0.527	−20.574	51.293	−35.184	PS-C4-7	0.845	0.835	11.740	13.145	−85.960
PS-C1-4	0.451	0.506	−23.196	48.327	60.214	PS-All-1	0.753	0.873	12.440	27.636	31.743
PS-C1-5	0.643	0.559	−28.521	46.820	53.805	PS-All-2	0.691	0.702	15.986	34.777	−1.728
PS-C1-6	0.549	0.575	29.033	6.501	184.727	PS-All-3	0.815	0.847	−2.932	26.296	−6.449
PS-C1-7	0.811	0.760	−16.978	24.675	−11.191	PS-All-4	0.759	0.770	14.020	18.791	−6.073
PS-C2-1	0.425	0.688	10.306	59.616	33.914	PS-All-5	0.593	0.511	38.761	40.008	58.050
PS-C2-2	0.731	0.707	−22.381	32.323	30.893	PS-All-6	0.750	0.764	10.681	15.034	−11.331
PS-C2-3	0.410	0.385	−40.427	51.438	32.782	PS-All-7	0.737	0.810	1.694	4.264	−33.920
PS-C2-4	0.758	0.811	−8.346	21.643	18.065	FS1	0.788	0.879	5.669	20.534	−3.980
PS-C2-5	0.681	0.570	−30.476	47.249	146.256	FS2	0.766	0.731	20.981	30.359	−19.494
PS-C2-6	0.820	0.816	−13.325	22.418	16.849	FS3	0.775	0.865	0.471	24.857	−27.062
PS-C2-7	0.611	0.504	−27.614	59.413	66.902	FS4	0.780	0.784	15.359	18.069	30.714
PS-C3-1	0.587	0.691	37.171	19.774	551.846	FS5	0.815	0.890	−4.772	8.671	3.809
PS-C3-2	0.245	0.629	16.678	45.749	130.572	FS6	0.749	0.722	19.321	22.553	4.687
PS-C3-3	0.417	0.698	12.878	44.491	185.630	FS7	0.803	0.864	12.177	20.735	19.873

Note: ¹ C1, C2, C3, C4, and All in PS series experiments represents Cluster 1, Cluster 2, Cluster 3, Cluster 4 and all the 21 physical attributes.

Appendix C

Table A3. Full results of transfer learning experiments for target catchment “12007”.

ExperimentName	NSE	KGE	FHV	FMS	FLV	ExperimentName	NSE	KGE	FHV	FMS	FLV
ASTL	0.783	0.773	−23.096	−2.410	−42.67	PS-C3-4	0.653	0.669	−31.986	11.615	−1576.68
SS1	0.770	0.757	−22.791	−11.124	−1523.79	PS-C3-5	0.699	0.658	−37.148	−7.347	−256.63
SS2	0.723	0.710	−28.489	−12.123	−21.78	PS-C3-6	0.701	0.680	−34.931	9.526	−1610.20
SS3	0.685	0.664	−35.286	−1.462	−1622.01	PS-C3-7	0.707	0.681	−32.484	−3.877	−1618.00
SS4	0.679	0.649	−35.133	−5.897	−183.06	PS-C4-1	0.723	0.684	−33.116	−7.444	−1624.96
SS5	0.753	0.687	−29.008	−6.831	−51.30	PS-C4-2	0.716	0.702	−31.776	−2.321	−1611.69
SS6	0.798	0.756	−25.359	−4.553	8.63	PS-C4-3	0.674	0.617	−35.480	−1.539	−1606.39
SS7	0.788	0.732	−25.044	−13.046	−230.31	PS-C4-4	0.598	0.631	−30.848	−11.739	−61.40
PS-C1-1 ¹	0.754	0.739	−27.032	6.660	−1530.93	PS-C4-5	0.734	0.732	−29.999	6.012	−1474.69
PS-C1-2	0.722	0.668	−31.309	−16.907	−1629.12	PS-C4-6	0.740	0.687	−33.350	−7.065	−103.06
PS-C1-3	0.734	0.723	−28.772	6.776	−1556.62	PS-C4-7	0.747	0.730	−26.808	12.864	−354.92
PS-C1-4	0.695	0.692	−28.173	−13.624	−1621.58	PS-All-1	0.787	0.727	−25.008	−2.949	−78.73
PS-C1-5	0.777	0.727	−23.182	−4.695	−10.82	PS-All-2	0.684	0.665	−35.458	−4.354	−138.65
PS-C1-6	0.784	0.775	−21.434	−8.360	−181.71	PS-All-3	0.697	0.719	−26.368	−1.385	−58.54
PS-C1-7	0.764	0.709	−27.895	−13.992	17.91	PS-All-4	0.645	0.706	−26.428	−4.733	−398.90
PS-C2-1	0.788	0.769	−21.521	−3.747	−378.18	PS-All-5	0.796	0.735	−25.243	−7.577	−3.44
PS-C2-2	0.748	0.739	−28.076	6.780	−1567.92	PS-All-6	0.785	0.795	−17.065	−3.034	−1586.35
PS-C2-3	0.731	0.721	−25.189	−3.898	−1636.65	PS-All-7	0.770	0.747	−26.581	−6.082	−20.61
PS-C2-4	0.695	0.693	−28.909	2.253	−1519.99	FS1	0.726	0.725	−27.464	−1.281	−1538.04
PS-C2-5	0.741	0.747	−23.114	1.789	16.92	FS2	0.705	0.681	−30.606	−0.278	−1581.42
PS-C2-6	0.785	0.795	−17.065	−3.034	−1586.35	FS3	0.758	0.758	−23.960	2.970	−1568.33
PS-C2-7	0.800	0.769	−22.036	−8.027	−85.02	FS4	0.663	0.706	−24.801	−10.781	−1570.83
PS-C3-1	0.572	0.625	−27.801	−16.395	−4.45	FS5	0.788	0.753	−27.173	−1.492	−118.38
PS-C3-2	0.717	0.720	−26.684	0.266	−1586.41	FS6	0.771	0.715	−28.918	−8.875	−72.28
PS-C3-3	0.631	0.645	−31.095	−0.944	−39.96	FS7	0.819	0.773	−24.531	−2.931	−87.15

Note: ¹ C1, C2, C3, C4, and All in PS series experiments represents Cluster 1, Cluster 2, Cluster 3, Cluster 4 and all the 21 physical attributes.

Appendix D

Table A4. Full results of transfer learning experiments for target catchment “33023”.

ExperimentName	NSE	KGE	FHV	FMS	FLV	ExperimentName	NSE	KGE	FHV	FMS	FLV
ASTL	0.809	0.852	10.419	−1.685	−659.25	PS-C3-4	0.708	0.657	−19.693	75.371	−13.30
SS1	0.852	0.853	−7.332	−39.799	−119.12	PS-C3-5	0.595	0.644	−22.321	−30.817	−660.29
SS2	0.770	0.882	5.275	−17.452	−1103.49	PS-C3-6	0.602	0.538	−26.055	−25.569	−580.81
SS3	0.738	0.721	−31.311	−7.193	−1086.27	PS-C3-7	0.470	0.596	−8.465	−6.081	−318.09
SS4	0.767	0.876	4.179	−23.645	−988.32	PS-C4-1	0.859	0.857	12.982	1.080	−968.32
SS5	0.772	0.733	18.241	−11.152	−1050.00	PS-C4-2	0.725	0.702	−30.820	3.384	−845.92
SS6	0.788	0.845	16.173	−11.593	−1001.04	PS-C4-3	0.788	0.888	1.251	−4.058	−516.53
SS7	0.771	0.807	16.357	−16.822	−993.63	PS-C4-4	0.730	0.707	−22.199	17.628	−229.19
PS-C1-1 ¹	0.738	0.767	−16.615	−11.745	−988.65	PS-C4-5	0.666	0.766	29.059	−16.838	−857.66
PS-C1-2	0.565	0.737	33.916	−44.371	−1121.75	PS-C4-6	0.733	0.751	23.313	13.227	−728.21
PS-C1-3	0.482	0.498	56.729	−23.028	−118.30	PS-C4-7	0.674	0.664	35.314	−7.927	−597.19
PS-C1-4	0.416	0.341	−57.533	24.773	−129.78	PS-All-1	0.729	0.848	12.247	−30.168	−876.44
PS-C1-5	0.656	0.659	27.612	−4.368	−949.55	PS-All-2	0.667	0.744	−3.277	−10.305	−1029.47
PS-C1-6	0.847	0.890	16.546	−11.683	−1041.07	PS-All-3	0.747	0.825	−6.858	−7.433	−989.30
PS-C1-7	0.835	0.825	15.576	0.020	−914.39	PS-All-4	0.721	0.800	−3.850	−19.366	−1068.96
PS-C2-1	0.638	0.752	25.383	5.212	−839.24	PS-All-5	0.844	0.794	18.799	−22.091	−927.24
PS-C2-2	0.629	0.603	37.349	−15.415	−980.53	PS-All-6	0.774	0.739	26.207	−12.497	−1085.17
PS-C2-3	0.685	0.839	6.676	−12.829	−546.65	PS-All-7	0.787	0.822	−3.405	−10.106	−805.11
PS-C2-4	0.612	0.786	10.715	−49.881	−1046.64	FS1	0.796	0.869	5.455	−22.014	−1115.54
PS-C2-5	0.583	0.580	46.239	24.758	−357.63	FS2	0.747	0.863	11.949	−4.358	−1051.10
PS-C2-6	0.723	0.706	37.126	−5.632	−860.64	FS3	0.809	0.887	10.133	−7.287	−757.98
PS-C2-7	0.720	0.693	32.399	25.251	−796.59	FS4	0.688	0.834	4.509	−10.187	−1040.16
PS-C3-1	0.627	0.615	−39.004	−35.616	−900.16	FS5	0.687	0.572	−34.498	−37.387	39.26
PS-C3-2	0.638	0.792	15.845	−7.776	−697.17	FS6	0.797	0.768	24.600	−20.774	−1115.09
PS-C3-3	0.448	0.539	−11.343	−24.791	−718.80	FS7	0.760	0.748	23.093	−20.010	−1134.19

Note: ¹ C1, C2, C3, C4, and All in PS series experiments represents Cluster 1, Cluster 2, Cluster 3, Cluster 4 and all the 21 physical attributes.

Appendix E

Figure A1. Observed and predicted streamflow comparisons of representative TL experiments for (a) target catchment “39010”, (b) target catchment “12007”, and (c) target catchment “33023”. ASTL refers to All-Source Transfer Learning experiment. SS1 refers to the experiment using top 10 spatial similar source catchments.

Appendix F

Figure A2. Comparison of catchment characteristics between different ranked groups (SS1–SS4) and target catchment “39010”. Error bars represent standard deviation within each group (Data from the UK National River Flow Archive).

Appendix G

Figure A3. Visualizations of (a) the Elbow method, (b) Silhouette analysis, and (c) the clustering of 21 static physical attributes.

Appendix H

Figure A4. Comparison of weekly hydrographs and corresponding DTW values between target catchment “39010” and the selected source catchments.

Appendix I

Figure A5. Comparison of weekly hydrographs and corresponding DTW values between target catchment “12007” and the selected source catchments.

Appendix J

Figure A6. Comparison of weekly hydrographs and corresponding DTW values between target catchment “33023” and the selected source catchments.

References

Huang, P.; Wang, G.; Guo, L.; Mello, C.R.; Li, K.; Ma, J.; Sun, S. Most Global Gauging Stations Present Biased Estimations of Total Catchment Discharge. Geophys. Res. Lett. 2023, 50, e2023GL104253. [Google Scholar] [CrossRef]
Lavers, D.A.; Harrigan, S.; Andersson, E.; Richardson, D.S.; Prudhomme, C.; Pappenberger, F. A Vision for Improving Global Flood Forecasting. Environ. Res. Lett. 2019, 14, 121002. [Google Scholar] [CrossRef]
Do, H.X.; Westra, S.; Leonard, M. A Global-Scale Investigation of Trends in Annual Maximum Streamflow. J. Hydrol. 2017, 552, 28–43. [Google Scholar] [CrossRef]
Sivapalan, M. Prediction in Ungauged Basins: A Grand Challenge for Theoretical Hydrology. Hydrol. Process. 2003, 17, 3163–3170. [Google Scholar] [CrossRef]
Global Runoff Data Centre (GRDC). “Grdc Data Portal”. Federal Institute of Hydrology (BfG). Available online: https://portal.grdc.bafg.de/applications/public.html (accessed on 21 August 2025).
Oudin, L.; Kay, A.; Andréassian, V.; Perrin, C. Are Seemingly Physically Similar Catchments Truly Hydrologically Similar? Water Resour. Res. 2010, 46, W11558. [Google Scholar] [CrossRef]
Guo, Y.; Zhang, Y.; Zhang, L.; Wang, Z. Regionalization of Hydrological Modeling for Predicting Streamflow in Ungauged Catchments: A Comprehensive Review. WIREs Water 2020, 8, e1487. [Google Scholar] [CrossRef]
Arsenault, R.; Poissant, D.; Brissette, F. Parameter Dimensionality Reduction of a Conceptual Model for Streamflow Prediction in Canadian, Snowmelt Dominated Ungauged Basins. Adv. Water Resour. 2015, 85, 27–44. [Google Scholar] [CrossRef]
Zhang, Y.; Vaze, J.; Chiew, F.H.; Teng, J.; Li, M. Predicting Hydrological Signatures in Ungauged Catchments Using Spatial Interpolation, Index Model, and Rainfall–Runoff Modelling. J. Hydrol. 2014, 517, 936–948. [Google Scholar] [CrossRef]
Tegegne, G.; Kim, Y.-O. Modelling Ungauged Catchments Using the Catchment Runoff Response Similarity. J. Hydrol. 2018, 564, 452–466. [Google Scholar] [CrossRef]
Xu, Y.; Lin, K.; Hu, C.; Wang, S.; Wu, Q.; Zhang, L.; Ran, G. Deep Transfer Learning Based on Transformer for Flood Forecasting in Data-Sparse Basins. J. Hydrol. 2023, 625, 129956. [Google Scholar] [CrossRef]
Rahmani, F.; Shen, C.; Oliver, S.; Lawson, K.; Appling, A. Deep Learning Approaches for Improving Prediction of Daily Stream Temperature in Data-Scarce, Unmonitored, and Dammed Basins. Hydrol. Process. 2021, 35, e14400. [Google Scholar] [CrossRef]
Aghelpour, P.; Varshavian, V. Evaluation of Stochastic and Artificial Intelligence Models in Modeling and Predicting of River Daily Flow Time Series. Stoch. Environ. Res. Risk Assess. 2020, 34, 33–50. [Google Scholar] [CrossRef]
Ghimire, S.; Yaseen, Z.M.; Farooque, A.A.; Deo, R.C.; Zhang, J.; Tao, X. Streamflow Prediction Using an Integrated Methodology Based on Convolutional Neural Network and Long Short-Term Memory Networks. Sci. Rep. 2021, 11, 17497. [Google Scholar] [CrossRef]
Lee, S.; Lee, D. Improved Prediction of Harmful Algal Blooms in Four Major South Korea’s Rivers Using Deep Learning Models. Int. J. Environ. Res. Public Health 2018, 15, 1322. [Google Scholar] [CrossRef] [PubMed]
Le, X.-H.; Nguyen, D.-H.; Jung, S.; Yeon, M.; Lee, G. Comparison of Deep Learning Techniques for River Streamflow Forecasting. IEEE Access 2021, 9, 71805–71820. [Google Scholar] [CrossRef]
Sit, M.; Demiray, B.Z.; Xiang, Z.; Ewing, G.J.; Sermet, Y.; Demir, I. A Comprehensive Review of Deep Learning Applications in Hydrology and Water Resources. Water Sci. Technol. 2020, 82, 2635–2670. [Google Scholar] [CrossRef] [PubMed]
Kratzert, F.; Klotz, D.; Shalev, G.; Klambauer, G.; Hochreiter, S.; Nearing, G. Benchmarking a Catchment-Aware Long Short-Term Memory Network (Lstm) for Large-Scale Hydrological Modeling. Hydrol. Earth Syst. Sci. Discuss. 2019, 2019, 1–32. [Google Scholar] [CrossRef]
Ghobadi, F.; Kang, D.S. Improving Long-Term Streamflow Prediction in a Poorly Gauged Basin Using Geo-Spatiotemporal Mesoscale Data and Attention-Based Deep Learning: A Comparative Study. J. Hydrol. 2022, 615, 20. [Google Scholar] [CrossRef]
Ma, K.; Feng, D.; Lawson, K.; Tsai, W.P.; Liang, C.; Huang, X.; Sharma, A.; Shen, C. Transferring Hydrologic Data across Continents—Leveraging Data-Rich Regions to Improve Hydrologic Prediction in Data-Sparse Regions. Water Resour. Res. 2021, 57, e2020WR028600. [Google Scholar] [CrossRef]
Yang, M.; Yang, Q.; Shao, J.; Wang, G.; Zhang, W. A New Few-Shot Learning Model for Runoff Prediction: Demonstration in Two Data Scarce Regions. Environ. Model. Softw. 2023, 162, 105659. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
Thompson, A. Transfer Learning with Time Series Prediction: Review. SSRN. Available online: https://ssrn.com/abstract=4214809 (accessed on 2 January 2025).
Yaqub, M.; Jinchao, F.; Ahmed, S.; Arshid, K.; Bilal, M.A.; Akhter, M.P.; Zia, M.S. Gan-Tl: Generative Adversarial Networks with Transfer Learning for Mri Reconstruction. Appl. Sci. 2022, 12, 8841. [Google Scholar] [CrossRef]
Wang, D.; Zheng, T.F. Transfer Learning for Speech and Language Processing. In Proceedings of the 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Hong Kong, 16–19 December 2015. [Google Scholar] [CrossRef]
Alhares, H.; Tanha, J.; Balafar, M.A. Amtldc: A New Adversarial Multi-Source Transfer Learning Framework to Diagnosis of COVID-19. Evol. Syst. 2023, 14, 1101–1115. [Google Scholar] [CrossRef]
He, Q.-Q.; Pang, P.C.-I.; Si, Y.-W. Multi-Source Transfer Learning with Ensemble for Financial Time Series Forecasting. In Proceedings of the 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Melbourne, VIC, Australia, 14–17 December 2020. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A Survey of Transfer Learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
Khoshkalam, Y.; Rousseau, A.N.; Rahmani, F.; Shen, C.; Abbasnezhadi, K. Applying Transfer Learning Techniques to Enhance the Accuracy of Streamflow Prediction Produced by Long Short-Term Memory Networks with Data Integration. J. Hydrol. 2023, 622, 129682. [Google Scholar] [CrossRef]
Muhammad, A.U.; Abba, S.I. Transfer Learning for Streamflow Forecasting Using Unguaged Mopex Basins Data Set. Earth Sci. Inform. 2023, 16, 1241–1264. [Google Scholar] [CrossRef]
Xu, Y.; Lin, K.; Hu, C.; Wang, S.; Wu, Q.; Zhang, J.; Xiao, M.; Luo, Y. Interpretable Machine Learning on Large Samples for Supporting Runoff Estimation in Ungauged Basins. J. Hydrol. 2024, 639, 131598. [Google Scholar] [CrossRef]
Wang, Z.; Dai, Z.; Póczos, B.; Carbonell, J. Characterizing and Avoiding Negative Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
Zhang, W.; Deng, L.; Zhang, L.; Wu, D. A Survey on Negative Transfer. IEEE/CAA J. Autom. Sin. 2023, 10, 305–329. [Google Scholar] [CrossRef]
Nai, C.; Liu, X.; Tang, Q.; Liu, L.; Sun, S.; Gaffney, P.P.J. A Novel Strategy for Automatic Selection of Cross-Basin Data to Improve Local Machine Learning-Based Runoff Models. Water Resour. Res. 2024, 60, e2023WR035051. [Google Scholar] [CrossRef]
Lees, T.; Buechel, M.; Anderson, B.; Slater, L.; Reece, S.; Coxon, G.; Dadson, S.J. Benchmarking Data-Driven Rainfall–Runoff Models in Great Britain: A Comparison of Long Short-Term Memory (Lstm)-Based Models with Four Lumped Conceptual Models. Hydrol. Earth Syst. Sci. 2021, 25, 5517–5534. [Google Scholar] [CrossRef]
Coxon, G.; Addor, N.; Bloomfield, J.P.; Freer, J.; Fry, M.; Hannaford, J.; Howden, N.J.K.; Lane, R.; Lewis, M.; Robinson, E.L.; et al. Catchment Attributes and Hydro-Meteorological Timeseries for 671 Catchments across Great Britain (Camels-Gb). NERC Environ. Inf. Data Cent. 2020. [Google Scholar] [CrossRef]
Wilson, O.J.; Pescott, O.L. Köppen-Geiger Climate Classification Prediction Maps for the Uk at 1 Km Resolution, 1901–2080. NERC EDS Environ. Inf. Data Cent. 2023. [Google Scholar] [CrossRef]
Met Office; Hollis, D.C.E.; Kendon, M.; Packman, S.; Doherty, A. Haduk-Grid Gridded Climate Observations on a 1km Grid over the Uk, V1.3.0.Ceda (1836-2023). edited by NERC EDS Centre for Environmental Data Analysis. 2024. [Google Scholar] [CrossRef]
Humaira, H.; Rasyidah, R. Determining the Appropiate Cluster Number Using Elbow Method for K-Means Algorithm. In Proceedings of the 2nd Workshop on Multidisciplinary and Applications (WMA), Padang, Indonesia, 24–25 January 2018. [Google Scholar] [CrossRef]
Yang, M.; Olivera, F. Classification of Watersheds in the Conterminous United States Using Shape-Based Time-Series Clustering and Random Forests. J. Hydrol. 2023, 620, 129409. [Google Scholar] [CrossRef]
Hochreiter, S. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Yao, Y.; Zhao, Y.; Li, X.; Feng, D.; Shen, C.; Liu, C.; Kuang, X.; Zheng, C. Can Transfer Learning Improve Hydrological Predictions in the Alpine Regions? J. Hydrol. 2023, 625, 130038. [Google Scholar] [CrossRef]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. Lstm: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef]
Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–Runoff Modelling Using Long Short-Term Memory (Lstm) Networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
Kratzert, F.; Gauch, M.; Nearing, G.; Klotz, D. Neuralhydrology—A Python Library for Deep Learning Research in Hydrology. J. Open Source Softw. 2022, 7, 4050. [Google Scholar] [CrossRef]
Nash, J.E.; Sutcliffe, J.V. River Flow Forecasting through Conceptual Models Part I—A Discussion of Principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
Knoben, W.J.M.; Freer, J.E.; Woods, R.A. Technical Note: Inherent Benchmark or Not? Comparing Nash–Sutcliffe and Kling–Gupta Efficiency Scores. Hydrol. Earth Syst. Sci. 2019, 23, 4323–4331. [Google Scholar] [CrossRef]
Yilmaz, K.K.; Gupta, H.V.; Wagener, T. A Process-Based Diagnostic Approach to Model Evaluation: Application to the Nws Distributed Hydrologic Model. Water Resour. Res. 2008, 44, 18. [Google Scholar] [CrossRef]
Yan, L.; Lei, Q.; Jiang, C.; Yan, P.; Ren, Z.; Liu, B.; Liu, Z. Climate-Informed Monthly Runoff Prediction Model Using Machine Learning and Feature Importance Analysis. Front. Environ. Sci. 2022, 10, 1049840. [Google Scholar] [CrossRef]
Farahani, A.; Voghoei, S.; Rasheed, K.; Arabnia, H.R. A Brief Review of Domain Adaptation. In Advances in Data Science and Information Engineering; Springer: Cham, Switzerland, 2021; pp. 877–894. [Google Scholar] [CrossRef]

Figure 1. The distribution of Global Runoff Data Centre (GRDC) gauging stations based on their record length. Source: The Global Runoff Data Centre, 56068 Koblenz, Germany. Retrieved from: https://portal.grdc.bafg.de/applications/public.html [Accessed: 21 August 2025] [5].

Figure 2. (a) Köppen–Geiger climate classification zones. Cfb is temperate, no dry season, warm summer; Cfc is temperate, no dry season, cold summer; Csb is temperate, dry summer, warm summer; Dfc is cold, no dry season, cold summer; ET is polar tundra [37]. (b) rainfall map using the HadUK-Grid 1 km average annual rainfall data [38], along with the locations of three target catchments.

Figure 3. (a–c) Spatial distributions of the selected target catchment and corresponding source catchm–ents in Spatial Similarity (SS) experiments; (d–f) NSE results of SS experiments along with the two baseline experiments (1 to 7 on x-axis represent the sub-experiments using different amounts of top-ranked source catchments).

Figure 4. (a–o) Spatial distributions of selected source catchments and corresponding NSE results for three target catchments. Each column represents Physical Attribute Similarity (PS) experiments based on Cluster 1 (C1), Cluster 2 (C2), Cluster 3 (C3), Cluster 4 (C4), and the full attribute set (PS-All). The three rows correspond to the results for target catchments “39010”, “12007”, and “33023”, respectively.

Figure 5. (a–c) Spatial distributions of the selected target catchment and corresponding source catchments in Flow Regime Similarity (FS) experiments; (d–f) NSE results of FS experiments along with the two baseline experiments.

Table 1. 21 catchment static attributes from CAMELS-GB dataset used for this study [35,36].

Attributes Name	Description	Unit
“area”	catchment area	km²
“elev_mean”	catchment mean elevation	m.a.s.l
“dpsbar”	catchment mean drainage path slope	m/km
“sand_perc”	percentage sand—soil	%
“silt_perc”	percentage silt	%
“clay_perc”	percentage clay	%
“porosity_hypres”	volumetric porosity (saturated water content estimated using a pedotransfer function based on silt, clay and organic fractions, bulk density and topsoil)	-
“conductivity_hypres”	saturated hydraulic conductivity (estimated using a pedotransfer function based on silt, clay and organic fractions, bulk density and topsoil)	cm h⁻¹
“soil_depth_pelletier”	depth to bedrock (maximum 50 m)	m
“frac_snow”	fraction of precipitation falling as snow (for days colder than 0 °C)	-
“dwood_perc”	percentage cover of deciduous woodland	%
“ewood_perc”	percentage cover of evergreen woodland	%
“crop_perc”	percentage cover of crops	%
“urban_perc”	percentage cover of suburban and urban	%
“low_prec_freq”	frequency of dry days (<1 mm day⁻¹)	days yr⁻¹
“low_prec_dur”	average duration of dry periods (number of consecutive days < 1 mm day⁻¹)	days
“high_prec_freq”	frequency of high precipitation days (≥5 times mean daily precipitation)	days yr⁻¹
“high_prec_dur”	average duration of high precipitation events (number of consecutive days ≥ 5 times mean daily precipitation)	days
“p_mean”	mean daily precipitation	mm day⁻¹
“pet_mean”	mean daily PET (Penman–Monteith equation without interception correction)	mm day⁻¹
“p_seasonality”	seasonality and timing of precipitation (estimated using sine curves to represent the annual temperature and precipitation cycles; positive (negative) values indicate that precipitation peaks in summer (winter) and values close to zero indicate uniform precipitation throughout the year)	-

Table 2. Detailed information on pre-training/fine-tuning, validating, and testing datasets.

Catchment Type	Time Range	Training/Fine-tuning Set (Days)	Validating Set (Days)	Testing Set (Days)
Source Catchments (668 catchments)	30 September 1989 to 30 September 2015	7305	1096	1096
Target Catchment	30 September 2008 to 30 September 2015	365	1096	1096

Table 3. Detailed information of three target catchments (source: UK National River Flow Archive).

Target	Station Number	River	Location	Catchment Area	Elevation (mAOD ¹)	Primary Geology	SAAR ² (mm)	Dominant Land Use
1	“39010”	Colne	Denham	743	33.30–266.20	Chalk with Drift cover; clays in valleys; extensive gravel tracts	704	Rural in headwaters; suburban development in middle and lower reaches
2	“12007”	Dee	Mar Lodge	289	334.10–1308.90	Dalradian and Moinian metamorphic rocks; granite mountains; nearly half overlain by superficial deposits	1334	Mountainous with moorland and some forestry
3	“33023”	Lea Brook	Beck Bridge	101.8	7.90–122.30	Chalk with Boulder Clay cover (south, upper catchment); river terrace deposits (north)	579	Mixed agricultural

Note: ¹ mAOD: meters Above Ordnance Datum; ² SAAR: Standard Average Annual Rainfall (1961–1990).

Table 4. Summary of baseline and similarity-based transfer learning experiments.

Experiment Name	Similarity Criterion	Selected Source Catchments
NTL ¹	/	/
ASTL ²	/	All 668
SS1	Spatial Similarity	Top1–10
SS2		Top11–20
SS3		Top21–30
SS4		Top31–40
SS5		Top1–20
SS6		Top1–30
SS7		Top1–40
PS1	Physical Attributes Similarity	Top1–10
PS2		Top11–20
PS3		Top21–30
PS4		Top31–40
PS5		Top1–20
PS6		Top1–30
PS7		Top1–40
FS1	Flow Regime Similarity	Top1–10
FS2		Top11–20
FS3		Top21–30
FS4		Top31–40
FS5		Top1–20
FS6		Top1–30
FS7		Top1–40

Note: ¹ NTL—Non-transfer learning; ² ASTL—All-Source Transfer Learning.

Table 5. The attributes contained in each cluster.

Cluster	Attributes Name
Cluster 1 (C1)	“soil_depth_pelletier”, “dwood_perc”, “crop_perc”, “urban_perc”, “pet_mean”, “p_seasonality”, “high_prec_freq”, “low_prec_freq”, “high_prec_dur”, “low_prec_dur”
Cluster 2 (C2)	“elev_mean”, “dpsbar”, “sand_perc”, “conductivity_hypres”, “frac_snow”, “ewood_perc”,”p_mean”
Cluster 3 (C3)	“area”
Cluster 4 (C4)	“silt_perc”, “clay_perc”,”porosity_hypres”

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, Y.; Mandania, R.; Ma, J.; Chen, J.; Zhuang, W. Enhancing Streamflow Modeling in Data-Scarce Catchments with Similarity-Guided Source Selection and Transfer Learning. Water 2025, 17, 2762. https://doi.org/10.3390/w17182762

AMA Style

Gao Y, Mandania R, Ma J, Chen J, Zhuang W. Enhancing Streamflow Modeling in Data-Scarce Catchments with Similarity-Guided Source Selection and Transfer Learning. Water. 2025; 17(18):2762. https://doi.org/10.3390/w17182762

Chicago/Turabian Style

Gao, Yuxuan, Rupal Mandania, Jun Ma, Jack Chen, and Wuyi Zhuang. 2025. "Enhancing Streamflow Modeling in Data-Scarce Catchments with Similarity-Guided Source Selection and Transfer Learning" Water 17, no. 18: 2762. https://doi.org/10.3390/w17182762

APA Style

Gao, Y., Mandania, R., Ma, J., Chen, J., & Zhuang, W. (2025). Enhancing Streamflow Modeling in Data-Scarce Catchments with Similarity-Guided Source Selection and Transfer Learning. Water, 17(18), 2762. https://doi.org/10.3390/w17182762

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Streamflow Modeling in Data-Scarce Catchments with Similarity-Guided Source Selection and Transfer Learning

Abstract

1. Introduction

2. Data and Methodology

2.1. Study Area and Dataset

2.2. Catchment Similarity Comparison

2.3. Transfer Learning Network and Model Training

2.4. Experimental Settings

2.5. Evaluation Metrics

3. Results and Discussion

3.1. Effects of Spatial Similarity on Transferability

3.2. Effects of Physical Attributes Similarity on Transferability

3.3. Effects of Flow Regime Similarity on Transferability

4. Limitations and Future Works

4.1. Limitations

4.2. Future Works

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

Appendix G

Appendix H

Appendix I

Appendix J

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI