Exploring Random Forest Machine Learning and Remote Sensing Data for Stream ﬂ ow Prediction: An Alternative Approach to a Process-Based Hydrologic Modeling in a Snowmelt-Driven Watershed

: Physically based hydrologic models require signi ﬁ cant e ﬀ ort and extensive information for development, calibration, and validation. The study explored the use of the random forest regression (RFR), a supervised machine learning (ML) model, as an alternative to the physically based Soil and Water Assessment Tool (SWAT) for predicting stream ﬂ ow in the Rio Grande Headwaters near Del Norte, a snowmelt-dominated mountainous watershed of the Upper Rio Grande Basin. Remotely sensed data were used for the random forest machine learning analysis (RFML) and RStu-dio for data processing and synthesizing. The RFML model outperformed the SWAT model in accuracy and demonstrated its capability in predicting stream ﬂ ow in this region. We implemented a customized approach to the RFR model to assess the model’s performance for three training periods, across 1991–2010, 1996–2010, and 2001–2010; the results indicated that the model’s accuracy improved with longer training periods, implying that the model trained on a more extended period is be tt er able to capture the parameters’ variability and reproduce stream ﬂ ow data more accurately. The variable importance (i.e., IncNodePurity) measure of the RFML model revealed that the snow depth and the minimum temperature were consistently the top two predictors across all training periods. The paper also evaluated how well the SWAT model performs in reproducing stream ﬂ ow data of the watershed with a conventional approach. The SWAT model needed more time and data to set up and calibrate, delivering acceptable performance in annual mean stream ﬂ ow simulation, with satisfactory index of agreement (d), coe ﬃ cient of determination (R 2 ), and percent bias (PBIAS) values, but monthly simulation warrants further exploration and model adjustments. The study recommends exploring snowmelt runo ﬀ hydrologic processes, dust-driven sublimation e ﬀ ects, and more detailed topographic input parameters to update the SWAT snowmelt routine for be tt er monthly ﬂ ow estimation. The results provide a critical analysis for enhancing stream ﬂ ow prediction, which is valuable for further research and water resource management, including snowmelt-driven semi-arid regions.


Introduction
Various modeling tools have been developed and widely used worldwide to predict hydrologic responses [1,2] and are deemed essential for water resource management [3,4], particularly in areas where the hydrologic data or information is limited [5][6][7]. Monitoring is more reliable but is constrained by limited time and resources to collect sufficient data for adequate systems analysis [8][9][10]. In contrast, hydrologic modeling can interpolate between the data gaps and improve understanding of the processes and the parameters [6,8,11]. Furthermore, it is cost-effective and time-saving to help estimate flows to support monitoring, especially when high accuracy is not urgent at the primary stage [12].
However, hydrological processes are sometimes difficult to explain due to the nonlinear climatic and hydrologic factors and the complex parameter relationships [1,13]. Many hydrologic models need improved methods for simulating streamflow across watersheds [13,14]. Although the overall performances of these models have been improved in recent times, the models still warrant further exploration in delineating spatially distributed hydrologic processes [13,15], and researchers should explore the various approaches in simulating hydrologic responses for different watersheds [16,17]. Much past and current research has focused on determining which model among differing model types provides improved predictions that are well matched with observed data [2,18,19].
Selecting an appropriate model for a specific application is critical, as different models have different characteristics and methods and are compatible with varying areas of study [14,20]. For example, Devia et al. (2015) evaluated five hydrologic models: VIC, TOPMODEL, HBV, MIKE SHE, and SWAT. The VIC model was suitable for agricultural water management in moist areas, whereas the MIKE SHE model was unsuitable for smaller watersheds due to the extensive data and physical parameters required. Besides, the SWAT model needed little calibration to achieve acceptable results, whereas the HBV model performed satisfactorily, and the TOPMODEL was successful in catchments with shallow soil and moderate topography [14]. Thus, finding a suitable tool for hydrographic (i.e., streamflow) prediction is challenging due to the unique dynamics of each basin, and there is no single model that is perfect for all basins [1,21]. Moreover, hydrologic factors influencing hydrography vary spatially and temporally [2,14]. Therefore, decision-makers in watershed management should adopt a suitable approach for acquiring reliable monitoring, modeling, and system characterization information, emphasizing the model's ability to capture spatiotemporal variability, suitability for specific time scales, and flexibility in accommodating different climatic conditions. Various types of models exist, from physically based distributed models to empirical ones [1,22]. Physically based models simulate flow in a river basin based on climatic and hydrologic variables, providing insights into river basin processes [23]. However, the physically-based models require considerable effort and extensive information for development, whereas semi-distributed models divide the watershed into units and capture spatial heterogeneity [24], providing flexibility in data requirements and making them more suitable for practical applications in data-limited regions [25]; a well-calibrated semi-distributed model can adequately represent hydrological processes with balance accuracy and computational efficiency and provide reliable predictions with fewer computational resources than a fully distributed model [25][26][27]. However, calibrating the numerous parameters makes the process complex and time-consuming but may sometimes produce different results from observed data due to model structure and parameter uncertainty [23,28]. Empirical models, on the other hand, can be appropriate when the data are limited or the physical processes are highly complex and uncertain [1,28]. Past studies also indicated that complex models were only sometimes the most accurate, whereas simple empirical models were more effective in reproducing observed flow [14,17]. Hence, we intend to identify simplicity that retains high accuracy.
Machine learning (ML) has emerged as an efficient alternative to physical processbased models due to its simplicity and ability to model complex non-linear systems and estimate variables such as streamflow from the other input variables [1, 28]. The study aims to assess the capability of an ML approach in predicting streamflow as an alternative to a semi-distributed model. This study compared the random forest machine learning (RFML) model with the Soil and Water Assessment Tool (SWAT) for estimating long-term streamflow in a snowmelt-led mountainous watershed.
Although both SWAT and RFML have been examined separately in hydrographic prediction in the past [1,23], limited research has compared them [1,28]. Recent studies have shown improved results with ML methods, including random forest (RF) [1,23]. Snowmelt runoff modeling (SRM) has been previously studied and used in this study area [29,30], and a water operations model-URGWOM (Upper Rio Grande Water Operations Model) has also been developed [31,32]. However, the suitability of SWAT, a widely used semi-distributed model, is still unexplored for the study area. SWAT represents the complexity of hydrological processes by combining physical understanding and empirical calibration [33]. Although initially designed for large agricultural basins, not for modeling the heterogeneous mountain basins [5,34], SWAT often performed poorly in mountainous locations and required additional routines to be efficient; many published approaches used integrated features within SWAT [35][36][37].
Nonetheless, evaluating SWAT's performance for such regions could be valuable for further research, providing information on water balance and related parameters [22]. However, rather than developing techniques or routines for model enhancement, the study's secondary objective is to evaluate how well the SWAT model performs in reproducing the streamflow of the study watershed with a conventional approach and to document this effort for future research. The RFML approach in predicting streamflow for this study area is novel, and the comparison with SWAT is also novel. Additionally, the study identified critical parameters and drivers affecting surface water supplies, which are critical for effectively modeling and monitoring water resource systems, particularly in semi-arid snowmelt-driven watersheds. The research outcomes have significant implications for water resource management and ecosystem services.

Study Watershed
The study watershed is Rio Grande Headwaters (RGH) near Del Norte (Figure 1) of the Upper Rio Grande (URG) basin of southwestern Colorado and northern New Mexico in the United States (US). The RGH is located at the upper reaches of the San Juan Mountains, in the upper part of the URG basin [30]. A snow-dominated hydrologic regime characterizes the watershed's hydrology. The watershed covers an area of approximately 3380 square kilometers and includes high-elevation alpine terrain, forested mountain slopes, and some lower-elevation agricultural regions. The elevation of the watershed ranges from 2434 to 4215 m above sea level, with an average of 3230.68 m. Most of the annual streamflow occurs during the spring and early summer as the snowpack melts, which can contribute significant flow to the RG downstream [38,39]. The URG basin exhibits variability in temperature and precipitation based on latitude and elevation. The watershed's annual average temperatures are −6 to −1 °C, ranging from −6 to 7 °C, and it receives an average annual precipitation of approximately 630 mm, with much of this falling as snow in the winter months [38,40]. A series of steep-sided valleys and ridges characterize the topography of the watershed. Overall, the complex landscape and topography of the watershed can pose challenges in accurately simulating flow [29]. Efficient management of water resources in the semi-arid Southwest is critical due to recurring droughts, which are expected to worsen with climate change [38,41,42]. Projected climate change and related impacts on water resources in the diverse US Southwest are expected to vary as a function of local elevation and even by hillslope orientation [29,39,43]. The runoff ratio reduction suggests a potential future decline in streamflow due to rising temperatures [41,42]. Climate change may significantly affect various hydro-geoclimatic factors of the RGH watershed, leading to a compounded decline in runoff, which serves as a vital upstream source and water supply for the entire URG basin. The URG basin's declining snowpack affects streamflow dynamics, potentially impacting agriculture, ecosystems, and socio-technical systems [38,41,43]. Long-term simulations through robust models are needed to efficiently predict streamflow and manage water resources in the region [44]. Temperature rise and precipitation variability are consistently projected across various models [18,45], whereas other crucial variables specific to local contexts are often overlooked. The study's underlying interest is exploring critical components/factors to the watershed's streamflow dynamics by analyzing the parameter sensitivity of SWAT and variable importance measures of RFML.

Prediction Methods and Predictor Variables
Different research groups use different variables and empirical techniques for runoff estimation. For example, the NRCS employed multiple linear regression models that establish a mathematical relationship between predictor and response variables expressed through equations [18]. However, linear methods are appropriate for only some limited cases [46]; conversely, ensemble decision tree-based algorithms are suitable for diverse data, can ignore irrelevant predictors, and handle both linear and non-linear mechanisms being interpretable [47]. RF is a unique ensemble method with inherent accuracy estimation and predictor importance measures [47,48]. We selected RF for its ability to identify and rank important predictors through variable importance measures. Furthermore, the RF algorithm can efficiently use data from diverse sources with different scales, effectively deals with multicollinearity, and does not require normal data distribution [49]. It can capture non-linear relationships despite some collinearity [50,51] since it does not depend on the linearity assumption between predictor and response variables, making it relatively robust to collinearity compared to other regression methods, such as linear regression [52].
Various predictor variables, including snow water equivalent (SWE), snow depth, precipitation, antecedent streamflow, soil moisture, sublimation, temperature, etc., have been used in various studies in predicting the streamflow of the region [18,38,39]. The Natural Resources Conservation Service (NRCS) uses several predictor variables, including snow water equivalent (SWE), precipitation, antecedent streamflow, temperature, groundwater levels, and soil water content, to forecast seasonal streamflow volume [18,53]. Studies indicated that minimum temperature is often significantly connected to snowmelt timing and rate, affecting snow cover's persistence and the snowmelt initiation time [45,54]. Elevated minimum temperatures accelerate snowmelt, leading to earlier peak snowmelt and lower overall snow water equivalent [39,55]. However, the minimum temperature and snowmelt relationship depends on many factors, such as snowpack characteristics, elevation, and regional climate patterns [29,54]. The increasing temperatures can also increase sublimation by enabling greater latent heat absorption into the snowpack, causing a reduction in snowpack and earlier runoff and reducing water supplies after runoff [45,55].
Snowpack properties and distribution are vital for water supplies in snowmelt-dominated river systems, acting as a reservoir and contributing significantly to runoff patterns [53,56]. Runoff can also be influenced by soil moisture, an essential factor in providing water from snowmelt runoff and precipitation [57,58]. Based on the literature review [18,38,39,53,54,58], we selected five non-mutually exclusive predictor variables-minimum temperature, snow depth, precipitation, soil moisture, and sublimation for streamflow prediction using the RFML model, as these variables can interact in complex ways and may co-occur to influence streamflow.
Several studies emphasized understanding the dynamics of variables' influence on estimating runoff [39,59,60] and discussed the potential applications of remote sensing techniques in monitoring variables' variability for improving prediction [37,59]. The advent of novel techniques in remote sensing has enhanced its capability to address the spatial and temporal variability of snow factors [39,60]. Many researchers, therefore, utilized remote sensing data and techniques such as synthetic aperture radar (SAR) and optical remote sensing (ORS) to monitor these variables' variability [59,61,62].

Random Forest Machine Learning (RFML)
RFML is a supervised ML method that creates multiple decision trees for the prediction model, where each tree is trained on a randomly selected subset of the available training data [48]. During prediction, the algorithm combines the predictions from each decision tree to generate a final prediction [1, 47,63]. In addition, the RFML can handle missing data and non-linear relationships between the predictors and the target variable [1,63].

RFML: Random Forest Regression (RFR)
RFR is a specific type of RFML that is effective for predicting continuous values [64]; it generates many decision trees on different data subsets, and the final prediction is made by averaging the results of all the individual trees. RFR has been widely used for hydrologic prediction utilizing meteorological and hydrological parameters; its feature of identifying the most critical predictor variables helps comprehend underlying hydrological processes [65][66][67]. Studies have demonstrated the superiority of RFR over other ML algorithms and traditional statistical methods in streamflow prediction [63], especially in regions with intricate hydrological processes and limited data availability [64,68]. Cho et al. (2019) successfully applied RFML predicting hydrographs in snowmelt-driven mountainous watersheds [69]. RFML has become popular in the remote sensing and hydrology communities due to its higher accuracy in streamflow predictions and flood risk management, which have traditionally been challenging with traditional approaches [69].

Data Description
The study used a 30 m × 30 m resolution digital elevation model (DEM) obtained from the Shuttle Radar Topography Mission (SRTM) of the United States Geological Survey (USGS) EarthExplorer and extracted it using QGIS 3.16 [70]. The watershed was delineated using the DEM and the operational USGS gauging station (Lat 37°41′19.0″, Lon 106°27′35.5″, NAD83) at Del Norte (08220000) (Appendix A). Monthly time-step data for selected predictors (i.e., minimum temperature, snow depth, precipitation, soil moisture, and sublimation) and response (i.e., streamflow) variables were gathered from January 1991 to December 2016 from various sources summarized in Table 1.  [76] To prepare the remote sensing data for analysis, we first disaggregated the original monthly raster cells to a resolution of 30 m × 30 m. Then, we clipped the resulting cells to the sub-watershed border. Subsequently, the sub-watershed responses were derived as the average of the disaggregated pixels, determining the monthly mean values of the variables of interest. All data processing and analysis were conducted using RStudio [77]. We use the raster packages' disaggregate () function that employs bilinear interpolation in raster resampling [78].
We extracted the randomForest package [79] to develop the RFR model. The target variable and input features are defined in the training set. The general process primarily splits the data into training and validation sets through bootstrapping and out-of-bag (OOB) sampling. The train () function is used for ML tasks, i.e., bootstrapping and out-ofbag (OOB) for cross-validation, reducing bias and overfitting. For optimizing the performance on the validation set, a grid or random search is performed over a range of values for the hyperparameters ntree (trees), mtry (variables randomly selected at each split), and maxnodes (maximum terminal nodes in each tree). We set the root mean square error (RMSE) as the training control object for a 10-fold cross-validation (method = "cv," number = 10); the model adjusts its parameters during training to minimize the RMSE, leading to more accurate predictions of the validation set. The 10-fold cross-validation process involves dividing the data into 10 subsets, training the model on nine subsets, and evaluating its performance on the remaining subset. The process is repeated 10 times, using a different subset as the validation set. The caret package used automatically selects the best combination of the hyperparameters based on the minimum RMSE value. We applied ridge regularization during model training to address overfitting concerns and ensure the model's robustness. Ridge regularization adds a penalty term to the loss function, which helps control model complexity and prevents overfitting [80]. The model returned by the train () function is trained on the entire dataset and, based on the average performance of the model on all the resampled data, provides a more robust estimate of the model's performance [64]. Finally, the trained model makes predictions on a separate validation set, and we evaluated its capacity to reproduce flow data on the validation set. Figure 2 presents a flowchart for this RF model connecting predictor and target variables. RFML addresses prediction uncertainty through bootstrapping to generate an ensemble of decision trees, which involves randomly sampling the dataset, allowing each decision tree to be trained on a different subset of the original data and estimating the variance of predictions across the trees. The OOB error estimate further evaluates the model's performance on the test data set, addressing prediction uncertainty. The default OOB sample size is one-third of the original dataset proposed by Bradley Efron [81]. However, the optimal proportion may vary depending on the dataset size and complexity. Therefore, experimenting with different proportions is crucial to determine the best fit for each situation.

Analytical Procedure
We implemented a customized approach to the RFR model for long-term streamflow prediction, assessing the model's performance for three training periods. Adapting the customized approach involves manually splitting the dataset into different periods for the training dataset and holding out a validation dataset not used in the model training. The predict () function in the code utilizes the trained model and the input variables' data to predict streamflow on the validation dataset. We evaluated the trained model's performance on a separate validation dataset not used during training or cross-validation. Three different training/validation data ratios were considered: (1) a 62.5-37.5% split (i.e. We assessed the impact of the length of the training data on the model's prediction accuracy by varying the training and validation data ratio and evaluating their performance using performance metrics. Using segregated data splits allows the model's ability to generalize for unseen future data to be investigated, which is crucial for hydrologic predictions. It also provides valuable insights into the model's adaptability to different climatic conditions and land use patterns, affecting streamflow. In other words, significant for model development and validation, we identified the impact of the training period's length on the hydrologic prediction using RFR. To the authors' knowledge, this analytical approach for exploring RFR in hydrographic prediction has yet to be documented in the literature.

Variable Importance
RF determines predictor variable importance by measuring the decrease in the impurity of the target variable when a particular predictor variable is included in the model. The IncNodePurity represents the total decrease in node impurity, weighted by the probability of reaching that node, averaged over all decision trees in the ensemble. A higher value of IncNodePurity indicates that the variable is more important for the model. The importance () function in the randomForest package computes the importance scores. The algorithm calculates the importance score for each predictor variable and averages these values over all trees to measure variable importance [66,79,82].

SWAT Hydrologic Model
SWAT is a well-known process-based hydrological model for simulating hydrologic cycles, sediment yield, water quality, and quantity in watersheds, developed by Texas A&M AgriLife Research and the United States Department of Agriculture-Agricultural Research Service (USDA-ARS) [7,34]. The SWAT model incorporates both physically based and empirical elements, such as the curve number (CN) method for estimating runoff and requiring calibration for parameter adjustment [33,83]. The classification of SWAT varies among experts, with some considering it a physically based model with empirical components and others categorizing it as semi-physically based [84]. Primarily developed for large agricultural watersheds, SWAT is suitable for assessing the impact of long-term land use, management practices, and climate change on ungagged study basins [36,85]. SWAT requires input variables such as DEM, LULC, soil map, and weather data [86] and is a suitable hydrologic model for long-term simulations [4,87].
SWAT divides watersheds into sub-basins and hydrological response units (HRUs) based on land use, lithology, and slope. This creates a more accurate model, with different HRUs for each sub-basin. The hydrological cycle is simulated using the general water balance equation [85,88]. The SWAT water balance equation calculates the final water content (SWt) of a watershed based on the initial water content (SW0), precipitation (Rd), surface runoff (Qsur), evapotranspiration (Ea), unsaturated zone water accumulation (Wseed), and return flow (Qqw). The water balance equation is as follows: It represents the balance between water cycle components in a watershed or catchment. The SWAT water balance equation helps understand the hydrological processes in a watershed and is useful for evaluating the impacts of land use and climate change on water resources [85,88,89].
The study implemented the general process of SWAT 2012 hydrologic modeling, including data collection, hydrologic model setup, output data calibration, sensitivity analysis, and validation [33]. First, input data were collected, including climatic and hydrologic parameters; a hydrologic model was then set up to convert the input parameters (rainfall, temperature, relative humidity, wind speed, and solar radiation) into runoff or flow. Next, the simulated flow was calibrated with observed flow through selected parameters. Finally, a sensitivity analysis was conducted to rank the parameters' sensitivity, and the model was validated in the end. The study calibrated monthly and annual mean simulated flow (a warming period in 2001) with observed flow data from 2002 to 2010 and validated it from 2011 to 2015.

Input Data
The input data-topography, land use, soil, meteorology, and hydrography datawere collected from various sources/agencies; the data and the corresponding sources are given in Table 2. The observed flow was converted to cubic meters per second to match the SWAT model's flow output unit [88]. Similarly, the raster DEM, soil, and land use map were converted into a common NAD 1983 UTM Zone 13N coordinate system and resampled land use and soil raster to a 30 m × 30 m resolution to prepare for analysis in QSWAT. QSWAT, a QGIS plugin, is used to delineate the watershed (Appendix A), calculate hydrologic responses, and visualize SWAT outputs [70]. Table 3 summarizes the land use and soil information extracted for the watershed using land use land cover and soil data layers. Precipitation and temperature data were downloaded from the CFSR Global Weather database for the study watershed and re-formatted for the SWAT model. Other variables, such as wind speed, evapotranspiration, relative humidity, and solar radiation, were generated through the SWAT weather generator [88,89].

Calculation of Runoff Volume
The model used the SCS-CN (Soil Conservation Service-Curve Number) method, commonly used for estimating the runoff generated from a rainfall event in a particular area, preferably in suburban or rural areas [94]. The method is designed for a single storm event but can be scaled to find average annual runoff values. The curve number is based on the area's hydrologic soil group, land use, treatment, and hydrologic condition for a particular watershed, where hydrologic characteristics of soil and rainfall volume are known. The entire runoff hydrography can be produced as an outcome. When specific information on antecedent conditions is unavailable, the SCS-CN method is widely used to estimate precipitation [95]. Typically, the SCS model computes direct runoff with the help of the following relationship: where CN = weighted curve number, CN = curve number from 30 to 100, and N. A = area with curve number CNi. A is the total area of the watershed. At the same time, CN is the runoff curve based on hydrologic soil cover, a function of soil type, land cover, and antecedent moisture condition (AMC). Q is an actual runoff in mm, P is total rainfall in mm, and S is the potential maximum water retention by the soil in mm [96]. Based on the capacity of the soil for infiltration, soils are divided into four categories: A, B, C, and D, representing a strong infiltration capacity, a fairly high infiltration capacity, a moderate infiltration capacity, and a low infiltration capacity, respectively [97].

Model Calibration, Validation, and Performance Evaluation
The SUFI-2 algorithm provided by SWAT-CUP automatically calibrates the model parameters [83]. The simulation period is divided into warming-up (2001) Root mean standard deviation ratio (RSR) Percent bias (PBIAS%) ∑ ∑ 100 (8) Where Qobs is the observed streamflow, and Qsim is the simulated streamflow. The index of agreement (d) is a metric used to measure the agreement between observed and simulated values, ranging from 0 (no agreement) to 1 (perfect agreement) [98]. Although R-squared (coefficient of determination) is commonly used for performance evaluation in hydrological modeling, it has limitations, such as oversensitivity to high extreme values and insensitivity to additive and proportional differences between simulated and observed data [99]. The Nash-Sutcliffe efficiency (NSE) is a normalized statistic that estimates the relative magnitude of residual variance and measures how well the observed versus simulated plot fits on the 1:1 line [100,101]. Finally, the PBIAS measures the central tendency of the simulated data and indicates whether the model's performance is poor by identifying the inclination to be greater or smaller than the observed data [99]. Table 5 presents the objective function results, their statistical ranges, and optimal values. The SWAT model outputs were calibrated using a multi-objective approach for the selected parameters to achieve goodness-of-fit between observed and simulated flows; The sensitivity of parameters was determined to assess their impact on streamflow by regressing them against the average of the objective function values.

Sensitivity Analysis
Sensitivity analysis aims to determine the optimal range of parameters and rank their sensitivity [83]. The study performed a global sensitivity analysis using Latin hypercube sampling to determine the optimal parameter range and rank sensitivity. It is usually used in Monte Carlo simulation; it significantly reduces the number of runs necessary to achieve a reasonable outcome [102]. The t-stat provides a measure of sensitivity identifying relative significance, and the p-value determines the significance of the analysis (a value close to zero is more significant). A larger absolute value of the t-stat and a smaller p-value indicates a more sensitive parameter [83].

SWAT Model Performance Assessment
The study evaluated the performance of SWAT by comparing simulated monthly and yearly average flows with observed data using the selected objective functions. The results show that the monthly flow simulation needed more adjustments for better prediction accuracy, whereas the annual simulation was fairly acceptable.
We selected 22 parameters for streamflow calibration based on relevant literature of comparable study areas [36,37,103]. The original ranges and fitted values of these parameters are provided in Table 6. The primary ranges of parameters for monthly and annual simulations were the same; we also listed the fitted values in Table 6 and depicted the flow hydrographs in Figures 3 and 4. Once the calibrated parameters were validated, the simulated and observed flows were illustrated and compared for both the calibration and the validation periods, as shown in Figure 3. The simulation results showed a slight left shift in most flow peaks compared to the observed ones, implying that the simulated peak flows occurred a month or two earlier than the observed peak. This discrepancy suggests a timing issue or error between the observed and simulated flow [83,103], indicating that the simulated flow peak occurred slightly earlier, ranging from one month to two earlier than the observed flow peak. Figure  4 presents a comparison between simulated yearly mean discharge and observed discharge. The SWAT model's annual simulation results provided better accuracy (Table 7) with observed flows; however, a model overestimation (Figure 4) still appeared during calibration and validation periods. The goodness-of-fit metrics, presented in Table 7, provide a means of evaluating model performance and comparing the accuracy of yearly and monthly flow simulations.  Table 7 shows the goodness-of-fit metrics for the observed vs. simulated streamflow during the calibration and validation periods. The yearly simulation approach performed better, with higher R 2 , NS, and d values and a lower RSR value than the monthly simulation. On the other hand, the performance of the monthly simulation, with R 2 , NS, and d, was below satisfactory levels. The annual simulation was acceptable, with satisfactory R 2 and d at both the calibration and validation stages. However, monthly and yearly simulations overestimated the observed flow during calibration and validation but within acceptable limits (± 25) of PBIAS%. Despite being within the acceptable limit of ± 25, this implies that there might be some uncertainties in the input data or the model structure, which need to be further investigated and addressed to enhance the model's performance.

Calibration Validation
Therefore, during calibration, the sensitive parameters were identified through global sensitivity analysis. The analysis revealed that HRU SLP.hru (average slope steepness) was the most sensitive for monthly streamflow, followed by EPCO.bsn, Al-pha_BF.gw, SMFMX.bsn, and CH_K2, in that order of sensitivity (Appendix B), and these parameters significantly impacted the model output. Therefore, improving these parameters' measurement or estimation accuracy could enhance the model performance and the overall understanding of the watershed hydrological response.

RFML Model Performance Assessment
The study implemented the 10-fold cross-validation approach, applied ridge regularization, and presented metrics for the cross-validation and validation datasets to comprehensively evaluate the model's performance and its potential for overfitting. The RFML model was developed for three training periods to predict and validate 2011-2016 data. The models' performances were assessed based on the same evaluation metrics-d, R2, PBIAS%, NSE, and RSR-as shown in Table 8. The results indicate that the model performed well on the validation dataset, and consistency between cross-validated metrics and validation metrics suggests that the model generalized reasonably well to new, unseen data. The RFML model performed well in predicting the validation set in all three training periods, with a good R 2 (0.79-0.81) and NSE (0.791-0.846), indicating a good fit between observed and predicted values. The PBIAS% values ranged from 1.079% to 4.982%, indicating a slight underestimation of the flow. The ranges (0.918-0.956) of d values suggest that the model's predictions agreed with the observed data. The RSR values were low (0.487-0.394), implying that the model's predictions had low uncertainty. The overall results suggest that the RFML models performed well during the cross-validation and testing/validation stages and accurately predicted streamflow. The results also indicate that increasing the training period length improved the RFML model's performance. Although the improvement was not significant, the significant part was the improving nature, consistent with the increased training period, which indicates that the model became more capable of capturing variability and delivering better results with a more extended training period. The model trained in the most extended period (1991-2010) had the highest values of d, R2, and NSE and the lowest value of RSR among all three training periods, indicating that the model trained on the longest period was better able to capture the variability in streamflow data and produce more accurate predictions.

Variable Importance Assessment
This relative importance (IncNodePurity) ranking was significant to the study; it allowed for prioritizing critical predictors and understanding their relative contributions to the model's predictions. We analyzed the variable importance measure to identify key predictors that became significant during specific training periods, which allowed us to understand how the predictor rankings evolved and adapted to changing hydrological conditions, providing valuable insights into influential variables under different climatic and environmental settings. Incorporating the variable importance measure enhanced the interpretability of the models, making it valuable for decision-making in hydrological applications.
The IncNodePurity ( Figure 5) suggested that mean_snowDepth was the most influential predictor of the streamflow of the watershed; however, it was not correlated (Appendix C) with streamflow. The IncNodePurity importance measure also revealed that mean_snowDepth and Tmean were consistently the top predictors across all training periods, whereas Tmean also had a good correlation with streamflow. Figure 5 shows each training period's RF variable importance (i.e., IncNodePurity). Mean_ppt was weakly but interestingly negatively correlated with streamflow in the correlation table (Appendix C), which may require further investigation. The correlation between precipitation and streamflow can be negative in watersheds with high evapotranspiration rates or soils with high infiltration capacity. More rain can lead to less water in rivers and streams, negatively correlating precipitation and streamflow. In addition, a negative correlation can occur if precipitation falls as rain, causing early snowmelt or not accumulating as much. The relationship in mountainous basins dominated by snowmelt is complex and depends on temperature, snow accumulation, and snowmelt timing. Mean_soilMoisture and mean_ppt importance varied for different training periods and mean_sublimation was consistently the least important, according to IncNodePurity. Soil moisture and snow depth were also strongly correlated, whereas snow depth was the most important predictor variable. Therefore, we plotted the snow depth with the predicted and observed streamflow in Figure 6.  Figure 6 shows the relationship between snow depth and the observed/predicted streamflow. The peak flow occurred when the snow depth was diminishing, indicating that snowmelt is a significant factor in determining the timing and magnitude of the peak flow. This relationship is essential to the hydrological cycle in snow-dominated regions and can inform water resource management decisions. It is important to note that the peak snow depth/diminishing varied by a couple of months and did not consistently occur at the same time each year, influencing the streamflow and the peak flow timing in the basin.

Discussion
The SWAT monthly simulation exhibited a leftward shift in the flow peak compared to the observed data, indicating a timing error. Snowmelt timing likely played a crucial role in this case. The degree-day approach in the SWAT model assumes uniform snowmelt across the watershed [21,89], which may not be the case in some areas of this large mountainous watershed for topographic and meteorological factors. This approach defines snowmelt as happening when the temperature on a given day is higher than the melting temperature of snow [89]. However, this approach only takes the daily average temperature and does not consider the total amount of heat accumulating over time [21]. SWAT considers 0 °C a linear function of the difference between the average snowpackmaximum air temperature and the base or threshold temperature for snowmelt [89], but the dust in snow of the watershed can also change the mechanism, which can allow more solar radiation into the snowpack [104,105]. Absorbing more solar energy can enhance sublimation and affect snowmelt timing by accelerating the snowmelt rate [105,106]. The accurate data or prediction of these snowmelt characteristics is critical for the model prediction for this area (Appendix Figure A7). Many researchers, therefore, use remote sensing data or develop particular routines (i.e., a temperature index-based approach within SWAT) to capture spatial variability in snow accumulation and melting [21,[35][36][37]43]. For example, Debele et al. (2009) compared two methods for simulating snowmelt processesa physically based energy budget model and a simpler temperature index model within SWAT, and they found that the simpler temperature index model was sufficient for most practical applications and that the inclusion of ground surface slope and aspect could improve results [35]; this approach can be adopted to improve monthly simulation, as accurately representing slope steepness is also crucial in this watershed in simulating hydrologic response due to the sensitivity of the SLP.hru parameter.

SnowDepth Observed Streamflow Simulated StreamFlow
The HRU SLP.hru parameter is the most sensitive parameter for the watershed, and accurately representing slope steepness is crucial for capturing hydrological behavior. Incorporating more detailed topographic information for calibrating the SLP.hru parameter in a large watershed like the RGH may be challenging but necessary to improve model performance. Improving accuracy in estimating slope steepness using DEMs or remote sensing techniques can be utilized. Accurate data or prediction of these factors is critical for model accuracy, as slight changes can significantly change the model prediction.
The SWAT model warrants more time and data to set up and calibrate; still, a viable model delivering acceptable performance in simulating annual streamflow simulation with satisfactory d, R 2 (Appendix D), and PBIAS% values but monthly simulation warrants further exploration and model adjustments. In contrast, the RFML model showed good performance with a simplified procedure in predicting monthly flow using remotely sensed data and some predictor variables, as indicated by the high values of R 2 and NSE, which measure the model's accuracy in capturing the observed streamflow variability. Although the model tended to underestimate streamflow values (positive PBIAS%), it still predicted streamflow well during the validation period with different training lengths. A more extended training period enhanced the RFML model's performance, allowing it to capture more complex relationships between input and output variables, resulting in better predictions.
IncNodePurity of RFML determined the importance of snow depth as a predictor, which is intriguing despite not correlating with streamflow. The feature importance of RFR is typically calculated based on how much a particular feature contributes to reducing the impurity in the nodes of the decision trees. The RFML model can capture complex non-linear relationships between snow depth and streamflow, which traditional correlation analysis may not reveal. The peak flow observed after a month or two of diminishing snow depth, as shown in Figure 5, indicates that the variability in snowmelt rate posed challenges in establishing a direct correlation. Other factors like climate change and dust may contribute to this complexity, affecting the snowmelt rate. Here, the snowmelt (Appendix Figure A8) rate would be more likely to correlate with streamflow rather than snow depth. Although the IncNodePurity and global sensitivity analysis employ different methodologies, they both indicate the importance of snow parameters, with the sensitivity analysis identifying the SMFMX.bsn (melt factor) as one of the top sensitive parameters.

Scope of Future Study
We evaluated the sensitive parameter ranking of the SWAT model, which highlighted mountain slope (HRU SLP.hru, average slope steepness) and snowmelt factor (SMFMX.bsn) as essential parameters for model sensitivity. On the other hand, snow depth and min temperature consistently emerged as the top predictor variables from RFML variable importance analysis across different periods. Interestingly, altitude, min temperature, snow depth, and snowmelt rate are interconnected and mutually inclusive, influencing the streamflow dynamics, which provides a significant insight into the watershed response. The relative importance ranking, combined with the sensitive parameter analysis from SWAT, enhances understanding of the watershed's behavior, shedding light on the contribution of identifying critical parameters. Enhancing these parameters' estimation accuracy could significantly improve model performance and deepen understanding of watershed hydrological response, laying a solid foundation for future research.
The study focused on a watershed with specific characteristics critical for the entire basin; future studies can incorporate a broader dataset with multiple heterogeneous watersheds, allowing for a more comprehensive analysis of watershed behavior. Such an approach can enable a deeper understanding of the relative influence of parameters, enhancing its analysis by quantifying the impact of predictors on the model's error and incorporating lag effects in the analysis, which would provide a better understanding of temporal relationships between variables, such as the influence of winter soil moisture on summer flow.

Conclusions
The study assessed SWAT performance in a semi-arid snowmelt-driven region of the RGH, calibrating monthly and annual average streamflow from 2002 to 2010 with a oneyear (2001) warm-up period and validating from 2011 to 2015. The SWAT model produced acceptable results for the annual average streamflow simulation; however, more exploration and adjustments were required for the monthly simulation. The study recommends exploring snowmelt runoff hydrologic processes, dust-driven sublimation effects, and more detailed topographic input parameters to enhance the model for better flow estimation. This study serves as a reference, documenting a primary study with SWAT for the study area and identifying challenges that require the development of additional routines for model enhancement, foundational for future modeling efforts.
A critical aspect of the study was the utilization of remotely sensed datasets, such as SRTM, PRISM, and GES DISC, to generate derived datasets, which played a significant role in the investigation and provided valuable inputs for modeling efforts. The remote sensing data used in this study were explicitly derived for monthly response and subsequently utilized in RFML to explore reproducing streamflow data. As a result, the research presented an exclusive approach for deriving remote sensing data and using it in RFML, demonstrating the usefulness of the process for watershed modeling and monitoring.
Every hydrological model has specific applications and distinct limitations associated with uncertainties [9,86]. These uncertainties can be addressed by comparing different models for a given geographic region [2,19,28]. The study compared the effectiveness of the SWAT and the RFML models and provided insights into the strengths and limitations of their simulating streamflow in a snowmelt-dominated watershed. For example, SWAT can simulate flow in watersheds with no streamflow data and can be valuable for understanding underlying hydrologic processes. However, it takes large input data, including land use, soil type, topography, weather data, and other parameters influencing the hydrologic process; consequently, the model requires substantial time and effort. Therefore, it was crucial to pursue a more efficient alternative to reproduce streamflow closest to reality with as simple an approach as possible (i.e., the principle of parsimony). The RFML model demonstrated higher prediction accuracy than the SWAT model, utilizing a simplified procedure, even without requiring parameter tuning or predictor reduction.
Process-based hydrologic models can provide valuable insight into hydrologic processes and model parameters; however, simulating the process warrants much more information and time. A simpler machine learning model may be more suitable when the main goal is to predict or reproduce streamflow to support watershed management with limited information. Based on validation data metrics, RFML appears to be a desirable alternative or a complementary tool to process-based models. The RFML model outperformed the SWAT model in accuracy by utilizing remote sensing data of the hydroclimatic predictor variables of the snowmelt-driven watershed, where snowmelt runoff modeling is particularly challenging. This also opens up new scopes for research in hydrographic predictions.
Simulating hydrologic responses is crucial for operational hydrology and water resource management, especially in data-scarce regions. Although both models are useful, their performance can be improved by reducing uncertainty and enhancing the estimation accuracy of the critical parameters. Furthermore, integrating semi-distributed (i.e., SWAT) and ML approaches in predicting hydrologic responses can enhance monitoring and man-agement efforts. The results provide a critical analysis for enhancing the streamflow prediction needed for monitoring and managing water resources, including snowmelt-led semi-arid regions. In addition, the findings can aid in modeling hydrologic events such as climate change, land use change, floods, and droughts, leading to improved water security, ecosystem health, and sustainability.   PRECIPmm