Multi-Gene Genetic Programming Regression Model for Prediction of Transient Storage Model Parameters in Natural Rivers

: A Transient Storage Model (TSM) which considers the storage exchange process that induces an abnormal mixing phenomenon has been widely used to analyze solute transport in natural rivers. The primary step in applying TSM is calibration of four key parameters: ﬂow zone dispersion coefﬁcient ( K f ), main ﬂow zone area ( A f ), storage zone area ( A s ), and storage exchange rate ( α ); by ﬁtting the measured Breakthrough Curves (BTCs). In this study, to overcome the costly tracer tests that are necessary for parameter calibration, two dimensionless empirical models were derived, to estimate TSM parameters, using multi-gene genetic programming (MGGP) and principal components regression (PCR). A total of 128 datasets with complete variables from 14 published papers were chosen from an extensive meta-analysis and were applied to derivations. The performance comparison revealed that the MGGP-based equations yielded the superior prediction results. According to TSM analysis of ﬁeld experiment data from Cheongmi Creek, South Korea, although all assessed empirical equations produced acceptable BTCs in monotonous reach, the MGGP model was superior to the other models. The predicted BTCs obtained by the empirical models in some highly complicated reaches were biased due to misprediction of A s . Sensitivity analyses of MGGP models showed that the sinuosity is the most inﬂuential factor in K f , A s , and α , whlie A f is more sensitive to U / U ∗ . This study proves that the MGGP-based model can be used for economic TSM analysis, thus providing an alternative option to direct calibration and the inverse modeling initial parameters.


Introduction
Precise prediction of solute transport in natural streams is essential to management of water quality in rivers. To analyze the fate and transport process of solutes, tracer testing using a conservative or non-conservative tracer is a one straight forward method used in many solute transport studies. In order to analyze solute transport mechanisms with a tracer test, assessment and measurement of hydraulic and, geomorphic properties and breakthrough curves (BTCs) are necessary. The obtained BTCs are used for straight analysis and estimation of solute mixing parameters in mixing analysis models. However, in a natural river system, analysis of the fate of solutes requires quantification of the effects of transient storage zones, including factors such as bed material, pool-riffle, channel meander, artificial hydraulic structures, and aquatic vegetation ( Figure 1). In Figure 1, the red dashed lines indicate the hyporheic exchange, and solid navy lines indicate free flows through surface transient storages. These transient storage zones drastically influence flow structure and cause anomalous dispersion characteristics which cannot be simulated using a conventional one-dimensional advection dispersion equation model (1D-ADE) as shown in Figure 2. Thus, previous research has advocated models that account for the effect of transient storage rather than 1D-ADE in order to better represent skewed BTCs with long tails [1][2][3][4][5][6][7][8][9][10].  [11][12][13][14][15]).
In order to solve the anomalous mixing problem shown in Figure 2, many efforts have focused on the development of phenomenological models that simulate the BTCs observed in natural tracer tests [17]. Representative models that were developed over the past decades include the following: conventional Transient Storage Model (TSM) [10], multiple zone TSM [18], Fractional advection-dispersion equation model (FADE) [19], Modified advection-dispersion model (MADE) [20], Multirate Mass Transfer Model (MRMT) [21], Advective Storage Path Model (ASP) [22], Continuous Time Random Walk approach (CTRW) [23], Solute Transport in Rivers (STIR) [24], and Aggregated Dead Zone (ADZ) [25]. Those models are continuously updated in order to reflect real mixing phenomena. Among the phenomenological models that consider transient storage, TSM is the most widely used solute transport model. TSM imitates the complicated natural river mixing process due to transient storage by simplifying the perpendicular mass exchanges between two zones, a mobile zone (main flow zone) and an immobile zone (storage zone) as shown in Figure 3. TSM requires calibration of four key parameters, which are main flow zone dispersion coefficient (K f ), main flow zone area (A f ), storage zone area (A s ), and mass exchange rate (α), by inverse modeling in which best fit parameters can be found by matching simulated BTC to the measured curve. In particular, one-dimensional transport with inflow and storage (OTIS), which is a finite difference method numerical model, and its parameter estimator OTIS-P are common software tools used for TSM analysis [26]. However, many investigators have reported that the OTIS-P suffers from the equifinality problem that the calibrated TSM parameters may be local optima or unreasonable parameter sets [27,28]. Several investigators have attempted to overcome the local minima problem by adopting meta-heuristic optimization algorithms [16]. Even though the meta-heuristic optimization methods have an advantage over local search algorithms, there is a problem in that the estimated TSM parameter values differ given different curve similarity criteria [16]. From another point of view, many researchers have focused on identifying the properties that contribute to uncertainty in TSM parameter estimation [25,26,29,30]. For example, a study that took the TSM parameter combination point of view showed that the value of experimental Damkohler number (DaI), which is the ratio of solute advection and the storage effect, in a reasonable range [0. 5 10] has less uncertainty, so DaI is used as a reference value in reasonable TSM parameter estimation [29]. In order to overcome uncertainty, uncertainty software tools were developed using statistical approaches, such as Monte-Carlo analysis [28], and a generalized likelihood uncertainty estimation (GLUE) framework [30]. On the other hand, a recent study spotlighted the computational points of TSM, and proved that computational conditions such as grid size (dx) and computational time step (dt) affect the result of TSM parameter estimation since numerical models have numerical errors [31].
Overall, TSM parameter estimation using inverse modeling has a primary limitation in that it requires BTC measurement data and hydraulic data from tracer tests for every parameter estimation. For this reason, conventional parameter estimation methods are infeasible for mixing studies of large-scale rivers. For example, recent studies exploiting the solute transport models pay attention to the contaminant source identification problem [32][33][34]. However, such studies are conducted under the assumption that the mixing parameters are known since both 1D-ADE and TSM require calibration of parameters using inverse modeling, which is a high-cost method with a great deal of uncertainty. More specifically, users have to re-estimate TSM parameters when discharge or bed-form change, even at the same stations since hydraulic properties, which influence TSM parameters, have changed.
To overcome the drawbacks of parameter estimation with inverse modeling, empirical equations to calculate the properties of transient storage have been developed by analyzing their relationship with hydraulic features, such as velocity, shear stress, and shape-related factors in cross-section [4,10,[35][36][37][38][39][40][41]. Jackson et al. [15] comprehensively classified hydromorphic factors accounting for diverse surface transient storage (STS) factors that exert sudden changes in flow structure, apart from hyporheic transient storage (HTS). An additional accessible feature that explains STS in that article is a morphological feature, channel sinuosity (S n ). Similarly, most of empirical equations used to estimate hte 1D-ADE dispersion coefficient K ADE are functions of cross-sectional width (W), mean hydraulic depth (h), mean velocity (U), and shear velocity (U * ) [42][43][44][45][46][47][48][49]. Early works that took a regression approach to TSM parameter estimation focused on relating hydraulic properties using only storage area ratio = A s /A f . Recent studies produced extra parameters accounting for transient storage zone, such as K f , the storage residence time T sto = /α, and A s [37][38][39][40].
Plus, a few studies [38,39] considered K ADE as an influential factor accounting for transient storage, but K ADE cannot be obtained unless BTCs are available. More recently, a set of empirical equations for OTIS-based TSM parameters, K f , A s , and α, were proposed. Femeena et al. [40] carried out a meta-analysis of various published studies on river mixing tracer tests, and they included 1D-ADE dispersion coefficient, K ADE , values in the derivation of the empirical equation for the main flow zone dispersion coefficient, K f . It has been reported that K ADE has a larger value than K f for the same BTCs since TSM deforms BTCs by the composite effect of the four TSM parameters while 1D-ADE deforms BTCs with only K ADE [37]. The equations are based on non-linear regression analysis and finding equation forms by trial and error; equation forms uncovered by this approach can be restricted by the researcher's intuition, so it is difficult to find hidden non-linear relationships. Even though previous equations are valuable since they do not require much information for estimation of the three TSM parameters, there have been no efforts to identify the main flow zone area, A f , using the empirical model since many investigators regard it is a deterministic value once hydraulic properties are measured in the tracer test. Thus, a complete model with the four key transient storage parameters (K f , A f , A s , and α) does not yet exist.
As data-driven approaches emerge, recently developed models have employed machine learning methods (e.g., support vector machine (SVM), artificial neural networks (ANNs), and symbolic regression techniques). Symbolic regression techniques have the advantage of producing explicit forms of equations, while the other machine learning methods produce implicit results. Taking an example of the water resource problem, multi-gene genetic programming (MGGP) was successfully utilized in daily streamflow prediction [50], and prediction of K ADE [41]. In accordance with the comparison study, genetic programming (GP) showed superior results compared to SVM and particle swarm model selection [51], whereas SVMs and ANNs have the potential for over-fitting and challenge of kernel parameter tuning results [52].
This study has three main objectives. The first aim of this study was to develop new dimensionless empirical equations for the complete set of TSM parameters, K f , A f , A s , and α, that reflect the effects of both hydraulic and morphological features, using MGGP, which has been proven to be preferable in representing the non-linear behavior of the water quality variables as well as streamflow. In this study, we suggest new expressions for A f in the analysis since the main idea of TSM implies that A f can vary when other TSM parameters change, even though past efforts did not present an empirical equation for the main flow zone area A f . The second objective was to investigate the applicability of the proposed equations for a tracer injection experiment in a natural river, Cheongmi Creek, Korea. Consequently, one-at a time (OAT) sensitivity analysis was conducted for each empirical equation of the MGGP model in order to identify the major hydromorphic property of each TSM parameter. Thus, the objectives of this study were to present new alternative options for the TSM parameter estimation and to analyze the TSM parameters in terms of hydromorphic properties that are rather easily measured.

Transient Storage Model
The fundamental idea of a TSM is to simplify complex natural river into a free flow zone and an immobile storage zone, as illustrated in Figure 3. After the first introduction of a TSM, the executable program OTIS was developed; it additionally considers lateral inflow and reactive solutes [26], and it is far the most popular TSM implementation [17]. TSM consists of a main flow zone equation (Equation (1)) and a storage zone equation (Equation (2)), where solutes are exchanged perpendicularly between the two zones.
where Remarks for the TSM One-dimensional solute transport analysis is successfully applicable under the following three conditions: (1) statistically steady flow field; (2) constant cross-sectional area; (3) complete mixing over the cross-section [8]. However, even if the cross-sectional mixing is completed, the 1D-ADE simulation cannot be correct in complicated geometry, where skewed BTCs with a long tail are observed [45,53]. Nevertheless, if the characteristic length scale for variation is much larger than the channel variation, the longitudinal dispersion process in Equation (1) can be reasonable [53]. Complicated geometry accompanies storage zones where the solute is trapped. The streamflow area can be divided into the main flow zone and storage zone, and concentration difference induces the mass exchange mechanism between the two differed areas [54]. The TSM expresses the linear exchange process regarding the mass exchange coefficient and the storage area ratio, adopting the gradient induced mass exchange assumption (Equation (2)).
Since the TSM regards the cross-sectional area as a summation of the main flow zone and storage zone areas, it compels the assumption of instantaneous and uniform distribution of solute also in the storage zone. However, it is not easy to explain the simultaneous occurring of complex solute transport mechanisms (i.e., uniform distribution in the two cross-sectional areas, mass exchange, and no flow between adjacent storage areas) using this linear kinetic [10]. Despite, that a solute pulse is transiently trapped and release in storages in mountain streams. Accordingly, Equation (2), which models distributed transient storage zones along a flow path, is plausible [10].
As the coupled part of Equations (1) and (2) is replaced with residence time distribution (RTD), which determines how the tail of BTC decays, the TSM can be seen as the exponential distribution RTD model [22]. The exponential RTD is appropriate only under the well-mixed condition in both the main flow and storage zones [55]. Hence, the exponential RTD model underestimates the hyporheic exchange effect [56]. Experimental studies have proved that the power-law RTD fits better than the exponential RTD model, such as the TSM, as HTS contributes more [17,21,[56][57][58][59][60][61].

Multi-Gene Genetic Programming
As previously mentioned, to analyze pollutant mixing in rivers using TSM, the parameters of the Equations (1) and (2) need to be estimated by either inverse modeling or empirical equations. In this study, new empirical equations for the set of TSM parameters were proposed using a multi-gene genetic programming (MGGP) model.
Genetic programming (GP) is a specialized genetic algorithm (GA) which is an evolutionary technique that mimics natural evolutionary processes such as mutation and crossover. The main difference between GA and GP is that GP evolves tree-based data structures while the GA evolves numeric vectors.
GP attempts to find the model with the optimal fit by constructing and modifying trees consisting of functions and variables. First, a random population of an individual gene is generated. Once the population of genes is randomly generated and fitness function values are evaluated, each gene is modified based on the principles of natural evolution with mutations and crossovers, thus producing offspring. The mutation process picks branches, along with sub-nodes, and replaces each bunch with a randomly generated subtree as depicted in Figure 4. For the crossover operation, terminals or branched nodes of parent trees are randomly selected, and the selected points are exchanged as depicted in Figure 5. Those two operations are applied to models with low fitness after the fitness value of each function is evaluated. This evolution step is iterated until the termination criterion is met, enhancing the fitness of the models produced from GP. MGGP is a scaled symbolic regression method that is an advanced version of standard GP. The MGGP model is a linear weighted combination model consisting of individual gene-trees. The MGGP uses one or more gene-trees and calibrates coefficients of the genetrees using statistical regression methods such as least-square regression. A typical example of MGGP model is shown in Figure 6, in which b i are the coefficients of the gene-trees.
As in standard GP, which iteratively reproduces new models using crossover and mutation procedures, the MGGP algorithm contains crossovers and mutations as well. The evolutionary processes in MGGP are so-called high-level crossovers and mutations, whereas those of standard GP are called low-level processes. As a result of the operations done in MGGP, MGGP produces a bunch of equations, which are linear combinations of non-linear terms, without any pre-specified functional structure. In addition, the frequencies of variables in the obtained formulae reflect the relative importance of the variables. Subsequently, the MGGP approach offers more opportunity to catch nonlinearity associated with the phenomenon than does finding formula structures using the trial-and-error method.

Dimensional Analysis and Data Collection
In order to develop generalized equations, dimensional analysis was performed based on Buckingham's Pi theorem. As previously mentioned, past studies asserted that the hydromorphic properties contributing to the transient storage effect are morphological factors (such as channel meandering and cross-sectional shape), hydraulic conditions, and bed shear stress [13,15]. For example, Tonia and Buffington [13] explained that morphological parameters (e.g., channel meandering, pools, and riffle sequences) mainly drive the HTS exchange process. In addition, we summarized the relevant factors, which were easily obtained and the generally applcable properties, considering the classification of Jackson et al. [15], for STS, as in the following equation.
where ; S n is the channel sinuosity. According to [38,39], Froude number Fr and Reynolds number Re did not show discerning relationships with the TSM parameters. Femeena et al. [40] excluded bed slope due to measurement uncertainty. Nevertheless, slope-based shear velocity and sinuosity were included as input variables in order to take all available features into account. Consequently, the functional relationships between dimensionless variables can be summarized as Basically, a meta-analysis was performed in order to assemble sufficiently large TSM parameter values. OTIS-based TSM parameters were taken into consideration in the derivation of empirical equations. In particular, complete data sets, with the four TSM parameters (K f , A f , A s , and α) and hydromorphic properties (cross section, channel sinuosity, bed slope, and so on), were taken into account in the analysis. In the assembled data set, shear velcotiy, U * , was calculated using bed slope (Equation (5)).
Among 700 collections of meta-data, 128 datasets from 14 published papers [38,[62][63][64][65][66][67][68][69][70][71][72][73][74] were adopted for this study by selecting data which provided all variables of the Equation (4) and satisfied U/U * > 1. The meta-data list is available in the supplementary material. Noh et al. [16] assessed the applicability of optimization techniques for parameter estimation of the four TSM parameters. Using the parameter estimation setting, we exploited the meta-heuristic optimization method SC-SAHEL (Shuffled Complex-Self Adaptive Hybrid EvoLution) with a mean squared error (MSE), meta-data includes unpublished parameter estimation values from a tracer test conducted in Gam-Creek, Korea (Appendix A). A detailed description of the SC-SAHEL optimization algorithm is found in [75].
Even though the TSM parameters estimated by Cheong et al. [38] were based on the analytical solution of Hart [55], not OTIS, the estimated parameter values were included in the derivation under the assumption that the estimated parameter values would produce the same BTCs. Different expressions of the TSM parameters can be transformed using the following equations where is the storage zone ratio [-]; and T is the residence time [T].
The chosen TSM parameter data were randomly divided into a training set (90), and a test set (38) in order to derive new equations. Table 1 presents simple statistics (minimum, maximum, and mean) for the dimensionless variables of the training set and the test set. In addition to the table, the simple statistics are shown as the boxplots in Figure 7. As shown in this figure, the total data range of the training set embraces the data range of the test set.  In order to consider the multicollinearity between dimensionless hydromorphic variables, variance inflation factor (VIF) values were calculated. The VIF is one of the criteria necessary for the evaluation of linear dependency between input variables in the regression analysis. It can be calculated with the following equation.
where R 2 i is the calculated coefficient of variance for regression of the ith input variable on the other input variables. Generally, if VIF is greater than 10, then multicollinearity is said to be high. The calculated VIF values for the input variables W/h, U/U * , and S n are 1.002, 1.005, and 1.005, respectively. Thus, there is no significant multicollinearity problem in the regression.

Formulation by MGGP
New MGGP expressions for the TSM parameters were derived using GPTIPS, which is the Matlab library for MGGP [76]. MGGP evolves function terminals of the equations by following a specified function set. The four basic operators and power-based operators (power, square, cube, exp, and tanh) were nominated in the function terminal of MGGP. On the other hand, GPTIPS presents Pareto front models under two objective functions (model complexity and fitness) since over-fitting is a concern with high complexity models. In the same manner, MGGP was performed 200 times in a sequence of 500 generations for a population size of 500 for genetic operations in order to produce various forms of the Pareto front results. The maximum number of genes and the tree depth are directly related to the complexity of the produced equations since MGGP evolves in every iteration. Thus, in this study, the maximum depth and maximum number of genes were set to 6 and 4, respectively. The other hyperparameters, elitism, crossover relative parameters, and mutation parameters indicate the probability of a genetic operator's activation in each generation, and those parameter values were chosen based on the previous study. Table 2 summarizes the hyperparameter setting for MGGP in the present study.
The Pareto equations for each TSM parameter are given as follows.

Formulation by PCR-Based Regression
In order to test the empirical equations of the MGGP model, PCR-based regression equations were additionally derived using Matlab's LIBrary for Robust Analysis (LI-BRA) [77]. PCR is a classical option for multiple linear regression, as it is robust even with correlated data and it makes it easy to interpret the input variables. The standard regression model defines an equation in the form y = X β + reg , where y is the observation maxrix; X is the regressor matrix; β is the regression coefficient matrix; and reg is the regression error. The main distinction of PCR is that it determines model coefficients from A which is the eigenvector of X X . Using the orthogonality of A, the general regression model of PCR can be expressed as A scree plot was used to determine the number of principal components as shown in Figure 8). The recommended number of principal components is three since the cumulative eigenvalue is lower than 10% only with three principal components. The empirical models using PCR were derived as: The equations derived with both MGGP and PCR are in dimensionless form, but the MGGP equations had more complicated structures than did the PCR equations. The major difference between the two models was that the MGGP model considers the linear contribution of each input variable in addition to the nonlinear correlations. For example, U/U * had linear effects on Equations (11)- (12), and it could be expressed as independent terms. A s had two linear terms of W/h and U/U * , but α showed the most complicated formulation. Contrary to the MGGP model, we assumed that the structures of the four PCR equations are identical as per a pre-determined nonlinear power-law relationship. In addition, PCR equations using the total dataset were derived for expanded use but were not analyzed (see Appendix B). All derived equations are provided as MATLAB function files in the supplementary material.

Statistical Performance of the Models
In order to assess the performance of the proposed equations relative to published equations for the TSM parameters, simple regression Equations (20)- (22) presented in [40] were also considered in this study.
The equation set (Equations (20)- (22)) was noted as F2019 for brevity. Four performance criteria, accuracy, discrepancy ratio (DR), root mean squared error, coefficient of determination (R 2 ) and Pearson's correlation coefficient (ρ) were evaluated to compare the performance of the equations.
DR (Discrepancy Ratio = P (observed) Var(P (observed) )Var(P (predicted) ) (27) where, P i,(observed) and P i,(observed) are the ith components of observed and the predicted target TSM parameters, respectively; and P (observed) is the mean value of the observed TSM parameter. Each calculated performance criterion is shown in Table 3. The bold characters in Table 3 indicate the best model for each performance criterion. The MGGP model predicted K f and A f with the highest accuracy. Compared to the PCR and F2019 equations, MGGP predicted A s with a reasonable degree of accuracy, even though the PCR model gave a slightly better result. The prediction of α by MGGP showed low accuracy in both the training set and the test set. The accuracy of the other two models are about the same as that of MGGP for the prediction of α. The RMSE and R 2 results showed that the predictability of the MGGP model was good in terms of K f , A f , and α in the training set. The PCR equations were good at predicting K f , A f , and A s in the test set, and the F2019 model for A s showed the lowest RMSE value over the training set. Regarding the α test set, F2019 (Equation (22)) had the lowest RMSE and the highest R 2 values, but the MGGP model was the best model on average. The model with the best mean RMSE and R 2 was the MGGP model, except for A s .
In regards to ρ, most of the models were higher than 0.5, but low ρ values were observed in every model for prediction of α. In particular, ρ values for the training set using the MGGP formulae were closest to 1 for the training data, whereas PCR showed slightly more significant correlation in the test set. Still, the best performance of average ρ were given by the MGGP predictions for every TSM parameter.
Briefly, the equations derived by MGGP gave the best performance in the training set and averaging performance of the training and test sets. The PCR equations showed stable performance and intermediate results between those of MGGP and F2019. The equations for A s and α equations in F2019 performed fairly despite their simple formulations.
The prediction results are illustrated as acatter plots in Figures 9 and 10. Furthermore, Figures 11 and 12 present DR histograms, which represented the distributions of the predictions, of the training set and the test set, respectively. Figure 9a, Equation (20) overestimated K f , especially K f was large. As shown Figures 11 and 12, high-variance distributions were obtained using Equations (16) and (20)      From the results of the scatter plots and the DR histograms, the empirical equations predicted the two-dimensional shape variables (A f and A s ) more accurately than the solute mixing related variables (K f and α). It implies that the three hydromorphic variables, W/h, U/U * , and S n , are not enough to describe the complicated physical mixing process in natural rivers where a number of transient storages are arranged, as shown in Figure  1. Unfortunately, adopting more variables is not easy. Therefore, even recent empirical approaches to the 1D-ADE's dispersion coefficient are adopting W/h and U/U * due to the limitations of knowledge in physical process and measurement technique [41,48,49,78].
Large errors in predictions of K f and α may be produced from the TSM model error. As aforementioned, the TSM considers only the bulk shear dispersion in the main flow zone, and it treats transverse mass exchange at the storage zone boundary. This weakly twodimensional vision neglects the transverse dispersion even though it allows the transverse solute transport.
Subsequently, difficulty in the prediction of K f is inherent due to too simplified vision of the empirical equations as described above. The discrepancy in α can be explained in a similar sense since not only its order is too small but also α has trade-off interaction with K f [30].

Tracer Test description
To validate the suggested empirical equations, we used tracer test data, which was obtained from Cheongmi Creek, Yeoju-si, Gyeonggi-do, South Korea, in 2015 [16] . The experimental reach, a braided river with many storage zone areas (e.g., sand bars, meandering channels, and a bridge), was located downstream of Dangjin Bridge near the confluence with the Han River. The total length of the experimental reach was 3550 m, and it was divided into four sections denoted S1-S4 for measurement of concentration and five sections, denoted U1-U5 for measurement of hydraulic properties (Figure 13).   Figure 14 show the hydromorphic properties measured in the Cheongmi Creek tracer test. In the table, L IP is the flow distance from the injection point (IP). The shear velocity had been calculated using the slope from the Manning equation, and the applied Manning coefficient, 0.037, was determined by consulting the published guide [79]. Cross-sectional mean velocity and water depth were measured using RDI-StreamPro ADCP at all sites. The discharge during the tracer test was 2.26 m 3 /s, and the cross-sectional area was calculated, dividing measured discharge by mean velocity (Table 4). Sinuosity can be calculated with plan view. The calculated sinuosity of the specified reaches S1-S2, S2-S3, and S3-S4 are 1.0562, 1.0671, and 1.1207, respectively.  The experimental site is a meandering channel with sand bars at the inner banks. The meandering bends start at the sub-reaches S1-S2 and S3-S4. In sub-reach S3-S4, the river flows more rapidly and more sinuous than in sub-reach S1-S2. The sub-reach S2-S3 is the downstream half of the first bend where flow accelerates. In addition, The sub-section S2-S3 is the most complicated section due to the Hyeonsa Bridge, which accompanies vortical structure, sudden contraction, and expansion. Therefore, cross-sectional shape and velocity change abruptly between S2 and S3.
Rhodamine WT (RWT) (0.2 kg) was injected at multiple points considering the onedimensional fully-mixed condition (both horizontally and vertically) in a natural stream. The distance between the IP and S1 can be estimated using Equation (28) [80].
in which, L 0 is the distance from the injection point for complete mixing on cross-section [L]; n is the number of injection points in the lateral direction; and E z is the lateral mixing coefficient which is estimated from E z /HU * = 0.15 [45] [L 2 /T]. The fully mixed condition distance from the IP was estimated at 300 m using Equation (28). The distance between IP and S1 was set at 940 m to consider the storage effect in the braided channel, from a conservative perspective. RWT was measured using YSI-600OMS fluorometers; the measurement devices were calibrated beside the experimental river before being installed. In order to obtain crosssectional averaged concentrations, three or four sensors were fixed at laterally uniform distances at all measurement sites. During postprocessing, the recorded concentrations were corrected, taking into account stream temperature differences due to the day crossing by referencing the temperature at the start of the observation (Equation (29)) [81]. Also, the background concentration was removed in keeping with the corrected concentration considering the variation in temperature from Equation (29).
where C and C 0 are the RWT concentrations at temperatures t and t 0 , respectively [ppm]; t is the temperature of the stream during the measurement [ • C]; t 0 is the reference temperature; and n RWT is the temperature calibration coefficient of RWT (n RWT = 0.027) [82].

Simulation Results
In this subsection, TSM parameters and BTCs were predicted using the hydromorphic properties measured with the tracer test. Table 5 summarizes the calibrated TSM parameters via inverse modeling and the TSM parameters calculated using the three empirical models (MGGP, PCR, and F2019). In addition to the TSM parameters, DaI was calculated in each case to assess whether the estimation was reasonable. DaI is given by: where L reach is the reach length [L]. Every evaluated DaI value was in a range of [0. 1 10], which is a reasonable estimation range [62,74]. Since F2019 do not provide a formula for A f , reach averaged cross-sectional area was adopted instead. In the sub-reach S1-S2, a sand bar had migrated because of the sudden expansion of the channel. The mean cross-sectional area of the first sub-reach was 14.6009 m 2 , and the calibrated A f was smaller than the actual cross-section. The defect of the calibrated A f contributed 5.4298 m 2 to the storage zone area, but A f + A s was greater than the measured cross-section area due to the sand bar HTS. Moreover, the measured velocity at S1 (0.39 m/s) was more than three times faster than that at U2 (0.13 m/s) and S2 (0.11 m/s), so that the estimate of the effective area was low. Keeping in mind the A f s value used in the F2019, the MGGP model better predicted A f than the PCR model, in the sub-reach S1-S2. The F2019 and PCR models over-estimated K f ; the values in those models were 4.31 times, and 2.5 times higher, respectively. Every model under-estimated the storage parameters (A s and α).
The second sub-reach, S2-S3, included the Hyeonsa Bridge, which is an artificial STS. Due to the existence of the bridge pier, hydromorphic variables were extensively altered throughout the reach. Thus, DaI, which reflects the storage effect, was higher in this sub-reach than in other sub-reaches, despite the magnitudes of A s and α being half of those in compared to the reach S1-S2. The MGGP model was superior to F2019 and PCR models in the prediction of the four parameters, while all models over-estimated K f and under-estimated two storage parameters. The F2019 model predicted K f as 7.2714, which is six times higher than the calibrated value. The predictabilities of A s and α were reasonable in the order MGGP-F2019-PCR and MGGP-PCR-F2019, respectively.
The last sub-reach, S3-S4, was a meandering band reach with channel expansion. Unlike in the other sub-reaches, in this sub-reach, the measured cross-section was smaller than the calibrated value. Furthermore, using the PCR model for A s was the best choice in this reach, whereas the measured value was more reliable in the other reaches. In terms of K f , the MGGP model was most accurate, followed by PCR-F2019. The MGGP model predicted α accurately, but F2019 was better at predicting A s . The observed BTCs in the tracer test and the curves generated using the empirical equations are shown in Figure 15. The BTCs in sub-reaches S1-S2 and S2-S3 using the F2019 model had the largest R 2 s, followed by the PCR model and MGGP model. In the last sub-reach, the three models produced values with good accuracy, with R 2 > 0.99; and the R 2 of the MGGP BTC was the largest, 0.997. From the perspective of shape, the BTCs from F2019 had slow rises and steep decreases. The PCR and MGGP models reproduced steep rises and gentle tails. In the first two subreaches, S1-S2 and S2-S3, the peaks of the MGGP and PCR BTCs appeared slower than did the BTCs from F2019. Thus the F2019 model simulated the observed BTCs more precisely, even though the MGGP and PCR models predicted K f , A s , and α more accurately. All models simulated BTCs without a phase lag in the last sub-reach, while all models predicted A f with high accuracy. This result revealed that precise prediction of A f significantly affects the production of BTCs, which have accurate time-related features, such as the time to peak concentration, time to centroid, and so on.
Regarding both the estimation values and simulated BTCs, overall predictabilities were high in the order of S3-S4, S2-S3, and S1-S2. Contrary to, , which is given by Equation (7), followed the inverse order of the predictabilities. Especially, the BTCs and TSM parameters in S2-S3 were much successfully estimated than in S1-S2, although DaIbased uncertainty was higher in S2-S3 than in S1-S2. It implies that has a positive correlation with uncertainty in parameter estimation. The largest wetted area (A f , A s , , W, and h) and the slowest flow were observed in the first sub-reach, S1-S2, among the sub-reaches. As aforementioned, the TSM is inferior as the ratio HTS has more significance than STS, resulting in problematic prediction. Therefore, it hints that HTS was more influential than STS in the sub-reach S1-S2 since friction factor f and account for the hyporheic exchange particularly [71,83].

Sensitivity Analysis
As demonstrated in the preceding sections, the MGGP model stably predicted the TSM parameters. Hence, the MGGP model has been adopted as a benchmark model for the following sensitivity analysis. One-at-a-time (OAT) sensitivity analysis was performed to identify the extent to which a change in the input variable's of, ±20 % from the medians, affect the TSM parameters. For quantitative assessments, the elasticity (Equation (31)) and the sensitivity index (SI) (Equation (32)) were computed.
SI (Sensitivity Index) = max (parameter) − min (parameter) max (parameter) (32) where el is a gradient of the parameter using the median where input variables, and SI indicates the absolute rate of change when the variable changes by 40 %. The magnitude of a variable is reflected in the calculation of e var , though SI considers only the targeted parameter. Figure 16 presents spider plots and two sensitivity indices for each MGGP equation, since the MGGP model showed the highest correlation with our observations in every TSM parameter. Figure 16a shows that K f hu * is sensitive to S n which supports previous studies on mixing in meandering channels, in which that sinuosity had a strong positive relationship with secondary current [84,85], increasing the dispersion coefficient [86]. The other dimensionless input variables were also found to increase shear dispersion. The el of S n , 141, was the highest due to the small order, and it is 14 times more significant than the value of U/U * . In terms of SI, S n is the most sensitive variable, followed by W/h and U/U * . The SI values of W/h and U/U * were 0.135 and 0.29, respectively.
On the other hand, the main flow zone area, A f , was insensitive at low sinuosity, though it was nonlinear at high sinuosity. Thus, the general sensitivity of S n was not clear despite having the largest SI and el. However, the power of S n in the PCR model for A f (Equation (17)) was 0.0729, which is the smallest power, implying that the effect of S n on A f is not significant. U/U * and W/h yielded similar SI values, even though U/U * was nine times more sensitive than W/h in terms of el.
U/U * and S n were reciprocal to the storage zone area, A s , except for around the median value of U/U * which showed nonlinear behavior around the median value. Even though U/U * showed nonlinear behavior resulting in very large sensitivity criteria, the absolute power of the PCR A s equation was 0.6310, which is an intermediate value between that of the other two variables. In the spider plot, the influences of the input variables were significant in the order U/U * , S n , and W/h. Boundary shear stress affects a slow storage zone area in a wide river. In a river with a wide cross-sectional area, there are more likely to be transient storage areas, so a weakly positive correlation of W/h with A s is established.  Figure 16d shows different features from those of the PCR equation, Equation (19), illustrating that the mass exchange rate is negatively correlated with all input variables. The action of U/U * on α is most powerful at SI = 0.007. However, U/U * presented similar elasticity to S n with el = −3.4 × 10 −6 and 3.2 × 10 −6 . In total parameter adjustment, W/h was slightly more responsive than S n ; SI was in which 0.003 and 0.002, respectively. However, the elasticity of W/h was sixteen times weaker than that of the other hydromorphic variables. This result showed that a change in the exchange rate is mainly governed by U/U * since it intensifies the advective effect, and the turbulence intensity in the main flow zone.
From subplots (c) and (d) in Figure 16, as U/U * increases, A s decreases more rapidly than A f increases. It means that has a negative correlation with U/U * . In a similar approach, W/h is proportional to since el of W/h in A f was two times larger than in A s . This analogy about U/U * , the ratio of stream power and friction, in the storage parameters, A s and α, coincides with the observations that the U and f are the most relevant factor [62,71,83].
According to the graphs, the most significant features in the main flow zone dispersion and the storage zone processes are S n and U/U * , respectively. However, we note that the variance of S n is very small compared to W/h and U/U * . Regarding this, the contributions of U/U * would be remarkable to both K f and the transient storage effect. Thus the assumption that U/U * is a driving factor in the one-dimensional mixing process is still valid.

Conclusions
In the present study, we investigated how hydromorphic parameters interact with TSM parameters by analyzing empirical equations. In the main purpose of analysis, new equations for four TSM parameters (the main flow zone dispersion coefficient (K f ), the main flow zone area (A f ), the storage zone area (A s ), and the storage exchange rate (α)) were developed using MGGP and PCR. The newly proposed equations for the TSM parameters consist of dimensionless TSM parameters (K f /(hU * ), A f /(Wh), A s /Wh, and αh/U * ) and dimensionless input variables (U/U * , W/h, and S n ) which were derived from dimensional analysis. A total of 128 datasets were selected for the derivation of the equations in the meta-analysis. The two presented models and the model published by Femeena et al. were compared using performance criteria. Afterward, field assessment was carried out by matching observed BTCs from Cheongmi Creek and simulated BTCs using the empirical models. Consequently, OAT sensitivity analysis of the MGGP-based TSM parameter empirical equations was performed.
In terms of the performance criteria, the best model was the MGGP model in the training set. Besides, the MGGP model produced superior results as averaging performance criteria of the training set and test set. Next, the PCR model made stable prediction in the test dataset for K f , A f , and A s . The F2019 model kept over-estimating K f , but it was slightly better than the other two models in the prediction of α in the test set. Considering the overall performance indices, the MGGP equations stably predicted all TSM parameters.
The results of the instream application indicated that all empirical models generated acceptable BTCs. However, prediction of mean stream velocity, accounting for A f , over the complicated reach of a natural stream, using simple methods, remains infeasible.
In the sensitivity analysis section, which focused on the MGGP model, S n significantly affected K f . Change in A f was more responsive to U/U * , and W/h. The effect of U/U * on A s was considerable compared to the effects of the other dimensionless variables. The exchange rate, α, was mostly affected by the ratio of stream velocity and shear velocity, U/U * .
Regardless of the applicability of the developed equations, hidden relationships of nonapplied features of the transient storage effect may have been neglected in this study, since the adopted hydromorphic variables were insufficient to explain the complicated storage zones in real rivers. Nevertheless, we suggest the use of the new empirical equations to analyze TSM parameters as reference values in conventional inverse modeling or as an alternate approach in situations where direct inverse modeling is not accessible.
We wish to thank P.V. Femeena for providing supplementary data used to derive TSM models in their study.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Description of the Gam-Creek Tracer Test
Additional tracer test was performed in Gam-Creek, Gimcheon-si, Gyeongsangbuk-do to collect the calibrated TSM parameters. Figures A1 and A2 show the plan view and obtained BTCs, respectively, in the tracer test. A bathymetry survey was conducted utilizing Real-Time Kinematic-Global Positioning System (RTK-GPS). The used RTK-GPS model is Sokkia GRX1. The used coordinate of the GPS is a GSR80 ellipsoid Traverse Mercator X-Y type (EPSG: 5186), which is a standard of the National Geographic Information Institute of Korea. Sontek FlowTracker and YSI-600OMS fluorometers are used to measure flow velocity and BTCs. The calibrated TSM parameters and the measured hydromorphic variables are showed in Table A1.

Appendix B. Derived PCR Equations Using Total Dataset
The below PCR-based equations derived using the total dataset are provided for those who want to use them in expanded use.