Comparison of Time Nonlocal Transport Models for Characterizing Non-Fickian Transport : From Mathematical Interpretation to Laboratory Application

Non-Fickian diffusion has been increasingly documented in hydrology and modeled by promising time nonlocal transport models. While previous studies showed that most of the time nonlocal models are identical with correlated parameters, fundamental challenges remain in real-world applications regarding model selection and parameter definition. This study compared three popular time nonlocal transport models, including the multi-rate mass transfer (MRMT) model, the continuous time random walk (CTRW) framework, and the tempered time fractional advection–dispersion equation (tt-fADE), by focusing on their physical interpretation and feasibility in capturing non-Fickian transport. Mathematical comparison showed that these models have both related parameters defining the memory function and other basic-transport parameters (i.e., velocity v and dispersion coefficient D) with different hydrogeologic interpretations. Laboratory column transport experiments and field tracer tests were then conducted, providing data for model applicability evaluation. Laboratory and field experiments exhibited breakthrough curves with non-Fickian characteristics, which were better represented by the tt-fADE and CTRW models than the traditional advection–dispersion equation. The best-fit velocity and dispersion coefficient, however, differ significantly between the tt-fADE and CTRW. Fitting exercises further revealed that the observed late-time breakthrough curves were heavier than the MRMT solutions with no more than two mass-exchange rates and lighter than the MRMT solutions with power-law distributed mass-exchange rates. Therefore, the time nonlocal models, where some parameters are correlated and exchangeable and the others have different values, differ mainly in their quantification of pre-asymptotic transport dynamics. In all models tested above, the tt-fADE model is attractive, considering its small fitting error and the reasonable velocity close to the measured flow rate.


Introduction
Non-Fickian or anomalous transport, where the plume variance grows nonlinearly in time, has been documented extensively for solute transport in heterogeneous aquifers [1][2][3][4], soils [5][6][7], and rivers [8,9].Non-Fickian transport for dissolved contaminants can occur at all scales, varying from field scale [10] to micro-and nano-scale media [11,12].Non-Fickian diffusion, which is characterized by slow diffusion (sometimes further classified as sub-diffusion), has long been believed to relate to sorption/desorption between mobile and immobile domains under an equilibrium assumption [13] or kinetic conditions [14,15], or mass exchange between flow regions with relatively high and low velocities [16].Note here the term "diffusion" contains both molecular diffusion and mechanical dispersion (representing the local variation of advection from the mean velocity), and hence "non-Fickian diffusion" is used interchangeable with "non-Fickian transport" in this study.
To capture non-Fickian transport induced by solute retention, various transport models have been developed, starting from the standard advection-dispersion equation (ADE) with either equilibrium or kinetic sorption [17,18] and the two-domain or two-site models proposed originally in chemical engineering [19].Time nonlocal transport models were then developed to capture solute retention in natural geologic media with intrinsic physical and chemical heterogeneity.There are at least three popular time nonlocal transport models, which are the multi-rate mass transfer (MRMT) model [16,20], the hydrologic version of the continuous time random walk (CTRW) developed by Berkowitz and colleagues (see the extensive review in [21] and the mathematical version of CTRW in [22]), and the tempered time fractional advection-dispersion equation (tt-fADE) model [23].Some of these models have been compared theoretically.For example, mathematical similarity between the CTRW framework, the tt-fADE, and the MRMT model was explored by [24] and [21] (whose main conclusion is discussed below for clarification), and the numerical approximation of the MRMT model using CTRW schemes was developed by [25].
While previous studies showed that most of the time nonlocal models are identical with correlated parameters, fundamental challenges remain in real-world applications regarding model selection and parameter definition.Although various stochastic models have been developed for three decades in hydrology, they have not become routine modeling tools, due to many reasons.For example, given well-controlled laboratory transport experiments using heterogeneous sand columns and conservative tracers (which have been widely used to understand real-world diffusion and the resultant transport dynamics that are non-Fickian), a newcomer faces the challenge of model selection: which time nonlocal model should be used to capture the observed non-Fickian dynamics under specific flow/transport conditions, such as conservative tracer transport in saturated, heterogeneous columns repacked in the laboratory with a stable, relatively high water flow rate?To our best knowledge, there is, unfortunately, no literature providing such an answer for this simple question.The above time nonlocal transport models were originally built upon different physical theories and contain different quantities and types of parameters.A better understanding of the potential benefits and limitations of different time nonlocal transport models as applied to non-Fickian dynamics, therefore, is required before they can be reliably applied for real-world applications and attract new users with limited knowledge in stochasticity.
This study will systematically evaluate the above (MRMT, CTRW, and tt-fADE) theoretical treatments of non-Fickian transport of conservative solutes, with respect to their parameters and ability to represent non-Fickian dispersion, especially the late-time tailing behavior, which is typical for hydrological processes in heterogeneous geological media.Late-time dynamics of contaminant transport is also a major factor in many environmental issues, such as groundwater contamination remediation and aquifer vulnerability assessment.We will then apply all the modeling methods for laboratory column transport experiments and field tracer tests.We emphasize here that the application of a time nonlocal transport model to capture non-Fickian transport is not new.What is new in this study is the quantitative, practical comparison of nonlocal transport models given experimental data.
Water 2018, 10, 778 3 of 28 The rest of the work is organized as follows.In Section 2, we review the above time nonlocal transport models.In Section 3, the time nonlocal transport models are applied to non-Fickian transport in multidimensional heterogeneous porous media.The applicability of those time nonlocal models is checked using laboratory experiments and field tests.Our laboratory experiments, where a conservative tracer moves through heterogeneous lightweight expanded clay aggregate (Leca) beads, exhibit typical non-Fickian diffusive behavior with elongated late-time tails.The field tracer tests also exhibit apparent tailing behaviors.Section 4 checks and compares all the time nonlocal models for characterizing the observed non-Fickian dynamics.Further analyses are shown in Section 5, where we group the transport models and briefly discuss parameter uncertainty.Conclusions are finally drawn in Section 6.

Review and Evaluation of Time Nonlocal Transport Models
The core of the time nonlocal models is the appropriate definition of the memory function, which controls the distribution of waiting times for contaminants trapped by immobile zones.In this section, we focus on the theoretical background, especially the memory function, for each model and explore potential correlation of critical parameters in different models.For example, previous studies emphasized that the tt-fADE is a specific form of the CTRW framework since the tt-fADE assumes a truncated power-law memory function, which is one of the memory functions previously assumed by the CTRW framework [21].Identical functionality was also pointed out for the MRMT and the CTRW framework (see Section 1).

Multi-Rate Mass Transfer Model
The MRMT model describes mass transfer between a mobile domain and any number of immobile domains with varying properties.The linear, multi-rate, first-order solute transport equations in the absence of sources/sinks can be written as [20]: where C m and C im,j [ML −3 ] represent the aqueous concentrations in the well-mixed mobile zone and the j-th well-mixed immobile zone, respectively; β j [dimensionless] is the capacity coefficient usually defined as the ratio of porosities of the j-th immobile and the mobile phases; v [LT −1 ] is the velocity vector; D [L 2 T −1 ] is the dispersion coefficient tensor; n [dimensionless] is the number of distinct immobile phases; and α j [T −1 ] is the first-order mass transfer rate (also called the rate coefficient) associated with the j-th immobile zone.When n = 1, Equation (1) reduces to the single-rate mass transfer model.The summation term in the left-hand side of (1) can be expressed as a convolution, leading to the time-nonlocal form [16,26]: where the symbol " * "denotes convolution, and g(t) [T −1 ] is a memory function defined by the weighted sum of the exponential decay from individual immobile zones [16]: where b(α) [T] is a density function of first-order rate coefficients.
Water 2018, 10, 778 4 of 28 In terms of similarities with the other nonlocal methods discussed below, the MRMT model captures the time nonlocality caused by the diffusion-limited transport of solutes in immobile zones.A practical advantage of this approach is that the memory function has explicit hydrogeological meaning and thus it may be calculated, fitted, or even predicted [27].Finally, the parameters v, D, and g(t) can be spatially variable, so the MRMT method can capture the local variation of solute transport velocity caused by heterogeneity.

Tempered Time Fractional Advection-Dispersion Equation Model
The tt-fADE is one analytic technique that accounts for the time nonlocality of the medium, and simultaneously accounts for convergence of a stochastic solute particle motion process (i.e., a CTRW) to a limit distribution.In particular, if the distribution of trapping times between the movement of solute particles has an infinite mean, then the overall transport equation has one fractional-order derivative representing "dispersion" in time, leading to the standard time fractional advection-dispersion equation (t-fADE) model.Assuming a power-law memory function to describe random waiting times in the immobile zones [28]: where Γ(•) is the Gamma function, and the exponent 0 < γ ≤ 1 (when γ approaches to 1, Γ(1 − γ) approaches to infinity, making no memory effects g(t) = 0, and the model would behave like the traditional ADE model with a retardation coefficient 1 + β).Then by definition, is a Caputo fractional derivative of order γ.Inserting ( 6) into the MRMT model ( 1) and assuming that the solute is initially placed in the mobile zone only, one obtains the following standard t-fADE describing the mobile and immobile solute transport: where C im denotes the overall chemical concentration in all immobile domains.Meerschaert et al. [23] generalized the t-fADE ( 7) and ( 8) by introducing an exponentially truncated power-law function, which is an incomplete Gamma function, as the memory function: where λ > 0 [T −1 ] is the truncation parameter in time.This modification leads to the tt-fADE: where C 0 m = C m (x, t = 0) denotes the initial source located only in the mobile phase.At a time Water 2018, 10, 778 5 of 28 the tail of the mobile-phase breakthrough curve (BTC) declines as a power law function: while at a much later time t >> 1/ λ, the slope of the mobile-phase BTC approaches negative infinity (i.e., the late-time BTC tail declines exponentially).Therefore, the value of λ controls the transition of the BTC late-time tail from a power-law function to exponential function [29].The use of the tt-fADE (10) and (11) to model mobile/immobile anomalous solute transport is motivated by four factors: (1) the equation governs the limits of known stochastic processes; (2) it describes a combination of first-order mass transfer models and reduces to known mobile/immobile equations in the integer order case; (3) the equation has tractable solutions that model the significant features of solute plume evolution in time and space; and (4) the equation is parsimonious, with no more parameters than the standard MRMT model (1) with multiple pairs of rate and capacity coefficients.To summarize, the tt-fADE has one major limitation compared to the MRMT model ( 1) and the CTRW model discussed below: the memory function embedded in the tt-fADE is a specific form (9), while the memory function used in the MRMT and the CTRW model can have different forms.Potential effects of this difference on the models' abilities to estimate real-world physical behavior is explored below, in Section 4, and discussed in Section 5.

Continuous Time Random Walk Framework
Derivation of the CTRW model in hydrologic sciences [21] starts from the generalized master equation with the kernel Φ defined as (in Laplace space (t → s)) [30]: where the tilde "~" denotes the Laplace transform, Ψ(x, s) is the Laplace transform of the joint density of jump length and duration, and φ(s) is the Laplace transform of the transition time or duration where the right-hand side term is valid for independent jump size and transition time, and f (k) denotes the jump size density.Berkowitz et al. [21] defined a memory function M to replace φ(s) in ( 15) where t 1 [T] denotes a "typical median transition time" for particles.Inserting ( 16) into (15), and taking the Laplace and Fourier inverse transform, the following well-known CTRW framework was obtained [21]: where v ψ denotes the average velocity, and D ψ is the dispersion coefficient.When deriving the CTRW (17) from the master Equation ( 15), the jump size density f (k) in (15) needs to be expanded as Water 2018, 10, 778 6 of 28 resulting in the spatially-averaged velocity v ψ and dispersion coefficient D ψ in (17): Therefore, here the "typical median transition time" t 1 is the mean waiting time: The CTRW memory function M defined by (16) has various forms to capture various breakthrough curves (BTCs).One popular form is the exponentially truncated power-law, defined by the transition time φ (see Equation (16) in [32]): where the factor B = {t (with τ 2 = t 2 /t 1 .)keeps the integral of φ(t) to be 1.For simplicity and direct comparison between models, we constrain the exponent ξ to be 0 < ξ < 1 in this study.The Laplace transform of ( 22) is (see Equation (17) in [32]) A simple manipulation shows the relationship between the MRMT memory function g(t) and the CTRW memory function M(t) in Laplace space: where θ M [dimensionless] and θ I [dimensionless] are the porosity in the mobile and (total) immobile domains, respectively.Inserting ( 16) and ( 23) into (24), we obtain: There is no known analytical solution for g in real time t, except for the following asymptote at late time t >> t 1 τ 2 : Based on (26), we obtain the late-time growth rate for p Therefore, for time t >> t 2 , the non-Fickian transport transitions to Fickian diffusion.The cutoff time scale t 2 in (22), as explained by [32], "corresponds to the largest heterogeneity length scale".In another numerical study by Willmann et al. [33], t 2 was also called "the late cutoff time".The above analysis shows that t 2 is functionally equivalent to the inverse of the truncation parameter λ used in the tt-fADE model (10) and (11).
For the intermediate time t 1 << t << t 2 , one can obtain the memory function g by solving (25) numerically (e.g., using the numerical inverse Laplace transform).This was done by [32], who found that the transition probability scales as a power-law function p(t) ∝ t −ξ−1 , where t 1 << t << t 2 (28) Comparing ( 28) and ( 13), we find that the power-law exponent ξ in the CTRW framework is functionally equivalent to the scale index γ in the tt-fADE model (10) and (11).
Hence, the parameters in the CTRW framework (17) (e.g., ξ and t 2 ) are related to the parameters in the tt-fADE model (10) and (11) (γ and λ), except for t 1 in (17), which may be estimated by the mean diffusive time.Parameters predicted by one model (such as the tt-fADE) may also help to improve the estimated parameters of the other (i.e., the CTRW framework).

Applications: Capturing Non-Fickian Transport in Multidimensional Porous Media
Here we apply the above time nonlocal transport models to capture non-Fickian transport observed in multidimensional, heterogeneous porous media.Well-controlled laboratory experiments of sand column transport were conducted, to provide data to evaluate the three nonlocal transport models reviewed above.We used a cylindrical organic glass (polymethyl methacrylate) tube filled with non-uniform "lightweight expanded clay aggregate" (Leca) beads to monitor solute transport in saturated porous media.Leca beads were selected since they contain high intra-granular porosity (and potential for sorption on the large surface area of clay particles), which can lead to retention for solute transport as often occurs in the field.The length of the Leca column was 100 cm with an internal diameter of 4 cm.The experimental apparatus was composed of inflow and outflow water tanks, a porous medium column, tubing, and a detection device (Figure 1).The diameter of the Leca beads varied from 1.0 to 2.0 mm.A pulse of Brilliant Blue FCF (BBF), which is a conservative organic compound typically used as colorant for foods, with a volume of 5 mL was injected into the column (from the bottom of the glass tube set vertically) at a concentration of 0.1 g/L, representing an instantaneous point source.For the relatively short time-scales of these experiments, we believe the mechanical dispersion of BBF is predominant, and effects of molecular diffusion are negligible.An ultraviolet visible light spectrophotometer was used to measure the absorbance of solute, and the absorbance was then converted to concentration.Continuous sampling provided tracer BTCs used to check the applicability of the nonlocal transport models.
Water 2018, 10, x FOR PEER REVIEW 7 of 28 For the intermediate time t1 << t << t2, one can obtain the memory function g by solving (25) numerically (e.g., using the numerical inverse Laplace transform).This was done by [32], who found that the transition probability scales as a power-law function , where t1 << t << t2.(28) Comparing ( 28) and ( 13), we find that the power-law exponent ξ in the CTRW framework is functionally equivalent to the scale index γ in the tt-fADE model (10) and (11).
Hence, the parameters in the CTRW framework (17) (e.g., ξ and t2) are related to the parameters in the tt-fADE model (10) and (11) (γ and λ), except for t1 in (17), which may be estimated by the mean diffusive time.Parameters predicted by one model (such as the tt-fADE) may also help to improve the estimated parameters of the other (i.e., the CTRW framework).

Applications: Capturing Non-Fickian Transport in Multidimensional Porous Media
Here we apply the above time nonlocal transport models to capture non-Fickian transport observed in multidimensional, heterogeneous porous media.Well-controlled laboratory experiments of sand column transport were conducted, to provide data to evaluate the three nonlocal transport models reviewed above.We used a cylindrical organic glass (polymethyl methacrylate) tube filled with non-uniform "lightweight expanded clay aggregate" (Leca) beads to monitor solute transport in saturated porous media.Leca beads were selected since they contain high intra-granular porosity (and potential for sorption on the large surface area of clay particles), which can lead to retention for solute transport as often occurs in the field.The length of the Leca column was 100 cm with an internal diameter of 4 cm.The experimental apparatus was composed of inflow and outflow water tanks, a porous medium column, tubing, and a detection device (Figure 1).The diameter of the Leca beads varied from 1.0 to 2.0 mm.A pulse of Brilliant Blue FCF (BBF), which is a conservative organic compound typically used as colorant for foods, with a volume of 5 mL was injected into the column (from the bottom of the glass tube set vertically) at a concentration of 0.1 g/L, representing an instantaneous point source.For the relatively short time-scales of these experiments, we believe the mechanical dispersion of BBF is predominant, and effects of molecular diffusion are negligible.An ultraviolet visible light spectrophotometer was used to measure the absorbance of solute, and the absorbance was then converted to concentration.Continuous sampling provided tracer BTCs used to check the applicability of the nonlocal transport models.Six flow rates (increasing from 0.4, 0.6, 0.8, 1.0, 1.2, to 1.4 mL/s; see Table 1) were carried out in the experiment.For each flow rate, three experimental runs were conducted where the BBF BTCs were collected at three sections of the sand column, varying from 50, 70, to 100 cm.During each run with the similar flow rate, the hydraulic gradient between the inlet and outlet was kept constant in time, and therefore the flow velocity for the three experimental runs should be close to each other.
Table 1.Measured and fitted parameters (using the advection-dispersion equation (ADE) model with equilibrium sorption) for Brilliant Blue FCF (BBF) transport through the "lightweight expanded clay aggregate" (Leca) beads at different flow rates and travel distances and field tests.In the legend, Q represents the flow rate; L denotes the travel distance (i.e., the length of the column); v* (=v/R) is the average flow velocity divided by the retardation coefficient; D* (=D/R) is the dispersion coefficient divided by the retardation coefficient; θ is the porosity; and RMSE stands for root mean square error between observed values and predicted values.The measured BTCs all exhibit late-time tailing behavior (see Figure 2 and Figures S1-S5 in the Supplementary File), which is one of the major characteristics of non-Fickian transport.

Model Fit and Comparison
Comparisons of the measured and best-fit BTCs using the above three time-nonlocal transport models and the classical ADE are shown in Figure 2 and Figures S6-S10 in the Supplementary File.In this section, we briefly introduce the model fitting process, and then compare the model results.

The ADE Model with Equilibrium Sorption
For comparison, we first use the classic ADE with equilibrium sorption to quantify the measured BTCs.The governing equation for one-dimensional chemical transport in groundwater with advection, dispersion, and retardation is [34]: which has the following solution with an instantaneous point source at the origin:

Model Fit and Comparison
Comparisons of the measured and best-fit BTCs using the above three time-nonlocal transport models and the classical ADE are shown in Figure 2 and Figures S6-S10 in the Supplementary File.In this section, we briefly introduce the model fitting process, and then compare the model results.

The ADE Model with Equilibrium Sorption
For comparison, we first use the classic ADE with equilibrium sorption to quantify the measured BTCs.The governing equation for one-dimensional chemical transport in groundwater with advection, dispersion, and retardation is [34]: which has the following solution with an instantaneous point source at the origin: where M [M] represents the initial mass injected into the column; A [L 2 ] is the cross-section area; and R [dimensionless] is the retardation coefficient.From a mathematical perspective, R acts as a rescaling factor in time.Hence, we reduce the three parameters v, D, and R in the ADE model to two parameters v* (=v/R) and D* (=D/R).The fitted values for parameters v* and D* are listed in Table 1.These two parameters were calibrated manually based on visual inspection.
In our Leca-column transport experiments, the relatively large flow velocity led to a large Peclet number (Pe >> 1) and the dispersion coefficient D relates to v via D = α v, where α [L] denotes the dispersivity.The relationship between dispersivity α and the travel distance in the laboratory experiments is shown in Figure 3.For a fixed flow rate, α varies with the travel distance without any fixed trend, perhaps due to varying dispersion over the relatively short travel distance.
where M [M] represents the initial mass injected into the column; A [L 2 ] is the cross-section area; and R [dimensionless] is the retardation coefficient.From a mathematical perspective, R acts as a rescaling factor in time.Hence, we reduce the three parameters v, D, and R in the ADE model to two parameters v* (=v/R) and D* (=D/R).The fitted values for parameters v* and D* are listed in Table 1.These two parameters were calibrated manually based on visual inspection.
In our Leca-column transport experiments, the relatively large flow velocity led to a large Peclet number (Pe >> 1) and the dispersion coefficient D relates to v via D = α v, where α [L] denotes the dispersivity.The relationship between dispersivity α and the travel distance in the laboratory experiments is shown in Figure 3.For a fixed flow rate, α varies with the travel distance without any fixed trend, perhaps due to varying dispersion over the relatively short travel distance.The relationship between dispersivity and flow rate in the laboratory experiments is shown in Figure 4.Although the best-fit dispersivity fluctuates with the flow rate, it generally increases with an increasing flow velocity.A larger flow velocity causes a wider spatial distribution of contaminant plume and increases the apparent dispersivity in the ADE model.This phenomenon is consistent with known non-Fickian solute transport behavior and indicates that the ADE model is performing as expected [33].The relationship between dispersivity and flow rate in the laboratory experiments is shown in Figure 4.Although the best-fit dispersivity fluctuates with the flow rate, it generally increases with an increasing flow velocity.A larger flow velocity causes a wider spatial distribution of contaminant plume and increases the apparent dispersivity in the ADE model.This phenomenon is consistent with known non-Fickian solute transport behavior and indicates that the ADE model is performing as expected [33].
In the BTCs measured at three travel distances with six different flow rates in the laboratory experiments and field tracer tests, the ADE model ( 29) captures the rapid increase of early-time BTCs, but overestimates the decline of the late-time BTC tails although the delay of transport due to retardation was considered in the ADE (29).This implies that the delayed transport observed in our laboratory experiments and field tests is more complex than equilibrium adsorption described in (29), where the sorbed or immobile concentration is a simple linear function of the dissolved concentration.In the BTCs measured at three travel distances with six different flow rates in the laboratory experiments and field tracer tests, the ADE model ( 29) captures the rapid increase of early-time BTCs, but overestimates the decline of the late-time BTC tails although the delay of transport due to retardation was considered in the ADE (29).This implies that the delayed transport observed in our laboratory experiments and field tests is more complex than equilibrium adsorption described in (29), where the sorbed or immobile concentration is a simple linear function of the dissolved concentration.

The MRMT Model
The MRMT model (1) has various specific forms, which might be useful for applications and therefore require further evaluation.For example, the immobile domain can be characterized as multiple layers with a distribution of diffusion rate coefficients or first-order mass-transfer rates, or be simplified as a layer with a single diffusion rate which can be attractive in applications due to its simplicity in manipulation.There is, however, no solid physical justification for the selection of any of these forms, given simply the limited information for a sand column like the one used in our laboratory.For systematic analyses of all potential mass transfer models, we selected the following four MRMT subsets with different mass-transfer formulations: (1) MRMT model 1 (single mass-transfer rate): the immobile zones can be simplified by a single, homogeneous first-order mass-transfer rate (which is also the single-rate double-porosity model); (2) MRMT model 2 (single diffusion rate): the immobile zones have a single diffusion rate for all layers; (3) MRMT model 3 (two mass-transfer rates): the immobile zones have two sets of rate coefficients; (4) MRMT model 4 (multiple mass-transfer rates): the immobile zones have a power-law distribution of (first-order mass-transfer) rate coefficients.
We use the STAMMT-L version 3.0 code [35] to solve the above four MRMT models.Best-fit results, which were calibrated manually, are shown in Figure 2 and Figures S6-S10 and discussed below.

The MRMT Model
The MRMT model ( 1) has various specific forms, which might be useful for applications and therefore require further evaluation.For example, the immobile domain can be characterized as multiple layers with a distribution of diffusion rate coefficients or first-order mass-transfer rates, or be simplified as a layer with a single diffusion rate which can be attractive in applications due to its simplicity in manipulation.There is, however, no solid physical justification for the selection of any of these forms, given simply the limited information for a sand column like the one used in our laboratory.
For systematic analyses of all potential mass transfer models, we selected the following four MRMT subsets with different mass-transfer formulations: (1) MRMT model 1 (single mass-transfer rate): the immobile zones can be simplified by a single, homogeneous first-order mass-transfer rate (which is also the single-rate double-porosity model); (2) MRMT model 2 (single diffusion rate): the immobile zones have a single diffusion rate for all layers; (3) MRMT model 3 (two mass-transfer rates): the immobile zones have two sets of rate coefficients; (4) MRMT model 4 (multiple mass-transfer rates): the immobile zones have a power-law distribution of (first-order mass-transfer) rate coefficients.
We use the STAMMT-L version 3.0 code [35] to solve the above four MRMT models.Best-fit results, which were calibrated manually, are shown in Figure 2 and Figures S6-S10 and discussed below.

MRMT Model 1 with a Single Mass-Transfer Rate
The general mobile-immobile (MIM) model with diffusion in the immobile zone can be written as [20]: where T(x,t) [ML −3 T −1 ] is a transient term accounting for rate-limited mass transfer between the mobile and immobile domains, and Da is the apparent diffusion coefficient.Here n denotes the dimensionality of the problem, and n = 1, 2, 3 denotes diffusion into layers, cylinders, and spheres, respectively.For a single rate first-order mass-transfer approximation, T(x,t) can be expressed as [35]: which is the single rate version of the MRMT model (1).The best-fit parameters using the single-rate MIM model ( 31)-( 34) are listed in Table 2.The fitting exercise showed the sensitivity of model parameters to the travel distance and water flow rate in the laboratory experiments.First, the dispersivity α L does not change with the travel distance (Table 2), since the plume expansion with time is captured by the mass transfer term in the model.This is different from the standard ADE model, where the dispersivity must increase with the travel distance to capture scale-dependent dispersion.31)-( 34)) at different flow rates and travel distances.In the legend, L denotes the travel distance (i.e., the length of the column); α L is the dispersivity; v is the flow velocity; β tot is the total capacity coefficient; and α stands for the mass transfer rate.Second, the average velocity used in the model is slightly larger than the measured BTC peak velocity and the ADE velocity (note that the ADE velocity is also larger than the real BTC peak velocity), probably due to the assumption that there might be an immobile domain interacting with the mobile domain.The parameters v and D in the MRMT and ADE models have different meanings and hence may not have the same values.Parameters v and D in the MRMT model refer to the mobile domain [36], and therefore the velocity in the MRMT model is generally larger than the ADE velocity, while the opposite is expected for the dispersion coefficient D. This holds true for all the MRMT formulations.
Water 2018, 10, 778 13 of 28 Third, the capacity coefficient β either remains constant or decreases very slightly with the travel distance, implying that the medium heterogeneity might not significantly change with the medium's length.
Fourth, the rate coefficient α increases slightly and the capacity coefficient β decreases slightly with an increasing flow rate at each control plane.This subtle change is likely due to the assumption that the faster water flows correspond to less volumetric proportion of immobile domains.A faster flow may decompose immobile domains and enhance the mass exchange between mobile and immobile domains, resulting in a larger mass exchange rate α.

MRMT Model 2 with a Single Diffusion Rate
The layered diffusion model is a specific case of the MRMT model [20] with the following rate and capacity coefficients: The best-fit parameters for model (35) and ( 36) are listed in Table 3.There is no apparent correlation between the mass transfer rate and the travel distance or water flow velocity.The same conclusion is found for the capacity coefficient.The best-fit parameters are listed in Table 4.The dispersivity α L remains constant, since solute plume expansion (due likely to solute retention) is mainly captured by mass exchange between mobile and the two immobile zones.Or in other words, the two pairs of parameters, α j (j = 1, 2) and β j (j = 1, 2), control the mass exchange.Their values fluctuate (without predictable trends) with the travel distance and flow rate in the laboratory experiments, although α 2 decreases with increasing α 1 for the same flow rate.There is no efficient way to directly measure the rate coefficient or capacity coefficient, which creates a challenge for relating MRMT model parameters to observable physical phenomena, and for providing information to MRMT models based on ancillary observations in heterogeneous media.The density of rate coefficient for MRMT model 4 can be defined as [16]: where α min [T −1 ] denotes the minimum rate coefficient, α max [T −1 ] is the upper boundary of the rate coefficient, and k is the exponent.
In the fitting parameters shown in Table 5, the dispersivity α L remains stable for the same reason mentioned above for the other MRMT models.Flow velocity used in this model can be approximated by the peak velocity for the measured BTC.The total capacity coefficient (β tot ) does not significantly change with the travel distance.The exponent k controls the slope of the late-time BTC in a log-log plot, and hence a larger k denotes faster decline of the late-time BTC.The best-fit k slightly increases with an increasing flow rate, due likely to the relatively faster decline of the late-time solute concentration under a larger water flow rate.It is also noteworthy that, on one hand, the minimum mass transfer rate (α min ) in (37) controls the maximum waiting time for solute particles, which is functionally equivalent to the inverse of the cutoff time scale t 2 in the CTRW framework and the truncation parameter λ in the tt-fADE model.In general, α min increases with an increasing flow rate (Table 5).Faster flow may accelerate the mass exchange between mobile and immobile domains, generating shorter mean residence times for solute particles in the immobile domain and leading to a greater mass transfer rate.On the other hand, the maximum mass transfer rate (α max ) in (37) defines the shortest waiting time (for solute particles between two displacements), whose impact on the late-time transport dynamics can be overwhelmed by the other smaller rates.Numerical results also show that α max apparently does not change with the travel distance and flow rate.Hence, α max can be kept constant for all cases (Table 5).
As shown in Figure 2 and Figures S6-S10, the MRMT model 4 captures the BTC late-time tail much better than the other mass-transfer formations with fewer rate coefficients.However, compared with the tt-fADE model, the simulated tail of the MRMT model 4 tends to be slightly heavier at the end of the modeling time.In other words, it slightly underestimates the mass transfer rate at the late time.Note that the memory function in the tt-fADE is not exactly the same as that in the MRMT model 4. The tt-fADE has exponentially-truncated power-law rate coefficients, while the MRMT model 4 simply deletes any rate coefficient larger than α max and smaller than α min .This subtle difference in memory functions between the tt-fADE and MRMT model 4 might be the reason for the differences observed here.In addition, the MRMT model 4 (with six model parameters) requires one more parameter than the tt-fADE model.
As shown in Figure 2 and Figures S6-S10, both the MRMT model 1 and model 2 can capture most characteristics of the observed BTCs, except for the late-time tailing.Therefore, we need more than one mass-transfer rate to fit the solute transport in the one-dimensional heterogeneous sand column and field tracer tests.Increasing the number of immobile domains improves the model's performance at late times, but more immobile domains can complicate the model application.To be specific, adding one more immobile domain adds two unknown parameters (a pair of capacity coefficient and mass exchange rate coefficient for that immobile domain).(10) and (11) The failure of the standard ADE model and the single rate mass transfer model in capturing the late-time BTC tailing motivates the application of the other time nonlocal transport models such as the tt-fADE model (10) and (11).A numerical solver of ( 10) and ( 11) can be found in [37].The best-fit results using this model are shown in Figure 2 and Figures S1-S5, where both the peak and the late time tailing can be captured simultaneously.

The tt-fADE Model
The best-fit parameters are shown in Table 6, where the best-fit velocity v is slightly larger than the measured peak velocity v peak , due to the fractional-order capacity coefficient β in the left-hand side of the tt-fADE Equations ( 10) and ( 11) when γ → 1: Table 6.The best-fit parameters for the tempered time fractional advection-dispersion equation (tt-fADE) model ( 10) and ( 11) at different flow rates and travel distances.In the legend, α is the dispersivity; γ is the time/scale index; β is the fractional-order capacity coefficient; and λ is the truncation parameter.Figures 5 and 6 show the fitted parameters versus the flow rate and travel distance in the laboratory experiments.The best-fit dispersivity α does not change significantly with the travel distance (Figure 5), which is different from the scale-dependent dispersion observed by other studies [38].
This discrepancy might be due to the following reasons: (1) the travel distance is too short to observe any stable trend of dispersion (tracer injection at the inlet may cause a boundary effect on transport and random deviation from the local mean velocity), and (2) the system is advection dominated (as a typical laboratory sand-column transport experiment) and therefore the estimates of D might be too uncertain to clearly show a subtle trend.In addition, the best-fit dispersivity increases with an increasing flow rate.The faster the water flows, the wider the plume becomes, requiring a larger dispersivity.This is consistent with the fitting results of the other models mentioned above.This discrepancy might be due to the following reasons: (1) the travel distance is too short to observe any stable trend of dispersion (tracer injection at the inlet may cause a boundary effect on transport and random deviation from the local mean velocity), and (2) the system is advection dominated (as a typical laboratory sand-column transport experiment) and therefore the estimates of D might be too uncertain to clearly show a subtle trend.In addition, the best-fit dispersivity increases with an increasing flow rate.The faster the water flows, the wider the plume becomes, requiring a larger dispersivity.This is consistent with the fitting results of the other models mentioned above.
The other model parameters, including the time index γ (representing residence time in the immobile domain), the capacity coefficient β (the ratio of immobile to mobile volume), and the truncation parameter λ (controlling the transition from power-law to exponential tailing), vary with the flow rate (Figure 6).First, the time index γ increases with an increasing flow rate, since a larger γ represents a shorter mean residence time for solute particles in the immobile domain (here we assume that the mean residence time in the immobile domain decreases with an increasing flow velocity).Second, the capacity coefficient β also increases with the flow rate, showing that the variation of the time index might overshadow the variation of the capacity coefficient.Third, the truncation parameter λ increases with the flow rate, implying that the non-Fickian transport converges to its Fickian asymptote more quickly if the flow rate is larger.This trend might be due to the decreased immobile portion with a higher flow rate in the short sand column.In addition, the truncation parameter λ is relatively small, in order to capture the relatively heavy late-time tails.
The above three parameters (γ, β, and λ), however, generally remain constant at different travel distances if the flow rate remains unchanged (Table 6).The packing procedure for the sand column was designed to yield a macroscopically homogeneous texture, and, indeed, the results imply that the statistics of medium properties (such as the sand size distribution, specific surface area, effective porosity, and/or internal structure) are spatially uniform.For example, a uniform fractional capacity coefficient β suggests that the ratio of mobile to immobile volume is constant in space.This phenomenon holds true for the field tracer tests.The spatial uniformity of these parameters simplifies The other model parameters, including the time index γ (representing residence time in the immobile domain), the capacity coefficient β (the ratio of immobile to mobile volume), and the truncation parameter λ (controlling the transition from power-law to exponential tailing), vary with the flow rate (Figure 6).First, the time index γ increases with an increasing flow rate, since a larger γ represents a shorter mean residence time for solute particles in the immobile domain (here we assume that the mean residence time in the immobile domain decreases with an increasing flow velocity).Second, the capacity coefficient β also increases with the flow rate, showing that the variation of the time index might overshadow the variation of the capacity coefficient.Third, the truncation parameter λ increases with the flow rate, implying that the non-Fickian transport converges to its Fickian asymptote more quickly if the flow rate is larger.This trend might be due to the decreased immobile portion with a higher flow rate in the short sand column.In addition, the truncation parameter λ is relatively small, in order to capture the relatively heavy late-time tails.
The above three parameters (γ, β, and λ), however, generally remain constant at different travel distances if the flow rate remains unchanged (Table 6).The packing procedure for the sand column was designed to yield a macroscopically homogeneous texture, and, indeed, the results imply that the statistics of medium properties (such as the sand size distribution, specific surface area, effective porosity, and/or internal structure) are spatially uniform.For example, a uniform fractional capacity coefficient β suggests that the ratio of mobile to immobile volume is constant in space.This phenomenon holds true for the field tracer tests.The spatial uniformity of these parameters simplifies the fitting procedure.For each flow rate with different sample distances, the time index and the capacity coefficient can be calibrated a single time for one sampling distance, and then kept constant for the other sampling distances.Therefore, when using the tt-fADE model (10) and (11), we only need to fit three parameters: velocity, dispersion coefficient, and truncation parameter.Fitting is further simplified by using the observed BTC peak velocity as a lower bound when predicting the velocity in the tt-fADE model.

The CTRW Model
The fitted results using the CTRW model (17) are shown in Figure 2 and Figures S1-S5.We used the CTRW MATLAB Toolbox Version 3.1 developed by [39].The CTRW framework with the truncated power-law transition time φ(t) (22) was used, as suggested by [21], since it can efficiently capture the transition from non-Fickian to Fickian transport [32].
This CTRW model contains five unknown parameters, ξ, t 1 , t 2 , v ψ , and D ψ .The time t 1 expressed by formula (21) represents the approximated mean transition time.The power law behavior in the BTC begins from t 1 and ends at t 2 .Knowledge of parameters gained in the fitting exercise of the tt-fADE model in Section 4.3 improves the predictability of the CTRW model.In particular, we set the power-law exponent ξ in the CTRW model equal to the time index γ in the tt-fADE model ( 10) and ( 11), and approximate the cutoff time scale t 2 in the CTRW framework using the inverse of the truncation parameter λ in the tt-fADE (see Section 2.3).Here a smaller ξ represents more disorder of the host system.The remaining three parameters, including the velocity v ψ , the dispersion coefficient D ψ , and the mean waiting time t 1 in can be fitted using the observed BTCs.The fitted parameters using the CTRW model are listed in Table 7.
Table 7.The best-fit parameters for the continuous time random walk (CTRW) model.In the legend, v ψ is the CTRW transport velocity, which can be different from the average pore velocity v; D ψ is the dispersion coefficient with the subscript ψ indicating CTRW interpretation; ξ is the power-law exponent; t 1 is the mean transition time; and t 2 is the truncation time scale.ξ is converted from γ in the tt-fADE model ( 10) and ( 11), and t 2 is converted from λ in the tt-fADE model.

Comparison of Transport Models
The transport models discussed in Section 4 can be classified into two main groups.The first group includes the ADE with equilibrium adsorption, the single mass-transfer rate MIM model, and the single diffusion rate MRMT model, which capture the observed early-time BTC and its peak, Water 2018, 10, 778 20 of 28 but underestimate the persistent late-time tail of each BTC.The second group includes the MRMT model with two or a power-law distributed rate coefficients, the tt-fADE, and the CTRW model with a truncated power-law memory function, which can capture the overall trend and positive skewness of the BTCs.The two rate coefficients in the MRMT model, however, might not be adequate to capture heavy late-time tailing for solute transport observed in the field.There is no efficient way to predict the capacity coefficient and the mass transfer rate in the model, especially when there are multiple sets of mass transfer rates due to various retention capabilities in natural soils with spatial variable hydraulic properties.In contrast, the tt-fADE and the CTRW model parameters are more directly relatable to physical features [29].
MRMT model: As articulated by [24], the MRMT model relates closely to the CTRW framework in capturing solute retention times.Particularly, the mean transition time t 1 in the CTRW framework is related to the inverse of the mean rate coefficient in the MRMT model with power-law distributed rate coefficients.Our fitting exercise in Section 4.4 shows that the BTC is not sensitive to α max .The mean transition time t 1 in the CTRW framework and the maximum boundary α max used in the MRMT model might not be needed when capturing the late-time tailing of solute transport (note that the cutoff time t 2 in the CTRW model or the minimum rate coefficient α min in the MRMT model play a more important role than t 1 or α max in affecting the late-time BTC), and therefore they may be removed from the fitting parameters to simplify the model applications.Note that the tt-fADE model does not need the lower-bound of retention times.In addition, the MRMT model with power-law distributed rate coefficients tends to slightly overestimate the late-time tailing in BTCs (Figure 2 and Figures S6-S10), implying that the actual mass transfer rates may decline faster than a power-law function at late times.
CTRW model: The CTRW framework has a complex relationship to the tt-fADE model.On one hand, as discussed in Section 2.3 and checked in Section 4.4, the power-law exponent ξ in the CTRW framework is functionally equivalent to the scale index γ in the tt-fADE, and the cutoff time scale t 2 in CTRW is equivalent to the inverse of the truncation parameter λ in the tt-fADE.On the other hand, the velocity and dispersion coefficient in the CTRW framework significantly differ from those in the tt-fADE model.Similarly, they are not directly related to the solution of the traditional ADE.Applications in Section 4.4 show that the average CTRW transport velocity v ψ (1.303 mm/s) is ~59% less than the average real peak velocity (3.18 mm/s) of the observed BTC and ~62% less than the average best-fit velocity (3.39 mm/s) in the tt-fADE model at the flow rate Q = 1.4 mL/s.For the field tracer tests, the CTRW transport velocity v ψ (0.3 mm/s) is two orders of magnitude larger than the real peak velocity (0.003 mm/s) of the observed BTC, which is similar to the best-fit velocity (0.004 mm/s) in the tt-fADE model.The velocity used in the tt-fADE model can be calculated by using Equation (38), instead of fitting.This procedure further eliminates the number of parameters in the tt-fADE model and makes the fitting more convenient.In addition, the dispersion coefficient in the CTRW framework D ψ cannot be kept constant under the same flow rate for different travel distances like the dispersion coefficient in the tt-fADE model for both laboratory experiments and field tracer tests.Therefore, the spatially averaged velocity defined by Formula ( 19) for the CTRW framework may differ from the actual pore-scale velocity.In the CTRW model, different from the traditional ADE, the advective, dispersive, and diffusive transport mechanisms are combined in the random walk formalism.The advective component and the dispersive component are calculated by spatial moments of the same joint probability density function (PDF) for particle transitions and hence cannot be disconnected [21].In particular, according to Equation (19), velocity in the CTRW model can be estimated by determining the characteristic time and mean distance, which is the first moment of the PDF of transition displacement.It is, however, difficult to predict the effective velocity v ψ without detailed knowledge of the porous medium.It is also noteworthy that the generalized master Equation (21) does not separate the effects of the spatially varying velocity field on solute particle displacement into an advective part and a dispersive part.The concept of CTRW therefore does not build an explicit relationship between real velocity and model velocity, and the same is true for the dispersion coefficient.In addition, the power-law exponent ξ can also affect the overall magnitude of solute plume expansion, an effect that intermingles with the dispersion coefficient D ψ and the effective velocity v ψ .The above analysis is consistent with the result in [40] that many parameters, particularly v ψ and ξ, in the CTRW model are correlated with each other.Their fitting exercises showed that different parameter combinations can lead to the same mean Eulerian velocity predicted by the model, implying that the estimated model parameters were not unique.
tt-fADE model: In the tt-fADE model, the best-fit velocity v can be larger than the plume peak's velocity v peak because the effective velocity used in the tt-fADE is adjusted by the elapsed time for solutes spent in retention, as represented by the fractional-order capacity coefficient β on the right-hand side of Equation (38).In this study, laboratory experiments with six flow rates and field tracer tests show that the best-fit velocity in the tt-fADE model is close to the measured BTC's peak velocity, since the capacity coefficient shown in Equation ( 38) is relatively small.Because the transport velocity used in the tt-fADE model (10) and ( 11) is not significantly different from the BTC's peak velocity, the tt-fADE model uses an independent parameter, the time index γ, to control the power-law distribution of the late-time BTC.This parameterization of the tailing is relatively simple as compared to the CTRW model, with three parameters, v ψ , D ψ , and ξ that contribute to time-nonlocal non-Fickian dispersion.We calculate the root mean square error (RMSE) for the laboratory experiments.Comparison of the RMSE (see Tables 1, 6 and 7) of the two models and the ADE model demonstrates that they both perform better than the standard ADE, especially in describing tailing behavior in a heterogeneous medium.When the Formula ( 38) is used, the tt-fADE model is more convenient to apply in practice than the CTRW framework, since the former contains fewer parameters.
Why the fitted four-parameter tt-fADE model may provide the best performance?This might be due to two reasons.First, according to the physical derivation of the tt-fADE in Section 2.2, the tt-fADE separates motion in the mobile zone (using the basic transport parameters v and D to define advection and Gaussian diffusive displacement) and retention in the immobile domains.When the tracer particle moves in the mobile domain, its average speed is v, with a finite time required to finish the jump.This physical separation might be reasonable in hydrogeological media.In the CTRW framework, the memory function defines the random waiting time between two subsequent jumps, while the motion can finish instantaneously (in other words, no physical time is required for the tracer particle to move).This physical discrepancy may also cause the discrepancy of the two basic transport parameters v and D between the two models, in addition to the spatial average parameters required by the CTRW framework.Second, the tt-fADE does not specify the lower-bound of the waiting time (using an additional parameter such as the lower-limit t 1 in the CTRW framework), since the slow advection can also affect the late-time BTC tail.Mass exchange can apparently affect the late-time BTC tail only when the diffusive time scale is much longer than the advective time scale, as pointed out by [16].This may explain why t 1 might not be needed in the CTRW framework.Further real-world tests are needed to check the above hypotheses.

Parameter Sensitivity
One example of parameter sensitivity is tested here for the tt-fADE (10) and (11).Sensitivity of the BTCs to variations of the four main parameters (D, γ, β, λ) in the tt-fADE model is shown in Figure 7.
First, the dispersion coefficient D has a subtle impact on the overall shape of BTCs.Decreasing D from 0.005 cm 2 /s to 0.0005 cm 2 /s results in similar BTCs after normalization (i.e., re-scaling), implying that trapping due to the immobile zones may account at least partially for the spatial expansion of solute plumes.When D increases from 0.005 cm 2 /s to 0.05 cm 2 /s, the BTC becomes wider and its shape slightly changes (Figure 7a,b).
Second, the time index γ controls the power-law slope of the late-time BTC.For example, when γ increases from 0.29 to 0.50 (representing the decrease of probability for long retention times), the late-time BTC becomes steeper, approaching relatively fast to its Gaussian asymptote (Figure 7c,d).
When γ decreases from 0.29 to 0.05, the BTC's late-time tail becomes heavier (i.e., with a gentler slope), although the overall BTC looks narrower (likely due to the normalization of BTCs).The simulated BTC with the time index γ equal to 0.29 gives the best fit.First, the dispersion coefficient D has a subtle impact on the overall shape of BTCs.Decreasing D from 0.005 cm 2 /s to 0.0005 cm 2 /s results in similar BTCs after normalization (i.e., re-scaling), implying that trapping due to the immobile zones may account at least partially for the spatial expansion of solute plumes.When D increases from 0.005 cm 2 /s to 0.05 cm 2 /s, the BTC becomes wider and its shape slightly changes (Figure 7a,b).
Second, the time index γ controls the power-law slope of the late-time BTC.For example, when γ increases from 0.29 to 0.50 (representing the decrease of probability for long retention times), the late-time BTC becomes steeper, approaching relatively fast to its Gaussian asymptote (Figure 7c,d).Third, the fractional-order capacity coefficient β shifts the BTC and expands the BTC's late-time tail.When β decreases (representing a decrease of the immobile zone volume or the immobile solute mass at equilibrium), the BTC becomes narrower and shifts to the left (representing a larger effective velocity).A faster drop is apparent in the late-time BTC tail with a smaller β.An opposite change of the BTC can be seen for an increasing β (Figure 7e,f).
Fourth, the truncation parameter λ affects the speed for the late-time BTC to transfer from a heavy tailed, power-law slope to an exponential tail.A larger truncation parameter means an earlier transform from non-Fickian to Fickian transport (Figure 7g,h).For the largest λ tested, the resultant BTC is the closest to the solution of the ADE model, as expected because the tt-fADE reduces to the ADE model for λ → ∞.

Application to Field Transport
A field trace transport test was conducted recently by Zheng [41], which provides field data to evaluate further the time nonlocal transport models and compare with the laboratory column experiments.The test site is in the Zhangjiawan Village, Zhangjiawan Town, southeast of the Tongzhou District, Beijing, China, with the longitude of 116.72 • and latitude 39.848 • .This experimental site is in the Chaobai River alluvial plain.The average annual precipitation in the vicinity is about 533 mm, and the evaporation is 1822 mm.It has a multi-layer aquifer structure.The aquifer is composed of gravelly coarse sand, coarse sand, fine sand, silty clay and clay layer in the test field and surrounding area.The hydraulic conductivity coefficient shows obvious heterogeneity, indicating that the aquifer develops a small-scale preferential flow channel network.
The subsurface network consisted of one injection well, one pumping well, and three monitoring wells with continuous multi-tubing (denoted as well 13, 23, and 33, respectively) (see Figure 8 for the study site).
All the five wells are along the same line of the general groundwater flow direction, and therefore a one-dimensional model may be used to approximate the overall transport.The injection well and the pumping well are separated by 8 m, and the three observation wells have a uniform interval of 2 m.Groundwater flows from the injection well to well 13, 23, 33, and to the pumping well.
Water 2018, 10, x FOR PEER REVIEW 23 of 28 When γ decreases from 0.29 to 0.05, the BTC's late-time tail becomes heavier (i.e., with a gentler slope), although the overall BTC looks narrower (likely due to the normalization of BTCs).The simulated BTC with the time index γ equal to 0.29 gives the best fit.Third, the fractional-order capacity coefficient β shifts the BTC and expands the BTC's late-time tail.When β decreases (representing a decrease of the immobile zone volume or the immobile solute mass at equilibrium), the BTC becomes narrower and shifts to the left (representing a larger effective velocity).A faster drop is apparent in the late-time BTC tail with a smaller β.An opposite change of the BTC can be seen for an increasing β (Figure 7e,f).
Fourth, the truncation parameter λ affects the speed for the late-time BTC to transfer from a heavy tailed, power-law slope to an exponential tail.A larger truncation parameter means an earlier transform from non-Fickian to Fickian transport (Figure 7g,h).For the largest λ tested, the resultant BTC is the closest to the solution of the ADE model, as expected because the tt-fADE reduces to the ADE model for λ → ∞.

Application to Field Transport
A field trace transport test was conducted recently by Zheng [41], which provides field data to evaluate further the time nonlocal transport models and compare with the laboratory column experiments.The test site is in the Zhangjiawan Village, Zhangjiawan Town, southeast of the Tongzhou District, Beijing, China, with the longitude of 116.72° and latitude 39.848°.This experimental site is in the Chaobai River alluvial plain.The average annual precipitation in the vicinity is about 533 mm, and the evaporation is 1822 mm.It has a multi-layer aquifer structure.The aquifer is composed of gravelly coarse sand, coarse sand, fine sand, silty clay and clay layer in the test field and surrounding area.The hydraulic conductivity coefficient shows obvious heterogeneity, indicating that the aquifer develops a small-scale preferential flow channel network.
The subsurface network consisted of one injection well, one pumping well, and three monitoring wells with continuous multi-tubing (denoted as well 13, 23, and 33, respectively) (see Figure 8 for the study site).All the five wells are along the same line of the general groundwater flow direction, and therefore a one-dimensional model may be used to approximate the overall transport.The injection well and the pumping well are separated by 8 m, and the three observation wells have a uniform interval of 2 m.Groundwater flows from the injection well to well 13, 23, 33, and to the pumping well.
Sodium bromide, a commonly used conservative tracer, was injected into the injection well at a depth of 11.8 m, and groundwater samples were taken from the three observation wells (Well 13, 23, 33) located downstream at a depth of ~10 m.The concentration of the injected solution was 1288 mg/L.The injection rate was about 0.3 m 3 /h, and the injection duration was 6 h.A peristaltic pump was used to collect the samples in a chronological order, and the sample concentration was measured using a MP523-06 bromide ion concentration meter.
The measured BTCs exhibit apparent late-time tailings (see Figure 9), similar to that observed in the laboratory column transport.Applications show that the time nonlocal models can capture part of the late-time BTC tail, but not the whole tail of BTC containing apparent noise.Due to the noise, it is also impossible to obtain a reliable RMSE.Although the apparent noise causes high uncertainty in model fitting, both the CTRW framework and the tt-fADE can capture a heavier late-time tail than the MRMT model with two sets of rate coefficients, and the measured BTCs do contain a high concentration at the very end of the sampling period (Figure 9).
In addition, all the measured BTCs show apparent early time tail, which cannot be captured by the tt-fADE or the CTRW framework with the time index between 0 and 1.It is, however, not a surprise, since the early arrivals are most likely due to fast motion of tracer particles along preferential flow paths, while the delayed arrivals are caused by solute retention due to mass exchange between the mobile and relatively immobile zones.At the field site, high-permeable sand constitutes the layer connecting the injection well and the three monitoring wells, likely forming the preferential channels.The time nonlocal transport models considered in this study were developed to capture solute retention, hence missing the early tail of the BTC.The spatiotemporal fADE may capture both the early and late time tails in the BTC [28,42], which will be explored in a future study.
It is also noteworthy that the flow velocity in real aquifers is several orders of magnitude smaller than that used in the laboratory experiments.Hence, the field transport is diffusion dominated, while the laboratory transport is advection dominated.This discrepancy might imply that the late-time BTC tail persists for groundwater flow with a broad range of Peclet numbers, which can be characterized by the time nonlocal models.However, for groundwater flow with a small Peclet number and potential preferential flow paths, the early time BTC tail may occur, which cannot be efficiently Sodium bromide, a commonly used conservative tracer, was injected into the injection well at a depth of 11.8 m, and groundwater samples were taken from the three observation wells (Well 13, 23, 33) located downstream at a depth of ~10 m.The concentration of the injected solution was 1288 mg/L.The injection rate was about 0.3 m 3 /h, and the injection duration was 6 h.A peristaltic pump was used to collect the samples in a chronological order, and the sample concentration was measured using a MP523-06 bromide ion concentration meter.
The measured BTCs exhibit apparent late-time tailings (see Figure 9), similar to that observed in the laboratory column transport.Applications show that the time nonlocal models can capture part of the late-time BTC tail, but not the whole tail of BTC containing apparent noise.Due to the noise, it is also impossible to obtain a reliable RMSE.Although the apparent noise causes high uncertainty in model fitting, both the CTRW framework and the tt-fADE can capture a heavier late-time tail than the MRMT model with two sets of rate coefficients, and the measured BTCs do contain a high concentration at the very end of the sampling period (Figure 9).
In addition, all the measured BTCs show apparent early time tail, which cannot be captured by the tt-fADE or the CTRW framework with the time index between 0 and 1.It is, however, not a surprise, since the early arrivals are most likely due to fast motion of tracer particles along preferential flow paths, while the delayed arrivals are caused by solute retention due to mass exchange between the mobile and relatively immobile zones.At the field site, high-permeable sand constitutes the layer connecting the injection well and the three monitoring wells, likely forming the preferential channels.The time nonlocal transport models considered in this study were developed to capture solute retention, hence missing the early tail of the BTC.The spatiotemporal fADE may capture both the early and late time tails in the BTC [28,42], which will be explored in a future study.
It is also noteworthy that the flow velocity in real aquifers is several orders of magnitude smaller than that used in the laboratory experiments.Hence, the field transport is diffusion dominated, while the laboratory transport is advection dominated.This discrepancy might imply that the late-time BTC tail persists for groundwater flow with a broad range of Peclet numbers, which can be characterized by the time nonlocal models.However, for groundwater flow with a small Peclet number and potential preferential flow paths, the early time BTC tail may occur, which cannot be efficiently captured by the time nonlocal transport models such as the tt-fADE or the CTRW framework with an index less than 1.

Figure 2 .
Figure 2. Laboratory tracer test: Comparison between the measured (symbols) and the modeled (lines) breakthrough curves (BTCs) using the advection-dispersion equation (ADE), the tempered time fractional advection-dispersion equation (tt-fADE) (red thick lines), the continuous time random walk (CTRW) (green dashed lines), and the multi-rate mass transfer (MRMT) model with the water flow rate Q = 1.4 mL/s.

Figure 2 .
Figure 2. Laboratory tracer test: Comparison between the measured (symbols) and the modeled (lines) breakthrough curves (BTCs) using the advection-dispersion equation (ADE), the tempered time fractional advection-dispersion equation (tt-fADE) (red thick lines), the continuous time random walk (CTRW) (green dashed lines), and the multi-rate mass transfer (MRMT) model with the water flow rate Q = 1.4 mL/s.

Figure 3 .
Figure 3.The best-fit dispersivity in the ADE model (29) versus the travel distance.

Figure 3 .
Figure 3.The best-fit dispersivity in the ADE model (29) versus the travel distance.

Figure 4 .
Figure 4.The best-fit dispersivity in the ADE model versus the water flow rate.

Figure 4 .
Figure 4.The best-fit dispersivity in the ADE model versus the water flow rate.

Figure 5 .
Figure 5. Dispersivity in the tt-fADE model changes with the travel distance and flow rate.

Figure 5 .
Figure 5. Dispersivity in the tt-fADE model changes with the travel distance and flow rate.

Figure 6 .
Figure 6.The tt-fADE model parameters change with the travel distance and flow rate.

Figure 6 .
Figure 6.The tt-fADE model parameters change with the travel distance and flow rate.

Water 2018 , 28 Figure 7 .
Figure 7. Sensitivity analysis for coefficients in the tt-fADE model.

Figure 7 .
Figure 7. Sensitivity analysis for coefficients in the tt-fADE model.
(a) Zhangjiawan test site location and distribution of wells

Figure 8 .
Figure 8. Study site map showing injection well, pumping well, and continuous multichannel tubing (CMT) wells.

Figure 8 .
Figure 8. Study site map showing injection well, pumping well, and continuous multichannel tubing (CMT) wells.

Table 2 .
The fitted parameters for the single mass-transfer rate mobile-immobile model (i.e., Equations (

Table 3 .
The fitted parameters for the single diffusion rate multi-rate mass transfer (MRMT) model at different flow rates and travel distances.In the legend, α d is the diffusion rate coefficient (α d = D a /a 2 , where Da is the apparent diffusion coefficient and a is the layer half-thickness).
4.2.3.MRMT Model 3 with Two Mass-Exchange RatesMRMT model 3 contains two pairs of coefficients: two rate coefficients (α 1 and α 2 ) and two capacity coefficients (β 1 and β 2 ) corresponding to the first and the second immobile domains, respectively.

Table 4 .
The fitted parameters for the two-set MRMT model at different flow rates and travel distances.In the legend, α 1 and β 1 represent the mass transfer rate and capacity coefficient of the first immobile domain, respectively; and α 2 and β 2 are the mass transfer rate and capacity coefficient of the second immobile domain, respectively.

Table 5 .
The best-fit parameters for the power-law MRMT model 4 at different flow rates and travel distances.In the legend, k stands for the slope of the late-time tail, and α min and α max stand for the lower and upper boundary of the mass transfer rates, respectively.