Multiscale Temporal Irreversibility of Streamﬂow and Its Stochastic Modelling

: We investigate the impact of time’s arrow on the hourly streamﬂow process. Although time asymmetry, i.e., temporal irreversibility, has been previously implemented in stochastics, it has only recently attracted attention in the hydrological literature. Relevant studies have shown that the time asymmetry of the streamﬂow process is manifested at scales up to several days and vanishes at larger scales. The latter highlights the need to reproduce it in ﬂood simulations of ﬁne-scale resolution. To this aim, we develop an enhancement of a recently proposed simulation algorithm for irreversible processes, based on an asymmetric moving average (AMA) scheme that allows for the explicit preservation of time asymmetry at two or more time-scales. The method is successfully applied to a large hourly streamﬂow time series from the United States Geological Survey (USGS) database, with time asymmetry prominent at time scales up to four days.


Introduction
The term "time's arrow" was coined by Eddington [1] to describe time directionality, which can be determined by studying the organization of atoms, molecules and bodies. He states "Let us draw an arrow arbitrarily. If as we follow the arrow we find more and more of the random element in the state of the world, then the arrow is pointing towards the future; if the random element decreases the arrow points towards the past. That is the only distinction known to physics. This follows at once if our fundamental contention is admitted that the introduction of randomness is the only thing which cannot be undone. I shall use the phrase 'time's arrow to express this one-way property of time which has no analogue in space. It is a singularly interesting property from a philosophical standpoint." The direction of time can be handled by the class of irreversible processes that destroy macroscopic information and are manifestations of the second law of thermodynamics. This law states that all natural processes generate entropy, a measure of uncertainty. The irreversible destruction of the macroscopic order defines what can be called the "thermodynamic" arrow of time.
However, there are irreversible processes that do not necessarily destroy macroscopic information. Rather they modify information in a manner that, again, by viewing their evolution, it becomes evident which way "time's arrow" is pointing at.
Many processes of interest to engineering and natural sciences are modelled using Gaussian linear stochastic processes. However such a modelling approach cannot reproduce irreversibility. Weiss [2] showed that if the process x(t) is Gaussian, then it is time reversible. As a result, a directional process cannot be Gaussian. He also showed that a discrete-time autoregressive moving-average (ARMA) process is reversible if and only if it is Gaussian. This conclusion is very important because it shows that stationary series that show evidence of directionality cannot be modelled by Gaussian ARMA models. Therefore, other models should be used to accurately model this behaviour [3]. 2 of 11 In hydrology, it is known that the ascending part of a hydrograph is steeper than the descending one [4]. This pattern clearly reflects time's arrow and can be modelled as a stochastic property. However, only few methods have been proposed in the hydrological literature to detect and statistically measure irreversibility. A method proposed by Psaradakis [5] is based on the evaluation of the probability of the differenced process. As the probability is positive, for time-irreversible processes there is a deviation of the median of the differenced process from zero. In another study, Müller et al. [6] propose a class of new tests for time irreversibility and suggest different ways to implement them. As index of asymmetry they use the third moment of differences of the empirical copulas rather than of the original time series. By performing simulations of combined sewer systems with original and time-reversed time series they found "significant deviations of more than 10%" among them. Serinaldi and Kilsby [7] used directed horizontal visibility graphs (DHVGs) to perform an analysis of the dynamics of streamflow fluctuations with focus on time irreversibility and long-range dependence. The recent study of irreversibility by Koutsoyiannis [3] states that time asymmetry requires the study of the third moment µ 3 and the coefficient of skewness C s of the process, original and differenced.
Attempts to simulate time irreversible streamflow processes have been even fewer. In this respect, Koutsoyiannis [3] proposed a model called AMA (Asymmetric Moving Average) in order to generate irreversible time series. Mathai and Mujumdar [8] have also built a model to simulate time irreversible streamflow at multiple sites. Multisitecorrelated streamflow states were generated, and then flow sequences were constructed by independently considering the ascension and recession limbs of the hydrograph at individual sites.
The above literature review highlights the importance of time asymmetry in hydrology and the need for a broader investigation for its stochastic modelling on a global scale. In this study, we use streamflow time series data from the United States Geological Survey (USGS) database [9] to investigate the irreversibility of the hourly streamflow process at scales of up to one hundred (hours). The asymmetry parameter that is employed is the same as in the study by Koutsoyiannis [3,10]: the ratio of the skewness coefficient of the differenced process to that of the original process. The aim is twofold: (a) to quantify the irreversibility of the hourly streamflow process and identify at which range of scales it should affect its modelling, and (b) to modify the existing method by Koutsoyiannis [3,10], which preserves irreversibility at the first scale only, and to make it capable of simultaneously preserving the irreversibility also at a second scale. Finally, the original and the extended method are verified by a case study, while the methodology is further discussed in the thesis by Vavoulogiannis [11].

Dataset
Time series were downloaded from the Water Department of USGS, the largest provider of in situ water data in the world. For the irreversibility investigation, we compiled a dataset consisting of 762 stations around the USA using the climata.usgs package. The data were downloaded at a 15 min resolution and were subsequently aggregated to the 1 h scale. The time series were downloaded for a period of five years (from November 2014 until November 2019) and had less than 10% missing values. For more information on the data, see Vavoulogiannis [11].

Definition of Time Irreversibility
A stochastic process x(t) is a collection of (usually infinitely many) random variables x indexed by t, typically representing time. In turn, a random variable x is an abstract mathematical entity, associated with a probability distribution function F(x) := P{x ≤ x} where x is any numerical value (i.e., a regular variable) [12]. (Notice that underlined symbols denote random variables). The stochastic process x(t) represents the evolution of the system over time, while a trajectory x(t) is a realization of x(t); if it is known at certain points t i , it is a time series. A stochastic process x(t), at a (continuous) time t, is characterized by its nth-order distribution function: The process is time reversible or time symmetric if its joint distribution does not change after a reflection of time about the origin [2], i.e., if for any n, t 1 ; t 2 ; . . . ; t n−1 ; t n , F(x 1 , x 1 , . . . , x n ; t 1 , t 2 , . . . , t n ) = F(x 1 , x 1 , . . . , x n ; −t 1 , −t 2 , . . . , −t n ) Here, we use Koutsoyiannis's method [3] to study time asymmetry as follows. First, the time-differenced stochastic process is defined in discrete-and continuous-time, respectively, as: We also define X κ as the cumulative process of x κ in discrete time: Based on the process (original or time-differenced) at scale 1, we may also define the averaged process at any scale k ≥ 1, e.g., the averaged original process x (k) is defined in discrete time as: It is easy to see that the first moment (mean) of the differenced process is always zero while the second one (variance) is always positive, and thus they do not provide indications on time asymmetry. Hence, the least-order moment that can be used to detect irreversibility is the third one. Using the second and the third moments, the skewness of the differenced process is calculated as: We further introduce the following index of time asymmetry, which is defined as the ratio of the skewness of the differenced process C s to that of the original process C s : This is found to be particularly helpful in the simulation process. A high positive value of this index denotes a large time asymmetry, whereas values close to 0 indicate time reversibility.

Multiscale Preservation of Time Irreversibility
The "hydrograph pattern", i.e., a steeper ascending limb and a mild descending limb, can be stochastically understood and modelled through the property of temporal irreversibility, hence bypassing the subjectivities involved in hydrograph modelling through conceptual models [3]. The conceptual basis behind streamflow time asymmetry is of less interest here since we can model it as a stochastic property.
The AMA (Asymmetric Moving Average) model proposed by Koutsoyiannis (2019) can deal with irreversibility [3,10]. It is based on filtering non-Gaussian white noise and can also preserve some other important stochastic characteristics such as the long-range dependence. For a simulation length q, we can write: The a j are internal coefficients of the generation scheme and not model parameters to be estimated from data, while V i−1 represents a white noise process. The above model can preserve irreversibility provided that the sequence of coefficients a j is not symmetric about 0. Our investigation of irreversibility suggests that it is prominent at scales up to the mark of 100 h (i.e., approximately four days) and approaches zero for larger scales. Therefore, it is critical to preserve irreversibility in a wide range of smaller scales. In the current study, the algorithm proposed by Koutsoyiannis [3,10], which preserves irreversibility at the basic scale, is modified and extended to simulate time series that preserve the irreversibility at both the first and the second scale. To this aim, it is first important to calculate the theoretical moments of the AMA model at the first and second scale. For a simulation length q, the AMA model of Equation (9) can also be expressed as: where b j = a j−q , w i = v i+q , with the latter being a lognormal white noise process. Other distributions can be used for even better accuracy [13]. For the first scale, we must calculate the second and third moments of the differenced and original sequences, respectively. The second moment of the original sequence is: whereas that of the differenced sequence is: where we set a −q−1 = 0, or, more generally, we set a n = 0 for any n out of the interval [−q, q]. The third moment of the original sequence is: whereas that of the differenced sequence is: Likewise, for the scale 2 we must calculate the second and third moments of the differenced and original sequences, respectively. We highlight that the process is first averaged at the second scale, according to Equation (5), and afterwards differenced and not the other way around. The second moment of the original sequence at scale 2 can be expressed as: whereas the second moment of the differenced sequence at scale 2 is: The third moment of the original sequence at scale 2 is: whereas the third moment of the differenced sequence at scale 2 is: After calculating the sample moments, optimization tools are used to estimate the parameters of the AMA model by minimizing the sum of squared errors between the sample (empirical) and the theoretical moments. The parameterization follows the same methodology as in Koutsoyiannis [10]. The function θ(ω) is defined as the smooth minimum of two hyperbolic functions θ i (ω) (i = 1, 2) of frequency ω, i.e.: where the symbols C ji ; i = 1, 2; j = 0, 1, 2 and ζ denote parameters to be determined by optimization.
After the function of θ(ω) is parameterized, we use it to perform the Fourier transform and express the real part of the result for j = 0, . . . , q. At last, we have the sequence of a η , i.e.,: where i is the imaginary unit, θ(ω) is an odd real function (meaning θ(−ω) = −θ(ω)). In turn, A R (ω) is defined as: where s d (ω) represents the power spectrum.

Stochastic Tools for Multiscale Dependence Characterization
For the characterization of the multiscale dependence, we use the climacogram stochastic metric, which expresses the quantification of change/variability in the scale domain, instead of the common lag (i.e., through the autocovariance or autocorrelation function) and frequency (i.e., through the power-spectrum) domains. The second-order climacogram is defined as the variance of the averaged process x(t) (Equation (5)) (assumed stationary) versus the averaging time scale k and is symbolized by γ(k) [14]. The climacogram is useful for detecting the long-term change (or else dependence, persistence, clustering) of a process's multiscale stochastic representation.
The statistical bias in the climacogram estimator can be calculated as follows. As shown in Koutsoyiannis's study [15], assuming that we have n = T/∆ observations of the averaged process x i (∆) , where the observation period T is an integer multiple of ∆, the time-resolution, the expected value of the empirical (sample) climacogramγ(∆) is: The climacogram is also related to the power spectrum and the autocovariance. However, the study by Dimitriadis and Koutsoyiannis [16] showed that the climacogram had the smallest estimation error among the three tools, while its bias could be computed simply and analytically. Additionally, the fact that its values are always positive is an advantage in stochastic modelling. Moreover, it is well-defined with an intuitive definition and is for the most part monotonic.
Another useful metric in the scale domain is the climacospectrum, which is a newly introduced stochastic tool. It is defined by Koutsoyiannis [17] as: where k represents the scale. The asymptotic behaviour of the second-order characteristics of a process for k → 0 and k → ∞ is characterized by two parameters, M and H, which are given by: where "#" represents the slope on a doubly logarithmic plot (a doubly logarithmic derivative). The climacospectrum has the following advantages [17]: In comparison with the power spectrum, it is superior with respect to its connection with the conditional entropy production. Specifically, it is more precise without exceptions. Additionally, the variance, on which the definition of the climacospectrum is based, is more closely related to uncertainty, and as a result to the entropy of the process, than the power spectrum and the autocovariance.
To apply the method to the streamflow series, first the effect of the annual cycle is approximately removed by multiplying the discharge values by 12 different coefficients, one per month, summing up to 1. These coefficients are estimated by minimizing the total variance of the transformed time series.
Then, the Filtered Hurst Kolmogorov model is fitted to the data in order to estimate the parameters H, M and a. The climacogram of the Filtered Hurst Kolmogorov process is given below [17]: where a is a scale parameter, and M and H are the fractal and Hurst parameters. The same calibration function is used for both the climacogram and the climacospectrum (empirical and theoretical), since the climacospectrum is more robust for the analysis of the finer scales and the climacogram is more robust for the analysis of the larger scales. The reason behind this is related to the theoretical context of these stochastic tools, as discussed earlier and in [16]. We note that both the climacogram and the climacospectrum were adapted for bias.
After that, the discrete power spectrum through the Fast Fourier Transform (FFT) and the AMA coefficient were calculated. The next step was to detect the scale-wise time irreversibility. We aggregate the data up to the 100 h scale and calculate the sample skewness of the differenced and original processes. Their ratio is the irreversibility index, as discussed above. Furthermore, the irreversibility of both scales 1 and 2 was also important to calculate since it was to be preserved later by the algorithm.
In the first case, i.e., when irreversibility is only preserved at scale one, optimization tools are used to find the parameters needed to estimate the constant θ. In the second case for the sequence θ(ω), which is defined as the smooth minimum of two hyperbolic functions of frequency, optimization was used again. The concept is that after building functions to calculate the sample and theoretical moments, computational tools have to minimize the difference between the sample (empirical) and the theoretical moments. In the second case, the difference is that this happens for two scales and that the square error is minimized. The output is the θ(ω) sequence. With knowledge of the power spectrum and θ(ω) we are able to calculate the AMA coefficients from Equation (20), i.e., the a η sequence. After this procedure, the synthetic time series can be simulated by applying Equation (9).
Specifically, this methodology generates time series that preserve irreversibility at two timescales along with the second-order dependence properties of the process. For example, if we choose as the basic scale the one-hour scale, the second scale for the irreversibility to be preserved is two hours. In Figure 1, the complete methodology is shown in steps.
for the sequence θ(ω), which is defined as the smooth minimum of two hyperbolic functions of frequency, optimization was used again. The concept is that after building functions to calculate the sample and theoretical moments, computational tools have to minimize the difference between the sample (empirical) and the theoretical moments. In the second case, the difference is that this happens for two scales and that the square error is minimized. The output is the θ(ω) sequence. With knowledge of the power spectrum and θ(ω) we are able to calculate the AMA coefficients from Equation (20), i.e., the sequence. After this procedure, the synthetic time series can be simulated by applying Equation (9).
Specifically, this methodology generates time series that preserve irreversibility at two timescales along with the second-order dependence properties of the process. For example, if we choose as the basic scale the one-hour scale, the second scale for the irreversibility to be preserved is two hours. In Figure 1, the complete methodology is shown in steps.

Irreversibility Investigation from a Large Dataset
Streamflow time series at a 15 min resolution were downloaded from the water department of the United States Geological Survey (USGS). In total, we studied 762 stations across the USA [11]. This particular dataset was chosen for three reasons: (1) it contains fine-scale streamflow data, which were necessary for the investigation of the temporal irreversibility; (2) the large spatial coverage of the database including different catchments allowed for a broad spectrum of investigation; and (3) the reliability of the USGS data.

Irreversibility Investigation from a Large Dataset
Streamflow time series at a 15 min resolution were downloaded from the water department of the United States Geological Survey (USGS). In total, we studied 762 stations across the USA [11]. This particular dataset was chosen for three reasons: (1) it contains fine-scale streamflow data, which were necessary for the investigation of the temporal irreversibility; (2) the large spatial coverage of the database including different catchments allowed for a broad spectrum of investigation; and (3) the reliability of the USGS data.
The results from the estimation of time irreversibility up to a timescale of 100 h are shown in Figure 2. The mean skewness ratios on scales 1 and 2 h are 2.51 and 1.9, respectively. The variance of the skewness ratio is 43.58 for scale 1 h, and 7.65 for scale 2 h. It is observed that the irreversibility is on average very pronounced at the first few scales and then slowly converges to zero at the scale of four days, where the process becomes approximately reversible. However, a large variability is noted among the stations.
The results from the estimation of time irreversibility up to a timescale of 100 h are shown in Figure 2. The mean skewness ratios on scales 1 and 2 h are 2.51 and 1.9, respectively. The variance of the skewness ratio is 43.58 for scale 1 h, and 7.65 for scale 2 h. It is observed that the irreversibility is on average very pronounced at the first few scales and then slowly converges to zero at the scale of four days, where the process becomes approximately reversible. However, a large variability is noted among the stations.

Case Study: Monocacy River
In Figure 3, the irreversibility of the Monocacy River at Bridgeport is shown scalewise. The ratio at scale 1 h is calculated as: = 1.390 and at scale 2 h as: = 1.197. As seen in Figure 3, we may reasonably assume that the physical process becomes reversible at scale 100 (approximately four days).

Case Study: Monocacy River
In Figure 3, the irreversibility of the Monocacy River at Bridgeport is shown scale-wise. The ratio at scale 1 h is calculated as: r 1 = 1.390 and at scale 2 h as: r 2 = 1.197. As seen in Figure 3, we may reasonably assume that the physical process becomes reversible at scale 100 (approximately four days).
The results from the estimation of time irreversibility up to a timescale of 100 h are shown in Figure 2. The mean skewness ratios on scales 1 and 2 h are 2.51 and 1.9, respectively. The variance of the skewness ratio is 43.58 for scale 1 h, and 7.65 for scale 2 h. It is observed that the irreversibility is on average very pronounced at the first few scales and then slowly converges to zero at the scale of four days, where the process becomes approximately reversible. However, a large variability is noted among the stations.

Case Study: Monocacy River
In Figure 3, the irreversibility of the Monocacy River at Bridgeport is shown scalewise. The ratio at scale 1 h is calculated as: = 1.390 and at scale 2 h as: = 1.197. As seen in Figure 3, we may reasonably assume that the physical process becomes reversible at scale 100 (approximately four days).   Next, for the preservation of time irreversibility at the second scale, the ( ) function needs to be estimated as the smooth minimum of two hyperbolic functions of frequency. The sequence that is found after optimization is shown in Figure 5. It may seem a continuous smooth line, but in reality it comprises 1024 × 2 + 1 coefficients . After estimating the process's dependence characteristics and time asymmetry, we produce synthetic series with a similar stochastic behaviour. In Figure 6a, we show the results from 100 simulations of 10,000 length for the original method, while the results of the modified one are shown in Figure 6b. At each scale we compare the average of the simulations to the target irreversibility. It is shown that the irreversibility targets are adequately achieved. It is also observed that the first method cannot efficiently achieve the second scale target as was expected. However, it is close to it.
Additional research could be directed towards quantifying time asymmetry worldwide and connecting it with conceptual characteristics of the basin, e.g., the surface area. Downstream stations were observed to have a higher time irreversibility index than the upstream ones. This could also be a topic of future research. Furthermore, other parametric odd functions such as the ( ) function could be explored to preserve Next, for the preservation of time irreversibility at the second scale, the θ(ω) function needs to be estimated as the smooth minimum of two hyperbolic functions of frequency. The sequence that is found after optimization is shown in Figure 5. It may seem a continuous smooth line, but in reality it comprises 1024 × 2 + 1 coefficients θ ω j .
As described before, the Filtered Hurst Kolmogorov model is fitted to the approximately deseasonalised time series through both the climacogram (Figure 4a) and climacospectrum (Figure 4b). The parameters were calculated as: = 19.399 h, = 0.628 and = 0.724. Next, for the preservation of time irreversibility at the second scale, the ( ) function needs to be estimated as the smooth minimum of two hyperbolic functions of frequency. The sequence that is found after optimization is shown in Figure 5. It may seem a continuous smooth line, but in reality it comprises 1024 × 2 + 1 coefficients . Figure 5. The sequence of values preserving irreversibility at both scales simultaneously for j > 0. (Note that for j = 0, θ(ω0) = 0 for j < 0, θ(ωj) = -θ(ω-j ).
After estimating the process's dependence characteristics and time asymmetry, we produce synthetic series with a similar stochastic behaviour. In Figure 6a, we show the results from 100 simulations of 10,000 length for the original method, while the results of the modified one are shown in Figure 6b. At each scale we compare the average of the simulations to the target irreversibility. It is shown that the irreversibility targets are adequately achieved. It is also observed that the first method cannot efficiently achieve the second scale target as was expected. However, it is close to it.
Additional research could be directed towards quantifying time asymmetry worldwide and connecting it with conceptual characteristics of the basin, e.g., the surface area. Downstream stations were observed to have a higher time irreversibility index than the After estimating the process's dependence characteristics and time asymmetry, we produce synthetic series with a similar stochastic behaviour. In Figure 6a, we show the results from 100 simulations of 10,000 length for the original method, while the results of the modified one are shown in Figure 6b. At each scale we compare the average of the simulations to the target irreversibility. It is shown that the irreversibility targets are adequately achieved. It is also observed that the first method cannot efficiently achieve the second scale target as was expected. However, it is close to it.
Additional research could be directed towards quantifying time asymmetry worldwide and connecting it with conceptual characteristics of the basin, e.g., the surface area. Downstream stations were observed to have a higher time irreversibility index than the upstream ones. This could also be a topic of future research. Furthermore, other parametric odd functions such as the θ(ω) function could be explored to preserve irreversibility at larger scales in an implicit manner. Scaling and time asymmetry is also a subject that requires further investigation. In any case, the proposed methodology is of such generality that it can be used to explicitly preserve the irreversibility at even larger scales along with the process's second-order (or higher) scaling behaviour. irreversibility at larger scales in an implicit manner. Scaling and time asymmetry is also a subject that requires further investigation. In any case, the proposed methodology is of such generality that it can be used to explicitly preserve the irreversibility at even larger scales along with the process's second-order (or higher) scaling behaviour.

Conclusions
Time's arrow has an important role in science and is often related to randomness and uncertainty. We have investigated the time asymmetry of streamflow data at fine timescales in order to assess the importance of taking it into account in streamflow modelling. The irreversibility of the streamflow process was quantified by the ratio of the values of the skewness of the differenced process and that of the original process . We have performed a large sample analysis of the USGS hourly streamflow (762 stations) and found the irreversibility index of the latter to be on the average equal to 2.5 at the 1 h scale and 1.9 at the 2 h scale. The process became approximately reversible around the timescale of four days, yet a large variability was observed among the stations.
Further, this study proposes a modification to the existing method by Koutsoyiannis [3] that preserves irreversibility only at one scale, making it capable of preserving the irreversibility explicitly at two scales. We have validated the proposed method by a case study of measured streamflow timeseries and found it to be successful and able to adequately preserve the irreversibility implicitly at even greater scales than the target ones.
Overall, our results suggest that temporal irreversibility is a marked property of the streamflow process that manifests itself up to several days in empirical records. The proposed modification of the AMA simulation method is an additional step towards achieving its multiscale stochastic modelling in order to generate realistic and theoretically consistent hydrographs.

Conclusions
Time's arrow has an important role in science and is often related to randomness and uncertainty. We have investigated the time asymmetry of streamflow data at fine timescales in order to assess the importance of taking it into account in streamflow modelling. The irreversibility of the streamflow process was quantified by the ratio of the values of the skewness of the differenced process C s and that of the original process C s . We have performed a large sample analysis of the USGS hourly streamflow (762 stations) and found the irreversibility index of the latter to be on the average equal to 2.5 at the 1 h scale and 1.9 at the 2 h scale. The process became approximately reversible around the timescale of four days, yet a large variability was observed among the stations.
Further, this study proposes a modification to the existing method by Koutsoyiannis [3] that preserves irreversibility only at one scale, making it capable of preserving the irreversibility explicitly at two scales. We have validated the proposed method by a case study of measured streamflow timeseries and found it to be successful and able to adequately preserve the irreversibility implicitly at even greater scales than the target ones.
Overall, our results suggest that temporal irreversibility is a marked property of the streamflow process that manifests itself up to several days in empirical records. The proposed modification of the AMA simulation method is an additional step towards achieving its multiscale stochastic modelling in order to generate realistic and theoretically consistent hydrographs.  Data Availability Statement: All the data used in this study are available online at https://waterdata. usgs.gov/nwis/rt (accessed on 7 April 2021).