Stochastic Analysis of Hourly to Monthly Potential Evapotranspiration with a Focus on the Long-Range Dependence and Application with Reanalysis and Ground-Station Data

: The stochastic structures of potential evaporation and evapotranspiration (PEV and PET or ETo) are analyzed using the ERA5 hourly reanalysis data and the Penman–Monteith model applied to the well-known CIMIS network. The latter includes high-quality ground meteorological samples with long lengths and simultaneous measurements of monthly incoming shortwave radiation, temperature, relative humidity, and wind speed. It is found that both the PEV and PET processes exhibit a moderate long-range dependence structure with a Hurst parameter of 0.64 and 0.69, respectively. Additionally, it is noted that their marginal structures are found to be light-tailed when estimated through the Pareto–Burr–Feller distribution function. Both results are consistent with the global-scale hydrological-cycle path, determined by all the above variables and rainfall, in terms of the marginal and dependence structures. Finally, it is discussed how the existence of, even moderate, long-range dependence can increase the variability and uncertainty of both processes and, thus, limit their predictability.


Introduction
Evapotranspiration is a paramount element in hydrology, with relevance in many aspects of the geosciences. From hydrological and agronomic perspectives, the potential evapotranspiration (PET) and (potential) evaporation (PEV) are key for water balance estimation, the assessment of crop water demand, and integrated rainfall-runoff modelling. PET [1] is defined as "the amount of water transpired in a given time by a short green crop, completely shading the ground, of uniform height and with adequate water status in the soil profile". A particular (reference) case thereof is the reference evapotranspiration (ETo), which refers to "the rate of evapotranspiration from a hypothetical reference crop with an assumed crop height of 0.12 m, a fixed surface resistance of 70 s/m, and an albedo of 0.23, closely resembling the evapotranspiration from an extensive surface of green (cool season) grass of uniform height, actively growing, well-watered, and completely shading the ground" [2]. Evaporation is the physical process by which liquid water enters the atmosphere as water vapor. In what follows, when we refer to all of the above processes, we use the acronym PE. We also note that PE may be different from the actual evapo(transpi)ration (in cases where there is not adequate water availability).
For the PE assessment, historically, many models have been developed highlighting the Penman-Monteith model as the most suitable [3]. One of the main shortcomings of estimating PE with the Penman-Monteith model is the requirement of a significant number of meteorological inputs such as, without distinction, temperature, radiation, relative so that we could compare its stochastic structures to the PET records, with a focus on the long-range dependence (LRD) behavior.
In Section 2, we introduce the methodology on the estimation of the marginal and second-order dependence structures, while in Section 3, we present the statistical characteristics of the selected stations as well as the results obtained from the analysis, with a focus on the marginal and the dependence structures. Finally, in Sections 4 and 5, we summarize our findings, and we discuss how the results may be consistent with the ones obtained from the hydrological-cycle path under HK dynamics, expanding from Gaussian to Pareto-type tail behavior, and from fractal and intermittent behavior at small scales to LRD behavior at large scales.

Metrics of Marginal and Dependence Structures
The estimators and models applied for both the marginal and the second-order dependence structures are part of the stochastic framework of the HK dynamics, with a focus on the LRD behavior [30][31][32][33][34][35], and they have been applied to turbulent and key hydrologicalcycle processes of global networks with resolutions spanning from small scales (relevant to the fractal behavior) to climatic scales (for a review, see [26]).
It has been shown that a flexible probability distribution function, which seems to fit well a great variety of key hydrological-cycle processes [25,26], with tail-behaviors ranging from Gaussian to Pareto, is the PBF distribution function [24,[36][37][38], i.e.: where x > d, d is a location parameter (in units of x), ζ and ξ are dimensionless shape parameters, and λ is a scale parameter (in units of x). It is noted that here the Dutch convention is adopted, where underlined symbols denote random variables and stochastic processes. The estimation of the parameters of the PBF distribution function for the identification of the marginal structure of the PE process is based on the first four statistical moments, and particularly on the central moments and coefficients (i.e., mean, variance, skewness, and kurtosis). It is stressed that, although the estimation from the classical moments of high order are unknowable, especially in the presence of heavy tails and LRD [25], the hourly PEV and the monthly PET processes are expected to be close to a light-tail behavior and, therefore, the estimation of skewness and kurtosis coefficients could be, in approximation, reliably estimated from data.
For the dependence structure of the PE processes, we select the climacogram metric, which is defined as the variance of the averaged process at the scale domain [7]. i.e.: where k is the scale (in units of x). (See discussion on the origins of the name, mathematical definitions, etc., in [26,39]) It has been shown that the climacogram estimator at the scale domain is a more powerful estimator than the autocovariance function at the lag domain or the powerspectrum at the frequency domain [34], while its classical estimator adjusted for bias is defined as [40]:γ 4 of 14 where κ = k/∆ is the dimensionless scale, ∆ is the time resolution of the process,μ is the mean of the process, n/κ is the integer part of n/κ, and x (κ) i is the i-th element of the averaged sample of the process at scale κ, i.e.: For the climacogram model, contained in the above estimator, we select a generalization of the HK model (for details and more sophisticated models, see [25,26]), which has been shown to well simulate processes from sub-hourly to over-annual resolutions, and from short-to long-term scales associated with fractal and LRD behaviors that exclude the drop of variance at the intermediate scales: where a is the variance of the process, q is a scale parameter (in units of the scale k), M is the fractal parameter, and H is the Hurst parameter indicative of the LRD of the process, i.e., for 0.5 < H < 1 the process exhibits LRD behavior, while for 0 < H < 0.5 it exhibits an anti-persistent behavior, and for H = 0.5 a white-noise behavior. Here, the standardized climacogram is used, i.e.,γ(k)/γ(1), because the effect of the sample variance is already accounted for through the marginal fitting. We also note that a Gaussian process with q → 0 and M = 0.5 coincides with the well-known fractional Gaussian noise model (e.g., [41]).

Data Extraction and Processing
For the analysis of the hourly PEV process, we use the reanalysis ensemble data extracted (access date at 29/10/2021; with coordinates S32-N42 and W115-E125) from the ERA5 [42] of the Centre for Medium-Range Weather Forecasts (ECMWF; https://cds.climate.copernicus.eu/ accessed on 1 October 2021) across California ( Figure 1) and for the period 1979-today (Table 1).    Observations for the PE processes are usually available in monthly or daily resolutions and usually only for short periods, while a global gridded dataset based on the ERA5 data has been recently released [43]. Here, we use two datasets and compare the marginal and dependence structures of the reanalysis PEV timeseries with the PET timeseries of coarser monthly resolution, extracted from a network of 41 ground stations (see details in Table 1 and Figures 2 and 3). Particularly, the monthly Penman-Monteith dataset of the CIMIS network is used, in which reference evapotranspiration and potential evapotranspiration coincide due to local surface and vegetation conditions. The samples of 41 meteorological stations (https://cimis.water.ca.gov/, accessed on 1 October 2021) are well-distributed across California (Figure 1) for the period 1983-2013 (Table 1), which corresponds to a maximum of 372 monthly values. The meteorological network has been developed in cooperation with Davis University, and the local environment of the meteorological stations allow accurate estimation of the PET.     To account for the impact of the double periodicity (diurnal and seasonal) of the PE processes on the dependence structure, we simulate the transformed process by applying a double standardization on the original timeseries. Particularly, we subtract the hourly and monthly means (Figures 4 and 5) and then we divide with the hourly and monthly standard deviations (Figures 6 and 7). Other transformation methods could be applied that take into consideration higher moments (e.g., [26]) such as skewness ( Figure 8) and kurtosis ( Figure 9) coefficients, or even more sophisticated ones [44]; however, as can be derived from Table 1 and Figures 6 and 7, the PE processes (especially the aggregated PET process) is close to a light-tail distribution, and therefore we do not expect any significant differences by applying those methods. After the double standardization, we de-stand-  To account for the impact of the double periodicity (diurnal and seasonal) of the PE processes on the dependence structure, we simulate the transformed process by applying a double standardization on the original timeseries. Particularly, we subtract the hourly and monthly means (Figures 4 and 5) and then we divide with the hourly and monthly standard deviations (Figures 6 and 7). Other transformation methods could be applied that take into consideration higher moments (e.g., [26]) such as skewness ( Figure 8) and kurtosis  Figure 9) coefficients, or even more sophisticated ones [44]; however, as can be derived from Table 1 and Figures 6 and 7, the PE processes (especially the aggregated PET process) is close to a light-tail distribution, and therefore we do not expect any significant differences by applying those methods. After the double standardization, we de-standardize each timeseries based on the total mean and standard deviation of the original timeseries (Table 1 and Figure 10). Finally, we fit the marginal and dependence models described in the previous section to each transformed timeseries, and the results are depicted and described in the next section. To account for the impact of the double periodicity (diurnal and seasonal) of the PE processes on the dependence structure, we simulate the transformed process by applying a double standardization on the original timeseries. Particularly, we subtract the hourly and monthly means (Figures 4 and 5) and then we divide with the hourly and monthly standard deviations (Figures 6 and 7). Other transformation methods could be applied that take into consideration higher moments (e.g., [26]) such as skewness ( Figure 8) and kurtosis (Figure 9) coefficients, or even more sophisticated ones [44]; however, as can be derived from Table 1 and Figures 6 and 7, the PE processes (especially the aggregated PET process) is close to a light-tail distribution, and therefore we do not expect any significant differences by applying those methods. After the double standardization, we de-standardize each timeseries based on the total mean and standard deviation of the original timeseries (Table 1 and Figure 10). Finally, we fit the marginal and dependence models described in the previous section to each transformed timeseries, and the results are depicted and described in the next section.

Results
The PBF marginal distribution function is fitted to each transformed timeseries (e.g., Figure 11), and the parameters for all transformed timeseries can be seen in Table 2. Note that the fit of the PBF to all timeseries is exceptionally good. From Table 2, it can be observed that the transformed PEV and PET processes exhibit a light-tail behavior. The average values of the shape parameters are estimated as ξ ≈ 0.04 and ζ ≈ 5.7 for the CIMIS dataset, and ξ = 0.08 and ζ = 7.6 for the ERA5 transformed timeseries. It has been shown [25] that the tail index, ξ, does not depend on the averaging scale. Therefore, the slight differences in the estimated values are either due to statistical uncertainty or to differences in the nature of the data. The PBF marginal distribution function is fitted to each transformed timeseries (e.g., Figure 11), and the parameters for all transformed timeseries can be seen in Table 2. Note that the fit of the PBF to all timeseries is exceptionally good. From Table 2, it can be observed that the transformed PEV and PET processes exhibit a light-tail behavior. The average values of the shape parameters are estimated as ξ ≈ 0.04 and ζ ≈ 5.7 for the CIMIS dataset, and ξ = 0.08 and ζ = 7.6 for the ERA5 transformed timeseries. It has been shown [25] that the tail index, ξ, does not depend on the averaging scale. Therefore, the slight differences in the estimated values are either due to statistical uncertainty or to differences in the nature of the data.
Additionally, the combined climacogram from all the empirical ones for the CIMIS transformed timeseries is depicted in Figure 12 and compared to the one from the ERA5 transformed timeseries depicted in Figure 13. It can be observed that a Hurst-Kolmogorov behavior is detected in both data sources, with a Hurst parameter of approximately 0.65. Specifically, the estimated parameters for the CIMIS dataset are H = 0.64 and q = 1.17 months (M is assumed to be 0.5 because the empirical climacogram is very close to an fGn process), and for the ERA5 timeseries they are H = 0.69, q = 19.7 h, and M = 0.8.  Table 2. Parameters of the marginal probability distribution function for all transformed timeseries of each station (note that the squared correlation coefficient is R 2 > 0.99 for all models). The symbols ξ, ζ, λ, and d correspond to Equation. (1).

Sequence
Name  Additionally, the combined climacogram from all the empirical ones for the CIMIS transformed timeseries is depicted in Figure 12 and compared to the one from the ERA5 transformed timeseries depicted in Figure 13. It can be observed that a Hurst-Kolmogorov behavior is detected in both data sources, with a Hurst parameter of approximately 0.65. Specifically, the estimated parameters for the CIMIS dataset are H = 0.64 and q = 1.17 months (M is assumed to be 0.5 because the empirical climacogram is very close to an fGn process), and for the ERA5 timeseries they are H = 0.69, q = 19.7 h, and M = 0.8.

Discussion
Here we discuss how the above results can contribute to the existing literature relating to the potential evaporation and evapotranspiration from the point of view of stochastics and, in particular, of the HK dynamics.
The stochastic analysis of the potential evaporation (PEV) and potential evapotranspiration (PET) presented is useful (a) to highlight the stochastic similarities between them, (b) to quantify the variability and uncertainty of these processes, and (c) to develop a stochastic model capable of simulating important stochastic characteristics, for purposes such as forecasting and risk management. The PEV timeseries is extracted in hourly resolution as a reanalysis ensemble over California and through the ERA5 network, while for

Discussion
Here we discuss how the above results can contribute to the existing literature relating to the potential evaporation and evapotranspiration from the point of view of stochastics and, in particular, of the HK dynamics.
The stochastic analysis of the potential evaporation (PEV) and potential evapotranspiration (PET) presented is useful (a) to highlight the stochastic similarities between them, (b) to quantify the variability and uncertainty of these processes, and (c) to develop a stochastic model capable of simulating important stochastic characteristics, for purposes such as forecasting and risk management. The PEV timeseries is extracted in hourly resolution as a reanalysis ensemble over California and through the ERA5 network, while for the PET, the high-quality CIMIS dataset with 41 stations is used over the same area for comparison.
The analysis of the above three tasks is performed based on the stochastic metrics and Hurst-Kolmgorov (HK) dynamics. Moreover, the marginal structures and secondorder dependence structures are compared to the structures of each other and of other key hydrological-cycle processes such as temperature, relative humidity, wind speed, streamflow, and precipitation, as analyzed from a global network of stations in [25].
In particular, and similar to the global analysis, it is illustrated how the Pareto-Burr-Feller (PBF) probability distribution function may well describe the marginal structure of both the hourly PEV and monthly PET. Additionally, both processes are shown to exhibit a light-tail behavior. However, it is noted that the shape parameters of the PBF (i.e., ξ and ζ), which characterize the type of the tail, are slightly smaller in the CIMIS data (i.e., overall mean from stations 0.04 and 5.7, respectively) as compared to the reanalysis data (i.e., 0.08 and 7.6, respectively), indicating a heavier tail for the latter.
Additionally, it is found that, similarly to the other key hydrological-cycle processes mentioned above, both PEV and PET processes exhibit long-range dependence, with a Hurst parameter of medium strength. In particularly, H is estimated as 0.65 and 0.68 for the PET and PEV processes, respectively, which is weaker than the ones for temperature, relative humidity, solar radiation, and wind speed (0.80-0.85 [25]) and stronger than the one for precipitation (i.e., 0.61 [25]) for the examined range of scales spanning from the hourly resolution to the climatic scales. This can be interpreted as an indication that the PET and PEV processes have a wider predictability time window than precipitation's, and narrower than the rest (i.e., entailing a higher degree of long-term unpredictability).
As a final remark, the need to apply a suitable stochastic model to reproduce important characteristics, such as LRD behavior, is stressed. The work shows the robust use of a stochastic framework to simulate the variability and uncertainty of a hydrometeorological process in emerging new practices and challenges: • Stochastic modelling of evapotranspiration at a fine time scale (e.g., hourly) is considered to be useful for numerous agronomist applications because it is strongly connected to the forecast of the plant water demands. In recent years of micro-farm techniques, the stochastic modelling of evapotranspiration, with sound physicalinterpretation, has tracked the attention of the scientific community in order to simulate more accurately the water-food-energy nexus. • A proper stochastic model for the simulation of the evapotranspiration should be based at a wide range of spatio-temporal scales and meteorological conditions; thus, a global-scale analysis is important in order to identify stochastic similarities so as to improve the simulation techniques. • Stochastic simulation of the error analysis between the modelled and the measured Penman-Monteith assessment could highly contribute to improving potential evapotranspiration estimates. • Stochastic PET modeling could offer a solid probabilistic frame for identifying the long-term trend of hydrometeorological components in horizons greater than the available records and thus is of potential interest for climatological studies.

Conclusions
A stochastic model is presented for hourly potential evaporation (PEV) and monthly potential evapotranspiration (PET) based on the ERA5 hourly reanalysis data and the Penman-Monteith model applied to the well-known CIMIS network.
It was found that both the marginal probability distributions of PEV and PET are lighttailed when estimated through the Pareto-Burr-Feller distribution function. Additionally, the long-range dependence of both the PEV and PET is found to be of moderate strength, quantified through a Hurst parameter of 0.64 and 0.69, respectively.
The above results reveal the stochastic similarities between the ground and reanalysis data series. Additionally, the results are shown to be consistent to the hydrological-path of the marginal and dependence structures of Hurst-Kolmogorov dynamics. In particular, both PET and PEV can be placed between the stochastic structures of temperature, relative humidity, solar radiation, and wind speed (i.e., strong LRD and light-to medium-tail) and the precipitation's structures (i.e., weak LRD and heavy tail). Finally, it is discussed how the existence of, even moderate, long-range dependence and tail distribution increase the variability and uncertainty of both processes, and thus limit their predictability.