Copula Bias Correction for Extreme Precipitation in Reanalysis Data over a Greek Catchment †

The projection of extreme precipitation events with higher accuracy and reliability that engender severe socioeconomic impacts more frequently is considered a priority research topic in the scientific community. Although large-scale initiatives for monitoring meteorological and hydrological variables exist, the lack of data is still evident particularly in regions with complex topographic characteristics. The latter results in the use of reanalysis data or data derived from regional climate models, however both datasets are biased to the observations resulting in nonaccurate results in hydrological studies. The current research presents a newly developed statistical method for the bias correction of the maximum rainfall amount at watershed scale. In particular, the proposed approach necessitates the coupling of a spatial distribution method, namely Thiessen polygons, with a multivariate probabilistic distribution method, namely copulas, for the bias correction of the maximum precipitation. The case study area is the Nestos River basin where the several extreme episodes that have been recorded have direct impacts to the regional agricultural economy. Thus, using daily data by three monitoring stations and daily reanalysis precipitation values from the grids closest to these stations, the results demonstrated that the bias corrected maximum precipitation totals (greater than 90%) is much closer to the real max precipitation totals, while the respective reanalysis value underestimates the real precipitation totals. The overall improvement of the output shows that the proposed Thiessen-copula method could constitute a significant asset to hydrologic simulations.


Introduction
The extreme precipitation episodes of recent decades have been observed more frequently and their intense is greater compared with the past [1].The impacts of these extremes are obvious in many different fields such as economy, society, agriculture and hydrology, resulting in the need of reliable projections.According to Mao et al. [2], in order to achieve higher accuracy in hydrological results, it is mandatory to correct the bias between a model's used real values of several climate parameters and especially precipitations.In cases where a lack of observed data has occurred, but reliable techniques such as the use of reanalysis data, is proposed as scientifically proven solution.Bastola and Misra [3] in particular demonstrate that reanalysis data is useful in simulating realistic hydrological response at watershed scales.At the same time, although precipitation estimates from global reanalysis are dynamically consistent with large-scale circulation, when precipitation is compared with rain gauge estimates the output is poor, since reanalysis products are forecast by the reanalysis system and precipitation is not assimilated [4].
Numerous studies in the fields of insurance and finance have attempted using the copula method [5,6].The utility of the copula method based on its ability to analyze the dependence of two or more random variables that have not necessarily the same distribution [7].Additionally, copulas have the advantage to capture the features of the dependence [8] and to examine this dependence not only linearly, as do other indices [2].As such, copulas have been used widely in hydrology of late.Shiau [9] suggest copulas for drought analysis in order to overcome the problem that drought characteristics have different distributions.Several scientists use copulas for analyzing drought characteristics (e.g., severity and intense) [10,11] or for correlating drought with other climate parameters such as precipitation [12].Furthermore Golian [13] used a bivariate copula function for studying the rainfall-runoff simulations of a watershed in the region of Iran, while Perera [14] used this method for studying the interdependence between the Kelani River and Kotte Canal in Sri Lanka.
The present study investigates the combination of the copula's probabilistic distribution method with the Thiessen polygon spatial distribution method to tackle bias correction of extreme precipitations reanalysis data in the important hydrological region of Nestos river basin in Greece.Thiessen polygons have been widely used in hydrology for spatial interpolation since the work of Lee and Schachter in 1980 [15].The evaluation of the results was achieved using statistical tools such as Taylor diagrams as well as the relative operating characteristics curves (ROC).The latter are popular in clinical epidemiology as they test the accuracy of a diagnosis [16] and in this study is checked the accuracy of the bias corrected values.

Data
The present study uses daily precipitation data from four meteorological stations located in the Nestos catchment (Figure 1a).Apart from the observed precipitation records, ERA-Interim reanalysis data with spatial resolution of 12.5 × 12.5 km, were retrieved by the European Centre for Medium-Range Weather Forecasts (ECMWF) for the selected case study area.Thus, for every station the closest continental grid point that presented similar topographic characteristics was selected.Both reanalysis data and observed records cover a time period of nine sequential years, i.e., from 1987 to 1995.

Methodology
According to Nelsen [17], copulas are multivariate cumulative distribution functions with the ability to mathematically model the dependence between two or more variables using their marginal distributions.Assuming that X and Y are two random variables with F and G marginal distributions respectively, the joint distribution function of the initial variables is H and is equal to the copula function of their marginal (H(x,y) = C(F(x), G(y)).The centered theorem of copulas is the Sklar's theorem [18].According to that, if the marginals of the studied variables are continuous, then the copula function can uniquely be defined.Otherwise, C is unique on RanF × RanG (Ran is the range).The importance of this theorem is that every joint distribution function can be decomposed into the marginal of the variables and into a copula function, which completely describes their dependence.
In the present study, the copula method was combined with thiessen triangles, which is an alteration of the thiessen polygons method, to achieve a bias correction of total extreme precipitation between real observations and reanalysis data.Firstly, three (stations 1, 2 and 4) of the four available stations were used for the analysis while the other one (station 3) for evaluation.Specifically, the three selected stations formed a triangle that includes the tested station (Figure 1b).
For the stations located at the triangle vertices, the absolute maximum and the monthly precipitations were used to model the dependence among them.Twelve copula families (Gaussian, Student t, Clayton, Gumbel, Frank, Joe, BB1, BB6, BB7, BB8, Tawn type 1 and 2) coming from both Archimedean and elliptical categories were tested in order to select the one that can best describe the dependence.The final selection was based on Akaike's information criterion [19] and Bayesian information criterion.Thereafter, using the copula results by the stations located at the triangle vertices and taking also into account the distance between them and the evaluated stations (x-point), a new copula family was defined (Figure 1c).The newly defined copula family can mathematically describe the dependence between the mean and maximum precipitation at the x-point.Consequently, using the new copula family and the reanalysis mean precipitation data, the maximum extreme bias corrected precipitations have been calculated.

Total and Extreme Precipitation Analysis
The dependence between absolute maximum extreme and monthly precipitation was assessed using copulas to three of the four studied stations.After testing several copula families, it was found that the Survival Clayton family reflects the dependence for both station 1 and station 4, while Frank copula reflects the dependence for station 2. The dependence is more power for the Prasinada station as the Kendall's tau correlation index is almost 0.87, while it is almost 0.78 for the other two stations.Figure 2 visualizes that relationship, confirming the fact that Survival Clayton copula presents upper-tail dependence (stations 1 and 4) while Frank presents no upper or lower tail dependence (station 2).

Bias Correction Results
Reanalysis data presents important biases from real observations especially for precipitation parameters and extreme events.As a consequence, the main purpose of this study is to reduce extreme rainfall biases between real and reanalysis data at the scale of a hydrological basin.Table 1 presents the observed reanalysis and the bias corrected extreme precipitation indices for the station of Sidironero.Extreme precipitation indices are defined as the 90th, 95th and 99th percentile of the monthly precipitation [20].
As it can be seen from Table 1, for all indices the bias corrected values are closer to the observed ones compared with reanalysis data.Additionally, as the percentile became higher, reanalysis presents higher differences compared to the observations, while the bias corrected were closer to the observed ones.This is also proved by the Taylor diagram (Figure 3a) of the studied data sets.In particular, the Taylor diagram shows that the correlation between observed and reanalysis data was almost zero while after the bias correction the correlation increased to 0.5.Additionally, the root mean square error has been reduced while there is an increase of the variation.An additional evaluation of the results was conducted with the use of the ROC curves.According to Figure 3b the area under curve-which is an effective measure of accuracy [21]-is bigger for the bias corrected values proving that the Thiessen-copula method can be used for the bias correction of extreme precipitation.

Discussion-Conclusions
The bias correction of extreme rainfall with the coupling of the copula method and an alteration of the Thiessen polygon method is presented.The proposed method adjusts the extreme reanalysis precipitation data to observed data in a selected river basin in Greece.The method's evaluation was achieved after the comparison of the bias corrected values with observed datasets using different statistical and optional methods.
Lafon et al. [22], Teutschbein and Seibert [23] have studied and compared different bias correction methods in hydrology.Methods such as the linear scaling approach, the delta change method [24] or local intensity scaling [25] use simple mathematical equations for bias correction.However, they mainly focus on estimating mean values without expanding to the whole distribution [2].
Additionally, as Yang et al. [26] mention, the accuracy for extremes is much lower even with more dynamic methods.With the same concept, Berg, et al. [27], indicate the importance of bias correction to reanalysis products, regarding soil moisture, runoff, and snow water equivalence at simulations covering the geographic area of North America.
The results of the present study show that the copula method combining with Thiessen triangles can be an accurate tool for rainfall extremes bias correction.The copula method also presented satisfying results in Mao's [2] study, wherein it was used for the bias correction of the model's precipitation values in Germany.Additionally, Piani [28] propose the use of copula for a two-dimensional bias correction for the parameters of temperature and precipitation in climate models.In accordance with this study, the success of the method derived from the ability of the copula method to satisfactorily represent the dependence structure of the studied variables.Consequently, the dependence structure is not the same in every station in a specific region, as it is also observed in the three studied stations of the presented case study.Furthermore the bias corrected extreme precipitations not only were much closer to the real ones but they also have higher correlation with the observed extremes, as well as the root mean square error is lower compared with the reanalysis data.The need of bias correction is more obvious in the case of climate change, where the models outputs need to be bias-corrected before it can be used for climate change impact studies.In that case, various methods such as the quantile mapping, the cumulative distribution function transform (CDF-t), and equidistant quantile matching are presented in the literature [29].In conclusion, this investigation proposes the copula method in combination with the Thiessen triangles technique as a useful tool for the bias correction of extreme precipitations.It is also believed that this method would be a fruitful area for further application in large river basins.
Author Contributions: All authors conceived and initiated the study.G.L. analyzed the data.All authors contributed to the discussion and interpretation of the results and the writing of the manuscript.

Figure 1 .
Figure 1.(a) The map of Greece-the red circle includes the studied region (b) zoom of the Nestos region and the four studied stations with the studied triangle (1 = Achladia, 2 = Prasinada, 3 = Sidironero, 4 = Toxotes), (c) St1 to St3 indicate the three stations at the triangle vertices.The x-point is the unknown station and dist1 to dist3 are the distances between x-point and the three stations.

Figure 2 .
Figure 2. Structure of values upon which dependence is modelled by Survival Clayton or Frank Copulas.

Figure 3 .
Figure 3. (a) Taylor diagram of extreme precipitation at Sidironero.The blue circle presents the reanalysis data and red circle the bias corrected data.(b) ROC -Relative Operating Characteristics curves of extreme precipitation at Sidironero.The black line concerns the observed extreme values and red line the bias corrected values.

Table 1 .
Extreme precipitation indices (mm) from observed, reanalysis and bias-corrected data.