Physics Parameterization Selection in RCM and ESM Simulations Revisited: New Supporting Approach Based on Empirical Copulas

This study aims at a new supplementary approach to identify optimal configurations of physics parameterizations in regional climate models (RCMs) and earth system models (ESMs). Traditional approaches separately evaluate variable performance, which may lead to an inappropriate selection of physics parameterization combinations. Besides traditional approaches, we suggest an additional selection approach by considering the joint dependence structure (covariance structure) between key meteorological variables, i.e., precipitation P and temperature T. This is accomplished by empirical P and T copula functions and the χ2-test, and is demonstrated in two locations in Kenya with different major precipitation processes. It is shown that the selection based on traditional approaches alone may lead to nonoptimal decisions in terms of joint dependence structure between P and T. It was found that the copula-based approach may reduce the need for complex multivariate bias correction, as demonstrated using local intensity scaling for P and linear scaling for T. The new approach may contribute to improving RCM and ESM simulations and climate-impact studies worldwide.


Introduction
The fidelity of regional climate models (RCMs) and earth system models (ESMs) in reproducing the observed regional climate provides confidence for the generation of regional climate projections for the future. RCMs and ESMs are still the tools of choice to derive future climate projections for climate-change-impact studies, such as in hydrology and agriculture (e.g., Reference [1]). In these disciplines, precipitation P and temperature T (P and T) belong to the most important meteorological variables (e.g., References [2,3]) used for validation.
Recently, it was found that extreme precipitation markedly rises at higher atmospheric temperatures, faster than the rate of the atmosphere's water-holding capacity (Clausius-Clapeyron equation). As a consequence of temperature increases induced by climate change, intensification of (extreme) precipitation intensities with regionally varying magnitudes is expected [4].
Similarly to the atmosphere, such a link between P and T also exists for the lower atmosphere, next to Earth's surface (e.g., Reference [5]). From this, it becomes obvious that a correct representation of the joint distribution of surface P and T is a critical aspect for generating reliable RCM and ESM projections.
Even though their horizontal resolution is high (typically ranging in the order from 50 to 5 km), RCMs need to represent processes at smaller scales than those they can explicitly resolve. Besides land-surface processes, the most crucial processes to be parameterized in RCMs include radiation, convection, and cloud microphysics, partly with complex interactions. Oettli et al. [1], for instance, found that obtained results from two different physics configurations of the same RCM may be more distinct than the two different RCMs themselves. One of the key challenges is precipitation modeling: in an RCM, its generation is subdivided into a large-scale scheme, accounting for clouds and precipitation that result from atmospheric processes resolved by the models (e.g., cyclones and frontal systems), and a convection scheme, describing clouds and precipitation resulting from subgrid scale-convective processes [3]. Precipitation generation thereby involves many coupled processes between cumulus convection, cloud microphysics, radiation, land and ocean surface, and the planetary boundary layer [6].
The choice of the parameterization schemes has significant impact on climate projections (e.g., Reference [7]). It is therefore indispensable to identify a suitable physics parameterization combination prior to conducting a long-term RCM simulation (e.g., Reference [8]), but also short-term weather forecasts. Both for a full-factorial (e.g., References [9][10][11]) and a one-change-at-a-time approach (e.g., References [12][13][14]), analyses of the performance of different physical-parameter configurations are based on specific metrics, separately evaluated for the climate variables of interest. This is a severe limitation because it might lead to inappropriate selection of a configuration in terms of joint distributions of major variables, which may finally lead to complex and nonlinear biases in RCM simulations [15,16].
Our working hypothesis is that a better reflection of physical consistency, i.e., covariance between key climate variables in RCM and ESM simulations, allows to reduce complex biases that are difficult to correct with postprocessing bias-correction applications, also referred to as Model Output Statistics (MOS) in the following sections. A better reflection of joint distribution (and variability) would finally help boost the reliability of future RCM and ESM projections.
In this study, we employed the concept of empirical copulas to reflect the joint distribution (dependence structure) of key climate variables in RCM and ESM simulations. Here, we focus on precipitation P and temperature T. For this reason, we analyzed the empirical P and T copula from the observations and compared them to simulated P and T copulas of various RCM physics parameterization simulations, known to be the most sensitive, thus reflecting the highest uncertainties in the RCMs and ESMs. Better reflecting joint distribution and covariance helps increase the credibility of future RCM projections.
For many regions worldwide, physics parameterization studies reveal fundamental problems in selecting optimal RCM parameterization configurations: the selected configuration can better reproduce either P or T, and a compromise solution is necessary. Kenya is such a region, with a high demand for reliable climate projections but without common agreement on a suitable parameterization configuration in the literature [17][18][19].

Material and Methods
A short description of the used data and applied methods in this study is briefly given below.

Data
In this study, observational P and T data of the Tana River Basin (TRB) in Kenya were used. Due to the different characteristics of P and T within this region and due to the low number of missing values (<2%), we selected two different stations, namely, MERU and LAMU ( Figure 1) to demonstrate our approach. Specifically for the East African region, only a few efforts so far have been made to identify a suitable physics parameterization configuration (e.g., References [17,19]). However, sensitivity studies from the Coordinated Regional Climate Downscaling Experiment (CORDEX) are provided to asses the performance of physical parameterization for different domains worldwide. Kenya is covered by both the CORDEX-Africa [18] and the CODEX-MENA (e.g., [20,21]) region, but suitable CORDEX parameter combinations are usually selected based on performance statistics for the entire modelling domain, but not for single countries, such as Kenya.
For this reason, we applied the regional Weather Research and Forecasting (WRF) model in the domain from 12 • S to 13 • N, and from 22 • E to 52 • E, encompassing most of East Africa. For this domain, a horizontal resolution of 50 km was used in order to test several physics parameterization combinations. It should be remarked that a full-factorial approach could not be applied due to computational constraints, but simulations cover two relatively long simulation periods, (1) from 2005-2009 (five years) and (2) from 2011-2014 (four years). Both periods comprise a two-month period for spin-up for all WRF configuration simulations. More information about the specific WRF model setup is given in Kerandi et al. [19] and Kerandi et al. [22]. The WRF model is an RCM, but the same approach could be used for any ESM as well.
We are aware of the limitations in terms of the considered observation stations for validation. In this paper, however, the focus is on the presentation of the new approach rather than studying a large number of locations across the TRB.
From the different conducted WRF parameterization runs (Table 1), we extracted modeled P and T tuples from the RCM grid cells in which observation stations MERU and LAMU are located. Practically, only the positive pairs of P and T (i.e., pairs in which P > 0) could be considered (intermittent nature of precipitation). We also cross-checked the selected RCM grid cells by correlation analysis between RCM runs and observations prior to further analysis in order to exclude the displacement effects of the RCM simulations. Table 1. Weather Research and Forecasting (WRF) physics parameterization combinations used in this study (first letter of the abbreviation represents the cumulus convection scheme, the second the microphysics scheme, and the third the planetary boundary layer, which is the Asymmetric Convective Model (ACM) in all cases).

Empirical P and T Copula
Copulas have been extensively used in various fields of hydrology and meteorology, such as for improving the spatial resolution of modeled precipitation [29], and for bias-correcting the output from RCMs (e.g., References [30,31]). They have also been applied in the field of data fusion of different hydrometeorological datasets (e.g., References [32,33]).
Here, only a brief theoretical introduction to the concept of copulas is given, while a more comprehensive description can be found in Reference [34]. In general, copulas are (theoretical) functions that link bi-or multivariate distribution functions F (x 1 , ..., x n ) from independent and identically distributed (iid) random variables x 1 , ..., x n to their univariate marginal distributions F X i (x i ). This is expressed in Sklar's theorem [35]: with multivariate distribution F (x 1 , ..., x n ), copula C, and univariate marginals F (x 1 , ..., x n ). Multivariate probability density function (PDF) is then given through: Thus, dependency between two or more random variables is fully described by the Copula-PDF c, and independent from univariate marginal distributions f X . In this study, we focus on the bivariate case and only on observations (i.e., P and T observations). For this purpose, and under the assumption of a sufficient quantity of observational data, the dependence structure can be directly estimated from the data, i.e., the empirical copula. The description of the empirical copula, also known as Deheuvels' copula, was given in many publications (e.g., References [30,31,36]). Let {r 1 (1), . . ., r 1 (n)} and {r 2 (1), . . ., r 2 (n)} denote the rank space values that are derived from the fitted theoretical marginal distributions. Then, empirical copula C n is a rank-based estimator of C θ : with u = F X (x), v = F Y (y) and 1(. . .) denoting the indicator function and n the sample size. C n is a discontinuous approximation of C θ , to which it converges uniformly. Since it is completely nonparametric, it can be considered to be the most objective approximation of underlying true copula C n (e.g., Reference [37]), which makes it a suitable candidate for GoF test statistics. On the other hand, C n densities are estimated at discrete increments (lattice) to be defined between those values in order to circumvent the problem of nonuniqueness [36]. Thus, the empirical copula is expected to depend on the lattice.

Selection of Physics Parameterization Combination
The empirical copula from the observations is compared to the copulas from the different physics simulations by visual inspection and Pearson's χ 2 -test (e.g., Reference [38]), which compares an observed histogram with an expected PDF, and operates more naturally for discrete than for continuous random variables since, to implement it, the range must be divided into classes [38]. Division into classes is already given, as the empirical copula density is estimated at discrete incremental steps, i.e., the lattice (see Section 2.2). In our case, the observed C n can be directly compared with the simulated C s,p , in which p refers to the data of a specific WRF physical parameterization simulation. The χ 2 -test statistic involves the counts of C s,p data values within the predefined lattice in relation to the frequencies of the observed C n within the same lattice. As a rule of thumb, classes with small numbers of expected counts should be avoided [38]. For this reason, the test is repeated for various lattice configurations in order to check its robustness. Generally speaking, χ 2 -test examine the independence of each table dimension, yielding the test's p-value. Values of p close to zero cast doubt on the assumption of independence.

Performance Evaluation
In order to evaluate the performance of the selection based on the new approach (Section 3.2), two simple MOS are applied: linear scaling [39] for temperature, and local intensity scaling [40] for precipitation. For T, the mean bias of the model is calculated for each month m of the period used for calibration (2005)(2006)(2007)(2008)(2009), and is then added to the model data, i.e., the RCM data of the validation period (2011-2014) at every time step t (here, daily): where corr denotes the corrected time series, and obs and mod denote the observed and modeled time series, respectively. The local intensity scaling approach [40] has been used to debias P. It consists of two steps: first, wet-day frequencies of the model are matched with those of the observations, including monthly precipitation threshold P thresh,m ; second, a scaling factor based on the matched frequencies is derived to match the mean of the observed and modeled precipitation intensities. The correction is then performed following the binary assignment: if P mod,m (t) < P thresh,m P mod,m (t) · P obs,m P mod,m , otherwise.
In both cases, transfer functions are derived on a monthly basis (i.e., one for each calendar month) before these are applied to the daily time series of P and T, separately, for all WRF physics parameterization simulations. MOS schemes are calibrated for 2005-2009, and the merit for different RCM physics parameterization simulations is assessed for the 2011-2014 validation period by calculating the mean absolute error (MAE), the mean squared error (MSE), and the root mean square error (RMSE). Figure 2 illustrates the bivariate PDFs of the simulated and observed P and T pairs for the different physical parameterization combinations for LAMU. Visually, it is very difficult to observe remarkable differences between the different combinations, or to judge which parameterization combination is preferrable. For the MERU station, clear vertical displacement of the WRF simulations (i.e., a bias of T compared to the observations can be seen (Figure 3). Warm bias occurs due to MERU's high elevation, which is not well-reflected in the WRF simulations due to relatively coarse model resolution. Bias is highest for the WRF-BLA combination; for the other three combinations, however, no remarkable differences could be observed, thus making selection difficult.  Apart from looking at densities in the data space, empirical copula density allows to visualize the dependence structure in the rank space in more detail. It better discloses how and how strongly two variables are tied within their distribution ranges. Figures 4 and 5 illustrate the dependence structures between P and T for the LAMU and MERU stations, respectively.

Joint P and T Distribution for Different WRF Parameterization Simulations
Clear differences can be seen between both stations, indicating different regimes and precipitation-generation processes. From Figure 4, it can be seen that, for LAMU station, low precipitation values are not strongly related to low rainfall rates. The same holds true for high values. Enhanced densities can be found for low rainfall rates and high temperatures. Such features can be partly reflected by only a few of the analyzed parameterization simulations. From Figure 4, for instance, it becomes obvious that WRF-BLA, with its highest density at low precipitation and large temperature values, is not an optimal choice to represent the observed data. Additionally, it can be seen that the same cumulus parameterization scheme (here, Kain-Fritsch) leads to similar empirical copula density functions.

Selection of a Suitable WRF Parameter Combination
Besides visual inspection, χ 2 -test statistics were calculated so as to more objectively decide which parameterization scheme should be selected. Table 2 displays the minimum p-values for which the h 0 hypothesis (the modeled P and T data come from the same distribution as the observed data) can be rejected, i.e., small values close to zero cast doubt on independence. In this case, one would accept the h 1 hypothesis (modeled and observed P and T data come from different distributions). Based on the retained p-values of the χ 2 -test, one would select BLA parameterization for LAMU, and KWA for MERU. It can be seen that the test result may depend on the applied lattice to estimate the empirical copula densities, which, in turn, depends on the number of available data tuples. Classically (separate evaluations of P and T), one would have come to a different selection (Table 3). For MERU station, one would have selected KLA for both precipitation and temperature, whereas one would select BLA for precipitation and KWA for temperature.

MOS Impact on Different WRF Parameterization Simulations
Two different types of MOS were applied to the WRF parameterization simulations, namely, local intensity scaling for precipitation and linear scaling for temperature. For precipitation, the selected-best configuration could not significantly be improved by the MOS; in fact, in most cases, statistics became worse. The same tendencies could also be observed for temperature and MERU station. For LAMU, however, temperature statistics could still be improved for the selected best configuration by the MOS.
Interestingly, the MOS could not improve the performance statistics of any configuration suggested by joint selection (based on the empirical copulas). For the parameterization combinations suggested by the empirical copulas, namely, KWA for MERU and BLA for LAMU, performance decreased after application of the MOS approaches, except for LAMU temperature values. Here, the MOS lead to a remarkable improvement, which is very close to the best performing configuration (KWA). This indicates that the joint selection may lead to a better decision than traditional selection does. By means of simple linear MOS, the temperature biases of the identified combination obtained by the joint selection approach (BLA) could be significantly reduced. In-depth analyses of the impact of different bias-correction schemes go beyond the scope of this study, but will be the subject of subsequent analyses.

Discussion
Multiphysics ensemble RCM studies traditionally focus on the representation of the statistics of single hydrometeorological variables (e.g., References [9,10,14,18,[41][42][43]). As demonstrated by the present paper, this may lead to nonoptimal decisions on the parameterization combination with which to conduct RCM projections, induced by various performance measures for different hydrometeorological target variables.
Unlike the traditional selection procedure, the new joint evaluation approach specifically takes into account the covariance structure of RCM simulation variables such as P and T. Through this, the new approach can provide useful complementary knowledge for the identification of a suitable configuration combination. It is suggested to analyze empirical copula functions rather than the bivariate density plots in the data space [44], because only an empirical copula discloses significant differences in the joint dependence structure of the variables. Besides visual inspection, the χ 2 -test allows making an objective decision with respect to the (P and T) dependence structure. Results were found to be robust, with only little impact of the choice of the observed lattice. Note that a relatively large simulation period is required (significantly longer than one year) due to precipitation's intermittent nature and the inapplicability of the empirical copula in the case of P = 0 [30].
It is speculated that the new approach may reduce the need for complex multivariate postprocessing bias-correction approaches. However, one should test whether simple linear scaling approaches could further reduce biases, as shown for temperature at LAMU, or if performance further decreases compared to uncorrected series, since additional errors and uncertainties are introduced by MOS [15,16]. More in-depth studies are necessary to systematically check this speculation. Therefore, it is suggested to apply various MOS approaches with different levels of complexity and many more observation sites for validation than used in this conceptual study.
We conclude that the presented approach is a useful complement (besides traditional univariate evaluations) to identify suitable RCM physical parameterization combinations in order to obtain more reliable RCM simulations. Unlike the classical approach, it avoids ambiguity, since only one performance measure is considered, i.e., χ 2 -test statistics. The new approach, however, should only be considered as supplementary analysis. Classical approaches remain essential in the selection process, since they, e.g., also consider bias structures in a spatial sense rather than conduct pointwise evaluations. In theory, more variables than P and T could be implemented, depending on the needs and the available observation data.
Finally, the combination of the classical and the new joint evaluation approach may help to produce more reliable future climate projections, and may, therefore, contribute to the improvement of future climate-impact studies.
Funding: This study was partially funded by the Bavarian State Ministry for the Environment and Consumer Protection in the framework of the BIAS-II project (TKP01KPB-66747). N. Kerandi obtained funds from the National Commission for Science, Technology, and Innovation, and the German Academic Exchange Service.