Future Projections and Uncertainty Assessment of Precipitation Extremes in the Korean Peninsula from the CMIP6 Ensemble with a Statistical Framework

Scientists occasionally predict projected changes in extreme climate using multi-model ensemble methods that combine predictions from individual simulation models. To predict future changes in precipitation extremes in the Korean peninsula, we examined the observed data and 21 models of the Coupled Model Inter-Comparison Project Phase 6 (CMIP6) over East Asia. We applied generalized extreme value distribution (GEVD) to a series of annual maximum daily precipitation (AMP1) data. Multivariate bias-corrected simulation data under three shared socioeconomic pathway (SSP) scenarios—namely, SSP2-4.5, SSP3-7.0, and SSP5-8.5—were used. We employed a model weighting method that accounts for both performance and independence (PI-weighting). In calculating the PI-weights, two shape parameters should be determined, but usually, a perfect model test method requires a considerable amount of computing time. To address this problem, we suggest simple ways for selecting two shape parameters based on the chi-square statistic and entropy. Variance decomposition was applied to quantify the uncertainty of projecting the future AMP1. Return levels spanning over 20 and 50 years, as well as the return periods relative to the reference years (1973–2010), were estimated for three overlapping periods in the future, namely, period 1 (2021–2050), period 2 (2046–2075), and period 3 (2071–2100). From these analyses, we estimated that the relative increases in the observations for the spatial median 20-year return level will be approximately 18.4% in the SSP2-4.5, 25.9% in the SSP3-7.0, and 41.7% in the SSP5-8.5 scenarios, respectively, by the end of the 21st century. We predict that severe rainfall will be more prominent in the southern and central parts of the Korean peninsula.


Introduction
Heavy precipitation can have cascading effects on communities, infrastructure, agriculture, and livestock, as well as on economically and culturally important natural ecosystems. For example, extreme rain events can result in costly damage to wastewater treatment plants, culverts, and roads. Extreme precipitation can result in landslides and floods, accompanied with a loss of life and the deterioration of infrastructure. Thus, understanding and projecting heavy rainfall is of significant importance to climate change impact, adaptation, and vulnerability assessments. quantify uncertainty in an MME. In calculating the PI-weights, considering only one or two climate variables over a relatively small area can lead to an overfitting problem [36,40]. To avoid this problem, we thus consider five climate variables over East Asia, while our focus is the annual maximum daily precipitation (AMP1) over the Korean peninsula.
In applying the PI-weighting, we have to determine two shape parameters that control the strength of the weights. One way to select the shape parameters is a leave-one-out perfect model test [36,40], but it requires a huge amount of computing time. To overcome this computational problem, we suggest simple ways to determine these parameters based on the entropy and p-values of the chi-square statistic.
The remainder of this paper is structured as follows. The data construction and numerical models are described in Section 2. The statistical methods are briefly mentioned in Section 3. Section 4 describes the PI-weighting with the computational details, including simple ways for determining the shape parameters. The results of the model weights and projected future changes are presented in Sections 5 and 6, respectively. Section 7 describes the results of an uncertainty assessment and projection by latitude based on an analysis of variance. In Section 8, relative improvement of the PI-weighted method over the simple average is quantified using the skill score and prediction variance. Discussions are then given in Section 9, followed by a summary of the paper in Section 10. Details including technical specifics, tables, and figures are provided in the accompanying supplementary material (hereafter referred to as the Supplementary Materials) file.

Data and Simulation Models
We consider five climate variables over East Asia to avoid overfitting in calculating the PI-weights, while our focus is the annual maximum daily rainfall (AMP1) over the Korean peninsula in this study. Table 1 lists five climate variables.
Consecutive wet days and dry days are defined as consecutive days with daily precipitation of ≥1 and <1 mm, respectively [42,43]. Table S1 in the accompanying Supplementary Materials lists the 21 CMIP6 climate models used in this study. The considered scenarios are shared socioeconomic pathways SSP2-4.5, SSP3-7.0, and SSP5-8.5 [28]. Hereafter, we shorten the scenario names to SSP2, SSP3, and SSP5, although we use both versions interchangeably. Three overlapping periods are considered for future data, namely, period 1 (2021-2050), period 2 (2046-2075), and period 3 (2071-2100), abbreviated by P1, P2, and P3 in this study. To re-grid the common grid points of 1 • × 1 • , the iterative Barnes interpolation scheme [44] was employed for the observations and simulation data from the 21 models for each of five climate variables. The Barnes technique produces a rainfall field on a regular grid from irregularly distributed rainfall observation stations.
East Asia (EA) is considered by many authors (e.g., [4,14,15,45,46]) to be one of the regions most vulnerable to future increases in extreme weather and climate under global warming. For example, projected changes in extreme precipitation indices over the Asian monsoon domain are larger than over other monsoon domains, indicating the strong sensitivity of Asian monsoons to global warming [47].   Figure 1) in North Korea are too sparse, re-gridding using Barnes interpolation may not capture the locality of severe rainfall there. For a better re-gridding in North Korea, we used the Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) reanalysis data [51] as auxiliary information. However, the AMP1 in APHRODITE has serious bias in its mean and variance, as shown in Figure S1. We thus correct it using nearby observations using the quantile mapping technique [52]. Examples of time series plots of the observations, APHRODITE data, and the bias-corrected APHRODITE data near the observational stations are shown in Figure S1. Then, the bias-corrected APHRODITE data and the observations at stations were used together for better interpolation in re-gridding. On the right side of Figure 1, the map of the KP shows the spatial distribution of 91 rainfall observation stations, the 80 APHRODITE grid points in North Korea, and the final 46 grids used in this study.

Generalized Extreme Value Distribution
The generalized extreme value distribution (GEVD) is widely used to analyze extreme univariate values. The three types of extreme value distributions are sub-classes of GEVD. The cumulative distribution function of the GEVD is as follows: when 1 + ξ(x − µ)/σ > 0, where µ, σ, and ξ are the location, scale, and shape parameters, respectively. The particular case for ξ = 0 in Equation (1) is the Gumbel distribution, whereas the cases for ξ > 0 and ξ < 0 are known as the Fréchet and the negative Weibull distributions, respectively [53]. It can be helpful to describe the changes in extremes in terms of the changes in extreme quantiles. These are obtained by inverting the following (1): Here, z p is known as the return level associated with the return period 1/p because level z p is expected to be exceeded, on average, once every 1/p years [53]. For example, a 20-year return level is computed as the 95th percentile of the fitted GEVD and a 50-year return level as the 98th percentile. Conversely to the above, the return period T(z) = 1/p for the given value z is obtained by calculating p = 1 − G(z). For the given value z, T(z) is sometimes called the expected waiting time, and the value p = 1 − G(z) is referred to as the exceedance probability of z.
Assuming the data approximately follow a GEVD, the parameters can be estimated by the maximum likelihood method [53,54] or the method of L-moment estimation. The Lmoment estimator is more efficient than the maximum likelihood estimator in small samples for typical shape parameter values [55]. The L-moment method is employed in this study using the "lmom" package in R [56] because a relatively small number of samples are analyzed for each comparison period. Moreover, the formulae used to obtain the L-moment estimator are simple compared to those for obtaining the maximum likelihood estimator, which needs an iterative optimization until convergence.

Bias Correction
Although simulations from climate or meteorological models provide significant information, the simulated data are associated with potential biases in that their statistical distribution differs from the distribution of the observations. This is partly because of unpredictable internal variability that differs from the observations, and because global climate models (GCMs) have a very low spatial resolution for being employed directly in most impact models [57,58]. For example, in GCM precipitation fields, the bias may be due to errors in convective parameterizations and unresolved subgrid-scale orography [59]. Bias-correction (BC) methods are commonly applied to transform the simulated data into new data with no or fewer statistical biases with respect to an observed time series. In this study, future simulation outputs are bias corrected to compute the intensity of extreme rainfall.
To correct the model outputs more efficiently by taking account of the dependency among variables or nearby grids, several multivariate BC methods have recently been proposed. In this study, we chose the multivariate bias correction (MBC) method by Cannon [59] among the many available BC methods [52]; it is a multivariate extension of quantile delta mapping (QDM). QDM has an advantage of preserving the approximate trends of the model data. In this study, the MBC is applied to the five climate variables provided in Table 1 to take into account the dependency among these variables. The MBC is applied one time to the future simulation data for all periods (2021-2100) based on the historical information, not for each period separately. More details are provided in the Supplementary Materials.

Performance and Independence Weighting
Knutti et al. [39] argued that the growing number of models with different characteristics and considerable interdependence finally justifies abandoning a strict model democracy. They provided at least five reasons for why PI-weighting is required. Brunner et al. [36] illustrated that PI-weighting leads to an increase in the investigated skill score for temperature and precipitation while minimizing the probability of overfitting (or overconfidence).
As the basic idea of PI-weighting, models that agree poorly with observations for a selected set of diagnostics receive less weight, as do models that largely duplicate existing models [39]. Weights are calculated for each model based on a combination of the distance D i (informing the performance) and the model similarity S ij (informing the dependence): with the total number of model runs M and the shape parameters σ D and σ S . The weights are normalized such that their sum equals 1. The details in computing the performance distance D i are given in the Supplementary Materials. The numerator represents the modeling skill when using a Gaussian weighting, where the weight decreases exponentially the farther away a model is from the observations. The denominator is the "effective repetition of a model" [32] and is intended to account for the model interdependency [39]. The details in computing the distance S ij are given in the next subsection.
The shape parameters define the strength of the weighting and the relative importance of the performance and independence [36]. Large values will lead to an almost equal weighting, whereas small values will lead to aggressive (or one-sided) weighting, giving a few models most of the weight. The shape parameters are often determined through a perfect model test (or a model-as-truth experiment) using the continuous rank probability score [36,40]. The perfect model test picks each model from a multi-model ensemble in turn and treats it as the true representation of the climate system. This leave-one-out procedure requires a huge amount of computing time. To address this computational problem, we consider relatively simple ways to determine the shape parameters in the next subsections.

Computing Independence Weights
If a model has no close neighbors, then all S ij (i = j) are large, and the denominator of the PI-weight is approximately one and has no effect. If two models i and j are identical, then S ij = 0 and the denominator equals two, so each model gets half the weight.
To calculate the model similarity S ij , we follow a technique among several methods proposed by Sanderson et al. [60]. A method employed in this study is based on the empirical orthogonal function (EOF) or principal component analysis. The following process is done for each grid: First, for each model, the historical data from 1850 to 2015 and the future simulation data from three scenarios are lined up as one time series data, as in Figure S2 in the Supplementary Materials. The bias correction is not applied to all data for this process. We can choose the historical data only, as was done by Brunner et al. [61], but we deploy all simulation data for a maximum use of available information. For each time series induced from each model, seven-year moving averages are obtained. Then, for each climate variable, a correlation matrix R among all M models is constructed by applying the Spearman correlation coefficients to those M numbers of the series of sevenyear moving averages. That is, R is the correlation matrix of M models, with size M × M.
A singular value decomposition (SVD) is performed on R 1/2 and truncated to t modes to obtain the dominant modes of multivariate ensemble variability such that where U is an orthogonal matrix of model loadings (size M by t) whose columns are the eigenvectors of the model correlation matrix R, λ (size t by t) are the eigenvalues of R, and V (size M by t) are the eigenvectors of R. The dimensions are sorted by decreasing eigenvalue, such that the basis set can be truncated to a smaller number of modes t [60]. Note that t is often determined by selecting a number of the eigenvalues greater than 1. The model loadings U now define a t-dimensional space (where t is the truncation length of the SVD) in which intermodel and observation-model Euclidean distances may be defined. The intermodel distances can then be measured in a Euclidean sense in the loadings matrix, such that the distances S ij between two models i and j can be expressed as [60] U(i, l) is interpreted as a correlation or a dependency of the model i to the l-th principal component. Thus, a small S ij value means high dependency or similarity between models i and j.
The above procedures are done separately for each of five climate variables listed in Table 1. Thus, we have five distances between models i and j, corresponding to five climate variables. Then, the final distances are obtained by averaging those five distances for models i and j. An example of the final distances S ij between two models i and j averaged from the five distances is given in Table S2 in the Supplementary Materials. A small value indicates high dependency or similarity between two models. In Table S2, the first four models (UKESM2, CanESM5, EC-Earth3-Veg, and EC-Earth3) show the highest similarity, whereas the last four models (MPI-ESM1-2-LR, GFDL-ESM4, INM-CM5-0, and MPI-ESM1-2-HR) show the lowest similarity.

Selection of σ S
To select an appropriate value of the shape parameter σ S for the I-weights, we consider an entropy-based approach. We denote I i (σ S ) as a normalized I-weight for model i and for the given σ S , as defined in the following: where .
The entropy of the I-weights as a measure of uncertainty [62] from these weights is defined by the following: as a function of σ S . When all I i (σ S )s are almost equal, the entropy has a high value. We thus expect the entropy to increase because σ S has a large value. Note that the calculation of S ij does not depend on σ S , and thus, the S ij values obtained are fixed for the entropy computation. The entropy is computed as σ S changes from 0.1 to 1.0 in increments of 0.01. Figure 2 presents the entropy function of σ S computed from the data used for this study, which indicates that it is at its minimum at σ S = 0.4. It is interesting to note that the entropy function increases as σ S decreases from 0.4 to zero. This is explained by looking into the similarity measure 1 + ∑ M j =i exp(− S ij σ S ). As σ S moves toward zero, this measure converges at one for all i. Thus, s i moves toward one, and I i is close to 1/M for all i. Because we want to have a shape parameter σ S that can differentiate the I-weights most distinctly with minimum uncertainty, the value σ S = 0.4 minimizing the entropy is chosen in this study.

Selection of σ D
To select an appropriate value of σ D for the P-weights, we attempted to use the entropy criteria again, but we were not fortunate enough to obtain the optimal result, as with σ S . Thus, a technique based on the p-value of the chi-square statistic [54] is considered in this study.
We denote P i (σ D ) as a normalized P-weight for model i and for the given σ D , which is defined as follows: where D i is the performance measure of the i-th model. For testing the hypothesis frame, the null hypothesis and the alternative hypothesis for i = 1, · · · , M are as follows: H 0 : all weights are equal ⇔ P i = 1 M for all i H 1 : some weights are not equal ⇔ P i = 1 M for some i. For the given P i , the chi-square statistic used to test the above hypothesis is as follows: Because we do not want to accept equal weights, σ D should be selected to reject the null hypothesis. That is, the p-value [54] obtained from the chi-square statistic should be less than a preassigned value α (significance level), e.g., α = 0.05. In addition, because we also do not want aggressive weights, a σ D can be selected as the maximum value of σ D in which we still reject H 0 with α level. That is, our selection is Here, χ 2 indicates a random variable of (9) under the equal P i weights. Although this selection assures the use of the least aggressive weights, it is still statistically significantly different from the equal weights.
The p-values are computed by a Monte-Carlo simulation in which random numbers of weights are generated from the Dirichlet distribution [63]. When the parameters are all equal to 1, the Dirichlet distribution is the same as the multivariate uniform distribution with values between 0 and 1, which represents the null hypothesis. We used the "MCMCpack" package [64] in R to generate the random weights that satisfy H 0 .
The detailed steps of computing the p-value for given σ D and χ 2 0 (σ D ) are: Step 1: Generate random weights P (k) i from the Dirichlet distribution with all parameters equal to 1 (under H 0 ), for i = 1, · · · , M; Step 2: Compute , and denote it χ 2 (k) ; Step 3: Iterate the above two steps K (=1000, for example) times; Step 4: where I[A] denotes the identity function, which takes 1 or 0, depending on if the condition A is satisfied or not. Note that P (k) i s generated in Step 1 do not depend on σ D . Figure 3 depicts the chi-square statistic values computed from AMP1 with some pvalues as σ D changes from 0.10 to 0.30 in increments of 0.01. We calculated the σ D for each of the five climate variables, and then calculated the average from those five σ D s. When α = 0.05, as is usually applied in testing a hypothesis in statistics, the averaged σ * D from five different σ D is 0.21. When α = 0.1, σ * D = 0.25. We use σ * D = 0.21 in this study.  Table S2 provides the similarity values S ij for certain models. Figure 4 shows the intermodel distance matrix for the 21 CMIP6 models considered in this study. The distances are obtained from the five climate variables listed in Table 1 for East Asia. Each box represents a pairwise combination, where red indicates a greater distance. According to Figure 4, models MPI-ESM1-2-HR, INM-CM5-0, GFDL-ESM4, MPI-ESM1-2-LR, and INM-CM4-8 were found to be the most independent, whereas models UKESM, CanESM5, EC-Earth3-Veg, and EC-Earth3 were found to be the most dependent.

PI-Weights
The normalized PI-weights are obtained using Equation (2) with σ S = 0.4 and σ D = 0.21. Figure 5 demonstrates the distributions of the P-, I-, and PI-weights. The variability of the I-weights is smaller than that of the P-weights.
The high P-weights of the CanESM5 and EC-Earth3-Veg models decrease in the PI-weights owing to the low I-weights. The PI-weights of the BCC-CSM2-MR, FGOALS-g3, and GFDL-ESM4 models increase owing to a relatively high independency. The PI-weight is not located in the middle of the P-and I-weights, but is close to the P-weight, except in a few cases. When the P-weight (I-weight) is almost the same as the equal weight, as in the GFDL-ESM4 (KACE-1-0-G and IPSL-CM6A-LR) model, it seems that the PI-weight is wholly influenced by the I-weight (P-weight). The performances for some of the models, such as ACCESS-ESM1-5, MIROC6, INM-CM5-0, and INM-CM4-8, are so low that their (even relatively high) I-weights do not affect the final weights. Based on this view, the performance is more influential to the PI-weights than the independency. Some of these observations may be changed if different σ S and σ D are used.

Results: Future Projection of Extreme Precipitation
Using the PI-weights obtained in the above section, the future extreme precipitations are projected by the MME. Note that the future climate data are used after the bias correction with the MBC method [59]. Figure 6 illustrates the time series plots of the nine-year moving averages of AMP1 in Seoul, for example, from the observations, from the PI-weighted ensemble of the historical data, and from the PI-weighted ensembles for the future data from the three SSP scenarios, with a 90% confidence band. In Figure 6, the line for the observations shows more variation than the lines of the historical data and the future projections by the MME. This means that the MME has smaller variance and can catch the signal more clearly than a single simulation model. The boxplot of the historical data after the BC is similar to that from the observations, whereas the boxplot before the BC is much smaller than that from the observations. The increasing trends from P1 to P3 are evident in every scenario. Summary statistics of the corresponding values of these boxplots are provided in Table S3.

Return Levels
These values are lower in the 20-year return levels than the results by Lee et al. [27], who used the BMA method with the CMIP5 models. The median values for P3 in Table S3 are higher than the results by Shin et al. [41], who used the CMIP6 models and a hybrid weighting method between the BMA and equal weighting. It seems that the hybrid weighting by Shin et al. [41] and the PI-weighting in this study are similar. These last two weighting schemes assigned relatively lower weights to the CMIP6 models with heavy future precipitation than a weighting method based on the performance only would give to the same models.     These rates of change are lower than the results by Lee et al. [27], who used the BMA method with the CMIP5 models. This is perhaps due to the differences in the reference period and the research methods, as well as the difference between the CMIP5 and CMIP6. For the 20-year return level, our result in the KP is approximately 1.8 to 2.0 times faster than the changes in the globally averaged value (10% in the SSP2-4.5 and 20% in the SSP5-8.5) reported by Kharin et al. [65].  These projections of extreme rainfall are similar to or less frequent than the results obtained by previous studies [23,27] based on CMIP5 models, i.e., approximately 1-in-11 (1in- 21) year and 1-in-7 (1-in-13) year events under SSP2 and SSP5, respectively. Shin et al. [41] realized the occurrence as approximately 1-in-10 (1-in-30) and 1-in-8 (1-in-17) year events under SSP2 and SSP5, respectively, which is similar to our projection.

Exceedance Probability and Waiting Time
Because of computational issues and the defects of the return period [66], the exceedance probability is often used as an alternative to the return period [67]. This is defined as Pr [Y(θ) > z], where z is a specified precipitation value and Y(θ) is a random variable following a GEVD with a parameter estimateθ. Here, Y(θ) depends on the models, periods, and scenarios.
The spatially averaged estimates of the exceedance probability are presented in Figure S6 and Table S6. There are relatively large differences in the exceedance probability of a downpour of 100 to 250 mm compared with that for over 250 mm, as shown in Figure S6. The differences between the past and future scenarios are distinct during the period P3.
From Table S6, the return period or expected waiting time (T(z)) until the reoccurrence of a specific AMP1 value (z) is computed by T(z) = 1/p(z), where p(z) is the exceedance probability of the AMP1 z. These values are listed in Table 2. For z = 200 mm of rainfall, for example, the expected waiting times until a reoccurrence are 11.1 years in the past, 13.0 years in the future period P1, 8.0 years in P2, and 5.9 years in P3 based on the SSP5 scenario. For the case of a z = 300 mm downpour, the expected waiting times are 78 years in the past, 76 years in P1, 50 years in P2, and 29 years in P3 based on the SSP5 scenario.

Expected Number of Reoccurring Years
Another quantity that we can obtain is the expected number of reoccurrences during a certain period. By multiplying 30 years by the exceedance probability p(z), we can estimate the expected frequency of such years over a period of 30 years in which we have more than z amount of AMP1 for a year. These values are given in Table S7. For z = 150 mm (200 mm), as a specific example, during the last 30 years, we have experienced 6.8 (2.7) years in which AMP1 was greater than 150 mm (200 mm). In addition, we are likely to have an expected number of years of 8.2 (2.3) for the future period P1, 10.1 (3.8) for P2, and 12.1 (5.1) for P3 under the SSP5 scenario.
From this comparison, particularly for AMP1 of 100 to 300 mm, the expected number of years of occurrence for the future periods under the SSP5 scenario increases by approximately 1.25 times that over the last 30 years for P2 and 1.91 times that for P3 by the end of 21st century. These results are based on the spatially averaged values. When the exceedance probability is considered for each grid, different local results will be obtained.

Results: Projection by Latitude and Quantifying Uncertainty
Three or four major sources of climate projection uncertainty might be considered when trying to understand uncertainties in projected metrics: (1) climate model, (2) emission scenario, and (3) internal variation or randomness unexplained by other sources [68]. We apply the analysis of variance (ANOVA) technique [69] in this study. Other methods considered in [67] can also be applied.

Variance Decomposition with Interaction
Hawkins and Sutton [68] applied a method of variance decomposition to quantify uncertainty when assuming no interactions between the sources. In addition to their assumption, we added the interaction term between the model and scenario. Other factors, i.e., (4) future period and (5) location, such as the latitude applied in this study, are also considered in the ANOVA model. Although the period and latitude may not be real sources of uncertainty, they can be included as independent variables in the ANOVA model because they may affect the response variable. Here, the response variable is a 20-year return level, for example. Figure 10 presents the proportions of variances contributed by each variable for each period. In P1, the variance by the model is the largest, whereas that by the scenario is the smallest. The proportion of internal variation (residual) decreases from P1 to P3. The variance proportion by model decreases from P1 to P3, whereas the variation due to the scenario increases significantly from P1 to P3. It is notable that the variation contributed by the interaction between the model and scenario is relatively large during period 1, but becomes smaller during period 3. The details of this variation are presented in Figure 11, where the return levels from the models generally increase in mean and variance as the scenario changes from SSP2 to SSP5. However, the return levels of some models, such as CESM2-WACCM, ACCESS-ESM1-5, NorESM2-MM, MIROC6, MRI-ESM2-0, GFDL-ESM4, and INM-CM5-0, decrease from SSP2 to SSP3, contrary to the expectation. Moreover, the return values of some models, such as MPI-ESM1-2-HR, MPI-ESM1-2-LR, NorESM2-LM, and FGOALS-g3, decrease from SSP3 to SSP5. The latter two "unexpected" situations make the interaction variance large in P1, as shown in Figure 12. Despite such "unexpected" situations still appearing in P3, the variations owing to the scenario and model themselves are relatively too large, and thus, the proportion of the interaction variance becomes relatively small in P3. Figure 13 depicts boxplots of the 20-year return levels (unit: mm) from the 21 CMIP6 models as the latitude changes from south to north. It is evident that the return levels decrease initially from 33 • to 36 • , but increase from 36 • to 38 • . However, they decrease from 38 • to 42 • more rapidly. In Figure 13, the solid lines around the boxes across the latitude are the spatial medians of the 20-year return levels for each latitude and the four periods (P0, P1, P2, and P3). Here, P0 indicates the reference period, and thus, the result is obtained from the observations. The dashed lines depict the values for the three scenarios (SSP2-4.5, SSP3-7.0, and SSP5-8.5), in which three dashed lines locate the inside range of the solid lines. The spatial medians for the scenario (period) are obtained over all values across the periods (scenarios) for the 21 models. It is notable and interesting that the variation owing to the scenario is smaller than that from the period. Future projections fluctuate less than the observations.   The detailed values of the 20-year return levels of AMP1 and annual maximum fiveday precipitation (AMP5) by latitude are presented in Table 3. These localized values provide different information from the previous results based on the spatial medians and averages over the Korean peninsula (Table S3). Table 3 indicates that a much higher intensity occurs in the southern part of the Korean peninsula than in the northern part. When the areas below and above 38 • are considered the south and north, respectively, the south is likely to receive approximately 1.54 (1.72) times heavier rainfall in the 20-year return levels of AMP1 on average than the north (the spatial median) based on a rough calculation using the above tables. Although we did not obtain these values, the localized values of the return periods, the relative changes in the return levels, the exceedance probabilities, and the expected number of occurring years can be obtained for each latitude. Those values in the south would show higher intensity during extreme precipitation with shortened expected waiting times compared to the spatial medians throughout the Korean peninsula.

Comparison of PI-Weighted Ensembles to the Simple Average
For comparison between the PI-weighting scheme and the simple average for the multimodel ensemble, we consider two measures. The first one is the error index I 2 based on Baker and Taylor [70] for the reference (historical) period. The second is the weighted variance of return level prediction [71,72] for the future periods. In computing these measures, non-bias-corrected data were used.

Error Index for the Historical Period
The error index can measure combined errors, that is, over multiple variables, compared to the observed climate [40]. However, we only evaluate our target variable AMP1 over the reference period 1973-2010. In a first step, the normalized error is calculated for the T-year return level as the difference between the PI-weighted multimodel and the observations: where S n is the weighted multimodel T-year return level per grid point n. S n is calculated from the historical data that are not bias corrected. o n is the T-year return level obtained from the observations, and σ n is the standard deviation of o n . The corresponding e 2 eq is calculated for the nonweighted multimodel T-year return level. Note that, for each n, o n and σ n are unchanged between e 2 w and e 2 eq because those are computed from the observations on grid point n. Then, the error index is obtained by: This index can be useful to evaluate if the weighted mean is improved compared to the simple average [40]. The smaller the I 2 , the larger the improvement in the weighted average compared to the simple average for the target variable (AMP1) for the reference period.
The computed values of I 2 in this study are 0.622 for PI-weights and 0.627 for Pweights for the 20-year return level. Thus, there are 37.8% and 37.3% improvements in the PI-weighted and the P-weighted averages, respectively, compared to the nonweighted average. The decrease of the error index by adding I-weights from P-weights is 0.005, which is a 0.80% decrease relative to that of the P-weights. The results for the 50-year return level are similar to these.

Prediction Variance for the Future
The second measure to compare is from the weighted variance formula [71,72]: where r(T) is the T-year return level, r k (T) is the T-year return level from the k-th model, andr(T) = 1 K ∑ K k=1 r k (T) w k . The first term of (13) is the among-model variance, and the second one is within-model variance. Var(r k (T)) is estimated using a bootstrap technique in this study. Based on this variance, we quantify the skill of the PI-weighted method with the Brier skill score [54]: where Var PI is the variance of the T-year return level calculated by (13) from the PI-weights, and Var Eq is the variance calculated by (13) from the equal weights for each grid point. BSS is a relative quantity and shows a percentage improvement of the PI-weighted method over the simple average. Figure 14 shows the differences of variances of 20-year return levels calculated from the PI-weights and the equal weights plotted for 46 grid points for each future period and for each scenario. Var PI for the Y-axes is the variance calculated by Equation (13) from the PI-weights. Var Eq for the X-axes is the corresponding variance calculated from the equal weights. In the top panel, more points are located below the diagonal lines, which means that the variances calculated from the PI-weights are smaller more often than those from the simple average. The middle (bottom) panel shows the differences of between-model (within-model) variances. Var BM,PI and Var W M,PI for the Y-axis are between-model and within-model variances from the PI-weights, respectively, and Var BM,Eq and Var W M,Eq for the X-axis are the corresponding variances calculated from the equal weights. The within-model variances from both weights are similar, as seen in the bottom panel. Thus, we know that the differences in total variances in the top panel are mainly because of the differences of between-model variances. Table 4 reads the averaged values of BSS over 46 grid points, the averaged differences of variances (unit: mm 2 ) of the 20-year return level, and relative improvements (RIs) for three periods and for three scenarios. The BSS values range from −2.2% to 22.5%. The averages of BSS over all periods are 6.29% for SSP2, 15.2% for SSP3, and 13.1% for SSP5 (11.5% overall). These averages of BSS also increase as the period moves from P1 to P3. The BSS due to between-model variance is bigger than that due to within-model variance. Watching the sixth to the eighth columns of this table, we know that the positive differences of two variances (Var Eq − Var PI ) are mainly because of the differences of two between-model variances (Var BM,PI − Var BM,Eq ). These patterns are prominent for the SSP5-8.5 scenario with increasing values from P1 to P3. The last column shows the RI of the PI-weighted method over the simple average in the variances. The RI values range from −3.3% to 26.0%. The averages of RI over all periods are 7.45% for SSP2, 19.1% for SSP3, and 14.0% for SSP5 (13.5% overall). These averages of RI also increase as the period moves from P1 to P3.

Discussion
It is generally accepted that increasing greenhouse gases induce atmospheric temperature warming, which results in increasing equivalent potential temperatures and specific humidity, according to the Clausius-Clapeyron relationship [73]. The increase in atmospheric water vapor is the main factor in generating convective instability. In view of this insight, Kim et al. [26] obtained an indication that increasing the extreme precipitation over South Korea in the past and future scenarios is related more to a change in convective instability rather than to synoptic conditions. Another plausible explanation for the increase in the maximum precipitation over the Korean peninsula is the increase in both the frequency and strength of typhoons in the region. Typhoons have a greater influence on the southern area than the northern part. The occurrence of extreme rainfall in the southern part is related to more sources or variables than in the northern area [25,26].
We observed a scale discrepancy between the model grids and the observation stations, which may compromise the credibility of our results. Regional projections require a fine resolution of the simulation models, whereas some of the CMIP6 models have a coarse resolution. It is known that the model spread is one of the major sources of uncertainty in regional predictions (Hawkins and Sutton, 2009). However, large numbers of simulation models can reduce the uncertainty and provide a robust projection [74,75]. A total of twenty-one CMIP6 models with different scales used in this study can cover the study region with a finer spatial structure to increase the reliability of the projection.
The daily precipitation data consist of measurements from 00:00 to 24:00 throughout the day. In daily observations, the rainfall does not accumulate between 22:00 and 02:00. In the data used in this study, such precipitation is divided and recorded in two separate days. The actual serious daily risk due to heavy rainfall does not exactly depend on the precipitation over the time duration from 00:00 to 24:00 exclusively. It is therefore recommended to consider the AMP1 data based on the maximum precipitation during the 24 h movement. In this sense, the results presented in this study underestimate the actual intensity and frequency of AMP1. More realistic daily data, such those as obtained after moving for 24 h and the annual maximum of two (and several) days of precipitation, should be used in a future study for assessment of risk owing to extreme rainfall.
Brunner et al. [36,61] considered multiple observational (or reanalysis) datasets to include an estimate of the observational uncertainty. They proposed a novel approach to account for the observational spread and uncertainty in a multi-model weighting study, which can lead to robust results and a more precise uncertainty quantification. In addition, considering multiple observational datasets may address the problem in which the BC and performance-based weighting scheme utilize an excessive number of observations. We believe that using the observations twice in the BC and weight calculation is unadvisable. Xu et al. [35] considered a Bayesian weighting method that removes observations during the initial phase of the downscaling and adds them in the estimation of the posterior distribution. However, if the series of observations is sufficiently long to divide into two parts, we may use one part for the BC and the other part for the weight calculation. Although we did not apply these methods, this would be a good approach in a future study.
In determining the shape parameter σ D for the performance weights, we used the standardized return levels. However, as a different approach, the BMA method described in [27,41,72] can be employed. The BMA weighting does not need to standardize the return levels or determine the shape parameter σ D . It requires a bootstrap to estimate the variances of the return levels, which is straightforward in current computing facilities. Thus, calculating the P-weights based on the BMA, with an effort to minimize the probability of overconfidence, is recommendable.

Summary
We estimated the future changes in precipitation extremes within the Korean peninsula using observations, 21 multiple CMIP6 models, generalized extreme value distribution, the multivariate bias correction technique, and the model weighting method (PI-weighting), which account for both the performance and independence of the models. To avoid overfitting in the PI-weighting, we considered five climate variables for East Asia.
In applying the PI-weighting method, we suggest two ways of selecting two shape parameters, based on the p-value of the chi-square statistic and entropy. The suggested methods are simple and intuitively appealing, although they may need more justification for use in other studies.
From the analysis described this study, we realized that a 1-in-20 year (1-in-50 year) annual maximum daily precipitation within the Korean peninsula will likely become a 1-in-11 (1-in-26) year, a 1-in-8 (1-in-20) year, and a 1-in-7 (1-in-15) year event in terms of the median by the end of the 21st century under the SSP2-4.5, SSP3-7.0, and SSP5-8.5 scenarios, respectively, as compared to the observations from 1973 through 2010. These results are similar to or less frequent than those obtained by previous studies [23,27,41], but still, they predict more frequent and intensified extreme precipitation events by the end of the 21st century as compared to 1973 through 2010.
The expected frequency of the reoccurring years, particularly for AMP1 from 100 to 300 mm under the SSP5 scenario, is projected to increase by approximately 1.0 times that of the past 30 years for period 1 (2021-2050), approximately 1.25 times that for period 2 (2046-2075), and approximately 1.91 times that for period 3 (2071-2100).
From the analysis based on latitude, we found that extreme rainfall is more prominent in the southern and central parts of the peninsula. The downpour in the southern part is approximately 1.54 times heavier than that of the northern part and approximately 1.72 times that of the spatial median of the Korean peninsula. For example, for 150 mm of AMP1, the expected waiting times until reoccurrence in the southern part (spatial median of the peninsula) are 2.6 (4.4) years during the reference period, 2.1 (3.6) years during P1, 1.7 (3.0) years during P2, and 1.5 (2.5) years during P3 based on the SSP5-8.5 scenario.
Heavy rainfall can have a significant effect on human life, communities, infrastructure, agriculture and livestock, and natural ecosystems. Thus, in addressing the impact of climate change due to more frequent extreme precipitation events, governments and communities should prepare the proper infrastructure and systems more carefully and securely to prevent critical damage, such as loss of life from landslides and flooding.
Supplementary Materials: The following are available at https://www.mdpi.com/2073-4433/12/1 /97/s1. Table S1: The list of 21 CMIP6 (Coupled Model Intercomparison Project Phase 6) models analyzed in this study. Table S2: The similarity distance metric S ij between model i and model j. Table S3: Statistics of 20-year and 50-year return levels of the annual maximum daily precipitation (unit: mm) averaged over 46 grids in the Korean peninsula for the observations (OBS) and the future periods under the three SSP scenarios. Table S4: Relative change (unit: %) in 20-year and 50-year return levels of the annual maximum daily precipitation averaged over the Korean peninsula relative to 1973-2010. Table S5: Statistics of 20-year and 50-year return periods (unit: year) of the annual maximum daily precipitation averaged over 46 grids in the Korean peninsula. Table S6: Spatially averaged the exceedance probability over the Korean peninsula for the annual maximum daily precipitation (AMP1) from 100 mm to 500 mm, obtained from the observations (OBS) and the CMIP6 models. Table S7: The expected frequency of reoccurring years during 30 years for specific the annual maximum daily precipitation (AMP1) values from 100 mm to 500 mm in the Korean peninsula, obtained from the observations (OBS) and the CMIP6 models. Figure S1: Examples of time series plots of the observations (black line), APHRODITE data (red line), and the bias-corrected data (blue line) in the Korean peninsula. Figure S2: Arrangement of data and 7-year moving averages composed of the historical data from 1850 to 2010 and the future data for computing the Spearman correlation coefficient between models. Figure S3: Schematic box-plots of 50-year return levels of the annual maximum daily precipitation (unit: mm) averaged over 46 grids in the Korean peninsula for the future periods under the three SSP scenarios. Figure S4: Isopluvial maps of 50-year return levels of the annual maximum daily precipitation for 46 grids over the Korean peninsula. Figure S5: Isopluvial maps of for the relative changes (unit: %) of 20-year and 50 return levels relative to 1973-2010 for the annual maximum daily precipitation for 46 grids over the Korean peninsula. Figure S6: The exceedance probability plots for the annual maximum daily precipitation (AMP1) from 50 mm to 300 mm in the Korean peninsula, obtained from the observations (OBS) and the CMIP6 models. Figure S7: Interaction plots between 21 CMIP6 models and the latitude in which the latitude changes from 33 • to 43 • , for 20-year return levels (unit: mm) computed over the Korean peninsula.