Ensembles for Viticulture Climate Classiﬁcations of the Willamette Valley Wine Region

: Future climate projections provide an opportunity to evaluate cultivar climate classiﬁcation and preferred styles of wine production for a wine grape growing region. However, ensemble selection must account for downscaled archive model skills and interdependence rather than be arbitrary and subjective. Relatedly, methods for generalizing climate model choice remain uncertain, particularly for identifying optimal ensemble subsets. In this study we consider the complete archive of the thirty-two Coupled Model Intercomparison Project Phase 5 (CMIP5) daily Localized Constructed Analogs (LOCA) downscaled historic datasets and their observational data that were used for downscaling and bias corrections. We apply four model averaging methods to determine optimal ensembles for the computation of six common climate classiﬁcation indices for the Willamette Valley (WV) American Viticultural Area (AVA). Among the four methods evaluated, elastic-net regularization consistently performed best with identifying optimal ensemble subsets. Variation exists among the optimal ensembles computed for each of the six bioclimatic indices. However, a subset of approximately seven to ten climate models were consistently excluded across all six indices’ ensembles. While speciﬁc to the archive and wine region, optimal ensemble sizes were noticeably larger than ensemble sizes commonly employed in published studies. Results are reported such that they can be used by researchers to independently perform analyses involving any one of the six bioclimatic indices throughout the WV AVA while using historic and future LOCA CMIP5 climate projections. The data and methods employed herein are applicable for other wine regions.


Introduction
Climate is significant in determining cultivar suitability and the resulting wine profile typical for a wine grape growing region [1][2][3][4]. Temperature and water availability are recognized as two of the most important environmental factors influencing grapevine growth and the subsequent quality of berries and wine [5][6][7][8][9]. Various indices exist to classify viticulture climate [10], and several recent studies have evaluated the impacts of climate change to viticulture using bioclimatic indices and downscaled future climate projections for wine regions (see Table 1).
Quénol et al. [11] highlighted the importance of accounting for model uncertainty when performing studies that involve the application of future climate projections. Model simulations of the future from the Coupled Model Intercomparison Project Phase 5 (CMIP5) [12] archive are no longer necessarily considered as equally likely independent realizations of the future climate [13][14][15][16][17][18]. The models constituting the CMIP5 archive are now recognized as interdependent by virtue of shared codes and the use of similar parameterization schemes and calibration methods [17,19,20]. The assumption of archive model independence results in a multi-model mean that is biased by the duplicative information that is contributed by the similar models. This can confound assessments of model agreement about changes in future climate and alter the statistics of identified correlations. Ignoring model dependence can result in poor accuracy and poor uncertainty quantification. The number of effective models in the archive is conjectured to be less than the number of simulations [17,[21][22][23][24] and there is a need not only to better understand weighting strategies that constrain projection uncertainty [15][16][17][18], but also to efficiently and effectively generalize model choice in a manner that selects optimal ensemble subsets for specific applications [24,25].
A common approach to uncertainty quantification of future climate projections is to equally weight a multi-model ensemble, possibly formed by the subjective selection of a subset of the available models, wherein weights are assigned without any consideration of model skill [24]. While a given multi-model mean is generally considered a better choice than any single model [24,25], it is also potentially not optimal relative to alternative multi-model ensemble compositions and weights assignment strategies [13][14][15][16][17][18]. It has been noted that the equal weighted multi-model mean does not necessarily quantify projection uncertainty but rather is simply an uninformed ad hoc measure of ensemble spread [26].
It is now more commonplace to address the epistemic uncertainty associated with model choice by applying methods that account for model skill and interdependence wherein skill is measured by a comparison with observed data [15][16][17]. Weight assignments are further complicated by the learned understanding that individual model skill is dependent upon the modeling objective and study location [17]. For viticulture climate classification of a given area using future climate projections, this implies that weight assignments could vary by the bioclimatic index under consideration.
Sanderson et al. [17] introduced two weight assignment strategies to constrain projection uncertainty, one that accounted for model skill and another that combined the computed skill with an evaluation of model independence. These two approaches have been applied in recent studies [14,15,18]. A few recent studies have also applied Bayesian model averaging (BMA) to assign ensemble weights and constrain projection uncertainty [15,16,18]. BMA is attractive from the perspective that it robustly quantifies model uncertainty and its solution implicitly accounts for model interdependence. However, it is unattractive in that its application is highly compute intensive, and can also be technically challenging to diagnose, relative to most other model calibration strategies [27].
Herger et al. [24] distinguished between approaches that assign continuous weights to the entire ensemble archive with those that are focused on the identification of an ensemble subset. They applied mixed integer quadratic programming to efficiently and flexibly find optimal subsets while accounting for model interdependence. Yang et al. [25] used the Taylor diagram [28] to rank models and subsequently combined the higher performing models using simple averaging to find a nine member multi-model ensemble subset that best characterized the spatiotemporal variability of temperature and precipitation over China. In this study we apply a recent advance for regularizing linear models that is directed at automatic variable selection [29][30][31] to find prediction specific optimal ensemble subsets while simultaneously accounting for model skill and interdependence. Regularization of the ensemble linear model is needed given the thirty-two member multi-model archive we consider in this study and acknowledgment that there are 2 n − 1 possible models that involve subsets of n predictors [32]. If n = 32, then there are 4,294,967,295 possible models.
Specification of the selection criteria for the ensemble compositions or the ensemble weights is not common in published studies that report on viticultural climate impact assessments using downscaled CMIP5 ensemble subsets. The arbitrary selection and application of a single model or ensemble subset of downscaled CMIP5 models to quantify future climate impacts is most likely incomplete and suboptimal [24,25]. In this article we apply a directed equal weights strategy similar in design to that performed by Yang et al. [25], the two skill weighting strategies introduced by Sanderson et al. [17], and elastic-net regularization [33] to determine for each method optimal ensemble compositions and weights relevant for the computation of the growing season average temperature (GST) [34], growing degree days (GDD) [1,3], Huglin's index (HI) [35], cool night index (CI) [36], dryness index (DI) [37], and the Géoviticulture multicriteria climate classification (GMCC) [38] viticultural climate indices for a study region containing the Willamette Valley (WV) American Viticultural Area (AVA).
Heat unit accumulation is important for suitability assessment in that each cultivar has a minimum summed temperature requirement to reach maturity. A late-ripening cultivar will not achieve full ripening in a cool climate and an early ripening cultivar will ripen too quickly in a warm climate. Optimal climate suitability occurs where cultivars ripen at the end of the growing season [8]. The most common applied heat unit accumulation index is the Winkler index (WI), more often referred to as growing degree days (GDD) [1,3,10]. It is the sum of the mean temperatures above a specified base for the growing season, defined as 1 April-31 October in the northern hemisphere [1]. The Huglin index (HI) [35] is similar in form to the GDD method but gives extra weight to the maximum temperature, includes an adjustment factor to account for day length as a function of latitude, and as originally defined only sums for the months April-September. Huglin and Schneider [39] classified cultivars as a function of HI. The GST climate index is the mean of the observed maximum and minimum daily surface air temperature values from the first of April through the end of October (for the Northern Hemisphere). Jones [34] demonstrated broad correlation between the GST and cultivar ripening potential across many wine regions.
While temperature-based heat unit accumulation indices are most used to address viticultural suitability [3], Tonietto and Carbonneau [38] introduced the Géoviticulture multicriteria climate classification system (GMCC) to describe and compare the climates of vineyards worldwide. The GMCC combines three complementary viticulture indices that not only capture climatology but also relate well to qualitative potential and characteristics relevant to wine production. The first index, HI, accounts for heat unit accumulation. The second index is the cool night index (CI) which is defined as the mean of the minimum night temperatures for the month before harvest. The third GMCC index is the dryness index (DI). It is the aggregate vertical accounting of potential soil water availability for April to September whose computation considers precipitation, soil evaporation, and vine transpiration while excluding surface runoff and drainage. It functions as an indicator of the level of presence-absence of dryness for a given area. The GMCC is applicable at multiple spatial scales and includes a comprehensive companion database to support worldwide climatic group comparisons, and one potential use of it is the study of the evolution of viticultural climate as a function of climate change [38].
We consider the complete archive of thirty-two CMIP5 daily Localized Constructed Analogs (LOCA) downscaled historic datasets [40,41] of minimum surface air temperature, maximum surface air temperature, and precipitation (https://gdo-dcp.ucllnl.org/ downscaled_cmip_projections/ accessed on 2 December 2020) and their observational data that were used for downscaling and bias correction [42]. The method comparison provides an opportunity to learn about their relative performance. It also reveals optimal ensemble subset sizes for the computation of the GST, GDD, HI, CI, DI, and GMCC indices in the WV AVA. Moreover, by considering multiple indices, we can explore whether LOCA CMIP5 ensemble compositions and weights vary by viticulture climate classification index.
The Willamette Valley is of interest not only given its recent and consistent observed warm to hot vintages [43], but also due to the very limited previous study of climate change impacts to viticulture in the AVA [44]. To our knowledge, this is the first study to comprehensively evaluate a complete downscaled CMIP5 model archive to determine optimal ensemble compositions and weights for the computation of future projections of viticulture climate classification for a wine region. Acknowledging that this type of study is technically challenging and infrequent [24], our results are provided in a manner such that they can be directly used by others to reliably analyze the impacts of daily LOCA CMIP5 future climate projections to viticulture for the WV AVA.

Study Region
The study region consists of 553 model-observation comparison points which cover the 13,913 square kilometer Willamette Valley AVA which lies in the Willamette River Basin in the northwestern part of the State of Oregon (OR) in the U.S. (Figure 1). The WV AVA includes the entire main stem of the Willamette River, including parts of its Middle Fork and Coastal Fork which join near Springfield, OR just south of Eugene, OR. The Willamette River generally flows in a northly pattern to its confluence with the Columbia River in Portland, OR. The WV AVA is bounded to its west by the Oregon Coastal Range and to its east by the Cascade Range.
The Oregon Wine Board [54] reported for the 2019 vintage 25,197 harvested acres of the WV AVA's 25,452 planted acres. Total production for the WV AVA was estimated to be 73,560 tons, which accounted for approximately 69.7% of the total wine grape production for the State of Oregon. For the WV AVA, Pinot Noir, Pinot Gris, Chardonnay, Pinot Blanc, and Riesling accounted for 69.3, 16.9, 7.4, 1.2, and 1.2 percent of its total production, respectively.  [40,41] with their equivalently processed observed data counterparts [18,42].

Time Series Data
The complete archive of Coupled Model Intercomparison Project phase 5 (CMIP5) daily Localized Constructed Analogs (LOCA) multi-model historic  datasets [40,41] of minimum surface air temperature, maximum surface air temperature, and precipitation was collected from the Downscaled CMIP5 Climate and Hydrology Projections archive at https://gdo-dcp.ucllnl.org/downscaled_cmip_projections/ (accessed on 2 December 2020). The 32 models and modeling groups that provided the data used in this study are listed in Table S1. The observation-based 1/16° longitude-latitude gridded Livneh data product [41] was used to train and downscale the daily LOCA CMIP5 datasets which span 25.125° N to 52.875° N and −124.625° E to −67.000° E [40,55]. In this study, version 1.2 of the Livneh dataset at the 553 grid locations indicated in Figure 1 functioned as the observation dataset. It was processed and compared with its corresponding LOCA CMIP5 processed model historic data counterparts [18] to compute prediction specific (i.e., growing season average temperature (GST), growing degree days (GDD), Huglin index (HI), cool night index (CI), dryness index (DI), and Géoviticulture multicriteria climate classification (GMCC)) model-to-measurement evaluations and resulting ensemble composition and weights.

Viticulture Climate Indices and Processed Datasets for Prediction Specific Modeling Analyses
Calculation of any one of the six viticulture climate classification indices involves one or more variables wherein each individual variable is a unique processed base model/observation data product (i.e., precipitation, maximum surface air temperature, minimum surface air temperature). The model-observation datasets developed to support the prediction specific modeling analyses were not the specific viticultural climate indices but rather processed datasets that support their calculation as indicated in Equations (1)-(4), (5) and (7). The growing season average temperature, GST [34], growing degree days,  [40,41] with their equivalently processed observed data counterparts [18,42].

Time Series Data
The complete archive of Coupled Model Intercomparison Project phase 5 (CMIP5) daily Localized Constructed Analogs (LOCA) multi-model historic  datasets [40,41] of minimum surface air temperature, maximum surface air temperature, and precipitation was collected from the Downscaled CMIP5 Climate and Hydrology Projections archive at https://gdo-dcp.ucllnl.org/downscaled_cmip_projections/ (accessed on 2 December 2020). The 32 models and modeling groups that provided the data used in this study are listed in Table S1. The observation-based 1/16 • longitude-latitude gridded Livneh data product [41] was used to train and downscale the daily LOCA CMIP5 datasets which span 25.125 • N to 52.875 • N and −124.625 • E to −67.000 • E [40,55]. In this study, version 1.2 of the Livneh dataset at the 553 grid locations indicated in Figure 1 functioned as the observation dataset. It was processed and compared with its corresponding LOCA CMIP5 processed model historic data counterparts [18] to compute prediction specific (i.e., growing season average temperature (GST), growing degree days (GDD), Huglin index (HI), cool night index (CI), dryness index (DI), and Géoviticulture multicriteria climate classification (GMCC)) model-to-measurement evaluations and resulting ensemble composition and weights.

Viticulture Climate Indices and Processed Datasets for Prediction Specific Modeling Analyses
Calculation of any one of the six viticulture climate classification indices involves one or more variables wherein each individual variable is a unique processed base model/ observation data product (i.e., precipitation, maximum surface air temperature, minimum surface air temperature). The model-observation datasets developed to support the prediction specific modeling analyses were not the specific viticultural climate indices but rather processed datasets that support their calculation as indicated in Equations (1)-(4), (5) and (7). The growing season average temperature, GST [34], growing degree days, GDD [1,3], Huglin index, HI [35], and cool night index, CI [36], are defined in Equations (1)-(4), respectively.
Apr 1 T max , ∑ Oct 31 where n = 214, T max and T min , T max , T min , T mean , and K denote the number of days for the northern hemisphere growing season, the maximum and minimum daily surface air temperature data values in • C, the monthly means of the T max , the monthly means of the T min , the monthly means of the mean of the T max and T min , and an adjustment factor to account for day length as a function of latitude, respectively. In Equation (1) we see that the GST is a function of the sums of the T max and T min for the duration of the growing season.
Equations (2) and (3) emphasize that the GDD and HI are each a function of the T max and T min for their respective periods. The adjustment factor, K, for the calculation of the HI was not included in the right-hand side of Equation (3) since it does not vary by climate model. The CI is a function of the T min for the month of September (see Equation (4)). The dryness index [37] is a function of the April to September monthly means of the daily precipitation and maximum and minimum surface air temperature data, It is based on the following equation calculated monthly from April to September, where W, W o , P, T v , and E s represent the soil water reserve in millimeters (mm) at the end of a given month, the initial soil water reserve in mm, monthly precipitation in mm, monthly potential transpiration in mm, and monthly direct evaporation from the soil in mm [38]. The value for DI each year is the value of W at the end of September (See [38] for additional details about calculation of the DI). The Géoviticulture multicriteria climate classification (GMCC) is defined as the cross product of the HI, CI, and DI [38]; hence, Relative individual LOCA CMIP5 model performance may vary across each of a given prediction's respective functional variables [15,17,18,23]. For example, for computation of the GST, an individual LOCA CMIP5 model might possess better skill relative to the remaining members of the archive in summing daily maximum surface air temperatures from April through October but poor skill relative to the remaining members of the multi-model ensemble with summing the daily minimum surface air temperatures from April through October. Recently introduced methods for evaluating climate model skill require the consideration of each individual variable composing the prediction [15,17,18,23]. Table 2 summarizes the model/observation datasets that were developed to evaluate the performance of LOCA CMIP5 individual and multi-model ensembles for the calculation of the six viticulture climate classification indices throughout the WV AVA. Each processed model-observation dataset specified in Table 2 used data from the 553 model-observation grid locations indicated in Figure 1 for the entire 56 years (1950-2005) that define the LOCA CMIP5 historic period.
T max and T min , T max , T min , and P denote the maximum and minimum daily surface air temperature, monthly means of the T max , monthly means of the T min , and the mean monthly precipitation. a April to October. b April to September.

Modeling Averaging Methods
Model averaging involves combining models to improve prediction accuracy and reduce forecast uncertainty, preferably while simultaneously accounting for model skill and interdependence [15,16,25,56]. Eyring et al. [13] expressed the need for further study and better understanding of advanced model averaging methods for climate projections.

Model
The specified model is a general linear model without intercept, where M i and w i represent the i-th LOCA CMIP5 model and its assigned non-negative weight, respectively. The modeling objective is to minimize model-to-measurement misfit for each WV AVA viticulture climate classification index directed dataset defined in Table 2 by using one of four different methodologies as they are described in Sections 2.4.2-2.4.5.

Skill Weighting
Sanderson et al. [23] and Sanderson et al. [17] introduced a weight assignment strategy for developing climate multi-model ensembles that addressed the assumption regarding CMIP5 model democracy; namely, that individual members of the CMIP5 multi-model archive can vary in their skill by location and prediction. Computation of the root mean squared error (RMSE) between the observations and each model, by variable, enables the ranking of the individual ensemble members. The RMSE is given by where S, O, and N denote modeled data, their observed counterparts, and the number of model-observation comparisons. The skill weight assigned to the i-th LOCA CMIP5 model [17,23] is defined as where A, δ i(obs) , and D q denote a normalizing constant such that the entire set of individual model skill weights sums to one, the median skill for each model [15], and a parameter called the radius of model equality [23] which determines the degree to which models and E and E denote, for a given prediction specific functional variable, the model's RMSE and the mean RMSE across all ensemble members, respectively. A small value assigned to the radius of model equality will result in a weights assignment that favors an ensemble subset whose members possess the best median skill values; whereas, as D q → ∞ , computed weights will approach the multi-model mean which does not account for model skill.

Skill and Model Interdependence Weighting
Sanderson et al. [23] and Sanderson et al. [17] augmented their skill weight assignment strategy to also account for model similarity in attempts to develop ensemble compositions and weights that further reduce CMIP5 archive redundancy by virtue of shared parameterization schemes and training methods. As with their skill weight assignment strategy, RMSE values calculated between each model and its respective variable specific processed observations enables the calculation of independence weights for each LOCA CMIP5 ensemble member. A multi-variate inter-model distance matrix, δ, is computed as the equally weighted linear combination of normalized variable specific inter-model distance matrices that are each composed of their RMSE distances between each pair of models. For any two models i and j, a similarity score matrix is defined as where D u is a parameter defined as the radius of similarity which specifies the distance scale over which models should be considered similar and down-weighted for co-dependence. We choose a value of 0.5 for D u [15,23]. In theory, for two identical models, S ij = 1. Also, S ij → 0 as δ ij → ∞ . Using the similarity scores, the independence weight for the i-th LOCA CMIP5 model is defined as where n is the total number of models [15,17,23]. A weight to assign to each individual model that simultaneously accounts for its skill and interdependence is the product of its skill weight and independence weight normalized by A such that the sum of the w(i) across all models equals one [15,17,23].

Model Median Skill Directed Simple Model Averaging
For each prediction, the median skills computed for each model, δ i(obs) , were ranked and ensembles of size one to size equal to thirty-two were constructed using simple model averaging. This is like the approach presented by Yang et al. [25] who used skill scores obtained from application of the Taylor diagram [28] to build multi-model ensembles using simple model averaging.

Elastic-Net Regularization
In their climate model subset study, Herger et al. [24] mentioned lasso regression from the field of machine learning as an alternative method to their approach. Zou and Hastie [33] introduced the elastic-net penalty as a compromise between ridge [57,58] and Climate 2021, 9, 140 9 of 23 lasso [59] regression. Given observations y i , i = 1, . . . , n, an n × m matrix of covariates X, and an assumed linear model the elastic-net minimizes where λ is a non-negative regularization parameter that is tuned to weight the overall strength of the penalty, α ∈ [0, 1] is specified to control the penalty term to vary from ridge regression at α = 0 to lasso regression at α = 1, and w i is the weight assigned to the ith observation [29]. Ridge regression yields smooth solutions that include all the predictors, whereas application of lasso regression results in automatic variable selection (i.e., sparse, much more easily interpretable solutions) [32]. The elastic-net combines the two methods.
As α increases from 0 to 1 for a fixed λ, the number of zero-valued η j increases from 0 to the sparsity of the lasso [29]. In this study, variable selection was desired, therefore α was specified close to 1 for numerical stability [29]. With α specified close to 1, the elastic-net performs much like lasso regression while retaining ridge regression's capacity to collectively shrink the coefficients for any highly correlated covariables [29,60]. Cross validation (CV) was employed to ensure that the minimizing value for λ was properly located for each elastic-net application. Google Scholar reports 14,411 citations for [33]. Implementations of the elastic net exist in R, Matlab, Apache Spark, and SAS.

Results and Discussion
For each viticulture climate classification directed prediction (i.e., GST, GDD, HI, CI, DI, GMCC), individual LOCA CMIP5 model RMSEs computed by variable are listed in Table 3. Together with the specification of D q and D u , these RMSE values can be used to independently calculate prediction specific ensemble skill weights and independence weights.  Table 2, for calculation of the growing season average temperature (GST), growing degree days (GDD), Huglin index (HI), cool night index (CI), dryness index (DI), and the Géoviticulture multicriteria climate classification (GMCC) throughout the Willamette Valley American Viticultural Area during the LOCA CMIP5 defined historic period    Figure 2 presents viticulture climate classification directed LOCA CMIP5 model relative RMSE values, E * , and median skill values, δ i(obs) , also normalized such that their means are one. Each row of Figure 2 has its own color scale wherein its minimum, mean, and maximum are blue, white, and red, respectively. Hence, for any given row, models color coded with shades of blue/red possess more/less skill than the ensemble mean. The first two/third and fourth/fifth and sixth/fifth through seventh/fifth through eighth rows of Figure 2 are the variables specific for the calculation of the GST/GDD/HI/DI/GMCC, respectively. The ninth through fourteenth rows of Figure 2 are the computed median skill values for calculation of the GST, GDD, HI, CI, DI, and GMCC, respectively.
LOCA CMIP5 model median skill values vary by model and their individual rankings differ for each of the six viticulture climate classification indices that were considered for the Willamette Valley AVA. The results not only emphasize that it is necessary to account for model skill, but also, considering Equation (10), that skill weight assignments are prediction specific [17,23]. Model median skill values, δ i(obs) , are a function of their related relative RMSE values, E * . The results presented in Figure 2 also emphasize that individual model skill varies for each of the functional variables relevant for the calculation of the GST, GDD, HI, CI, DI, and GMCC, respectively. For example, for calculation of the GST, its related relative RMSE model rankings emphasize that individual models substantively differ in their relative capacities to sum the daily (1) maximum and (2) minimum surface air temperatures from the beginning of April to the end of October. Climate 2021, 9, x FOR PEER REVIEW 11 of 23 The color scale is shown for each row, and blues/reds represent models with more/less skill than the ensemble mean (white). Figure 3 is the multi-variate inter-model distance matrix that was developed using the RMSE values listed in Table 3 that are relevant for calculation of the GMCC viticulture climate classification index throughout the WV AVA. Together with specification of the radius of similarity, this matrix supports computation of model independence weights to account for, in addition to skill, archive model interdependence by down-weighting similar models [15,17,18,23]. Multi-variate inter-model distance matrices were also computed to calculate independence weights for the five other viticulture climate classification indices. In Figure 3, model pairs with small inter-model distances are denoted blue and considered dependent, whereas larger pairwise inter-model distances are red and signify independent models.
For all six viticulture climate classification indices considered in this study, calculated independence weights did not substantively alter results that were obtained which solely considered model skill weighting (see Table 4). This observation agrees with results presented by Wootten et al. [18]. Figure 4 is a plot of normalized independence weights that were calculated for each of the six viticulture climate classification indices. The normalized independence weights shown in Figure 4 emphasize that Equation (13) does not sufficiently down-weight model redundancy and only provides a slight nudge to weight the independent models more heavily. For this study, the slightly more heavily weighted independent models in general were also the models with poor relative skill. The skill as well as skill model interdependence weighting schemes generally produce weight assignments that cluster together [18].  Despite the previously mentioned observed differences with LOCA CMIP5 individual model skill for the calculation of six specific viticulture climate classification indices throughout the WV AVA, some observed similarities also exist. A strong consistency is observed for the rankings of model median skill values, and their related relative RMSE values, E * , for the GDD and HI, which is understandable given their similar functional variable definitions (see Equations (2) and (3), and Table 2). In addition, across all six viticulture climate classification indices, there exists a group of nine LOCA CMIP5 models (bcc-csm1-1-m, CanESM2, CCSM4, CMCC-CM, CMCC-CMS, CNRM-CM5, CSIRO-Mk3-6-0, EC-EARTH, and MPI-ESM-MR) the median skill of which, in each case, is consistently greater than its ensemble mean. Figure 3 is the multi-variate inter-model distance matrix that was developed using the RMSE values listed in Table 3 that are relevant for calculation of the GMCC viticulture climate classification index throughout the WV AVA. Together with specification of the radius of similarity, this matrix supports computation of model independence weights to account for, in addition to skill, archive model interdependence by down-weighting similar models [15,17,18,23]. Multi-variate inter-model distance matrices were also computed to calculate independence weights for the five other viticulture climate classification indices. In Figure 3, model pairs with small inter-model distances are denoted blue and considered dependent, whereas larger pairwise inter-model distances are red and signify independent models.  [40,41] for its defined historic period . Red matrix entries depict pairwise independent models, whereas entries colored blue denote pairwise models whose datasets are closer in agreement and demonstrate more co-dependence.  [40,41] for its defined historic period . Red matrix entries depict pairwise independent models, whereas entries colored blue denote pairwise models whose datasets are closer in agreement and demonstrate more co-dependence.
For all six viticulture climate classification indices considered in this study, calculated independence weights did not substantively alter results that were obtained which solely considered model skill weighting (see Table 4). This observation agrees with results presented by Wootten et al. [18]. Figure 4 is a plot of normalized independence weights that were calculated for each of the six viticulture climate classification indices. The normalized independence weights shown in Figure 4 emphasize that Equation (13) does not sufficiently down-weight model redundancy and only provides a slight nudge to weight the independent models more heavily. For this study, the slightly more heavily weighted independent models in general were also the models with poor relative skill. The skill as well as skill model interdependence weighting schemes generally produce weight assignments that cluster together [18]. Table 4. For the Willamette Valley American Viticultural Area (AVA) and growing season average temperature (GST), growing degree days (GDD), Huglin index (HI), cool night index (CI), dryness index (DI), and the Géoviticulture multicriteria climate classification (GMCC) viticulture climate classification indices, computed root mean squared error (RMSE) values obtained by comparing climate index specific processed daily Localized Constructed Analogs (LOCA) Coupled Model Intercomparison Project phase 5 (CMIP5) model historic datasets [40,41] with their equivalently processed observed data counterparts [24,41] using the skill weighting, and skill and model interdependence weighting methods [17,23]. N denotes the number of LOCA CMIP5 models defining the ensemble.  Table 3 and graphically represented in Figure 2, were ranked to develop ensemble compositions of size one to thirty-two. For each prediction of interest (i.e., GST, GDD, HI, CI, DI, and GMCC), the RMSE was subsequently computed for each of these thirty-two model compositions using simple model averaging (see Figure 5 for GST and Figures S1-S5 for GDD, HI, CI, DI, and GMCC, respectively). As previously mentioned, this simple method is like the approach presented by Yang et al. [25]; however, instead of the ranked skill scores from the Taylor diagram [28] we used the ranked model median skills. Climate 2021, 9, x FOR PEER REVIEW 13 of 23   [40,41] with their equivalently processed observed data counterparts [24,41] using the skill weighting, and skill and model interdependence weighting methods [17,23]. N denotes the number of LOCA CMIP5 models defining the ensemble.

GST
GST GDD HI CI DI GMCC  Figure 5 includes a plot of the RMSEs computed using simple model averaging (SMA) for each of the thirty-two ensemble compositions formed from the ranked median skill values that are of relevance for the calculation of the growing season average temperature, GST, throughout the WV AVA. This median skill directed SMA plot demonstrates a quick drop from its single model RMSE to a minimum, associated with an ensemble subset of size 10, and subsequently rises to the ensemble mean composed of all thirty-two models. This pattern is generally also observed in Figures S1-S5; however, the ensemble sizes associated with the minimum RMSE for prediction of the GDD, HI, CI, DI, and GMCC are 11, 18, 12, 14, and 21, respectively. Figure 5 and Figure S1-S5 also include results from application of the skill weighting method [17,23], with each figure including results associated with different parameterizations (i.e., values for the radius of model equality) that effectively result in ensemble compositions of varying sizes when zero is defined for a computed weight with a value less than 1 × 10 −4 to estimate ensemble size for the given D q value. In each figure, as the radius of model equality is increased from its initially assigned value which yields the most parsimonious ensemble composition considered for the skill weighting method, RMSE values for the skill weighting ensembles drop to a minimum and subsequently rise toward the no skill ensemble mean. For the skill weighting method, its minimum is always composed of all 32 models (see Figures 5 and S1-S5). model was overfit [24,65]. Further related study is encouraged to better understand wine region specific model archive weighting strategies that not only select optimal ensemble subsets for viticulture climate impact assessments but that also constrain projection uncertainty. Although application of the elastic net identifies the best regularized solution and yields robust estimates of model uncertainty, one potentially promising direction to explore, for example, is to combine and compare the elastic net and Reliability Ensemble Averaging (REA) methods [33,65]. Combining results from application of the elastic net with the REA method would result in an REA model performance criterion that simultaneously accounts for model skill and interdependence and an REA model convergence criterion that is less likely to be confounded due to model replication [23].  [40,41] with their equivalently processed observed data counterparts [18,42] using skill weighting, with various values for its radius of model equality, D q [17,23], model median skill directed simple model averaging (SMA), and elastic-net regularization [33]. N denotes the number of LOCA CMIP5 models defining the ensemble.
Herger et al. [24] emphasized that the skill and model interdependence weighting methods are directed at assigning continuous weights as opposed to the identification of optimal ensemble subsets. Results presented herein and also by Wootten et al. [18] reinforce this observation (see Figure 5 and Figure S1-S5 and Table 4). However, it is notable to observe that the model median skill directed simple model averaging method generally yields lower RMSE values than those computed from the skill weighting method for ensembles of comparable sizes (see Figure 5 and Figure S1-S5). The method also consistently finds an optimal ensemble subset with an RMSE value less than or equal in value to the skill weighting method minimizing RMSE that includes the complete archive (see Figure 5 and Figure S1-S5).
While relatively straightforward methods such as skill weighting, skill weighting including interdependence, and model median skill SMA can be proposed and applied to assign weights for model averaging, one can instead apply optimization and inference methods that involve a systematic and iterative search of parameter space, including implementation of formal stopping/convergence criteria, to find prediction specific optimal ensemble compositions and weights to account for skill and interdependence. For example, Massoud et al. [15], Massoud et al. [16], and Wootten et al. [18] applied BMA [61], which involves Markov Chain Monte Carlo sampling, and Herger et al. [24] employed mixed integer quadratic programming for model averaging. We apply a calibration method that to our knowledge has not previously been considered for combining multi-model climate projections; viz., the elastic-net penalty [33]. It can be readily configured to either find smooth solutions or parsimonious models while preserving fit with the observational dataset. Figure 5 and Figure S1-S5 further include results from applications of elastic-net regularization configured for automatic variable selection. Each elastic-net application was performed using the R software package 'glmnet' [29], employing k-fold CV with k = 32 (i.e., leave-one-out CV), α = 0.95, and w i = 1 for each observation. The best regularizing model is defined to be at the largest λ value within one standard error of the minimum [29,60,62]. Each figure includes the minimum, identified using leave-one-out cross validation, and its associated best regularized solution. The best regularized solution in each case differed only slightly from the minimizing solution. Elastic-net regularization consistently found a minimizing optimal ensemble subset that outperformed the other three model averaging methods considered in this study (see Figures 5 and S1-S5). From a practical perspective, for a given prediction, application of elastic-net regularization is less resource and time intensive to apply than the skill weighting, skill and model interdependence weighting, and the model median skill SMA methods.
The calibrated models listed in Table 5 are prediction specific (i.e., GST, GDD, HI, CI, DI, GMCC) weight assignments computed from cross validation directed applications of elastic-net regularization configured for automatic variable selection. Equation (8) can be applied using the weights listed in Table 5 to optimally compute, within the framework presented herein, the GST, GDD, HI, CI, DI, or GMCC for the WV AVA while using LOCA CMIP5 historical and future climate projections. For example, the T max and T min associated with the ensemble model M defined using the weights listed in the second column of Table 5 can be used to optimally compute the GST throughout the WV AVA using Equation (1). The model weights listed in Table 5 are based on comparisons with the Livneh dataset [42] for the LOCA CMIP5 defined historical period [40,41]. While not shown, closely matching summary statistics were also obtained for each respective viticulture climate index, for calibration and verification, from cross validation directed elastic-net regularization fits which held back twenty-five percent of the LOCA CMIP5 historical data.
The models listed in Table 5 for optimized computation of the GST, GDD, HI, CI, DI, and GMCC throughout the WV AVA further reinforce statements regarding CMIP5 archive democracy and redundancy [15][16][17], and the need to develop ensemble compositions and weights assignments for a specific modeling objective and location [17]. For each viticulture climate classification index, unique ensemble subsets with variable sizes and weights were identified which outperform the complete archive ensemble mean in each case. Table 6 summarizes computed correlations among the models listed in Table 5. The optimal ensembles identified for the GST and CI viticulture climate classification indices both demonstrate poor correlation with the remaining four indices. Strong correlations exist among the optimal ensembles identified for computation of the GDD, HI, DI, and GMCC indices for the WV AVA when w i = 1 for each observation. The general functional definitions for the GST and the CI are each unique relative to the remaining four indices which share common variables (see Equations (1)-(4), (6) and (7)). However, alternate values could be assigned to the w i , particularly to fit the DI and GMCC which include three and four distinct meteorological data requirements, respectively, to balance the contributions among the functional variables that define the indices [27,63]. For example, the final two columns of Table 5 include ensembles obtained for the DI and GMCC when the w i are assigned such that model-to-measurement misfit for each individual functional variable is equal in value at the start of the optimization [27,63]. With the w i assigned in this manner to fit the DI and GMCC, computed correlations among the ensembles decrease except for the CI and GMCC as expected (see Table 6).
Of the thirty-two models which constitute the entire LOCA CMIP5 archive, 10, 21, 21, 13, 22, and 21 of its members compose unique ensembles for optimal computation of the GST, GDD, HI, CI, DI, and GMCC throughout the WV AVA (see Table 5) when w i = 1 for each observation. Across these six viticulture climate classification index ensembles, seven LOCA CMIP5 models were consistently excluded from their compositions with a zero valued weights assignment (CESM1-BGC, CESM1-CAM5, GFDL-CM3, GFDL-ESM2G, GISS-E2-R, MIROC-ESM-CHEM, MPI-ESM-LR). If zero is defined for weights with a value less than 0.02, then the number of excluded models rises to ten (and includes EC-EARTH, IPSL-CM5A-LR, MIROC5). While unique to the WV AVA, the LOCA CMIP5 archive, and the six indices evaluated herein, these optimal ensemble composition sizes and weights differ relative to previously reported ensemble sizes and weights assignments which used downscaled climate models to evaluate the impacts of climate change to viticulture [45][46][47][48][49][50][51][52][53]. Table 5. Optimal weights obtained using elastic-net regularization [33] to assign when applying Localized Constructed Analogs (LOCA) Coupled Model Intercomparison Project phase 5 (CMIP5) historic and future climate projections [40,41] to compute the growing season average temperature (GST), growing degree days (GDD), Huglin index (HI), cool night index (CI), dryness index (DI), and the Géoviticulture multicriteria climate classification (GMCC) indices throughout the Willamette Valley American Viticultural Area (AVA). (* indicates that each functional observation group defining the index was balanced to be seen of equal importance at the start of the optimization).  The data and methods employed in our study are applicable for other wine regions. The approach outlined in this study for developing model-observation comparisons is of value from a practical perspective in that it is potentially far easier to compute and apply a viticulture climate classification directed optimal ensemble based on its underlying data types than the index itself, particularly for compute intensive indices such as the dryness index [37]. A potentially promising related research direction would be to explore updating the cultivar specific phenological understanding associated with one or more of the easy to compute and widely used temperature only based indices considered in this study by leveraging a recent contribution about the modeling of grapevine phenology [64]. Further application and evaluation of the two methods that were applied for the first time in this study to generalize climate model choice for the WV AVA using the LOCA CMIP5 archive (i.e., the simple model averaging method of [25] that uses the ranked model median skills and elastic-net regularization [33]) is encouraged and would not necessarily be limited to our study area or predictions of viticulture climate classification.
The optimal ensemble subsets computed in this study for each of the six indices are unique prediction and location specific model combinations that adhere to the principle of parsimony by simultaneously accounting for ensemble model skill and similarity for the LOCA CMIP5 historical period. Our approach to ensemble model selection is like other similar studies in that it depends on historical calibration training data [18,24,25]. It is preferable to use an optimal ensemble subset rather than its comparable no skill ensemble mean from the perspective that the parsimonious model combination does not contain duplicative information that can bias results and correlation statistics [23] and also in that the ensemble mean is prone to poor model accuracy and poor model uncertainty quantification [24]. The skill of each prediction and location specific parsimonious optimal ensemble subset for predicting future climates is dependent upon whether the calibrated model was overfit [24,65]. Further related study is encouraged to better understand wine region specific model archive weighting strategies that not only select optimal ensemble subsets for viticulture climate impact assessments but that also constrain projection uncertainty. Although application of the elastic net identifies the best regularized solution and yields robust estimates of model uncertainty, one potentially promising direction to explore, for example, is to combine and compare the elastic net and Reliability Ensemble Averaging (REA) methods [33,65]. Combining results from application of the elastic net with the REA method would result in an REA model performance criterion that simultaneously accounts for model skill and interdependence and an REA model convergence criterion that is less likely to be confounded due to model replication [23].

Conclusions
This study was motivated by recent contributions which underscored the need for prediction and location specific Coupled Model Intercomparison Project Phase 5 ensemble compositions and weights that account for model skill and similarity and that address ensemble subset selection. It was also motivated by the expressed need to further study methods that generalize climate model choice. This study considered the complete archive of the thirty-two Coupled Model Intercomparison Project Phase 5 Localized Constructed Analogs downscaled daily historic datasets. It explored, for the first time, ensemble subset selection for the optimal computation of the growing season average temperature, growing degree days, Huglin's index, cool night index, dryness index, and the multicriteria climate classification indices throughout the Willamette Valley American Viticultural Area. Four model averaging methods were evaluated including two recently introduced from the climate science community, viz., skill weighting, and skill and model interdependence weighting. Two additional methods, neither previously considered, that were also applied to generalize climate model selection included model median skill directed simple model averaging, a modification of a recently introduced approach to combine climate models, and elastic-net regularization, a formal optimization technique that is well designed for application to automatic variable selection problems. The scope of the method comparison performed in this study is limited to these four methods, the 553 model-observation data locations which cover the WV AVA as shown in Figure 1, and the model-observation datasets listed in Table 2. Further research of methods directed at ensemble subset selection is encouraged.
For a given viticulture climate classification index, application of the skill as well as skill model interdependence weighting methods involves evaluating the performance of each indices' individual functional variables. Application of the two climate model weighting strategies to compute the GST, GDD, HI, CI, DI, and GMCC throughout the WV AVA revealed that LOCA CMIP5 individual model skill varies by each indices' defined processed meteorological variables. Moreover, also considering independence in addition to skill did not substantively alter ensemble compositions or weights relative to solely considering model skill. While results obtained from their application did vary as a function of their parameterization, optimal ensembles from these two methods consistently, across all six indices, involved the complete archive and were less optimal than those obtained from the other two model averaging methods. Interestingly, model median skill directed SMA, which can be directly applied following application of the skill weighting strategy, was able to consistently compute LOCA CMIP5 ensemble subsets for each of the six viticulture climate classification indices. The RMSE values listed in Table 3 permit researchers to independently apply the skill weighting, skill weighting including interdependence, and model median skill directed SMA strategies to develop ensembles to compute the GST, GDD, HI, CI, DI, and GMCC indices throughout the WV AVA using historic and future LOCA CMIP5 climate projections.
Of the four model averaging methods considered in this study, results obtained from application of elastic-net regularization consistently outperformed the other three methods with finding optimal ensemble subsets from among the complete LOCA CMIP5 model archive. Its application, much like the other three methods, is straightforward. The LOCA CMIP5 ensemble compositions and weights did vary by viticulture climate classification index. While specific to the archive and location, it was noteworthy to observe that the optimal ensemble composition sizes and weights across the six viticulture climate classification indices differed relative to ensemble sizes and weights assignments reported in published studies that focused on climate change impacts to viticulture. The weights listed in Table 5, derived from applications of elastic-net regularization, permit researchers to independently develop ensembles to compute the GST, GDD, HI, CI, DI, and GMCC throughout the WV AVA using historic and future LOCA CMIP5 climate projections.
Future projections of the bioclimatic indices considered in this study, computed using optimal compositions and weights, enable robust, adaptive evaluations of existing and new sites for viticulture throughout the Willamette Valley American Viticultural Area. The climate model ensemble weights assignment strategies presented herein are transferable to other wine regions. Further related study is encouraged, not only for ensemble subset selection, but also to constrain forecast uncertainty. For example, while compute intensive for some indices such as the DI and the GMCC, further related study that seeks to find optimal ensemble subsets using the computed indices instead is also encouraged. This would enable comparison with the results contained herein as well as provide opportunity to find optimal ensembles across predefined sets of viticulture climate classification indices for a given wine region.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/cli9090140/s1, Figure S1: For the Willamette Valley American Viticultural Area (AVA) and growing degree days (GDD), computed root mean squared error (RMSE) values obtained by comparing GDD specific processed daily Localized Constructed Analogs (LOCA) Coupled Model Intercomparison Project phase 5 (CMIP5) model historic datasets [40,41] with their equivalently processed observed data counterparts [18,42] using skill weighting, with various values for its radius of model equality, D q [17,23], model median skill directed simple model averaging (SMA), and elastic-net regularization [33]. N denotes the number of LOCA CMIP5 models defining the ensemble, Figure S2: For the Willamette Valley American Viticultural Area (AVA) and Huglin index (HI), com-puted root mean squared error (RMSE) values obtained by comparing HI specific processed daily Localized Constructed Analogs (LOCA) Coupled Model Intercomparison Project phase 5 (CMIP5) model historic datasets [40,41] with their equivalently processed observed data counterparts [18,42] using skill weighting, with various values for its radius of model equality, D q [17,23], model median skill directed simple model averaging (SMA), and elastic-net regularization [33]. N denotes the number of LOCA CMIP5 models defining the ensemble, Figure S3: For the Willamette Valley American Viticultural Area (AVA) and cool night index (CI), computed root mean squared error (RMSE) values obtained by comparing CI specific processed daily Localized Constructed Analogs (LOCA) Coupled Model Intercomparison Project phase 5 (CMIP5) model historic datasets [40,41] with their equivalently processed observed data counterparts [18,42] using skill weighting, with various values for its radius of model equality, D q [17,23], model median skill directed simple model averaging (SMA), and elastic-net regularization [33]. N denotes the number of LOCA CMIP5 models defining the ensemble, Figure S4: For the Willamette Valley American Viticultural Area (AVA) and dryness index (DI), computed root mean squared error (RMSE) values obtained by comparing DI specific processed daily Localized Constructed Analogs (LOCA) Coupled Model Intercomparison Project phase 5 (CMIP5) model historic datasets [40,41] with their equivalently processed observed data counterparts [18,42] using skill weighting, with various values for its radius of model equality, D q [17,23], model median skill directed simple model averaging (SMA), and elastic-net regularization [33]. N denotes the number of LOCA CMIP5 models defining the ensemble, Figure S5: For the Willamette Valley American Viticultural Area (AVA) and Géoviticulture multicriteria climate classification (GMCC), computed root mean squared error (RMSE) values obtained by comparing GMCC specific processed daily Localized Constructed Analogs (LOCA) Coupled Model Intercomparison Project phase 5 (CMIP5) model historic datasets [40,41] with their equivalently processed observed data counterparts [18,42] using skill weighting, with various values for its radius of model equality, D q [17,23], model median skill directed simple model averaging (SMA), and elastic-net regularization [33]. N denotes the number of LOCA CMIP5 models defining the ensemble, Table S1: The 32 models and modeling groups that provided Coupled Model Intercomparison Project phase 5 (CMIP5) data used in this study.